7,306 Matching Annotations
  1. May 2025
    1. Author response:

      eLife Assessment

      This manuscript introduces a useful protein-stability-based fitness model for simulating protein evolution and unifying non-neutral models of molecular evolution with phylogenetic models. The model is applied to four viral proteins that are of structural and functional importance. The justification of some hypotheses regarding fitness is incomplete, as well as the evidence for the model's predictive power, since it shows little improvement over neutral models in predicting protein evolution.

      We thank for the constructive comments that helped improve our study. Regarding the comment about justification of fitness, we will include in the revised manuscript additional information to support the relevance of modeling protein evolution accounting for protein folding stability. We agree that increasing the parameterization of the developed birth-death model is interesting, if it does not lead to overfitting. The model presented considers the fitness of protein variants to determine their reproductive success through the corresponding birth and death rates, varying among lineages, and it is biologically meaningful and technically correct (Harmon 2019). Following a suggestion of the first reviewer to allow variation of the global birth-death rate among lineages, we will additionally incorporate this aspect into the model and evaluate its performance with the data for the evaluation of the models. The integration of structurally constrained substitution models of protein evolution, as Markov models, into the birth-death process was made following standards approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012) and we will provide more information about it in the revised manuscript. Regarding the predictive power, our study showed good accuracy in predicting the real folding stability of forecasted protein variants. On the other hand, predicting the exact sequences proved to be more challenging, indicating needs in the field of substitution models of molecular evolution. Altogether, we believe our findings provide a significant contribution to the field, as accurately forecasting the folding stability of future real proteins is fundamental for predicting their protein function and enabling a variety of applications. Additionally, we implemented the models into a freely available computer framework, with detailed documentation and diverse practical examples.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ferreiro et al. present a method to simulate protein sequence evolution under a birth-death model where sequence evolution is constrained by structural constraints on protein stability. The authors then use this model to explore the predictability of sequence evolution in several viral structural proteins. In principle, this work is of great interest to molecular evolution and phylodynamics, which have struggled to couple non-neutral models of sequence evolution to phylodynamic models like birth-death. Unfortunately, though, the model shows little improvement over neutral models in predicting protein evolution, and this ultimately appears to be due to fundamental conceptual problems with how fitness is modeled and linked to the phylodynamic birth-death model.

      We thank the reviewer for the positive comments about our work.

      Regarding predictive power, the study showed a good accuracy in predicting the real folding stability of forecasted protein variants under a selection model, but not under a neutral model. However, predicting the exact sequences was more challenging. For example, amino acids with similar physicochemical properties can result in similar folding stability while differ in the specific sequence, more accurate substitution models of molecular evolution are required in the field. We consider that forecasting the folding stability of future real proteins is an important advancement in forecasting protein evolution, given the essential role of folding stability in protein function and its variety of applications. Regarding the conceptual concerns related to fitness modeling, we clarify this issue in detail in our responses to the specific comments below.

      Major concerns:

      (1) Fitness model: All lineages have the same growth rate r = b-d because the authors assume b+d=1. But under a birth-death model, the growth r is equivalent to fitness, so this is essentially assuming all lineages have the same absolute fitness since increases in reproductive fitness (b) will simply trade off with decreases in survival (d). Thus, even if the SCS model constrains sequence evolution, the birth-death model does not really allow for non-neutral evolution such that mutations can feed back and alter the structure of the phylogeny.

      We thank the reviewer for this comment that aims to improve the realism of our model. In the model presented (but see later for another model derived from the proposal of the reviewer and that we are now implementing into the framework and applying to the data used for the evaluation of the models), the fitness predicted from a protein variant is used to obtain the corresponding birth rate of that variant. In this way, protein variants with high fitness have high birth rates leading to overall more birth events, while protein variants with low fitness have low birth rates resulting in overall more extinction events, which has biological meaning for the study system. The statement “All lineages have the same growth rate r = b-d” in our model is incorrect because, in our model, b and d can vary among lineages according to the fitness. For example, a lineage might have b=0.9, d=0.1, r=0.8, while another lineage could have b=0.6, d=0.4, r=0.2. Indeed, the statement “this is essentially assuming all lineages have the same absolute fitness” is incorrect. Clearly, assuming that all lineages have the same fitness would not make sense, in that situation the folding stability of the forecasted protein variants would be similar under any model, which is not the case as shown in the results. In our model, the fitness affects the reproductive success, where protein variants with a high fitness have higher birth rates leading to more birth events, while those with lower fitness have higher death rates leading to more extinction events. This parameterization is meaningful for protein evolution because the fitness of a protein variant can affect its survival (birth or extinction) without necessarily affecting its rate of evolution. While faster growth rate can sometimes be associated with higher fitness, a variant with high fitness does not necessarily accumulate substitutions at a faster rate. Regarding the phylogenetic structure, the model presented considers variable birth and death events across different lineages according to the fitness of the corresponding protein variants, and this alters the derived phylogeny (i.e., protein variants selected against can go extinct while others with high fitness can produce descendants). We are not sure about the meaning of the term “mutations can feed back” in the context of our system. Note that we use Markov models of evolution, which are well-stablished in the field (despite their limitations), and substitutions are fixed mutations, which still could be reverted later if selected by the substitution model (Yang 2006). Altogether, we find that the presented birth-death model is technically correct and appropriate for modeling our biological system. Its integration with structurally constrained substitution (SCS) models of protein evolution, as Markov models, is correct following general approaches of molecular evolution in population genetics (Yang 2006; Carvajal-Rodriguez 2010; Arenas 2012; Hoban, et al. 2012). We will provide a more detailed description of the model in the revised manuscript.

      Apart from these clarifications about the birth-death model used, we understand the point of the reviewer and following the suggestion we are now incorporating an additional birth-death model that accounts for variable global birth-death rate among lineages. Specifically, we are following the model proposed by Neher et al (2014), where the death rate is considered as 1 and the birth rate is modeled as 1 + fitness. In this model, the global birth-death rate varies among lineages. We are now implementing this model into the computer framework and applying it to the data used for the evaluation of the models. Preliminary results, which will be finally presented in the revised manuscript, indicate that this model yields similar predictive accuracy compared to the previous birth-death model. If this is confirmed, accounting for variability in the global birth-death rate does not appear to play a major role in the studied systems of protein evolution. We will present this additional birth-death model and its results in the revised manuscript.

      (2) Predictive performance: Similar performance in predicting amino acid frequencies is observed under both the SCS model and the neutral model. I suspect that this rather disappointing result owes to the fact that the absolute fitness of different viral variants could not actually change during the simulations (see comment #1).

      The study shows similar performance in predicting the sequences of the forecasted proteins under both the SCS model and the neutral model, but shows differences in predicting the folding stability of the forecasted proteins between these models. Indeed, as explained in the previous answer, the birth-death model accounts for variation in fitness among lineages, leading to differences among lineages in reproductive success. The new birth-death model that we are now implementing, which incorporates variation of the global birth-death rate among lineages, is producing similar preliminary results. In addition to these considerations, it is known that SCS models applied to phylogenetics (such as ancestral molecular reconstruction) can model protein evolution with high accuracy in terms of folding stability. However, inferring sequences (i.e., ancestral sequences) is considerably more challenging even for ancestral molecular reconstruction (Arenas, et al. 2017; Arenas and Bastolla 2020). The observed sequence diversity is much greater than the observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions among amino acids with similar physicochemical properties can result in protein variants with similar folding stability but different specific amino acid sequences; further work is demanded in the field of substitution models of molecular evolution. We will expand the discussion of this aspect in the revised manuscript.

      (3) Model assessment: It would be interesting to know how much the predictions were informed by the structurally constrained sequence evolution model versus the birth-death model. To explore this, the authors could consider three different models: 1) neutral, 2) SCS, and 3) SCS + BD. Simulations under the SCS model could be performed by simulating molecular evolution along just one hypothetical lineage. Seeing if the SCS + BD model improves over the SCS model alone would be another way of testing whether mutations could actually impact the evolutionary dynamics of lineages in the phylogeny.

      In the present study, we compare the neutral model + birth-death (BD) with the SCS model + BD. Markov substitution models Q are applied upon an evolutionary time (i.e., branch length, t) and this allows to determine the probability of substitution events during that time period [P(t) = exp (Qt)]. This approach is traditionally used in phylogenetics to model the incorporation of substitutions over time. Therefore, to compare the neutral and SCS models, an evolutionary time is required, in this case it is provided by the birth-death process. The suggestions 1) and 2) cannot be compared without an underlined evolutionary history. However, comparisons in terms of likelihood, and other aspects, between models that ignore the protein structure and the implemented SCS models are already available in our previous studies based on coalescent simulations or given phylogenetic trees (Arenas, et al. 2013; Arenas, et al. 2015). There, SCS models produced proteins with more realistic folding stability than models that ignore evolutionary constraints from the protein structure, and those findings are consistent with the results from the present study where we explore the application of these models to forecasting protein evolution. We would like to emphasize that forecasting the folding stability of future real proteins is a significant and novel finding, folding stability is fundamental to protein function and has diverse implications. While accurately forecasting the exact sequences would indeed be ideal, this remains a challenging task with current substitution models. In this regard, we will discuss in the revised manuscript the need of developing more accurate substitution models.

      (4) Background fitness effects: The model ignores background genetic variation in fitness. I think this is particularly important as the fitness effects of mutations in any one protein may be overshadowed by the fitness effects of mutations elsewhere in the genome. The model also ignores background changes in fitness due to the environment, but I acknowledge that might be beyond the scope of the current work.

      This comment made us realize that more information about the features of the implemented SCS models should be included in the manuscript. In particular, the implemented SCS models consider a negative design based on the observed residue contacts in nearly all proteins available in the Protein Data Bank (Arenas, et al. 2013; Arenas, et al. 2015). This data is provided as an input file and it can be updated to incorporate new structures (see the framework documentation and the practical examples). Therefore, the prediction of folding stability is a combination of positive design (direct analysis of the target protein) and negative design (consideration of background proteins to reduce biases), thus incorporating background molecular diversity. This important feature was not sufficiently described in the manuscript, and we will add more details in the revised version. Regarding the fitness caused by the environment, we agree with the reviewer. This is a challenge for any method aiming to forecast evolution, as future environmental shifts are inherently unpredictable and may impact the accuracy of the predictions. Although one might attempt to incorporate such effects into the model, doing so risks overparameterization, especially when the additional factors are uncertain or speculative. We will include a discussion in the revised manuscript about our perspective on the potential effects of environmental changes on forecasting evolution.

      (5) In contrast to the model explored here, recent work on multi-type birth-death processes has considered models where lineages have type-specific birth and/or death rates and therefore also type-specific growth rates and fitness (Stadler and Bonhoeffer, 2013; Kunhert et al., 2017; Barido-Sottani, 2023). Rasmussen & Stadler (eLife, 2019) even consider a multi-type birth-death model where the fitness effects of multiple mutations in a protein or viral genome collectively determine the overall fitness of a lineage. The key difference with this work presented here is that these models allow lineages to have different growth rates and fitness, so these models truly allow for non-neutral evolutionary dynamics. It would appear the authors might need to adopt a similar approach to successfully predict protein evolution.

      We agree with the reviewer that robust birth-death models have been developed applying statistics and, in many cases, the primary aim of those studies is the development and refinement of the model itself. Regarding the study by Rasmussen and Stadler 2019, it incorporates an external evaluation of mutation events where the used fitness is specific for the proteins investigated in that study, which may pose challenges for users interested in analyzing other proteins. In contrast, our study takes a different approach. We implement a fitness function that can be predicted and evaluated for any type of protein (Goldstein 2013), making it broadly applicable. In addition, we provide a freely available and well-documented computational framework to facilitate its use. The primary aim of our study is not the development of novel or complex birth-death models. Rather, we aim to explore the integration of a standard birth-death model with structurally constrained substitution models for the purpose of predicting protein evolution. In the context of protein evolution, substitution models are a critical factor (Liberles, et al. 2012; Wilke 2012; Bordner and Mittelmann 2013; Echave, et al. 2016; Arenas, et al. 2017; Echave and Wilke 2017), and their combination with a birth-death model constitutes a first approximation upon which next studies can build to better understand this biological system. We will include these considerations in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this study, "Forecasting protein evolution by integrating birth-death population models with structurally constrained substitution models", David Ferreiro and co-authors present a forward-in-time evolutionary simulation framework that integrates a birth-death population model with a fitness function based on protein folding stability. By incorporating structurally constrained substitution models and estimating fitness from ΔG values using homology-modeled structures, the authors aim to capture biophysically realistic evolutionary dynamics. The approach is implemented in a new version of their open-source software, ProteinEvolver2, and is applied to four viral proteins from HIV-1 and SARS-CoV-2.

      Overall, the study presents a compelling rationale for using folding stability as a constraint in evolutionary simulations and offers a novel framework and software to explore such dynamics. While the results are promising, particularly for predicting biophysical properties, the current analysis provides only partial evidence for true evolutionary forecasting, especially at the sequence level. The work offers a meaningful conceptual advance and a useful simulation tool, and sets the stage for more extensive validation in future studies.

      We also thank this reviewer for the positive comments on our study. Regarding the predictive power, our results showed good accuracy in predicting the folding stability of the forecasted protein variants. However, predicting the specific sequences of these variants is more challenging. For example, forecasting in amino acids with similar physicochemical properties can result in different sequences but in similar folding stability. We believe that these findings are realistic and interesting as they indicate that while forecasting folding stability is feasible, forecasting the specific sequence evolution is more complex that one could anticipate.

      Strengths:

      The results demonstrate that fitness constraints based on protein stability can prevent the emergence of unrealistic, destabilized variants - a limitation of traditional, neutral substitution models. In particular, the predicted folding stabilities of simulated protein variants closely match those observed in real variants, suggesting that the model captures relevant biophysical constraints.

      We agree with the reviewer and appreciate the consideration that forecasting the folding stability of future real proteins is a relevant finding. For instance, folding stability is fundamental for protein function and affects several other molecular properties.

      Weaknesses:

      The predictive scope of the method remains limited. While the model effectively preserves folding stability, its ability to forecast specific sequence content is not well supported.

      It is known that structurally constrained substitution (SCS) models applied to phylogenetics (such as ancestral molecular reconstruction) can model protein evolution with high accuracy in terms of folding stability, while inferring sequences (i.e., ancestral sequences) remains considerably more challenging (Arenas, et al. 2017; Arenas and Bastolla 2020). The observed sequence diversity is much higher than the observed structural diversity (Illergard, et al. 2009; Pascual-Garcia, et al. 2010), and substitutions between amino acids with similar physicochemical properties can result in protein variants with similar folding stability but with different specific amino acid composition. We will expand the discussion of this aspect in the manuscript.

      Only one dataset (HIV-1 MA) is evaluated for sequence-level divergence using KL divergence; this analysis is absent for the other proteins. The authors use a consensus Omicron sequence as a representative endpoint for SARS-CoV-2, which overlooks the rich longitudinal sequence data available from GISAID. The use of just one consensus from a single time point is not fully justified, given the extensive temporal and geographical sampling available. Extending the analysis to include multiple timepoints, particularly for SARS-CoV-2, would strengthen the predictive claims. Similarly, applying the model to other well-sampled viral proteins, such as those from influenza or RSV, would broaden its relevance and test its generalizability.

      The evaluation of forecasting evolution using real datasets is complex due to several conceptual and practical aspects. In contrast to traditional phylogenetic reconstruction of past evolutionary events and ancestral sequences, forecasting evolution often begins with a variant that is evolved forward in time and requires a rough fitness landscape to select among possible future variants (Lässig, et al. 2017). Another concern for validating the method is the need to know the initial variant that gives rise to the corresponding forecasted variants, and it is not always known. Thus, we investigated systems where the initial variant, or a close approximation, is known, such as scenarios of in vitro monitored evolution. In the case of SARS-CoV-2, the Wuhan variant is commonly used as the starting variant of the pandemic. Next, since forecasting evolution is highly dependent on the used model of evolution, unexpected external factors can be dramatic for the predictions. For this reason, systems with minimal external influences provide a more controlled context for evaluating forecasting evolution. For instance, scenarios of in vitro monitored virus evolution avoid some external factors such as host immune response. Another important aspect is the availability of data at two (i.e., present and future) or more time points along the evolutionary trajectory, with sufficient genetic divergence between them to identify clear evolutionary signatures. Additionally, using consensus sequences can help mitigate effects from unfixed mutations, which should not be modeled by a substitution model of evolution. Altogether, not all datasets are appropriate to properly evaluate forecasting evolution. We will include these considerations in the revised manuscript.

      Sequence comparisons based on the KL divergence require, at the studied time point, an observed distribution of amino acid frequencies among sites and an estimated distribution of amino acid frequencies among sites. In the study datasets, this is only the case for the HIV-1 MA dataset, which belongs to a previous study from one of us and collaborators where we obtained at least 20 independent sequences at each sampling point (Arenas, et al. 2016). We will provide additional information on this aspect in the manuscript.

      Regarding the Omicron dataset, we used 384 curated sequences of the Omicron variant of concern to construct the study dataset and we believe that it is a representative sample. The sequence used for the initial time point was the Wuhan variant (Wu, et al. 2020), which is commonly assumed to be the origin of the pandemic in SARS-CoV-2 studies. As previously indicated, the use of consensus sequences is convenient to avoid variants with unfixed mutations. Regarding extending the analysis to other timepoints (other variants of concern), we kindly disagree because Omicron is the variant of concern with the highest genetic distance to the Wuhan variant, and a high genetic distance is required to properly evaluate the prediction method. We noted that earlier variants of concern show a small number of fixed mutations in the study proteins, despite the availability of large numbers of sequences in databases such as GISAID.

      Additionally, we investigated the evolutionary trajectories of HIV-1 protease (PR) in 12 intra-host viral populations.

      Next, following the proposal of the reviewer, we will incorporate the analysis of an additional viral dataset (probably influenza following the suggestion of the reviewer) to further assess the generalizability of the method. Still, as previously indicated, not all datasets are suitable for a proper evaluation of forecasting evolution. Factors such as the shape of the fitness landscape and the amount of genetic variation over time can influence the accuracy of predictions. We will present the results of the analysis of the new data in the revised manuscript.

      It would also be informative to include a retrospective analysis of the evolution of protein stability along known historical trajectories. This would allow the authors to assess whether folding stability is indeed preserved in real-world evolution, as assumed in their model.

      Our present study is not focused on investigating the evolution of the folding stability over time, although it provides this information indirectly at the studied time points. Instead, the present study shows that the folding stability of the forecasted protein variants is similar to the folding stability of the corresponding real protein variants for diverse viral proteins, which is an important evaluation of the method. Next, the folding stability can indeed vary over time in both real and modeled evolutionary scenarios, and our present study is not in conflict with this. In that regard, which is not the aim of our present study, some previous phylogenetic-based studies have reported temporal fluctuations in folding stability for diverse data (Arenas, et al. 2017; Olabode, et al. 2017; Arenas and Bastolla 2020; Ferreiro, et al. 2022).

      Finally, a discussion on the impact of structural templates - and whether the fixed template remains valid across divergent sequences - would be valuable. Addressing the possibility of structural remodeling or template switching during evolution would improve confidence in the model's applicability to more divergent evolutionary scenarios.

      This is an important point. For the datasets that required homology modeling (in several cases it was not necessary because the sequence was present in a protein structure of the PDB), the structural templates were selected using SWISS-MODEL, and we applied the best-fitting template. We will include additional details about the parameters of the homology modeling in the revised version. Indeed, our method assumes that the protein structure is maintained over the studied evolutionary time, which can be generally reasonable for short timescales where the structure is conserved (Illergard, et al. 2009; Pascual-Garcia, et al. 2010). Over longer evolutionary timescales, structural changes may occur, and in such cases, modeling the evolution of the protein structure would be necessary. To our knowledge, modeling the evolution of the protein structure remains a challenging task that requires substantial methodological developments. Recent advances in artificial intelligence, particularly in protein structure prediction from sequence, may offer promising tools for addressing this challenge. However, we believe that evaluating such approaches in the context of structural evolution would be difficult, especially given the limited availability of real data with known evolutionary trajectories involving structural change. In any case, this is probably an important direction for future research. We will include this discussion in the revised manuscript.

      Cited references

      Arenas M. 2012. Simulation of Molecular Data under Diverse Evolutionary Scenarios. PLoS Comput Biol 8:e1002495.

      Arenas M, Bastolla U. 2020. ProtASR2: Ancestral reconstruction of protein sequences accounting for folding stability. Methods Ecol Evol 11:248-257.

      Arenas M, Dos Santos HG, Posada D, Bastolla U. 2013. Protein evolution along phylogenetic histories under structurally constrained substitution models. Bioinformatics 29:3020-3028.

      Arenas M, Lorenzo-Redondo R, Lopez-Galindez C. 2016. Influence of mutation and recombination on HIV-1 in vitro fitness recovery. Molecular Phylogenetics and Evolution 94:264-270.

      Arenas M, Sanchez-Cobos A, Bastolla U. 2015. Maximum likelihood phylogenetic inference with selection on protein folding stability. Molecular Biology and Evolution 32:2195-2207.

      Arenas M, Weber CC, Liberles DA, Bastolla U. 2017. ProtASR: An Evolutionary Framework for Ancestral Protein Reconstruction with Selection on Folding Stability. Systematic Biology 66:1054-1064.

      Bordner AJ, Mittelmann HD. 2013. A new formulation of protein evolutionary models that account for structural constraints. Molecular Biology and Evolution 31:736-749.

      Carvajal-Rodriguez A. 2010. Simulation of genes and genomes forward in time. Current Genomics 11:58-61.

      Echave J, Spielman SJ, Wilke CO. 2016. Causes of evolutionary rate variation among protein sites. Nature Reviews Genetics 17:109-121.

      Echave J, Wilke CO. 2017. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 46:85-103.

      Ferreiro D, Khalil R, Gallego MJ, Osorio NS, Arenas M. 2022. The evolution of the HIV-1 protease folding stability. Virus Evol 8:veac115.

      Goldstein RA. 2013. Population Size Dependence of Fitness Effect Distribution and Substitution Rate Probed by Biophysical Model of Protein Thermostability. Genome Biol Evol 5:1584-1593.

      Harmon LJ. 2019. Introduction to birth-death models. In. Phylogenetic Comparative Methods. p. https://lukejharmon.github.io/pcm/chapter10_birthdeath/.

      Hoban S, Bertorelle G, Gaggiotti OE. 2012. Computer simulations: tools for population and evolutionary genetics. Nature Reviews Genetics 13:110-122.

      Illergard K, Ardell DH, Elofsson A. 2009. Structure is three to ten times more conserved than sequence--a study of structural response in protein cores. Proteins 77:499-508.

      Lässig M, Mustonen V, Walczak AM. 2017. Predicting evolution. Nature Ecology & Evolution 1:0077.

      Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning AP, Dokholyan NV, Echave J, et al. 2012. The interface of protein structure, protein biophysics, and molecular evolution. Protein Science 21:769-785.

      Neher RA, Russell CA, Shraiman BI. 2014. Predicting evolution from the shape of genealogical trees. Elife 3.

      Olabode AS, Kandathil SM, Lovell SC, Robertson DL. 2017. Adaptive HIV-1 evolutionary trajectories are constrained by protein stability. Virus Evol 3:vex019.

      Pascual-Garcia A, Abia D, Mendez R, Nido GS, Bastolla U. 2010. Quantifying the evolutionary divergence of protein structures: the role of function change and function conservation. Proteins 78:181-196.

      Wilke CO. 2012. Bringing molecules back into molecular evolution. PLoS Comput Biol 8:e1002572.

      Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, Hu Y, Tao ZW, Tian JH, Pei YY, et al. 2020. A new coronavirus associated with human respiratory disease in China. Nature 579:265-269.

      Yang Z. 2006. Computational Molecular Evolution. Oxford, England.: Oxford University Press.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this work, the authors have developed SPLASH+, a micro-assembly and biological interpretation framework that expands on their previously published reference-free statistical approach (SPLASH) for sequencing data analysis.

      Thank you for this thorough overview of our work.

      Strengths:

      (1) The methodology developed by the authors seems like a promising approach to overcome many of the challenges posed by reference-based single-cell RNA-seq analysis methods.

      Thank you for your positive comment on the potential of our approach to address the limitations of reference-based methods for scRNA-Seq analysis.

      (2) The analysis of the RNU6 repetitive small nuclear RNA provides a very compelling example of a type of transcript that is very challenging to analyze with standard reference-based methods (e.g., most reads from this gene fail to align with STAR, if I understood the result correctly).

      We thank the reviewer for their positive comment. We agree that the variation in RNU6 detected by SPLASH+ underscores the potential of our reference-free method to make discoveries in cases where reference-based approaches fall short.

      Weaknesses:

      (1) The manuscript presents a number of case studies from very diverse domains of single-cell RNA-seq analysis. As a result, the manuscript has been challenging to review, because it requires domain expertise in centromere biology, RNA splicing, RNA editing, V(D)J transcript diversity, and repeat polymorphisms.

      We appreciate the reviewer’s effort in thoroughly evaluating this manuscript, especially given the broad range of biological domains discussed. Our main goal in presenting a wide range of applications was to highlight the key strength of the SPLASH+ framework: its ability to unify diverse biological discoveries within a single method that operates directly on sequencing reads.

      (2) Although the paper focuses on SmartSeq2 full-length single-cell RNA-seq data analysis, the vast majority of single-cell RNA-seq data that is currently being generated comes from droplet-based methods (e.g., 10x Genomics) that sequence only the 3' or 5' ends of transcripts. As a result, it is unclear if SPLASH+ is also applicable to these types of data.

      We thank the reviewer for this comment. Due to the specific data format of barcoded single-cell sequencing platforms such as 10x Genomics, extending the SPLASH framework to support 10x analysis required engineering a specialized preprocessing tool. We have addressed this in a recent work, which is now available as a preprint (https://doi.org/10.1101/2024.12.24.630263).

      (3) The criteria used for the selection of the 10 'core genes' have not been sufficiently justified.

      We chose these genes as SPLASH+ detected regulated splicing for them in nearly all tissues (18 out of 19)  analyzed in our study (i.e., identifying anchors classified as splicing anchors in those tissues). Our subsequent analysis showed that all these genes are involved in either splicing regulation or histone modification. We will further clarify this selection criterion in the revision. 

      (4) It is currently unclear how the splicing diversity discovered in this paper relates to the concept of noisy splicing (i.e., there are likely many low-frequency transcripts and splice junctions that are unlikely to have a significant functional impact beyond triggering nonsense-mediated decay).

      In our analysis, to ensure sufficient read coverage, we considered significant anchors supported by more than 50 reads and detected in over 10 cells. Additionally, our downstream analyses (including splicing analysis) are based on assembled sequences (compactors) generated through our micro-assembly step. This process effectively acts as a denoising step by filtering out sequences likely caused by sequencing errors or with very low read support. However, we agree that the detected splice variants have not been fully functionally characterized, and further functional experiments may be needed.

      (5) The paper presents only a very superficial discussion of the potential weaknesses of the SPLASH+ method.

      We discussed two potential limitations of SPLASH+ in the Conclusions section: (1) it is not suitable for differential gene expression analysis, and (2) although we provide a framework for interpreting and analyzing SPLASH results, further work is still needed to improve the annotation of calls lacking BLAST matches. We will add more discussion for these in the revision. 

      (6) The cursory mention of metatranscriptome in the conclusion of the paper is confusing, as it might suggest the presence of microbial cells in sterile human tissues (which has recently been discredited in cancer, see e.g. https://www.science.org/content/article/journal-retracts-influential-cancer-microbiome-paper).

      We will remove the mention of metatranscriptome in the revised manuscript.

      Reviewer #2 (Public review):

      The authors extend their SPLASH framework with single-cell RNA-seq in mind, in two ways. First, they introduce "compactors", which are possible paths branching out from an anchor. Second, they introduce a workflow to classify compactors according to the type of biological sequence variation represented (splicing, SNV, etc). They focus on simulated data for fusion detection, and then focus on analyzing the Tabula sapiens Smart-seq2 data, showing extensive results on alternative splicing analysis, VDJ, and repeat elements.

      This is strong work with an impressive array of biological investigations and results for a methods paper. I have various concerns about terminology and comparisons, as follows (in a somewhat arbitrary order, apologies).

      Thank you for this thorough overview of our work and your positive comment on the strength of our work.

      (1) The discussion of the weaknesses of the consensus sequence approach of SPLASH is an odd way to motivate SPLASH+ in my opinion, in that SPLASH is not yet so widely used, so the baseline for SPLASH+ is really standard alignment-based approaches. It is fine to mention consensus sequence issues briefly, but it felt belabored.

      We thank the reviewer and agree that the primary comparison for SPLASH+ is with reference-based methods. However, since SPLASH+ builds upon SPLASH, we also aimed to highlight the limitations of the consensus step in original SPLASH and how SPLASH+ addresses them. To maintain the main focus of the paper on comparison with reference-based methods and biological investigations, this discussion with consensus was provided in a Supplementary Figure. We will shorten this discussion in the revision.

      (2) Regarding compactors reducing alignment cost: the comparison should really be between compactor construction and alignment vs read alignment (and maybe vs modern contig construction algorithms and alignment).

      Since the SPLASH framework is fundamentally reference-free and does not require read alignment, we compared the number of sequence alignments for compactors to the total read alignments required by a reference-based method to show that while compactors are aligned to the reference, the number of alignments needed is still orders of magnitude less than a reference-based approach requiring alignment of all the reads.

      (3) The language around "compactors" is a bit confusing, where the authors sometimes refer to the tree of possibilities from an anchor as a "compactor", and sometimes a compactor is a single branch. Presumably, ideally, compactors should be DAGs, not trees, i.e., they can connect back together. Perhaps the authors could comment on whether this matters/would be a valuable extension.

      We thank the reviewer for their comment. We refer to each generated assembled sequence as “a compactor”, and we attempted to make this clear in the paper. We will review the text further to ensure this definition is clear in the revised version.

      (4) The main oddness of the splicing analysis to me is not using cell-type/state in any way in the statistical testing. This need not be discrete cell types: psiX, for example, tested whether exonic PSI was variable with reference to a continuous gene expression embedding. Intuitively, such transcriptome-wide signal should be valuable for a) improving power and b) distinguishing cell-type intrinsic/"noisy" from cell-type specific splicing variation. A straightforward way of doing this would be pseudobulking cell types. Possibly a more sophisticated hierarchical model could be constructed also.

      We appreciate the reviewer’s concern regarding SPLASH+ not using cell type metadata. SPLASH, which performs the core statistical inference in SPLASH+, is an unsupervised tool specifically designed to make biological discoveries without relying on metadata (such as cell type annotations in scRNA-Seq). This is particularly useful in scRNA-seq, where cell type labels could be missing, imprecise, or may miss important within-cell-type variation. As shown in the paper, even without using metadata, SPLASH+ demonstrated improved performance than both SpliZ and Leafcutter (two metadata-dependent tools) in terms of achieving higher concordance and identifying more differentially spliced genes. Regarding pseudobulking, as has been shown in the SpliZ paper (https://doi.org/10.1038/s41592-022-01400-x), pseudobulking requires multiple pseudobulked replicates per cell type for reliable inference, which is often not feasible in scRNA-seq settings, making such methods statistically suboptimal for single-cell studies. We will add a discussion on pseudobulking in the revision. 

      (5) A secondary weakness is that some informative reads will not be used, for example, unspliced reads aligning to an alterantive exons. This relates to the broader weakness of SPLASH that it is blind to changes in coverage that are not linked to a specific anchor (which should be acknowledged somewhere, maybe in the Discussion). In the deeply sequenced SS2 data, this is likely not an issue, but might be more limiting in sparser data. A related issue is that coverage change indicative of, e.g., alternative TSS or TES (that do not also include a change in splice junction use) will not be detected. In fairness, all these weaknesses are shared by LeafCutter. It would be valuable to have a comparison to a more "traditional" splicing analysis approach (pick your favorite of rMATS, MISO, SUPPA).

      We thank the reviewer for their comment. As noted in the Conclusion, the SPLASH framework is not designed for differential gene expression analysis, which relies on quantifying read coverage. Rather, it focuses on detecting differential sequence diversity arising from mechanisms like alternative splicing or RNA editing. We will clarify this limitation further in the revised Conclusion. 

      Regarding splicing evaluation, we have performed extensive comparisons with two widely used and recent methods—SpliZ and Leafcutter—for both bulk and single-cell splicing analysis. While we appreciate the reviewer’s suggestion to include an additional method, given the current length of the paper and the fact that leafcutter has previously been shown to outperform rMATS, MAJIQ, and Cufflinks2

      (https://www.nature.com/articles/s41588-017-0004-9), we believe the current comparisons provide sufficient support for the evaluation of the splicing detection by SPLASH+.

      (6) "We should note that there is no difference between gene fusions and other RNA variants (e.g., RNA splicing) from a sequence assembly viewpoint". Maybe this is true in an abstract sense, but I don't think it is in reality. AS can produce hundreds of isoforms from the same gene, and be variable across individual cells. Gene fusions are generally less numerous/varied and will be shared across clonal populations, so the complexity is lower. That simplicity is balanced against the challenge that any genes could, in principle, fuse.

      We selected the fusion benchmarking dataset solely to evaluate how well compactors reconstruct sequences. Since our goal was to assess the accuracy of reconstructed compactor sequences, we needed a benchmarking dataset with ground truth sequences, which this dataset provides. We had explained our main reason and purpose for selecting fusion dataset in the text, but we will clarify it further in the revision.

      (7) For the fusion detection assessment, SPLASH+ is given the correct anchor for detection. This feels like cheating since this information wouldn't usually be available. Can the authors motivate this? Are the other methods given comparable information? Also, TPM>100 seems like a very high expression threshold for the assessment.

      We agree with the reviewer that the fusion benchmarking dataset should not be used to assess the entire SPLASH+ framework. In fact, we did not use this dataset to evaluate SPLASH+; it was used exclusively to evaluate the performance of compactors as a standalone module. Specifically, we tested how well compactors can reconstruct fusion sequences when provided with seed sequences corresponding to fusion junctions. This aligns with our expectation from compactors in SPLASH+, that they should correctly reconstruct the sequence context for the detected anchors. As noted in our previous response, since our goal was to assess the accuracy of reconstructed compactor sequences, we required a benchmarking dataset with ground truth sequences, which this dataset provides. We will clarify this further in the revision.

      We appreciate the reviewer’s concern that a TPM of 100 is high. In Figure 1C, we presented the full TPM distribution for fusions missed or detected by compactors. The 100 threshold was an arbitrary benchmark to illustrate the clear difference in TPM profiles between these two sets of fusions. We will clarify this point in the revised manuscript.

      (8) Why are only 3'UTRs considered and not 5'? Is this because the analysis is asymmetric, i.e., only considering upstream anchors and downstream variation? If so, that seems like a limitation: how much additional variation would you find if including the other direction?

      We thank the reviewer for their comment. SPLASH+ can, in principle, detect variation in 5’ UTR regions, as demonstrated by the variations observed in the 5’ UTRs of the genes ANPC16 and ARPC2. If sequence variation exists in the 5′ UTR, SPLASH+ can still detect it by identifying an anchor upstream of the variable region, as it directly parses sequencing reads to find anchors with downstream sequence diversity. Even when the variation occurs near the 5′ end of the 5′ UTR, SPLASH+ can still capture this diversity if the user selects a shorter anchor length.

      (9) I don't find the theoretical results very meaningful. Assuming independent reads (equivalently binomial counts) has been repeatedly shown to be a poor assumption in sequencing data, likely due to various biases, including PCR. This has motivated the use of overdispersed distributions such as the negative Binomial and beta binomial. The theory would be valuable if it could say something at a specified level of overdispersion. If not, the caveat of assuming no overdispersion should be clearly stated.

      We appreciate the reviewer’s comment. We will clarify this in the revised paper.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for the careful review of our manuscript. Overall, they were positive about our use of cutting-edge methods to identify six inversions segregating in Lake Malawi. Their distribution in ~100 species of Lake Malawi species demonstrated that they were differentially segregating in different ecogroups/habitats and could potentially play a role in local adaptation, speciation, and sex determination. Reviewers were positive about our finding that the chromosome 10 inversion was associated with sex-determination in a deep benthic species and its potential role in regulating traits under sexual selection. They agree that this work is an important starting point in understanding the role of these inversions in the amazing phenotypic diversity found in the Lake Malawi cichlid flock.

      There were two main criticisms that were made which we summarize:

      (1) Lack of clarity. It was noted that the writing could be improved to make many technical points clearer. Additionally, certain discussion topics were not included that should be.

      We will rewrite the text and add additional figures and tables to address the issues that were brought up in a point-by-point response. We will improve/include (1) the nomenclature to understand the inversions in different lineages, (2) improved descriptions for various genomic approaches, (3) a figure to document the samples and technologies used for each ecogroup, and 4) integration of LR sequences to identify inversion breakpoints to the finest resolution possible.

      (2) We overstate the role that selection plays in the spread of these inversions and neglect other evolutionary processes that could be responsible for their spread.

      We agree with the overarching point. We did not show that selection is involved in the spread of these inversions and other forces can be at play. Additionally, there were concerns with our model that the inversions introgressed from a Diplotaxodon ancestor into benthic ancestors and incomplete lineage sorting or balancing selection (via sex determination) could be at play. Overall, we agree with the reviewers with the following caveats. 1. Our analysis of the genetic distance between Diplotaxodons and benthic species in the inverted regions is more consistent with their spread through introgression versus incomplete lineage sorting or balancing selection. 2. Further the role of these inversions is likely different in different species. For example, the inversion of 10 and 11 play a role in sex determination in some species but not others and the potential pressures acting on the inverted and non-inverted haplotypes will be very different. These are very interesting and important questions booth for understanding the adaptive radiations in Lake Malawi and in general, and we are actively studying crosses to understand the role of these inversions in phenotypic variation between two species. We will modify the text to make all of these points clearer.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using high-quality genomic data (long-reads, optical maps, short-reads) and advanced bioinformatic analysis, the authors aimed to document chromosomal rearrangements across a recent radiation (Lake Malawi Cichlids). Working on 11 species, they achieved a high-resolution inversion detection and then investigated how inversions are distributed within populations (using a complementary dataset of short-reads), associated with sex, and shared or fixed among lineages. The history and ancestry of the inversions is also explored.

      On one hand, I am very enthusiastic about the global finding (many inversions well-characterized in a highly diverse group!) and impressed by the amount of work put into this study. On the other hand, I have struggled so much to read the manuscript that I am unsure about how much the data supports some claims. I'm afraid most readers may feel the same and really need a deep reorganisation of the text, figures, and tables. I reckon this is difficult given the complexity brought by different inversions/different species/different datasets but it is highly needed to make this study accessible.

      The methods of comparing optical maps, and looking at inversions at macro-evolutionary scales can be useful for the community. For cichlids, it is a first assessment that will allow further tests about the role of inversions in speciation and ecological specialisation. However, the current version of the manuscript is hardly accessible to non-specialists and the methods are not fully reproducible.

      Strengths:

      (1) Evidence for the presence of inversion is well-supported by optical mapping (very nice analysis and figure!).

      (2) The link between sex determination and inversion in chr 10 in one species is very clearly demonstrated by the proportion in each sex and additional crosses. This section is also the easiest to read in the manuscript and I recommend trying to rewrite other result sections in the same way.

      (3) A new high-quality reference genome is provided for Metriaclima zebra (and possibly other assemblies? - unclear).

      (4) The sample size is great (31 individuals with optical maps if I understand well?).

      (5) Ancestry at those inversions is explored with outgroups.

      (6) Polymorphism for all inversions is quantified using a complementary dataset.

      Weaknesses:

      (1) Lack of clarity in the paper: As it currently reads, it is very hard to follow the different species, ecotypes, samples, inversions, etc. It would be useful to provide a phylogeny explicitly positioning the samples used for assembly and the habitat preference. Then the text would benefit from being organised either by variant or by subgroups rather than by successive steps of analysis.

      We have extensively rewritten the paper to improve the clarity. With respect to this point, we moved Figure 6 to Figure 1, which places the phylogeny of Lake Malawi cichlids at the beginning of the paper. We incorporated information about samples/technologies by ecogroup into this figure to help the reader gain an overview of the technologies involved. We added information about habitat for each ecogroup as well. While we considered a change to the text organization suggested here, we thought it was clearer to keep the original headings.

      (2) Lack of information for reproducibility: I couldn't find clearly the filters and parameters used for the different genomic analyses for example. This is just one example and I think the methods need to be re-worked to be reproducible. Including the codes inside the methods makes it hard to follow, so why not put the scripts in an indexed repository?

      We now provide a link to a github repository (https://github.com/ptmcgrat/CichlidSRSequencing/tree/Kumar_eLife) containing the scripts used for the major analysis in the paper. Because our data is behind a secure Dropbox account, readers will not be able to run the analysis, however, they can see the exact programs, filters, and parameters used for manuscript embedded within each script.

      (3) Further confirmation of inversions and their breakpoints would be valuable. I don't understand why the long-reads (that were available and used for genome assembly) were not also used for SV detection and breakpoint refinement.

      We did use long reads to confirm the presence of the inversions by creating five new genome assemblies from the PacBio HiFi reads: two additional Metriaclima zebra samples and three Aulonocara samples. Alignment of these five genomes to the MZ_GT3 reference is shown in Figures S2 – S7. These genome assemblies were also used to identify the breakpoints of the inversions. However, because of the extensive amount of repetitive DNA at the breakpoints (which is known to be important for the formation of large inversions), our ability to resolve the breakpoints was limited.

      (4) Lack of statistical testing for the hypothesis of introgression: Although cichlids are known for high levels of hybridization, inversions can also remain balanced for a long time. what could allow us to differentiate introgression from incomplete lineage sorting?

      The coalescent time between the inversions between Diplotaxodons and benthics should allow us to distinguish these two mechanisms. Our finding that the genetic distance, which is related to coalescent time, is closer within the inversions than the whole genome is supportive of introgression. However, we did not perform any simulations or statistical tests. We make it clearer in the text that incomplete lineage sorting remains a possible mechanism for the distribution of inversions within these ecogroups.

      (5) The sample size is unclear: possibly 31 for Bionano, 297 for short-reads, how many for long-reads or assemblies? How is this sample size split across species? This would deserve a table.

      We have included this information in the new Figure 1.

      (6) Short read combines several datasets but batch effect is not tested.

      We do not test for batch effect. However, we do note that all of the datasets were analyzed by the same pipeline starting from alignment so batch effects would be restricted to aspects of the reads themselves. Additionally, samples from the different data sets clustered as expected by lineage and inferred inversion, so for these purposes unlikely to have affected analysis.

      (7) It is unclear how ancestry is determined because the synteny with outgroups is not shown.

      Ancestry analysis was determined using the genome alignments of two outgroups from outside of Lake Malawi. This is shown in Figure S8.

      (8) The level of polymorphism for the different inversions is difficult to interpret because it is unclear whether replicated are different species within an eco-group or different individuals from the same species. How could it be that homozygous references are so spread across the PCA? I guess the species-specific polymorphism is stronger than the ancestral order but in such a case, wouldn't it be worth re-doing the PCa on a subset?

      The genomic PCA plots reflect the evolutionary histories that are observed in the whole genome phylogenies. Because the distribution of the inverted alleles violate the species tree, they form separate clusters on the PCA plots that can be used to genotype specific species. We have also performed this analysis on benthics (utaka/shallow benthics/deep benthics) and the distribution matches the expectation.

      Reviewer #2 (Public review):

      Summary:

      Chromosomal inversions have been predicted to play a role in adaptive evolution and speciation because of their ability to "lock" together adaptive alleles in genomic regions of low recombination. In this study, the authors use a combination of cutting-edge genomic methods, including BioNano and PacBio HiFi sequencing, to identify six large chromosomal inversions segregating in over 100 species of Lake Malawi cichlids, a classic example of adaptive radiation and rapid speciation. By examining the frequencies of these inversions present in species from six different linages, the authors show that there is an association between the presence of specific inversions with specific lineages/habitats. Using a combination of phylogenetic analyses and sequencing data, they demonstrate that three of the inversions have been introduced to one lineage via hybridization. Finally, genotyping of wild individuals as well as laboratory crosses suggests that three inversions are associated with XY sex determination systems in a subset of species. The data add to a growing number of systems in which inversions have been associated with adaptation to divergent environments. However, like most of the other recent studies in the field, this study does not go beyond describing the presence of the inversions to demonstrate that the inversions are under sexual or natural selection or that they contribute to adaptation or speciation in this system.

      Strengths:

      All analyses are very well done, and the conclusions about the presence of the six inversions in Lake Malawi cichlids, the frequencies of the inversions in different species, and the presence of three inversions in the benthic lineages due to hybridization are well-supported. Genotyping of 48 individuals resulting from laboratory crosses provides strong support that the chromosome 10 inversion is associated with a sex-determination locus.

      Weaknesses:

      The evidence supporting a role for the chromosome 11 inversion and the chromosome 9 inversion in sex determination is based on relatively few individuals and therefore remains suggestive. The authors are mostly cautious in their interpretations of the data. However, there are a few places where they state that the inversions are favored by selection, but they provide no evidence that this is the case and there is no consideration of alternative hypotheses (i.e. that the inversions might have been fixed via drift).

      We have removed mention of chromosome 9’s potential role in sex determination from the paper. While our analysis of sex association with chromosome 11 was limited compared to our analysis of chromosome 10, it was still statistically significant, and we believe it should be left in the paper. The role of 11 (and 9 and 10) in sex determination was also demonstrated using an independent dataset by Blumer et al (https://doi.org/10.1101/2024.07.28.605452)

      We agree that we did not properly consider alternative hypothesis in the original submission and have rewritten the Discussion substantially to consider various alternative hypothesis.

      Reviewer #3 (Public review):

      This is a very interesting paper bringing truly fascinating insight into the genomic processes underlying the famous adaptive radiation seen in cichlid fishes from Lake Malawi. The authors use structural and sequence information from species belonging to distinct ecotypic categories, representing subclades of the radiation, to document structural variation across the evolutionary tree, infer introgression of inversions among branches of the clade, and even suggest that certain rearrangements constitute new sex-determining loci. The insight is intriguing and is likely to make a substantial contribution to the field and to seed new hypotheses about the ecological processes and adaptive traits involved in this radiation.

      I think the paper could be clarified in its prose, and that the discussion could be more informative regarding the putative roles of the inversions in adaptation to each ecotypic niche. Identifying key, large inversions shared in various ways across the different taxa is really a great step forward. However, the population genomics analysis requires further work to describe and decipher in a more systematic way the evolutionary forces at play and their consequences on the various inversions identified.

      The model of evolution involving multiple inversions putatively linking together co-adapted "cassettes" could be better spelled out since it is not entirely clear how the existing theory on the recruitment of inversions in local adaptation (e.g. Kirkpatrick and Barton) operates on multiple unlinked inversions. How such loci correspond to distinct suites of integrated traits, or not, is not very easy to envision in the current state of the manuscript.

      This is a very interesting point, and we agree creates complications for a simple model of local adaptation. We imagine though that the actual evolutionary history was much more complicated than a single Rhamphochromis-type species separating from a single Diplotaxodon-type species and could have occurred sequentially involving multiple species that are now extinct. A better understanding of the role each of these inversions play in phenotypic diversity could potentially help us determine if different inversions carry variation that could be linked to distinct habit differences. We have added a line to the discussion.

      The role of one inversion in sex determination is apparent and truly intriguing. However, the implication of such locus on ecological adaptation is somewhat puzzling. Also, whether sex determination loci can flow across species via introgression seems quite important as a route to chromosomal sex determination, so this could be discussed further.

      Another very interesting point. If the inversions are involved in ecological adaptation (an important caveat), then potentially the inverted and non-inverted haplotypes play dual roles in the Aulonocara animals with the inverted haplotype carrying adaptive alleles to deep water and the non-inverted haplotype carrying alleles resolving sexual conflict. We have broadened our discussion about their function at the origin including non-adaptive roles.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Overall, the paper is well-written and clear. I do have a few suggestions for changes that would help the reader:

      (1) Figure 1: the figure legend could be expanded here to help the reader; what are the blue and yellow lines? Why are there two lines for the GT3a assembly? And, I had to somehow read the legend a few times to understand that the top line is the UMD2a reference assembly, and the next line is the new Bionano map.

      Fixed in what is now Figure 2

      (2) Paragraph starting on line 133: you use the word "test" to refer to the Bionano analyses; it is not clear whether anything is being tested. Perhaps "analyse the maps" or just "map" would be more clear? Or more explanation?

      The text has been modified to address this point

      (3) L145-146: perhaps change "a single inversion" and "a double inversion" to "single inversions" and "double inversions".

      The text has been modified to address this point

      (4) L157: suppression of recombination in inversion heterozygotes is "textbook" material and perhaps does not need a reference. Or, you could reference an empirical paper that demonstrates this point. Though I love the Kirkpatrick and Barton paper, it certainly is not the correct reference for this point.

      The Kirkpatrick reference was incorrectly included here. The correct reference was an empirical demonstration (Conte) that there were regions of suppressed recombination that have been observed in the location of the inversions. We have also moved this reference further up in the sentence to a more appropriate position

      (5) L173: how do you know this is an assembly error and not polymorphism?

      The text has been modified to address this point

      (6) L277(?): "currently growing in the lab" is probably unnecessary.

      The text has been modified to address this point

      (7) L298: "the inversion on 10 acts as an XY sex determiner": the inversion itself is not the sex determination gene; rather, it is linked. I think it would be more precise, here and throughout the paper, to say that these inversions likely harbor the sex determination locus (for example, the wording on lines 369-370 is misleading).

      We agree with the larger point that the inversion might not be causal for sex determination, however, it could still be causal through positional effects. We have modified the text to make it clear that it could also carry the causal locus (or loci).

      (8) Figure 6: overall, this figure is very helpful! However, it contains several problematic statements. In no case do you have evidence that these inversions are "favored by selection"; such statements should be deleted. Also, in point 3, you state that inversions 9, 11, and 20 are transferred to benthic lineages, and then that these inversions are involved in sex determination. But, your data suggests that it is chromosomes 9, 10, and 11 that are linked to sex determination.

      This figure is now Figure 1. We have remove these problematic statements.

      (9) L356-360: I would move the references that are currently at the end of the sentence to line 357 after the statement about the previous work on hybridization. Otherwise, it reads as if these previous papers demonstrated what you have demonstrated in your work.

      The text has been modified to address this point

      (10) Overall, the discussion focuses completely on adaptive explanations for your results, and I would like to see at least an acknowledgement that drift could also be involved unless you have additional data to support adaptive explanations.

      We have rewritten the text to account for the possibility of drift (line 404 and 405).

      Reviewer #3 (Recommendations for the authors):

      The paper utilizes heterogeneous datasets coming from different sources, and it is not always clear which specimens were used to generate structural information (bionano) or sequence information. A diagram summarizing the sequence data, methodologies, and research questions would be beneficial for the reader to navigate in this paper.

      Much of this information has been added to what is now Figure 1. All of this data is also found in Table S2.

      The authors performed genome alignments to analyze and homologize inversion, but this process is not clearly described. For the PCA, SNP information likely involves mapping onto a common reference genome. However, it is not clear how this was achieved given the different species and varying divergence times involved.

      We now include a link to the github that contains the commands that were run. Because the overall level of sequence divergence between cichlid species is quite low (2*10^-3 – Milansky et al), mapping different species onto a common reference is commonly performed in Lake Malawi cichlids.

      The introgression scenario is very intriguing but its role in local adaptation of the ecogroup types is not easy to understand. I understand this is still an outstanding question, but it is unclear how the directionality of introgressions was estimated. This can be substantiated using tree topology analysis, comparative estimates of sequence divergence, and accumulation of DNA insertions. The diagram does not clearly indicate which ones are polymorphic. In some cases, polymorphic inversions could result from the coexistence of native and introgressed haplotypes.

      We agree that this analysis would be interesting but is beyond the scope of this paper.

      The alternative model of introgression proposed in the cited preprint is interesting and should deserve a formal analysis here. The authors consider unclear what would drive "back" introgressions of non-inverted haplotypes, but this would depend on the selection regimes acting on the inversions themselves, which can include forms of balancing selection and a role for recessive lethals (heterozygote advantage). For instance, a standard haplotype could be favored if it shelters deleterious mutations carried by an inversion. Testing the introgression history over a wider range of branches and directions would provide further insights.

      We agree that this analysis would be interesting but is beyond the scope of this paper.

      The prose in the paper is occasionally muddled and somewhat unclear. Referring to chromosomes solely by their numbers (e.g.. "inversion on 11") complicates readability.

      This is the standard way to refer to chromosomes in cichlids and we believe while it complicates readability, any other method would be inconsistent with other papers. Changes to nomenclature might improve the readability of this paper, but would make it more difficult to compare results for these chromosomes from other papers with what we have found.

    1. Billionaires

      I believe that harassment is never justified. Harassment involves actions like online insults, cyberstalking, and invasion of personal information to harm a user. While some people may think that harassment is acceptable when directed at extremists such as racists, white supremacists, or sexists. While I strongly disagree, there are clearly better ways to address such issues than resorting to harassment. For example, we can use facts and logic to refute their views instead of launching personal attacks, or report their behavior through legal and official channels.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The Authors investigated the anatomical features of the excitatory synaptic boutons in layer 1 of the human temporal neocortex. They examined the size of the synapse, the macular or the perforated appearance and the size of the synaptic active zone, the number and volume of the mitochondria, the number of the synaptic and the dense core vesicles, also differentiating between the readily releasable, the recycling and the resting pool of synaptic vesicles. The coverage of the synapse by astrocytic processes was also assessed, and all the above parameters were compared to other layers of the human temporal neocortex. The Authors conclude that the subcellular morphology of the layer 1 synapses is suitable for the functions of the neocortical layer, i.e. the synaptic integration within the cortical column. The low glial coverage of the synapses might allow the glutamate spillover from the synapses enhancing synaptic crosstalk within this cortical layer.

      Strengths:

      The strengths of this paper are the abundant and very precious data about the fine structure of the human neocortical layer 1. Quantitative electron microscopy data (especially that derived from the human brain) are very valuable, since this is a highly time- and energy consuming work. The techniques used to obtain the data, as well as the analyses and the statistics performed by the Authors are all solid, strengthen this manuscript, and mainly support the conclusions drawn in the discussion.

      Comments on latest version:

      The corrected version of the article titled “Ultrastructural sublaminar specific diversity of excitatory synaptic boutons in layer 1 of the adult human temporal lobe neocortex" has been improved thanks to the comments and suggestions of the reviewers. The Authors implemented several of my comments and suggestions. However, many of them were not completed. It is understandable that the Authors did not start a whole new series of experiments investigating inhibitory synapses (as it was a misunderstanding affecting 2 reviewers from the three). But the English text is still very hard to understand and has many mistakes, although I suggested to extensively review the use of English. Furthermore, my suggestion about avoiding many abbreviations in the abstract, analyse and discuss more the perforated synapses, the figure presentation (Figure 3) and including data about the astrocytic coverage in the Results section were not implemented. My questions about the number of docked vesicles and p10 vesicles, as well as about the different categories of the vesicle pools have not been answered neither. Many other minor comments and suggestions were answered, corrected and implemented, but I think it could have been improved more if the Authors take into account all of the reviewers' suggestions, not only some of them. I still have several main and minor concerns, with a few new ones as well I did not realize earlier, but still think it is important.

      We would like to thank the reviewer for the comments.

      - We worked on the English again and tried to improve the language.

      - We avoided to use too many abbreviations in the Abstract and reduced them to a minimum.

      - We included a small paragraph about non-perforated vs. perforated active zones in both the Results and Discussion sections. However, since the majority of active zones in all cortical layers of the human TLN were of the macular type, we concluded that it is not relevant to describe their function in more detail.

      - In Figure 3 A-C we added contour lines to the boutons to make their outlines more visible.

      - We completed the data about the astrocytic coverage in the Results section (see also below).

      - Concerning the vesicle pools please see below.

      Main concerns:

      (1) Epileptic patients:

      As all patients were epileptic, it is not correct to state in the abstract that non-epileptic tissue was investigated. Even if the seizure onset zone was not in the region investigated, seizures usually invade the temporal lobe in TLE. If you can prove that no spiking activity occurred in the sample you investigated and the seizures did not invade that region, then you can write that it is presumably non-epileptic. I would suggest to write “L1 of the human temporal lobe neocortical biopsy tissue". See also Methods lines 608-612. Write only “non-epileptic" or “non-affected" if you verified it with EcoG. If this was the case, please write a few sentences about it in the Methods.

      We rephrased Material and Methods concerning this point and added that patients were monitored with EEG, MRI and multielectrode recordings. In addition, we stated that the epileptic focus was always far away from the neocortical tissue samples. Furthermore, we added a small paragraph that functional studies using the same methodology have shown that neocortical access tissue samples taken from epilepsy surgery do not differ in electrophysiological properties and synaptic physiology when compared with acute slice preparations in experimental animals and we quoted the relevant papers.

      We hope that the reviewer is now convinced that our tissue samples can be regarded as non-affected.

      (2) About the inhibitory/excitatory synapses.

      Since our focus was on excitatory synaptic boutons as already stated in the title we have not analyzed inhibitory SBs. Now, I do understand that only excitatory synapses were investigated. Although it was written in the title, I did not realized, since all over the manuscript the Authors were writing synapses, and were distinguishing between inhibitory and excitatory synapses in the text and showing numerous excitatory and inhibitory synapses on Figure 2 and discussing inhibitory interneurons in the Discussion as well. Maybe this was the reason why two reviewers out of the three (including myself) thought you investigated both types of synapses but did not differentiated between them. So, please, emphasize in the Abstract (line 40), Introduction (for ex. line 92-97) and the Discussion (line 369) that only excitatory synaptic boutons were investigated.

      As this paper investigated only excitatory synaptic boutons, I think it is irrelevant to write such a long section in the Discussion about inhibitory interneurons and their functions in the L1 of the human temporal lobe neocortex. Same applies to the schematic drawing of the possible wiring of L1 (Figure 7). As no inhibitory interneurons were examined, neither the connection of the different excitatory cells, only the morphology of single synaptic boutons without any reference on their origin, I think this figure does not illustrate the work done in this paper. This could be a figure of a review paper about the human L1, but is inappropriate in this study.

      We followed the reviewer’s suggestion and pointed out explicitly that we only investigated excitatory synaptic boutons. We also changed the Discussion and focused more on circuitry in L1 and the role of CR-cells.

      (3) Perforated synapses

      The findings of the Geinismann group suggesting that perforated synapses are more efficient than non-perforated ones is nowadays very controversially discussed” I did not ask the Authors to say that perforated synapses are more efficient. However, based on the literature (for ex. Harris et al, 1992; Carlin and Siekievitz, 1982; Nieto-Sampedro et al., 1982) the presence of perforated synapses is indeed a good sign of synapse division/formation - which in turn might be coupled to synaptic plasticity (Geinisman et al, 1993), increased synaptic activity (Vrensen and Cardozo, 1981), LTP (Geinisman et al, 1991, Harris et al, 2003), pathological axonal sprouting (Frotscher et al, 2006), etc. I think it is worth mentioning this at least in the Discussion.

      We agree with the reviewer and added a small paragraph in the Results section about the two types of AZs in L1 of the human TLN. We pointed out that there are both types, macular non-perforated and perforated AZs, but the majority in all layers were of the non-perforated type. In the Discussion we added some paper pointing out the role of perforated synapses.

      (4) Question about the vesicle pools

      Results, Line 271: Still not understandable, why the RRP was defined as {less than or equal to}10 nm and {less than or equal to}20nm. Why did you use two categories? One would be sufficient (for example {less than or equal to}20nm). Or the vesicles between 10 and 20nm were considered to be part of RRP? In this case there is a typo, it should be {greater than or equal to}10 nm and {less than or equal to}20nm.

      The answer of the Authors was to my question raised: We decided that also those very close within 10 and 20 nm away from the PreAZ, which is less than a SV diameter may also contribute to the RRP since it was shown that SVs are quite mobile.

      This does not clarify why did you use two categories. Furthermore, I did not receive answer (such as Referee #2) for my question on how could you have 3x as many docked vesicles than vesicles {less than or equal to}10nm. The category {less than or equal to}10nm should also contain the docked vesicles. Or if this is not the case, please, clarify better what were your categories.

      We thank the reviewer for pointing out that mentioning two distance criteria (p10 and p20) to define one physiological entity (RRP) is somewhat confusing and we acknowledge that the initial response to the reviewers falls short of explaining this choice. This is indeed only understandable in the context of the original paper by Sätzler et al. 2002, where these criteria were first introduced. We therefore referenced this publication more prominently in the paragraph in question.

      So to explain this, we first would like to clarify the definition of the two RRP classification criteria used (p10 and p20), which has caused some confusion amongst the reviewers as to which vesicles where included or not:

      - p10 criterion: p£10 nm (SVs have a minimum distance less than or equal to 10 nm from the PreAZ), including ‘docked’ vesicles which have a distance of zero or less (p0)

      - p20 criterion: p£20 nm (SVs have a minimum distance less than or equal to 20 nm from the PreAZ), including vesicles of the p10 criterion.

      As mentioned, these criteria were introduced first in Sätzler et al. 2002 looking at the Calyx of Held synapse. In that paper, we tried to establish a morphological correlate to existing physiological measurements, which included the RRP. As there is no known marker that would allow to discriminate between vesicles that contribute to the RRP anatomically, we looked at existing physiological experiments such as Schneggenburger et al. 1999; Wu and Borst 1999; Sun and Wu 2001 and compared their total numbers to our measurements. As the number of docked vesicles (p0, see above) was on the lower side of these physiological estimates, we also looked at vesicles close to the AZ, which we think could be recruited within a short time (£ 10 msec). Comparing with existing literature, we found that at p20 we get pool sizes comparable to midrange estimates of reported RRP sizes. In order to account for the variability of the observed physiological pool sizes, we reported all three measurements (p0, p10, p20) not only in the original Calyx of Held, but in all subsequent studies of different CNS synapses of our group since then.

      As it remains uncertain if such correlate indeed exists, we therefore followed the suggestion to rephrase RRP and RP to putative RRP and putative RP (see also Rollenhagen et al. 2007). We thank both reviewers for pointing out this omission.

      Concerning the difference between ‘docked’ vesicles and vesicles within the p10 perimeter criterion. First of all, the reviewer is right in saying that the category p10 ({less than or equal to}10nm) should also contain the docked vesicles (see above). The fact to have 3x as many ‘docked’ vesicles in our TEM tomography than in the p10 distance analysis could be partly explained, on the one hand, by a very high variability between patients (as expressed by the high SD, table 1) and, on the other hand, by a high intraindividual synaptic bouton variability. In both sublayers, there is a huge difference in the number of vesicles within the p10 criterion of individual synaptic boutons ranging from 0 to ~40 with a mean value of ~1 to ~4 (calculated per patient), the upper level being close to the values calculated with TEM tomography for the ‘docked’ vesicles.

      (5) Astrocytic coverage

      On Fig. 6 data are presented on the astrocytic coverage derived from L1 and L4. In my previous review I asked to include this in the text of the Results as well, but I still do not see it. It is also lacking from the Results how many samples from which layer were investigated in this analysis. Only percentages are given, and only for L1 (but how many patients, L1a and/or L1b and/or L4 is not provided). In contrast, Figure 6 and Supplementary Table 2 (patient table) contains the information that this analysis has been made in L4 as well. Please, include this information in the text as well (around lines 348-360).

      In our previous revised version, we had included the values shown in Fig. 6 for both L1 and L4 in the Results section (L4: lines 352 – 355: ‘The findings in L1…’). However, we agree with the reviewer and have now also added the number of patients and synapses investigated (now lines 359 – 365).

      About how to determine glial elements. I cannot agree with the Authors that glial elements can be determined with high certainty based only on the anatomical features of the profiles seen in the EM. “With 25 years of experience in (serial) EM work" I would say, that glial elements can be very similar to spine necks and axonal profiles.

      All in all, if similar methods were used to determine the glial coverage in the different layers of the human neocortex, than it can be compared (I guess this is the case). However, I would say in the text that proper determination would need immunostaining and a new analysis. This only gives an estimation with the possibility of a certain degree of error.

      We do not entirely agree with the reviewer on this point. As stated in the text, there are structural criteria to identify astrocytic elements (see citations quoted). These golden standard criteria are commonly used also by other well-known groups (DeFelipe and co-workers, Francisco Clasca and co-workers; Michael Frotscher the late and co-workers etc.). However, in a past paper about astrocytic coverage of synaptic complexes in L5 of the human TLN, immunohistochemistry against glutamine synthetase, a key enzyme in astrocytes, was carried out to describe the coverage. This experiment supports our findings in the other cortical layers of the human TLN. As the reviewer might know, immunohistochemistry always led to a reduction in ultrastructural preservation, so we decided not to use immunohistochemistry for the further publications of the other cortical layers. We added a short notice on this in the Material and Methods section.

      (6) Large interindividual differences in the synapse density should be discussed in the Discussion.

      As suggested by the reviewer we have included a sentence in the Discussion that interindividual differences can be either related to differences in age, gender and the use of different methodology as suggested by DeFelipe and co-workers (1999)

      Reviewer #2 (Public review):

      Summary:

      The study of Rollenhagen et al examines the ultrastructural features of Layer 1 of human temporal cortex. The tissue was derived from drug-resistant epileptic patients undergoing surgery, and was selected as further from the epilepsy focus, and as such considered to be non-epileptic. The analyses has included 4 patients with different age, sex, medication and onset of epilepsy. The MS is a follow-on study with 3 previous publications from the same authors on different layers of the temporal cortex:

      Layer 4 - Yakoubi et al 2019 eLife

      Layer 5 - Yakoubi et al 2019 Cerebral Cortex,

      Layer 6 - Schmuhl-Giesen et al 2022 Cerebral Cortex

      They find, the L1 synaptic boutons mainly have single active zone a very large pool of synaptic vesicles and are mostly devoid of astrocytic coverage.

      Strengths:

      The MS is well written easy to read. Result section gives a detailed set of figures showing many morphological parameters of synaptic boutons and surrounding glial elements. The authors provide comparative data of all the layers examined by them so far in the Discussion. Given that anatomical data in human brain are still very limited, the current MS has substantial relevance.

      The work appears to be generally well done, the EM and EM tomography images are of very good quality. The analyses is clear and precise.

      Weaknesses:

      The authors made all the corrections required, answered most of my concerns, included additional data sets, and clarified statements where needed.

      My remaining points are:

      Synaptic vesicle diameter (that has been established to be ~40nm independent of species) can properly be measured with EM tomography only, as it provides the possibility to find the largest diameter of every given vesicle. Measuring it in 50 nm thick sections result in underestimation (just like here the values are ~25 nm) as the measured diameter will be smaller than the true diameter if the vesicle is not cut in the middle, (which is the least probable scenario). The authors have the EM tomography data set for measuring the vesicle diameter properly.

      We thank the reviewer for the helpful comments. We followed the recommendation to measure the vesicle diameter using our TEM tomography tilt series, but came to similar results concerning this synaptic parameter. As stated in our Material and Methods section, we only counted (measured) clear ring-link structures according to a paper by Abercrombie (1963). Since our results are similar for both methods, we do believe that our measurements are correct. Even random single measurements on the original 3D tilt-series yielded comparable results (Lübke and co-workers, personal observation). Furthermore, our results are within ranges, although with high variability, also described by other groups (see discussion lines 436 - 449). We therefore hope that the reviewer will now accept our measurements.

      It is a bit misleading to call vesicle populations at certain arbitrary distances from the presynaptic active zone as readily releasable pool, recycling pool and resting pool, as these are functional categories, and cannot directly be translated to vesicles at certain distances. Even it is debated whether the morphologically docked vesicles are the ones, that are readily releasable, as further molecular steps, such as proper priming is also a prerequisite for release.

      It would help to call these pools as "putative" correlates of the morphological categories.

      We followed the suggestion by the reviewer and renamed our vesicle pools as putative RRP, putative RP and putative resting pools.

      Reviewer #3 (Public review):

      Summary:

      Rollenhagen at al. offer a detailed description of layer 1 of the human neocortex. They use electron microscopy to assess the morphological parameters of presynaptic terminals, active zones, vesicle density/distribution, mitochondrial morphology and astrocytic coverage. The data is collected from tissue from four patients undergoing epilepsy surgery. As the epileptic focus was localized in all patients to the hippocampus, the tissue examined in this manuscript is considered non-epileptic (access) tissue.

      Strengths:

      The quality of the electron microscopic images is very high, and the data is analyzed carefully. Data from human tissue is always precious and the authors here provide a detailed analysis using adequate approaches, and the data is clearly presented.

      Weaknesses:

      The text connects functional and morphological characteristics in a very direct way. For example, connecting plasticity to any measurement the authors present would be rather difficult without any additional functional experiments. References to various vesicle pools based on the location of the vesicles is also more complex than it is suggested in the manuscript. The text should better reflect the limitations of the conclusions that can be drawn from the authors' data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Astrocytic coverage

      On Fig. 6 data are presented on the astrocytic coverage derived from L1 and L4. In my previous review I asked to include this in the text of the Results as well, but I still do not see it. It is also lacking from the Results how many samples from which layer were investigated in this analysis. Only percentages are given, and only for L1 (but how many patients, L1a and/or L1b and/or L4 is not provided). In contrast, Figure 6 and Supplementary Table 2 (patient table) contains the information that this analysis has been made in L4 as well. Please, include this information in the text as well (around lines 348-360).

      See above.

      About how to determine glial elements. I cannot agree with the Authors that glial elements can be determined with high certainty based only on the anatomical features of the profiles seen in the EM. “With 25 years of experience in (serial) EM work" I would say, that glial elements can be very similar to spine necks and axonal profiles. Please, see the photos below, out of the 16 circled profiles (2nd picture, very similar to each other) only 3 belong to an astroglial cell (last picture, purple profiles-purple cell), 10 are spines/spine necks/small caliber dendrites of pyramidal cells, 3 are axonal profiles (last but one picture, blue profiles, marked with arrows on the right side). If you follow in your serial sections those elements which you think are glial processes and indeed they are attached to a confidently identifiable glial cell, I agree, it is a glial process. But identifying small, almost empty profiles without any specific staining, from one single EM section, as glial process is very uncertain. Please, check the database of the Allen Institute made from the V1 visual cortex of a mouse. It is a large series of EM sections where they reconstructed thousands of neurons, astroglial and microglial cells. It is possible to double click on the EM picture on a profile and it will show the cell to which that profile belongs. https://portal.brain-map.org/connectivity/ultrastructural-connectomics Pictures included here: https://elife-rp.msubmit.net/eliferp_files/2024/11/25/00132644/02/132644_2_attach_21_29456_convrt.pdf

      All in all, if similar methods were used to determine the glial coverage in the different layers of the human neocortex, than it can be compared (I guess this is the case). However, I would say in the text that proper determination would need immunostaining and a new analysis. This only gives an estimation with the possibility of a certain degree of error.

      As stated above, we carried out glutamine synthetase immunohistochemistry in L5 of the human TLN and came to the same results. However, we added a sentence on this in the chapter on astrocytic coverage in the Material and Methods section. Additionally, we modified this chapter according to the reviewer’s suggestion.

      Minor comments

      Introduction: Last sentence is not understandable (lines 101-103), please rephrase. (contribute to understand or contribute in understanding or contribute to the understanding of..., but definitely not contribute to understanding). The authors should check and review extensively for improvements to the use of English, or use a program such as Grammarly.

      Results: Grammar (line 107): L1 in the adult mammalian neocortex represents a relatively...

      Line 173: “Some SBs in both sublaminae were seen to establish either two or three SBs on the same spine, spines 173 of other origin or dendritic shafts." - Some SBs established two or three SBs? I would write Some SBs established two or three synapses on...

      Line 243: “The synaptic cleft size were slightly, but non-significantly different"

      Line 260: “DCVs play an important role in endo- and exocytosis, the build-up of PreAZs by releasing Piccolo and Bassoon (Schoch and Gundelfinger 2006; Murkherjee et al. 2010)," - please, correct this.

      We have done corrections as suggested by the reviewer.

      Line 374: No point at the end of the last phrase.

      Discussion:

      Lines 400-404: “The majority of SBs in L1 of the human TLN had a single at most three AZs that could be of the non perforated macular or perforated type comparable with results for other layers in the human TLN but by ~1.5-fold larger than in rodent and non-human primates." - What is comparable with the other layers, but different from animals? Please rephrase this sentence, it is not understandable. I already mentioned this sentence in my previous review, but nothing happened.

      Lines 435-437: “Remarkably, the total pool sizes in the human TLN were significantly larger by more than 6-fold (~550 SVs/AZ), and ~4.7-fold (~750 SVs/AZ;) than those in L4 and L5 (Yakoubi et al. 2019a, b; see also Rollenhagen et al. 2018) in rats." Please rethink what you wished to say and compare to the sentence meaning. I think you wanted to compare human TLN L1 pool size to L4 and L5 in the human TLN (Yakoubi 2019a and b) and to rat (Rollenhagen 2018). Instead, you compared all layers of the human TLN to L4 and L5 in rats (with partly wrong references). Please rephrase this. Lines 483-484: “Astrocytes serve as both a physical barrier to glutamate diffusion and as mediate neurotransmitter uptake via transporters".

      This sentence is grammatically incorrect, please rephrase.

      We corrected the sentences as suggested by the reviewer.

      Methods:

      In the text, there are only 4 patients (lines 603-604), but in the supplementary table there are 9 patients (5 new included for L4 astrocytic coverage). Please, correct it in the text.

      Lines 608-609: “neocortical access tissue samples were resected to control the seizures for histological inspection by neuropathologists." - What is the meaning of this? Please, rephrase.

      We thank the reviewer for the comment and included the 5 patients used for L4 to the Material and Methods section, as well as in the Results section.

      The reviewer is right, and we rephrased and corrected the sentence concerning the inspection by neuropathologists.

      Figures

      Figures 5B: The legend says “SB (sb) synapsing on a stubby spine (sp) with a prominent spine apparatus (framed area) and a thick dendritic segment (de) in L1b" - In my opinion this is not one synaptic bouton, but two. Clearly visible membranes separate them, close to the spine.

      Supplemental Table 2 (patient table). If there is no information about Hu_04 patient's epilepsy, please write N/A (=non available) instead of - (which means it does not exist).

      The reviewer is right, and we corrected the figure and the legend, as well as the table accordingly.

      Reviewer #2 (Recommendations for the authors):

      The authors addressed almost all of my concern, only this one remained:

      If there is, however, relevant literature on "methods based on EM tomography" and "stereological methods to estimate both types of error" (over- and underestimates) that we are missing out on, we would appreciate the reviewer providing us with the corresponding references so that we can include such calculations in our paper.

      There is a very detailed new study on calculating correction for TEM 2D 3D, Rothman et al 2023 PLOS One. That addresses most of these issues.

      We thank the reviewer for drawing our attention to the publication by Rothman et al. 2023, which is a very detailed and comprehensive study looking at accurately estimating distributions of 3D size and densities of particles from 2D measurements using – amongst others – ET and TEM images as well as synaptic vesicles for validating their method. However, we do not see how this would be relevant to the reported mean diameters and their corresponding variances. And even if we would have reported on vesicle size/diameter distributions (referred to as G(d) in Rothmann et al. 2023), the authors themselves state that “… the results from our ET and TEM image analysis highlight the difficulty in computing a complete G(d) of MFT vesicles due to their small size…

    1. In addition, homophobia has diverse roots, so being more aware of thedifferent biases and anxieties behind its expressions can be key to challeng-ing it and to challenging transphobia and other forms of exclusion as well.Even in the midst of thinking about bias and ensuring a fully educationalresponse, there is a danger in letting homophobia define how and why les-sons on sexual minorities are included in school. Institutional and legal re-strictions have shaped the lives of sexual minority people, yet it would be avast oversimplification to say that is the only reality of their lives. Sexuality,as discussed in Chapter 1, has a long and varied history-indeed historiesof identities and subjectivities may bear little resemblance to the categoriesby which we currently define sexual identity. As much as those communitiesand identity formations were related to restrictions on individuals' ability tolive, they nonetheless formed cultures and associations, and-like other mi-norities living in a cultural context shaped by bias-reshaped their worlds.Tactically, it may be possible to convince people who initially do not wantto include sexual minority issues in schooling that to do so would helpaddress the risks that LGBTQ students face. However, we also need to becareful not to frame LGBTQ issues as only risk or deficit ones. We need toprovide the opportunity to examine the positive aspects of LGBTQ commu-nities and cultures and the abilities of sexuality and gender diverse people tolive lives beyond institutional constraints.

      This section really made me think about how LGBTQ topics are often framed around danger, risk, or trauma. While those realities are important, it's limiting if that’s all we focus on. I like how the reading reminds us that LGBTQ communities also have resilience, joy, and rich cultural histories. Including those aspects in education helps move the conversation from tolerance to genuine respect.

    2. particular relationship to one another? How are sexual identities also de-fined by intense relationships, desires that may not be acted upon? Howare attractions defined through ideas about gender, race, and class? In otherwords, as we think about making schools safer for sexual minorities, howdo we even begin to address important issues, for instance, whether racialharassment is part of homophobia?

      This reaffirms that sexuality and gender are far more slippery and complex than categories can imply. It reinforces that even when schools try to place "normal" expectations upon them, people's experiences of identity cannot be constrained within firm boxes. By inquiring how sexuality intersects with race, class, and gender, the book highlights that safe schools for LGBTQ students require responding to broader systems of oppression rather than discrete cases of bullying and discrimination. It challenges us to examine more thoroughly how all students, regardless of identity, do well when schools push back on narrow definitions of what is "normal."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Using a cross-modal sensory selection task in head-fixed mice, the authors attempted to characterize how different rules reconfigured representations of sensory stimuli and behavioral reports in sensory (S1, S2) and premotor cortical areas (medial motor cortex or MM, and ALM). They used silicon probe recordings during behavior, a combination of single-cell and population-level analyses of neural data, and optogenetic inhibition during the task.

      Strengths:

      A major strength of the manuscript was the clarity of the writing and motivation for experiments and analyses. The behavioral paradigm is somewhat simple but well-designed and wellcontrolled. The neural analyses were sophisticated, clearly presented, and generally supported the authors' interpretations. The statistics are clearly reported and easy to interpret. In general, my view is that the authors achieved their aims. They found that different rules affected preparatory activity in premotor areas, but not sensory areas, consistent with dynamical systems perspectives in the field that hold that initial conditions are important for determining trial-based dynamics.

      Weaknesses:

      The manuscript was generally strong. The main weakness in my view was in interpreting the optogenetic results. While the simplicity of the task was helpful for analyzing the neural data, I think it limited the informativeness of the perturbation experiments. The behavioral read-out was low dimensional -a change in hit rate or false alarm rate- but it was unclear what perceptual or cognitive process was disrupted that led to changes in these read-outs. This is a challenge for the field, and not just this paper, but was the main weakness in my view. I have some minor technical comments in the recommendations for authors that might address other minor weaknesses.

      I think this is a well-performed, well-written, and interesting study that shows differences in rule representations in sensory and premotor areas and finds that rules reconfigure preparatory activity in the motor cortex to support flexible behavior.

      Reviewer #2 (Public Review):

      Summary:

      Chang et al. investigate neuronal activity firing patterns across various cortical regions in an interesting context-dependent tactile vs visual detection task, developed previously by the authors (Chevee et al., 2021; doi: 10.1016/j.neuron.2021.11.013). The authors report the important involvement of a medial frontal cortical region (MM, probably a similar location to wM2 as described in Esmaeili et al., 2021 & 2022; doi: 10.1016/j.neuron.2021.05.005; doi: 10.1371/journal.pbio.3001667) in mice for determining task rules.

      Strengths:

      The experiments appear to have been well carried out and the data well analysed. The manuscript clearly describes the motivation for the analyses and reaches clear and well-justified conclusions. I find the manuscript interesting and exciting!

      Weaknesses:

      I did not find any major weaknesses.

      Reviewer #3 (Public Review):

      This study examines context-dependent stimulus selection by recording neural activity from several sensory and motor cortical areas along a sensorimotor pathway, including S1, S2, MM, and ALM. Mice are trained to either withhold licking or perform directional licking in response to visual or tactile stimulus. Depending on the task rule, the mice have to respond to one stimulus modality while ignoring the other. Neural activity to the same tactile stimulus is modulated by task in all the areas recorded, with significant activity changes in a subset of neurons and population activity occupying distinct activity subspaces. Recordings further reveal a contextual signal in the pre-stimulus baseline activity that differentiates task context. This signal is correlated with subsequent task modulation of stimulus activity. Comparison across brain areas shows that this contextual signal is stronger in frontal cortical regions than in sensory regions. Analyses link this signal to behavior by showing that it tracks the behavioral performance switch during task rule transitions. Silencing activity in frontal cortical regions during the baseline period impairs behavioral performance.

      Overall, this is a superb study with solid results and thorough controls. The results are relevant for context-specific neural computation and provide a neural substrate that will surely inspire follow-up mechanistic investigations. We only have a couple of suggestions to help the authors further improve the paper.

      (1) We have a comment regarding the calculation of the choice CD in Fig S3. The text on page 7 concludes that "Choice coding dimensions change with task rule". However, the motor choice response is different across blocks, i.e. lick right vs. no lick for one task and lick left vs. no lick for the other task. Therefore, the differences in the choice CD may be simply due to the motor response being different across the tasks and not due to the task rule per se. The authors may consider adding this caveat in their interpretation. This should not affect their main conclusion.

      We thank the Reviewer for the suggestion. We have discussed this caveat and performed a new analysis to calculate the choice coding dimensions using right-lick and left-lick trials (Fig. S3h) on page 8. 

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).”

      We also have included the caveats for using right-lick and left-lick trials to calculate choice coding dimensions on page 13.

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      (2) We have a couple of questions about the effect size on single neurons vs. population dynamics. From Fig 1, about 20% of neurons in frontal cortical regions show task rule modulation in their stimulus activity. This seems like a small effect in terms of population dynamics. There is somewhat of a disconnect from Figs 4 and S3 (for stimulus CD), which show remarkably low subspace overlap in population activity across tasks. Can the authors help bridge this disconnect? Is this because the neurons showing a difference in Fig 1 are disproportionally stimulus selective neurons?

      We thank the Reviewer for the insightful comment and agree that it is important to link the single-unit and population results. We have addressed these questions by (1) improving our analysis of task modulation of single neurons  (tHit-tCR selectivity) and (2) examining the relationship between tHit-tCR selective neurons and tHit-tCR subspace overlaps.  

      Previously, we averaged the AUC values of time bins within the stimulus window (0-150 ms, 10 ms bins). If the 95% CI on this averaged AUC value did not include 0.5, this unit was considered to show significant selectivity. This approach was highly conservative and may underestimate the percentage of units showing significant selectivity, particularly any units showing transient selectivity. In the revised manuscript, we now define a unit as showing significant tHit-tCR selectivity when three consecutive time bins (>30 ms, 10ms bins) of AUC values were significant. Using this new criterion, the percentage of tHittCR selective neurons increased compared with the previous analysis. We have updated Figure 1h and the results on page 4:

      “We found that 18-33% of neurons in these cortical areas had area under the receiver-operating curve (AUC) values significantly different from 0.5, and therefore discriminated between tHit and tCR trials (Fig. 1h; S1: 28.8%, 177 neurons; S2: 17.9%, 162 neurons; MM: 32.9%, 140 neurons; ALM: 23.4%, 256 neurons; criterion to be considered significant: Bonferroni corrected 95% CI on AUC did not include 0.5 for at least 3 consecutive 10-ms time bins).”

      Next, we have checked how tHit-tCR selective neurons were distributed across sessions. We found that the percentage of tHit-tCR selective neurons in each session varied (S1: 9-46%, S2: 0-36%, MM:25-55%, ALM:0-50%). We examined the relationship between the numbers of tHit-tCR selective neurons and tHit-tCR subspace overlaps. Sessions with more neurons showing task rule modulation tended to show lower subspace overlap, but this correlation was modest and only marginally significant (r= -0.32, p= 0.08, Pearson correlation, n= 31 sessions). While we report the percentage of neurons showing significant selectivity as a simple way to summarize single-neuron effects, this does neglect the magnitude of task rule modulation of individual neurons, which may also be relevant. 

      In summary, the apparent disconnect between the effect sizes of task modulation of single neurons and of population dynamics could be explained by (1) the percentages of tHit-tCR selective neurons were underestimated in our old analysis, (2) tHit-tCR selective neurons were not uniformly distributed among sessions, and (3) the percentages of tHit-tCR selective neurons were weakly correlated with tHit-tCR subspace overlaps. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      For the analysis of choice coding dimensions, it seems that the authors are somewhat data limited in that they cannot compare lick-right/lick-left within a block. So instead, they compare lick/no lick trials. But given that the mice are unable to initiate trials, the interpretation of the no lick trials is a bit complicated. It is not clear that the no lick trials reflect a perceptual judgment about the stimulus (i.e., a choice), or that the mice are just zoning out and not paying attention. If it's the latter case, what the authors are calling choice coding is more of an attentional or task engagement signal, which may still be interesting, but has a somewhat different interpretation than a choice coding dimension. It might be worth clarifying this point somewhere, or if I'm totally off-base, then being more clear about why lick/no lick is more consistent with choice than task engagement.

      We thank the Reviewer for raising this point. We have added a new paragraph on page 13 to clarify why we used lick/no-lick trials to calculate choice coding dimensions, and we now discuss the caveat regarding task engagement.  

      “No-lick trials included misses, which could be caused by mice not being engaged in the task. While the majority of no-lick trials were correct rejections (respond-to-touch: 75%; respond-to-light: 76%), we treated no-licks as one of the available choices in our task and included them to calculate choice coding dimensions (Fig. S4c,d,f). To ensure stable and balanced task engagement across task rules, we removed the last 20 trials of each session and used stimulus parameters that achieved similar behavioral performance for both task rules (Fig. 1d; ~75% correct for both rules).”

      In addition, to address a point made by Reviewer 3 as well as this point, we performed a new analysis to calculate choice coding dimensions using right-lick vs left-lick trials. We report this new analysis on page 8:

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).” 

      We added discussion of the limitations of this new analysis on page 13:

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      The authors find that the stimulus coding direction in most areas (S1, S2, and MM) was significantly aligned between the block types. How do the authors interpret that finding? That there is no major change in stimulus coding dimension, despite the change in subspace? I think I'm missing the big picture interpretation of this result.

      That there is no significant change in stimulus coding dimensions but a change in subspace suggests that the subspace change largely reflects a change in the choice coding dimensions.

      As I mentioned in the public review, I thought there was a weakness with interpretation of the optogenetic experiments, which the authors generally interpret as reflecting rule sensitivity. However, given that they are inhibiting premotor areas including ALM, one might imagine that there might also be an effect on lick production or kinematics. To rule this out, the authors compare the change in lick rate relative to licks during the ITI. What is the ITI lick rate? I assume pretty low, once the animal is welltrained, in which case there may be a floor effect that could obscure meaningful effects on lick production. In addition, based on the reported CI on delta p(lick), it looks like MM and AM did suppress lick rate. I think in the future, a task with richer behavioral read-outs (or including other measurements of behavior like video), or perhaps something like a psychological process model with parameters that reflect different perceptual or cognitive processes could help resolve the effects of perturbations more precisely.

      Eighteen and ten percent of trials had at least one lick in the ITI in respond-to-touch and  respond-tolight blocks, respectively. These relatively low rates of ITI licking could indeed make an effect of optogenetics on lick production harder to observe. We agree that future work would benefit from more complex tasks and measurements, and have added the following to make this point (page 14):

      “To more precisely dissect the effects of perturbations on different cognitive processes in rule-dependent sensory detection, more complex behavioral tasks and richer behavioral measurements are needed in the future.”

      Reviewer #2 (Recommendations For The Authors):

      I have the following minor suggestions that the authors might consider in revising this already excellent manuscript :

      (1) In addition to showing normalised z-score firing rates (e.g. Fig 1g), I think it is important to show the grand-average mean firing rates in Hz.

      We thank the Reviewer for the suggestion and have added the grand-average mean firing rates as a new supplementary figure (Fig. S2a). To provide more details about the firing rates of individual neurons, we have also added to this new figure the distribution of peak responses during the tactile stimulus period (Fig. S2b).

      (2) I think the authors could report more quantitative data in the main text. As a very basic example, I could not easily find how many neurons, sessions, and mice were used in various analyses.

      We have added relevant numbers at various points throughout the Results, including within the following examples:

      Page 3: “To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM and ALM (Fig. 1e-g, Fig. S1a-h, and Fig. S2a; S1: 6 mice, 10 sessions, 177 neurons, S2: 5 mice, 8 sessions, 162 neurons, MM: 7 mice, 9 sessions, 140 neurons, ALM: 8 mice, 13 sessions, 256 neurons).”

      Page 5: “As expected, single-unit activity before stimulus onset did not discriminate between tactile and visual trials (Fig. 2d; S1: 0%, 177 neurons; S2: 0%, 162 neurons; MM: 0%, 140 neurons; ALM: 0.8%, 256 neurons). After stimulus onset, more than 35% of neurons in the sensory cortical areas and approximately 15% of neurons in the motor cortical areas showed significant stimulus discriminability (Fig. 2e; S1: 37.3%, 177 neurons; S2: 35.2%, 162 neurons; MM: 15%, 140 neurons; ALM: 14.1%, 256 neurons).”

      Page 6: “Support vector machine (SVM) and Random Forest classifiers showed similar decoding abilities

      (Fig. S3a,b; medians of classification accuracy [true vs shuffled]; SVM: S1 [0.6 vs 0.53], 10 sessions, S2

      [0.61 vs 0.51], 8 sessions, MM [0.71 vs 0.51], 9 sessions, ALM [0.65 vs 0.52], 13 sessions; Random

      Forests: S1 [0.59 vs 0.52], 10 sessions, S2 [0.6 vs 0.52], 8 sessions, MM [0.65 vs 0.49], 9 sessions, ALM [0.7 vs 0.5], 13 sessions).”

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).” 

      Page 8: “In contrast, we found that S1, S2 and MM had stimulus CDs that were significantly aligned between the two block types (Fig. S4e; magnitude of dot product between the respond-to-touch stimulus CDs and the respond-to-light stimulus CDs, mean ± 95% CI for true vs shuffled data: S1: 0.5 ± [0.34, 0.66] vs 0.21 ± [0.12, 0.34], 10 sessions; S2: 0.62 ± [0.43, 0.78] vs 0.22 ± [0.13, 0.31], 8 sessions; MM: 0.48 ± [0.38, 0.59] vs 0.24 ± [0.16, 0.33], 9 sessions; ALM: 0.33 ± [0.2, 0.47] vs 0.21 ± [0.13, 0.31], 13 sessions).”  Page 9: “For respond-to-touch to respond-to-light block transitions, the fractions of trials classified as respond-to-touch for MM and ALM decreased progressively over the course of the transition (Fig. 5d; rank correlation of the fractions calculated for each of the separate periods spanning the transition, Kendall’s tau, mean ± 95% CI: MM: -0.39 ± [-0.67, -0.11], 9 sessions, ALM: -0.29 ± [-0.54, -0.04], 13 sessions; criterion to be considered significant: 95% CI on Kendall’s tau did not include 0).

      Page 11: “Lick probability was unaffected during S1, S2, MM and ALM experiments for both tasks, indicating that the behavioral effects were not due to an inability to lick (Fig. 6i, j; 95% CI on Δ lick probability for cross-modal selection task: S1/S2 [-0.18, 0.24], 4 mice, 10 sessions; MM [-0.31, 0.03], 4 mice, 11 sessions; ALM [-0.24, 0.16], 4 mice, 10 sessions; Δ lick probability for simple tactile detection task: S1/S2 [-0.13, 0.31], 3 mice, 3 sessions; MM [-0.06, 0.45], 3 mice, 5 sessions; ALM [-0.18, 0.34], 3 mice, 4 sessions).”

      (3) Please include a clearer description of trial timing. Perhaps a schematic timeline of when stimuli are delivered and when licking would be rewarded. I may have missed it, but I did not find explicit mention of the timing of the reward window or if there was any delay period.

      We have added the following (page 3): 

      “For each trial, the stimulus duration was 0.15 s and an answer period extended from 0.1 to 2 s from stimulus onset.”

      (4) Please include a clear description of statistical tests in each figure legend as needed (for example please check Fig 4e legend).

      We have added details about statistical tests in the figure legends:

      Fig. 2f: “Relationship between block-type discriminability before stimulus onset and tHit-tCR discriminability after stimulus onset for units showing significant block-type discriminability prior to the stimulus. Pearson correlation: S1: r = 0.69, p = 0.056, 8 neurons; S2: r = 0.91, p = 0.093, 4 neurons; MM: r = 0.93, p < 0.001, 30 neurons; ALM: r = 0.83, p < 0.001, 26 neurons.” 

      Fig. 4e: “Subspace overlap for control tHit (gray) and tCR (purple) trials in the somatosensory and motor cortical areas. Each circle is a subspace overlap of a session. Paired t-test, tCR – control tHit: S1: -0.23, 8 sessions, p = 0.0016; S2: -0.23, 7 sessions, p = 0.0086; MM: -0.36, 5 sessions, p = <0.001; ALM: -0.35, 11 sessions, p < 0.001; significance: ** for p<0.01, *** for p<0.001.”  

      Fig. 5d,e: “Fraction of trials classified as coming from a respond-to-touch block based on the pre-stimulus population state, for trials occurring in different periods (see c) relative to respond-to-touch → respondto-light transitions. For MM (top row) and ALM (bottom row), progressively fewer trials were classified as coming from the respond-to-touch block as analysis windows shifted later relative to the rule transition. Kendall’s tau (rank correlation): MM: -0.39, 9 sessions; ALM: -0.29, 13 sessions. Left panels: individual sessions, right panels: mean ± 95% CI. Dash lines are chance levels (0.5). e, Same as d but for respond-to-light → respond-to-touch transitions. Kendall’s tau: MM: 0.37, 9 sessions; ALM: 0.27, 13 sessions.”

      Fig. 6: “Error bars show bootstrap 95% CI. Criterion to be considered significant: 95% CI did not include 0.”

      (5) P. 3 - "To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM, and ALM using 64-channel silicon probes (Fig. 1e-g and Fig. S1a-h)." Please specify if these areas were recorded simultaneously or not.

      We have added “We recorded from one of these cortical areas per session, using 64-channel silicon probes.”  on page 3.  

      (6) Figure 4b - Please describe what gray and black lines show.

      The gray traces are the distance between tHit and tCR trajectories in individual sessions and the black traces are the averages across sessions in different cortical areas. We have added this information on page 6 and in the Figure 4b legend. 

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).

      Fig. 4b: “Distance between tHit and tCR trajectories in S1, S2, MM and ALM. Gray traces show the time varying tHit-tCR distance in individual sessions and black traces are session-averaged tHit-tCR distance (S1:10 sessions; S2: 8 sessions; MM: 9 sessions; ALM: 13 sessions).”

      (7) In addition to the analyses shown in Figure 5a, when investigating the timing of the rule switch, I think the authors should plot the left and right lick probabilities aligned to the timing of the rule switch time on a trial-by-trial basis averaged across mice.

      We thank the Reviewer for suggesting this addition. We have added a new figure panel to show the probabilities of right- and left-licks during rule transitions (Fig. 5a).

      Page 8: “The probabilities of right-licks and left-licks showed that the mice switched their motor responses during block transitions depending on task rules (Fig. 5a, mean ± 95% CI across 12 mice).” 

      (8) P. 12 - "Moreover, in a separate study using the same task (Finkel et al., unpublished), high-speed video analysis demonstrated no significant differences in whisker motion between respond-to-touch and respond-to-light blocks in most (12 of 14) behavioral sessions.". Such behavioral data is important and ideally would be included in the current analysis. Was high-speed videography carried out during electrophysiology in the current study?

      Finkel et al. has been accepted in principle for publication and will be available online shortly. Unfortunately we have not yet carried out simultaneous high-speed whisker video and electrophysiology in our cross-modal sensory selection task.

      Reviewer #3 (Recommendations For The Authors):

      (1) Minor point. For subspace overlap calculation of pre-stimulus activity in Fig 4e (light purple datapoints), please clarify whether the PCs for that condition were constructed in matched time windows. If the PCs are calculated from the stimulus period 0-150ms, the poor alignment could be due to mismatched time windows.

      We thank the Reviewer for the comment and clarify our analysis here. We previously used timematched windows to calculate subspace overlaps. However, the pre-stimulus activity was much weaker than the activity during the stimulus period, so the subspaces of reference tHit were subject to noise and we were not able to obtain reliable PCs. This caused the subspace overlap values between the reference tHit and control tHit to be low and variable (mean ± SD, S1:  0.46± 0.26, n = 8 sessions, S2: 0.46± 0.18, n = 7 sessions, MM: 0.44± 0.16, n = 5 sessions, ALM: 0.38± 0.22, n = 11 sessions).  Therefore, we used the tHit activity during the stimulus window to obtain PCs and projected pre-stimulus and stimulus activity in tCR trials onto these PCs. We have now added a more detailed description of this analysis in the Methods (page 32). 

      “To calculate the separation of subspaces prior to stimulus delivery, pre-stimulus activity in tCR trials (100 to 0 ms from stimulus onset) was projected to the PC space of the tHit reference group and the subspace overlap was calculated. In this analysis, we used tHit activity during stimulus delivery (0 to 150 ms from stimulus onset) to obtain reliable PCs.”   

      We acknowledge this time alignment issue and have now removed the reported subspace overlap between tHit and tCR during the pre-stimulus period from Figure 4e (light purple). However, we think the correlation between pre- and post- stimulus-onset subspace overlaps should remain similar regardless of the time windows that we used for calculating the PCs. For the PCs calculated from the pre-stimulus period (-100 to 0 ms), the correlation coefficient was 0.55 (Pearson correlation, p <0.01, n = 31 sessions). For the PCs calculated from the stimulus period (0-150 ms), the correlation coefficient was 0.68 (Figure 4f, Pearson correlation, p <0.001, n = 31 sessions). Therefore, we keep Figure 4f.  

      (2) Minor point. To help the readers follow the logic of the experiments, please explain why PPC and AMM were added in the later optogenetic experiment since these are not part of the electrophysiology experiment.

      We have added the following rationale on page 9.

      “We recorded from AMM in our cross-modal sensory selection task and observed visually-evoked activity (Fig. S1i-k), suggesting that AMM may play an important role in rule-dependent visual processing. PPC contributes to multisensory processing51–53 and sensory-motor integration50,54–58.  Therefore, we wanted to test the roles of these areas in our cross-modal sensory selection task.”

      (3) Minor point. We are somewhat confused about the timing of some of the example neurons shown in figure S1. For example, many neurons show visually evoked signals only after stimulus offset, unlike tactile evoked signals (e.g. Fig S1b and f). In addition, the reaction time for visual stimulus is systematically slower than tactile stimuli for many example neurons (e.g. Fig S1b) but somehow not other neurons (e.g. Fig S1g). Are these observations correct?

      These observations are all correct. We have a manuscript from a separate study using this same behavioral task (Finkel et al., accepted in principle) that examines and compares (1) the onsets of tactile- and visually-evoked activity and (2) the reaction times to tactile and visual stimuli. The reaction times to tactile stimuli were slightly but significantly shorter than the reaction times to visual stimuli (tactile vs visual, 397 ± 145 vs 521 ± 163 ms, median ± interquartile range [IQR], Tukey HSD test, p = 0.001, n =155 sessions). We examined how well activity of individual neurons in S1 could be used to discriminate the presence of the stimulus or the response of the mouse. For discriminability for the presence of the stimulus, S1 neurons could signal the presence of the tactile stimulus but not the visual stimulus. For discriminability for the response of the mouse, the onsets for significant discriminability occurred earlier for tactile compared with visual trials (two-sided Kolmogorov-Smirnov test, p = 1x10-16, n = 865 neurons with DP onset in tactile trials, n = 719 neurons with DP onset in visual trials).

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study combines a range of advanced ultrastructural imaging approaches to define the unusual endosomal system of African trypanosomes. Compelling images show that instead of a distinct set of compartments, the endosome of these protists comprises a continuous system of membranes with functionally distinct subdomains as defined by canonical markers of early, late and recycling endosomes. The findings suggest that the endocytic system of bloodstream stages has evolved to facilitate the extraordinarily high rates of membrane turnover needed to remove immune complexes and survive in the blood, which is of interest to anyone studying infectious diseases.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Bloodstream stages of the parasitic protist, Trypanosoma brucei, exhibit very high rates of constitutive endocytosis, which is needed to recycle the surface coat of Variant Surface Glycoproteins (VSGs) and remove surface immune complexes. While many studies have shown that the endo-lysosomal systems of T. brucei BF stages contain canonical domains, as defined by classical Rab markers, it has remained unclear whether these protists have evolved additional adaptations/mechanisms for sustaining these very high rates of membrane transport and protein sorting. The authors have addressed this question by reconstructing the 3D ultrastructure and functional domains of the T. brucei BF endosome membrane system using advanced electron tomography and super-resolution microscopy approaches. Their studies reveal that, unusually, the BF endosome network comprises a continuous system of cisternae and tubules that contain overlapping functional subdomains. It is proposed that a continuous membrane system allows higher rates of protein cargo segregation, sorting and recycling than can otherwise occur when transport between compartments is mediated by membrane vesicles or other fusion events.

      Strengths:

      The study is a technical tour-de-force using a combination of electron tomography, super-resolution/expansion microscopy, immune-EM of cryo-sections to define the 3D structures and connectivity of different endocytic compartments. The images are very clear and generally support the central conclusion that functionally distinct endocytic domains occur within a dynamic and continuous endosome network in BF stages.

      Weaknesses:

      The authors suggest that this dynamic endocytic network may also fulfil many of the functions of the Golgi TGN and that the latter may be absent in these stages. Although plausible, this comment needs further experimental support. For example, have the authors attempted to localize canonical makers of the TGN (e.g. GRIP proteins) in T. brucei BF and/or shown that exocytic carriers bud directly from the endosomes?

      We agree with the criticism and have shortened the discussion accordingly and clearly marked it as speculation. However, we do not want to completely abandon our hypothesis.

      The paragraph now reads:

      Lines 740 – 751:

      “Interestingly, we did not find any structural evidence of vesicular retrograde transport to the Golgi. Instead, the endosomal ‘highways’ extended throughout the posterior volume of the trypanosomes approaching the trans-Golgi interface. It is highly plausible that this region represents the convergence point where endocytic and biosynthetic membrane trafficking pathways merge. A comparable merging of endocytic and biosynthetic functions has been described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019). As we could not find structural evidence for the existence of a TGN we tentatively propose that trypanosomes may have shifted the central orchestrating function of the TGN as a sorting hub at the crossroads of biosynthetic and recycling pathways to the endosome. Although this is a speculative scenario, it is experimentally testable.”

      Furthermore, we removed the lines 51 - 52, which included the suggestion of the TGN as a master regulator, from the abstract.

      Reviewer #2 (Public Review):

      The authors suggest that the African trypanosome endomembrane system has unusual organisation, in that the entire system is a single reticulated structure. It is not clear if this is thought to extend to the lysosome or MVB. There is also a suggestion that this unusual morphology serves as a trans-(post)Golgi network rather than the more canonical arrangement.

      The work is based around very high-quality light and electron microscopy, as well as utilising several marker proteins, Rab5A, 11 and 7. These are deemed as markers for early endosomes, recycling endosomes and late or pre-lysosomes. The images are mostly of high quality but some inconsistencies in the interpretation, appearance of structures and some rather sweeping assumptions make this less easy to accept. Two perhaps major issues are claims to label the entire endosomal apparatus with a single marker protein, which is hard to accept as certainly this reviewer does not really even know where the limits to the endosomal network reside and where these interface with other structures. There are several additional compartments that have been defined by Rob proteins as well, and which are not even mentioned. Overall I am unconvinced that the authors have demonstrated the main things they claim.<br /> The endomembrane system in bloodstream form T. brucei is clearly delimited. Compared to mammalian cells it is tidy and confined to the posterior part of the spindleshaped cell. The endoplasmic reticulum is linked to one side of the longitudinal cell axis, marked by the attached flagellum, while the mitochondrion locates to the opposite side. Glycosomes are easily identifiable as spheres, as are acidocalcisomes, which are smaller than glycosomes and – in electron micrographs – are characterized by high electron density. All these organelles extend beyond the nucleus, which is not the case for the endosomal compartment, the lysosome and the Golgi. The vesicles found in the posterior half of the trypanosome cell are quantitatively identifiable as COP1, CCVI or CCVII vesicles, or exocytic carriers. The lysosome has a higher degree of morphological plasticity, but this is not topic of the present work. Thus, the endomembrane system in T. brucei is comparatively well structured and delimited, which is why we have chosen trypanosomes as cell biological model.

      We have published EP1::GFP as marker for the endosome system and flagellar pocket back in 2004. We have defined the fluid phase volume of the trypanosome endosome in papers published between 2002 and 2007. This work was not intended to represent the entirety of RAB proteins. We were only interested in 3 canonical markers for endosome subtypes. We do not claim anything that is not experimentally tested, we have clearly labelled our hypotheses as such, and we do not make sweeping assumptions.

      The approaches taken are state-of-the-art but not novel, and because of the difficulty in fully addressing the central tenet, I am not sure how much of an impact this will have beyond the trypanosome field. For certain this is limited to workers in the direct area and is not a generalisable finding.

      To the best of our knowledge, there is no published research that has employed 3D Tokuyasu or expansion microscopy (ExM) to label endosomes. The key takeaway from our study, which is the concept that "endosomes are continuous in trypanosomes" certainly is novel. We are not aware of any other report that has demonstrated this aspect.

      The doubts formulated by the reviewer regarding the impact of our work beyond the field of trypanosomes are not timely. Indeed, our results, and those of others, show that the conclusions drawn from work with just a few model organisms is not generalisable. We are finally on the verge of a new cell biology that considers the plethora of evolutionary solutions beyond ophistokonts. We believe that this message should be widely acknowledged and considered. And we are certainly not the only ones who are convinced that the term "general relevance" is unscientific and should no longer be used in biology.

      Reviewer #3 (Public Review):

      Summary:

      As clearly highlighted by the authors, a key plank in the ability of trypanosomes to evade the mammalian host’s immune system is its high rate of endocytosis. This rapid turnover of its surface enables the trypanosome to ‘clean’ its surface removing antibodies and other immune effectors that are subsequently degraded. The high rate of endocytosis is likely reflected in the organisati’n and layout of the endosomal system in these parasites. Here, Link et al., sought to address this question using a range of light and three-dimensional electron microscopy approaches to define the endosomal organisation in this parasite.

      Before this study, the vast majority of our information about the make-up of the trypanosome endosomal system was from thin-section electron microscopy and immunofluorescence studies, which did not provide the necessary resolution and 3D information to address this issue. Therefore, it was not known how the different structures observed by EM were related. Link et al., have taken advantage of the advances in technology and used an impressive combination of approaches at the LM and EM level to study the endosomal system in these parasites. This innovative combination has now shown the interconnected-ness of this network and demonstrated that there are no ‘classical’ compartments within the endosomal system, with instead different regions of the network enriched in different protein markers (Rab5a, Rab7, Rab11).

      Strengths:

      This is a generally well-written and clear manuscript, with the data well-presented supporting the majority of the conclusions of the authors. The authors use an impressive range of approaches to address the organisation of the endosomal system and the development of these methods for use in trypanosomes will be of use to the wider parasitology community.

      I appreciate their inclusion of how they used a range of different light microscopy approaches even though for instance the dSTORM approach did not turn out to be as effective as hoped. The authors have clearly demonstrated that trypanosomes have a large interconnected endosomal network, without defined compartments and instead show enrichment for specific Rabs within this network.

      Weaknesses:

      My concerns are:

      i) There is no evidence for functional compartmentalisation. The classical markers of different endosomal compartments do not fully overlap but there is no evidence to show a region enriched in one or other of these proteins has that specific function. The authors should temper their conclusions about this point.

      The reviewer is right in stating that Rab-presence does not necessarily mean Rabfunction. However, this assumption is as old as the Rab literature. That is why we have focused on the 3 most prominent endosomal marker proteins. We report that for endosome function you do not necessarily need separate membrane compartments. This is backed by our experiments.

      ii) The quality of the electron microscopy work is very high but there is a general lack of numbers. For example, how many tomograms were examined? How often were fenestrated sheets seen? Can the authors provide more information about how frequent these observations were?

      The fenestrated sheets can be seen in the majority of the 37 tomograms recorded of the posterior volume of the parasites. Furthermore, we have randomly generated several hundred tiled (= very large) electron micrographs of bloodstream form trypanosomes for unbiased analyses of endomembranes. In these 2D-datasets the “footprint” of the fenestrated flat and circular cisternae is frequently detectable in the posterior cell area.

      We now have included the corresponding numbers in all EM figure legends.

      iii) The EM work always focussed on cells which had been processed before fixing. Now, I understand this was important to enable tracers to be used. However, given the dynamic nature of the system these processing steps and feeding experiments may have affected the endosomal organisation. Given their knowledge of the system now, the authors should fix some cells directly in culture to observe whether the organisation of the endosome aligns with their conclusions here.

      This is a valid criticism; however, it is the cell culture that provides an artificial environment. As for a possible effect of cell harvesting by centrifugation on the integrity and functionality of the endosome system, we consider this very unlikely for one simple reason. The mechanical forces acting in and on the parasites as they circulate in the extremely crowded and confined environment of the mammalian bloodstream are obviously much higher than the centrifugal forces involved in cell preparation. This becomes particularly clear when one considers that the mass of the particle to be centrifuged determines the actual force exerted by the g-forces. Nevertheless, the proposed experiment is a good control, although much more complex than proposed, since tomography is a challenging technique. We have performed the suggested experiment and acquired tomograms of unprocessed cells. The corresponding data is now included as supplementary movie 2, 3 and 4. We refer to it in lines 202 – 206: To investigate potential impacts of processing steps (cargo uptake, centrifugation, washing) on endosomal organization, we directly fixed cells in the cell culture flask, embedded them in Epon, and conducted tomography. The resulting tomograms revealed endosomal organization consistent with that observed in cells fixed after processing (see Supplementary movie 2, 3, and 4).

      We furthermore thank the reviewer for the experiment suggestion in the acknowledgments.

      iv) The discussion needs to be revamped. At the moment it is just another run through of the results and does not take an overview of the results presenting an integrated view. Moreover, it contains reference to data that was not presented in the results.

      We have improved the discussion accordingly.

      Recommendations for the authors:

      The reviewers concurred about the high calibre of the work and the importance of the findings.

      They raised some issues and made some suggestions to improve the paper without additional experiments - key issues include

      (1) Better referencing of the trypanosome endocytosis/ lysosomal trafficking literature.

      The literature, especially the experimental and quantitative work, is very limited. We now provide a more complete set of references. However, we would like to mention that we had cited a recent review that critically references the trypanosome literature with emphasis on the extensive work done with mammalian cells and yeast.

      (2) Moving the dSTORM data that detracts from otherwise strong data in a supplementary figure.

      We have done this.

      (3) Removal of the conclusion that the continuous endosome fulfils the functions of TGN, without further evidence.

      As stated above, this was not a conclusion in our paper, but rather a speculation, which we have now more clearly marked as such. Lines 740 to 751 now read:

      “Interestingly, we did not find any structural evidence of vesicular retrograde transport to the Golgi. Instead, the endosomal ‘highways’ extended throughout the posterior volume of the trypanosomes approaching the trans-Golgi interface. It is highly plausible that this region represents the convergence point where endocytic and biosynthetic membrane trafficking pathways merge. A comparable merging of endocytic and biosynthetic functions was already described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019). As we could not find structural evidence for the existence of a TGN we tentatively propose that trypanosomes may have shifted the central orchestrating function of the TGN as a sorting hub at the crossroads of biosynthetic and recycling pathways to the endosome. Although this is a speculative scenario, it is experimentally testable.”

      (4) Broader discussion linking their findings to other examples of organelle maturation in eukaryotes (e.g cisternal maturation of the Golgi)

      We have improved the discussion accordingly.

      Reviewer #1 (Recommendations For The Authors):

      What are the multi-vesicular vesicles that surround the marked endosomal compartments in Fig 1. Do they become labelled with fluid phase markers with longer incubations (e.g late endosome/ lysosomal)?

      The function of MVBs in trypanosomes is still far from being clear. They are filled with fluid phase cargo, especially ferritin, but are devoid of VSG. Hence it is likely that MVBs are part of the lysosomal compartment. In fact, this part of the endomembrane system is highly dynamic. MVBs can be physically connected to the lysosome or can form elongated structures. The surprising dynamics of the trypanosome lysosome will be published elsewhere.

      Figure 2. The compartments labelled with EP1::Halo are very poorly defined due to the low levels of expression of the reporter protein and/or sensitivity of detection of the Halo tag. Based on these images, it would be hard to conclude whether the endosome network is continuous or not. In this respect, it is unclear why the authors didn't use EP1-GFP for these analyses? Given the other data that provides more compelling evidence for a single continuous compartment, I would suggest removing Fig 2A.

      We have used EP1::GFP to label the entire endosome system (Engstler and Boshart, 2004). Unfortunately, GFP is not suited for dSTORM imaging. By creating the EP1::Halo cell line, we were able to utilize the most prominent dSTORM fluorescent dye, Alexa 647. This was not primarily done to generate super resolution images, but rather to measure the dynamics of the GPI-anchored, luminal protein EP with single molecule precision. The results from this study will be published separately. But we agree with the reviewer and have relocated the dSTORM data to the supplementary material.

      The observation that Rab5a/7 can be detected in the lumen of lysosome is interesting. Mechanistically, this presumably occurs by invagination of the limiting membrane of the lysosome. Is there any evidence that similar invagination of cytoplasmic markers occurs throughout or in subdomains of the endocytic network (possibly indicative of a 'late endosome' domain)?

      So far, we have not observed this. The structure of the lysosome and the membrane influx from the endosome are currently being investigated.

      The authors note that continuity of functionally distinct membrane compartments in the secretory/endocytic pathways has been reported in other protists (e.g T. cruzi). A particular example that could be noted is the endo-lysosomal system of Dictyostelium discoideum which mediates the continuous degradation and eventual expulsion of undigested material.

      We tried to include this in the discussion but ultimately decided against it because the Dictyostelium system cannot be easily compared to the trypanosome endosome.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Not sure that 'common' is the correct term here. Frequent, near-universal..... it would be true that endocytosis is common across most eukaryotes.

      We have changed the sentence to “common process observed in most eukaryotes” (line 33).

      Immune evasion - the parasite does not escape the immune system, but does successfully avoid its impact, at least at the population level.

      We have replaced the word “escape” with “evasion” (line 35).

      The third sentence needs to follow on correctly from the second. Also, more than Igs are internalised and potentially part of immune evasion, such as C3, Factor H, ApoL1 etcetera.

      We believe that there may be a misunderstanding here. The process of endocytic uptake and lysosomal degradation has so far only been demonstrated in the context of VSGbound antibodies, which is why we only refer to this. Of course, the immune system comprises a wide range of proteins and effector molecules, all of which could be involved in immune evasion.

      I do not follow the logic that the high flux through the endocytic system in trypanosomes precludes distinct compartmentalisation - one could imagine a system where a lot of steps become optimised for example. This idea needs expanding on if it is correct.

      Membrane transport by vesicle transfer between several separate membrane compartments would be slower than the measured rate of membrane flux.

      Again I am not sure 'efficient' on line 40. It is fast, but how do you measure efficiency? Speed and efficiency are not the same thing.

      We have replaced the word “efficient” with “fast” (line 42).

      The basis for suggesting endosomes as a TGN is unclear. Given that there are AP complexes, retromer, exocyst and other factors that are part of the TGN or at least post-G differentiation of pathways in canonical systems, this seems a step too far. There really is no evidence in the rest of the MS that seems to support this.

      Yes, we agree and have clarified the discussion accordingly. We have not completely removed the discussion on the TGN but have labelled it more clearly as speculation.

      I am aware I am being pedantic here, but overall the abstract seems to provide an impression of greater novelty than may be the case and makes several very bold claims that I cannot see as fully valid.

      We are not aware of any claim in the summary that we have not substantiated with experiments, or any hypothesis that we have not explained.

      Moreover, the concept of fused or multifunctional endosomes (or even other endomembrane compartments) is old, and has been demonstrated in metazoan cells and yeast. The concept of rigid (in terms of composition) compartments really has been rejected by most folks with maturation, recycling and domain structures already well-established models and concepts.

      We agree that the (transient) presence of multiple Rab proteins decorating endosomes has been demonstrated in various cell types. This finding formed the basis for the endosomal maturation model in mammals and yeast, which has replaced the previous rigid compartment model.

      However, we do not appreciate attempts to question the originality of our study by claiming that similar observations have been made in metazoans or yeast. This is simply wrong. There are no reports of a functionally structured, continuous, single and large endosome in any other system. The only membrane system that might be similar was described in the American parasite Trypanosoma cruzi, however, without the use of endosome markers or any functional analysis. We refer to this study in the discussion.

      In summary, the maturation model falls short in explaining the intricacies of the membrane system we have uncovered in trypanosomes. Therefore, one plausible interpretation of our data is that the overall architecture of the trypanosome endosomes represents an adaptation that enables the remarkable speed of plasma membrane recycling observed in these parasites. In our view, both our findings and their interpretation are novel and worth reporting. Again, modern cell biology should recognize that evolution has developed many solutions for similar processes in cells, about whose diversity we have learned almost nothing because of our reductionist view. A remarkable example of this are the Picozoa, tiny bipartite eukaryotes that pack the entire nutritional apparatus into one pouch and the main organelles with the locomotor system into the other. Another one is the “extreme” cell biology of many protozoan parasites such as Giardia, Toxpoplasma or Trypanosoma.

      Higher plants have been well characterised, especially at the level of Rab/Arf proteins and adaptins.

      We now mention plant endosomes in our brief discussion of the trypanosome TGN. Lines 744 – 747:

      “A comparable merging of endocytic and biosynthetic functions was already described for the TGN in plants. Different marker proteins for early and recycling endosomes were shown to be associated and/ or partially colocalized with the TGN suggesting its function in both secretory and endocytic pathways (reviewed in Minamino and Ueda, 2019).”

      The level of self-citing in the introduction is irritating and unscholarly. I have no qualms with crediting the authors with their own excellent contributions, but work from Dacks, Bangs, Field and others seems to be selectively ignored, with an awkward use of the authors' own publications. Diversity between organisms for example has been a mainstay of the Dacks lab output, Rab proteins and others from Field and work on exocytosis and late endosomal systems from Bangs. These efforts and contributions surely deserve some recognition?

      This is an original article and not a review. For a comprehensive overview the reviewer might read our recent overview article on exo- and endocytic pathways in trypanosomes, in which we have extensively cited the work of Mark Field, Jay Bangs and Joel Dacks. In the present manuscript, we have cited all papers that touch on our results or are otherwise important for a thorough understanding of our hypotheses. We do not believe that this approach is unscientific, but rather improves the readability of the manuscript. Nevertheless, we have now cited additional work.

      For the uninitiated, the posterior/anterior axis of the trypanosome cell as well as any other specific features should be defined.

      In lines 102 - 110 we wrote:

      “This process of antibody clearance is driven by hydrodynamic drag forces resulting from the continuous directional movement of trypanosomes (Engstler et al., 2007). The VSG-antibody complexes on the cell surface are dragged against the swimming direction of the parasite and accumulate at the posterior pole of the cell. This region harbours an invagination in the plasma membrane known as the flagellar pocket (FP) (Gull, 2003; Overath et al., 1997). The FP, which marks the origin of the single attached flagellum, is the exclusive site for endo- and exocytosis in trypanosomes (Gull, 2003; Overath et al., 1997). Consequently, the accumulation of VSG-antibody complexes occurs precisely in the area of bulk membrane uptake.”

      We think this sufficiently introduces the cell body axes.

      I don't understand the comment concerning microtubule association. In mammalian cells, such association is well established, but compartments still do not display precise positioning. This likely then has nothing to do with the microtubule association differences.

      We have clarified this in the text (lines 192 – 199). There is no report of cytoplasmic microtubules in trypanosomes. All microtubules appear to be either subpellicular or within the flagellum. To maintain the structure and position of the endosomal apparatus, they should be associated either with subpellicular microtubules, as is the case with the endoplasmic reticulum, or with the more enigmatic actomyosin system of the parasites. We have been working on the latter possibility and intend to publish a follow-up paper to the present manuscript.

      The inability to move past the nucleus is a poor explanation. These compartments are dynamic. Even the nucleus does interesting things in trypanosomes and squeezes past structures during development in the tsetse fly.

      The distance between the nucleus and the microtubule cytoskeleton remains relatively constant even in parasites that squeeze through microfluidic channels. This is not unexpected as the nucleus can be highly deformed. A structure the size of the endosome will not be able to physically pass behind the nucleus without losing its integrity. In fact, the recycling apparatus is never found in the anterior part of the trypanosome, most probably because the flagellar pocket is located at the posterior cell pole.

      L253 What is the evidence that EP1 labels the entire FP and endosomes? This may be extensive, but this claim requires rather more evidence. This is again suggested at l263. Again, please forgive me for being pedantic, but this is an overstatement unless supported by evidence that would be incredibly difficult to obtain. This is even sort of acknowledged on l271 in the context of non-uniform labelling. This comes again in l336.

      The evidence that EP1 labels the entire FP and endosomes is presented here: Engstler and Boshart, 2004; 10.1101/gad.323404).

      Perhaps I should refrain from comments on the dangers of expansion microscopy, or asking what has actually been gained here. Oddly, the conclusion on l290 is a fair statement that I am happy with.

      An in-depth discussion regarding the advantages and disadvantages of expansion microscopy is beyond the manuscript's intended scope. Our approach involved utilizing various imaging techniques to confirm the validity of our findings. We appreciate that our concluding sentence is pleasing.

      F2 - The data in panel A seem quite poor to me. I also do not really understand why the DAPI stain in the first and second columns fails to coincide or why the kinetoplast is so diffuse in the second row. The labelling for EP1 presents as very small puncta, and hence is not evidence for a continuum. What is the arrow in A IV top? The data in panel B are certainly more in line with prior art, albeit that there is considerable heterogeneity in the labelling and of the FP for example. Again, I cannot really see this as evidence for continuity. There are gaps.... Albeit I accept that labelling of such structures is unlikely to ever be homogenous.

      We agree that the dSTORM data represents the least robust aspect of the findings we have presented, and we concur with relocating it to the supplementary material.

      F3 - Rather apparent, and specifically for Rab7, that there is differential representation - for example, Cell 4 presents a single Rab7 structure while the remaining examples demonstrate more extensive labelling. Again, I am content that these are highly dynamic strictures but this needs to be addressed at some level and commented upon. If the claim is for continuity, the dynamics observed here suggest the usual; some level of obvious overlap of organellar markers, but the representation in F3 is clever but not sure what I am looking at. Moreover, the title of the figure is nothing new. What is also a bit odd is that the extent of the Rab7 signal, and to some extent the other two Rabs used, is rather variable, which makes this unclear to me as to what is being detected. Given that the Rab proteins may be defining microdomains or regions, I would also expect a region of unique straining as well as the common areas. This needs to at least be discussed.

      The differences in the representation result from the dynamics of the labelled structures. Therefore, we have selected different cells to provide examples of what the labelling can look like. We now mention this in the results section.

      The overlap of the different Rab signals was perhaps to be expected, but we now have demonstrated it experimentally. Importantly, we performed a rigorous quantification by calculating the volume overlaps and the Pearson correlation coefficients.

      In previous studies the data were presented as maximal intensity projections, which inherently lack the complete 3D information.

      We found that Rab proteins define microdomains and that there are regions of unique staining as well as common areas, as shown in Figure 3. The volumes do not completely overlap. This is now more clearly stated in lines 315 – 319:

      “These objects showed areas of unique staining as well as partially overlapping regions. The pairwise colocalization of different endosomal markers is shown in Figure 3 A, XI - XIII and 3 B. The different cells in Figure 3 B were selected to represent the dynamic nature of the labelled structures. Consequently, the selected cells provide a variety of examples of how the labelling can appear.”

      This had already been stated in lines 331 – 336:

      “In summary, the quantitative colocalization analyses revealed that on the one hand, the endosomal system features a high degree of connectivity, with considerable overlap of endosomal marker regions, and on the other hand, TbRab5A, TbRab7, and TbRab11 also demarcate separated regions in that system. These results can be interpreted as evidence of a continuous endosomal membrane system harbouring functional subdomains, with a limited amount of potentially separated early, late or recycling endosomes.”

      F4-6 - Fabulous images. But a couple of issues here; first, as the authors point out, there is distance between the gold and the antigen. So, this of course also works in the z-plane as well as the x/y-planes and some of the gold may well be associated with membraneous figures that are out of the plane, which would indicate an absence of colinearity on one specific membrane. Secondly, in several instances, we have Rab7 essentially mixed with Rab11 or Rab5 positive membrane. While data are data and should be accepted, this is difficult to reconcile when, at least to some level, Rab7 is a marker for a late-endosomal structure and where the presence of degradative activity could reside. As division of function is, I assume, the major reason for intracellular compartmentalisation, such a level of admixture is hard to rationalise. A continuum is one thing but the data here seem to be suggesting something else, i.e. almost complete admixture.

      We are grateful for the positive feedback regarding the image quality. It is true that the "linkage error," representing the distance between the gold and the antigen, also functions to some extent in the z-axis. However, it's important to note that the zdimension of the section in these Figures is 55 nm. Nevertheless, it's interesting to observe that membranes, which may not be visible within the section itself but likely the corresponding Rab antigen, is discernible in Figure 4C (indicated by arrows).

      We have clarified this in lines 397 – 400:

      “Consequently, gold particles located further away may represent cytoplasmic TbRab proteins or, as the “linkage error” can also occur in the z-plane, correspond to membranes that are not visible within the 55 nm thickness of the cryosection (Figure 4, panel C, arrows). “

      The coexistence of different Rabs is most likely concentrated in regions where transitions between different functions are likely. Our focus was primarily on imaging membranes labelled with two markers. We wanted to show that the prevailing model of separate compartments in the trypanosome literature is not correct.

      F7 - Not sure what this adds beyond what was published by Grunfelder.

      First, this figure is an important control that links our results to published work (Grünfelder et al. (2003)). Second, we include double staining of cargo with Rab5, Rab7, and Rab11, whereas Grünfelder focused only on Rab11. Therefore, our data is original and of such high quality that it warrants a main figure.

      F8 - and l583. This is odd as the claim is 'proof' which in science is a hard thing to claim (and this is definitely not at a six sigma level of certainty, as used by the physics community). However, I am seeing structures in the tomograms which are not contiguous - there are gaps here between the individual features (Green in the figure).

      We have replaced the term "proof". It is important to note that the structures in individual tomograms cannot all be completely continuous because the sections are limited to a thickness of 250 nm. Therefore, it is likely that they have more connectivity above and below the imaged section. Nevertheless, we believe that the quality of the tomograms is satisfactory, considering that 3D Tokuyasu is a very demanding technique and the production of serial Tokuyasu tomograms is not feasible in practice.

      Discussion - Too long and the self-citing of four papers from the corresponding author to the exclusion of much prior work is again noted, with concerns about this as described above. Moreover, at least four additional Rab proteins are known associated with the trypanosome endosomal system, 4, 5B, 21 and 28. These have been completely ignored.

      We have outlined our position on referencing in original articles above. We also explained why we focused on the key marker proteins associated with early (Rab5), late (Rab7) and recycling endosomes (Rab11). We did not ignore the other Rabs, we just did not include them in the present study.

      Overall this is disappointing. I had expected a more robust analysis, with a clearer discussion and placement in context. I am not fully convinced that what we have here is as extreme as claimed, or that we have a substantial advance. There is nothing here that is mechanistic or the identification of a new set of gene products, process or function.

      We do not think that this is constructive feedback.

      This MS suggests that the endosomal system of African trypanosomes is a continuum of membrane structures rather than representing a set of distinct compartments. A combination of light and electron microscopy methods are used in support. The basic contention is very challenging to prove, and I'm not convinced that this has been. Furthermore, I am also unclear as to the significance of such an organisation; this seems not really addressed.

      We acknowledge and respect varying viewpoints, but we hold a differing perspective in this matter. We are convinced that the data decisively supports our interpretation. May future work support or refute our hypothesis.

      Reviewer #3 (Recommendations For The Authors):

      Line 81 - delete 's

      Done.

      Generally, the introduction was very well written and clearly summarised our current understanding but the paragraph beginning line 134 felt out of place and repeated some of the work mentioned earlier.

      We have removed this paragraph.

      For the EM analysis throughout quantification would be useful as highlighted in the public review. How many tomograms were examined, and how often were types of structures seen? I understand the sample size is often small but this would help the reader appreciate the diversity of structures seen.

      We have included the numbers.

      Following on from this how were the cells chosen for tomogram analysis? For example, the dividing cell in 1D has palisades associating with the new pocket - is this commonly seen? Does this reflect something happening in dividing cells. This point about endosomal division was picked up in the discussion but there was little about in the main results.

      This issue is undoubtedly inherent to the method itself, and we have made efforts to mitigate it by generating a series of tomograms recorded randomly. We have refrained from delving deeper into the intricacies of the cell cycle in this manuscript, as we believe that it warrants a separate paper.

      As the authors prosecute, the co-localisation analysis highlights the variable nature of the endosome and the overlap of different markers. When looking at the LM analysis, I was struck by the variability in the size and number of labelled structures in the different cells. For example, in 3A Rab7 is 2 blobs but in 3B Cell 1 it is 4/5 blobs. Is this just a reflection of the increase in the endosome during the cell cycle?

      The variability in representation is a direct consequence of the dynamic nature of the labelled structures. For this reason, we deliberately selected different cells to represent examples of how the labelling can look like. We have decided not to mention the dynamics of the endosome during the cell cycle. This will be the subject of a further report.

      Moreover, Rab 11 looks to be the marker covering the greatest volume of the endosomal system - is this true? I think there's more analysis of this data that could be done to try and get more information about the relative volumes etc of the different markers that haven't been drawn out. The focus here is on the co-localisation.

      Precisely because we recognize the importance of this point, we intend to turn our attention to the cell cycle in a separate publication.

      I appreciate that it is an awful lot of work to perform the immuno-EM and the data is of good quality but in the text, there could be a greater effort to tie this to the LM data. For example, from the Rab11 staining in LM you would expect this marker to be the most extensive across the networks - is this reflected in the EM?

      For the immuno-EM there were no numbers, the authors had measured the position of the gold but what was the proportion of gold that was in/near membranes for each marker? This would help the reader understand both the number of particles seen and the enrichment of the different regions.

      Our original intent was to perform a thorough quantification (using stereology) of the immuno-EM data. However, we later realized that the necessary random imaging approach is not suitable for Tokuyasu sections of trypanosomes. In short, the cells are too far apart, and the cell sections are only occasionally cut so that the endosomal membranes are sufficiently visible. Nevertheless, we continue to strive to generate more quantitative data using conventional immuno-EM.

      The innovative combination of Tokuyasu tomograms with immuno-EM was great. I noted though that there was a lack of fenestration in these models. Does this reflect the angle of the model or the processing of these samples?

      We are grateful to the referee, as we have asked ourselves the same question. However, we do not attribute the apparent lack of fenestration to the viewing angle, since we did not find fenestration in any of the Tokuyasu tomograms. Our suspicion is more directed towards a methodological problem. In the Tokuyasu workflow, all structures are mainly fixed with aldehydes. As a result, lipids are only effectively fixed through their association with membrane proteins. We suggest that the fenestration may not be visible because the corresponding lipids may have been lost due to incomplete fixation.

      We now clearly state this in the lines 563 – 568.

      “Interestingly, these tomograms did not exhibit the fenestration pattern identified in conventional electron tomography. We suspect that this is due to methodological reasons. The Tokuyasu procedure uses only aldehydes to fix all structures. Consequently, effective fixation of lipids occurs only through their association with membrane proteins. Thus, the lack of visible fenestration is likely due to possible loss of lipids during incomplete fixation.”

      The discussion needs to be reworked. Throughout it contains references to results not in the main results section such as supplementary movie 2 (line 735). The explicit references to the data and figures felt odd and more suited to the results rather than the discussion. Currently, each result is discussed individually in turn and more effort needs to be made to integrate the results from this analysis here but also with previous work and the data from other organisms, which at the moment sits in a standalone section at the end of the discussion.

      We have improved the discussion and removed the previous supplementary movies 2 and 3. Supplementary movie 1 is now mentioned in the results section.

      Line 693 - There was an interesting point about dividing cells describing the maintenance of endosomes next to the old pocket. Does that mean there was no endosome by the new pocket and if so where is this data in the manuscript? This point relates back to my question about how cells were chosen for analysis - how many dividing cells were examined by tomography?

      The fate of endosomes during the cell cycle is not the subject of this paper. In this manuscript we only show only one dividing cell using tomography. An in-depth analysis focusing on what happens during the cell cycle will be published separately.

      Line 729 - I'm unclear how this represents a polarization of function in the flagellar pocket. The pocket I presume is included within the endosomal system for this analysis but there was no specific mention of it in the results and no marker of each position to help define any specialisation. From the results, I thought the focus was on endosomal co-localisation of the different markers. If the authors are thinking about specialisation of the pocket this paper from Mark Field shows there is evidence for the exocyst to be distributed over the entire surface of the pocket, which is relevant to the discussion here. Boehm, C.M. et al. (2017) The trypanosome exocyst: a conserved structure revealing a new role in endocytosis. PLoS Pathog. 13, e1006063

      We have formulated our statement more cautiously. However, we are convinced that membrane exchange cannot physically work without functional polarization of the pocket. We know that Rab11, for example, is not evenly distributed on the pocket. By the way, in Boehm et al. (2017) the exocyst is not shown to cover the entire pocket (as shown in Supplementary Video 1).

      We now refer to Boehm et al. (Lines 700 – 703):

      “Boehm et al (2017) report that in the flagellar pocket endocytic and exocytic sites are in close proximity but do not overlap. We further suggest that the fusion of EXCs with the flagellar pocket membrane and clathrin-mediated endocytosis take place on different sites of the pocket. This disparity explains the lower colocalization between TbRab11 and TbRab5A.”

      Line 735 - link to data not previously mentioned I think. When I looked at this data I couldn't find a key to explain what all the different colours related to.

      We have removed the previous supplementary movies 2 and 3. We now reference supplementary movie 1 in the results section.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Du et al. address the cell cycle-dependent clearance of misfolded protein aggregates mediated by the endoplasmic reticulum (ER) associated Hsp70 chaperone family and ER reorganisation. The observations are interesting and impactful to the field.

      Strength:

      The manuscript addresses the connection between the clearance of misfolded protein aggregates and the cell cycle using a proteostasis reporter targeted to ER in multiple cell lines. Through imaging and some biochemical assays, they establish the role of BiP, an

      Hsp70 family chaperone, and Cdk1 inactivation in aggregate clearance upon mitotic exit.

      Furthermore, the authors present an initial analysis of the role of ER reorganisation in this clearance. These are important correlations and could have implications for ageingassociated pathologies. Overall, the results are convincing and impactful to the field.

      Weakness:

      The manuscript still lacks a mechanistic understanding of aggregate clearance. Even though the authors have provided the role of different cellular components, such as BiP, Cdk1 and ATL2/3 through specific inhibitors, at least an outline establishing the sequence of events leading to clearance is missing. Moreover, the authors show that the levels of ERFlucDM-eGFP do not change significantly throughout the cell cycle, indicating that protein degradation is not in play. Therefore, addressing/elaborating on the mechanism of disassembly can add value to the work. Also, the physiological relevance of aggregate clearance upon mitotic exit has not been tested, nor have the cellular targets of this mode of clearance been identified or discussed.

      Thank you for your suggestions. 

      We have added descriptions about the sequence of events leading to clearance in the abstract (line 33) and discussion (line 316). 

      We have commented on the future work that could address the molecular mechanisms behind the aggregate clearance in the discussion (line 388). 

      It has been difficult to address the physiological relevance of aggregate clearance during cell division, as the inhibition of BiP or depletion of ATL2/3 that prevent aggregate clearance cause cellular consequences not specific to aggregate clearance. Future work that lead to understanding of aggregate clearance at the molecular level may allow us to address this more specifically. Furthermore, we have commented about the potential defects that could arise in cells expressing ER-FlucDM-eGFP that have a perturbed cellular health based on the proteomic analysis (line 359). 

      To identify pathological targets that undergo clearance as the ER-FlucDM-eGFP, we tested three pathological mutants (CFTR-∆F508, AAT S and Z variants) that are known to mis-fold and accumulate in the ER. Unfortunately, expression of these mutants did not result in the confinement of aggregates in the nucleus. The data related to this have been added as Figure S1E and S1F (line 102) in this revised manuscript. We have also commented in the discussion that pathological targets are yet to be identified and could be a part of future work (line 392).

      Reviewer #2 (Public review):

      This paper describes an interesting observation that ER-targeted misfolded proteins are trapped within vesicles inside nucleus to facilitate quality control during cell division. This work supports the concept that transient sequestration of misfolded proteins is a fundamental mechanism of protein quality control. The authors satisfactorily addressed several points asked in the review of first submission. The manuscript is improved but still unable to fully address the mechanisms.

      Strengths:

      The observations in this manuscript are very interesting and open up many questions on proteostasis biology.

      Weaknesses:

      Despite inclusions of several protein-level experiments, the manuscript remained a microscopy-driven work and missed the opportunity to work out the mechanisms behind the observations.

      Thank you for your suggestions. We believe that our study has provided a genetic basis for the involvement of ER reorganization and BiP during cell division in aggregate clearance, which is a new observation. We have also commented in this revised manuscript about the future work that could address the molecular mechanisms behind the aggregate clearance in the discussion (line 388).  

      Reviewer #3 (Public review):

      This paper describes a new mechanism for the clearance of protein aggregates associated to endoplasmic reticulum re-organization that occurs during mitosis.

      Experimental data showing clearance of protein aggregates during mitosis is solid, statistically significant, and very interesting. The authors made several new experiments included in the revised version to address the concerns raised by reviewers. A new proteomic analysis, co-localization of the aggregates with the ER membrane Sec61beta protein, expression of the aggregate-prone protein in the nucleus does not result in accumulation of aggregates, detection of protein aggregates in the insoluble faction after cell disruption and mostly importantly knockdown of ATL proteins involved in the organization of ER shape and structure impaired the clearance mechanism. This last observation addresses one of the weakest points of the original version which was the lack of experimental correlation between ER structure capability to re-shape and the clearance mechanism.

      In conclusion, this new mechanism of protein aggregate clearance from the ER was not completely understood in this work but the manuscript presented, particularly in the revised version, an ensemble of solid observations and mechanistic information to scaffold future studies that clarify more details of this mechanism. As stated by the authors: "How protein aggregates are targeted and assembled into the intranuclear membranous structure waits for future investigation". This new mechanism of aggregate clearance from the ER is not expected to be fully understood in a single work but this paper may constitute one step to better comprehend the cell capability to resolve protein aggregates in different cell compartments.

      We thank the reviewer for the comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript presents a very interesting set of observations that could have significant implications on age-related protein misfolding and aggregate clearance. There are a few places in the manuscript that still need more clarity. Some are listed below, which I think can improve the manuscript.

      - The new data associated with proteomic analysis is appreciated, but the information gained has not been explored or elaborated sufficiently in the manuscript. Based on the differential expression of cell cycle proteins, how the authors interpret cellular health is unclear. Also, the physiological role of this mode of aggregate clearance remains unclear.

      We have added our interpretation of perturbed cellular health in cells expressing ERFlucDM-eGFP in the discussion (line 359). 

      It has been difficult to address the physiological relevance of aggregate clearance during cell division, as the inhibition of BiP or depletion of ATL2/3 that prevent aggregate clearance cause cellular consequences not specific to aggregate clearance. Future work that lead to understanding of aggregate clearance at the molecular level may allow us to address this more specifically.

      - In Figure 3A, have the authors measured the total GFP intensity from interphase through early G1? Even though the number and area of the aggregates decrease significantly, the cytoplasmic GFP signal does not seem to increase. Considering new CHX chase experiments and total Fluorescence intensity calculations (Figure S7D), which indicate no difference, one would expect an increase in cytoplasmic signal upon the disassembly of aggregates. Therefore, the data from Figures 3A and 7D seem contradictory. Can the authors please explain?

      We apologized for the confusion. The images in Figure 3A were derived from fixed cells. So, different cells were shown in every cell cycle phases and were not suitable for quantification. Fluorescence intensity changes could be better appreciated in Figure 3C or 4D as these were time-lapse microscopy images of live cells progressing through mitosis and cytokinesis. Data used in the quantification of fluorescence intensity in Figure S7D were derived from live cells taken from specific time points to avoid unwanted fluorescence bleaching during time-lapse microscopy. 

      - Do the authors expect a similar clearance of pathological aggregates such as mutant FUS or TDP43 condensates? Showing aggregate disassembly of disease-relevant aggregates would be an excellent addition to the manuscript, but it might be beyond the scope of the current version. However, the authors can comment/speculate how their study might extend to pathological condensates.

      We tested three pathological mutants (CFTR-∆F508, AAT S and Z variants) that are known to mis-fold and accumulate in the ER. Unfortunately, expression of these mutants did not result in the confinement of aggregates in the nucleus. The data related to this have been added as Figure S1E and S1F (line 102) in this revised manuscript. We have commented that pathological targets are yet to be identified and could be a part of future work (line 392).

      - The presence of ER membrane around these aggregates is an interesting observation. This membrane is retained even after nuclear membrane breakdown. What could be the relevance of membrane-bound aggregates, especially since the membrane can limit the access of chaperones involved in disassembly? This observation becomes more important since the depletion of ER membrane fusion proteins also leads to the accumulation of aggregates. Are the membranes a beacon for disassembly? The authors may comment/ speculate. This could also be an important aspect of the mechanism of clearance.

      We think that the ER membranes around the aggregates are disassembled when the ER networks reorganize during mitotic exit and this may allow accessibility of BiP to disaggregate the aggregates. We have added this in the discussion (line 316).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __ Summary In this work, the authors present a careful study of the lattice of the indirect flight muscle (IFM) in Drosophila using data from a morphometric analysis. To this end, an automated tool is developed for precise, high-throughput measurements of sarcomere length and myofibril width, and various microscopy techniques are used to assess sub-sarcomeric structures. These methods are applied to analyze sarcomere structure at multiple stages in the process of myofibrillogenesis. In addition, the authors present various factors and experimental methods that may affect the accurate measurement of IFM structures. Although the comprehensive structural study is appreciated, there are major issues with the presentation/scope of the work that need to be addressed: Major Comments 1. The main weakness of the paper is in its claim of presenting a model of the sarcomere. Indeed, the paper reports a structural study that is drawn onto a 3D schematic. There is no myofibrillogenesis model that would provide insights into mechanisms. Therefore, the use of the word model is grossly overstated.

      In biology, the term “model” is used in various contexts, but it generally refers to a simplified representation of a biological system, a structure or a process. Accordingly, we consider “model” the most fitting phrase for what we present in Figure 4 (Figure 7 in the revised manuscript). These are not arbitrary 3D schematics; they are scaled representations in which the length, the number and the relative three-dimensional arrangement of thin and thick filaments are based on measurements. These measurements are primarily based on our own data (presented in the main text and provided in the supplementary materials), as published data were either lacking or inconsistent. Moreover, we would like to highlight that we do not claim to present a conceptual or mechanistic model of myofibrillogenesis, but we do present structural reconstructions or models for four developmental time points. Therefore, we disagree with the remark that “the use of the word model is grossly overstated”, as our wording fully corresponds to the common sense.

      In general, the major focus and contribution of the work is unclear. How does the comprehensive nature of the measurements contribute to existing literature?

      We significantly revised the text to highlight the main points more firmly, and added an additional section to help non-specialist readers to better understand our aims and findings.

      Figure labels are often rather confusing - for example it is unclear why there is a B, B', B' etc instead of B,C,D, etc.

      The figure labels have been revised in accordance with the reviewer’s recommendation.

      Some comments in the text are not clearly tied to the figures. For example, in lines 108-109, are the authors referring to the shadow along the edges of the myofibril when saying they are not clearly defined (Figure 1C)?

      The lines refer to the fact that identifying the boundary of an “object” in a fluorescence microscopy image is inherently challenging - even under ideal conditions where the object’s image is not affected by nearby signals or background noise. To improve clarity, we revised this section and now it reads: The other key parameter - myofibril diameter - is typically measured using phalloidin staining. However, accurately delineating their boundaries in micrographs is difficult - even under optimal conditions (high signal‑to‑noise ratio, no overlapping fibers, etc.; Fig. 1C). This limitation arises from the fundamental nature of light microscopy as the image produced is a blurred version of the actual structure, due to convolution with the microscope’s point spread function.

      In line 116, it is unclear what "surrounding structures" the authors are referring to if the myofibrils are isolated.

      We revised the text for clarity. It now states: Once isolated, myofibrils lie flat on the coverslip, aligning with the focal plane of the objective lens. This orientation allows for high-resolution, undistorted imaging and accurate two-dimensional measurements, free from interference by neighboring biological structures (e.g.: other myofibrils).

      In lines 141-142, there is no reference of data to back up the claim of validation.

      We addressed this mistake by including a reference to Fig. S1E (Fig. S1D in the revised manuscript).

      In line 170, the authors mention the mef2-Gal4/+ strain as a Gal4 driver line but do not clearly state how this strain is different from the wildtypes or how this impacts their results.

      Mef2-Gal4 is a muscle-specific Gal4 driver, often used in Drosophila muscle studies. It is a convention between Drosophila geneticists that presence of a transgene (i.e. Mef2-Gal4) changes the genetic background, and although it does not necessariliy cause any phenotypic effect, it is clearly distinguished from the wild type situation, and whenever relevant, Mef2-Gal4/+ is the preferred choice (if not the correct choice) as a control instead of wild type. As clear from our data, presence of the Mef2-Gal4 driver line does not affect the length or width of IFM sarcomeres as compared to wild type.

      In lines 182-185, the authors discuss the effects of tissue embedding on morphometrics. Were factors such as animal sex, age, fiber type, etc. conserved in these experiments? If not, any differences in results may be confounding.

      We fully agree with the reviewer that when testing the effect of a single variable, all other variables should remain constant. This is actually one of the main points emphasized in the results section. Additionally, this information is already provided in the Source Data files for each panel.

      In lines 199-201, the authors discuss results of myofibril diameter using different preparation methods, yet no data is cited to support the claims. In line 220, the phrase "6 independent experiments" is unclear. Is each independent experiment performed using a different animal? Furthermore, are 6 experiments performed for each time point?

      We substantially revised the relevant paragraphs and ensured that the corresponding data (Figure 2A in the revised manuscript) is cited each time when it is discussed. We conducted six independent experiments at each time point. This is consistently indicated in the figures and can be verified in the SourceData files (specifically, Fig3SourceData in this case). To clarify what we mean by "independent experiments," we added the following sentence to the Methods section: Experiments were considered independent when specimens came from different parental crosses, and each experiment included approximately six animals to capture individual variability.

      In line 254, the authors refer to "number of sarcomeres". It must be clearly stated if this refers to sarcomeres per myofibril, image area, etc.

      It is now clearly stated as: "number of sarcomeres per myofibril".

      In line 274, the authors refer to "myofilament number". It must be clearly stated if this refers to myofilaments per myofibril, image area, etc.

      We counted the number of myofilaments in developing myofibrils, and this is now clearly stated in the text and in the legend of Figure 3 (Figure 4 in the revised manuscript).

      In line 299, the authors mention that thin filaments measured less than 560 nm in length, yet no data is cited to support this.

      The previously missing reference to Figure 4 (Figure 7 in the revised manuscript) has now been added in addition to the revised Supplementary Figure 5.

      In the "Quantifying sarcomere growth dynamics" section of the summary (starting from line 402) the authors introduce data that would be more naturally placed in the results and discussion section.

      As suggested by the reviewer, we incorporated the key aspects of sarcomere growth dynamics into the Results and Discussion section.

      In lines 422-423, it is not mentioned what the controls are for.

      This was already explained in the main text between lines 167 and 173.

      In the caption of Figure 1C, it is not mentioned what the red dashed lines in the microscope images represent.

      The caption has been updated to include the following clarification: The red dashed lines border the ROI used for generating the intensity profiles.

      In the caption of Figure 1D, the difference between the lighter and darker grey points is not mentioned.

      This was already explained in each relevant figure legend. In this specific case, it is stated between lines 850 and 852: “Light gray dots represent individual measurements of sarcomere length and myofibril diameter, while the larger dots indicate the mean values from independent experiments.”

      In line 849, the stated p-value (0.003) does not match that mentioned in the figure (0.0003).

      We thank the reviewer for noticing this small mistake; correction was made to display the accurate p-value of 0.0003 at both places.

      In line 874, it is not clear what an "independent experiment" refers to (different animal, etc.?).

      We refer the reviewer to point 9, where this question has already been addressed.

      Figure 2A is hard to read. Using different colored dots for different time points might help.

      As suggested by the reviewer, we generated a plot with the individual points color-coded by time.

      The significant figures presented in Figure 4 give a completely inaccurate representation of the variability of the measurements achieved with these techniques.

      Certainly, each measured parameter exhibits inherent biological and technical variability. We have made all the raw data available to the reader through the SourceData files, and this variability is also evident in Figures 1, 2, 3, Supplementary Figure 1, 3, and 5 (Figure 1, 2, 3, 4, 6, and Supplementary Figure 1 in the revised manuscript). Also we have included an additional plot (Supplementary Figure 5 in the revised manuscript) that presents the calculated thin and thick filament lengths and their uncertainty. However, in Figure 4 (Figure 7 in the revised manuscript), our goal was to present an easily understandable visual representation of the sarcomeric structures for each time point, based on the averages of the relevant measurements.

      In line 877, it should be mentioned that the number of filaments is counted per myofibril. The y-axes in the figure should also be adjusted to clarify this.

      As suggested by the reviewer, both the figure legend and the plot have been updated to clearly indicate that the filament count refers to the number per myofibril.

      In line 883, it is not clear what an "independent experiment" refers to (different animal, etc.?).

      We refer the reviewer to point 9, where this question has already been addressed.

      The statement of sample sizes in all figures is a little confusing.

      Following general guidelines, we used SuperPlots to effectively present the data, as nicely demonstrated in the JCB viewpoint article by Lord et al., 2020 (PMID: 32346721). Individual measurements are shown as pooled data points, allowing readers to appreciate the spread, distribution and number of measurements. Overlaid on these pooled dot plots are the mean values from each independent experiment, with error bars representing variability between independent experiments. Sample sizes are provided for both individual measurements and independent experiments. This is now clearly explained in the Materials and Methods section, and we corrected the legends to improve clarity (“n” indicates the number of independent experiments/individual measurements).

      In lines 1007-1008, the authors imply that the lattice model is needed for calculation of myofilament length. However, from the equations and previous data, it seems that this can be estimated using the confocal and dSTORM images.

      As the reviewer correctly noted, myofilament length can be estimated using measurements from confocal and dSTORM images, following the equations provided. However, constructing even a simplified model requires multiple constraints to be defined and applied in a specific order. In practice, one must first determine the number and arrangement of myofilaments in a cross-sectional view of an “average sarcomere” before attempting to build a longitudinal model, where length calculations become relevant. This is now clarified in the text.

      A more specific discussion of future directions is needed to put this paper in context. For example: Can anything from the overall process be used to better understand sarcomere dynamics in larger animals/humans? Can this be applied to disease modelling?

      To address these questions, we have added a section titled STUDY LIMITATIONS, which states: “Our study is focused on describing the growth of IFM sarcomeres during myofibrillogenesis at the level of individual myofilaments. Additionally, we developed a user-friendly software tool for precise sarcomere size measurements and demonstrate that these measurements are sensitive to varying conditions. Whereas, this tool can be used successfully on whole muscle fiber preparations as well, our pipeline was intentionally optimized for individual IFM myofibrils ensuring higher measurement precision in our hands than other type of preparations. Thus, we predict that future work will be required to extend it to sarcomeres from other muscle tissues or species. Nevertheless, our study exemplifies a workflow how to measure sarcomere dimensions precisely. With some variations, it should be possible to adopt it for other muscles, including vertebrate and human striated muscles. To facilitate this and to enhance the accessibility and usability of this dataset, we welcome any feedback and suggestions from researchers in the field.”

      One of the major claims of the paper is that there is a measurable variability with sex and other parameters. However, this data is never clearly summarized, presented (except for supplement), or discussed for its implications.

      We followed the suggestion of the reviewer, and we moved this supplementary data into a main figure, and thoroughly revised the corresponding paragraphs to present and discuss the findings more clearly.

      Minor Comments: 1. Lines 60-65 seem to break the flow of the introduction. As the authors discuss existing methods in literature for IFM analysis in the previous couple sentences, the following sentences should clearly state the limitations of existing methods/current gap in literature and a general idea of what the current work is contributing.

      We agree with this remark, and we substantially revised the Introduction to clearly define the existing gap in the literature and to articulate how our work addresses this gap.

      In line 104, the acronym for ZASPs is not spelled out.

      The acronym has now been spelled out for clarity.

      **Referee Cross-commenting**

      I agree as well.

      Reviewer #1 (Significance (Required)):

      In summary, this paper provides a multi-scale characterization of Drosophila flight muscle sarcomere structure under a variety of conditions, which is potentially a significant contribution for the field. However, the paper scope is overstated in that it does not provide an actual sarcomere model. Further, there are multiple issues with data presentation that impact the readability of the manuscript.

      Although it is somewhat unclear what would be “an actual sarcomere model” for the reviewer, but we cannot accept that we made on overstatement by using the word “model”, because one of the main outcomes of our work are indeed the myofilament level sarcomere models depicted in Figure 4 (Figure 7 in the revised manuscript). As said above, we do not claim that these would be molecular models, or mechanistic models or developmental models, but it makes absolutely nonsense (even in common terms!) that our scaled graphical representations (based on a wealth of measurements) should not be or cannot be called models.

      As to the comment with data presentation, we thank the reviewer for the numerous suggestions, and we substantially revised the manuscript to increase clarity and overall readability.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ Summary: In this manuscript titled "A myofilament lattice model of Drosophila flight muscle sarcomeres based on multiscale morphometric analysis during development," Görög et al. perform a detailed analysis of morphological parameters of the indirect flight muscle (IFM) of D. melanogaster. The authors start by illustrating the range of measurements reported in the literature for mature IFM sarcomere length and width, showing a need to revisit and determine a standardized measurement. They develop a new Python-based tool, IMA, to analyze sarcomere lengths from confocal micrographs of isolated myofibrils stained with phalloidin and a z-disc marker. Using this tool, they demonstrate that sample preparation (especially mounting medium), as well as fiber type, sex, and age influence sarcomere measurements. Combining IMA, TEM, and STORM data, they measure sarcomere parameters across development, providing a comprehensive and up-to-date set of "standardized" sarcomere measurements. Using these data, they generate a model integrating all of the parameters to model sarcomeres at four discrete timepoints of development, recapitulating key phases of sarcomere formation and growth.

      Major comments: Line 200 & 901 - Figure S1B - The authors make a strong statement about the use of liquid versus hardening media, and it is clear from the image provided in Figure S1 that there is a difference in the apparent sarcomere width. The identity of the "liquid media" versus the "hardening media" should be clearly identified in the Results, in addition to the legend for Figure S1. The authors show that "glycerol-based solutions" increase sarcomere width, but the Materials only list 90% glycerol and PBS. However, a frequently used liquid mounting media is Vectashield. Based on the literature, measurements in liquid Vectashield show diameters significantly less than 2.2 microns observed here with presumably 90% glycerol or PBS. Can the authors qualify this statement, or provide data that all forms of liquid mounting media cause this effect? Does this also apply to hemi-thorax and sectioned preparations, or just isolated myofibrils?

      We used a PBS-based solution containing 90% glycerol as our liquid medium, as now stated in the main text. In response to the reviewer’s suggestion, we also tested a non-hardening version of Vectashield (H-1000). Myofibrils in Vectashield were significantly thicker than those in ProLong Gold but still thinner than those in the 90% glycerol–PBS solution, shown in Figure 2B. The mechanisms that could potentially explain these observations have been described in several studies (Miller et al., 2008; Tanner et al., 2011, 2012). Briefly, IFM is a densely packed macromolecular assembly. Upon removal of the cell membrane, myofibrillar proteins attract water, leading to overhydration of the myofilament lattice. This increases the spacing between filaments, resulting in an expansion of overall myofibril diameter. The extent of hydration depends on the osmolarity of the surrounding medium, as the system eventually reaches osmotic equilibrium. While both liquid media induced significant swelling, the observed differences likely reflect variations in their osmotic properties. In contrast, dehydration - an essential step in electron microscopy sample preparation - reduces the spacing between filaments, making myofibrils appear thinner. This explains why EM micrographs consistently show significantly smaller myofibril diameters (Chakravorty et al., 2017).

              Hardening media such as ProLong Gold introduce additional artifacts: during polymerization, these media shrink, exerting compressive forces on the tissue (Jonkman et al., 2020). We therefore propose that isolated myofibrils first expand due to overhydration in the dissection solution, and are then compressed back toward their *in vivo* dimensions during incubation in ProLong Gold. The average *in vivo* diameter of IFM myofibrils can be estimated without direct measurements, as it is determined by two key factors: (i) the number of myofilaments, which has been quantified in EM cross-sections in several studies (Fernandes & Schöck, 2014; Shwartz et al., 2016; Chakravorty et al., 2017) including our own, and (ii) the spacing between filaments, which can be measured by X-ray diffraction even in live *Drosophila* or under various experimental conditions (Irving & Maughan, 2000; Miller et al., 2008; Tanner et al., 2011, 2012). Our findings suggest that the effects of lattice overhydration and media-induced shrinkage are most pronounced in isolated myofibrils. In larger tissue preparations, the inter-myofibrillar space likely acts as a mechanical and osmotic buffer, reducing the extent of such distortions
      

      Can the authors comment on whether the length of fixation or fixation buffer solution, in addition to the mounting medium, make a difference on sarcomere length and diameter measurements? This is another source of variation in published protocols.

      The effect of fixation time on sarcomere morphometrics in whole-mount IFM preparations has been previously demonstrated by DeAguero et al. (2019), as briefly noted in our manuscript. To extend these findings, we performed a comparison using isolated myofibrils, assessing morphometric parameters after fixation for 10, 20 (standard) and 60 minutes. We found no difference between the 10- and 20-minute fixation conditions; however, fixation for 60 minutes resulted in significantly increased myofibril diameter (and these data are now shown in Supplementary Figure 1C). A comparable increase in thickness was also observed when using a glutaraldehyde-based fixative. These results suggest that more extensively fixed myofibrils may better resist the compressive forces exerted by hardening media.

      Line 237-238. The authors conclude that premyofibrils are much thinner than previously measured. The use of Airyscan to more accurately measure myofibril width at this timepoint is a good contribution, as indeed diffraction and light scatter likely contribute to increased width measured in light microscopy images. I also wonder, though, how well the IMP software performs in measuring width at 36h APF, given how irregular the isolated myofibrils at this stage look (wide z-lines but thinner and weaker H and I bands as shown in Fig. 2B)?

      The reviewer is correct that measurements during the early stages of myofibrillogenesis require additional effort. However, in addition to its automatic mode, IMA can also operate in semi-automatic or manual modes, ensuring complete control over the measurements. Myofibril width is determined from the phalloidin channel at the Z-line (as described in the software’s User Guide and Supplementary Figure 2), where it is at its thickest.

      Also, how much of the difference in sarcomere width arises due to effects of "stripping" components off of the sarcomere at the earliest timepoint (for example alpha-actinin or Zasp proteins)?

      A comparison between isolated myofibrils and those from microdissected muscles (Supplementary Figure 3B, Figure 3C in the revised manuscript) shows that the isolation process does not alter the morphometric measurements of sarcomeres. Moreover, the measured myofibril width aligns well with what we expect based on the number of myofilaments observed in TEM cross-sections of myofibrils at 36 hours APF (Figure 3A, now Figure 4A in the revised manuscript), supporting the consistency of our model.

      Myofibrils at early timepoints do contain more than 4-12 sarcomeres in a line (they extend the full length of the myofiber), so it is possible they are breaking due to the detergent and mechanical disruption induced by the isolation method.

      The reviewer is correct - myofibrils likely span the full length of the myofiber from the onset of myofibrillogenesis. However, during the isolation of individual myofibrils, they often break, and even mature myofibrils typically fragment into pieces of about 300 µm in length (illustrated in Figure 1E, now Figure 2A in the revised manuscript). Importantly, our measurements show that this fragmentation does not affect the assessed sarcomere length or width (as shown in Supplementary Figure 3B, now Figure 3C in the revised manuscript).

      Line 312 - What does "stable association" mean in this context? The authors mention early timepoints lack stable association of alpha-Actinin or Zasp52, and they reference Fig. S4C, but this figure only shows 72h and 24 AE, not 36h and 48 h APF. Previous reports have seen localization of both alpha-Actinin and Zasp52, so presumably the detergent or mechanical isolation is stripping these components off of the isolated myofibrils up until 72h.

      In agreement with previous reports, we also detected both α-Actinin (as shown in former Supplementary Figure 3B, now Figure 3C) and Zasp52 in microdissected IFM starting from 36 hours APF. However, these markers were largely absent from the isolated myofibrils of young pupae (36 to 60 hours APF). By 60 hours APF, strong α-Actinin and Zasp52 staining became evident in isolated myofibrils, whereas dTitin epitopes were clearly detectable from the earliest time point examined. This indicates that some proteins, such as α-Actinin and Zasp52, can be lost during the isolation process, whereas others like dTitin are retained and this differential sensitivity appears to depend on developmental stage. A likely explanation is that α-Actinin and Zasp52 are recruited early to Z-bodies but are only fully incorporated as more mature Z-disks form between 48 and 60 hours APF. This incomplete incorporation at the earlier stages could account for their loss during the isolation process. This interpretation is supported by our morphological analysis of the Z-discs, as shown in the dSTORM dataset (former Figure 3B, B’’, now Figure 4C, E) and in longitudinal TEM sections (former Supplementary Figure 5B, now in Figure 6B). Because α-Actinin and Zasp52 are not detected in isolated myofibrils at 36 and 48 hours APF, they are not included in Figure S4C (Figure 5C in the revised manuscript). This is explained in the updated figure legend.

      This same type of issue comes up again in Lines 325-334, where the authors talk about 3E8 and MAC147. They state that 3E8 signal significantly declines in later stages and that MAC147 is not suitable to label myofibrils in young pupae, but they only show data from 72 APF and 24 AE (which looks to have decent staining for both 3E8 and MAC147). A clearer explanation here would be helpful.

      To put it simply: we used one myosin antibody to label the A-band in the IFM of 36h APF and 48h APF animals, and a different antibody for the 72h APF and 24h AE stages. In more detail: Myosin 3E8 is a monoclonal antibody targeting the myosin heavy chain and labels the entire length of mature thick filaments except for the bare zone (former Supplementary Figure 4D, now in Figure 5D), suggesting its epitope is near the head domain. As a result, we expect a uniform A-band staining - excluding the bare zone - which is exactly what we observe in the IFM of young pupae (36h APF and 48h APF; formerly Figure 3B, now Figure 4C in the revised manuscript). However, at 72h APF and 24h AE, Myosin 3E8 produces a different staining pattern: two narrow stripes flanking the bare zone and two broader, more diffuse stripes near the A/I band junction (former Supplementary Figure 4D, now Figure 5D). This change is likely due to restricted antigen accessibility at these later developmental stages - a common issue in the densely packed IFM - making this antibody unsuitable for reliably measuring thick filament length in these stages.

      MAC147 is another monoclonal antibody against Mhc that recognizes an epitope near the head domain. However, it only works reliably in more mature myofibrils (72h APF and 24h AE; formerly Figure 3B, now Figure 4C in the revised manuscript), likely due to its specificity for a particular Mhc isoform. This is why we do not include images from earlier developmental stages using this antibody. We added a revised, concise explanation in the main text for general readers, and provided a more detailed description for specialist readers in the legend of Supplementary Figure 4D (updated as Figure 5D in the revised manuscript).

      Figure 3B. The authors show the H, Z, and I lengths in B', B', and B' and discuss these lengths in the text (lines 305-320). It would also be nice to actually have the plots showing the measured/calculated lengths for thin and thick filaments. These are mentioned in the results, but I cannot find the plots in the figures and there is no panel reference.

      A summary table of the measured and calculated parameters is provided in Fig4SourceData (Fig7Source Data in the revised manuscript). However, following the reviewer’s suggestion, we also generated an additional plot (Supplementary Figure 5 in the revised manuscript) that displays the calculated thin and thick filament lengths.

      Line 400. Does the model in Figure 4 actually have molecular resolution as the authors claim? From these views, thick and thin filaments appear to be represented by cylindrical objects. Localization of specific molecules would require further modeling with individual proteins. Or do the authors mean localization from STORM imaging relative to the ends of the thick and/or thin filaments? The model itself is a useful contribution, but based on Figure 4, resolution of individual molecules is not evident.

      The reviewer is correct; and we fully agree that we do not present a molecular model of sarcomeres in this study - nor do we claim to. Instead we present a myofilament level model. Nevertheless, the scaled myofilament lattice model we introduce could serve as a geometric constraint when constructing supramolecular models of sarcomeres. As the reviewer rightly notes, implementing such an approach would require additional effort.

      The main Results section of the text is condensed into 4 figures. However, I found myself flipping back and forth between the main figures and the supplement continuously, especially parts of Supplemental Figures 1, 3, 4, and 5. With such large amounts of detail in the Results relying on the supplement, it may be worth considering reorganizing the main and supplemental figures, and having 7 main figures, to include important panels that are currently in the supplement (esp. Fig S1B, S1C, S1D, S3B, S4, S5).

      We found it a very useful suggestion, and we substantially reorganized the figures in the revised manuscript according to the recommendations of the reviewer.

      Minor comments: On the plots in Fig. S1B, D, and F, it is hard to see the color of the dots because the red error bars are on top of them. Can the other distribution dots be tinted the correct color or the x-axis labels be added, so it is clear which dataset is which?

      We significantly enlarged the dots to enhance visual clarity.

      Line 142 needs a reference to Figure S1, Panel E, which shows the accuracy and precision measurements.

      The requested panel reference has now been included in the revised manuscript.

      Lines 198 - is this range from the above publications? Needs to be clearly cited.

      The range has indeed been estimated using measurements from the aforementioned publications, and this point is now further clarified in the revised text.

      Figure S3B is confusing - why do the blow-ups overlap both the top (presumably microdissected) and the bottom (presumably isolated) images? The identity of microdissected images should be labeled, as they are hard to see underneath of the blown-up images and the identity of individual image planes wasn't immediately obvious.

      We refined the panel structure of Figure S3B (Figure 3C in the revised manuscript) to enhance clarity as the reviewer suggested.

      Line 298. By "misaligned," do the authors mean the pointed ends are not uniformly anchored in the z-disc, leading to the wide z-disc measurements? At this early stage, I'm not sure "misaligned" is the right word - perhaps "were not yet aligned in register at the z-disc" or something similar.

      We revised the text for clarity. It now reads: At 36 hours APF, thin filaments had not yet aligned in perfect register at the Z-disc, with most measuring less than 560 nm in length - and exhibiting considerable variability.

      Figure S6 - spelling mistake in label of panel A, "sarcomer" should be "sarcomere"

      The typo is corrected.

      Line 487. Spelling "Zaps52" should be "Zasp52"

      The typo is corrected.

      Line 887. Spelling "Myofilement" should be "Myofilament"

      The typo is corrected.

      Line 946-947. In the legend for Supp. Fig. 3., the authors should specify which published datasets on sarcomere length are shown in the figure by including the references in the legend. Presumably the "isolated individual myofibrils" are the blue "this study" lines, leaving the "microdissected muscles" as the magenta "previous reports" on the figure. Without the reference, it is not clear if these are microdissected, isolated myofibrils, hemi-thorax sections, cryosections, or another preparation method for the "previous reports" data.

      The references have now been added to both the figure and its legend.

      **Referee Cross-commenting**

      I agree with the comments from the other reviewers. Many of the major themes are consistent across the reviews, including regarding the model, preparation methods, and the software tool.

      Reviewer #2 (Significance (Required)):

      Strengths: This manuscript is an important contribution to the field of sarcomere development. The authors use modern technologies to revisit variation in morphometric measurements in the literature, and they identify parameters that influence this variation. Notably, sex-specific differences, DLM versus DVM measurements, and mouting media are potential contributors to the variability. Combining TEM and STORM with a confocal timecourse of isolated myofibrils, they refine previously published values of sarcomere length and width, and add more comprehensive data for filament length, number and spacing. This highly accurate timecourse demonstrates continual growth of sarcomeres after 48 h APF, and correct some inconsistencies from previous large-scale timecourse datasets. These data are very valuable to the field, especially Drosophila muscle biologists, and will serve as a comparative resource for future studies. Weaknesses: At early timepoints, loss of sarcomere components through mechanical or detergent-mediated artifacts may influence the authors' measurements. In addition, isolating myofibrils is not always the most ideal approach, as it loses information on myofiber structure as well as organization and structure of the myofibrils in vivo.

      We believe that the control experiments we presented here adequately demonstrate that sarcomere measurements are not affected by the myofibril isolation process at early timepoints (Figure 3C). Nevertheless, we certainly agree with the reviewer that isolated myofibrils alone cannot capture the entire complexity of muscle tissues, and additional approaches should also be applied in complex projects. Yet, we are confident that our approach offers the most reliable and efficient method for precise morphometric analysis of the sarcomeres, and although alone it is very unlikely to be sufficient to address all questions of a muscle development project, it can still be applied as a very useful and robust tool.

      The point regarding liquid versus hardening mounting media is valuable, but remains to be tested and validated with the diverse liquid and hardening media used by other labs.

      Whereas it would not be feasible for us to test all possible liquid and hardening media used by others in all possible conditions, we tested the effect of Vectashield (the most commonly used liquid media) according to the suggestion of the reviewer, and the results are now included in the manuscript. We think that this is a valuable extension of the list of the materials and conditions we tested, although we need to point out that our primary goal was not necessarily to test as many conditions as possible (because the number of those conditions is virtually endless), rather to raise awareness among colleagues that these variables can significantly impact the data obtained and affect their comparability.

      The IMA software seems to be designed specifically for analysis of isolated myofibrils, and it is unclear if it would work for other types of IFM preparations.

      As stated in the manuscript, IMA is a specialized tool designed for the analysis of individual myofibrils. While it can also process other types of IFM preparations in semi-automatic or manual modes, we believe these approaches compromise both efficiency and accuracy. This is further clarified in the revised manuscript.

      A last point is that TEM and STORM may not be available on a regular basis to many labs, hindering wide implementation of the approach used in this manuscript to generate very accurate and detailed measurements of sarcomere morphometrics.

      Regarding the availability of TEM and STORM, we acknowledge that these techniques are not universally accessible. However, that is exactly one major value of our work that our open-source software tool now allows researchers to generate valuable data using only a confocal microscope in combination with our published datasets.

      Audience: Scientists who study sarcomerogenesis or Drosophila muscle biology.

      My expertise: I study muscle development in the Drosophila model.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __ Summary: This manuscripts presents a computational tool to quantify sarcomere length and myofibril width of the Drosophila indirect flight muscles, including developmental samples. This tool was applied to confocal and STORM super-resolution images of isolated myofibrils from adult and developing flight muscles. Thick filament numbers per myofibril were counted during development of flight muscles. A myofilament model of developing flight muscle myofibrils is presented that remains speculative for the early developmental stages.

      Major comments: 1. The title of the manuscript appears unclear. What is a lattice model? Lattice is an ordered array. The filament array parameters for mature flight muscles was aready measured. It appears that the authors speculate how this order might be generated during sarcomere assembly, which is not studied in this manuscript as it is limited to periodic arrays after 36h APF.

      As the reviewer correctly points out, a lattice refers to an ordered array - in the case of IFM sarcomeres, this includes both thin and thick filaments. Therefore, the phrase "myofilament lattice model of Drosophila flight muscle sarcomeres" specifically describes a model representing the spatial organization of these filament arrays within the sarcomere. To provide additional clarity for readers, we have revised the title to include more context. It now reads: Developmental Remodeling of Drosophila Flight Muscle Sarcomeres: A Scaled Myofilament Lattice Model Based on Multiscale Morphometrics

      To create a model of these arrays, three essential pieces of information are required:

      1) The length of the filaments,

      2) The number of filaments, and

      3) The relative position of the filaments.

      While some direct measurements are available in the literature, and others can be used to calculate the necessary values, available data is often contradictory or simply different from each other (as described in our ms) making them unsuitable for constructing scaled models of the myofilament arrays. In contrast to that, here we present a comprehensive and consistent set of measurements that enabled us to build models not only of mature sarcomeres but also of sarcomeres at three other significant developmental time points.

      Regarding the mention of "sarcomere assembly" in line 37, we intended it to refer to the growth of the sarcomeres, not their initial formation. We do not speculate about sarcomere assembly anywhere in the text. In fact, we have clearly stated multiple times that our focus is on the growth of the IFM myofilament array during myofibrillogenesis. Nevertheless, to avoid confusion, we revised the phrase in line 37 to "sarcomere growth".

      The authors review the flight muscle sarcomere length literature and conclude it is variable because of imprecise measurements. Likely this is partially true, however, more importantly is that the sarcomere length and width changes during isolation methods of the myofibrils, as well as by various embedding methods, as the authors show here as well in Figure 1B-E.

      We dedicated two sections of the Results - “An automated method to accurately measure sarcomeric parameters” and “IFM sarcomere morphometrics are affected by sex, age, fiber type, and sample preparation” - to exploring potential sources of variability in published IFM sarcomere measurements. Based on these analyses, we conclude that such variability stems from both measurement imprecision and biological or technical factors, including sex, age, fiber type and, of foremost, sample preparation. Because it is difficult to quantify the relative impact of each variable across published studies, we have refrained from speculations about the relative contribution of the different factors in the revised manuscript.

      Hence, I find the strongly claims the authors make here surprising, while they are isolating the myofibrils. Hence, these myofibrils are ruptured at the ends, relaxed or contracted, depending on buffer choice and passive tension is released. On page 8, the authors correctly state that the embedding medium causes shrinkage of the myofibrils. While isolation is state of the art for electron microscopy techniques, other methods including sectioning or even whole mount preparation have been developed for high resolution microscopy of IFMs that avoid these artifacts. Unfortunately, this manuscript only uses isolated myofibrils that were fixed and then mechanically dissociated by pipetting. This method likely induces variations as seen by the large spread of sarcomere length reported in Figure 1C (2.8-3.9µm?) and even bigger spreads for myofibril widths. Are these also seen in tissue without dissections? Unfortunately, no comparision to intact flight muscles are reported with the here presented quantification tool. The sarcomere length spread in the developmental samples is even larger.

      The major issue raised in this paragraph is the use of isolated myofibril versus intact flight muscle preparations. The reviewer claims that the latter might be superior because the isolated myofibrils are ruptured at their ends. Clearly, the intact IFMs cannot be imaged in vivo by light microscopy because the adult fly cuticle is opaque. To visualize these muscles, one must open the thorax, but neither microdissection nor sectioning preserves them perfectly, even the cleanest longitudinal cuts sever some myofibrils, and dissection itself can damage the tissue. Although published images often show only the most pristine regions, the practice of selective cropping cannot be taken as a scientific argument. Here, by comparing sarcomere lengths measured in isolated myofibrils with those from whole-mount longitudinal DLM sections and microdissected IFM myofibers, we demonstrate that isolation does not alter sarcomere length (Figure 1E, now Figure 2A in the revised manuscript). As to myofibril width, it is determined by two parameters: the number of myofilaments and the spacing between them. In vivo filament spacing has been measured directly, and filament counts can be obtained from EM cross-sections of DLM fibers. Combining these values gives an expected in vivo myofibril diameter. While isolated myofibrils measure thinner than those in whole-mount or microdissected samples (Figure 1E, now Figure 2A in the revised manuscript), their diameter closely matches this in vivo estimate (see manuscript, lines 187–198). Therefore, we conclude that isolated myofibrils (even if it seems counterintuitive for this reviewer) are superior for sarcomere measurements than whole-mount preparations - and that is why we primarily rely on them here.

      Despite that, we certainly recognize that isolated myofibrils cannot recapitulate every aspect of an IFM fiber, and the need for whole-mount preparations during our IFM studies is not questioned by us.

              In addition to this general answer to the issues raised in the above paragraph of the reviewer, we would like to specifically reflect for some of the remarks:
      

      „Unfortunately, this manuscript only uses isolated myofibrils that were fixed and then mechanically dissociated by pipetting.”

      This is a false statement that “this manuscript only uses isolated myofibrils” as we used different preparation methods for initial comparisons (see Figure 1E, now Figure 2A in the revised manuscript). Additionally, unlike the reviewer assumed, the myofibrils were first dissociated and then fixed, and not vice versa (as described in the Materials and Methods section).

      „This method likely induces variations as seen by the large spread of sarcomere length reported in Figure 1C (2.8-3.9µm?) and even bigger spreads for myofibril widths. Are these also seen in tissue without dissections?”

      This remark makes absolutely no sense, as we do not report sarcomere length values in Figure 1C at all. By assuming that the reviewer meant to refer to Figure 1B, it still remains a misunderstanding or a false statement, because that panel refers to the variations found in published data (not in our current data), and this is clearly explained both in the figure legend and the main text. Regardless of that, the stated spread does not appear unusual. In the article by Spletter et al. (2018), the authors report a similar spread (2.576–3.542 µm) for sarcomere length in mature IFM using whole-mount DLM cross-sections. As to the second question here, we do observe a comparable spread in other preparations as well (see Figure 1E, now Figure 2A in the revised manuscript), which is again the opposite conclusion as compared to the (clearly false) assumption of the reviewer.

      „Unfortunately, no comparision to intact flight muscles are reported with the here presented quantification tool. „

      This is also a false statement; as we do report comparison to whole mount cross sections which we belive the reviewer considers „intact” in Figure 1E (Figure 2A in the revised manuscript).

      „The sarcomere length spread in the developmental samples is even larger.”

      The spread is not larger at all than in previous reports, as clearly shown in Supplementary Figure 3A.

      The authors suggest that there are sex differences in sarcomere length and pupal development duration. This is potentially interesting, unfortunately they then use mixed sex samples to analyse sarcomeres during flight muscle development.

      In the revised manuscript, we now provide a more detailed description of a subtle post-eclosion difference in IFM sarcomere metrics between male and female Drosophila. We attribute this variation to the well-established observation that female pupae develop slightly faster than males, a property that may last till shortly after eclosion. Confirming this experimentally would require considerable effort with limited scientific benefit. Nonetheless, the subtle nature of this sex-linked variation reinforced our decision to include IFM sarcomeres from both male and female flies in our comprehensive developmental analysis.

      The IMA software tool lacks critical assessment of its performance compared to other tools and the validation presented is too limited. IMA seems to generate systematic errors, based on Fig S1E, as it does not report the ground truth. These have to be discussed and compared to available tools. The principles of fitting used in IMA seem well adapted to IFM myofibrils in low noise conditions, but may not be usable in other situations. This should be assessed and discussed.

      IMA is a specialized software tool developed to address a specific need, notably, to accurately and efficiently measure sarcomere length and myofibril diameter in individual IFM myofibril images labeled with both phalloidin and Z-disc markers. For our purposes, it remains the most suitable and reliable option, and we are confident that IMA outperforms all other available tools. To demonstrate this, we have included a table comparing the few alternatives (MyofibrilJ, SarcGraph, and sarcApp) capable of both measurements, which further supports our conclusion. Given IMA's focused application, extensive validation under artificially low signal-to-noise conditions is unnecessary. While IMA may introduce minor systematic errors (~0.01 µm for sarcomere length and ~0.03 µm for myofibril diameter), these are negligible errors relative to the limitations of the simulated ground truth data used for benchmarking. This point is now addressed in the manuscript.

      It is claimed that validation was achieved on simulated IFM images: do the authors rather mean simulated isolated IFM myofibril images? This is not quite the same in terms of algorithm complexity and this should be corrected if this is the case.

      Indeed, we used simulated individual IFM myofibril images, where both phalloidin labeling and Z-disc labeling are present. This is clearly shown in Supplementary Figure 1A, and stated in the text when first introduced: „we generated artificial images of IFM myofibrils with known dimensions, simulating the image formation process”

      The authors need to revise their comparison to other tools. It is incomplete and seemingly incorrect. It should be clearly stated that IMA is limited to isolated myofibrils, which is a far easier segmentation task than what other tools can do, such as sarcApp (Neininger-Castro et al. 2023, PMID: 37921850). Defining the acronym would be valuable in that sense. The claim line 129-130 "none can adequately measure myofibril diameter from regular side view images" is unclear. What do the authors refer to as "side view images"? Sarc-Graph from Zhao et al 2021, PMID: 34613960, and sarcApp from Neininger-Castro et al. 2023 provide sarcomere width, in conditions that are very similar to what IMA does, e.g. on xy images based on the documentation provided on github. A performance comparison with these tools would be valuable. Does installation and use of IMA require computational skills?

      Motivated by the reviewer’s comments, we revised the section introducing IMA. However, we chose not to include an extensive comparison with other software tools, as this would divert the manuscript’s focus without impacting the main conclusions. Instead, we added a summary table highlighting the key requirements for analyzing IFM sarcomere morphometrics from Z-stacks of phalloidin- and Z-line-labeled individual myofibrils and compared the available tools accordingly. In our experience, most software tools are developed to address very specific problems, even those marketed as general-purpose solutions. Consequently, applying them beyond their intended scope often results in reduced efficiency and suboptimal performance. Although sarcApp was initially available as a free tool, one of its dependencies (PySimpleGUI 5) has since adopted a commercial license model. Using a trial version of PySimpleGUI 5, we evaluated sarcApp on our dataset. The software is limited to single-plane image input, hence raw image stacks must be preprocessed into a suitable format, which is a time consuming step. Furthermore, implementation requires basic programming proficiency, as parameter adjustments must be performed directly within the source code to accommodate dataset-specific configurations. Once appropriately configured, sarcApp reliably quantifies both sarcomere length and myofibril width with accuracy comparable to that of IMA. However, it lacks built-in diagnostic feedback or visualization tools to facilitate measurement verification or troubleshooting during batch processing. SarcGraph also supports only single-plane image inputs and requires prior image preprocessing. Additionally, images must be loaded manually one by one, which further reduces processing efficiency. Parameter optimization relies on direct code modification through a trial-and-error process, demanding a certain level of programming proficiency. Even with these adjustments, the software frequently introduces artifacts - such as Z-line splitting - when applied to our dataset. Even when segmentation is successful, sarcomere length is often overestimated, whereas myofibril diameter is consistently underestimated. As compared to these issues, IMA was designed for ease of use and does not require any programming experience to install or operate. It can automatically handle raw microscopic image formats without the need for preprocessing. Segmentation is fully automated, with no requirement for parameter tuning. The tool provides visual feedback during both the segmentation and fitting steps, allowing users to confidently assess and validate the results. IMA produces accurate and precise measurements of sarcomere length and diameter. Batch processing is enabled by default, significantly improving efficiency when analyzing multiple images. Finally, unlike the reviewer stated, IMA is not limited to isolated myofibrils. It is optimized for isolated myofibrils (i.e. full performance is achieved on these samples), but it can also work on whole-mount preparations in semi-automatic and manual mode, which still allow precise measurements (with some reduction in processing efficiency).

      As to the minor comments, the acronym IMA was already defined in lines 541 and 917–918 of the original submission, as well as on the software’s GitHub page. Additionally, we replaced the phrase "side view images" with "longitudinal myofibril projections" to improve clarity.

      How do the authors know that the bright phallodin signal visible that the Z-disc at 36h and 48h APF is due to actin filament overlap, as suggested? An alternative solution are more short actin filaments at the early Z-discs.

      It is widely accepted that the bright phalloidin signal at the Z-line in mature sarcomeres reflects actin filament overlap (e.g., Littlefield and Fowler, 2002; PMID: 11964243). Accordingly, in slightly stretched myofibrils, this bright signal diminishes, and in more significantly stretched myofibrils, a small gap appears (e.g., Kulke et al., 2001; PMID: 11535621). The width of this bright phalloidin signal corresponds to the electron-dense band seen in longitudinal EM sections (Figure 3B and Supplementary Figure 5B, now Figure 4B and Figure 6B in the revised manuscript) and matches the actin filament overlap observed in Z-disc cryo-EM reconstructions from other species (Yeganeh et al., 2023; Rusu et al., 2017), where individual thin filaments can be resolved. By extension, we interpret the bright phalloidin signals at the Z-discs observed at 36 h and 48 h APF as arising from similar actin filament overlaps, given their comparable width to the electron-dense Z-bodies described both in our study (Supplemantary Figure 5B, now Figure 6B in the revised manuscript) and by Reedy and Beall (1993). While we cannot fully rule out the reviewer’s alternative interpretation, for the time being it remains a bold speculation without supporting evidence, and therefore we prefer to stay with the conventional view.

      The authors seem to doubt their own interpretation that actin filaments shrink when reading line 304 and following. This is obviously critical for the "model" presented.

      Unlike the reviewer implies, we certainly do not doubt our own interpretation, but to avoid confusion we revised the corresponding paragraph in the manuscript and provided more details on our explanation, and we also provide a brief overview of it here. Between 36 h and 48 h APF we observe a pronounced structural transition in the IFM sarcomeres. In EM cross-sections, the previously irregular myofilament lattice becomes organized into a regular hexagonal pattern (Figure 3A, now Figure 4A in the revised manuscript) with filament spacing typical of mature myofibrils (Supplementary Figure 5A, now Figure 6A in the revised manuscript). In longitudinal EM sections, the elongated, amorphous Z-bodies condense along the myofibril axis to form well-defined, adult-like Z-discs (Supplementary Figure 5B, now Figure 6B in the revised manuscript). Similarly, dSTORM imaging shows that the Z-disc associated D-Titin epitopes become more compact and organized during this period (Supplementary Figure 4E, now Figure 5E in the revised manuscript). The edges of the thick filament arrays also become more sharply defined, and the appearance of a distinct bare zone indicates the establishment of a regular register (Figure 3B, now Figure 4B in the revised manuscript). By assuming that a similar reorganization occurs within the thin filament array, the apparent length of the thin filament array would decrease—not due to shortening of individual filaments, rather due to improved alignment. Although we cannot directly resolve single thin filaments, this reorganization offers the most plausible explanation for the observed change.

      Minor comments: 1. Figure S1B is not called out in the text.

      The reviewer might have missed this, but in fact, it is explicitly called out in line 181.

      Fig. 1: Please state whenever images are simulations?

      We appreciate the reviewer’s observation that the simulated IFM myofibril images are indistinguishable from the real ones, as this confirms the adequacy of these images for testing our software tool. However, this is already clearly indicated: Figure 1B features simulated images, as noted in the figure legend (line 824), and Supplementary Figure 1A similarly shows simulated images, as stated both in the legend (line 886) and in the figure.

      Fig. 2: Length-width correlation - please provide individual points color-coded by time point?

      As suggested by the reviewer, we generated a plot with the individual points color-coded by time.

      "newly eclosed males and females, we observed that males have slightly shorter sarcomeres and narrower myofibrils". Please provide a statistical test supporting the difference.

      In the revised manuscript, we compared sarcomere length and myofibril width between males and females from 0 to 96 hours AE using a two-way ANOVA with Sidak’s multiple comparisons test. We expanded our description of these observations in the main text, and details of the statistical analysis are now included in the revised figure legend (Figure 1E). Briefly, newly eclosed males showed slightly shorter sarcomeres than females - a consistent but non-significant trend (p = 0.9846) - which resolved by 12 h AE, with sarcomere lengths remaining similar thereafter (p = 0.1533; Figure 1E). In contrast, myofibril width was significantly narrower in the newly eclosed males (p = 0.0374), but this difference disappeared between 24 and 48 h AE as myofibrils expanded in diameter during post-eclosion development (p

      Were statistical tests performed using animals as sample numbers? Please clarify in the images what are animal and what are sarcomere numbers.

      Following standard guidelines, statistical tests were performed using the means of independent experiments, as noted in the figure legends. For each experiment, we used approximately 6 animals, and this information is now included in the Materials and Methods section.

      mef2-Gal4 should be spelled Mef2-GAL4 according to Flybase.

      This has been corrected in the revised text and figures.

      Are the images shown in Figure 2B representative? 96h AE appears thicker than 24h AE but the graph reports no difference.

      We aimed to show representative images, however, in the case of 96h APF we may have selected a wrong example. We now changed the image for a more appropriate one.

      The authors only found Zasp52 and alpha-Actinin at the Z-discs from 72h APF onwards, which is different to what others have reported.

      Similarly to former reports, we detected both α-Actinin (see Supplementary Figure 3B, now Figure 3C in the revised manuscript) and Zasp52 in microdissected IFMs as early as 36 hours APF. However, these markers were largely absent in isolated myofibrils from the early pupal stages (36–60 hours APF). By 60 hours APF, strong α-Actinin and Zasp52 signals were clearly visible in isolated myofibrils (the closest timepoint captured by dSTORM is 72h APF). As discussed in the manuscript, a likely explanation is that α-Actinin and Zasp52 are recruited to developing Z-bodies early on but are only fully incorporated into mature Z-discs between 48 and 60 hours APF. Their incomplete integration at earlier stages may lead to their loss during the isolation procedure.

      Thick filament length during development has also been estimated by Orfanos and Sparrow, which should be cited (PMID: 23178940)

      Contrary to the reviewer’s claim, the article 'Myosin isoform switching during assembly of the Drosophila flight muscle thick filament lattice' does not provide any measurements or estimates of thick filament length; it only includes a schematic illustration where the length of the thick filaments is not based on empirical data.

      **Referee Cross-commenting**

      I also agree with my colleagues comments, which are largely consistent.

      Reviewer #3 (Significance (Required)):

      This paper introduces a tool to measure sarcomere length. Easy to use tools that do this as well already exist. The tool can also measure sarcomere width, which it claims as unique point, which is not the case, see above comment.

      We are aware that other tools exist to measure sarcomere parameters (and we did not claim the opposite in our ms), nevertheless, we need to emphasize that based on our comparisons, IMA is superior to all three alternatives. Three software tools could, in principle, be used to measure both sarcomere length and myofibril diameter: MyofibrilJ, SarcGraph, and sarcApp. However, two of them - MyofibrilJ and SarcGraph - consistently under- or overestimate these values. The only tool capable of performing these measurements reliably, sarcApp, is no longer freely available, it requires programming expertise, and it does not support raw image file formats, making it difficult to use in practice (see above comments for more details). In contrast, IMA is user-friendly and does not require any programming expertise to install or operate. It can automatically process raw microscopic image formats without the need for preprocessing. Segmentation is fully automated, and no parameter tuning is necessary. The tool offers visual feedback on both the segmentation and fitting processes, enabling users to validate results with confidence. IMA delivers accurate and precise measurements of sarcomere length and diameter. Additionally, batch processing is enabled by default, significantly enhancing workflow efficiency.

      This manuscript shows that depending on the isolation and embedding media sarcomere and myofibrils width changes and hence artifacts can be introduced. While this is not suprising, it has not been well controlled in a number of previous publications.

      Furthermore, this paper measures sarcomere length and width during flight muscle development and consolidates what was already known from previous publications. Sarcomeres are added until 48 h APF, then they grow in diameter. Despite strong claims in the text, I do not see any significant novel findings how sarcomeres grow in length or width or any significant deviations from what has been published before. This is even documented in the supplementary graphs by comparing to published data. It is close to identical.

      The overall process has been quantitatively described in four previous studies (Reedy and Beall, 1993, Orfanos et al., 2015, Spletter et al., 2018, Nikonova et al., 2024). While there is general agreement on the pattern of sarcomere development, significant discrepancies exist among these datasets; differences that become particularly problematic when attempting to build structural models. More specifically: Reedy and Beall (1993) report substantially shorter sarcomeres compared to all other datasets, including ours. This discrepancy likely stems from two factors: (i) their use of longitudinal EM sections, where sample preparation is known to cause considerable tissue shrinkage; and (ii) the maintenance of their flies at 23 °C, a temperature that clearly delays development relative to the more commonly used 25 °C. Interestingly, Spletter et al. (2018) and Nikonova et al. (2024) conducted their experiments at 27 °C, which also deviates from standard conditions and may complicate comparisons. Orfanos et al. (2015) suggested that mature sarcomere length is reached by approximately 88 hours after puparium formation (APF). In contrast, our measurements show that sarcomeres continue to elongate beyond this point, reaching mature length between 12 and 24 hours post-eclosion. All four earlier studies report a mature sarcomere length around 3.2-3.3 µm, only slightly longer than the ~3.2 µm length of thick filaments (Katzemich et al., 2012; Gasek et al., 2016). This would imply an I-band length below ~100 nm, which is an implausibly short distance. In contrast, our data, along with several recent studies (González-Morales et al., 2019; Deng et al., 2021; Dhanyasi et al., 2020; DeAguero et al., 2019), support a mature sarcomere length of approximately 3.45 µm, placing the length of the I-band at around 250 nm. This estimate is more consistent with high-resolution structural observations from longitudinal EM sections and fluorescent nanoscopy (Szikora et al., 2020; Schueder et al., 2023). Although Reedy and Beall (1993) provide limited data on myofibril diameter during myofibrillogenesis, a more detailed quantitative analysis is presented by Spletter et al. (2018) and by Nikonova et al. (2024). Interestingly, Spletter et al. report two separate datasets - one based on longitudinal sections and another on cross-sections of DLM fibers. While the measurements are consistent during early pupal stages, they diverge significantly in mature IFMs (1.116 ± 0.1025 µm vs. 1.428 ± 0.0995 µm), a discrepancy that is not addressed in their publication. Nikonova et al. (2024) report even narrower myofibril widths (0.9887 ± 0.1273 µm). Moreover, the reported diameters of early myofibrils in all three datasets are nearly twice as large as those reported by Reedy and Beall (1993) and in our own measurements, directly contradicting the reviewer's claim that the values are “close to identical.” Finally, our data clearly demonstrate that both the length and diameter of IFM sarcomeres reach a plateau in young adults, which is a key developmental feature not examined in previous studies.

      In summary, we did not and we do not intend to claim that our conclusions are novel as to the general mechanisms of myofibril and sarcomere growth. Rather, our contribution lies in providing a high-precision, robust analysis of the growth process using a state-of-the-art toolkit, resulting in a comprehensive description that aligns with structural data obtained from TEM and dSTORM. We therefore believe that expert readers will recognize numerous valuable aspects of our approaches that will advance research in the field.

      Counting the total number of thick filaments during myofibril development is nice, however, this also has been done (REEDY, M. C. & BEALL, C. 1993, PMID: 8253277). In this old study, the authors reported the amount of filament across one myofibril. How does this compare to the new data here counting all filaments? Unfortunatley, this is not discussed.

      Indeed, the study by Reedy and Beall (1993) was primarily based on longitudinal DLM sections, which were used to estimate myofibril width and count the number of thick filaments on this lateral view images (e.g., ~15 thick filaments wide at 75 hours APF), but total thick filament numbers were not provided. While such data could theoretically be used to estimate the number of myofilaments per myofibril, these estimations would depend on the unverified assumption that the section includes the full width of the myofibril. Additionally, the study did not provide standard deviations or the number of measurements, limiting the interpretability and reproducibility of their findings. These points highlight the need for a more rigorous and quantitative approach. For these reasons, we chose to quantify myofilament number using cross-sections, providing more accurate and reliable assessments.

      Besides the difference between the lateral versus cross sections, a direct comparison of our studies is further complicated by differences in the developmental time points and experimental conditions used. Reedy and Beall (1993) reports data from pupae aged 42, 60, 75 and 100 hours, as well as from adults, whereas we present data from 36, 48, and 72 hours APF, and from 24 hours after eclosion, which corresponds to approximately 124 hours APF. Moreover, their experiments were carried out at 23 °C, a temperature that somewhat slows down pupal development and results in adult eclosion at around 112 hours APF, as stated in their study. In contrast, our experiments were carried out at the more commonly used 25 °C, where adults typically emerge around 100 hours APF.

      Collectively, these differences prevented meaningful comparisons between the two datasets, and therefore we preferred to avoid lengthy discussions on this issue.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02953

      Corresponding author(s): Andreas, Villunger

      1. General Statements [optional]

      *We would like to thank the reviewers for their constructive input and overall support. We appreciate to provide a provisional revision plan, as outlined here, and are happy to engage in additional communication with journal editors via video call, in case further clarifications are needed. *

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      Reviewer #1

      __Evidence, reproducibility and clarity __

      Summary: This manuscript by Leone et al describes the role of the PIDDosome in cardiomyocytes. Using a series of whole body and cardiomyocyte specific knockouts, the authors show that the PIDDosome maintains correct ploidy in these cells. It achieves this through inducing cell cycle arrest in cardiomyocytes in a p53 dependent manner. Despite this effect on ploidy, PIDDosome-deficient hearts show no structural or functional defects. Statistics and rigor appear to be adequate.

      We thank this referee for taking the time to evaluate our work and their valuable comments. We assume that this reviewer by mistake indicates that the phenomenon we describe, depends on p53. As outlined in the abstract and throughout the manuscript, the effect is independent of p53, but may additionally still involve p21, acting along or parallel to the PIDDosome.

      Major comments: 1. Figure 1 uses fluorescent intensity of a nuclear stain to determine ploidy per nucleus and they further separate the results into mononucleated, binucleated or multinucleated cells. It is hard to know how to interpret these results without further information or controls. Is there a good positive control that can be used to help to show whether this assay is quantitative? The differences are larger with the Raidd and caspase-2 knockouts than with the Pidd knockouts but this is not addressed.

      *We appreciate this concern. Regarding a "good positive control" we can say that we follow state-of the art in the cardiomyocyte field and studies by the Evans (PMID: 36622904), Kuhn (PMID: 32109383), Bergmann (PMID: 26544945) and Patterson labs (PMID: 28783163, 36912240) all use the identical approach to discriminate 2n from 4n nuclei in microscopy images at the cellular level. The fact that the majority of rodent CM nuclei is indeed diploid (PMID: 31175264, 31585517 and 32078450) and a large number of nuclei has been evaluated to assess their mean fluorescence intensity (MFI) reduces the risk of a systematic bias in our analysis. Moreover, we have used an orthogonal approach that is indeed quantitative to define DNA content, i.e,. flow-cytometry based evaluation of DNA content in isolated CM nuclei (Fig. 1C). We hence are confident our assays are quantitative. *

      Regarding the fact that loss of Pidd1 causes a more saddle phenotype, we can offer to discuss this in light of the fact that Pidd1 has additional functions, outside the PIDDosome (PMID: 35343572), and that we made similar observations when analyzing ploidy in hepatocytes (PMID: *31983631). Given the fact that all components of the PIDDosome show a similar phenotype, and that this phenotype is mimicked by loss of the protein that connects PIDD1 and centrosomes, ANKRD26 (Fig. 4a), we are confident that this biological variation in our analysis is not affecting our conclusions. *

      On line 459 the authors state that the increase in polyploidy in PIDDosome knockouts occurs in adult hood but this is not directly tested. In fact, in the next section the polyploidy is assessed in early postnatal development. This statement should be explained or removed.

      We see that we have made an unclear statement here. In fact, we first noted increases in ploidy in adult heart and then define the time window in development when this happens. This sentence will be rephrased.

      In Figure 4. The authors obtained RNAseq data for P1, P7 and P14 but only show the differences with and without caspase-2 at P7. Given that the differences in ploidy are more significant at P14 (Fig 3D), all the comparisons should be shown along with analysis of whether the same genes/gene families are altered in the absence of caspase-2.

      The reason why we focus on postnatal day 7 (P7) is that data from Alkass et al (PMID: 26544945) and other labs (PMID: 31175264 ) document that on this day the initial wave of binucleation peaks. Hence, we hypothesized that the PIDDosome must be active in most CM, which aligns well with the increased mRNA levels of all of its components (Figure 3). Interestingly, it seems that its action is tightly regulated, as mRNA of PIDDosome components drop on P10, suggesting PIDDosome shut-down or downregulation. Similar findings have been noted in the liver (PMID: *31983631). Alkass and colleagues also show that very few CMs enter another round of DNA synthesis between P7 and P14, and hence possible transcriptome changes in the absence of the PIDDosome will be strongly diluted. *

      Please note that on P1, there is no difference between genotypes to be expected as all CM are mononucleated diploids and cytokinesis competent, as previously demonstrated (PMID: *26544945). Moreover, PIDDosome expression levels are extremely low (Fig. 3A). As such, no difference between genotypes are expected on P1. In addition, on P14 the ploidy phenotype observed in PIDDosome knockout mice reaches the maximum and ploidy increases are comparable to adult tissue. Thus, at this time the trigger for PIDDosome activation (cytokinesis failure) is no longer observed as the majority of CMs are post-mitotic, (PMID: 26247711). As such the impact of PIDDosome activation on the P14 transcriptome is most likely negligible. However, if desired, we can expand our bioinformatics analysis summarizing findings made related to DEGs over time in wt animals by comparing genotypes also on day 1 and day 14. In light of the above, analysis between genotypes on P7 holds still appears as the one most meaningful. *

      Some validation of the RNAseq and/or proteomics results would be an important addition to this study

      We agree with this notion and propose to validate key candidates related to cardiomyocyte proliferation and polyploidization, some of which we found to be differentially expressed at the mRNA level on day 7in the RNAseq data (e.g., p21, Foxm1, Kif18a, Lin37 and others)

      Regarding the proteomics results, we face the challenge that we can only try to confirm if candidate proteins are likely caspase substrates in silico using DeepCleave*, and potentially pick one or two candidates linked to CM differentiation for further analysis in vitro and in heterologous cell based assays (e.g. 293T cells), as no bona-fide ventricular cardiomyocyte cell lines exist. Primary postnatal CMs are extremely difficult to transfect, nor they proliferate without drug-treatment, or fail cytokinesis ex vivo. *

      Figure 4D: the authors make the conclusion that p21 is downstream of PIDD (et p53 independent). However, this is not supported by the data because the increase in 4N cells/decrease in 2N cells, although statistically significant, is nowhere near that of caspase-2 KO and caspase-2/p21 KO. Statistics should also compare p32KO with c2KO. In the absence of any other data, the more likely conclusion is that p21 is not involved.

      *We agree that the findings related to the impact seen upon loss of p21 suggest that it is not the only effector involved in ploidy control and it may not even be an effector engaged by caspase-2, as C2/p21 DKO mice have an even higher ploidy increase, albeit not statistically significant. However, it is important to highlight that p21 (Cdkn1a) was found to be downregulated in our transcriptomic analysis suggesting an involvement in the caspace2-cascade. We are happy to highlight this when presenting the results and in the discussion. *

      *We assume that this referee refers to p73 KO data that should be compared to Casp2 KO data (could be read as p73 or p53, but the latter we compare side by side with Casp2 in Fig. 4 already). As p73 KO mice were not found to be viable beyond day 7 (our attempt to find animals on day 10 failed, in line with published literature (PMID: 24500610, 10716451)), we can only offer to compare this data set to the data presented in Figure 3C, where we have analyzed ploidy increases on day 7 from wt and PIDDosome mutant mice. This re-analysis will show that only Caspase-2 mutant mice display a significant ploidy increase on P7, when compared to wt or p73 mutant animals, while no difference are noted between wt and p73 mutant mice (to be included in new Suppl. Fig. 3C) *

      Minor comments: Suggest moving Figure 4A to Figure 3 as it seems to fit better there based on the citation of this figure in the text

      *We can see some benefit in this recommendation and included panel 4A now in an updated version of Figure 3. *

      Recommend enhancing the brightness of microscopy images in Figure 1E and 2D

      We will try to improve image quality, may have been due to PDF conversion

      Significance

      This study provides interesting information for the role of the PIDDosome in protecting from polyploidy and adds to the body of work by this same group studying this pathway in the liver.

      The main weakness in terms of significance is the lack of a phenotype in the hearts of these animals. Therefore, it is clear that ploidy (or at least PIDDosome dependent ploidy) has minimal impact on cardiac development.

      We respectfully disagree with the comment that the lack of impact on cardiac function constitutes a weakness of our findings. Several studies on ploidy control in the liver (PMID 34228992) but importantly also heart (PMID: 36622904) have failed to document a clear impact of increased ploidy on organ function. This does not infer insignificance, but maybe rather that the context where this becomes relevant has not been identified. We are happy expand on this in our discussion

      The authors mention that they have not tried giving these mice an myocardial infarct (MI) or inducing any other type of cardiac damage. Although it is understood that these experiments are likely outside of the scope of the present study, without this information the impact of this study is moderate. I recommend expanding the discussion to provide a more in-depth possible rationale as to why ploidy perturbations do not lead to structural changes like in the liver.

      Despite this, the insights to the pathway itself are interesting to investigators in the caspase-2 field if a little underdeveloped, especially concerning the role of p21.

      My expertise is in cell death and caspase biology (especially caspase-2). I have sufficient expertise to evaluate all parts of this paper.

      *As mentioned above, we will amend our conclusions on p21, in light of potential findings made when validating DEG candidates, as stated above. *

      *We hope that the changes and amendments proposed here will be satisfactory to this referee to recommend publication of a revised manuscript. *

      Reviewer #2

      __Evidence, reproducibility and clarity: __

      __Summary: __

      In this study, the authors investigated the role of the PIDDosome during cardiomyocyte polyploidization. PIDDosome is a multi-protein complex activating the endopeptidase Caspase-2, and shown to be involved in eliminating cells with extra centrosomes or in response to genotoxic stress (Burigotto & Fava, 2021, Sladky and Villunger, 2020). In both cases, the PIDDosome is recruited in a ANKRD26-dependent manner at the centrosomes leading to p53 stabilization and cell death (Burigotto & Fava, 2021; Evans et al., 2020; Burigotto et al., 2021).

      Here, by studying mouse cardiomyocyte differentiation, the authors showed that PIDDosome is imposing ploidy restriction during cardiomyocyte differentiation. Importantly, in contrast to a previous report in the liver (Sladky et al., 2020), they showed that PIDDosome acts in a p53-independent manner in cardiomyocytes. Indeed, they suggested that PIDDosome controls ploidy in cardiomyocytes through p21 activation.

      We want to thank this reviewer for the time taken to evaluate our work and provide critical feedback that will help to improve our revised manuscript.

      __Major comments: __

      In general the conclusions of the authors are well supported by the experiments. However, I would suggest the following experiments/analysis to strengthen the paper:

      The authors should improve the Figure 1 to help the readers who are not familiar with cardiomyocyte polyploidization. For instance, I would suggest to add a scheme to summarize cardiomyocyte polyploidization (in terms of nuclear size, mono vs multi and so on).

      We agree that a visual summary of the postnatal timing of CM polyploidization will be helpful for the generalist not familiar with the topic and have added a scheme, adapted from a study by Alkass et al. (PMID: *26544945), who elegantly defined the timing of this process during postnatal mice life (now Fig. 1A). *

      Based on the images they presented in 1B, the authors should also measure the nuclear area or volume in the different conditions in which components of the PIDDosome were depleted. Indeed, these two parameters should be easier to conceptualize for the readers (instead of the fluorescence nuclear intensity). This could help to understand if the nuclear size is maintained between the different conditions and if this is comparable between mono, bi or multinucleated cardiomyocytes.

      We have acquired this data and it can be used to provide additional information on nuclear area and/or volume. We propose to focus on re-analyzing data from wt, Casp2 and XMLC2CRE/Casp2f/f mice. The additional information can be included in Figures 1 & 2, respectively.

      • In Figure 2A, the authors presented cross section of heart from animals showing that PIDDosome depletion has no effect on heart size. This is a surprising result since cardiomyocytes have higher ploidy levels and this could have an effect on their function. Since the importance of this observation, the authors should present a quantification of the heart size in the different conditions shown in Figure 2A.

      We agree with this comment. We can measure the heart vs. body weight ratio or tibia length in adult Casp2-/- vs. WT (3 month old) in order to indirectly evaluate possible increases in CM size linked to increased ploidy.

      Also, the percentage of cardiomyocytes presenting higher levels of ploidy seems quite low. The authors should discuss this point. In particular because this could explain the absence of consequences on heart size and function at steady state.

      We agree with this conclusion and will expand on this in our discussion. It is important to note that as opposed to findings made in liver (PMID: *31983631), genetic manipulation of ploidy regulators such as E2f7/8 (PMID: 36622904), only led to modest changes in CM ploidy, suggesting that either a small band-width compatible with normal heart function exists, or that additional mechanisms exist that take control when these thresholds set by the PIDDosome or E2f7/8 are exceeded. These mechanisms could involve Cyclin G (PMID: 20360255), or TNNI3K (PMID: 31589606). Importantly, a recent publication has shown that overexpression of Plk1(T210D) and Ect2 from birth causes increased heart weight coupled with a minor decrease in CM size. These mice undergo to premature death (PMID: 39912233) suggesting that CM polyploidization is a tight regulated process regulated by several independent mechanisms during heart development. *

      In Figure 2D, the authors measured the cardiomyocyte cross-sectional area and concluded that removing PIDDosome components have no effect on cardiomyocyte cell size. Since it has been shown that ploidy increase is normally associated with an increase in cell area, the authors should measure cell area of cardiomyocytes analyzed in Figure 1B. It could be then interesting to establish a correlation with nuclear area and the mono, bi or multinucleated status. This will strengthen the results showing that ploidy increases without affecting cell area.

      Indeed, studies in PIDDosome deficient livers suggest that tissue is containing fewer but bigger cells (PMID: *31983631). As opposed to the liver the percentage of cardiomyocytes presenting higher levels of ploidy is relatively low. Thus, a possible increase in CM size in PIDDosome deficient mice may be masked in heart cross-sections. In order to better correlate the ploidy with cell size, we propose to reanalyze our microscopy images used to extract the data displayed in Fig. 1D. We may run into the problem though that the number of cells acquired may become limiting to achieve sufficient statistical power. In this case we could pool data from different PIDDosome mutant CM to increase statistical power. Again, we propose to initially prioritize wt vs. Casp2 vs. XMLC2/Casp2f/f mice. In addition, we can offer to quantify heart to body weight ratio or tibia length as an additional read-out (see answer to a previous reviewer comment). *

      The authors should discuss the fact that PIDDosome depletion lead only to a mild increase in ploidy levels (4N) in a small percentage of cardiomyocyte. If the PIDDosome is controlling ploidy, one could expect that removing it should lead to a drastic increase in the ploidy levels. Is PIDDosome depletion leading to cell death in some cardiomyocyte? The authors should discuss this point in the discussion or if relevant show a staining with an apoptosis marker. Is another mechanism compensating to prevent higher ploidy levels in cardiomyocytes?

      These are valid thoughts, some of which we contemplated before. In part, we have addressed them in our response to Reviewer#1, above, discussing similar findings made in E2f7/8 deficient hearts (PMID: 36622904), or Cyclin G overexpressing hearts (PMID: 20360255), where also only modest changes in ploidy were achieved. Together these observations are suggesting alternative control mechanism able to act, or limited tolerance towards larger shifts in ploidy, incompatible with proper cell function and survival. Towards this end, we can offer to test if we find increased signs of cell death in PIDDosome mutant hearts by TUNEL staining of histological sections. Of note, we did not find evidence for such a phenomenon in the liver (PMID: 31983631).

      Even if the authors presented RNAseq data suggesting that the PIDDosome is activated during cardiomyocyte differentiation, they should clearly demonstrate this point to strengthen the message of the paper. Indeed, the conclusions are based on the absence of PIDDosome components triggering higher ploidy in cardiomyocytes. However, we don't know whether (and when) the PIDDosome is activated during cardiomyocyte differentiation to control their ploidy levels. I would suggest to analyze PIDDosome activation markers by immunofluorescence in *cardiomyocytes at different developmental stages. *

      *We agree with this referee that direct proof of PIDDosome activation would be helpful and that we only infer back from loss of function phenotypes when and where the PIDDosome becomes activated. However, several technical issues prevent us from collecting more direct evidence of PIDDosome activation in the developing heart. 1) Polyploidization in heart CM appears to happen gradually in CM from day 3 on with a peak at day 7 (PMID: 26544945). Hence, this is not a synchronous process, where we could pinpoint simultaneous activation of the PIDDosome in all cells at the same time, which would facilitate biochemical analysis, e.g., by western blotting for signs of Caspase-2 activation (i.e. the loss of its pro-form, PMID: 28130345). 2) Our most reliable readout, MDM2 cleavage by caspase-2 giving rise to specific fragments detectable in western, is not applicable to mouse tissue, as the antibody we use only detects human MDM2 (PMID: 28130345) and no other MDM2 Ab we tested gave satisfactory results. Independent of that, 3) we do not see involvement of p53 in CM ploidy control (arguing against a role of MDM2). *

      *As such, we can only offer to look at extra centrosome clustering in postnatal binucleated CM (as also suggested further below), as a putative trigger for PIDDosome activation. However, this has been published by the first author of this study before (PMID 31301302). Given that we have made the significant effort to time resolve the increase in ploidy in postnatal mice (please note that several hearts needed to be pooled for each time point, analyzed in multiple biological replicates), we think that our conclusions are well-justified based on the genetic data provided. *

      Concerning the methods, the authors must add the references for each product they used and not only the origin. When relevant, the RRID should be indicated. Without this information the method and the data cannot be reproduced.

      We will update this information where relevant to reproduce our results

      Minor comments:

      In general, the text and the figures are clear. Nevertheless, I would suggest the following changes:

      • Figures 1B, 2B and 2C: the y-axis must start at 0.

      We will adopt axes accordingly

      Figure 4A: The authors should stain centrosomes in cardiomyocytes. This should strengthen the conclusion taken by the authors based on the results obtained in mice depleted for ANKRD26. Indeed, for the moment they are insufficient to conclude about the role of the centrosomes. The authors should show that centrosomes cluster in cardiomyocytes (a condition necessary for PIDDosome activation in polyploid cells) and if possible that component of the PIDDosome are recruited here.

      *This point is well taken and addressed in part above. Clustering of extra centrosomes has been documented and published by the first author of this study in rat polyploid cardiomyocytes (PIMID; cited). We can offer to show clustering of centrosomes in mouse CM isolated from day 7 hearts, but while PIDD1 can be detected well in MEF, we repeatedly failed to stain fro PIDD1 in primary CMs. *

      Figure 4F: I would suggest to modify the working model to emphasize more the differences between WT and PIDDosome KO.

      We will aim to improve this cartoon/graphical abstract

      The prior studies are referenced appropriately.

      Reviewer #2 (Significance (Required)):

      How polyploid cells control their ploidy levels during differentiation remains poorly understood. The data presented here represent thus an advance concerning this question. The actual model concerning PIDDosome activation relies on the presence of extra centrosomes that drives the ANKDR26-dependent recruitment of the PIDDosome. Then, Caspase 2 is activated leading to a p53-p21 dependent cell cycle arrest (Burigotto & Fava, 2021, Sladky and Villunger, 2020; Janssens & Tinel, 2012; Evans et al., 2020; Burigotto et al., 2021). In this study, the authors showed that similar pathway takes place during cardiomyocyte differentiation to control ploidy levels. These data are reminiscent of previous work showing PIDDosome involvement during hepatocyte polyploidization (Sladky et al. 2020). Together, these data highlight the prominent role of the PIDDosome complex in controlling ploidy levels in physiological context. Importantly, this study identified that the classical p53-dependent cell cycle arrest described after PIDDosome activation is not involved here. Instead, the data established that independently of p53, p21 contribute to control cardiomyocyte ploidy. In consequence, this study extends the initial pathway associated with PIDDosome activation and suggest that other mechanisms could take place to restrain cell proliferation upon PIDDosome activation. Overall, this makes this paper significant and of interest for the following fields: polyploidy, heart/cardiomyocyte development and PIDDosome.

      My field of expertise includes polyploidy, cell cycle and genetic instability.

      We thank this reviewer for the time taken and the positive feedback provided.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      N/A

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      • *As outlined above, limited tools are available to validate putative caspase-2 substrates, identified in proteomics analysis, in an impactful manner. *
      • *Also, as discussed above, we deem myocardial infarction experiments in mice as unsuitable to improve our work, as with all likely-hood, they will yield negative results. *
    2. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02953

      Corresponding author(s): Andreas, Villunger

      [The “revision plan” should delineate the revisions that authors intend to carry out in response to the points raised by the referees. It also provides the authors with the opportunity to explain their view of the paper and of the referee reports.

      • *

      The document is important for the editors of affiliate journals when they make a first decision on the transferred manuscript. It will also be useful to readers of the reprint and help them to obtain a balanced view of the paper.

      • *

      If you wish to submit a full revision, please use our "Full Revision" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      *We would like to thank the reviewers for their constructive input and overall support. We appreciate to provide a provisional revision plan, as outlined here, and are happy to engage in additional communication with journal editors via video call, in case further clarifications are needed. *

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      • *

      Reviewer #1

      __Evidence, reproducibility and clarity __

      Summary: This manuscript by Leone et al describes the role of the PIDDosome in cardiomyocytes. Using a series of whole body and cardiomyocyte specific knockouts, the authors show that the PIDDosome maintains correct ploidy in these cells. It achieves this through inducing cell cycle arrest in cardiomyocytes in a p53 dependent manner. Despite this effect on ploidy, PIDDosome-deficient hearts show no structural or functional defects. Statistics and rigor appear to be adequate.

      We thank this referee for taking the time to evaluate our work and their valuable comments. We assume that this reviewer by mistake indicates that the phenomenon we describe, depends on p53. As outlined in the abstract and throughout the manuscript, the effect is independent of p53, but may additionally still involve p21, acting along or parallel to the PIDDosome.

      Major comments: 1. Figure 1 uses fluorescent intensity of a nuclear stain to determine ploidy per nucleus and they further separate the results into mononucleated, binucleated or multinucleated cells. It is hard to know how to interpret these results without further information or controls. Is there a good positive control that can be used to help to show whether this assay is quantitative? The differences are larger with the Raidd and caspase-2 knockouts than with the Pidd knockouts but this is not addressed.

      *We appreciate this concern. Regarding a “good positive control” we can say that we follow state-of the art in the cardiomyocyte field and studies by the Evans (PMID: 36622904), Kuhn (PMID: 32109383), Bergmann (PMID: 26544945) and Patterson labs (PMID: 28783163, 36912240) all use the identical approach to discriminate 2n from 4n nuclei in microscopy images at the cellular level. The fact that the majority of rodent CM nuclei is indeed diploid (PMID: 31175264, 31585517 and 32078450) and a large number of nuclei has been evaluated to assess their mean fluorescence intensity (MFI) reduces the risk of a systematic bias in our analysis. Moreover, we have used an orthogonal approach that is indeed quantitative to define DNA content, i.e,. flow-cytometry based evaluation of DNA content in isolated CM nuclei (Fig. 1C). We hence are confident our assays are quantitative. *

      Regarding the fact that loss of Pidd1 causes a more saddle phenotype, we can offer to discuss this in light of the fact that Pidd1 has additional functions, outside the PIDDosome (PMID: 35343572), and that we made similar observations when analyzing ploidy in hepatocytes (PMID: *31983631). Given the fact that all components of the PIDDosome show a similar phenotype, and that this phenotype is mimicked by loss of the protein that connects PIDD1 and centrosomes, ANKRD26 (Fig. 4a), we are confident that this biological variation in our analysis is not affecting our conclusions. *

      On line 459 the authors state that the increase in polyploidy in PIDDosome knockouts occurs in adult hood but this is not directly tested. In fact, in the next section the polyploidy is assessed in early postnatal development. This statement should be explained or removed.

      We see that we have made an unclear statement here. In fact, we first noted increases in ploidy in adult heart and then define the time window in development when this happens. This sentence will be rephrased.

      In Figure 4. The authors obtained RNAseq data for P1, P7 and P14 but only show the differences with and without caspase-2 at P7. Given that the differences in ploidy are more significant at P14 (Fig 3D), all the comparisons should be shown along with analysis of whether the same genes/gene families are altered in the absence of caspase-2.

      The reason why we focus on postnatal day 7 (P7) is that data from Alkass et al (PMID: 26544945) and other labs (PMID: 31175264 ) document that on this day the initial wave of binucleation peaks. Hence, we hypothesized that the PIDDosome must be active in most CM, which aligns well with the increased mRNA levels of all of its components (Figure 3). Interestingly, it seems that its action is tightly regulated, as mRNA of PIDDosome components drop on P10, suggesting PIDDosome shut-down or downregulation. Similar findings have been noted in the liver (PMID: *31983631). Alkass and colleagues also show that very few CMs enter another round of DNA synthesis between P7 and P14, and hence possible transcriptome changes in the absence of the PIDDosome will be strongly diluted. *

      Please note that on P1, there is no difference between genotypes to be expected as all CM are mononucleated diploids and cytokinesis competent, as previously demonstrated (PMID: *26544945). Moreover, PIDDosome expression levels are extremely low (Fig. 3A). As such, no difference between genotypes are expected on P1. In addition, on P14 the ploidy phenotype observed in PIDDosome knockout mice reaches the maximum and ploidy increases are comparable to adult tissue. Thus, at this time the trigger for PIDDosome activation (cytokinesis failure) is no longer observed as the majority of CMs are post-mitotic, (PMID: 26247711). As such the impact of PIDDosome activation on the P14 transcriptome is most likely negligible. However, if desired, we can expand our bioinformatics analysis summarizing findings made related to DEGs over time in wt animals by comparing genotypes also on day 1 and day 14. In light of the above, analysis between genotypes on P7 holds still appears as the one most meaningful. *

      Some validation of the RNAseq and/or proteomics results would be an important addition to this study

      We agree with this notion and propose to validate key candidates related to cardiomyocyte proliferation and polyploidization, some of which we found to be differentially expressed at the mRNA level on day 7in the RNAseq data (e.g., p21, Foxm1, Kif18a, Lin37 and others)

      Regarding the proteomics results, we face the challenge that we can only try to confirm if candidate proteins are likely caspase substrates in silico using DeepCleave*, and potentially pick one or two candidates linked to CM differentiation for further analysis in vitro and in heterologous cell based assays (e.g. 293T cells), as no bona-fide ventricular cardiomyocyte cell lines exist. Primary postnatal CMs are extremely difficult to transfect, nor they proliferate without drug-treatment, or fail cytokinesis ex vivo. *

      Figure 4D: the authors make the conclusion that p21 is downstream of PIDD (et p53 independent). However, this is not supported by the data because the increase in 4N cells/decrease in 2N cells, although statistically significant, is nowhere near that of caspase-2 KO and caspase-2/p21 KO. Statistics should also compare p32KO with c2KO. In the absence of any other data, the more likely conclusion is that p21 is not involved.

      *We agree that the findings related to the impact seen upon loss of p21 suggest that it is not the only effector involved in ploidy control and it may not even be an effector engaged by caspase-2, as C2/p21 DKO mice have an even higher ploidy increase, albeit not statistically significant. However, it is important to highlight that p21 (Cdkn1a) was found to be downregulated in our transcriptomic analysis suggesting an involvement in the caspace2-cascade. We are happy to highlight this when presenting the results and in the discussion. *

      *We assume that this referee refers to p73 KO data that should be compared to Casp2 KO data (could be read as p73 or p53, but the latter we compare side by side with Casp2 in Fig. 4 already). As p73 KO mice were not found to be viable beyond day 7 (our attempt to find animals on day 10 failed, in line with published literature (PMID: 24500610, 10716451)), we can only offer to compare this data set to the data presented in Figure 3C, where we have analyzed ploidy increases on day 7 from wt and PIDDosome mutant mice. This re-analysis will show that only Caspase-2 mutant mice display a significant ploidy increase on P7, when compared to wt or p73 mutant animals, while no difference are noted between wt and p73 mutant mice (to be included in new Suppl. Fig. 3C) *

      Minor comments: Suggest moving Figure 4A to Figure 3 as it seems to fit better there based on the citation of this figure in the text

      *We can see some benefit in this recommendation and included panel 4A now in an updated version of Figure 3. *

      Recommend enhancing the brightness of microscopy images in Figure 1E and 2D

      We will try to improve image quality, may have been due to PDF conversion


      Significance

      This study provides interesting information for the role of the PIDDosome in protecting from polyploidy and adds to the body of work by this same group studying this pathway in the liver.

      The main weakness in terms of significance is the lack of a phenotype in the hearts of these animals. Therefore, it is clear that ploidy (or at least PIDDosome dependent ploidy) has minimal impact on cardiac development.

      We respectfully disagree with the comment that the lack of impact on cardiac function constitutes a weakness of our findings. Several studies on ploidy control in the liver (PMID 34228992) but importantly also heart (PMID: 36622904) have failed to document a clear impact of increased ploidy on organ function. This does not infer insignificance, but maybe rather that the context where this becomes relevant has not been identified. We are happy expand on this in our discussion

      • *

      The authors mention that they have not tried giving these mice an myocardial infarct (MI) or inducing any other type of cardiac damage. Although it is understood that these experiments are likely outside of the scope of the present study, without this information the impact of this study is moderate. I recommend expanding the discussion to provide a more in-depth possible rationale as to why ploidy perturbations do not lead to structural changes like in the liver.

      Despite this, the insights to the pathway itself are interesting to investigators in the caspase-2 field if a little underdeveloped, especially concerning the role of p21.

      My expertise is in cell death and caspase biology (especially caspase-2). I have sufficient expertise to evaluate all parts of this paper.

      *As mentioned above, we will amend our conclusions on p21, in light of potential findings made when validating DEG candidates, as stated above. *

      *We hope that the changes and amendments proposed here will be satisfactory to this referee to recommend publication of a revised manuscript. *

      • *


      Reviewer #2

      __Evidence, reproducibility and clarity: __

      __Summary: __

      In this study, the authors investigated the role of the PIDDosome during cardiomyocyte polyploidization. PIDDosome is a multi-protein complex activating the endopeptidase Caspase-2, and shown to be involved in eliminating cells with extra centrosomes or in response to genotoxic stress (Burigotto & Fava, 2021, Sladky and Villunger, 2020). In both cases, the PIDDosome is recruited in a ANKRD26-dependent manner at the centrosomes leading to p53 stabilization and cell death (Burigotto & Fava, 2021; Evans et al., 2020; Burigotto et al., 2021).

      Here, by studying mouse cardiomyocyte differentiation, the authors showed that PIDDosome is imposing ploidy restriction during cardiomyocyte differentiation. Importantly, in contrast to a previous report in the liver (Sladky et al., 2020), they showed that PIDDosome acts in a p53-independent manner in cardiomyocytes. Indeed, they suggested that PIDDosome controls ploidy in cardiomyocytes through p21 activation.

      We want to thank this reviewer for the time taken to evaluate our work and provide critical feedback that will help to improve our revised manuscript.

      __Major comments: __

      In general the conclusions of the authors are well supported by the experiments. However, I would suggest the following experiments/analysis to strengthen the paper:

      The authors should improve the Figure 1 to help the readers who are not familiar with cardiomyocyte polyploidization. For instance, I would suggest to add a scheme to summarize cardiomyocyte polyploidization (in terms of nuclear size, mono vs multi and so on).

      We agree that a visual summary of the postnatal timing of CM polyploidization will be helpful for the generalist not familiar with the topic and have added a scheme, adapted from a study by Alkass et al. (PMID: *26544945), who elegantly defined the timing of this process during postnatal mice life (now Fig. 1A). *

      Based on the images they presented in 1B, the authors should also measure the nuclear area or volume in the different conditions in which components of the PIDDosome were depleted. Indeed, these two parameters should be easier to conceptualize for the readers (instead of the fluorescence nuclear intensity). This could help to understand if the nuclear size is maintained between the different conditions and if this is comparable between mono, bi or multinucleated cardiomyocytes.

      We have acquired this data and it can be used to provide additional information on nuclear area and/or volume. We propose to focus on re-analyzing data from wt, Casp2 and XMLC2CRE/Casp2f/f mice. The additional information can be included in Figures 1 & 2, respectively.

      • In Figure 2A, the authors presented cross section of heart from animals showing that PIDDosome depletion has no effect on heart size. This is a surprising result since cardiomyocytes have higher ploidy levels and this could have an effect on their function. Since the importance of this observation, the authors should present a quantification of the heart size in the different conditions shown in Figure 2A.

      We agree with this comment. We can measure the heart vs. body weight ratio or tibia length in adult Casp2-/- vs. WT (3 month old) in order to indirectly evaluate possible increases in CM size linked to increased ploidy.

      Also, the percentage of cardiomyocytes presenting higher levels of ploidy seems quite low. The authors should discuss this point. In particular because this could explain the absence of consequences on heart size and function at steady state.

      We agree with this conclusion and will expand on this in our discussion. It is important to note that as opposed to findings made in liver (PMID: *31983631), genetic manipulation of ploidy regulators such as E2f7/8 (PMID: 36622904), only led to modest changes in CM ploidy, suggesting that either a small band-width compatible with normal heart function exists, or that additional mechanisms exist that take control when these thresholds set by the PIDDosome or E2f7/8 are exceeded. These mechanisms could involve Cyclin G (PMID: 20360255), or TNNI3K (PMID: 31589606). Importantly, a recent publication has shown that overexpression of Plk1(T210D) and Ect2 from birth causes increased heart weight coupled with a minor decrease in CM size. These mice undergo to premature death (PMID: 39912233) suggesting that CM polyploidization is a tight regulated process regulated by several independent mechanisms during heart development. *

      • *

      In Figure 2D, the authors measured the cardiomyocyte cross-sectional area and concluded that removing PIDDosome components have no effect on cardiomyocyte cell size. Since it has been shown that ploidy increase is normally associated with an increase in cell area, the authors should measure cell area of cardiomyocytes analyzed in Figure 1B. It could be then interesting to establish a correlation with nuclear area and the mono, bi or multinucleated status. This will strengthen the results showing that ploidy increases without affecting cell area.

      Indeed, studies in PIDDosome deficient livers suggest that tissue is containing fewer but bigger cells (PMID: *31983631). As opposed to the liver the percentage of cardiomyocytes presenting higher levels of ploidy is relatively low. Thus, a possible increase in CM size in PIDDosome deficient mice may be masked in heart cross-sections. In order to better correlate the ploidy with cell size, we propose to reanalyze our microscopy images used to extract the data displayed in Fig. 1D. We may run into the problem though that the number of cells acquired may become limiting to achieve sufficient statistical power. In this case we could pool data from different PIDDosome mutant CM to increase statistical power. Again, we propose to initially prioritize wt vs. Casp2 vs. XMLC2/Casp2f/f mice. In addition, we can offer to quantify heart to body weight ratio or tibia length as an additional read-out (see answer to a previous reviewer comment). *

      The authors should discuss the fact that PIDDosome depletion lead only to a mild increase in ploidy levels (4N) in a small percentage of cardiomyocyte. If the PIDDosome is controlling ploidy, one could expect that removing it should lead to a drastic increase in the ploidy levels. Is PIDDosome depletion leading to cell death in some cardiomyocyte? The authors should discuss this point in the discussion or if relevant show a staining with an apoptosis marker. Is another mechanism compensating to prevent higher ploidy levels in cardiomyocytes?

      These are valid thoughts, some of which we contemplated before. In part, we have addressed them in our response to Reviewer#1, above, discussing similar findings made in E2f7/8 deficient hearts (PMID: 36622904), or Cyclin G overexpressing hearts (PMID: 20360255), where also only modest changes in ploidy were achieved. Together these observations are suggesting alternative control mechanism able to act, or limited tolerance towards larger shifts in ploidy, incompatible with proper cell function and survival. Towards this end, we can offer to test if we find increased signs of cell death in PIDDosome mutant hearts by TUNEL staining of histological sections. Of note, we did not find evidence for such a phenomenon in the liver (PMID: 31983631).

      Even if the authors presented RNAseq data suggesting that the PIDDosome is activated during cardiomyocyte differentiation, they should clearly demonstrate this point to strengthen the message of the paper. Indeed, the conclusions are based on the absence of PIDDosome components triggering higher ploidy in cardiomyocytes. However, we don't know whether (and when) the PIDDosome is activated during cardiomyocyte differentiation to control their ploidy levels. I would suggest to analyze PIDDosome activation markers by immunofluorescence in *cardiomyocytes at different developmental stages. *

      *We agree with this referee that direct proof of PIDDosome activation would be helpful and that we only infer back from loss of function phenotypes when and where the PIDDosome becomes activated. However, several technical issues prevent us from collecting more direct evidence of PIDDosome activation in the developing heart. 1) Polyploidization in heart CM appears to happen gradually in CM from day 3 on with a peak at day 7 (PMID: 26544945). Hence, this is not a synchronous process, where we could pinpoint simultaneous activation of the PIDDosome in all cells at the same time, which would facilitate biochemical analysis, e.g., by western blotting for signs of Caspase-2 activation (i.e. the loss of its pro-form, PMID: 28130345). 2) Our most reliable readout, MDM2 cleavage by caspase-2 giving rise to specific fragments detectable in western, is not applicable to mouse tissue, as the antibody we use only detects human MDM2 (PMID: 28130345) and no other MDM2 Ab we tested gave satisfactory results. Independent of that, 3) we do not see involvement of p53 in CM ploidy control (arguing against a role of MDM2). *

      *As such, we can only offer to look at extra centrosome clustering in postnatal binucleated CM (as also suggested further below), as a putative trigger for PIDDosome activation. However, this has been published by the first author of this study before (PMID 31301302). Given that we have made the significant effort to time resolve the increase in ploidy in postnatal mice (please note that several hearts needed to be pooled for each time point, analyzed in multiple biological replicates), we think that our conclusions are well-justified based on the genetic data provided. *

      Concerning the methods, the authors must add the references for each product they used and not only the origin. When relevant, the RRID should be indicated. Without this information the method and the data cannot be reproduced.

      We will update this information where relevant to reproduce our results

      Minor comments:

      In general, the text and the figures are clear. Nevertheless, I would suggest the following changes:

      • Figures 1B, 2B and 2C: the y-axis must start at 0.

      We will adopt axes accordingly

      Figure 4A: The authors should stain centrosomes in cardiomyocytes. This should strengthen the conclusion taken by the authors based on the results obtained in mice depleted for ANKRD26. Indeed, for the moment they are insufficient to conclude about the role of the centrosomes. The authors should show that centrosomes cluster in cardiomyocytes (a condition necessary for PIDDosome activation in polyploid cells) and if possible that component of the PIDDosome are recruited here.

      *This point is well taken and addressed in part above. Clustering of extra centrosomes has been documented and published by the first author of this study in rat polyploid cardiomyocytes (PIMID; cited). We can offer to show clustering of centrosomes in mouse CM isolated from day 7 hearts, but while PIDD1 can be detected well in MEF, we repeatedly failed to stain fro PIDD1 in primary CMs. *

      Figure 4F: I would suggest to modify the working model to emphasize more the differences between WT and PIDDosome KO.

      We will aim to improve this cartoon/graphical abstract

      The prior studies are referenced appropriately.

      Reviewer #2 (Significance (Required)):

      How polyploid cells control their ploidy levels during differentiation remains poorly understood. The data presented here represent thus an advance concerning this question. The actual model concerning PIDDosome activation relies on the presence of extra centrosomes that drives the ANKDR26-dependent recruitment of the PIDDosome. Then, Caspase 2 is activated leading to a p53-p21 dependent cell cycle arrest (Burigotto & Fava, 2021, Sladky and Villunger, 2020; Janssens & Tinel, 2012; Evans et al., 2020; Burigotto et al., 2021). In this study, the authors showed that similar pathway takes place during cardiomyocyte differentiation to control ploidy levels. These data are reminiscent of previous work showing PIDDosome involvement during hepatocyte polyploidization (Sladky et al. 2020). Together, these data highlight the prominent role of the PIDDosome complex in controlling ploidy levels in physiological context. Importantly, this study identified that the classical p53-dependent cell cycle arrest described after PIDDosome activation is not involved here. Instead, the data established that independently of p53, p21 contribute to control cardiomyocyte ploidy. In consequence, this study extends the initial pathway associated with PIDDosome activation and suggest that other mechanisms could take place to restrain cell proliferation upon PIDDosome activation. Overall, this makes this paper significant and of interest for the following fields: polyploidy, heart/cardiomyocyte development and PIDDosome.

      My field of expertise includes polyploidy, cell cycle and genetic instability.

      We thank this reviewer for the time taken and the positive feedback provided.

      • *

      • *

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      • *

      N/A

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      • *

      • *As outlined above, limited tools are available to validate putative caspase-2 substrates, identified in proteomics analysis, in an impactful manner. *

      • *Also, as discussed above, we deem myocardial infarction experiments in mice as unsuitable to improve our work, as with all likely-hood, they will yield negative results. *
    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study investigates the potential of targeting specific regions within the RNA genome of the Porcine Epidemic Diarrhea Virus (PEDV) for antiviral drug development. The authors used SHAPE-MaP to analyze the structure of the PEDV RNA genome in infected cells. They categorized different regions of the genome based on their structural characteristics, focusing on those that might be good targets for drugs or small interfering RNAs (siRNAs).

      They found that dynamic single-stranded regions can be stabilized by compounds (e.g., to form G-quadruplexes), which inhibit viral proliferation. They demonstrated this by targeting a specific G4-forming sequence with a compound called Braco-19. The authors also describe stable (structured) single-stranded regions that they used to design siRNAs showing that they effectively inhibited viral replication.

      Strengths:

      There are a number of strengths to highlight in this manuscript.

      (1) The study uses a sophisticated technique (SHAPE-MaP) to analyze the PEDV RNA genome in situ, providing valuable insights into its structural features.

      (2) The authors provide a strong rationale for targeting specific RNA structures for antiviral development.

      (3) The study includes a range of experiments, including structural analysis, compound screening, siRNA design, and viral proliferation assays, to support their conclusions.

      (4) Finally, the findings have potential implications for the development of new antiviral therapies against PEDV and other RNA viruses.

      Overall, this interesting study highlights the importance of considering RNA structure when designing antiviral therapies and provides a compelling strategy for identifying promising RNA targets in viral genomes.

      Weaknesses:

      I have some concerns about the utility of the 3D analyses, the effects of their synonymous mutants on expression/proliferation, a potentially missed control for studies of mutants, and the therapeutic utility of the compound they tested vs. Gquadruplexes.

      We thank the reviewer for their positive assessment and insightful comments. Below, we address each point of concern:

      (1) The utility of the 3D analyses:

      In the revised manuscript, we have toned down this discussion and moved Figure 3A to the supplementary materials to reduce any sense of fragmentation in the overall story. While SHAPE-MaP technology is mature and convenient to use and can indeed capture some RNA structural elements with special functions in certain case; we acknowledge that its application for 3D analyses requires further validation. We believe this approach will become more prevalent in future research.

      (2) The effects of synonymous mutants on expression/proliferation:

      In the PEDV genome, the PQS1 mutation site encodes lysine (AAG). Given that lysine has only two codons (AAG and AAA), the G3109A synonymous mutation represented our sole viable option. Published studies (Ding et al., 2024) confirm that neither AAG nor AAA are classified as rare or dominant codons in mammalian cells. Therefore, the observed changes in viral proliferation levels are likely to stem from alterations in RNA secondary structure rather than codon usage effects.

      REFERENCES:

      Ding W, Yu W, Chen Y, et al. Rare codon recoding for efficient noncanonical amino acid incorporation in mammalian cells. Science. 2024;384(6700):1134-1142. 

      (3) Potentially missed control for studies of mutants:

      In the revised manuscript, we have incorporated additional control experiments evaluating Braco-19's therapeutic effects on the PQS3 mutant strain (Figure 4 – figure supplement 3):

      (4) The therapeutic utility of Braco-19 vs. G-quadruplexes:

      While Braco-19 is indeed a broad-spectrum G4 ligand, our data clearly show that not all PQSs in the viral genome can form G4 structures. Our findings primarily provide proof-of-concept that sequences with high G4-forming potential in viral genomes represent viable targets for antiviral therapy. Future studies could leverage SHAPEguided structural insights to design ligands with enhanced specificity for viral G4s, potentially improving therapeutic utility while minimizing off-target effects.

      Reviewer #2 (Public review):

      Summary:

      Luo et. al. use SHAPE-MaP to find suitable RNA targets in Porcine Epidemic Diarrhoea Virus. Results show that dynamic and transient structures are good targets for small molecules, and that exposed strand regions are adequate targets for siRNA. This work is important to segment the RNA targeting.

      Strengths:

      This work is well done and the data supports its findings and conclusions. When possible, more than one technique was used to confirm some of the findings.

      Weaknesses:

      The study uses a cell line that is not porcine (not the natural target of the virus).

      We thank the reviewer for their insightful comments and recognition of our study's value. The most commonly employed cell models for in vitro PEDV studies are monkey-derived Vero E6 cells and porcine PK1 cells. However, PEDV (particularly our strain) exhibits significantly lower replication efficiency in PK1 cells compared to Vero cells, and no cytopathic effects were observed in PK1 cells. In our preliminary attempts to perform SHAPE-MaP experiments using infected PK1 cells, the sequencing data showed less than 0.03% alignment to the PEDV genome, rendering subsequent analysis and downstream experiments unfeasible.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Luo et al. applied SHAPE-Map to analyze the secondary structure of the Porcine Epidemic Diarrhoea Virus (PEDV) RNA genome in infected cells. By combining SHAPE reactivity and Shannon entropy, the study indicated that the folding of the PEDV genomic RNA was nonuniform, with the 5' and 3' untranslated regions being more compactly structured, which revealed potentially antiviral targetable RNA regions. Interestingly, the study also suggested that compounds bound to well-folded RNA structures in vitro did not necessarily exhibit antiviral activity in cells, because the binding of these compounds did not necessarily alter the functions of the well-folded RNA regions. Later in the manuscript, the authors focus on guanine-rich regions, which may form G-quadruplexes and be potential targets for small interfering RNA (siRNA). The manuscript shows the binding effect of Braco-19 (a G-quadruplex-binding ligand) to a predicted G4 region in vitro, along with the inhibition of PEDV proliferation in cells. This suggests that targeting high SHAPE-high Shannon G4 regions could be a promising approach against RNA viruses. Lastly, the manuscript identifies 73 singlestranded regions with high SHAPE and low Shannon entropy, which demonstrated high success in antiviral siRNA targeting.

      Strengths:

      The paper presents valuable data for the community. Additionally, the experimental design and data analysis are well documented.

      Weakness:

      The manuscript presents the effect of Braco-19 on PQS1, a single G4 region with high SHAPE and high Shannon entropy, to suggest that "the compound can selectively target the PQS1 of the high SHAPE-high Shannon region in cells" (lines 625-626). While the effect of Braco-19 on PQS1 is supported by strong evidence in the manuscript, the conclusion regarding the G4 region with high SHAPE and high Shannon entropy is based on a single target, PQS1.

      We thank the reviewer for their positive assessment of our methodology and dataset. We propose that dynamic RNA structures in high SHAPE-high Shannon regions, when stabilized by small molecules, can serve as viable targets for antiviral therapy. Gquadruplexes represent a characteristic type of such dynamic structures that compete with local stem-loop formations in the genome. While we identified seven highly conserved PQSs in the PEDV genome, only PQS1 was located within a high SHAPEhigh Shannon region. To further validate this concept, we have supplemented the revised manuscript with Thioflavin T (ThT) fluorescence turn-on assays (Figures 3D, 3E, and Figure 3 – figure supplement 6), which provide additional evidence for the differential G4-forming capabilities of PQSs across regions with distinct structural features.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major Comments:

      (1) It could be valuable for the authors to spend some more effort comparing their approach to siRNA target discovery and design to current methods for siRNA design. It would be good to highlight which components are novel, and which might offer superior performance with respect to other existing methods.

      We thank the reviewer for highlighting this important point. In response, we have rewritten the relevant section in the discussion:

      “Our approach uniquely integrates in situ RNA structural data (SHAPE reactivity and Shannon entropy) to prioritize siRNA targets within stable single-stranded regions (high SHAPE reactivity, low Shannon entropy), which are experimentally validated as accessible in infected cells. This represents a significant departure from traditional siRNA design methods that rely primarily on sequence conservation, thermodynamic rules (e.g., Tuschl rules), or in vitro structural predictions (Ali Zaidi et al., 2023; Qureshi et al., 2018; Tang and Khvorova, 2024),which may not accurately reflect intracellular RNA accessibility. Bowden-Reid et al. designed 39 antiviral siRNAs against various SARS-CoV-2 variants based on sequence conservation, ultimately identifying 8 highly effective sequences (Bowden-Reid et al., 2023). Notably, five of these effective sequences targeted regions that were located in high SHAPE-high Shannon regions according to SARS-CoV-2 SHAPE datasets (Supplementary Table 8) (Manfredonia et al., 2020). This independent finding aligns perfectly with our conclusions and demonstrates that SHAPE-based siRNA design outperforms sequence/structureagnostic approaches, at least in terms of significantly improving antiviral siRNA screening efficiency. Given the growing availability of SHAPE datasets for numerous viruses, we are confident that our methodology will facilitate more precise design of antiviral siRNAs.”

      (2) The section targeting their discovered G4 structure with Braco-19 is interesting, particularly showing effects on viral proliferation; however, it's not clear to me how this compound could be used therapeutically against PEDV, as it is a non-selective binder of G4 structures. Their results are good support for the presence and functionality of a G4 structure in PEDV, but I don't see any strategy outlined in the manuscript on how this could be specifically targeted with Braco-19.

      While Braco-19 is indeed a broad-spectrum G4 ligand, our data demonstrate that not all PQSs in the viral genome can form G4 structures under physiological conditions. Our results specifically show that Braco-19 exerts its anti-PEDV activity by targeting PQS1, which is located in a high SHAPE-high Shannon entropy region. This target specificity was further confirmed by the complete resistance of the PQS1mut strain (lacking G4-forming ability) to Braco-19 treatment in our in vitro assays. 

      Additionally, previous studies have reported that during rapid viral replication, viral RNA accumulates to levels that significantly exceed host RNA concentrations. This "concentration advantage" suggests that G4 ligands like Braco-19 would preferentially bind viral G4 structures over host targets, thereby enhancing their antiviral specificity in vivo. In summary, our data provide proof-of-concept that viral genomic regions with high G4-forming potential - particularly those in high SHAPE-high Shannon entropy regions - represent promising targets for antiviral therapy.

      (3) The section where they proposed 3D RNA structures based on sequence similarity feels "tacked on" and I don't see how it adds to the overall story. The authors identify a short RNA hairpin in the PEDV genome with some sequence similarity to the CPEB3 nuclease P4 hairpin. However, they don't provide any evidence that this motif functions in a similar way or that it's important for the virus's life cycle. They also don't explain how this similarity could be exploited for antiviral drug development. It's not clear whether targeting this motif would have any effect on the virus. It's interesting that these two sequences share nucleotides, but it's unlikely that they share any homology...perhaps they convergently evolved (or were captured), but the similarity could also be coincidental.

      We appreciate the reviewer's insightful observation regarding this section. While our intention was to demonstrate that flexible conformations in high SHAPE-high Shannon regions could potentially be targeted, we acknowledge that extensive discussion of these motifs' functions would exceed the scope of this study, resulting in some disconnection from the main narrative. In response to this valuable feedback, we have consequentially removed it from the manuscript.

      (4) The authors should consider the optimality of the synonymous mutation (G3109A) that they introduced, as G3109A could swap a rare codon for a more optimal one. Even though the protein sequence is unaffected, the translation rate (and ability to proliferate) could be very different due to altered codon optimality. Additionally, to show the inactivity of the PQS3 mutant, the Braco-19 treatment studies performed on the PQS1 mutants could be repeated with PQS3 - using this as a control for these experiments.

      We appreciate the reviewer's insightful comment regarding codon optimization. In the PEDV genome, the PQS1 mutation site encodes lysine (AAG). Since lysine has only two codons (AAG and AAA), the G3109A synonymous mutation was our only viable option. Published literature (Ding et al. 2024) confirms that neither AAG nor AAA are classified as either preferred or rare codons in mammalian cells. Therefore, this substitution should have minimal direct impact on translation efficiency. Compared to nonsynonymous mutations that would alter amino acid sequences, we believe this synonymous mutation represents the optimal approach for maintaining native protein function while introducing the desired structural modification.

      REFERENCES:

      Ding W, Yu W, Chen Y, et al. Rare codon recoding for efficient noncanonical amino acid incorporation in mammalian cells. Science. 2024;384(6700):1134-1142.

      In the revised version, we have added control experiments showing the inhibitory activity of Braco-19 against the PQS3 mutant strain (Figure 4—figure supplement 3C) and discussed it in the results section.

      “Furthermore, as a control, we observed nearly identical inhibitory activity of Braco19 against both the PQS3 mutant strain (AJ1102-PQS3mut) and wild-type virus (Figure 4—figure supplement 3C), demonstrating the specificity of Braco-19's action on PQS1.”

      Minor Comments:

      (5) The authors' description of the Shannon Entropy could be improved. The current description makes it seem like the Shannon Entropy only provides information on base pairing, however, the Shannon entropy quantifies the uncertainty of structural states at each position and is calculated based on the probabilities of the different states (paired or unpaired) that a nucleotide can adopt.

      We have revised the description of Shannon entropy in the manuscript:

      "The pairing probability of each nucleotide derived from SHAPE reactivities was subsequently used to calculate Shannon entropy. Regions with high Shannon entropy may adopt alternative conformations, while those with low Shannon entropy correspond to either well-defined RNA structures or persistently single-stranded regions (MATHEWS, 2004; Siegfried et al., 2014)."

      (6) The overall writing of the manuscript is very good, but there are some minor grammatical issues throughout, e.g., here are some of the ones that I caught:

      a) Lines 71-3: "various types of RNA structures such as hairpin structure, RNA singlestrand, RNA pseudoknot and RNA G-quadruplex (G4)" - the examples should be plural and, rather than "hairpins" (or in addition), perhaps add "helixes" to be more generically correct(?).

      We have revised the relevant description: 

      "various types of RNA structures such as stem-loop structures (with double-helical stems), RNA single-strand, RNA pseudoknot and RNA G-quadruplex (G4)"

      b) Lines 74-5: "Of these, RNA G4 has shown considerable promise because of the high stability and modulation by small molecules" should be "Of these, RNA G4 has shown considerable promise because of its high stability and ability for modulation by small molecules."

      We have revised the sentence:

      “Of these, RNA G4 has shown considerable promise because of its high stability and ability for modulation by small molecules.”

      c) Line 76: "have" should be "has".

      We have revised the sentence.

      d) Lines 104-5 (and elsewhere): "frameshift stimulation element (FSE)" should be "frameshift stimulatory element (FSE)".

      We have revised the sentence.

      e) Lines 428-9: following the Manfredonia's methods" should be "following Manfredonia's method" or "following the Manfredonia method".

      We have made the appropriate edit.

      These edits ensure grammatical accuracy and consistency with standard scientific terminology. We appreciate the reviewer's attention to detail, which has significantly improved the clarity of our manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) There are some important references missing, on shape-seq from Julius Lucks.

      We have added citations to the foundational work by Lucks et al. (2011, PNAS) that pioneered in vitro RNA structure probing using SHAPE-seq.

      (2) Describe the acronym "SHAPE",

      We have now included the full name of SHAPE:“Selective 2’-Hydroxyl Acylation and Primer Extension”.

      (3) Line 81: 2"-hydroxyl-selective - the prime is incorrect.

      We thank the reviewer for catching this technical error. We have corrected "2"hydroxyl" to "2'-hydroxyl".

      (4) Explaining a bit better how shape reagent works would be beneficial (one sentence should suffice).

      We have revised the Introduction section:

      “SHAPE reagents like NAI selectively modify flexible, unpaired 2′-OH groups in RNA, and these modifications are detected as mutations during reverse transcription, enabling precise mapping of RNA secondary structures through sequencing.”

      (5) Line 128: cite the paper that introduced NAI.

      We have now properly cited the original publication introducing NAI(Spitale et al., 2012).

      (6) Line 243: Can you describe what the compound is?

      The compound is Braco-19. This has now been included in the methods section. 

      (7) Line 272: describe what 3Dpol is and the source of it.

      We have supplemented the relevant information as follows:

      "3Dpol (recombinant RNA-dependent RNA polymerase; Abcam, ab277617, 0.02 mg/reaction)"

      (8) Figure 1 legend: For both C and D, the explanation of the G4 structure and the RISC complex should be added, otherwise, it becomes unclear why they are there.

      We have revised the captions for Figure 1 as follows:

      "(A) Well-folded regions (low SHAPE reactivity and low Shannon entropy; 26.40% of genome). These regions represent stably folded RNA structures with minimal conformational flexibility, likely serving as structural scaffolds or functional elements in viral replication. (B) Dynamic structured regions (low SHAPE reactivity and high Shannon entropy; 11.70% of genome). These conformationally plastic domains likely mediate regulatory switches between alternative secondary structures during infection. (C) Dynamic unpaired regions (high SHAPE reactivity and high Shannon entropy; 26.90% of genome). These regions are prone to form non-canonical nucleic acid structures (e.g., G-quadruplexes), which can be stabilized by small-molecule ligands to inhibit viral replication. (D) Persistent unpaired regions (high SHAPE reactivity and low Shannon entropy; 9.67% of genome). These regions are more accessible for siRNA binding, facilitating recruitment of Argonaute proteins and Dicer to form the RNAinduced silencing complex (RISC) for targeted cleavage."

      (9) Figure S2 panel A should be in Figure 1. This is a nice picture showing the backbone of the research.

      In the revised manuscript, we have reorganized Figure 1 and Figure S2 by incorporating the SHAPE-MaP workflow diagram (previously Figure S2A) into Figure 1 as panel (A): 

      (10) Please add the citation to Braco-19.

      We have now added the appropriate citation for Braco-19 (Gowan et al., 2002) in the revised manuscript.

      (11) Figure 5 legend: could you add in parenthesis the what ds means (and call Figure S28).

      We appreciate the reviewer's attention to detail. In the revised manuscript, we have clarified the abbreviations in the Figure 5 legend: ss (single-stranded targeting siRNAs); ds (dual-stranded targeting siRNAs). 

      (12) Line 107: I would argue that the "stabilization of a G4" inhibited viral proliferation. And that supports the point of the paper, that a small molecule that stabilizes the G4 can be used to reduce viral replication. I suggest emphasizing this thorough the paper.

      We fully concur with the reviewer's insightful perspective. In the revised manuscript, we have comprehensively strengthened the point of 'G4 stabilization' as an antiviral mechanism through the following enhancements:

      (1) In the Results section: We present Thioflavin T (ThT) fluorescence assays demonstrating the G4-forming capability of PQSs in the full-length PEDV genomic RNA context:

      “These findings indicate that although most PQSs can form G4 structures in vitro, PQS1—located in the high SHAPE-high Shannon entropy region—demonstrates the most robust G4-forming capability when competing with local secondary structures in the genomic context.”

      (2) In the Results section: The inclusion of Braco-19 inhibition assays using PQS3 mutant virus as control provides robust evidence that Braco-19 exerts its antiviral effects specifically through PQS1 stabilization:

      “Furthermore, as a control, we observed nearly identical inhibitory activity of Braco-19 against both the PQS3 mutant strain (AJ1102-PQS3mut) and wild-type virus, demonstrating the specificity of Braco-19's action on PQS1.”

      (3) In the Discussion section: We have rewritten the mechanistic interpretation to emphasize: 

      "Crucially, Braco-19 showed no inhibitory activity against the PQS1-mutant strain while maintaining potent activity against the PQS3-mutant strain (Figure 4E, Figure 4—figure supplement 3C). This suggests that the compound can selectively target the PQS1 of the high SHAPE-high Shannon region in cells." 

      (13) For PQS1, it's suggested that it is indeed a competing and transient conformation that forms the G4. I wonder if using an extended PQS1 (perhaps what is shown in Figure 3E) and using fluorescence, and/or K+ vs Li+, and/or in-vitro SHAPE could tell us more about this dynamic structure. Thioflavin T or any other fluorescent molecule that binds to G4s could be easily used to show how the formation of G4 may happen or not. In addition, how Braco-19 could really lock the dynamic structure in-vitro as well. I think the field would benefit from a deeper investigation of it.

      To address the dynamic competition between G4 and alternative RNA conformations, we performed Thioflavin T (ThT) fluorescence turn-on assay (now in Figure 3D-E and Figure 3—figure supplement 6) under physiological K<sup>+</sup> conditions (100 mM), with PRRSV-G4 RNA as a positive control. This reads as:

      “To validate whether SHAPE analysis could reflect the competitive conformational folding of PQSs in the PEDV genome, we performed in vitro transcription to obtain local intact structures containing PQSs within dynamic single-stranded regions and stable double-stranded regions (Table S6). Thioflavin T (ThT) fluorescence turn-on assays were conducted under physiological K<sup>+</sup> conditions (100 mM), with the G4 sequence of porcine reproductive and respiratory syndrome virus (PRRSV) serving as a positive control (Control-G4)(Fang et al., 2023). The results demonstrated that for short PQSs sequences containing only G4-forming motifs (Table S7), PQS1, PQS3, PQS4, and PQS6 all induced significant ThT fluorescence enhancement (Figure 3D-E, Figure 3—figure supplement 6), confirming their ability to form G4 structures. However, in long RNA fragments encompassing PQSs and their flanking sequences, only PQS1 and PQS4 exhibited pronounced ThT fluorescence responses (Figure 3DE), whereas PQS2, PQS3, and PQS6 showed negligible signals (Figure 3E, Figure 3— figure supplement 6). Notably, the PQS1-long chain displayed the strongest fluorescence signal, while its mutant counterpart (PQS1mut-long chain) exhibited the lowest background fluorescence (Figure 3D). These findings indicate that although most PQSs can form G4 structures in vitro, PQS1—located in the high SHAPE-high Shannon entropy region—demonstrates the most robust G4-forming capability when competing with local secondary structures in the genomic context. Therefore, PQS1 was selected for further structural and functional validation.”

      (14) Figure S29 is nice and informative. Consider moving it to the main text.

      We appreciate the reviewer's positive assessment of Figure S29. Now we have renamed this figure as "Figure 5—Supplement 2".

    1. Reviewer #1 (Public review):

      This is a very interesting paper addressing the hierarchical nature of the mammalian auditory system. The authors use an unconventional technique to assess brain responses -- functional ultrasound imaging (fUSI). This measures blood volume in the cortex at a relatively high spatial resolution. They present dynamic and stationary sounds in isolation and together, and show that the effect of the stationary sounds (relative to the dynamic sounds) on blood volume measurements decreases as one ascends the auditory hierarchy. Since the dynamic/stationary nature of sounds is related to their perception as foreground/background sounds (see below for more details), this suggests that neurons in higher levels of the cortex may be increasingly invariant to background sounds.

      The study is interesting, well conducted, and well written. I am broadly convinced by the results. However, I do have some concerns about the validity of the results, given the unconventional technique. fUSI is convenient because it is much less invasive than electrophysiology, and can image a large region of the cortex in one go. However, the relationship between blood volume and neuronal activity is unclear, and blood volume measurements are heavily temporally averaged relative to the underlying neuronal responses. I am particularly concerned about the implications of this for a study on dynamic/stationary stimuli in auditory cortical hierarchy, because the time scale of the dynamic sounds is such that much of the dynamic structure may be affected by this temporal averaging. Also, there is a well-known decrease in temporal following rate that is exhibited by neurons at higher levels of the auditory system. This means that results in different areas will be differently affected by the temporal averaging. I would like to see additional control models to investigate the impact of this.

      I also think that the authors should address several caveats: the fact that their measurements heavily spatially average neuronal responses, and therefore may not accurately reflect the underlying neuronal coding; that the perceptual background/foreground distinction is not identical to the dynamic/stationary distinction used here; and that ferret background/foreground perception may be very different from that in humans.

      Major points

      (1) Changes in blood volume due to brain activity are indirectly related to neuronal responses. The exact relationship is not clear, however, we do know two things for certain: (a) each measurable unit of blood volume change depends on the response of hundreds or thousands of neurons, and (b) the time course of the volume changes are are slow compared to the potential time course of the underlying neuronal responses. Both of these mean that important variability in neuronal responses will be averaged out when measuring blood changes. For example, if two neighbouring neurons have opposite responses to a given stimulus, this will produce opposite changes in blood volume, which will cancel each other out in the blood volume measurement due to (a). This is important in the present study because blood volume changes are implicitly being used as a measure of coding in the underlying neuronal population. The authors need to acknowledge that this is a coarse measure of neuronal responses and that important aspects of neuronal responses may be missing from the blood volume measure.

      (2) More importantly for the present study, however, the effect of (b) is that any rapid changes in the response of a single neuron will be cancelled out by temporal averaging. Imagine a neuron whose response is transient, consisting of rapid excitation followed by rapid inhibition. Temporal averaging of these two responses will tend to cancel out both of them. As a result, blood volume measurements will tend to smooth out any fast, dynamic responses in the underlying neuronal population. In the present study, this temporal averaging is likely to be particularly important because the authors are comparing responses to dynamic (nonstationary) stimuli with responses to more constant stimuli. To a first approximation, neuronal responses to dynamic stimuli are themselves dynamic, and responses to constant stimuli are themselves constant. Therefore, the averaging will mean that the responses to dynamic stimuli are suppressed relative to the real responses in the underlying neurons, whereas the responses to constant stimuli are more veridical. On top of this, temporal following rates tend to decrease as one ascends the auditory hierarchy, meaning that the comparison between dynamic and stationary responses will be differently affected in different brain areas. As a result, the dynamic/stationary balance is expected to change as you ascend the hierarchy, and I would expect this to directly affect the results observed in this study.

      It is not trivial to extrapolate from what we know about temporal following in the cortex to know exactly what the expected effect would be on the authors' results. As a first-pass control, I would strongly suggest incorporating into the authors' filterbank model a range of realistic temporal following rates (decreasing at higher levels), and spatially and temporally average these responses to get modelled cerebral blood flow measurements. I would want to know whether this model showed similar effects as in Figure 2. From my guess about what this model would show, I think it would not predict the effects shown by the authors in Figure 2. Nevertheless, this is an important issue to address and to provide control for.

      (3) I do not agree with the equivalence that the authors draw between the statistical stationarity of sounds and their classification as foreground or background sounds. It is true that, in a common foreground/background situation - speech against a background of white noise - the foreground is non-stationary and the background is stationary. However, it is easy to come up with examples where this relationship is reversed. For example, a continuous pure tone is perfectly stationary, but will be perceived as a foreground sound if played loudly. Background music may be very non-stationary but still easily ignored as a background sound when listening to overlaid speech. Ultimately, the foreground/background distinction is a perceptual one that is not exclusively determined by physical characteristics of the sounds, and certainly not by a simple measure of stationarity. I understand that the use of foreground/background in the present study increases the likely reach of the paper, but I don't think it is appropriate to use this subjective/imprecise terminology in the results section of the paper.

      (4) Related to the above, I think further caveats need to be acknowledged in the study. We do not know what sounds are perceived as foreground or background sounds by ferrets, or indeed whether they make this distinction reliably to the degree that humans do. Furthermore, the individual sounds used here have not been tested for their foreground/background-ness. Thus, the analysis relies on two logical jumps - first, that the stationarity of these sounds predicts their foreground/background perception in humans, and second, that this perceptual distinction is similar in ferrets and humans. I don't think it is known to what degree these jumps are justified. These issues do not directly affect the results, but I think it is essential to address these issues in the Discussion, because they are potentially major caveats to our understanding of the work.

    2. Author response:

      Reviewer #1:

      (1) Changes in blood volume due to brain activity are indirectly related to neuronal responses. The exact relationship is not clear, however, we do know two things for certain: (a) each measurable unit of blood volume change depends on the response of hundreds or thousands of neurons, and (b) the time course of the volume changes are slow compared to the potential time course of the underlying neuronal responses. Both of these mean that important variability in neuronal responses will be averaged out when measuring blood changes. For example, if two neighbouring neurons have opposite responses to a given stimulus, this will produce opposite changes in blood volume, which will cancel each other out in the blood volume measurement due to (a). This is important in the present study because blood volume changes are implicitly being used as a measure of coding in the underlying neuronal population. The authors need to acknowledge that this is a coarse measure of neuronal responses and that important aspects of neuronal responses may be missing from the blood volume measure.

      The reviewer is correct: we do not measure neuronal firing, but use blood volume as a proxy for bulk local neuronal activity, which does not capture the richness of single neuron responses. We will highlight this point in the manuscript. This is why the paper focuses on large-scale spatial representations as well as cross-species comparison. For this latter purpose, fMRI responses are on par with our fUSI data, with both neuroimaging techniques showing the same weakness.

      (2) More importantly for the present study, however, the effect of (b) is that any rapid changes in the response of a single neuron will be cancelled out by temporal averaging. Imagine a neuron whose response is transient, consisting of rapid excitation followed by rapid inhibition. Temporal averaging of these two responses will tend to cancel out both of them. As a result, blood volume measurements will tend to smooth out any fast, dynamic responses in the underlying neuronal population. In the present study, this temporal averaging is likely to be particularly important because the authors are comparing responses to dynamic (nonstationary) stimuli with responses to more constant stimuli. To a first approximation, neuronal responses to dynamic stimuli are themselves dynamic, and responses to constant stimuli are themselves constant. Therefore, the averaging will mean that the responses to dynamic stimuli are suppressed relative to the real responses in the underlying neurons, whereas the responses to constant stimuli are more veridical. On top of this, temporal following rates tend to decrease as one ascends the auditory hierarchy, meaning that the comparison between dynamic and stationary responses will be differently affected in different brain areas. As a result, the dynamic/stationary balance is expected to change as you ascend the hierarchy, and I would expect this to directly affect the results observed in this study.

      It is not trivial to extrapolate from what we know about temporal following in the cortex to know exactly what the expected effect would be on the authors' results. As a first-pass control, I would strongly suggest incorporating into the authors' filterbank model a range of realistic temporal following rates (decreasing at higher levels), and spatially and temporally average these responses to get modelled cerebral blood flow measurements. I would want to know whether this model showed similar effects as in Figure 2. From my guess about what this model would show, I think it would not predict the effects shown by the authors in Figure 2. Nevertheless, this is an important issue to address and to provide control for.

      We understand the reviewer’s concern about potential differences in response dynamics in stationary vs non-stationary sounds. In particular, it seems that the reviewer is concerned that responses to foregrounds may be suppressed in non-primary fields because foregrounds are not stationary, and non-primary regions could struggle to track and respond to these sounds. Nevertheless, we  observed the contrary, with non-primary regions over-representing non-stationary (dynamic) sounds, over stationary ones. For this reason, we are inclined to think that this explanation cannot falsify our findings.

      Furthermore, background sounds are not completely constant: they are still dynamic sounds, but their temporal modulation rates are usually faster (see Figure 3B). Similarly, neural responses to these two types of sounds are dynamic (see for example Hamersky et al., 2025, Figure 1).  Thus, we are not sure that blood volume would transform the responses to these types of sounds non-linearly.

      We understand the comment that temporal following rates might differ across regions in the auditory hierarchy and agree. In fact, we show that tuning to temporal rates differ across regions and partly explains the differences in background invariance we observe. We think the reviewer’s suggestion is already implemented by our spectrotemporal model, which incorporates the full range of realistic temporal following rates (up to 128 Hz). The temporal averaging is done as we take the output of the model (which varies continuously through time) and average it in the same window as we used for our fUSI data. When we fit this model to the ferret data, we find that voxels in non-primary regions, especially VP (tertiary auditory cortex), tend to be more tuned to low temporal rates (Figure 2F, G), and that background invariance is stronger in voxels tuned to low rates. This is, however, not true in humans, suggesting that background invariance in humans rely on different computational mechanisms.

      (3) I do not agree with the equivalence that the authors draw between the statistical stationarity of sounds and their classification as foreground or background sounds. It is true that, in a common foreground/background situation - speech against a background of white noise - the foreground is non-stationary and the background is stationary. However, it is easy to come up with examples where this relationship is reversed. For example, a continuous pure tone is perfectly stationary, but will be perceived as a foreground sound if played loudly. Background music may be very non-stationary but still easily ignored as a background sound when listening to overlaid speech. Ultimately, the foreground/background distinction is a perceptual one that is not exclusively determined by physical characteristics of the sounds, and certainly not by a simple measure of stationarity. I understand that the use of foreground/background in the present study increases the likely reach of the paper, but I don't think it is appropriate to use this subjective/imprecise terminology in the results section of the paper.

      We appreciate the reviewer’s comment that the classification of our sounds into foregrounds and backgrounds is not verified by any perceptual experiments. We use those terms to be consistent with the literature, including the paper we derived this definition from (Kell et al., 2019). These terms are widely used in studies where no perceptual or behavioral experiments are included, and even when animals are anesthetized. However, we will emphasize the limits of this definition when introducing it, as well as in the discussion.

      (4) Related to the above, I think further caveats need to be acknowledged in the study. We do not know what sounds are perceived as foreground or background sounds by ferrets, or indeed whether they make this distinction reliably to the degree that humans do. Furthermore, the individual sounds used here have not been tested for their foreground/background-ness. Thus, the analysis relies on two logical jumps - first, that the stationarity of these sounds predicts their foreground/background perception in humans, and second, that this perceptual distinction is similar in ferrets and humans. I don't think it is known to what degree these jumps are justified. These issues do not directly affect the results, but I think it is essential to address these issues in the Discussion, because they are potentially major caveats to our understanding of the work.

      We agree with the reviewer that the foreground-background distinction might be different in ferrets. In anticipation of that issue, we had enriched the sound set with more ecologically relevant sounds, such as ferret and other animal vocalizations. Nevertheless, the point remains valid and is already raised in the discussion. We will emphasize this limitation in addition to the limitation of our definition of foregrounds and backgrounds.

      Reviewer #2:

      (1) Interpretation of the cerebral blood volume signal: While the results are compelling, more caution should be exercised by the authors in framing their results, given that they are measuring an indirect measure of neural activity, this is the difference between stating "CBV in area MEG was less background invariant than in higher areas" vs. saying "MEG was less background invariant than other areas". Beyond framing, the basic properties of the CBV signal should be better explored:

      a) Cortical vasculature is highly structured (e.g. Kirst et al.( 2020) Cell). One potential explanation for the results is simply differences in vasculature and blood flow between primary and secondary areas of auditory cortex, even if fUS is sensitive to changes in blood flow, changes in capillary beds, etc (Mace et al., 2011) Nat. Methods.. This concern could be addressed by either analyzing spontaneous fluctuations in the CBV signal during silent periods or computing a signal-to-noise ratio of voxels across areas across all sound types. This is especially important given the complex 3D geometry of gyri and sulci in the ferret brain.

      We agree with the reviewers that there could be differences in vasculature across subregions of the auditory cortex. We will run analyses providing comparisons of basic signal properties across our different regions of interest. We note that this point would also be valid for the human fMRI data, for which we cannot run these controls. Nevertheless, this should not affect our analyses and results, which should be independent of local vascular density. First, we normalize the signal in each voxel before any analysis, so that the absolute strength of the signal, or blood volume in a given voxel, does not matter. Second, we do see sound-evoked responses in all regions (Figure S2) and only focus on reliable voxels in each region. Third, our analysis mostly relies on voxel-based correlation across sounds, which is independent of the mean and variance of the voxel responses. Thus, we believe that differences in vascular architecture across regions are unlikely to affect our results.

      b) Figure 1 leaves the reader uncertain what exactly is being encoded by the CBV signal, as temporal responses to different stimuli look very similar in the examples shown. One possibility is that the CBV is an acoustic change signal. In that case, sounds that are farther apart in acoustic space from previous sounds would elicit larger responses, which is straightforward to test. Another possibility is that the fUS signal reflects time-varying features in the acoustic signal (e.g. the low-frequency envelope). This could be addressed by cross-correlating the stimulus envelope with fUS waveform. The third possibility, which the authors argue, is that the magnitude of the fUS signal encodes the stimulus ID. A better understanding of the justification for only looking at the fUS magnitude in a short time window (2-4.8 s re: stimulus onset) would increase my confidence in the results.

      We thank the reviewer for raising that point as it highlights that the layout of Figure 1 is misleading. While Figure 1B shows an example snippet of our sound streams, Figure 1D shows the average timecourse of CBV time-locked to a change in sound (foreground or background, isolated or in a mixture). This is the average across all voxels and sounds, and the point is just to illustrate the dynamics for the three broad categories. In Figure 1E however, we show the cross-validated cross-correlation of CBV  across sounds (and different time lags). To obtain this, we compute for each voxel the response to each sound at each time lag, thus obtaining two vector of size number of sounds per lag, one per repeat. Then, we correlate all these vectors across the two repeats, obtaining one cross-correlation matrix per neuron. We finally average these matrices across all neurons. The fact that you see red squares demonstrates that the signal encodes sound identity, since CBV is more similar across two repeats of the same sound (for e.g., in the foreground only matrix, 0-5 s vs 0-5 s), than two different sounds (0-5 s vs. 7-12 s). We will modify the figure layout as well as the legend to improve clarity.

      (2) Interpretation of the human data: The authors acknowledge in the discussion that there are several differences between fMRI and fUS. The results would be more compelling if they performed a control analysis where they downsampled the Ferret fUS data spatially and temporally to match the resolution of fMRI and demonstrated that their ferret results hold with lower spatiotemporal resolution.

      We agree with the reviewer that the use of different techniques might come in the way of cross-species comparison. We will add additional discussion on this point. We already control for the temporal aspect by using the average of stimulus-evoked activity across time (note that due to scanner noise, sounds are presented cut into small pieces in the fMRI experiments). Regarding the spatial aspect, there are several things to consider. First, both species have brains of very different sizes, a factor that is conveniently compensated for by the higher spatial resolution of fUSI compared to fMRI (0.1 vs 2 mm). Downsampling to fMRI resolution would lead to having one voxel per region per slice, which is not feasible. We also summarize results with one value per region, which is a form of downsampling that is fairer across species. Furthermore, we believe that we already established in a previous study (Landemard et al, 2021 eLife) that fUSI and fMRI data are comparable signals. We indeed could predict human fMRI responses to most sounds from ferret fUSI responses to the same identical sounds.

      Reviewer #3:

      As mentioned above, interpretation of the invariance analyses using predictions from the spectrotemporal modulation encoding model hinges on the model's ability to accurately predict neural responses. Although Figure S5 suggests the encoding model was generally able to predict voxel responses accurately, the authors note in the introduction that, in human auditory cortex, this kind of tuning can explain responses in primary areas but not in non-primary areas (Norman-Haignere & McDermott, PLOS Biol. 2018). Indeed, the prediction accuracy histograms in Figure S5C suggest a slight difference in the model's ability to predict responses in primary versus non-primary voxels. Additional analyses should be done to a) determine whether the prediction accuracies are meaningfully different across regions and b) examine whether controlling for prediction accuracy across regions (i.e., sub-selecting voxels across regions with matched prediction accuracy) affects the outcomes of the invariance analyses.

      The reviewer is correct: the spectrotemporal model tends to perform less well in human non-primary cortex. We believe this does not contradict our results but goes in the same direction: while there is a gradient in invariance in both ferrets and humans, this gradient is predicted by the spectrotemporal model in ferrets, but not in humans (possibly indeed because predictions are less good in human non-primary auditory cortex). Regardless of the mechanism, this result points to a difference across species. We will clarify these points by quantifying potential differences in prediction accuracy in both species and comment on those in the manuscript.

      A related concern is the procedure used to train the encoding model. From the methods, it appears that the model may have been fit using responses to both isolated and mixture sounds. If so, this raises questions about the interpretability of the invariance analyses. In particular, fitting the model to all stimuli, including mixtures, may inflate the apparent ability of the model to "explain" invariance, since it is effectively trained on the phenomenon it is later evaluated on. Put another way, if a voxel exhibits invariance, and the model is trained to predict the voxel's responses to all types of stimuli (both isolated sounds and mixtures), then the model must also show invariance to the extent it can accurately predict voxel responses, making the result somewhat circular. A more informative approach would be to train the encoding model only on responses to isolated sounds (or even better, a completely independent set of sounds), as this would help clarify whether any observed invariance is emergent from the model (i.e., truly a result of low-level tuning to spectrotemporal features) or simply reflects what it was trained to reproduce.

      We thank the reviewer for this suggestion and will run an additional prediction using only the sounds presented in isolation. This will be included in the next version of the manuscript.

      Finally, the interpretation of the foreground invariance results remains somewhat unclear. In ferrets (Figure 2I), the authors report relatively little foreground invariance, whereas in humans (Figure 5G), most participants appear to show relatively high levels of foreground invariance in primary auditory cortex (around 0.6 or greater). However, the paper does not explicitly address these apparent cross-species differences. Moreover, the findings in ferrets seem at odds with other recent work in ferrets (Hamersky et al. 2025 J. Neurosci.), which shows that background sounds tend to dominate responses to mixtures, suggesting a prevalence of foreground invariance at the neuronal level. Although this comparison comes with the caveat that the methods differ substantially from those used in the current study, given the contrast with the findings of this paper, further discussion would nonetheless be valuable to help contextualize the current findings and clarify how they relate to prior work.

      We thank the reviewer for this point. We will indeed add further discussion of the  difference between ferrets and humans in foreground invariance in primary auditory cortex. In addition, while we found a trend for higher background invariance than foreground invariance in ferret primary auditory cortex, this difference was not significant and many voxels exhibit similar levels of background and foreground invariance (for example in Figure 2D, G). Thus, we do not think our results are inconsistent with Hamersky et al., 2025, though we agree the bias towards background sounds is not as strong in our data. This might indeed reflect differences in methodology, both in the signal that is measured (blood volume vs spikes), and the sound presentation paradigm. We will add this point to our discussion.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This is a very interesting paper investigating the fitness and cellular effects of mutations that drive dihedral protein complex into forming filaments. The Levy group have previously shown that this can happen relatively easily in such complexes and this paper now investigates the cellular consequences of this phenomenon. The study is very rigorous biophysically and very surprisingly comes up empty in terms of an effect: apparently this kind of self-assembly can easily be tolerated in yeast, which was certainly not my expectation. This is a very interesting result, because it implies that such assemblies may evolve neutrally because they fulfill the two key requirements for such a trajectory: They are genetically easily accessible (in as little as a single mutation), and they have perhaps no detrimental effect on fitness. This immediately poses two very interesting questions: Are some natural proteins that are known to form filaments in the cell perhaps examples of such neutral trajectories? And if this trait is truly neutral (as long as it doesn't affect the base biochemical function of the protein in question), why don't we observe more proteins form these kinds of ordered assemblies.

      I have no major comments about the experiments as I find that in general very carefully carried out. I have two more general comments:

      1. The fitness effect of these assemblies, if one exists, seems very small. I think it's worth remembering that even very small fitness effects beyond even what competition experiments can reveal could in principle be enough to keep assembly-inducing alleles at very low frequencies in natural populations. Perhaps this could be acknowledged in the paper somewhere.
      2. The proteins used in this study I think were chosen such that they do not have an important function in yeast that could be disrupted by assembly This allows the effect of the large scale assemblies to be measured in isolation. If I deduced this correctly, this should probably be pointed out agin in this paper (I apologise if I missed this).
      3. The model system in which these effects were tested for is yeast. This organism has a rigid cell wall and I was wondering if this makes it more tolerant to large scale assemblages than wall-less eukaryotes. Could the authors comment on this?

      Minor points:

      In Figure 2D, what are the fits? And is there any analysis that rules out expression effects on the mutant caused by higher levels of the wild-type? The error bars in Figure 2E are not defined.

      Significance

      This is a remarkably rigours paper that investigates whether self-assembly into large structures has any fitness effect on a single celled organism. This is very relevant, because a landmark paper from the Levy group showed that many proteins are very close in genetic terms to forming such assemblies. The general expectation I think would have been that this phenomenon is pretty harmful. This would have explained why such filaments are relatively rare as far as we know. This paper now does a large number of highly rigours experiments to first prove beyond doubt that a range of model proteins really can be coaxed into forming such filaments in yeast cells through a very small number of mutations. Its perhaps most surprising result is that this does not negatively affect yeast cells.

      From an evolutionary perspective, this is a very interesting and highly surprising result. It forces us to rethink why such filaments are not more common in Nature. Two possible answers come to mind: First, it's possible that filamentation is not directly harmful to the cell, but that assembling proteins into filaments can interfere with their basic biochemical function (which was not tested for here).

      Second, perhaps assembly does cause a fitness defect, but one so small that it is hard to measure experimentally. Natural selection is very powerful, and even fitness coefficients we struggle to measure in the laboratory can have significant effects in the wild. If this is true, we might expect such filaments to be more common in organisms with small effective population sizes, in which selection is less effective.

      A third possibility is of course that the prevalence of such self-assembly is under-appreciated. Perhaps more proteins than we currently know assemble into these structures under some conditions without any benefit or detriment to the organism.

      These are all fascinating implications of this work that straddle the fields of evolutionary genetics and biochemistry and are therefore relevant to a very wide audience. My own expertise is in these two fields. I also think that this work will be exciting for synthetic biologists, because it proves that these kinds of assemblies are well tolerated inside cells.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper is an elegant, mostly observational work, detailing observations that polysome accumulation appears to drive nucleoid splitting and segregation. Overall I think this is an insightful work with solid observations.

      Thank you for your appreciation and positive comments. In our view, an appealing aspect of this proposed biophysical mechanism for nucleoid segregation is its self-organizing nature and its ability to intrinsically couple nucleoid segregation to biomass growth, regardless of nutrient conditions.

      Strengths:

      The strengths of this paper are the careful and rigorous observational work that leads to their hypothesis. They find the accumulation of polysomes correlates with nucleoid splitting, and that the nucleoid segregation occurring right after splitting correlates with polysome segregation. These correlations are also backed up by other observations:

      (1) Faster polysome accumulation and DNA segregation at faster growth rates.

      (2) Polysome distribution negatively correlating with DNA positioning near asymmetric nucleoids.

      (3) Polysomes form in regions inaccessible to similarly sized particles.

      These above points are observational, I have no comments on these observations leading to their hypothesis.

      Thank you!

      Weaknesses:

      It is hard to state weaknesses in any of the observational findings, and furthermore, their two tests of causality, while not being completely definitive, are likely the best one could do to examine this interesting phenomenon.

      It is indeed difficult to prove causality in a definitive manner when the proposed coupling mechanism between nucleoid segregation and gene expression is self-organizing, i.e., does not involve a dedicated regulatory molecule (e.g., a protein, RNA, metabolite) that we could have eliminated through genetic engineering to establish causality. We are grateful to the reviewer for recognizing that our two causality tests are the best that can be done in this context.

      Points to consider / address:

      Notably, demonstrating causality here is very difficult (given the coupling between transcription, growth, and many other processes) but an important part of the paper. They do two experiments toward demonstrating causality that help bolster - but not prove - their hypothesis. These experiments have minor caveats, my first two points.

      (1) First, "Blocking transcription (with rifampicin) should instantly reduce the rate of polysome production to zero, causing an immediate arrest of nucleoid segregation". Here they show that adding rifampicin does indeed lead to polysome loss and an immediate halting of segregation - data that does fit their model. This is not definitive proof of causation, as rifampicin also (a) stops cell growth, and (b) stops the translation of secreted proteins. Neither of these two possibilities is ruled out fully.

      That’s correct; cell growth also stops when gene expression is inhibited, which is consistent with our model in which gene expression within the nucleoid promotes nucleoid segregation and biomass growth (i.e., cell growth), inherently coupling these two processes. This said, we understand the reviewer’s point: the rifampicin experiment doesn’t exclude the possibility that protein secretion and cell growth drive nucleoid segregation. We are assuming that the reviewer is envisioning an alternative model in which sister nucleoids would move apart because they would be attached to the membrane through coupled transcription-translation-protein secretion (transertion) and the membrane would expand between the separating nucleoids, similar to the model proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several observations arguing against cell elongation/transertion acting a predominant mechanism of nucleoid segregation.

      (1) For this alternative mechanism to work, membrane growth must be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation. Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To circumvent the membrane fluidity issue, one could potentially evoke an additional connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid. However, peptidoglycan growth is dispersed early in the cell division cycle when the nucleoid splitting happens in fast growing cells and only appears to be zonal after the onset of cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In our revised manuscript, we clarify this point and provide confirmatory data showing that the cell elongation rate is indeed slower than the nucleoid segregation rate (Figure 1H and Figure 1 - figure supplement 5A), indicating that it cannot be the main driver.

      (3) The asymmetries in nucleoid compaction that we described in our paper are predicted by our model. We do not see how they could be explained by cell growth or protein secretion.

      (4) We also show that polysome accumulation at ectopic sites (outside the nucleoid) results in correlated nucleoid dynamics, consistent with our proposed mechanism. It is not clear to us how such nucleoid dynamics could be explained by cell growth or protein secretion (transertion).

      (1a) As rifampicin also stops all translation, it also stops translational insertion of membrane proteins, which in many old models has been put forward as a possible driver of nucleoid segregation, and perhaps independent of growth. This should at last be mentioned in the discussion, or if there are past experiments that rule this out it would be great to note them.

      It is not clear to us how the attachment of the DNA to the cytoplasmic membrane could alone create a directional force to move the sister nucleoids. We agree that old models have proposed a role for cell elongation (providing the force) and transertion (providing the membrane tether). Please see our response above for the evidence (from the literature and our work) against it. This was mentioned in the Introduction and Results section, but we agree that this was not well explained. We have now put emphasis on the related experimental data (Figure 1H, Figure 1 – figure supplement 5A, ) and revised the text (lines 199 - 210) to clarify these points.

      (1b) They address at great length in the discussion the possibility that growth may play a role in nucleoid segregation. However, this is testable - by stopping surface growth with antibiotics. Cells should still accumulate polysomes for some time, it would be easy to see if nucleoids are still segregated, and to what extent, thereby possibly decoupling growth and polysome production. If successful, this or similar experiments would further validate their model.

      We reviewed the literature and could not find a drug that stops cell growth without stopping gene expression. Any drug that affects the integrity or potential of the membrane depletes cells of ATP; without ATP, gene expression is inhibited. However, our experiment in which we drive polysome accumulation at ectopic sites decouples polysome accumulation from cell growth. In this experiment, by redirecting most of chromosome gene expression to a single plasmid-encoded gene, we reduce the rate of cell growth but still create a large accumulation of polysomes at an ectopic location. This ectopic polysome accumulation is sufficient to affect nucleoid dynamics in a correlated fashion. In the revised manuscript, we have clarified this point and added model simulations (Figure 7 – figure supplement 2) to show that our experimental observations are predicted by our model.

      (2) In the second experiment, they express excess TagBFP2 to delocalize polysomes from midcell. Here they again see the anticorrelation of the nucleoid and the polysomes, and in some cells, it appears similar to normal (polysomes separating the nucleoid) whereas in others the nucleoid has not separated. The one concern about this data - and the differences between the "separated" and "non-separated" nuclei - is that the over-expression of TagBFP2 has a huge impact on growth, which may also have an indirect effect on DNA replication and termination in some of these cells. Could the authors demonstrate these cells contain 2 fully replicated DNA molecules that are able to segregate?

      We have included new flow cytometry data of fluorescently labeled DNA to show that DNA replication is not impacted.

      (3) What is not clearly stated and is needed in this paper is to explain how polysomes do (or could) "exert force" in this system to segregate the nucleoid: what a "compaction force" is by definition, and what mechanisms causes this to arise (what causes the "force") as the "compaction force" arises from new polysomes being added into the gaps between them caused by thermal motions.

      They state, "polysomes exert an effective force", and they note their model requires "steric effects (repulsion) between DNA and polysomes" for the polysomes to segregate, which makes sense. But this makes it unclear to the reader what is giving the force. As written, it is unclear if (a) these repulsions alone are making the force, or (b) is it the accumulation of new polysomes in the center by adding more "repulsive" material, the force causes the nucleoids to move. If polysomes are concentrated more between nucleoids, and the polysome concentration does not increase, the DNA will not be driven apart (as in the first case) However, in the second case (which seems to be their model), the addition of new material (new polysomes) into a sterically crowded space is not exerting force - it is filling in the gaps between the molecules in that region, space that needs to arise somehow (like via Brownian motion). In other words, if the polysome region is crowded with polysomes, space must be made between these polysomes for new polysomes to be inserted, and this space must be made by thermal (or ATP-driven) fluctuations of the molecules. Thus, if polysome accumulation drives the DNA segregation, it is not "exerting force", but rather the addition of new polysomes is iteratively rectifying gaps being made by Brownian motion.

      We apologize for the understandable confusion. In our picture, the polysomes and DNA (conceptually considered as small plectonemic segments) basically behave as dissolved particles. If these particles were noninteracting, they would simply mix. However, both polysomes and DNA segments are large enough to interact sterically. So as density increases, steric avoidance implies a reduced conformational entropy and thus a higher free energy per particle. We argue (based on Miangolarra et al. 2021 PMID: 34675077 and Xiang et al. 2021 PMID: 34186018) that the demixing of polysomes and DNA segments occurs because DNA segments pack better with each other than they do with polysomes. This raises the free energy cost associated with DNA-polysome interactions compared to DNA-DNA interactions. We model this effect by introducing a term in the free energy χ_np, which refers to as a repulsion between DNA and polysomes, though as explained above it arises from entropic effects. At realistic cellular densities of DNA and polysomes, this repulsive interaction is strong enough to cause the DNA and polysomes to phase separate.

      This same density-dependent free energy that causes phase separation can also give rise to forces, just in the way that a higher pressure on one side of a wall can give rise to a net force on the wall. Indeed, the “compaction force” we refer to is fundamentally an osmotic pressure difference. At some stages during nucleoid segregation, the region of the cell between nucleoids has a higher polysome concentration, and therefore a higher osmotic pressure, than the regions near the poles. This results in a net poleward force on the sister nucleoids that drives their migration toward the poles. This migration continues until the osmotic pressure equilibrates. Therefore, both phase separation (due to the steric repulsion described above) and nonequilibrium polysome production and degradation (which creates the initial accumulation of polysomes around midcell) are essential ingredients for nucleoid segregation.

      This has been clarified in the revised text, with the support of additional simulation results showing how the asymmetry in polysome distribution causes a compaction force (Figure 4A).

      The authors use polysome accumulation and phase separation to describe what is driving nucleoid segregation. Both terms are accurate, but it might help the less physically inclined reader to have one term, or have what each of these means explicitly defined at the start. I say this most especially in terms of "phase separation", as the currently huge momentum toward liquid-liquid interactions in biology causes the phrase "phase separation" to often evoke a number of wider (and less defined) phenomena and ideas that may not apply here. Thus, a simple clear definition at the start might help some readers.

      In our case, phase separation means that the DNA-polysome steric repulsion is strong enough to drive their demixing, which creates a compact nucleoid. As mentioned in a previous point, this effect is captured in the free energy by the χ_np term, which is an effective repulsion between DNA and polysomes, though it arises from entropic effects.

      In the revised manuscript, we now illustrate this with our theoretical model by initializing a cell with a diffuse nucleoid and low polysome concentration. For the sake of simplicity, we assume that the cell does not elongate. We observe that the DNA-polysome steric repulsion is sufficient to compact the nucleoid and place it at mid-cell (new Figure 4A).

      (4) Line 478. "Altogether, these results support the notion that ectopic polysome accumulation drives nucleoid dynamics". Is this right? Should it not read "results support the notion that ectopic polysome accumulation inhibits/redirects nucleoid dynamics"?

      We think that the ectopic polysome accumulation drives nucleoid dynamics. In our theoretical model, we can introduce polysome production at fixed sources to mimic the experiments where ectopic polysome production is achieved by high plasmid expression. The model is able to recapitulate the two main phenotypes observed in experiments (Figure 7). These new simulation results have been added to the revised manuscript (Figure 7 – figure supplement 2).

      (5) It would be helpful to clarify what happens as the RplA-GFP signal decreases at midcell in Figure 1- is the signal then increasing in the less "dense" parts of the cell? That is, (a) are the polysomes at midcell redistributing throughout the cell? (b) is the total concentration of polysomes in the entire cell increasing over time?

      It is a redistribution—the RplA-GFP signal remains constant in concentration from cell birth to division (Figure 1 – Figure Supplement 1E). This is now clarified in the revised text.

      (6) Line 154. "Cell constriction contributed to the apparent depletion of ribosomal signal from the mid-cell region at the end of the cell division cycle (Figure 1B-C and Movie S1)" - It would be helpful if when cell constriction began and ended was indicated in Figures 1B and C.

      Good idea. We have added markers in Figure 1C to indicate the average start of cell constriction. This relative time from birth to division was estimated as described in the new Figure 1 – figure supplement 2. We have also indicated that cell birth and division correspond to the first and last images/timepoint in Figure 1B and C, respectively. The two-imensional average cell projections presented in Figure 3D also indicate the average timing of cell constriction, consistent with our analysis in Figure 1 – figure supplement 2.

      (7) In Figure 7 they demonstrate that radial confinement is needed for longitudinal nucleoid segregation. It should be noted (and cited) that past experiments of Bacillus l-forms in microfluidic channels showed a clear requirement role for rod shape (and a given width) in the positing and the spacing of the nucleoids.

      Wu et al, Nature Communications, 2020. "Geometric principles underlying the proliferation of a model cell system" https://dx.doi.org/10.1038/s41467-020-17988-7

      Good point! We have revised the text to mention this work. Thank you.

      (8) "The correlated variability in polysome and nucleoid patterning across cells suggests that the size of the polysome-depleted spaces helps determine where the chromosomal DNA is most concentrated along the cell length. This patterning is likely reinforced through the displacement of the polysomes away from the DNA dense region"

      It should be noted this likely functions not just in one direction (polysomes dictating DNA location), but also in the reverse - as the footprint of compacted DNA should also exclude (and thus affect) the location of polysomes

      We agree that the effects could go both ways at this early stage of the story. We have revised the text accordingly.

      (9) Line 159. Rifampicin is a transcription inhibitor that causes polysome depletion over time. This indicates that all ribosomal enrichments consist of polysomes and therefore will be referred to as polysome accumulations hereafter". Here and throughout this paper they use the term polysome, but cells also have monosomes (and 2 somes, etc). Rifampicin stops the assembly of all of these, and thus the loss of localization could occur from both. Thus, is it accurate to state that all transcription events occur in polysomes? Or are they grouping all of the n-somes into one group?

      In the original discussion, we noted that our term “polysomes” also includes monosomes for simplicity, but we agree that the term should have been defined much earlier. The manuscript has been revised accordingly. Furthermore, in the revised manuscript, we have included additional simulation results with three different diffusion coefficients that reflect different polysome sizes to show that different polysome species with less or more ribosomes give similar results (Figure 4 – figure supplement 4). This shows that the average polysome description in our model is sufficient.

      Thank you for the valuable comments and suggestions!

      Reviewer #2 (Public review):

      Summary:

      The authors perform a remarkably comprehensive, rigorous, and extensive investigation into the spatiotemporal dynamics between ribosomal accumulation, nucleoid segregation, and cell division. Using detailed experimental characterization and rigorous physical models, they offer a compelling argument that nucleoid segregation rates are determined at least in part by the accumulation of ribosomes in the center of the cell, exerting a steric force to drive nucleoid segregation prior to cell division. This evolutionarily ingenious mechanism means cells can rely on ribosomal biogenesis as the sole determinant for the growth rate and cell division rate, avoiding the need for two separate 'sensors,' which would require careful coupling.

      Terrific summary! Thank you for your positive assessment.

      Strengths:

      In terms of strengths; the paper is very well written, the data are of extremely high quality, and the work is of fundamental importance to the field of cell growth and division. This is an important and innovative discovery enabled through a combination of rigorous experimental work and innovative conceptual, statistical, and physical modeling.

      Thank you!

      Weaknesses:

      In terms of weaknesses, I have three specific thoughts.

      Firstly, my biggest question (and this may or may not be a bona fide weakness) is how unambiguously the authors can be sure their ribosomal labeling is reporting on polysomes, specifically. My reading of the work is that the loss of spatial density upon rifampicin treatment is used to infer that spatial density corresponds to polysomes, yet this feels like a relatively indirect way to get at this question, given rifampicin targets RNA polymerase and not translation. It would be good if a more direct way to confirm polysome dependence were possible.

      The heterogeneity of ribosome distribution inside E. coli cells has been attributed to polysomes by many labs (PMID: 25056965, 38678067, 22624875, 31150626, 34186018, 10675340). The attribution is also consistent with single-molecule tracking experiments showing that slow-moving ribosomes (polysomes) are excluded by the nucleoid whereas fast-diffusing ribosomes (free ribosomal subunits) are distributed throughout the cytoplasm (PMID: 25056965, 22624875). These points are now mentioned in the revised manuscript.

      Second, the authors invoke a phase separation model to explain the data, yet it is unclear whether there is any particular evidence supporting such a model, whether they can exclude simpler models of entanglement/local diffusion (and/or perhaps this is what is meant by phase separation?) and it's not clear if claiming phase separation offers any additional insight/predictive power/utility. I am OK with this being proposed as a hypothesis/idea/working model, and I agree the model is consistent with the data, BUT I also feel other models are consistent with the data. I also very much do not think that this specific aspect of the paper has any bearing on the paper's impact and importance.

      We appreciate the reviewer’s comment, but the output of our reaction-diffusion model is a bona fide phase separation (spinodal decomposition). So, we feel that we need to use the term when reporting the modeling results. Inside the cell, the situation is more complicated. As the reviewer points out, there are likely entanglements (not considered in our model) and other important factors (please see our discussion on the model limitations). This said, we have revised our text to clarify our terms and proposed mechanism.

      Finally, the writing and the figures are of extremely high quality, but the sheer volume of data here is potentially overwhelming. I wonder if there is any way for the authors to consider stripping down the text/figures to streamline things a bit? I also think it would be useful to include visually consistent schematics of the question/hypothesis/idea each of the figures is addressing to help keep readers on the same page as to what is going on in each figure. Again, there was no figure or section I felt was particularly unclear, but the sheer volume of text/data made reading this quite the mental endurance sport! I am completely guilty of this myself, so I don't think I have any super strong suggestions for how to fix this, but just something to consider.

      We agree that there is a lot to digest. We could not come up with great ideas for visuals others than the schematics we already provide. However, we have revised the text to clarify our points and added a simulation result (Figure 4A) to help explain biophysical concepts.

      Reviewer #3 (Public review):

      Summary:

      Papagiannakis et al. present a detailed study exploring the relationship between DNA/polysome phase separation and nucleoid segregation in Escherichia coli. Using a combination of experiments and modelling, the authors aim to link physical principles with biological processes to better understand nucleoid organisation and segregation during cell growth.

      Strengths:

      The authors have conducted a large number of experiments under different growth conditions and physiological perturbations (using antibiotics) to analyse the biophysical factors underlying the spatial organisation of nucleoids within growing E. coli cells. A simple model of ribosome-nucleoid segregation has been developed to explain the observations.

      Weaknesses:

      While the study addresses an important topic, several aspects of the modelling, assumptions, and claims warrant further consideration.

      Thank you for your feedback. Please see below for a response to each concern.

      Major Concerns:

      Oversimplification of Modelling Assumptions:

      The model simplifies nucleoid organisation by focusing on the axial (long-axis) dimension of the cell while neglecting the radial dimension (cell width). While this approach simplifies the model, it fails to explain key experimental observations, such as:

      (1) Inconsistencies with Experimental Evidence:

      The simplified model presented in this study predicts that translation-inhibiting drugs like chloramphenicol would maintain separated nucleoids due to increased polysome fractions. However, experimental evidence shows the opposite-separated nucleoids condense into a single lobe post-treatment (Bakshi et al 2014), indicating limitations in the model's assumptions/predictions. For the nucleoids to coalesce into a single lobe, polysomes must cross the nucleoid zones via the radial shells around the nucleoid lobes.

      We do not think that the results from chloramphenicol-treated cells are inconsistent with our model. Our proposed mechanism predicts that nucleoids will condense in the presence of chloramphenicol, consistent with experiments. It also predicts that nucleoids that were still relatively close at the time of chloramphenicol treatment could fuse if they eventually touched through diffusion (thermal fluctuation) to reduce their interaction with the polysomes and minimize their conformational energy. Fusion is, however, not expected for well-separated nucleoids since their diffusion is slow in the crowded cytoplasm. This is consistent with our experimental observations: In the presence of a growth-inhibitory concentration of chloramphenicol (70 μg/mL), nucleoids in relatively close proximity can fuse, but well-separated nucleoids condense and do not fuse. Since the growth rate inhibition is not immediate upon chloramphenicol treatment, many cells with well-separated condensed nucleoids divide during the first hour. As a result, the non-fusion phenotype is more obvious in non-dividing cells, achieved by pre-treating cells with the cell division inhibitor cephalexin (50μg/mL). In these polyploid elongated cells, well-separated nucleoids condensed but did not fuse, not even after an hour in the presence of chloramphenicol. We have revised the manuscript to add these data (illustrative images + a quantitative analysis) in Figure 4 – figure supplement 1.

      (2) The peripheral localisation of nucleoids observed after A22 treatment in this study and others (e.g., Japaridze et al., 2020; Wu et al., 2019), which conflicts with the model's assumptions and predictions. The assumption of radial confinement would predict nucleoids to fill up the volume or ribosomes to go near the cell wall, not the nucleoid, as seen in the data.

      The reviewer makes a good point that DNA attachment to the membrane through transertion could contribute to the nucleoid being peripherally localized in A22 cells. We have revised the text to add this point. However, we do not think that this contradicts the proposed nucleoid segregation mechanism described in our model. On the contrary, by attaching the nucleoid to the cytoplasmic membrane along the cell width, transertion might help reduce the diffusion and thus exchange of polysomes across nucleoids. We have revised the text to discuss transertion over radial confinement.

      (3) The radial compaction of the nucleoid upon rifampicin or chloramphenicol treatment, as reported by Bakshi et al. (2014) and Spahn et al. (2023), also contradicts the model's predictions. This is not expected if the nucleoid is already radially confined.

      We originally evoked radial confinement to explain the observation that polysome accumulations do not equilibrate between DNA-free regions. We agree that transertion is an alternative explanation. Thank you for bringing it to our attention. However, please note that this does not contradict the model. In our view, it actually supports the 1D model by providing a reasonable explanation for the slow exchange of polysomes across DNA-free regions. The attachment of the nucleoid to the membrane along the cell width may act as diffusion barrier. We have revised the text and the title of the manuscript accordingly.

      (4) Radial Distribution of Nucleoid and Ribosomal Shell:

      The study does not account for well-documented features such as the membrane attachment of chromosomes and the ribosomal shell surrounding the nucleoid, observed in super-resolution studies (Bakshi et al., 2012; Sanamrad et al., 2014). These features are critical for understanding nucleoid dynamics, particularly under conditions of transcription-translation coupling or drug-induced detachment. Work by Yongren et al. (2014) has also shown that the radial organisation of the nucleoid is highly sensitive to growth and the multifork nature of DNA replication in bacteria.

      We have revised the manuscript to discuss the membrane attachment. Please see the previous response.

      The omission of organisation in the radial dimension and the entropic effects it entails, such as ribosome localisation near the membrane and nucleoid centralisation in expanded cells, undermines the model's explanatory power and predictive ability. Some observations have been previously explained by the membrane attachment of nucleoids (a hypothesis proposed by Rabinovitch et al., 2003, and supported by experiments from Bakshi et al., 2014, and recent super-resolution measurements by Spahn et al.).

      We agree—we have revised the text to discuss membrane attachment in the radial dimension. See previous responses.

      Ignoring the radial dimension and membrane attachment of nucleoid (which might coordinate cell growth with nucleoid expansion and segregation) presents a simplistic but potentially misleading picture of the underlying factors.

      Please see above.

      This reviewer suggests that the authors consider an alternative mechanism, supported by strong experimental evidence, as a potential explanation for the observed phenomena:

      Nucleoids may transiently attach to the cell membrane, possibly through transertion, allowing for coordinated increases in nucleoid volume and length alongside cell growth and DNA replication. Polysomes likely occupy cellular spaces devoid of the nucleoid, contributing to nucleoid compaction due to mutual exclusion effects. After the nucleoids separate following ter separation, axial expansion of the cell membrane could lead to their spatial separation.

      This “membrane attachment/cell elongation” model is reminiscent to the hypothesis proposed by Jacob et al in 1963 (doi:10.1101/SQB.1963.028.01.048). There are several lines of evidence arguing against it as the major driver of nucleoid segregation:

      (Below is a slightly modified version of our response to a comment from Reviewer 1—see page 3)

      (1) For this alternative model to work, axial membrane expansion (i.e., cell elongation) would have to be localized at the middle of the splitting nucleoids (i.e., midcell position for slow growth and ¼ and ¾ cell positions for fast growth) to create a directional motion. To our knowledge, there is no evidence of such localized membrane incorporation. Furthermore, even if membrane growth was localized at the right places, the fluidity of the cytoplasmic membrane (PMID: 6996724, 20159151, 24735432, 27705775) would be problematic. To go around this fluidity issue, one could potentially evoke a potential connection to the rigid peptidoglycan, but then again, peptidoglycan growth would have to be localized at the middle of the splitting nucleoid to “push” the sister nucleoid apart from each other. However, peptidoglycan growth is dispersed prior to cell constriction (PMID: 35705811, 36097171, 2656655).

      (2) Even if we ignore the aforementioned caveats, Paul Wiggins’s group ruled out the cell elongation/transertion model by showing that the rate of cell elongation is slower than the rate of chromosome segregation (PMID: 23775792). In the revised manuscript, we confirm that the cell elongation rate is indeed overall slower than the nucleoid segregation rate (see Figure 1 - figure supplement 5A where the subtraction of the cell elongation rate to the nucleoid segregation rate at the single-cell level leads to positive values).

      (3) Furthermore, our correlation analysis comparing the rate of nucleoid segregation to the rate of either cell elongation or polysome accumulation argues that polysome accumulation plays a larger role than cell elongation in nucleoid segregation. These data were already shown in the original manuscript (Figure 1I and Figure 1 – figure supplement 5B) but were not highlighted in this context. We have revised the text to clarify this point.

      (4) The membrane attachment/cell elongation model does not explain the nucleoid asymmetries described in our paper (Figure 3), whereas they can be recapitulated by our model.

      (5) The cell elongation/transertion model cannot predict the aberrant nucleoid dynamics observed when chromosomal expression is largely redirected to plasmid expression (Figure 7). In the revised manuscript, we have added simulation results showing that these nucleoid dynamics are predicted by our model (Figure 7 – figure supplement 2).

      Based on these arguments, we do not believe that a mechanism based on membrane attachment and cell elongation is the major driver of nucleoid segregations. However, we do believe that it may play a complementary role (see “Nucleoid segregation likely involves multiple factors” in the Discussion). We have revised the text to clarify our thoughts and mention the potential role of transertion.

      Incorporating this perspective into the discussion or future iterations of the model may provide a more comprehensive framework that aligns with the experimental observations in this study and previous work.

      As noted above, we have revised the text to mention transertion.

      Simplification of Ribosome States:

      Combining monomeric and translating ribosomes into a single 'polysome' category may overlook spatial variations in these states, particularly during ribosome accumulation at the mid-cell. Without validating uniform mRNA distribution or conducting experimental controls such as FRAP or single-molecule measurements to estimate the proportions of ribosome states based on diffusion, this assumption remains speculative.

      Indeed, for simplicity, we adopt an average description of all polysomes with an average diffusion coefficient and interaction parameters, which is sufficient for capturing the fundamental mechanism underlying nucleoid segregation. To illustrate that considering multiple polysome species does not change the physical picture, we have considered an extension of our model, which contains three polysome species, each with a different diffusion coefficient (D<sub>P</sub> = 0.018, 0.023, or 0.028 μm<sup>2</sup>/s), reflecting that polysomes with more ribosomes will have a lower diffusion coefficient. Simulation of this model reveals that the different polysome species have essentially the same concentration distribution, suggesting that the average description in our minimal model is sufficient for our purposes. We present these new simulation results in Figure 4 – figure supplement 4 of the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Does the polysome density correlate with the origins? If the majority of ribosomal genes are expressed near the origins,

      This is indeed an interesting point that we mention in the discussion. The fact that the chromosomal origin is surrounded by highly expressed genes (PMID: 30904377) and is located near the middle of the nucleoid prior to DNA replication (PMID: 15960977, 27332118, 34385314, 37980336) can only help the model that we propose by increasing the polysome density at the mid-nucleoid position.

      (2) Red lines in 3C are hard to resolve - can the authors make them darker?

      Absolutely. Sorry about that.

      Reviewer #2 (Recommendations for the authors):

      The authors use rifampicin treatment as a mechanism to trigger polysome disassembly and show this leads to homogenous RplA distribution. This is a really important experiment as it is used to link RplA localization to polysomes, and tp argue that RplA density is reporting on polysomes. Given rifampicin inhibits RNA polymerase, and given the only reference of the three linking rifampicin to polysome disassembly is the 1971 Blundell and Wild ref), it would perhaps be useful to more conclusively show that polysome depletion (as opposed to inhibition of mRNA synthesis, which is upstream of polysome assembly) by using an alternative compound more commonly linked to polysome disassembly (e.g., puromycin) and show timelapse loss of density as a function of treatment time. This is not a required experiment, but given the idea that RplA density reports on polysomes is central to the authors' interpretation, it feels like this would be a thing worth being certain of. An alternative model is that ribosomes undergo self-assembly into local storage depots when not being used, but those depots are not translationally active/lack polysomes. I don't know if I think this is likely, but I'm not convinced the rifampicin treatment + waiting for a relatively long period of time unambiguously excludes other possible mechanisms given the large scale remodeling of the intracellular environment upon mRNA inhibition. I 100% buy the relationship between ribosomal distribution and nucleoid segregation (and the ectopic expression experiments are amazing in this regard), so my own pause for thought here is "do we know those ribosomes are in polysomes in the ribosome-dense regions". I'm not sure the answer to this question has any bearing on the impact and importance of this work (in my mind, it doesn't, but perhaps there's a reason it does?). The way to unambiguously show this would really be to do CryoET and show polysomes in the dense ribosomal regions, but I would never suggest the authors do that here (that's an entire other paper!).

      We agree that mRNAs play a role, as mRNAs are major components of polysomes and most mRNAs are expected to be in the form of polysomes (i.e., in complex with ribosomes). In addition, as mentioned above, the enrichments of ribosome distribution are known to be associated with polysomes (PMID: 25056965, 38678067, 22624875, 31150626, 34186018, 10675340). The attribution is consistent with single-molecule tracking experiments showing that slow-moving ribosomes (polysomes) are excluded by the nucleoid whereas fast-diffusing ribosomes (free ribosomal subunits) are distributed throughout the cytoplasm (PMID: 25056965, 22624875). This is also consistent with cryo-ET results that we actually published (see Figure S5, PMID: 34186018). We have added this information to the revised manuscript. Thank you for alerting us of this oversight.

      On line 320 the authors state "Our single-cell studies provided experimental support that phase separation between polysomes and DNA contributes to nucleoid segregation." - this comes pretty out of left field? I didn't see any discussion of this hypothesis leading up to this sentence, nor is there evidence I can see that necessitates phase separation as a mechanistic explanation unless we are simply using phase separation to mean cellular regions with distinct cellular properties (which I would advise against). If the authors really want to pursue this model I think much more support needs to be provided here, including (1) defining what the different phases are, (2) providing explicit description of what the attractive/repulsive determinants of these different phases could be/are, and (3) ruling out a model where the behavior observed is driven by a combination of DNA / polysome entanglement + steric exclusion; if this is actually the model, then being much more explicit about this being a locally arrested percolation phenomenon would be essential. Overall, however, I would probably dissuade the authors from pursuing the specific underlying physics of what drives the effects they're seeing in a Results section, solely because I think ruling in/out a model unambiguously is very difficult. Instead, this would be a useful topic for a Discussion, especially couched under a "our data are consistent with..." if they cannot exclude other models (which I think is unreasonably difficult to do).

      Thank you for your advice. We have revised the text to more carefully choose our words and define our terms.

      Minor comments:

      The results in "Cell elongation may also contribute to sister nucleoid migration near the end of the division cycle" are really interesting, but this section is one big paragraph, and I might encourage the authors to divide this paragraph up to help the reader parse this complex (and fascinating) set of results!

      We have revised this section to hopefully make it more accessible.

      Reviewer #3 (Recommendations for the authors):

      Technical Controls:

      The authors should conduct a photobleaching control to confirm that the perceived 'higher' brightness of new ribosomes at the mid-cell position is not an artefact caused by older ribosomes being photobleached during the imaging process. Comparing results at various imaging frequencies and intensities is necessary to address this issue.

      The ribosome localization data across 30 nutrient conditions (Figure 2, Figure 1 – figure supplement 6, Figure 2 – Figure supplement 1, Figure 2 – Figure supplement 3 and Figure 5) are from snapshot images, which do not have any photobleaching issue. They confirm the mid-cell accumulation seen by time-lapse microscopy. We have revised the text to clarify this point.

      Novelty of Experimental Measurements:

      While the scale of the study is unprecedented, claims of novelty (e.g., line 142) regarding ribosome-nucleoid segregation tracking are overstated. Similar observations have been made previously (e.g., Bakshi et al., 2012; Bakshi et al., 2014; Chai et al., 2014).

      Our apologies. The text in line 142 oversimplified our rationale. This has been corrected in the revised manuscript.

    1. As I was preparing to present the first iteration of this paper, I worried I might be attributing inaccurate feelings to her so I asked her how she felt about being labeled as a child with special needs. She fired back with no hesitation, "I hate it!"

      I think this paragraph is really true and powerful. We often think that "identity" is a label that others put on us, but in fact, we ourselves are constantly participating in, responding to, and even internalizing these labels to some extent. Lydia's sentence "I hate it!" really made me feel the conflict - she was given a label that "helped" her, but her feelings were not really understood. It is too easy for us to use words like "special needs" as neutral words, but ignore the oppression that may be brought to the person involved. I like the author's reminder that identity is complex and dynamic, not a fixed definition.

    2. I also want to point out that despite the many challenges we face, our lives are no doubt much easier than those without our many privileges of skin color, social class, and language:

      Sometimes the advantages we have are "invisible". Things like skin color, social class and language, which we may not pay much attention to in our daily life, do quietly influence our experiences at school, such as whether we are misunderstood or easily understood and supported by teachers. The author's admission of her privilege is not to deny the difficulties she is facing, but to present a more comprehensive and honest educational perspective. I think this kind of self-awareness is also very important in the school environment, especially for us students. Only by learning to recognize our own position can we better understand the situation of others.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Public review:

      In this study, Porter et al report on outcomes from a small, open-label, pilot randomized clinical trial comparing dornase-alfa to the best available care in patients hospitalized with COVID-19 pneumonia. As the number of randomized participants is small, investigators describe also a contemporary cohort of controls and the study concludes about a decrease of inflammation (reflected by CRP levels) aJer 7 days of treatment but no other statistically significant clinical benefit.

      Suggestions to the authors:

      • The RCT does not follow CONSORT statement and reporting guidelines

      We thank you for this suggestion and have now amended the order and content of the manuscript to follow the CONSORT statement as closely as possible.

      • The authors have chosen a primary outcome that cannot be at least considered as clinically relevant or interesting. AJer 3 years of the pandemic with so much research, why investigate if a drug reduces CRP levels as we already have marketed drugs that provide beneficial clinical outcomes such as dexamethasone, anakinra, tocilizumab and baricitinib.

      We thank the reviewer for bringing up this central topic. The answer to this question has both a historical and practical component. This trial was initiated in June of 2020 and was completed in June of 2021. At that time there were no known treatments for the severe immune pathology of COVID19 pneumonia. In June 2020, dexamethasone data came out and we incorporated dexamethasone into the study design. It took much longer for all other anti-inflammatories to be tested. Hence, our decision to trial an approved endonuclease was based purely on basic science work on the pathogenic role of cell-free chromatin and NETs in murine sepsis and flu models and the ability of DNase I to clear them and reduce pathology in these animal models. In addition, evidence for the presence of cell-free chromatin components in COVID-19 patient plasma had already been communicated in a pre-print. Finally, several studies had reported the anti-inflammatory effects of dornase treatment in CF patients. Hence there was a strong case for a cheap, safe, pulmonary noninvasive treatment that could be self-administered outside the clinical se]ng.

      The Identification of novel/repurposed treatments effective for COVID-19 were hampered by patient recruitment to competing studies during a pandemic. This resulted in small studies with inconclusive or contrary findings. In general, effective treatments were only picked up in very large RCTs. For example, demonstrating dexamethasone as effective in COVID-19 required recruitment of 6,425 patients into the RECOVERY study. Multiple trials with anti-IL-6 gave conflicting evidence until RECOVERY recruited 4116 adults with COVID-19 (n=2022, tocilizumab and 2094, control) similar for Baracitinib (4,148 randomised to treatment and 4,008 to standard care). Anakinra is approved for patients with elevated suPAR, based on data from one randomized clinical trial of 594 patients, of whom 405 had active treatment (PMID: 34625750). However, a systematic review analysing over 1,627 patients (anakinra 888, control 739) with COVID-19 showed no benefit (PMID: 36841793). Regarding the choice of the primary endpoint, there is a wealth of clinical evidence to support the relevance of CRP as a prognostic marker for COVID-19 pneumonia patients and it is a standard diagnostic and prognostic clinical parameter in infectious disease wards. This choice in March 2020 was supported by evidence of the prognostic value of IL-6; CRP is a surrogate of IL-6. We also provide our own data from a large study of severe COVID-19 pneumonia in figure 1, showing how well CRP correlates with survival.

      In summary, our data suggest that Dornase yields an anti-inflammatory effect that is comparable or potentially superior to cytokine-blocking monotherapies at a fraction of the cost and potentially without the additional adverse effects such as the increase for co-infections.

      We now provide additional justification on these points in the introduction on pg.4 as follows:

      “The trial was ini.ated in June 2020 and was completed in September of 2021. At the start of the trial only dexamethasone had been proven to benefit hospitalized COVID-19 pneumonia pa.ents and was thus included in both arms of the trial. To increase the chance of reaching significance under challenging constraints in pa.ent access, we opted to increase our sample size by using a combina.on of randomized individuals and available CRP data from matched contemporary controls (CC) hospitalized at UCL but not recruited to a trial. These approaches demonstrated that when combined with dexamethasone, nebulized DNase treatment was an effec.ve an.-inflammatory treatment in randomized individuals with or without the implementa.on of CC data.”

      We also added the following explanation in the discussion on pg. 16:

      “Our study design offered a solution to the early screening of compounds for inclusion in larger platform trials. The study took advantage of frequent repeated measures of quantifiable CRP in each patient, to allow a smaller sample size to determine efficacy/futility than if powered on clinical outcomes. We applied a CRP-based approach that was similar to the CATALYST and ATTRACT studies. CATALYST showed in much smaller groups (usual care, 54, namilumab, 57 and infliximab, 35) that namilumab that is an antibody that blocks the cytokine GM-CSF reduced CRP even in participants treated with dexamethasone whereas infliximab that targets TNF-α had no significant effect on CRP. This led to a suggestion that namilumab should be considered as an agent to be prioritised for further investigation in the RECOVERY trial. A direct comparison of our results with CATALYST is difficult due to the different nature of the modelling employed in the two studies. However, in general Dornase alfa exhibited comparable significance in the reduction in CRP compared to standard of care as described for namilumab at a fraction of the cost. Furthermore, endonuclease therapies may prove superior to cytokine blocking monotherapies, as they are unlikely to increase the risk for microbial co-infections that have been reported for antibody therapies that neutralize cytokines that are critical for immune defence such as IL-1β, IL-6 or GM-CSF. “

      • Please provide in Methods the timeframe for the investigation of the primary endpoint

      This information is provided in the analysis on pg. 8:

      “The primary outcome was the least square (LS) mean CRP up to 7 days or at hospital discharge whichever was sooner.”

      • Why day 35 was chosen for the read-out of the endpointt?

      We now state on pg. 8 that “Day 35 was chosen as being likely to include most early mortality due to COVID-19 being 4 weeks after completion of a week of treatment. ( i.e. d7 of treatment +28 (4 x 7 days))”

      • The authors performed an RCT but in parallel chose to compare also controls. They should explain their rationale as this is not usual. I am not very enthusiastic to see mixed results like Figures 2c and 2d.

      We initially aimed at a fully randomized trial. However, the swiJ implementation of trial prioritization strategies towards large and pre-established trial plamorms in the UK made the recruitment COVID19 patients to small studies extremely challenging. Thus, we struggled to gain access to patients. Our power calculations suggested that a mixed trial with randomized and contemporary controls was the best way forward under these restrictions in patient access that could provide sufficient power.

      That being said, we also provide the primary endpoint (CRP) results in Fig. 3B as well as the results for the length of hospitalization (Fig. S3D) for the randomized subjects only.

      • Analysis is performed in mITT; this is a major limitation. The authors should provide at least ITT results. And they should describe in the main manuscript why they chose mITT analysis.

      We apologize if this point was confusing. We performed the analysis on the ITT as defined in our SAP: “The primary analysis population will be all evaluable patients randomised to BAC + dornase alfa or BAC only who have at least one post-baseline CRP measurement, as well as matched historical comparators.”

      We understand that the reason this might be mistaken as an mITT is because the N in the ITT (39) doesn’t match the number randomised and because we had stated on pg. 8 that “ Efficacy assessments of primary and secondary outcomes in the modified inten.on-to-treat popula.on were performed.”

      However, we did randomise 41 participants, but:

      One participant in the DA arm never received treatment. The individual withdrew consent and was replaced. We also have no CRP data for this participant in the database, so they were unevaluable, and we couldn’t include them in the baseline table even if we wanted to. In addition, 1 participant in BAC only had a baseline CRP measurement available. Hence not evaluable as we only have a baseline CRP measurement for this participant.

      We have corrected the confusing statement on pg. 8 and added an additional explanation.

      “Efficacy assessments of primary and secondary outcomes in the inten.on-to-treat (ITT) popula.on were performed on all randomised par.cipants who had received at least one dose of dornase alfa if randomized to treatment. For full details see Sta.s.cal Analysis Plan. The ITT was adjusted to mi.gate the following protocol viola.ons where one par.cipant in the BAC arm and one in the DA arm withdrew before they received treatment and provided only a baseline CRP measurement available. The par.cipant in the DA arm was replaced with an addi.onal recruited pa.ent. Exploratory endpoints were only available in randomised par.cipants and not in the CC. In this case, a post hoc within group analysis was conducted to compare baseline and post-baseline measurements.”

      • It is also not usual to exclude patients from analysis because investigators just do not have serial measurements. This is lost to follow up and investigators should have pre-decided what to do with lost-to-follow-up.

      Our protocol pre-specified that the primary analysis population should have at least one postbaseline CRP measurement (pg. 13 of protocol). The patient that was excluded was one that initially joined the trial but withdrew consent after the first treatment but before the first post-treatment blood sample could be drawn. Hence, the pre-treatment CRP of this patient alone provided no useful information.

      • In Table 1 I would like to see all randomized patients (n=39), which is missing. There are also baseline characteristics that are missing, like which other treatments as BAT received by those patients except for dexamethasone.

      Table 1 includes all 39 patients plus 60 CCs.<br /> Table 2 shows additional treatments given for COVID-19 as part of BAC.

      • In the first paragraph of clinical outcomes, the authors refer to a cohort that is not previously introduced in the manuscript. This is confusing. And I do not understand why this analysis is performed in the context of this RCT although I understand its pilot nature.

      One of the main criticisms we have encountered in this study has been the choice of the primary endpoint. The best way respond to these questions was to provide data to support the prognostic relevance of CRP in COVID-19 pneumonia from a separate independent study where no other treatments such as dexamethasone, anakinra or anti-IL6 therapies were administered. We think this is very useful analysis and provides essential context for the trial and the choice of the primary endpoint, indicating that CRP has good enough resolution to predict clinical outcomes.

      • Propensity-score selected contemporary controls may introduce bias in favor of the primary study analysis, since controls are already adjusted for age, sex and comorbidities.

      The contemporary controls were selected to best match the characteristics of the randomized patients including that the first CRP measurement upon admission surpassed the trial threshold, so we do not see how this selection process introduces biases, as it was blinded with regards to the course of the CRP measurements. Given that this was a small trial, matching for baseline characteristics is necessary to minimize confounding effects.

      • The authors do not clearly present numerically survivors and non-survivors at day 34, even though this is one of the main secondary outcomes.

      We now provide the mortality numbers in the following paragraph on pg. 13.

      “Over 35 days follow up, 1 person in the BAC + dornase-alfa group died, compared to 8 in the BAC group. The hazard ra.o observed in the Cox propor.onal hazards model (95% CI) was 0.47 (0.06, 3.86), which es.mates that throughout 35 days follow-up, there was a 53% reduced chance of death at any given .mepoint in the BAC + dornase-alfa group compared to the BAC group, though the confidence intervals are wide due to a small number of events. The p-value from a log-rank test was 0.460, which does not reach sta.s.cal significance at an alpha of 0.05.”

      • It is unclear why another cohort (Berlin) was used to associate CRP with mortality. CRP association with mortality should (also) be performed within the current study.

      As we explained above, the Berlin cohort CRP data serve to substantiate the relevance of CRP as a primary endpoint in a cohort that experienced sufficient mortality as this cohort did not receive any approved anti-inflammatory therapy. Mortality in our COVASE trial was minimal, since all patients were on dexamethasone and did not reach the highest severity grade, since we opted to treat patients before they deteriorated further. The overall mortality was 8% across all arms of our study, which does not provide enough events for mortality measurements. In contrast the Berlin cohort did not receive dexamethasone and all patients had reached a WHO severity grade 7 category with mortality at 30%.

      My other concerns are:

      • This report is about an RCT and the authors should follow the CONSORT reporting guidelines. Please amend the manuscript and Figure 1b accordingly and provide a CONSORT checklist.

      We now provide a CONSORT checklist and have amended the CONSORT diagram accordingly.

      • Please provide in brief the exclusion criteria in the main manuscript

      We have now included the exclusion criteria in the manuscript on pg. 6.

      “1.1.1 Exclusion criteria

      1. Females who are pregnant, planning pregnancy or breasmeeding

      2. Concurrent and/or recent involvement in other research or use of another experimental inves.ga.onal medicinal product that is likely to interfere with the study medica.on within (specify .me period e.g. last 3 months) of study enrolment 3. Serious condi.on mee.ng one of the following:

      a. Respiratory distress with respiratory rate >=40 breaths/min

      b. oxygen satura.on<=93% on high-flow oxygen

      1. Require mechanical invasive or non-invasive ven.la.on at screening

      2. Concurrent severe respiratory disease such as asthma, COPD and/or ILD

      3. Any major disorder that in the opinion of the Inves.gator would interfere with the evalua.on of the results or cons.tute a health risk for the trial par.cipant

      4. Terminal disease and life expectancy <12 months without COVID-19

      5. Known allergies to dornase alfa and excipients

      6. Par.cipants who are unable to inhale or exhale orally throughout the en.re nebulisa.on period So briefly Patients were excluded if they were:

      7. pregnant, planning pregnancy or breasmeeding

      8. Serious condition meeting one of the following:

      a. Respiratory distress with respiratory rate >=40 breaths/min

      b. oxygen satura.on<=93% on high-flow oxygen

      1. Require ven.la.on at screening

      2. Concurrent severe respiratory disease such as asthma, COPD and/or ILD

      3. Terminal disease and life expectancy <12 months without COVID-19

      4. Known allergies to dornase alfa and excipients

      5. Participants who are unable to inhale or exhale orally throughout the en.re nebulisa.on period”

      • "The final trial visit occurred at day 35." "Analysis included mortality at day 35". I am not sure I understand why. In clinicaltrials.gov all endpoints are meant to be studies at day 7 except for mortality rate day 28. Why day 35 was chosen? Please be consistent.

      Thank you for identifying this inconsistency. We have amended the record on clinicaltrials.gov to read ‘’the time to event data was censored at 28 days post last dose (up to d35) for the randomised participants and at the date of the last electronic record for the CC.”

      • Please provide in Methods the timeframe for the investigation of the primary endpoint

      • The authors performed an RCT but in parallel chose to compare also controls. They should explain their rationale as this is not usual. I am not very enthusiastic to see mixed results like Figures 2c and 2d.

      • Analysis is performed in mITT; this is a major limitation. The authors should provide at least ITT results. And they should describe in the main manuscript why they chose mITT analysis.

      • It is also not usual to exclude patients from analysis because investigators just do not have serial measurements. This is lost to follow up and investigators should have pre-decided what to do with lost-to-follow-up.

      • Figure 1b as in CONSORT statement, please provide reasons why screened patients were not enrolled.

      • In Table 1 I would like to see all randomized patients (n=39), which is missing. There are also baseline characteristics that are missing, like which other treatment as BAT received those patients except for dexamethasone.

      • In the first paragraph of clinical outcomes, the authors refer to a cohort that is not previously introduced in the manuscript. This is confusing. And I do not understand why this analysis is performed in the context of this RCT although I understand its pilot nature.

      • In Figure 2 the authors draw results about ITT although in methods describe that they performed an mITT analysis. Please be consistent.

      Please see answers provided to these queries above.

      Reviewer #2 (Recommendations For The Authors):

      1) Suppl Figure 2B would be more informative if presented as a Table with N of patients with per day sampling

      We now provide the primary end point daily sampling table in Table 3.

      2) The numbers at risk should figure under the KM curves

      The numbers at risk for figures 1E, 2C, 2D have been added as graphs either in the main figures or in the supplement.

      3) HD in Supplementary figure 3 should be explained

      We apologize for this omission. We now provide a description for the healthy donor samples that we used in the cell-free DNA measurements in figure S3B on pg. 14:

      “Compared to the plasma of anonymized healthy donors volunteers at the Francis Crick ins.tute (HD), plasma cf-DNA levels were elevated in both BAC and DA-treated COVASE par.cipants.

      4) Presentation is inappropriate for Table S4

      We thank the reviewer for pointing this issue. We have now formaxed Table S4 to be consistent with all other tables.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript is a focused investigation of the phosphor-regulation of a C. elegans kinesin-2 motor protein, OSM-3. In C-elegans sensory ciliary, kinesin-2 motor proteins Kinesin-II complex and OSM-3 homodimer transport IFT trains anterogradely to the ciliary tip. Kinesin-II carries OSM-3 as an inactive passenger from the ciliary base to the middle segment, where kinesin-II dissociates from IFT trains and OSM-3 gets activated and transports IFT trains to the distal segment. Therefore, activation/inactivation of OSM-3 plays an essential role in its ciliary function.

      Strengths:

      In this study, using mass spectrometry, the authors have shown that the NEKL-3 kinase phosphorylates a serine/threonine patch at the hinge region between coiled coils 1 and 2 of an OSM-3 dimer, referred to as the elbow region in ubiquitous kinesin-1. Phosphomimic mutants of these sites inhibit OSM-3 motility both in vitro and in vivo, suggesting that this phosphorylation is critical for the autoinhibition of the motor. Conversely, phospho-dead mutants of these sites hyperactivate OSM-3 motility in vitro and affect the localization of OSM3 in C. elegans. The authors also showed that Alanine to Tyrosine mutation of one of the phosphorylation rescues OS-3 function in live worms.

      Weaknesses:

      Collectively, this study presents evidence for the physiological role of OSM-3 elbow phosphorylation in its autoregulation, which affects ciliary localization and function of this motor. Overall, the work is well performed, and the results mostly support the conclusions of this manuscript. However, the work will benefit from additional experiments to further support conclusions and rule out alternative explanations, filling some logical gaps with new experimental evidence and in-text clarifications, and improving writing before I can recommend publication.

      We appreciate Reviewer #1’s comments and suggestions. We have now provided additional evidences and discussions to further support our conclusions and fill the logical gaps. We have also provided alternative explanations to our data and improved writing.

      Reviewer #2 (Public review):

      Summary:

      The regulation of kinesin is fundamental to cellular morphogenesis. Previously, it has been shown that OSM-3, a kinesin required for intraflagellar transport (IFT), is regulated by autoinhibition. However, it remains totally elusive how the autoinhibition of OSM-3 is released. In this study, the authors have shown that NEKL-3 phosphorylates OSM-3 and releases its autoinhibition.

      The authors found NEKL-3 directly phosphorylates OSM-3 (although the method is not described clearly) (Figure 1). The phophorylated residue is the "elbow" of OSM-3. The authors introduced phospho-dead (PD) and phospho-mimic (PM) mutations by genome editing and found that the OSM-3(PD) protein does not form cilia, and instead, accumulates to the axonal tips. The phenotype is similar to another constitutive active mutant of OSM-3, OSM-3(G444A) (Imanishi et al., 2006; Xie et al., 2024). osm-3(PM) has shorter cilia, which resembles with loss of function mutants of osm-3 (Figure 3). The authors did structural prediction and showed that G444E and PD mutations change the conformation of OSM-3 protein (Figure 3). In the single-molecule assays G444E and PD mutations exhibited increased landing rate (Figure 4). By unbiased genetic screening, the authors identified a suppressor mutant of osm-3(PD), in which A489T occurs. The result confirms the importance of this residue. Based on these results, the authors suggest that NEKL-3 induces phosphorylation of the elbow domain and inactivates OSM-3 motor when the motor is synthesized in the cell body. This regulation is essential for proper cilia formation.

      Strengths:

      The finding is interesting and gives new insight into how the IFT motor is regulated.

      Weaknesses:

      The methods section has not presented sufficient information to reproduce this study.

      We appreciate that Reviewer #2 is also positive to our study. We have now provided sufficient information in the revised Methods section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major Concerns

      (1) Why do the authors think that NEKL-3 phosphorylates OSM-3 in the first place? This seems to come out of nowhere and prior evidence indicating that NEKL-3 may be phosphorylating OSM-3 is not even mentioned in the Introduction.

      We thank the Reviewer for raising this important point. Our hypothesis that NEKL-3 phosphorylates OSM-3 stems from prior findings in our lab. In a previous study (Yi et al., Traffic, 2018, PMID: 29655266), we identified NEKL-4, a member of the NIMA kinase family, as a suppressor of the OSM-3(G444E) hyperactive mutation. This discovery prompted us to explore the broader role of NIMA kinases in regulating OSM3. Subsequent genetic screens (Xie et al., EMBO J, 2024, PMID: 38806659) revealed that both NEKL-3 and NEKL-4 suppress multiple OSM-3 mutations, further supporting their functional interaction. Given the established role of NIMA kinases in phosphorylation-dependent processes (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), we hypothesized that NEKL-3/4 may directly phosphorylate OSM-3 to modulate its activity.

      To test this hypothesis, we expressed recombinant C. elegans NEKL-3 and OSM-3 proteins and conducted in vitro phosphorylation assays. While we were unable to obtain active recombinant NEKL-4 (limitations noted in the revised text), our experiments with NEKL-3 revealed phosphorylation at residues 487-490 (YSTT motif) in OSM-3’s tail region, as confirmed by mass spectrometry. These findings are now explicitly contextualized in the Introduction and Results sections of the revised manuscript.

      Page #4, Line #11:

      “...In our previous study (Yi et al., Traffic, 2018, PMID: 29655266), a genetic screen targeting the OSM-3(G444E) hyperactive mutation identified NEKL-4, a member of the NIMA kinase family, as a suppressor of this phenotype. This finding, combined with reports that NIMA kinases regulate ciliary processes independently of their canonical mitotic roles (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), prompted us to investigate whether NIMA kinases modulate OSM-3-driven intraflagellar transport. We hypothesized that NEKL-3/4, as paralogs within this family, might directly phosphorylate OSM-3 to regulate its motility...”

      Page #4, line #26:  

      “... To determine whether NIMA kinase family members could directly phosphorylate

      OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region ...”

      (2) The authors need to characterize the proteins they expressed and purified for in vitro ATPase and motility assays. Are these proteins monomers or dimers?

      For our in vitro ATPase and motility assays, OSM-3 was expressed in E. coli BL21(DE3) and purified using established protocols (Xie et al., EMBO J, 2024, PMID: 38806659; Imanishi et al., JCB, 2006, PMID: 17000874). To confirm its oligomeric state, we analyzed recombinant OSM-3 by size-exclusion chromatography coupled with multiangle light scattering (SEC-MALS). As reported in Xie et al. (2024), OSM-3 (~80 kDa monomer) elutes with a molecular weight of 173–193 kDa under physiological buffer conditions, consistent with a homodimeric assembly. These findings confirm that the functional unit used in our assays is the biologically relevant dimer. This characterization has been added to the revised manuscript on Page #35, Line #7.

      “…OSM-3 was expressed in E. coli BL21(DE3) and purified for in vitro assays using established protocols (REFs). Size-exclusion chromatography coupled with multiangle light scattering (SEC-MALS) (Xie et al., EMBO J., 2024) confirmed that recombinant OSM-3 forms a homodimer (173–193 kDa) under physiological conditions, ensuring its dimeric state remained intact....” 

      (3) The authors primarily used PD and PM mutations, which affect all four amino acids in the region. This may or may not be physiologically relevant. Figure 5 indicates that T489 is a critical regulatory site. However, this conclusion is undermined by reliance on PD mutations, which affect all four amino acids. Creating PM (T489E) and PD (T489A) mutations based on WT OSM-3 would better reflect physiological relevance. In vitro assays with a single phosphomimic or phosphor-dead mutation at residue 489 are missing at the end of this story. This would better link Figure 5 with the rest of the manuscript.

      We thank the reviewer for this constructive critique. Below, we address the concerns and integrate new data to strengthen the link between T489 and autoinhibition:

      To probe the regulatory role of T489 phosphorylation, we generated osm-3(T489E) (phosphomimetic, PM) and osm-3(T489A) (phospho-dead, PD) mutant animals. Strikingly, both mutants formed axonal puncta (Figure S7), recapitulating the hyperactive phenotype of the OSM-3G444E mutant. While the similar puncta formation in PM and PD mutants initially appeared paradoxical, this observation underscores the necessity of dynamic phosphorylation cycling at T489 for proper autoinhibition. Specifically, the PD mutant (T489A) likely disrupts phosphorylationdependent autoinhibition stabilization, leading to constitutive activation, where as the PM mutant (T489E) may mimic a "locked" phosphorylated state, preventing dephosphorylation-dependent release of autoinhibition in cilia and trapping OSM-3 in an aggregation-prone conformation. These results highlight T489 as a structural linchpin whose post-translational modification dynamically regulates motor activity. While the precise molecular mechanism—such as how phosphorylation modulates tailmotor domain interactions—remains to be elucidated, our data conclusively demonstrate that perturbing T489 (even in isolation) destabilizes autoinhibition, driving puncta formation and the constitutive activity.

      We have integrated the above paragraph in the revised manuscript on page #8, line #27.

      (4) There seems to be a disconnect between the MT gliding assays in Figure 4C and single molecule motility assays in Figure 4E. The gliding assays show that all constructs can glide microtubules at near WT speeds. Yet, the motility assays show that WT and PM cannot land or walk on MTs. The authors need to explain why this is the case. Is this because surface immobilization of kinesin from its tail disrupts autoinhibition? Alternatively, the protein preparation may include monomers that cannot be autoinhibited and cannot land and processively walk on surface-immobilized microtubules (because they only have one motor domain) but can glide microtubules when immobilized on the surface from their tail.

      The surface immobilization of OSM-3 via its tail domain disrupts autoinhibition, a phenomenon previously observed in other kinesins such as kinesin-1 (Nitzsche et al, Methods Cell Biol., 2010, PMID: 20466139). In our assays, OSM-3 was nonspecifically immobilized on glass surfaces, enabling microtubule gliding by motors whose autoinhibition was relieved through tail anchoring. Critically, the PD and PM mutations reside in the tail region and do not alter the intrinsic properties of the motor head domain. Consequently, once autoinhibition is released via immobilization, the gliding velocities reflect the conserved motor head activity, which is expected to remain comparable across all constructs. While we cannot entirely rule out the presence of monomeric OSM-3 in solution, several lines of evidence argue against this possibility. First, the mutations are located in the elbow region, which is dispensable for motor dimerization. Second, SEC-MALS analysis from prior studies confirms that purified OSM-3 exists predominantly as dimers in solution. 

      We have discussed these issues in the revised text on page #10, line #18: 

      “…In our gliding assays, OSM-3PM has an increased gliding speed of 0.69 ± 0.07 μm/s (Fig. 4 C-D), similar to PD mutant. PD and PM mutations are confined to the elbow region, leaving the motor head’s mechanochemical properties intact. Upon tail immobilization—which releases autoinhibition—the gliding speeds reflect motor head activity. Single-molecule assays, however, directly resolve their native regulatory states: PD mutants are constitutively active, whereas PM mutants persist in an autoinhibited state (Fig. 4E-G). Although monomeric OSM-3 could theoretically mediate singlemotor gliding, the previous SEC-MALS data demonstrate that OSM-3 purifies as stable dimers (Xie et al., EMBO J, 2024, PMID: 38806659). Thus, dimeric OSM-3 is perhaps the predominant functional species in our assays…”

      (5) An alternative explanation for the data is that both PD and PM mutations result in loss-of-function effects, disrupting OSM-3 activity. For instance:

      a) In Figure 2C, both mutations cause shorter cilia than the wild type (WT).

      b) In Figure 4A, both mutations result in higher ATPase activity than WT.

      c) In Figure 4D, both mutations show increased gliding velocity compared to WT. These results suggest the observed effects could stem from loss of function rather than phosphorylation-specific regulation.

      Although PD and PM mutations exhibit superficially similar "loss-of-function" phenotypes in certain assays, they mechanistically disrupt motor regulation in distinct ways:

      a) Ciliary Length (Figure 2C) PD Mutants: Hyperactivation causes OSM-3-PD to prematurely aggregate into axonal puncta, preventing ciliary entry. Consequently, cilia are built solely by the weaker Kinesin-II motor, which only constructs shorter middle segments.

      PM Mutants: OSM-3-PM retains autoinhibition during transport (enabling ciliary entry) but cannot be dephosphorylated in cilia. This blocks activation, leaving OSM-3-PM partially functional and resulting in cilia intermediate in length between WT and PD.

      We have discussed this issue in the revised text on page #5, line #30:

      “…These findings indicate that OSM-3-PM is in an autoinhibited state capable of ciliary delivery, yet fails to achieve full activation due to defective dephosphorylation. This incomplete activation results in suboptimal motor function and intermediate ciliary length phenotypes (Fig.2 B-C). In contrast, OSM-3-PD exhibits constitutive activation leading to aggregation into axonal puncta, which completely abolishes its ciliary entry capacity (Fig.2 A-B)...”

      b) ATPase Activity (Figure 4A)

      PD Mutants: Fully autoinhibition-released (98.15% of KHC ATPase activity), consistent with constitutive activation.

      PM Mutants: Show partial ATPase activity (34.28% of KHC), reflecting imperfect phosphomimicry. While the DDEE substitution introduces negative charges, it fails to fully replicate the steric/kinetic effects of phosphorylated tyrosine (Y486; phenyl ring absent), resulting in incomplete autoinhibition stabilization. Despite this, the residual inhibition is sufficient to phenocopy shorter cilia in vivo.

      We have discussed this issue in the revised text on page #7, line#19:

      “…The PM mutant’s partial ATPase activity (34.28% of KHC) might arise from imperfect phosphomimicry—while the DDEE substitution introduces negative charges, it lacks the steric bulk of phosphorylated tyrosine (pY487). And this incomplete mimicry allows residual autoinhibition, sufficient to limit ciliary construction in vivo...”

      c) Microtubule Gliding Velocity (Figure 4D)

      Gliding Assay Limitation: Tail immobilization artificially releases autoinhibition, masking regulatory differences. Thus, all constructs (PD, PM) exhibit similar velocities (~0.7 µm/s), reflecting conserved motor head activity.

      Single-Molecule Assay (Figure 4E): Directly resolves native autoinhibition states:

      PD mutants show robust motility (autoinhibition released).

      PM mutants remain largely inactive (autoinhibition retained).

      We have discussed this issue in the revised text on page #10, line#18:

      “…In our gliding assays, OSM-3PM has an increased gliding speed of 0.69 ± 0.07 μm/s (Fig. 4 C-D), similar to PD mutant. PD and PM mutations are confined to the elbow region, leaving the motor head’s mechanochemical properties intact. Upon tail immobilization—which releases autoinhibition—the gliding speeds reflect motor head activity. Single-molecule assays, however, directly resolve their native regulatory states: PD mutants are constitutively active, whereas PM mutants persist in an autoinhibited state (Fig. 4E-G)...”

      Minor Suggestions and Concerns

      (1) Lines 60-66: References that support these observations are missing from this section.

      We have added the relevant references.

      (2) Lines 66-67: I would revise this sentence as "It remains unclear how OSM-3 becomes enriched...".

      We have made the changes.

      (3) Line 85: The authors should describe how they perform these assays (i.e. recombinantly expressed NEKL-3 and OSM-3, are these C. elegans proteins, and which expression system was used...).

      We have described them in the main text and methods

      Page #4 line #26

      “...To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM-3 protein in order to perform in vitro phosphorylation assays...”

      Page #35 line#12

      “...Basically, point mutations was introduced in to pET.M.3C OSM-3-eGFP-His6 plasmid for prokaryotic expression. Plasmid transformed E. coli (BL21) was cultured at 37°C and induced overnight at 23°C with 0.2 mM IPTG. Cells were lysed in lysis buffer (50 mM NaPO4 pH8.0, 250 mM NaCl, 20 mM imidazole, 10 mM bME, 0.5 mM ATP, 1 mM MgCl¬2, Complete Protease Inhibitor Cocktail (Roche)) and Ni-NTA beads were applied for affinity purification. After incubation, beads were washed with wash buffer (50 mM NaPO4 pH6.0, 250 mM NaCl, 10 mM bME, 0.1 mM ATP, 1 mM MgCl¬2) and eluted with elute buffer (50 mM NaPO4 pH7.2, 250 mM NaCl, 500 mM imidazole, 10 mM bME, 0.1 mM ATP, 1 mM MgCl¬2). Protein concentration was determined by standard Bradford assay. C elegans nekl-3 cDNA was cloned in to pGEX-6P GST vector and expressed in E. coli BL21 (DE3) and purified for in vitro phosphorylation assays. Plasmid transformed E. coli (BL21) was cultured at 37°C and induced overnight at 18°C with 0.5 mM IPTG. Cells were lysed in lysis buffer (50 mM NaPO4 pH8.0, 250 mM NaCl, 1 mM DTT, Complete Protease Inhibitor Cocktail (Roche)) and GST beads were applied for affinity purification. After incubation, beads were washed with wash buffer (50 mM NaPO4 pH6.0, 250 mM NaCl, 1 mM DTT) and eluted with elute buffer (50 mM NaPO4 pH7.2, 150 mM NaCl, 10 mM GSH, 1 mM DTT). Purified proteins were dialyzed against storge buffer (50 mM Tris-HCl, pH 8.0, 150 mM NaCl). Protein concentration was determined by standard Bradford assay...”

      (4) Line 141: The first sentence of this paragraph lacks motivation. I would start this sentence with "To directly observe the effects of phosphor mutants in the elbow region in microtubule binding and motility of OSM-3, we...".

      We have made the change.

      (5) Figure 1B: The mass spectrometry data in Figure 1B lacks adequate explanation. The Methods section should detail the experimental protocol, data interpretation, and any databases used. Additionally, the manuscript should list all identified phosphorylation sites on OSM-3 to provide context, including whether Y487_T490 is the major site.

      We have provided the detailed experimental protocol, data interpretation, and databases used in methods. We have provided all identified sites as Appendix table S1.

      (6) Figure 1C: Is it possible to model the effect of PM and PD mutations using AlphaFold? The authors should also show PAE or pLDDT scores of their model.

      AlphaFold cannot well model the effect of mutants, but we conducted the Rosetta relax to capture their possible conformational changes, as shown in the revised Figure 3. We have provided PAE and pLDDT as a new figure, Figure S2.

      (7) Figure 2D: The unit for speed should use a lowercase "s" for seconds.

      We have fixed it.

      (8) Figure 3: I am not sure whether this figure stands for a main text figure on its own, as it is only a Rosetta prediction and is not supported by any experimental data. In addition, it remains unclear what the labels on the x-axis mean.

      We have updated the figure and explain the labels on the x-axis in Figure S4 to make it more reader-friendly.

      (9) Figure 4: NEKL-3-treated OSM-1 should be included as a positive control in the in vitro experiments.

      We suspect that the Reviewer asked for NEKL-3-treated OSM-3. 

      In our other study which has just been accepted by the Journal of Cell Biology, NEKL3-treated OSM-3 significantly reduced the affinity between OSM-3 motor and microtubules and showed very low ATPase activity. We have cited and discussed this in the revised text on page #10, line #28: 

      “…As demonstrated in our recent study (Huang et al., JCB, 2025, In press, attached), phosphorylation of OSM-3 by NEKL-3 at two distinct regions—Ser96 and the conserved "elbow" motif—differentially regulates its activity and localization. Phosphorylation at Ser96 reduces OSM-3’s ATPase activity and alters its ciliary distribution from the distal segment to a uniform localization, while elbow phosphorylation induces autoinhibition, retaining OSM-3 in the cell body. Strikingly, in vitro phosphorylation of OSM-3 by NEKL-3 significantly reduces its microtubulebinding affinity, likely arising from combined modifications at both sites. We propose a model wherein elbow phosphorylation ensures anterograde ciliary transport, while Ser96 phosphorylation fine-tunes distal segment targeting. This multistep regulation may involve distinct phosphatases to reverse phosphorylation at specific sites, a hypothesis warranting further investigation….”

      (10) Figure 4C, D, and F: The unit of velocity is wrong. The authors should use the same units they used in the table shown in Figure 4B.

      We have fixed these errors

      (11) Figure 4F: The velocity of PD is a lot lower than G444E. Therefore, it would be more appropriate to refer to PD as partially active, rather than hyperactive.

      We have made the change. 

      (12) Figure 5: There is too much genetics jargon on this figure (EMF, F2, 100%Dyf,...). How are the alleles numbered? Is it OK to refer to them as Alleles 1 and 2 for simplicity?

      According to the established C. elegans allele nomenclature, each worm allele has a unique number named after the lab code for identification. We have simplified the labels and updated the figure to make it more reader-friendly.

      (13) Figure 5E: A plot would be more reader-friendly than a table. Additionally, the legend for Fig. 5E mistakenly refers to it as "D."

      We have changed the table to a plot and fixed the mistakes. We thank the Reviewer for pointing them out.

      Reviewer #2 (Recommendations for the authors):

      (1) The model appears as if NEKL-3 induces dephosphorylation of OSM-3 (Figure 6). This is not consistent with the conclusions described in the Discussion and is confusing.

      We have updated the model figure and fixed the error.

      (2) It should be described why the authors hypothesized NEKL-3 phosphorylates OSM3. Was there genetic evidence? Did the authors screened cilia-related kinases? or Did the authors identify it incidentally? Providing this information would help readers to understand the context of the research.

      We appreciate both Reviewers for pointing out this issue. 

      Our hypothesis that NEKL-3 phosphorylates OSM-3 stems from prior findings in our lab. In a previous study (Yi et al., Traffic, 2018, PMID: 29655266), we identified NEKL-4, a member of the NIMA kinase family, as a suppressor of the OSM-3(G444E) hyperactive mutation. This discovery prompted us to explore the broader role of NIMA kinases in regulating OSM-3. Subsequent genetic screens (Xie et al., EMBO J, 2024, PMID: 38806659) revealed that both NEKL-3 and NEKL-4 suppress multiple OSM-3 mutations, further supporting their functional interaction. Given the established role of NIMA kinases in phosphorylation-dependent processes (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), we hypothesized that NEKL-3/4 may directly phosphorylate OSM3 to modulate its activity.

      To test this hypothesis, we expressed recombinant C. elegans NEKL-3 and OSM-3 proteins and conducted in vitro phosphorylation assays. While we were unable to obtain active recombinant NEKL-4 (limitations noted in the revised text), our experiments with NEKL-3 revealed phosphorylation at residues 487-490 (YSTT motif) in OSM-3’s tail region, as confirmed by mass spectrometry. These findings are now explicitly contextualized in the Introduction and Results sections of the revised manuscript.

      Page #4, Line #11:

      “... In our previous study (Yi et al., Traffic, 2018, PMID: 29655266), a genetic screen targeting the OSM-3(G444E) hyperactive mutation identified NEKL-4, a member of the NIMA kinase family, as a suppressor of this phenotype. This finding, combined with reports that NIMA kinases regulate ciliary processes independently of their canonical mitotic roles (Fry et al., JCS, 2012, PMID: 23132929; Chivukula et al., Nat. Med., 2020, PMID: 31959991; Thiel, C. et al. Am. J. Hum. Genet. 2011, PMID: 21211617; Smith, L. A. et al., J. Am. Soc. Nephrol., 2006, PMID: 16928806), prompted us to investigate whether NIMA kinases modulate OSM-3-driven intraflagellar transport. We hypothesized that NEKL-3/4, as paralogs within this family, might directly phosphorylate OSM-3 to regulate its motility...”

      Page #4, line #26: 

      “... To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region...”

      (3) It is curious the authors have not addressed the cilia phenotype and the localization of OSM-3 in nekl-3 mutant. Regardless of whether these observations agrees with the proposed mechanisms, it is essential for the authors to show and discuss the cilia phenotype and OSM-3 localization in nekl-3 mutants.

      We thank the Reviewer for highlighting this critical point. Indeed, nekl-3 null mutants are inviable due to essential mitotic roles (Barstead et al., 2012, PMID: 23173093), precluding direct analysis of ciliary phenotypes. To bypass this limitation, we recently generated nekl-3 conditional knockouts (cKOs) in ciliated neurons (Huang et al., JCB, 2025 in press, attached). In these mutants, OSM-3—which is normally enriched in the ciliary distal segment—becomes uniformly distributed along the cilium. This redistribution correlates with premature activation of OSM-3-driven anterograde motility in the ciliary middle region, consistent with our proposed model where NEKL3 phosphorylation suppresses OSM-3 activity. We have now integrated this result and discussion into the revised manuscript, reinforcing the physiological relevance of NEKL-3-mediated regulation in ciliary transport. 

      Page #6 line #10

      “… While nekl-3 null mutants are inviable due to essential mitotic roles (Barstead et al., 2012, PMID: 23173093), conditional knockout (cKO) of nekl-3 in ciliated neurons (Huang et al., JCB, 2025 in press, attached) revealed its critical role in regulating OSM3 dynamics. In nekl-3 cKO animals, OSM-3—normally enriched in the ciliary distal segment—redistributed uniformly along the cilium, concomitant with premature activation of anterograde motility in the middle ciliary region. This phenotype aligns with our model wherein NEKL-3 phosphorylation suppresses OSM-3 activity, ensuring spatiotemporal regulation of IFT.…”

      (4) The methods section lacks some information, which is critical to reproducing this study.

      We have now provided detailed information in the methods section in the revised manuscript.

      (a) It is not described how the authors determined phosphorylation of OSM-3 by NEKL-3. In methods, nothing is described about the assay.

      We performed in vitro phosphorylation assays using recombinant OSM-3 and NEKL3 purified from bacteria. We then used LC-MS/MS for identification of phosphorylation sites. We have now updated the methods section to include all the information.

      Page #4 line #26

      “... To determine whether NIMA kinase family members could directly phosphorylate OSM-3, we purified prokaryotic recombinant C. elegans NEKL-3/NEKL-4 and OSM3 protein in order to perform in vitro phosphorylation assays. We were able to obtain active recombinant NEKL-3 but not NEKL-4. The in vitro phosphorylation assays showed that NEKL-3, directly phosphorylates OSM-3 (Fig. 1A-B, Appendix Table S1). Subsequent mass spectrometric analysis revealed phosphorylation at residues 487-490, which localize to the conserved "YSTT" motif within OSM-3’s C-terminal tail region...”

      Page #36, line #19

      “In vitro phosphorylation assay 20 μM purified OSM-3 was incubated with 1 μM GST-NEKL-3 at 30 °C in 100 μL reaction buffer (50 mM Tris-HCl pH 8.0, 10 mM MgCl2, 150 mM NaCl, and 2 mM ATP) for 30 min. The reaction was terminated by boiling for 5 min with an SDS-sample buffer.

      Mass spectrometry

      Following NEKL-3 treatment, OSM-3 proteins were resolved by SDS-PAGE and visualized with Coomassie Brilliant Blue staining. Protein bands corresponding to OSM-3 were excised and subjected to digestion using the following protocol: reduction with 5 mM TCEP at 56°C for 30 min; alkylation with 10 mM iodoacetamide in darkness for 45 min at room temperature, and tryptic digestion at 37°C overnight with a 1:20 enzyme-to-protein ratio. The resulting peptides were subjected to mass spectrometry analysis. Briefly, the peptides were analyzed using an UltiMate 3000 RSLCnano system coupled to an Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientific). We applied an in-house proteome discovery searching algorithm to search the MS/MS data against the C. elegans database. Phosphorylation sites were determined using PhosphoRS algorithm with manual validation of MS/MS spectra.”

      (b) The method of structural prediction by Alfafold2 and LocalColabFold needs clarification. In general, the prediction gives several candidates. How did the authors choose one of these candidates?

      We generated five candidate models and all of them showed similar conformation. We thus chose the model with the highest confidence. We have provided PAE and pLDDT as additional data in Figure S2 and discussed them in the revised text on, Page #4, line #32: 

      “...To gain structural insights from this motif, we employed LocalColabFold based on AlphaFold2 to predict the dimeric structure of OSM-3 (Evans et al., 2022; Jumper et al., 2021; Mirdita et al., 2022). The highest-confidence model was selected for further analysis (Fig. 1C, Fig. S2)...”

      (c) The methods to predict conformational changes by introducing various point mutations are interesting (Figure 3). However, the methods require more detailed descriptions. In the current form, the manuscript only lists the tools used. The pipelines and parameters need to be described. This information is important because AlphaFoldbased predictions often give folded conformations because the training data are mainly composed of folded proteins. It is surprising that the methods applied here give open conformations induced by point mutations.

      We have described the pipelines in the revised Methods section on page#34, line#25: 

      “…OSM-3 model was predicted using LocalColabFold (Evans et al., 2022; Jumper et al., 2021; Mirdita et al., 2022). Mutated proteins were designed by Pymol 2.6, choosing the rotamer of the mutated residues in G444E, PM and PD models with the least clash as the initial conformation. To predict mutation-induced conformational changes, the initial models were subjected to Pyrosetta (Chaudhury et al., 2010). The energies of pre-relaxed models were evaluated with Rosetta Energy Function 2015 (Alford et al., 2017), and then the relax procedure were applied to the models with default parameters to obtain the relaxed models visualized by Pymol to minimize the energy of these models. In detail, to obtain the relaxed models visualized by Pymol and minimize the energy of these models, the classic relax mover was used in the procedure mentioned above with default settings. The relax script has been uploaded to Github: https://github.com/young55775/RosettaRelax_for_OSM3...”

      (5) The authors have purified proteins. Do they show different properties in gel filtration that are consistent with the structural prediction? It is anticipated that open-form mutants are eluted from earlier than closed forms.

      We thank the reviewer for this insightful suggestion. Indeed, our recent study supported that the open-from of the active OSM-3 G444E mutation were eluted earlier than the wild-type closed form (Xie et al., EMBO J., 2024). While the current study did not perform gel filtration chromatography (SEC) to directly compare the hydrodynamic properties of the OSM-3 mutants, our functional assays provide robust evidence for conformational changes predicted by structural modeling. For example: ATPase activity assays revealed that the open-state mutants (e.g., G444E and PD muatnts) exhibited significantly enhanced enzymatic activity (Figure 4A), consistent with structural predictions of an active, destabilized autoinhibitory interface (Figure 3A). These functional readouts collectively validate the predicted structural states. While SEC could further corroborate these findings by distinguishing compact (closed) versus extended (open) conformations, we prioritized assays that directly link structural predictions to in vitro enzymatic activity and in vivo ciliary transport dynamics. Future studies incorporating SEC or cryo-EM will provide additional biophysical validation of these states.

      We have revised the text in the manuscript (Page #7, Lines #22): 

      “…Notably, the open-state OSM-3 mutants (e.g., G444E) displayed elevated ATPase activity, consistent with structural predictions of autoinhibition release (Fig. 3A, Fig. 4A) (Xie et al., 2024). While hydrodynamic profiling (e.g., SEC) could further resolve conformational states, our functional assays directly connect predicted structural changes to altered biochemical and cellular activity...”

      Minor point

      (1) Line 85 "MIMA kinase family" should be "NIMA kinase family".

      We have corrected the typo and appreciate that the Reviewer for pointing it out. 

      (2) M.S. and D.S. need to be defined in Figure 2D.

      We have updated the figures.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:<br /> The global decline of amphibians is primarily attributed to deadly disease outbreaks caused by the chytrid fungus, Batrachochytrium dendrobatidis (Bd). It is unclear whether and how skin-resident immune cells defend against Bd. Although it is well known that mammalian mast cells are crucial immune sentinels in the skin and play a pivotal role in the immune recognition of pathogens and orchestrating subsequent immune responses, the roles of amphibian mast cells during Bd infections are largely unknown. The current study developed a novel way to enrich X. laevis skin mast cells by injecting the skin with recombinant stem cell factor (SCF), a KIT ligand required for mast cell differentiation and survival. The investigators found an enrichment of skin mast cells provides X. laevis substantial protection against Bd and mitigates the inflammation-related skin damage resulting from Bd infection. Additionally, the augmentation of mast cells leads to increased mucin content within cutaneous mucus glands and shields frogs from the alterations to their skin microbiomes caused by Bd.

      Strengths:<br /> This study underscores the significance of amphibian skin-resident immune cells in defenses against Bd and introduces a novel approach to examining interactions between amphibian hosts and fungal pathogens.

      Weaknesses:<br /> The main weakness of the study is the lack of functional analysis of X. laevis mast cells. Upon activation, mast cells have the characteristic feature of degranulation to release histamine, serotonin, proteases, cytokines, and chemokines, etc. The study should determine whether X. laevis mast cells can be degranulated by two commonly used mast cell activators IgE and compound 48/80 for IgE-dependent and independent pathways. This can be easily done in vitro. It is also important to assess whether in vivo these mast cells are degranulated upon Bd infection using avidin staining to visualize vesicle releases from mast cells. Figure 3 only showed rSCF injection caused an increase in mast cells in naïve skin. They need to present whether Bd infection can induce mast cell increase and rSCF injection under Bd infection causes a mast cell increase in the skin. In addition, it is unclear how the enrichment of mast cells provides protection against Bd infection and alternations to skin microbiomes after infection. It is important to determine whether skin mast cells release any contents mentioned above.

      We would like to thank the reviewer for taking the time to review our work and for providing us with valuable feedback.

      Please note that amphibians do not possess the IgE antibody isotype1.

      To our knowledge there have been no published studies using approaches for studying mammalian mast cell degranulation to examine amphibian mast cells. Notably, several studies suggest that amphibian mast cells lack histamine2, 3, 4, 5 and serotonin2, 6. While there are commercially available kits and reagents for examining mammalian mast cell granule content, most of these reagents may not cross-react with their amphibian counterparts. This is especially true of cytokines and chemokines, which diverged quickly with evolution and thus do not share substantial protein sequence identity across species as divergent as frogs and mammals. Respectfully, while following up on these findings is possible, it would involve considerable additional work to find reagents that would detect amphibian mast cell contents.

      We would also like to respectfully point out that while mast cell degranulation is a feature most associated with mammalian mast cells, this is not the only means by which mammalian mast cells confer their immunological effects. While we agree that defining the biology of amphibian mast cell degranulation is important, we anticipate that since the anti-Bd protection conferred by enriching frog mast cells is seen after 21 days of enrichment, it is quite possible that degranulation may not be the central mechanism by which the mast cells are mediating this protection.

      As noted in our manuscript, frog mast cells upregulate their expression of interleukin-4 (IL4), which is a hallmark cytokine associated with mammalian mast cells7. We are presently exploring the role of the frog IL4 in the observed mast cell anti-Bd protection. Should we generate meaningful findings in this regard, we will add them to the revised version of this manuscript.

      We are also exploring the heparin content of frog mast cells and capacities of these cells to degranulate in vitro in response to compound 48/80. In addition, we are exploring in vivo mast cell degranulation via histology and avidin-staining. Should these studies generate significant findings, we will include them in the revised version of this manuscript.

      Per the reviewer’s suggestion, in our revised manuscript we also plan to include data showing whether Bd infections affect skin mast cell numbers and how rSCF injection impacts skin mast cell numbers in the context of Bd infections.

      In regard to how mast cells impact Bd infections and skin microbiomes, our data indicate that mast cells are augmenting skin integrity during Bd infections and promoting mucus production, as indicated by the findings presented in Figure 4A-C and Figure 5A-C, respectively. There are several mammalian mast cell products that elicit mucus production. In mammals, this mucus production is mediated by goblet cells while the molecular control of amphibian skin mucus gland content remains incompletely understood. Interleukin-13 (IL13) is the major cytokine associated with mammalian mucus production8, while to our knowledge this cytokine is either not encoded by amphibians or else has yet to be identified and annotated in these animals’ genomes. IL4 signaling also results in mucus production9 and we are presently exploring the possible contribution of the X. laevis IL4 to skin mucus gland filling. Any significant findings on this front will be included in the revised manuscript. Histamine release contributes to mast cell-mediated mucus production10, but as we outline above, several studies indicate that amphibian mast cells may lack histamine2, 3, 4, 5. Mammalian mast cell-produced lipid mediators also play a critical role in eliciting mucus secretion11 and our transcriptomic analysis indicates that frog mast cells express several enzymes associated with production of such mediators. We will highlight this observation in our revised manuscript.

      We anticipate that X. laevis mast cells influence skin integrity, microbial composition and Bd susceptibility in a myriad of ways. Considering the substantial differences between amphibian and mammalian evolutionary histories and physiologies, we anticipate that many of the mechanisms by which X. laevis mast cells confer anti-Bd protection will prove to be specific to amphibians and some even unique to X. laevis. We are most interested in deciphering what these mechanisms are but foresee that they will not necessarily reflect what one would expect based on what we know about mammalian mast cells in the context of mammalian physiologies.

      Reviewer #2 (Public Review):

      Summary:<br /> In this study, Hauser et al investigate the role of amphibian (Xenopus laevis) mast cells in cutaneous immune responses to the ecologically important pathogen Batrachochytrium dendrobatidis (Bd) using novel methods of in vitro differentiation of bone marrow-derived mast cells and in vivo expansion of skin mast cell populations. They find that bone marrow-derived myeloid precursors cultured in the presence of recombinant X. laevis Stem Cell Factor (rSCF) differentiate into cells that display hallmark characteristics of mast cells. They inject their novel (r)SCF reagent into the skin of X. laevis and find that this stimulates the expansion of cutaneous mast cell populations in vivo. They then apply this model of cutaneous mast cell expansion in the setting of Bd infection and find that mast cell expansion attenuates the skin burden of Bd zoospores and pathologic features including epithelial thickness and improves protective mucus production and transcriptional markers of barrier function. Utilizing their prior expertise with expanding neutrophil populations in X. laevis, the authors compare mast cell expansion using (r)SCF to neutrophil expansion using recombinant colony-stimulating factor 3 (rCSF3) and find that neutrophil expansion in Bd infection leads to greater burden of zoospores and worse skin pathology.

      Strengths: <br /> The authors report a novel method of expanding amphibian mast cells utilizing their custom-made rSCF reagent. They rigorously characterize expanded mast cells in vitro and in vivo using histologic, morphologic, transcriptional, and functional assays. This establishes solid footing with which to then study the role of rSCF-stimulated mast cell expansion in the Bd infection model. This appears to be the first demonstration of the exogenous use of rSCF in amphibians to expand mast cell populations and may set a foundation for future mechanistic studies of mast cells in the X. laevis model organism. 

      We thank the reviewer for recognizing the breadth and extent of the undertaking that culminated in this manuscript. Indeed, this manuscript would not have been possible without considerable reagent development and adaptation of techniques that had previously not been used for amphibian immunity research. In line with the reviewer’s sentiment, to our knowledge this is the first report of using molecular approaches to augment amphibian mast cells, which we hope will pave the way for new areas of research within the fields of comparative immunology and amphibian disease biology.

      Weaknesses:<br /> The conclusions regarding the role of mast cell expansion in controlling Bd infection would be stronger with a more rigorous evaluation of the model, as there are some key gaps and remaining questions regarding the data. For example:

      1. Granulocyte expansion is carefully quantified in the initial time courses of rSCF and rCSF3 injections, but similar quantification is not provided in the disease models (Figures 3E, 4G, 5D-G). A key implication of the opposing effects of mast cell vs neutrophil expansion is that mast cells may suppress neutrophil recruitment or function. Alternatively, mast cells also express notable levels of csfr3 (Figure 2) and previous work from this group (Hauser et al, Facets 2020) showed rG-CSF-stimulated peritoneal granulocytes express mast cell markers including kit and tpsab1, raising the question of what effect rCSF3 might have on mast cell populations in the skin. Considering these points, it would be helpful if both mast cells and neutrophils were quantified histologically (based on Figure 1, they can be readily distinguished by SE or Giemsa stain) in the Bd infection models.

      We thank the reviewer for this insightful suggestion. We are performing a further examination of skin granulocyte content during Bd infections and plan on including any significant findings in our revised manuscript.

      We predict that rSCF administration results in the accumulation of mast cells that are polarized such that they ablate the inflammatory response elicited by Bd infection. Mammalian mast cells, including peritonea-resident mast cells, express csf3r12, 13. Although the X. laevis animal model does not permit nearly the degree of immune cell resolution afforded by mammalian animal models, we do know that the adult X. laevis peritonea contain heterogenous leukocyte populations. We anticipate that the high kit expression reported by Hauser et al., 2020 in the rCSF3-recruited peritoneal leukocytes reflects the presence of mast cells therein. As such and in acknowledgement of the reviewer’s suggestion, we also think that the cells recruited by rCSF3 into the skin may include not only neutrophils but also mast cells. Possibly, these mast cells have distinct polarization states from those enriched by rSCF. While the lack of antibodies against frog neutrophils or mast cells has limited our capacity to address this question, we will attempt to reexamine by histology the proportions of skin neutrophils and mast cells in the skins of frogs under the conditions described in our manuscript. Any new findings in this regard will be included in the revised version of this work.

      2. Epithelial thickness and inflammation in Bd infection are reported to be reduced by rSCF treatment (Figure 3E, 5A-B) or increased by rCSF3 treatment (Figure 4G) but quantification of these critical readouts is not shown.

      We thank the reviewer for this suggestion. We will score epithelial thickness under the distinct conditions described in our manuscript and present the quantified data in the revised paper.

      3. Critical time points in the Bd model are incompletely characterized. Mast cell expansion decreases zoospore burden at 21 dpi, while there is no difference at 7 dpi (Figure 3E). Conversely, neutrophil expansion increases zoospore burden at 7 dpi, but no corresponding 21 dpi data is shown for comparison (Figure 4G). Microbiota analysis is performed at a third time point,10 dpi (Figure 5D-G), making it difficult to compare with the data from the 7 dpi and 21 dpi time points. Reporting consistent readouts at these three time points is important to draw solid conclusions about the relationship of mast cell expansion to Bd infection and shifts in microbiota.

      Because there were no significant effects of mast cell enrichment at 7 days post Bd infection, we chose to look at the microbiome composition in a subsequent experiment at 10 days and 21 days post Bd infection, with 10 days being a bit more of a midway point between the initial exposure and day 21, when we see the effect on Bd loads. We will clarify this rationale in the revised manuscript.

      The enrichment of neutrophils in frog skins resulted in prompt (12 hours post enrichment) skin thickening (in absence of Bd infection) and increased frog Bd susceptibility by 7 days of infection. Conversely, mast cell enrichment stabilized skin mucosal and symbiotic microbial environment, presumably accounting at least in part for the lack of further Bd growth on mast cell-enriched animals by 21 days of infection. Our question regarding the roles of inflammatory granulocytes/neutrophils during Bd infections was that of ‘how’ rather ‘when’ these cells affect Bd infections. Because the central focus of this work was mast cells and not other granulocyte subsets, when we saw that rCSF3-recruited granulocytes adversely affected Bd infections at 7 days post infection, we did not pursue the kinetics of these responses further. We plan to explore the roles of inflammatory mediators and disparate frog immune cell subsets during the course of Bd infections, but we feel that these future studies are more peripheral to the central thesis of the present manuscript regarding the roles of frog mast cells during Bd infections.

      4. Although the effect of rSCF treatment on Bd zoospores is significant at 21 dpi (Figure 3E), bacterial microbiota changes at 21 dpi are not (Figure S3B-C). This discrepancy, how it relates to the bacterial microbiota changes at 10 dpi, and why 7, 10, and 21 dpi time points were chosen for these different readouts (Figure 5F-G), is not discussed.

      Our results indicate that after 10 days of Bd infection, control Bd-challenged animals exhibited reduced microbial richness, while skin mast cell-enriched Bd-infected frogs were protected from this disruption of their microbiome. The amphibian microbiome serves as a major barrier to these fungal infections14, and we anticipate that Bd-mediated disruption of microbial richness and composition facilitates host skin colonization by this pathogen. Control and mast cell-enriched animals had similar skin Bd loads at 10 days post infection. However, by 21 days of Bd infection the mast cells-enriched animals maintained their Bd loads to levels observed at 10 days post infection, whereas the control animals had significantly greater Bd loads. Thus, we anticipate that frog mast cells are conferring the observed anti-Bd protection in part by preventing microbial disassembly and thus interfering with optimal Bd colonization and growth on frog skins. In other words, maintained microbial composition at 10 days of infection may be preventing additional Bd colonization/growth, as seen when comparing skins of control and mast cell-enriched frogs at 21 days post infection. By 21 days of infection, control animals rebounded from the Bd-mediated reduction in bacterial richness seen at 10 days. Considering that after 21 days of infection control animals also had significantly greater Bd loads than mast-cell enriched animals suggests that there may be a critical earlier window during which microbial composition is able to counteract _Bd_growth. 

      While the current draft of our manuscript has a paragraph to this effect (see below), we appreciate the reviewer conveying to us that our perspective on the relationship between skin mast cells and the kinetics of microbial composition and _Bd_loads could be better emphasized. We plan to revise our manuscript to include the above discussion points. 

      Bd infections caused major reductions in bacterial taxa richness, changes in composition and substantial increases in the relative abundance of Bd-inhibitory bacteria early in the infection. Similar changes to microbiome structure occur during experimental Bd infections of red-backed salamanders and mountain yellow-legged frogs15, 16. In turn, progressing Bd_infections corresponded with a return to baseline levels of _Bd-inhibitory bacteria abundance and rebounding microbial richness, albeit with dissimilar communities to those seen in control animals. These temporal changes indicate that amphibian microbiomes are dynamic, as are the effects of Bd infections on them. Indeed, Bd infections may have long-lasting impacts on amphibian microbiomes15. While Bd infections manifested in these considerable changes to frog skin microbiome structure, mast cell enrichment appeared to counteract these deleterious effects to their microbial composition. Presumably, the greater skin mucosal integrity and mucus production observed after mast cell enrichment served to stabilize the cutaneous environment during Bd infections, thereby ameliorating the Bd-mediated microbiome changes. While this work explored the changes in established antifungal flora, we anticipate the mast cell-mediated inhibition of Bd may be due to additional, yet unidentified bacterial or fungal taxa. Intriguingly, while mammalian skin mast cell functionality depends on microbiome elicited SCF production by keratinocytes17, our results indicate that frog skin mast cells in turn impact skin microbiome structure and likely their function. It will be interesting to further explore the interdependent nature of amphibian skin microbiomes and resident mast cells.

      5. The time course of rSCF or rCSF3 treatments relative to Bd infection in the experiments is not clear. Were the treatments given 12 hours prior to the final analysis point to maximize the effect? For example, in Figure 3E, were rSCF injections given at 6.5 dpi and 20.5 dpi? Or were treatments administered on day 0 of the infection model? If the latter, how do the authors explain the effects at 7 dpi or 21 dpi given mast cell and neutrophil numbers return to baseline within 24 hours after rSCF or rCSF3 treatment, respectively?

      Please find the schematic of the immune manipulation, Bd infection, and sample collection times below. We will include a figure like this in our revised manuscript.

      The title of the manuscript may be mildly overstated. Although Bd infection can indeed be deadly, mortality was not a readout in this study, and it is not clear from the data reported that expanding skin mast cells would ultimately prevent progression to death in Bd infections.

      We acknowledge this point. The revised manuscript will be titled: “Amphibian mast cells: barriers to chytrid fungus infections”.

      Reviewer #3 (Public Review):

      Summary:<br /> Hauser et al. provide an exceptional study describing the role of resident mast cells in amphibian epidermis that produce anti-inflammatory cytokines that prevent Batrachochytrium dendrobatidis (Bd) infection from causing harmful inflammation, and also protect frogs from changes in skin microbiomes and loss of mucin in glands and loss of mucus integrity that otherwise cause changes to their skin microbiomes. Neutrophils, in contrast, were not protective against Bd infection. Beyond the beautiful cytology and transcriptional profiling, the authors utilized elegant cell enrichment experiments to enrich mast cells by recombinant stem cell factor, or to enrich neutrophils by recombinant colony-stimulating factor-3, and examined respective infection outcomes in Xenopus.

      Strengths:<br /> Through the use of recombinant IL4, the authors were able to test and eliminate the hypothesis that mast cell production of IL4 was the mechanism of host protection from Bd infection. Instead, impacts on the mucus glands and interaction with the skin microbiome are implicated as the protective mechanism. These results will press disease ecologists to examine the relative importance of this immune defense among species, the influence of mast cells on the skin microbiome and mucosal function, and open the potential for modulating mucosal defense.

      We thank the reviewer for recognizing the significance and utility of the findings presented in our manuscript.

      Weaknesses:<br /> A reduction of bacterial diversity upon infection, as described at the end of the results section, may not always be an "adverse effect," particularly given that anti-Bd function of the microbiome increased. Some authors (see Letourneau et al. 2022 ISME, or Woodhams et al. 2023 DCI) consider these short-term alterations as encoding ecological memory, such that continued exposure to a pathogen would encounter an enriched microbial defense. Regardless, mast cell-initiated protection of the mucus layer may negate the need for this microbial memory defense.

      We thank the reviewer their insightful comment. We will revise our discussion to include this possible interpretation.

      While the description of the mast cell location in the epidermal skin layer in amphibians is novel, it is not known how representative these results are across species ranging in chytridiomycosis susceptibility. No management applications are provided such as methods to increase this defense without the use of recombinant stem cell factor, and more discussion is needed on how the mast cell component (abundance, distribution in the skin) of the epidermis develops or is regulated.

      We appreciate the reviewer’s comment and would like to point out that the work presented in our manuscript was driven by comparative immunology questions more than by conservation biology.

      We thank the reviewer for suggesting expanding our discussion to include potential management applications and potential mechanisms for regulating frog skin mast cells. While any content to these effects would be highly speculative, we agree that it may spark new interest and pave new avenues for research. To this end, our revised manuscript will include a paragraph to this effect.

      References:

      1. Flajnik, M.F. A cold-blooded view of adaptive immunity. Nat Rev Immunol 18, 438-453 (2018).

      2. Mulero, I., Sepulcre, M.P., Meseguer, J., Garcia-Ayala, A. & Mulero, V. Histamine is stored in mast cells of most evolutionarily advanced fish and regulates the fish inflammatory response. Proc Natl Acad Sci U S A 104, 19434-19439 (2007).

      3. Reite, O.B. A phylogenetical approach to the functional significance of tissue mast cell histamine. Nature 206, 1334-1336 (1965).

      4. Reite, O.B. Comparative physiology of histamine. Physiol Rev 52, 778-819 (1972).

      5. Takaya, K., Fujita, T. & Endo, K. Mast cells free of histamine in Rana catasbiana. Nature 215, 776-777 (1967).

      6. Galli, S.J. New insights into "the riddle of the mast cells": microenvironmental regulation of mast cell development and phenotypic heterogeneity. Lab Invest 62, 5-33 (1990).

      7. Babina, M., Guhl, S., Artuc, M. & Zuberbier, T. IL-4 and human skin mast cells revisited: reinforcement of a pro-allergic phenotype upon prolonged exposure. Archives of dermatological research 308, 665-670 (2016).

      8. Lai, H. & Rogers, D.F. New pharmacotherapy for airway mucus hypersecretion in asthma and COPD: targeting intracellular signaling pathways. J Aerosol Med Pulm Drug Deliv 23, 219-231 (2010).

      9. Rankin, J.A. et al. Phenotypic and physiologic characterization of transgenic mice expressing interleukin 4 in the lung: lymphocytic and eosinophilic inflammation without airway hyperreactivity. Proc Natl Acad Sci U S A 93, 7821-7825 (1996).

      10. Church, M.K. Allergy, Histamine and Antihistamines. Handb Exp Pharmacol 241, 321-331 (2017).

      11. Nakamura, T. The roles of lipid mediators in type I hypersensitivity. J Pharmacol Sci 147, 126-131 (2021).

      12. Aponte-Lopez, A., Enciso, J., Munoz-Cruz, S. & Fuentes-Panana, E.M. An In Vitro Model of Mast Cell Recruitment and Activation by Breast Cancer Cells Supports Anti-Tumoral Responses. Int J Mol Sci 21 (2020).

      13. Jamur, M.C. et al. Mast cell repopulation of the peritoneal cavity: contribution of mast cell progenitors versus bone marrow derived committed mast cell precursors. BMC Immunol 11, 32 (2010).

      14. Walke, J.B. & Belden, L.K. Harnessing the Microbiome to Prevent Fungal Infections: Lessons from Amphibians. PLoS Pathog 12, e1005796 (2016).

      15. Jani, A.J. et al. The amphibian microbiome exhibits poor resilience following pathogen-induced disturbance. ISME J 15, 1628-1640 (2021).

      16. Muletz-Wolz, C.R., Fleischer, R.C. & Lips, K.R. Fungal disease and temperature alter skin microbiome structure in an experimental salamander system. Mol Ecol 28, 2917-2931 (2019).

      17. Wang, Z. et al. Skin microbiome promotes mast cell maturation by triggering stem cell factor production in keratinocytes. J Allergy Clin Immunol 139, 1205-1216 e1206 (2017).

    1. When looking at who contributes in crowdsourcing systems, or with social media in generally, we almost always find that we can split the users into a small group of power users who do the majority of the contributions, and a very large group of lurkers who contribute little to nothing. For example

      This is interesting to see that a majority of people are lurkers because as someone you uses social media it seems like there are many people who contribute but this actually shocks me because I guess you have to look at it in the scale of numbers if there are 100 million users of an app even 10% of that is 10 million which is many users and thats why people may think that there are a lot of engagers when in fact a good chunk of us are just lurkers.

    1. Author response:

      The following is the authors’ response to the original reviews

      Thank you for your valuable comments, which helped us improve our manuscript. We will make the following modifications in the revised manuscript:

      (1) In the first paragraph of the Result section, we will provide a summary of trimeric G proteins in Ciona and explain how we focused on Gαs and Gαq in the initial phase of this study.

      We added a summary of trimeric G proteins in Ciona in the initial part of the Results section (page 6, line 23 to page 8, line 5). In this summary, we added the following sentence explaining the reason we focused on Gas and Gaq in the initial phase of this study: "Among them, we prioritized examining the Gα proteins having an excitatory function (Gαq and Gαs) rather than inhibitory roles since previous studies suggested that excitatory events like Ca<sup>2+</sup> transient and neuropeptide secretion occur when Ciona metamorphose."

      (2) As the reviewer 1 suggests, the polymodal roles of papilla neurons are interesting. Although we could not address this through functional analyses in this study, we will add a discussion regarding this aspect. The sentences will be something like the following:

      “The recent study (Hoyer et al., 2024) provided several lines of evidence suggesting that PSNs can serve as the sensors of several chemicals in addition to the mechanical stimuli. This finding and our model could be mutually related because these chemicals could modify Ca<sup>2+</sup> and cAMP production. The use of G protein signaling allows Ciona to reflect various environmental stimuli to initiate metamorphosis in the appropriate situation, both mechanically and chemically.”

      We added a discussion related to the recent publication by Hoyer and colleagues on page 23, lines 13-18: " A recent study[19] provided several lines of evidence suggesting that PNs can serve as the sensors of several chemicals in addition to mechanical stimuli. This finding and our model could be mutually related because these chemicals could modify Ca<sup>2+</sup> and cAMP production. G protein signaling allows Ciona to reflect various environmental stimuli to initiate metamorphosis either mechanically or chemically according to the situation."

      (3) As both reviewers suggested, imaging cAMP on the backgrounds of some G protein knockdowns is essential, and we will conduct the experiments.

      We added the data on cAMP imaging in Gas, Gaq, and dvGai_Chr2 knockdown larvae in Supplementary Figure S4C-D and Figure 6E.

      (4) We carefully modify the text throughout the manuscript so that the descriptions suitably reflect the results.

      We modified the descriptions of experimental results so that the text reflects the results more precisely.

      Reviewer #1:

      Pg1 - need to add an additional '6' to the author list to clarify which two or more authors contributed equally.

      We added a 6 as suggested. Thank you for pointing this out.

      Pg3 - note that larval adhesive organ applies to not all benthic adults, but to benthic sessile adults this makes it sound like the adhesive organ can trigger metamorphosis but has that been shown? In Ciona or others? Need to specify the role of cells secreting adhesive, vs sensory cells that trigger metamorphosis?

      We divided the corresponding sentence into two to clearly state that adhesion and triggering metamorphosis are related but could be different events. Moreover, we modified the sentence to state that physical contact is one example of a cue triggering metamorphosis. We then added another example of a factor triggering metamorphosis—i.e., chemicals from the organisms surrounding the adherence site (page 3, lines 16-20 of the revised version):

      "Many marine invertebrates exhibit a benthic lifestyle at the adult stage[4]. Their planktonic larvae have an adhesive organ that secretes adhesives and adheres to a substratum. The cues associated with the adhesion, such as the physical contact with the substratum and a chemical from organisms surrounding the adherence site, can trigger their metamorphosis."

      Pg 4 - although mechanosensation is the focus here, could there also be chemoreception/chemoreceptors involved in Ciona metamorphosis? For example, Hoyer et al. 2024 (Current Biology 34(6):1168-1182) concluded that some palp sensory neurons were multimodal and could be both chemo- and mechano-sensory.

      We added statements about this recent finding in the Introduction and Discussion sections. In the Introduction (page 4, lines 16-18), however, we also stated that a mechanical stimulus can trigger metamorphosis in the lab without the need to supply these chemicals. This is to emphasize that the mechanical stimulus is the focus of this study. In the Discussion, we added a statement that G-protein signaling could also be used to receive the chemical stimuli (page 23, lines 13-18).

      Pg 6 - Before starting functional characterizations, it would be useful to give an overview (table?) of the G proteins found in papillae, and what receptor they are suspected of binding to, or if this is completely unknown, and which downstream pathways they likely activate. That is, to show some results about which G proteins are found in Ciona, and which are found in papillae. In this way, it will make more sense for readers when the Gai is suddenly introduced later, following the sections of Gaq and Gas.

      Thank you for your idea to improve the readability of this manuscript. In the initial part of the Results section (page 6, line 22 to page 8, line 5), we added descriptions of the repertoire of trimeric G-proteins in Ciona, including phylogenetic analyses, and expression in the papillae based on RNA-seq data, followed by the reason why we initially focused on Gaq and Gas. The data are displayed in Supplementary Figure S1. The phylogenetic analyses were modified from those shown in Supplementary Figure S5 of the previous version. We also added the general downstream activities of Gas, Gai and Gaq in the Introduction section (page 6, lines 10-12). Considering the contents, the general function of Ga12/13 was stated in the Results section (page 8, lines 2-3).

      We did not add the information about their partner receptors in this early section. This is because there are many candidates, and we could not pick some of them. Instead, we described our current suppositions about their possible partners in the Discussion (page 23, line 22 to page 24, line 19). However, we suspect that there are more candidates, and we wish to promote unbiased research in the future.

      Pg 9 - would be good to know the timing of this PF fluorescence increase and the timing of stimulation in the text here, relevant to the 30-min gap before metamorphosis initiation

      We added the start times for the cAMP reduction and re-upregulation in the following sentence (page 11, lines 17-18): "The cAMP reduction and increase respectively started at 35 seconds and 4 min 40 seconds after stimulation on average."

      Pg 28 - Phylogenetic analysis: Given that the results may be of interest to metamorphosis in other marine invertebrates as discussed in the last paragraph of the paper, it would be useful to include G proteins from these other animal phyla where available in the phylogenetic tree. Similarly, in Figure S5A it would be useful to highlight further all the different Ciona G proteins, and the different protein families, through the use of additional colour/labelling (regardless of whether this remains Fig S5A, or becomes part of the main figures)

      We drew a phylogenetic tree of G-proteins including those in some sessile and benthic animals (barnacle, sea anemone, hydra, sponge, sea urchin and shell). However, we decided not to add the tree in the revised version because, unfortunately, the bootstrap values of many branches were not high enough to have confidence in the results. We hope you understand our decision. Ciona divergent G-proteins are likely to be specific to Ciona.

      According to your comment, we highlighted all Ciona G alpha proteins in red in Figure S5A, which is now Figure S1A in the revised version.

      Figure 3E and Figure S3 - is the data shown as an average of all larvae measured (n=5 and n=4) or is it data from one representative larva out of the 4-5 measured? This needs clarification.

      The original graphs in Figure 3E and Figure S3 are typical examples. We added the graphs summarizing data of all larvae in each experimental condition in Supplementary Figure S4 (corresponding to Supplementary Figure S3 of the original version). Figure 3E remains as a typical example of the result of a single larva to explain our data analysis in detail.

      Experimental suggestion - As mentioned above, one missing detail seems to be the need for evidence that cAMP is elevated in the papillae directly as a result of Gs activation- this could be shown with measurement of cAMP via PF in Gs knockdown larvae that are mechanically stimulated compared to wildtype stimulated and non-stimulated?

      Thank you for your suggestion. The experiments are indeed important. We added the data of Pink Flamindo imaging in the Gas, Gaq and dvGai_Chr2 knockdown conditions. The results of Gas and Gaq knockdowns are described in page 11, line 24 to page 12, line 5, and are displayed in Supplementary Figure S4C-D. The result of dvGai_Chr2 knockdown is given on page 16, lines 20-22 and shown in Figure 6E.

      In order to insert the data of cAMP imaging of dvGai_Chr2 knockdown larvae, we transferred some panels of Figure 6 to Supplementary Figure S6. In addition, the knockdown data of dvGαi_Chr4 and double knockdowns of Gai genes are also included in Supplementary Figure S6.

      Reviewer #2:

      Page 6, line 3-4 in the first paragraph of the "Results"; the authors state "Neither morphant showed any signature of metamorphosis even though both were allowed to adhere to the base of culture dishes...". However, judging from Fig. 1E, "the percentage of metamorphosis initiation" (indicated by the initiation of tail regression) in Gαq morphans is not close to 0 (average about 40%), thus I am not convinced this observation can be described as "Neither morphant showed any signature of metamorphosis..." in this sentence.

      Thank you for your suggestion. In writing the original text, we oversimplified some of the descriptions when trying to improve the readability. We agree this resulted in imprecision in places. We have revised all these passages in our revision. In this particular case, we softened the overly emphatic statement to better reflect the results, changing “... any signature of metamorphosis...” to “... reduced rate of metamorphosis initiation...” In addition, we stated that the effect of G_α_q MO was weaker than that of G_α_s MO on page 8, lines 10-12. The weaker effect of Gaq MO was due to the redundant role of the Gi pathway, which is shown on page 17, lines 10-17, and in Figure 6G-H.

      Similarly, in the next paragraph describing the knockdown of PLCβ1/2/3, PLCβ4, and IP3R genes, the authors appear to neglect there is a weaker effect of the PLCβ4 MO, and simply described the results as "The knockdown larvae of these three genes failed to start metamorphosis". Based on Fig. 1H, about 30% of the PLCβ4 MO-injected animals still initiated tail regeneration. This difference may have some biological meanings and thus should be described more precisely.

      We added the following sentence on page 8, lines 18-19 of the revised version: “The effect of PLCβ4 MO was weaker than those of the other MOs, suggesting that this PLC plays an auxiliary role.”

      Page 7, second paragraph, on the description of GCaMP8 fluorescence and also at the end of Fig. 1O legend, the citation to "Figure S1" is confusing; Fig. S1 is the phylogenetic tree of PLCβ proteins. Is there additional data regarding this Gαq MO plus GCaMP8 mRNA injection experiment?

      Figure S1 of the original version corresponds to Figure S2 of the revised version. To avoid confusion, we deleted this citation from the legend of Figure 1O. By this modification, the sentence stating the repertoire of PLCb and IP3R in Ciona (page 8, lines 15-16) is the only sentence citing Figure S2 in the revised version.

      Page 8, first sentence; The purpose of theophylline treatment is not to prevent larvae from adhesion, thus I would suggest modifying this sentence to: "We treated wild-type larvae with theophylline after tail amputation, and we observed that most theophylline-treated larvae completed tail regression without adhesion (Figure 2D-F)".

      We modified the sentence according to your comment. Thank you for your suggestion.

      Page 9, second paragraph; judging from the data presented in Fig. 3C, I think this description: "when papillae were removed from larvae, theophylline failed to induce metamorphosis" is not accurate, because about ~30% of the Papilla cut +Theophylline-treated larvae still initiated their tail regression. This needs to be explained clearly.

      We modified the sentence (page 11, lines 2-3) as follows: “...the average rate of metamorphosis induction by theophylline was reduced from 100% to 30%...”

      Similarly in the next few sentences regarding the results presented in Fig, 3D, the effects of overexpressing those genes are not uniform. While amputation of papillae in larvae overexpressing caPLCβ1/2/3 could inhibit metamorphosis almost completely, papilla cut seems to have a weaker effect on caGαq, caGαs, and bPAC-overexpressing larvae.

      We added a description explaining that caPLCβ1/2/3 was the most sensitive to papilla amputation, and the possibility that PLCβ1/2/3 works specifically in the papillae (page 11, lines 9-11): “Among these experiments, caPLCβ1/2/3 overexpression was the most sensitive to papilla amputation, suggesting that PLCβ1/2/3 acts specifically in the papillae during metamorphosis.”

      Page 9, the paragraph on using the fluorescent cAMP indicator; there is a discrepancy between the described developmental time when the authors conducted this experiment and the metamorphosis competent timing (after 24hpf) described on page 7. On page 26, the authors describe "The Pink Flamindo mRNA-injected larvae were immobilized on Poly L lysine-coated glass bottom dishes at 20-21 hpf...". Did the authors start stimulating the larvae to observe the fluorescent signal soon after immobilization, or wait several hours until the larvae passed 24hpf and then conduct the experiment?

      The latter is the case. The immobilized larvae were kept until they acquired the competence for metamorphosis and then stimulation/recording was carried out. This point is described in the Materials and Methods section of the revised version (page 29, lines 16-18):

      "The Pink Flamindo mRNA-injected larvae were immobilized on Poly L lysine-coated glass-bottom dishes at 20-21 hpf, and stimulated their adhesive papillae around 25 hpf."

      Page 10, the description "...Gαq morphants initiated metamorphosis when caGαs was overexpressed in the nervous system (Figure 4F)". It should be noted that the result is only a partial rescue. To be precise, this description needs to be modified.

      We changed the sentence to reflect the results more precisely (page 14, lines 2-3): “Moreover, caGαs overexpression in the nervous system significantly, although not perfectly, ameliorated the effect of Gαq MO (Figure 4F).”

      Page 12-13, This description and the figure 5E presented is a bit confusing to me. The figure legend for 5E: "GABA is necessary for Ca2+ transient in the adhesive papillae (arrow)" But the arrow in this image points to a place with no fluorescent signal, and on the upper corner it labeled as "29% (n=17)". Does that mean the proportion of "no Ca2+ increase after stimulation" was 29% among the 17 samples examined? Or actually, is the other way around that 81% of the examined larvae did not show Ca2+ signal increase after stimulation?

      The latter is the case. We added a caption explaining this clearly in the Figure legend: “The percentage and number exhibit the rate of animals showing Ca<sup>2+</sup> transient in the papillae.”

      Page 13, second paragraph; I do not agree with the overly simplified description that "GABA significantly ameliorated the metamorphosis-failed phenocopies of Gαq, PLCβ, and Gαs morphants". As shown in Fig. 5F-H, adding GABA exerts different levels of partial rescue effect on each morphant, and thus should be described clearly.

      When the outliers are neglected, the effect of GABA is most evident in Gαs knockdowns. This suggests that the target(s) of GABA signaling is more likely to be Gq pathway components. We added the following sentence to the revised version (page 15, lines 14-16):

      “Among the three morphants, GABA exhibited the most effective rescues in Gαs knockdowns than Gαq and PLCβ.”

      In addition, we think this sentence establishes a more logical connection with the sentence that follows it: “These results could be explained by assuming enhancement of the Gq pathway by GABA through PLCβ and another GABA-mediated metamorphic pathway bypassing Gq components.” Thank you for your suggestion.

      The section "Contribution of Gi to metamorphosis" confirmed the possibility that GABA signaling targets Gq pathway components.

      Page 13, the first paragraph on "Contribution of Gi to metamorphosis"; the description that "The knockdown of this gene (Gαi) exhibited a significantly reduced rate of metamorphosis;..." is misleading. I would suggest modifying the entire sentence as "The knockdown of this gene (Gαi) exhibited a moderate (although statistically significant) reduction of metamorphosis rate, suggesting the presence of another Gαi regulating metamorphosis".

      Thank you for your suggestion. We modified the sentence (page 16, lines 2-4 in the revised version) as recommended. We believe the description is much improved.

      Page 20, the last sentence about Ciona papilla neurons expressing transcription factor Islet; the authors seem to attempt to make some comparison with the vertebrate pancreatic beta cells in this paragraph, but the comparison and the argument are not fully developed in this current format.

      To deepen this discussion, we added the following sentence (page 23, lines 10-12): “The atypical secretion of GABA might depend on the transcription factor like Islet shared between Ciona papilla neurons and vertebrate beta cells.”

      However, we would like to limit the depth of our discussion on this point, as we hope to expand on it further in future studies.

      Other suggestions:

      Page 3, second paragraph: as they become unable to "move" after metamorphosis -> "relocate"

      We corrected the word as suggested.

      Page 4, second paragraph: In the first sentence, the author states the current understanding of chordate phylogeny and cites Delsuc et al. 2006 Nature paper at the end of this sentence. However, in this paper cephalochordates were erroneously grouped with echinoderms, and thus chordates did not form a monophyletic clade. A later paper by Bourlat et al, (Nature 444:85-88, 2006) corrected this problem, and subsequently Dulsuc et al. also published another paper (genesis, 46:592-604, 2008) with broader sampling to overcome this problem. These later publications need to be included for the sake of correctness.

      We added this reference.

      Page 14, regarding the redundant function of the typical Gαi protein in the papillae; the authors may try double KD of Gαi and dvGαi_Chr2 in their experimental system to test this idea.

      We carried out double knockdown of typical Gai and dvGαi_Chr2. However, we could not address their redundant role sufficiently because most of the double knockdown larvae exhibited severe shape malformation.

      dvGαi_Chr4 is also expressed in the papillae. We carried out knockdown of this gene, to find that the knockdown resulted in very minor but statistically significant reduction of the metamorphosis rate, suggesting that this Gai also plays a supportive role in metamorphosis. We also carried out double knockdown of dvGαi_Chr2 and dvGαi_Chr4. The double KD larvae exhibited responsiveness to GABA, probably because of the presence of typical Gai.

      These results are described on page 16, lines 2-18, and the data are shown in Supplementary Figure S6A-D of the revised version.

      Responses to the Reviewing editor's comments:

      "Larvae of the ascidian Ciona initiate metamorphosis tens of minutes after adhesion to a substratum via its adhesive organ." - Larvae is plural so change to 'via their adhesive organ'

      The sentence was corrected as suggested.

      "Metamorphosis is a widespread feature of animal development that allows them" - revise the sentence, e.g. "Metamorphosis is a widespread feature of development that allows animals"

      The sentence was corrected as suggested.

      "GABA synthase (GAD)" GAD is not called GABA synthase but glutamate decarboxylase - clarify, e.g. encoding the enzyme synthesizing GABA called glutamate decarboxylase (GAD)

      This part was corrected exactly as suggested. Thank you.

      "IP3 is received by its receptor on the endoplasmic reticulum (ER) and releases calcium ion (Ca2+ )" revise to "IP3 is received by its receptor on the endoplasmic reticulum (ER) that releases calcium ion (Ca2+ )"

      The sentence was corrected as suggested.

      "Moreover, GPCR is implicated as the mediator of settlement" - GPCRs are implicated

      This sentence was modified as suggested.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for providing valuable comments and suggestions for improving the manuscript.

      Response to reviewer comments:

      Reviewer-1

      Comment 1: Major concern is the study lacks rigor in several areas where n=2, results are not quantified with statistics. They need to run power analysis and increase their samples sizes. Please include statistics on all measurements. Filamentous actin staining and alpha-sma is used to visualize mechanosensing but also in other cell activities such as cell contractility for movement, cell to substrate adhesion, cell division, etc. They need to query more mechanosensing related pathways (Piezo1/2, Yap/taz-Hippo, integrin-Focal Adhesion Kinase, etc) to show that mechanosensing changed.

      Response: We have increased the sample size to a minimum of n=3 in most cases. However, a few experiments will require more time to increase sample size, as mentioned below.

      Our data emphasized the role of Rac1 and SRF. We understand that other molecular players may also be involved in sensing or responding to mechanical forces, but surveying multiple families of candidates without a specific hypothesis or functional experiment is beyond the scope of this study.

      __Comment 2: __Fig. 1: In panel E, the cranial bone area measurement is not normalized to mitigate the possible effect of individual differences.

      Response: We have re-quantified the data with normalization to the length of the skull.

      __Comment 3: __In Fig. 2 the authors mentioned many phenotypical changes (bone length changes, gap thickness change, apex thickness change, etc.) based on histology stain, none of them are quantified to show a significant difference between Rac1-WT and Rac1-KO.

      Response: In Fig. 2A, we present the gross morphology of the Rac1-KO embryos and only discuss the tissue defects like edema, hematoma, and hypoplasia, confirmed through H&E as shown in Fig. 2C. We also show the apical limits of the intact calvaria in Fig. 2D, consistent with the calvaria defects observed at birth. In fact, we do not discuss any “bone length changes, gap thickness, or apex thickness change” in this section as suggested by the reviewer. To address the request for more quantification we have added measurement of the edematous area of the apical mesenchyme at E14.5 (Fig. 2C), and this is now shown in Suppl. Fig. 1E. We also added quantification of embryo genotypes and Chi-square tests, now shown in Suppl. Fig. 1D.

      Comment 4: Fig. 2 In panel D, with only 2 embryos per group is not enough for quantitation

      Response: We plan to increase the number of embryos during the revision period.

      Comment 5: Fig. 2 In panel D, the two arrows in the Rac1-KO mutants are not easy to catch.

      Response: We made the arrows bigger and bolder.

      Comment 6: Fig. 3 The thickness quantification is not performed.

      Response: We added quantification in Fig. 3D.

      Comment 7: Fig. 3 The images show an obvious curve change of the apex between the control and mutant. Such change is not discussed in the results. Is it due to histology issue?

      Response: We do not think it is due to technical issues but reflects a real change in the shape of the apex of the head. We modified the graphical representation in Figure 3E to reflect this change in curvature. We also added the following sentence to the results on page 7: “We also noted a loss of curvature in the apex of the Rac1-KO head at E13.5, which correlated with loss of aSMA+ mesenchymal cells and thinning of the EMM (Fig. 3E).”

      __Comment 8: __The merged layer did not show S100a6. While the authors are showing apical expansion of the mesenchyme toward the dermis and meninges, it is hard to track where they are without a merged image.

      Response: We added merged images.

      Comment 9: Fig. 4 In panel B, 2 biological replicates per genotype are very low.

      __Response: __The effect of Rac1-KO on cell cycle is already known (Moore et al. 1997; Nikolova et al. 2007; Gahankari et al. 2021), and our result is supported by in vivo quantification of Tom+Edu+ cells in different regions of the embryonic head shown in Fig. 4A. We prefer not to repeat this assay.

      Comment 10: Fig. 4 There is no cell death data.

      Response: We will generate data on cell death during the revision period.

      __Comment 11: __Fig. 5 In panel B, the GAPDH western plot bands in the mutants seem to be thinner than those of controls.

      Response: We verified equal loading with a Ponceau stain, so this minor change in the GAPDH level could be due to biological differences in the protein level. Nevertheless, by our estimation this minor difference does not explain away the major difference in Rac1 and Srf levels.

      __Comment 12: __Though the immunostain showed a decrease in signal intensity, it is hard to know whether the decrease is significant enough across all Rac1-KO mutants. They need to measure the fluorescence intensity and perform statistics.

      Response: We will generate better images of SRF staining and quantify the difference between Rac1-WT and Rac1-KO during the revision period.

      Comment 13: Fig. 6: Similar as Fig. 2, there is no quantification and n=1 per genotype is not enough

      Response: During the revision period we will increase the number of E12.5 Srf-KO and Srf-WT embryos to n=3 for Figure 6G. All other panels currently have n=7 or greater.

      Comment 14: Fig. 7: Need quantification between Srf-KO and Rac1-KO with statistics to show they are not different, but both significantly different from WTs

      Response: In Figure 7D we have added quantification of aSMA area in Srf-KO and Rac1-KO. These results show that both mutants have a similar phenotype with reduced aSMA expression compared to their respective WT littermates, which supports the conclusion that they work in the same pathway. We do not agree with the reviewer that the two mutants should show no statistical difference, because Rac1 and Srf are different genes with overlapping but also non-overlapping functions. During the revision period we will add more Srf-KO embryos and repeat the statistical analysis.

      Comment 15: Supplement Fig.2: No image showing the time point before E11.5.

      Response: We will add an E10.5 time point during the revision period.

      Comment 16: Supplement Fig.3: The ventral view of Rac1-WT does not have the same angle as it shows in Rac1-KO. Makes harder to see the difference between control and mutant.

      Response: We adjusted the brightness/contrast to make the difference clearer.

      Comment 17: Supplement Fig.4 &7: The alkaline phosphatase stained area needs to be normalized to some other metric because the embryos could be different size.

      Response: We normalized to the width of the eye and is now represented in Suppl. Fig. 4 and 7.

      Comment 18: Supplement Fig 6 A: The legend and figure don't match. Is it E13.5 or 14.5. Panel 6B needs better images without curling of the tissue.

      Response: This has been fixed. The immunostaining images in Suppl. Fig. 6A is E14.5. Panel B is now replaced with better images in the revised manuscript.


      Reviewer-2

      __Comment 1.1: __In Fig. 5, links between Rac1, SRF, αSMA, and contractility in mesenchymal cells are shown. Molecular analyses (Western blot and qPCR) were performed using primary cultured mesenchymal cells (prepared after freed from the epidermal population). Although use of cells prepared from E18.5 embryos may have been chosen by the authors for the safe isolation of the mesenchymal population without contamination of epidermal cells, this reviewer finds that anti-SRF immunoreactivity is weaker at E13.5 than at E12.5 (throughout the section including the mesencephalic wall) and therefore wonder whether SRF expression changes in a stage-dependent manner. So, simply borrowing results obtained from E18.5-derived cells for describing the scenario around E12.5 and E13.5 is a little disappointing point found only here in this study.

      Response: In fact, the reason we chose E18.5 was to get enough cells to do the experiments in Figure 5A-D without extensive passaging and/or immortalization, which would undoubtedly cause the cells to deviate from their in vivo character as they become adapted to growing on plastic with 10% serum. Therefore, we prefer not to change the cells as suggested by the reviewer.

      __Comment 1.2: __In Fig. 5F, it is difficult to clearly see "reduction" of SRF immunoreactivity in Rac1-KO. Therefore, quantification of %SRF+/totalTomato+ would be desired.

      Response: __We will generate better images of SRF staining and quantify the difference between Rac1-WT and Rac1-KO during the __revision period.

      __Comment 1.3: __Separately, direct comparison of spontaneous centripetal shrinkage of the apical/dorsal scalp tissues, which will occur in 30 min when prepared at E12.5 or E13.5 (Tsujikawa et al., 2022), between WT and Rac1-KO would strengthen the results in Fig. 5D. As KO is specific to the mesenchyme, the authors do not have to worry about removal of the epidermal layer (which would be much more difficult at E12.5-13.5 than E18.5). If the degree of centripetal shrinkage of the "epidermis plus mesenchyme" layers were smaller in Rac1-KO, it would be interpreted to be mainly due to poorer recoiling activity and contractility of the Rac1-KO mesenchymal tissue.

      Response: __We will try to perform the centripetal shrinkage assays as shown by Tsujikawa et al., during the __revision period.

      Comment 2: The authors favor "apical" vs. "basolateral" to tell the relative positions in the embryonic head, not only in the adult head. But "apical" vs. "basolateral" should be accompanied with dorsal vs. ventral at least at the first appearance. Apical-to-basal axis or apex vs. basolateral by itself can provide, in many contexts, impressions that epithelial layers/cells are being discussed. Please note that the authors also use "caudal" (in the embryonic head). Usually, a universally defined anatomical axis perpendicular to the rostral-to-caudal axis is the dorsal-to-ventral axis.

      Response: Apologies for confusing terminology. The terminology is now defined uniformly according to the anatomical axis.

      Comment 3: One of the authors' statements in ABSTRACT "In control embryos, α-smooth muscle actin (αSMA) expression was spatially restricted to the apical mesenchyme, suggesting a mechanical interaction between the growing brain and the overlying mesenchyme" and a similar one in RESULTS "αSMA was not detected in the basolateral mesenchyme of either genotype from E12.5-E14.5 (Suppl. Fig. 4A), suggesting restriction of the mechanosensitive cell state to the apical mesenchyme" need to be at least partly revised, taking previous publication about the normal αSMA pattern in the embryonic head into account more carefully. Tsujikawa et al. (2022) described "Low-magnification observations showed superficial immunoreactivity for alpha smooth muscle actin (αSMA), which has been suggested to function in cells playing force-generating and/or constricting roles; this immunoreactivity was continuously strong throughout the dorsal (calvarial) side of the head but not ventrally toward the face, producing a staining pattern similar to a cap (Figure 2A)" . Therefore, in this new paper, descriptions like "we observed ...., consistent with ....(2022)" or "we confirmed .... (2022)" would be more accurate and appropriate regarding this specific point. Such a minor change does not reduce this study's overall novelty at all.

      Response: Thank you for the correction. We have replaced the terminology and cited the article (Tsujikawa et al., 2022) appropriately, crediting their finding.

      Comment 4: It would be very helpful if the authors provide a schematic illustration in which physiological and pathological scenarios (at the molecular, cellular, and tissue levels found or suggested by this study) are shown.

      Response: We have added a schematic representation of the molecular changes happening in the apical head development because of Rac1- and Srf-KO, and it is represented in Suppl. Fig. 7C.


      Comment 5: Despite being put in the title, "mechanosensing" by mesenchymal cells is not directly assessed in this study. If appropriate, something like "mechano-functioning" would be closer to what the authors demonstrated.

      __Response: __We changed the title to refer to “mechano-responsive mesenchyme”. We think this is appropriate because the cells of interest have reduced aSMA and reduced proliferation, both of which are known to occur, at least in part, as responses to mechanical inputs.

      Reviewer-3

      Comment 1: Prrx1-Cre targets calvarial mesenchyme and Suzuki et al., 2009 showed that Prrx1-Cre mediated loss of Rac1 lead to calvarial bone phenotype due to incomplete fusion of the skull. While this phenotype was not studied in detail, the statement in the intro and discussion that the calvarial phenotype has not been recapitulated in mice is incorrect.

      Response: Suzuki et al showed incomplete fusion of the skull. Although the skull is a tissue that is affected in AOS, it is not akin to the scalp and calvaria aplasia that typifies AOS. Our result stands apart from this. We clarified our position as such:

      Introduction (page 4): “Nevertheless, the calvaria phenotype seen in AOS individuals has not been explored in detail or fully recapitulated in mice.”

      Discussion (page 11): Previous studies have demonstrated the role of Rac1 in mesenchyme-derived tissues, but they did not recapitulate AOS phenotypes.”

      Comment 2: The authors show that Pdgfra-Cre induced knockout of Rac1 leads to lower-than-expected numbers of Rac1-cKO embryos at E18.5 and P1. Phenotypic analysis shows that the earliest phenotype is blebbing and hematoma in the nasal region at E11.5/12.5. It is stated that this was resolved at E18.5. It is unclear if this is truly a resolution of the phenotype or that these embryos fail to survive until E18.5. Do 100% of the Rac1-cKO embryos exhibit the blebbing/hematoma at E11.5/12.5? What is the observed number/percentage of Rac1-cKO embryos at E11.5/12.5? If the observed percentage of Rac1-cKO is similar to that at E18.5 (lower than the expected 25%), this would support resolution. If the observed ratio is as expected at E11.5/12.5, then this would support embryonic loss before E18.5 rather than phenotypic resolution.

      Response: Please note that 100% (n=12) of E12.5 Rac1-KO embryos displayed nasal and mild caudal edema as exhibited in Fig. 2A, but none (n=16) had blebbing/hematoma by E18.5. We added tables for the number of embryos recovered at E12.5 and E18.5 to Supplemental Figure 1. These results show that the percentage of mutants at E12.5 was 21.42%, not significantly different from the expected frequency (p = 0.5371). At E18.5, the percentage dropped slightly to 18.3%, but still not significantly different from expected (p = 0.1545). The significant change in frequency of blebbing/hematoma from E12.5 to E18.5, without any significant change in the frequency of mutants, supports phenotypic resolution of the early blebbing/hematoma.

      Comment 3: It is stated that brain shape is altered in Rac1-cKO embryos at E14.5 and E18.5 and concluded that these shape differences are secondary to the cranial defects. Pdgfra+ cells gives rise to the meninges and if the Pdgfra-Cre line recapitulates this expression, then loss of the ubiquitously expressed Rac1 in the meninges could lead to a primary defect in the brain, which may lead to secondary defects in the calvarium and scalp. Their conclusion should recognize other possibilities.

      Response: We agree it is possible that there are meninges defects that secondarily change the shape of the brain, and we added a mention of this possibility. It is highly unlikely that scalp defects are only secondary to brain changes because the first observable phenotypes are in the EMM that gives rise to the scalp.

      Comment 4: The TdTom staining in wholemount at E13.5 (Supplemental Figure 2B) is difficult to appreciate in the image shown.

      Response: At E11.5 there is good contrast between labeled cranial structures and non-labeled body. At E13.5, Tomato appears in most of the mesenchymal cells in the embryo, so there is not as much contrast. The lack of contrast at E13.5 may cause the reviewer think there is something wrong with the image.

      Comment 5: The idea that the EMM laminates into the meninges and scalp layers is not new and should be properly cited (Vu et al., 2021, Scientific Reports). The following paper should also be cited on the use of alpha-SMA (Acta2) as a marker of the anterior calvaria mesenchyme: Holms et al., 2020 Cell Reports.

      Response: Thank you. We are happy to add those citations.

      Comment 6: It is concluded that meningeal development is maintained in the cKO; however, this conclusion was based on a single marker (S100a6) that is both expressed in the presumptive meninges and dermis and greatly reduced overall in the cKO. This conclusion should be softened or other markers used to show that the meninges is indeed normal.

      Response: We softened the conclusion on the meninges in the revised manuscript, as this part of the phenotype is was not our focus but it would be a good thing to look at in the future.

      Comment 7: The overlap of S100a6 and alpha-SMA is difficult to appreciate in the images shown in Figure 3. Since this is important to the conclusion, co-staining should be done. If co-staining cannot be done due to the primary antibodies' origins, then ISH should be done.

      Response: We added merged images.

      Comment 8: It is concluded that reduced alpha-SMA suggests an early failure of Rac-cKO cells to respond to the mechanical environment. While this is one possibility, the reduction of alpha-SMA may simply be due to a reduction of these cells resulting from failed differentiation, decreased proliferation, or increased apoptosis.

      Response: We think the fact that aSMA is downregulated in cultured cells strongly argues against it being a trivial consequence of reduce proliferation etc. Nevertheless, we softened our conclusion to allow for some of these things to also contribute to the reduced aSMA expression. We will check apoptosis during the revision period.

      Comment 9: The conclusion that alpha-SMA is a transient population only present in apical cranial mesenchyme between E12.5-14.5 is not consistent with prior studies: Holms et al., 2020 Cell Reports; Holms et al., 2021 Nature Communications; Farmer et al., 2021 Nature Communications; Takeshita et al., 2016 JBMR.

      Response: There is no contradiction. Our statements are based on antibody staining where it is very evident that a-SMA-expressing cells are detectable throughout the apical mesenchyme between E12.5 and E14.5. But at E18.5 we do not see this kind of broad aSMA expression the apical head, suggesting a transient and spatially restricted population of cells in the apical mesenchyme. This is consistent with the studies from Tsujikawa et al., 2022 and Angelozzi et al., 2022. The papers mentioned by the reviewer are only focused on the suture mesenchyme. They do not claim there is broad aSMA/Acta2 expression in the apical head, but only in a spatially restricted subpopulation of suture mesenchymal cells.

      Comment 10: In the SRF immunostaining results in control and Rac1-cKO embryos, it is difficult to appreciate the nuclear localization at E12.5 in Figure 5E, as the DAPI is over saturated, and the image quality is poor. The image quality is also poor in Figure 5F.

      Response: We will generate better images of SRF staining and quantify the difference between Rac1-WT and Rac1-KO during the revision period.

      Comment 11: To what extent is the expression/localization of MRTF, the transcriptional co-activator of SRF, altered in the calvarial mesenchyme of Rac1-cKO embryos? Changes in MRTF would strengthen the link between Rac1 and SRF.

      Response: We do not know how MRTF expression/localization changes in the embryo tissue, but western blot data on Rac1-KO fibroblasts revealed a reduction in expression/nuclear localization of MRTF-A/B that mirrored the changes in SRF. We added these blots to Figure 5A. However, as noted at the end of the discussion, MRTF is not always required for SRF function in vivo ( Dinsmore, Elife 2022). The MRTFA/B-KO is a possibility for future work.

      Comment 12: Hypoplasia of the apical mesenchyme (Figure 6G, inset 1) in Srf-cKO is difficult to see.

      Response: During the revision period we will increase the number of E12.5 Srf-KO and Srf-WT embryos to n=3 for Figure 6G and replace the picture with a better one.

      Comment 13: Generally, the organization of the data into many main and supplemental Figures makes the flow difficult to follow.

      __Response____: __We understand the concern, but we have tried our best to organize the most important data into main figures and the relevant but less essential data into supplemental figures.

      Comment 14: SFR interacts with Pdgfra interacts genetically with Srf in neural crest cells in craniofacial development, with Srf being a target of PDGFRa signaling (Vasudevan and Soriano, 2015, Dev Cell). Since the Pdgfra-Cre line used here is hemizygous, is important that the control used to look at SRF expression in the Rac1-cKO is Pdgfra-Cre+.

      Response: It is standard practice to include some Cre+ mice in the control set to reveal whether Cre has toxic effects in the cells of interest. To the reviewer’s concern about genetic interactions between the Pdgfra gene and Srf, this should not be relevant here because the Pdgfra-Cre used in our study is a transgene and does not affect the endogenous Pdgfra gene.

      Comment 15: The text size in all figures is too small and varies throughout, making it difficult to read.

      Response: To fit the panel in the Word document, the figure is resized. This should not be an issue in the final manuscript.

      Comment 16: Details about the pulse-chase timing of the EdU experiments should be included in the results. Also, does n = 3 for each stage and each genotype? I would be helpful to include a representative section for a control and cKO littermate pair.

      Response: The details are now included in the methods section. Yes, n=3 in each stage and genotype (Fig. 4A). The representative images are also included.

      Comment 17: The relative sizing of the panels within and between figures is haphazard. Some are very large and others very small (Figure 2, 6, Supplemental Figure 1, 2, 6, 7).

      Response: The image panels are fixed in the revised manuscript.

      Comment 18: In Figure 5A and F, the titles "E12.5" and "E13.5" are in italics.

      Response: The fonts for the figures are fixed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      We appreciate the reviewers' thoughtful comments and suggestions. Below, we provide point-by-point responses to the recommendations and outline the updates made to the manuscript.

      (1) Discussion, "the obvious experiment is to manipulate a neuron's anatomical embedding while leaving stimulus information intact."] The epiphenomenon can arise from the placement and types of a neuron's neurotransmitters and neuromodulators, too.

      The content of vesicles released by a neuron is obviously of great importance in determining postsynaptic impact. However, we’re suggesting that (assuming vesicular content is held constant) the anatomically-relevant patterning of spiking might additionally affect the postsynaptic neuron’s integration of the presynaptic input. To avoid confusion, we updated the text accordingly: “the obvious experiment is to manipulate a neuron's anatomical embedding while minimally impacting external and internal variables, such as stimulus information and levels of neurotransmitters or neuromodulators” (Line 594 - 596).

      (2) “In all conditions, the slope of the input duration versus sensitivity line was still positive at 1,800 seconds (Fig. 3B)". This may suggest that the estimate of the calculated statistics (ISI, PSTH) is more reliable with more data, rather than (or in addition to) specific information being extracted from faraway time points. Another potential confound is the training statistics were calculated from all training data, so the test data is a better match to training data when test statistics are calculated from more data. Overall, the validity of the conclusions following this observation is not clear to me.

      This is a great point. Accordingly, we revised the text to include this possibility: “Because the training data were of similar duration, this could be explained by either of two possibilities. First, the signal is relatively short, but noisy—in this case, extended sampling will increase reliability. Second, the anatomical signal is, itself, distributed over time scales of tens to hundreds of seconds.” (Line 252 - 255).

      (3) "This further suggests that there is a latent neural code for anatomical location embedded within the spike train, a feature that could be practically applied to determining the brain region of a recording electrode without the need for post-hoc histology". The performance of the model at the subregion level, which is a typical level of desired precision in locating cells, does not seem to support such a practical application. Please clarify to avoid confusion.

      The current model should not be considered a replacement for traditional methods, such as histology. Our intention is to convey that, with the inclusion of multimodal data and additional samples, a computational approach to anatomical localization has great promise. We updated the manuscript to clarify this point: “While significantly above chance, the structure-level model still lacks the accuracy for immediate practical application. However, it is highly likely that the incorporation of datasets with diverse multi-modal features and alternative regions from other research groups will increase the accuracy of such a model. In addition, a computational approach can be combined with other methods of anatomical reconstruction.” (Line 355 - 359).

      Additionally, we directly addressed this point in our original manuscript (Discussion section: Line 498 - 505 in the current version). Furthermore, following the release of our preprint, independent efforts have adopted a multimodal strategy with qualitatively similar results (Yu et al., 2024). Other recent work expands on the idea of utilizing single-neuron features for brain region/structure characterization (La Merre et al., 2024).

      Yu, H., Lyu, H., Xu, E. Y., Windolf, C., Lee, E. K., Yang, F., ... & Hurwitz, C. (2024). In vivo cell-type and brain region classification via multimodal contrastive learning. bioRxiv, 2024-11.

      Le Merre, P., Heining, K., Slashcheva, M., Jung, F., Moysiadou, E., Guyon, N., ... & Carlén, M. (2024). A Prefrontal Cortex Map based on Single Neuron Activity. bioRxiv, 2024-11.

      (4) "These results support the notion the meaningful computational division in murine visuocortical regions is at the level of VISp versus secondary areas.". The use of the word "meaningful" is vague and this conclusion is not well justified because it is possible that subregions serve different functional roles without having different spiking statistics.

      Precisely! It is well established that different subregions serve different functional purposes - but they do not necessitate different regional embeddings. It is important to note the difference between stimulus encoding and the embedding that we are describing. As a rough analogy, the regional embedding might be considered a language, while the stimulus is the content of the spoken words. However, to avoid vague words, we revised the sentence to “These results suggest that the computational differentiability of murine visuocortical regions is at the level of VISp versus secondary areas.” (Line 380 - 381)

      (5) Figure 3D left/right halves look similar. A measure of the effect size needs to accompany these p-values.

      We assume the reviewer is referring to Figure 3E. Although some of the violin plots in Figure 3E look similar, they are not identical. In the revision, we include effect sizes in the caption.

      (6) Figure 3A, 3F: Could uncertainty estimates be provided?

      Yes. We added uncertainty estimates to the text (Line 272 - 294) and to the caption of Figure S2, which displays confusion matrices corresponding to Figure 3A. The inclusion of similar estimates for 3F would be so unwieldy as to be a disservice to the reader—there are 240 unique combinations of stimulus parameters and structures. In the context of the larger figure, 3F serves to illustrate a relationship between stimulus, region, and the anatomical embedding.

      (7) Page 21. "semi-orthogonal". Please reword or explain if this usage is technical.

      We replaced “semi-orthogonal” with “dissociable” (Line 549).

      (8) Page 11, "This approach tested whether..."] Unclear sentence. Please reword.

      We changed “This approach tested whether the MLP’s performance depended on viewing the entire ISI distribution or was enriched in a subset of patterns” to “This approach identified regions of the ISI distribution informative for classification” (Line 261).

      Reviewer #2 (Recommendations for the authors):

      We appreciate the reviewer’s comments and summary of the results. We agree that the introductory results (Figs. 1-3) are not particularly compelling when considered in isolation. They provide a baseline of comparison for the subsequent results. Our intention was to approach the problem systematically, progressing from well-established, basic methods to more advanced approaches. This allows us to clearly test a baseline and avoid analytical leaps or untested assumptions. Specifically:

      ● Figure 1 provides an evaluation of the standard dimensionality reduction methods. As expected, these methods yield minimal results, serving as a clear baseline. This is consistent, for example, with an understanding of single units as rate-varying Poisson processes.

      ● Figures 2 and 3 then build upon these results with spiking features frequent in neuroscience literature such as firing rate, coefficient of variation, etc using linear supervised and more detailed spiking features such as ISI distribution using nonlinear supervised machine learning methods.

      By starting from the standpoint of the status quo, we are better able to contextualize the significance of our later findings in Figures 4–6.

      Response to Specific Points in the Summary

      (6) Separability of VISp vs. Secondary Visual Areas

      I found the entire argument about visual areas somewhat messy and unclear. The stimuli used might not drive the secondary visual areas particularly well and might necessitate task engagement.

      We appreciate your feedback that the dissection of visual cortical structures is unclear. To summarize, as shown in the bottom three rows of Figure 6, there is a notable lack of diagonality in visuocortical structures. This means that our model was unable to learn signatures to reliably predict these classes. In contrast, visuocortical layer is returned well above chance, and superstructures (primary and secondary areas) are moderately well identified, albeit still well above chance.

      Consider a thought experiment, if Charlie Gross had not shown faces to monkeys to find IT, or Newsome and others shown motion to find MT and Zeki and others color stimuli to find V4, we would conclude that there are no differences.

      The thought experiment is misleading. The results specifically do not arise from stimulus selectivity—much of Newsome’s own work suggests that the selectivity of neurons in IT etc. is explained by little more than rate varying Poisson processes. In this case, there should be no fundamental anatomical difference in the “language” of the neurons in V4 and IT, only a difference in the inputs driving those neurons. In contrast, our work suggests that the “language” of neurons varies as a function of some anatomical divisions. In other words, in contrast to a Poisson rate code, our results predict that single neuron spike patterns might be remarkably different in MT and IT— and that this is not a function of stimulus selectivity. Notably, the anatomical (and functional) division between V1 and secondary visual areas does not appear to manifest in a different “language”, thus constituting an interesting result in and of itself.

      We regret a failure to communicate this in a tight and compelling fashion on the first submission, but hope that the revision is limpid and accessible.

      Barberini, C. L., Horwitz, G. D., & Newsome, W. T. (2001). A comparison of spiking statistics in motion sensing neurones of flies and monkeys. Motion Vision: Computational, Neural, and Ecological Constraints, 307-320.

      Bair, W., Zohary, E., & Newsome, W. T. (2001). Correlated firing in macaque visual area MT: time scales and relationship to behavior. Journal of Neuroscience, 21(5), 1676-1697.

      Similarly, why would drifting gratings be a good example of a stimulus for the hippocampus, an area thought to be involved in memory/place fields?

      The results suggest that anatomical “language” is not tied to stimuli. It is imperative to recall that neurons are highly active absent experimentally imposed stimuli, such as when an animal is at rest, when an animal is asleep, and when an animal is in the dark (relevant to visual cortices). With this in mind, also recall that, despite the lack of stimuli tailored to the hippocampus, neurons therein were still reliably separable from neurons in seven nuclei in the thalamus, 6 of which are not classically considered visual regions. Should these regions (including hippocampus) have been inert during the presentation of visual stimuli, there would have been very little separability.

      (7) Generalization across laboratories

      “[C]omparison across laboratories was somewhat underwhelming. It does okay but none of the results are particularly compelling in terms of performance.

      Any result above chance is a rejection of the null hypothesis: that a model trained on a set of animals in Laboratory A will be ineffective in identifying brain regions when tested on recordings collected in Laboratory B (in different animals and under different experimental conditions). As an existence proof, the results suggest conserved principles (however modest) that constrain neuronal activity as a function of anatomy. That models fail to achieve high accuracy (in this context) is not surprising (given the limitations of available recordings)---that models achieve anything above chance, however, is.

      Thus, after reading the paper many times, I think part of the problem is that the study is not cohesive, and the authors need to either come up with a tool or demonstrate a scientific finding.

      We demonstrate that neuronal spike trains carry robust anatomical information. We developed an ML architecture for this and that architecture is publicly available.

      They try to split the middle and I am left somewhat perplexed about what exact scientific problem they or other researchers are solving.

      We humbly suggest that the question of a neurons “language” is highly important and central to an understanding of how brains work. From a computational perspective, there is no reason for a vast diversity of cell types, nor a differentiation of the rules that dictate neuronal activity in one region versus another. A Turing Complete system can be trivially constructed from a small number of simple components, such as an excitatory and inhibitory cell type. This is the basis of many machine learning tools.

      Please do not confuse stimulus specificity with the concept of a neuron’s language. Neurons in VISp might fire more in response to light, while those in auditory cortex respond to sound. This does not mean that these neurons are different - only that their inputs are. Given the lack of a literature describing our main effect—that single neuron spiking carries information about anatomical location—it is difficult to conclude that our results are either commonplace or to be expected.

      I am also unsure why the authors think some of these results are particularly important.

      See above.

      For instance, has anyone ever argued that brain areas do not have different spike patterns?

      Yes. In effect, by two avenues. The first is a lack of any argument otherwise (please do not conflate spike patterns with stimulus tuning), and the second is the preponderance of, e.g., rate codes across many functionally distinct regions and circuits.

      Is that not the premise for all systems neuroscience?

      No. The premise for all systems neuroscience (from our perspective) is that the brain is a) a collection of interacting neurons and b) the collective system of neurons gives rise to behavior, cognition, sensation, and perception. As stated above, these axiomatic first principles fundamentally do not require that neurons, as individual entities, obey different rules in different parts of the brain.

      I could see how one could argue no one has said ISIs matter but the premise that the areas are different is a fundamental part of neuroscience.

      Based on logic and the literature, we fundamentally disagree. Consider: while systems neuroscience operates on the principle that brain regions have specialized functions, there is no a priori reason to assume that these functions must be reflected in different underlying computational rules. The simplest explanation is that a single language of spiking exists across regions, with functional differences arising from processing distinct inputs rather than fundamentally different spiking rules. For example, an identical spike train in the amygdala and Layer 5 of M1 would have profoundly different functional impacts, yet the spike timing itself could be identical (even as stimulus response). Until now, evidence for region-specific spiking patterns has been lacking, and our work attempts to begin addressing this gap. There is extensive further work to be conducted in this space, and it is certain that models will improve, rules will be clarified, and mechanisms will be identified.

      Detailed major comments

      (1) Exploratory trends in spiking by region and structure across the population:

      The argument in this section is that unsupervised analyses might reveal subtle trends in the organization of spiking patterns by area. The authors show 4 plots from t-SNE and claim to see subtle organization. I have concerns. For Figure 1C, it is nearly impossible to see if a significant structure exists that differentiates regions and structures. So this leads certain readers to conclude that the authors are looking at the artifactual structure (see Chari et al. 2024) - likely to contribute to large Twitter battles. Contributing to this issue is that the hyperparameter for tSNE was incorrectly chosen. I do think that a different perplexity should be used for the visualization in order to better show the underlying structure; the current visualization just looks like a single "blob". The UMAP visualizations in the supplement make this point more clearly. I also think the authors should include a better plot with appropriate perplexity or not include this at all. The color map of subtle shades of green and yellow is hard to see as well in both Figure S1 and Figure 1.

      In response to the feedback, we replaced t-SNE/UMAP with LDA, while keeping PCA for dimensionality reduction.

      As stated in the original methods, t-SNE/UMAP hyperparameters were chosen based on the combination that led to the greatest classifiable separability of the regions/structures in the space (across a broad range of possible combinations). It just so happens that the maximally separable structure from a regions/structures perspective is the “blob”. This suggests that perhaps the predominant structure the t-SNE finds in the data is not driven by anatomy. If we selected hyperparameters in some other way that was not based specifically on regions/structures (e.g. simple visual inspection of the plots) the conformation would of course be different and not blob-like. However, we removed the t-SNE and UMAP to avoid further confusion.

      The “muddy appearance” is not an issue with the color map. As seen in Figure 1B, the chosen colors are visibly distinct. Figure 1C (previous version) appeared muddy yellow/green because of points that overlap with transparency, resulting in a mix of clearly defined classes (e.g., a yellow point on top of a blue point creating green). This overlap is a meaningful representation of the separability observed in this analysis. We also tried using 2D KDE for visualization, but it did not improve the impression of visual separability.

      We are removing p-values from the figures because they lead to the impression that we over-interpret these results quantitatively. However, we calculated p-values based on label permutation similar to the way R2 suggests (see previous methods). The conflation with the Wasserstein distances is an understandable misunderstanding. These are unrelated to p-values and used for the heatmaps in S1 only (see previous methods).

      Instead of p-values, we now use the adjusted rand index, which measures how accurately neurons within the same region are clustered together (see Line 670 - 671, Figure 1C, and Figure S1) (Hubert & Arabie 1985). This quantifies the extent to which the distribution of points in dimensionally-reduced space is shaped by region/structure.

      Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. https://doi.org/10.1007/BF01908075

      (2) Logistic classifiers:

      The results in this section are somewhat underwhelming. Accuracy is around 40% and yes above chance but I would be very surprised if someone is worried about separating visual structures from the thalamus. Such coarse brain targeting is not difficult. If the authors want to include this data, I recommend they show it as a control in the ISI distribution section. The entire argument here is that perhaps one should not use derived metrics and a nonlinear classifier on more data is better, which is essentially the thrust of the next section.

      As outlined above, our work systematically increases in model complexity. The logistic result is an intermediate model, and it returns intermediate results. This is an important stepping stone between the lack of a result based on unsupervised linear dimensionality reduction and the performance of supervised nonlinear models.

      From a purely utilitarian perspective, the argument could be framed as “one should not use derived metrics, and a nonlinear classifier on more data is better.” However, please see all of our notes above.

      (3) MLP classifiers:

      Even in this section, I was left somewhat underwhelmed that a nonlinear classifier with large amounts of data outperforms a linear classifier with small amounts of data. I found the analysis of the ISIs and which timescales are driving the classifier interesting but I think the classifier with smoothing is more interesting. So with a modest chance level decodability of different brain areas in the visual system, I found it somewhat grandiose to claim a "conserved" code for anatomy in the brain. If there is conservation, it seems to be at the level of the coarse brain organization, which in my opinion is not particularly compelling.

      The sample size used for both the linear and nonlinear classifiers is the same; however, the nonlinear classifier leverages the detailed spiking time information from ISIs. Our goal here was to systematically evaluate how classical spike metrics compare to more detailed temporal features in their ability to decode brain areas. We chose a linear classifier for spike metrics because, with fewer features, nonlinear methods like neural networks often offer very modest advantages over linear methods, less interpretability, and are prone to overfitting.

      Respectfully, we stand by our word choice. The term “conserved” is appropriate given that our results hold appreciably, i.e., statistically above chance, across animals.

      (4) Generalization section:

      The authors suggest that a classifier learned from one set of data could be used for new data. I was unsure if this was a scientific point or the fact that they could use it as a tool.

      It can be both. We are more driven by the scientific implications of a rejection of the null.

      Is the scientific argument that ISIs are similar across areas even in different tasks?

      It appears so - despite heterogeneity in the tuning of single neurons, their presynaptic inputs, and stimuli, there is identifiable information about anatomical location in the spike train.

      Why would one not learn a classifier from every piece of available data: like LFP bands, ISI distributions, and average firing rates, and use that to predict the brain area as a comparison?

      Because this would obfuscate the ability to conclude that spike trains embed information about anatomy.

      Considering all features simultaneously and adding additional data modalities—such as LFP bands and spike waveforms—has potential to improve classification accuracy at the cost of understanding the contribution of each feature. The spike train as a time series is the most fundamental component of neuronal communication. As a result, this is the only feature of neuronal activity of concern for the present investigation.

      Or is the argument that the ISIs are a conserved code for anatomy? Unfortunately, even in this section, the data are underwhelming.

      We appreciate the reviewer’s comments, but arrive at a very different conclusion. We were quite surprised to find any generalizability whatsoever.

      Moreover, for use as a tool, I think the authors need to seriously consider a control that is either waveforms from different brain areas or the local field potentials. Without that, I am struggling to understand how good this tool is. The authors said "because information transmission in the brain arises primarily from the timing of spiking and not waveforms (etc)., our studies involve only the timestamps of individual spikes from well-isolated units ". However, we are not talking about information transmission and actually trying to identify and assess brain areas from electrophysiological data.

      While we are not blind to the “tool” potential that is suggested by our work, this is not the primary motivation or content in any section of the paper. As stated clearly in the abstract, our motivation is to ask “whether individual neurons [...] embed information about their own anatomical location within their spike patterns”. We go on to say “This discovery provides new insights into the relationship between brain structure and function, with broad implications for neurodevelopment, multimodal integration, and the interpretation of large-scale neuronal recordings. Immediately, it has potential as a strategy for in-vivo electrode localization.” Crucially, the last point we make is a nod to application. Indeed, our results suggest that in-vivo electrode localization protocols may benefit from the incorporation of such a model.

      In light of the reviewer’s concerns, we have further dampened the weight of statements about our model as a consumer-ready tool.

      Example 1: The final sentence of the abstract now reads: “Computational approximations of anatomy have potential to support in-vivo electrode localization.”

      Example 2: The results sections now contains the following text: “While significantly above chance, the structure-level model still lacks the accuracy for immediate practical application. However, it is highly likely that the incorporation of datasets with diverse multi-modal features and alternative regions from other research groups will increase the accuracy of such a model. In addition, a computational approach can be combined with other methods of anatomical reconstruction.” (Line 355 - 359).

      Example 3: We replaced the phrase "because information transmission in the brain arises primarily from the timing of spiking and not waveforms (etc) " with the phrase “because information is primarily encoded by the firing rate or the timing of spiking and not waveforms (etc)” (Line 116 - 118).

      (5) Discussion section:

      In the discussion, beginning with "It is reasonable to consider . . ." all the way to the penultimate paragraph, I found the argumentation here extremely hard to follow. Furthermore, the parts of the discussion here I did feel I understood, I heavily disagreed with. They state that "recordings are random in their local sampling" which is almost certainly untrue when it comes to electrophysiology which tends to oversample task-modulated excitatory neurons (https://elifesciences.org/articles/69068). I also disagree that "each neuron's connectivity is unique, and vertebrate brains lack 'identified neurons' characteristic of simple organisms. While brains are only eutelic and "nameable" in only the simplest organisms (C. elegans), cell types are exceedingly stereotyped in their connectivity even in mammals and such connectivity defines their computational properties. Thus I don't find the premise the authors state in the next sentence to be undermined ("it seems unlikely that a single neuron's happenstance imprinting of its unique connectivity should generalize across stimuli and animals"). Overall, I found this subsection to rely on false premises and in my opinion it should be removed.

      At the suggestion of R2, we removed the paragraph in question. However, we would like to address some points of disagreement:

      We agree that electrophysiology, along with spike-sorting, quality metrics, and filtering of low-firing neurons, leads to oversampling of task-modulated neurons. However, when we stated that recordings are random in their local sampling, we were referring to structural (anatomical) randomness, not functional randomness. In other words, the recorded neurons were not specifically targeted (see below).

      Electrode arrays, such as Neuropixels, record from hundreds of neurons within a small volume relative to the total number of neurons and the volume of a given brain region. For instance, the paper R2 referenced includes a statement supporting this: “... assuming a 50-μm ‘listening radius’ for the probes (radius of half-cylinder around the probe where the neurons’ spike amplitude is sufficiently above noise to trigger detection) …, the average yield of 116 regular-spiking units/probe (prior to QC filtering) would imply a density of 42,000 neurons/mm³, much lower than the known density of ~90,000 neurons/mm³ for excitatory cells in mouse visual cortex….”

      If we take the estimated volume of V1 to be approximately 3 mm³, this region could theoretically be subdivided into multiple cylinders with a 100-μm diameter. While stereotaxic implantation of the probe mitigates some variability, the natural anatomical variability across individual animals introduces spatially random sampling. This was the randomness we were referring to, and thus, we disagree with the assertion that our claim is “almost certainly untrue.”

      Additionally, each cortical pyramidal neuron is understood to have ~ 10,000 presynaptic partners. It is highly unlikely that these connections are entirely pre-specified, perfectly replicated within the same animal, and identical across all members of species. Further, there is enormous diversity in the activity properties of even neighboring cells of the same type. Consider pyramidal neurons in V1. Single neuron firing rates are log normally distributed, there are many of combinations of tuning properties (i.e., direction, orientation) that must occupy each point in retinotopic space, and there is powerful experience dependent change in the connectivity of these cells. We suggest that it is inconceivable that any two neurons, even within a small region of V1, have identical connectivity.

      Minor Comments:

      (1) Although the description of confusion matrices is good from a didactic perspective, some of this could be moved to methods to simplify the paper.

      We thank the reviewer for the suggestion. However, given the broad readership of eLife, we gently suggest that confusion matrices are not a trivial and universally appreciated plotting format. For the purpose of accessibility, a brief and didactic 2-sentence description will make the paper far more comprehensible to many readers at little cost to experts.

      (2) Figure 3A: It is concluded in their subsequent figure that the longer the measured amount of time, the better the decoding performance. Thus it makes sense why the average PSTHs do not show significant decoding of areas or structures

      That is a good observation. However, all features were calculated from the same duration of data, except in Figure 3B, where we tested the effect of duration. The averaged PSTH was calculated from the same length of data as the ISI distribution and binned to have the same number of feature lengths as the ISI distribution (refer to Methods section). Therefore, we interpreted this as an indication of information degradation through averaging, rather than an effect of data length (Line 234 - 237).

      (3) Figure 3D: A Gaussian is used to fit the ISI distributions here but ISI distributions do not follow a normal distribution, they follow an inverse gamma distribution.

      We agree with the reviewer and we are familiar with the literature that the ISI distribution is best fitted by a gamma family distribution (as a recent, but not earliest example: Li et al. 2018). However, we did not fit a gaussian (or any distribution) to the data, we just calculated the sample mean and variance. Reporting sample mean and variance (or standard deviation) is not something that is only done for Gaussian distributions. They are broadly used metrics that simply have additional intrinsic meaning for Gaussian distributions. We used the schematic illustration in Fig 3D because mean and variance are much more familiar in Gaussian distribution context, but ultimately that does not affect our analyses in Fig 3 E-F. Alternatively, the alpha and beta intrinsic parameters of a gamma distribution could have been used, but they are known by a much smaller portion of neuroscientists.

      Li, M., Xie, K., Kuang, H., Liu, J., Wang, D., Fox, G. E., ... & Tsien, J. Z. (2018). Spike-timing pattern operates as gamma-distribution across cell types, regions and animal species and is essential for naturally-occurring cognitive states. Biorxiv, 145813(10.1101), 145813.

      (4) Figure 3G: Something is wrong with this figure as each vertical bar is supposed to represent a drifting grating onset but yet, they are all at 5 hz despite the PSTH being purportedly shown at many different frequencies from 1 to 15 hz.

      We appreciate your attention to detail, but we are not representing the onset of individual drifting gratings in this. We just meant to represent the overall start\end of the drifting grating session. We did not intend to signal the temporal frequency of the drifting gratings (or the spatial frequency, orientation, or contrast).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      The mechanism, as I understand it, is different from what the authors described before in the RNN with tonic gain changes. As uncertainty increases, the network enters a regime in which the two excitatory populations start to oscillate. My intuition is that this oscillation arises from the feedback loop created by the new gain control mechanism. If my intuition is correct, I think it would be worth to explain this mechanism in the paper more explicitly.

      While interesting, this intuition is not correct. The oscillations are generated by the interaction between excitatory and inhibitory nodes in the network and occur in the model even with stationary gain. All of the plots in figure 3 exploring the dynamical regime of the network at different input x gain combinations (i.e., where the oscillatory regime is characterised) are simulations run with stationary gain.

      To ensure that this intuition is more clearly presented in the manuscript, we have edited the description in the text.

      P. 12: “Because of the large size of the network, we could not solve for the fixed points or study their stability analytically. Instead, we opted for a numerical approach and characterised the dynamical regime (i.e. the location and existence of approximate fixed-point attractors) across all combinations of (static) gain and  visited by the network.”

      Reviewer #2 (Public review):

      - The demonstration of the causal role of gain modulation in perceptual switches is partial. This causality is clearly demonstrated in the simulation work with the RNN. However, it is not fully demonstrated in the pupil analysis and the fMRI analysis. One reason is that this work is correlative (which is already very informative). An analysis of the timing of the effect might have overcome this limitation. For example, in a previous study, the same group showed that fMRI activity in the LC region precedes changes in the energy landscape of fMRI dynamics, which is a step towards investigating causal links between gain modulation, changes in the energy landscape and perceptual switches.

      Thank you for the suggestion, which we considered in detail. Unfortunately, the  temporal and spatial resolution of the fMRI data collected for this study precluded the same analyses we’ve run in previous work, however this is an important question for future work.

      - Some effects may reflect the expectation of a perceptual switch rather than the perceptual switch itself. To mitigate this risk, the design of the fMRI task included catch trials, in which no switch occurs, to reduce the expectation of a switch. The pupil study, however, did not include such catch trials.

      We agree that this is a limitation of the current study, which we previously highlighted in the methods section.

      - The paper uses RNN-based modelling to provide mechanistic insight into the role of gain modulation in perceptual switches. However, the RNN solves a task that differs markedly from that performed by human participants, which may limit the explanatory value of the model. The RNN is provided with two inputs characterising the sensory evidence supporting the first and last image category in the sequence (e.g. plane and shark). In contrast, observers in the task were naïve as to the identity of the last image at the beginning of the sequence. The brain first receives sensory evidence about the image category (e.g. plane) with which the sequence begins, which is very easy to recognise, then it sees a sequence of morphed images and has to discover what the final image category will be. To discover the final image category, the brain has to search a vast space of possible second images (it is a shark?, a frog?, a bird?, etc.), rather than comparing the likelihood of just two categories. This search process and the perceptual switch in the task appear to be mechanistically different from the competition between two inputs in the RNN.

      We appreciate the critical analysis of the experimental paradigm but disagree with the reviewers conclusions for two keys reasons: 1) Participants prior exposure to the images, such that they could create an expectation about what stimulus category a particular image would transition into (i.e., the image could not switch into any possible category); and 2) even if the reviewers’ concern was founded, models of K winner-take-all decision making are structured identically irrespective of whether the options are 2 or K options all that changes is the simulated reaction times which depend linearly on the K (for an example model see Hugh Wilson’s textbook Spikes, Decisions, and Actions, 1999, p.89-91). For these reasons, we maintain that the RNN is a sensible representation of the behavioural task.

      - Another aspect of the motivation for the RNN model remains unclear. The authors introduce dynamic gain modulation in the RNN, but it is not clear what the added value of dynamic gain modulation is. Both static (Fig. S1) and dynamic (Fig. 2F) gain modulation lead to the predicted effect: faster switching when the gain is larger.

      While we agree that the effect is observable with both static and dynamic gain, the stronger construct validity associated with the dynamic approach, including a stronger link with the observed pupil dynamics and a rich literature associated with modelling the behavioural consequences of surprise/uncertainty led us to the conclusion that the dynamical approach was a better representation of our hypothesis.

      - Fig 1C: I don't see a "top grey bar" indicating significance.

      Thank you for catching this, the caption has been amended. The text was from an older version of the manuscript.

      - p. 10, reference to fig 3F seems incorrect: there is Fig 3F upper and Fig 3F lower, and nothing on Fig 3 and its legend mention the lesion of units

      This has been amended. We meant to refer to 2F.

      - In the response letter you mention a MATLAB tutorial, but I could not find it.

      This has been amended. Github repository can be found at https://github.com/ShineLabUSYD/AmbiguousFigures

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript reports that expression of the E. coli operon topAI/yjhQ/yjhP is controlled by the translation status of a small open reading frame, that authors have discovered and named toiL, located in the leader region upstream of the operon. Authors propose the following model for topAI activation: Under normal conditions, toiL is translated but topAI is not expressed because of Rho-dependent transcription termination within the topAI ORF and because its ribosome binding site and start codon are trapped in an mRNA hairpin. Ribosome stalling at various codons of the toiL ORF, prompted in this work by some ribosome-targeting antibiotics, triggers an mRNA conformational switch which allows translation of topAI and, in addition, activation of the operon's transcription because presence of translating ribosomes at the topAI ORF blocks Rho from terminating transcription. The model is appealing and several of the experimental data mainly support it. However, it remains unanswered what is the true trigger of the translation arrest at toiL and what is the physiological role of the induced expression of the topAI/yjhQ/yjhP operon.

      Reviewer #2 (Public review):

      Summary:

      Baniulyte and Wade describe how translation of an 8-codon uORF denoted toiL upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent termination with the gene.

      Strengths:

      The authors used multiple different approaches such as a genetic screen to identify factors such as 23S rRNA mutations that affect topA1 expression and ribosome profiling to examine the consequences of various antibiotics on toiL-mediated regulation.

      Weaknesses:

      Future experiments will be needed to better understand the physiological role of the toiL-mediated regulation and elucidate the mechanism of specific antibiotic sensing.

      The results are clearly described, and the revisions have helped to improve the presentation of the data.

      Reviewer #3 (Public review):

      In this revised manuscript, the authors provide convincing data to support an elegant model in which ribosome stalling by ToiL promotes downstream topAI translation and prevents premature Rho-dependent transcription termination. However, the physiological consequences of activating topAI-yjhQP expression upon exposure to various ribosome-targeting antibiotics remain unresolved. The authors have satisfactorily addressed all major concerns raised by the reviewers, particularly regarding the SHAPE-seq data. Overall, this study underscores the diversity of regulatory ribosome-stalling peptides in nature, highlighting ToiL's uniqueness in sensing multiple antibiotics and offering significant insights into bacterial gene regulation coordinated by transcription and translation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      - Showing the ribosome density profiles of topAI/yjhQP and toiL in control and tetracycline treated cells is necessary to support that ribosome arrest at toiL increases translation of topAI/yjhQP.

      Figure 7B shows ribosome density around the start of toiL. Ribosome density increases across topAI in the presence of tetracycline, but we have opted not to show this region because we cannot say whether the increase in ribosome occupancy (represented in Figure 7A) is due to an increase in translation efficiency, RNA level, or both.

      - The subinhibitory antibiotic concentrations used in the reporter assays were based on MICs reported in the literature. This is not appropriate since MICs can greatly vary between strains, antibiotic solution stocks, and experimental conditions.

      Reported MICs were used as an initial guide for selecting antibiotic concentrations to test in our reporter assays. We have added text to indicate this, and to highlight that MICs vary considerably between strains.

      - toiL sequence may have evolved to maintain base-pairing with the topAI upstream region rather than, as authors suggest in Discussion, to respond to antibiotic-mediated arrest in an amino acid sequence specific manner.

      We have chosen to frame this as speculation.

      - Authors may consider commenting on the possibility that chloramphenicol does not induce because ToiL lacks alanine residues, whose presence at specific places of a nascent protein have been shown to promote chloramphenicol action (2016 PNAS 113:12150; 2022 NSMB 29:152).

      This is a great point as none of our stalling reporters included an ORF with alanine. We now include a short paragraph in the Discussion section to raise this possibility.

      - Tetracycline was added at the "subinhibitory concentration" of 8 ug/mL for the reporter assays but at 1 ug/mL for the ribosome profiling experiments. Authors should explain what was the rational for this.

      We think the reviewer is mixing up the epidemiological cut-off value of 8 ug/mL with the concentration used in experiments (0.5-1 ug/mL for reporter assays and ribosome profiling). The text was confusing, so we have added a sentence to the Methods section to indicate that epidemiological cut-off values and MICs were only a guide for selecting antibiotic concentrations to test.

      Reviewer #2 (Recommendations for the authors):

      I wish the authors had been slightly less dismissive of the reviewers' comments. At a minimum, it would be nice if the authors could be consistent about the ribosome representation throughout the manuscript;

      We apologize if our previous responses gave the impression of being dismissive. That was certainly not our intention. We greatly value the reviewers' feedback, and we appreciate the opportunity to clarify any misunderstandings. We believe the reviewer is referring to the different shape and color of the ribosome in Figures 8 and 9, and Figure 8 figure supplement 2, which we have now corrected.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      This important and creative study finds that the uplift of the Qinghai-Tibet Plateau-via its resultant monsoon system rather than solely its high elevation-has shifted avian migratory directions from a latitudinal to a longitudinal orientation. However, the main claims are incomplete and only partially supported, as the reliance on eBird data-which lacks the resolution to capture population-specific teleconnections-combined with a limited tracking dataset covering only seven species leaves key aspects of the argument underdetermined, and the critical assumption of niche conservatism is not sufficiently foregrounded in the manuscript. More clearly communicating these limitations would significantly enhance the interpretability of the results, ensuring that the major conclusions are presented in the context of these essential caveats.

      We appreciate your positive comments and constructive suggestions. We fully acknowledge your concerns about clearly communicating the limitations associated with the data used and analytical assumptions. We will try to get more satellite tracking data of birds migrating across the plateau. We will carefully consider the insights that our paper can deliver and make sure the limitations of our datasets and the critical assumption of niche conservatism are clearly presented. By explicitly clarifying these caveats, we believe the transparency and interpretability of the findings will be much improved.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors have done a good job of responding to the reviewer's comments, and the paper is now much improved.

      Again, we thank the reviewer for constructive comments during review.

      Reviewer #2 (Public review):

      I would like to thank the authors for the revision and the input they invested in this study.

      We are grateful for your thoughtful feedback and enthusiasms, which will help us improve our manuscript.

      With the revised text of the study, my earlier criticism holds, and your arguments about the counterfactual approach are irrelevant to that. The recent rise of the counterfactual approach might likely mirror the fact that there are too many scientists behind their computers, and few go into the field to collect in situ data. Studies like the one presented here are a good intellectual exercise but the real impact is questionable.

      We understand your question about the relevance of the counterfactual approach used in our study. Our intent in using a counterfactual scenario (reconstructing migration patterns assuming pre-uplift conditions on the QTP) was to isolate the potential influence of the plateau’s geological history on current migration routes. We agree that such an approach must be used properly. In the revision, we will explicitly clarify why this counterfactual comparison is useful – namely, it provides a theoretical baseline to test how much the QTP’s uplift (and the associated monsoon system) might have redirected migration paths. We acknowledge that the counterfactual results are theoretical and will explicitly emphasise the assumptions involved (e.g. species–environment relationships hold between pre- and post- lift environments) in the main text. Nonetheless, we defend the approach as a valuable study design: it helps generate testable hypotheses about migration (for instance, that the plateau’s monsoon-driven climate, rather than just its elevation, introduces an east–west shift en route). We will also tone down the language around this analysis to avoid overstating its real-world relevance. In summary, we will clarify that the counterfactual analysis is meant to complement, not replace, empirical observations, and we will discuss its limitations so that its role is appropriately bounded in the paper.

      All your main conclusions are inferred from published studies on 7! bird species. In addition, spatial sampling in those seven species was not ideal in relation to your target questions. Thus, no matter how fancy your findings look, the basic fact remains that your input data were for 7 bird species only! Your conclusion, “our study provides a novel understanding of how QTP shapes migration patterns of birds” is simply overstretching.

      Thank you for your comments. We apologise for any confusion regarding the scope of our dataset. Our main conclusions are not solely derived from seven bird species. Rather, we integrated a full list of 50 bird species that migrate across the QTP and analysed their migratory patterns with eBird data. We studied the factors influencing their choices of migratory routes with seven species that were among the few with available tracking data across the QTP. In this revision, we will clarify the role of these seven species and the rationale for their selection. Additionally, we attempt to include more satellite tracking data to improve spatial coverage, as recommended by the reviewer and editor. Based on discussions with potential collaborators, we will hopefully include a number of at least 10 more species with available tracking data.

      The way you respond to my criticism on L 81-93 is something different than what you admit in the rebuttal letter. The text of the ms is silent about the drawbacks and instead highlights your perspective. I understand you; you are trying to sell the story in a nice wrapper. In the rebuttal you state: “we assume species' responses to environments are conservative and their evolution should not discount our findings.” But I do not see that clearly stated in the main text.

      Thanks, as suggested we will clearly state the assumptions of niche conservatism in the Introduction.

      In your rebuttal, you respond to my criticism of "No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites" when you responded: ... "we can track the movement of species every week, and capture the breeding and wintering areas for specific populations" I am having a feeling that you either play with words with me or do not understand that from eBird data nobody will be ever able to estimate population-specific teleconnections between breeding and wintering areas. It is simply impossible as you do not track individuals. eBird gives you a global picture per species but not for particular populations. You cannot resolve this critical drawback of your study.

      We agree that inferring population-specific migratory connections (teleconnections) from eBird data is challenging and inherently limited. eBird provides occurrence records for species, but it generally cannot distinguish which breeding population an individual bird came from or exactly where it goes for winter. However, in this study we intend to infer broad-scale movement patterns (e.g. general directions and stopover regions) rather than precise one-to-one population linkages. In the revision, we will carefully rephrase those sections to make clear that our inferences are at the species level and at large spatial scales. We will also explicitly state in the Discussion that confirming population connectivity would require targeted tracking or genetic studies, and that our eBird-based analysis can only suggest plausible routes and region-to-region linkages. We will contrast migratory routes identified by using eBird data and satellite tracking for the same species to check their similarity. We argue that, even with its limits, the eBird dataset can still yield useful insights (such as identifying major flyway corridors over the QTP).

      I am sorry that you invested so much energy into this study, but I see it as a very limited contribution to understanding the role of a major barrier in shaping migration.

      Thank you for recognising our efforts in the study. By integrating both satellite tracking and community-contributed data, we explored how the uplift of the QTP could shape avian migration across the area. We believe our findings provide important insights of how birds balance their responses to large-scale climate change and geological barrier, which yields the most comprehensive picture to date of how the QTP uplift shapes migratory patterns of birds. We will also acknowledge the study’s limitations to ensure that readers understand the context and constraints of our findings.

      My modest suggestion for you is: go into the field. Ideally use bird radars along the plateau to document whether the birds shift the directions when facing the barrier.

      We appreciate your suggestions to incorporate field tracking or radar studies to strengthen our results. All coauthors have years of field experiences, even on the QTP and Arctic. For example, the tracking data of peregrine falcons (Falco peregrinus) that we will incorporate in the revision are collected with during our own fieldwork in the Arctic for more than six years. We agree that more direct tracking (through GPS tagging or radar) would be an ideal way to validate migration pathways and population connectivity. In this revision, as stated above we will try to more species with satellite tracking data. We will also note that future studies should build on our findings by using dedicated tracking of more individual birds and radar monitoring of migration over the QTP. We will cite recent advances in these techniques and suggest that incorporating more tracking data could further test the hypotheses generated by our analyses.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      L55 "an important animal movement behaviour is.." Is there any unimportant animal movement? I mean this sentence is floppy, empty.

      We will rewrite this sentence to remove any ambiguous phrasing.

      L 152-154 This sentence is full of nonsense or you misinterpretation. First of all, the issue of inflexible initiation of migration was related to long-distance migrants only! The way you present it mixes apples and oranges (long- and short-distance migrants). It is not "owing to insufficient responses" but due to inherited patterns of when to take off, photoperiod and local conditions.

      We will remove the sentence to avoid misinterpretation.

      L 158 what is a migration circle? I do not know such a term.

      We will amend it as “annual migration cycle”, which is a more common way to describe the yearly round-trip journey between breeding and wintering grounds of birds.

      L 193 The way you present and mix capital and income breeding theory with your simulation study is quite tricky and super speculative.

      We will present this idea as an inference rather than a conclusion: “This pattern could be consistent with a ‘capital breeding’ strategy — where birds rely on energy reserves acquired before breeding — rather than an ‘income’ strategy that depends on food acquired during breeding. However, we note that this interpretation would require further study.” By adding this caution, we will make it clear that we are not asserting this link as proven fact, only suggesting it as one possible explanation. We will also double-check that the rest of the discussion around this point is framed appropriately.


      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study addresses a novel and interesting question about how the rise of the Qinghai-Tibet Plateau influenced patterns of bird migration, employing a multi-faceted approach that combines species distribution data with environmental modeling. The findings are valuable for understanding avian migration within a subfield, but the strength of evidence is incomplete due to critical methodological assumptions about historical species-environment correlations, limited tracking data, and insufficient clarity in species selection criteria. Addressing these weaknesses would significantly enhance the reliability and interpretability of the results.

      We would like to thank you and two anonymous reviewers for your careful, thoughtful, and constructive feedback on our manuscript. These reviews made us revisit a lot of our assumptions and we believe the paper is much improved as a result. In addition to minor points, we have made three main changes to our manuscript in response to the reviews. First, we addressed the concerns on the assumptions of historical species-environment correlations from perspectives of both theoretical and empirical evidence. Second, we discussed the benefits and limitations of using tracking data in our study and demonstrate how the findings of our study are consolidated with results of previous studies. Third, we clarified our criteria for selecting species in terms of both eBird and tracking data.

      Below, we respond to each comment in turn. Once again, we thank you all for your feedback.

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      We are appreciative of the reviewer’s careful reading of our manuscript, encouraging comments and constructive suggestions.

      Weaknesses:

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. This relates to the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to a lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section reads as quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      Yes, it is the journal request to format in this way (Methods follows the Results and Discussion) for the article type of short reports. As suggested, in the revision we have elaborated on details of our findings, in terms of (i) shifts of distribution of avian breeding and wintering areas under the influence of the uplift of the Qinghai-Tibet Plateau (Lines 102-116), and (ii) major factors that shape current migration patterns of birds in the plateau (Lines 118-138). We have also better referenced the approaches we used in the study.

      Reviewer #2 (Public review):

      Summary:

      The study tries to assess how the rise of the Qinghai-Tibet Plateau affected patterns of bird migration between their breeding and wintering sites. They do so by correlating the present distribution of the species with a set of environmental variables. The data on species distributions come from eBird. The main issue lies in the problematic assumption that species correlations between their current distribution and environment were about the same before the rise of the Plateau. There is no ground truthing and the study relies on Movebank data of only 7 species which are not even listed in the study. Similarly, the study does not outline the boundaries of breeding sites NE of the Plateau. Thus it is absolutely unclear potentially which breeding populations it covers.

      We are very grateful for the careful review and helpful suggestions. We have revised the manuscript carefully in response to the reviewer’s comments and believe that it is much improved as a result. Below are our point-by-point replies to the comments.

      Strengths:

      I like the approach for how you combined various environmental datasets for the modelling part.

      We appreciate the reviewer’s encouragement.

      Weaknesses:

      The major weakness of the study lies in the assumption that species correlations between their current distribution and environments found today are back-projected to the far past before the rise of the Q-T Plateau. This would mean that species responses to the environmental cues do not evolve which is clearly not true. Thus, your study is a very nice intellectual exercise of too many ifs.

      This is a valid concern. We have addressed this from both the perspectives of the theoretical design of our study and empirical evidence.

      First, we agree with the reviewer that species responses to environmental cues might vary over time. Nonetheless, the simulated environments before the uplift of the plateau serve as a counterfactual state in our study. Counterfactual is an important concept to support causation claims by comparing what happened to what would have happened in a hypothetical situation: “If event X had not occurred, event Y would not have occurred” (Lewis 1973). Recent years have seen an increasing application of the counterfactual approach to detect biodiversity change, i.e., comparing diversity between the counterfactual state and real estimates to attribute the factors causing such changes (e.g., Gonzalez et al. 2023). Whilst we do not aim to provide causal inferences for avian distributional change, using the counterfactual approach, we are able to estimate the influence of the plateau uplift by detecting the changes of avian distributions, i.e., by comparing where the birds would have distributed without the plateau to where they currently distributed. We regard the counterfactual environments as a powerful tool for eliminating, to the extent possible, vagueness, as opposed to simply description of current distributions of birds. Therefore, we assume species’ responses to environments are conservative and their evolution should not discount our findings. We have clarified this in the Introduction (Lines 81-93).

      Second, we used species distribution modelling to contrast the distributions of birds before and after the uplift of the plateau under the assumption that species tend to keep their ancestral ecological traits over time (i.e., niche conservatism). This indicates a high probability for species to distribute in similar environments wherever suitable. Particularly, considering bird distributions are more likely to be influenced by food resources and vegetation distributions (Qu et al. 2010, Li et al. 2021, Martins et al. 2024), and the available food and vegetation before the uplift can provide suitable habitats for birds (Jia et al. 2020), we believe the findings can provide valuable insights into the influence of the plateau rise on avian migratory patterns. Having said that, we acknowledge other factors, e.g., carbon dioxide concentrations (Zhang et al. 2022), can influence the simulations of environments and our prediction of avian distribution. We have clarified the assumptions and evidence we have for the modelling in Methods (Lines 362-370).

      The second major drawback lies in the way you estimate the migratory routes of particular birds. No matter how good the data eBird provides is, you do not know population-specific connections between wintering and breeding sites. Some might overwinter in India, some populations in Africa and you will never know the teleconnections between breeding and wintering sites of particular species. The few available tracking studies (seven!) are too coarse and with limited aspects of migratory connectivity to give answer on the target questions of your study.

      We agree with the reviewer that establishing interconnections for birds is important for estimating the migration patterns of birds. We employed a dynamic model to assess their weekly distributions. Thus, we can track the movement of species every week, and capture the breeding and wintering areas for specific populations. That being said, we acknowledge that our approach can be subjected to the patchy sampling of eBird data. In contrast, tracking data can provide detailed information of the movement patterns of species but are limited to small numbers of species due to the considerable costs and time needed. We aimed to adopt the tracking data to examine the influence of focal factors on avian migration patterns, but only seven species, to the best of our ability, were acquired. Moreover, similar results were found in studies that used tracking data to estimate the distribution of breeding and wintering areas of birds in the plateau (e.g., Prosser et al. 2011, Zhang et al. 2011, Zhang et al. 2014, Liu et al. 2018, Kumar et al. 2020, Wang et al. 2020, Pu and Guo 2023, Yu et al. 2024, Zhao et al. 2024). We believe the conclusions based on seven species are rigour, but their implications could be restricted by the number of tracking species we obtained. We have better demonstrated how our findings on breeding and wintering areas of birds are reinforced by other studies reporting the locations of those areas. We have also added a separate caveat section to discuss the limitations stated above (Lines 202-215).

      Your set of species is unclear, selection criteria for the 50 species are unknown and variability in their migratory strategies is likely to affect the direction of the effects.

      In this revision, we have clarified the selection criteria for the 50 species and outlined the boundaries of the breeding areas of all birds (Lines 243-249). Briefly, we first obtained a full list of birds in the plateau from Prins and Namgail (2017). We then extracted species identified as full migrants in Birdlife International (https://datazone.birdlife.org/species/spcdistPOS) from the full list. Migratory birds may follow a capital or income migratory strategy depending on how much birds ingest endogenous reserved energy gained prior to reproduction. We have added discussions on how these migratory strategies might influence the effects of environment on migratory direction (Lines 183-200).

      In addition, the position of the breeding sites relative to the Q-T plate will affect the azimuths and resulting migratory flyways. So in fact, we have no idea what your estimates mean in Figure 2.

      We calculated the azimuths not only by the angles between breeding sites and wintering sites but also based on the angles between the stopovers of birds. Therefore, the azimuths are influenced by the relative positions of breeding, wintering and stopover sites. This would minimize the possible errors by just using breeding areas such as the biases caused by relative locations of breeding areas to the QTP as the reviewer pointed. We have better explained this both in the Introduction, Methods and legend of Figure 2.

      There is no way one can assess the performance of your statistical exercises, e.g. performances of the models.

      As suggested, we have reported Area Under the Curve (AUC) of the Receiver Operator Characteristic (ROC)assess the performances of the models (Table S1). AUC is a threshold-independent measurement for discrimination ability between presence and random points (Phillips et al. 2006). When the AUC value is higher than 0.75, the model was considered to be good (Elith et al. 2006). (Lines 379-383).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This is an interesting topic and a novel theme. The visualisations and presentation are to a very high standard. The Introduction is very well-written and introduces the main concepts well, with a clear logical structure and good use of the literature. The Methods are detailed and well described and written in such a fashion that they are transparent and repeatable.

      I only have one major issue, which is possibly a product of the structure requirements of the paper/journal. With the Results and Discussion, line 91 onwards. I understand the structure of the paper necessitates delving immediately into the results, but it is quite hard to follow due to a lack of background information. In comparison to the Methods, which are incredibly detailed, the Results in the main section read quite superficial. They provide broad overviews of broad findings but I found it very hard to actually get a picture of the main results in its current form. For example, how the different species factor in, etc.

      Please see our responses above.

      Reviewer #2 (Recommendations for the authors):

      Methodological issues:

      Line 219 Why have you selected only 64 species and what were the selection criteria?

      We have clarified the selection criteria (Lines 243-248). Briefly, we first obtained a full list of birds in the plateau from Prins and Namgail (2017). We then extracted species identified as full migrants in Birdlife International (https://datazone.birdlife.org/species/spcdistPOS) from the full list.

      Minor:

      Line 219 eBird has very uneven distribution, especially in vast areas of Russia. How can your exercise on Lines 232-238 overcome this issue?

      Yes, eBird data can be biased due to patchy sampling and variation of observers’ skills in identifying species. To address this issue, we have developed an adaptive spatial-temporal modelling (stemflow; Chen et al. 2024) to correct the imbalance distribution of data and modelled the observer experience to address the bias in recognising species. The stemflow was developed based on a machine learning modelling framework (AdaSTEM) which leverages the spatio-temporal adjacency information of sample points to model occurrence or abundance of species at different scales. It has been frequently used in modelling eBird data (Fink et al. 2013, Johnston et al. 2015, Fink et al. 2020) and has been proven to be efficient and advanced in multi-scale spatiotemporal data modelling. We have better explained this (Lines 251-270; Lines 307-321).

      Line 54 This sentence sounds very empty and in fact does not tell us much.

      We have adjusted this sentenced to “Animal movement underpins species’ spatial distributions and ecosystem processes”.

      Line 55 Again a sentence that implies a causality of the annual cycle to make the species migrate. It does not make sense.

      We have revised this sentence as “An important animal movement behaviour is migrating between breeding and wintering grounds”.

      Line 58 How is our fascination with migratory journeys related to the present article? I think this line is empty.

      We have changed this sentence to “Those migratory journeys have intrigued a body of different approaches and indicators to describe and model migration, including migratory direction, speed, timing, distance, and staging periods”.

      Figure 1 - ABC insets are OK, but a combination of lati- and longitudinal patterns is possible, e.g. in species with conservative strategies or for whatever other reason.

      Thank you for the suggestion. We kept the ABC insets rather than combining them together as we believe this can deliver a clear structure of influence of QTP uplift under different scenarios.

      The legend to Figure 2 is not self-explanatory. Please make it clear what the response variable is and its units. The first line of the legend should read something like The influence of environmental factors on the direction of avian migration.

      Thank you. We have amended the legends of Figure 2 as suggested:

      “Figure 2. The influence of environmental factors on the direction of avian migration.  Migratory directions are calculated based on the azimuths between each adjacent stopover, breeding and wintering areas for each species. We employ multivariate linear regression models under the Bayesian framework to measure the correlation between environmental factors and avian migratory directions. Wind represents the wind cost calculated by wind connectivity. Vegetation is measured by the proportion of average vegetation cover in each pixel (~1.9° in latitude by 2.5° in longitude). Temperature is the average annual temperature. Precipitation is the average yearly precipitation. All environmental layers are obtained using the Community Earth System Model. West QTP, central QTP, and East QTP denote areas in the areas west (longitude < 73°E), central (73°E ≤ longitude < 105°E), and east of (longitude ≥ 105°E) the Qinghai-Tibet Plateau, respectively.”

      References

      Chen, Y., Z. Gu, and X. Zhan. 2024. stemflow: A Python Package for Adaptive Spatio-Temporal Exploratory Model. Journal of Open Source Software 9:6158.

      Elith, J., C. H. Graham, R. P. Anderson, M. Dudík, S. Ferrier, A. Guisan, R. J. Hijmans, F. Huettmann, J. R. Leathwick, A. Lehmann, J. Li, L. G. Lohmann, B. A. Loiselle, G. Manion, C. Moritz, M. Nakamura, Y. Nakazawa, J. McC. M. Overton, A. Townsend Peterson, S. J. Phillips, K. Richardson, R. Scachetti-Pereira, R. E. Schapire, J. Soberón, S. Williams, M. S. Wisz, and N. E. Zimmermann. 2006. Novel methods improve prediction of species' distributions from occurrence data. Ecography 29:129-151.

      Fink, D., T. Auer, A. Johnston, V. Ruiz-Gutierrez, W. M. Hochachka, and S. Kelling. 2020. Modeling avian full annual cycle distribution and population trends with citizen science data. Ecological Applications 30:e02056.

      Fink, D., T. Damoulas, and J. Dave. 2013. Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. Pages 1284-1290 in Proceedings of the AAAI Conference on Artificial Intelligence.

      Gonzalez, A., J. M. Chase, and M. I. O'Connor. 2023. A framework for the detection and attribution of biodiversity change. Philosophical Transactions of the Royal Society B: Biological Sciences 378.

      Jia, Y., H. Wu, S. Zhu, Q. Li, C. Zhang, Y. Yu, and A. Sun. 2020. Cenozoic aridification in Northwest China evidenced by paleovegetation evolution. Palaeogeography, Palaeoclimatology, Palaeoecology 557:109907.

      Johnston, A., D. Fink, M. D. Reynolds, W. M. Hochachka, B. L. Sullivan, N. E. Bruns, E. Hallstein, M. S. Merrifield, S. Matsumoto, and S. Kelling. 2015. Abundance models improve spatial and temporal prioritization of conservation resources. Ecological Applications 25:1749-1756.

      Kumar, N., U. Gupta, Y. V. Jhala, Q. Qureshi, A. G. Gosler, and F. Sergio. 2020. GPS-telemetry unveils the regular high-elevation crossing of the Himalayas by a migratory raptor: implications for definition of a “Central Asian Flyway”. Scientific Reports 10:15988.

      Lewis, D. 1973. Counterfactuals. Oxford: Blackwell.

      Li, S.-F., P. J. Valdes, A. Farnsworth, T. Davies-Barnard, T. Su, D. J. Lunt, R. A. Spicer, J. Liu, W.-Y.-D. Deng, J. Huang, H. Tang, A. Ridgwell, L.-L. Chen, and Z.-K. Zhou. 2021. Orographic evolution of northern Tibet shaped vegetation and plant diversity in eastern Asia. Science Advances 7:eabc7741.

      Liu, D., G. Zhang, H. Jiang, and J. Lu. 2018. Detours in long-distance migration across the Qinghai-Tibetan Plateau: individual consistency and habitat associations. PeerJ 6:e4304.

      Martins, L. P., D. B. Stouffer, P. G. Blendinger, K. Böhning-Gaese, J. M. Costa, D. M. Dehling, C. I. Donatti, C. Emer, M. Galetti, R. Heleno, Í. Menezes, J. C. Morante-Filho, M. C. Muñoz, E. L. Neuschulz, M. A. Pizo, M. Quitián, R. A. Ruggera, F. Saavedra, V. Santillán, M. Schleuning, L. P. da Silva, F. Ribeiro da Silva, J. A. Tobias, A. Traveset, M. G. R. Vollstädt, and J. M. Tylianakis. 2024. Birds optimize fruit size consumed near their geographic range limits. Science 385:331-336.

      Phillips, S. J., R. P. Anderson, and R. E. Schapire. 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190:231-259.

      Prins, H. H. T., and T. Namgail. 2017. Bird migration across the Himalayas : wetland functioning amidst mountains and glaciers. Cambridge University Press, Cambridge.

      Prosser, D. J., P. Cui, J. Y. Takekawa, M. Tang, Y. Hou, B. M. Collins, B. Yan, N. J. Hill, T. Li, Y. Li, F. Lei, S. Guo, Z. Xing, Y. He, Y. Zhou, D. C. Douglas, W. M. Perry, and S. H. Newman. 2011. Wild Bird Migration across the Qinghai-Tibetan Plateau: A Transmission Route for Highly Pathogenic H5N1. Plos One 6:e17622.

      Pu, Z., and Y. Guo. 2023. Autumn migration of black-necked crane (Grus nigricollis) on the Qinghai-Tibetan and Yunnan-Guizhou plateaus. Ecology and Evolution 13:e10492.

      Qu, Y., F. Lei, R. Zhang, and X. Lu. 2010. Comparative phylogeography of five avian species: implications for Pleistocene evolutionary history in the Qinghai-Tibetan plateau. Molecular Ecology 19:338-351.

      Wang, Y., C. Mi, and Y. Guo. 2020. Satellite tracking reveals a new migration route of black-necked cranes (Grus nigricollis) in Qinghai-Tibet Plateau. PeerJ 8:e9715.

      Yu, X., G. Song, H. Wang, Q. Wei, C. Jia, and F. Lei. 2024. Migratory flyways and connectivity of Brown Headed Gulls (Chroicocephalus brunnicephalus) revealed by GPS tracking. Global Ecology and Conservation 56:e03340.

      Zhang, G.-G., D.-P. Liu, Y.-Q. Hou, H.-X. Jiang, M. Dai, F.-W. Qian, J. Lu, T. Ma, L.-X. Chen, and Z. Xing. 2014. Migration routes and stopover sites of Pallas’s Gulls Larus ichthyaetus breeding at Qinghai Lake, China, determined by satellite tracking. Forktail 30:104-108.

      Zhang, G.-G., D.-P. Liu, Y.-Q. Hou, H.-X. Jiang, M. Dai, F.-W. Qian, J. Lu, Z. Xing, and F.-S. Li. 2011. Migration Routes and Stop-Over Sites Determined with Satellite Tracking of Bar-Headed Geese (Anser indicus) Breeding at Qinghai Lake, China. Waterbirds 34:112-116, 115.

      Zhang, R., D. Jiang, C. Zhang, and Z. Zhang. 2022. Distinct effects of Tibetan Plateau growth and global cooling on the eastern and central Asian climates during the Cenozoic. Global and Planetary Change 218:103969.

      Zhao, T., W. Heim, R. Nussbaumer, M. van Toor, G. Zhang, A. Andersson, J. Bäckman, Z. Liu, G. Song, M. Hellström, J. Roved, Y. Liu, S. Bensch, B. Wertheim, F. Lei, and B. Helm. 2024. Seasonal migration patterns of Siberian Rubythroat (Calliope calliope) facing the Qinghai–Tibet Plateau. Movement Ecology 12:54.

    1. Author response:

      We thank the reviewers for their thoughtful and generous assessment of our work. Overall, the reviewers found our work to be novel and relevant. In particular: reviewer #1 found that our manuscript “It is timely and highly valuable for the telomere field” reviewer #2 stated, “Overall, I find this manuscript worthy of publication, as the optimized END-seq methods described here will likely be widely utilized in the telomere field.” Reviewer #3 stated that “The study is original, the experiments were well-controlled and excellently executed.”

      We are extremely grateful for these comments and want to thank all the reviewers and the editors for their time and effort in reviewing our work.

      The reviewers had a number of suggestions to improve our work. We have addressed all the points as highlighted in the point-by-point responses below.

      Reviewer 1:

      One minor question would be whether the authors could expand more on the application of END-Seq to examine the processive steps of the ALT mechanism? Can they speculate if the ssDNA detected in ALT cells might be an intermediate generated during BIR (i.e., is the ssDNA displaced strand during BIR) or a lesion? Furthermore, have the authors assessed whether ssDNA lesions are due to the loss of ATRX or DAXX, either of which can be mutated in the ALT setting?

      We appreciate the reviewer’s insightful questions regarding the application of our assays to investigate the nature of the ssDNA detected in ALT telomeres. Our primary aim in this study was to establish the utility of END-seq and S1-END-seq in telomere biology and to demonstrate their applicability across both ALT-positive and -negative contexts. We agree that exploring the mechanistic origins of ssDNA would be highly informative, and we anticipate that END-seq–based approaches will be well suited for such future studies. However, it remains unclear whether the resolution of S1-END-seq is sufficient to capture transient intermediates such as those generated during BIR. We have now included a brief speculative statement in the revised discussion addressing the potential nature of ssDNA at telomeres in ALT cells.

      Reviewer #2:

      How can we be sure that all telomeres are equally represented? The authors seem to assume that END-seq captures all chromosome ends equally, but can we be certain of this? While I do not see an obvious way to resolve this experimentally, I recommend discussing this potential bias more extensively in the manuscript.

      We thank the reviewer for raising this important point. END-seq and S1-END-seq are unbiased methods designed to capture either double-stranded or single-stranded DNA that can be converted into blunt-ended double-stranded DNA and ligated to a capture oligo. As such, if a subset of telomeres cannot be processed using this approach, it is possible that these telomeres may be underrepresented or lost. However, to our knowledge, there are no proposed telomeric structures that would prevent capture using this method. For example, even if a subset of telomeres possesses a 5′ overhang, it would still be captured by END-seq. Indeed, we observed the consistent presence of the 5′-ATC motif across multiple cell lines and species (human, mouse, and dog). More importantly, we detected predictable and significant changes in sequence composition when telomere ends were experimentally altered, either in vivo (via POT1 depletion) or in vitro (via T7 exonuclease treatment). Together, these findings support the robustness of the method in capturing a representative and dynamic view of telomeres across different systems.

      That said, we have now included a brief statement in the revised discussion acknowledging that we cannot fully exclude the possibility that a subset of telomeres may be missed due to unusual or uncharacterized structures

      I believe Figures 1 and 2 should be merged.

      We appreciate the reviewer’s suggestion to merge Figures 1 and 2. However, we feel that keeping them as separate figures better preserves the logical flow of the manuscript and allows the validation of END-seq and its application to be presented with appropriate clarity and focus. We hope the reviewer agrees that this layout enhances the clarity and interpretability of the data.

      Scale bars should be added to all microscopy figures.

      We thank the reviewer for pointing this out. We have now added scale bars to all the microscopy panels in the figures and included the scale details in the figure legends.

      Reviewer #3:

      Overall, the discussion section is lacking depth and should be expanded and a few additional experiments should be performed to clarify the results.

      We thank the reviewer for the suggestions. Based on this reviewer’s comments and comments for the other reviewers, we incorporated several points into the discussion. As a result, we hope that we provide additional depth to our conclusions.

      (1) The finding that the abundance of variant telomeric repeats (VTRs) within the final 30 nucleotides of the telomeric 5' ends is similar in both telomerase-expressing and ALT cells is intriguing, but the authors do not address this result. Could the authors provide more insight into this observation and suggest potential explanations? As the frequency of VTRs does not seem to be upregulated in POT1-depleted cells, what then drives the appearance of VTRs on the C-strand at the very end of telomeres? Is CST-Pola complex responsible?

      The reviewer raises a very interesting and relevant point. We are hesitant at this point to speculate on why we do not see a difference in variant repeats in ALT versus non-ALT cells, since additional data would be needed. One possibility is that variant repeats in ALT cells accumulate stochastically within telomeres but are selected against when they are present at the terminal portion of chromosome ends. However, to prove this hypothesis, we would need error-free long-read technology combined with END-seq. We feel that developing this approach would be beyond the scope of this manuscript.

      (2) The authors also note that, in ALT cells, the frequency of VTRs in the first 30 nucleotides of the S1-END-SEQ reads is higher compared to END-SEQ, but this finding is not discussed either. Do the authors think that the presence of ssDNA regions is associated with the VTRs? Along this line, what is the frequency of VTRs in the END-SEQ analysis of TRF1-FokI-expressing ALT cells? Is it also increased? Has TRF1-FokI been applied to telomerase-expressing cells to compare VTR frequencies at internal sites between ALT and telomerase-expressing cells?

      Similarly to what is discussed above, short reads have the advantage of being very accurate but do not provide sufficient length to establish the relative frequency of VTRs across the whole telomere sequence. The TRF1-FokI experiment is a good suggestion, but it would still be biased toward non-variant repeats due to the TRF1-binding properties. We plan to address these questions in a future study involving long-read sequencing and END-seq capture of telomeres.

      Finally, in these experiments (S1-END-SEQ or END-SEQ in TRF1-Fok1), is the frequency of VTRs the same on both the C- and the G-rich strands? It is possible that the sequences are not fully complementary in regions where G4 structures form.

      We thank the reviewer for this observation. While we do observe a higher frequency of variant telomeric repeats (VTRs) in the first 30 nucleotides of S1-END-seq reads compared to END-seq in ALT cells, we are currently unable to determine whether this difference is significant, as an appropriate control or matched normalization strategy for this comparison is lacking. Therefore, we refrain from overinterpreting the biological relevance of this observation.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      (3) Based on the ratio of C-rich to G-rich reads in the S1-END-SEQ experiment, the authors estimate that ALT cells contain at least 3-5 ssDNA regions per chromosome end. While the calculation is understandable, this number could be discussed further to consider the possibility that the observed ratios (of roughly 0.5) might result from the presence of extrachromosomal DNA species, such as C-circles. The observed increase in the ratio of C-rich to G-rich reads in BLM-depleted cells supports this hypothesis, as BLM depletion suppresses C-circle formation in U2OS cells. To test this, the authors should examine the impact of POLD3 depletion on the C-rich/G-rich read ratio. Alternatively, they could separate high-molecular-weight (HMW) DNA from low-molecular-weight DNA in ALT cells and repeat the S1-END-SEQ in the HMW fraction.

      The reviewer is absolutely correct. Our calculation did not exclude the possibility of extrachromosomal DNA as a source of telomeric ssDNA. We have now addressed this point in our discussion.

      (4) What is the authors' perspective on the presence of ssDNA at ALT telomeres? Do they attribute this to replication stress? It would be helpful for the authors to repeat the S1-END-SEQ in telomerase-expressing cells with very long telomeres, such as HeLa1.3 cells, to determine if ssDNA is a specific feature of ALT cells or a result of replication stress. The increased abundance of G4 structures at telomeres in HeLa1.3 cells (as shown in J. Wong's lab) may indicate that replication stress is a factor. Similar to Wong's work, it would be valuable to compare the C-rich/G-rich read ratios in HeLa1.3 cells to those in ALT cells with similar telomeric DNA content.

      The reviewer is correct in pointing out that we still do not know what causes ssDNA at telomeres in ALT cells. Replication stress seems the most logical explanation based on the work of many labs in the field. However, our data did not reveal any significant difference in the levels of ssDNA at telomeres in non-ALT cells based on telomere length. We used the HeLa1.2.11 cell line (now clarified in the Materials section), which is the parental line of HeLa1.3 and has similarly long telomeres (~20 kb vs. ~23 kb). Despite their long telomeres and potential for replication-associated challenges such as G-quadruplex formation, HeLa1.2.11 cells did not exhibit the elevated levels of telomeric ssDNA that we observed in ALT cells (Figure 4B). Additional experiments are needed to map the occurrence of ssDNA at telomeres in relation to progression toward ALT.

      Finally, Reviewer #3 raises a list of minor points:

      (1) The Y-axes of Figure 4 have been relabeled to account for the G-strand reads.

      (2) Statistical analyses have been added to the figures where applicable.

      (3) The manuscript has been carefully proofread to improve clarity and consistency throughout the text and figure legends.

      (4) We have revised the text to address issues related to the lack of cross-referencing between the supplementary figures and their corresponding legends.

    1. The rubric is intended to be used as a stand-alone resource. The following is an explanation of each category and how we framed it to meet our development goals.

      This rubric is supposed to highlight the basic needs of all students and how well a certain e-learning tool fits into these needs. I think this rubric is a good start for analyzing digital tools that you may bring into the classroom, but the true test is seeing how much your students have learned from these tools after they are used. Just because a tool may fit perfectly in this rubric does not mean it will educate students perfectly in the classroom.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study examined the interaction between two key cortical regions in the mouse brain involved in goal-directed movements, the rostral forelimb area (RFA) - considered a premotor region involved in movement planning, and the caudal forelimb area (CFA) - considered a primary motor region that more directly influences movement execution. The authors ask whether there exists a hierarchical interaction between these regions, as previously hypothesized, and focus on a specific definition of hierarchy - examining whether the neural activity in the premotor region exerts a larger functional influence on the activity in the primary motor area than vice versa. They examine this question using advanced experimental and analytical methods, including localized optogenetic manipulation of neural activity in either region while measuring both the neural activity in the other region and EMG signals from several muscles involved in the reaching movement, as well as simultaneous electrophysiology recordings from both regions in a separate cohort of animals.

      The findings presented show that localized optogenetic manipulation of neural activity in either RFA or CFA resulted in similarly short-latency changes in the muscle output and in firing rate changes in the other region. However, perturbation of RFA led to a larger absolute change in the neural activity of CFA neurons. The authors interpret these findings as evidence for reciprocal, but asymmetrical, influence between the regions, suggesting some degree of hierarchy in which RFA has a greater effect on the neural activity in CFA. They go on to examine whether this asymmetry can also be observed in simultaneously recorded neural activity patterns from both regions. They use multiple advanced analysis methods that either identify latent components at the population level or measure the predictability of firing rates of single neurons in one region using firing rates of single neurons in the other region. Interestingly, the main finding across these analyses seems to be that both regions share highly similar components that capture a high degree of variability of the neural activity patterns in each region. Single units' activity from either region could be predicted to a similar degree from the activity of single units in the other region, without a clear division into a leading area and a lagging area, as one might expect to find in a simple hierarchical interaction. However, the authors find some evidence showing a slight bias towards leading activity in RFA. Using a two-region neural network model that is fit to the summed neural activity recorded in the different experiments and to the summed muscle output, the authors show that a network with constrained (balanced) weights between the regions can still output the observed measured activities and the observed asymmetrical effects of the optogenetic manipulations, by having different within-region local weights. These results put into question whether previous and current findings that demonstrate asymmetry in the output of regions can be interpreted as evidence for asymmetrical (and thus hierarchical) inputs between regions, emphasizing the challenges in studying interactions between any brain regions.

      Strengths:

      The experiments and analyses performed in this study are comprehensive and provide a detailed examination and comparison of neural activity recorded simultaneously using dense electrophysiology probes from two main motor regions that have been the focus of studies examining goal-directed movements. The findings showing reciprocal effects from each region to the other, similar short-latency modulation of muscle output by both regions, and similarity of neural activity patterns without a clear lead/lag interaction, are convincing and add to the growing body of evidence that highlight the complexity of the interactions between multiple regions in the motor system and go against a simple feedforward-like network and dynamics. The neural network model complements these findings and adds an important demonstration that the observed asymmetry can, in theory, also arise from differences in local recurrent connections and not necessarily from different input projections from one region to the other. This sheds an important light on the multiple factors that should be considered when studying the interaction between any two brain regions, with a specific emphasis on the role of local recurrent connections, that should be of interest to the general neuroscience community.

      Weaknesses:

      While the similarity of the activity patterns across regions and lack of a clear leading/lagging interaction are interesting observations that are mostly supported by the findings presented (however, see comment below for lack of clarity in CCA/PLS analyses), the main question posed by the authors - whether there exists an endogenous hierarchical interaction between RFA and CFA - seems to be left largely open. 

      The authors note that there is currently no clear evidence of asymmetrical reciprocal influence between naturally occurring neural activity patterns of the two regions, as previous attempts have used non-natural electrical stimulation, lesions, or pharmacological inactivation. The use of acute optogenetic perturbations does not seem to be vastly different in that aspect, as it is a non-natural stimulation of inhibitory interneurons that abruptly perturbs the ongoing dynamics.

      We do believe that our optogenetic inactivation identifies a causal interaction between the endogenous activity patterns in the excitatory projection neurons, which we have largely silenced, and the downstream endogenous activity that is perturbed. The effect in the downstream region results directly from the silencing of activity in the excitatory projection neurons that mediate each region’s interaction with other regions. Here we have performed a causal intervention common in biology: a loss-of-function experiment. Such experiments generally reveal that a causal interaction of some sort is present, but often do not clarify much about the nature of the interaction, as is true in our case. By showing that a silencing of endogenous activity in one motor cortical region causes a significant change to the endogenous activity in another, we establish a causal relationship between these activity patterns. This is analogous to knocking out the gene for a transcription factor and observing causal effects on the expression of other genes that depend on it. 

      Moreover, our experiments are, to our knowledge, the first that localize a causal relationship to endogenous activity in motor cortex at a particular point during a motor behavior. Lesion and pharmacological or chemogenetic inactivation have long-lasting effects, and so their consequences on firing in other regions cannot be attributed to a short-latency influence of activity at a particular point during movement. Moreover, the involvement of motor cortex in motor learning and movement preparation/initiation complicates the interpretation of these consequences in relation to movement execution, as disturbance to processes on which execution depends can impede execution itself. Stimulation experiments generate spiking in excitatory projection neurons that is not endogenous.

      That said, we would agree that the form of the causal interaction between RFA and CFA remains unaddressed by our results. These results do not expose how the silenced activity patterns affect activity in the downstream region, just as knocking out a transcription factor gene does not expose how the transcription factor influences the expression of other genes. To show evidence for a specific type of interaction dynamics between RFA and CFA, a different sort of experiment would be necessary. See Jazayeri and Afraz, Neuron, 2017 for more on this issue.

      Furthermore, the main finding that supports a hierarchical interaction is a difference in the absolute change of firing rates as a result of the optogenetic perturbation, a finding that is based on a small number of animals (N = 3 in each experimental group), and one which may be difficult to interpret. 

      Though N = 3, we do show statistical significance. Moreover, using three replicates is not uncommon in biological experiments that require a large technical investment.

      As the authors nicely demonstrate in their neural network model, the two regions may differ in the strength of local within-region inhibitory connections. Could this theoretically also lead to a difference in the effect of the artificial light stimulation of the inhibitory interneurons on the local population of excitatory projection neurons, driving an asymmetrical effect on the downstream region? 

      We (Miri et al., Neuron, 2017) and others (Guo et al., Neuron, 2014) have shown that the effect of this inactivation on excitatory neurons in CFA is a near-complete silencing (90-95% within 20 ms). There thus is not much room for the effects on projection neurons in RFA to be much larger. We have measured these local effects in RFA as part of other work (Kristl et al., biorxiv, 2025), verifying that the effects on RFA projection neuron firing are not larger.

      Moreover, the manipulation was performed upon the beginning of the reaching movement, while the premotor region is often hypothesized to exert its main control during movement preparation, and thus possibly show greater modulation during that movement epoch. It is not clear if the observed difference in absolute change is dependent on the chosen time of optogenetic stimulation and if this effect is a general effect that will hold if the stimulation is delivered during different movement epochs, such as during movement preparation.

      We agree that the dependence of RFA-CFA interactions on movement phase would be interesting to address in subsequent experiments. While a strong interpretation of lesion results might lead to a hypothesis that premotor influence on primary motor cortex is local to, or stronger during, movement preparation as opposed to execution, at present there is to our knowledge no empirical support from interventional experiments for this hypothesis. Moreover, existing results from analysis of activity in these two regions have produced conflicting results on the strength of interaction between these regions during preparation. Compare for example BachschmidRomano et al., eLife, 2023 to Kaufman et al., Nature Neuroscience, 2014.

      That said, this lesion interpretation would predict the same asymmetry we have observed from perturbations at the beginning of a reach - a larger effect of RFA on CFA than vice versa.

      Another finding that is not clearly interpretable is in the analysis of the population activity using CCA and PLS. The authors show that shifting the activity of one region compared to the other, in an attempt to find the optimal leading/lagging interaction, does not affect the results of these analyses. Assuming the activities of both regions are better aligned at some unknown groundtruth lead/lag time, I would expect to see a peak somewhere in the range examined, as is nicely shown when running the same analyses on a single region's activity. If the activities are indeed aligned at zero, without a clear leading/lagging interaction, but the results remain similar when shifting the activities of one region compared to the other, the interpretation of these analyses is not clear.

      Our results in this case were definitely surprising. Many share the intuition that there should be a lag at which the correlations in activity between regions may be strongest. The similarity in alignment across lags we observed might be expected if communication between regions occurs over a range of latencies as a result of dependence on a broad diversity of synaptic paths that connect neurons. In the Discussion, we offer an explanation of how to reconcile these findings with the seemingly different picture presented by DLAG.

      Reviewer #2 (Public review):

      Summary:

      While technical advances have enabled large-scale, multi-site neural recordings, characterizing inter-regional communication and its behavioral relevance remains challenging due to intrinsic properties of the brain such as shared inputs, network complexity, and external noise. This work by Saiki-Ishkawa et al. examines the functional hierarchy between premotor (PM) and primary motor (M1) cortices in mice during a directional reaching task. The authors find some evidence consistent with an asymmetric reciprocal influence between the regions, but overall, activity patterns were highly similar and equally predictive of one another. These results suggest that motor cortical hierarchy, though present, is not fully reflected in firing patterns alone.

      Strengths:

      Inferring functional hierarchies between brain regions, given the complexity of reciprocal and local connectivity, dynamic interactions, and the influence of both shared and independent external inputs, is a challenging task. It requires careful analysis of simultaneous recording data, combined with cross-validation across multiple metrics, to accurately assess the functional relationships between regions. The authors have generated a valuable dataset simultaneously recording from both regions at scale from mice performing a cortex-dependent directional reaching task.

      Using electrophysiological and silencing data, the authors found evidence supporting the traditionally assumed asymmetric influence from PM to M1. While earlier studies inferred a functional hierarchy based on partial temporal relationships in firing patterns, the authors applied a series of complementary analyses to rigorously test this hierarchy at both individual neuron and population levels, with robust statistical validation of significance.

      In addition, recording combined with brief optogenetic silencing of the other region allowed authors to infer the asymmetric functional influence in a more causal manner. This experiment is well designed to focus on the effect of inactivation manifesting through oligosynaptic connections to support the existence of a premotor to primary motor functional hierarchy.

      Subsequent analyses revealed a more complex picture. CCA, PLS, and three measures of predictivity (Granger causality, transfer entropy, and convergent cross-mapping) emphasized similarities in firing patterns and cross-region predictability. However, DLAG suggested an imbalance, with RFA capturing CFA variance at a negative time lag, indicating that RFA 'leads' CFA. Taken together these results provide useful insights for current studies of functional hierarchy about potential limitations in inferring hierarchy solely based on firing rates.

      While I would detail some questions and issues on specifics of data analyses and modeling below, I appreciate the authors' effort in training RNNs that match some behavioral and recorded neural activity patterns including the inactivation result. The authors point out two components that can determine the across-region influence - 1) the amount of inputs received and 2) the dependence on across-region input, i.e., the relative importance of local dynamics, providing useful insights in inferring functional relationships across regions.

      Weaknesses:

      (1) Trial-averaging was applied in CCA and PLS analyses. While trial-averaging can be appropriate in certain cases, it leads to the loss of trial-to-trial variance, potentially inflating the perceived similarities between the activity in the two regions (Figure 4). Do authors observe comparable degrees of similarity, e.g., variance explained by canonical variables? Also, the authors report conflicting findings regarding the temporal relationship between RFA and CFA when using CCA/PLS versus DLAG. Could this discrepancy be due to the use of trial-averaging in former analyses but not in the latter?

      We certainly agree that the similarity in firing patterns is higher in trial averages than on single trials, given the variation in single-neuron firing patterns across trials. Here, we were trying to examine the similarity of activity variance that is clearly movement dependent, as trial averages are, and to use an approach aligned with those applied in the existing literature. We would also agree that there is more that can be learned about interactions from trial-by-trial analysis. It is possible that the activity components identified by DLAG as being asymmetric somehow are not reflected strongly in trial averages. In our Discussion we offer another potential explanation that is based on other differences in what is calculated by DLAG and CCA/PLS.

      We also note here that all of the firing pattern predictivity analysis we report (Figure 6) was done on single trial data, and in all cases the predictivity was symmetric. Thus, our results in aggregate are not consistent with symmetry purely being an artifact of trial averaging.

      (2) A key strength of the current study is the precise tracking of forelimb muscle activity during a complex motor task involving reaching for four different targets. This rich behavioral data is rarely collected in mice and offers a valuable opportunity to investigate the behavioral relevance of the PM-M1 functional interaction, yet little has been done to explore this aspect in depth. For example, single-trial time courses of inter-regional latent variables acquired from DLAG analysis can be correlated with single-trial muscle activity and/or reach trajectories to examine the behavioral relevance of inter-regional dynamics. Namely, can trial-by-trial change in inter-regional dynamics explain behavioral variability across trials and/or targets? Does the inter-areal interaction change in error trials? Furthermore, the authors could quantify the relative contribution of across-area versus within-area dynamics to behavioral variability. It would also be interesting to assess the degree to which across-area and within-area dynamics are correlated. Specifically, can acrossarea dynamics vary independently from within-area dynamics across trials, potentially operating through a distinct communication subspace?

      These are all very interesting questions. Our study does not attempt to parse activity into components predictive of muscle activity and others that may reflect other functions. Distinct components of RFA and CFA activity may very well rely on distinct interactions between them.

      (3) While network modeling of RFA and CFA activity captured some aspects of behavioral and neural data, I wonder if certain findings such as the connection weight distribution (Figure 7C), across-region input (Figure 7F), and the within-region weights (Figure 7G), primarily resulted from fitting the different overall firing rates between the two regions with CFA exhibiting higher average firing rates. Did the authors account for this firing rate disparity when training the RNNs?

      The key comparison in Figure 7 is shown in 7F, where the firing rates are accounted for in calculating the across-region input strength. Equalizing the firing rates in RFA and CFA would effectively increase RFA rates. If the mean firing rates in each region were appreciably dependent on across-region inputs, we would then expect an off-setting change in the RFA→CFA weights, such that the RFA→CFA distributions in 7F would stay the same. We would also expect the CFA→RFA weights would increase, since RFA neurons would need more input. This would shift the CFA→RFA (blue) distributions up. Thus, if anything, the key difference in this panel would only get larger. 

      We also generally feel that it is a better approach to fit the actual firing rates, rather than normalizing, since normalizing the firing rates would take us further from the actual biology, not closer.

      (4) Another way to assess the functional hierarchy is by comparing the time courses of movement representation between the two regions. For example, a linear decoder could be used to compare the amount of information about muscle activity and/or target location as well as time courses thereof between the two regions. This approach is advantageous because it incorporates behavior rather than focusing solely on neural activity. Since one of the main claims of this study is the limitation of inferring functional hierarchy from firing rate data alone, the authors should use the behavior as a lens for examining inter-areal interactions.

      As we state above, we agree that examining interactions specific to movement-related activity components could reveal interesting structure in interregional interactions. Since it remains a challenge to rigorously identify a subset of neural activity patterns specifically related to driving muscle activity, any such analysis would involve an additional assumption. It remains unclear how well the activity that decoders use for predicting muscle activity matches the activity that actually drives muscle activity in situ.

      To address this issue, which related to one raised by Reviewer #3 below, we have added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      Reviewer #3 (Public review):

      This study investigates how two cortical regions that are central to the study of rodent motor control (rostral forelimb area, RFA, and caudal forelimb area, CFA) interact during directional forelimb reaching in mice. The authors investigate this interaction using

      (1) optogenetic manipulations in one area while recording extracellularly from the other, (2) statistical analyses of simultaneous CFA/RFA extracellular recordings, and (3) network modeling.

      The authors provide solid evidence that asymmetry between RFA and CFA can be observed, although such asymmetry is only observed in certain experimental and analytical contexts.

      The authors find asymmetry when applying optogenetic perturbations, reporting a greater impact of RFA inactivation on CFA activity than vice-versa. The authors then investigate asymmetry in endogenous activity during forelimb movements and find asymmetry with some analytical methods but not others. Asymmetry was observed in the onset timing of movement-related deviations of local latent components with RFA leading CFA (computed with PCA) and in a relatively higher proportion and importance of cross-area latent components with RFA leading than CFA leading (computed with DLAG). However, no asymmetry was observed using several other methods that compute cross-area latent dynamics, nor with methods computed on individual neuron pairs across regions. The authors follow up this experimental work by developing a twoarea model with asymmetric dependence on cross-area input. This model is used to show that differences in local connectivity can drive asymmetry between two areas with equal amounts of across-region input.

      Overall, this work provides a useful demonstration that different cross-area analysis methods result in different conclusions regarding asymmetric interactions between brain areas and suggests careful consideration of methods when analyzing such networks is critical. A deeper examination of why different analytical methods result in observed asymmetry or no asymmetry, analyses that specifically examine neural dynamics informative about details of the movement, or a biological investigation of the hypothesis provided by the model would provide greater clarity regarding the interaction between RFA and CFA.

      Strengths:

      The authors are rigorous in their experimental and analytical methods, carefully monitoring the impact of their perturbations with simultaneous recordings, and providing valid controls for their analytical methods. They cite relevant previous literature that largely agrees with the current work, highlighting the continued ambiguity regarding the extent to which there exists an asymmetry in endogenous activity between RFA and CFA.

      A strength of the paper is the evidence for asymmetry provided by optogenetic manipulation. They show that RFA inactivation causes a greater absolute difference in muscle activity than CFA interaction (deviations begin 25-50 ms after laser onset, Figure 1) and that RFA inactivation causes a relatively larger decrease in CFA firing rate than CFA inactivation causes in RFA (deviations begin <25ms after laser onset, Figure 3). The timescales of these changes provide solid evidence for an asymmetry in the impact of inactivating RFA/CFA on the other region that could not be driven by differences in feedback from disrupted movement (which would appear with a ~50ms delay).

      The authors also utilize a range of different analytical methods, showing an interesting difference between some population-based methods (PCA, DLAG) that observe asymmetry, and single neuron pair methods (granger causality, transfer entropy, and convergent cross mapping) that do not. Moreover, the modeling work presents an interesting potential cause of "hierarchy" or "asymmetry" between brain areas: local connectivity that impacts dependence on across-region input, rather than the amount of across-region input actually present.

      Weaknesses:

      There is no attempt to examine neural dynamics that are specifically relevant/informative about the details of the ongoing forelimb movement (e.g., kinematics, reach direction). Thus, it may be preemptive to claim that firing patterns alone do not reflect functional influence between RFA/CFA. For example, given evidence that the largest component of motor cortical activity doesn't reflect details of ongoing movement (reach direction or path; Kaufman, et al. PMID: 27761519) and that the analytical tools the authors use likely isolate this component (PCA, CCA), it may not be surprising that CFA and RFA do not show asymmetry if such asymmetry is related to the control of movement details. 

      An asymmetry may still exist in the components of neural activity that encode information about movement details, and thus it may be necessary to isolate and examine the interaction of behaviorally-relevant dynamics (e.g., Sani, et al. PMID: 33169030).

      To clarify, we are not claiming that firing patterns in no way reflect the asymmetric functional influence that we demonstrate with optogenetic inactivation. Instead, we show that certain types of analysis that we might expect to reflect such influence, in fact, do not. Indeed, DLAG did exhibit asymmetries that matched those seen in functional influence (at least qualitatively), though other methods we applied did not.

      As we state above, we do think that there is more that can be gleaned by looking at influence specifically in terms of activity related to movement. However, if we did find that movement-related activity exhibited an asymmetry following functional influence, our results imply that the remaining activity components would exhibit an opposite asymmetry, such that the overall balance is symmetric. This would itself be surprising. We also note that the components identified by CCA and PLS do show substantial variation across reach targets, indicating that they are not only reflecting condition-invariant components. These analyses were performed on components accounting for well over 90% of the total activity variance, suggesting that both conditiondependent and condition-invariant components should be included.

      To address the concern about condition-dependent and condition-invariant components, we have added a sentence to the Results section reporting our CCA and PLS results: “Because our results here involve the vast majority of trial-averaged activity variance, we expect that they encompass both components of activity that vary for different movement conditions (condition-dependent), and those that do not (condition-invariant).” To address the general concerns about potential differences in activity components specifically related to muscle activity, we have also added an additional paragraph to the Discussion (see “Manifestations of hierarchy in firing patterns”).

      The idea that local circuit dynamics play a central role in determining the asymmetry between RFA and CFA is not supported by experimental data in this paper. The plausibility of this hypothesis is supported by the model but is not explored in any analyses of the experimental data collected. Given the focus on this idea in the discussion, further experimental investigation is warranted.

      While we do not provide experimental support for this hypothesis, the data we present also do not contradict this hypothesis. Here we used modeling as it is often used - to capture experimental results and generate hypotheses about potential explanation. We do feel that our Discussion makes clear where the hypothesis derives from and does not misrepresent the lack of experimental support. We expect readers will take our engagement with this hypothesis with the appropriate grain of salt. The imaginable experiments to support such a hypothesis would constitute another substantial study, requiring numerous controls - a whole other paper in itself.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) There are a few small text/figure caption modifications that can be made for clarity of reading:

      (2) Unclear sentence in the second paragraph of the introduction: "For example, stimulation applied in PM has been shown to alter the effects on muscles of stimulation in M1 under anesthesia, both in monkeys and rodents."

      This sentence has been rephrased for clarity: “For example, in anesthetized monkeys34 and rodents35, stimulation in PM alters the effects of stimulation in M1 on muscles.”

      (3) The first section of the results presents the optogenetic manipulation. However, the critical control that tests whether this was strictly a local manipulation that did not affect cells in the other region is introduced only much later. It may be helpful to add a comment in this section noting that such a control was performed, even if it is explained in detail later when introducing the recordings.

      We have added the following to the first Results section: “we show below that direct optogenetic effects were only seen in the targeted forelimb area and not the other.”

      (4) Figure 1D - I imagine these averages are from a single animal, but this is not stated in the figure caption.

      “For one example mouse,” has been added to the beginning of the Figure 1D legend.

      (5) Figure 2F - N=6 is not stated in the panel's caption (though it can make it clearer), while it is stated in the caption of 2H.

      “n = 6 mice” has been added to the Figure 2F legend.

      (6) There's some inconsistency with the order of RFA/CFA in the figures, sometimes RFA is presented first (e.g., Figure 1D and 1F), and sometimes CFA is presented first (e.g., panels of Figure 2).

      We do not foresee this leading to confusion.

      (7) "As expected, the majority of recorded neurons in each region exhibited an elevated average firing rate during movement as compared to periods when forelimb muscles were quiescent (Figure 2D,E; Figure S1A,B)" - Figure S1A,B show histograms of narrow vs. wide waveforms, is this the relevant figure here?

      We apologize for the cryptic reference. The waveform width histograms were referred to here because they enabled the separation of narrow- and wide-waveform cells shown in Figure 2D,E. We have added the following clause to the referenced sentence to make this explicit:  “, both for narrow-waveform, putative interneurons and wide-waveform putative pyramidal neurons.”

      (8) Figure 2I caption - "The fraction of activity variance from 150 ms before reach onset to 150 ms after it that occurs before reach onset" - this sentence is not clear.

      The Figure 2I legend has been updated to “The activity variance in the 150 ms before muscle activity onset, defined as a fraction of the total activity variance from 150 ms before to 150 ms after muscle activity onset, for each animal (circles) and the mean across animals (black bars, n = 6 mice).”

      (9) Figure 4B-G - is this showing results across the 6 animals? Not stated clearly.

      Yes - the 21 sessions we had referred to are drawn from all six mice. We have updated the legend here to make this explicit.

      (10) DLAG analysis - is there any particular reasoning behind choosing four across-region and four within-region components?

      In actuality, we completed this analysis for a broad range of component numbers and obtained similar results in all cases. Four fell in the center of our range, and so we focused the illustrations shown in the figure on this value. In general, the number of components is arbitrary. The original paper from Gokcen et al. describes a method for identifying a lower bound on the number of distinct components the method can identify. However, this method yields different results for each individual recording session. For the comparisons we performed, we needed to use the same range of values for each session.

      (11) Figure 5A seems to show 11 across-session components, it's unclear from the caption but I imagine this should show 12 (4 components times 3 sessions?)

      As we state in the Methods, any across-region latent variable with a lag that failed to converge between the boundary values of ±200 ms was removed from the analysis. In the case illustrated in this panel, the lag for one of the components failed to converge and is not shown. We have now clarified this both in the relevant Results paragraph and in the figure legend.

      (12) Figure 5B - is each marker here the average variance explained by all across/within components that were within the specified lag criteria across sessions per mouse? In other words, what does a single marker here stand for?

      We apologize for the lack of clarity here. These values reflect the average across sessions for each mouse. We have updated the legend to make this explicit.

      Reviewer #2 (Recommendations for the authors):

      As I have addressed most of my major recommendations in the public review, I will use this section to include relatively minor points for the authors to consider.

      (1) The EMG data in Figure 1C shows distinct patterns across spouts, both in the magnitude and complexity of muscle activations. It would be interesting to investigate whether these differences in muscle activity lead to behavioral variations (e.g., reaction time, reach duration) and how they relate to the relative involvement of the two areas.

      We agree that it would be interesting to examine how the interactions between areas vary as behavior varies. While the differences between reaches here are limited, we have addressed this question for two substantially different motor behaviors (reaching and climbing) in a follow-up study that was recently preprinted (Kristl et al., biorxiv, 2025).

      (2) How do the authors account for the lingering impact of RFA inactivation on muscle activity, which persists for tens of milliseconds after laser offset? Could this effect be due to compensatory motor activity following the perturbation? A further illustration of how the raw limb trajectories and/or muscle activity are perturbed and recovered would help readers better understand the impact of motor cortical inactivation.

      To clarify the effects of inactivation on a longer timescale, we have added a new supplemental figure showing the plots from Figure 1D over a longer time window extending to 500 ms after trial onset (new Figure S1). Lingering effects do persist, at least in certain cases. In general, we find it hard to ascertain the source of optogenetic effects on longer timescales like this. On the shortest timescales, effects will be mediated by relatively direct connections between regions. However, on these longer timescales, effects could be due to broader changes in brain and behavioral state that can influence muscle activity. For example, attempts to compensate for the initial disturbance to muscle activity could cause divergence from controls on these longer timescales. Muscle tissue itself is also known to have long timescale relaxation dynamics, and it would not be surprising if the relevant control circuits here also had long timescales dynamics, such that we would not expect an immediate return to control when the light pulse ends. Because of this ambiguity, we generally avoid interpretation of optogenetic effects on these longer timescales.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 9: ". We measured the time at which the activity state deviated from baseline preceding reach onset," - I cannot find how this deviation was defined (neither the baseline nor the threshold).

      We have added text to the Figure 2G legend that explicitly states how the baseline and activity onset time were defined.

      (2) Given the shape of the curves in Figure 2G, the significance of this result seems susceptible to slight modifications of what defines a baseline or a deviation threshold. For example, it looks like the circle for CFA has a higher y-axis value, suggesting the baseline deviance is higher, but it is unclear why that would be from the plot. If the threshold for deviation in neural activity state were held uniform between CFA and RFA is the difference still significant across animals?

      We have repeated the analysis using the same absolute threshold for each region. We used the higher of the two thresholds from each region. The difference remains significant. This is now described in the last paragraph of the Results section for Figure 2.

      (3) Since summed deviation of the top 3 PCs is used to show a difference in activity onset between CFA/RFA, but only a small proportion of variance is explained pre-movement (<2% in most animals), it seems relevant to understand what percentage of CFA/RFA neuron activity actually is modulated and deviates from baseline prior to movement and to show the distribution of activity onsets at the single neuron level in CFA/RFA. Can an onset difference only be observed using PCA? 

      Because many neurons have low firing rates, estimating the time at which their firing rate begins to rise near reach onset is difficult to do reliably. It is also true that not all neurons show an increase around onset - some show a decrease and others show no discernible change. Using PCs to measure onset avoids both of these problems, since they capture both increases and decreases in individual neuron firing rates and are much less noisy than individual neuron firing rates. 

      However, based on this comment, we have repeated this analysis on a single-neuron level using only neurons with relatively high average firing rates. Specifically, we analyzed neurons with mean firing rates above the 90th percentile across all sessions within an animal. Neurons whose activity never crossed threshold were excluded. Results matched those using PCs, with RFA neurons showing an earlier average activity onset time. This is now described in the last paragraph of the Results section for Figure 2.

      (4) It is stated that to study the impact of inactivation on CFA/RFA activity, only the 50 highest average firing rate neurons were used (and maybe elsewhere too, e.g., convergent cross mapping). It is unclear why this subselection is necessary. It is justified by stating that higher firing rate neurons have better firing rate estimates. This may be supportable for very low firing rate units that spike sorting tools have a hard time tracking, but I don't think this is supported by data for most of the distribution of firing rates. It therefore seems like the results might be biased by a subselection of certain high firing rate neuron populations. It would be useful to also compute and mention if the results for all neurons/neuron pairs are the same. If there is worry about low-quality units being those with low firing rates, a threshold for firing rate as used elsewhere in the paper (at least 1 spike / 2 trials) seems justified.

      The issue here is that as firing rates decrease and firing rate estimates get noisier, estimates of the change in firing rate get more variable. Here we are trying to estimate the fraction of neurons for which firing rates decreased upon inactivation of the other region. Variability in estimates of the firing rate change will bias this estimate toward 50%, since in the limit when the change estimates are entirely based on noise, we expect 50% to be decreases. As expected, when we use increasingly liberal thresholds for this analysis, the fraction of decreases trends closer to 50%. 

      As a consequence of this, we cannot easily distinguish whether higher firing rate neurons might for some reason have a greater tendency to exhibit decreases in firing compared to lower firing rate neurons. However, we see no positive reason to expect such a difference. We have added a sentence noting this caveat in interpreting our findings to the relevant paragraph of the Results.

      The lack of min/max axis values in Figure 3B-F makes it hard to interpret - are these neurons almost silent when near the bottom of the plot or are they still firing a substantial # of spikes?

      To aid interpretation of the relative magnitude of firing rate changes, we have added minimum firing rates for the averages depicted in Figure 3B,C,E and F to the legend. Our original thinking was that the plots in Figure 3G and H would provide an indication of the relative changes in firing.

      It would be interesting to know if the impact of optogenetic stimulation changed with exposure to the manipulation. Are all results presented only from the first X number of sessions in each animal? Or is the effect robust over time and (within the same animal) you can get the same results of optogenetic inactivation over time? This information seems critical for reproducibility.

      We have now performed brief optogenetic inactivations in several brain areas in several different behavioral paradigms, and have found that inactivation effects are stable both within and across sessions, almost surprisingly so. This includes cases where the inactivations were more frequent (every ~1.25 s on average) and more numerous (>15,000 trials per animal) than in the present manuscript. Thus we did not restrict our analysis here to the first X sessions or trials within a session. We have added additional plots as Figure S3T-AA showing the stability of optogenetic effects both within and across sessions.

      Given that it can be difficult to record from interneurons (as the proportion of putative interneurons in Figure S1 attests), the SALT analyses would be more convincing if a few recordings had been performed in the same region as optogenetic stimulation to show a "positive control" of what direct interneuron stimulation looks like. Could also use this to validate the narrow/wide waveform classification.

      We have verified that using SALT as we have in the present manuscript does detect vGAT+ interneurons directly responding to light. This is included in a recent preprint from the lab (Kristl et al., biorxiv, 2025). We (Warriner et al., Cell Reports, 2022) and others (Guo et al., Neuron, 2014) have previously used direct ChR2 activation to validate waveform-based classification.

      Simultaneous CFA/RFA recordings during optogenetic perturbation would also allow for time courses of inhibition to be compared in RFA/CFA. Does it take 25ms to inhibit locally, and the cross-area impact is fast, or does it inactivate very fast locally and takes ~25ms to impact the other region?

      Latencies of this sort are difficult to precisely measure given the statistical limits of this sort of data, but there does appear to be some degree of delay between local and downstream effects. We do not have a statistical foundation as of yet for concluding that this is the case. It will be interesting to examine this issue more rigorously in the future.

      Given the difference in the analytical methods, the authors should share data in a relatively unprocessed format (e.g., spike times from sorted units relative to video tracking + behavioral data), along with analysis code, to allow others to investigate these differences.

      We plan to post the data and code to our lab’s Github site once the Version of Record is online.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:  

      Reviewer #1 (Public Review):

      Summary:

      This paper reports an intracranial SEEG study of speech coordination, where participants synchronize their speech output with a virtual partner that is designed to vary its synchronization behavior. This allows the authors to identify electrodes throughout the left hemisphere of the brain that have activity (both power and phase) that correlates with the degree of synchronization behavior. They find that high-frequency activity in the secondary auditory cortex (superior temporal gyrus) is correlated to synchronization, in contrast to primary auditory regions. Furthermore, activity in the inferior frontal gyrus shows a significant phase-amplitude coupling relationship that is interpreted as compensation for deviation from synchronized behavior with the virtual partner.

      Strengths:

      (1) The development of a virtual partner model trained for each individual participant, which can dynamically vary its synchronization to the participant's behavior in real-time, is novel and exciting.

      (2) Understanding real-time temporal coordination for behaviors like speech is a critical and understudied area.

      (3) The use of SEEG provides the spatial and temporal resolution necessary to address the complex dynamics associated with the behavior.

      (4) The paper provides some results that suggest a role for regions like IFG and STG in the dynamic temporal coordination of behavior both within an individual speaker and across speakers performing a coordination task.

      We thank the Reviewer for their positive comments on our manuscript.

      Weaknesses:

      (1) The main weakness of the paper is that the results are presented in a largely descriptive and vague manner. For instance, while the interpretation of predictive coding and error correction is interesting, it is not clear how the experimental design or analyses specifically support such a model, or how they differentiate that model from the alternatives. It's possible that some greater specificity could be achieved by a more detailed examination of this rich dataset, for example by characterizing the specific phase relationships (e.g., positive vs negative lags) in areas that show correlations with synchronization behavior. However, as written, it is difficult to understand what these results tell us about how coordination behavior arises.

      We understand the reviewer’s comment. It is true that this work, being the first in the field using real-time adapting synchronous speech and intracerebral neural data, is a descriptive work, that hopefully will pave the way for further studies. We have now added more statistical analyses (see point 2) to go beyond a descriptive approach and we have also rewritten the discussion to clarify how this work can possibly contribute to disentangle different models of language interaction. Most importantly we have also run new analyses taking into account the specific phase relationship, as suggested.

      We already had an analysis using instantaneous phase difference in the phase-amplitude coupling approach, that bridges phase of behaviour to neural responses (amplitude in the high-frequency range). However, this analysis, as the reviewer noted, does not distinguish between positive and negative lags, but rather uses the continuous fluctuations of coordinative behaviour. Following the reviewer’s suggestion, we have now run a new analysis estimating the average delay (between virtual partner speech and patient speech) in each trial, using a cross-correlation approach. This gives a distribution of delays across trials that can then be “binned” as positive or negative. We have thus rerun the phase-amplitude coupling analyses on positive and negative trials separately, to assess whether the phase amplitude relationship depends upon the anticipatory (negative lags) or compensatory (positive lags) behaviour. Our new analysis (now in the supplementary, see figure below) does not reveal significant differences between positive and negative lags. This lack of difference, although not easy to interpret, is nonetheless interesting because it seems to show that the IFG does not have a stronger coupling for anticipatory trials. Rather the IFG seems to be strongly involved in adjusting behaviour, minimizing the error, independently of whether this is early or late.

      We have updated the “Coupling behavioural and neurophysiological data” section in Materials and methods as follows:  

      “In the third approach, we assessed whether the phase-amplitude relationship (or coupling) depends upon the anticipatory (negative delays) or compensatory (positive delays) behaviour between the VO and the patients’ speech. We computed the average delay in each trial using a cross-correlation approach on speech signals (between patient and VP) with the MATLAB function xcorr. A median split (patient-specific ; average median split = 0ms, average sd = 24ms) was applied to conserve a sufficient amount of data, classifying trials below the median as “anticipatory behaviour” and trials above the median as “compensatory behaviour”. Then we conducted the phase-amplitude coupling analyses on positive and negative trials separately.”

      We also added a paragraph on this finding in the Discussion:

      “Our results highlight the involvement of the inferior frontal gyrus (IFG) bilaterally, in particular the BA44 region, in speech coordination. First, trials with a weak verbal coordination (VCI) are accompanied by more prominent high frequency activity (HFa, Fig.4; Fig.S4). Second, when considering the within-trial time-resolved dynamics, the phase-amplitude coupling (PAC) reveals a tight relation between the low frequency behavioural dynamics (phase) and the modulation of high-frequency neural activity (amplitude, Fig.5B ; Fig.S5). This relation is strongest when considering the phase adjustments rather than the phase of speech of the VP per se : larger deviations in verbal coordination are accompanied by increase in HFa. Additionally, we also tested for potential effects of different asynchronies (i.e., temporal delay) between the participant's speech and that of the virtual partner but found no significant differences (Fig.S6). While lack of delay-effect does not permit to conclude about the sensitivity of BA44 to absolute timing of the partner’s speech, its neural dynamics are linked to the ongoing process of resolving phase deviations and maintaining synchrony.”

      (2) In the results section, there's a general lack of quantification. While some of the statistics reported in the figures are helpful, there are also claims that are stated without any statistical test. For example, in the paragraph starting on line 342, it is claimed that there is an inverse relationship between rho-value and frequency band, "possibly due to the reversed desynchronization/synchronization process in low and high frequency bands". Based on Figure 3, the first part of this statement appears to be true qualitatively, but is not quantified, and is therefore impossible to assess in relation to the second part of the claim. Similarly, the next paragraph on line 348 describes optimal clustering, but statistics of the clustering algorithm and silhouette metric are not provided. More importantly, it's not entirely clear what is being clustered - is the point to identify activity patterns that are similar within/across brain regions? Or to interpret the meaning of the specific patterns? If the latter, this is not explained or explored in the paper.

      The reviewer is right. We have now added statistical analyses showing that:

      (1) the ratio between synchronization and desynchronization evolves across frequencies (as often reported in the literature).

      (2) the sign of rho values also evolves across frequencies.

      (3) the clustering does indeed differ when taking into account behaviour. We have also clarified the use of clustering and the reasoning behind it.

      We have updated the Materials and methods section as follows:

      “The statistical difference between spatial clustering in global effect and brain-behaviour correlation was estimated with linear model using the R function lm (stat package), post-hoc comparisons were corrected for multiple comparisons using the Tukey test (lsmeans R package ; Lenth, 2016). The statistical difference between clustering in global effect and behaviour correlation across the number of clusters was estimated using permutation tests (N=1000) by computing the silhouette score difference between the two conditions.” We have updated the Results section as follows:

      (1) “This modulation between synchronization and desynchronization across frequencies was significant (F(5) = 6.42, p < .001 ; estimated with linear model using the R function lm).”

      (2) “The first observation is a gradual transition in the direction of correlations as we move up frequency bands, from positive correlations at low frequencies to negative ones at high frequencies (F(5) = 2.68, p = .02). This effect, present in both hemispheres, mimics the reversed desynchronization/synchronization process in low and high frequency bands reported above.”

      (3) “Importantly, compared to the global activity (task vs rest, Fig 3A), the neural spatial profile of the behaviour-related activity (Fig 3B) is more clustered, in the left hemisphere. Indeed, silhouette scores are systematically higher for behaviour-related activity compared to global activity, indicating greater clustering consistency across frequency bands (t(106) = 7.79, p < .001, see Figure S3). Moreover, silhouette scores are maximal, in particular for HFa, for five clusters (p < .001), located in the IFG BA44, the IPL BA 40 and the STG BA 41/42 and BA22 (see Figure S3).”

      (3) Given the design of the stimuli, it would be useful to know more about how coordination relates to specific speech units. The authors focus on the syllabic level, which is understandable. But as far as the results relate to speech planning (an explicit point in the paper), the claims could be strengthened by determining whether the coordination signal (whether error correction or otherwise) is specifically timed to e.g., the consonant vs the vowel. If the mechanism is a phase reset, does it tend to occur on one part of the syllable?

      Thank you for this thoughtful feedback. We agree that the relationship between speech coordination and specific speech units, such as consonants versus vowels, is an intriguing question. However, in our study, both interlocutors (the participant and the virtual partner) are adapting their speech production in real-time. This interactive coordination makes it difficult to isolate neural signatures corresponding to precise segments like consonants or vowels, as the adjustments occur in a continuous and dynamic context.

      The VP's ability to adapt depends on its sensitivity to spectral cues, such as the transition from one phonetic element to another. This is likely influenced by the type of articulation, with certain transitions being more salient (e.g., between a stop consonant like "p" and a vowel like "a") and others being less distinct (e.g., between nasal consonants like "m" and a vowel). Thus, the VP’s spectral adaptation tends to occur at these transitions, which are more prominent in some cases than in others.

      For the participants, previous studies have shown a greater sensitivity during the production of stressed vowels (Oschkinat & Hoole, 2022; Li & Lancia, 2024), which may reflect a heightened attentional or motor adjustment to stressed syllables.

      Here, we did not specifically address the question of coordination at the level of individual linguistic units. Moreover, even if we attempted to focus on this level, it would be challenging to relate neural dynamics directly to specific speech segments. The question of how synchronization at the level of individual linguistic units might relate to neural data is complex. The lack of clear, unit-specific predictions makes it difficult to parse out distinct neural signatures tied to individual segments, particularly when both interlocutors are continuously adjusting their speech in relation to one another.

      Therefore, while we recognize the potential importance of examining synchronization at the level of individual phonetic elements, the design of our task and the nature of the coordination in this interactive context (realtime bidirection adaptation) led us to focus more broadly on the overall dynamics of speech synchronization at the syllabic level, rather than on specific linguistic units.

      We now state at the end of the Discussion section:

      “It is worth noting that the influence of specific speech units, such as consonants versus vowels, on speech coordination remains to be explored. In non-interactive contexts, participants show greater sensitivity during the production of stressed vowels, possibly reflecting heightened attentional or motor adjustments (Oschkinat & Hoole, 2022; Li & Lancia, 2024). In this study, the VP’s adaptation relies on sensitivity to spectral cues, particularly phonetic transitions, with some (e.g., formant transitions) being more salient than others. However, how these effects manifest in an interactive setting remains an open question, as both interlocutors continuously adjust their speech in real time. Future studies could investigate whether coordination signals, such as phase resets, preferentially align with specific parts of the syllable.” References cited:

      – Oschkinat, M., & Hoole, P. (2022). Reactive feedback control and adaptation to perturbed speech timing in stressed and unstressed syllables. Journal of Phonetics, 91, 101133.

      – Li, J., & Lancia, L. (2024). A multimodal approach to study the nature of coordinative patterns underlying speech rhythm. In Proc. Interspeech, 397-401.

      (4) In the discussion the results are related to a previously-described speech-induced suppression effect. However, it's not clear what the current results have to do with SIS, since the speaker's own voice is present and predictable from the forward model on every trial. Statements such as "Moreover, when the two speech signals come close enough in time, the patient possibly perceives them as its own voice" are highly speculative and apparently not supported by the data.

      We thank the reviewer for raising thoughtful concerns about our interpretation of the observed neural suppression as related to speaker-induced suppression (SIS). We agree that our study lacks a passive listening condition, which limits direct comparisons to the original SIS effect, traditionally defined as the suppression of neural responses to self-produced speech compared to externally-generated speech (Meekings & Scott, 2021).

      In response, we have reconsidered our terminology and interpretation. In the revised Discussion section, we refer to our findings as a "SIS-related phenomenon specific to the synchronous speech context". Unlike classic SIS paradigms, our interactive task involves simultaneous monitoring of self- and externally-generated speech, introducing additional attentional and coordinative demands.

      The revised Discussion also incorporates findings by Ozker et al. (2022, 2024), which link SIS and speech monitoring, suggesting that suppressing responses to self-generated speech facilitates error detection. We propose that the decrease in high-frequency activity (HFa) as verbal coordination increases reflects reduced error signals due to closer alignment between perceived and produced speech. Conversely, HFa increases with reduced coordination may signify greater prediction error.

      Additionally, we relate our findings to the "rubber voice" effect (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021), where temporally and phonetically congruent external speech can be perceived as self-generated. We speculate that this may occur in synchronous speech tasks when the participant's and VP's speech signals closely align. However, this interpretation remains speculative, as no subjective reports were collected to confirm this perception. Future studies could include participant questionnaires to validate this effect and relate subjective experience to neural measures of synchronization.

      Overall, our findings extend the study of SIS to dynamic, interactive contexts and contribute to understanding internal forward models of speech production in more naturalistic scenarios.

      We have now added these points to the discussion as follows:

      “The observed negative correlation between verbal coordination and high-frequency activity (HFa) in STG BA22 suggests a suppression of neural responses as the degree of behavioural synchrony increases. This result is reminiscent of findings on speaker-induced suppression (SIS), where neural activity in auditory cortex decreases during self-generated speech compared to externally-generated speech (Meekings & Scott, 2021; Niziolek et al., 2013). However, our paradigm differs from traditional SIS studies in two critical ways: (1) the speaker's own voice is always present and predictable from the forward model, and (2) no passive listening condition was included. Therefore, our findings cannot be directly equated with the original SIS effect.

      Instead, we propose that the suppression observed here reflects a SIS-related phenomenon specific to the synchronous speech context. Synchronous speech requires simultaneous monitoring of self- and externallygenerated speech, a task that is both attentionally demanding and coordinative. This aligns with evidence from Ozker et al. (2024, 2022), showing that the same neural populations in STG exhibit SIS and heightened responses to feedback perturbations. These findings suggest that SIS and speech monitoring are related processes, where suppressing responses to self-generated speech facilitates error detection. In our study, suppression of HFa as coordination increases may reflect reduced prediction errors due to closer alignment between perceived and produced speech signals. Conversely, increased HFa during poor coordination may signify greater mismatch, consistent with prediction error theories (Houde & Nagarajan, 2011; Friston et al., 2020). Furthermore, when self- and externally-generated speech signals are temporally and phonetically congruent, participants may perceive external speech as their own. This echoes the "rubber voice" effect, where external speech resembling self-produced feedback is perceived as self-generated (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021). While this interpretation remains speculative, future studies could incorporate subjective reports to investigate this phenomenon in more detail.” References cited:

      – Franken, M. K., Hartsuiker, R. J., Johansson, P., Hall, L., & Lind, A. (2021). Speaking With an Alien Voice: Flexible Sense of Agency During Vocal Production. Journal of Experimental Psychology-Human perception and performance, 47(4), 479-494. https://doi.org/10.1037/xhp0000799

      – Houde, J. F., & Nagarajan, S. S. (2011). Speech production as state feedback control. Frontiers in human neuroscience, 5, 82.

      – Lind, A., Hall, L., Breidegard, B., Balkenius, C., & Johansson, P. (2014). Speakers' acceptance of real-time speech exchange indicates that we use auditory feedback to specify the meaning of what we say. Psychological Science, 25(6), 1198-1205. https://doi.org/10.1177/0956797614529797

      – Meekings, S., & Scott, S. K. (2021). Error in the Superior Temporal Gyrus? A Systematic Review and Activation Likelihood Estimation Meta-Analysis of Speech Production Studies. Journal of Cognitive Neuroscience, 33(3), 422-444. https://doi.org/10.1162/jocn_a_01661

      – Niziolek C. A., Nagarajan S. S., Houde J. F (2013) What does motor efference copy represent? Evidence from speech production Journal of Neuroscience 33:16110–16116Ozker M., Doyle W., Devinsky O., Flinker A (2022) A cortical network processes auditory error signals during human speech production to maintain fluency PLoS Biology 20.

      – Ozker, M., Yu, L., Dugan, P., Doyle, W., Friedman, D., Devinsky, O., & Flinker, A. (2024). Speech-induced suppression and vocal feedback sensitivity in human cortex. eLife, 13, RP94198. https://doi.org/10.7554/eLife.94198

      – Zheng, Z. Z., MacDonald, E. N., Munhall, K. G., & Johnsrude, I. S. (2011). Perceiving a Stranger's Voice as Being One's Own: A 'Rubber Voice' Illusion? PLOS ONE, 6(4), e18655.

      (5) There are some seemingly arbitrary decisions made in the design and analysis that, while likely justified, need to be explained. For example, how were the cutoffs for moderate coupling vs phase-shifted coupling (k ~0.09) determined? This is noted as "rather weak" (line 212), but it's not clear where this comes from. Similarly, the ROI-based analyses are only done on regions "recorded in at least 7 patients" - how was this number chosen? How many electrodes total does this correspond to? Is there heterogeneity within each ROI?

      The reviewer is correct, we apologize for this missing information. We now specify that the coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization, but keeping the phase-shifted coupling at a rather implicit level.  

      Concerning the definition of coupling as weak, one should consider that, in the Kuramoto model, the strength of coupling (k) is relative to the spread of the natural frequencies (Δω) in the system. In our study, the natural frequencies of syllables range approximately from 2 Hz to 10Hz, resulting in a frequency spread of Δω = 8 Hz. For coupling to strongly synchronize oscillators across such a wide range, k must be comparable to or exceed Δω. Thus, since k = 0.1 is far much smaller than Δω, it is therefore classified as weak coupling.

      We have now modified the Materials and methods section as follows:

      “More precisely, for a third of the trials the VP had a neutral behaviour (close to zero coupling: k = +/- 0.01). For a third it had a moderate coupling, meaning that the VP synchronised more to the participant speech (k = -0.09). And for the last third of the trials the VP had a moderate coupling but with a phase shift of pi/2, meaning that it moderately aimed to speak in between the participant syllables (k = + 0.09). The coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization but keeping the phase-shifted coupling at a rather implicit level. In other terms, while participants knew that the VP would adapt, they did not necessarily know in which direction the coupling went.”

      Regarding the criterion of including regions recorded in at least 7 patients, our goal was to balance data completeness with statistical power. Given our total sample of 16 patients, this threshold ensures that each included region is represented in at least ~44% of the cohort, reducing the likelihood of spurious findings due to extremely small sample sizes. This choice also aligns with common neurophysiological analysis practices, where a minimum number of subjects (at least 2 in extreme cases) is required to achieve meaningful interindividual comparisons while avoiding excessive data exclusion. Additionally, this threshold maintains a reasonable tradeoff between maximizing patient inclusion and ensuring that statistical tests remain robust.

      We have now added more information in the Results section “Spectral profiles in the language network are nuanced by behaviour” on this point as follows:

      “To balance data completeness and statistical power, we included only brain regions recorded in at least 7 patients (~44% of the cohort) for the left hemisphere and at least 5 patients for the right hemisphere (~31% of the cohort), ensuring sufficient representation while minimizing biases due to sparse data.”

      Reviewer #2 (Public Review):

      Summary:

      This paper investigates the neural underpinnings of an interactive speech task requiring verbal coordination with another speaker. To achieve this, the authors recorded intracranial brain activity from the left hemisphere in a group of drug-resistant epilepsy patients while they synchronised their speech with a 'virtual partner'. Crucially, the authors were able to manipulate the degree of success of this synchronisation by programming the virtual partner to either actively synchronise or desynchronise their speech with the participant, or else to not vary its speech in response to the participant (making the synchronisation task purely one-way). Using such a paradigm, the authors identified different brain regions that were either more sensitive to the speech of the virtual partner (primary auditory cortex), or more sensitive to the degree of verbal coordination (i.e. synchronisation success) with the virtual partner (secondary auditory cortex and IFG). Such sensitivity was measured by (1) calculating the correlation between the index of verbal coordination and mean power within a range of frequency bands across trials, and (2) calculating the phase-amplitude coupling between the behavioural and brain signals within single trials (using the power of high-frequency neural activity only). Overall, the findings help to elucidate some of the left hemisphere brain areas involved in interactive speaking behaviours, particularly highlighting the highfrequency activity of the IFG as a potential candidate supporting verbal coordination.

      Strengths:

      This study provides the field with a convincing demonstration of how to investigate speaking behaviours in more complex situations that share many features with real-world speaking contexts e.g. simultaneous engagement of speech perception and production processes, the presence of an interlocutor, and the need for inter-speaker coordination. The findings thus go beyond previous work that has typically studied solo speech production in isolation, and represent a significant advance in our understanding of speech as a social and communicative behaviour. It is further an impressive feat to develop a paradigm in which the degree of cooperativity of the synchronisation partner can be so tightly controlled; in this way, this study combines the benefits of using prerecorded stimuli (namely, the high degree of experimental control) with the benefits of using a live synchronisation partner (allowing the task to be truly two-way interactive, an important criticism of other work using pre-recorded stimuli). A further key strength of the study lies in its employment of stereotactic EEG to measure brain responses with both high temporal and spatial resolution, an ideal method for studying the unfolding relationship between neural processing and this dynamic coordination behaviour.

      We sincerely appreciate the Reviewer's thoughtful and positive feedback on our manuscript.

      Weaknesses:

      One major limitation of the current study is the lack of coverage of the right hemisphere by the implanted electrodes. Of course, electrode location is solely clinically motivated, and so the authors did not have control over this. However, this means that the current study neglects the potentially important role of the right hemisphere in this task. The right hemisphere has previously been proposed to support feedback control for speech (likely a core process engaged by synchronous speech), as opposed to the left hemisphere which has been argued to underlie feedforward control (Tourville & Guenther, 2011). Indeed, a previous fMRI study of synchronous speech reported the engagement of a network of right hemisphere regions, including STG, IPL, IFG, and the temporal pole (Jasmin et al., 2016). Further, the release from speech-induced suppression during a synchronous speech reported by Jasmin et al. was found in the right temporal pole, which may explain the discrepancy with the current finding of reduced leftward high-frequency activity with increasing verbal coordination (suggesting instead increased speech-induced suppression for successful synchronisation). The findings should therefore be interpreted with the caveat that they are limited to the left hemisphere, and are thus likely missing an important aspect of the neural processing underpinning verbal coordination behaviour.

      We have now included, in the supplementary materials, data from the right hemisphere, although the coverage is a bit sparse (Figures S2, S4, S5, see our responses in the ‘Recommendation for the authors’ section, below). We have also revised the Discussion section to add the putative role of right temporal regions (see below as well).

      A further limitation of this study is that its findings are purely correlational in nature; that is, the results tell us how neural activity correlates with behaviour, but not whether it is instrumental in that behaviour. Elucidating the latter would require some form of intervention such as electrode stimulation, to disrupt activity in a brain area and measure the resulting effect on behaviour. Any claims therefore as to the specific role of brain areas in verbal coordination (e.g. the role of the IFG in supporting online coordinative adjustments to achieve synchronisation) are therefore speculative.

      We appreciate the reviewer’s observation regarding the correlational nature of our findings and agree that this is a common limitation of neuroimaging studies. While elucidating causal relationships would indeed require intervention techniques such as electrical stimulation, our study leverages the unique advantages of intracerebral recordings, offering the best available spatial and temporal resolution alongside a high signal-tonoise ratio. These attributes ensure that our data accurately reflect neural activity and its temporal dynamics, providing a robust foundation for understanding the relationship between neural processes and behaviour. Therefore, while causal claims are beyond the scope of this study, the precision of our methodology allows us to make well-supported observations about the neural correlates of synchronous speech tasks.

      Recommendations for the authors:

      Reviewing Editor Comment:

      After joint consultation, we are seeing the potential for the report to be strengthened and the evidence here to be deemed ultimately at least 'solid': to us (editors and reviewers) it seems that this would require both (1) clarifying/acknowledging the limitations of not having right hemisphere data, and (2) running some of the additional analyses the reviewers suggest, which should allow for richer examination of the data e.g. phase relationships in areas that correlate with synchronisation.

      We have now added data on the right hemisphere (RH) that we did not previously report due to a rather sparse sampling of the RH. These results are now reported in the Results section as well as in the Supplementary section, where we put all right hemisphere figures for all analyses (Figure S2, S4, S5). We have also run additional analyses digging into the phase relationship in areas that correlate with synchronisation (Figure S6). These additional analyses allowed us to improve the Discussion section as well.

      Reviewer #1 (Recommendations For The Authors):

      In some sections, the writing is a bit unclear, with both typos and vague statements that could be fixed with careful proofreading.

      We thank the reviewer for pointing out areas where the writing could be improved. We carefully proofread the manuscript to address typos and clarify any vague statements. Specific sections identified as unclear have been rephrased for better precision and readability.

      In Figure 1, the colors repeat, making it impossible to tell patients apart.

      We have now updated Figure 1 colormap to avoid redundancy and added the right hemisphere.

      Line 132: "16 unilateral implantations (9 left, 7 bilateral implantations)". Should this say 7 right hemisphere? If so, the following sentence stating that there was "insufficient cover [sic] of the right hemisphere" is unclear, since the number of patients between LH and RH is similar.

      The confusion was due to the fact that the lateralization refers to the presence/absence of electrodes in the Heschl’s gyrus (left : H’ ; right : H) exclusively.

      We have thus changed this section as follows:

      “16 patients (7 women, mean age 29.8 y, range 17 - 50 y) with pharmacoresistant epilepsy took part in the study. They were included if their implantation map covered at least partially the Heschl's gyrus and had sufficiently intact diction to support relatively sustained language production.” The relevant part (previously line 132) now states:

      “Sixteen patients with a total of 236 electrodes (145 in the left hemisphere) and 2395 contacts (1459 in the left hemisphere, see Figure 1). While this gives a rather sparse coverage of the right hemisphere, we decided, due to the rarity of this type of data, to report results for both hemispheres, with figures for the left hemisphere in the main text and figures for the right hemisphere in the supplementary section.”

      Reviewer #2 (Recommendations For The Authors):

      (1) To address the concern regarding the absence of data from the right hemisphere, I would advise the authors to directly acknowledge this limitation in their Discussion section, citing relevant work suggesting that the right hemisphere has an important role to play in this task (e.g. Jasmin et al., 2016). You should also make this clear in your abstract e.g. you could rewrite the sentence in line 40 to be: "Then, we recorded the intracranial brain activity of the left hemisphere in 16 patients with drug-resistant epilepsy...".

      We are grateful to the reviewer for this comment that incited us to look into the right hemisphere data. We have now included results in the right hemisphere, although the coverage is a bit sparse. We have also revised the Discussion section to add the putative role of right temporal regions. Interestingly, our results show, as suggested by the reviewer, a clear involvement of the RH in this task.

      First, the full brain analyses show a very similar implication of the RH as compared to the LH (see Figure below). We have now added in the Results section:

      “As expected, the whole language network is strongly involved, including both dorsal and ventral pathways (Fig 3A). More precisely, in the left temporal lobe the superior, middle and inferior temporal gyri, in the left parietal lobe the inferior parietal lobule (IPL) and in the left frontal lobe the inferior frontal gyrus (IFG) and the middle frontal gyrus (MFG). Similar results are observed in the right hemisphere, neural responses being present across all six frequency bands with medium to large modulation in activity compared to baseline (Figure S2A) in the same regions. Desynchronizations are present in the theta, alpha and beta bands while the low gamma and HFa bands show power increases.”

      As to compared to the left hemisphere, assessing brain-behaviour correlations in the right hemisphere does not provide the same statistical power, because some anatomical regions have very few electrodes. Nonetheless, we observe a strong correlation in the right IFG, similar to the one we previously reported in the left hemisphere, and we now report in the Results section:

      “The decrease in HFa along the dorsal pathway is replicated in the right hemisphere (Figure S4). However, while both the right STG BA41/42 and STG BA22 present a power increase (compared to baseline) — with a stronger increase for the STG BA41/42 — neither shows a significant correlation with verbal coordination (t(45)=-1.65, p=.1 ; t(8)=-0.67, p=.5 ; Student’s T test, FDR correction). By contrast, results in the right IFG BA44 are similar to the one observed in the left hemisphere with a significant power increase associated with a negative brainbehaviour correlation (t(17) = -3.11, p = .01 ; Student’s T test, FDR correction).”

      Interestingly, the phase-amplitude coupling analysis yields very similar results in both hemispheres (exception made for BA22). We have thus updated the Results section as follows:

      “Notably, when comparing – within the regions of interest previously described – the PAC with the virtual partner speech and the PAC with the phase difference, the coupling relationship changes when moving along the dorsal pathway: a stronger coupling in the auditory regions with the speech input, no difference between speech and coordination dynamics in the IPL and a stronger coupling for the coordinative dynamics compared to speech signal in the IFG (Figure 5B ). When looking at the right hemisphere, we observe the same changes in the coupling relationship when moving along the dorsal pathway, except that no difference between speech and coordination dynamics is present in the right secondary auditory regions (STG BA22; Figure S5).”

      We also included in the Discussion section the right hemisphere results also mentioning previous work of Guenther and the one of Jasmin. On the section “Left secondary auditory regions are more sensitive to coordinative behaviour” one can read:

      “Furthermore, the absence of correlation in the right STG BA22 (Figure S4) seems in first stance to challenge influential speech production models (e.g. Guenther & Hickok, 2016) that propose that the right hemisphere is involved in feedback control. However, one needs to consider the the task at stake heavily relied upon temporal mismatches and adjustments. In this context, the left-lateralized sensitivity to verbal coordination reminds of the works of Floegel and colleagues (2020, 2023) suggesting that both hemispheres are involved depending on the type of error: the right auditory association cortex monitoring preferentially spectral speech features and the left auditory association cortex monitoring preferentially temporal speech features. Nonetheless, the right temporal pole seems to be sensitive to speech coordinative behaviour, confirming previous findings using fMRI (Jasmin et al., 2016) and thus showing that the right hemisphere has an important role to play in this type of tasks (e.g. Jasmin et al., 2016).”

      References cited:

      – Floegel, M., Fuchs, S., & Kell, C. A. (2020). Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control. Nature Communications, 11(1), 2839.

      – Floegel, M., Kasper, J., Perrier, P., & Kell, C. A. (2023). How the conception of control influences our understanding of actions. Nature Reviews Neuroscience, 24(5), 313-329.

      – Guenther, F. H., & Hickok, G. (2016). Neural models of motor speech control. In Neurobiology of language (pp. 725-740). Academic Press.

      (2) When discussing previous work on alignment during synchronous speech, you may wish to include a recently published paper by Bradshaw et al (2024); this manipulated the acoustics of the accompanist's voice during a synchronous speech task to show interactions between speech motor adaptation and phonetic convergence/alignment.

      We thank the reviewer for pointing to this recent and interesting paper. We added the article as reference as follows

      “Furthermore, synchronous speech favors the emergence of alignment phenomena, for instance of the fundamental frequency or the syllable onset (Assaneo et al., 2019 ; Bradshaw & McGettigan, 2021 ; Bradshaw et al., 2023; Bradshaw et al., 2024).”

      (3) Line 80: "Synchronous speech resembles to a certain extent to delayed auditory feedback tasks"- I think you mean "altered auditory feedback tasks" here.

      In the case of synchronous speech it is more about timing than altered speech signals, that is why the comparison is done with delayed and not altered auditory feedback. Nonetheless, we understand the Reviewer’s point and we have now changed the sentence as follows:

      “Synchronous speech resembles to a certain extent to delayed/altered auditory feedback tasks”

      (4) When discussing superior temporal responses during such altered feedback tasks, you may also want to cite a review paper by Meekings and Scott (2021).

      We thank the reviewer for this suggestion, indeed this was a big oversight!

      The paper is now quoted in the introduction as follows:

      “Previous studies have revealed increased responses in the superior temporal regions compared to normal feedback conditions (Hirano et al., 1997 ; Hashimoto & Sakai, 2003 ; Takaso et al., 2010 ; Ozerk et al., 2022 ; Floegel et al., 2020 ; see Meekings & Scott, 2021 for a review of error-monitoring and feedback control in the STG during speech production).”

      Furthermore, we updated the discussion part concerning the speaker-induced suppression phenomenon (see below our response to the point 10).

      (5) Line 125: "The parameters and sound adjustment were set using an external low-latency sound card (RME Babyface Pro Fs)". Can you please report the total feedback loop latency in your set-up? Or at the least cite the following paper which reports low latencies with this audio device.

      Kim, K. S., Wang, H., & Max, L. (2020). It's About Time: Minimizing Hardware and Software Latencies in Speech Research With Real-Time Auditory Feedback. Journal of Speech, Language, and Hearing Research, 63(8), 25222534. https://doi.org/10.1044/2020_JSLHR-19-00419

      We now report the total feedback loop latency (~5ms) and also cite the relevant paper (Kim et al., 2020).

      (6) Line 127 "A calibration was made to find a comfortable volume and an optimal balance for both the sound of the participant's own voice, which was fed back through the headphones, and the sound of the stimuli." What do you mean here by an 'optimal balance'? Was the participant's own voice always louder than the VP stimuli? Can you report roughly what you consider to be a comfortable volume in dB?

      This point was indeed unlcear. We have now changed as follows:

      “A calibration was made to find a comfortable volume and an optimal balance for both the sound of the participant's own voice, which was fed back through the headphones, and the sound of the stimuli. The aim of this procedure was that the patient would subjectively perceive their voice and the VP-voice in equal measure. VP voice was delivered at approximately 70dB.”

      (7) Relatedly, did you use any noise masking to mask the air-conducted feedback from their own voice (which would have been slightly out of phase with the feedback through the headphones, depending on your latency)?

      Considering the low-latency condition allowed with the sound card (RME Babyface Pro Fs), we did not use noise masking to mask the air-conducted feedback from the self-voice of the patients.

      (8) Line 141: "four short sentences were pre-recorded by a woman and a man." Did all participants synchronise with both the man and woman or was the VP gender matched to that of the participant/patient?

      We thank the reviewer for this important missing detail. We know changed the text as follows:

      “Four stimuli corresponding to four short sentences were pre-recorded by both a female and a male speaker. This allowed to adapt to the natural gender differences in fundamental frequency (i.e. so that the VP gender matched that of the patients). All stimuli were normalised in amplitude.”

      (9) Can you clarify what instructions participants were given regarding the VP? That is, were they told that this was a recording or a real live speaker? Were they naïve to the manipulation of the VP's coupling to the participant?

      We have now added this information to the task description as follows:

      “Participants, comfortably seated in a medical chair, were instructed that they would perform a real-time interactive synchronous speech task with an artificial agent (Virtual Partner, henceforth VP, see next section) that can modulate and adapt to the participant’s speech in real time.”

      “The third step was the actual experiment. This was identical to the training but consisted of 24 trials (14s long, speech rate ~3Hz, yielding ~1000 syllables). Importantly, the VP varied its coupling behaviour to the participant. More precisely, for a third of the sequences the VP had a neutral behaviour (close to zero coupling : k = +/- 0.01). For a third it had a moderate coupling, meaning that the VP synchronised more to the participant speech (k = - 0.09). And for the last third of the sequences the VP had a moderate coupling but with a phase shift of pi/2, meaning that it moderately aimed to speak in between the participant syllables (k = + 0.09). The coupling values were empirically determined on the basis of a pilot experiment in order to induce more or less synchronization, but keeping the phase-shifted coupling at a rather implicit level. In other terms, while participants knew that the VP would adapt, they did not necessarily know in which direction the coupling went.”  

      (10) The paragraph from line 438 entitled "Secondary auditory regions are more sensitive to coordinative behaviour" includes an interesting discussion of the relation of the current findings to the phenomenon of speech-induced suppression (SIS). However, the authors appear to equate the observed decrease in highfrequency activity as speech coordination increases with the phenomenon of SIS (in lines 456-457), which is quite a speculative leap. I would encourage the authors to temper this discussion by referring to SIS as a potentially related phenomenon, with a need for more experimental work to determine if this is indeed the same phenomenon as the decreases in high-frequency power observed here. I believe that the authors are arguing here for an interpretation of SIS as reflecting internal modelling of sensory input regardless of whether this is self-generated or other-generated; if this is indeed the case, I would ask the authors to be more explicit here that these ideas are not a standard part of the traditional account of SIS, which only includes internal modelling of self-produced sensory feedback.

      As stated in the public review, we thank both reviewers for raising thoughtful concerns about our interpretation of the observed neural suppression as related to speaker-induced suppression (SIS). We agree that our study lacks a passive listening condition, which limits direct comparisons to the original SIS effect, traditionally defined as the suppression of neural responses to self-produced speech compared to externally-generated speech (Meekings & Scott, 2021).

      In response, we have reconsidered our terminology and interpretation. In the revised discussion, we refer to our findings as a "SIS-related phenomenon specific to the synchronous speech context." Unlike classic SIS paradigms, our interactive task involves simultaneous monitoring of self- and externally-generated speech, introducing additional attentional and coordinative demands.

      The revised discussion also incorporates findings by Ozker et al. (2024, 2022), which link SIS and speech monitoring, suggesting that suppressing responses to self-generated speech facilitates error detection. We propose that the decrease in high-frequency activity (HFa) as verbal coordination increases reflects reduced error signals due to closer alignment between perceived and produced speech. Conversely, HFa increases with reduced coordination may signify greater prediction error.

      Additionally, we relate our findings to the "rubber voice" effect (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021), where temporally and phonetically congruent external speech can be perceived as self-generated. We speculate that this may occur in synchronous speech tasks when the participant's and VP's speech signals closely align. However, this interpretation remains speculative, as no subjective reports were collected to confirm this perception. Future studies could include participant questionnaires to validate this effect and relate subjective experience to neural measures of synchronization.

      Overall, our findings extend the study of SIS to dynamic, interactive contexts and contribute to understanding internal forward models of speech production in more naturalistic scenarios.

      We have now added these points to the discussion as follows:

      “The observed negative correlation between verbal coordination and high-frequency activity (HFa) in STG BA22 suggests a suppression of neural responses as the degree of synchrony increases. This result aligns with findings on speaker-induced suppression (SIS), where neural activity in auditory cortex decreases during self-generated speech compared to externally-generated speech (Meekings & Scott, 2021; Niziolek et al., 2013). However, our paradigm differs from traditional SIS studies in two critical ways: (1) the speaker's own voice is always present and predictable from the forward model, and (2) no passive listening condition was included. Therefore, our findings cannot be directly equated with the original SIS effect.

      Instead, we propose that the suppression observed here reflects a SIS-related phenomenon specific to the synchronous speech context. Synchronous speech requires simultaneous monitoring of self- and externally generated speech, a task that is both attentionally demanding and coordinative. This aligns with evidence from Ozker et al. (2024, 2022), showing that the same neural populations in STG exhibit SIS and heightened responses to feedback perturbations. These findings suggest that SIS and speech monitoring are related processes, where suppressing responses to self-generated speech facilitates error detection.

      In our study, suppression of HFa as coordination increases may reflect reduced prediction errors due to closer alignment between perceived and produced speech signals. Conversely, increased HFa during poor coordination may signify greater mismatch, consistent with prediction error theories (Houde & Nagarajan, 2011; Friston et al., 2020).”

      (11) Within this section, you also speculate in line 460 that "Moreover, when the two speech signals come close enough in time, the patient possibly perceives them as its own voice." I would recommend citing studies on the 'rubber voice' effect to back up this claim (e.g. Franken et al., 2021; Lind et al., 2014; Zheng et al., 2011).

      We are grateful to the Reviewer for this interesting suggestion. Directly following the previous comment, the section now states:

      “Furthermore, when self- and externally-generated speech signals are temporally and phonetically congruent, participants may perceive external speech as their own. This echoes the "rubber voice" effect, where external speech resembling self-produced feedback is perceived as self-generated (Zheng et al., 2011; Lind et al., 2014; Franken et al., 2021). While this interpretation remains speculative, future studies could incorporate subjective reports to investigate this phenomenon in more detail.”

      (12) As noted in my public review, since your methods are correlational, you need to be careful about inferring the causal role of any brain areas in supporting a specific aspect of functioning e.g. line 501-504: "By contrast, in the inferior frontal gyrus, the coupling in the high-frequency activity is strongest with the input-output phase difference (input of the VP - output of the speaker), a metric that reflects the amount of error in the internal computation to reach optimal coordination, which indicates that this region optimises the predictive and coordinative behaviour required by the task." I would argue that the latter part of this sentence is a conclusion that, although consistent with, goes beyond the current data in this study, and thus needs tempering.

      We agree with the Reviewer and changed the sentence as follows:

      “By contrast, in the inferior frontal gyrus, the coupling in the high-frequency activity is strongest with the inputoutput phase difference (input of the VP - output of the speaker), a metric that could possibly reflect the amount of error in the internal computation to reach optimal coordination. This indicates that this region could have an implication in the optimisation of the predictive and coordinative behaviour required by the task.”

    1. In 2019 the company Facebook (now called Meta) presented an internal study that found that Instagram was bad for the mental health of teenage girls, and yet they still allowed teenage girls to use Instagram. So, what does social media do to the mental health of teenage girls, and to all its other users?

      I think it’s a great question of should we still allow people to use instagram if it’s bad for us. If Meta’s proves that instagram can be bad to teenage girls. Shouldn’t we find ways to let it be beneficial instead of just ban it. I would considered that as a method to stop instagram from taking away customers from Facebook. Also the study indicated that in general this could be harmful. But there’s people benefits from this platform. If unfair to close it just because this may be bad for the tonnage girls mental health. Finding ways to make it beneficial to mental health would be the right solution.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02888

      Corresponding author(s): Christian, Fankhauser

      General Statements

      We were pleased to see that the three reviewers found our work interesting and provided supportive and constructive comments.

      Our answers to their comments and/or how we propose to address them in a revised manuscript are included in bold.

      1. Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Plant systems sense shading by neighbors via the phytochrome signaling system. In the shade, PHYTOCHROME-INTERACTING FACTORS (PIFs) accumulate and are responsible for transcriptional reprogramming that enable plants to mobilize the "shade-avoidance response". Here, the authors have sought to examine the role of chromatin in modulating this response, specifically by examining whether "open" or "closed" chromatin regions spanning PIF target genes might explain the transcriptional output of these genes. They used a combination of ATAC-seq/CoP-qPCR (to detect open regions of chromatin), ChIP (to assay PIF binding) and RNA-seq (to measure transcript abundance) to understand how these processes may be mechanistically linked in Arabidopsis wild-type and pif mutant lines. They found that some chromatin accessibility changes do occur after LRFR (shade) treatment (32 regions after 1h and 61 after 25 h). While some of these overlap with PIF-binding sites, the authors found no correlation between open chromatin states and high levels of transcription. Because auxin is an important component of the shade-avoidance response and has been shown to control chromatin accessibility in other contexts, they examined whether auxin might be required for opening these regions of chromatin. They find that in an auxin biosynthesis mutant, there is a small subset of PIF target genes whose chromatin accessibility seems altered relative to the wild-type. Likewise, they found that chromatin accessibility for certain PIF targets is altered in phyB and pif mutant and propose that PIFs are necessary for changing the accessibility of chromatin in these genes. The authors thus propose that PIF occupancy of already open regions, rather than increased accessibility, underly the increase in transcript of abundance of these target genes in response to shade.

      Major comments: *• I find that the data generally support the hypothesis presented in the manuscript that chromatin accessibility alone does not predict transcription of PIF target genes in the shade. That said, I think that a paragraph from the discussion (lines 321-332) would benefit from some careful rephrasing. I think it is perfectly reasonable to propose that PIF occupancy is more predictive of shade-induced transcriptional output than chromatin accessibility, but I think that calling PIF occupancy "the key drivers" (line 323) or "the main driving force" (line 76) risks ignoring the observation that levels of PIF occupancy specifically do not predict expression levels of PIF target genes (Pfeiffer et al., 2014, Mol Plant). For PIL1 and HFR1, the authors have shown that PIF promoter occupancy and transcript levels are correlated, but the central finding of Pfeiffer et al. was that this pattern does not apply to the majority of PIF direct target genes. Finding factors (i.e. histone marks) that convert PIF-binding information into transcriptional output appears to have been the impetus for the experiments devised in Willige et al., 2021 and Calderon et al., 2022. It is great that the authors have outlined in the discussion that there are a number of factors that modulate PIF transcriptional activating activity but I think that the emphasis on PIF-binding explaining transcript abundance should be moderated in the text. *

      We appreciate the reviewers’ comments and will address it by introducing appropriate changes to the discussion. One element that should be pointed out is that the study of Willige et al., 2021 allows us to look at sites where PIF7 is recruited in response to the shade stimulus (a low R/FR treatment) and relate this to higher transcript abundance of the nearby genes. The study of Pfeiffer et al., 2014 which analyses PIF ChIP studies from several labs does not include this dynamic view of PIF recruitment in response to a stimulus. For example, this study re-analyses data from our lab, Hornitschek et al., 2012, in which we did PIF5 ChIP in low R/FR, but we did not compare that to high R/FR to enable an analysis of sites where we see recruitment of PIF5 in response to a shade cue. In the revised manuscript we will also include a new figure comparing PIF7 recruitment and changes in gene expression at direct PIF target genes.

      • I think that the hypothesis could be further supported by incorporating the previously published ChIP-seq data on PIF1, PIF3 and PIF5 binding. Given these data are published/publicly available, I think it would be helpful to note which of the 72 DARs are bound by PIF1, PIF3 and/or PIF5. Especially so given that PIF5 (Lorrain et al., 2008, Plant J) and PIF1/PIF3 (Leivar et al., 2012, Plant Cell) contribute at least in some capacity to transcriptional regulation in response to shade. At the very least, it might help explain some of the observed increases in nucleosome accessibility observed for genes that don't have PIF4 or PIF7-binding.* This is a thoughtful suggestion. Our choice to focus on PIF7 target genes is dictated by two reasons. First, the finding that amongst all tested PIFs, PIF7 is the major contributor to the control of low R/FR (neighbor proximity) induced responses in seedlings (e.g. Li et al., 2012; de Wit et al., 2016; Willige et al., 2021). In addition, the PIF7 ChIP-seq and gene expression data from the Willige et al., 2021 paper was obtained using growth conditions very similar to the ones we used, hence allowing us to compare it to our data. As the reviewer suggests, other PIFs also contribute to the low R/FR response and hence looking at ChIP-seq for those PIFs in publicly available data is also informative. One limitation of this data is that ChIP-seq was not always done in seedlings grown in conditions directly comparable to the conditions we used (except for PIF5, see above). Nevertheless, we have performed this analysis with the available data suggested by the reviewer and intend to include the results in the revised version of the manuscript, presumably updated Figure 4B.

      • In the manuscript, there are several instances where separate col-0 (wild type) controls have been used for identical experiments. Specifically, qPCR (Fig 3C, Fig S7C/D and Fig S8C/D), CoP-qPCR (Fig 5B/5C and Fig S8E/F) and hypocotyl measurements (Fig S7A/B and Fig S8A/B). In the cases of the hypocotyl measurements, there appear to be hardly any differences between col-0 controls indicating the measurements can be confidently compared between genotypes.

      We appreciate this comment but to be comprehensive, we like to include a Col-0 control for each experiment (whenever possible) and hence also include the data when available.

      • In some cases of qPCR and CoP-qPCR experiments however, the differences in values obtained from col-0 samples that underwent identical experimental treatments appear to vary significantly. In Figure 3C for example, the overall trend for PIL1 expression in col-0 is the same (e.g. HRFR levels are low, LRFR1 levels are much higher and LRFR25 levels drop down to some intermediate level) but the expression levels themselves appear to differ almost two-fold for the LRFR 1h timepoint (~110 on the left panel vs ~60 for the right panel). Given the size of the error bars, it appears that these data represent the mean from only one biological replicate. PIL1 expression levels at LRFR 1h as reported in Fig S7C and D also show similar ~2-fold differences. __This is a good comment. Having looked at PIL1 gene induction by low R/FR in dozens of similar experiments made us realize that indeed while the PIL1 induction is always massive, the extent is somewhat variable. Based on the data that we have (including from RNA-seq) we are convinced that this is due to the very low level of expression of PIL1 in high R/FR conditions. Given that induction by low R/FR is expressed as fold increase relative to baseline high R/FR expression, small changes in the lowly expressed PIL1* in high R/FR leads to seemingly significant differences in its induction by low R/FR across experiments.__

      All qPCR data is represented by three biological replicates, and the variation between them per experiment is low, which is reflected in the size of the SD error bars. Data on technical and biological replicates in each panel will be clearly indicated in the revised figure legends.

      • I would recommend that the authors explicitly describe the number of biological replicates used for each experiment in the methods section. If indeed these experiments were only performed once, I think the authors should be very careful in the language used in describing their conclusions and in assigning statistical significance. One possibility that could also be helpful would be normalizing LRFR 1h and LRFR 25h values to HRFR values and plotting these data somewhere in the supplemental data. If, for example, the relative levels of PIL1 are different between replicates but the fold-induction between HRFR and LRFR 1h are the same, this would at least allay any concerns that the experimental treatments were not the same. I understand that doing so precludes comparison between genotypes, but I do think it's important to show that at least the control data are comparable between experiments.

      * All qPCR and CoP-qPCR experiments have been performed with three 3 biological replicates as described in Materials and Methods section, and these are represented in the Figures. Relative gene expression in the qPCR experiments was normalized to two housekeeping genes YLS8 and UBC21 and afterwards to one biological replicate of Col-0 control in HRFR. As indicated for the previous comment information about replicates will be included in the updated figure legends.

      • Similarly, for the CoP-qPCR experiments presented in Fig 5B and 5C, the col-0 values for region P2 between Fig 5B and 5C shows that while HRFR and LRFR 1h look comparable, the values presented for LRFR 25h are quite different.

      * This comment of the reviewer prompted us to propose a different way of representing the data that is clearer (new Figure 5B and 5C). We believe that this facilitates the comparison between the genotypes. Enrichment over the input was calculated for the chromatin accessibility of each region. Chromatin accessibility was further normalized against two open control regions on the promoters of ACT2 (AT3G18780, region chr3:6474579: 6474676) and RNA polymerase II transcription elongation factor (AT1G71080 region chr1:26811833:26811945). The difference between previous representation is that the regions are not additionally subtracted to Col-0 in HRFR. We will update the Materials and Methods and figure legend sections with this information.

      Minor comments: • Presentation of Supplemental Figure 7A/7B and Supplemental Figure 8A/8B could be changed to make the data more clear (i.e. side-by-side rather than superimposed).

      We propose changing the presentation of the hypocotyl length data to show the values for days side-by-side as the Reviewer suggests.

      • I think that the paragraph introducing auxin (lines 25-37) could be reduced to 1-2 sentences and merged into a separate introductory paragraph given that the SAV3 work makes up a relatively minor component of the manuscript.

      * We agree with the reviewer and will reduce the paragraph about auxin and merge it with the previous paragraph about transcription.

        • For Figure 3A, I would strongly encourage the authors to show some of the raw western blot data for PIF4, PIF5 and PIF7 protein abundance (and loading control), not just the normalized values. This could be put into supplemental data, but I think it should accompany the manuscript.

      * We agree that presenting the raw data that was used for quantification is important. We will include the western blots used for quantifying PIF4, PIF5 and PIF7 protein abundance (and loading control DET3). This information will presumably be included to the Supplementary Figure 3C (figure number to be confirmed once we decide on all new data to be presented).

      • Lines 145-147 "we observed a strong correlation between PIF4 protein levels (Figure 3A) and PIL1 promoter occupancy (Figure 3B), and a similar behavior was seen with PIF7 (Figure 3B)." According to Fig 3A, there is no statistically significant increase in PIF7 abundance after 1h shade. There is an apparent increase in PIF7 promoter occupancy, but the variation appears too large for it to be statistically significant. I agree that qualitatively there is a correlation for PIF4 but I think the description of the behavior of PIF7 should be rephrased.

      * __As suggested by the reviewer, we will rephrase this paragraph to more accurately account for our data and also what was reported by others (e.g. Willige et al, 2021, in Li et al, 2012) regarding the regulation PIF7 levels and phosphorylation in response to a low R/FR treatment. __

      • There appear to be issues in the coloring of the labels (light blue dots vs dark blue dots) for the PIF7 panels of Fig 3B and Supplemental Fig 3B.*

      We thank the reviewer for pointing this out. This will be clarified by appropriate changes in the figure to avoid confusion in the revised version of Figure 3B.

      Reviewer #1 (Significance (Required)):

      This authors here have sought to examine the possibility that the transcriptional responses to shade mediated by the phy-PIF system might involve large-scale opening or closing of chromatin regions. This is an important and unanswered question in the field despite several studies that have looked at the role of histone variants (H2A.Z) and modifications (H3K4me3 and H3K9ac) in modulating PIF transcriptional activating activity. The authors have shown that, at least in the case of the transcriptional response to shade mediated by PIF7 (and to an extent PIF4), large-scale changes in chromatin accessibility are not occurring in response to shade treatment.

      The results presented in this study support the hypothesis that large-scale changes in chromatin accessibility may have already occurred before plants see shade. This opens the possibility that perhaps the initial perception of light by etiolated (dark-grown seedlings) might trigger changes in chromatin accessibility, opening up chromatin in regions encoding "shade-specific" genes and/or closing chromatin in regions encoding "dark-specific" genes.

      The audience for this particular manuscript encompasses a fairly broad group of biologists interested in understanding how environmental stimuli can trigger changes in chromatin reorganization and transcription. The results here are important in that they rule out chromatin accessibility changes as underlying the changes in transcription between the short-term and long-term shade responses. They also reveal that there are a few cases in which chromatin accessibility does change in a statistically-significant manner in response to shade. These regions, and the molecular players which regulate their accessibility, merit further exploration.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The study by Paulisic et al. explores the variations in chromatin accessibility landscape induced by plant exposure to light with low red/far-red ratios (LRFR), which mimicks neighbor shade perception. The authors further compare these changes with the genome association of PIF4 and PIF7 transcription factors - two major actors of gene expression regulation in response to LRFR. While this is not highlighted in the main text, the analyses of chromatin accessibility are performed on INTACT-mediated nucleus sorting, presumably to ensure proper and clean isolation of nuclei.

      Major comments

      • Why is the experimental setup exposing plants to darkness overnight? Does this affect the response to LRFR, by a kind of reset of phytochrome signaling? I guess this choice was made to maintain a strong circadian rhythm. Yet, given that PIF genes are clock-regulated, I am afraid that this choice complicates data interpretation concerning the specific effects of LRFR exposure.

      There appears to be some confusion which prompts us to better explain our protocol both by changing Figure 1A (that outlines the experimental conditions) and in the text.

      Seedlings are grown in long day conditions because this is more physiologically relevant than growing them in constant light, which is a rather unnatural condition.

      The reviewer is correct that PIF transcription is under circadian control and the shade avoidance response is gated by the circadian clock (e.g. Salter et al., 2003). To prevent conflating circadian and light quality effects, all samples that are compared are harvested at the same ZT (circadian time – hours after dawn). This allows us to focus our analysis on light quality effects specifically. We are therefore convinced that our protocol does not complicate the interpretation of the LRFR effects reported here.

      • As a result of this setup, the 1h exposure to LRFR immediately follows HRFR while the 3h final LRFR exposure of the « 25h LRFR » samples immediately follows a long period of darkness. Can this explain why in several instances (e.g., at the ATHB2 gene) 1h LRFR seems to have stronger effects than 25h LRFR on chromatin accessibility?* Please check the explanation above. Both samples are harvested at the same ZT (ZT3, meaning 3 hours after dawn). The 1h LRFR seedlings went through the night, had 2 hours of HRFR then 1h of LRFR. The 25h are harvested at the very same ZT, meaning 3h after dawn. Importantly, the HRFR control was also harvested at ZT3, meaning 3h after dawn. As indicated above this protocol allows us to focus on the light quality effects by comparing samples that are all harvested at the same ZT.

      We expect that the changes in Fig. 1A and associated text changes will clarify this issue.

      • Lane 42 cites the work by Calderon et al 2022 as « Transcript levels of these genes increase before the H3K4me3 levels, implying that H3K4me3 increases as a consequence of active transcription ». Despite this previous study being reviewed and published, such a strong conclusion should be taken cautiously, and I disagree with it. The study by Calderon et al compares RNA-seq with ChIP-seq data, two methodologies with very different sensitivity, especially when employing bulk cells/whole seedlings as starting materials. For example, a gene strongly induced in a few cells will give a good Log2FC in RNA-seq data analysis (as new transcripts are produced after a low level of transcripts before shade) but, even though its chromatin variations would follow the same temporality or would even precede gene induction, this would be invisible in bulk ChIP-seq data analysis (which averages the signal of all cells together). I understand the rationale for relying on the conclusions made in an excellent lab with strong expertise in light signaling, but I recommend being cautious when relying on these conclusions to interpret new data.* We agree with this comment, and we will change the text to reflect this.

      • The problem is that the same issue holds true when comparing ATAC-seq and RNA-seq data. ATAC signals reflect average levels over all cells while RNA-seq data can be influenced by a few cell highly expressing a given gene. Even though authors carefully sorted nuclei using an INTACT approach, this should be discussed, in particular when gene clusters (such as cluster C-D) show no match between chromatin accessibility and transcript level variations. In this regard, is PIF7 expressed in many cells or a small niche of cells upon LRFR exposure? The conclusions on its role in chromatin accessibility, analyzed here as mean levels of many different seedling cells, could be affected by PIF7 activity pattern (e.g., at lane 293). __This is a good comment. PIF7 is expressed in the cotyledons and leaves in LD conditions (Kidokoro et al, 2009, Galvao et al, 2019), and few available scRNA-seq datasets indicate an enrichment of PIF7 in the epidermis (Kim et al, 2021, Lopez-Anido et al, 2021). LRFR exposure only mildly represses PIF7* expression as seen in Figure 3A and also in our bulk RNA-seq study (Table S4). We will discuss this potential limitation to our study in a revised version of the manuscript.__

      • Lane 89, the conclusion linking DNA methylation and DNA accessibility is unclear to me, this may be rephrased. Also, it should be noted that in gene-rich regions, most DNA methylation is located along the body of moderately to highly transcribing genes (gene-body methylation) while promoters of active and inactive genes are most frequently un-methylated.* We will rephrase to better reflect the presence or absence of DNA methylation on promoter regions of shade regulated genes that contain accessible sites.

      • Figure 3B shows a few ChIP-qPCR results with important conclusions. Why not sequencing the ChIPped DNA to obtain a genome-wide view of the PIF4-PIF7 relationships at chromatin, and also consequently a more robust genome-wide normalization?

      * Several studies have shown that in the conditions that we studied here: transfer of seedlings from high R/FR (simulated sun) to low R/FR (neighbor proximity), amongst all PIFs, PIF7 is the one that plays the most dominant function (e.g. Li et al., 2012; de Wit et al., 2016; Willige et al., 2021). PIF4 and PIF5 also contribute but to a lesser extent. Given that Willige et al., 2021 did extensive ChIP-seq studies for PIF7 using similar conditions to the ones we used, we decided to rely on their data (that we re-analyzed), rather than performing our own PIF7 ChIP-seq analysis. While also performing a ChIP-seq analysis for PIF4 in similar conditions might be useful (this data is not available as far as we know), we are not convinced that doing that experiment would substantially modify the message. In the revised version we will also include analysis of the data from Pfeiffer et al., 2014, which comprises a ChIP-seq. dataset for PIF5 (the closest paralog of PIF4) initially performed by Hornitschek et al., in 2012 in low R/FR conditions (see comment to reviewer 1 above). For new ChIP-seq, we would have to make this experiment from scratch with substantially more material than what we used for the targeted ChIP-qPCR analyses. We thus do not feel that such an investment (time and money) is warranted.

        • Given the known functional interaction between PIF7 and INO80, it would be relevant to test whether changes in chromatin accessibility at ATHB2 and other genes are affected in ino80 mutant seedlings. __We agree with the reviewer that this is potentially an interesting experiment. This will allow us to determine whether the nucleosome histone composition has an influence on nucleosome positioning at selected shade-regulated genes (e.g. ATHB2). We note that according to available data, the effect of INO80 would be expected once PIF7 started transcribing shade-induced genes. We therefore propose comparing the WT with an ino80 mutant for their seedling growth phenotype, expression of selected shade marker gene (e.g. ATHB2*) and chromatin accessibility before (high R/FR) and after low R/FR treatment at selected shade marker genes. This will allow us to determine whether INO80 influences chromatin accessibility prior to a low R/FR treatment and/or once the treatment started. Our plan is to include this data in a revised version of the manuscript. __
      • On the same line, it would be interesting to test whether PIF7 target regions with pre-existing accessible chromatin would exist in ino80 mutant plants. In other words, testing a model in which chromatin remodeling by INO80 defines accessibility under HRFR to enable rapid PIF recruitment and DNA binding upon LRFR exposure.*

      See our answer just above.

      Minor comments

      *• In Figure 1C, it seems that PIF7 target genes do not match the set of LRFR-downregulated genes (even less than at random). Why not exclude these 4 genes from the analyses? *

      This is correct. There are indeed only 4 downregulated PIF7 target genes as we define them. Removing these genes from the analyses does not change our interpretation of the data and hence for completeness we propose keeping them in a revised version of the manuscript

      • Figure 3A shows the quantification of protein blots, but I did not find the corresponding images. These should be shown in the figure or as a supplementary figure with proper controls.

      * We will include the raw Westen blots used for quantification of PIF4, PIF5 and PIF7 in the revised version of the manuscript

        • Lane 102, it is unclear why PIF7 target genes were defined as the -3kb/TSS domains while Arabidopsis intergenic regions are on average much shorter. Gene regulatory regions, or promoters, are typically called within -1kb/TSS regions to avoid annotating a ChIP peak to the upstream gene or TE. A better proxy of PIF7 typical binding sites in gene regulatory regions could be determined by analysing the mean distance between PIF7 peak coordinates and the closest TSS. Typically, a gene meta-plot would give this information. __We agree that the majority of PIF7 binding peaks are close to the 5’ of the TSS based on the PIF7 binding distribution meta-plot. But several known PIF binding sites are actually further upstream than 1kb 5’ of the TSS (e.g. ATHB2 and HFR1). However, we re-analyzed the data using your suggestion with -2kb/TSS and -1kb/TSS and while the number of target genes is reduced, it does not change our conclusions about PIF7 binding sites being located on accessible chromatin regions. Importantly, some well characterized LRFR induced genes such as HFR1* would not be annotated correctly if only peaks closest to the gene TSS were taken into account, without flanking genes. In this case only the neighboring AT1G02350 would be annotated, hence missing some important PIF7 target genes. Taking this into consideration we will not modify this part of the analysis in a revised manuscript.__
      • Figure 4B, what's represented in the ATAC-seq heatmap: does a positive z-score represent high accessibility?*

      On the ATAC-seq heatmap we have represented z-scores of the average CPM (counts per million) for accessible chromatin regions. Z-scores are calculated by subtracting the average CPM from the median of averaged CPMs for each accessible chromatin region and then divided by the standard deviation (SD) of those averaged CPMs across all groups per accessible region (in our case a group is an average of three biological replicates for either HRFR, 1h or 25h of LRFR). In that sense, z-score indicates a change in accessibility, where higher z-score indicates opening of the region and lower z-score indicates a region becoming more closed when compared among the three light treatments (HRFR, 1h or 25h of LRFR). We will make sure that this is clear in the revised manuscript. Reviewer #2 (Significance (Required)):

      Contradicting the naive hypothesis that PIFs may target shade-inducible genes to « open » chromatin of shade-inducible genes with the help of chromatin remodelers, such as INO80, the study highlights that PIF7 typically associates with pre-existing accessible chromatin states. Thus, even though this is not stated, results from this study indicate that PIF7 is not a pioneer transcription factor. The data seem very robust, and while some conclusions need clarification, it should be of great interest to the community of scientists studying plant light signaling and shade responses.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      In their manuscript, Paulisic et al. investigate whether the transcriptional response of Arabidopsis seedlings to shade depends on chromatin accessibility, with a specific focus on PIF7-regulated genes. To this end, they perform ATAC-seq and RNA-seq, along with other experiments, on seedlings exposed to short and long shade and correlate the results with previously reported PIF7 and PIF4 ChIP-seq data. Based on their findings, they propose that shade-mediated transcriptional regulation may not require extensive remodeling of DNA accessibility. Specifically, they suggest that the open chromatin conformation allows PIFs to easily access and recognize their binding motifs, rapidly initiating gene expression in response to shade. This transcriptional response primarily depends on a transient increase in PIF stability and gene occupancy, with changes in chromatin accessibility occurring in only a small number of genes.

      Major comments: * • I have one issue that, in my opinion, requires more attention. To define the PIF7 target genes, which were later used to estimate whether PIF7 binds to open or closed chromatin and affects DNA accessibility after its binding, the authors compared the 4h LRFR data point from Willige et al. (2021) ChIP-seq with their 1h RNA-seq data point. This comparison might have missed early genes where PIF7 binds before the 1h time point but is no longer present on DNA at 4h. I understand the decision to choose the 4h Willige et al. ChIP-seq data point, performed under LD conditions, as it matches the data in this study, rather than the 5min-30min data points, which were conducted in constant light. However, if possible, it would be interesting to also compare the RNA-seq data with the early PIF7 binding genes to assess how many additional PIF7 target genes could be identified based on that comparison and whether this might alter the conclusions. If the authors do not agree with this point, it should at least be emphasized that the ChIP-seq data and the RNA-seq/ATAC-seq data were performed under different LRFR conditions (R/FR 0.6 vs. 0.1), which may lead to the misidentification of PIF7 target genes in the manuscript.*

      1) This is an interesting suggestion, we therefore reanalyzed 5, 10 and 30 min ChIP-seq timepoints from Willige et al, 2021 and compared them to 4h of LRFR (ZT4). We have crossed these lists of potential PIF7 targets with our 1h LRFR PIF457 dependent genes based on our RNA-seq. While some PIF7 targets appear only in early time points 5-10 min of LRFR exposure, overall, the number and composition of PIF7 target genes is rather constant across these timepoints. We propose to include these additional analyses in a revised version of the manuscript as a supplemental figure. However, these additional analyses do not influence our general conclusions.

      2) The comment regarding the R/FR ratio is important. We will point this out although the conditions used by Willige et al., 2021 and the ones we used are similar, they are not exactly the same in terms of R/FR ratio. Importantly, in both studies the early transcriptional response largely depends on the same PIFs, many of the same response genes are induced (e.g. PIL1, AtHB2, HFR1, YUC8, YUC9 and many others) and the physiological response (hypocotyl elongation) is similar. This shows that this low R/FR response yields robust responses.

      Minor comments: • In Fig. 1D, please describe the meaning of the blue shaded areas and the blue lines under the ATAC-seq peaks, as they do not always correlate.

      The shaded areas and the bars define the extension of the ATAC-seq accessible chromatin peaks. We will add the meaning of the shaded areas and the blue bars in the Figure legend and correct the colors in a revised manuscript

      • In Fig. 1E, it could be helpful to note that the 257 peaks in the right bar correspond to the peaks associated with the 177 genes in the left bar.* We will update Figure 1E and Figure legends for better understanding as the Reviewer suggested.

      • In lines 116, 119, and 122, I believe it should read "Fig. 2" instead of "Fig. 2A."* We thank the Reviewer for noticing the error that we will correct.

      • Lines 138-139: "PIF7 total protein levels were overall more stable, and only a mild and non-significant increase of PIF7 levels was seen at 1 h of LRFR." Since PIF7 usually appears as two bands in HRFR and only one band in LRFR, how was the protein level of PIF7 quantified in Fig. 3A? Additionally, I was wondering about the authors' thoughts on the discrepancy with Willige et al. (2021, Extended Data Fig. 1d), where PIF7 abundance seems to increased after 30 min and 2 h of LRFR.* PIF7 protein levels were quantified by considering both the upper and the lower band in HRFR (total PIF7) and normalizing its levels to DET3 loading control. We still observe an increase in the total PIF7 protein levels at 1h of LRFR, however this change was not statistically significant in these experiments. In our conditions as in Willige et al, 2021, the increase in PIF7 protein levels to short term shade seems consistent as is the pronounced shift or disappearance of the upper band (phosphorylated form) on the Western blots (raw data will be available in the revised manuscript). We will introduce text changes referring to the phosphorylation status of PIF7 in our conditions.

      • Line 150: "... many early PIF target genes (Figure 3C)." Since only PIL1 is shown in Fig. 3C, I would recommend revising this sentence. Alternatively, the data could be presented, as in Fig. 2, for all the PIF7 target genes with transient expression patterns.

      * We will introduce changes in the text to reflect that we only show PIL1 in the main Figure 3C.

      • Line 204: I'm not sure if Supplementary Fig. 7C-D is correct here. If it is, could the order of the figures be changed so that Supplementary Fig. 7C-D becomes Supplementary Fig. 7A-B?*

      The order of the panels A-B in the Supplementary Figure 7 follows the order of the text in the manuscript and is mentioned before panels C-D. It refers to the sentence “Overexpression of phyB resulted in a strong repression of hypocotyl elongation in both HRFR and LRFR, while the absence of phyB promoted hypocotyl elongation (Supplementary Figure 7A-B).”

        • Line 208: "In all three cases...". Please clarify what the three cases refer to. __We will change the text to more explicitly refer to the differentially accessible regions (DARs) of the genes ATHB2 and HFR1* shown in Figure 5A.__
      • Line 231: Should Fig. 5C also be cited here in addition to Supplementary Fig. 7?* We will add the reference to Figure 5C that was missing.

      *• In Supplementary Table 3, more information is needed. For example, it could mention: "This data is presented in Fig. 3 and is based on datasets from ChIP-seq, RNA-seq, etc."

      *

      The table will be updated with more information as suggested by the Reviewer.

      • In the figure legend of Fig. 4B, please check the use of "( )".*

      We will correct the error and include the references inside the parenthesis.

      Reviewer #3 (Significance (Required)):

      Paulisic et al. present novel discoveries in the field of light signaling and shade avoidance. Their findings extend our understanding of how DNA organization, prior to shade, affects PIF binding and how PIF binding remodels DNA accessibility. The data presented support the conclusions well and are backed by sufficient experimental evidence.

      2. Description of the revisions that have already been incorporated in the transferred manuscript

      The manuscript has not been modified yet.

      3. Description of analyses that authors prefer not to carry out

      • *

      Reviewer 2 asked for new ChIP-seq analyses for PIF7 and PIF4. For reasons that we outlined above, we believe that such analyses are not required, and we currently do not intend performing these experiments.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Plant systems sense shading by neighbors via the phytochrome signaling system. In the shade, PHYTOCHROME-INTERACTING FACTORS (PIFs) accumulate and are responsible for transcriptional reprogramming that enable plants to mobilize the "shade-avoidance response". Here, the authors have sought to examine the role of chromatin in modulating this response, specifically by examining whether "open" or "closed" chromatin regions spanning PIF target genes might explain the transcriptional output of these genes. They used a combination of ATAC-seq/CoP-qPCR (to detect open regions of chromatin), ChIP (to assay PIF binding) and RNA-seq (to measure transcript abundance) to understand how these processes may be mechanistically linked in Arabidopsis wild-type and pif mutant lines. They found that some chromatin accessibility changes do occur after LRFR (shade) treatment (32 regions after 1h and 61 after 25 h). While some of these overlap with PIF-binding sites, the authors found no correlation between open chromatin states and high levels of transcription. Because auxin is an important component of the shade-avoidance response and has been shown to control chromatin accessibility in other contexts, they examined whether auxin might be required for opening these regions of chromatin. They find that in an auxin biosynthesis mutant, there is a small subset of PIF target genes whose chromatin accessibility seems altered relative to the wild-type. Likewise, they found that chromatin accessibility for certain PIF targets is altered in phyB and pif mutant and propose that PIFs are necessary for changing the accessibility of chromatin in these genes. The authors thus propose that PIF occupancy of already open regions, rather than increased accessibility, underly the increase in transcript of abundance of these target genes in response to shade.

      Major comments:

      I find that the data generally support the hypothesis presented in the manuscript that chromatin accessibility alone does not predict transcription of PIF target genes in the shade. That said, I think that a paragraph from the discussion (lines 321-332) would benefit from some careful rephrasing. I think it is perfectly reasonable to propose that PIF occupancy is more predictive of shade-induced transcriptional output than chromatin accessibility, but I think that calling PIF occupancy "the key drivers" (line 323) or "the main driving force" (line 76) risks ignoring the observation that levels of PIF occupancy specifically do not predict expression levels of PIF target genes (Pfeiffer et al., 2014, Mol Plant). For PIL1 and HFR1, the authors have shown that PIF promoter occupancy and transcript levels are correlated, but the central finding of Pfeiffer et al. was that this pattern does not apply to the majority of PIF direct target genes. Finding factors (i.e. histone marks) that convert PIF-binding information into transcriptional output appears to have been the impetus for the experiments devised in Willige et al., 2021 and Calderon et al., 2022. It is great that the authors have outlined in the discussion that there are a number of factors that modulate PIF transcriptional activating activity but I think that the emphasis on PIF-binding explaining transcript abundance should be moderated in the text.

      I think that the hypothesis could be further supported by incorporating the previously published ChIP-seq data on PIF1, PIF3 and PIF5 binding. Given these data are published/publicly available, I think it would be helpful to note which of the 72 DARs are bound by PIF1, PIF3 and/or PIF5. Especially so given that PIF5 (Lorrain et al., 2008, Plant J) and PIF1/PIF3 (Leivar et al., 2012, Plant Cell) contribute at least in some capacity to transcriptional regulation in response to shade. At the very least, it might help explain some of the observed increases in nucleosome accessibility observed for genes that don't have PIF4 or PIF7-binding.

      In the manuscript, there are several instances where separate col-0 (wild type) controls have been used for identical experiments. Specifically, qPCR (Fig 3C, Fig S7C/D and Fig S8C/D), CoP-qPCR (Fig 5B/5C and Fig S8E/F) and hypocotyl measurements (Fig S7A/B and Fig S8A/B). In the cases of the hypocotyl measurements, there appear to be hardly any differences between col-0 controls indicating the measurements can be confidently compared between genotypes.

      In some cases of qPCR and CoP-qPCR experiments however, the differences in values obtained from col-0 samples that underwent identical experimental treatments appear to vary significantly. In Figure 3C for example, the overall trend for PIL1 expression in col-0 is the same (e.g. HRFR levels are low, LRFR1 levels are much higher and LRFR25 levels drop down to some intermediate level) but the expression levels themselves appear to differ almost two-fold for the LRFR 1h timepoint (~110 on the left panel vs ~60 for the right panel). Given the size of the error bars, it appears that these data represent the mean from only one biological replicate. PIL1 expression levels at LRFR 1h as reported in Fig S7C and D also show similar ~2-fold differences.

      I would recommend that the authors explicitly describe the number of biological replicates used for each experiment in the methods section. If indeed these experiments were only performed once, I think the authors should be very careful in the language used in describing their conclusions and in assigning statistical significance. One possibility that could also be helpful would be normalizing LRFR 1h and LRFR 25h values to HRFR values and plotting these data somewhere in the supplemental data. If, for example, the relative levels of PIL1 are different between replicates but the fold-induction between HRFR and LRFR 1h are the same, this would at least allay any concerns that the experimental treatments were not the same. I understand that doing so precludes comparison between genotypes, but I do think it's important to show that at least the control data are comparable between experiments.

      Similarly, for the CoP-qPCR experiments presented in Fig 5B and 5C, the col-0 values for region P2 between Fig 5B and 5C shows that while HRFR and LRFR 1h look comparable, the values presented for LRFR 25h are quite different.

      Minor comments:

      Presentation of Supplemental Figure 7A/7B and Supplemental Figure 8A/8B could be changed to make the data more clear (i.e. side-by-side rather than superimposed).

      I think that the paragraph introducing auxin (lines 25-37) could be reduced to 1-2 sentences and merged into a separate introductory paragraph given that the SAV3 work makes up a relatively minor component of the manuscript.

      For Figure 3A, I would strongly encourage the authors to show some of the raw western blot data for PIF4, PIF5 and PIF7 protein abundance (and loading control), not just the normalized values. This could be put into supplemental data, but I think it should accompany the manuscript.

      Lines 145-147 "we observed a strong correlation between PIF4 protein levels (Figure 3A) and PIL1 promoter occupancy (Figure 3B), and a similar behavior was seen with PIF7 (Figure 3B)." According to Fig 3A, there is no statistically significant increase in PIF7 abundance after 1h shade. There is an apparent increase in PIF7 promoter occupancy, but the variation appears too large for it to be statistically significant. I agree that qualitatively there is a correlation for PIF4 but I think the description of the behavior of PIF7 should be rephrased.

      There appear to be issues in the coloring of the labels (light blue dots vs dark blue dots) for the PIF7 panels of Fig 3B and Supplemental Fig 3B.

      Significance

      This authors here have sought to examine the possibility that the transcriptional responses to shade mediated by the phy-PIF system might involve large-scale opening or closing of chromatin regions. This is an important and unanswered question in the field despite several studies that have looked at the role of histone variants (H2A.Z) and modifications (H3K4me3 and H3K9ac) in modulating PIF transcriptional activating activity. The authors have shown that, at least in the case of the transcriptional response to shade mediated by PIF7 (and to an extent PIF4), large-scale changes in chromatin accessibility are not occurring in response to shade treatment.

      The results presented in this study support the hypothesis that large-scale changes in chromatin accessibility may have already occurred before plants see shade. This opens the possibility that perhaps the initial perception of light by etiolated (dark-grown seedlings) might trigger changes in chromatin accessibility, opening up chromatin in regions encoding "shade-specific" genes and/or closing chromatin in regions encoding "dark-specific" genes.

      The audience for this particular manuscript encompasses a fairly broad group of biologists interested in understanding how environmental stimuli can trigger changes in chromatin reorganization and transcription. The results here are important in that they rule out chromatin accessibility changes as underlying the changes in transcription between the short-term and long-term shade responses. They also reveal that there are a few cases in which chromatin accessibility does change in a statistically-significant manner in response to shade. These regions, and the molecular players which regulate their accessibility, merit further exploration.

      My fields of expertise are photobiology, photosynthesis and early seedling development.

    1. Reviewer #1 (Public review):

      This is a well-designed and very interesting study examining the impact of imprecise feedback on outcomes in decision-making. I think this is an important addition to the literature, and the results here, which provide a computational account of several decision-making biases, are insightful and interesting.

      I do not believe I have substantive concerns related to the actual results presented; my concerns are more related to the framing of some of the work. My main concern is regarding the assertion that the results prove that non-normative and non-Bayesian learning is taking place. I agree with the authors that their results demonstrate that people will make decisions in ways that demonstrate deviations from what would be optimal for maximizing reward in their task under a strict application of Bayes' rule. I also agree that they have built reinforcement learning models that do a good job of accounting for the observed behavior. However, the Bayesian models included are rather simple, per the author's descriptions, applications of Bayes' rule with either fixed or learned credibility for the feedback agents. In contrast, several versions of the RL models are used, each modified to account for different possible biases. However, more complex Bayes-based models exist, notably active inference, but even the hierarchical Gaussian filter. These formalisms are able to accommodate more complex behavior, such as affect and habits, which might make them more competitive with RL models. I think it is entirely fair to say that these results demonstrate deviations from an idealized and strict Bayesian context; however, the equivalence here of Bayesian and normative is, I think, misleading or at least requires better justification/explanation. This is because a great deal of work has been done to show that Bayes optimal models can generate behavior or other outcomes that are clearly not optimal to an observer within a given context (consider hallucinations for example) but which make sense in the context of how the model is constructed as well as the priors and desired states the model is given.

      As such, I would recommend that the language be adjusted to carefully define what is meant by normative and Bayesian and to recognize that work that is clearly Bayesian could potentially still be competitive with RL models if implemented to model this task. An even better approach would be to directly use one of these more complex modelling approaches, such as active inference, as the comparator to the RL models, though I would understand if the authors would want this to be a subject for future work.

      Abstract:

      The abstract is lacking in some detail about the experiments done, but this may be a limitation of the required word count. If word count is not an issue, I would recommend adding details of the experiments done and the results.<br /> One comment is that there is an appeal to normative learning patterns, but this suggests that learning patterns have a fixed optimal nature, which may not be true in cases where the purpose of the learning (e.g. to confirm the feeling of safety of being in an in-group) may not be about learning accurately to maximize reward. This can be accommodated in a Bayesian framework by modelling priors and desired outcomes. As such, the central premise that biased learning is inherently non-normative or non-Bayesian, I think, would require more justification. This is true in the introduction as well.

      Introduction:

      As noted above, the conceptualization of Bayesian learning being equivalent to normative learning, I think requires further justification. Bayesian belief updating can be biased and non-optimal from an observer perspective, while being optimal within the agent doing the updating if the priors/desired outcomes are set up to advantage these "non-optimal" modes of decision making.

      Results:

      I wonder why the agent was presented before the choice, since the agent is only relevant to the feedback after the choice is made. I wonder if that might have induced any false association between the agent identity and the choice itself. This is by no means a critical point, but it would be interesting to get the authors' thoughts.

      The finding that positive feedback increases learning is one that has been shown before and depends on valence, as the authors note. They expanded their reinforcement learning model to include valence, but they did not modify the Bayesian model in a similar manner. This lack of a valence or recency effect might also explain the failure of the Bayesian models in the preceding section, where the contrast effect is discussed. It is not unreasonable to imagine that if humans do employ Bayesian reasoning that this reasoning system has had parameters tuned based on the real world, where recency of information does matter; affect has also been shown to be incorporable into Bayesian information processing (see the work by Hesp on affective charge and the large body of work by Ryan Smith). It may be that the Bayesian models chosen here require further complexity to capture the situation, just like some of the biases required updates to the RL models. This complexity, rather than being arbitrary, may be well justified by decision-making in the real world.

      The methods mention several symptom scales- it would be interesting to have the results of these and any interesting correlations noted. It is possible that some of the individual variability here could be related to these symptoms, which could introduce precision parameter changes in a Bayesian context and things like reward sensitivity changes in an RL context.

      Discussion:

      (For discussion, not a specific comment on this paper): One wonders also about participants' beliefs about the experiment or the intent of the experimenters. I have often had participants tell me they were trying to "figure out" a task or find patterns even when this was not part of the experiment. This is not specific to this paper, but it may be relevant in the future to try and model participant beliefs about the experiment especially in the context of disinformation, when they might be primed to try and "figure things out".

      As a general comment, in the active inference literature, there has been discussion of state-dependent actions, or "habits", which are learned in order to help agents more rapidly make decisions, based on previous learning. It is also possible that what is being observed is that these habits are at play, and that they represent the cognitive biases. This is likely especially true given, as the authors note, the high cognitive load of the task. It is true that this would mean that full-force Bayesian inference is not being used in each trial, or in each experience an agent might have in the world, but this is likely adaptive on the longer timescale of things, considering resource requirements. I think in this case you could argue that we have a departure from "normative" learning, but that is not necessarily a departure from any possible Bayesian framework, since these biases could potentially be modified by the agent or eschewed in favor of more expensive full-on Bayesian learning when warranted.

      Indeed, in their discussion on the strategy of amplifying credible news sources to drown out low-credibility sources, the authors hint at the possibility of longer-term strategies that may produce optimal outcomes in some contexts, but which were not necessarily appropriate to this task. As such, the performance on this task- and the consideration of true departure from Bayesian processing- should be considered in this wider context.

      Another thing to consider is that Bayesian inference is occurring, but that priors present going in produce the biases, or these biases arise from another source, for example, factoring in epistemic value over rewards when the actual reward is not large. This again would be covered under an active inference approach, depending on how the priors are tuned. Indeed, given the benefit of social cohesion in an evolutionary perspective, some of these "biases" may be the result of adaptation. For example, it might be better to amplify people's good qualities and minimize their bad qualities in order to make it easier to interact with them; this entails a cost (in this case, not adequately learning from feedback and potentially losing out sometimes), but may fulfill a greater imperative (improved cooperation on things that matter). Given the right priors/desired states, this could still be a Bayes-optimal inference at a social level and, as such, may be ingrained as a habit that requires effort to break at the individual level during a task such as this.

      The authors note that this task does not relate to "emotional engagement" or "deep, identity-related issues". While I agree that this is likely mostly true, it is also possible that just being told one is being lied to might elicit an emotional response that could bias responses, even if this is a weak response.

    2. Reviewer #3 (Public review):

      Summary

      This paper investigates how disinformation affects reward learning processes in the context of a two-armed bandit task, where feedback is provided by agents with varying reliability (with lying probability explicitly instructed). They find that people learn more from credible sources, but also deviate systematically from optimal Bayesian learning: They learned from uninformative random feedback, learned more from positive feedback, and updated too quickly from fully credible feedback (especially following low-credibility feedback). Overall, this study highlights how misinformation could distort basic reward learning processes, without appeal to higher-order social constructs like identity.

      Strengths

      (1) The experimental design is simple and well-controlled; in particular, it isolates basic learning processes by abstracting away from social context.

      (2) Modeling and statistics meet or exceed the standards of rigor.

      (3) Limitations are acknowledged where appropriate, especially those regarding external validity.

      (4) The comparison model, Bayes with biased credibility estimates, is strong; deviations are much more compelling than e.g., a purely optimal model.

      (5) The conclusions are interesting, in particular the finding that positivity bias is stronger when learning from less reliable feedback (although I am somewhat uncertain about the validity of this conclusion)

      Weaknesses

      (1) Absolute or relative positivity bias?

      In my view, the biggest weakness in the paper is that the conclusion of greater positivity bias for lower credible feedback (Figure 5) hinges on the specific way in which positivity bias is defined. Specifically, we only see the effect when normalizing the difference in sensitivity to positive vs. negative feedback by the sum. I appreciate that the authors present both and add the caveat whenever they mention the conclusion (with the crucial exception of the abstract). However, what we really need here is an argument that the relative definition is the *right* way to define asymmetry....

      Unfortunately, my intuition is that the absolute difference is a better measure. I understand that the relative version is common in the RL literature; however previous studies have used standard TD models, whereas the current model updates based on the raw reward. The role of the CA parameter is thus importantly different from a traditional learning rate - in particular, it's more like a logistic regression coefficient (as described below) because it scales the feedback but *not* the decay. Under this interpretation, a difference in positivity bias across credibility conditions corresponds to a three-way interaction between the exponentially weighted sum of previous feedback of a given type (e.g., positive from the 75% credible agent), feedback positivity, and condition (dummy coded). This interaction corresponds to the non-normalized, absolute difference.

      Importantly, I'm not terribly confident in this argument, but it does suggest that we need a compelling argument for the relative definition.

      (2) Positivity bias or perseveration?

      A key challenge in interpreting many of the results is dissociating perseveration from other learning biases. In particular, a positivity bias (Figure 5) and perseveration will both predict a stronger correlation between positive feedback and future choice. Crucially, the authors do include a perseveration term, so one would hope that perseveration effects have been controlled for and that the CA parameters reflect true positivity biases. However, with finite data, we cannot be sure that the variance will be correctly allocated to each parameter (c.f. collinearity in regressions). The fact that CA- is fit to be negative for many participants (a pattern shown more strongly in the discovery study) is suggestive that this might be happening. A priori, the idea that you would ever increase your value estimate after negative feedback is highly implausible, which suggests that the parameter might be capturing variance besides that it is intended to capture.

      The best way to resolve this uncertainty would involve running a new study in which feedback was sometimes provided in the absence of a choice - this would isolate positivity bias. Short of that, perhaps one could fit a version of the Bayesian model that also includes perseveration. If the authors can show that this model cannot capture the pattern in Figure 5, that would be fairly convincing.

      (3) Veracity detection or positivity bias?

      The "True feedback elicits greater learning" effect (Figure 6) may be simply a re-description of the positivity bias shown in Figure 5. This figure shows that people have higher CA for trials where the feedback was in fact accurate. But, assuming that people tend to choose more rewarding options, true-feedback cases will tend to also be positive-feedback cases. Accordingly, a positivity bias would yield this effect, even if people are not at all sensitive to trial-level feedback veracity. Of course, the reverse logic also applies, such that the "positivity bias" could actually reflect discounting of feedback that is less likely to be true. This idea has been proposed before as an explanation for confirmation bias (see Pilgrim et al, 2024 https://doi.org/10.1016/j.cognition.2023.105693 and much previous work cited therein). The authors should discuss the ambiguity between the "positivity bias" and "true feedback" effects within the context of this literature....

      The authors get close to this in the discussion, but they characterize their results as differing from the predictions of rational models, the opposite of my intuition. They write:

      Alternative "informational" (motivation-independent) accounts of positivity and confirmation bias predict a contrasting trend (i.e., reduced bias in low- and medium credibility conditions) because in these contexts it is more ambiguous whether feedback confirms one's choice or outcome expectations, as compared to a full-credibility condition.

      I don't follow the reasoning here at all. It seems to me that the possibility for bias will increase with ambiguity (or perhaps will be maximal at intermediate levels). In the extreme case, when feedback is fully reliable, it is impossible to rationally discount it (illustrated in Figure 6A). The authors should clarify their argument or revise their conclusion here.

      (4) Disinformation or less information?

      Zooming out, from a computational/functional perspective, the reliability of feedback is very similar to reward stochasticity (the difference is that reward stochasticity decreases the importance/value of learning in addition to its difficulty). I imagine that many of the effects reported here would be reproduced in that setting. To my surprise, I couldn't quickly find a study asking that precise question, but if the authors know of such work, it would be very useful to draw comparisons. To put a finer point on it, this study does not isolate which (if any) of these effects are specific to *disinformation*, rather than simply _less information._ I don't think the authors need to rigorously address this in the current study, but it would be a helpful discussion point.

      (5) Over-reliance on analyzing model parameters

      Most of the results rely on interpreting model parameters, specifically, the "credit assignment" (CA) parameter. Exacerbating this, many key conclusions rest on a comparison of the CA parameters fit to human data vs. those fit to simulations from a Bayesian model. I've never seen anything like this, and the authors don't justify or even motivate this analysis choice. As a general rule, analyses of model parameters are less convincing than behavioral results because they inevitably depend on arbitrary modeling assumptions that cannot be fully supported. I imagine that most or even all of the results presented here would have behavioral analogues. The paper would benefit greatly from the inclusion of such results. It would also be helpful to provide a description of the model in the main text that makes it very clear what exactly the CA parameter is capturing (see next point).

      (6) RL or regression?

      I was initially very confused by the "RL" model because it doesn't update based on the TD error. Consequently, the "Q values" can go beyond the range of possible reward (SI Figure 5). These values are therefore *not* Q values, which are defined as expectations of future reward ("action values"). Instead, they reflect choice propensities, which are sometimes notated $h$ in the RL literature. This misuse of notation is unfortunately quite common in psychology, so I won't ask the authors to change the variable. However, they should clarify when introducing the model that the Q values are not action values in the technical sense. If there is precedent for this update rule, it should be cited.

      Although the change is subtle, it suggests a very different interpretation of the model.

      Specifically, I think the "RL model" is better understood as a sophisticated logistic regression, rather than a model of value learning. Ignoring the decay term, the CA term is simply the change in log odds of repeating the just-taken action in future trials (the change is negated for negative feedback). The PERS term is the same, but ignoring feedback. The decay captures that the effect of each trial on future choices diminishes with time. Importantly, however, we can re-parameterize the model such that the choice at each trial is a logistic regression where the independent variables are an exponentially decaying sum of feedback of each type (e.g., positive-cred50, positive-cred75, ... negative-cred100). The CA parameters are simply coefficients in this logistic regression.

      Critically, this is not meant to "deflate" the model. Instead, it clarifies that the CA parameter is actually not such an assumption-laden model estimate. It is really quite similar to a regression coefficient, something that is usually considered "model agnostic". It also recasts the non-standard "cross-fitting" approach as a very standard comparison of regression coefficients for model simulations vs. human data. Finally, using different CA parameters for true vs false feedback is no longer a strange and implausible model assumption; it's just another (perfectly valid) regression. This may be a personal thing, but after adopting this view, I found all the results much easier to understand.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary: 

      The manuscript by Nicoletti et al. presents a minimal model of habituation, a basic form of non-associative learning, addressing both from dynamical and information theory aspects of how habituation can be realized. The authors identify that negative feedback provided with a slow storage mechanism is sufficient to explain habituation.

      Strengths: 

      The authors combine the identification of the dynamical mechanism with information-theoretic measures to determine the onset of habituation and provide a description of how the system can gain maximum information about the environment.

      We thank the reviewer for highlighting the strength of our work and for their comments, which we believe have been instrumental in significantly improving our work and its scope. Below, we address all their concerns.

      Weaknesses: 

      I have several main concerns/questions about the proposed model for habituation and its plausibility. In general, habituation does not only refer to a decrease in the responsiveness upon repeated stimulation but as Thompson and Spencer discussed in Psych. Rev. 73, 16-43 (1966), there are 10 main characteristics of habituation, including (i) spontaneous recovery when the stimulus is withheld after response decrement; dependence on the frequency of stimulation such that (ii) more frequent stimulation results in more rapid and/or more pronounced response decrement and more rapid spontaneous recovery; (iii) within a stimulus modality, the less intense the stimulus, the more rapid and/or more pronounced the behavioral response decrement; (iv) the effects of repeated stimulation may continue to accumulate even after the response has reached an asymptotic level (which may or may not be zero, or no response). This effect of stimulation beyond asymptotic levels can alter subsequent behavior, for example, by delaying the onset of spontaneous recovery. 

      These are only a subset of the conditions that have been experimentally observed and therefore a mechanistic model of habituation, in my understanding, should capture the majority of these features and/or discuss the absence of such features from the proposed model. 

      We are really grateful to the reviewer for pointing out these aspects of habituation that we overlooked in the previous version of our manuscript. Indeed, our model is able to capture most of these 10 observed behaviors, specifically: 1) habituation; 2) spontaneous recovery; 3) potentiation of habituation; 4) frequency sensitivity; 5) intensity sensitivity; 6) subliminal accumulation. Here, we are following the same terminology employed in Eckert et al., Current Biology 34, 5646–5658 (2024), the paper highlighted by the reviewer. We have dedicated a section of the revised version of the manuscript to these hallmarks, substantiating the validity of our framework as a minimal model to have habituation. We remark that these are the sole hallmarks that can be discussed by considering one single external stimulus and that can be identified without ambiguity in a biochemical context. This observation is again in line with Eckert et al., Current Biology 34, 5646–5658 (2024).

      In the revised version, we employ the same strategy of the aforementioned work to determine when the system can be considered “habituated”. Indeed, we introduce a response threshold that is now discussed in the manuscript. We also included a note in the discussions stating that, since any biochemical model will eventually reach a steady state, subliminal accumulation, for example, can only be seen with the use of a threshold. The introduction of different storage mechanisms, ideally more detailed at a molecular level, can shed light on this conceptual gap. This is an interesting direction of research.

      Furthermore, the habituated response in steady-state is approximately 20% less than the initial response, which seems to be achieved already after 3-4 pulses, the subsequent change in response amplitude seems to be negligible, although the authors however state "after a large number of inputs, the system reaches a time-periodic steady-state". How do the authors justify these minimal decreases in the response amplitude? Does this come from the model parametrization and is there a parameter range where more pronounced habituation responses can be observed? 

      The reviewer is correct, but this is solely a consequence of the specific set of parameters we selected. We made this choice solely for visualization purposes in the previous version. In the revised version, in the section discussing the hallmarks of habituation, we also show other parameter choices when the response decrement is more pronounced. Moreover, we remark that the contour plot of \Delta⟨U> clearly shows that the decrement can largely exceed the 20% threshold presented in the previous version.

      In the revised version, also in light of the works highlighted by the reviewer, we decided to move the focus of the manuscript to the information-theoretic advantage of habituation. As such, we modified several parts of the main text. Also, in the region of optimal information gain, habituation is at an intermediate level. For this reason, we decided to keep the same parameter choice as the previous version in Figure 2.

      We stated that the time-periodic steady-state is reached “after a large number of stimuli” from a mathematical perspective. However, by using a habituation threshold, as done in Eckert et al., Current Biology 34, 5646–5658 (2024), we can state that the system is habituated after a few stimuli for each set of parameters. This aspect is highlighted in the revised version of the manuscript (see also the point above).

      The same is true for the information content (Figure 2f) - already at the first pulse, IU, H ~ 0.7 and only negligibly increases afterwards. In my understanding, during learning, the mutual information between the input and the internal state increases over time and the system extracts from these predictions about its responses. In the model presented by the authors, it seems the system already carries information about the environment which hardly changes with repeated stimulus presentation. The complexity of the signal is also limited, and it is very hard to clarify from the presented results, whether the proposed model can actually explain basic features of habituation, as mentioned above. 

      As for the response decrement of the readout, we can certainly choose a set of parameters for which the information gain is higher. In the revised version, we also report the information at the first stimulation and when the system is habituated to give a better idea of the range of these quantities. At any rate, as the referee correctly points out, it is difficult to give an intuitive interpretation of the information in our minimal model.

      It is also important to remark that, since the readout population and the receptor both undergo fast dynamics (with appropriate timescales as discussed in the text), we are not observing the transient gain of information associated with the first stimulus. As such, the mutual information presents a discontinuous behavior that resembles the dynamics of the readout, thereby starting at a non-zero value already at the first stimulus.

      Additionally, there have been two recent models on habituation and I strongly suggest that the authors discuss their work in relation to recent works (bioRxiv 2024.08.04.606534; arXiv:2407.18204).

      We thank the reviewer for pointing out these relevant references. In the revised version, we highlighted that we discuss the information-theoretic aspects of habituation, while the aforementioned references focus on the dynamics of this phenomenon.

      Reviewer #1 (Recommendations for the authors):

      I would also like to note here the simplification of the proposed biological model - in particular, that the receptor can be in an active/passive state, as well as proposing the Nf-kB signaling module as a possible molecular realization. Generally, a large number of cell surface receptors including RTKs of GPCRs have much more complex dynamics including autocatalytic activation that generally leads to bistability, and the Nf-kB has been demonstrated to have oscillatory even chaotic dynamics (works of Savas Tsay, Mogens Jensen and others). Considering this, the authors should at least discuss under which conditions these TNF-Alpha signaling could potentially serve as a molecular realisation for habituation. 

      We thank the reviewer for bringing this to our attention. In the previous version, we reported the TNF signaling network only to show a similar coarse-grained modular structure. However, following a suggestion of reviewer #2, we decided to change Figure 1 to include a simplified molecular scheme of chemotaxis rather than TNF signaling, to avoid any source of confusion about this issue.

      Also, a minor point: Figures 2d-e are cited before 2a-c. 

      We apologize for the oversight. The structure of the Figures and their order is now significantly different, and they are now cited in the correct order. 

      Reviewer #2 (Public review):

      In this study, the authors aim to investigate habituation, the phenomenon of increasing reduction in activity following repeated stimuli, in the context of its information-theoretic advantage. To this end, they consider a highly simplified three-species reaction network where habituation is encoded by a slow memory variable that suppresses the receptor and therefore the readout activity. Using analytical and numerical methods, they show that in their model the information gain, the difference between the mutual information between the signal and readout after and before habituation, is maximal for intermediate habituation strength. Furthermore, they demonstrate that the Pareto front corresponds to an optimization strategy that maximizes the mutual information between signal and readout in the steady state, minimizes some form of dissipation, and also exhibits similar intermediate habituation strength. Finally, they briefly compare predictions of their model to whole-brain recordings of zebrafish larvae under visual stimulation. 

      The author's simplified model might serve as a solid starting point for understanding habituation in different biological contexts as the model is simple enough to allow for some analytic understanding but at the same time exhibits all basic properties of habituation in sensory systems. Furthermore, the author's finding of maximal information gain for intermediate habituation strength via an optimization principle is, in general, interesting. However, the following points remain unclear or are weakly explained: 

      We thank the reviewer for deeming our work interesting and for considering it a solid starting point for understanding habituation in biological systems.

      (1) Is it unclear what the meaning of the finding of maximal information gain for intermediate habituation strength is for biological systems? Why is information gain as defined in the paper a relevant quantity for an organism/cell? For instance, why is a system with low mutual information after the first stimulus and intermediate mutual information after habituation better than one with consistently intermediate mutual information? Or, in other words, couldn't the system try to maximize the mutual information acquired over the whole time series, e.g., the time series mutual information between the stimulus and readout?

      This is a delicate aspect to discuss and we thank the referee for the comment. In the revised version, we report information gain, initial and final information, highlighting that both gain and final information are higher in regions where habituation is present. They have qualitatively similar behavior and highlight a clear information-theoretic advantage of this dynamical phenomenon. An important point is that, to determine the optimal Pareto front, we consider a prolonged stimulus and its associated steady-state information. Therefore, from the optimization point of view, there is no notion of “information gain” or “final information”, which are intrinsically dynamical quantities. As a result, the fact that optimal curve lies in the region of optimal information gain is a-priori not expected and hints at the potential crucial role of this feature. In the revised version, we elucidate this aspect with several additional analyses.

      We would like to add that, from a naive perspective, while the first stimulation will necessarily trigger a certain (non-zero) mutual information, multiple observations of the same stimulus have to reflect into accumulated information that consequently drives the onset of observed dynamical behaviors, such as habituation.

      (2) The model is very similar to (or a simplification of previous models) for adaptation in living systems, e.g., for adaptation in chemotaxis via activity-dependent methylation and demethylation. This should be made clearer.

      We apologize for having missed this point. Our choice has been motivated by the fact that we wanted to avoid confusion between the usual definition of (perfect) adaptation and habituation. However, we now believe that this is not the case for the revised manuscript, and we now include chemotaxis as an example in Figure 1.

      (3) It remains unclear why this optimization principle is the most relevant one. While it makes sense to maximize the mutual information between stimulus and readout, there are various choices for what kind of dissipation is minimized. Why was \delta Q_R chosen and not, for instance, \dot{\Sigma}_int or the sum of both? How would the results change in that case? And how different are the results if the mutual information is not calculated for the strong stimulation input statistics but for the background one?

      We thank the reviewer for the suggestion. We agree that a priori, there is no reason to choose \delta Q_R or a function of the internal energy flux J_int (that, in the revised version, we are using in place of \dot\Sigma_int following the suggestion of reviewer #3). The rationale was to minimize \delta Q_R since this dissipation is unavoidable and stems from the presence of the storage inhibiting the receptor through the internal pathway. Indeed, considering the existence of two different pathways implementing sensing and feedback, the presence of any input will result in a dissipation produced by the receptor. This energy consumption is reflected in \delta Q_R.

      In the revised version, we now include in the optimization principle two energy contributions (see Eq. (14) of the revised manuscript): \delta Q_R and E_int, which is the energy consumption associated with the driven storage production per unit energy. All Figures have been updated accordingly. The results remain similar, as \delta Q_R still represents the main contribution, especially at high \beta.

      Furthermore, in the revised version, we include examples of the Pareto optimization for different values of input strength. As detailed both in the main text and the Supplementary Information, changing the value of ⟨H⟩ moves the Pareto frontier in the (\beta, \sigma) space, since the signal needs to be strong enough for the system to distinguish it from the intrinsic thermal noise (controlled by beta). We also show that if the system is able to tune the inhibition strength \kappa, the Pareto frontiers at different ⟨H⟩ collapse into a single curve. This shows that, although the values of, e.g., the mutual information, depend on ⟨H⟩, the qualitative behavior of the system in this regime is effectively independent of it. We also added more details about this in the Supplementary Information.

      (4) The comparison to the experimental data is not too strong of an argument in favor of the model. Is the agreement between the model and the experimental data surprising? What other behavior in the PCA space could one have expected in the data? Shouldn't the 1st PC mostly reflect the "features", by construction, and other variability should be due to progressively reduced activity levels? 

      The agreement between data and model is not surprising - we agree on this - since the data exhibit habituation. However, we believe that the fact that our minimal model is able to capture the features of a complex neural system just by looking at the PCs, without any explicit biological details, is non-trivial. We also stress that the 1st PC only reflects the feature that captures most of the variance of the data and, as such, it is difficult to have a-priori expectations on what it should represent. In the case of the data generated from the model, most of the variance of the activity comes from the switching signal, and similar considerations can be made for the looming stimulations in the data. We updated the manuscript to clarify this point.

      Reviewer #2 (Recommendations for the authors):

      (1) The abstract makes it sound like a new finding is that habituation is due to a slow, negative feedback mechanism. But, as mentioned in the introduction, this is a well-known fact. 

      We agree with the reviewer. We have revised the abstract.

      (2) Figure 2c Why does the range of Delta Delta I_f include negative values if the corresponding region is shaded (right-tilted stripes)? 

      The negative values in the range are those attained in the shaded region with right-tilted stripes. We decided to include them in the colorbar for clarity, since Delta Delta I_f is also plotted in the region where it attains negative values.

      (3) What does the Pareto front look like if the optimization is done for input statistics given by ⟨H⟩_min? 

      In the revised version, we include examples of the Pareto optimization for different values of input strength. As detailed both in the main text and the Supplementary Information, changing the value of ⟨H⟩ moves the Pareto frontier in the (\beta, \sigma) space, since the strength of the signal is crucial for the system to discriminate input and thermal noise (see also the answers above).

      In particular, in Figure 4 we explicitly compare the results of the Pareto optimization (which is done with a static input of a given statistics) with the dynamics of the model for different values of ⟨H⟩ in two scenarios, i.e., adaptive and non-adaptive inhibition strength (see answers above for details).

      We also remark that ⟨H⟩_min represents the background signal that the system is not trying to capture, which is why we never used it for optimization.

      (4) From the main text, it is rather difficult to understand how the comparison to the experimental data was performed. How was the PCA done exactly? What are the "features" of the evoked neural response? 

      The PCA on data is performed starting from the single-neuron calcium dynamics. To perform a far comparison, we reconstruct a similar but extremely simplified dynamics using our model as explained in Methods to perform the PCA on analogous simulated data. We added a comment on this in the revised version. While these components capture most of the variance in the data, their specific interpretation is usually out of reach and we believe that it lies beyond the scope of this theoretical work. We also remark that the model does not contain all these biological details - a strong aspect in our opinion - and, as such, it cannot capture specific biological features.

      Reviewer #3 (Public review):

      The authors use a generic model framework to study the emergence of habituation and its functional role from information-theoretic and energetic perspectives. Their model features a receptor, readout molecules, and a storage unit, and as such, can be applied to a wide range of biological systems. Through theoretical studies, the authors find that habituation (reduction in average activity) upon exposure to repeated stimuli should occur at intermediate degrees to achieve maximal information gain. Parameter regimes that enable these properties also result in low dissipation, suggesting that intermediate habituation is advantageous both energetically and for the purpose of retaining information about the environment. 

      A major strength of the work is the generality of the studied model. The presence of three units (receptor, readout, storage) operating at different time scales and executing negative feedback can be found in many domains of biology, with representative examples well discussed by the authors (e.g. Figure 1b). A key takeaway demonstrated by the authors that has wide relevance is that large information gain and large habituation cannot be attained simultaneously. When energetic considerations are accounted for, large information gain and intermediate habituation appear to be a favorable combination. 

      We thank the reviewer for this positive assessment of our work and its generality.

      While the generic approach of coarse-graining most biological detail is appealing and the results are of broad relevance, some aspects of the conducted studies, the problem setup, and the writing lack clarity and should be addressed: 

      (1) The abstract can be further sharpened. Specifically, the "functional role" mentioned at the end can be made more explicit, as it was done in the second-to-last paragraph of the Introduction section ("its functional advantages in terms of information gain and energy dissipation"). In addition, the abstract mentions the testing against experimental measurements of neural responses but does not specify the main takeaways. I suggest the authors briefly describe the main conclusions of their experimental study in the abstract.

      We thank the reviewer for raising this point. In the revised version, we have changed the abstract to reflect the reviewer’s points and the new structure and results of the manuscript.

      (2) Several clarifications are needed on the treatment of energy dissipation. 

      -   When substituting the rates in Eq. (1) into the definition of δQ_R above Eq. (10), "σ" does not appear on the right-hand side. Does this mean that one of the rates in the lower pathway must include σ in its definition? Please clarify.

      We apologize to the reviewer for this typo. Indeed, \sigma sets the energy scale of feedback and, as such, it appears in the energetic driving given by the feedback on the receptor, i.e., in Eq. (1) together with \kappa. This typo has been corrected in the revised manuscript, and all subsequent equations are consistent.

      -   I understand that the production of storage molecules has an associated cost σ and hence contributes to dissipation. The dependence of receptor dissipation on ⟨H⟩, however, is not fully clear. If the environment were static and the memory block was absent, the term with ⟨H⟩ would still contribute to dissipation. What would be the nature of this dissipation?

      In the spirit of building a paradigmatic minimal model with a thermodynamic meaning, we considered H to act as an external thermodynamic driving. Since this driving acts on a different pathway with respect to the one affected by the storage, the receptor is driven out of equilibrium by its presence.

      By eliminating the memory block, we would also be necessarily eliminating the presence of the pathway associated with the storage effect (“internal pathway” in the manuscript), since its presence is solely due to the existence of a storage population. Therefore, in this case, the receptor would be a 2-state, 1-pathway system and, as such, it would always satisfy an effective detailed balance. As a consequence, the definition of \delta Q_R reported in the manuscript would not hold anymore and the receptor would not exhibit any dissipation. Thus, in a static environment and without a memory block, no receptor dissipation would be present. We would also like to stress that our choice to model two different pathways has been motivated by the observation that the negative feedback acts along a different pathway in several biochemical and biological examples. We made some changes to the model description in the revised version and we hope that this aspect has been clarified.

      -   Similarly, in Eq. (9) the authors use the ratio of the rates Γ_{s → s+1} and Γ_{s+1 → s} in their expression for internal dissipation. The first-rate corresponds to the synthesis reaction of memory molecules, while the second corresponds to a degradation reaction. Since the second reaction is not the microscopic reverse of the first, what would be the physical interpretation of the log of their ratio? Since the authors already use σ as the energy cost per storage unit, why not use σ times the rate of producing S as a metric for the dissipation rate? 

      We agree with the referee that the reverse reaction we considered is not the microscopic reverse of the storage production. In the case of a fast readout population, we employed a coarse-grained view to compute this entropy production. To be more precise, we gladly welcomed the referee’s suggestion in the revised version and modified the manuscript accordingly. As suggested, we now employ the energy flux associated with the storage production to estimate the internal dissipation (see new Fig. 3). 

      In the revised version, we also use this quantity in the optimization procedure in combination with \deltaQ_R (see new Fig. 4) to have a complete characterization of the system’s energy consumption. The conclusions are qualitatively identical to before, but we believe that now they are more solid from a theoretical perspective. For this important advance in the robustness and quality of our work, we are profoundly grateful to the referee.

      (3) Impact of the pre-stimulus state. The plots in Figure 2 suggest that the environment was static before the application of repeated stimuli. Can the authors comment on the impact of the pre-stimulus state on the degree of habituation and its optimality properties? Specifically, would the conclusions stay the same if the prior environment had stochastic but aperiodic dynamics? 

      The initial stimulus is indeed stochastic with an average constant in time and mimics the background (small) signal. We apply the (strong) stimulation when the system already reached a stationary state with respect to the background. As it can be appreciated in Fig. 2 of the revised version, the model response depends on the pre-stimulus level, since it sets the storage concentration before the stimulation arrives and, as such, the subsequent habituation dynamics. This dependence is important from a dynamical perspective. The information-theoretic picture has been developed, as said above, by letting the system relax before the first stimulus. This eliminates this arbitrary dependence and provides a clearer idea of the functional advantages of habituation. Moreover, the optimization procedure is performed in a completely different setting, with no pre-stimulus at all, since we only have one prolonged stimulation. We hope that the revised version is clearer on all these points.

      (4) Clarification about the memory requirement for habituation. Figure 4 and the associated section argue for the essential role that the storage mechanism plays in habituation. Indeed, Figure 4a shows that the degree of habituation decreases with decreasing memory. The graph also shows that in the limit of vanishingly small Δ⟨S⟩, the system can still exhibit a finite degree of habituation. Can the authors explain this limiting behavior; specifically, why does habituation not vanish in the limit Δ⟨S⟩ -> 0?

      We apologize for the lack of clarity and we thank the reviewer for spotting this issue. In Figure 4 (now Figure 5 in the revised manuscript) Δ⟨S⟩ is not exactly zero, but equal to 0.15% at the final point. It appeared as 0% in the plot due to an unwanted rounding in the plotting function that we missed. This has been fixed in the revised version, thank you.

      Reviewer #3 (Recommendations for the authors):

      (1) Page 2 | "Figure 1b-e" should be "Figure 1b-d" since there is no panel (e) in Figure 1. 

      (2) Figure 1a | In the top schematic, the symbol "k" is used, while in the rest of the text, the proportionality constant is denoted by κ. 

      We thank the reviewer for pointing this out. Figure 1 has been revised and the panels are now consistent. The proportionality constant (the inhibition strength) has also been fixed.

      (3) Figure 1a | I find the upper part of the schematic for Storage hard to perceive. I understand the lower part stands for the degradation reaction for storage molecules. The upper part stands for the synthesis reaction catalyzed by the readout population. I think the bolded upper arrow would explain it sufficiently well; the left/right arrows, together with the crossed green circle make that part of the figure confusing. Consider simplifying. 

      We decided to remove the left/right arrows, as suggested by the reviewer, as we agree that they were unnecessarily complicating the schematic. We hope that the revised version will be easier to understand.

      (4)Page 3 | It would be helpful to tell what the temporal statistics of the input signal $p_H(h,t)$ is, i.e. <h(t) h(t')>. Looking at the example trajectory in Figure 1a, consecutive signal values do not seem correlated. 

      We agree with the reviewer that this is an important detail and worth mentioning. We now explicitly state that consecutive values are not correlated, for simplicity. 

      (5)Figure 2 | I believe the label "EXTERNAL INPUT" refers to the *average* external input, not one specific realization (similar to panels (d) and (e) that report on average metrics). I suggest you indicate this in the label, or, what may be even better, add one particular realization of the stochastic input to the same graph.

      We thank the reviewer for spotting this. We now write that what we show is the average external signal. We prefer this solution rather than showing a realization of the stochastic input, since it is more consistent with the rest of the plots, where we always show average quantities. We also note that Figure 2 is now Figure 3 in the revised manuscript.

      (6)Figure 2d | The expression of Δ⟨U⟩ is the negative of the definition in Eq. (5). It should be corrected. 

      In the revised version, both the definitions in Figure 2 (now Figure 3) and in the text (now Eq. (11)) are consistent.

      (7) Figure 3(d-e) caption | "where ⟨U⟩ starts to be significantly smaller than zero." There, it should be Δ⟨U⟩ instead of ⟨U⟩. 

      Thanks again, we corrected this typo.

    1. Now, there are a million implications to outsourcing our first drafts to AI. We know people anchor on the first idea they see, influencing their future work, so even drafts that are completely rewritten will be AI-tinged. People may not be as thoughtful about what they write, or the lack of effort may mean they don’t think through problems as deeply.

      The starting point can no longer be a draft, must be a conversation?

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      Wang et al. identify Hamlet, a PR-containing transcription factor, as a master regulator of reproductive development in Drosophila. Specifically, the fusion between the gonad and genital disc is necessary for the development of continuous testes and seminal vesicle tissue essential for fertility. To do this, the authors generate novel Hamlet null mutants by CRISPR/Cas9 gene editing and characterize the morphological, physiological, and gene expression changes of the mutants using immunofluorescence, RNA-seq, cut-tag, and in-situ analysis. Thus, Hamlet is discovered to regulate a unique expression program, which includes Wnt2 and Tl, that is necessary for testis development and fertility. 

      Strengths: 

      This is a rigorous and comprehensive study that identifies the Hamlet-dependent gene expression program mediating reproductive development in Drosophila. The Hamlet transcription targets are further characterized by Gal4/UAS-RNAi confirming their role in reproductive development. Finally, the study points to a role for Wnt2 and Tl as well as other Hamlet transcriptionally regulated genes in epithelial tissue fusion. 

      We appreciate that the reviewer thinks our study is rigorous.

      Weaknesses: 

      The image resolution and presentation of figures is a major issue in this study. As a nonexpert, it is nearly impossible to see the morphological changes as described in the results. Quantification of all cell biological phenotypes is also lacking therefore reducing the impact of this study to those familiar with tissue fusion events in Drosophila development. 

      In the revised version, we have improved the image presentation and resolution. For all the images with more than two channels, we included single-channel images, changed the green color to lime and the red to magenta, highlighted the testis (TE) and seminal vescicles to make morphological changes more visible.  

      We had quantification for marker gene expression in the original version, and now also included quantification for cell biological phenotypes which are generally with 100% penetrance.  

      Reviewer #2 (Public review): 

      Strengths: 

      Wang and colleagues successfully uncovered an important function of the Drosophila PRDM16/PRDM3 homolog Hamlet (Ham) - a PR domain-containing transcription factor with known roles in the nervous system in Drosophila. To do so, they generated and analyzed new mutants lacking the PR domain, and also employed diverse preexisting tools. In doing so, they made a fascinating discovery: They found that PR-domain containing isoforms of ham are crucial in the intriguing development of the fly genital tract. Wang and colleagues found three distinct roles of Ham: (1) specifying the position of the testis terminal epithelium within the testis, (2) allowing normal shaping and growth of the anlagen of the seminal vesicles and paragonia and (3) enabling the crucial epithelial fusion between the seminal vesicle and the testis terminal epithelium. The mutant blocks fusion even if the parts are positioned correctly. The last finding is especially important, as there are few models allowing one to dissect the molecular underpinnings of heterotypic epithelial fusion in development. Their data suggest that they found a master regulator of this collective cell behavior. Further, they identified some of the cell biological players downstream of Ham, like for example E-Cadherin and Crumbs. In a holistic approach, they performed RNAseq and intersected them with the CUT&TAG-method, to find a comprehensive list of downstream factors directly regulated by Ham. Their function in the fusion process was validated by a tissue-specific RNAi screen. Meticulously, Wang and colleagues performed multiplexed in situ hybridization and analyzed different mutants, to gain a first understanding of the most important downstream pathways they characterized, which are Wnt2 and Toll. 

      This study pioneers a completely new system. It is a model for exploring a process crucial in morphogenesis across animal species, yet not well understood. Wang and colleagues not only identified a crucial regulator of heterotypic epithelial fusion but took on the considerable effort of meticulously pinning down functionally important downstream effectors by using many state-of-the-art methods. This is especially impressive, as the dissection of pupal genital discs before epithelial fusion is a time-consuming and difficult task. This promising work will be the foundation future studies build on, to further elucidate how this epithelial fusion works, for example on a cell biological and biomechanical level. 

      We appreciate that the reviewer thinks our study is orginal and important.

      Weaknesses: 

      The developing testis-genital disc system has many moving parts. Myotube migration was previously shown to be crucial for testis shape. This means, that there is the potential of non-tissue autonomous defects upon knockdown of genes in the genital disc or the terminal epithelium, affecting myotube behavior which in turn affects fusion, as myotubes might create the first "bridge" bringing the epithelia together. The authors clearly showed that their driver tools do not cause expression in myoblasts/myotubes, but this does not exclude non-tissue autonomous defects in their RNAi screen. Nevertheless, this is outside the scope of this work. 

      We thank the reviewer’s consideration of non-tissue autonomous defects upon gene knockdown. The driver, hamRSGal4, drives reporter gene expression mainly in the RS epithelia, but we did observe weak expression of the reporter in the myoblasts before they differentiate into myotubes. Thus, we could not rule out a non-tissue autonomou effect in the RNAi screen. So we now included a statement in the result, “Given that the hamRSGal4 driver is highly expressed in the TE and SV epithelia, we expect highly effective knockdown occurs only in these epithelial cells. However, hamRSGal4 also drives weak expression in the myoblasts before they differentiated into myotubes (Supplementary Fig. 5B), which may result in a non-tissue autonomous effect when knocking down the candidate genes expressed in myoblasts.”

      However, one point that could be addressed in this study: the RNAseq and CUT&TAG experiments would profit from adding principal component analyses, elucidating similarities and differences of the diverse biological and technical replicates. 

      Thanks for the suggestion. We now have included the PCA analyses in supplementary figure 6A-B and the corresponding description in the text. The PCA graphs validated the consistency between biological replicates of the RNA-seq samples. The Cut&Tag graphs confirm the consistency between the two biological replicates from the GFP samples, but show a higher variability between the w1118 replicates. Importantly, we only considered the overlapped peaks pulled by the GFP antibody from the ham_GFP genotype and the Ham antibody from the wildtype (w1118) sample as true Ham binding sites. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      Major Concern: 

      (1) The image resolution and presentation of figures (Figures 2, 5, 6, and 7) is a major issue in this study. As a non-expert, it is nearly impossible to see the morphological changes as described in the results. Images need to be captured at higher resolution and zoomed in with arrows denoting changes as described. Individual channels, particularly for intensity measurement need to be shown in black and white in addition to merged images. Images also need pseudo-colored for color-blind individuals (i.e. no red-green staining). 

      The images were captured at a high resolution, but somehow the resolution was drammaticlly reduced in the BioRxiv PDF. We try to overcome this by directly submitting the PDF in the Elife submission system. In the revised version, we have included single-channel images, changed the green and red colors to lime and magenta for color blindness. We also highlighted the testis (TE) and seminal vescicle structures in the images to make morphological changes more visible.  

      (2) The penetrance of morphological changes observed in RT development is also unclear and needs to be rigorously quantified for data in Figures 2, 5, and 7. 

      We now included quantification for cell biological phenotypes which are generally with 100% penetrance. The percentage of the penetrance and the number of animals used are indicated in each corresponding image.  

      Reviewer #2 (Recommendations for the authors): 

      Major Points 

      (1) Lines 193- 220 I would strongly suggest pointing out the obvious shape defects of the testes visible in Figure 2A ("Spheres" instead of "Spirals"). These are probably a direct consequence of a lack in the epithelial connection that myotubes require to migrate onto the testis (in a normal way) as depicted in the cartoons, allowing the testis to adopt a spiral shape through myotube-sculpting (Bischoff et al., 2021), further confirming the authors' findings! 

      Good point. In the revised text, we have added more description of the testis shape defects and pointed out a potential contribution from compromised myotube migration.   

      (2) Line 216: "Often separated from each other". Here it would be important to mention how often. If the authors cannot quantify that from existing data, I suggest carrying it out in adult/pharate adult genital tracts (if there is no strong survivor bias due to the lethality of stronger affected animals), as this is much easier than timing prepupae. This should be a quick and easy experiment. 

      Because it is hard to tell whether the separation of the SV and TE was caused by developmental defects or sometimes could be due to technical issues (bad dissection), we now change the description to, “control animals always showed connected TE and SV, whereas ham mutant TE and SV tissues were either separated from each other, or appeared contacted but with the epithelial tubes being discontinuous (Fig. 2B).” Additionally, we quantified the disconnection phenotype, which is 100% penetrance in 18 mutant animals. This quantification is now included in the figure. 

      (3) Lines 289-305, Figure 3. I could only find how many replicates were analyzed in the RNAseq/CUT&Tag experiments in the Material & Methods section. I would add that at least in the figure legends, and perhaps even in the main text. Most importantly, I would add a Principal Component Analysis (one for RNAseq and one for the CUT&TAG experiment), to demonstrate the similarity of biological replicates (3x RNaseq, 4x Cut&Tag) but also of the technical replicates (RNAseq: wt & wt/dg, ham/ham & ham/df, GD & TE; CUT&TAG: Antibody & GFP-Antibody, TG&TE...). This should be very easy with the existing data, and clearly demonstrate similarities & differences in the different types of replicates and conditions. 

      Principle component analysis and its description are now added to Supplementary Fig 6 and the main text respectively. 

      (4) Line 321; Supplementary Table 1: In the table, I cannot find which genes are down- or upregulated - something that I think is very important. I would add that, and remove the "color" column, which does not add any useful information. 

      In Supplementary table 1, the first sheet includes upregulated genes while the second sheet includes downregulated genes. We removed the column “color” as suggested.  

      (5) Line 409: SCRINSHOT was carried out with candidate genes from the screen. One gene I could not find in that list was the potential microtubule-actin crosslinker shot. If shot knockdown caused a phenotype, then I would clearly mention and show it. If not, I would mention why a shot is important, nonetheless. 

      shot is one of the candidate target genes selected from our RNA-seq and Cut&Tag data. However, in the RNAi screen, knocking down shot with the available RNAi lines did not cause any obvious phenotype. These could be due to inefficient RNAi knockdown or redundancy with other factors. We anyway wanted to examine shot expression pattern in the developing RS, give the important role of shot in epithelial fusion (Lee S., 2002). Using SCRINSHOT, we could detect epithelial-specific expression of shot, implying its potential function in this context. We now revised the text to clarify this point. 

      Minor points 

      (1) Cartoons in Figure 1: The cartoons look like they were inspired by the cartoon from Kozopas et al., 1998 Fig. 10 or Rothenbusch-Fender et al., 2016 Fig 1. I think the manuscript would greatly profit from better cartoons, that are closer to what the tissue really looks like (see Figure 1H, 2G), to allow people to understand the somewhat complicated architecture. The anlagen of the seminal vesicles/paragonia looks like a butterfly with a high columnar epithelium with a visible separation between paragonia/seminal vesicles (upper/lower "wing" of the "butterfly"). Descriptions like "unseparated" paragonia/seminal vesicle anlagen, would be much easier to understand if the cartoons would for example reflect this separation. It would even be better to add cartoons of the phenotypic classes too, and to put them right next to the micrographs. (Another nitpick with the cartoons: pigment cells are drastically larger and fewer in number (See: Bischoff et al., 2021 Figure 1E & MovieM1).) 

      Thanks for the suggestion. We have updated Figure 1 by adding additional illustrations showing the accessory gland and seminal vesicle structures in the pupal stage and changing the size of pigment cells.

      (2) Line 95-121 I would also briefly introduce PR domains, here. 

      We have added a brief descripition of the PR domains.

      (3) Line 152, 158, 160, 162. When first reading it, I was a bit confused by the usage of the word sensory organ. I would at least introduce that bristles are also known as external mechanosensory organs. 

      We have now revised the description to “mechano-sensory organ”.

      eg. Line 184, 194, and many more. Most times, the authors call testis muscle precursors "myoblasts". This is correct sometimes, but only when referring to the stage before myoblast-fusion, which takes place directly before epithelial fusion (28 h APF). Postmyoblast-fusion (eg. during migration onto the testis), these cells should be called myotubes or nascent myotubes, as the fly muscle community defined the term myoblast as the singlenuclei precursors to myotubes. 

      We have now revised the description accordingly.  

      (4) Line 217/Figure 2B. It looks like there is a myotube bridge between the testis and the genital disc. I would point that out if it's true. If the authors have a larger z-stack of this connection, I suggest creating an MIP, and checking if there are little clusters of two/three/four nuclei packed together. This would clearly show that the cells in between are indeed myotubes (granted that loss of ham does not introduce myoblast-fusion-defects). 

      We do not have a Z-stack of this connection, and thus can not confirm whether the cells in this image are myotubes. However, we found that mytubes can migrate onto the testis and form the muscular sheet in the ham mutant despite reduced myotube density. At the junction there are myotubes, suggesting that loss of ham does not introduce myoblast-fusion defects. These results are now included in the revised manuscript, supplementary Fig. 5 C-D.

      (5) Line 231/Supplementary Fig. 3C-G: I would add to the cartoons, where the different markers are expressed. 

      We have added marker gene expression in the cartoons.

      (6) Line 239. I don't see what Figure 1A/1H refers to, here. I would perhaps just remove it. 

      Yes, we have removed it.

      (7) Line 232. I would rephrase the beginning of the sentence to: Our data suggest Ham to be... 

      Yes, we have revised it.

      (8) Line 248-250/Figure 2F. Clonal analyses are great, but I think single channels should be shown in black and white. Also, a version without the white dashed line should be shown, to clearly see the differences between wt and ham-mutant cells. 

      Now single channel images from the green and red images are presented in Supplementary Figures. This particular one is in Supplementary Figure 3B. 

      (9) Line 490. The Toll-9 phenotype was identified on the sterility effect/lack-of-spermphenotype alone, and it was deduced, that this suggests connection defects. By showing the right focus plane in Fig S8B (lower right), it should be easy to directly show whether there is a connection defect or not. Also, one would expect clearer testis-shaping defects, like in ham-mutants, as a loss of connection should also affect myotube migration to shape the testis. This is just a minor point, as it only affects supplementary data with no larger impact on the overall findings, even if Toll-9 is shown not to have a defect, after all. 

      We find that scoring defects at the junction site at the adult stage is difficult and may not be always accurate. Instead, we score the presence of sperms in the SV, which indirectly but firmly suggests successful connection between the TE and SV. We have now included a quantification graph, showing the penetrance of the phentoype in the new Supplementary Fig.14C. There were indeed morphological defects of TE in Toll-9 RNAi animals. We now included the image and quantification in the new Supplementary Fig.14B.

    1. Author response:

      The following is the authors’ response to the original reviews

      Response to the public reviews:

      We are very pleased to see these positive reviews of our preprint.

      Reviewers 1 and 3 raise issues around PIP-PP1 interactions.

      (1) Role of the “RVxF-ΦΦ-R-W string”

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs) and Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed the trajectory of the PPP1R15A/B, Neurabin/Spinphilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs across the PP1 surface encompasses not only the RVxF-ΦΦ-R trio, but also additional sequences C-terminal to it (Chen et al, eLife, 2015). This extended trajectory is maintained in the Phactr1-PP1 complex (Fedoryshchak et al, eLife (2020). Based on structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134.

      The extended “RVxF-ΦΦ-R-W” interaction brings sequences C-terminal to the “W” SLiM into the vicinity of the hydrophobic groove that adjoins the PP1 catalytic centre. In the Phactr1/PP1 complex, these sequences remodel the groove, generating a novel pocket that facilitates sequence-specific substrate recognition.

      This raises the possibility that sequences C-terminal to the extended “RVxF-ΦΦ-R-W string” in the other complexes also confer sequence-specific substrate recognition, and our study aims to test this hypothesis. Indeed, the hydrophobic groove structures of the Neurabin/Spinophilin/PP1 and Phactr1/PP1 complexes differ significantly (Ragusa et al, 2010; see Fedoryshchak et al 2020, Fig2 FigSupp1).

      (2) Orientation of the W side chain

      Reviewer 1 points out that in the substrate-bound PP1/PPP1R15A/Actin/eIF2 pre-dephosphorylation complex the W sidechain is inverted with respect to its orientation in  PP1-PPP1R15B complex (Yan et al, NSMB 2021). The authors proposed that this may reflect the role of actin in assembly of the quaternary complex. This does not necessarily invalidate the notion that sequences C-terminal to the “W” motif might play a role in actin-independent substrate recognition, and we therefore consider our inclusion of the R15A/B fusions in our analysis to be reasonable.

      (3) Conservation of W

      The motif ‘W’ does not mandate tryptophan - Phactrs and PPP1R15A/B indeed have W at this position but Neurabin/spinophilin contain VDP, which makes similar interactions. Similarly the “RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In our revision, we will present comparisons of the differentially remodelled/modified PP1 hydrophobic groove in the various complexes, discuss the different orientations of the tryptophan in the previously published PPP1R15A/PP1 and PPP1R15B/PP1 structures. We will also address the other issues raised by the referees.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comments and suggestions for revisions

      (1) The authors do not provide strong evidence that the interactions of the 'W' of the RVxF- øø -R-W string with the hydrophobic groove of PP1 is conserved in PIPs. Whereas the RVxF motif is well conserved and validated since its discovery in 1997, as are the øø - (an extension of the RVxF motif), and the 'R', the conservation of the Trp residue in the RVxF-øø-R-W string is not conserved.

      We did not mean to imply that the W motif is conserved amongst all PIPs.

      Most PIPs interact with the globular PP1 catalytic core through short linear interaction motifs (SLiMs). Choy et al (PNAS 2014) previously showed that many PIPs interact with PP1 through a conserved trio of SLiMs, RVxF-ΦΦ-R, which is also present in the Phactrs.

      Previous structural analysis showed that the PPP1R15A/B, Neurabin/Spinophilin (PPP1R9A/B), and PNUTS (PPP1R10) PIPs share a trajectory across the PP1 surface that encompasses not only the RVxF-ΦΦ-R SLIMs, but also additional sequences C-terminal to the R SLIM (Chen et al, eLife, 2015). This trajectory is also shared by the Phactr1-PP1 complex (Fedoryshchak et al, eLife, 2020). Based on this structural alignment we proposed the existence of an additional hydrophobic “W” SLiM that interacts with the PP1 residues I133 and Y134 (See Fedoryshchak et al, 2020, Figure 1 figure supplement 2).

      Introduction, paragraph 2 is rewritten to make this clearer.

      The sequence and positions of W differ in amino acid type and position relative to the RVxF-øø-R string.

      The motif ‘W’ does not mandate tryptophan, it is our name for a common structurally aligned motif: although the Phactrs and PPP1R15A/B indeed have W at this position, Neurabin and spinophilin contain VDP, which nevertheless makes similar interactions. Similarly the _“_RVxF” motifs in Phactr1, Neurabin/Spinophilin, PPP1R15A/B and PNUTS are LIRF, KIKF, KV(R/T)F and TVTW respectively.

      In the Discussion the authors state that the hydrophobic groove of PP1 is remodelled by Neurabin. However, details of this are not described or shown in the manuscript.

      The shared trajectory determined by the RVxF-øø-R-W string brings the sequences C-terminal to the W SLIM into the vicinity of the PP1 hydrophobic groove. In the Phactr1/PP1 holoenzyme this generates a novel pocket required for substrate recognition (Fedoryshchak et al, 2020). These observations raised the possibility that sequences C-terminal to the “W” motif in the other RVxF-øø-R-W PIPs also play a role in substrate recognition.

      Introduction paragraph 3 now cites a new Figure 1-S2, which shows how the hydrophobic groove is remodelled in the various different PIP/PP1 complexes. A revised Figure 1A now indicates the hydrophobic residues defining the hydrophobic groove by grey shading.

      (2) To add to the confidence of the structure, the authors should include a 2Fo-Fc simulated annealing omit map, perhaps showing the R and W interactions of the RVxF-øø-R-W string.

      This is now included as new Figure 6 Figure supplement 1. Note that in Neurabin, the W motif is VDP, where the valine and proline sidechains interact similarly to the tryptophan (see also new Figure 1-S2G,H).

      We also add a new supplementary Figure 6-S1 comparing our PBM-liganded Neurabin PDZ domain with the previously published unliganded structure (Ragusa et al 2010).

      (3) Page 16. The authors state that spinophilin remodels the PP1 hydrophobic groove differently from Phactrs. Arguably spinophilin does not remodel the PP1 hydrophobic groove at all. There are no contacts between spinophilin and the PP1 hydrophobic groove in the spinophilin-PP1 structure, correlating with the absence of 'W" in the RVxF-øø-R-W string in spinophilin.

      The VDP sequence corresponding to the W motif in spinophilin and neurabin makes analogous contacts to those made by the W in Phactr1 (see Fedoryshchak et al 2020).

      Remodelling is meant in the sense of altering the structure of the major groove by bringing new sequences into its vicinity rather than necessarily directly interacting with it. The spinophilin/PP1 and Phactr/PP1 hydrophobic grooves are compared in new Figure 1-S2 (see also Fedoryshchak et al 2020, Figure 2 figure supplement 1)

      (4) Page 8. For the cell-based/proteomics-dephosphorylation assay in Figure 2, it isn't clear why there were no dephosphorylation sites detected for the PPP1R15A/B-PP1 fusion (except PPP6R1 S531 for PPP1R15B). One might have expected a correlation with PP1 alone. Does this imply that PPP1R15A/B are inhibiting PP1 catalytic activity? Was the activity tested in vitro?

      The R15A/B data are compared to average abundance of all the phosphosites in the dataset, including those of PP1.

      We have not tested for a general inhibitory effect of R15A/B on PP1 activity. Many PIPs including R15A/B do occlude one or more of the PP1 substrate groove and therefore generally act as inhibitors of PP1 activity against some potential substrates, while enhancing activities against others.

      Other points 

      (4) Figure S1: Colour sequence similarities/identities.

      Done

      (6) Figures: Structure figures lacked labels:

      Figure 1A, label PP1, Phactrs etc.

      Done

      Figure 6, label PP1, Neurabin, previous Neurabin structure (Fig. 6C), hydrophobic groove, PDZ domain, etc.

      Done

      (7) Statistical analysis. p values should be shown for data in:

      Figure 5.

      To avoid cluttering the Figure, a new sheet, “statistical significance” has been added to Supplementary Table 3, summarizing the analysis.

      Figure 1.

      Figure amended (now figure 1-S1).

      (8) Some inconsistency with labels, eg '34-WT' used in Fig. 5C, whereas '34A-WT' (better) in Methods.

      Now changed to 34A etc where used.

      (9) Page 6. PPP1R9A/B is not shown in Figure 1A and Figure S1A.

      PPP1R9A/B are Neurabin and spinophilin - now clarified in Introduction paragraph 2, Results paragraph 1, Discussion paragraph 1.

      (10) Page 7: lines 4, 'site' not 'side'.

      Done

      (11) Page 9: DTL and CAMSAP3 were found to be dephosphorylated in the PP1-Neurabin/spinophilin screen. Are these PDZ-binding proteins?

      Neither DTL nor CAMSAP3 contain C-terminal hydrophobic residues characteristic of classical PBMs. Sentence added in Discussion, paragraph 5

      (12) Page 12 and Figure 5 and S5: The synthetic p4E-BP1 and IRSp53WT peptides with PBM should be given more specific names to indicate the presence of the PBM.

      We have renamed 4E-BP1<sup>WT</sup> and IRSp53<sup>WT</sup> to 4E-BP1<sup>PBM</sup> and  IRSp53<sup>PBM</sup> respectively, emphasising the inclusion of the wildtype or mutated PBM from 4E-BP1 on these peptides.

      Text, Figure 5, and Figure S5 all revised accordingly.

      (13) Give PDB code for spinophilin-PP1 complex coordinates shown in Figure 6C.

      PDB codes for the various PIP/PP1 complexes now given in new Figure 1-S2 and revised Figure 6C.

      Reviewer #2 (Recommendations for the authors):

      The work undertaken by the authors is extensive and robust, however, I believe that some improvement in the writing and some detailed explanation of certain results sections would help with the presentation of the work and clarity for the readers.

      (1) The introduction should contain more information about the interaction between PP1 and Neurabin, given that this is the focus of the paper. This would give the reader the necessary background required to follow the paper.

      Introduction paragraph 2 revised to describe the different SLIMs in more detail. New Figure 1-S2 shows detail of the different remodelled hydrophobic grooves in the various PIP/PP1 complexes.

      (2) More information on PP1-IRSp53L460A has to be added before discussing results in S1B.

      Sentence explaining that IRSp53 L460 docks with the remodelled PP1 hydrophobic groove in the Phactr1/PP1 holoenzyme added in Results paragraph 2.

      (3) Page 6: "as expected, the +5 residue L460A mutation, which impairs dephosphorylation by the intact Phactr1/PP1 holoenzyme, impaired sensitivity to all the fusions, indicating that they recognise phosphorylated IRSp53 in a similar way (Figure S1B)". Statistics between IRSp53 and IRSp53L460A across PP1-PIPs need to be conducted before concluding the above. From the graph and the images, the impairment to dephosphorylation is not convincing.

      For each of the four PP1-Phactr fusions, the IRSp53 L460A peptide shows significantly less reactivity than the IRSp53WT peptide (p<0.05 for each fusion).

      Since the proteomics studes in Figure 2 show that the substrate specificity of the four PP1-Phactr1 fusions is virtually identical, we combined the data for the four different fusions. The IRSp53 L460A peptide shows significantly less reactivity than the IRSp53WT peptide in this analysis (p< 0.0001). This result shown in revised Figure S1B and legend.

      (4) mCherry-4E-BP1(118+A), in which an additional C-terminal alanine should still allow TOSmediated phosphorylation, but prevent PDZ interaction. Does 4EBP1 (118+A) actually prevent interaction between PP1-Neurabin? This interaction needs to be validated, especially since spinophilin was shown to bind to multiple regions of PP1.

      It is not clear what the referee is asking for here. The biochemical analysis in Figure 4C shows that the C-terminus of 4E-BP1 constitutes a classical PBM. The X-ray crystallography in Figure 6 confirms this, demonstrating H-bond interactions between the 4E-BP1 C-terminal carboxylate and main chain amides of L514, G515 and I516.

      We consider the possibility that the 4E-BP1(118+A) mutant inhibits the activity of PP1-neurabin via a mechanism other than direct blocking 4E-BP1 / PDZ interaction to be unlikely for the following reasons:

      (1) Addition of a C-terminal alanine will disrupt the PBM interaction because the extra residue sterically blocks access to the PBM-binding groove. This is the most parsimonious explanation, and is based on our solid structural and biochemical evidence that the 4E-BP1 C-terminus is a classical PBM.

      (2) Alphafold3 modelling predicts Neurabin PDZ / 4E-BP1 PBM interaction with high confidence (shown in Figure 6-S2E), but it does not predict any PDZ interaction with 4E-BP1(118+A). Note added in Figure 6-S2 legend.

      (3) Recognition of the 4E-BP1(118+A) mutation without loss of binding affinity would require that the mutant becapable of binding formally equivalent to recognition of an “internal” PDZ-binding peptide. Recognition of such “internal peptides” is dependent on their adopting a specifically constrained conformation, which typically requires reorganisation of the PDZ carboxylate-binding GLGF loop. Such “internal site” recognition typically involves more than one residue C-terminal to the conventional PDZ “0” position (see Penkert et al NSMB 2004, doi:10.1038/nsmb839; Gee et al JBC 1998, DOI: 10.1074/jbc.273.34.21980; Hillier et al 1999, Science PMID: 10221915).

      (5) It is nice to see that the various PP1-Phactr fusions have around 60% substrate overlap between them. Would it be possible to compare these results with previously published mass spec data of Phactr1XXX from the group? There is mention of some substrates being picked up, but a comparison much like in Figure 2E would be more informative about the extent to which the described method captures relevant information.

      This is difficult to do directly as the PP1-Phactr fusion data are from human cells while that in Fedoryshchak et al 2020 is from mouse.

      However, manual curation shows that of the 28 top hits seen in our previous analysis of Phactr1XXX in NIH3T3 cells, 18 were also detectable in the HEK293 system; of these, 13 were also detected as as PP1-Phactr fusion hits. Data summarised in new Figure 2-S1C. Text amended in Results, “Proteomic analysis...”, paragraph 2.

      (6) Figure 3D Why are the levels of pT70, pT37/46 and total protein in vector controls much lower as compared to 0nM Tet in PP1-Neurabin conditions? It is also weird that given total protein is so low, why are the pS65/101 levels high compared to the rest?

      We think it likely these phenomena reflect a low level expression of PP1-Neurabin expression in uninduced cells. Now noted in Figure 3D legend, basal PP1-Neurabin expression shown in new Figure 3-S1C. This alters the relative levels of the different species detected by the total 4E-BP1 antibody in favour of the faster migrating forms, which are less phosphorylated than the slower ones, and the total amount increases about 2-fold (Figure 3D, compare 0nM Tet lanes).

      The altered p65/101-pT70 ratio is also likely to reflect the leaky PP1-Neurabin expression, since the relative intensities of the various phosphorylated species are dependent on both the relative rates of phosphorylation and dephosphorylation. Expression of a phosphatase would therefore be expected to differentially affect the phosphorlyation levels of different sites according to their reactivity.

      (7) Figure 3E: Does inhibiting mTORC further reduce translation when PP1-Neurabin is expressed? If this is the case, this might suggest that they might not necessarily be mTORC inhibitors?

      We have not done this experiment. Since Rapamycin cannot be guaranteed to completely block 4E-BP1 phosphorylation, and PP1-Neurabin cannot be guaranteed to completely dephosphorylate 4E-BP1, any further reduction upon their combination would be hard to interpret.

      (8) Substrate interactions with the remodelled PP1 hydrophobic groove do not affect PP1-Neurabin specificity. Is there evidence that PP1-Neurabin remodels the hydrophobic groove? Is it not possible that Neurabin does not remodel the PP1 groove to begin with and hence there is no effect observed with the various mutants? If this is not the case, it should be explained in a bit more detail.

      Comparison of the Neurabin/PP1 and Phactr1/PP1 structures shows that the hydrophobic groove is remodelled differently in the two complexes. Now shown in new Figure 1-S2B,C,G.

      (9) Figure 5B has a lot of interesting information, which I believe has not been discussed at all in the results section.

      To help interpretation of the enzymology in Figure 5 we have renamed 4E-BP1WT and IRSp53WT to 4E-BP1PBM and IRSp53PBM respectively, emphasising the inclusion of the wildtype or mutated PBM from 4E-BP1 on these peptides. Text in Results, “PDZ domain interaction…”, paragraph 1, and Figures 5 and S5 revised accordingly.

      Why does the 4E-BP1Mut affect catalytic efficiency of PP1 alone when compared with WT, while no difference is observed with IRSp53WT and mutant?

      We do not understand the basis for the differential reactivity of 4E-BP1PBM and 4E-BP1MUT with PP1 alone; we suspect that it reflects the hydrophobicity change resulting from the MDI -> SGS substitution. However this is unlikely to be biologically significant as PP1 is sequestered in PIP-PP1 complexes.

      Importantly, the two PP1 fusion proteins behave consistently in this assay – the presence of the intact PBM increases reactivity with PP1-Neurabin, but has no effect on dephosphorylation by PP1-Phactr1.

      Why does PP1 alone not have a difference between IRSp53WT and mutant, while PP1-Neurabin does have a difference?

      This is due to the presence of the PBM in IRSp53WT (now renamed IRSp53PBM), which affects increases affinity for PP1 Neurabin, but not PP1 alone. Likewise, PP1-Phactr1, which does not possess a PDZ domain, is also unaffected by the integrity of the PBM.

      (7) “Strikingly, alanine substitutions at +1 and +2 in 4E-BP1WT increased catalytic efficiency by both fusions, perhaps reflecting changes at the catalytic site itself (Figure 5E, Figure S5E)”. This could be expanded upon, because this suggests a mechanism that makes the substrate refractory to PDZ/hydrophobic groove remodelling?

      We favour the idea that this reflects a requirement to balance dephosphorylation rates between the multiple 4E-BP1 phosphorylation sites, especially if multiple rounds of dephosphorylation occur for each PBM—PDZ interaction. Additional sentences added in Discussion paragraph 7.

      (8) Typographical errors and minor comments:

      a) PIPs can target PP1 to specific subcellular locations, and control substrate specificity through autonomous substrate-binding domains, occupation or extension of the substrate grooves, or modification of PP1 surface electrostatics.

      b) Phosphophorylation side site abundances within triplicate samples from the same cell line were comparable between replicates (Figure 2B).

      c) While the alanine substitutions had little effect, conversion of +4 to +6 to the IRSp534E-BP1 sequence LLD increased catalytic efficiency some 20-fold (Figure 5C, Figure S5C). 

      d) Figure 3E labels are not clear. The graph can be widened to make the labels of the conditions clearer.

      All corrected

      Reviewer #3 (Recommendations for the authors):

      This was a very well-written manuscript.

      However, I was looking for a summary mechanistic figure or cartoon to help me navigate the results.

      I noted a few typos in the text.

      New summary Figure 5-S2 added, cited in results, and discussed in Discussion paragraph 6,7.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This article presents a meta-analysis that challenges established abundance-occupancy relationships (AORs) by utilizing the largest known bird observation database. The analysis yields contentious outcomes, raising the question of whether these findings could potentially refute AORs.

      We thank the Reviewer for their positive comments.

      Strengths:

      The study employed an extensive aggregation of datasets to date to scrutinize the abundance-occupancy relationships (AORs).

      We thank the Reviewer for their positive comments.

      Weaknesses:

      While the dataset employed in this research holds promise, a rigorous justification of the core assumptions underpinning the analytical framework is inadequate. The authors should thoroughly address the correlation between checklist data and global range data, ensuring that the foundational assumptions and potential confounding factors are explicitly examined and articulated within the study's context.

      We thank the Reviewer for these comments. We agree that more justification and transparency is needed of the core assumptions that form the foundation of our methods. In our revised version, we have taken the following steps to achieve this:

      - Altered the title to be more explicit about the core assumptions, which now reads: “Local-scale relative abundance is decoupled from global range size”

      - We have added more details on why and how we treat global range size as a measure of ‘occupancy.’

      - We have added a section that discusses the limitations of using eBird relative abundance

      Reviewer #2 (Public Review):

      Summary:

      The goal is to ask if common species when studied across their range tend to have larger ranges in total. To do this the authors examined a very large citizen science database which gives estimates of numbers, and correlated that with the total range size, available from Birdlife. The average correlation is positive but close to zero, and the distribution around zero is also narrow, leading to the conclusion that, even if applicable in some cases, there is no evidence for consistent trends in one or other direction.

      We thank the Reviewer for these comments.

      Strengths:

      The study raises a dormant question, with a large dataset.

      We thank the Reviewer for these comments. We intended to take a longstanding question and attempt to apply novel datasets that were not available mere decades ago. While we do not imply that we have ‘solved’ the question, we hope this work highlights the potential for further interrogation using these large datasets.

      Weaknesses:

      This study combines information from across the whole world, with many different habitats, taxa, and observations, which surely leads to a quite heterogeneous collection.

      We agree that there is a heterogeneous collection of data across many habitats, taxa, and observations. However, rather than as a weakness, we see this as a significant strength. Our work assumes we are averaging over this variability to assess for a large-scale pattern in the relationship - something that was potentially a limitation of previous work, as these large datasets were often focused on particular contexts (e.g., much work focused solely on the UK), which we believe could limit some of the generalizability of the previous work. However, the reviewer makes a fair point in regard to the heterogeneity of data collection. We have now added some text in the discussion which is explicit about this - see the new section named “Potential limitations of current work and future work –-although our findings challenge some long-held assumptions about the consistency of the abundance-occupancy relationship, our work only deals with interspecific AORs among birds, synthesizing observations of potentially heterogeneous locations, context and quality”.

      First, scale. Many of the earlier analyses were within smaller areas, and for example, ranges are not obviously bounded by a physical barrier. I assume this study is only looking at breeding ranges; that should be stated, as 40% of all bird species migrate, and winter limitation of populations is important. Also are abundances only breeding abundances or are they measured through the year? Are alien distributions removed?

      Second, consider various reasons why abundance and range size may be correlated (sometimes positively and sometimes negatively) at large scales. Combining studies across such a large diversity of ecological situations seems to create many possibilities to miss interesting patterns. For example:

      (1) Islands are small and often show density release.

      See comment below.

      (2) North temperate regions have large ranges (Rapoport's rule) and higher population sizes than the tropics.

      See comment below.

      (3) Body size correlates with global range size (I am unsure if this has recently been tested but is present in older papers) and with density. For example, cosmopolitan species (barn owl, osprey, peregrine) are relatively large and relatively rare.

      See comment below.

      (4) In the consideration of alien species, it certainly looks to me as if the law is followed, with pigeon, starling, and sparrow both common and widely distributed. I guess one needs to make some sort of statement about anthropogenic influences, given the dramatic changes in both populations and environments over the past 50 years.

      See comment below. We also added a sentence in the methods that highlighted we did not remove alien ranges and provided reasons why. Still, we do acknowledge the dramatic changes in populations and environments over the past 50 years (see the new section  “Potential limitations of current work and futur work”)

      (5) Wing shape correlates with ecological niche and range size (e.g. White, American Naturalist). Aerial foraging species with pointed wings are likely to be easily detected, and several have large ranges reflecting dispersal (e.g. barn swallow).

      We agree that all of the points above are interesting data explorations. As said above, our main purpose was to highlight the potential for further interrogation using these large datasets. However, we have added some additional text in the discussion that explicitly mentions/encourages these additional data explorations. We hope people will pick up on the potential for these data and explore them further.

      Third, biases. I am not conversant with ebird methodology, but the number appearing on checklists seems a very poor estimate of local abundance. As noted in the paper, common species may be underestimated in their abundance. Flocking species must generate large numbers, skulking species few. The survey is often likely to be in areas favorable to some species and not others. The alternative approach in the paper comes from an earlier study, based on ebird but then creating densities within grids and surely comes with similar issues.

      We agree that if we were interested in the absolute abundance of a given species, the local number on an eBird checklist would be a poor representation. However, our study aims not to estimate absolute abundance but to examine relative abundance among species on each checklist. By focusing on relative abundance, we leverage eBird data's strengths in detecting the presence and frequency of species across diverse locations and times, thereby capturing community composition trends that can provide meaningful insights despite individual checklist biases. This approach allows us to assess the comparative prominence of species in the community as reported by the observer, providing a consistent metric of relative abundance. Despite detectability biases, the structure of eBird checklists reflects the observer’s encounter rates with each species under similar conditions, offering a valuable snapshot of relative species composition across sites and times. The key to our assumption is that these biases discussed are not directional and, therefore, random throughout the sampling process, which would translate to no ‘real’ bias in our effect size of interest.

      Range biases are also present. Notably, tropical mountain-occupying species have range sizes overestimated because holes in the range are not generally accounted for (Ocampo-Peñuela et al., Nature Communications). These species are often quite rare, too.

      We thanks the reviewer for pointing to this issue and reference. We included a discussion on these biases in our limitations section and reference Ocampo-Peñuela et al. to emphasize the need for improved spatial resolution in range data for more accurate AOR assessments.”More precise range-size estimates would also improve the accuracy of AOR assessments, since species range data are often overestimated due to the failure to capture gaps in actual distributions ”

      Fourth, random error. Random error in ebird assessments is likely to be large, with differences among observers, seasons, days, and weather (e.g. Callaghan et al. 2021, PNAS). Range sizes also come with many errors, which is why occupancy is usually seen as the more appropriate measure.

      If we consider both range and abundance measurements to be subject to random error in any one species list, then the removal of all these errors will surely increase the correlation for that list (the covariance shouldn't change but the variances will decrease). I think (but am not sure) that this will affect the mean correlation because more of the positive correlations appear 'real' given the overall mean is positive. It will definitely affect the variance of the correlations; the low variance is one of the main points in the paper. A high variance would point to the operation of multiple mechanisms, some perhaps producing negative correlations (Blackburn et al. 2006).

      We agree random errors can affect estimates, but as we wrote above, random errors, regardless of magnitudes, would not bias estimates. After accounting for sampling error (a part of random errors), little variance is left to be explained as we have shown in the MS. This suggests that many of the random errors were part of the sampling errors. And this is where meta-analysis really shines.

      On P.80 it is stated: "Specifically, we can quantify how AOR will change in relation to increases in species richness and sampling duration, both of which are predicted to reduce the magnitude of AORs" I haven't checked the references that make this statement, but intuitively the opposite is expected? More species and longer durations should both increase the accuracy of the estimate, so removing them introduces more error? Perhaps dividing by an uncertain estimate introduces more error anyway. At any rate, the authors should explain the quoted statement in this paper.

      It would be of considerable interest to look at the extreme negative and extreme positive correlations: do they make any biological sense?

      Extremely high correlations would not make any biological sense if these observations were based on large sample sizes. However, as shown in Figure 2, all extreme correlations come from small sample sizes (i.e., low precision), as sampling theory expects (actually our Fig 2 a text-book example of the funnel shape). Therefore, we do not need to invoke any biological explanations here.

      Discussion:

      I can see how publication bias can affect meta-analyses (addressed in the Gaston et al. 2006 paper) but less easily see how confirmation bias can. It seems to me that some of the points made above must explain the difference between this study and Blackburn et al. 2006's strong result.

      We agree. Now, we extended an explanation of why confirmation bias could result in positive AOR. Yet, we point out confirmation bias is a very common phenomena which we cite relevant citations in the original MS. The only way to avoid confirmation bias is to conduct a study blind but this is not often possible in ecological work.

      “Meta-research on behavioural ecology identified 79 studies on nestmate recognition, 23 of which were conducted blind. Non-blind studies confirmed a hypothesis of no aggression towards nestmates nearly three times more often. It is possible that confirmation bias was at play in earlier AOR studies.”

      Certainly, AOR really does seem to be present in at least some cases (e.g. British breeding birds) and a discussion of individual cases would be valuable. Previous studies have also noted that there are at least some negative and some non-significant associations, and understanding the underlying causes is of great interest (e.g. Kotiaho et al. Biology Letters).

      We agree. And yes, we pointed out these in our introduction.

      Reviewer #3 (Public Review):

      Summary:

      This paper claims to overturn the longstanding abundance occupancy relationship.

      Strengths:

      (1) The above would be important if true.

      (2) The dataset is large.

      We have clarified this point by changing the title to emphasize that we do not suggest overturning AORs entirely but instead provide a refined view of the relationship at a global scale. Our results suggest a weaker and more context-dependent AOR than previously documented. We hope our revised title and additional clarifications in the text convey our intent to contribute to a more nuanced understanding rather than a whole overturning of the AOR framework.

      Weaknesses:

      (1) The authors are not really measuring the abundance-occupancy relationship (AOR). They are measuring abundance-range size. The AOR typically measures patches in a metapopulation, i.e. at a local scale. Range size is not an interchangeable notion with local occupancy.

      We have refined this in our revision to be more explicitly focused on global range size. However, we note that the classic paper by Bock and Richlefs (1983, Am Nat) also refers to global (species entire) range size in the context of the AOR. Importantly, Bock and Richlefs pointed out the importance of using species’ entire ranges; without such uses, there will be sampling artifacts creating positive AORs when using arbitrary geographical ranges, which were used in some studies of AORs. So we highlight that our work is well in line with the previous work, allowing us to question the longstanding macroecological work. One of the issues of AOR has been how to define occupancy and global range size, which provides a relatively ambiguous measure, which is why we used this measure.

      (2) Ebird is a poor dataset for this. The sampling unit is non-standard. So abundance can at best be estimated by controlling for sampling effort. Comparisons across space are also likely to be highly heterogenous. They also threw out checklists in which abundances were too high to be estimated (reported as "X"). As evidence of the biases in using eBird for this pattern, the North American Breeding Bird Survey, a very similar taxonomic and geographic scope but with a consistent sampling protocol across space does show clear support for the AOR.

      Yes, we agree the sampling unit is non-standard. However, this is a significant strength in that it samples across much heterogeneity (as discussed in response to Reviewer 2, above). We were interested in relative abundance and not direct absolute abundance per se, which is accurate, especially since we did control for sampling effort.

      We appreciate the reviewer’s attention to our data selection criteria. We excluded checklists containing ‘X’ entries to minimize biases in our abundance estimates. The 'X' notation is often used for the most common species, reflecting the observer's identification of presence without specifying a count. This approach was chosen to avoid disproportionately inflating presence data for these abundant species, which could distort the relative abundance calculations in our analysis. By excluding such checklists, we aimed to retain consistency and ensure that local abundance estimates were representative across all species on each checklist. We have revised our manuscript to clarify this methodological choice and hope this explanation addresses the reviewer’s concern. We modified our text in the methods to make the entries ‘X’ clearer (see the Method section).

      (3) In general, I wonder if a pattern demonstrated in thousands of data sets can be overturned by findings in one data set. It may be a big dataset but any biases in the dataset are repeated across all of those observations.

      Overturning a major conclusion requires careful work. This paper did not rise to this level.

      We appreciate the reviewer’s caution regarding broad conclusions based on a single dataset, even one as large as eBird. Our intention was not to definitively overturn the abundance-occupancy relationship (AOR) but to re-evaluate it with the most extensive and globally representative dataset currently available. We recognise that potential biases in citizen science data, such as observer variation, may influence our findings, and we have taken steps to address these in our methodology and limitations sections. We see this work as a contribution to an ongoing discourse, suggesting that AOR may be less universally consistent than previously believed, mainly when tested with large-scale citizen science data. We hope this study will encourage additional research that tests AORs using other expansive datasets and approaches, further refining our understanding of this classic macroecological relationship. However, we have left our broad message about instigating credible revolution and also re-examining ecological laws.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The investigation focuses solely on interspecific relationships among birds; thus, the extrapolation of these conclusions to broader ecological contexts requires further validation.

      We have now added this point to our new section: “Although our findings challenge some long-held assumptions about the consistency of the abundance-occupancy relationship, our work only deals with interspecific AORs among birds, so we hope this work serves as a foundation for further investigations that utilize such comprehensive datasets.”

      (2) The rationale for combining data from eBird - a platform predominantly representing individual observations from urban North America - with the more globally comprehensive BirdLife International database needs to be substantiated. The potential underrepresentation of global abundance in the eBird checklist data could introduce a sampling bias, undermining the foundational premises of AORs.

      We agree with the limitation of ebird sampling coverage, but it should not bias our results. In statistical definitions, bias is directional, and if not directional, it will become statistical noise, making it difficult to detect the signal. In fact, our meta-analyses adjust what statisticians call sampling bias and it is the strength of meta-analysis.

      (3) In the full mixed-effect model, checklist duration and sampling variance (inversely proportional to sample size N) are treated as fixed effects. However, these variables are likely to be negatively correlated, which could introduce multicollinearity, inflating standard errors and diminishing the statistical significance of other factors, such as the intercept. This calls into question the interpretation of insignificance in the results.

      Multicollinearity is an issue with sample sizes. For example, with small datasets, correlations of 0.5 could be an issue, and such an issue would usually show up as a large SE. We do not have such an issue with ~ 17 million data points. Please refer to this paper.

      Freckleton, Robert P. "Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error." Behavioral Ecology and Sociobiology 65 (2011): 91-101.

      (4) The observed low heterogeneity may stem from discrepancies in sampling for abundance versus occupancy, compounded by uncertainties in reporting behavior.

      If we assume everybody underreports common species or overreports rare species, this could happen. However, such an assumption is unlikely. If some people report accurately (but not others), we should see high heterogeneity, which we do not observe).  We have touched upon this point in our original MS.

      (5) The contribution and implementation of phylogenetic comparative analysis remain ambiguous and were not sufficiently clarified within the study.

      We need to add more explanation for the global abundance analysis

      “To statistically test whether there was an effect of abundance and occupancy at the macro-scale, we used phylogenetic comparative analysis.  This analysis also addresses the issue of positive interspecific AORs potentially arising from not accounting for phylogenetic relatedness among species examined ”

      (6) The use of large N checklists could skew the perceived rarity or commonality of species, potentially diminishing the positive correlation observed in AORs. A consistent observer effect could lead to a near-zero effect with high precision.

      Regardless of the number of N species in checklists (seen in Fig 2), correlations are distributed around zero. This means there is nothing special about large N checklists. 

      (7) The study should acknowledge and discuss any discrepancies or deviations from previous literature or expected outcomes.

      We felt we had already done this as we discussed the previous meta-analysis and what we expected from this meta-analysis.  Nevertheless, we have added some relevant sentences in the new version of MS.

      In addition to these major points, there are several minor concerns:

      (1) Figure 2B lacks discussion, and the metric for the number of observations is not clarified. Furthermore, the labeling of the y-axis appears to be incorrect.

      Thank you very much for pointing out this shortcoming. Now, the y-axis label has been fixed and we mention 2B in the main text.

      (2) The study should provide a clear, mathematical expression of the multilevel random effect models for greater transparency.

      Many thanks for this point, and now we have added relevant mathematical expressions in Table S6.

      (3) On Line 260, the term "number of species" should be refined to "number of species in a checklist," ideally represented by a formula for precision.

      This ambiguity has been mended as suggested.

      Please provide the data and R code linked to the outputs.

      The referee must have missed the link (https://github.com/itchyshin/AORs) in our original MS. In addition to our GitHub repository link, we now have added a link to our Zenodo repository (https://doi.org/10.5281/zenodo.14019900).

      Reviewer #3 (Recommendations For The Authors):

      The authors cite Rabinowitz's 7 forms of rarity paper as a suggestion that previous findings also break the AOR. In fact empirical studies of the 7 forms of rarity typically find that all three forms of rareness vs commonness are heavily correlated (e.g. Yu & Dobson 2000).

      We thank the reviewer for drawing attention to Yu & Dobson (2000) and similar studies that find positive correlations among the axes of rarity. Ref 3 is correct in that Rabinowitz’s (1981) framework does not require that local abundance and geographic range size be uncorrelated for every species; instead, it highlights conceptual scenarios where a species may be common locally yet have a restricted distribution (or vice versa).

      Empirical analyses such as Yu & Dobson (2000) show that, on average, these axes can be correlated, which may align with conventional AOR findings in some taxonomic groups. However, Rabinowitz’s key insight was that exceptions do occur, so these exceptions demonstrate that strong positive AORs may not be universally applicable. Our results do not claim that Rabinowitz’s framework “breaks” the AOR outright; instead, we use it to underscore that local abundance can, in principle, be “decoupled” from global occupancy.  Whether the correlation found by Yu & Dobson (2000) implies a positive AOR, requires a detailed simulation study, which is an interesting avenue for future research. 

      Thus, citing Rabinowitz serves to highlight the potential heterogeneity and complexity of abundance–occupancy relationships rather than to refute every positive correlation reported in the literature. Our findings suggest that when examined at large spatiotemporal scales (with unbiased sampling), the overall AOR signal may be less robust than traditionally believed. This is consistent with Rabinowitz’s view that local abundance and global range can vary along independent axes. Now we added

      “Although studies using her framework found positive correlations between species range and local abundance.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript assesses the utility of spatial image correlation spectroscopy (ICS) for measuring physiological responses to DNA damage. ICS is a long-established (~1993) method similar to fluorescence correlation spectroscopy, for deriving information about the fluorophore density that underlies the intensity distributions of images. The authors first provide a technical but fairly accessible background to the theory of ICS, then compare it with traditional spot-counting methods for its ability to analyze the characteristics of γH2AX staining. Based on the degree of aggregation (DA) value, the authors then survey other markers of DNA damage and uncover some novel findings, such as that RPA aggregation inversely tracks the sensitivity to PARP inhibitors of different cell lines.

      The need for a more objective and standardized tool for analyzing DNA damage has long been felt in the field and the authors argue convincingly for this. The data in the manuscript are in general well-supported and of high quality, and show promise of being a robust alternative to traditional focus counting. However, there are a number of areas where I would suggest further controls and explanations to strengthen the authors' case for the robustness of their ICS method.

      Strengths:

      The spatial ICS method the authors describe and demonstrate is easy to perform and applicable to a wide variety of images. The DDR was well-chosen as an arena to showcase its utility due to its well-characterized dose-responsiveness and known variability between cell types. Their method should be readily useable by any cell biologist wanting to assess the degree of aggregation of fluorescent tags of interest.

      Weaknesses:

      The spatial ICS method, though of longstanding history, is not as intuitive or well-known as spot-based quantitation. While the Theory section gives a standard mathematical introduction, it is not as accessible as it could be. Additionally, the values of TNoP and DA shown in the Results are not discussed sufficiently with regard to their physical and physiological interpretation.

      We agree that a major limitation in adaption of this approach is a deeper understanding of the theory and results. We have updated the theory section to include further discussion (Page 4 line 132)

      The correlation of TNoP with γH2AX foci is high (Figure 2) and suggestive that the ICS method is suitable for measuring the strength of the DDR. The authors correctly mention that the number of spots found using traditional means can vary based on the parameters used for spot detection. They contrast this with their ICS detection method; however, the actual robustness of spatial ICS is not given equal consideration.

      We found it difficult to give equal consideration of robustness to ICS. The major limitation of traditional approaches is proper selection of an intensity threshold that is necessary to define and separate foci from background intensity. However, ICS does not employ a threshold, therefore we could not test different thresholding applications in ICS as we did with traditional methods. In our view the absence of the need for a threshold is profoundly advantageous. The only inputs we employ in the ICS analysis are used to segment cell nuclei, yet these have no impact on the ICS calculation and are necessary for any analysis of the DDR.

      Reviewer #2 (Public review):

      Summary:

      Immunostaining of chromatin-associated proteins and visualization of these factors through fluorescence microscopy is a powerful technique to study molecular processes such as DNA damage and repair, their timing, and their genetic dependencies. Nonetheless, it is well-established that this methodology (sometimes called "foci-ology") is subject to biases introduced during sample preparation, immunostaining, foci visualization, and scoring. This manuscript addresses several of the shortcomings associated with immunostaining by using image correlation spectroscopy (ICS) to quantify the recruitment of several DNA damage response-associated proteins following various types of DNA damage.

      The study compares automated foci counting and fluorescence intensity to image correlation spectroscopy degree of aggregation study the recruitment of DNA repair proteins to chromatin following DNA damage. After validating image correlation spectroscopy as a reliable method to visualize the recruitment of γH2AX to chromatin following DNA damage in two separate cell lines, the study demonstrates that this new method can also be used to quantify RPA1 and Rad51 recruitment to chromatin following DNA damage. The study further shows that RPA1 signal as measured by this method correlates with cell sensitivity to Olaparib, a widely-used PARP inhibitor.

      Strengths:

      Multiple proof-of-concept experiments demonstrate that using image correlation spectroscopy degree of aggregation is typically more sensitive than foci counting or foci intensity as a measure of recruitment of a protein of interest to a site of DNA damage. The sensitivity of the SKOV3 and OVCA429 cell lines to MMS and the PARP inhibitors Olaparib and Veliparib as measured by cell viability in response to increasing amounts of each compound is a valuable correlate to the image correlation spectroscopy degree of aggregation measurements.

      Weaknesses:

      The subjectivity of foci counting has been well-recognized in the DNA repair field, and thus foci counts are usually interpreted relative to a set of technical and biological controls and across a meaningful time period. As such:

      (1) A more detailed description of the numerous prior studies examining the immunostaining of proteins such as γH2AX, RAD51, and RPA is needed to give context to the findings presented herein.

      We apologize for not providing enough detail. We have added further references and discussion. γH2AX foci counting, in particular, has been used in thousands of previous studies. (Pages 18 line 513 and 517)

      (2) The benefits of adopting image correlation spectroscopy should be discussed in comparison to other methods, such as super-resolution microscopy, which may also offer enhanced sensitivity over traditional microscopy.

      Thank you for raising this point. We have added this discussion (page 19 line 553). The limiting factor that ICS addresses is the partition coefficient of signal in a foci or cluster versus outside the cluster. Super-resolution will not necessarily improve this unless it is resolved down to single molecule counting. However, one would still need to evaluate how to define a cluster or foci in the background of non-cluster distribution.

      (3) Additional controls demonstrating the specificity of their antibodies to detection of the proteins of interest should be added, or the appropriate citations validating these antibodies included.

      We have added text stating that we only use validated antibodies (page 6 line 193). One thing to note is that we are measuring differences between treatment conditions, thus, if an antibody has non-specific labeling of proteins of cellular structures that do not change upon treatment, our approach would overcome this limitation.

      Reviewer #3 (Public review):

      Summary:

      This paper described a new tool called "Image Correlation Spectroscopy; ICS) to detect clustering fluorescence signals such as foci in the nucleus (or any other cellular structures). The authors compared ICS DA (degree of aggregation) data with Imaris Spots data (and ImageJ Find Maxima data) and found a comparable result between the two analyses and that the ICS sometimes produced a better quantification than the Imaris. Moreover, the authors extended the application of ICS to detect cell-cycle stages by analyzing the DAPI image of cells. This is a useful tool without the subjective bias of researchers and provides novel quantitative values in cell biology.

      Strengths:

      The authors developed a new tool to detect and quantify the aggregates of immunofluorescent signals, which is a center of modern cell biology, such as the fields of DNA damage responses (DDR), including DNA repair. This new method could detect the "invisible" signal in cells without pre-extraction, which could prevent the effect of extracted materials on the pre-assembled ensembles, a target for the detection. This would be an alternative method for the quantification of fluorescent signals relative to conventional methods.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) The ICS theory section is essential and based on an excellent review from one of the authors. It would benefit greatly from a diagram showing where the quantities 𝒈(𝟎, 𝟎), 𝝎𝟎, and 𝒈inf come from in the 2D Gaussian fit, ideally for two cases where these quantities differ (i.e., how they correspond to different DA or TNoP values). In my opinion, this addition would greatly increase the manuscript's accessibility for DDR researchers. The citation of the review at the beginning would also be a plus.

      We have added the review citation at the front of the theory section (page 3 line 87).We have highlighted where g(0,0), the most critical measurement for determination of TNoP and DA, derives from in Figure 2D. However, it is difficult to describe all the curve fit parameters in an image as they have some interdependency on each other and thus labeling one in a single image would not independently capture how they might be observed in a different curve fit.

      (2) The TNoP measured in Figure 2 is a quantity about 2000-3000 times greater than the number of "traditionally detected" foci by both methods and the linear relations have very low Y intercepts. Can the authors comment explicitly on the physical interpretation of this number - are 2 to 3 thousand independent particles present within each "focus" detected by traditional means? If so, then what might one "particle" correspond to? (a single secondary antibody or fluorophore? a nucleosome?). In a similar vein, the X intercepts lie at around 25 foci, meaning that in images with fewer than that number of foci detected by ImageJ or Imaris, the ICS method should detect zero TNoP - is this in line with the authors' predictions? Is it possible that a first-order line fit is not the most appropriate relation between the two methods?

      We apologize for our brevity here. Since DA proved to be a more useful metric we did not spend much effort discussing TNoP. TNoP correlates to the number of clustered particles, or non-diffuse fluorophores. TNoP is the inverse of the number of individual particles per nucleus, but the value is not a direct measure of foci. If a sample had no clustering at all, the number of individual particles would be at a maximum and the TNoP would be at a minimum. However, as fluorophores cluster, the number of individual particles (i.e. non-clustered fluorophores) decreases, which increases the TNoP value. Therefore, TNoP has a correlation to the number of foci detected through traditional measurements, as we found here. Yet, TNoP is a relative measurement and cannot be compared across different conditions. Similar to foci counting, TNoP is unable to factor the size or intensity of each cluster, thus DA is a more appropriate quantification of the DNA damage response.

      The value of TNoP is dependent on the fitted point spread function and the area of the nucleus. The y=0 intercept of TNoP is defined by the optical setup and is not expected to necessarily go through x=0. Intriguingly, other groups have found that some foci identified through traditional measurements are actually clusters of multiple smaller foci, thus the concept of what a foci represents is difficult to interpret. Thus, here we aimed to show a general correlation of TNoP with foci count through traditional methods to reflect how ICS is similar to foci counting, then employed DA to overcome the limitations of defining a foci.

      We have tried to clarify this in the text (page 8, line 266)

      (3) Some suggestions to address the robustness of ICS:

      For a given sample (i.e. one segmented nucleus), the calculation of DA and TNoP should be similar between different images of that same nucleus taken at different times, similar to how the number of traditionally detected foci would be fairly invariant. In particular, it should be shown that these values are not just scaling with the higher normalized intensity seen in stronger DDR responses. In the same vein, the linear relationship between TNoP and "foci" should not change even if the confocal settings are slightly different (i.e., higher/lower illumination intensity) as long as the condition stipulated by the authors in the Discussion holds ("ICS can be implemented on any fluorescence image as long as the square relative fluorescence intensity fluctuations are detectable above noise fluctuations."). To show, as the title states, that spatial ICS is a robust tool, it would be desirable to demonstrate this with a series of images of the same cell at the same or varying excitation intensities.

      Thank you for your suggestions. Indeed, the calculation will be the same over sequential images of the same cell. Observations of dose dependent DA that does not correlate with intensity for RPA1 and RAD51 results (Fig. S5) directly demonstrates that DA does not just scale with intensity.

      We would not expect the TNoP to change with confocal setting, however we show in Figure 1 that the number of foci does indeed change with intensity settings as captured by thresholds. Therefore, any interpretation of TNoP vs. foci count would be very difficult to make at different microscope settings. To ensure we are fairly comparing ICS to existing analysis we keep the settings the same and measure changes between conditions.

      (4) More information is needed on how intensity normalization was performed. The Methods states "Measurements across experiments were normalized by the control in each dataset." The DMSO (0mM drug) plots all appear to have a mean of 1.0, so it appears the values for each set of control nuclei were divided by their own mean, and then the values for each set of experimental nuclei were divided by the mean value of all 3 controls as an aggregate; is this correct?

      We apologize for not being more clear. Thank you for raising this point. We normalized data to a control from each experimental group. Thus, in figures 3,4 and 5 data were collected over multiple experiments with one control per experiment and each treatment condition included in each experiment. Therefore, we normalized each result to the corresponding control from that imaging session. However, in Figure 8 we ran experiments at much higher throughput with multiple controls per experiment, thus the data were normalized to the overall average of the controls, which is why the control averages are not all at a value of 1. We have clarified this in the text. (Page 7 line 218).

      (5) Some more information about the ICS analysis should be given if the full code is not provided - in particular, how the nucleus mask was implemented on the "signal" channel (were the edges abruptly set to zero or was a window function introduced to avoid edge effects in the discrete FFT?

      Thank you for raising this point. We have added the code to GitHub - github.com/ dubachLab/ics. The signal region was established by simply applying the nuclear mask from the DAPI channel to the IF channel. Each region is padded with average intensity value at the edges for 2x the dimensions of the ROI to remove edge effects in the FFT.

      Minor comments:

      (1) Figure 3, 4, 5: I think it would aid figure readability if channels were labeled in the images themselves, not just in the legend.

      Thank you for the suggestion, we tried doing this and struggle to fit a label with the layout of the images. We were also concerned about interpretation of data in each column and the potential to assign data to each figure if they were so prominently labeled.

      (2) Supplemental Figures are mislabeled; the order given in the legends is S1, S2, S3, S2, S3. S4 is called out in the main text where it should be S5.

      Thank you for catching this error. We have made the necessary corrections. S4 contains data on cellular response to the drugs, while S5 contains intensity data in response to MMS.

      (3) It should be stated for each Figure what kind of microscopy was performed - I assume that it is confocal for everything except when widefield is explicitly stated, but for clarity please add this information.

      Indeed, this is correct, we have indicated which microscopy was used for each figure.

      (4) The MATLAB code and full (uncropped) Western blots should be provided as supplemental data if possible.

      We have included a GitHub link for the code and un-cropped western blots.

      (5) The p values from significance tests should indicate whether multiple comparisons correction was necessary (if suggested by Prism) and performed.

      Apologies for a lack of clarity but this was not necessary, significance was calculated vs. the next lower dose (e.g. 10 micromolar vs. 1 micromolar). We have clarified this in the methods (page 7 line 221).

      Reviewer #2 (Recommendations for the authors):

      Major points:

      In addition to the weaknesses noted above, to encourage widespread adoption of this method, the authors should make the tools that they used for their analysis publicly available. In a few instances (e.g., compare Figures 3J and 3L), other methods outperform DA. It would be meaningful to discuss when especially DA may be a better measure than others (such as intensity or number of foci).

      We have made code available on Github. We expect results, such as those in Figures 3J and 3L where intensity is significantly higher at the highest concentration but DA is not are reflective of the underlying biology and this may be interpreted differently under different experimental conditions. Imaris spots (Fig. 3K) also does not capture a significant increase at the highest dose of olaparib, suggesting that intensity may raise but it doesn’t not generate more foci. These results are likely highly dependent on the mechanism of olaparib at such a high concentration and the DDR response. We are hesitant to draw biological conclusions from these results and instead would like to highlight the capacity of ICS to evaluate the DDR, therefore we don’t want to make any broad comments about different applications.

      Minor points:

      (1) Pg. 12: "We used MMS to induce DNA damage in SKOV3 and OVCA429 cells. As expected, normalized intensity for RPA1 and RAD51 values (Figure S5) did not display a dose dependence on MMS concentration."

      Please provide a citation for the claim that RPA1 and RAD51 normalized intensities do not display a dose dependence on MMS concentration.

      These were data that we generated. We were not expecting an intensity change as that would presumably require increased protein generation in response to MMS, compared to gH2AX where the phospho-specific H2AX is generated in the DDR.

      (2) Pg. 12: "Similar to RPA1, RAD51 does not form distinguishable foci in the nuclei in cells without preextraction (Fig. 5)." Please provide a citation for this claim.

      We did not do pre-extraction and our results don’t produce changes in distinguishable foci. We provided citations discussing how, without pre extraction, foci formation for these proteins is not obvious (REF 38 and 39).

      (3) I noted that the authors cite one paper [38] apparently showing that RPA and Rad51 do not always form foci, however, this is in the C. elegans germline in response to micro irradiation, therefore I am not sure that it is applicable to human cells.

      We apologize for referencing a paper on C elegans. Most papers looking at RPA and RAD51 in the DDR use pre-extraction as it seems necessary to observe foci. Therefore, there are not as many papers, that we could find, that do not use pre-extraction. Reference 39 is in Hela cells.

      Reviewer #3 (Recommendations for the authors):

      Major points:

      (1) Page 8, the second paragraph: In the Result section, it is better to describe how the authors carried out immuno-staining (without pre-extract subtraction) and ICS briefly, although the method is described in detail in the Method section.

      Thank you for the suggestion, we have added this description (page 8, line 259)

      (2) In Figure 5K-P: The authors analyzed "invisible" RAD51 foci on the image (Fig. 5L, M, O, and P) without pre-extraction. As a control experiment, it is useful to check whether pre-extraction would provide "visible" RAD51 foci and to examine the similar MMS concentration dependency shown in Figure 5R (or 5T). This would strengthen the power of the ICS analysis.

      Thank you for the suggestion. In our hands, pre-extraction is extremely subjective. We have tried performing pre-extraction but find highly variable results depending on conditions. Therefore, we did not include any pre-extraction here. We expect that performing these experiments may or may not agree with results in Figure 5 largely because we are unable to achieve repeatable pre-extraction foci counting.

      (3) Figure 6D (and 6C) looks very interesting. It would be important to show the interpretation of this correlation shown in the graph. Although the authors argued that ICS analysis results shown in the graph could provide new insight into the DDR (page 14, last line 5), as shown in another part, it is important to carry out the same analysis by using Imaris Spots. Moreover, it is interesting to apply the analysis to RAD51 foci (shown in Figure 5), given that the PARPi effect is enhanced in the absence of RAD51mediated recombination.

      We completely agree that this analysis may generate interesting results to help interpret the DDR response to PARP inhibition. These experiments are part of an ongoing follow up study where we extend the use of ICS to other parts of the DDR and investigate protein clustering across several proteins with impact on PARPi response. Therefore, since the focus of this manuscript is introducing ICS as a tool to study the DDR, we believe that omitting those data here does not deter from the central points of the manuscript. We including results in Figure 6 because we wanted to show how ICS could impact DDR research. Furthermore, combined with our advances shown in Figures 7 and 8, we are currently working on adapting ICS to be high-throughput and much simpler than Imaris spots for handling large datasets needed to generate results like those in Figure 6.

      Minor points:

      (1) Figure 1I, blue arrows: These showed an area with a higher background. Because of a low magnification, it is very hard to see the difference from the other areas of the background. It is better to show a magnified image of the representative region with a higher background.

      We hope that readers can see the higher intensity in the diffuse area. We attempted to construct a zoomed in area, but that either blocked a significant portion of the nonzoomed image or added complexity to the figure. We have noted that images in Figure S1 are larger and more obviously capture an increase in background intensity.

      (2) Figure 2 legend, line 5, the same as "A)": This should be "B".

      Here, the number of independent particle clusters is intended to be the same as A, the difference is that the independent particles are clusters in C and individual fluorophores in A.

      (3) Page 9, the first paragraph, last line, foci formation, and foci composition: These should be "focus formation and focus composition".

      We have changed this.

      (4) Page 15, the first paragraph, line 5, palbociclib, camptothecin, or etoposide: please explain what kinds of the drugs are.

      We have added that these drugs cause cells to stall at different cell cycle stages. Explaining the drugs would take considerable room in the text.

      (5) Page 16, the first paragraph, line 1, bleomycin: Please explain what this drug is.

      Similar to above, we have stated that this drug causes DNA damage, going into detail would take several sentences.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      (1) This manuscript introduces a useful curation pipeline of antibody-antigen structures downloaded from the PDB database. The antibody-antigen structures are presented in a new database called AACDB, alongside annotations that were either corrected from those present in the PDB database or added de-novo with a solid methodology. Sequences, structures, and annotations can be very easily downloaded from the AACDB website, speeding up the development of structure-based algorithms and analysis pipelines to characterize antibody-antigen interactions. However, AACDB is missing some key annotations that would greatly enhance its usefulness.

      Here are detailed comments regarding the three strengths above:

      I think potentially the most significant contribution of this database is the manual data curation to fix errors present in the PDB entries, by cross-referencing with the literature. However, as a reviewer, validating the extent and the impact of these corrections is hard, since the authors only provided a few anecdotal examples in their manuscript.

      I have personally verified some of the examples presented by the authors and found that SAbDab appears to fix the mistakes related to the misidentification of antibody chains, but not other annotations.

      (a) "the species of the antibody in 7WRL was incorrectly labeled as "SARS coronavirus B012" in both PDB and SabDab" → I have verified the mistake and fix, and that SAbDab does not fix is, just uses the pdb annotation.

      (b) "1NSN, the resolution should be 2.9 , but it was incorrectly labeled as 2.8" → I have verified the mistake and fix, and that sabdab does not fix it, just uses the PDB annotation.

      (c) "mislabeling of antibody chains as other proteins (e.g. in 3KS0, the light chain of B2B4 antibody was misnamed as heme domain of flavocytochrome b2)" → SAbDab fixes this as well in this case.

      (d) "misidentification of heavy chains as light chains (e.g. both two chains of antibody were labeled as light chain in 5EBW)" → SAbDab fixes this as well in this case.

      I personally believe the authors should make public the corrections made, and describe the procedures - if systematic - to identify and correct the mistakes. For example, what was the exact procedure (e.g. where were sequences found, how were the sequences aligned, etc.) to find mutations? Was the procedure run on every entry?

      We appreciate the reviewer’s valuable feedback. Our correction procedures combined manual curation with systematic sequence analysis. While most metadata discrepancies were resolved through cross-referencing original literature, we implemented a structured approach for identifying mutations in specific cases. For PDB entries labeled as variants (e.g., "Bevacizumab mutant" or "Ipilimumab variant Ipi.106") where the "Mutation(s)" field was annotated as "NO," we retrieved the canonical therapeutic antibody sequence from Thera-SAbDab, then performed pairwise sequence alignment against the PDB entry using BLAST program to identified mutated residues.

      This procedure was not applied to all entries, as mutations are context-dependent. Therapeutic antibodies have well-defined reference sequences, enabling systematic alignment. For antibodies lacking unambiguous wild-type references (e.g., research-grade or non-therapeutic antibodies), mutation annotations were directly inherited from the PDB or literature.

      All corrections have been publicly archived in AACDB. We have added a detailed discussion of this issue in the section “2.3 Metadata” of revised manuscript.

      (2) I believe the splitting of the pdb files is a valuable contribution as it standardizes the distribution of antibody-antigen complexes. Indeed, there is great heterogeneity in how many copies of the same structure are present in the structure uploaded to the PDB, generating potential artifacts for machine learning applications to pick up on. That being said, I have two thoughts both for the authors and the broader community. First, in the case of multiple antibodies binding to different epitopes on the same antigen, one should not ignore the potentially stabilizing effect that the binding of one antibody has on the complex, thereby enabling the binding of the second antibody. In general, I urge the community to think about what is the most appropriate spatial context to consider when modeling the stability of interactions from crystal structure data. Second, and in a similar vein, some antigens occur naturally as homomultimers - e.g. influenza hemagglutinin is a homotrimer. Therefore, to analyze the stability of a full-antigen-antibody structure, I believe it would be necessary to consider the full homo-trimer, whereas, in the current curation of AACDB with the proposed data splitting, only the monomers are present.

      We sincerely appreciate the reviewer’s insightful comments regarding the splitting of PDB files and we appreciate the opportunity to address the reviewer’s thoughtful concerns.

      Firstly, when two antibodies bind to distinct epitopes on the same antigen, we would like to clarify that this scenario can be divided into two cases based on the experimental context: Case1: When two antibodies bind to distinct epitopes on the same antigen, and their complexes are determined in separate structures. For example, SAR650984 (PDB: 4CMH) and daratumumab (PDB: 7DHA) target CD38 at non-overlapping epitopes. These two antibody-antigen complexes were determined independently, and their structures do not influence each other. Case 2 : When the crystal structure contains a ternary complex with two antibodies and an antigen, as in the example of 6OGE discussed in Section 2.2 of our manuscript. After reviewing the original literature, the experiment confirmed that the order of Fab binding does not affect the formation of the ternary complex, and the binding of one antibody does not enhance the binding of the other. This supports the rationale for splitting 6OGE into two separate structures. However, we acknowledge that not all ternary complexes in the PDB provide such detailed experimental descriptions in their original literature. We agree with the reviewer that in some cases, one antibody may stabilize the structure to facilitate the binding of a second antibody. For instance, in 3QUM, the 5D5A5 antibody stabilizes the structure, enabling the binding of the 5D3D11 antibody to human prostate-specific antigen. Such sandwich complexes are indeed valuable for identifying true epitopes and paratopes. Importantly, splitting the structure does not alter the interaction sites.

      Secondly, we fully agree with the reviewer that for antigens that naturally exist as homomultimers (e.g., influenza hemagglutinin as a homotrimer), the full multimeric structure should be considered when analyzing stability. In such cases, users can directly utilize the original PDB structures provided in their multimeric form. Our splitting approach is intended to provide an additional option for cases where monomeric analysis is sufficient or preferred, but it does not preclude the use of the original multimeric structures when necessary.

      (3) I think the manuscript is lacking in justification about the numbers used as cutoffs (1A^2 for change in SASA and 5A for maximum distance for contact) The authors just cite other papers applying these two types of cutoffs, but the underlying physico-chemical reasons are not explicit even in these papers. I think that, if the authors want AACDB to be used globally for benchmarks, they should provide direct sources of explanations of the cutoffs used, or provide multiple cutoffs. Indeed, different cutoffs are often used (e.g. ATOM3D uses 6A instead of 5A to determine contact between a protein and a small molecule https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c45147dee729311ef5b5c3003946c48f-Abstract-round1.html). I think the authors should provide a figure with statistics pertaining to the interface atoms. I think showing any distribution differences between interface atoms determined according to either strategy (number of atoms, correlation between change in SASA and distance...) would be fundamental to understanding the two strategies. I think other statistics would constitute an enhancement as well (e.g. proportion of heavy vs. light chain residues).

      Some obvious limitations of AACDB in its current form include:

      AACDB only contains entries with protein-based antigens of at most 50 amino acids in length. This excludes non-protein-based antigens, such as carbohydrate- and nucleotide-based, as well as short peptide antigens.

      AACDB does not include annotations of binding affinity, which are present in SAbDab and have been proven useful both for characterizing drivers of antibody-antigen interactions (cite https://www.sciencedirect.com/science/article/pii/S0969212624004362?via%3Dihub) and for benchmarking antigen-specific antibody-design algorithms (cite https://www.biorxiv.org/content/10.1101/2023.12.10.570461v1)).

      We thank the reviewer for raising this critical point about the cutoff values used in AACDB. In the current study, the selection of the threshold value is very objective; the threshold chosen in the manuscript is summarized based on existing literature, and we have provided more literature support in the manuscript. The criteria for defining interacting amino acids in established tools, typically do not set the ΔSASA exceed 1 Å2 and the distance exceed 6 Å. While our manuscript emphasizes widely accepted thresholds for consistency with prior benchmarks, AACDB explicitly provides raw ΔSASA and distance values for all annotated residues. Users can dynamically filter the data from downloaded files by excluding entries exceeding their preferred thresholds (e.g., selecting 5Å instead of 6Å). This ensures adaptability to diverse research needs. In the revised version, we reset the distance threshold to 6 Å and calculated the interacting amino acids in order to give the user a wider range of choices. In the section “3.2 Database browse and search” of revised manuscript, we provide a description of the flexible choice of thresholds for practical use.

      Furthermore, distance and ΔSASA are two distinct metrics for evaluating interactions. Distance directly quantifies spatial proximity between atoms, reflecting physical contacts such as van der Waals interactions or hydrogen bonds, and is ideal for identifying direct spatial adjacency. ΔSASA, on the other hand, measures changes in solvent accessibility of residues during binding, capturing the contribution of buried surfaces to binding free energy. Even for residues not in direct contact, reduced SASA due to conformational changes may indicate indirect functional roles.

      As demonstrated through comparisons on the detailed information pages, the sets of interacting amino acids defined by these two methods differ by only a few residues, with no significant variation in their overall distributions. However, since interaction patterns vary significantly across different complexes, analyzing residue distributions across all structures using both criteria is not feasible.

      We thank the reviewer for highlighting these limitations. AACDB currently focuses on protein-based antigens ≤50 amino acids to prioritize structural consistency, which excludes non-protein antigens and shorter peptides. While affinity annotations are critical for benchmarking antibody design tools, these data were not integrated in this release due to insufficient data verification caused by internal team constraints. We acknowledge these gaps and plan to expand antigen diversity and incorporate affinity metrics in future updates.

      Reviewer #2:

      Summary:

      Antibodies, thanks to their high binding affinity and specificity to cognate protein targets, are increasingly used as research and therapeutic tools. In this work, Zhou et al. have created, curated, and made publicly available a new database of antibody-antigen complexes to support research in the field of antibody modelling, development, and engineering.

      Strengths:

      The authors have performed a manual curation of antibody-antigen complexes from the Protein Data Bank, rectifying annotation errors; they have added two methods to estimate paratope-epitope interfaces; they have produced a web interface that is capable of both effective visualisation and of summarising the key useful information in one page. The database is also cross-linked to other databases that contain information relevant to antibody developability and therapeutic applications.

      Weaknesses:

      The database does not import all the experimental information from PDB and contains only complexes with large protein targets.

      Thank you for the valuable feedback. As previously responded to Reviewer 1, due to limitations within our team, comprehensive data integration from PDB has not been achieved in the current version. We acknowledge the significance of expanding the database to encompass a broader range of experimental information and complexes with diverse target sizes. Regrettably, immediate updates to address these limitations are not feasible at this time. Nevertheless, we are committed to enhancing the database in upcoming upgrades to provide users with a more comprehensive and inclusive resource

      Recommendations for the authors:

      Reviewer #1:

      (1) Line 194: "produce" → "produced"

      We thank the reviewer for the feedback. We have checked the grammar and spelling carefully in the revised manuscript.

      (2) As mentioned in the public review, I think adding binding affinity annotations would greatly enhance the use cases for the database.

      We thank the reviewer for the suggestion. As the response in “Public review”. Due to team constraints, these data are not integrated into this release but are being collated. We recognize these gaps and plan to expand antigenic diversity and incorporate affinity metrics in future updates.

      (3) I think adding a visualization of interface atoms and contacts on an entry's webpage would be useful for someone exploring specific entries. It also would be useful if the authors provided a pymol command to select interface residues since that's a procedure any structural biologist is likely to do.

      We sincerely appreciate the reviewer’s constructive suggestions. In response to the request for enhanced visualization and accessibility of interface residue information, we have implemented the following improvements: (1) Web Interface Visualization. On the entry-specific webpage, we have added an interactive visualization window that highlights the antigen-antibody interaction interface using distinct colors. The interaction interface visualization has been incorporated into Figure 5 of the revised manuscript, with a detailed description. (2) PyMOL Command Accessibility. The “Help” page now provides step-by-step PyMOL commands to select and visualize interface residues.

      (4) I think the authors should provide headers to the files containing interface residues according to the change-in-SASA criterion, as they do for those computed according to contact. This would avoid unnecessary confusion - however slight - and make parsing easier. I was initially confused by the meaning of the last column, though after a minute I understood it to be the change in SASA.

      We thank the reviewer for providing such detailed feedback. We thank the reviewer for the comment and the suggestion. We have provided headers for the files of the interacting residues defined by ΔSASA.

      (5) Line 233: "AACDB's data processing pipeline supports mmCIF files" → The meaning and implications of this statement are not obvious to me, and are mentioned nowhere else in the paper. Do you mean that in AACDB there are structure entries that the RCSB PDB database only has in mmCIF file format, and not .pdb format? So, effectively, there are some entries in AACDB that are not in any other antibody-specific database?I checked and, as of Dec 3rd, 2024, there are 41 structures in AACDB that are NOT in SAbDab. Manually checking 5 of those 41 structures, none are mmCIF-only structures.

      We thank the reviewer for the valuable comment. Because of the size of the structures within certain entries, representing them in a single PDB format data file is not feasible due to the excessive number of atoms and polymer chains they contain. As a result, PDB stores these structures in “mmcif” format files. In AACDB, 47 entries, such as 7SOF, 7NKT, 7B27, and 6T9D, are only available in the “mmCIF” format from the PDB. The “.pdb” and “.cif” files contain atomic coordinates in distinct text formats, and the segmentation of these structure files is automatically conducted based on manually annotated antibody-antigen chains. To accommodate this, we have incorporated these considerations into our file processing pipeline, thereby enabling a fully automated file segmentation process. Additionally, we employed Naccess to calculate interatomic distances. However, since this software only accepts .pdb format files as input, we also converted all split .cif files into .pdb format within our fully automated pipeline. We apologize for the lack of clarity in the original manuscript and have included a more detailed explanation in the "2.2 PDB Splitting" section of the revised manuscript.

      Reviewer #2:

      (1) In SabDab and PDB, experimental binding affinities are also reported: could the authors comment on whether they also imported this information and double-checked it against the original paper? If it wasn't imported, that might discourage some users and should be considered as an extension for the future.

      We thank the reviewer for the comment and the suggestion. As the response in “Public review”. Due to current resource constraints, quantitative affinity data has not been incorporated into this release but is undergoing systematic curation. We explicitly recognize these limitations and propose a two-pronged strategy for future iterations: (1) broadening antigen diversity coverage through expanded structural sampling, and (2) integrating quantitative binding affinity measurements. In the Discussion section, we have included description outlining the planned enhancements.

      (2) Line 49-50: the references mentioned in connection to deep learning methods for antibody-antigen predictions seem a bit limited given the amount of articles in this field, with 3 of 4 references on one method only (SEPPA), could the authors expand this list to reflect a bit more the state of the art?

      We thank the reviewer for the suggestion. We agree that more relevant studies should be listed and therefore more references are provided in the revised manuscript.

      When mentioning the limitations of the existing databases, it feels a bit that the criticism is not fully justified. For instance:

      Line 52-53: could the authors elaborate on the reasons why such an identification is challenging? (Isn't it possible to make an efficient database-filtered search? Or rather, should one highlight that a more focussed resource is convenient and why?)

      Thank you for feedback. In this study, the keywords "antibody complex," "antigen complex," and "immunoglobulin complex," were employed during data collection. PDB returned over 30,000 results, of which only one-tenth met our criteria after rigorous filtering. This demonstrates that keyword searches, while useful, inherently limit result precision and introduce substantial redundancy, likely due to the PDB's search mechanism. That’s why we illustrated the significant challenges in identifying antibody-antigen complexes from general protein structures in the PDB.

      Line 55: reading the website http://www.abybank.org/abdb/, it would be fairer to say that the web interface lacks updates, as the database and the code have gone through some updates. Could the authors provide a concrete example of the reason why: 'The AbDb database currently lacks proper organization and management of this valuable data.'?

      We thank the reviewer for highlighting this issue. In our original manuscript, the statement that the AbDb database "lacks proper organization and management" was based on the absence of explicit statement regarding data updates on its official website at the time of submission, even though internal updates to its content may have occurred. We fully respect the long-standing contributions of AbDb to antibody structural research, and our comments were solely directed at the specific state of the database at that time. As the reviewer noted, following the release of our preprint, we have also taken note of AbDb's recent updates. To reflect the latest developments and avoid potential misinterpretation, we have revised the original statement in revised manuscript.

      Also 'this rapid updating process may inadvertently overlook a significant amount of information that requires thorough verification,': it's difficult for me to understand what this means in practice. Could the authors clarify if they simply mean that SabDab collects information from PDB and therefore tends to propagate annotation errors from there? If yes, I think it's enough to state it in these terms, and for sure I agree that the reason is that correcting these annotation errors requires a substantial amount of work.

      We thank the reviewer for providing such detailed feedback on the manuscript. We acknowledge that SabDab represents a highly valuable contribution to the field, and its rapid update mechanism has significantly advanced related research areas. However, as stated by the reviewer, we aim to clarify that SabDab primarily relies on automated metadata extraction from the PDB for annotation, and its rapid update process inherently inherits raw data from upstream sources. According to their paper, manual curation is only applied when the automated pipeline fails to resolve structural ambiguities. This workflow—dependent on PDB annotations with limited manual verification—may propagate errors provided by PDB. Examples include species misannotation and mutation status misinterpretation. We fully agree with the reviewer's observation that correcting errors in such cases necessitates labor-intensive manual curation, which is a core motivation for our study.

      Line 86: why 'Structures that consisted solely of one type of antibody were excluded'? Why exclude complexes with antigens shorter than 50 amino acids? These complexes are genuine antibody-antigen complexes.

      We thank the reviewer for the valuable question. The AACBD database is dedicated to curating structural data of antigen-antibody complexes. Structures featuring only a single antibody type are classified as free antibodies and systematically excluded from the database due to the absence of protein-bound partners. During data screening , we retained sequences shorter than 50 amino acids by categorizing them as peptides rather than eliminating them outright. The current release exclusively encompasses complexes with protein-based antigens. Meanwhile, complexes involving peptide, haptens, and nucleic acid antigens are undergoing systematic curation, with planned inclusion in future updates to broaden antigen category representation.

      Line 96 needs a capital letter at the beginning.

      Line 107: 'this would generate' → 'this generates' (given it is something that has been implemented, correct?).

      Line 124: missing an 'of'.

      Line 163: inspiring by -> inspired by.

      Thank you for feedback. All of the above grammatical or spelling errors have been revised in the manuscript.

      Line 109-111: apart from the example, it would be good to spell out the general rule applied to anti-idiotypic antibodies.

      We thank the reviewer for the valuable feedback. For anti-idiotypic antibodies complex. the partner antibody is treated as a dual-chain antigen, , necessitating individual evaluation of heavy chain and light chain interactions with the anti-idiotypic component. We have given a general rule for anti-idiotypic antibodies in section “2.2 PDB splitting” of revised manuscript.

      Line 155-159: could the authors provide references for the two choices (based on sasa and any-atom distance) that they adopted to define interacting residues?

      We thank the reviewer for the comment and the suggestion. As the same as the response to reviewer #1 in Public review. The interacting residues definition and the threshold chosen in the manuscript is summarized based on existing literature. We have added additional references for support in section “1.Introduction”. Our resource does not provide a fixed amino acid list. Instead, all interacting residues are explicitly documented alongside their corresponding ΔSASA (solvent-accessible surface area changes) and intermolecular distances, allowing researchers to flexibly select residue pairs based on customized thresholds from downloadable datasets. Furthermore, aligning with widely adopted criteria in current literature—where interactions are defined by ΔSASA >1 Ų and atomic distances <6 Å, we have recalibrated our analysis in the revised version. Specifically, we replaced the previous 5 Å distance threshold with a 6 Å cutoff to recalculate interacting residues.

      Line 176-178: could the authors re-phrase this sentence to clarify what they mean by 'change in the distribution'?

      We thank the reviewer for the suggestion. Our search was conducted with an end date of November 2023. However, Figure 3B includes an entry dated 2024. Upon reviewing this record, we identified that the discrepancy arises from the supersession of the 7SIX database entry (originally released in December 2022) by the 8TM1 version in January 2024. This version update explains the apparent chronological inconsistency. We regret any lack of clarity in our original description and have revised the corresponding section in the manuscript to explicitly clarify this change of database.

      Caption Figure 3: please spell out all the acronyms in the figure. Provide the date when the last search was performed (i.e., the date of the last update of these statistics).

      We thank the reviewer for the comment. We have systematically expanded all acronyms and included update dates for statistics in the legend of Figure 3. Corresponding changes have also been made to the statistical pages on the website.

      Finally, it would be advisable to do a general check on the use of the English language (e.g. I noted a few missing articles). In Figure 5 DrugBank contains typos.

      We sincerely appreciate the reviewer's meticulous attention to linguistic precision. We have corrected the typographical error in Figure 5 and conducted a comprehensive review of the entire manuscript to ensure accuracy and clarity.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate how the viscoelasticity of the fingertip skin can affect the firing of mechanoreceptive afferents and they find a clear effect of recent physical skin state (memory), which is different between afferents. The manuscript is extremely well-written and well-presented. It uses a large dataset of low threshold mechanoreceptive afferents in the fingertip, where it is particularly noteworthy that the SA-2s have been thoroughly analyzed and play an important role here. They point out in the introduction the importance of the non-linear dynamics of the event when an external stimulus contacts the skin, to the point at which this information is picked up by receptors. Although clearly correlated, these are different processes, and it has been very well-explained throughout. I have some comments and ideas that the authors could think about that could further improve their already very interesting paper. Overall, the authors have more than achieved their aims, where their results very much support the conclusions and provoke many further questions. This impact of the previous dynamics of the skin affecting the current state can be explored further in so many ways and may help us to better understand skin aging and the effects of anatomical changes of the skin.

      At the beginning of the Results, it states that FA-2s were not considered as stimuli did not contain mechanical events with frequency components high enough to reliably excite them. Was this really the case, did the authors test any of the FA-2s from the larger dataset? If FA-2s were not at all activated, this is also relevant information for the brain to signal that it is not a relevant Pacinian stimulus (as they respond to everything). Further, afferent receptive fields that were more distant to the stimulus were included, which likely fired very little, like the FA-2s, so why not consider them even if their contribution was low?

      Thank you for bringing this up, we have now clarified in the text that while FA-2s did respond at a low rate during the experiment, their responses were not reliably driven by the force stimuli. In the Methods section we have included the following text:

      “Initially, 10 FA-2 neurons were also included in the analysis. But their responsiveness during the experiment was remarkably low, and unlike the other neuron types, their responses were rarely affected by force stimuli. Specifically, only one of the observed FA-2 neurons responded during the force protraction phases. Due to the lack of clear stimulus-driven responses, FA-2 neurons were subsequently excluded from further analysis.”

      One question that I wondered throughout was whether you have looked at further past history in stimulation, i.e. not just the preceding stimulus, but 2 or 3 stimuli back? It would be interesting to know if there is any ongoing change that can be related back further. I do not think you would see anything as such here, but it would be interesting to test and/or explore in future work (e.g. especially with sticky, forceful, or sharp indentation touch). However, even here, it could be that certain directions gave more effects.

      This is a very interesting question! A discernible effect from the previous stimulus could persist at the end of the current stimulation (see Figure 4C), potentially influencing the next one—a 2-stimuli-back effect. Unfortunately, our experimental design did not allow for rigorous testing of this effect. While all possible pairs of stimulus directions were included in immediately consecutive trials, this was not the case for pairs separated by additional trials. Hence, the combination of a likely weak effect and limited variation in history precluded a thorough analysis of a 2-stimuli-back effect. Future work should delve into the time course of the viscoelastic effect in greater detail.

      Did the authors analyze or take into account the difference between receptive field locations? For example, did afferents more on the sides have lower responses and a lesser effect of history?

      An investigation into the potential impact of the relationship between the receptive field location on the fingertip skin and the primary contact site of the stimulus surface revealed no discernible influence for SA-1 and SA-2 neurons. In contrast, FA-1 neurons, particularly those predominantly sensitive to the previous stimulation or displaying mixed sensitivity, exhibited a tendency to terminate near the primary stimulation site. We have added these observations to the text:

      “We found no straightforward relationship between a neuron's sensitivity to current and previous stimulation and its termination site in fingertip skin. Specifically, there was no statistically significant effect of the distance between a neuron's receptive field center and the primary contact site of the stimulus surface on whether neurons signaled current, prior, or mixed information for SA-1 (Kruskal-Wallis test H(2)=3.86, p= 0.15) or SA-2 neurons (H(2)=0.75, p=0.69). However, a significant difference emerged for FA-1 neurons (H(2)=8.66, p=0.01), indicating that neurons terminating closer to the stimulation site on the flat part of the fingertip were more likely to signal past or mixed information.”

      Was there anything different in the firing patterns between the spontaneous and non-spontaneously active SA-2s? For example, did the non-spontaneous show more dynamic responses?

      The firing patterns of both spontaneously and non-spontaneously active SA-2 neurons shared similarities in terms of adaptation and range of firing rate modulation in response to force stimuli, i.e., ‘dynamic response’. The distinction lay in the pattern of modulation of the firing rate associated with stimulus presentations. For spontaneously active SA-2 neurons, this modulation occurred around a significant background discharge, implying that a force stimulus could either decrease or increase the firing rate, depending on how it deformed the fingertip. This characteristic is well illustrated by the firing pattern of the neuron depicted in the lower panels of Figure 3D. Conversely, in non-spontaneously active SA-2 neurons, a force stimulus could only induce an increase in the firing rate or no change. Although the neuron depicted in the upper panels of Figure 3D exhibited some background activity, it serves to exemplify this characteristic. In the text, we have elucidated the dynamics of the SA-2 neuron response by highlighting that force stimulation can either decrease or increase the firing rate in neurons with spontaneous activity through the following addition/change:

      “This increased variability was most evident during the force protraction phase where most neurons exhibited the most intense responses. Increased variability was also observed in instances where the dynamic response to force stimulation involved a decrease in the firing rate (lower panels of Figure 3D). This phenomenon was observed in SA-2 neurons that maintained an ongoing discharge during intertrial periods (cf. Fig. 2A). In these cases, the response to a force stimulus constituted a modulation of the firing rate around the background discharge, signifying that a force stimulus could either decrease or increase the firing rate depending on the prevailing stimulus direction.”

      Were the spontaneously active SA-2 afferents firing all the time or did they have periods of rest - and did this relate to recent stimulation? Were the spontaneously active SA-2s located in a certain part of the finger (e.g. nail) or were they randomly spread throughout the fingertip? Any distribution differences could indicate a more complicated role in skin sensing.

      SA-2 neurons, in general, are well-known for undergoing significant post-stimulation depression (e.g., Knibestöl and Vallbo, 1970; Chambers et al., 1972; Burgess and Perl, 1973). In our force stimulations, this post-excitatory depression manifested as a reduced or absent response during the latter part of the stimulus retraction period for stimuli in directions that markedly excited the neuron. The excitability recovered when the fingertip relaxed during the subsequent intertrial period, and for "spontaneously active" neurons, the firing resumed (see examples in Figure 7A). Furthermore, some “spontaneously active” neurons could be silenced or exhibit a near-silent period during force stimulation for certain force directions, while the spontaneous firing returned during the upcoming intertrial period when the fingertip shape recovered (for example, see responses to stimulation in the proximal and especially ulnar directions in the top panel in Figure 7A).

      Regarding the location of the receptive field centres of spontaneously active and non-spontaneously active SA-2 neurons on the fingertip we did not observe any obvious spatial segregation. To illustrate this, we have revised Figure 1A by color-marking SA-2 neurons that exhibited ongoing activity in intertrial periods, and the figure caption has been modified accordingly:

      “Figure 1. Experimental setup. A. Receptive field center locations shown on a standardized fingertip for all first-order tactile neurons included in the study, categorized by neuron type. Purple symbols denote spontaneously active SA-2 neurons exhibiting ongoing activity without external stimulation.”

      Did the authors look to see if the spontaneous firing in SA-2s between trials could predict the extent to which the type 1 afferents encode the proceeding stimulus? Basically, does the SA-2 state relate to how the type 1 units fire?

      We found no clear indications that the responses of FA-1 and SA-1 could be readily anticipated based on the firing patterns of SA-2 neurons.

      In the discussion, it is stated that "the viscoelastic memory of the preceding loading would have modulated the pattern of strain changes in the fingertip differently depending on where their receptor organs are situated in the fingertip". Can the authors expand on this or make any predictions about the size of the memory effect and the distance from the point of stimulation?

      We have explored this topic further in the text, referring to recent studies modeling essential aspects of fingertip mechanics. However, in our view, current models lack the capability to predict the specific nature sought by the reviewer. These models should include a detailed understanding of the intricate networks of collagen fibers anchoring the pulp tissue at the distal phalangeal bone and the nail. They should also consider potential inherent directional preferences of the receptor organs, attributed to their microanatomy. The text modifications are as follows:

      “In addition to the receptor organ locations, the variation in sensitivity among neurons to fingertip deformations in response to both previous and current loadings would stem from the fingertip’s geometry and its complex composite material properties. Possible inherent directional preferences of the receptor organs, attributed to their microanatomy, could also be significant. However, mechanical anisotropy, particularly within the viscoelastic subcutaneous tissue of the fingertip induced by intricately oriented collagen fiber strands forming fat columns in the pulp (Hauck et al., 2004), are likely to play a crucial role. This anisotropy would shape the dynamic pattern of strain changes at neurons' receptor sites, intricately influencing a neuron's sensitivity not only to current but also to preceding loadings. Indeed, recent modeling efforts suggest that such mechanical anisotropy strongly influences the spatiotemporal distribution of stresses and strains across the fingertip (Duprez et al., 2024).”

      Relatedly, we have included additional text to provide a more comprehensive explanation of the “bulk deformation” of the fingertip that occurs during the loadings:

      “As pressure increases in the pulp, the pulp tissue bulges at the end and sides of the fingertip. Simultaneously, the tangential force component amplifies the bulging in the direction of the force while stretching the skin on the opposite side.”

      In the discussion, it would be good if the authors could briefly comment more on the diversity of the mechanoreceptive afferent firing and why this may be useful to the system.

      The diversity in responses among neurons is instrumental in enhancing the information transmitted to the brain by averting redundancy in information acquisition. This diversity thereby contributes to an overall increase in information. We've included a brief statement, along with several references, underscoring this concept:

      "The resulting diversity in the sensitivities of neurons might enhance the overall information collected and relayed to the brain by the neuronal population, facilitating the discrimination between tactile stimuli or mechanical states of the fingertip (see Rongala et al., 2024; Corniani et al., 2022; Tummala et al., 2023, for more extensive explorations of this idea)."

      Also, the authors could briefly discuss why this memory (or recency) effect occurs - is it useful, does it serve a purpose, or it is just a by-product of our skin structure? There are examples of memory in the other senses where comparisons could be drawn. Is it like stimulus adaptation effects in the other senses (e.g. aftereffects of visual motion)?

      We have expanded the concluding paragraph of the discussion, specifically delving into the question of whether the mechanical memory effect serves a deliberate purpose or is simply an incidental byproduct of our skin structure:

      “In any case, the viscoelastic deformability of the fingertips plays a pivotal role in supporting the diverse functions of the fingers. For example, it allows for cushioned contact with objects featuring hard surfaces and allows the skin to conform to object shapes, enabling the extraction of tactile information about objects' 3D shapes and fine surface properties. Moreover, deformability is essential for the effective grasping and manipulation of objects. This is achieved, among other benefits, by expanding the contact surface, thereby reducing local pressure on the skin under stronger forces and enabling tactile signaling of friction conditions within the contact surface for control of grasp stability. Throughout, continuous acquisition of information about various aspects of the current state of the fingertip and its skin by tactile neurons is essential for the functional interaction between the brain and the fingers. In light of this, the viscoelastic memory effect on tactile signaling of fingertip forces can be perceived as a by-product of an overall optimization process within prevailing biological constraints.”

      One point that would be nice to add to the discussion is the implications of the work for skin sensing. What would you predict for the time constant of relaxation of fingertip skin, how long could these skin memory effects last? Two main points to address here may be how the hydration of the skin and anatomical skin changes related to aging affect the results. If the skin is less viscoelastic, what would be the implications for the firing of mechanoreceptors?

      It is likely that the time constant depends to some extent on mechanical factors of the skin, which will likely change due to age or environmental factors. However, while these questions are intriguing, they fall outside the scope of the current study and we are not aware of studies that have addressed these issues directly in experiments either.

      How long does it take for the effect to end? Again, this will likely depend on the skin's viscoelasticity. However, could the authors use it in a psychophysical paradigm to predict whether participants would be more or less sensitive to future stimuli? In this way, it would be possible to test whether the direction modifies touch perception.

      Time constants for tissue viscoelasticity have been estimated to extend up to several seconds (see citations in the introduction). While direct perceptual effects could indeed be explored through psychophysical experimental paradigms, we are currently unaware of any studies specifically addressing the type of effect described in this study. In addition to the statement that, concerning manipulation and haptic tasks, "to our knowledge, a possible influence of fingertip viscoelasticity on task performance has not been systematically investigated," we have now also addressed tactile psychophysical tasks conducted during passive touch with the following sentence in the text:

      “Similarly, there is a lack of systematic investigation of potential effects of fingertip viscoelasticity on performance in tactile psychophysical tasks conducted during passive touch.”

      Reviewer #2 (Public Review):

      Summary:

      The authors sought to identify the impact skin viscoelasticity has on neural signalling of contact forces that are representative of those experienced during normal tactile behaviour. The evidence presented in the analyses indicates there is a clear effect of viscoelasticity on the imposed skin movements from a force-controlled stimulus. Both skin mechanics and evoked afferent firing were affected based on prior stimulation, which has not previously been thoroughly explored. This study outlines that viscoelastic effects have an important impact on encoding in the tactile system, which should be considered in the design and interpretation of future studies. Viscoelasticity was shown to affect the mechanical skin deflections and stresses/strains imposed by previous and current interaction force, and also the resultant neuronal signalling. The result of this was an impaired coding of contact forces based on previous stimulation. The authors may be able to strengthen their findings, by using the existing data to further explore the link between skin mechanics and neural signalling, giving a clearer picture than demonstrating shared variability. This is not a critical addition, but I believe would strengthen the work and make it more generally applicable.

      Strengths:

      - Elegant design of the study. Direct measurements have been made from the tactile sensory neurons to give detailed information on touch encoding. Experiments have been well designed and the forces/displacements have been thoroughly controlled and measured to give accurate measurements of global skin mechanics during a set of controlled mechanical stimuli.

      - Analytical techniques used. Analysis of fundamental information coding and information representation in the sensory afferents reveals dynamic coding properties to develop putative models of the neural representation of force. This advanced analysis method has been applied to a large dataset to study neural encoding of force, the temporal dynamics of this, and the variability in this.

      Weaknesses:

      - Lack of exploration of the variation in neural responses. Although there is a viscoelastic effect that produces variability in the stimulus effects based on prior stimulation, it is a shame that the variability in neural firing and force-induced skin displacements have been presented, and are similarly variable, but there has been no investigation of a link between the two. I believe with these data the authors can go beyond demonstrating shared variability. The force per se is clearly not faithfully represented in the neural signal, being masked by stimulation history, and it is of interest if the underlying resultant contact mechanics are.

      Thank you for this suggestion. We have added a new section investigating the link between skin deformation and neural firing in more depth via a simple neural model. Please see our answer below in the ‘Recommendations’ section for further details.

      Validity of conclusions:

      The authors have succeeded in demonstrating skin viscoelasticity has an impact on skin contact mechanics with a given force and that this impacts the resultant neural coding of force. Their study has been well-designed and the results support their conclusions. The importance and scope of the work is adequately outlined for readers to interpret the results and significance.

      Impact:

      This study will have important implications for future studies performing tactile stimulation and evaluating tactile feedback during motor control tasks. In detailed studies of tactile function, it illustrates the necessity to measure skin contact dynamics to properly understand the effects of a force stimulus on the skin and mechanoreceptors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (Very) minor comments

      - The authors say at the beginning of the Results that, "The fourth type of tactile neurons in the human glabrous skin, fast adapting type II neurons...". Although generally written that there are four types of afferent in the glabrous skin, it would be better to state that these are low-threshold A-beta myelinated mechanoreceptive afferents, at least one time, as there are other types of afferent in the glabrous skin that respond to mechanical stimulation (e.g. low and high threshold C-fibers).

      This is now clarified at the start of the Results section:

      “We recorded action potentials in the median nerve of individual low-threshold A-beta myelinated first-order human tactile neurons innervating the glabrous skin of the fingertip…”

      - Fig. 3: Could you add '(N)' as the measurement of force for Fig. 3A for Fz, Fy, and Fz? Also, please change 'Data was recorded' to 'Data were recorded' in the legend.

      Fixed.

      - At the beginning of the Methods, you say that your study conforms to the Declaration of Helsinki, which actually requires pre-registration in a database. If you did not pre-register your study, please can you add '... in accordance with the Declaration of Helsinki, apart from pre-registration in a database'.

      Thanks for making us aware of this. We have added the suggested qualifier to the ethics statement.

      Reviewer #2 (Recommendations For The Authors):

      The neural representation/encoding of the actual displacement vectors would be a useful addition to the analyses. These vectors have been demonstrated to systematically change with the condition in the irregular series (Figure 2E) and will thus significantly act on the dynamics of induced mechanical changes in the skin with a given interaction force. Thus, it could be examined how the neurons code the magnitude of displacements as well as their direction. An evaluation of the extent to which the imposed displacement magnitudes are encoded in the neural responses would be a useful addition in explaining the signalling of the force events and how the central nervous system decodes these. Evaluating an alternative displacement encoding for comparison to pure force encoding may reveal more about how contact events are represented in the tactile system, which must decode these variable afferent signals to reconstruct a percept of the interaction. It could then be explored how the central nervous system may then scale the dynamic afferent responses based on the background viscoelastic state likely to be present in the SA-II afferent signals (Figure 7) for a context in which to evaluate the dynamic contact forces. This may of course be a complex relationship for the type-I afferents, where the underlying mechanical events evoking the firing (microslips not represented in global forces) have not been measured here. Such a model could be more widely applicable, as the skin viscoelasticity and displacement magnitudes are a straightforward measurement metric and could perhaps be used as a better proxy for neural signalling. This would allow the investigation of a wider variety of forces, and the study of the timing of the viscoelastic effect, both of which have been fixed here. This would give the work a broader impact, rather than just highlighting that this effect produces variability, it could reveal if this mechanical feature is structured in the neural representation. The categorical encoding/decoding tested here is specific to the stimuli used (magnitudes, intervals), but there is the possibility that this may be more generally applicable (within the bounds of forces/speeds) if the underlying basis of the variability in the signalling produced by the viscoelasticity is identified. Since the time course of the viscoelasticity has not been measured here (fixed forces and intervals), further study is required to fully understand the implications this has for a wider variety of situations.

      We agree that a better understanding of how the mechanical deformations are reflected in the resulting spike trains would be valuable. While ultimately a full understanding will need precise measurements of skin deformation across the whole fingertip to account for mechanical propagation to mechanoreceptor locations, relating the deformations at the contact location with neural firing patterns directly can provide useful hints into which aspects of deformation are encoded and how. To this end, we ran a new analysis that aimed to predict the time-varying neural responses directly from the recorded mechanical movements of the contactor.

      Below we have reproduced the new results and methods text along with the additional figures for this analysis. Note that we have also added text in the Discussion to interpret these findings in the context of our other results.

      New section in Results titled Predicting neural responses from contactor movements: “The similarity in the history-dependent variation in neural firing and fingertip deformation at a given force stimulus suggests that neuronal firing is determined by how the fingertip deforms rather than the applied force itself. However, this similarity does not clarify the relationship between fingertip deformation dynamics and neural signaling. To investigate further, we fit cross-validated multiple linear regression models to evaluate how well distinct aspects of contactor movement could predict the time-varying firing rates of individual neurons during the protraction phases of the irregular sequence. The models used predictors based on (1) the three-dimensional position of the contactor, (2) its three-dimensional velocity, (3) a combination of position and velocity signals, and, finally, (4) position and velocity signals along with all possible two-way interactions between them, capturing potentially complex relationship between fingertip deformations and neural signaling.

      Comparing the variance explained (R<sup>2</sup>) by each regression model for each neuron type revealed clear differences between the models (Figure 5A). A two-way mixed design ANOVA, with regression model as within-group effects and neuron type as a between-group effect revealed a main effect of model on variance explained (F(3,462) = 815.5, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.84). Model prediction accuracy overall increased with the number of predictors, with the two-way interaction model outperforming all others (p < 0.001 for all comparisons, Tukey’s HSD). Additionally, a significant main effect of neuron type (F(2,154) = 29.8, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.28) and a significant interaction between regression model and neuron type were observed (F(6,462) = 50.8, p < 0.001, η<sub>p</sub><sup>2</sup> = 0.40).

      For neuron type, model predictions were most accurate for SA-2 neurons, followed by SA-1 neurons, with FA-1 neurons showing the lowest accuracy (p < 0.003 for all comparisons, Tukey’s HSD). The interaction between model and neuron type revealed distinct patterns. For SA-1 and SA-2 neurons, position-only and velocity-only models had similar prediction accuracy (p ≥ 0.996, Tukey’s HSD) with no significant differences between these neuron types (p ≥ 0.552, Tukey’s HSD). FA-1 neurons performed poorly with the position-only model but showed higher accuracy with the velocity-only model (p < 0.001, Tukey’s HSD) and better than SA-1 neurons (p = 0.006, Tukey’s HSD). Models combining position and velocity predictors (without interactions) surpassed both position-only and velocity-only models for SA-1 and SA-2 neurons (p < 0.001, Tukey’s HSD). Overall, the differences between neuron types broadly match their tuning to static and dynamic stimulus properties.

      The two-way interaction model, accounting for most variance in neural responses, produced mean R<sup>2</sup> values of 0.75 for FA-1, 0.88 for SA-1, and 0.91 for SA-2 neurons (Figure 5A). To evaluate the contribution of the different predictors, we ranked them using the permutation feature importance method, focusing on the six most important ones. Regression analyses using only these variables explained almost all of the variance explained by the full model, with a median R<sup>2</sup> reduction of just 0.055 across all neurons. Across all neuron types, at least half included all three velocity components (dPx, dPy, dPz) among the top six, with FA-1 neurons showing the highest prevalence (Figure 5B). Interactions between normal position (Pz) and each velocity component were also frequently observed, while interactions involving tangential position and velocity components were less common. Interactions among velocity components were relatively well represented, followed by interactions limited to position components. Position signals were generally less represented, except for normal position (Pz) in slowly adapting neurons, where it appeared in 50% of SA-1 and 68% of SA-2 neurons. Despite these broad trends, important predictors varied widely across ranks even within a given neuron class (see Figure 5-figure supplement 1), and even the most frequent variables appeared in only a subset of cases, suggesting broad variability in sensitivity across neurons.”

      New methods paragraph titled Predicting time-varying firing rates from skin deformations:

      “This analysis was conducted in Python (v3.13) with pandas for data handling, numpy for numerical operations, and scikit-learn for model fitting and evaluation.

      To assess how well individual neurons' time-varying firing rates could be predicted from simultaneous contactor movements, we fitted multiple linear regression models (see Khamis et al., 2015, for a similar approach}. This analysis focused on the force protraction phase of the irregular sequence, where neurons were most responsive and sensitive to stimulation history. Data from 100 ms before to 100 ms after the protraction phase (between -0.100 s and 0.225 s relative to protraction onset) were included for each trial. Neurons were included if they fired at least two action potentials during the force protraction phase and the following 100 ms in at least five of the 25 trials. This ensured sufficient variability in firing rates for meaningful regression analysis, resulting in 68 SA-1, 38 SA-2, and 51 FA-1 neurons being included.

      Contractor position signals digitized at 400 Hz were linearly interpolated to 1000 Hz. Instantaneous firing rates, derived from action potentials sampled at 12.8 kHz, were resampled at 1000 Hz to align with position signals. A Gaussian filter (σ = 10 ms, cutoff ~16 Hz) was applied to the firing rate as well as to the position signals before differentiation. To account for axonal conduction (8–15 ms) and sensory transduction delays (1–5 ms), firing rates were advanced by 15 ms to align approximately with independent variables.

      Regressions were performed using scikit-learn's Ridge and RidgeCV regressors, which apply L2 regularization to mitigate overfitting. Hyperparameter tuning for the regularization parameter (alpha) was performed using GridSearchCV with a predefined range (0.001–1000.0), incorporating five-fold cross-validation to select the best value. To minimize overfitting risks, model performance was further validated with independent five-fold cross-validation (KFold), and R<sup>2</sup> scores were computed using cross_val_score.

      We constructed four linear regression models with increasing complexity: (1) Position-only, using three-dimensional contactor positions (Px, Py, Pz); (2) Velocity-only, using three-dimensional velocities (dPx, dPy, dPz); (3) Combined, including all position and velocity signals (6 predictors); and (4) Interaction, including all signals and their two-way interactions (21 predictors). All features were standardized using StandardScaler to improve regularization and model convergence. PolynomialFeatures generated second-order interaction terms for the interaction model. Feature importance was evaluated with permutation_importance, and simpler models were built using the most important features. These models were validated through cross-validation to assess retained explanatory power.”

      Minor:

      - It would be useful to add a brief description of the material aspects of the contactor tip to the methods (as per Birznieks 2001).

      We have added the following statement:

      “To ensure that friction between the contactor and the skin was sufficiently high to prevent slips, the surface was coated with silicon carbide grains (50–100 μm), approximating the finish of smooth sandpaper.”

      - The axes labelling on Figure 3A and legend description is ambiguous, probably placing the Px, Py, and Pz labels on the far left axes and the Fx, Fy, and Fz on the right side of the far right axes would make this clearer.

      Label placement has been improved along with some other minor fixes.

      - For the quasi-static phase analysis, the phrase "absence of loading" used in reference to the interstimulus period and SA-II afferents does not seem to be a correct description. The finger is still loaded (at least in the normal direction), with a magnitude of imposed displacement that counteracts the viscoelastic force exerted by the skin mechanics of the fingertip. Although there is a zero net-force load, a mechanical stimulus is still being actively applied to the skin.

      We have changed the wording throughout the text and now consistently refer either to the “interstimulus period” directly or to an “absence of externally applied stimulation” to avoid confusion.

    1. If teachers and students can meet each other's needs, a comfortable life for all is the reward. Sizer believed that when one or the other breaks this unspoken contract, trouble is likely to follow.

      This passage really reflects the "tacit understanding" in many classrooms - if students don't cause trouble, the teacher can easily finish the class, and everyone doesn't make things difficult for each other. I used to feel this atmosphere in high school. Some students in the class didn't study much, but as long as they didn't disturb others, the teacher would let them "slack off" by default. It's like an unspoken rule of "we don't undermine each other." The author quoted Sizer's "Let's Make a Deal" to satirize this seemingly calm but lacking in-depth communication in education. I think this "transactional" teaching atmosphere may seem to be easy in the short term, but in the long run, it will make teaching lose its real challenge and meaning.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This short report shows that the transcription factor gene mirror is specifically expressed in the posterior region of the butterfly wing imaginal disk, and uses CRISPR mosaic knock-outs to show it is necessary to specify the morphological features (scales, veins, and surface) of this area.

      Strengths:

      The data and figures support the conclusions. The article is swiftly written and makes an interesting evolutionary comparison to the function of this gene in Drosophila. Based on the data presented, it can now be established that mirror likely has a similar selector function for posterior-wing identity in a plethora of insects.

      We thank the reviewer for their feedback.

      Weaknesses:

      This first version has minor terminological issues regarding the use of the terms "domains" and "compartment".

      We acknowledge that the terminologies “domains” and “compartments” might lead to confusion. To avoid confusion we have removed the term “compartment” from the manuscript.

      Reviewer #2 (Public Review):

      This is a short and unpretentious paper. It is an interesting area and therefore, although much of this area of research was pioneered in flies, extending basic findings to butterflies would be worthwhile. Indeed, there is an intriguing observation but it is technically flawed and these flaws are serious.

      The authors show that mirror is expressed at the back of the wing in butterflies (as in flies). They present some evidence that is required for the proper development of the back of the wing in butterflies (a region dubbed the vannus by the ancient guru Snodgrass). But there are problems with that evidence. First, concerning the method, using CRISP they treat embryos and the expectation is that the mirror gene will be damaged in groups of cell lineages, giving a mosaic animal in which some lines of cells are normal for mirror and others are not. We do not know where the clones or patches of cells that are defective for mirror are because they are not marked. Also, we do not know what part of the wing is wild type and what part is mutant for mirror. When the mirror mutant cells colonise the back of the wing and that butterfly survives (many butterflies fail to develop), the back of the wing is altered in some selected butterflies. This raises a second problem: we do not know whether the rear of the wing is missing or transformed. From the images, the appearance of the back of the wing is clearly different from the wild type, but is that due to transformation or not? And then I believe we need to know specifically what the difference is between the rear of the wing and the main part. What we see is a silvery look at the back that is not present in the main part, is it the structure of the scales? We are not told.

      Thank you for this feedback. We appreciate that many readers may not accustomed to looking at mosaic knockouts. As discussed in a previous review article (Zhang & Reed 2017), we rely on a combination of contralateral asymmetry and replicates to infer mutant phenotypes. For many genes (e.g. pigmentation enzymes) mutant clones are obvious, but for other types of genes (e.g. ligands) clone boundaries are sometimes not directly diagnosable. It is simply a limitation of our study system. Nonetheless, you see for yourself that “the back of the wing is altered in some butterflies” – the effects of deleting mirror are clear and repeatable.

      In terms of interpreting mutant phenotypes, we agree that that paper would benefit from a better description of the specific effects. Therefore, we have included an improved, more systematic description of phenotypes, along with better-annotated figures showing changes in wing shape and venation, scale coloration, and color pattern transformation (e.g. posterior elongation of the orange marginal stripes).

      There are other problems. Mirror is only part of a group of genes in flies and in flies both iroquois and mirror are needed to make the back of the wing, the alula (Kehl et al). What is known about iro expression in butterflies?

      In Drosophila mirror, araucan, and caupolican comprise the so-called Iroqouis Complex of genes. As denoted in Figure S4 and in Kerner et al (doi: https://doi.org/10.1186/1471-2148-9-74) the divergence of araucan and caupolican into two separate paralogs is restricted to Drosophila. As in most insects, butterflies have only two Iroquois Complex genes: araucan and mirror. We tested the role of araucan in Junonia coenia as shown in our pre-print: https://doi.org/10.1101/2023.11.21.568172. Its expression appears to be restricted to early pupal wings where it is transcribed in all scale-forming cells. Mosaic araucan KOs resulted in a change in scale iridescent coloration associated with changes in the laminar thickness of scale cells.  

      In flies, mirror regulates a late and local expression of dpp that seems to be responsible for making the alula. What happens in butterflies? Would a study of the expression of Dpp in wildtype and mirror compromised wings be useful?

      We thank the reviewer for the proposal and agree that a future study comparing Dpp in wild-type versus mirror KO butterflies would be useful to clarify the mechanism of Dpp signalling in wing development. It is not clear, however, that the results of a Dpp experiment would change the conclusions of our current study therefore we decided not to undertake these additional experiments for our revision.

      Thus, I find the paper to be disappointing for a general journal as it does little more than claim what was discovered in Drosophila is at least partly true in butterflies. 

      We respect that the reviewer does not have a strong interest in the comparative aspects of this study. Fair enough. This report is primarily aimed at biologists interested in the evolutionary history of insect wings.

      Also, it fails to explain what the authors mean by "wing domains" and "domain specification". They are not alone, butterfly workers, in general, appear vague about these concepts, their vagueness allowing too much loose thinking.

      A domain is “a region distinctively marked by some physical feature”. This term is used extensively in the developmental biology literature (e.g. “expression domain”, “embryonic domain”, “tissue domain”, “domain specification”) and is found throughout popular textbooks (e.g. Alberts et al. “The Cell”, Gilbert “Developmental Biology”). We prefer the term “domain” because of its association in the Drosophila literature with transcription factors that define fields of cells. We specifically avoided using the term “compartment” because of its association with cell lineage, which we have not tested. 

      Since these matters are at the heart of the purpose and meaning of the work reported here, we readers need a paper containing more critical thought and information. I would like to have a better and more logical introduction and discussion.

      We would like the very same thing, of course, and we hope the reviewer finds our revised manuscript to be more satisfying to read.

      The authors do define what they mean by the vannus of the wing. In flies the definition of compartments is clear and abundantly demonstrated, with gene expression and requirement being limited precisely to sets of cells that display lineage boundaries. It is true that domains of gene expression in flies, for example of the iroquois complex, which includes mirror, can only be related to patterns with difficulty. Some recap of what is known plus the opinion of the authors on how they interpret papers on possible lineage domains in butterflies might also be useful as the reader, is no wiser about what the authors might mean at the end of it!

      We thank the reviewer for this suggestion. However, our experiments have little to contribute to the topic of cell lineage compartmentalization. We have therefore opted to avoid speculating on this topic to prevent confusion and to keep the manuscript focused on our experimental results.

      The references are sometimes inappropriate. The discovery of the AP compartments should not be referred to Guillen et al 1995, but to Morata and Lawrence 1975. Proofreading is required.

      We thank the reviewer for suggesting this important reference. We have included it in our revision.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Chatterjee et al. examines the role of the mirror locus in patterning butterfly wings. The authors examine the pattern of mirror expression in the common buckeye butterfly, Junonia coenia, and then employ CRISPR mutagenesis to generate mosaic butterflies carrying clones of mirror mutant cells. They find that mirror is expressed in a well-defined posterior sector of final-instar wing discs from both hindwings and forewings and that CRISPR-injected larvae display a loss of adult wing structures presumably derived from the mirror expressing region of hindwing primordium (the case for forewings is a bit less clear since the mirror domain is narrower than in the hindwing, but there also do seem to be some anomalies in posterior regions of forewings in adults derived from CRISPR injected larvae). The authors conclude that the wings of these butterflies have at least three different fundamental wing compartments, the mirror domain, a posterior domain defined by engrailed expression, and an anterior domain expressing neither mirror nor engrailed. They speculate that this most posterior compartment has been reduced to a rudiment in Drosophila and thus has not been adequately recognized as such a primary regional specialization.

      Critique:

      This is a very straightforward study and the experimental results presented support the key claims that mirror is expressed in a restricted posterior section of the wing primordium and that mosaic wings from CRISPR-injected larvae display loss of adult wing structures presumably derived from cells expressing mirror (or at least nearby). The major issue I have with this paper is the strong interpretation of these findings that lead the authors to conclude that mirror is acting as a high-level gene akin to engrailed in defining a separate extreme posterior wing compartment. To place this claim in context, it is important in my view to consider what is known about engrailed, for which there is ample evidence to support the claim that this gene does play a very ancestral and conserved function in defining posterior compartments of all body segments (including the wing) across arthropods.

      (1) Engrailed is expressed in a broad posterior domain with a sharp anterior border in all segments of virtually all arthropods examined (broad use of a very good panspecies anti-En antibody makes this case very strong).

      (2) In Drosophila, marked clones of wing cells (generated during larval stages) strictly obey a straight anterior-posterior border indicating that cells in these two domains do not normally intermix, thus, supporting the claim that a clear A/P lineage compartment exists.

      In my opinion, mirror does not seem to be in the same category of regulator as engrailed for the following reasons:

      (1) There is no evidence that I am aware of, either from the current experiments, or others that the mirror expression domain corresponds to a clonal lineage compartment. It is also unclear from the data shown in this study whether engrailed is co-expressed with mirror in the posterior-most cells of J. coenia wing discs. If so, it does not seem justified to infer that mirror acts as an independent determinant of the region of the wing where it is expressed.

      (2) Mirror is not only expressed in a posterior region of the wing in flies but also in the ventral region of the eye. In Drosophila, mirror mutants not only lack the alula (derived approximately from cells where mirror is expressed), but also lack tissue derived from the ventral region of the eye disc (although this ventral tissue loss phenotype may extend beyond the cells expressing mirror).

      In summary, it seems most reasonable to me to think of mirror as a transcription factor that provides important development information for a diverse set of cells in which it can be expressed (posterior wing cells and ventral eye cells) but not that it acts as a high-level regulator as engrailed.

      Recommendation:

      While the data provided in this succinct study are solid and interesting, it is not clear to me that these findings support the major claim that mirror defines an extreme posterior compartment akin to that specified by engrailed. Minimally, the authors should address the points outlined above in their discussion section and greatly tone down their conclusion regarding mirror being a conserved selector-like gene dedicated to establishing posterior-most fates of the wing. They also should cite and discuss the original study in Drosophila describing the mirror expression pattern in the embryo and eye and the corresponding eye phenotype of mirror mutants: McNeill et al., Genes & Dev. 1997. 11: 1073-1082; doi:10.1101/gad.11.8.1073.

      We thank the reviewer for their summary, critique, and recommendations. We agree with everything the reviewer says. Honestly, however, we were surprised by these comments because we took great care in the paper to never refer to mirror as a compartmentalization gene or claim it has a function in cell lineage compartmentalization like engrailed. As pointed out, we lack clonal analyses to test for compartmentalization. This is why we used the term “domain” instead of “compartment” in the title and throughout the manuscript. Nevertheless, we have recrafted the discussion in the manuscript, including completely removing the term “compartment”, to better avoid implications that mirror plays a role in cell lineage compartmentalization. 

      We also thank the reviewer for recommending the paper about the role of mirror in eye development. For the sake of keeping the paper focused, however, we decided not to broach the topic of mirror functions outside the context of wing development.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have minor comments for improvement.

      The abstract and introductions are terminologically problematic when they refer to the concept of compartment and compartment boundaries. Allegedly this confusion has previously propagated in several articles related to butterfly wing development, which keeps alienating this literature from being taken seriously by fly specialists, for example. So it is important to use the right terms. I will try to explain point by point here, but I would appreciate it if the authors could undertake a significant rewrite taking these comments into account. The authors use the terms compartment and compartment boundary. This has a very specific use in developmental genetics: mitotic clones never cross a boundary (or compartment). I think the authors can keep referring to the equivalent of the A-P boundary, which is situated somewhere between M1-M2 based on unpublished data from the Patel Lab, and is not very well defined (Engrailed expression moves a little bit during development in this area). Domain is a looser term and can be used more liberally to describe genetically defined regions.

      - "Classical morphological work subdivides insect wings into several distinct domains along the antero-posterior (AP) axis, each of which can evolve relatively independently." Yes. This concept of domain and individuation seems important. You could make a proposed link to selector genes here.

      - "There has been little molecular evidence, however, for AP subdivision beyond a single compartment boundary described from Drosophila melanogaster." Incorrect, and this conflates "domain" and "compartment".

      Flies have wing AP domains too, that pattern their veins (see the cited Banerjee et al). 

      - "Our results confirm that insect wings can have more than one posterior developmental domain, and support models of how selector genes may facilitate evolutionarily individuation of distinct AP domains in insect wings". Yes, and I like the second part of the sentence. Still, I would recommend simply deleting "confirm that insect wings can have more than one posterior developmental domain, and" because this is neglecting previous work on AP genetic regionalization in both flies (vein literature) and butterflies (e.g. McKenna and Nijhout, Banerjee et al).

      - "Analyses of wing pattern diversity across butterflies, considering both natural variation and genetic mutants, suggest that wings can be subdivided into at least five AP domains, bounded by the M1, M3, Cu2, and 2A veins respectively, within each of which there are strong correlations in color pattern variation and wing morphology (Figure 1A)". Yes, and I would recommend emphasizing they correspond to welldefined gene expression domains as mentioned in Banerjee et al, or McKenna and Nijhout.

      - "The anterior-most of these domains, bordered by the M1 vein, appears to correspond to an AP compartment boundary originally described by cell lineage tracing in Drosophila melanogaster, and later supported in butterfly wings by expression of the Engrailed transcription factor. Interestingly, however, D. melanogaster work has yet to reveal clear evidence for additional AP domain boundaries in the wing." Confusingly, because the first sentence is about compartments while the second is about AP domains. I also think the claim that Dmel has no other known AP domains is dubious because Spalt is highly regionalized in flies.

      - "Previous authors have proposed the existence of such individuated domains, and speculated that they may be specified by selector genes.5,10 Our data provide experimental support for this model, and now motivate us to identify factors that specify other domain boundaries between the M1 and A2 veins." Yes, I completely agree with this way to emphasize the selector effect, and to link it to the concept of "individuated domain"

      We cannot thank the reviewer enough for the time and thought they devoted to giving helpful suggestions to improve our manuscript. We have applied all of the above recommendations to the revision.

      Fig. S1: the field needs to move away from Red/Green microscopy images, for accessibility reasons.

      The easiest fix here would be to change the red channels to magenta.

      Green/Magenta provides excellent contrast and accessibility in general in 2-channel images.

      We thank the reviewer for this suggestion. We have improved the color accessibility of Fig. S1.

    1. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

      definition memex

    2. One cannot hope thus to equal the speed and flexibility with which the mind follows an associative trail,

      One cannot hope thus to equal the speed and flexibility with which the mind follows an associative trail, but it should be possible to beat the mind decisively in regard to the permanence and clarity of the items resurrected from storage.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kv2 subfamily potassium channels contribute to delayed rectifier currents in virtually all mammalian neurons and are encoded by two distinct types of subunits: Kv2 alpha subunits that have the capacity to form homomeric channels (Kv2.1 and Kv2.2), and KvS or silent subunits (Kv5,6,8.9) that can assemble with Kv2.1 or Kv2.2 to form heteromeric channels with novel biophysical properties. Many neurons express both types of subunits and therefore have the capacity to make both homomeric Kv2 channels and heteromeric Kv2/KvS channels. Determining the contributions of each of these channel types to native potassium currents has been very difficult because the differences in biophysical properties are modest and there are no Kv2/KvS-specific pharmacological tools. The authors set out to design a strategy to separate Kv2 and Kv2/KvS currents in native neurons based on their observation that Kv2/KvS channels have little sensitivity to the Kv2 pore blocker RY785 but are blocked by the Kv2 VSD blocker GxTx. They clearly demonstrate that Kv2/KvS currents can be differentiated from Kv2 currents in native neurons using a two-step strategy to first selectively block Kv2 with RY785, and then block both with GxTx. The manuscript is beautifully written; takes a very complex problem and strategy and breaks it down so both channel experts and the broad neuroscience community can understand it.

      Strengths:

      The compounds the authors use are highly selective and unlikely to have significant confounding cross-reactivity to other channel types. The authors provide strong evidence that all Kv2/KvS channels are resistant to RY785. This is a strength of the strategy - it can likely identify Kv2/KvS channels containing any of the 10 mammalian KvS subunits and thus be used as a general reagent on all types of neurons. The limitation then of course is that it can't differentiate the subtypes, but at this stage, the field really just needs to know how much Kv2/KvS channels contribute to native currents and this strategy provides a sound way to do so.

      Weaknesses:

      The authors are very clear about the limitations of their strategy, the most important of which is that they can't differentiate different subunit combinations of Kv2/KvS heteromers. This study is meant to be a start to understanding the roles of Kv2/KvS channels in vivo. As such, this is a minor weakness, far outweighed by the potential of the strategy to move the field through a roadblock that has existed since its inception.

      The study accomplishes exactly what it set out to do: provide a means to determine the relative contributions of homomeric Kv2 and heteromeric Kv2/KvS channels to native delayed rectifier K+ currents in neurons. It also does a fabulous job laying out the case for why this is important to do.

      Reviewer #2 (Public Review):

      Summary:

      Silent Kv subunits and the channels containing these Kv subunits (Kv2/KvS heteromers) are in the process of discovery. It is believed that these channels fine-tune the voltage-activated K+ currents that repolarize the membrane potential during action potentials, with a direct effect on cell excitability, mostly by determining action potentials firing frequency.

      Strengths:

      What makes silent Kv subunits even more important is that, by being expressed in specific tissues and cell types, different silent Kv subunits may have the ability to fine-tune the delayed rectifying voltage-activated K+ currents that are one of the currents that crucially determine cell excitability in these cells. The present manuscript introduces a pharmacological method to dissect the voltage-activated K+ currents mediated by Kv2/KvS heteromers as a means of starting to unveil their importance, together with Kv2-only channels, to the cells where they are expressed.

      Weaknesses:

      While the method is effective in quantifying these currents in any isolated cell under an electric voltage clamp, it is ineffective as a modulating maneuver to perhaps address these currents in an in vivo experimental setting. This is an important point but is not a claim made by the authors.

      We agree. We have now stated in the introduction that this study does not address the roles of Kv2/KvS currents in an in vivo setting.

      Manuscript revisions:

      While this study does not address the impact of GxTX or RY785 on action potentials or in vivo, the distinct pharmacology of Kv2/KvS heteromers presented here suggests that KvS conductances could be targeted to selectively modulate discrete subsets of cell types.  

      There are other caveats with the methods and data:

      (i) The need for a 'cocktail' of blockers to supposedly isolate Kv2 homomers and Kv2/KvS heteromers' currents from others may introduce errors in the quantification Kv2/KvS heteromers-mediated K+ currents and that is due to possible blockers off targets.

      We now point out that is possible that off target effects of blockers may introduce errors, include references that identify the selectivity of the blockers used in the cocktail, and specifically note that 4-aminopyridine in the cocktail is expected to block 2% of Kv2 homomers yet have a lesser impact Kv2/KvS heteromers. Additionally, to test whether the KvS isolation strategy requires the cocktail in neurons, we performed new experiments on a different subclass of nociceptors without the blocker cocktail and identified a substantial KvS-like component (new Fig 7 Supplement 3).

      Manuscript revisions:

      “After whole-cell voltage clamp was established, non-Kv2/KvS conductances were suppressed by changing to an external solution containing a cocktail of inhibitors: 100 nM alpha-dendrotoxin (Alomone) to block Kv1 (Harvey and Robertson, 2004), 3 μM AmmTX3 (Alomone) to block Kv4 (Maffie et al., 2013; Pathak et al., 2016), 100 μM 4-aminopyridine to block Kv3 (Coetzee et al., 1999; Gutman et al., 2005), 1 μM TTX to block TTX sensitive Nav channels, and 10 μM A803467 (Tocris) to block Nav1.8 (Jarvis et al., 2007). It is possible that off target effects of blockers may introduce errors in the quantification Kv2/KvS heteromer-mediated K<sup>+</sup> currents. For example, 4-aminopyridine is expected to block a small fraction, 2%, of Kv2 homomers and have a lesser impact on Kv2/KvS heteromers (Post et al., 1996; Thorneloe and Nelson, 2003; Stas et al., 2015) which could result in a slight overestimation of the ratio of Kv2/KvS heteromers to Kv2 homomers.”

      “We also tested the other major mouse C-fiber nociceptor population, peptidergic nociceptors, to determine if this subpopulation also has conductances resistant to RY785 yet sensitive to GxTX. We voltage clamped DRG neurons from a CGRP<sup>GFP</sup> mouse line that expresses GFP in peptidergic nociceptors (Gong et al., 2003). Deep sequencing has identified mRNA transcripts for Kv6.2, Kv6.3, Kv8.1 and Kv9.3 present in GFP+ neurons, an overlapping but distinct set of KvS subunits from the Mrgprd<sup>GFP</sup> non-peptidergic population (Zheng et al., 2019). In GFP+ neurons from CGRP<sup>GFP</sup> mice, we found that a fraction of outward current was inhibited by 1 µM RY785 and additional current inhibited by 100 nM GxTX (Fig 7 Supplement 3 A-C). In these experiments, 58 ± 2% (mean ± SEM) was KvS-like (Fig 7 Supplement 3 D) identifying that KvSlike conductances are present in these peptidergic nociceptors. For CGRP<sup>GFP</sup> neurons we did not include the Kv1, Kv3, Kv4, Nav and Cav channel inhibitor cocktail used for other neuron experiments, indicating that the cocktail of inhibitors is not required to identify KvS-like conductances.”

      (ii) During the electrophysiology experiments, the authors use a holding potential that is not as negative as it is needed for the recording of the full population of the Kv2/KvS channels. Depolarized holding potentials lead to a certain level of inactivation of the channels, that vary according to the KvS involved/present in that specific population of channels. As a reminder, some KvS promote inactivation and others prevent inactivation. Therefore, the data must be interpreted as such.

      We agree. We now point out that the physiological holding potentials used are insufficiently negative to relieve inactivation from all Kv2/KvS heteromeric channels. We also note that the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.

      Manuscript revisions:

      “Neurons were held at a membrane potential of –74 mV to mimic a physiological resting potential. KvS subunits can profoundly shift the voltage-inactivation relation (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and this potential is likely insufficiently negative to relieve inactivation from all Kv2/KvS heteromeric channels. Also, the activation membrane potential is close to the half-maximal point of Kv2/KvS conductances. Thus the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.”

      (iii) The analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. Also, in dealing with a heterogenous population of Kv2/KvS heteromers, heterogenous K+ conductance deactivation kinetics is a must. Indeed, different KvS may significantly relate to different deactivation kinetics as well.

      We now discuss that the bi-exponential fit of tail currents is likely inadequate to capture the deactivation kinetics of all underlying components of a heterogenous population of Kv2/KvS heteromers.

      Manuscript revisions:

      “We note that the analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. We expect that inactivation of Kv2/KvS conductances during the 200 ms pre-pulse is minimal (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and did not notice inactivation during the activation pulse. Also, deactivation kinetics can vary in a heterogenous population of Kv2/KvS heteromers. While analysis of tail currents could skew the quantification of total Kv2 like and KvS-like conductances, our data supports that mouse nociceptors and human neurons have tail currents that are resistant to RY785 and sensitive to GxTX consistent with the presence of Kv2/KvS heteromers.”

      (iv) Silent Kv subunits may be retained in the ER, in heterologous systems like CHO cells. This aspect may subestimate their expression in these systems. Nevertheless, the authors show similar data in CHO cells and in primary neurons.

      We agree. We now note that in heterologous systems, including CHO cells, transfection of KvS subunits can result in KvS subunits that are retained intracellularly.

      Manuscript revisions:

      “While a fraction of KvS subunits appear to be retained intracellularly, immunofluorescence for Kv5.1, Kv9.3 and Kv2.1 also appeared localized to the perimeter of transfected Kv2.1-CHO cells (Figure 1 Supplement).”

      (v) The hallmark of silent Kv subunits is their effect on the time inactivation of K+ currents. As such, data should be shown throughout, preferably, from this perspective, but it was only done so in Figure 4G.

      Indeed, effects on inactivation are a hallmark of KvS subunits. However, quantifying inactivation of Kv2/KvS channels requires steps to positive voltages for approximately 10 seconds. In neurons steps this long usually resulted in irreversible changes in leak currents/input resistance that degraded the accuracy of RY785/GxTX subtraction currents. Consequently, we did not acquire inactivation data in neurons, and we now explain in the manuscript why such data was not obtained.

      Manuscript revisions:

      “While changes in inactivation are prominent with KvS subunits, we did not investigate inactivation in neurons because the lengthy depolarizations required often resulted in irreversible leak current increases that degraded the accuracy of RY785/GxTX subtraction current quantification.”

      (vi) Functional characterization of currents only, as suggested by the authors as a bona fide of Kv2 and Kv2/KvS currents, should not be solely trusted to classify the currents and their channel mediators.

      We agree, and now state explicitly that functional characterization cannot be trusted to classify their channel mediators of conductances, and we try to be clear about this throughout the manuscript by using soft terms such as "KvS-like" when identity is uncertain.

      Manuscript revisions:

      “As functional characterization alone cannot be trusted to classify their channel mediators of conductances, we define conductances consistent with Kv2/KvS heteromers as 'KvS-like' and conductances consistent with Kv2 homomers as 'Kv2-like'.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There is not a lot to do here - this was a real pleasure to read and very easy to understand, as written. Here are a few minor things to consider:

      (1) The naming of the KvS subunits has always been confusing - it is not clear that Kv5,6,8,9 are members of the Kv2 subfamily from the names. KvS does a good job of differentiating them by assembly phenotype and has been used a lot in the literature, but it doesn't solve the misconception of what subfamily they belong to. This might not matter so much for mammals, where all KvS channels are in the Kv2 subfamily, but it makes it impossible to extend the naming system to other animals where subunits requiring heteromeric assembly are common in most subfamilies. How about trying the name Kv2S? It would have continuity with KvS in the reader's mind, make it clear that they are Kv2 subfamily, and make a naming system that could be extended beyond vertebrates. This is not a problem the authors created - just a completely optional suggestion on how to solve it if so inclined.

      We agree that naming conventions for these subunits are problematic, and agonized quite a bit about nomenclature. In the end we chose to stick with the precedent of KvS.

      (2) Another naming issue they should definitely change is the use of "subfamily" for the different KvS subtypes (Kv5, Kv6, Kv8, and Kv9). This really creates confusion with the higher-order subfamilies that have a very clear functional definition: a subfamily of Kv genes is a group of related genes that have assembly compatibility. Those are Kv1, Kv2, Kv3 and Kv4. KvS genes are assembly compatible with Kv2, evolutionarily derived from the Kv2 lineage, and thus clearly a part of the Kv2 subfamily. Using a subfamily for the next lower level of the naming hierarchy confuses this. The authors should use different terms like sub-type or class or subgroups for the divisions within KvS.

      Thank you. We have standardized to Kv2/KvS as a subfamily; Kv5, Kv6, Kv8, and Kv9 as subtypes; and individual proteins, e.g. Kv8.1, as subunits.

      (3) When you discuss whether the KvS subunit directly disrupts Ry785 binding in the pore or works allosterically and you said you know which KvS residues point into the pore from models, I thought that maybe you could tell from a sequence alignment whether the KvS channels you didn't test look the same in the conduction pathway as the ones you did test. If so, you could mention that if the binding site is the pore, they should all be resistant. Alternatively, if one you didn't test looks fundamentally more similar to the Kv2s in this region, then maybe it could be fingered as a possible exception that needs to be tested later.

      Great ideas. We now assess sequence KvS variability near the proposed RY785 binding site in all KvS subunits. We generated structural models of RY785 docking to Kv2.1 and Kv2.1/Kv8.1 and found that residues near RY785 are different in all KvS subunits.

      Manuscript revisions:

      “We analyzed computational structural models of RY785 docked to a Kv2.1 homomer and a 3:1 Kv2.1:Kv8.1 heteromer (Fig 9) to gain structural insight into how KvS subunits might interfere with RY785 binding. We used Rosetta to dock RY785 to a cryo-EM structure of a Kv2.1 homomer in an apparently open state (Fernández-Mariño et al., 2023). The top-scoring docking pose has RY785 positioned below the selectivity filter and off-axis of the pore (Fig 9 A), similar to a stable pose observed in molecular dynamic simulations (Zhang et al., 2024). In this pose, RY785 contacts a collection of Kv2.1 residues that vary in every KvS subtype (Fig 9 B,D,E). Notably, RY785 bound similarly to a 3:1 model of Kv2.1/Kv8.1, in contact with the three Kv2.1 subunits, yet avoided the Kv8.1 subunit (Fig 9C). This is consistent with RY785 binding less well to Kv2.1/Kv8.1 heteromers, and also suggests that a 3:1 Kv2:KvS channel could retain a RY785 binding site when open.”

      (4) Future suggestion or tip - not for this paper. Your data shows your isolation strategy works really well on Kv6 channels, and these are also the Kv2/KvS channels that have the most pronounced biophysical changes. Working on neurons that have a prominent Kv2/Kv6 component would really show how well the strategy outlined here works to describe the physiology of native neurons. The highest KvS expression I have seen in public data in a wellstudied cell type is Kv6.4 in spinal motor neurons.

      Wonderful tip, thank you. We are indeed very interested in Kv6.4 in spinal motor neurons.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript makes a good contribution to the identification of Kv2/KvS channels in primary cells. The pharmacological method proposed by the authors to dissect the currents in an experimental setting seems proper. Although meritorious in themselves, the findings are heavily phenomenological in the opinion of this reviewer. The manuscript should be improved with some level of mechanistic data and/or the demonstration of different levels of expression in different cell types.

      Thank you for the suggestions. This manuscript now demonstrates strikingly higher levels of the KvS-like component of Kv2 currents in somatosensory (DRG nonpeptidergic and peptidergic nociceptor) versus autonomic (SCG) neuron types. The mechanistic question of what electrophysiological properties the KvS subunits are providing to the neuronal circuit is an exciting one that we are pursuing separately.

      Manuscript revisions:

      “While we found only RY785-sensitive Kv2-like conductances in SCG neurons, Kv2/KvS heteromer-like conductances were dominant in DRG neurons.”

      At present, the manuscript says that the combination of RY785 and guangxitoxin-1E can be used to define Kv2/KvS-mediated K+ currents. Importantly, this method cannot be used in a way that one can functionally determine the function of Kv2/KvS channels, since it depends on the pre-blocking of Kv2-mediated K+ currents prior. In the opinion of this reviewer, this fact decreases the attention of a potential reader.

      Indeed, our study is focused on revealing KvS heteromers by voltage clamp, and we now clarify in the introduction that we do not determine the function of Kv2/KvS channels in this study, so as not to lead the reader to expect studies of neuronal signaling.

      However, the selective pharmacology we identify suggests RY785 application could reveal the function of Kv2 homomers, and for RY785-insensitive signaling, GxTX application of could reveal the function of Kv2/KvS heteromers. We now mention these possible applications in the Discussion.

      Manuscript revisions:

      “While this study does not address the impact of GxTX or RY785 on action potentials or in vivo, the distinct pharmacology of Kv2/KvS heteromers presented here suggests that KvS conductances could be targeted to selectively modulate discrete subsets of cell types.”

      Please find below suggestions for improving the manuscript:

      (1) The term "Kv2/KvS heteromers" should be used throughout instead of variations such as "Kv2/KvS channels", "Kv2/KvS" and others. Standardization of the term to refer to heteromers would make the manuscript easier to read.

      Thank you. We have standardized terms to consistently refer to Kv2/KvS heteromers.

      (2) Confusing terms like KvS conductances, KvS-like conductances, KvS-like (RY785-resistant, GxTX-sensitive) currents, and KvS channels should be avoided because they disregard the current belief that KvS cannot form functional homomeric channels. The term KvS-containing channels, and Kv2/KvS channels, seem more accurate. Uniformization in this regard will also make the manuscript more easily readable.

      Thank you. We have standardized terms to Kv2/KvS heteromers and KvS-containing channels when channel subunits are known and the use terms KvS-like and Kv2-like for functionally identified endogenous conductances with unknown channel subunits.

      (3) Referring to KvS as a regulatory subunit is inaccurate. It is clear that KvS is part of, and it makes up the alpha pore. KvS therefore is a part of the conductive pathway and not a regulatory (suggesting accessory) subunit. KvS take part in selectivity filter (fully conserved), but they also make up an important part of the conducting pathway with non-conserved amino acid residues.

      We felt it important to include the descriptor “regulatory” to connect our nomenclature with prior use of the descriptor in the literature, and now only use the term at the start of the introduction.

      Manuscript revisions:

      “A potential source of molecular diversity for Kv2 channels are a group of Kv2-related proteins which have been referred to as regulatory, silent, or KvS subunits.”

      (4) The use of a cocktail of channel inhibitors may affect the quantification of Kv2/KvS heteromers-mediated K+ currents because they may interact with RY785 and/or GxTx or they may even interact with the sites for these two drugs on Kv2-containing channels.

      This is an interesting point worth considering, thank you. We now alert readers to this possibility in the discussion when considering the limitations of our approach.

      Manuscript revisions:

      “Also, the cocktail of inhibitors used in most neuron experiments here could potentially alter RY785 or GxTX action against KvS/Kv2 channels.”

      (5) The graphical representation of fractional blocking and other parameters (e.g., Fig 1D), is hard to read in these slim plots. In my opinion, tall bars would be more meaningfully visualized.

      Thank you for pointing out that the graphs were hard to read, we have made the graph easier to read and added tall bars.

      (6) Vehicle control for IHC and electrophysiology. Please state what is the vehicle used in the electrophysiology experiments.

      Thank you. The composition of vehicle has now been stated in the methods.

      Manuscript revisions:

      “All RY785 solutions contained 0.1% DMSO. Vehicle control solutions also contained 0.1% DMSO but lacked RY785.”

      “Sections were incubated in vehicle solution (4% milk, 0.2% triton diluted in PB) for 1 hr at RT.”

      (7) The reference Trapani & Korn, 2003 (?) is not included in the list. This reference is important since it sets what are the Kv2.1-CHO cells. In this regard it is also important to mention, even better to address, the expressing qualities of this system in the face of a co-expression with a plasmid-based expression of silent Kv subunits. Are these two ways of expressing Kv subunits, meant to come together (or not) in heteromers, balanced? This question is critical here. Still, in regard to Kv2.1-CHO cells, it was not clear in the manuscript if the term "transfection" refers only to the plasmids used to temporarily induce the expression of silent Kv subunits and potentially Kv channels accessory subunits.

      We now include the Trapani & Korn, 2003 reference (thank you for pointing out this accidental omission), and better explain expression methods. The benefit of the inducible Kv2.1 expression is control of Kv conductance densities which can otherwise become so large as to be refractory to voltage clamp. The beauty of the expression system is that cells recently transfected with KvS subunits can be induced to express just enough Kv2.1 to get a substantial but not clampoverwhelming RY785-resistant Kv2/KvS conductance. We also discuss that our expression methods are distinct from past studies. We stop short of comparing the expression systems, as this is beyond the scope of what we set out to study.

      Manuscript revisions: See next response

      (8) Kv2.1-CHO cells transfection procedures, induction, and validation are unclear. This validation is important here.

      We have clarified transfection procedures, induction, and validation in the methods section.

      Manuscript revisions:

      “The CHO-K1 cell line transfected with a tetracycline-inducible rat Kv2.1 construct (Kv2.1-CHO) (Trapani and Korn, 2003) was cultured as described previously (Tilley et al., 2014).”

      Transfections were achieved with Lipofectamine 3000 (Life Technologies, L3000001). 1 μl Lipofectamine was diluted, mixed, and incubated in 25 μl of Opti-MEM (Gibco, 31985062).”

      “Concurrently, 0.5 μg of KvS or AMIGO1 or Navβ2, 0.5 μg of pEGFP, 2 μl of P3000 reagent and 25 μl of Opti-MEM were mixed. DNA and Lipofectamine 3000 mixtures were mixed and incubated at room temperature for 15 min. This transfection cocktail was added to 1 ml of culture media in a 24 well cell culture dish containing Kv2.1-CHO cells and incubated at 37 °C in 5% CO2 for 6 h before the media was replaced. Immediately after media was replaced, Kv2.1 expression was induced in Kv2.1-CHO cells with 1 μg/ml minocycline (Enzo Life Sciences, ALX380-109-M050), prepared in 70% ethanol at 2 mg/ml. Voltage clamp recordings were performed 12-24 hours later. We note that the expression method of Kv2/KvS heteromers used here is distinct from previous studies which show that the KvS:Kv2 mRNA ratio can affect the expression of functional Kv2/KvS heteromers (Salinas et al., 1997b; Pisupati et al., 2018). We validated the functional Kv2/KvS heteromer expression using voltage clamp to establish distinct channel kinetics and the presence of RY785-resistant conductance in KvS-transfected cells and using immunohistochemistry to label apparent surface localization of KvS subunits (Figure 4, Figure 1 Supplement, Figure 1 and Figure 5).”

      (9) It is important for readers to add some context to Kv2.1/Kv8.1 channels (and other Kv2/KvS heteromers) used to test the combination of RY785 and GxTx. In my opinion, this enriches the discussion.

      Good idea. We have added context about each of the KvS subunits we test.

      Manuscript revisions:

      “To test the pharmacological response of KvS we began with Kv8.1, a subunit that creates heteromers with biophysical properties distinct from Kv2 homomers (Salinas et al., 1997a), and modulates motor neuron vulnerability to cell death (Huang et al., 2024).

      Each of these KvS subunits create Kv2/KvS heteromers that have distinct biophysical properties (Kramer et al., 1998; Kerschensteiner and Stocker, 1999; Bocksteins et al., 2012). Kv5.1/Kv2.1 heteromers play an important role in controlling the excitability of mouse urinary bladder smooth muscle (Malysz and Petkov, 2020), mutations in Kv6.4 have been shown to influence human labor pain (Lee et al., 2020b), and deficiency of Kv9.3 disrupts parvalbumin interneuron physiology in mouse prefrontal cortex (Miyamae et al., 2021).”

      (10) In general, the membrane potential used to activate Kv2 only channels and Kv2/KvS channels is too close to the activation V1/2. In case the comparing curves are displaced in their relative voltage dependence and voltage sensitivity, using that range of membrane potential may introduce a crucial error in the estimation of the conductance's relative amplitudes.

      We now note that the relative conductances of Kv2-only vs Kv2/KvS channels are expected to vary with voltage protocol, as KvS inclusion results in channels with altered voltage responses.

      Manuscript revisions:

      “…the activation membrane potential is close to the half-maximal point of Kv2/KvS conductances. Thus the ratio of Kv2-like to KvS-like conductance is expected to vary with voltage protocols.”

      (11) The use of tail currents to estimate conductance is problematic if i) lack of current inactivation is not assured, and ii) if the different currents, with possible different deactivation kinetics at the used membrane potential (e.g., mV), are not assured. Why was the activation peak used at times, and at different elapsed times the tail currents were used instead? These aspects of conductance's amplitude estimation methods should be well defined.

      In CHO cells peak currents were analyzed because outward currents seem to offer the best signal/noise. In neurons, we restricted analysis to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. We have clarified this analysis in the methods section.

      Manuscript revisions:

      “In CHO cells peak currents were analyzed because outward currents seem to offer the best signal/noise. In neurons, we restricted analysis to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. In neurons, voltage gated currents remained in the toxin cocktail + RY785 and GxTX, that were sometimes unstable. To minimize complications from these currents, we restricted analysis of RY785 and GxTX subtraction experiments to tail currents at elapsed times to minimize complications from non-Kv2 endogenous voltage-gated channels which deactivate more quickly. We note that the analysis of conductance activation by using tail currents is only accurate when dealing with non-inactivating conductances. We expect that inactivation of Kv2/KvS conductances during the 200 ms pre-pulse is minimal (Salinas et al., 1997a; Kramer et al., 1998; Kerschensteiner and Stocker, 1999) and did not notice inactivation during the activation pulse. Also, deactivation kinetics can vary in a heterogenous population of Kv2/KvS heteromers. While analysis of tail currents could skew the quantification of total Kv2 like and KvS-like conductances, our data supports that mouse nociceptors and human neurons have tail currents that are resistant to RY785 and sensitive to GxTX consistent with the presence of Kv2/KvS heteromers.”

      (12) Were the experiments including different conditions such as control, RY, and RY+GxTx done pair-wised? This could potentially better the statistics and strengthen the data and the conclusions drawn from them.

      The control, RY, and RY+GxTX in neurons were done pairwise and the statistical tests performed for these experiments were pairwise tests. We have clarified this in the figure legends.

      Manuscript revisions:

      “Wilcoxon rank tests were paired, except the comparison of RY785 to vehicle which was unpaired.”

      (13) The holding potential of the experiments, mostly -89 mV, may be biasing the estimation of Kv2 only channels vs. Kv2/KvS channels conductances. Figure 4I exemplifies this concern.

      We agree. Figure 4I reveals that a holding potential of -89 mV vs -129 mV reduces conductance of Kv2.1/Kv8.1 heteromers vs Kv2.1 homomers in CHO cells by ~20%. We have now alerted readers that the ratio of Kv2 only channels vs. Kv2/KvS conductances can vary with holding voltage.

      Manuscript revisions:

      “Under these conditions, 58 ± 3 % (mean ± SEM) of the delayed rectifier conductance was resistant to RY785 yet sensitive to GxTX (KvS-like) (Fig 7 F). We note that the ratio of KvS- to Kv2-like conductances is expected to vary with holding potential, as KvS subunits can change the degree and voltage-dependence of steady state inactivation (e.g. Fig 4I).”

      (14) It is possible that Figure 6A (control trace) and Figure 6C ("Kv2-like" trace) are the same, by mistake, since their noise pattern looks too similar.

      Indeed the noise pattern of the Figure 6A (control trace) and Figure 6C ("Kv2-like" trace) are related, as they have inputs from the same trace, with Figure 6C ("Kv2-like" trace) being a subtraction of Figure 6A (+RY trace) from Figure 6A (control trace).

      (15) For example, in Figure 7A, what is the identity of the current remaining after the RY+GxTx application? In Figure 7B, a supposed outlier in the group of data referring to "veh" in the right panel is what possibly is making this group different from +RY in the left panel (p=0.02, Wilcoxon rank test). I would recommend parametric tests only since the data is essentially quantitative.

      In Figure 7A, we do not know the identity of the current remaining after the RY+GxTX application, the kinetics of the residual current appeared distinct from the Kv2/KvS-like currents blocked by RY or GxTX, but we did not analyze these.

      The date in Figure 7B, was indeed the positive outlier in the group of data referring to "veh" in the right panel and contributes to the p-value, but we saw no reason to exclude it. We have now replaced the representative trace in 7B with a non-outlier trace. We respectfully disagree with the suggestion to use parametric statistical tests as we do not know the distribution underlying the variance our data.

      Manuscript revisions:

      “Subsequent application of 100 nM GxTX decreased tail currents by 68 ± 5% (mean ± SEM) of their original amplitude before RY785. We do not know the identity of the outward current that remains in the cocktail of inhibitors + RY785 + GxTX.”

      (16) Please state the importance of using nonpeptidergic neurons to study silent Kv5.1 and Kv9.1 subunits. RNA data may not necessarily work to probe function or protein abundance, which is crucial in heteromeric complexes.

      We have now more thoroughly explained our rationale for choosing the nonpeptidergic neurons.

      RNA is not predictive of protein abundance, and we have not yet been successful in measuring KvS protein abundance in these neurons, so we've probed KvS abundance by assessing RY785 resistance.

      Manuscript revisions:

      “Mouse dorsal root ganglion (DRG) somatosensory neurons express Kv2 proteins (Stewart et al., 2024), have GxTX-sensitive conductances (Zheng et al., 2019), and express a variety of KvS transcripts (Bocksteins et al., 2009; Zheng et al., 2019), yet transcript abundance does not necessarily correlate with functional protein abundance. To record from a consistent subpopulation of mouse somatosensory neurons which has been shown to contain GxTXsensitive currents and have abundant expression of KvS mRNA transcripts (Zheng et al., 2019), we used a Mrgprd<sup>GFP</sup> transgenic mouse line which expresses GFP in nonpeptidergic nociceptors (Zylka et al., 2005; Zheng et al., 2019). Deep sequencing identified that mRNA transcripts for Kv5.1, Kv6.2, Kv6.3, and Kv9.1 are present in GFP+ neurons of this mouse line (Zheng et al., 2019) and we confirmed the presence of Kv5.1 and Kv9.1 transcripts in GFP+ neurons from Mrgprd<sup>GFP</sup> mice using RNAscope (Fig 7 Supplement 1).”

      (17) In Figure 8B, were +RY data different from veh data? The figure shows no Wilcoxon (nonparametric) comparison and this is important to be stated. What conductance(s) is the vehicle solution blocking or promoting? What is RY dissolved in, DMSO? What is the DMSO final concentration?

      We now state that in Figure 8B, +RY amplitudes were not statistically different from veh data in this limited data set. However, the RY-subtraction currents always had Kv2-like biophysical properties, whereas vehicle-subtraction currents had variable properties precluding biophysical analysis for Fig 8D.

      In Figure 8B, we do not know what conductance(s) the vehicle solution is affecting, we think the changes observed are likely merely time dependent or due to the solution exchange itself. RY stock is in DMSO. All recording solutions have 0.1% DMSO final concentration, this is now noted in methods.

      Manuscript revisions:

      “Unlike mouse neurons, we did not detect a significant difference in tail currents of RY785 versus vehicle controls. However, RY785-subtracted currents always had Kv2-like biophysical properties whereas vehicle-subtraction currents had variable properties that precluded the same biophysical analysis. Overall, these results show that human DRG neurons can produce endogenous voltage-gated currents with pharmacology and gating consistent with Kv2/KvS heteromeric channels.”

      “All RY785 solutions contained 0.1% DMSO. Vehicle control solutions also contained 0.1% DMSO but lacked RY785.”

      (18) METHODS. The electrophysiology approach should be unified in all aspects as applicable and possible.

      We have unified the mouse dorsal root ganglion and mouse superior cervical ganglion methods sections. We have kept CHO cells and mouse/human neurons section separate because the methods were substantially different.

      (19) DISCUSSION. The discussion section spends half of its space trying to elaborate on possible blocking/inhibiting/modulating mechanisms for RY785. The present manuscript shows no data, at least not that I have noticed, that would evoke such discussion.

      We have shortened this section, and enhance the discussion with structural models (new Fig 9), and our functional data indicating perturbed RY785 interaction with Kv2.1/8.1.

      Manuscript revisions:

      “In this pose, RY785 contacts a collection of Kv2.1 residues that vary in every KvS subtype (Fig 9 B,D,E). Notably, RY785 bound similarly to a 3:1 model of Kv2.1/Kv8.1, in contact with the three Kv2.1 subunits, yet avoided the Kv8.1 subunit (Fig 9C). This is consistent with RY785 binding less well to Kv2.1/Kv8.1 heteromers, and also suggests that a 3:1 Kv2:KvS channel could retain a RY785 binding site when open. However, the RY785 resistance of Kv2/KvS heteromers may primarily arise from perturbed interactions with the constricted central cavity of closed channels. In homomeric Kv2.1, RY785 becomes trapped in closed channels and prevents their voltage sensors from fully activating, indicating that RY785 must interact differently with closed channels (Marquis and Sack, 2022). Here we found that Kv2.1/Kv8.1 current rapidly recovers following washout of RY785, suggesting that Kv2.1/Kv8.1 heteromers do not readily trap RY785 (Figure 2 Supplement). Overall, the structural modeling suggests that KvS subunits sterically interfere with RY785 binding to the central cavity, while functional data suggest KvS subunits disrupt RY785 trapping in closed states.”

      (20) DISCUSSION. Topics like ER retention and release upon certain conditions would be a better enrichment for the manuscript in my opinion.

      ER retention of KvS subunits is indeed an important topic! However, we have opted not to delve into it here.

      (21) DISCUSSION. Speculation about the binding site for RY on Kv2/KvS channels is also not touched by the data shown in the manuscript.

      We have shortened this section of discussion, and now present this with structural models of RY785 docked to a Kv2.1 homomer and 3:1 Kv2.1: Kv8.1 heteromer (new Fig 9) to ground speculations. See manuscript changes noted in response to comment (19) above.

      (22) DISCUSSION. An important reference is missing in regard to stoichiometry: Bocksteins et al., 2017. This work is the only one using a non-optical technique to add knowledge to that question.

      Good point, and an excellent study we didn’t realize we’d not included before. We now include Bocksteins et al. 2017 as a reference in the Introduction.

      (23) In my opinion, allosterism and orthosterism are concepts not yet useful for the discussion of RY binding sites without even a general piece of data.

      We now include structural models of RY785 docked to a Kv2.1 homomer and 3:1 Kv2.1: Kv8.1 heteromer (new Fig 9) to ground blocking speculations. See manuscript changes noted in response to comment (19).

      (24) The term "homogeneously susceptible" associated with a Hill slope close to 1 needs to be more elaborated.

      Thank you, we have elaborated.

      Manuscript revisions:

      “Also, the degree of resistance to RY785 may vary if Kv2:KvS subunit stoichiometry varies. With high doses of RY785, we found that the concentration-response characteristics of Kv2.1/Kv8.1 in CHO cells revealed hallmarks of a homogenous channel population with a Hill slope close to 1 (Fig 2B). However, other KvS subunits might assemble in multiple stoichiometries and result in pharmacologically-distinct heteromer populations.”

      (25) Stating the KvS are resistant to RY785 is not proper in my opinion. This opinion relates to the fact that the RY binding site in the channels is certainly not restricted to a binding site residing only on the Kv subunit.

      Good point. We have now changed phrasing to convey that KvS subunits are a component of a heteromer that imbues RY785 resistance.

      Manuscript revisions:

      “These results show that voltage-gated outward currents in cells transfected with members from each KvS subtype have decreased sensitivity to RY785 but remain sensitive to GxTX. While we did not test every KvS subunit, the ubiquitous resistance suggests that all KvS subunits may provide resistance to 1 μM RY785 yet remain sensitive to GxTX, and that RY785 resistance is a hallmark of KvS-containing channels.”

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of the melanocortin system in puberty onset. They conclude that POMC neurons within the arcuate nucleus of the hypothalamus provide important but differing input to kisspeptin neurons in the arcuate or rostral hypothalamus.

      Strengths:

      Innovative and novel

      Technically sound

      Well-designed

      Thorough

      Weaknesses:

      There were no major weaknesses identified.

      Reviewer #2 (Public review):

      Summary:

      This interesting manuscript describes a study investigating the role of MC4R signalling on kisspeptin neurons. The initial question is a good one. Infertility associated with MC4 mutations in humans has typically been ascribed to the consequent obesity and impaired metabolic regulation. Whether there is a direct role for MC4 in regulating the HPG axis has not been thoroughly examined. Here, the researchers have assembled an elegant combination of targetted loss of function and gain of function in vivo experiments, specifically targetting MC4 expression in kisspeptin neurons. This excellent experimental design should provide compelling evidence for whether melanocortin signalling dirently affects arcuate kisspeptin neurons to support normal reproductive function. There were definite effects on reproductive function (irregular estrous cycle, reduced magnitude of LH surge induced by exogenous estradiol). However, the magnitude of these responses and the overall effect on fertility were relatively minor. The mice lacking MC4R in kisspeptin neurons remained fertile despite these irregularities. The second part of the manuscript describes a series of electrophysiological studies evaluating the pharmacological effects of melanocortin signalling in kisspeptin cells in ex-vivo brain slides. These studies characterised interesting differential actions of melanocortins in two different populations of kisspeptin neurons. Collectively, the study provides some novel insights into how direct actions of melanocortin signalling via the MC4 receptor in kisspeptin neurons contribute to the metabolic regulation of the reproductive system. Importantly, however, it is clear that other mechanisms are also at play.

      Strengths:

      The loss of function/gain of function experiments provides a conceptually simple but hugely informative experimental design. This is the key strength of the current paper - especially the knock-in study that showed improved reproductive function even in the presence of ongoing obesity. This is a very convincing result that documents that reproductive deficits in MC4R knockout animals (and humans with deleterious MC4R gene variants) can be ascribed to impaired signalling in the hypothalamic kisspeptin neurons and not necessarily caused as a consequence of obesity. As concluded by the authors: "reproductive impairments observed in MC4R deficient mice, which replicate many of the conditions described in humans, are largely mediated by the direct action of melanocortins via MC4R on Kiss1 neurons and not to their obese phenotype." This is important, as it might change how such fertility problems are treated.

      I would like to see the validation experiments for the genetic manipulation studies given greater prominence in the manuscript because they are critical to interpretation. Presently, only single unquantified images are shown, and a much more comprehensive analysis should be provided.

      Weaknesses:

      (1) Given that mice lacking MC4R in kisspeptin neurons remained fertile despite some reproductive irregularities, this can be described as a contributing pathway, but other mechanisms must also be involved in conveying metabolic information to the reproductive system. This is now appropriately covered in the discussion.

      (2) The mechanistic studies evaluating melanocortin signalling in kisspeptin neurons were all completed in ovariectomised animals (with and without exogenous hormones) that do not experience cyclical hormone changes. Such cyclical changes are fundamental to how these neurons function in vivo and may dynamically alter how they respond to hormones and neuropeptides. Eliminating this variable makes interpretation difficult, but the authors have justified this as a reductionist approach to evaluate estradiol actions specifically. However, this does not reflect the actual complexity of reproductive function.

      For example, the authors focus on a reduced LH response to exogenous estradiol in ovariectomised mice as evidence that there might be a sub-optimal preovulatory LH surge. However, the preovulatory LH sure (in intact animals) was not measured.

      They have not assessed why some follicles ovulated, but most did not. They have focused on the possibility that the ovulation signal (LH surge) was insufficient rather than asking why some follicles responded and others did not. This suggests some issue with follicular development, likely due to changes in gonadotropin secretion during the cycle and not simply due to an insufficient LH surge.

      Reviewer #3 (Public review):

      The manuscript by Talbi R et al. generated transgenic mice to assess the reproduction function of MC4R in Kiss1 neurons in vivo and used electrophysiology to test how MC4R activation regulated Kiss1 neuronal firing in ARH and AVPV/PeN. This timely study is highly significant in neuroendocrinology research for the following reasons.

      (1) The authors' findings are significant in the field of reproductive research. Despite the known presence of MC4R signaling in Kiss1 neurons, the exact mechanisms of how MC4R signaling regulates different Kiss1 neuronal populations in the context of sex hormone fluctuations are not entirely understood. The authors reported that knocking out Mc4r from Kiss1 neurons replicates the reproductive impairment of MC4RKO mice, and Mc4r expression in Kiss1 neurons in the MC4R null background partially restored the reproductive impairment. MC4R activation excites Kiss1 ARH neurons and inhibits Kiss1 AVPV/PeN neurons (except for elevated estradiol).

      (2) Reproduction dysfunction is one of obesity comorbidities. MC4R loss-of-function mutations cause obesity phenotype and impaired reproduction. However, it is hard to determine the causality. The authors carefully measured the body weight of the different mouse models (Figure 1C, Figure 2A, Figure 3B). For example, the Kiss1-MC4RKO females showed no body weight difference at puberty onset. This clearly demonstrated the direct function of MC4R signaling in reproduction but was not a consequence of excessive adiposity.

      (3) Gene expression findings in the "KNDy" system align with the reproduction phenotype.

      (4) The electrophysiology results reported in this manuscript are innovative and provide more details of MC4R activation and Kiss1 neuronal activation.

      Overall, the authors have presented sufficient background in a clear, logical, and organized structure, clearly stated the key question to be addressed, used the appropriate methodology, produced significant and innovative main findings, and made a justified conclusion.

      Comments on revisions:

      The authors have addressed my comments.

      Recommendations for the authors:

      The reviewers noted that they received comments in response to their concerns, and some improvements have been made to the manuscript. However, as described below, in some cases, a rebuttal was provided, but changes were not made to the manuscript. It is suggested that these issues be addressed to improve the quality of the manuscript.

      We thank the reviewers and editor for the assessment of the manuscript and recommendations for its improvement. We have addressed the remaining comments from reviewer #2 below, and hope that they find our revisions satisfactory.

      Reviewer #2 (Recommendations for the authors):

      The manuscript convincingly shows that MC4R in kisspeptin-producing cells can influence reproductive function. This suggests that fertility problems associated with melanocortin mutations are likely due to direct effects on the reproductive systems rather than simply being side effects of the resultant obesity.

      We are pleased that this reviewer finds the data convincing and thank them for the careful review of the manuscript, which has helped to improve its published version.

      The authors have responded to the reviewer's comments and made several improvements to the manuscript.

      The authors are correct in pointing out that the POMC-Cre animals should be fine for studies involving the administration of AAVs to adult animals. I have misinterpreted how these mice were being used, and this concern is fully addressed.

      Unfortunately, in some cases, the authors rebutted the reviewer's comments but did not change the manuscript. I suggest addressing several issues in the manuscript (after all, it is not the reviewer's opinion that counts; this process is about improving the manuscript).

      (1) Validation of the KO is insufficiently reported. From the methods, it appears that this was done thoroughly, but currently, only a single image of the arcuate nucleus is shown, and no image of the AVPV is shown. There is no quantitative information provided. The authors can keep these data as supplementary material, but they should be comprehensive and convincing, as so much depends on the degree of knockout in this model. One cannot assume complete KO based simply on the relevant genetics, as there are examples in this system where different Cre lines produce different outcomes with various floxed genes in the two major populations of kisspeptin neurons. This figure should show the quantitation of the RNAscope analysis from each of the two regions regarding the percentage of kisspeptin cells showing expression of MC4R mRNA. In addition, the lack of MC4 labelling in the arcuate nucleus, outside of kisspeptin neurons, is a concern. One would expect to see AgRP or POMC cells at this level, but are they still showing expression of MC4? A single image is insufficient to be convinced of the model's efficacy.

      We appreciate the reviewer’s concerns regarding the validation of the MC4RKO model. Below, we provide clarification and additional justification for our approach.

      (1) Quantification of MC4R in the Arcuate Nucleus (ARC): As noted by the reviewer, we were unable to detect sufficient MC4R signal in the ARC of KO mice to perform meaningful quantification. This is consistent with the expected outcome of a successful MC4R deletion. Given the low endogenous expression levels of MC4R in this region, even in control animals, and the technical limitations of RNAscope in detecting very low-abundance transcripts, especially for receptors, the absence of MC4R signal in the ARC of KO mice strongly supports effective deletion. Moreover, the MC4R loxP mouse has been published and validated by many labs including Brad Lowell’s lab who’s done extensive work using these mice for selective deletion of Mc4r from various neuronal populations such as Sim1 and Vglut2 neurons (Shah et al., 2014, de Souza Cordeiro et al., 2020). To further strengthen our validation, we provide additional images from another animal (Fig_S1) to illustrate the consistency of the MC4R KO in the ARC. These will be included as supplementary material, as suggested.Regarding AgRP and POMC neurons, MC4R is not highly expressed in these neurons (as per previous literature, e.g., Garfield et al., Nat Neurosci. 2015; Padilla SL et al, Endocrinology 2012; Henry et al, Nature, 2015). Instead, MC4R is predominantly found in downstream neurons in the paraventricular nucleus (PVN) and other hypothalamic regions (which is intact in our KO mice as shown in our validation figure). Thus, the absence of MC4R labeling in AgRP or POMC cells in our images aligns with known expression patterns and does not contradict the validity of our model.

      (2) MC4R Expression in the AVPV and OVX Effect on Kiss1 Expression: We acknowledge the reviewer’s request for MC4R expression analysis in the anteroventral periventricular nucleus (AVPV). However, due to the timing of tissue collection after ovariectomy (OVX), Kiss1 expression in the AVPV is significantly suppressed, making it technically unfeasible to perform co-staining of MC4R with Kiss1 in this region. This is a well-documented effect of estrogen depletion following OVX (Smith et al., 2005; Lehman et al., 2010). While we acknowledge that an ideal validation would include AVPV co-labeling, the experimental constraints related to OVX preclude this analysis in our dataset.

      Given these considerations and validations, we are confident that the KO is effective and specific.

      (2) Line 88: "... however, conflicting reports exist". Expand on this sentence to describe what these conflicting reports show. The authors responded to my comment but made no changes to the introduction. As a reader, I dislike being told there are conflicting reports, but then I have to go and look up the reference to see what that actual point of conflict is.

      By conflicting reports we meant that other studies have shown no association between MC4R and reproductive disorders, this has now been included in the revised manuscript (Line 89).

      (3) Could the authors explain how a decrease in AgRP would be interpreted as a "decrease in hypothalamic melanocortin tone" in line 142 and line 364? These overly simplistic interpretations of qPCR data detract from the overall quality of the paper.

      The reference to a decrease in melanocortin tone referred to the decrease in the expression of melanocortin receptor signaling, this has been clarified in the revised manuscript (lines 142 and 360).

      (4) Please show the individual cycle patterns for all animals, as in Figure 2B. This can be a supplemental figure, but the current bar charts are not informative.

      We respectfully disagree that the bar charts are not informative as they include the critical statistical analysis. We have now included all individual estrous cycle data in new separate supplemental figure (Sup. Figure 3). Therefore, we have excluded the representative cycles from the main figures as they are now in the new Supplemental. We have changed the orders of the figures in the text accordingly.

      (5) In their rebuttal, the authors state: "Mice lack true follicular and luteal phases, and therefore, it is impossible to separate estrogen-mediated changes from progesterone-mediated changes (e.g., in a proestrous female). Therefore, we use an ovariectomized female model in which we can generate an LH surge with an E2-replacement regimen [1]. This model enables us to focus on estrogen effects, exclude progesterone effects, and minimize variability. Inclusion of cycling females would make interpretation much more difficult." I disagree, but the authors can take this position if they wish. However, they should not report the responses to exogenous estradiol in an ovariectomised mouse as a "preovulatory LH surge" (line 380). An ovariectomised mouse cannot ovulate, and the estrogen-induced LH surge is significantly different in magnitude and timing from the endogenous preovulatory LH surge (likely due to the actions of progesterone). One goal of these studies is to understand why the ovulation rate appears to be low in the MC4-KO animals. Hence, evaluating whether the preovulatory LH surge is typical is important. This has not been done. The authors have shown that the response to exogenous estradiol is sub-normal. Such an effect might lead to a reduced preovulatory LH surge, but this has not been measured.

      We appreciate this reviewer’s concern about the nature of the preovulatory LH surge. We have clarified this in the revised manuscript and described it as “an induced LH surge” throughout the text (Lines 163, 533, 6560).

      (6) I believe that the ovulation process should be considered "all or none," and I do not quite understand the rebuttal discussion. The authors describe that "numerous follicles mature at the same time....". That is not disputed. My point was that each mature follicle will receive the identical endocrine ovulatory signal (correct? Or do the authors believe something different?). If it were sufficient for one follicle to ovulate, then all of those mature follicles (the number of which will be variable between animals and between cycles) would be expected to undergo ovulation. The fact that they do not raise several possibilities. One that the authors favor is that an insufficient ovulatory signal might approach a threshold where some follicles ovulate and others do not. This possibility is supported by the apparent increase in cystic follicles, which might be preovulatory follicles that did not complete the ovulation process. Such variation might be stochastic, within normal variation for sensitivity to LH. However, it is also possible that the follicles have not matured at the same rate, perhaps influenced by abnormal secretion of LH or FSH during earlier phases of the cycle, and hence are not in the appropriate condition to respond to the ovulation signal when it arrives. Some may even have matured prematurely due to the elevated gonadotropins reported in this study. Given the data and the partial fertility, the most likely explanation is that the genetic manipulation has resulted in fewer follicles being available for ovulation due to changes in follicular development rather than a deficit of the ovulation signal, although the latter mechanism might also contribute. A third possibility is that genetic manipulation has directly affected the ovary. The authors did not answer whether Kiss1 and MC4 are co-expressed in the ovary. I think the authors might want to rule this out by showing no change in MC4R expression in the ovary.

      We thank the reviewer for this thoughtful comment and agree that these are possible outcomes. We have now acknowledged them in the Discussion.

      To answer the reviewer’s question, we have not investigated the co-expression of Kiss1 and Mc4r in the ovary. While MC4R has indeed been documented in the ovary (Chen et al. Reproduction, 2017), the changes in gonadotropin release and supporting in vitro data included in this manuscript clearly document a central effect, however, an additional effect at the level of the ovary cannot be completely ruled out. This has now been added to the discussion (Line 378-387).

      (7) Lines 390, 454 " impaired LH pulse" What was the evidence for impaired LH pulse (see figure 2D)?

      Thank you for pointing this out. This comment referred to augmented LH release. This has been corrected in the revised manuscript (Line 394).

      The paper's strengths remain, as outlined in my original review. The authors have addressed what I perceived to be weaknesses, predominantly by changing the tone of discussion and interpretation of the data. This is appropriate. I consider the focus on the LH surge as the primary mechanism too narrow, and the authors should be considering how other changes during the cycle might influence ovarian function.

      We sincerely appreciate the reviewer’s thoughtful evaluation of our manuscript and their constructive feedback. We are pleased that our revisions have addressed the perceived weaknesses and that the adjustments to the discussion and interpretation were deemed appropriate.

      We acknowledge the reviewer’s perspective on broadening the discussion beyond the LH surge to consider additional cycle-dependent influences on ovarian function. While our current study focuses on this specific mechanism, we recognize that ovarian function is influenced by multiple physiological changes throughout the cycle. We have refined our discussion to reflect this broader context and appreciate the suggestion to consider these additional factors in future studies.

      We have addressed all of the reviewer’s comments to the best of our ability and hope they find the revised manuscript satisfactory.

    1. we carefully consid-ered and addressed the question of reliance, and whateverone may think about the extent of the legitimate reliance inthat case, it is not in the same league as that present here. Abood had held that a public sector employer may requirenon-union members to pay a portion of the dues collected from union members.
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: In this paper, the authors perform a screen by feeding C. elegans different E. coli genetic mutants and examining the effect on the expression of fat-7, a stearoyl-CoA 9-desturase, which has been associated with longevity. They identify 26 E. coli strains that decrease fat-7 expression, all of which slow development and increase lifespan. RNA sequencing of worms treated with 4 of these strains identified genes involved in defense against oxidative stress among those genes that are commonly upregulated. Feeding C. elegans these 4 bacterial strains results in increased ROS and activation of the mitochondrial unfolded protein response, which appears to contribute to lifespan extension as these bacterial strains do not increase lifespan when the mitochondrial unfolded protein response transcription factor ATFS-1 is disrupted. Finally, the authors demonstrate a role for iron levels in mediating these phenotypes: iron supplementation inhibits the phenotypes caused by the identified bacterial strains, while iron chelation mimics these phenotypes. Response: We thank the reviewer for an excellent summary of our work.

      Major comments: The proposed model involves an increase in ROS levels activating the UPRmt and then leading to lifespan extension. If the elevation is ROS levels is contributing then treatment with antioxidants should prevent UPRmt activation and lifespan extension. Response: This is an excellent point. We will treat the FAT-7-suppressing diets with antioxidants and observe the effect on C. elegans UPRmt activation and lifespan.

      The authors suggest that iron depletion may disrupt iron-sulfur cluster proteins. The Rieske iron-sulfur protein ISP-1 from mitochondrial electron transport chain complex III has previously been associated with lifespan. Point mutations affecting the function of ISP-1 or RNAi decreasing the levels of ISP-1 both result in increased lifespan (PMID 20346072, 11709184). Thus, iron depletion may be increasing ROS, activating UPRmt and increasing lifespan through decreasing ISP-1 levels.

      Response: The reviewer has raised an intriguing possibility that the increased lifespan on the FAT-7-suppressing diets could be because of perturbation of ISP-1 function. While ISP-1 levels may not be directly affected by the mutant diets, ISP-1 function might be perturbed on these diets as ISP-1 function requires iron-sulfur clusters. Therefore, we will study the lifespan of isp-1(qm150) mutant on the FAT-7-suppressing diets to explore whether the lifespan extension on these diets is ISP-1 dependent.

      All of the Kaplan-meier survival plots are missing statistical analyses. Please add p-values.

      Response: The p-values for all the survival plots are included in the respective figure legends.

      It would be helpful to include a model diagram of the proposed mechanisms in the main figures.

      Response: We will make a model diagram after completing the experiments suggested by the reviewers.

      Minor comments: Rather than "mutant diets" it would be more informative to call these "FAT-7-decreasing diets"

      Response: We have changed “mutant diets” to “FAT-7-suppressing diets” throughout the manuscript.

      Is it surprising that none of the bacterial strains increased FAT-7 levels? Why do you think this is?

      Response: Yes, it was indeed surprising to find only bacterial strains that reduced FAT-7 levels and none that increased them. One possible explanation is that these bacterial mutants may not directly regulate fat-7 expression. Instead, they might alter the overall dietary composition, which is known to influence fat-7 levels. It appears that none of the tested mutants modified the diet in a manner that would lead to fat-7 upregulation.

      Page 5. "We hypothesized that diets reducing FAT-7 might elevate oleic acid levels". Since FAT-7 converts stearic acid to oleic acid, wouldn't deceasing FAT-7 levels decrease oleic acid levels and increase stearic acid levels?

      Response: FAT-7 expression is regulated by a feedback mechanism and is sensitive to the fatty acid composition within host cells; elevated levels of unsaturated fatty acids, such as oleic acid, suppress FAT-7 expression. There are two possible ways bacterial mutants could lead to reduced FAT-7 levels: (1) by directly inhibiting FAT-7 expression, which would be expected to result in increased stearic acid levels; or (2) by supplying higher amounts of oleic acid through their composition, thereby suppressing FAT-7 expression via feedback regulation. We focused on the second possibility, as elevated oleic acid levels—like those seen with FAT-7-suppressing diets—are known to promote C. elegans lifespan. To avoid confusion, we have revised the statement to: “We hypothesized that bacterial diets might reduce FAT-7 expression because they have elevated levels of oleic acid”.

      Page 6. The authors cite Bennett et al. 2014 for the statement that "Activation of the UPRmt has been associated with lifespan extension". This paper reaches the opposite conclusion "Activation of the mitochondrial unfolded protein response does not predict longevity in Caenorhabditis elegans". Also, in the Bennett paper and PMID 34585931, it is shown that constitutive activation of ATFS-1 decreases lifespan. Thus, the relationship between the UPRmt and lifespan is not straightforward. These points should be mentioned.

      Response: The reviewer has raised an important point. We have now included a paragraph in the discussion to highlight these points. The revised manuscript reads: “All 26 FAT-7-suppressing diets identified in our study elevated hsp-6p::GFP expression and extended C. elegans lifespan. Although UPRmt activation and lifespan extension were consistently observed across these diets, there was no strong correlation between hsp-6p::GFP levels and the degree of lifespan extension. The role of the UPRmt in promoting longevity remains controversial (Bennett et al., 2014; Soo et al., 2021; Wu et al., 2018). For instance, gain-of-function mutations in atfs-1 have been shown to reduce lifespan (Bennett et al., 2014; Soo et al., 2021). However, a recent study demonstrated that mild UPRmt activation can extend lifespan, whereas strong activation has the opposite effect (Di Pede et al., 2025). These findings suggest that UPRmt contributes to longevity only under specific conditions and at specific activation levels. In our study, lifespan extension on FAT-7-suppressing diets was dependent on ATFS-1, indicating that UPRmt activation was necessary for this effect.

      Page 6. "Our transcriptomic analysis suggested elevated ROS". Rather than refer to gene expression, it would be better to refer to the ROS measurements that were performed.

      Response: We have changed it to the following sentence: “Our ROS measurement analysis suggested elevated ROS levels in worms fed FAT-7-suppressing diets.

      The long-lived mitochondrial mutants isp-1 and nuo-6 have increased ROS, UPRmt activation and increased lifespan. Multiple studies have examined gene expression in these long-lived mutant strains. How does gene expression in these mutants compare to worms treated with the FAT-7-decreasing E. coli mutants? While not necessary for this publication, it would be interesting to see whether the FAT-7-decreasing E. coli strains can increase isp-1 and nuo-6 lifespan.

      Response: We will compare the gene expression changes observed in isp-1 and nuo-6 mutants with the gene expression changes observed in worms exposed to FAT-7-suppressing diets. Additionally, we will examine the lifespan of isp-1 mutants on the mutant diets. These data will be included in the revised manuscript.

      SEK-1 is also involved in the p38-mediated innate immune signaling pathway, which has been shown to contribute to longevity in C. elegans. In fact, disruption of sek-1 using RNAi decreased the lifespan of several long-lived mutant strains PMID 36514863.

      Response: We thank the reviewer for highlighting this point. We have now added that the role of SEK-1 in regulating lifespan on FAT-7-suppressing diets could also be because of its role in innate immunity. The revised manuscript reads: “Notably, SEK-1 also regulates innate immunity and is essential for the extended lifespan observed in several long-lived C. elegans mutants (Soo et al., 2023). Therefore, its effect on lifespan in response to FAT-7-suppressing diets may also stem from its role in innate immune regulation.

      Figure 2. Why were cyoA and ycbk chosen to show the full Kaplan-meier survival plot?

      Response: These were selected randomly to show the range of the lifespan phenotype observed.

      Figure 2, panel D. A better title may be "Mean Survival (Percent increase from control)"

      Response: We have made this change.

      While not necessary for this paper, it would be interesting to determine whether the FAT-7-decreasing E. coli strains alter resistance to oxidative stress.

      Response: We will study the survival of worms on these diets upon supplementation with paraquat.

      Figure 4. It may be interesting to include a correlation plot comparing hsp-6::GFP fluorescence and lifespan. It looks like the magnitudes of increase for each phenotype are not correlated.

      Response: We have added a new Figure (Figure S4) to show the correlation between hsp-6::GFP fluorescence levels and percent change in mean lifespan. Indeed, there is no correlation between these phenotypes.

      Reviewer #1 (Significance (Required)):

      Overall, this is an interesting paper and the experiments are rigorously performed. The bacterial screen was comprehensive and was followed up by careful mechanistic experiments. This paper will be of interest to researchers studying the biology of aging. A diagram of the working model of the underlying mechanisms would enhance the paper. Response: We thank the reviewer for highlighting the significance of the study. We will include a model in the revised manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript, Das et al. investigate how different bacterial mutants affect the lifespan of C. elegans. The authors screened a library of E. coli mutants using a fat-7 reporter and identified 26 strains that reduce fat-7 expression, cause developmental delay, induce the mitochondrial unfolded protein response (using hsp-6 reporter), and increase worm lifespan. Among these, they focused on four strains and demonstrated that the effects of these mutants on developmental delay, fat-7 expression, and hsp-6 induction could be suppressed by iron supplementation. Furthermore, they showed that iron depletion alone is sufficient to induce fat-7 expression in worms. The lifespan extension observed in worms fed these mutant bacterial strains depends on SKN-1, SEK-1, and HLH-30. Overall, this is a well-written manuscript that highlights the role of iron in regulating fat-7 expression. However, the findings from the initial screen do not significantly expand upon what is already known in the literature. Many of the identified hits overlap with those reported by Zhang et al. (2019), which also highlighted the role of iron in developmental delay and hsp-6 induction. While the lifespan data and the role of fat-7 are novel aspects of this study, the authors have not conducted detailed mechanistic investigations to address key questions, such as: 1) How does the deletion of these bacterial genes alter the metabolic state of the diet? 2) How do these metabolic changes influence fat-7 expression in worms? 3) How does the downregulation of fat-7 contribute to longevity? Addressing these points would strengthen the mechanistic insights of the study.

      Response: We thank the reviewer for a thoughtful summary of our work and for the valuable feedback provided to improve the manuscript. We would like to emphasize that the screening conditions and objectives of our study were fundamentally different from those of Zhang et al. (2019). Furthermore, Zhang et al. (2019) did not investigate the effects of the bacterial mutants identified in their screens on C. elegans lifespan. Notably, the 26 bacterial mutants identified in our screen do not overlap with those reported in previous studies that examined bacterial strains promoting C. elegans longevity. As detailed below, we will address the points raised by the reviewer that will certainly strengthen the mechanistic insights of the study.

      Here are my detailed comments: 1. Suppressing FAT-7 levels in C. elegans does not inherently increase lifespan. To directly attribute this effect to FAT-7, it would be important to attempt a rescue experiment to restore FAT-7 expression and assess whether the lifespan extension persists. Additionally, measuring oleic acid levels in these mutants would help determine whether a high-oleic-acid diet is suppressing FAT-7 expression. The role of oleic acid cannot be ruled out using fat-2 mutants (Fig. 3B), as fat-2 mutants accumulate oleic acid when fed WT bacteria, but this may not translate to endogenous oleic acid accumulation in conditions where FAT-7 is suppressed.

      Response: We thank the reviewer for these useful suggestions. We will overexpress FAT-7 under a pan-tissue promoter (eft-3) and study lifespan on FAT-7-suppressing diets. Moreover, to explore whether oleic acid has any role in enhancing lifespan on FAT-7-suppressing diets, we will study the lifespan of worms on these diets upon supplementing with oleic acid along with wild-type bacterium control.

      To understand the host-microbe interaction in this study, it is important to determine what specific changes in the bacteria contribute to the observed phenotypes in worms. Identifying these bacterial factors will provide a clearer picture of their role in influencing worms stress signaling and lifespan.

      Response: The phenotypes observed in C. elegans across all the identified bacterial mutants are remarkably consistent, including increased UPRmt activation, reduced FAT-7 levels, delayed development, and extended lifespan. This consistency suggests that a common underlying factor is driving these effects. Although the bacterial mutants appear genetically diverse, gene expression data from C. elegans, along with comparisons to the findings of Zhang et al. (2019), indicate that elevated levels of reactive oxygen species (ROS) may represent this shared factor. These results suggest that bacterial ROS play a central role in mediating the host-microbe interactions underlying the observed phenotypes. To further support this hypothesis, we will directly measure ROS levels in the identified bacterial mutants. Additionally, we will test whether antioxidant treatment can suppress the C. elegans phenotypes, thereby establishing a causal role for bacterial ROS.

      It is important to rule out any changes in food consumption in worms fed these bacterial mutants, as differences in feeding amount could attribute to the observed lifespan effects.

      Response: We will carry out pharyngeal pumping rate measurements to study whether there is any difference in food consumption in worms fed these bacterial mutants.

      In figure 5A to 5G, please include the same-day controls to help clarify how iron supplementation effects these phenotypes relative to the control. For example, in Fig. 5F, it appears that iron extends the lifespan of worms fed the control diet. It would be clearer if appropriate controls were included in all of these figures or summarized in a table to help understand the impact of iron.

      Response: We will include these controls in the revised manuscript.

      How does iron depletion affect the levels of fat-7, and how does this contribute to the activation of the longevity pathways discussed in the manuscript.

      Response: This is an intriguing question. There are at least two possible explanations: (1) oxidative stress may directly downregulate fat-7 expression, and (2) iron depletion could reduce ferroptosis, which in turn may influence fatty acid metabolism. In the revised manuscript, we will include data on how oxidative stress affects FAT-7 expression.

      Minor comments 1. Please include a detailed table of the lifespan data for all replicates as a supplementary table.

      Response: We have included the details of survival curves for all the data in the new Table S2.

      In the Methods section, specify at what stage the worms were exposed to iron and the iron chelator for the lifespan experiments.

      Response: The L1-synchronized worms were exposed to iron and iron chelator plates and allowed to develop till the late L4 stage before being transferred to lifespan assay plates that also contained the respective supplements. This information is now included in the Methods section.

      Please clarify whether equal optical density (O.D.) of cells was seeded for both the WT and mutant strains, and mention if the mutants exhibit any growth defects.

      Response: We have examined the growth of the bacterial mutants and found that they do not exhibit growth defects. Therefore, for all the assays, NGM plates were seeded with saturated cultures of all the bacterial strains. We have now included the growth curves data in the manuscript (Figure S4).

      Reviewer #2 (Significance (Required)):

      Significance General Assessment: This study by Das et al. explores the impact of bacterial mutants on C. elegans lifespan. A key strength of the study is the identification of bacterial mutants that influence the expression of the gene encoding fatty acid desaturase (fat-7) and lifespan in C. elegans. Furthermore, the study highlights the role of iron in regulating fat-7 expression, suggesting that iron imbalance may play a crucial role in modulating fatty acid metabolism. However, the study's main limitation is that it does not significantly extend the current understanding of the microbial modulation of host metabolism and aging, as many of the identified bacterial hits overlap with those previously reported in Zhang et al. (2019). The manuscript would benefit from more in-depth mechanistic exploration, especially with regard to how specific bacterial factors influence the metabolic state of the worms and how these changes ultimately modulate fat-7 expression and longevity.

      Response: We thank the reviewer for highlighting the significance of our study. Once again, we would like to emphasize that the screening conditions and objectives of our study differed fundamentally from those of Zhang et al. (2019). Furthermore, Zhang et al. did not investigate the impact of the bacterial mutants identified in their screen on C. elegans lifespan. As outlined above, we will address the reviewer’s comments, which will undoubtedly strengthen the mechanistic insights of our study.

      Advance: This study presents a conceptual advance by exploring the iron-dependent regulation of fat-7 expression and lifespan in C. elegans, linking bacterial mutations with key longevity pathways (SKN-1, SEK-1, and HLH-30). The novelty lies in the direct investigation of the bacterial-induced changes in fat-7 expression, though the role of iron in these mutants for development and induction of mito-UPR was previously shown in the literature. This study also adds to the growing body of work on C. elegans as a model for studying aging and host-microbe interactions, particularly in understanding how diet and microbial exposure affect metabolic processes and lifespan.

      Response: We thank the reviewer for highlighting the advancement made by our study.

      Audience: This research will primarily interest specialized audiences in aging research, microbiology, and metabolism, especially those focused on host-microbe interactions. Keywords of my expertise: Host-microbe interactions, metabolism, system biology, C. elegans, aging.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this paper, the authors perform a screen by feeding C. elegans different E. coli genetic mutants and examining the effect on the expression of fat-7, a stearoyl-CoA 9-desturase, which has been associated with longevity. They identify 26 E. coli strains that decrease fat-7 expression, all of which slow development and increase lifespan. RNA sequencing of worms treated with 4 of these strains identified genes involved in defense against oxidative stress among those genes that are commonly upregulated. Feeding C. elegans these 4 bacterial strains results in increased ROS and activation of the mitochondrial unfolded protein response, which appears to contribute to lifespan extension as these bacterial strains do not increase lifespan when the mitochondrial unfolded protein response transcription factor ATFS-1 is disrupted. Finally, the authors demonstrate a role for iron levels in mediating these phenotypes: iron supplementation inhibits the phenotypes caused by the identified bacterial strains, while iron chelation mimics these phenotypes.

      Major comments:

      The proposed model involves an increase in ROS levels activating the UPRmt and then leading to lifespan extension. If the elevation is ROS levels is contributing then treatment with antioxidants should prevent UPRmt activation and lifespan extension.

      The authors suggest that iron depletion may disrupt iron-sulfur cluster proteins. The Rieske iron-sulfur protein ISP-1 from mitochondrial electron transport chain complex III has previously been associated with lifespan. Point mutations affecting the function of ISP-1 or RNAi decreasing the levels of ISP-1 both result in increased lifespan (PMID 20346072, 11709184). Thus, iron depletion may be increasing ROS, activating UPRmt and increasing lifespan through decreasing ISP-1 levels.

      All of the Kaplan-meier survival plots are missing statistical analyses. Please add p-values.

      It would be helpful to include a model diagram of the proposed mechanisms in the main figures.

      Minor comments:

      Rather than "mutant diets" it would be more informative to call these "FAT-7-decreasing diets"

      Is it surprising that none of the bacterial strains increased FAT-7 levels? Why do you think this is?

      Page 5. "We hypothesized that diets reducing FAT-7 might elevate oleic acid levels". Since FAT-7 converts stearic acid to oleic acid, wouldn't deceasing FAT-7 levels decrease oleic acid levels and increase stearic acid levels?

      Page 6. The authors cite Bennett et al. 2014 for the statement that "Activation of the UPRmt has been associated with lifespan extension". This paper reaches the opposite conclusion "Activation of the mitochondrial unfolded protein response does not predict longevity in Caenorhabditis elegans". Also, in the Bennett paper and PMID 34585931, it is shown that constitutive activation of ATFS-1 decreases lifespan. Thus, the relationship between the UPRmt and lifespan is not straightforward. These points should be mentioned.

      Page 6. "Our transcriptomic analysis suggested elevated ROS". Rather than refer to gene expression, it would be better to refer to the ROS measurements that were performed.

      The long-lived mitochondrial mutants isp-1 and nuo-6 have increased ROS, UPRmt activation and increased lifespan. Multiple studies have examined gene expression in these long-lived mutant strains. How does gene expression in these mutants compare to worms treated with the FAT-7-decreasing E. coli mutants? While not necessary for this publication, it would be interesting to see whether the FAT-7-decreasing E. coli strains can increase isp-1 and nuo-6 lifespan.

      SEK-1 is also involved in the p38-mediated innate immune signaling pathway, which has been shown to contribute to longevity in C. elegans. In fact, disruption of sek-1 using RNAi decreased the lifespan of several long-lived mutant strains PMID 36514863.

      Figure 2. Why were cyoA and ycbk chosen to show the full Kaplan-meier survival plot?

      Figure 2, panel D. A better title may be "Mean Survival (Percent increase from control)"

      While not necessary for this paper, it would be interesting to determine whether the FAT-7-decreasing E. coli strains alter resistance to oxidative stress.

      Figure 4. It may be interesting to include a correlation plot comparing hsp-6::GFP fluorescence and lifespan. It looks like the magnitudes of increase for each phenotype are not correlated.

      Significance

      Overall, this is an interesting paper and the experiments are rigorously performed. The bacterial screen was comprehensive and was followed up by careful mechanistic experiments. This paper will be of interest to researchers studying the biology of aging. A diagram of the working model of the underlying mechanisms would enhance the paper.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):*

      Summary: Chitin is a critical component of the extracellular matrix of arthropods and plays an essential role in the development and protection of insects. There are two chitin synthases in insects: Type A (exoskeletons) and Type B (for the peritrophic matrix in the gut). The study aims to investigate the specificity and mechanisms of the two chitin synthases in D. melanogaster and to clarify whether they are functionally interchangeable. Various genetic manipulations and fluorescence-based labeling were used to analyze the expression, localization, and function of Kkv and Chs2 in different tissues. Chs2 is expressed in the PR cells of the proventriculus and is required for chitin deposition in the peritrophic matrix. Kkv can deposit chitin in ectodermal tissues but not in the peritrophic matrix, whereas Chs2 can deposit chitin in the peritrophic matrix but not in ectodermal tissues. The subcellular localization of chitin synthases is specific to the tissues in which they are expressed. Kkv localizes apically in ectodermal tissues, whereas Chs2 localizes apically in the PR cells of the proventriculus. Altogether, Kkv and Chs2 cannot replace each other. The specificity of chitin synthases in D. melanogaster relies on distinct cellular and molecular mechanisms, including intracellular transport pathways and the specific molecular machinery for chitin deposition.*

      • *

      Congratulations on this incredible story and manuscript, which is straightforward and well-written. However, I have some comments that may help to improve it.

      We thank the reviewer for this very positive comment. We have addressed all comments to clarify and improve our manuscript.

      Major comments: 1.) Funny thing: the Chs2 mutant larva shows a magenta staining below the chitin accumulation of the esophagus, which looks like a question mark in 1H but cannot be found in control. Is that trachea reaching the pv?

      We assume that the reviewer refers to Fig 1N. As the reviewer suspects, this corresponds to a piece of trachea. Figure 1N shows a single section, making it difficult to identify what this staining corresponds to. We are providing below a projection of several sections where it is easier to identify the staining as tracheal tissue (arrow).

      We are now marking this pattern as trachea (tr) in the manuscript Figure 1N

      2.) Also, though it is evident that the PM chitin is lost in Ch2 mutants, could it be that the region is disturbed and cells express somewhere else chitin? There are papers by Fuß and Hoch (e.g., Mech of Dev, 79, 1998; Josten, Fuß et al., Dev. Biol.267, 2004) using markers such as Dve, Fkh, Wg, Delta, and Notch, etc. for precisely marking the endodermal/ectodermal region in the embryonic foregut/proventriculus. It would be beneficial to show, along with chitin and Chs expression patterns, the ectoderm/endoderm cells. This is particularly important as the authors report endodermal expression of Chs2 in embryos but don't use co-markers of the endodermal cells.

      We agree with the reviewer that this is an important issue and we note that Reviewer 2 also raised the same point. Therefore, we have addressed this issue.

      We obtained an antibody against Dve, kindly provided by Dr. Hideki Nakagoshi. Dve marks the endodermal region in the proventriculus (Fuss and Hoch, 1998, Fuss et al., 2004, Nakagoshi et al., 1998).This antibody worked nicely in our dissected L3 digestive tracts and allowed us to mark the endodermal region. We also obtained an antibody against Fkh, kindly provided by Dr. Pilar Carrera. Fkh marks the ectodermal foregut cells (Fuss and Hoch, 1998, Fuss et al., 2004). While, in our hands, this antibody performed well in embryonic tissues, we observed no staining in our dissected L3 digestive tracts. The reason for this is unclear, but we suspect technical limitations may be responsible (the ectodermal region of the proventriculus is very internal, potentially hindering antibody penetration). To circumvent this inconvenience, we tested a FkhGFP tagged allele available in Bloomington Stock Center. Fortunately, we were able to detect GFP in ectodermal cells of L3 carrying this allele. Using this approach, we conducted experiments to detect Fkh and Dve in the wild type or in Df(Chs2) conditions (Fig S1). In addition, we used these markers to map the expression of Kkv and Chs2 in the proventriculus (Fig 4).

      Altogether the results using these endodermal/ectodermal markers confirmed the presence of a cuticle adjacent to the FkhGFP-positive cells and a PM adjacent to the PR cells, marked by Dve. This PM is absent in Df(Chs2) L3 escapers, however, the general pattern of Fkh/Dve expression is not affected. Finally, we show that Chs2-expressing cells are positive for Dve while Kkv-expressing cells are not. We were unable to conduct an experiment demonstrating Kkv and Fkh co-expression due to technical incompatibilities, as both genes require the use of GFP-tagged alleles to visualise their expression. However, we believe that our imaging of Dve/Kkv clearly shows that Kkv expressing cells lack Dve expression and are localised in the internal (ectodermal) region of the proventriculus (Fig 4E).

      3.) The origin of midgut chitin accumulation is unclear. Chitin can come from yeast paster. Can the authors check kkv and chs2 mutants for food passage and test starving L1 larvae to detect chitin accumulation in the midgut without feeding them?

      This is a very interesting point that has also intrigued us.

      We observed that, in addition to the PM layer lining the midgut epithelium, CBP staining also revealed a distinct luminal pattern. Our initial hypothesis was that this pattern corresponded to the PM. However, its presence in Df(Chs2) larval escapers clearly indicates that this is not the case. Unfortunately, we cannot assess this pattern in kkv mutants, as these die at eclosion and do not proceed to larva stages.

      As the reviewer suggests, a likely possibility is that the luminal pattern originates from components in the food. These could correspond to yeast, as suggested by the reviewer, or possibly remnants of dead larvae present in the media (although Drosophila is considered herbivore in absence of nutritional stress).

      To assess whether the luminal pattern originates from the food we conducted two independent experiments. In experiment 1, we collected larvae reared under normal food conditions. Newly emerged L3 larvae were transferred in small numbers to minimise cannibalism (Ahmad et al., 2015) to new Petri plates containing moist paper. Larvae were starved for 3,4 or 5 days. Larvae starved for more than 5 days did not survive. We then dissected the guts and analysed CBP staining. We observed the presence of luminal CBP staining in these larvae, along with the typical PM signal in the proventriculus and along the midgut. In experiment 2, we collected larvae directly on agar plates containing only agar (without yeast or any other nutrients). We allowed the larvae to develop. These larvae showed minimal growth. We dissected the guts of these small larvae (which were challenging to dissect) and analysed CBP staining. Again, we detected presence of luminal CBP staining.

      These experiments indicate that, despite starvation, a luminal chitin pattern is still detected, suggesting that it is unlikely to originate from food. However, we cannot unequivocally rule out the possibility that the cannibalistic, detrivorous or carnivorous behavior of the nutrionally stressed larvae (Ahmad et al., 2015) in our experiments may influence the results. Therefore, more experiments would be required to address this point.

      In summary, while we cannot provide a definitive answer to the reviewer's question, nor fully satisfy our own curiosity, we would like to note that this specific observation is unrelated to the main focus of our study, as we have confirmed that the luminal pattern is not dependent on Chs2 function.

      Portions of midgut of starved larvae under the regimes indicated, stained for chitin (CBP, magenta). Note the presence of the luminal chitin pattern in the midgut

      4.) Subcellular localization assays require improved analysis, such as a co-marker for the apical membrane and statistical analysis with co-localization tools, showing the overlap at the membrane and intracellularly with membrane co-markers and KDEL.

      We have addressed the point raised by the reviewer. To analyse and quantify Chs2 subcellular localisation, particularly considering the observed pattern, we decided to use both a membrane and an ER marker. As a membrane marker we used srcGFP expressed in tracheal cells (see answer to point 7 of Reviewer 1) and as an ER marker we used KDEL. In this analysis, tracheal cells also expressed Chs2, which was visualised using the Chs2 antibody generated in the lab.

      To assess the colocalisation of Chs2 with each marker we used the JaCop pluggin in Fiji. We analysed individual cells from different embryos stained for membrane/ER/Chs2 using single confocal sections (to avoid artificial colocalisation). Images were processed as described in Materials and Methods. We obtained the Pearson's correlation coefficient (r), which measures the degree of colocalisation, for Chs2/srcGFP and Chs2/KDEL, n=36 cells from 9 different embryos. The average r value for Chs2/srcGFP was 0,064, while the average for Chs2/KDEL was around 0,7. r ranges between -1 and 1, where 1 indicates perfect correlation, 0 no correlation, and -1 perfect anti-correlation. Typically, an r value of 0.7 and above is considered a strong positive correlation, whereas a value below 0,1 is regarded as very weak or no correlation. Thus, our colocalisation analysis supports the hypothesis that Chs2 is primarily retained in the ER when expressed in non-endogenous tissues, likely unable to reach the membrane.

      We have reorganised the figures and now present an example of Chs2/srcGFP/KDEL subcellular localisation in tracheal cells and the colocalisation analysis in Fig 5H. The colocalisation analysis is described in the Materials and Methods section.

      Minor comments:

      5.) The authors used "L3 larval escapers." It would be interesting to know if the lack of Chs2 and the peritrophic matrix cause any physiological defects or lethality.

      The point raised by the reviewer is very interesting and relevant. The peritrophic matrix is proposed to play several important physiological roles, including the spatial organisation of the digestive process, increasing digestive efficiency, protection against toxins and pathogens, and serving as a mechanical barrier. Therefore, it is expected that the absence of chitin in the PM of the Df(Chs2) larval escapers may cause various physiological effects.

      Analysing these effects is a complex task, and it constitutes an entire research project on its own. In addressing the physiological requirements of the PM, we aim to analyse adult flies and assess various parameters, including viability, digestive transit dynamics, gut integrity, resistance to infections, fitness and fertility.

      A critical initial challenge in conducting a comprehensive analysis of the physiological requirements of the PM is identifying a suitable condition to evaluate the absence of Chs2. In this work we are using a combination of two overlapping deficiencies that uncover Chs2, along with a few additional genes (as indicated in Fig S1F). This deficiency condition presents two major inconveniences: first, the observed defects could be caused or influenced by the absence of genes other than Chs2, preventing us from conclusively attributing the defects to Chs2 loss (unless we rescued the defects by adding Chs2 back as we did in the manuscript). Second, the larva escapers, which are rare, do not survive to adulthood (indicating lethality but preventing us from analysing specific physiological aspects).

      To overcome these limitations, we are currently working to identify a genetic condition in which we can specifically analyse the absence of Chs2. We have identified several available RNAi lines and we are testing their efficiency in preventing chitin deposition in the PM. Additionally, we are characterising a putative null Chs2 allele, Chs2CR60212-TG4.0. This stock contains a Trojan-GAL4 gene trap sequence in the third intron, inserted via CRISPR/Cas9. As described in Flybase (https://flybase.org/), the inserted cassette contains a 'Trojan GAL4' gene trap element composed of a splice acceptor site followed by the T2A peptide, the GAL4 coding sequence and an SV40 polyadenylation signal. When inserted in a coding intron in the correct orientation, the cassette should result in truncation of the trapped gene product and expression of GAL4 under the control of the regulatory sequences of the trapped gene. We already know that, when crossed to a reporter line (e.g. UAS-GFP or UAS-nlsCherry) this line reproduces the Chs2 expression pattern, suggesting that the insertion may generate a truncated Chs2 protein. This line would represent an ideal tool to assess the absence of Chs2, and we are currently characterising it for further analysis

      In summary, we fully agree with the reviewer that investigating the physiological requirements of the PM is a compelling area of research, and we are actively addressing this question. However, this investigation constitutes a substantial and independent research effort that we believe is beyond the scope of the current manuscript at this stage.

      6.) The order identifiers are missing for materials and antibodies, e.g., anti-GFP (Abcam), but Abcam provides several ant-GFP; which was used? Please provide order numbers that guarantee the repeatability for others.

      We have now added all identifiers for materials and reagents used, in the materials and methods section.

      7.) Figure S5C, C', what marks GFP (blue) in the trachea? Maybe I have overlooked the description. What is UASsrcGFP? What is the origin of this line?

      We apologise for not providing a more detailed description of the UASsrcGFP line. This line corresponds to RRID BDSC#5432, as now indicated in Materials and Methods section.

      In this transgene, the UAS regulatory sequences drive the expression of GFP fused to Tag:Myr(v-src). As described in Flybase (https://flybase.org/), the P(UAS-srcEGFP) construct contains the 14 aa myristylation domain of v-src fused to EGFP. This tag is commonly used to target proteins of interest to the plasma membrane. The construct was generated by Eric Spana and is available in Drosophila stock centers.

      We typically use this transgene as a plasma membrane marker to outline cell membrane contours. In our experiments, srcGFP, under the control of the btlGal4 promoter, was used to visualise the membrane of tracheal cells in relation to Chs2 accumulation. As indicated in point 4, we have now transferred the images of srcGFP/Chs2/KDEL to the main Figures and used it for colocalisation analyses.

      8.) The authors claim that they validated the anti-Chs2 antibody. However, they show only that it recognizes a Cht2 epitope via ectopic expression. For more profound validation, immune staining is required in deletion mutants, upon knockdown, or upon expression of recombinant proteins, which is not shown.

      We generated an antibody against Chs2. We found that the antibody does not reliably detect the endogenous Chs2 protein, and so we find no pattern in the proventriculus or any other tissue in our immunostainings. It is very possible that the combination of low endogenous levels of Chs2 with a sub-optimal antibody (or low titer) leads to this result. In any case, as the antibody does not detect endogenous Chs2, it cannot be validated by analysing the expression upon Chs2 knockdown. In contrast, our antibody clearly detects specific staining in various tissues (e.g. trachea, salivary glands, gut) when Chs2 is expressed using the Gal4/UAS system, confirming its specificity for Chs2. It is worth to point that it is not unusual to find antibodies that are not sensitive enough to detect endogenous proteins but can detect overexpressed proteins (e.g

      (Lebreton and Casanova, 2016)).

      As an additional way to validate the specificity of our antibody, we have used the chimeras generated, as suggested by the reviewer. As indicated in the Materials and Methods section, the Anti-Chs2 was generated against a region comprising 1222-1383 aa in Chs2, with low homology to Kkv. This region is present in the kkv-Chs2GFP chimera but absent in Chs2-KkvGFP (see Fig 7A). Accordingly, our antibody recognises kkv-Chs2GFP but does not recognise Chs2-KkvGFP (Fig S7).

      We have revised the text in chapter 6 (6. Subcellular localisation of Chs2 in endogenous and ectopic tissues) to clarify these points and we have added the validation of the antibody using the chimeras in chapter 8 (8. Analysis of Chs2-Kkv chimeras) and Fig S7

      9) The legend and text explaining Fig. 4 D-E' can be improved. The authors used the Crimic line, which is integrated into the third ("coding") intron. This orientation can lead to the expression of Gal4 and cause a truncated version of the protein (according to Flybase). Is Chs2 expression reduced in the crimic mutant? If the mutation causes expression of a truncated version, the Chs2 antibody may not be able to detect it as it recognizes a fragment between 1222 and 1383 aa? Also, I'm unsure whether the Chs2 antibody or GFP was used to detect expression in PR cells. The authors describe using Ch2CR60212>SrcGFP together with Chs2+ specific antibodies.

      We apologise for the confusion.

      As the reviewer points, Chs2CR60212-TG4.0 contains a Trojan-GAL4 gene trap sequence in the third intron, inserted via CRISPR/Cas9. As described in Flybase (https://flybase.org/), the inserted cassette contains a 'Trojan GAL4' gene trap element composed of a splice acceptor site followed by the T2A peptide, the GAL4 coding sequence and an SV40 polyadenylation signal. When inserted in a coding intron in the correct orientation, the cassette should result in truncation of the trapped gene product and expression of GAL4 under the control of the regulatory sequences of the trapped gene.

      We found that when crossed to UAS-GFP or UAS-nlsCherry, this line reproduces a expression pattern that must correspond to Chs2. As the antibody that we generated is not suitable for detecting Chs2 endogenous expression, we resorted to using this combination, Chs2CR60212-TG4.0 crossed to a reporter line (such asUAS-GFP or UAS-nlsCherry), to visualise Chs2 expression by staining for GFP/Cherry in the intestinal tract and in the embryo (Figures 4 and S4).

      We realise that the Figure labelling we used in our original submission is very misleading, and we apologise for this. In the original figures we had labelled the staining combination with Kkv, Chs2, Exp as if we had used these antibodies. However, in all cases, we used GFP to visualise the pattern of these proteins in the genetic combinations indicated in the figures. We have corrected this in our revised version. We have also updated the text (Chapter 5), figures and figure legends.

      As the reviewer points, the insertion in Chs2CR60212-TG4.0 is likely to generate a truncated Chs2 protein. We cannot confirm this using the Chs2 antibody we generated because it does not recognise the endogenous Chs2 pattern. Nevertheless, as indicated in point 5, we are currently characterising this line. Our preliminary results indicate a high complexity of effects from this allele that require thorough analysis, as it may be acting as a dominant negative.

      Reviewer #1 (Significance (Required)):

      Significance: The manuscript's strength and most important aspects are the genetic analysis, expression, and localization studies of the two Chitin synthases in Drosophila embryos and larvae. However, beyond this manuscript, the development of mechanistic details, such as interaction partners that trigger secretion and action at the apical membranes and the role of the coiled-coil domain, will be interesting.

      The manuscript uses "first-class" genetics to describe the different roles of the two Chitin synthases in Drosophila, comparing ectodermal chitin (tracheal and epidermal chitin) with endodermal (midgut) chitin. Such a precise analysis has not been investigated before in insects. Therefore, the study deeply extends knowledge about the role of Chitin synthases in insects.

      The audience will specialize in basic research in zoology, developmental biology, and cell biology regarding - how the different Chitin synthases produce chitin. Nevertheless, as chitin is relevant to material research and medical and immunological aspects, the manuscript will be fascinating beyond the specific field and thus for a broader audience.

      I'm working on chitin in the tracheal system and epidermis in Drosophila.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ Drosophila have two different chitin synthase enzymes, Kkv and Chs2, and due to unique expression patterns and mutant phenotypes, it is relatively clear that they have different functions in producing either the cuticle-related chitin network (Kkv) or the chitin associated with the peritrophic matrix (PM). However, what is unknown is whether the different functions in making cuticle vs PM chitin is related to differences in cellular expression and/or enzyme properties within the cell. The authors exploit the genetic tractability of Drosophila and their ability to image cuticle vs PM chitin production to examine whether these 2 enzymes can substitute each other. They conclude that these two proteins are not equivalent in their capacity to generate chitin. The data are convincing; however, it is currently presented in a subjective fashion, which makes it difficult to interpret. Additionally, in my opinion there is some interpretation that requires softening or alternatively interpreted.

      We are pleased that the reviewer finds our data convincing. However, we acknowledge the reviewer's concern that our data was presented in a subjective manner, and we apologise for this. In response, we have carefully reviewed the entire manuscript and revised our data presentation to ensure a more objective tone. Numerous changes (including additional quantifications, new experiments and clarifications) have been incorporated throughout the text. These revisions are highlighted in the marked-up version. We hope that this revision provides a more accurate and objective presentation of our work.

      Major Comments:

      1- While the imaging is lovely, there are some things that are difficult to see in the figures. For example, the "continuous, thin and faint 'chitin' layer that lined the gut epithelium" is very difficult to visualise in the control images. Can they increase the contrast to help the reader appreciate this layer? This is particularly important as we are asked to appreciate a loss of this layer in the absence of Chs2.

      We have tried to improve the figures so that the PM layer in the midgut region is more clearly visible. We have added magnifications of small sections at the midgut lumen/epithelium border in grey to help visualise the PM. These improvements have been made in Figures 1,2,S1,S2,S3 and we believe that they better illustrate our results.

      2- All the mutant analysis is presented subjectively. For example, the authors state that they "found a consistent difference of CBP staining when they compared the 'Chs2' escapers to the controls". How consistent is consistent? Can this be quantified? What is the penetrance of this phenotype? They say that the thin layer is absent in the midgut and the guts are thinner. Could they provide more concrete data?

      As indicated above, we have reviewed the text to provide a more objective description of the phenotypes.

      We have quantified the defects in the Df(Chs2) mutant conditions. For this quantification we dissected intestinal tracts of control and Df(Chs2) larva escapers. We fixed, stained and mounted them together. The control guts expressed GFP in the midgut region as a way to distinguish control from mutants. We analysed the presence or absence of chitin in the PM. We found absence of chitin in the proventricular lumen and in the midgut in all Df(Chs2) guts and presence of chitin there in all control ones (n=12 Df(Chs2) guts, n=9 control guts, from 5 independent experiments). The results indicate a fully penetrant phenotype of lack of chitin in Df(Chs2) larva escapers (100% penetrance). We have added this quantification in the text, chapter 2 (2. Chs2 deposits chitin in the PM).

      To quantify the thickness of the guts, we took measurements of the diameter in control and Df(Chs2) guts at two comparable distance positions from the proventriculus (position 1, position 2, see image). Our quantifications indicated thinner tubes in mutant conditions.

      Image shows the anterior part of the intestinal tract, with the proventriculus encircled in white. Positions 1 and 2 indicate where the diameter quantifications were taken. Scatter plots quantifying the diameter at the two different positions in control and Chs2 larval escapers. Bars show mean {plus minus} SD. p=p value of unpaired t test two-tailed with Welch's correction.

      However, we are aware that our analysis of the thickness of the gut is not accurate, because we have not used markers to precisely measure at the same position in all guts and because we have not normalised the measurement position in relation to the whole intestinal tract (mainly due to technical issues).

      In relation to the fragility, we noticed that the guts of Chs2 larval escapers tended to break more easily during dissection than control guts, however, we have not been able to quantify this parameter in a reliable and objective manner.

      Since we consider that the requirement of Chs2 for PM deposition is sufficiently demonstrated, and that aspects such as gut morphology or fragility relate to the physiological requirements of the PM, which we are beginning to address as a new independent project (see our response to point 5 of Reviewer 1), we have decided to remove the sentence 'We also noticed that the guts of L3 escapers were thinner and more fragile at dissection." from the manuscript to avoid subjectivity.

      3- They state that Chs2 was able to restore accumulation of chitin in the PM of the proventriculus and the midgut. Please quantify. Additionally, does this restore the morphology of the guts (related to the comment above on the thinner guts in the absence of Chs2)?

      We have quantified the rescue of chitin deposition in the PM when Chs2 is expressed in PR cells in a Df(Chs2) mutant background. For this quantification we used the following genetic cross: PRGal4/Cyo; Df(Chs2)/TM6dfdYFP (females) crossed to UASChs2GFP or UASChs2/Cyo; Df(Chs2)/TM6dfdYFP. We selected Df(Chs2) larval escapers by the absence of TM6 (recognisable by the body shape). Among these larval escapers, we identified the presence of Chs2 in PR cells by the expression of GFP or Chs2. We found absence of chitin in the proventriculus and in the midgut in all Df(Chs2) guts that did not express Chs2 in PR cells (n=8/8 Df(Chs2)). In contrast, chitin was present in those intestinal tracts where Chs2 expression was detected in PR cells (n=8/8 PRGal4-UASChs2; Df(Chs2) guts, from 5 independent experiments). The results indicate a full rescue of chitin deposition by Chs2 expression in PR cells in Df(Chs2) mutant larvae. We have added this quantification in the text, chapter 2 (2. Chs2 deposits chitin in the PM).

      As requested by the reviewer, we have also conducted measurements to quantify gut thickness. We performed an analysis similar to the one described in point 2, this time comparing the diameter of Df(Chs2) and PRGal4-UASChs2;Df(Chs2) guts at positions 1 and 2 (see image in point 2 of Reviewer 2). Our quantifications indicated that guts were thicker when Chs2 is expressed in the PR region in Df(Chs2) larval escapers.

      As discussed in point 2, we have decided not to include these results in the manuscript, as this type of analysis requires a more comprehensive investigation.

      Scatter plots quantifying the diameter at the two different positions in Chs2 larval escapers and Chs2 larval escapers expressing Chs2 in PR cells. Bars show mean {plus minus} SD. p=p value of unpaired t test two-tailed with Welch's correction.

      4- This may be beyond the scope of this paper, but I find it interesting that the PM chitin is deposited in the proventricular lumen. Yet it forms a thin layer that lines the entire midgut? Any idea how this presumably dense chitin network gets transported throughout the midgut to line the epithelium? I imagine that this is unlikely due to diffusion, especially if they see an even distribution across the midgut. Do they see any evidence of a graded lining (i.e. is it denser in the midgut towards the proventriculus and does this progressively decrease as you look through the midgut?)?

      Insect peritrophic matrices have been classified into Type I and II (with some variations) depending on their origin (extensively reviewed in (Peters, 1992, Hegedus et al., 2019). Type I PMs are typically produced by delamination as concentric lamellae along the length of the midgut. Type II PMs, in contrast, are produced in a specialised region of the midgut that corresponds to the proventriculus and are typically more organised than Type I. In Type II PMs, distinct layers originate from distinct cell clusters in the proventriculus. It has been proposed that as food passes, it becomes encased by the extruded PM, which then slides down to ensheath the midgut. Drosophila larvae have been proposed to secrete a type II PM: through PM implantation experiments, Rizki proposed that the proventriculus is required to generate the PM in Drosophila larvae (Rizki, 1956). Our experiments confirmed this hypothesis: we show that expressing Chs2 exclusively in PR cells is sufficient to produce a PM along the midgut. Furthermore, we also show that expressing Chs2 in the midgut is not sufficient to produce a PM layer lining the midgut, at least at larval stages.

      The type II PM in Drosophila is proposed to be fully organised into four layers in the proventricular region (also referred as PM formation zone) before reaching the midgut (Peters, 1992, King, 1988, Rizki, 1956, Zhu et al., 2024). However, the mechanism by which the PM is subsequently transported into the midgut remains unclear. PM movement posteriorly is thought to depend on to the pressure exerted by continuous secretion of PM material (Peters, 1992). Early work by Wigglesworth (1929, 1930) proposed that the PM is secreted into the proventricular lumen, becomes fully organised, and is then pushed down by a press mechanism involving the aposed ectodermal/endodermal walls of the proventriculus. Rizki suggested that muscular contractions of the proventriculus walls may play a role, and that peristaltic movements of the gut add a pulling force to push the PM into the midgut (Rizki, 1956). Nevertheless, to our knowledge, the exact mechanism is still not fully understood.

      In response to the reviewer's question, the level of resolution of our analysis does not allow us to determine whether there is a graded PM lining along the midgut. However, available data using electron microscopy approaches suggest that the PM is a fully organised structure composed of four layers that is secreted and transported to line the midgut (King, 1988, Zhu et al., 2024).

      5- The authors state that expression of kkv in tracheal cells of kkv mutants perfectly restores accumulation of chitin in the luminal filaments. Is this really 100% restoration? They also reference a paper here, which may have quantified this result.

      We previously reported that the expression of kkv in tracheal cells restores chitin deposition in kkv mutants (Moussian et al,2015). However, our previous study did not quantify this rescue. As requested by the reviewer, we have now quantified the extent of the rescue.

      To perform this quantification, we used the following genetic cross:

      btlGa4/(Cyo); kkv/TM6dfdYFP (females) crossed to +/+; kkv UASkkvGFP/TM6dfdYFP (males)

      We stained the resulting embryos with CBP (to detect chitin) and GFP. GFP staining allowed us to identify the kkv mutants (by the absence of dfdYFP marker) and to simultaneously identify the embryos that expressed kkvGFP in tracheal cells (through btlGal4-driven expression). Since btlGal4 is homozygous viable, most females carried two copies of btlGal4.

      We compared the following embryo populations across 4 independent experiments:

      1. Cyo/+; kkv/kkv UASkkvGFP (kkv mutants not expressing kkv in the trachea)
      2. btlGal4/+; kkv/kkv UASkkvGFP (kkv mutants expressing kkv in the trachea) Results:

      3. Cyo/+; kkv/kkv UASkkvGFP ---- 0/6 embryos deposited chitin in trachea

      4. btlGal4/+; kkv/kkv UASkkvGFP ---- 27/27 embryos deposited chitin in trachea These results indicate complete restauration of chitin deposition in kkv mutants when kkv is expressed in tracheal cells (100% rescue).

      To further investigate whether Chs2 can compensate for kkv function in ectodermal tissues, we performed a similar quantification using the following genetic cross:

      btlGa4/(Cyo); kkv/TM6dfdYFP (females) crossed to UASChs2GFP/UASChs2GFP; kkv UASkkvGFP/TM6dfdYFP (males)

      We compared the following embryo populations across 2 independent experiments:

      1. Cyo/UASChs2GFP; kkv/kkv (kkv mutants not expressing Chs2 in the trachea)
      2. btlGal4/ UASChs2GFP; kkv/kkv (kkv mutants expressing Chs2 in the trachea) Results:

      3. Cyo/UASChs2GFP; kkv/kkv ---- 0/4 embryos deposited chitin in trachea

      4. btlGal4/ UASChs2GFP; kkv/kkv ---- 0/16 embryos deposited chitin in trachea These results indicate no restauration of chitin deposition in kkv mutants expressing Chs2 in the trachea (0% rescue).

      We have now incorporated these quantifications in the text, chapter 4 (4. Chs2 cannot replace Kkv and deposit chitin in ectodermal tissues.)

      6- They ask whether Kkv overexpression in the proventriculus can rescue Chs2 mutants... and vice versa, whether Chs2 overexpression in ectodermal cells can rescue kkv mutants. They show that kkv overexpression leads to an intracellular accumulation of chitin in the proventriculus. However, Chs2 overexpression in the trachea did not lead to any accumulation of chitin in the cells. They tailored their experiments and the associated discussion to address the hypothesis that there is potentially some difference in trafficking of these components. However, another possibility, which they have not ruled out, is that the different ability of kkv and Chs2 to produce chitin inside cells of the proventriculus and ectoderm, respectively, is potentially related to different enzymatic activities and cofactors required for chitin formation in these different cell types. Is this another potential explanation for the differences that they observe?

      We note that Kkv overexpression in any cell type (e.g. ectoderm, endoderm) consistently leads to chitin polymerisation. In ectodermal tissues, Kkv expression, in combination with Exp/Reb activity, results in extracellular chitin deposition. In the absence of Exp/Reb, Kkv expression leads to the accumulation of intracellular chitin punctae (De Giorgio et al., 2023, Moussian et al., 2015); this work). This correlates with the accumulation of Kkv at the apical membrane and presence of Kkv-containing vesicles, regardless of the presence of Exp/Reb (De Giorgio et al., 2023, Moussian et al., 2015); Figure 6, S6). In endodermal tissues, regardless of the presence of Exp/Reb, Kkv cannot deposit chitin extracellularly and instead produces intracellular chitin punctae. This correlates with a diffuse accumulation of Kkv in the endodermal cells (PR cells, or gut cells in the embryo) but presence of Kkv-containing vesicles (Figure 6, S6).

      In previous work we showed that Kkv's ability to polymerise chitin is completely abolished when it is retained in the ER. Indeed, we found that a mutation in a conserved WGTRE region leads to ER retention, the absence of Kkv-containing vesicles in the cell, and absence of intracellular chitin punctae or chitin deposition (De Giorgio et al., 2023).

      These findings indicate a correlation between Kkv subcellular localisation and chitin polymerisation/extrusion. Therefore, we hypothesise that intracellular trafficking and subsequent subcellular localisation play a crucial role in regulating Kkv activity (De Giorgio et al., 2023; this work).

      We find that Chs2 is expressed in PR cells (Figure 4) and observe that only in these PR cells does Chs2 localise apically (Fig 5A-D, S5A,B). This localisation correlates with the ability of Chs2 to deposit chitin in the PM and the presence of intracellular chitin punctae in PR cells (Fig 1F). When Chs2 is expressed in other cells types, we detect it primarily in the ER and observed no Chs2-containing vesicles (vesicles are suggestive of trafficking). This localisation correlates with the inability of Chs2 to produce intracellular chitin punctae or extracellular chitin deposition.

      Again, these results suggest a correlation between Chs2 subcellular localisation and chitin polymerisation/extrusion, aligning with the results observed for Kkv. Therefore, we hypothesise in this work that the intracellular trafficking and subsequent subcellular localisation of Chs2 play a crucial role in regulating its activity.

      Our hypothesis is consistent with seminal work in yeast chitin synthases, which has demonstrated the critical role of intracellular trafficking, and particularly ER exit, in regulating chitin synthase activity (reviewed in (Sanchez and Roncero, 2022).

      That said, we cannot exclude other explanations that are also compatible with the observed results. As pointed out by the reviewer, it is possible that Chs2 and Kkv require different enzymatic activities and/or cofactors for chitin polymerisation/deposition, which may be specific to different cell types. Indeed, we know that the auxiliary proteins Exp/Reb are specifically expressed in certain ectodermal tissues (Moussian et al., 2015). These mechanisms could act jointly or in parallel with the regulation of intracellular trafficking, or could even regulate this intracellular trafficking itself.

      Identifying the exact mechanisms controlling Kkv and Chs2 intracellular trafficking would be necessary to determine whether additional mechanisms (specific cofactors or enzymatic activities) are also involved or even serve as the primary regulatory elements.

      We have introduced these additional possibilities in the discussion section.

      7- They co-express Chs2 and Reb and show that this does not lead to chitin production or secretion. In the discussion they conclude that Chs2 does not "seem to be dependent on 'Reb' activity". I think that this statement potentially needs softening. They show that Reb is not sufficient in to induce Chs2 chitin production in cells that do not normally make a PM. However, they do not show that it is not essential in cells that normally express Chs2 and make PM.

      We fully agree with the reviewer's observation and thank her/him for pointing it out.

      As indicated by the reviewer, we show that co-expression of Reb and Chs2 in different tissues does not lead to an effect distinct from that observed with Chs2 expression alone. In addition, in the discussion we mention that we could not detect expression of reb/exp in PR cells, which aligns with the findings from Zhu et al, 2024, indicating no expression of reb/exp in the midgut cells of the adult proventriculus, as assessed by scRNAseq. We found that exp is expressed in the ectodermal cells of the larval proventriculus (Fig S4D), correlating with kkv expression in this region and cuticle deposition. These findings led us to propose that Chs2 does not seem to be dependent on Exp/Reb activity.

      However, in our original manuscript, we did not directly address whether Exp/Reb are required in the cells that normally express Chs2. As a result, we could not conclude that Chs2 relies on a set of auxiliary proteins different from Exp/Reb, and therefore a different molecular mechanism to that of Kkv in regulating chitin deposition.

      To address this specific point, we have conducted a new experiment to test Exp/Reb requirement in PR cells. We co-expressed RNAi lines for Exp/Reb in these cells and found that chitin deposition in the PM was not prevented. This further supports the hypothesis that Exp/Reb activity is not necessary for Chs2 function. We have added this experiment to Chapter 4 and Fig S3I,J.

      8- They looked at the endogenous expression pattern of kkv and Chs2 and say that they found accumulation of Kkv in the proventriculus and no accumulation in the midgut. Siimilarly, they look at the expression of Chs2 and detect it in cells of the proventriculus. Are there markers of these different cell types that they could use to colocalize these enzymes?

      We agree with the reviewer that this is an important issue and we note that Reviewer 1 also raised the same point. Therefore, we have addressed this issue.

      We obtained an antibody against Dve, kindly provided by Dr. Hideki Nakagoshi. Dve marks the endodermal region in the proventriculus (Fuss and Hoch, 1998, Fuss et al., 2004, Nakagoshi et al., 1998).This antibody worked nicely in our dissected L3 digestive tracts and allowed us to mark the endodermal region. We also obtained an antibody against Fkh, kindly provided by Dr. Pilar Carrera. Fkh marks the ectodermal foregut cells (Fuss and Hoch, 1998, Fuss et al., 2004, Nakagoshi et al., 1998). While, in our hands, this antibody performed well in embryonic tissues, we observed no staining in our dissected L3 digestive tracts. The reason for this is unclear, but we suspect technical limitations may be responsible (the ectodermal region of the proventriculus is very internal, potentially hindering antibody penetration). To circumvent this inconvenience, we tested a FkhGFP tagged allele available in Bloomington Stock Center. Fortunately, we were able to detect GFP in ectodermal cells of L3 carrying this allele. Using this approach, we conducted experiments to detect Fkh and Dve in relation to chitin accumulation in the wild type (Fig S1). In addition, we used these markers to map the expression of Kkv and Chs2 in the proventriculus (Fig 4). Our results using these endodermal/ectodermal markers confirmed the presence of a cuticle adjacent to the FkhGFP-positive cells and a PM adjacent to the PR cells, marked by Dve. Additionally, we show that Chs2-expressing cells are positive for Dve while Kkv-expressing cells are not. We could not conduct an experiment showing Kkv and Fkh co-expression due to technical incompatibilities, as we have to use GFP tagged alleles for both Kkv and Fkh to reveal their expression. However, we believe that our imaging of Dve/Kkv clearly shows that Kkv expressing cells lack Dve expression and localise in the internal (ectodermal) region of the proventriculus (Fig 4E).

      9- They overexpress Chs2 in cells of the midgut and see that it colocalises with an ER marker. They conclude that it is retained in the ER, which again, for them suggests that it has a trafficking problem in these cells. However, they are overexpressing it in these cells and this strong accumulation that they observe in the ER could simply be due to the massive expression levels. Additionally, they cannot conclude that it doesn't get out of the ER at all. They could be correct in thinking that there may be a trafficking issue, but this experiment does not conclusively show that Chs2 is entirely retained in the ER when expressed in ectopic tissues. I wonder if their interpretation needs softening or whether they should potentially address alternative hypotheses.

      The reviewer raises two distinct issues: 1) the localisation of overexpressed proteins 2) Chs2 ER retention.

      We agree that massive overexpression can lead to artifactual subcellular localisation due to saturation of the secretory pathway, causing ER accumulation. In our experiments, we overexpressed Kkv and Chs2 in different tissues (trachea, salivary glands, embryonic gut, and larval proventriculus), inducing high levels of both chitin synthases.

      For Kkv, we observed distinct subcellular localisation patterns in ectodermal versus endodermal tissues (illustrated in new Fig S6). In ectodermal tissues such as the trachea, large amounts of KkvGFP were detected, most of it localising apically. We also detected a more general KkvGFP distribution throughout the cell, including the ER, particularly at early stages. Additionally, we observed many KkvGFP-positive vesicles, reflecting exocytic and endocytic trafficking, as described previously (De Giorgio et al., 2023). The presence of these vesicles (as well as the apical localisation) indicates that KkvGFP is able to exit the ER. Indeed, our previous work demonstrated that when Kkv is retained in the ER, it does not localise apically or appear in vesicles (De Giorgio et al, 2023). In endodermal tissues, as described in our manuscript, KkvGFP did not exhibit polarised apical localisation and instead showed a diffuse pattern with some cortical enrichment. However, the presence of KkvGFP-containing vesicles still suggests that the protein is capable of exiting the ER also in these endodermal tissues.

      We observed a different subcellular pattern when we overexpressed Chs2GFP. In tissues where Chs2 is not normally expressed (e.g., trachea, salivary gland, embryonic gut), we did not detect apical or membrane accumulation (see Fig. 5,S5, S6 and response to point 4 of Reviewer #1). Nor did we observe accumulation of Chs2GFP in intracellular vesicles. Instead, Chs2GFP showed strong colocalisation with an ER marker (see Fig. 5,S5, S6 and response to point 4 of Reviewer #1). In contrast, when overexpressed in PR cells, we detected apical enrichment (Fig 5A-D, S5A,B). This indicates that despite massive expression levels, Chs2 can exit the ER in particular tissues.

      Taken together, our results strongly suggest that overexpressed Kkv can exit the ER in the different tissues analysed, whereas most Chs2GFP is retained in the ER in tissues other than PR cells. This correlates with the ability of overexpressed KkvGFP to polymerise chitin (either in intracellular puncta or deposited extracellularly depending on the presence of Exp/Reb) in all analysed tissues. Conversely, Chs2 was unable to polymerise chitin (either in intracellular puncta or extracellularly regardless of Exp/Reb presence) in tissues other than PR cells.

      Nevertheless, we acknowledge that we cannot definitively conclude that all Chs2 protein is entirely retained in the ER. We have included this caveat in our revised manuscript (Chapter 6 and Discussion section).

      Minor Comments: - No mention of Fig 3I in the results section and the order discussed in the results does not match the order in the figure.

      We apologise for these inconsistencies. We have addressed this issue in the text, figure legend, and the image order in Figure 3 and Figure S3.

      • In the results please provide some information on what the CRIMIC collection is and how it allows you to see Chs2 expression for non-experts.

      We have addressed this point in chapter 5 in the revised version, and we now provide a more detailed explanation of the CRIMIC Chs2CR60212-TG4.0 allele.

      Further details of this allele are also provided in our responses to points 5 and 9 of Reviewer 1.

      Reviewer #2 (Significance (Required)):

      Drosophila produce different types of chitinous structures that are required for either the exoskeleton of the animal or for proper gut function (peritrophic matrix). Additionally, most insects have two enzymes involved in the production of chitin and current data suggests that they have unique roles in producing either the exoskeleton or the peritrophic matrix. However, it is unclear whether their different functions are due to differences in cell type expression or differences in physiological activity of the enzymes. The authors exploit Drosophila to drive these 2 enzymes in different cell types that are known to produce the exoskeleton or the peritrophic matrix to determine whether they can functionally substitute mutant backgrounds. Their results give us a hint that these enzymes are not equivalent. What the authors were unable to address is why they are not equivalent. They hypothesise that the different physiological functions of the enzymes may be related to trafficking differences within their respective cell types. While this is an interesting hypothesis, the date are not really clear yet to make this conclusion.

      This work will be of interest to anyone interested in chitinous structures in insects and the cell biology of chitin-related enzymes.

      Literature


      AHMAD, M., CHAUDHARY, S. U., AFZAL, A. J. & TARIQ, M. 2015. Starvation-Induced Dietary Behaviour in Drosophila melanogaster Larvae and Adults. Sci Rep, 5__,__ 14285.

      DE GIORGIO, E., GIANNIOS, P., ESPINAS, M. L. & LLIMARGAS, M. 2023. A dynamic interplay between chitin synthase and the proteins Expansion/Rebuf reveals that chitin polymerisation and translocation are uncoupled in Drosophila. PLoS Biol, 21__,__ e3001978.

      FUSS, B. & HOCH, M. 1998. Drosophila endoderm development requires a novel homeobox gene which is a target of Wingless and Dpp signalling. Mech Dev, 79__,__ 83-97.

      FUSS, B., JOSTEN, F., FEIX, M. & HOCH, M. 2004. Cell movements controlled by the Notch signalling cascade during foregut development in Drosophila. Development, 131__,__ 1587-95.

      HEGEDUS, D. D., TOPRAK, U. & ERLANDSON, M. 2019. Peritrophic matrix formation. J Insect Physiol, 117__,__ 103898.

      KING, D. G. 1988. Cellular organization and peritrophic membrane formation in the cardia (proventriculus) of Drosophila melanogaster. J Morphol, 196__,__ 253-82.

      LEBRETON, G. & CASANOVA, J. 2016. Ligand-binding and constitutive FGF receptors in single Drosophila tracheal cells: Implications for the role of FGF in collective migration. Dev Dyn, 245__,__ 372-8.

      MOUSSIAN, B., LETIZIA, A., MARTINEZ-CORRALES, G., ROTSTEIN, B., CASALI, A. & LLIMARGAS, M. 2015. Deciphering the genetic programme triggering timely and spatially-regulated chitin deposition. PLoS Genet, 11__,__ e1004939.

      NAKAGOSHI, H., HOSHI, M., NABESHIMA, Y. & MATSUZAKI, F. 1998. A novel homeobox gene mediates the Dpp signal to establish functional specificity within target cells. Genes Dev, 12__,__ 2724-34.

      PETERS, W. 1992. Peritrophic Membranes, Springer Berlin, Heidelberg.

      RIZKI, M. T. M. 1956. The secretory activity of the proventriculus of Drosophila melanogaster. Journal of Experimental Zoology, 131__,__ 203-221.

      SANCHEZ, N. & RONCERO, C. 2022. Chitin Synthesis in Yeast: A Matter of Trafficking. Int J Mol Sci, 23.

      ZHU, H., LUDINGTON, W. B. & SPRADLING, A. C. 2024. Cellular and molecular organization of the Drosophila foregut. Proc Natl Acad Sci U S A, 121__,__ e2318760121.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Drosophila have two different chitin synthase enzymes, Kkv and Chs2, and due to unique expression patterns and mutant phenotypes, it is relatively clear that they have different functions in producing either the cuticle-related chitin network (Kkv) or the chitin associated with the peritrophic matrix (PM). However, what is unknown is whether the different functions in making cuticle vs PM chitin is related to differences in cellular expression and/or enzyme properties within the cell. The authors exploit the genetic tractability of Drosophila and their ability to image cuticle vs PM chitin production to examine whether these 2 enzymes can substitute each other. They conclude that these two proteins are not equivalent in their capacity to generate chitin. The data are convincing; however, it is currently presented in a subjective fashion, which makes it difficult to interpret. Additionally, in my opinion there is some interpretation that requires softening or alternatively interpreted.

      Major Comments:

      • While the imaging is lovely, there are some things that are difficult to see in the figures. For example, the "continuous, thin and faint 'chitin' layer that lined the gut epithelium" is very difficult to visualise in the control images. Can they increase the contrast to help the reader appreciate this layer? This is particularly important as we are asked to appreciate a loss of this layer in the absence of Chs2.
      • All the mutant analysis is presented subjectively. For example, the authors state that they "found a consistent difference of CBP staining when they compared the 'Chs2' escapers to the controls". How consistent is consistent? Can this be quantified? What is the penetrance of this phenotype? They say that the thin layer is absent in the midgut and the guts are thinner. Could they provide more concrete data?
      • They state that Chs2 was able to restore accumulation of chitin in the PM of the proventriculus and the midgut. Please quantify. Additionally, does this restore the morphology of the guts (related to the comment above on the thinner guts in the absence of Chs2)?
      • This may be beyond the scope of this paper, but I find it interesting that the PM chitin is deposited in the proventricular lumen. Yet it forms a thin layer that lines the entire midgut? Any idea how this presumably dense chitin network gets transported throughout the midgut to line the epithelium? I imagine that this is unlikely due to diffusion, especially if they see an even distribution across the midgut. Do they see any evidence of a graded lining (i.e. is it denser in the midgut towards the proventriculus and does this progressively decrease as you look through the midgut?)?
      • The authors state that expression of kkv in tracheal cells of kkv mutants perfectly restores accumulation of chitin in the luminal filaments. Is this really 100% restoration? They also reference a paper here, which may have quantified this result.
      • They ask whether Kkv overexpression in the proventriculus can rescue Chs2 mutants... and vice versa, whether Chs2 overexpression in ectodermal cells can rescue kkv mutants. They show that kkv overexpression leads to an intracellular accumulation of chitin in the proventriculus. However, Chs2 overexpression in the trachea did not lead to any accumulation of chitin in the cells. They tailored their experiments and the associated discussion to address the hypothesis that there is potentially some difference in trafficking of these components. However, another possibility, which they have not ruled out, is that the different ability of kkv and Chs2 to produce chitin inside cells of the proventriculus and ectoderm, respectively, is potentially related to different enzymatic activities and cofactors required for chitin formation in these different cell types. Is this another potential explanation for the differences that they observe?
      • They co-express Chs2 and Reb and show that this does not lead to chitin production or secretion. In the discussion they conclude that Chs2 does not "seem to be dependent on 'Reb' activity". I think that this statement potentially needs softening. They show that Reb is not sufficient in to induce Chs2 chitin production in cells that do not normally make a PM. However, they do not show that it is not essential in cells that normally express Chs2 and make PM.
      • They looked at the endogenous expression pattern of kkv and Chs2 and say that they found accumulation of Kkv in the proventriculus and no accumulation in the midgut. Siimilarly, they look at the expression of Chs2 and detect it in cells of the proventriculus. Are there markers of these different cell types that they could use to colocalize these enzymes?
      • They overexpress Chs2 in cells of the midgut and see that it colocalises with an ER marker. They conclude that it is retained in the ER, which again, for them suggests that it has a trafficking problem in these cells. However, they are overexpressing it in these cells and this strong accumulation that they observe in the ER could simply be due to the massive expression levels. Additionally, they cannot conclude that it doesn't get out of the ER at all. They could be correct in thinking that there may be a trafficking issue, but this experiment does not conclusively show that Chs2 is entirely retained in the ER when expressed in ectopic tissues. I wonder if their interpretation needs softening or whether they should potentially address alternative hypotheses.

      Minor Comments:

      • No mention of Fig 3I in the results section and the order discussed in the results does not match the order in the figure.
      • In the results please provide some information on what the CRIMIC collection is and how it allows you to see Chs2 expression for non-experts.

      Significance

      Drosophila produce different types of chitinous structures that are required for either the exoskeleton of the animal or for proper gut function (peritrophic matrix). Additionally, most insects have two enzymes involved in the production of chitin and current data suggests that they have unique roles in producing either the exoskeleton or the peritrophic matrix. However, it is unclear whether their different functions are due to differences in cell type expression or differences in physiological activity of the enzymes. The authors exploit Drosophila to drive these 2 enzymes in different cell types that are known to produce the exoskeleton or the peritrophic matrix to determine whether they can functionally substitute mutant backgrounds. Their results give us a hint that these enzymes are not equivalent. What the authors were unable to address is why they are not equivalent. They hypothesise that the different physiological functions of the enzymes may be related to trafficking differences within their respective cell types. While this is an interesting hypothesis, the date are not really clear yet to make this conclusion.

      This work will be of interest to anyone interested in chitinous structures in insects and the cell biology of chitin-related enzymes.

    1. Author response:

      The following is the authors’ response to the original reviews

      ANALYTICAL

      (1) A key claim made here is that the same relationship (including the same parameter) describes data from pigeons by Gibbon and Balsam (1981; Figure 1) and the rats in this study (Figure 3). The evidence for this claim, as presented here, is not as strong as it could be. This is because the measure used for identifying trials to criterion in Figure 1 appears to differ from any of the criteria used in Figure 3, and the exact measure used for identifying trials to criterion influences the interpretation of Figure 3***. To make the claim that the quantitative relationship is one and the same in the Gibbon-Balsam and present datasets, one would need to use the same measure of learning on both datasets and show that the resultant plots are statistically indistinguishable, rather than simply plotting the dots from both data sets and spotlighting their visual similarity. In terms of their visual characteristics, it is worth noting that the plots are in log-log axis and, as such, slight visual changes can mean a big difference in actual numbers. For instance, between Figure 3B and 3C, the highest information group moves up only "slightly" on the y-axis but the difference is a factor of 5 in the real numbers. Thus, in order to support the strong claim that the quantitative relationships obtained in the Gibbon-Balsam and present datasets are identical, a more rigorous approach is needed for the comparisons.

      ***The measure of acquisition in Figure 3A is based on a previously established metric, whereas the measure in Figure 3B employs the relatively novel nDKL measure that is argued to be a better and theoretically based metric. Surprisingly, when r and r2 values are converted to the same metric across analyses, it appears that this new metric (Figure 3B) does well but not as well as the approach in Figure 3A. This raises questions about why a theoretically derived measure might not be performing as well on this analysis, and whether the more effective measure is either more reliable or tapping into some aspect of the processes that underlie acquisition that is not accounted for by the nDKL metric.

      Figure 3 shows that the relationship between learning rate and informativeness for our rats was very similar to that shown with pigeons by Gibbon and Balsam (1981). We have used multiple criteria to establish the number of trials to learn in our data, with the goal of demonstrating that the correspondence between the data sets was robust. In the revised Figure 3, specifically 3C and 3D, we have plotted trials to acquisition using decision criterion equivalent to those used by Gibbon and Balsam. The criterion they used—at least one peck at the response key on at least 3 out of 4 consecutive trials—cannot be directly applied to our magazine entry data because rats make magazine entries during the inter-trial interval (whereas pigeons do not peck at the response key in the inter-trial interval). Therefore, evidence for conditioning in our paradigm must involve comparison between the response rate during CS and the baseline response rate, rather than just counting responses during the CS. We have used two approaches to adapt the Gibbon and Balsam criterion to our data. One approach, plotted in Figure 3C, uses a non-parametric signed rank test for evidence that the CS response rate exceeds the pre-CS response rate, and adopting a statistical criterion equivalent to Gibbon and Balsam’s 3-out-of-4 consecutive trials (p<.3125). The second method (Figure 3D) estimates the nDkl for the criterion used by Gibbon and Balsam and then applies this criterion to the nDkl for our data. To estimate the nDkl of Gibbon and Balsam’s data, we have assumed there are no responses in the inter-trial interval and the response probability during the CS must be at least 0.75 (their criterion of at least 3 responses out of 4 trials). The nDkl for this difference is 2.2 (odds ratio 27:1). We have then applied this criterion to the nDkl obtained from our data to identify when the distribution of CS response rates has diverged by an equivalent amount from the distribution of pre-CS response rates. These two analyses have been added to the manuscript to replace those previously shown in Figures 3B and 3C.

      (2) Another interesting claim here is that the rates of responding during ITI and the cue are proportional to the corresponding reward rates with the same proportionality constant. This too requires more quantification and conceptual explanation. For quantification, it would be more convincing to calculate the regression slope for the ITI data and the cue data separately and then show that the corresponding slopes are not statistically distinguishable from each other. Conceptually, it is not clear why the data used to test the ITI proportionality came from the last 5 conditioning sessions. What were the decision criteria used to decide on averaging the final 5 sessions as terminal responses for the analyses in Figure 5? Was this based on consistency with previous work, or based on the greatest number of sessions where stable data for all animals could be extracted?

      If the model is that animals produce response rates during the ITI (a period with no possible rewards) based on the overall rate of rewards in the context, wouldn't it be better to test this before the cue learning has occurred? Before cue learning, the animals would presumably only have attributed rewards in the context to the context and thus, produce overall response rates in proportion to the contextual reward rate. After cue learning, the animals could technically know that the rate of rewards during ITI is zero. Why wouldn't it be better to test the plotted relationship for ITI before cue learning has occurred? Further, based on Figure 1, it seems that the overall ITI response rate reduces considerably with cue learning. What is the expected ITI response rate prior to learning based on the authors' conceptual model? Why does this rate differ from pre and post-cue learning? Finally, if the authors' conceptual framework predicts that ITI response rate after cue learning should be proportional to contextual reward rate, why should the cue response rate be proportional to the cue reward rate instead of the cue reward rate plus the contextual reward rate?

      A single regression line, as shown in Figure 5, is the simplest possible model of the relationship between response rate and reinforcement rate and it explains approximately 80% of the variance in response rate. Fixing the log-log slope at 1 yields the maximally simple model. (This regression is done in the logarithmic domain to satisfy the homoscedasticity assumption.) When transformed into the linear domain, this model assumes a truly scalar relation (linear, intercept at the origin) and assumes the same scale factor and the same scalar variability in response rates for both sets of data (ITI and CS). Our plot supports such a model. Its simplicity is its own motivation (Occam’s razor).

      If separate regression lines are fitted to the CS and ITI data, there is a small increase in explained variance (R<sub>2</sub> = 0.82). These regression lines have been added to the plot in the revised manuscript (Figure 5). We leave it to further research to determine whether such a complex model, with 4 parameters, is required. However, we do not think the present data warrant comparing the simplest possible model, with one parameter, to any more complex model for the following reasons:

      · When a brain—or any other machine—maps an observed (input) rate to a rate it produces (output rate), there is always an implicit scalar. In the special case where the produced rate equals the observed rate, the implicit scalar has value 1. Thus, there cannot be a simpler model than the one we propose, which is, in and of itself, interesting.

      · The present case is an intuitively accessible example of why the MDL (Minimum Description Length) approach to model complexity (Barron, Rissanen, & Yu, 1998; Grünwald, Myung, & Pitt, 2005; Rissanen, 1999) can yield a very different conclusion from the conclusion reached using the Bayesian Information Criterion (BIC) approach. The MDL approach measures the complexity of a model when given N data specified with precision of B bits per datum by computing (or approximating) the sum of the maximum-likelihoods of the model’s fits to all possible sets of N data with B precision per datum. The greater the sum over the maximum likelihoods, the more complex the model, that is, the greater its measured wiggle room, it’s capacity to fit data. Recall that von Neuman remarked to Fermi that with 4 parameters he could fit an elephant. His deeper point was that multi-parameter models bring neither insight nor predictive power; they explain only post-hoc, after one has adjusted their parameters in the light of the data. For realistic data sets like ours, the sums of maximum likelihoods are finite but astronomical. However, just as the Sterling approximation allows one to work with astronomical factorials, it has proved possible to develop readily computable approximations to these sums, which can be used to take model complexity into account when comparing models. Proponents of the MDL approach point out that the BIC is inadequate because models with the same number of parameters can have very different amounts of wiggle room. A standard illustration of this point is the contrast between logarithmic model and power-function model. Log regressions must be concave; whereas power function regressions can be concave, linear, or convex—yet they have the same number of parameters (one or two, depending on whether one counts the scale parameter that is always implicit). The MDL approach captures this difference in complexity because it measures wiggle room; the BIC approach does not, because it only counts parameters.

      · In the present case, one is comparing a model with no pivot and no vertical displacement at the boundary between the black dots and the red dots (the 1-parameter unilinear model) to a bilinear model that allows both a change in slope and a vertical displacement for both lines. The 4-parameter model is superior if we use the BIC to take model complexity into account. However, 4-parameter has ludicrously more wiggle room. It will provide excellent fits—high maximum likelihood—to data sets in which the red points have slope > 1, slope 0, or slope < 0 and in which it is also true that the intercept for the red points lies well below or well above the black points (non-overlap in the marginal distribution of the red and black data). The 1-parameter model, on the other hand, will provide terrible fits to all such data (very low maximum likelihoods). Thus, we believe the BIC does not properly capture the immense actual difference in the complexity between the 1-parameter model (unilinear with slope 1) to the 4-parameter model (bilinear with neither the slope nor the intercept fixed in the linear domain).

      · In any event, because the pivot (change in slope between black and red data sets), if any, is small and likewise for the displacement (vertical change), it suffices for now to know that the variance captured by the 1-parameter model is only marginally improved by adding three more parameters. Researchers using the properly corrected measured rate of head poking to measure the rate of reinforcement a subject expects can therefore assume that they have an approximately scalar measure of the subject’s expectation. Given our data, they won’t be far wrong even near the extremes of the values commonly used for rates of reinforcement. That is a major advance in current thinking, with strong implications for formal models of associative learning. It implies that the performance function that maps from the neurobiological realization of the subject’s expectation is not an unknown function. On the contrary, it’s the simplest possible function, the scalar function. That is a powerful constraint on brain-behavior linkage hypotheses, such as the many hypothesized relations between mesolimbic dopamine activity and the expectation that drives responding in Pavlovian conditioning (Berridge, 2012; Jeong et al., 2022; Y.  Niv, Daw, Joel, & Dayan, 2007; Y. Niv & Schoenbaum, 2008).

      The data in Figures 4 and 5 are taken from the last 5 sessions of training. The exact number of sessions was somewhat arbitrary but was chosen to meet two goals: (1) to capture asymptotic responding, which is why we restricted this to the end of the training, and (2) to obtain a sufficiently large sample of data to estimate reliably each rat’s response rate. We have checked what the data look like using the last 10 sessions, and can confirm it makes very little difference to the results. We now note this in the revised manuscript. The data for terminal responding by all rats, averaged over both the last 5 sessions and last 10 sessions, can be downloaded from https://osf.io/vmwzr/

      Finally, as noted by the reviews, the relationship between the contextual rate of reinforcement and ITI responding should also be evident if we had measured context responding prior to introducing the CS. However, there was no period in our experiment when rats were given unsignalled reinforcement (such as is done during “magazine training” in some experiments). Therefore, we could not measure responding based on contextual conditioning prior to the introduction of the CS. This is a question for future experiments that use an extended period of magazine training or “poor positive” protocols in which there are reinforcements during the ITIs as well as during the CSs. The learning rate equation has been shown to predict reinforcements to acquisition in the poor-positive case (Balsam, Fairhurst, & Gallistel, 2006).

      (3) There is a disconnect between the gradual nature of learning shown in Figures 7 and 8 and the information-theoretic model proposed by the authors. To the extent that we understand the model, the animals should simply learn the association once the evidence crosses a threshold (nDKL > threshold) and then produce behavior in proportion to the expected reward rate. If so, why should there be a gradual component of learning as shown in these figures? In terms of the proportional response rule to the rate of rewards, why is it changing as animals go from 10% to 90% of peak response? The manuscript would be greatly strengthened if these results were explained within the authors' conceptual framework. If these results are not anticipated by the authors' conceptual framework, this should be explicitly stated in the manuscript.

      One of us (CRG) has earlier suggested that responding appears abruptly when the accumulated evidence that the CS reinforcement rate is greater than the contextual rate exceeds a decision threshold (C.R.  Gallistel, Balsam, & Fairhurst, 2004). The new more extensive data require a more nuanced view. Evidence about the manner in which responding changes over the course of training is to some extent dependent on the analytic method used to track those changes. We presented two different approaches. The approach shown in Figures 7 and 8 (now 6 and 7), extending on that developed by Harris (2022), assumes a monotonic increase in response rate and uses the slope of the cumulative response rate to identify when responding exceeds particular milestones (percentiles of the asymptotic response rate). This analysis suggests a steady rise in responding over trials. Within our theoretical model, this might reflect an increase in the animal’s certainty about the CS reinforcement rate with accumulated evidence from each trial. While this method should be able to distinguish between a gradual change and a single abrupt change in responding (Harris, 2022) it may not distinguish between a gradual change and multiple step-like changes in responding and cannot account for decreases in response rate.

      The other analytic method we used relies on the information theoretic measure of divergence, the nDkl (Gallistel & Latham, 2023), to identify each point of change (up or down) in the response record. With that method, we discern three trends. First, the onset tends to be abrupt in that the initial step up is often large (an increase in response rate by 50% or more of the difference between its initial value and its terminal value is common and there are instances where the initial step is to the terminal rate or higher). Second, there is marked within-subject variability in the response rate, characterized by large steps up and down in the parsed response rates following the initial step up, but this variability tends to decrease with further training (there tend to be fewer and smaller steps in both the ITI response rates and the CS response rate as training progresses). Third, the overall trend, seen most clearly when one averages across subjects within groups is to a moderately higher rate of responding later in training than after the initial rise. We think that the first tendency reflects an underlying decision process whose latency is controlled by diminishing uncertainty about the two reinforcement rates and hence about their ratio. We think that decreasing uncertainty about the true values of the estimated rates of reinforcement is also likely to be an important part of the explanation for the second tendency (decreasing within-subject variation in response rates). It is less clear whether diminishing uncertainty can explain the trend toward a somewhat greater difference in the two response rates as conditioning progresses. It is perhaps worth noting that the distribution of the estimates of the informativeness ratio is likely to be heavy tailed and have peculiar properties (as witness, for example, the distribution of the ratio of two gamma distributions with arbitrary shape and scale parameters) but we are unable at this time to propound an explanation of the third trend.

      (4) Page 27, Procedure, final sentence: The magazine responding during the ITI is defined as the 20 s period immediately before CS onset. The range of ITI values (Table 1) always starts as low as 15 s in all 14 groups. Even in the case of an ITI on a trial that was exactly 20 s, this would also mean that the start of this period overlaps with the termination of the CS from the previous trial and delivery (and presumably consumption) of a pellet. It should be indicated whether the definition of the ITI period was modified on trials where the preceding ITI was < 20 s, and if any other criteria were used to define the ITI. Were the rats exposed to the reinforcers/pellets in their home cage prior to acquisition?

      There was an error in the description provided in the original text. The pre-CS period used to measure the ITI responding was 10 s rather than 20 s. There was always at least a 5-s gap between the end of the previous trial and the start of the pre-CS period. The statement about the pre-CS measure has been corrected in the revised manuscript.

      (5) For all the analyses, the exact models that were fit and the software used should be provided. For example, it is not necessarily clear to the reader (particularly in the absence of degrees of freedom) that the model discussed in Figure 3 fits on the individual subject data points or the group medians. Similarly, in Figure 6 there is no indication of whether a single regression model was fit to all the plotted data or whether tests of different slopes for each of the conditions were compared. With regards to the statistics in Figure 6, depending on how this was run, it is also a potential problem that the analyses do not correct for the potentially highly correlated multiple measurements from the same subjects, i.e. each rat provides 4 data points which are very unlikely to be independent observations.

      Details about model fitting have been added to the revision. The question about fitting a single model or multiple models to the data in Figure 6 (now 5) is addressed in response 2 above. In Figure 5, each rat provides 2 behavioural data points (ITI response rate and CS response rate) and 2 values for reinforcement rate (1/C and 1/T). There is a weak but significant correlation between the ITI and CS response rates (r = 0.28, p < 0.01; log transformed to correct for heteroscedasticity). By design, there is no correlation between the log reinforcement rates (r = 0.06, p = .404).

      CONCEPTUAL

      (1) We take the point that where traditional theories (e.g., Rescorla-Wagner) and rate estimation theory (RET) both explain some phenomenon, the explanation in terms of RET may be preferred as it will be grounded in aspects of an animal's experience rather than a hypothetical construct. However, like traditional theories, RET does not explain a range of phenomena - notably, those that require some sort of expectancy/representation as part of their explanation. This being said, traditional theories have been incorporated within models that have the representational power to explain a broader array of phenomena, which makes me wonder: Can rate estimation be incorporated in models that have representational power; and, if so, what might this look like? Alternatively, do the authors intend to claim that expectancy and/or representation - which follow from probabilistic theories in the RW mould - are unnecessary for explanations of animal behaviour?***

      It is important for the field to realize that the RW model cannot be used to explain the results of Rescorla’s (Rescorla, 1966; Rescorla, 1968, 1969) contingency-not-pairing experiments, despite what was claimed by Rescorla and Wagner (Rescorla & Wagner, 1972; Wagner & Rescorla, 1972) and has subsequently been claimed in many modelling papers and in most textbooks and reviews (Dayan & Niv, 2008; Y. Niv & Montague, 2008). Rescorla programmed reinforcements with a Poisson process. The defining property of a Poisson process is its flat hazard function; the reinforcements were equally likely at every moment in time when the process was running. This makes it impossible to say when non-reinforcements occurred and, a fortiori, to count them. The non-reinforcements are causal events in RW algorithm and subsequent versions of it. Their effects on associative strength are essential to the explanations proffered by these models. Non-reinforcements—failures to occur, updates when reinforcement is set to 0, hence also the lambda parameter—can have causal efficacy only when the successes may be predicted to occur at specified times (during “trials”). When reinforcements are programmed by a Poisson process, there are no such times. Attempts to apply the RW formula to reinforcement learning soon foundered on this problem (Gibbon, 1981; Gibbon, Berryman, & Thompson, 1974; Hallam, Grahame, & Miller, 1992; L.J. Hammond, 1980; L. J. Hammond & Paynter, 1983; Scott & Platt, 1985). The enduring popularity of the delta-rule updating equation in reinforcement learning depends on “big-concept” papers that don’t fit models to real data and discretize time into states while claiming to be real-time models (Y. Niv, 2009; Y. Niv, Daw, & Dayan, 2005).

      The information-theoretic approach to associative learning, which sometimes historically travels as RET (rate estimation theory), is unabashedly and inescapably representational. It assumes a temporal map and arithmetic machinery capable in principle of implementing any implementable computation. In short, it assumes a Turing-complete brain. It assumes that whatever the material basis of memory may be, it must make sense to ask of it how many bits can be stored in a given volume of material. This question is seldom posed in associative models of learning, nor by neurobiologists committed to the hypothesis that the Hebbian synapse is the material basis of memory. Many—including the new Nobelist, Geoffrey Hinton— would agree that the question makes no sense. When you assume that brains learn by rewiring themselves rather than by acquiring and storing information, it makes no sense.

      When a subject learns a rate of reinforcement, it bases its behavior on that expectation, and it alters its behavior when that expectation is disappointed. Subjects also learn probabilities when they are defined. They base some aspects of their behavior on those expectations, making computationally sophisticated use of their representation of the uncertainties (Balci, Freestone, & Gallistel, 2009; Chan & Harris, 2019; J. A. Harris, 2019; J.A. Harris & Andrew, 2017; J. A. Harris & Bouton, 2020; J. A. Harris, Kwok, & Gottlieb, 2019; Kheifets, Freestone, & Gallistel, 2017; Kheifets & Gallistel, 2012; Mallea, Schulhof, Gallistel, & Balsam, 2024 in press).

      (2) The discussion of Rescorla's (1967) and Kamin's (1968) findings needs some elaboration. These findings are already taken to mean that the target CS in each design is not informative about the occurrence of the US - hence, learning about this CS fails. In the case of blocking, we also know that changes in the rate of reinforcement across the shift from stage 1 to stage 2 of the protocol can produce unblocking. Perhaps more interesting from a rate estimation perspective, unblocking can also be achieved in a protocol that maintains the rate of reinforcement while varying the sensory properties of the US (Wagner). How does rate estimation theory account for these findings and/or the demonstrations of trans-reinforcer blocking (Pearce-Ganesan)? Are there other ways that the rate estimation account can be distinguished from traditional explanations of blocking and contingency effects? If so, these would be worth citing in the discussion. More generally, if one is going to highlight seminal findings (such as those by Rescorla and Kamin) that can be explained by rate estimation, it would be appropriate to acknowledge findings that challenge the theory - even if only to note that the theory, in its present form, is not all-encompassing. For example, it appears to me that the theory should not predict one-trial overshadowing or the overtraining reversal effect - both of which are amenable to discussion in terms of rates.

      I assume that the signature characteristics of latent inhibition and extinction would also pose a challenge to rate estimation theory, just as they pose a challenge to Rescorla-Wagner and other probability-based theories. Is this correct?

      The seemingly contradictory evidence of unblocking and trans-reinforcer blocking by Wagner and by Pearce and Ganesan cited above will be hard for any theory to accommodate. It will likely depend on what features of the US are represented in the conditioned response.

      RET predicts one-trial overshadowing, as anyone may verify in a scientific programming language because it has no free parameters; hence, no wiggle room. Overtraining reversal effects appear to depend on aspects of the subjects’ experience other than the rate of reinforcement. It seems unlikely that it can proffer an explanation.

      Various information-theoretic calculations give pretty good quantitative fits to the relatively few parametric studies of extinction and the partial-reinforcement extinction effect (see Gallistel (2012, Figs 3 & 4); Wilkes & Gallistel (2016, Fig 6) and Gallistel (2025, under review, Fig 6). It has not been applied to latent inhibition, in part for want of parametric data. However, clearly one should not attribute a negative rate to a context in which the subject had never been reinforced. An explanation, if it exists, would have to turn on the effect of that long period on initial rate estimates AND on evidence of a change in rate, as of the first reinforcement.

      Recommendations for authors:

      MINOR POINTS

      (1) It is not clear why Figure 3C is presented but not analyzed, and why the data presented in Figure 4 to clarify the spread of the distribution of the data observed across the plots in Figure 3 uses the data from Figure 3C. This would seem like the least representative data to illustrate the point of Figure 4. It also appears that the data plotted in Figure 4 corresponds to Figure 3A and 3B rather than the odds 10:1 data indicated in the text.

      Figures 3 has changed as already described. The data previously plotted in Figure 4 are now shown in 3B and corresponds to that plotted in Figure 3A.

      (2) Log(T) was not correlated with trials to criterion. If trials to criterion is inversely proportional to log(C/T) and C is uncorrelated with T, shouldn't trials to criterion be correlated with log(T)? Is this merely a matter of low statistical power?

      Yes. There is a small, but statistically non-significant, correlation between log(T) and trials to criterion, r = 0.35, p = .22. That correlation drops to .08 (p = .8) after factoring out log(C/T), which demonstrates that the weak correlation between log(T) and trials to criterion is based on the correlation between log(t) and log(C/T).

      (3) The rationale for the removal of the high information condition samples in the Fig 8 "Slope" plot to be weak. Can the authors justify this choice better? If all data are included, the relationship is clearly different from that shown in the plot.

      We have now reported correlations that include those 3 groups but noted that the correlations are largely driven by the much lower slope values of those 3 groups which is likely an artefact of their smaller number of trials. We use this to justify a second set of correlations that excludes those 3 groups.

      (4) The discussion states that there is at most one free parameter constrained by the data - the constant of proportionality for response rate. However, there is also another free parameter constrained by data-the informativeness at which expected trials to acquisition is 1.

      I think this comment is referring to two different sets of data. The constant of proportionality of the response rate refers to the scalar relationship between reinforcement rate and terminal response rate shown in Figure 5. The other parameter, the informativeness when trials to acquisition equals 1, describes the intercept of the regression line in Figure 1 (and 3).

      (5) The authors state that the measurement of available information is not often clear. Given this, how is contingency measurable based on the authors' framework?

      (6) Based on the variables provided in Supplementary File 3, containing the acquisition data, we were unable to reproduce the values reported in the analysis of Figure 3.

      Figure 3 has changed, using new criteria for trials to acquisition that attempt to match the criterion used by Gibbon and Balsam. The data on which these figures are based has been uploaded into OSF.

      GRAPHICAL AND TYPOGRAPHICAL

      (1) Y-axis labels in Figure 1 are not appropriately placed. 0 is sitting next to 0.1. 0 should sit at the bottom of the y-axis.

      If this comment refers to the 0 sitting above an arrow in the top right corner of the plot, this is not misaligned. The arrow pointing to zero is used to indicate that this axis approaches zero in the upward direction. 0 should not be aligned to a value on the axis since a learning rate of zero would indicate an infinite number of learning trials. The caption has been edited to explain this more clearly.

      (2) Typo, Page 6, Final Paragraph, line 4. "Fourteen groups of rats were trained with for 42 session"

      Corrected. Thank you.

      (3) Figure 3 caption: Typo, should probably be "Number of trials to acquisition"?

      This change has now been made. The axis shows reinforcements to acquisition to be consistent with Gibbon and Balsam, but trials and number of reinforcements are identical in our 100% reinforcement schedule.

      (4) Typo Page 17 Line 1: "Important pieces evidence about".

      Correct. Thank you.

      (5) Consider consistent usage of symbols/terms throughout the manuscript (e.g. Page 22, final paragraph: "iota = 2" is used instead of the corresponding symbol that has been used throughout).

      Changed.

      (6) Typo Page 28, Paragraph 1, Line 9: "We used a one-sample t-test using to identify when this".

      This section of text has been changed to reflect the new analysis used for the data in Figure 3.

      (7) Typo Page 29, Paragraph 1, Line 2: "problematic in cases where one of both rates are undefined" either typo or unclear phrasing.

      “of” has been corrected to “or”

      (8) Typo Page 30: Equation 3 appears to have an error and is not consistent with the initial printing of Equation 3 in the manuscript.

      The typo in initial expression of Eq 3 (page 23) has been corrected.

      (9) Typo Page 33, Line 5: "Figures 12".

      Corrected.

      (10) Typo Page 34, Line 10: "and the 5 the increasingly"? Should this be "the 5 points that"?

      Corrected.

      (11) Typo Page 35, Paragraph 2: "estimate of the onset of conditioned is the trial after which".

      Corrected.

      (12) Clarify: Page 35, final paragraph: it is stated that four-panel figures are included for each subject in the Supplementary files, but each subject has a six-panel figure in the Supplementary file.

      The text now clarifies that the 4-panel figures are included within the 6-panel figures in the Supplementary materials.

      (13) It is hard to identify the different groups in Figure 2 (Plot 15).

      The figure is simply intended to show that responding across seconds within the trial is relatively flat for each group. Individuation of specific groups is not particularly important.

      (14) It appears that the numbering on the y-axis is misaligned in Figure 2 relative to the corresponding points on the scale (unless I have misunderstood these values and the response rate measure to the ITI can drop below 0?).

      The numbers on the Y axes had become misaligned. That has now been corrected.

      (15) Please include the data from Figure 3A in the spreadsheet supplementary file 3. If it has already been included as one of the columns of data, please consider a clearer/consistent description of the relevant column variable in Supplementary File 1.

      The data from Figure 3 are now available from the linked OSF site, referenced in the manuscript.

      (16) Errors in supplementary data spreadsheets such that the C/T values are not consistent with those provided in Table 1 (C/T values of 4.5, 54, 180, and 300 are slightly different values in these spreadsheets). A similar error/mismatch appears to have occurred in the C/T labels for Figures (e.g. Figure 10) and the individual supplementary figures.

      The C/T values on the figures in the supplementary materials have been corrected and are now consistent with those in Table 1.

      (17) Currently the analysis and code provided at https://osf.io/vmwzr/ are not accessible without requesting access from the author. Please consider making these openly available without requiring a request for authorization. As such, a number of recommendations made here may already have been addressed by the data and code deposited on OSF. Apologies for any redundant recommendations.

      Data and code are now available in at the OSF site which has been made public without requiring request.

      (18) Please consider a clearer and more specific reference to supplementary materials. Currently, the reader is required to search through 4 separate supplementary files to identify what is being discussed/referenced in the text (e.g. Page 18, final line: "see Supplementary Materials" could simply be "see Figure S1").

      We have added specific page numbers in references to the Supplementary Materials.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes a novel magnetic steering technique to target human adipose derived mesenchymal stem cells (hAMSC) or induce pluripotent stem cells to the TM (iPSC-TM). The authors show that delivery of the stem cells lowered IOP, increased outflow facility, and increased TM cellularity.

      Strengths:

      The technique is novel and shows promise as a novel therapeutic to lower IOP in glaucoma. hAMSC are able to lower IOP below the baseline as well as increase outflow facility above baseline with no tumorigenicity. These data will have a positive impact on the field and will guide further research using hAMSC in glaucoma models.

      Weaknesses:

      The transgenic mouse model of glaucoma the authors used did not show ocular hypertensive phenotypes at 6-7 months of age as previously reported. Therefore, if there is no pathology in these animals the authors did not show a restoration of function, but rather a decrease in pressure below normal IOP.

      We appreciate the reviewer’s feedback and agree with the statement of weakness. Accordingly, we have revised the language to improve clarity. Specifically, all references to "restoration of IOP" or "restoration of conventional outflow function" have been replaced with more precise phrases, in the following locations: 

      • lines 2-3 (title): Magnetically steered cell therapy for reduction of intraocular pressure  as a treatment strategy for open-angle glaucoma

      • lines 36-8 (abstract): We observed a 4.5 [3.1, 6.0] mmHg or 27% reduction in intraocular pressure (IOP) for nine months after a single dose of only 1500 magnetically-steered hAMSCs, explained by increased conventional outflow facility and associated with higher TM cellularity.

      • lines 45-6 (one-sentence summary): A novel magnetic cell therapy provided effective intraocular pressure reduction in mice, motivating future translational studies.

      • lines 123-4 (introduction): Despite the absence of ocular hypertension in our MYOC<sup>Y437H</sup> mice, our data demonstrate sustained IOP lowering and a significant benefit of magnetic cell steering in the eye, particularly for hAMSCs, strongly indicating further translational potential.

      • line 207 (results): The observed reductions in IOP and increases in outflow facility after delivery of both cell types suggested functional changes in the conventional outflow pathway.

      • line 509-10 (discussion): In summary, this work shows the effectiveness of our novel magnetic TM cell therapy approach for long-term IOP reduction through functional changes in the conventional outflow pathway.

      It is very important to note that at the 23rd annual Trabecular Meshwork Study Club meeting (San Diego, December 2024), Dr. Zode, the lead author of reference 26 originally describing the transgenic myocilin mouse model, announced during his talk that this model no longer demonstrates the glaucomatous phenotype in his hands, which incidentally has motivated him to create a new, CRISPR MYOC mouse model. Dr. Zode also stated that he was uncertain of the reason for this loss of phenotype. His observation is consistent with our report. However, other investigators continue to observe the desired phenotype in their colonies of this mouse (Dr. Wei Zhu, personal communication). Continued use of this mouse model should therefore be approached with caution. 

      Reviewer #2 (Public review):

      Summary:

      This observational study investigates the efficacy of intracameral injected human stem cells as a means to re-functionalize the trabecular meshwork for the restoration of intraocular pressure homeostasis. Using a murine model of glaucoma, human adiposederived mesenchymal stem cells are shown to be biologically safer and functionally superior at eliciting a sustained reduction in intraocular pressure (IOP). The authors conclude that the use of human adipose-derived mesenchymal stem cells has the potential for long-term treatment of ocular hypertension in glaucoma.

      Strengths:

      A noted strength is the use of a magnetic steering technique to direct injected stem cells to the iridocorneal angle. An additional strength is the comparison of efficacy between two distinct sources of stem cells: human adipose-derived mesenchymal vs. induced pluripotent cell derivatives. Utilizing both in vivo and ex vivo methodology coupled with histological evidence of introduced stem cell localization provides a consistent and compelling argument for a sustainable impact exogenous stem cells may have on the refunctionalization of a pathologically compromised TM.

      Weaknesses:

      A noted weakness of the study, as pointed out by the authors, includes the unanticipated failure of the genetic model to develop glaucoma-related pathology (elevated IOP, TM cell changes). While this is most unfortunate, it does temper the conclusion that exogenous human adipose derived mesenchymal stem cells may restore TM cell function. Given that TM cell function was not altered in their genetic model, it is difficult to say with any certainty that the introduced stem cells would be capable of restoring pathologically altered TM function. A restoration effect remains to be seen. 

      We acknowledge that the phrase “restoration of TM function” is not fully supported by our results, given the absence of ocular hypertension in our animal model. Accordingly, we have revised the language to more precisely describe our findings. For specific details regarding these changes, please refer to our response to Reviewer 1’s public comments above.

      Another noted complication to these findings is the observation that sham intracameralinjected saline control animals all showed elevated IOP and reduced outflow facility, compared to WT or Tg untreated animals, which allowed for more robust statistically significant outcomes. Additional comments/concerns that the authors may wish to address are elaborated in the Private Review section.

      We agree that sham-injected animals tended to have higher average IOPs than transgenic animals in our study. However, these differences did not reach statistical significance and therefore remain inconclusive. Further, an increase in IOP following placebo injection has been previously reported (Zhu et al., 2016). 

      Prompted by the Referee’s comments and also a private comment from Referee 1, we further investigated this effect by analyzing IOP in uninjected contralateral eyes at the mid-term time point and comparing the IOPs in these eyes to other cohorts, as now presented as additional data in Supplementary Tables 1 and 2 and Supplementary Figure 4 (see below). In brief, the uninjected contralateral transgenic eyes (10 months old) showed an IOP of 16.5 [15.9, 17.1] mmHg, which was intermediate between the IOP levels of the 6–7-month-old Tg group (15.4 [14.7, 16.1] mmHg) and the sham group (16.9 [15.5, 18.2] mmHg). However, none of these differences reached statistical significance. Additionally, we cannot rule out potential contralateral effects induced by the injections.

      Regarding the best way to assess the effect of cell treatment, we feel very strongly that the most relevant IOP comparison is between cell-injected eyes and control (vehicle)-injected eyes, since this provides the most direct accounting for the effects of injection itself on IOP. Other comparisons, such as WT or untreated Tg eyes vs. cell-treated eyes, are interesting but harder to interpret. However, in response to the referee’s comment, we have added comparisons between cell-treated groups and untreated Tg eyes to Table 2, adjusting the post-hoc corrections accordingly. All hAMSC treated groups show statistically significant decrease in IOP even compared to Tg untreated eyes, while iPSC-TMs fail to reach such significance.

      The following changes were made to the manuscript:

      Lines 326 et seq.: Eyes subjected to saline injection exhibited marginally higher IOPs and lower outflow facilities on average, in comparison to the transgenic animals at baseline. However, due to the lack of statistical significance in these differences and the inherent age difference between the saline-injected animals and the non-injected controls at baseline, no conclusive inference can be drawn regarding the effect of saline injection. To investigate this phenomenon further, we also analyzed IOPs in uninjected contralateral eyes at the midterm time point (Supplementary Tables 1 and 2, Supplementary Figure 4). The uninjected contralateral transgenic eyes (10 months old) showed an IOP of 16.5 [15.9, 17.1] mmHg, which was intermediate between the IOP levels of the 6–7-month-old Tg group (15.4 [14.7, 16.1] mmHg) and the sham-injected group (16.9 [15.5, 18.2] mmHg). However, none of these differences reached statistical significance. Of note, contralateral hypertension has been previously reported after subconjunctival and periocular injection of dexamethasoneloaded nanoparticles (34), and we similarly cannot definitively rule out potential contralateral effects induced by our stem cell injections. Thus, we cannot draw any definite conclusions from these additional IOP comparisons at this time.

      Reviewer #3 (Public review):

      Summary:

      The purpose of the current manuscript was to investigate a magnetic cell steering technique for efficiency and tissue-specific targeting, using two types of stem cells, in a mouse model of glaucoma. As the authors point out, trabecular meshwork (TM) cell therapy is an active area of research for treating elevated intraocular pressure as observed in glaucoma. Thus, further studies determining the ideal cell choice for TM cell therapy is warranted. The experimental protocol of the manuscript involved the injection of either human adipose derived mesenchymal stem cells (hAMSCs) or induced pluripotent cell derivatives (iPSC-TM cells) into a previously reported mouse glaucoma model, the transgenic MYOCY437H mice and wild-type littermates followed by the magnetic cell steering. Numerous outcome measures were assessed and quantified including IOP, outflow facility, TM cellularity, retention of stem cells, and the inner wall BM of Schlemm's canal.

      Strengths:

      All of these analyses were carefully carried out and appropriate statistical methods were employed. The study has clearly shown that the hAMSCs are the cells of choice over the iPSC-TM cells, the latter of which caused tumors in the anterior chamber. The hAMSCs were shown to be retained in the anterior segment over time and this resulted in increased cellular density in the TM region and a reduction in IOP and outflow facility. These are all interesting findings and there is substantial data to support it.

      Weaknesses:

      However, where the study falls short is in the MYOCY437H mouse model of glaucoma that was employed. The authors clearly state that a major limitation of the study is that this model, in their hands, did not exhibit glaucomatous features as previously reported, such as a significant increase in IOP, which was part of the overall purpose of the study. The authors state that it is possible that "the transgene was silenced in the original breeders". The authors did not show PCR, western blot, or immuno of angle tissue of the tg to determine transgenic expression (increased expression of MYOC was shown in the angle tissue of the transgenics in the original paper by Zode et al, 2011). This should be investigated given that these mice were rederived. Thus, it is clearly possible that these are not transgenic mice.

      All MYOC mice that were used in this study were genotyped and confirmed to carry the transgene as noted in the original version of the paper (see lines 590-2). However, the transgene seems not to have been active, based on the lack of ocular hypertension as well as the lack of differences in supporting endpoints such as outflow facility and TM cellularity. While it would have been possible to carry out their recommended assays to investigate the root cause of this loss of phenotype this was not an objective of our study. Thus we instead here focus simply on communicating the observed loss of phenotype to readers. We also refer the referee to the final paragraph of our response to Referee 1. 

      If indeed they are transgenics, the authors may want to consider the fact that in the Zode paper, the most significant IOP elevation in the mutant mice was observed at night and thus this could be examined by the authors. 

      This is a good point. However, while the dark-phase IOP does exhibit a distinctly larger elevation (as previously observed in hypertonic saline sclerosis), Zode et al. also reported a notable 3 mmHg IOP increase during the light phase. The complete absence of such daytime (light phase) IOP elevation in our animals diminished our enthusiasm for pursuing darkphase IOP measurements. 

      Other glaucomatous features of these mice could also have been investigated such as loss of RGCs, to further determine their transgenic phenotype. 

      We agree that these other phenotypes could be studied, but in the absence of any detectable IOP elevation (and thus lack of mechanical insult on RGC axons), loss of RGC is extremely unlikely. We also note that the loss of retinal ganglion cells (RGCs) in the Myocilin model remains a subject of controversy. For example, despite a significant increase in IOP (>10 mmHg) in this model across four mouse strains, three, including C57BL6/J, did not exhibit any signs of optic nerve damage (McDowell et al., 2012). In contrast, Zhu et al. observed considerable nerve damage in this model, which was reversed following iPSC-TM cell transplantation (Zhu et al., 2016). Given these conflicting findings, we directed our efforts toward outcome measures directly related to aqueous humor dynamics.

      Finally, while increased cellular density in the TM region was observed, proliferative markers could be employed to determine if the transplanted cells are proliferating.

      We agree that identifying the source of the increased trabecular meshwork (TM) cellularity we observed is interesting and we plan to pursue that in future studies. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The sham-injected transgenic animals showed elevated IOP 3-4 weeks after the baseline measurements in the transgenic mice. The authors justify this may be due to the increase in age in these animals. However, this seems unlikely due to the short duration of time between measurement of the baseline IOP and the Short time point (3-4 weeks). The authors do not provide IOP data for any WT sham injected eyes or naïve Tg eyes at these time points. These data are essential to determine if the elevation is due to the sham injection, age, or the transgene. Could it be that the IOP in this cohort of Tg mice didn't increase until 7-8 months of age instead of 6-7 months of age? The methods state only unilateral injections of the stem cells were done so it is assumed the contralateral eye was uninjected. What was the IOP in these eyes? These data would clarify the confusion in the data from sham-injected animals compared to baseline (naive) measurements.

      We agree that the average IOP in saline-injected groups is higher than in WT or non-treated Tg mice, although the difference is inconclusive due to a lack of statistical significance. It is important to note, however, that this difference is subtle and not comparable to the 3 mmHg light-phase IOP elevation previously observed in this model (Zode et al., 2011). 

      We appreciate the reviewer’s suggestion to include IOP data from the contralateral uninjected eyes, and we have now provided this information along with the comparative statistics in the supplementary materials. Additional details can be found in our response to a similar comment from Reviewer 2’s public review. In summary, the IOP difference in contralateral non-injected ten-month-old transgenic eyes was even smaller than in the original Tg group. IOP elevation following saline injection in mice has been reported previously (Zhu et al., 2016). As a potential confounding factor, we highlight possible contralateral effects of the injection itself (which is why we initially did not analyze IOP in the contralateral eyes).

      The hAMSC-treated eyes appear to lower IOP even from baseline (although stats were only provided compared to the sham-injected eyes, which as stated above appear to have increased).

      However, the iPSC-TM-treated eyes had IOPs equal to that of the baseline measurements taken 3 weeks prior. The significance is coming from the "sham-treated" eyes which had elevated IOPs. The controls listed above should be included to make these conclusions.

      The reviewer makes an astute observation. Please refer to our response to a similar observation by Reviewer 2 under public reviews, where we provide and discuss the comparative statistics noted by the reviewer. However, we feel very strongly that the most relevant IOP comparison is between cell-injected eyes and control-injected eyes. 

      If the transgenic mouse model truly did not have a phenotype, then the authors are testing the ability of the stem cells to lower IOP from baseline normal pressures. Therefore, the authors are not "restoring function of the conventional outflow pathway" as there is no damage to begin with. The language in the manuscript should be corrected to reflect this if the transgenics have no phenotype.

      We agree and have adjusted the language accordingly. For further details, please refer to our response to your public review.

      The authors noted in the iPSC-TM-treated eyes there was a high rate of tumorigenicity. If the magnetic steering of these cells is specific and targeted to the TM, why do the tumors form near the central iris?

      While magnetic steering is more specific to the trabecular meshwork (TM) than previouslyused approaches (Bahrani Fard et al., 2023), it is not perfect, and a modest amount of offtarget delivery to the iris, including its central portion, still occurs. Apparently, it took only a few mis-directed iPSC-TM cells to lead to tumors in this work, which is a serious concern for future translational approaches. 

      Reviewer #2 (Recommendations for the authors):

      (1) It appears that mice were injected unilaterally (Line 590). I may have missed this, but was the companion un-injected eye analyzed in this study? If not analyzed, was there a confounding concern or limitation that necessitated omitting this possible control option?

      Contralateral effects, such as hypertension in the untreated eye after subconjunctival and periocular injection of dexamethasone-loaded nanoparticles, have previously been reported in the literature (Li et al., 2019) and also reported anecdotally by other leaders in the field to the senior authors, which is why we did not initially analyze contralateral eyes in this study. However, prompted by this comment and others, we have now included the IOP measurements for contralateral uninjected ten-month-old transgenic eyes in the supplementary materials. For further details, please refer to our response to your public review.

      (2) Were all these mice the same gender? Would gender be expected to alter the findings of this study?

      Animals of both sexes were randomly chosen and included in the study. We added the following statement to the Materials and Methods section (line 530): After breeding and genotyping, mice, regardless of sex, were maintained to age 6-7 months, when transgenic animals were expected to have developed a POAG phenotype.

      (3) As noted in the public review, the use of PBS for a control seems to have resulted in a slight elevation in IOP (Figure 2) as well as a reduction in outflow facility (Figure 3B) when compared to WT or Tg mice. Was this difference statistically significant? 

      The differences between the sham (saline)-injected groups at any time point and untreated Tg mice did not reach statistical significance for IOP, facility, or TM cellularity and for facility, did not even show clear trends. For example, WT mice had, on average, 0.2 mmHg higher IOP and 0.6 nl/min/mmHg greater facility than the Tg group. Meanwhile on a similar scale, the long-term sham group exhibited 0.4 nl/min/mmHg higher facility compared to the Tg group. As the statistical tests indicate, these differences should be interpreted more as noise than meaningful signal. 

      If so, then it should be noted as to whether the observed decrease in IOP following stem cell injection remained statistically significant when compared to these un-injected control animals. If significance was lost, then this should be appropriately noted and discussed. It is not apparently obvious why sham controls should have elevated IOP. This is a design and statistical concern.

      Please refer to our response to a similar observation by Reviewer 1. We believe that comparing the treatment (cell suspension in saline) with its age-matched vehicle (saline) is the appropriate approach which maintains rigor by most directly accounting for the effects of injection. 

      (4) The tonicity of the PBS used as a vehicle control was not stated and I did not see within the methods whether the stem cells were suspended using this same PBS vehicle. I assume isotonic phosphate buffered saline was used and that the stem cells were resuspended using the same sterile PBS. 

      Thanks for catching this. We added “sterile PBS (1X, Thermo Fisher Scientific, Waltham, MA)” to the Methods section of the manuscript (line 567). 

      With regards to using PBS as an injection control, I wonder if a better comparable control might have been to use mesenchymal stem cells that were rendered incapable of proliferating prior to intracameral injection. This, of course, addresses the unexplained mechanism(s) by which mesenchymal stem cells elicit a decrease in IOP.

      This is an interesting idea, and represents another level of control. However, we explicitly chose not to use non-proliferating hAMSCs as a control, for several reasons. Firstly, a saline injection is the simplest control and in this initial study with multiple groups, we did not feel another experimental group should be added. Second, this control would not rule out paracrine effects from injected cells, which our data suggested are an important effect. Third, rendering injected cells truly non-proliferative could introduce unwanted/unknown phenotypes in these cells that would need to be carefully characterized. That being said, if an efficient method could be developed to render an entire population of these cells irreversibly non-proliferating, the reviewer’s suggestion would be worth pursuing to better understand the mechanism of TM cell therapies. 

      (5) As noted in Figure 4C, TM cellular density as quantified was not altered in the sham control, so a loss of cellular density can not explain the elevated IOP with this group. Injecting viable (not determined?) mesenchymal stem cells did show, over the short term, a noted increase in TM cellular density. 

      Thank you for noting this. We agree that changes in cell density do not explain the mild IOP elevation in the sham group. As the referee certainly is aware, there are multiple reasons that IOP can be elevated (changes in trabecular meshwork extracellular matrix, changes in trabecular meshwork stiffness) that are not necessarily related to cell density.  Since we do not know definitively the cause of this mild elevation, we would prefer to not speculate about it in the manuscript. 

      Thanks for pointing out our omission of a statement about injected cell viability. We have now included the following statement in the Materials and Methods section (564-566): “For all the experiments where animals received hAMSC, cell count and >90% viability was verified using a Countess II Automated Cell Counter (Thermo Fisher Scientific, Waltham, MA).”

      I'm confused, as clearly stated (Lines 431-432), mesenchymal stem cells accumulated close to, but not within, the TM. How is it that TM cellular density increased if these stem cells did not enter the TM? The authors may wish to clarify this distinction. Given that mesenchymal stem cells did not increase the risk of tumorigenicity, do the authors have any evidence that these cells actually proliferated post-injection or did they undergo senesce thereby displaying senescence-associated secretory phenotype as a source of paracrine support?

      As the reviewer correctly noted, our observations show that hAMSCs primarily accumulated close to, but outside, the TM (likely caught up in the pectinate ligaments). Based on observations of increased TM cellularity, we think that the most likely explanation of these findings is paracrine signaling, as the reviewer suggests and which was discussed at length in the original version of the manuscript (lines 453-477). 

      We agree that, despite observing little signal from hAMSCs within the TM, labeling with proliferation markers (e.g., Ki-67) and searching for co-localization with exogenous cells, and/or labeling for senescence markers would have provided more mechanistic information. This is an excellent topic for future study, which we plan to pursue, but was outside the scope of this study. 

      (6) As noted in the public review, I think it is a bit of a stretch to even suggest that the findings of this study support stem cell restoration of TM function given that the model apparently did not produce TM cell dysfunction as anticipated. A restoration effect remains to be seen.

      We agree and have adjusted the language accordingly. For further details, please refer to our response to Reviewer 1’s public comment.

      Reviewer #3 (Recommendations for the authors):

      (1) Show PCR, western blot, or immuno of angle tissue of the MYOC tg to confirm transgenic expression.

      (2) Examine the IOP of mice at night.

      (3) Investigate other glaucomatous features in the mice to determine if they have any of the transgenic phenotypes previously reported.

      (4) Examine proliferative markers in the TM region of angles injected with stem cells.

      Please see our responses to all four of these comments in the public section.

      Bibliography (for this response letter only)

      Bahrani Fard, M.R., Chan, J., Sanchez Rodriguez, G., Yonk, M., Kuturu, S.R., Read, A.T., Emelianov, S.Y., Kuehn, M.H., Ethier, C.R., 2023. Improved magnetic delivery of cells to the trabecular meshwork in mice. Exp. Eye Res. 234, 109602. https://doi.org/10.1016/j.exer.2023.109602

      Li, G., Lee, C., Agrahari, V., Wang, K., Navarro, I., Sherwood, J.M., Crews, K., Farsiu, S., Gonzalez, P., Lin, C.-W., Mitra, A.K., Ethier, C.R., Stamer, W.D., 2019. In vivo measurement of trabecular meshwork stiffness in a corticosteroid-induced ocular hypertensive mouse model. Proc. Natl. Acad. Sci. U. S. A. 116, 1714–1722.

      https://doi.org/10.1073/pnas.1814889116

      Zhu, W., Gramlich, O.W., Laboissonniere, L., Jain, A., Sheffield, V.C., Trimarchi, J.M., Tucker, B.A., Kuehn, M.H., 2016. Transplantation of iPSC-derived TM cells rescues glaucoma phenotypes in vivo. Proc. Natl. Acad. Sci. 113, E3492–E3500.

      Zode, G.S., Kuehn, M.H., Nishimura, D.Y., Searby, C.C., Mohan, K., Grozdanic, S.D., Bugge, K., Anderson, M.G., Clark, A.F., Stone, E.M., Sheffield, V.C., 2011. Reduction of ER stress via a chemical chaperone prevents disease phenotypes in a mouse model of primary open angle glaucoma. J. Clin. Invest. 121, 3542–3553. https://doi.org/10.1172/JCI58183

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1:

      The authors attempted to replicate previous work showing that counterconditioning leads to more persistent reduction of threat responses, relative to extinction. They also aimed to examine the neural mechanisms underlying counterconditioning and extinction. They achieved both of these aims and were able to provide some additional information, such as how counterconditioning impacts memory consolidation. Having a better understanding of which neural networks are engaged during counterconditioning may provide novel pharmacological targets to aid in therapies for traumatic memories. It will be interesting to follow up by examining the impact of varying amounts of time between acquisition and counterconditioning phases, to enhance replicability to real-world therapeutic settings.

      Major strengths

      · This paper is very well written and attempts to comprehensively assess multiple aspects of counterconditioning and extinction processes. For instance, the addition of memory retrieval tests is not core to the primary hypotheses but provides additional mechanistic information on how episodic memory is impacted by counterconditioning. This methodical approach is commonly seen in animal literature, but less so in human studies.

      · The Group x Cs-type x Phase repeated measure statistical tests with 'differentials' as outcome variables are quite complex, however, the authors have generally done a good job of teasing out significant F test findings with post hoc tests and presenting the data well visually. It is reassuring that there is a convergence between self-report data on arousal and valence and the pupil dilation response. Skin conductance is a notoriously challenging modality, so it is not too concerning that this was placed in the supplementary materials. Neural responses also occurred in logical regions with regard to reward learning.

      · Strong methodology with regards to neuroimaging analysis, and physiological measures.

      ·The authors are very clear on documenting where there were discrepancies from their pre-registration and providing valid rationales for why.

      We thank reviewer 1 for the positive feedback and for pointing out the strengths of our work. We agree that future research should investigate varying times between acquisition and counterconditioning to assess its success in real-life applications.

      Major Weaknesses

      (1) The statistics showing that counterconditioning prevents differential spontaneous recovery are the weakest p values of the paper (and using one-tailed tests, although this is valid due to directions being pre-hypothesized). This may be due to a relatively small number of participants and some variability in responses. It is difficult to see how many people were included in the final PDR and neuroimaging analyses, with exclusions not clearly documented. Based on Figure 3, there are relatively small numbers in the PDR analyses (n=14 and n=12 in counterconditioning and extinction, respectively). Of these, each group had 4 people with differential PDR results in the opposing direction to the group mean. This perhaps warrants mention as the reported effects may not hold in a subgroup of individuals, which could have clinical implications.

      General exclusion criteria are described on page 17. We have added more detailed information on the reasons for exclusion (see page 17). All exclusions were in line with pre-registered criteria. For the analysis, the reviewer is referring to (PDR analysis that investigated whether CC can prevent the spontaneous recovery of differential conditioned threat responses), 18 participants were excluded from this analysis: 2 participants did not show evidence for successful threat acquisition as was already indicated on page 17, and 16 participants were excluded due to (partially) missing data. We now explicitly mention the exclusion of the additional 16 participants on page 7 and have updated Figure 3 to improve visibility of the individual data points. Therefore, for this analysis both experimental groups consisted of 15 participants (total N=30).

      It is true that in both groups a few participants show the opposite pattern. Although this may also be due to measurement error, we agree that it is relevant to further investigate this in future studies with larger sample sizes. It will be crucial to identify who will respond to treatments based on the principles of standard extinction or counterconditioning. We have added this point in the discussion on page 14.

      Reviewer #2:

      Summary:

      The present study sets out to examine the impact of counterconditioning (CC) and extinction on conditioned threat responses in humans, particularly looking at neural mechanisms involved in threat memory suppression. By combining behavioral, physiological, and neuroimaging (fMRI) data, the authors aim to provide a clear picture of how CC might engage unique neural circuits and coding dynamics, potentially offering a more robust reduction in threat responses compared to traditional extinction.

      Strengths:

      One major strength of this work lies in its thoughtful and unique design - integrating subjective, physiological, and neuroimaging measures to capture the various aspects of counterconditioning (CC) in humans. Additionally, the study is centered on a well-motivated hypothesis and the findings have the potential to improve the current understanding of pathways associated with emotional and cognitive control. The data presentation is systematic, and the results on behavioral and physiological measures fit well with the hypothesized outcomes. The neuroimaging results also provide strong support for distinct neural mechanisms underlying CC versus extinction.

      We thank reviewer 2 for the feedback and for valuing the thoughtfulness that went into designing the study.

      Weaknesses:

      (1) Overall, this study is a well-conducted and thought-provoking investigation into counterconditioning, with strong potential to advance our understanding of threat modulation mechanisms. Two main weaknesses concern the scope and decisions regarding analysis choices. First, while the findings are solid, the topic of counterconditioning is relatively niche and may have limited appeal to a broader audience. Expanding the discussion to connect counterconditioning more explicitly to widely studied frameworks in emotional regulation or cognitive control would enhance the paper's accessibility and relevance to a wider range of readers. This broader framing could also underscore the generalizability and broader significance of the results. In addition, detailed steps in the statistical procedures and analysis parameters seem to be missing. This makes it challenging for readers to interpret the results in light of potential limitations given the data modality and/or analysis choices.

      In this updated version of the manuscript, we included the notion that extinction has been interpreted as a form of implicit emotion regulation. In addition to our discussion on active coping (avoidance), we believe that our discussion has an important link to the more general framework of emotion regulation, while remaining within the scope of relevance. Please see pages 14 and 15 for the changes. In addition to being informative to theories of emotion regulation, our findings are also highly relevant for forms of psychotherapy that build on principles of counterconditioning (e.g. the use of positive reinforcement in cognitive behavioral therapy), as we point out in the introduction. We believe this relevance shows that counterconditioning is more than a niche topic. In line with the recommendation from reviewer 2, we added more details and explanations to the statistical procedures and analyses where needed (see responses to recommendations).

      Reviewer #3:

      Summary:

      In this manuscript, Wirz et al use neuroimaging (fMRI) to show that counterconditioning produces a longer lasting reduction in fear conditioning relative to extinction and appears to rely on the nucleus accumbens rather than the ventromedial prefrontal cortex. These important findings are supported by convincing evidence and will be of interest to researchers across multiple subfields, including neuroscientists, cognitive theory researchers, and clinicians.

      In large part, the authors achieved their aims of giving a qualitative assessment of the behavioural mechanisms of counterconditioning versus extinction, as well as investigating the brain mechanisms. The results support their conclusions and give interesting insights into the psychological and neurobiological mechanisms of the processes that underlie the unlearning, or counteracting, of threat conditioning.

      Strengths:

      · Mostly clearly written with interesting psychological insights

      · Excellent behavioural design, well-controlled and tests for a number of different psychological phenomena (e.g. extinction, recovery, reinstatement, etc).

      · Very interesting results regarding the neural mechanisms of each process.

      · Good acknowledgement of the limitations of the study.

      We thank reviewer 3 for the detailed feedback and suggestions.

      Weaknesses:

      (1) I think the acquisition data belongs in the main figure, so the reader can discern whether or not there are directional differences prior to CC and extinction training that could account for the differences observed. This is particularly important for the valence data which appears to differ at baseline (supplemental figure 2C).

      Since our design is quite complex with a lot of results, we left the fear acquisition results as a successful manipulation check in the Supplementary Information to not overload the reader with information that is not the main focus of this manuscript. If the editor would like us to add the figure to the main text, we are happy to do so. During fear acquisition, both experimental groups showed comparable differential conditioned threat responses as measured by PDRs and SCRs. Subjective valence ratings indeed differed depending on CS category. Importantly, however, the groups only differed with respect to their rating to the CS- category, but not the CS+ category, which suggests that the strength of the acquired fear is similar between the groups. To make sure that these baseline differences cannot account for the differences in valence after CC/Ext, we ran an additional group comparison with differential valence ratings after fear acquisition added as a covariate. Results show that despite the baseline difference, the group difference in valence after CC/Ext is still significant (main effect Group: F<sub>(1,43)</sub>=7.364, p=0.010, η<sup>2</sup>=0.146). We have added this analysis to the manuscript (see page 7).

      (2) I was confused in several sections about the chronology of what was done and when. For instance, it appears that individuals went through re-extinction, but this is just called extinction in places.

      We understand that the complexity of the design may require a clearer description. We therefore made some changes throughout the manuscript to improve understanding. Figure 1 is very helpful in understanding the design and we therefore refer to that figure more regularly (see pages 6-7). We also added the time between tasks where appropriate (e.g. see page 7). Re-extinction after reinstatement was indeed mentioned once in the manuscript. Given that the reinstatement procedure was not successful (see page 9), we could not investigate re-extinction and it is therefore indeed not relevant to explicitly mention and may cause confusion. We therefore removed it (see page 12).

      (3) I was also confused about the data in Figure 3. It appears that the CC group maintained differential pupil dilation during CC, whereas extinction participants didn't, and the authors suggest that this is indicative of the anticipation of reward. Do reward-associated cues typically cause pupil dilation? Is this a general arousal response? If so, does this mean that the CSs become equally arousing over time for the CC group whereas the opposite occurs for the extinction group (i.e. Figure 3, bottom graphs)? It is then further confusing as to why the CC group lose differential responding on the spontaneous recovery test. I'm not sure this was adequately addressed.

      Indeed, reward and reward anticipation also evoke an increase in pupil dilation. This was an important reason for including a separate valence-specific response characterization task. Independently from the conditioning task, this task revealed that both threat and reward-anticipation induced strong arousal-related PDRs and SCRs. This was also reflected in the explicit arousal ratings, which were stronger for both the shock-reinforced (negative valence) and reward-reinforced (positive valence) stimuli. Therefore, it is not surprising that reward anticipation leads to stronger PDRs for CS+ (which predict reward) compared to CS- stimuli (which do not predict reward) during CC, but is reduced during extinction due to a decrease in shock anticipation. During the spontaneous recovery test, a return of stronger PDRs for CS+ compared to CS- stimuli in the standard extinction group can only reflect a return of shock anticipation. Importantly, the CC group received no rewards during the spontaneous recovery task and was aware of this, so it is to be expected that the effect is weakened in the CC group. However, CS+ and CS- items were still rated of similar valence and PDRs did not differ between CS+ and CS- items in the CC group, whereas the Ext group rated the CS+ significantly more negative and threat responses to the CS+ did return. It therefore is reasonable to conclude that associating the CS+ with reward helps to prevent a return of threat responses. We have added some clarifications and conclusions to this section on page 8.

      (4) I am not sure that the memories tested were truly episodic

      In line with previous publications from Dunsmoor et al.[1-4], our task allows for the investigation of memory for elements of a specific episode. In the example of our task, retrieval of a picture probes retrieval of the specific episode, in which the picture was presented. In contrast, fear retrieval relies on the retrieval of the category-threat association, which does not rely on retrieval of these specific episodic elements, but could be semantic in nature, as retrieval takes place at a conceptual level. We have added a small note on what we mean with episodic in this context on page 4. We do agree that we cannot investigate other aspects of episodic memories here, such as context, as this was not manipulated in this experiment.

      (5) Twice as many female participants than males

      It is indeed unfortunate that there is no equal distribution between female and male participants. Investigating sex differences was not the goal of this study, but we do hope that future studies with the appropriate sample sizes are able to investigate this specifically. We have added this to the limitations of this study on page 17.

      (6) No explanation as to why shocks were varied in intensity and how (pseudo-randomly?)

      The shock determination procedure is explained on pages 18-19 (Peripheral stimulation). As is common in fear conditioning studies in humans (see references), an ascending staircase procedure was used. The goal of this procedure is to try and equalize the subjective experience of the electrical shocks to be “maximally uncomfortable but not painful”.

      Recommendations for the authors:

      Reviewer #1:

      Very well written. No additional comments

      We thank reviewer 1 for valuing our original manuscript version. To further improve the manuscript, we adapted the current version based on the reviewer’s public review (see response to reviewer #1 public review comment 1).

      Reviewer #2:

      (1) I feel that more justification/explanation is needed on why other regions highly relevant to different aspects of counterconditioning (e.g., threat, memory, reward processing) were not included in the analyses.

      We first performed whole-brain analyses to get a general idea of the different neural mechanisms of CC compared to Ext. Clusters revealing significant group differences were then further investigated by means of preregistered ROI analyses. We included regions that have previously been shown to be most relevant for affective processing/threat responding (amygdala), memory (hippocampus), reward processing (NAcc) and regular extinction (vmPFC). We restricted our analyses to these most relevant ROIs as preregistered to prevent inflated or false-positive findings[5]. Beyond these preregistered ROIs, we applied appropriate whole-brain FEW corrections. The activated regions are listed in Supplementary Table 1 and include additional regions that were expected, such as the ACC and insula.

      (2) Were there observed differences across participants in the experiment? Any information on variance in the data such as how individual differences might influence these findings would provide a richer understanding of counterconditioning and increase the depth of interpretation for a broad readership.

      We agree that investigating individual differences is crucial to gain a better understanding of treatment efficacy in the framework of personalized medicine. Specifically, future research should aim to identify factors that help predict which treatment will be most effective for a particular patient. The results of this study provide a good basis for this, as we could show that the vmPFC in contrast to regular extinction, is not required in CC to improve the retention of safety memory. Therefore, this provides a viable option for patients who are not responding to treatments that rely on the vmPFC. In addition, as noted by Reviewer 1, in both groups a few participants show the opposite pattern (see Figure 3). It will be crucial to identify who will respond to treatments based on the principles of standard extinction or counterconditioning. We have added this point in the discussion on page 14.

      (3) While most figures are informative and clear, Figure 3 would benefit from detailed axis labels and a more descriptive caption. Currently, it is challenging to navigate the results presented to support the findings related to differential PDRs. A supplementary figure consolidating key patterns across conditions might also further facilitate understanding of this rather complicated result.

      We have made some changes to the figure to improve readability and understanding. Specifically, we changed the figure caption to “Change from last 2 trials CC/Ext to first 2 trials Spontaneous recovery test”, to give more details on what exactly is shown here. We also simplified the x-axis labels to “counterconditioning”, “recovery test” and “extinction”. With the addition of a clearer figure description, we hope to have improved understanding and do not think that another supplemental figure is needed.

      (4) Additional details on the statistical tests are needed. For example, please clarify whether p-values reported were corrected across all experimental conditions. Also, it would be helpful for the authors to discuss why for example repeated measures ANOVA or mixed-effects conditions were not used in this study. Might those tests not capture variance across participants' PDRs and SCRs over time better?

      We added that significant interactions were followed by Bonferroni-adjusted post-hoc tests where applicable (see page 21). We have used repeated measures ANOVAs to capture early versus late phases of acquisition and CC/extinction, as well as to compare late CC/extinction (last 2 trials) compared to early spontaneous recovery (first 2 trials) as is often done in the literature. A trial-level factor in a small sample would cost too many degrees of freedom and is not expected to provide more information. We have added this information and our reasoning to the methods section on page 21.

      Reviewer #3:

      (1) Suggest putting acquisition data into the main figures. In fact many of the supplemental figures could be integrated into the main figures in my opinion.

      See response to reviewer #3 public review comment 1.

      (2) Include explanations for why shock intensity was varied

      See response to reviewer #3 public review comment 6.

      (3) Include a better explanation for the change in differential responding from training to spontaneous recovery in the CC group (I think the loss of such responding in extinction makes more sense and is supported by the notion of spontaneous recovery, but I'm not sure about the loss in the CC group. There is some evidence from the rodent literature - which I am most familiar with - regarding a loss in contextual gradient across time which could account for some loss in specificity, could it be something like this?).

      See response to reviewer #3 public review comment 3.

      If we understand the reviewer correctly in that the we see a loss of differential responding due to a generalization to the CS-, this would imply an increase in responding to the CS-, which is not what we see. Our data should therefore be correctly interpreted as a loss of the specific response to the CS+ from the CC phase to the recovery test. Therefore, there is no spontaneous recovery in the CC group, and also not a non-specific recovery. To clarify this we relabeled Figure 3 by indicating “recovery test” instead of “spontaneous recovery”.

      (4) Is there a possibility that baseline differences, particularly that in Supplemental Figure 2C, could account for later differences? If differences persist after some transformation (e.g. percentage of baseline responding) this would be convincing to suggest that it doesn't.

      See response to reviewer #3 public review comment 1.

      (5) As I mentioned, I got confused by the chronology as I read through. Maybe mention early on when reporting the spontaneous recovery results that testing occurred the next day and that participants were undergoing re-extinction when talking about it for the second time.

      See response to reviewer #3 public review comment 2.

      (6) Page 8 - I was confused as to why it is surprising that the CC group were more aroused than the extinction group, the latter have not had CSs paired with anything with any valence, so doesn't this make sense? Or perhaps I am misunderstanding the results - here in text the authors refer back to Figure 2B, but I'm not sure if this is showing data from the spontaneous recovery test or from CC/extinction. If it is the latter, as the caption suggests, why are the authors referring to it here?

      Participants in the CC group showed increased differential self-reported arousal after CC, whereas arousal ratings did not differ between CS+ and CS- items after extinction. We interpret this in line with the valence and PDR results as an indication of reward-induced arousal. At the start of the next day, however, participants from the CC and extinction groups gave comparable ratings. It may therefore be surprising why participants in the CC group do not still show stronger ratings since nothing happened between these two ratings besides a night’s sleep (see design overview in Figure 1A). We removed the “suprisingly” to prevent any confusion.

      (7) I suggest that the authors comment on whether there were any gender differences in their results.

      See response to reviewer #3 public review comment 5.

      (8) The study makes several claims about episodic memory, but how can the authors be sure that the memories they are tapping into are episodic? Episodic has a very specific meaning - a biographical, contextually-based memory, whereas the information being encoded here could be semantic. Perhaps a bit of clarification around this issue could be helpful.

      See response to reviewer #3 public review comment 4.

      References

      (1) Dunsmoor, J. E. & Kroes, M. C. W. Episodic memory and Pavlovian conditioning: ships passing in the night. Curr Opin Behav Sci 26, 32-39 (2019). https://doi.org/10.1016/j.cobeha.2018.09.019

      (2) Dunsmoor, J. E. et al. Event segmentation protects emotional memories from competing experiences encoded close in time. Nature Human Behaviour 2, 291-299 (2018). https://doi.org/10.1038/s41562-018-0317-4

      (3) Dunsmoor, J. E., Murty, V. P., Clewett, D., Phelps, E. A. & Davachi, L. Tag and capture: how salient experiences target and rescue nearby events in memory. Trends Cogn Sci 26, 782-795 (2022). https://doi.org/10.1016/j.tics.2022.06.009

      (4) Dunsmoor, J. E., Murty, V. P., Davachi, L. & Phelps, E. A. Emotional learning selectively and retroactively strengthens memories for related events. Nature 520, 345-348 (2015). https://doi.org/10.1038/nature14106

      (5) Gentili, C., Cecchetti, L., Handjaras, G., Lettieri, G. & Cristea, I. A. The case for preregistering all region of interest (ROI) analyses in neuroimaging research. Eur J Neurosci 53, 357-361 (2021). https://doi.org/10.1111/ejn.14954

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewer #1:

      First, I thank the authors for clarifying some of the confusion I had in the previous comment and I appreciate the efforts the authors put into improving the quality of the manuscript. However, my concerns about the lack of novelty of the key findings are not perfectly addressed and there is no additional analysis done in this revision. Currently in this version of the manuscript, asserting that a p-value of 10-6 is close to genome-wide significance may be considered an overstatement. Further analysis focusing on finding novel and additional discovery is very necessary.

      We thank the reviewer for their comments. Reviewer #2 also made a comment regarding the genomewide threshold, “However, it remains unclear why the authors found it appropriate to apply STEAM to the LAAA model, a joint test for both allele and ancestry effects, which does not benefit from the same reduction in testing burden.” The reviewers’ have correctly identified our oversight - we have amended the manuscript as follows:

      (1) The abstract, “We identified a suggestive association peak (rs3117230, p-value = 5.292 x10-6, OR = 0.437, SE = 0.182) in the HLA-DPB1 gene originating from KhoeSan ancestry.”

      (2) From line 233 to 239: “The R package STEAM (Significance Threshold Estimation for Admixture Mapping) (Grinde et al., 2019) was used to determine the admixture mapping significance threshold given the global ancestral proportions of each individual and the number of generations since admixture (g = 15). For the LA model, a genome-wide significance threshold of pvalue < 2.5 x 10-6 was deemed significant by STEAM. The traditional genome-wide significance threshold of 5 x 10-8 was used for the GA, APA and LAAA models, as recommended by the authors of the LAAA model (Duan et al., 2018).” 

      (3) We excluded the results for the signal on chromosome 20, since this also did not reach the LAAA model genome-wide significance threshold.  

      (4) From line 296 to 308: “LAAA models were successfully applied for all five contributing ancestries (KhoeSan, Bantu-speaking African, European, East Asian and Southeast Asian). However, no variants passed the threshold for statistical significance. Although no variants reached genome-wide significance, a suggestive peak was identified in the HLA-II region of chromosome 6 when using the LAAA model and adjusting for KhoeSan ancestry (Figure 3). The QQ-plot suggested minimal genomic inflation, which was verified by calculating the genomic inflation factor ( = 1.05289) (Supplementary Figure 1). The lead variants identified using the LAAA model whilst adjusting for KhoeSan ancestry in this region on chromosome 6 are summarised in Table 3. The suggestive peak encompasses the HLA-DPA1/B1 (major histocompatibility complex, class II, DP alpha 1/beta 1) genes (Figure 4). It is noteworthy that without the LAAA model, this suggestive peak would not have been observed for this cohort. This highlights the importance of utilising the LAAA model in future association studies when investigating disease susceptibility loci in admixed individuals, such as the SAC population.”

      We acknowledge that our results are not statistically significant. However, our study advances this area of research by identifying suggestive African-specific ancestry associations with TB in the HLA-II region. These findings build upon the work of the ITHGC, which did not identify any significant associations or suggestive peaks in their African-specific analyses. We have included this argument in our manuscript (from lines 425 to 432):

      “The ITHGC did not identify any significant associations or suggestive peaks in their African ancestryspecific analyses.  Notably, the suggestive peak in the HLA-DPB1 region was only captured in our cohort using the LAAA model whilst adjusting for KhoeSan local ancestry. This underscores the importance of incorporating global and local ancestry in association studies investigating complex multi-way admixed individuals, as the genetic heterogeneity present in admixed individuals (produced as a result of admixtureinduced and ancestral LD patterns) may cause association signals to be missed when using traditional association models (Duan et al., 2018; Swart, van Eeden, et al., 2022).”

      We appreciate the comment regarding additional analyses. We acknowledge that we did not validate our SNP peak in the HLA-II region through fine-mapping due to the lack of a suitable reference panel (see lines 490 to 500). Our long-term goal is to develop a HLA-imputation reference panel incorporating KhoeSan ancestry; however, this is beyond the scope and funding allowances of this study.

      Reviewer #2 (Recommendations for the authors):

      The authors we think have done an excellent job with their responses and the manuscript has been substantially improved.

      Thank you for taking the time to help us improve our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the public reviewers and editors for their insightful comments on the manuscript. We have made the following changes to address their concerns and think the resulting manuscript is stronger as a result. Specifically, we have 1) added RNA FISH data of specific STB-2 and STB-3 RNA markers to confirm their distribution changes between STB<sup>in</sup> and STB<sup>out</sup> TOs, 2) removed language throughout the text that refer to STB-3 as a terminally differentiated nuclear subtype, and 3) generated CRISPR-mediated knock-outs of two genes identified by network analysis and validated their rolse in mediating STB nuclear subtype gene expression.

      Reviewer #1 (Public review): 

      Strengths: 

      The study offers a comprehensive SC- and SN-based characterization of trophoblast organoid models, providing a thorough validation of these models against human placental tissues. By comparing the older STB<sup>in</sup> and newer STB<sup>out</sup> models, the authors effectively demonstrate the improvements in the latter, particularly in the differentiation and gene expression profiles of STBs. This work serves as a critical resource for researchers, offering a clear delineation of the similarities and differences between TO-derived and primary STBs. The use of multiple advanced techniques, such as high-resolution sequencing and trajectory analysis, further enhances the study's contribution to the field. 

      Thank you for your thoughtful review—we appreciate your recognition of our efforts to comprehensively validate trophoblast organoid models and highlight key advancements in STB differentiation and gene expression.

      Weaknesses: 

      While the study is robust, some areas could benefit from further clarification. 

      (1) The importance of the TO model's orientation and its impact on outcomes could be emphasized more in the introduction. 

      We agree that TO orientation may significantly influence STB nuclear subtype differentiation. As the STB is critical for both barrier formation and molecular transport in vivo, lack of exposure to the surrounding media in STB<sup>in</sup> TOs in vitro could compromise these functions and the associated environmental cues that influence STB nuclear differentiation. We have added text to the introduction to highlight this point (lines 117-120).

      (2) The differences in cluster numbers/names between primary tissue and TO data need a clearer explanation, and consistent annotation could aid in comparison. 

      Thank you for highlighting that the comparisions and cluster annotations need clarification. In Figure 1, we did not aim to directly compare CTB and STB nuclear subtypes between TOs and tissue. Each dataset was analyzed independently, with clusters determined separately and with different resolutions decided via a clustering algorithm (Zappia and Oshlack, 2018). For example, for the STB, this approach identified seven subtypes in tissue but only two in TOs, making direct comparison challenging. To address this challenge, we integrated the SN datasets from TOs and tissue in Figure 6. This integration allowed us to directly compare gene expression between the sample types and examine the proportions within each STB subtype. Similarly, in Figure 2, direct comparison of individual CTB or STB clusters across the separate datasets is challenging (Figures 2A-C) due to differences in clustering. To overcome this, we integrated the datasets to compare cluster gene expression and relative proportions (Figures 2D-E). Nonetheless, to address the reviewers concern we have added text to the results section to clarify that subclusters of CTB and STB between datasets should not be directly compared until the datasets are integrated in Figure 2D-E and Figure 6 (lines 166-167).

      (3) The rationale for using SN sequencing over SC sequencing for TO evaluations should be clarified, especially regarding the potential underrepresentation of certain trophoblast subsets. 

      This is an important point as the challenges of studying a giant syncytial cell are often underappreciated by researchers that study mononucleated cells. We have added text to the introduction to clarify why traditional single cell RNA sequencing techniques were inadequate to collect  and characterize the STB (lines 91-93).

      (4) Additionally, more evidence could be provided to support the claims about STB differentiation in the STB<sup>out</sup> model and to determine whether its differentiation trajectory is unique or simply more advanced than in STB<sup>in</sup>. 

      Our original conclusion that STB<sup>out</sup> nuclei are more terminally differentiated than STB<sup>in</sup> was based on two observations: (1) STB<sup>out</sup> TOs exhibit increased expression of STB-specific pregnancy hormones and many classic STB marker genes and (2) STB<sup>out</sup> nuclei show an enrichment of the STB-3 nuclear subtype, which appears at the end of the slingshot pseudotime trajectory. However, upon consideration of the reviewer comments, we agree that this evidence is not sufficient to definitively distinguish if STB<sup>out</sup> nuclei are more advanced or follow a unique differentiation trajectory dependent on new environmental cues. Pseudotime analyses provided only a predictive framework for lineage tracing, and these predictions must be experimentally validated. Real-time tracking of STB nuclear subtypes in TOs would require a suite of genetic tools beyond the scope of this study. Therefore, to address the reviewers' concerns we have removed language suggesting that STB-3 is a terminally differentiated subtype or that STB<sup>out</sup> nuclei are more differentiated than STB<sup>in</sup> nuclei throughout the text until the discussion. Therein we present both our original hypothesis (that STB nuclei are further differentiated in STB<sup>out</sup>) and alternative explanations like changing trajectories due to local environmental cues (lines 619-625).

      Reviewer #2 (Public review): 

      Strengths: 

      (1) The use of SN and SC RNA sequencing provides a detailed analysis of STB formation and differentiation. 

      (2) The identification of distinct STB subtypes and novel gene markers such as RYBP offers new insights into STB development. 

      Thank you for highlighting these strengths—we appreciate your recognition of our use of SN and SC RNA sequencing to analyze STB differentiation and the discovery of distinct STB subtypes and novel gene markers like RYBP.

      Weaknesses: 

      (1) Inconsistencies in data presentation. 

      We address the individual comments of reviewer 2 later in this response.

      (2) Questionable interpretation of lncRNA signals: The use of long non-coding RNA (lncRNA) signals as cell type-specific markers may represent sequencing noise rather than true markers. 

      We appreciate the reviewer’s attention to detail in noticing the lncRNA signature seen in many STB nuclear subtypes. However, we disagree that these molecules simply represent sequencing noise. In fact, may studies have rigorously demonstrated that lncRNAs have both cell and tissue specific gene expression (e.g., Zhao et al 2022, Isakova et al 2021, Zheng et al 2020). Further, they have been shown to be useful markers of unique cell types during development (e.g., Morales-Vicente et al 2022, Zhou et al 2019, Kim et al 2015) and can enhance clustering interpretability in breast cancer (Malagoli et al 2024). Many lncRNAs have also been demonstrated to play a functional role in the human placenta, including H19, MEG3, and MEG8 (Adu-Gyamfi et al 2023) and differences are even seen in nuclear subtypes in trophoblast stem cells (Khan et al 2021). Therefore, we prefer to keep these lncRNA signatures included and let future researchers test their functional role.

      To improve the study's validity and significance, it is crucial to address the inconsistencies and to provide additional evidence for the claims. Supplementing with immunofluorescence staining for validating the distribution of STB_in, STB_out, and EVT_enrich in the organoid models is recommended to strengthen the results and conclusions. 

      Each general trophoblast cell type (CTB, STB, EVT) has been visualized by immunofluorescence by the Coyne laboratory in their initial papers characterizing the STB<sup>in</sup>, STB<sup>out</sup>, and EVT<sup>enrich</sup> models (Yang et al, 2022 and 2023). We agree that it is important to validate the STB nuclear subtypes found in our genomic study. However, one challenge in studying a syncytia is that immunofluorescence may not be a definitive method when the nuclei share a common cytoplasm. This is because protein products from mRNAs transcribed in one nucleus are translated in the cytoplasm and could diffuse beyond sites of transcription. Therefore, RNA fluorescence in situ hybridization (RNA-FISH) is instead needed. While a systematic characterization of the spatial distribution of the many marker genes found each subtype is outside the scope of this study, we include RNA-FISH of one STB-2 marker (PAPPA2) and one STB-3 marker (ADAMTS6) in Figure 3F-G and Supplemental Figure 3.3. This demonstrates there is an increase in STB-2 marker gene expression in STB<sup>in</sup> TOs and an increase in STB-3 marker gene expression in STB<sup>out</sup> TOs. 

      Reviewer #3 (Public review):  

      The authors present outstanding progress toward their aim of identifying, "the underlying control of the syncytiotrophoblast". They identify the chromatin remodeler, RYBP, as well as other regulatory networks that they propose are critical to syncytiotrophoblast development. This study is limited in fully addressing the aim, however, as functional evidence for the contributions of the factors/pathways to syncytiotrophoblast cell development is needed. Future experimentation testing the hypotheses generated by this work will define the essentiality of the identified factors to syncytiotrophoblast development and function. 

      We thank the reviewer for their thoughtful assessment, constructive feedback, and encouraging comments. We acknowledge that the initial manuscript primarily presented analyses suggesting correlations between RYBP and other factors identified in the gene network analysis and STB function. Understanding how gene networks in the STB are formed and regulated is a long-term goal that will require many experiments with collaborative efforts across multiple research groups.

      Nonetheless, to address this concern we have knocked out two key genes, RYBP and AFF1, in TOs using CRISPR-Cas9-mediated gene targeting. Bulk RNA sequencing of STB<sup>in</sup> TOs from both wild-type (WT) and knockout strains revealed that deletion of either gene caused a statistically significant decrease in the expression of the pregnancy hormone human placental lactogen and an increase in the expression of several genes characteristic of the oxygen-sensing STB-2 subtype, including FLT-1, PAPPA2, SPON2, and SFXN3. These findings demonstrate that knocking out RYBP or AFF1 results in an increase in STB-2 marker gene expression and therefore play a role in inhibiting their expression in WT TOs (Figure 5D-E and supplemental Figure 5.2). We also note that this is the first application of CRISPR-mediated gene silencing in a TO model.

      Future work will visualize the distribution of STB nuclear subtypes in these mutants and explore the mechanistic role of RYBP and AFF1 in STB nuclear subtype formation and maintenance. However, these investigations fall outside the scope of the current study.

      Localization and validation of the identified factors within tissue and at the protein level will also provide further contextual evidence to address the hypotheses generated. 

      We agree that visualizing STB nuclear subtype distribution is essential for testing the many hypotheses generated by our analysis. To address this, we have included RNA-FISH experiments for two STB subtype markers (PAPPA2 for STB-2 and ADAMTS6 for STB-3) in TOs. These experiments reveal an increase in PAPPA2 expression in STB<sup>in</sup> TOs and an increase in ADAMTS6 expression in STB<sup>out</sup> TOs (Figure 3F-G and Supplemental Figure 3.3). Genomic studies serve as powerful hypothesis generators, and we look forward to future work—both our own and that of other researchers—to validate the markers and hypotheses presented from our analysis.

      Recommendations for the authors: 

      Reviewing Editor Comments: 

      We strongly encourage the authors to further strengthen the study by addressing all reviewers' comments and recommendations, with particular attention to the following key aspects:

      (1) Clarifying the uniqueness of the STB differentiation trajectory between STB<sup>in</sup> and STB<sup>out</sup>, and determining whether STB<sup>out</sup> represents a more advanced stage of differentiation compared to STB<sup>in</sup>. It is also important to specify which developmental stage of placental villi the STB<sup>out</sup> and STB<sup>in</sup> are simulating. 

      We have revised the manuscript to remove definitive language claiming that STB-3 represents a terminally differentiated subtype or that STB<sup>out</sup> nuclei are more differentiated than STB<sup>in</sup> nuclei. Instead, we now present our hypothesis and alternative explanations in the discussion (lines 619-625), and emphasize the need for experimental validation of pseudotime predictions to test these hypotheses.

      (2) Utilizing immunofluorescence to validate the distribution of cell types in the organoid models. 

      The Coyne lab has previously performed immunofluorescence of CTB and STB markers in STB<sup>in</sup> and STB<sup>out</sup> TOs (Yang et al 2023). The syncytial nature of STBs complicates immunofluorescence-based validation of the STB nuclear subtypes due translating proteins all sharing a single common cytoplasm and therefore being able to diffuse and mix. Instead, we performed RNA-FISH for two STB subtype markers (PAPPA2, STB-2 and ADAMTS6, STB-3), which showed subtype-specific nuclear enrichment in STB<sup>in</sup> and STB<sup>out</sup> TOs, respectively (Figure 3F-G and Supplemental Figure 3.3).  

      (3) Addressing concerns regarding the use of lncRNA as cell marker genes. Employing canonical markers alongside critical TFs involved in differentiation pathways to perform a more robust cell-type analysis and validation is recommended.  

      As discussed in detail above, we maintain that lncRNAs are valuable markers, supported by their demonstrated roles in cell and tissue specificity and placental function. These signatures provide important insights and hypotheses for future research, and we have clarified this rationale in the revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      (1) The authors have presented an extensive SC- and SN-based characterization of their improved trophoblast TO model, including a comparison to human placental tissues and the previous TO iteration. In this way, the authors' work represents an invaluable resource for investigators by providing thorough validation of the TO model and a clear description of the similarities and differences between primary and TO-derived STBs. I would suggest that the authors reshape the study to further highlight and emphasize this aspect of the study. 

      We thank the reviewer for their thoughtful recommendation and agree that our datasets will serve as an invaluable resource for comparing in vitro models to in vivo gene expression. However, extensive validation is required to make definitive conclusions about the extent to which these systems mirror one another and where they diverge. For this reason, in this manuscript, we have focused on characterizing STB subtypes to provide a foundational understanding of the model and this poorly characterized subtype.

      (2) Introduction, Paragraph 3: What is the importance of orientation for the trophoblast TO model? The authors may consider removing some of the less important methodologic details from this paragraph and including more emphasis on why their TO model is an improvement. 

      Text has been added to this paragraph to highlight the importance of outward facing STB orientation, which is essential to mirror the STB’s transport function in vitro (lines 118-120).

      (3) Results, Figure 1: In addition to the primary placental tissue plots showing all cell populations, it may be useful to have side-by-side versions of similar plots showing only the trophoblast subsets, so that the primary and TO data could be more easily compared visually. 

      This has been implemented and added to the Supplemental Figure 1.4.

      (4) Results, Figure 1: In simple terms, what is the reason for ending up with different cluster numbers/names from the primary tissue and TO? Would it be possible to apply the same annotation to each (at least for trophoblast types) and thus allow direct comparison between the two? 

      As described above, each dataset was separately analyzed and clusters determined with an algorithm to determine the optimal clustering resolution. Therefore, the number of clusters between each dataset cannot be directly compared until the SN TO and tissue datasets are integrated together in Figure 6. We have added text to the manuscript to make it clear that they should not be compared except for in bulk number until this point (230-232).

      (5) Results, Figure 2: For subsequent evaluation of different in vitro TO conditions, did the authors use only SN sequencing because they wanted to focus on STB? Based on Figure 1, it seems some CTB subsets would be underrepresented if using only SN. Given that the authors look at both STB and CTB in their different TOs, is this an issue? 

      The CTB clusters that showed the greatest divergence between SC and SN datasets were those associated with mitosis and the cell cycle, likely due to nuclear envelope breakdown interfering with capture by the 10x microfluidics pipeline. While cytoplasmic gene expression provides valuable insights into CTB function, our manuscript focuses on the STB starting from Figure 2. Since the STB is captured exclusively by the SN dataset, we concentrated on this approach to streamline our analysis.

      (6) Results, Figure 3: What do the authors consider to be the primary contributing factors for why the STB subsets display differential gene expression between STB<sup>in</sup> and STB<sup>out</sup>? Is this due primarily to the cultural conditions and/or a result of the differing spatial arrangement with CTBs? 

      This is an intriguing question that is challenging to disentangle because the culture conditions are integral to flipping the orientation. The two primary factors that differ between STB<sup>in</sup> and STB<sup>out</sup> TOs are the presence of extracellular matrix in STB<sup>in</sup> and direct exposure to the surrounding media in STB<sup>out</sup>. We believe these environmental cues play a significant role in shaping the gene expression of STB subsets. Fully disentangling this relationship would require a method to alter the TO orientation without changing the culture conditions. While this is an exciting direction for future research, it falls outside the scope of the present study.

      (7) Results, Figure 4: The authors' analysis indicates that the STB nuclei from the STB<sup>out</sup> TO are likely "more differentiated" than those in STB<sup>in</sup> TO. Could the authors provide some qualitative or quantitative support for this? Is the STB<sup>out</sup> differentiated phenotype closer to what would be observed in a fully formed placenta? 

      As discussed earlier, we agree with the reviewers that this claim should be removed from the text outside of the discussion.

      (8) Results, Figure 5: Based on the trajectory analysis, do the authors consider that the STB from STB<sup>out</sup> TO are simply further along the differentiation pathway compared to those from STB<sup>in</sup> TO, or do the STB from STB<sup>out</sup> TO follow a differentiation pathway that is intrinsically distinct from STB<sup>in</sup> TO? 

      We think the idea of an intrinsically distinct pathway is a fascinating alternative hypothesis and have added it into the discussion. We do not find the pseudotime currently allows us to answer this question without additional experiments, so we have removed claims that the STB<sup>out</sup> STB nuclei are further along the differentiation pathway.

      (9) Results, Figure 6: A notable difference between the STB<sup>out</sup> TO and the term tissue is that the CTB subsets are much more prevalent. Is this simply a scale difference, i.e. due to the size of the human placenta compared to the limited STB nuclei available in the STB<sup>out</sup> TO? Or are there other contributing factors? 

      The proportion of CTB to STB nuclei in our term tissue (9:1) aligns with expectations based on stereological estimates. We believe the relatively low number of CTB nuclei in our dataset is due to the need for a larger sample size to capture more of this less abundant cell type. Since the primary focus of this paper is on STB, and we analyzed over 4,000 STB nuclei, we do not view this as a limitation. However, future studies utilizing SN to investigate term tissue should account for the abundance of STB nuclei and plan their sampling carefully to ensure sufficient representation of CTB nuclei if this is a desired focus.

      Reviewer #2 (Recommendations for the authors): 

      (1) The color annotations for cell types in Figure 2 are inconsistent between the different panels, and the term "Prolif" in Figure 2E is not explained by the authors. 

      We chose colors to enhance visibility on the UMAP. We do not wish readers to make direct comparisons between the different CTB or STB subtypes of the sample types until the datasets are integrated in Figure 2D. This is because an algorithm for the clustering resolution has been chosen independently for each dataset. Cluster proportions are better compared in the integrated datasets in Figure 2D. We have added text to the results section to make this clear to the reader (lines 166-167).

      (2) In Figure 3 and Supplementary Figures 1.3, the authors frequently present long non-coding RNA (lncRNA) signals as cell type-specific markers in the bubble plots. These signals are likely sequencing noise and may not accurately represent true markers for those cell types. It is recommended to revise this interpretation. 

      As referenced above, there are many examples of lncRNAs that have biological and pathological significance in the placenta (H19, Meg3, Meg8) and lncRNAs often have cell type specific expression that can enhance clustering. We prefer to keep these signatures included and let future researchers determine their biological significance.

      (3) In Figure 3C, the authors performed pathway enrichment analysis on the STB subtypes after integrating STB_in and STB_out organoids. The enrichment of the "transport across the blood-brain barrier" pathway in the STB-3 subtype does not align with the current understanding of STB cell function. Please provide corresponding supporting evidence. Additionally, please verify whether the other functional pathways represent functions specific to the STB subtypes. 

      Interestingly, many of the genes categorized under “transport across the blood-brain barrier” are transporters shared with “vascular transport.” These include genes involved in the transport of amino acids (SLC7A1, SLC38A1, SLC38A3, SLC7A8), molecules essential for lipid metabolism (SLC27A4, SLC44A1), and small molecule exchange (SLC4A4, SLC5A6). Given that the vasculature, the STB, and the blood-brain barrier all perform critical barrier functions, it is unsurprising that molecules associated with these GO terms are enriched in the STB-3 subtype, which expresses numerous transporter proteins. Since the transport of materials across the STB is a well-established function, we have not included additional supporting evidence but have clarified the genes associated with this GO term in the text (lines 392-394 and supplemental Table 9).

      (4) The pseudotime heatmap in Figure 4B is not properly arranged and is inconsistent with the differentiation relationships shown in Figure 4A. It is recommended to revise this. 

      We are uncertain which aspect of the heatmap in Figure 4A is perceived as inconsistent with Figure 4B. One distinction is that pseudotime in Figure 4A is normalized from 0 to 100 to fit the blue-to-yellow-to-red color scale, whereas in Figure 4B, the color scale is not normalized and the color bar ranging from white to red. This difference reflects our intent to simplify Figure 4B-C, as the abundance of color between cell types and gene expression changes required a streamlined representation to ensure the figure remained clear and easy to interpret. This is classically done in the field and consistent with the default code in the slingshot package.

      (5) In Figures 4C and 4D, although RYBP is highly expressed in STB, it is difficult to support the conclusion that RYBP shows the most significant expression changes. It is recommended to provide additional evidence. 

      The claim that RYBP exhibits the most significant expression changes was based on p-value ordering of genes associated with pseudotime via the associationTest function in slingshot and not with immunofluorescence data. The text has been revised to make this distinction clear (lines 390-393).

      (6) In Figure 4E, staining for CTB marker genes is missing, and in Figure 4F, CYTO is difficult to use as a classical STB marker. It is recommended to use the CGBs antibody from Figure 4E as a STB marker for staining to provide evidence.  

      We have revised the Figure 5B-C to use e-Cadherin as a CTB marker gene in TOs and CGB antibody as a marker of STB.

      In tissue, however, obtaining a good STB marker that does not overlap with the RYBP antibody (rabbit) in term tissue is difficult as the STB downregulates hCG expression closer to term to initiate contractions. SDC1 is often used but only labels the plasma membrane so does not help in distinguishing the STB cytoplasm. We have added an image of cytokeratin, e-Cadherin, and the STB marker ENDOU to validate that our current approach with e-Cadherin and cytokeratin allows us to accurately distinguish between CTB and STB cells.

      (7) The velocity results in Figure 5A do not align with the differentiation relationships between cells and contradict the pseudotime results presented in Figure 4 by the authors. 

      The reviewer raises an interesting observation regarding the velocity map in Figure 5A, which appears to show a bifurcation into two STB subtypes. This observation aligns with similar findings reported in tissue by our colleagues (Wang et al., 2024). However, given the low number of CTB cells in our tissue dataset, we were cautious about making definitive conclusions about pseudotime without a larger sample size. Notably, the RNA velocity map closely resembles the pseudotime trajectory in TOs, with CTB transitioning into the CTB-pf subtype and subsequently into the STB. One potential explanation for discrepancies between tissue and TOs is the difference in nuclear age: nuclei in tissue can be up to nine months old, whereas those in TOs are only hours or days old. It is possible that the lineage in TOs could bifurcate if cultured for longer than 48 hours, but our current dataset captures only the early stages of the STB differentiation process. While exploring these hypotheses is fascinating, they are beyond the scope of this current study.  

      Reviewer #3 (Recommendations for the authors): 

      Amazing work - I greatly enjoyed reading the manuscript. Here are a few questions and suggestions for consideration: 

      Evidence presented throughout the results sections hints that the organoids may represent an earlier stage of placental development compared to the term. Increased hCG gene expression is observed, but as noted expression is decreased in term STB. STB:CTB ratios are also higher at term compared to the first trimester, etc. It was difficult to conclude definitively based on how data is presented in Fig 6 and discussed. Maybe there is no clear answer. Perhaps the altered cell type ratios in the organoid models (e.g., few STB in EVT enrich conditions) impact recapitulation of the in vivo local microenvironment signaling. As such, can the authors speculate on whether cell ratios could be strategically leveraged to model different gestational time points? 

      Along these same lines, syncytiotrophoblast in early implantation (before proper villi development) is often described as invasive and later at the tertiary villi stage defined by hormone production, barrier function, and nutrient/gas exchange. Do the authors think the different STB subtypes captured in the organoid models represent different stages/functions of syncytiotrophoblast in placental development? 

      Minor Comments 

      (1) Please clarify what the third number represents in the STB:CTB ratio (e.g., 1:3:1 and 2:5:1). EVT? 

      The first number is a decimal point and not a colon (ie 1.3 and 2.5). Therefore these numbers are to be read as the STB:CTB ratio is 1.3 to 1 or 2.5 to 1.

      (2) Could consider co-localizing RYBP in term tissue with a syncytio-specific marker like CGB used for organoids (Fig 4F). 

      We addressed this concern in comment 6 to reviewer 2.

      (3) Recommend defining colors-which colors represent which module in Figure 5C in the legend and main body text. I see the labels surrounding the heatmap in 5B, but defining colors in text (e.g. cyan, magenta, etc.) would be helpful. Do the gray circles represent targets that don't belong to a specific module? Are the bolded factor names based on a certain statistical cutoff/defining criteria or were they manually selected? 

      The text of both the results and figure legends has been revised to clarify these points.

      (4) Data Availability: It would be helpful to provide supplemental table files for analyses (e.g., 5C to list the overlapping relationships in TGs for each TF/CR (5C) and 3E/6F to list DEG genes in comparisons). 

      Supplemental files for each analysis have been added (Supplemental Table 8-14). In addition, the raw and processed data is available on GEO and we have created an interactive Shiny App so people without coding experience can interact with each dataset (lines 917-919).

      (5) “...and found that each sample expressed these markers (Figure 6D), suggesting..." Consider clarifying "these". 

      Text has been added to refer to a few of these marker genes within the text (line 540).

      Citations

      (1) Zappia L, Oshlack A. Clustering trees: a visualization for evaluating clusterings at multiple resolutions. GigaScience. 2018;7(7):giy083. PMCID: PMC6057528

      (2) Zhou J, Xu J, Zhang L, Liu S, Ma Y, Wen X, Hao J, Li Z, Ni Y, Li X, Zhou F, Li Q, Wang F, Wang X, Si Y, Zhang P, Liu C, Bartolomei M, Tang F, Liu B, Yu J, Lan Y. Combined Single-Cell Profiling of lncRNAs and Functional Screening Reveals that H19 Is Pivotal for Embryonic Hematopoietic Stem Cell Development. Cell Stem Cell. 2019;24(2):285-298.e5. PMID: 30639035

      (3) Malagoli G, Valle F, Barillot E, Caselle M, Martignetti L. Identification of Interpretable Clusters and Associated Signatures in Breast Cancer Single-Cell Data: A Topic Modeling Approach. Cancers. 2024;16(7):1350. PMCID: PMC11011054

      (4) Adu-Gyamfi EA, Cheeran EA, Salamah J, Enabulele DB, Tahir A, Lee BK. Long non-coding RNAs: a summary of their roles in placenta development and pathology†. Biol Reprod. 2023;110(3):431–449. PMID: 38134961

      (5) Zheng M, Hu Y, Gou R, Nie X, Li X, Liu J, Lin B. Identification three LncRNA prognostic signature of ovarian cancer based on genome-wide copy number variation. Biomed Pharmacother. 2020;124:109810. PMID: 32000042

      (6) Khan T, Seetharam AS, Zhou J, Bivens NJ, Schust DJ, Ezashi T, Tuteja G, Roberts RM. Single Nucleus RNA Sequence (snRNAseq) Analysis of the Spectrum of Trophoblast Lineages Generated From Human Pluripotent Stem Cells in vitro. Front Cell Dev Biol. 2021;9:695248. PMCID: PMC8334858

      (7) Isakova A, Neff N, Quake SR. Single-cell quantification of a broad RNA spectrum reveals unique noncoding patterns associated with cell types and states. Proc Natl Acad Sci United States Am. 2021;118(51):e2113568118. PMCID: PMC8713755

      (8) Morales-Vicente DA, Zhao L, Silveira GO, Tahira AC, Amaral MS, Collins JJ, Verjovski-Almeida S. Singlecell RNA-seq analyses show that long non-coding RNAs are conspicuously expressed in Schistosoma mansoni gamete and tegument progenitor cell populations. Front Genet. 2022;13:924877. PMCID: PMC9531161

      (9) Kim DH, Marinov GK, Pepke S, Singer ZS, He P, Williams B, Schroth GP, Elowitz MB, Wold BJ. Single-Cell

      Transcriptome Analysis Reveals Dynamic Changes in lncRNA Expression during Reprogramming. Cell Stem Cell. 2015;16(1):88–101. PMCID: PMC4291542

      (10) Yang L, Liang P, Yang H, Coyne CB. Trophoblast organoids with physiological polarity model placental structure and function. bioRxiv. 2023;2023.01.12.523752. PMCID: PMC9882188

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      • *

      *We thank the reviewers for finding the manuscript enjoyable and well-written, with experiments that were performed well, show solid results and provide useful data for the community. The reviewers have provided meaningful feedback to improve this study. We have addressed the comments point-by-point below. The main text will also be further modified to incorporate new analysis where it has not yet been done. *

      • *

      • *

      2. Description of the planned revisions

      Reviewer 1:

      Summary OTX2 is a pivotal transcription factor that regulates the fate choice between somatic and primordial germ cell (PGC) lineages in early mouse development. In the current study, the authors use in vitro stem cell models to demonstrate that OTX2 mediates this developmental fate decision through controlling chromatin accessibility, whereby OTX2 helps to activate putative enhancers that are associated with somatic fate. By extension, those somatic-associated regulatory regions therefore become inaccessible in cells adopting PGC identity in which Otx2 is downregulated.

      Comments I enjoyed reading this manuscript. The experiments have been carried out well and for the most part the results provide convincing evidence to support the claims and conclusions in the manuscript. I particularly liked the experiments using the inducible Otx2 transgene to examine the acute changes in chromatin accessibility following restoration of OTX2.

      I include some suggestions below to the authors for additional analyses that I feel would further strengthen their study.

      I also felt that the authors focus almost exclusively on the subset of OTX2-bound sites that lose accessibility in the absence of OTX2. But, as they show in several figure panels, these sites tend to be the minority and that most OTX2-occupied sites do not lose accessibility in Otx2-null cells (actually, more sites tend to gain accessibility). I encourage the authors to modify the text and some of the analyses to give a better balance to their study. We are pleased that this reviewer enjoyed our manuscript. As suggested by the reviewer, we included analyses on the regions that are bound by OTX2 but do not show an increase in accessibility (see section 3 reviewer 1 point 6). The text will be expanded to include the new data and to include the description of the subset of OTX2 sites that do not show accessibility changes in the absence of OTX2. We have responded to other points they raised as detailed in the sections below

      • *

      Figure 1: The authors write: "...OTX2 binds mostly to putative enhancers." Whether these distal sites are enhancers is not sufficiently evidenced in the manuscript, but it is important information to collect to support their model of OTX2 function. The authors should strengthen their analysis by examining whether OTX2 peaks are enriched at previously defined enhancer regions.

      We plan to compare OTX2 bound regions with defined lists of enhancers identified in ESCs grown in Serum/LIF (e.g. Whyte et al 2013) and, if available, in 2i/LIF and EpiLCs. We will also analyse publicly available datasets for H3K4me1 (enhancer marker) and H3K27ac (marker of active regulatory regions) at the regions bound by OTX2 in ESCs and EpiLCs.

      Figure 2: I'm still puzzled why the authors did not examine flow-sorted WT+cyto cells?

      *We agree with the reviewer that it would be interesting to examine flow-sorted WT +cyto PGCLCs. Unfortunately, the expression of CD61 and SSEA1 only becomes visible from day 4 of PGCLC differentiation. Therefore, we were not able to isolate PGCLC at day 2 from WT cells differentiated in the presence of cytokines. We then used OTX2-/- cells at day 2 to model PGCLCs. This is based on the assumption that because day 6 Otx2-/- PGCLCs are transcriptionally similar to sorted day 6 WT cells (Zhang, Zhang et al Nature 2018), the same will be true at day 2. We will modify the text in the final version of this manuscript to clarify this point that has also been raised by reviewers 2 and 3. *

      • *

      Figure 3: I would be tempted to put Figure S3A and S3B into Figure 3. It would be better to show all 1246 DARs together, either ordered by OTX2 CT&RUN signal, or presented in two pre-defined groups (OTX2-bound vs unbound). I also suggest that the author show OTX2 signals and ATAC-seq signals for the 3028 DARs that gain accessibility in Otx2-null EpiLCs (this could be added to a supplemental figure).

      Although the analysis has been carried out and the figures have been amended, the main text will be modified in a future updated version of the manuscript to incorporate these results.

      • *

      Figure 3: What is special about the 8% of OTX2-bound site that lose accessibility, versus the 92% of sites that do not?

      *The 8% of the OTX2-bound regions that lose accessibility in the absence of OTX2 appear to be more sensitive to the loss of OTX2. One possible explanation is that the accessibility of the rest of OTX2 bound regions relies on other TFs, such as OCT4, that are expressed in EpiLCs. We will modify the main text to discuss this interesting point raised by the reviewer. *

      Figure 6F: If the 4221 sites are split into those bound by OTX2 versus those that are not (related to Figure 6C) then is there a difference? i.e. are the OTX2-bound sites opening up?

      We separated the 4,221 sites in OTX2 bound and unbound. The result is reported below:

      *Although there is a slight increase in accessibility in the OTX2 bound subset, the average accessibility reaches less than ¼ of the accessibility of these regions when OTX2 is present from day 0 to day 4, while the OTX2 unbound regions do not show an increase in accessibility. Although we can not rule out that a longer treatment with tamoxifen may lead to higher accessibility in the OTX2 bound subset, the dynamics are extremely slower compared to the EpiLC regions where accessibility reaches 50% of the d0-d4 sample in just 1 hour of tamoxifen treatment. *

      • *

      Is there any evidence that OTX2 binds and compacts PGCLC enhancers in somatic cells? I appreciate this is different to the main thrust of the authors' model, but being able to show that OTX2 does not compact these sites lends further support to their preferred model of OTX2 opening sites of somatic lineages.

      *Comparing the ATAC-seq in PGCLCs with ESCs and EpiLCs, we identify a subset of regions that are open in PGCLC only (PGCLC-specific accessible regions, see below). These regions do not show binding of OTX2 in WT EpiLCs or the d0-d2 Tam sample, suggesting that OTX2 does not bind and compact PGCLC-specific enhancers. *

      • *

      PGCLC-specific regions showing high accessibility only in PGCLCs.

      • OTX2 CUT&RUN signal in WT EpiLC, OTX2-ERT2 PGCLCs in presence or absence of Tamoxifen, showing that OTX2 does not bind PGCLC-specific regions even when it is overexpressed in GK15 medium.*

      *These analyses will be incorporated in the manuscript. *

      • *

      Discussion: Have prior studies established a connection between OTX2 and chromatin remodellers that can open chromatin? Or, if not, then perhaps this could be proposed as a line of future research.

      We thank the reviewer for suggesting to amplify the discussion on the possible connection between OTX2 and chromatin remodellers. Although there is no evidence in the literature of a direct interaction between OTX2 and chromatin remodellers, this can not be excluded. The connection might also be indirect: OTX2 is known to interact with OCT4, which in turn interacts and recruits to chromatin the catalytic subunit of the SWI/SNF complex, BRG1. This point will be discussed in a modified version of the manuscript.

      • *

      • *

      Reviewer 2:

      Barbieri and Chambers explore the role of OTX2 on mouse pluripotency and differentiation. To do so, they examine how the chromatin accessibility and OTX2 binding landscape changes across pluripotency, the exit of pluripotency towards formative and primed states, and through to PGCLC/somatic differentiation. The work mostly represents a resource for the community, with possible implications for our understanding of how OTX2 might mediate the germline-soma switch of fates. While the findings of the work are modest, the results seem solid and the manuscript is clear and well-written.

      *We are pleased that this reviewer found our results solid and the manuscript clear. *

      I have some comments as indicated below:

      1. The comparison between Otx2-/- cells in the presence of PGCLC cytokines compared to WT cells in the absence of cytokines seems like it is missing controls to me. I assume the authors wanted to enable homogeneous populations to facilitate their bulk sequencing methods, but it seems to me like they are comparing apples with oranges. It would have been better to have the reciprocal situations (Otx2-/- cells in basal differentiation medium, and WT cells in PGCLC cytokines) with a sorting strategy to better unpick the differences between the presence and absence of Otx2 in the 2 protocols. Having said that, the authors are careful not to draw many comparisons between those populations so I don't think this omission affects their current claims. They should however clarify whether the flow cytometry (Supp Fig2) was used for sorting cells or if all cells were taken for bulk sequencing. *We agree with the reviewer that it would be of interest to compare the PGCLC and somatic population derived from the OTX2-/- cells in GK15 without cytokines with the same populations derived from WT cells differentiated in the presence of cytokines. Our work aims to identify what happens at the stages of PGCLC differentiation when cells are still competent for both germline and somatic differentiation. Previous work from the lab showed that this dual competence is lost after day 2, therefore we focus our attention on this time of differentiation. Unfortunately, the two surface markers characteristics of PGCs (CD61 and SSEA1) are not expressed at day2 and, therefore we are not able to sort PGCLCs derived from OTX2-/- cells in GK15 without cytokines or WT cells differentiated in the presence of cytokines. As recognised by this reviewer, we aimed to obtain two homogenous populations that can model PGCLCs and somatic cells. This is based on data obtained at day 6 when Otx2-/- PGCLCs show a similar transcriptome to sorted day 6 WT cells (Zhang, Zhang et al Nature 2018) and the assumption that the same will be true at day 2. We will clarify that the supplementary Figure 2 is not a sorting strategy. As this point has been raised by reviewers 1 and 2 as well, we will modify the text to clarify the choice and the assumption behind using OTX2-/- cells in the presence of cytokines and WT cells in the absence of cytokines to model PGCLCs and somatic cells respectively. *

      2. *

      Throughout the text, the authors subject cells (WT / Otx2-/- /Otx2ER ) to different protocols to look at accessibility and Otx2 binding, but with no mention of the cell fate differences that occur in these different conditions. For instance, it is unclear to me to which fate the WT cells without PGCLC cytokines go - I presume this is neural but perhaps this is a mixed fate, given that they are in GK15 rather than N2B27. Likewise, the OTX2ER experiments may promote a mixed population between PGCLC/somatic fates, and this is never described. Ideally transcriptomic data would be collected, but failing that, qPCR data should be obtained to examine this more closely.

      *We are planning to generate RT-qPCR data for germ layer markers (ectoderm, endoderm and mesoderm) in WT cells in GK15 without cytokines at day 2, as well as OTX2-ERT2 cells with and without Tamoxifen at day 2 (noTam, d0-d2) and day 4 (no Tam, d0-d4). *

      The authors also state that "OTX2 facilitates Fgf5 transcription' (page10) but provide no transcriptional data to substantiate this claim. Again RT-qPCR would help make this point.

      *We will analyse the level of Fgf5 by RT-qPCR in OTX2-ERT2 EpiLCs treated for 1 hour and 6 hours with Tamoxifen to show the effect of OTX2 on Fgf5 transcription. *

      • *

      It is unclear to me what the 'increase[d] accessibility' (eg abstract final sentence, Figure 3E) really means at the cellular level. Does this indicate that more cells have this site open, and does this have implications for the heterogeneity of cell fates observed? Since the authors are concerned with fate decisions, this seems like an important consideration that should at least be discussed.

      The possibility that the increased accessibility is due to higher heterogeneity in the population is interesting and it will be included in the discussion in a revised version of the manuscript.

      • *

      • *

      Reviewer 3:

      In this manuscript, the authors perform OTX2 CUT&RUN and ATAC-seq in Otx2-null and WT ESCs, EpiLCs and PGCLCs to understand whether the role of OTX2 in restricting mouse germline entry that they previously described (Zhang Nature 2018) mechanistically depends on chromatin remodeling. They identify differentially accessible regions (DARs) between Otx2-null and WT cells at different stages of differentiation and show that many of these are OTX2 bound in WT. They then show using cells expressing OTX2-ER^T2 in Otx2-null Epiblast cells that when OTX2 is moved into the nucleus, the regions that were differentially closed in Otx2-null open within an hour, suggesting chromatin accessibility is directly controlled by OTX2 (rather than indirect effects involving transcription and translation which one would expect to take longer). The scope is narrow, but this is nice work and useful data for the mouse PGC field. However, there are a few places where the data could be strengthened, and the writing is a little confusing in places, for example by stating as fact in early sections what is not proven until later.

      We thank the reviewer for finding our work nice and useful for the mouse PGC field, and for the useful comments to improve the manuscript. We have included new analysis and modified the text as suggested to improve the writing, avoiding early statements that were not fully proven until later in the manuscript. We have responded to other points they raised as detailed below and in the next section.

      • *

      1) "we compared Otx2-/- cells cultured in the presence of PGC-promoting cytokines with wild-type cells cultured in the absence of PGC-promoting cytokines. Under these conditions Otx2-/- cells produce an essentially pure (>90%) CD61+/SSEA1+ population that we refer to as PGCLCs, while wild-type cells yield a cell population from which PGCLCs are absent"

      This is not a controlled comparison since one cannot separate the day 2 effect of cytokines from that of the Otx2 knockout. The manuscript would be strengthened if the authors include WT somatic and PGCLCs from the +cytokine conditions, which could be easily sorted out as shown in Supp. Fig. 2. Ideally they would also include Otx2-null somatic cells, although Supp. Fig. 2 shows those are rare under the conditions considered.

      *This work aimed to analyse early stages of EpiLC to PGCLC differentiation when cells are still competent for both somatic and germline differentiation. This stage has been described previously to be at day 2 of differentiation in GK15 + cytokines (PGCLC differentiation medium, Zhang, Zhang et al, Nature 2018). Unfortunately, CD61 and SSEA1 are not expressed at day 2 of PGCLC differentiation, and they start to be expressed on the cell surface by day 4. Consequently, it is impossible to sort cells at day 2 using the CD61+/SSEA1+ strategy. To overcome this problem, we used WT cells grown in GK15 without cytokines to model a population of somatic cells and OTX2-/- cells grown in GK15+ cytokines to model a homogeneous population of PGCLCs. As explained in a similar point raised by reviewers 2 and 3, we assumed that, as OTX2-/- cells grown in the presence of cytokines are transcriptionally similar to sorted WT cells at day 6 (Zhang, Zhang et al, Nature 2018), OTX2-/- cells at day 2 are similar to their WT counterpart at day 2. The main text will be modified to clarify that we are using homogeneous populations to model both PGCLC and somatic cells and that Figure S2 does not show a sorting strategy. *

      • *

      3) "In ESCs, OTX2 binds We are planning to perform a statistical analysis to ascertain that the small number of DARs bound by OTX2 are or are not bound by chance by OTX2.

      • *

      4) It would be good if the discussion was broadened to include both human and other transcription factors that are involved. How much of these conclusions could one expect to carry over to human or other mammals? There is some work from the Surani lab considering OTX2 in human. One could even look at published ATAC or OTX2 chip-seq data in hPSCs and potentially learn something interesting. Furthermore, there are studies on other transcription factors modulating chromatin accessibility in the decision between germline and somatic cells, for example PRDM1, PRDM14 (refs in e.g. Tang et al Nat Rev Gen 2016) or TFAP2A (at least in human (Chen et al Cell Rep 2019)). Do these factors affect the same genes? Is a coherent picture emerging of their respective roles in germline entry?

      *As suggested by the reviewer, we will discuss the role of OTX2 in human PGCLC formation and include studies on PGC-specific transcription factors concerning changes in chromatin accessibility in germline and somatic cells. This will be included in a revised version of the manuscript. *

      • *

      • *

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer 1:

      1. Figure 1: The authors report in the methods that they performed OTX2 CUT&RUN in biological duplicates. It would strengthen their results if they showed in Figure S1 some representative data from each replicate separately to show the consistency. As suggested by the reviewer to show consistency between replicates, two representative tracks of the two CUT&RUN replicates at the Tet2 (ESCs) and Fgf5 (EpiLCs) loci have been included in Figure S1A. The corresponding tracks of the average bigwig files are reported in Figure 1E. The main text (page 5) and the figure legends have been amended to incorporate the new panels.

      2. *

      Figure 2: I think it would be helpful to remind the reader here that Otx2 is normally downregulated in PGCs, and that Otx2 expression is maintained (at least initially) in somatic cells. This would help explain the logic behind the choice of samples that were profiled.

      We modified the text with the following sentence, as suggested by the reviewer, emphasising the level of OTX2 in early somatic vs early PGCLCs: “Otx2 expression is rapidly downregulated in the EpiLC to PGCLC transition while its expression is maintained longer in cells entering the somatic lineage [8]*” (page 7). *

      • *

      Figure 2D: I appreciate that the highlighted region at the Tet2 locus is a DAR, but from the genome tracks it looks as though the region still has high accessibility. Are there any other examples to exemplify a more obvious DAR? Additionally, since twice as many DARs gain accessibility in Otx2-null ESCs compared to lose accessibility, why not show examples of these as well? The same is true of EpiLCs. (Or alternatively, provide a good explanation for why not to show these other categories)

      We substituted the Tet2 DAR with a more clear example of ESC DAR located in the Hes1 locus that shows low accessibility in Otx2-/- ESCs versus WT ESCs. Examples of ESC DARs and EpiLC DARs that show higher accessibility in Otx2-/- vs WT cells have been added as new panels 2E (DAR in Pebp4 locus) and 2G (DAR in Tdh locus). We also simplified the panels showing only ATAC-seq tracks in WT and OTX2-/- cells, either ESCs (2D-E) or EpiLCs (G-H). Text and figure legends have been modified to accommodate the changes made in Figure 2.

      • *

      Figure 3: I would be tempted to put Figure S3A and S3B into Figure 3. It would be better to show all 1246 DARs together, either ordered by OTX2 CT&RUN signal, or presented in two pre-defined groups (OTX2-bound vs unbound). I also suggest that the author show OTX2 signals and ATAC-seq signals for the 3028 DARs that gain accessibility in Otx2-null EpiLCs (this could be added to a supplemental figure).

      Figures S3A and S3B have been moved to the main figure. Figure S3A is now part of Figure 3C, where all the 1,246 DARs are shown together, separated into two groups (OTX2-bound and -unbound). Figure S3B is now part of Figure 3F. A new heatmap showing the OTX2 and ATAC-seq signals for the 3028 regions that gain accessibility in Otx2-/- EpiLCs has been added as new Figure S3B. Only 28 out of the 3,028 regions overlap an OTX2 peak as shown in the new Figure S3A. These regions appear to be already open in ESCs (Figure S3C) and they do not fully close when OTX2 is absent. This can be explained by either a) the lack of expression of an OTX2 target gene that represses these regions or b) the continuous expression of a gene that is usually repressed by OTX2 in the transition to EpiLCs. In both cases, OTX2 does not directly repress these regions. Figure legends have been amended to incorporate the new panels. The main text will be modified to incorporate these results.

      • *

      Figure 6: Do the PGCLCs with OTX2 expression have chromatin accessibility profiles similar to somatic cells? Consider adding WT somatic cell data to Figure 6A, which could be an interesting comparison with the Tam d0-d2 samples.

      *The heatmap showing the ATAC-seq signal at the additional OTX2-induced regions in somatic cells has been added to Figure 6A. The data show that the regions induced by OTX2 are not open in somatic cells generated in GK15. One possible explanation is the overexpression of OTX2 induces the opening of neural-associated regions, but neural differentiation is not fully supported in GK15 medium (see reviewer 2, point 3). As suggested by reviewer 2, we will perform RT-qPCR of germ layer markers to analyse the identity of somatic cells grown in GK15 (without cytokines) and somatic cells induced by OTX2 overexpression. *

      • *

      • *

      • *

      Reviewer 2:

      The authors focus solely on the activating role of Otx2 in their data, but given the substantial proportion of DARs that decrease following Otx2 depletion, I presume it is possible that it also has a repressive effect? Either way, this should be discussed.

      *As also suggested by reviewer 1 (point 6), we analysed the accessibility level and the OTX2 signal at the 3,028 regions that gain accessibility in Otx2-/- EpiLCs (new Figure S3A-C). These regions show high accessibility in ESCs suggesting that these are ESC regions that do not close properly in the transition to EpiLCs in the absence of OTX2. OTX2 CUT&RUN show a low to absent signal at these regions, with just 28 regions overlapping EpiLCs DARs that show higher accessibility in Otx2-/- cells, suggesting that OTX2 does not have a direct suppressive effect on them. *

      • *

      The authors state that d2 PGCLCs "show an intermediate position between ESCs and EpiLCs" based on the PCA location. They should be careful to qualify that this is only in the first 2 principal components, because it may well be the case (and is likely) that in other components the PGCLC population is far removed from the pluripotent states.

      • The text has been updated as follows: d2 PGCLCs “show an intermediate position between ESCs and EpiLCs on both PC1 and PC2”.*

      • *

      Reviewer2 Minor Suggestions:

      1. Presumably the regions bound by OTX2 in Tet2, Mycn and Fgf5 (Fig1E) are called enhancers because these are known from existing literature. It would be helpful to cite the relevant references to this in the text for those unfamiliar with these. References (Whyte et al, Cell, 2018 – Tet2 and Mycn, Buecker et al, Cell Stem Cell, 2013, Thomas et al, Mol Cell 2021 – Fgf5) have been added to the text and the figure legends.

      On page 13, the authors say "To determine whether OTX2 expression is essential to maintain chromatin accessibility in somatic cells..." but this does not seem to be what they test because they are using PGCLC medium. Perhaps I misunderstood, but this could be clarified.

      *Expression of OTX2 during the first 2 days of PGCLC differentiation leads to a block of germline differentiation as previously shown in Zhang, Zhang et al, Nature 2018. After 2 days of tamoxifen treatment, cells have acquired somatic fate and cells will undergo somatic differentiation even after tamoxifen is withdrawn after day 2. Nevertheless, we agree with the reviewer that the sentence is of difficult interpretation and we modified the sentence as shown below and as reported in the updated manuscript: “To determine whether OTX2 expression is essential to maintain chromatin accessibility in cells differentiating in the presence of PGC-inducing cytokines after day 2” (page 12). *

      On page 14 the authors claim, "These results indicate that...the partner proteins that OTX2 act alongside differ...". While this may be the case, their results do not substantiate this, it is just speculation. Should be toned down.

      The text has been modified as follows: "These results suggest that...the partner proteins that OTX2 act alongside differ..."

      Page 18, PGCLC differentiation method sections needs to be described as such (ie. Add "For PGCLC differentiation..." before the second paragraph)

      *The text “For PGCLC differentiation” has been added at the beginning of the PGCLC differentiation method section. *

      It would be helpful to indicate time on the protocol schematics (eg Fig4A, 5A, 5D etc) as I had to keep checking the methods to find out how long the full differentiation time-course was.

      *Indication of time has been added to Figures 1, 2, 4, 5 and 6. *

      Since the authors compare between the Tam d0-d2 treatments assessed at d2 versus d4 (Figure5B vs 5E) it would be helpful to make the colourbars the same scale, for both ATAC and Cut&Run datasets.

      *The heatmap in Figure 5B has been modified. The colourbars of Figure 5B and 5E are now using the same scale. *

      • *

      Reviewer 3:

      1) As a minor point related to this, the second sentence is confusing since it kind of sounds like Otx2-/- and WT cells are compared under the same conditions unless one carefully reads the previous sentence.

      The text has been modified to clarify the different medium conditions for WT and OTX2-/- cells, as follows: “In the presence of PGC-inducing cytokines, Otx2-/- cells produce an essentially pure (>90%) CD61+/SSEA1+ population that we refer to as PGCLCs, while wild-type cells differentiated in GK15 medium without cytokines yield a cell population from which PGCLCs are absent” (page 7).

      • *

      2) "This suggests that OTX2 acts as a pioneer TF to regulate the accessibility of enhancers E1, E2 and E3."

      This is from the text corresponding to Fig. 2. That data actually only shows that Otx2-null cells have DARs, so somehow OTX2 affects chromatin accessibility but it could be indirect by controlling transcription of genes that modify chromatin accessibility. It is not until figure 4 that the data suggests that OTX2 directly affects accessibility, perhaps as a pioneer TF.

      The authors continue to make many statements about the direct action of OTX2 before the data supporting this is shown, on which I got hung up as a reader. I suggest the authors edit the manuscript to improve this. E.g. "OTX2 may directly control accessibility at these sites (Figure 3E)." and the fact that in 3E and other figure, it says "DARs increased by OTX2 binding" which at that point is not proven, so would better say "Otx2-null vs WT DARs" or something like that.

      The sentence "This suggests that OTX2 acts as a pioneer TF to regulate..” has been removed from the text (page 9). The sentence “OTX2 may directly control accessibility at these sites” has been modified with “*suggesting that the presence of OTX2 affects accessibility at these sites” (page 9). The sentence “ Together, these results suggest that OTX2 is required to open these chromatin regions” has been modified to “Together, these results suggest that OTX2 is required for the accessibility of these chromatin regions”. *

      The subset of DARs that increase in WT EpiLC and are bound by OTX2 that was called “DARs increased by OTX2 binding” has been renamed as “DARs higher in WT with OTX2 binding”. For consistency, the subset of DARs showing increased accessibility in WT EpiLCs that are not bound by OTX2 are now called “DARs higher in WT without OTX2 binding” (Figure 3, Figure 4, main text and figure legends). We will further revise the manuscript to avoid statements or hypotheses that are not yet supported by data throughout the text.

      • *

      Reviewer 3 – minor comments:

      1) "Comparing wild-type and Otx2-/- ESCs identified 375 differentially accessible regions (DARs) with increased accessibility in wild-type cells, and 743 regions with higher accessibility in Otx2-/- ESCs (Figures 2C). An example of ESC DARs where accessibility is increased in cells expressing OTX2 is the intragenic enhancer of Tet2. Tet2 is expressed at high levels in ESCs but at low levels in EpiLCs."

      The authors compare Otx2-null and WT ESCs then proceed to give an example comparing ESCs to EpiLCs, instead of Otx2-null vs WT ESCs, which is confusing.

      Furthermore, here and in other places the authors describe ESCs as not expressing OTX2. However, they also show CUT&RUN data for OTX2 in ESCs etc, clearly indicating that it is expressed, just lower (otherwise how could one get anything?).

      *We originally chose Tet2 enhancer as an example of the 375 ESC DAR with higher accessibility in WT vs Otx2-/- ESCs as it shows a slightly decreased level of accessibility and OTX2 binding in ESCs. Therefore, the sentence “where accessibility is increased in cells expressing OTX2” refers to WT cells (expressing OTX2) when compared to Otx2-/- cells (OTX2-null). The text has been changed to describe the new panel. The rest of the main text will be checked and modified where appropriate to avoid possible misinterpretations. *

      *We also appreciate that the change in accessibility is not clearly visible in the original Figure 2, as also pointed out by Reviewer 1 (point 6). In the updated Figure 2, we show a region in the Hes1 locus as an example of the 375 ESC DARs. Moreover, we simplified the panels showing ATAC-seq tracks of WT and OTX2-/- ESC (Fig. 2D-E) or EpiLCs (Fig. G-H). *

      2) "In contrast, in EpiLCs, OTX2 binds almost 40% (446 out of 1,246) of the DARs that are more accessible in wild-type than in Otx2-/- cells (Figure 3B-C). Notably, these regions are mainly located distal to genes (91%, Figure 3D), despite the increased fraction of promoter regions bound by OTX2 in EpiLCs (Figure S1A)."

      Are the authors rounding percentages with 2 significant digits, as suggested by the "91%"? If so, 446/1245 ~ 36%, not 40%.

      *The text has been modified from “OTX2 binds almost 40%” to “OTX2 binds 36%”. *

      3) The results in Figure 4 are nice and the real meat of the paper. One suggestion: It would be helpful is Fig. 4B were split up between the 446 and 800 genes instead of showing all 1246, and if the WT control was shown in the same figure as well.

      *Panels with the 446 and 800 regions have been added to Figure 4 instead of the panels with all 1246 regions. WT control has been inserted in Figure 4. The main text and the figure legends have been updated accordingly. *

      4) "Enforced OTX2 expression opens additional somatic regulatory regions" - it would be clearer to say "OTX2 overexpression opens additional somatic regulatory regions", since this is really about DARs between EpiLCs that already express OTX2 and those forced to express higher than WT endogenous levels by the OTX2-ER system?

      We thank the reviewer for their suggestion. The text has been modified (page 12)

      • *

      • *

    1. Author response:

      General Statements

      In our manuscript, we demonstrate for the first time that RNA Polymerase I (Pol I) can prematurely release nascent transcripts at the 5' end of ribosomal DNA transcription units in vivo. This achievement was made possible by comparing wild-type Pol I with a mutant form of Pol I, hereafter called SuperPol previously isolated in our lab (Darrière at al., 2019). By combining in vivo analysis of rRNA synthesis (using pulse-labelling of nascent transcript and cross-linking of nascent transcript - CRAC) with in vitro analysis, we could show that Superpol reduced premature transcript release due to altered elongation dynamics and reduced RNA cleavage activity. Such premature release could reflect regulatory mechanisms controlling rRNA synthesis. Importantly, This increased processivity of SuperPol is correlated with resistance with BMH-21, a novel anticancer drugs inhibiting Pol I, showing the relevance of targeting Pol I during transcriptional pauses to kill cancer cells. This work offers critical insights into Pol I dynamics, rRNA transcription regulation, and implications for cancer therapeutics.

      We sincerely thank the three reviewers for their insightful comments and recognition of the strengths and weaknesses of our study. Their acknowledgment of our rigorous methodology, the relevance of our findings on rRNA transcription regulation, and the significant enzymatic properties of the SuperPol mutant is highly appreciated. We are particularly grateful for their appreciation of the potential scientific impact of this work. Additionally, we value the reviewer’s suggestion that this article could address a broad scientific community, including in transcription biology and cancer therapy research. These encouraging remarks motivate us to refine and expand upon our findings further.

      All three reviewers acknowledged the increased processivity of SuperPol compared to its wildtype counterpart. However, two out of three questions our claims that premature termination of transcription can regulate ribosomal RNA transcription. This conclusion is based on SuperPol mutant increasing rRNA production. Proving that modulation of early transcription termination is used to regulate rRNA production under physiological conditions is beyond the scope of this study. Therefore, we propose to change the title of this manuscript to focus on what we have unambiguously demonstrated:

      “Ribosomal RNA synthesis by RNA polymerase I is subjected to premature termination of transcription”.

      Reviewer 1 main criticisms centers on the use of the CRAC technique in our study. While we address this point in detail below, we would like to emphasize that, although we agree with the reviewer’s comments regarding its application to Pol II studies, by limiting contamination with mature rRNA, CRAC remains the only suitable method for studying Pol I elongation over the entire transcription units. All other methods are massively contaminated with fragments of mature RNA which prevents any quantitative analysis of read distribution within rDNA.  This perspective is widely accepted within the Pol I research community, as CRAC provides a robust approach to capturing transcriptional dynamics specific to Pol I activity. 

      We hope that these findings will resonate with the readership of your journal and contribute significantly to advancing discussions in transcription biology and related fields.

      (1) Description of the planned revisions

      Despite numerous text modification (see below), we agree that one major point of discussion is the consequence of increased processivity in SuperPol mutant on the “quality” of produced rRNA. Reviewer 3 suggested comparisons with other processive alleles, such as the rpb1-E1103G mutant of the RNAPII subunit (Malagon et al., 2006). This comparison has already been addressed by the Schneider lab (Viktorovskaya OV, Cell Rep., 2013 - PMID: 23994471), which explored Pol II (rpb1-E1103G) and Pol I (rpa190-E1224G). The rpa190-E1224G mutant revealed enhanced pausing in vitro, highlighting key differences between Pol I and Pol II catalytic ratelimiting steps (see David Schneider's review on this topic for further details).

      Reviewer 2 and 3 suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Pol I mutant with decreased rRNA cleavage have been characterized previously, and resulted in increased errorrate. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively. This could provide valuable insights into the mechanistic differences between SuperPol and the wild-type enzyme. SuperPol is the first pol I mutant described with an increased processivity in vitro and in vivo, and we agree that this might be at the cost of a decreased fidelity.

      Regulatory aspect of the process:

      To address the reviewer’s remarks, we propose to test our model by performing experiments that would evaluate PTT levels in Pol I mutant’s or under different growth conditions. These experiments would provide crucial data to support our model, which suggests that PTT is a regulatory element of Pol I transcription. By demonstrating how PTT varies with environmental factors, we aim to strengthen the hypothesis that premature termination plays an important role in regulating Pol I activity.

      We propose revising the title and conclusions of the manuscript. The updated version will better reflect the study's focus and temper claims regarding the regulatory aspects of termination events, while maintaining the value of our proposed model.

      (2) Description of the revisions that have already been incorporated in the transferred manuscript

      Some very important modifications have now been incorporated:

      Statistical Analyses and CRAC Replicates:

      Unlike reviewers 2 and 3, reviewer 1 suggests that we did not analyze the results statistically. In fact, the CRAC analyses were conducted in biological triplicate, ensuring robustness and reproducibility. The statistical analyses are presented in Figure 2C, which highlights significant findings supporting the fact WT Pol I and SuperPol distribution profiles are different. We CRAC replicates exhibit a high correlation and we confirmed significant effect in each region of interest (5’ETS, 18S.2, 25S.1 and 3’ ETS, Figure 1) to confirm consistency across experiments. We finally took care not to overinterpret the results, maintaining a rigorous and cautious approach in our analysis to ensure accurate conclusions.

      CRAC vs. Net-seq:

      Reviewer 1 ask to comment differences between CRAC and Net-seq. Both methods complement each other but serve different purposes depending on the biological question on the context of transcription analysis. Net-seq has originally been designed for Pol II analysis. It captures nascent RNAs but does not eliminate mature ribosomal RNAs (rRNAs), leading to high levels of contamination. While this is manageable for Pol II analysis (in silico elimination of reads corresponding to rRNAs), it poses a significant problem for Pol I due to the dominance of rRNAs (60% of total RNAs in yeast), which share sequences with nascent Pol I transcripts. As a result, large Net-seq peaks are observed at mature rRNA extremities (Clarke 2018, Jacobs 2022). This limits the interpretation of the results to the short lived pre-rRNA species. In contrast, CRAC has been specifically adapted by the laboratory of David Tollervey to map Pol I distribution while minimizing contamination from mature rRNAs (The CRAC protocol used exclusively recovers RNAs with 3′ hydroxyl groups that represent endogenous 3′ ends of nascent transcripts, thus removing RNAs with 3’-Phosphate, found in mature rRNAs). This makes CRAC more suitable for studying Pol I transcription, including polymerase pausing and distribution along rDNA, providing quantitative dataset for the entire rDNA gene.

      CRAC vs. Other Methods:

      Reviewer 1 suggests using GRO-seq or TT-seq, but the experiments in Figure 2 aim to assess the distribution profile of Pol I along the rDNA, which requires a method optimized for this specific purpose. While GRO-seq and TT-seq are excellent for measuring RNA synthesis and cotranscriptional processing, they rely on Sarkosyl treatment to permeabilize cellular and nuclear membranes. Sarkosyl is known to artificially induces polymerase pausing and inhibits RNase activities which are involved in the process. To avoid these artifacts, CRAC analysis is a direct and fully in vivo approach. In CRAC experiment, cells are grown exponentially in rich media and arrested via rapid cross-linking, providing precise and artifact-free data on Pol I activity and pausing.

      Pol I ChIP Signal Comparison:

      The ChIP experiments previously published in Darrière et al. lack the statistical depth and resolution offered by our CRAC analyses. The detailed results obtained through CRAC would have been impossible to detect using classical ChIP. The current study provides a more refined and precise understanding of Pol I distribution and dynamics, highlighting the advantages of CRAC over traditional methods in addressing these complex transcriptional processes.

      BMH-21 Effects:

      As highlighted by Reviewer 1, the effects of BMH-21 observed in our study differ slightly from those reported in earlier work (Ref Schneider 2022), likely due to variations in experimental conditions, such as methodologies (CRAC vs. Net-seq), as discussed earlier. We also identified variations in the response to BMH-21 treatment associated with differences in cell growth phases and/or cell density. These factors likely contribute to the observed discrepancies, offering a potential explanation for the variations between our findings and those reported in previous studies. In our approach, we prioritized reproducibility by carefully controlling BMH-21 experimental conditions to mitigate these factors. These variables can significantly influence results, potentially leading to subtle discrepancies. Nevertheless, the overall conclusions regarding BMH-21's effects on WT Pol I are largely consistent across studies, with differences primarily observed at the nucleotide resolution. This is a strength of our CRAC-based analysis, which provides precise insights into Pol I activity.

      We will address these nuances in the revised manuscript to clarify how such differences may impact results and provide context for interpreting our findings in light of previous studies.

      Minor points:

      Reviewer #1:

      •  In general, the writing style is not clear, and there are some word mistakes or poor descriptions of the results, for example: 

      •  On page 14: "SuperPol accumulation is decreased (compared to Pol I)". 

      •  On page 16: "Compared to WT Pol I, the cumulative distribution of SuperPol is indeed shifted on the right of the graph." 

      We clarified and increased the global writing style according to reviewer comment.

      •  There are also issues with the literature, for example: Turowski et al, 2020a and Turowski et al, 2020b are the same article (preprint and peer-reviewed). Is there any reason to include both references? Please, double-check the references.  

      This was corrected in this version of the manuscript.

      •  In the manuscript, 5S rRNA is mentioned as an internal control for TMA normalisation. Why are Figure 1C data normalised to 18S rRNA instead of 5S rRNA? 

      Data are effectively normalized relative to the 5S rRNA, but the value for the 18S rRNA is arbitrarily set to 100%.

      •  Figure 4 should be a supplementary figure, and Figure 7D doesn't have a y-axis labelling. 

      The presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. In the absence of these subunits (which can vary depending on the purification batch), Pol I pausing, cleavage and elongation are known to be affected. To strengthen our conclusion, we really wanted to show the subunit composition of the purified enzyme. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      Y-axis is figure 7D is now correctly labelled

      •  In Figure 7C, BMH-21 treatment causes the accumulation of ~140bp rRNA transcripts only in SuperPol-expressing cells that are Rrp6-sensitive (line 6 vs line 8), suggesting that BHM-21 treatment does affect SuperPol. Could the author comment on the interpretation of this result? 

      The 140 nt product is a degradation fragment resulting from trimming, which explains its lower accumulation in the absence of Rrp6. BMH21 significantly affects WT Pol I transcription but has also a mild effect on SuperPol transcription. As a result, the 140 nt product accumulates under these conditions.

      Reviewer #2:

      •  pp. 14-15: The authors note local differences in peak detection in the 5'-ETS among replicates, preventing a nucleotide-resolution analysis of pausing sites. Still, they report consistent global differences between wild-type and SuperPol CRAC signals in the 5'ETS (and other regions of the rDNA). These global differences are clear in the quantification shown in Figures 2B-C. A simpler statement might be less confusing, avoiding references to a "first and second set of replicates" 

      According to reviewer, statement has been simplified in this version of the manuscript.

      •  Figures 2A and 2C: Based on these data and quantification, it appears that SuperPol signals in the body and 3' end of the rDNA unit are higher than those in the wild type. This finding supports the conclusion that reduced pausing (and termination) in the 5'ETS leads to an increased Pol I signal downstream. Since the average increase in the SuperPol signal is distributed over a larger region, this might also explain why even a relatively modest decrease in 5'ETS pausing results in higher rRNA production. This point merits discussion by the authors. 

      We agree that this is a very important discussion of our results. Transcription is a very dynamic process in which paused polymerase is easily detected using the CRAC assay. Elongated polymerases are distributed over a much larger gene body, and even a small amount of polymerase detected in the gene body can represent a very large rRNA synthesis. This point is of paramount importance and, as suggested by the reviewer, is now discussed in detail.

      •  A decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. Have the authors observed any evidence supporting this possibility? 

      Reviewer suggested that a decreased efficiency of cleavage upon backtracking might imply an increased error rate in SuperPol compared to the wild-type enzyme. We already started to address this point. Preliminary results from in vitro experiments suggest that SuperPol mutants exhibit an elevated error rate during transcription. However, these findings remain preliminary and require further experimental validation to confirm their reproducibility and robustness. We propose to consolidate these data and incorporate into the manuscript to address this question comprehensively.

      •  pp. 15 and 22: Premature transcription termination as a regulator of gene expression is welldocumented in yeast, with significant contributions from the Corden, Brow, Libri, and Tollervey labs. These studies should be referenced along with relevant bacterial and mammalian research. 

      According to reviewer suggestion, we referenced these studies.

      •  p. 23: "SuperPol and Rpa190-KR have a synergistic effect on BMH-21 resistance." A citation should be added for this statement. 

      This represents some unpublished data from our lab. KR and SuperPol are the only two known mutants resistant to BMH-21. We observed that resistance between both alleles is synergistic, with a much higher resistance to BMH-21 in the double mutant than in each single mutant (data not shown). Comparing their resistance mechanisms is a very important point that we could provide upon request. This was added to the statement.

      •  p. 23: "The released of the premature transcript" - this phrase contains a typo 

      This is now corrected.

      Reviewer #3:

      •  Figure 1B: it would be opportune to separate the technique's schematic representation from the actual data. Concerning the data, would the authors consider adding an experiment with rrp6D cells? Some RNAs could be degraded even in such short period of time, as even stated by the authors, so maybe an exosome depleted background could provide a more complete picture. Could also the authors explain why the increase is only observed at the level of 18S and 25S? To further prove the robustness of the Pol I TMA method could be good to add already characterized mutations or other drugs to show that the technique can readily detect also well-known and expected changes. 

      The precise objective of this experiment is to avoid the use of the Rrp6 mutant. Under these conditions, we prevent the accumulation of transcripts that would result from a maturation defect. While it is possible to conduct the experiment with the Rrp6 mutant, it would be impossible to draw reliable conclusions due to this artificial accumulation of transcripts.

      •  Figure 1C: the NTS1 probe signal is missing (it is referenced in Figure 1A but not listed in the Methods section or the oligo table). If this probe was unused, please correct Figure 1A accordingly. 

      We corrected Figure 1A.  

      •  Figure 2A: the RNAPI occupancy map by CRAC is hard to interpret. The red color (SuperPol) is stacked on top of the blue line, and we are not able to observe the signal of the WT for most of the position along the rDNA unit. It would be preferable to use some kind of opacity that allows to visualize both curves. Moreover, the analysis of the behavior of the polymerase is always restricted to the 5'ETS region in the rest of the manuscript. We are thus not able to observe whether termination events also occur in other regions of the rDNA unit. A Northern blot analysis displaying higher sizes would provide a more complete picture. 

      We addressed this point to make the figure more visually informative. In Northern Blot analysis, we use a TSS (Transcription Start Site) probe, which detects only transcripts containing the 5' extremity. Due to co-transcriptional processing, most of the rRNA undergoing transcription lacks its 5' extremity and is not detectable using this technique. We have the data, but it does not show any difference between Pol I and SuperPol. This information could be included in the supplementary data if asked.

      •  "Importantly, despite some local variations, we could reproducibly observe an increased occupancy of WT Pol I in 5'-ETS compared to SuperPol (Figure 1C)." should be Figure 2C. 

      Thanks for pointing out this mistake. it has been corrected.

      •  Figure 3D: most of the difference in the cumulative proportion of CRAC reads is observed in the region ~750 to 3000. In line with my previous point, I think it would be worth exploring also termination events beyond the 5'-ETS region. 

      We agree that such an analysis would have been interesting. However, with the exception of the pre-rRNA starting at the transcription start site (TSS) studied here, any cleaved rRNA at its 5' end could result from premature termination and/or abnormal processing events. Exploring the production of other abnormal rRNAs produced by premature termination is a project in itself, beyond this initial work aimed at demonstrating the existence of premature termination events in ribosomal RNA production.

      •  Figure 4: should probably be provided as supplementary material. 

      As l mentioned earlier (see comments), the presence of all Pol I specific subunits (Rpa12, Rpa34 and Rpa49) is crucial for the enzymatic activity we performed. This important control should be shown, but can indeed be shown in a supplementary figure if desired.

      •  "While the growth of cells expressing SuperPol appeared unaffected, the fitness of WT cells was severely reduced under the same conditions." I think the growth of cells expressing SuperPol is slightly affected. 

      We agree with this comment and we modified the text accordingly.

      •  Figure 7D: the legend of the y-axis is missing as well as the title of the plot. 

      Legend of the y-axis and title of the plot are now present.

      •  The statements concerning BMH-21, SuperPol and Rpa190-KR in the Discussion section should be removed, or data should be provided.

      This was discussed previously. See comment above.

      •  Some references are missing from the Bibliography, for example Merkl et al., 2020; Pilsl et al., 2016a, 2016b. 

      Bibliography is now fixed

      (3) Description of analyses that authors prefer not to carry out

      Does SuperPol mutant produces more functional rRNAs ?

      As Reviewer 1 requested, we agree that this point requires clarification.. In cells expressing SuperPol, a higher steady state of (pre)-rRNAs is only observed in absence of degradation machinery suggesting that overproduced rRNAs are rapidly eliminated. We know that (pre)rRNas are unable to accumulate in absence of ribosomal proteins and/or Assembly Factors (AF). In consequence, overproducing rRNAs would not be sufficient to increase ribosome content. This specific point is further address in our lab but is beyond the scope of this article.

      Is premature termination coupled with rRNA processing 

      We appreciate the reviewer’s insightful comments. The suggested experiments regarding the UTP-A complex's regulatory potential are valuable and ongoing in our lab, but they extend beyond the scope of this study and are not suitable for inclusion in the current manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The study investigates the relationship between replication timing (RT) and transcription. While there is evidence that transcription can influence RT, the underlying mechanisms remain unclear. To address this, the authors examined a single genomic locus that undergoes transcriptional activation during differentiation. They engineered the Pln locus by inserting promoters of varying strengths to modulate transcription levels and assessed the impact on replication timing using Repli-seq. Key Findings: • Figure 1C and 1D: The data show that higher transcription levels correlate with an advanced RT, suggesting that transcriptional activity influences replication timing. • Figure 2: To determine whether transcription alone is sufficient to alter RT, the authors inserted an hPGK reporter at different genomic locations. However, given the findings in Figure 1, which suggest that this is not the primary mechanism, • Figure 3: The authors removed the marker to examine whether the observed effects were due to the promoter-driven Pln locus, which has significantly larger then the marker. • Figure 4: The study explores the effect of increased doxycycline (Dox) treatment at the TRE (tetracycline response element), further supporting the role of transcription in RT modulation. • Figure 5: The findings demonstrate that Dox-induced RT advancement occurs rapidly, is reversible, and correlates with transcription levels, reinforcing the hypothesis that transcription plays a direct role in influencing replication timing. • Figure 6. Shows that during differentiation transcription of Pln is not required for RT advancement.

      Overall, the study presents a compelling link between transcription and replication timing, though some experimental choices warrant further clarification. I have no major comments.

      __Minor Comments: __Overall, the results are convincing, and the study appears to be well-conducted. In Figure 2, the authors use the hPGK promoter. However, it is unclear why they did not use the constructs from the previous experiments. Given that the hPGK promoter did not advance RT in Figure 1, the results in Figure 2 may not be entirely unexpected.

      We took advantage of previously published cell lines using a PiggyBac Vector designed to pepper the reporter gene at random sites throughout the genome; the point of the experiment was to acquire supporting evidence for the hypothesis that any vector with its selectable marker driven by the hPGK promoter will not advance RT no matter where it is inserted. Since there are reports concluding that transcription per se is sufficient to advance RT, it was important to confirm that there was nothing unique about the particular vector or locus into which we inserted our panel of vectors.

      ACTION DONE: We have now added the following sentence to the results describing this experiment: “____By analyzing RT in these lines, we could evaluate the effect of a different hPGK vector on RT when integrated at many different chromosomal sites. “

      Additionally, the study does not formally exclude the possibility that Pln protein expression itself influences RT. In Figure 1, readthrough transcription at the Pln locus could potentially drive protein expression. It would be useful to know whether the authors address this point in the discussion.

      NOT DONE FOR NEED OF CLARIFICATION: It is unclear why a secreted neural growth factor would have a direct effect on replication timing in embryonic stem cells and, in particular, only in cis (remember there is a control allele that is unaffected). We would be happy to address this in the Discussion if we understood the reviewers’ hypothesis. We cannot respond to this comment without understanding the hypothesis being tested as we do not know how a secreted protein could affect the RT of one allele without affecting the other.

      Regarding the mechanism, if transcription across longer genomic regions contributes to RT changes, transcription-induced could DNA supercoiling play a role. For instance, could negative supercoiling generated by active transcription influence replication timing?

      Yes, many mechanisms are possible.

      ACTION DONE: ____We have added the following sentence to the discussion, referencing a seminal paper on that topic by Nick Gilbert: “ ____For example, long transcripts could remodel a large segment of chromatin, possibly by creating domains of DNA supercoiling (Naughton et. al., 2013____).____”

      It remains puzzling why Pln transcription does not contribute to replication timing during differentiation. Is there any evidence of chromatin opening during this process? For example, are ATAC-seq profiles available that could provide insights into chromatin accessibility changes during differentiation?

      We thank the reviewer for asking this as we should have mentioned something very important here. Lack of necessity for transcription implies that independent mechanisms are functioning to elicit the RT switch. In other work (Turner et. al., bioRxiv, provisionally accepted to EMBO J.), we have shown that specific cis elements (ERCEs) can function to maintain early replication in the absence of transcription.

      ACTION DONE: We now explicitly state in the Discussion: “____This is not surprising, given that ERCEs can maintain early RT in the absence of transcription (Turner, bioRxiv).”

      ACTION TO BE DONE SOON: We will provide a new Figure 6D showing ATAC-seq changes upon differentiation of mESCs to mNPCs and their location relative to the promoter/enhance deletion. As you will see, there is an ATAC-seq site that appears during differentiation, upstream of the deletion. We will hypothesize in the revised manuscript that these are the elements that drive the RT switch and that future studies need to investigate that hypothesis. We have also added the following sentences to the discussion after the sentence above, stating: “____In fact, new sites of open chromatin, consistent with ERCEs appear outside of the deleted Ptn transcription control elements after differentiation (soon to be revised Figure 6D). The necessity and sufficiency of these sites to advance RT independent of transcription will be important to follow up.”

      We also have preliminary data that are part of a separate project in the lab so they are not ready for publication, but are directly relevant to the reviewer’s question. This data shows evidence for a region upstream of the Ptn promoter/enhancer deletion described in Figure 6 that, when deleted, DOES have an effect on the RT switch during differentiation. This deletion overlaps an ATAC-seq site we will show in the new figure 6D.

      Reviewer #1 (Significance (Required)):

      This is a compelling basic single-locus study that systematically compares replication timing (RT) and transcription dynamics while measuring several key parameters of transcription.

      My relevant expertise lies in transcriptional regulation and understanding how noncoding transcription influences local chromatin and gene expression.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In the manuscript entitled: Transcription can be sufficient, but is not necessary, to advance replication timing", the authors use as they state a "reductionist approach" to address a long-standing question in the replication field on what level the process of transcription within a replication domain can alter the underlying replication timing of this domain. The authors use an elegant hybrid mouse embryonic stem cell line to discriminate the two allelic copies and focus on a specific replication domain harboring the neuronal Ptn gene that is only expressed upon differentiation. The authors first introduce four different promoters in the locus upstream of Ptn gene that drive expression of small transgenes. Only the promoters with highest transcriptional induction could advance RT. If the promoters are placed in such a way that they drive expression of the 96kb Ptn gene, then also some the weaker promoters can drive RT advancement, suggesting that it is a combination of transcriptional strength and size of the transcribed domain important for RT changes. Using a DOX-inducible promoter, the authors show that this happens very fast (3-6h after transcription induction) and is reversible as removal of DOX leads to slower RT again. Finally, deleting the promoter of Ptn gene and driving cells into differentiation still advances RT, allowing the authors to conclude that "transcription can be sufficient but not necessary to advance replication timing."

      Major comments: Overall, this is a well designed study that includes all necessary controls to support the author's conclusions. I think it is a very interesting system that the authors developed. The weakness of the manuscript is that there is no mechanistic explanation how such RT changes are achieved on a molecular basis. But I'm confident that the system could be indeed used to further dissect the mechanistic basis for the transcription dependence of RT advancements.

      Therefore, I support publication of this manuscript if a few comments below can be addressed.

      1) Figure 4 shows a titration of different DOX concentrations and provides clear evidence that the degree of RT advancement tracks well with the level of transcription. As the doses of DOX are quite high in this experiment, have the authors checked on a global scale to what extent transcription might be deregulated in neighbouring genes or genome-wide?

      The DOX concentration that we use for all experiments other than the titration is 2 µg/ml, which is quite standard. The high concentrations (up to 16µg/ml) are only used in the titration experiments shown in Figure 4 to demonstrate that we have reached a plateau. In fact, we stated in Materials and Methods that high doses of Dox led to cell toxicity. Looking at the transcription datasets, there are no significant changes in transcription below 8µg/ml, a few dozen significant changes at 8 and more such changes at 16µg/ml of DOX. The tables of genome wide RT and transcription are provided in the manuscript for anyone wishing to investigate the effects of Dox on cellular physiology but at the concentration used in all other experiments (2µg/ml) there are no effects on transcription.

      __ACTION DONE: We have now modified the statement in the Materials and Methods to read: “ ____Mild toxicity and changes in genome-wide transcription were observed at 8µg/ml and more so at 16µg/ml”. __

      2) One general aspect is that the whole study is only focused on the one single Ptn replication domain. Could the authors extend this rather narrow view a bit and also show RT data in the neighbouring domains. This would be particularly important for the DOX titration experiment that has the potential to induce transcriptional deregulation (see comment above).

      __ACTION DONE: We have now added to revised Supplemental Figure 4 a zoom out of 10 Mb surrounding the Ptn gene showing no detectable effects on RT at any of the titration concentrations. __

      __ACTION TO BE DONE SOON: To address the generalization of the findings (length and strength matter), we have repeated the ESC to NPC differentiation and performed both Repli-seq and BrU-seq to evaluate RT changes relative to total genomic nascent transcriptional changes. The sequencing reads for this experiment are in our analyst’s hands so we expect this to be ready within a few weeks. We will provide a new Figure 7 comparing genome-wide changes in RT vs. transcription to determine the significance of length and strength of transcription induction to RT advances and the necessity of transcriptional induction for RT advances. We and other laboratories have performed many integrative analyses of RNA-microarray/RNA-seq data vs. RT changes, but not total genomic nascent transcription and not with a focus on the effect of length and strength of transcription. For example, outcomes that would be consistent with our reductionist findings at the Ptn locus would be if we find domains that are advanced for RT with no induction of transcription (transcription not necessary) and little to no regions showing significant induction of transcription without RT advances. __

      3) Figure 5 shows that the full capacity to advance RT upon DOX induction of the Ptn gene is achieved after 3h to 6h of DOX induction, so substantially less than a full cell cycle in mEScs (12h). This result suggests that origin licensing/MCM loading cannot be the critical mechanism to drive the RT change because only a small fraction of the cells has undergone M/G1-phase where origins are starting to get loaded. As a large fraction of mESCs (60-70%) are S-phase cells in an asynchronous population, the mechanism is likely taking place directly in S-phase. Could the authors try to synchronize cells in G1/S using double-thymidine block, then induce DOX for 3h before allowing cells to reenter S-phase and then check replication timing of the domain? This can be compared to an alternative experiment where transcription is only induced for 3h upon release into S-phase. This could provide more mechanistic insights as to whether transcription is sufficient to drive RT changes in G1 versus S-phase cells.

      We agree that the timing of induction is such that it is very likely that alterations in RT can occur during S phase. The reviewer proposes a reasonable experiment that could be done, but it would require a long delay of this publication to develop and validate those synchronization protocols and we do not have personnel at this time to carry out the experiment. This would be a great initiating experiment for someone to pursue the mechanisms by which transcription can advance RT.

      ACTION DONE: We have added the following sentence to the Discussion section on mechanisms: ____The rapid nature of the RT change after induction of transcription suggests that RT changes can occur after the functional loading of inactive MCM helicases onto chromatin in telophase/early G1 (Dimitrova, JCB, 1999; Okuno, EMBO J. 2001; Dimitrova, J. Cell Sci, 2002), and possibly after S phase begins.

      Minor comments: • Figure 1B and Figure 6A. Quality of the genome browser snapshots could be improved and certain cryptic labelling such as "only Basic displayed by default" could be removed

      ACTION DONE: We have modified these figures.

      • The genome browser tracks appear a bit small across the figures and could be visually improved.

      ACTION DONE: We have modified the genome browser tracks to improve their presentation

      • In figure 1E we see an advancement in RT in Ptn gene caused by nearby enhanced Hyg-TK gene expression induced by mPGK promoter. However, in figure 3D we see mPGK promoter has reduced ability to advance RT of Ptn gene. It would be nice to address this discrepancy in the results.

      The reviewer’s point is well taken. We are not sure of the answer. You can see that the transcription is very low in both cases, while the RT shift is greater in one replicate vs. the other.

      ACTION DONE: We have, rather unsatisfactorily, added the following sentence to the results section describing Figure 3. “____We do not know why the mPGK promoter was so poor at driving transcription in this context.”

      Reviewer #2 (Significance (Required)):

      In my point of view, this is an important study that unifies a large amount of literature into a conceptual framework that will be interesting to a broad audience working on the intertwined fields of gene regulation, transcription and DNA replication, as well as cell fate switching and development.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __ In their manuscript, "Transcription can be sufficient, but is not necessary, to advance replication timing," Vouzas et al. take a systematic and reductionist approach to investigate a late-replicating domain on chromosome VI. Here, they examine the effect of transcribing a single gene locus, Pleiotrophin, on replication timing. When inserting or manipulating promoters or transcript lengths using CRISPR-Cas9, replication timing was altered in mESCs as judged by a combination of Repli-Seq, Bru-Seq, and RNA-Seq. Importantly, they found that transcription can be sufficient to advance replication timing depending on the length and strength of the expression of an ectopically transcribed gene. Taken together, the manuscript presents a compelling argument that transcription can advance replication timing but is not necessary for it.

      Major comments • A schematic or conceptual model summarising the major findings of transcription-dependent and independent mechanisms of RT advancement should be included in the discussion to add to the conceptual framework

      NOT DONE: We discussed this at length between the two senior authors and the first author and we do not feel ready to draw a summary model. We do not know what is advancing RT when transcription is induced or not induced, and we are not comfortable choosing one possible model of many. We hope that the added speculations on mechanism in the Discussion will sufficiently convey the future research that we feel needs to be done.

      ACTIONS DONE: In addition to the speculation on mechanism that already was in our Discussion section, we have added: On mechanisms of rapid induction of RT change, we have added to the Discussion: “____The rapid nature of the RT change after induction of transcription suggests that RT changes can occur after the functional loading of inactive MCM helicases onto chromatin in telophase/early G1 (Dimitrova, JCB, 1999; Okuno, EMBO J. 2001; Dimitrova, J. Cell Sci, 2002), and possibly after S phase begins.” And “For example, long transcripts could remodel a large segment of chromatin, possibly by creating domains of DNA supercoiling (Naughton et. al., 2013, PMID ____23416946).____ “ On mechanisms of RT advance in the absence of transcription, we have added the following to the Discussion: “____This is not surprising, given that ERCEs can maintain early RT in the absence of transcription (Turner, bioRxiv). In fact, chromatin features with the properties of ERCEs do appear outside of the deleted Ptn transcription control elements after differentiation (soon to be revised Figure 6C). The necessity and sufficiency of these new chromatin features to advance RT independent of transcription will be important to follow up.”

      • Vouzas et al. spend a substantial part of the manuscript to delve into the requirements to advance RT and even use a Doxycycline-based titration for temporal advancement of RT. Yet, all conclusions come from the use of hybrid-genome mouse embryonic stem cells (mESCs). Therefore, it remains speculative if and whether findings can be generalized to other cell types or organisms. The authors could include another organism/ cell type to strengthen the relevance of their findings to a broader audience, particular as they identified promoters that drive ectopic gene expression without affecting RT. Showcasing this in other model organisms would be of great interest.

      NOT DONE: To set this system up in another cell type or species would take a very long time. We also do not have personnel to carry that approach.

      ACTION TO BE DONE SOON: As an alternative approach that partially addresses this reviewer’s concern, we will provide a new Figure 7 with an analysis of RT changes vs. transcriptional changes when mESCs are differentiated to neural precursor cells. As described above in response to Revier #2s criticism #2, we have repeated the ESC to NPC differentiation and performed both Repli-seq and BrU-seq to evaluate RT changes relative to total genomic nascent transcriptional changes. The sequencing reads for this experiment are in our analyst’s hands so we expect this to be ready within a few weeks. We will compare genome-wide changes in RT vs. transcription to determine the significance of length and strength of transcription induction to RT advances and the necessity of transcriptional induction for RT advances. We and other laboratories have performed many integrative analyses of RNA-microarray/RNA-seq data vs. RT changes, but not total genomic nascent transcription and not with a focus on the effect of length and strength of transcription. For example, outcomes that would be consistent with our reductionist findings at the Ptn locus would be if we find domains that are advanced for RT with no induction of transcription (transcription not necessary) and little to no regions showing significant induction of transcription without RT advances.

      • OPTIONAL: as with the previous point, the authors went to great depth and length to show how ectopic manipulations affect RT changes on a single locus using genome-wide methods. In addition, the manuscript would benefit from the inclusion of other loci, particularly as transcription of the Ptn locus wasn't needed during differentiation to advance RT at all.

      NOT DONE: This rigorous reductionist approach is laborious and to set it up at one gene at a time at additional loci would be a huge effort taking quite a long time.

      ACTION TO BE DONE SOON: (same as response above) As an alternative approach that partially addresses this reviewer’s concern, we will provide a new Figure 7 with an analysis of RT changes vs. transcriptional changes when mESCs are differentiated to neural precursor cells. As described above in response to Reviewer #2s criticism #2, we have repeated the ESC to NPC differentiation and performed both Repli-seq and BrU-seq to evaluate RT changes relative to total genomic nascent transcriptional changes. The sequencing reads for this experiment are in our analyst’s hands so we expect this to be ready within a few weeks. We will compare genome-wide changes in RT vs. transcription to determine the significance of length and strength of transcription induction to RT advances and the necessity of transcriptional induction for RT advances. We and other laboratories have performed many integrative analyses of RNA-microarray/RNA-seq data vs. RT changes, but not total genomic nascent transcription and not with a focus on the effect of length and strength of transcription. For example, outcomes that would be consistent with our reductionist findings at the Ptn locus would be if we find domains that are advanced for RT with no induction of transcription (transcription not necessary) and little to no regions showing significant induction of transcription without RT advances.

      • The same point of Ptn not needing to be transcribed to advance RT of the respective domain, albeit being a very interesting observation, disturbs the flow of the manuscript, as the whole case was built around transcription and this particular locus-containing domain. Maybe one can adapt the storytelling to fit better within the overall framework.

      We would argue that demonstrating induction of Ptn, the only gene in this domain, is sufficient to induce early RT is a logical segway to asking whether, in the natural situation, induction is correlated with advance in RT. Our results show that transcription is sufficient but not necessary, which is expected if there are other mechanisms that regulate RT.

      __ACTION DONE: To make this transition more smooth, we have added the following sentence to the beginning of the results section describing Figure 6: “ ____This raises the question as to whether the natural RT advance that accompanies Ptn induction during differentiation requires Ptn transcription, or whether other mechanisms, such as ERCEs (Sima / Turner) can advance RT independent of transcription. “ __

      ACTION TO BE DONE SOON:____ To finish the work flow in a way that ties length and strength and sufficiency but not necessity in to the theme of natural cellular differentiation, we will provide a new Figure 7 with an analysis of RT changes vs. transcriptional changes when mESCs are differentiated to neural precursor cells, as described above.

      Minor comments • While citations are thorough, some references (e.g., "need to add Wang, Klein, Mol. Cell 2021") are incomplete.

      __ACTION TO BE DONE SOON: We apologize that some references seemed to not be incorporated into the reference manager Mendely. Since we are still planning to add one more figure soon and we will need to add some references for the datasets that will be shown in future Figure 6D, after that draft is ready, we will comb the manuscript for any references that were not entered and correct them. __

      • The text corresponding to Figure 1C could use more explanation for readers not familiar with the depiction of Repli-Seq data.

      ACTION DONE: “____Repli-seq labels nascent DNA with BrdU, followed by flow cytometry to purify cells in early vs. late S phase based on their DNA content, then BrdU-substituted DNA from each of these fractions is immunoprecipitated, sequenced and expressed as a log2 ration of early to late synthesized DNA (log2E/L). BrU-seq labels total nascent RNA, which is then immunoprecipitated an expressed as reads per million per kilobase (RPMK).”

      • Figure 1C needs labelling of the x-axes.

      ACTION DONE: We have now labeled the X axes.

      • Statistical analyses should be used consistently throughout the manuscript and explained in more detail, i.e. significance levels, tests, instead of "Significant differences....calculated using x".

      We used the same analysis for all the Repliseq data and the same analysis for all the Bruseq data. We agree that we did not present this consistently in the figure legends and methods.

      ACTION DONE:____ To correct the confusion we have clarified the statistical methods in the methods section and referred to methods in the figure legends as follows:

      The methods description of statistical significance for RT now reads: “____Statistical significance of RT changes for all windows in each sample, relative to WT, were calculated using RepliPrint (Ryba et al., 2011), with a p-value of 0.01 used as the cut-off for windows with statistically significant differences.”

      The methods description of statistical significance for transcription now reads: “____Differential expression analysis, including the calculation of statistically significant differences in expression, was conducted using the R package DESeq2____. In Figure 1, statistical significance was calculated relative to HTK expression in the parental cell line, which is expected to be zero, since the parental line does not have an HTK insertion. In all other Figures significance was calculated relative to Ptn expression in the parental line, which is expected to be zero, since the parental line does not express Ptn.____”

      The legend to Figure 1C now reads: The red shading indicates 50kb windows with statistically significant differences in RT between WT casteneus and modified 129 alleles, determined as described in Methods.

      The legend to Figure 1E now reads: “The asterisks indicate a significant difference in the levels of HTK expression relative to HTK expression in the parental cell line as described in Methods. ____There are no asterisks for the RT data, as statistical significance was calculated for individual 50kb windows as shown in panel (C).”

      Each time significance is measured in the subsequent legends, it is followed by the phrase “, determined as described in Methods” or “presented as in Figure 1C” or “presented as in Figure 1E” as appropriate.

      __ __ **Referees cross-commenting** __ Comment on Reviewer#1's review__, comment mentioning ATAC-Seq: Another way to look at this could be to investigate for origin usage changes (BrdU-Seq or GLOE-Seq) of chromosome 6 during differentiation.

      NOT DONE: Unfortunately we could not find any studies comparing origin mapping in mESCs and mNPCs.

      Comment on Reviewer#2's review, major comment 3: I do agree with their statement that origin loading cannot be the driver of RT change, as MCM2-7 double hexamer loading is strictly uncoupled from origin firing. Hence, any mechanism responsible for RT advance must happen at the G1/S phase transition or during S-phase, most likely due to the regulated activity of DDK/CDK or the limitation and preferred recruitment of firing factors to early origins. This could be tested through overexpression of said factors.

      NOT DONE: We agree that manipulating these factors would be a reasonable next approach to sort out mechanism. Due to limited resources and personnel, we will not be able to do this in a short period of time. We also argue that these are experiments for the next chapter of the story, likely requiring an entire PhD thesis (or multiple) to sort out.

      ACTION DONE: We have added the following sentence to the Discussion section on mechanisms: ____The rapid nature of the RT change after induction of transcription suggests that RT changes can occur after the functional loading of inactive MCM helicases onto chromatin in telophase/early G1 (Dimitrova, JCB, 1999; Okuno, EMBO J. 2001; Dimitrova, J. Cell Sci, 2002), and possibly after S phase begins.

      Reviewer #3 (Significance (Required)):

      General: This manuscript presents a compelling study investigating the relationship between transcription and replication timing (RT) using a reductionist approach. The authors systematically manipulated transcriptional activity at the Ptn locus to dissect the elements of transcription that influence RT. The study's strengths lie in its rigorous experimental design, clear results, and the reconciliation of seemingly contradictory findings in the existing literature. However, some aspects could be improved, particularly in exploring the mechanistic details of transcription-independent RT regulation at the investigated domain, the generalisability of the findings to other cells/organisms, and enhancing the presentation of certain data (explanation of e.g. Figure 1c, dense figure arrangement, lack of a summary figure illustrating key findings (e.g., correlation between transcription rate, readthrough effects, and RT advancement)).

      Advance: The manuscript directly addresses and reconciles contradictory findings in the literature regarding the effect of ectopic transcription on RT. Previous studies have reported varying effects, with some showing that transcription advances RT (Brueckner et al., 2020; Therizols et al., 2014), while others have shown no effect or only partial effects depending on the insertion site (Gilbert & Cohen, 1990; Goren et al., 2008). The current study conceptually advances the field by systematically testing different promoters and transcript lengths at a single locus (mechanistic insight), demonstrating that the length and strength of transcription, as well as promoter context, influence RT. This presents a unifying concept on how RT can be influenced. The authors also present a tunable system (technical advance) that allows rapid and reversible alterations of RT, which will certainly be useful for future studies and the field.

      Audience: The primary audience will be specialised researchers in the fields of replication timing, epigenetics, and gene regulation. This study may be of interest beyond the specific field of replication timing, such as cancer biology, developmental biology, particularly if a more broader applicability of its tools and concepts can be shown.

      Expertise: origin licensing, origin activation, MCM2-7, yeast and human cell lines

    1. Another popular technique is called Wizard of Oz prototyping1,21 Hoysniemi, J., Hamalainen, P., and Turkki, L. (2004). Wizard of Oz prototyping of computer vision based action games for children. Conference on Interaction Design and Children (IDC). 2 Hudson, S., Fogarty, J., Atkeson, C., Avrahami, D., Forlizzi, J., Kiesler, S., Lee, J. and Yang, J. (2003). Predicting human interruptibility with sensors: a Wizard of Oz feasibility study. ACM SIGCHI Conference on Human Factors in Computing (CHI). . This technique is useful when you’re trying to prototype some complex, intelligent functionality that does not yet exist or would be time consuming to create, and use a human mind to replicate it. For example, imagine prototyping a driverless car without driverless car technology: you might have a user sit in the passenger seat with a couple of designers in the back seat, while one of the designers in the back seat secretly drives the car by wire. In this case, the designer is the “wizard”, secretly operating the vehicle while creating the illusion of a self-driving car. Wizard of Oz prototypes are not always the best fidelity, because it may be hard for a person to pretend to act like a computer might. For example, here’s Kramer, from the sitcom Seinfeld, struggling to simulate a computer-based voice assistant for getting movie times:

      This is an intriguing technique that I have never heard of before. There are many ideas we can come up with but we might not have to the resources that we need to implement those ideas so the Wizard of Oz technique can be extremely helpful. I think it will become more and more useful as designers try to create designs to get ahead. I feel like since technology is developing at such a fast rate, designers are looking for unique things that they can create that have never been done before, and it will require functionality that may not exist yet.

    1. Elizabeth R. Gordon Interviewed by Lilia Bierman TranscriptElizabeth R. Gordon Interviewed by Lilia Bierman00:00:00:00 - 00:00:37:24LILIA: Okay. I'm recording. ERG: Okay. As I'm scratching my head. Please edit that out. (Laughs)LILIA: (laughs) I will. Okay, our topic is on the transition from VCR, VHS, and DVD rentals to online streaming. The first question is, how old were you when VCR, VHS, and DVD became a thing, and later, when digital became a big thing? 00:00:37:24 - 00:01:05:21ERGSo, VCR, I was 14. Okay. DVD, I think, is probably like college. So maybe 21, 22. So that would have been like in 1993, but they still weren't affordable. Yeah. And then streaming. We probably didn't start streaming anything till about five years ago. I was in my late forties. 00:01:05:21 - 00:01:31:15LILIAOkay. What was your experience adapting to the transition to digital away from VHS, DVD, and VCR? And what did you think about these social changes?00:01:32:15 - 00:01:58:12ERGLike, when you have DVDs, when they get scratched, you would have to deal with that. And that was problematic. A lot of my videos are still on videotape. So my wedding is on tape. Oh my son, all his first moments are also on videotape.So I've got to get those transitioned—and then streaming and digital stuff. I mean like I said, because I came in the generation where we did not have personal computers in college. Everything has had to be self-taught. Luckily, my husband is very good about this, and he helps me out. But now I feel very confident in streaming and doing things like that and having apps on my phone—stuff like that.00:01:58:28 - 00:02:19:10Unknown(LILIA) Okay. (ERG) And then what was the second part of that. (Lilia) And what did you think about these social changes. (ERG) What do you mean by that. (LILIA)I mean it's just like how it it kind of ties into the next question, how it kind of changed your everyday lifestyle, if at all. If you noticed any changes, was it more difficult to adapt to.00:02:19:12 - 00:02:36:24ERGI mean, you made it easier because you didn't have to carry all this technology around. You have this I can stream Netflix on my phone now. And you don't have to keep up with X, Y and Z. It, I thought it made it very, it made it much easier and I definitely would not want to go backwards.00:02:38:18 - 00:03:09:11ERGBut I like my parents who are in their 80s. There's no way that they, they like the idea of probably have a Netflix or Amazon Prime, but there's no way that my dad could handle that. Yeah. He has a smartphone that, you know, it's, tech support. Yeah. Smartphone. LILIA Yep. I get it. Were there any challenges that you or others that you know, faced while adapting to these new technologies, whether it was learning it or just kind of want to throw your computer at the wall?00:03:09:16 - 00:03:30:01ERGYou know, because we didn't have any computer classes in high school. Yeah. I think they had one section. But the computers that we had or what we did, especially when I was in college, like I wanted C plus programing, I had never it was never taught like word processing Microsoft Word I learned how to type on a typewriter.00:03:30:22 - 00:03:51:21ERGSo again everything was self-taught. It was very hard to begin with and made me kind of nervous. I know a lot of people, think that they can mess something up and can't get it back, and, and there was a lot of anxiety, with that transition. But I feel, you know, again, like, I don't know everything.00:03:51:23 - 00:04:11:10ERGAnd I have children that can help me out, but, you know, I've had to learn a lot. My generation has had to learn a lot. Yeah. And most of us have adapted well, I think. Yes. I'm in Gen X, so that's 1965 to about 1980. And and we've learned a lot and adapted. You know. Yeah. The generation before us.00:04:11:12 - 00:04:38:29ERGNo they're not going to do that. No they're not. In retrospect what were the pros and cons of these shifts in technology. You can get more data on things. So I remember when I was writing my thesis in graduate school, and I was still we we didn't have a lot of memory on computers and had to save it on disks, and it took like 6 or 7 deaths and it would be awful.00:04:38:29 - 00:05:01:07ERGAnd then I'd have to get another. So that was extremely frustrating. You know, being able to have things that are quicker and easier to access and knowing that I've got more space and understanding what a megabyte is, what a gigabyte is, and the storage, that is a lot, lot more helpful. But again, I, I, I've enjoyed the technology push.00:05:01:07 - 00:05:26:12ERGThe one thing I don't like about it is that, I'm glad that I raised my children before this. Because I think that kids that are now being raised, a lot of them, you know, this is, this is shoved in their direction in order to occupy them and they're missing out on reading books. They're missing out on dealing with time that you just have to entertain yourself.00:05:26:12 - 00:05:42:26ERGLike going to the doctor's office. We always read books, or we always did stories, or we always just talked about our day. And now I see, you know, like a two year old or one year old, the doctor's office and the parent says this. Yep, yep. And that is just. And then again, you know, my students, I say it's constant.00:05:42:28 - 00:06:08:01ERGYeah. They can't cut it all. No. Like you got to be professional and put it aside and make eye contact. So it's all like that. Yeah. No, I totally agree. Looking back, what are the biggest lasting impacts of this shift? I just like the fact that you have more information that's accessible. You do have to decipher what is true and what's not true.00:06:08:02 - 00:06:29:26ERG Yeah, but, you know, if I have a question, instead of having to go to a library and find the book or and I would have I mean, I've taken graduate classes since the shift and my papers, I can find so much more information to write about. Because it's more accessible than half in a way on interlibrary loan or going over there and looking something up.00:06:29:28 - 00:06:54:27ERGSo I do like that quick access to information. I do like the portability of it. And I think that has really changed. And then I mean, things like exposure, like medical records. And when I make a doctor's appointment, the reminder will shift through my cell phone, or I'll shift through the app and then I can find out my test, my blood test for that rather quickly, and have to rely on somebody to call me and tell.00:06:55:00 - 00:07:04:29LILIAYeah, I totally agree. So I love all that. Yeah, it is very helpful. How would you describe this shift in one word?00:07:05:24 - 00:07:10:15ERGOne word?00:07:11:18 - 00:07:35:04ERGI think it's exciting. Yeah, I think it really is. I mean, again, I've embraced it because I've been forced to embrace it as an educator. As a parent. So I've everything about I've like except for again that this is just steering people away from having relationships. Yeah. And learning how to deal with, you know just empty time.00:07:35:04 - 00:07:56:10ERGYou've, you've got to, I think, a lot of parents are missing out on that. They definitely are. LILIAYeah, I totally agree. Do you miss VCR, VHS or DVD? And if so, what aspects specifically do you miss?00:07:56:13 - 00:08:19:09ERGCan't miss it if it's never gone. And I still have all my children's Pixar stuff. We lived on it. They had portable DVD players that would hook into the car. Yeah. We had 13-hour (car) rides to go with it. LILIAI mean, you can't argue about that.00:08:19:15 - 00:08:40:27ERGNo, you cannot, but no, I don't miss this at all. You know, I need to get the one thing that I'm really concerned about, which is that I need to get all my son's videos transferred over, and I'm about to send them to somebody. Yeah. And then my wedding video. I need to get that transferred into something. So, no, I don't miss it.00:08:40:29 - 00:09:01:17ERGNo, I still have a bunch, and I still have a DVD player. We got rid of the VCR a couple of years ago. Oh, maybe we haven't. So I can't watch my wedding videos anymore. But now I don't miss this at all. Okay, well that's fair. I don't blame you, since it does, and there's nothing in your computer, so, like.00:09:01:23 - 00:09:37:29ERGNo, I can't know. And there used to be some laptops where you could plug in CD's. Yeah, I remember that. And then like, you know, in the cars when I was 16, you had just, you had a radio and then you had a tape. And then like if you're real fancy, you had a plug in DVD and you plug in a CD player, but like when you went over a bob it was and then came you know they installed and I think my car right now it's like a 2016 I think it has a cassette and a DVD player.00:09:38:12 - 00:09:54:03ERGMay not have the cassette probably then, but yeah, it's just and then all that trying to figure out your song that you want, I mean it's just so much easier. Yeah. Just to plug something in or auto-connect it. It's fantastic. LILIAYeah. Okay. Well, that was all of my questions.Steven Hawk Interviewed by Colby Hawk TranscriptDr. Steven Hawk Interviewed by Colby Hawk00:00:00:00 - 00:00:28:08 Steven: Okay. Go ahead. You can introduce yourself. Yes. My name is Doctor Steven Hawk and I am a licensed K through 12 English teacher. And I've been teaching in the public schools for eight years now. Colby:  Cool. So, about how old were you? When, you know, you grew up with the, you know, VHS, VCR and everything, what was it like with that being a big thing back in the day?  00:00:28:08 - 00:00:48:04 Colby: What was your experiences with everyday life and having it having this technology?  Steven: Yeah. From, from the age where I was able to really watch movies, I was watching VHS tapes. So, I had a very small collection of VHS tapes and pretty much just rewatched the same 2 or 3 movies again and again and again and again.  00:00:48:04 - 00:01:06:24 Steven: As my mom would tell you, she would say, I wore out Land Before Time on VHS and Home Alone. Those are my two movies that I pretty much would play ‘em rewind ‘em, play ‘em, rewind ‘em. So as a child, that was my experience was just VHS tapes. You could go to a blockbuster and rent a VHS tape at that point.  00:01:06:26 - 00:01:29:22 Steven: But you owned very few and you were able to rent very few. If you were able to rent, it was usually like once a week. So, you didn't watch a lot of movies. And when you did, hopefully it was something you really liked, and you just watched it again and again and again.  Colby: Cool. Yeah. And having the technology and everything and, you know, the, you know, VHS mainly for you.  00:01:29:24 - 00:01:53:16 Colby: what was it like transitioning, to this digital, you know, internet age when you have iPhones in your pocket, MacBooks and streaming and all of that?  Steven: Yeah. So, the, the, the chain for me, was we went from VHS to DVD probably when I was about 13 years old, around 13. We, we had DVDs and that was a big deal.  00:01:53:19 - 00:02:15:11 Steven: And then DVDs evolved into Blu rays. So, the quality of the DVD DVDs got better. I remember it was my sophomore year of high school when MP3's became a thing. So no longer do we have to carry Walkmans to listen to music, but which is like a DVD, right? we transitioned to MP3's, and so the digital age kind of came upon us.  00:02:15:15 - 00:02:42:09 Steven: It wasn't until I was probably 22 that I had my first iPhone. So growing up, you know, we didn't have internet for the most part of my life. We didn't have any kind of apps or streaming until I was in my probably early 20s. And so that was a huge change because of the amount of things that you could be, I guess, exposed to through streaming.  00:02:42:12 - 00:03:07:12 Steven: It went from having to have a physical copy of a movie or a disc for music to being able to just choose from a vast digital library of different genres and different artists, to then seek out things which isn't something you were able to do. No more than just going to blockbuster and looking through the shelves, could you really seek out different genres and different types of things.  00:03:07:12 - 00:03:29:03 Steven: So, it in a lot of ways it was very freeing because it introduced you to a lot of new things, and you were able to discover a lot of new, tastes, genres, artists, things like that. So, yeah, I would say I was probably about 22 when streaming really caught on in the United States.  00:03:29:05 - 00:03:49:05 Colby: Now, if when you were 22, when you were 22, you would have just gotten out of college. So when you were still at UTK, what was that like, you know, going, you know, if you wanted to go watch something with your friends or, you know, catch up on the newest whatever, what what was that experience like before you had access to all this?  00:03:49:06 - 00:04:11:11 Steven: Yeah. So it was still DVDs were still the thing. You know, when I was in college, we hadn't moved to streaming quite yet. We had the internet age where you were streaming games online with friends and multiplayer and stuff like that. But not really movies. Movies and TV were not mainstream stream. They were not streamed to the mainstream yet.  00:04:11:14 - 00:04:33:23 Steven: And so for me, it was still going to the movies, you know, my friends and I, we would go to the movie theater if there was a movie coming out. You knew the release date and you would you would set a date and a time to go see the movie with your friends physically at a theater. So it wasn't like we stayed in our dorms or apartments and were able to stream the newest movie or TV show.  00:04:33:25 - 00:05:03:12 Steven: So, for me, that was it was still kind of what you would consider an old school experience. I know I've told you Facebook came out in 2005 when I first went to college. And, you know, so social media and the evolution of all streaming from internet, computer platforms, to digital media, for movies, and games, and music, that all really, you know, came mainstream after my college experience. Not during.  00:05:03:15 - 00:05:25:03 Colby: Now, the one big thing I think, and most everybody knows about right is blockbuster.  Steven  Yeah.  Colby  So, can you tell me a little bit more about your experiences with blockbuster? You know, was there like a membership program? Was there like certain deals that they had? What was it like going into one of these stores and renting and picking out your favorite flicks?  00:05:25:05 - 00:05:51:07 Steven: Yeah. If there was a membership program, I'm not aware. As a small child, I don't remember if there was a membership program. But what I do remember, and I tell people often, it was always like Christmas morning for me. I loved blockbuster. I think everyone kind of had the same experience where it was 1 or 2 times a week that you might be fortunate enough to go to a blockbuster and get to rent a new movie that you had never seen.  00:05:51:10 - 00:06:09:23 Steven: It was usually a Friday night, and you've been going to school all week and you're just looking forward to Friday night, because that's the one time your parents get to take you to blockbuster and you walk in the store, and it was like toys R us. You have all these movies, and it was just the covers of the movies with a DVD behind it.  00:06:09:25 - 00:06:32:09 Steven: And if you wanted to watch that movie, you had to take the cover out of the way and see if the DVD was still left. And if there was no DVD, then someone had already rented that movie. And if there were enough left, then you got to take one home. But very often they'd already been rented, and so some, some nights you would go for a certain movie, a new release, and it wasn't there.  00:06:32:14 - 00:06:50:03 Steven: And you'd be a little bummed, but you would just go pick out another movie and you would be excited because you didn't get to watch movies, but maybe once or twice a week. like, at all. You didn't get to watch any more than 1 or 2 movies a week. And so, it was a big deal to watch a movie back then, and it was a lot of fun.  00:06:50:04 - 00:07:15:08 Steven: It was something you really look forward to for Monday. You look forward to getting to Friday and Saturday so you could watch a movie and, and so yeah. It was really special back then. And, it had its. Looking back, you could say it had its difficulties. Like I said, you know, the movie may not be there for you to rent, but we dealt with that disappointment really well, I think, and just say, hey, maybe it'll be back by tomorrow.  00:07:15:08 - 00:07:36:02 Steven: Maybe we could rent it on Saturday night. If not, maybe next week. That'll be the movie. So, you know, we didn't get mad about it. It was part of the deal when you went to blockbuster. So I feel like, you know, movies were so much more special back then because they were so much more rare, and they're not rare anymore.  00:07:36:05 - 00:07:56:08 Steven: And so, you know, I miss I miss blockbuster, I miss the excitement of going into the store and the excitement of seeing if the DVD is still there and the excitement of taking it home and watching it. In the VHSs, you had to be kind and rewind is what you had to do. You know, you rewound the tape for the next person to use it.  00:07:56:15 - 00:08:14:18 Steven: When DVDs came along, it was special because you no longer had to rewind the movie. You could just return the disc. So that was a big deal for us. And then of course, as it moved to streaming, you could watch whatever you wanted whenever, you know, whatever day of the week. You didn't have to worry about rewinding or anything.  00:08:14:18 - 00:08:37:21 Steven: So, it was definitely an evolution. But, for me, blockbuster was really special. And not just blockbuster, but, you know, even Redbox later and, you know, any form of renting a movie during the week was really special.  Colby: Yeah. And, you're talking about how, you know, now it's not as you know, it's not special. You know, it's not, you know, you have easy access to everything.  00:08:37:21 - 00:09:10:19 Colby: And, kind of on that note, like looking back at your experiences having, you know, dealt with DVDs, VHS, all this stuff, and then having Disney+ and Netflix, and, whatever, Hulu, whatever. You know, how has that changed, like your lifestyle or, you know, just society today and, and like what what would you say or like in some of the pros and cons with having this easy access through, you know, the internet or whatever, you know.  00:09:10:24 - 00:09:35:04 Steven: Yeah. Definitely, it's a double edged sword. To kind of go back to say, Netflix started as a DVD subscription process, and then that turned into a digital streaming process. I didn't jump into that process, probably for a couple of years into when Netflix became a digital subscription service. Netflix was the first one that I subscribed to.  00:09:35:06 - 00:09:54:08 Steven: It was fairly cheap, and I thought, hey, this seems pretty neat, and I gave it a try. And that was my first foray into the digital streaming world. And I enjoyed it. You know, my first experience was, or my first thought was this, this is nice. This is a lot better than having to, you know, get out of my house and drive to a store and it may or may not be there.  00:09:54:08 - 00:10:20:06 Steven: And so, there were some pros there. There were some benefits to that process. But I think as time went on, and this is a year's process, right? As more and more things started to become, digital based, streaming based platforms, news, TV, movies, eventually, taking you out of the theater, even, and just leaving you in your living room.  00:10:20:08 - 00:10:50:07 Steven: Then the layers with Covid. You know, people not getting out of their house. They marketed streaming really heavily during the Covid years, and the years to follow Covid, as something to keep you safe. So it was a marketing ploy to really get you to binge watch and stream. So like I said, it became over time, I believe more of a negative thing had a negative impact on my life because it's so addictive.   00:10:50:09 - 00:11:27:02 Steven: Right? That word binge is probably not a positively connotated word in any other setting. If you binge on food per se, that would not be good. But to binge on Netflix has been marketed as a culturally positive thing. It's something that's good to do. And while it may seem good and may seem fun, and you may find a show or, you know, a series of shows that have five, seasons, and you can watch all of them in a matter of two weeks, I’m not sure that that’s healthy.   00:11:27:10 - 00:11:53:13 Steven: And, in my own life, personally, I think, I think it has had a negative impact to be totally honest. It’s much easier after a hard day of work to go to my bedroom and shut the door away from my kids and silence the house and just consume right? To not give anymore, but to just consume, to binge.   00:11:53:15 - 00:12:16:00 Steven: And that's not good. And I know that that's not good. And so, I feel like now I'm having to self-police. I'm having to say this much is okay, but this much is dangerous. This is not good, not healthy. And so, there's it's a fine line. I'm not exactly sure where the line is now because it's all an evolving process.    00:12:16:02 - 00:12:54:07 Steven: But for me personally, I know it's taking time from my kids, taking time from me reading books and things that I used to do more of, perhaps taking time away from, you know, talking to my wife and communicating. Giving myself a pass when things have been difficult to just sit there and binge and to stream. So, while there have been good things, I think you are, you're probably, kind of like the genres of music. You’re able to discover more through streaming, things that you didn't know existed or things that you didn't know perhaps you were interested in.  00:12:54:10 - 00:13:20:01 Steven: But the negative effect, I think, perhaps outweighs the positive. And that's just my experience. I know some people would disagree.  Colby: Yeah, there's a lot of differing opinions on, streaming and everything. And I think, I mean, I don't even have time to binge these days anymore, which is probably a good thing.  Steven: Yeah, I think so.  Colby: So we talked, you know, you touched on, like, the society and the shift and changes.  00:13:20:01 - 00:13:51:08 Colby: That was very good. With online and all that. Were there any, I guess, you kind of talked about this maybe a little bit, but like any challenges that you or any others that you observed or faced with this challenge of going away from, you know, more analog, whatever, to digital?   Steven: Yeah. I mean, nothing, nothing dramatic or drastic, but I think the first challenge was, of course, going from DVD to streaming because we were in an in-between stage there for a while.  00:13:51:13 - 00:14:07:23 Steven: You had streaming apps out there, and you had Netflix and things that you could, you know, sign up for and partake of, but it's like you kind of had a toe in that world, but you were still stuck to DVDs and you rented from, you know, once blockbuster went out, it was Redbox or, you know, stuff like that.  00:14:07:23 - 00:14:30:20 Steven: And then when I went full into streaming, then, I guess the challenge is, you know, part of its financial, to be totally honest. You’re, you're paying for things regularly that you didn't used to pay for, you know. Monthly, you're paying at a minimum, People are probably paying for one streaming app. Lots of people are paying for five or more streaming apps.  00:14:30:22 - 00:14:57:01 Steven: So what used to be free through cable is now charged through apps. So that's been a struggle. Just a financial struggle is like, where's the line between what's an appropriate amount to spend on this form of entertainment and what's not? What’s healthy, what's not? I know this was not for me, but for for some elderly people, there was a huge problem trying to transition to the digital streaming apps.  00:14:57:01 - 00:15:19:13 Steven: And, you know, they they had their TVs that they liked, but they weren't smart TVs. So, you know, they had to figure that they needed a new TV and how to work a new remote and how to download apps and work apps. And that wasn't a problem for me. But I did deal and try to help a lot of elderly people through that transition process to understand how to stream content.  00:15:19:16 - 00:15:40:17 Steven: But for me, you know, like I said, it was just kind of a. It was a learning phase then followed by a self-policing phase of what's. What do I need and what do I not need? Because everyone who develops a streaming app tells you that you need it. And it's kind of hard to select the right service, you know? Do you go with Hulu?  00:15:40:17 - 00:15:59:22 Steven: Do you go with, you know, Comcast? Which one do you go with? There are just so many to choose from that I had to do my research before I landed on the one that I would pay for. Yeah.  Colby: So I think we've already talked about, like, looking back, what were the big impacts on that.  00:15:59:22 - 00:16:29:29 Colby: I think we already touched that. Steven:  Yeah.  Colby:  How would you describe that shift in one word? Or that shift or like actually three things. How do you describe the shift?  The time before the like the VHS DVDs, all that. And then the time now after this shift. Like three, I know upped it but three.  Steven: Yeah. I would say for the time past, nostalgic. Nostalgic is my word because I miss it.  00:16:30:01 - 00:16:51:15 Steven: It's it's something you didn't know that you would miss when it when when it went away. there was sadness when blockbuster went out of business, but there was also an acceptance that this is just the new way of things. And sometimes the more we get into the new way, the more I wish it could become the old way.  00:16:51:18 - 00:17:19:01 Steven: So nostalgic would be that one. For the transition, I would say exciting would be the word I would use for that. I can remember being the only, high schooler, on the way to a baseball team with a new iPod that streamed. Or not streamed but you know, had the MP3 downloaded music that I could just select from a playlist, while all my friends had a Walkman disc that would skip if, you know, they didn't hold it right.  00:17:19:01 - 00:17:47:03 Steven: And so for me, it was exciting. It was a new frontier. It was a new challenge to learn the technology of it. What was for for the, what was the last question for now? I would say the word is dangerous. For the reasons I've stated already, you know, the, mainly the social reasons. What is marketed to us is that we, again, should binge these things.  00:17:47:09 - 00:18:15:27 Steven: We need these things. We can't live without these things. There's a lot of clever marketing that goes into it, and a lot of people that are persuaded by that marketing, including me to some extent. Right. Because I stream. I do watch shows and a lot of it, a lot more than I used to. What used to be one movie a week has turned into ten movies a week. And  20 episodes a week. And that's dangerous.  00:18:15:28 - 00:18:38:02 Steven: It’s dangerous because it's taking me from things that are more important. And it's giving me a pass when I'm tired to say I don't have to struggle with difficult things. I can just. I deserve this. To just sit quietly in my room, away from my children, away from my wife, away from whomever, and reward myself. I think that's a dangerous notion.  00:18:38:04 - 00:18:50:15 Steven: So dangerous, I think, would be the word. Colby: Cool. Yeah. And then. Yeah my battery’s giving me the warning. I think I've got 1 or 2. One more question.  00:18:50:15 - 00:19:10:24 Colby: Okay, so that two part thing, I guess if you could give me one more comment, like do you miss it? You know, do you miss the VHS? You know, rewinding and you know, having, you know, all that the blockbuster and what do you. What, if anything, would you change today? And then what were your favorite, you know, tapes? Or your.  00:19:10:28 - 00:19:34:01 Steven: Yeah. Yeah. Yeah. So I mentioned earlier, my two favorites when I was young was Land Before Time. The original Land Before time. The first one. Petrie, Longneck, and all the, Sharptooth. That was, I've watched that on repeat, I think. And, and then later when I was a little older, it was, Home Alone, the original Home Alone with Macaulay Culkin. And I just thought that was hilarious.  00:19:34:04 - 00:19:53:05 Steven: It’s kind of slapstick humor, you know? And so those are the two that were my favorite. As far as, you know, do I miss it? Absolutely. I miss the way things were, because I think I missed the way I was, and my family was, and other people were. That's what I missed. It's not that I miss blockbuster itself.  00:19:53:07 - 00:20:21:08 Steven: I miss the type of world that we lived in when we still had a blockbuster. When movies were still special. I didn't say earlier, but you know, as a, as a ninth-grade high school teacher, when we, when I was young and we had a special movie day that was like the best day ever. And so, as a teacher, I thought, hey, when they've really worked hard, I'm going to give them a special movie day occasionally, because I love that when I was young. And I tried that.  00:20:21:11 - 00:20:45:07 Steven: And I've learned that you can't get these kids to focus on a movie anymore. They're so desensitized. They're so overstimulated. They won't even watch a movie anymore. They don't care about movies anymore. I miss how much people cared about movies. So, yeah, I miss it. It's not that I miss VHS again. It's just I miss the way people were.  00:20:45:10 - 00:21:03:00 Steven: And I don't think we can ever get that back. I think we're too far away from that. I don't think we get back to that. So as far as the second part, you know, what could, I what would I change if I could change something? What would I want to change I don't think I have the power to change.  00:21:03:02 - 00:21:23:03 Steven: I want families to sit together on a couch on a Friday night, like I did with a couple pizzas and a show and watch it together, and laugh together, and have time together like family should. That's what I want to happen. but I can't make that happen for other people. I can try to make it happen in my home.  00:21:23:05 - 00:21:47:25 Steven: And, and I've been trying to do that more, you know? I've been consciously trying to do that more in my own home. But I can't do it for other peoples. And so, what I'm seeing in our culture is a shift away from, from loving one another, from spending time, quality time together, and for giving ourselves, as parents, a pass for spending time with our kids.  00:21:47:25 - 00:22:08:07 Steven: And sometimes, even for parenting our kids. Because it's easier just to put them in front of an iPad or a TV screen and just let them watch a movie than it is to discipline, or to ask them how their day was, or to troubleshoot things in their lives, or to help them with their math homework.  00:22:08:09 - 00:22:28:24 Steven: It’s easier just to let them stream something. So I don't know how we fix that, Colby. That's that's something that I've thought about a lot lately. How do we, as a society, as a culture, get back to at least some part of what we used to be when blockbuster still existed? I don't know, I don't know the answer to that.  00:22:28:24 - 00:22:52:17 Steven: I think it's a. It’s a question that people have to challenge themselves with personally. They have to know who they are, what they've become, what they want to be, and then find a way to, to find that middle ground between what's enough streaming and what's too much streaming for themselves as parents, as adults, and also for their children.  00:22:52:19 - 00:23:00:15 Steven: And I just don't have a good answer to that, even though I wish I could. Colby:  Sweet. That was a very good answer.Paul Navis  Interviewed by Cole Kennedy Transcript

      Good job running the interviews as conversations rather than spitting the questions out, without any follow up questions! I also appreciate that the transcripts were cleaned up and made easier to navigate.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity)

      *This study examines the reorganization of the microtubule (MT) cytoskeleton during early neuronal development, specifically focusing on the establishment of axonal and dendritic polarity. Utilizing advanced microscopy techniques, the authors demonstrate that stable microtubules in early neurites initially exhibit a plus-end-out orientation, attributed to their connection with centrioles. Subsequently, these microtubules are released and undergo sliding, resulting in a mixed-polarity orientation in early neurites. Furthermore, the study elegantly illustrates the spatial segregation of microtubules in dendrites based on polarity and stability. The experiments are rigorously executed, and the microscopy data are presented with exceptional clarity. The following are my primary concerns that warrant further consideration by the authors. *

      1. Potential Bias in the MotorPAINT Assay: Kinesin-1 and kinesin-3 motors exhibit distinct preferences for post-translationally modified (PTM) microtubules. Given that kinesin-1 preferentially binds to acetylated microtubules over tyrosinated microtubules in the MotorPAINT assay, the potential for bias in the results arises. Have the authors explored the use of kinesin-3, which favors tyrosinated microtubules, to corroborate the observed microtubule polarity?

      We thank the reviewer for the careful assessment of our manuscript. As the reviewer noted, it has indeed been demonstrated that kinesin-1 prefers microtubules marked by acetylation (Cai et al., PLoS Biol 2009; Reed et al., Curr Biol 2006) and kinesin-3 prefers microtubules marked by tyrosination in cells (Guedes-Dias et al., Curr Biol 2019; Tas et al., Neuron 2017); however, these preferences are limited in vitro, as demonstrated for example in Sirajuddin et al. (Nat Cell Biol 2014). When motor-PAINT was introduced, it was verified that purified kinesin-1 moves over both acetylated and tyrosinated microtubules with no apparent preference in this assay (Tas et al., Neuron 2017). This could be due to the more in vitro-like nature of the motor-PAINT assay (e.g. some MAPs may be washed away) and/or because of the addition of Taxol during the gentle fixation step, which converts all microtubules into those preferred by kinesin-1. We will clarify this in the text.

      Planned revisions:

      • We will clarify the lack of kinesin-1 selectivity in motor-PAINT assays in the text by adding the following sentence in the main text when introducing motor-PAINT: Importantly, while kinesin-1 has been shown to selectively move on stable, highly-modified microtubules in cells (Cai et al., PLoS Biol 2009; Reed et al., Curr Biol 2006), this is not the case after motor-PAINT sample preparation (Tas et al., Neuron 2017).

      Axon-Like Neurites in Stage 2b Neurons: The observation of axon-like neurites in Stage 2b neurons, characterized by an (almost) uniformly plus-end-out microtubule organization, is noteworthy. Have the authors confirmed this polarity using end-binding (EB) protein tracking (e.g., EB1, EB3) in Stage 2b neurons? Do these neurites display distinct morphological features, such as variations in width? Furthermore, do they consistently differentiate into axons when tracked over time using live-cell EB imaging, rather than the MotorPAINT assay? Could stable microtubule anchoring impede free sliding in these neurites or restrict sliding into them? Investigating microtubule sliding dynamics in these axon-like neurites would provide valuable insights.

      We thank the reviewer for highlighting this finding. Early in development, cultured neurons are known to transiently polarize and have axon-like neurites that may or may not develop into the future axon (Burute et al., Sci Adv 2022; Schelski & Bradke, Sci Adv 2022; Jacobson et al., Neuron 2006). In the absence of certain molecular or physical factors (e.g. Burute et al., Sci Adv 2022; Randlett et al., Neuron 2011), this transient polarization is seemingly random and as such, we do not expect the axon-like neurites in stage 2b neurons to necessarily become the axon. Interestingly, anchoring stable microtubules in a specific neurite using cortically-anchored StableMARK (Burute et al., Sci Adv 2022) or stabilizing microtubules in a specific neurite using Taxol (Witte et al., JCB 2008) has been shown to promote axon formation, but these stable microtubules have slower turnover (perhaps necessitating the use of laser severing as in Yau et al., J Neurosci 2016) and may not always bear EB comets given that EB comets are less commonly seen at the ends of stable microtubules (Jansen et al., JCB 2023).

      Planned revision:

      • We will add additional details to the text to clarify the likely transient nature of this polarization in agreement with previous literature and specify that they are otherwise not morphologically distinct.
      • We will perform additional EB3 tracking experiments in Stage 2b neurons to examine potential differences between neurites.

      *Taxol and Microtubule Sliding: Taxol-induced microtubule stabilization is known to induce the formation of multiple axons. Does taxol treatment diminish microtubule sliding and prevent polarity reversal in minor neurites, thereby facilitating their development into axons? *

      We thank the reviewer for this interesting suggestion. Taxol converts all microtubules into stable microtubules. Given that the initial neurites tend to be of mixed polarity, having stable microtubules pointing the "wrong" way may impede sliding and polarity sorting. Alternatively, since it is precisely the stable microtubules that we see sliding between and within neurites using StableMARK, Taxol may also increase the fraction of microtubules undergoing sliding. Because of this, it is not straightforward to predict how Taxol affects microtubule (re-)orientation and sliding. Preliminary motor-PAINT experiments do suggest that the multiple axons induced by Taxol treatment all contain predominantly plus-end-out microtubules, as expected, and that this is the case from early in development. We will further develop these findings to include them in our manuscript.

      Planned revision:

      • We have already performed some experiments in which we treat neurons with 10 nM Taxol and verify that we observe the formation of multiple axons by motor-PAINT. We will perform additional experiments in which we add this low dose of Taxol to the cells and determine its effect on microtubule sliding dynamics.

      *Sorting of Minus-End-Out Microtubules (MTs) in Developing Axons: Traces of minus-end-out MTs are observed proximal to the soma in both Stage 2b axon-like neurites and Stage 3 developing axons (Figure S4). Does this indicate a clearance mechanism for misoriented MTs during development? If so, is this sorting mechanism specific to axons? Could dynein be involved? Pharmacological inhibition of dynein (e.g., ciliobrevin-D or dynarrestin) could assess whether blocking dynein disrupts uniform MT polarity and axon formation. *

      We indeed think that a clearance mechanism is involved for removing misoriented microtubules in the axon after axon specification. Many motor proteins have been implicated in the polarity sorting of microtubules in neurons and for axons, dynein is believed to play a role (Rao et al., Cell Rep 2017; del Castillo et al., eLife 2015; Schelski & Bradke, Sci Adv 2022). A few of these studies already employed ciliobrevin, noting that it increases the fraction of minus-end-out microtubules in axons (Rao et al., Cell Rep 2017) and reduces the rate of retrograde flow of microtubules in immature neurites (Schelski & Bradke, Sci Adv 2022). These findings are in line with the suggestion of the reviewer. Interestingly, however, as we highlight in the discussion, the motility we observe for polarity reversal is extremely slow on average (~60 nm/minute) because the microtubule end undergoes bursts of motility and periods in which it appears to be tethered and rather immobile. Given that most neurites are non-axon-like, we assume these sliding events are mostly not taking place in axons or axon-like neurites. These events may thus be orchestrated by other motor proteins (e.g. kinesin-1, kinesin-2, kinesin-5, kinesin-6, and kinesin-12) that have been implicated in microtubule polarity sorting in neurons. We do observe retrograde sliding of stable microtubules in these neurites at a median speed of ~150 nm/minute, which is again much slower than typical motor speeds and occurs in almost all neurites and not specifically in one or two axon-like neurites. It is thus unclear which motors may be involved, and it is difficult to predict how any drug treatments would affect microtubule polarity.

      Dissecting the mechanisms of microtubule sliding will require many more experiments and will first require the recruitment and training of a new PhD student or postdoc. Therefore, we feel this falls outside the scope of the current work, which carefully maps the microtubule organization during neuronal development and demonstrates the active polarity reversal of stable microtubules during this process.

      Planned revision:

      • We will expand our discussion of the potential mechanisms facilitating polarity sorting in axons and axon-like neurites in the discussion.

      Impact of Kinesin-1 Rigor Mutants on MT Polarity and Dynamics: Would the expression of kinesin-1 rigor mutants alter MT dynamics and polarity? Validation with alternative methods, such as microtubule photoconversion, would be beneficial.

      It is important to note that StableMARK and its effects on microtubule stability have been extensively verified in the paper in which it was introduced (Jansen et al., JCB 2023). At low expression levels (where StableMARK has a speckled distribution along microtubules), StableMARK does not alter the stability of microtubules (e.g., they are still disassembled in response to serum starvation), alter their post-translational modification status or their distribution in the cell, or impede the transport of cargoes along them. Given that we chose to image neurons with very low expression levels of StableMARK (as inferred by the speckled distribution along microtubules), we expect its effects on the microtubule cytoskeleton to be minimal.

      Planned revision:

      • We will clarify the potential effects of StableMARK in the manuscript. We will perform experiments with photoactivatable tubulin to examine whether we still see microtubules that live for over 2 hours. We will furthermore examine whether it allows us to see microtubule sliding between neurites similar to work performed in the Gelfand lab (Lu et al., Curr Biol 2013).

      *Molecular Motors Driving MT Sliding: Which specific motors drive MT sliding in the soma and neurites? If a motor drives minus-end-out MTs into neurites, it must be plus-end-directed. The discussion should clarify the polarity of the involved motors to strengthen the conclusions. *

      We thank the reviewer for highlighting this point and will improve our discussion to clarify the polarity of the involved motors.

      Planned revision:

      • We will expand our discussion of the motors potentially involved in sliding microtubules when revising the manuscript.

      Stability of Centriole-Derived Microtubules: Microtubules emanating from centrioles are typically young and dynamic. How do they acquire acetylation and stability at an early stage? Do centrioles exhibit active EB1/EB3 comets in Stage 1/2a neurons? If these microtubules are severed from centrioles, could knockdown of MT-severing proteins (e.g., Katanin, Spastin, Fidgetin) alter microtubule polarity during neuronal development? A brief discussion would be valuable.

      We thank the reviewer for raising these interesting questions and suggestions. As suggested, we will include a brief discussion of these issues. What is known about the properties of stable microtubules is limited, so it is currently unclear how they are made. For example, we do not know if they are converted from labile microtubules or nucleated by a distinct pathway. If they are nucleated by a distinct pathway, do these microtubules grow in a similar manner as labile microtubules and do they have EB comets at their plus-ends (given that EB compacts the lattice (Zhang et al., Cell 2015, PNAS 2018) and stable microtubules have an expanded lattice in cells (de Jager et al., JCB 2025))? If they are converted, does something first cap their plus-end to limit further growth (given that EB comets are rarely observed at the ends of stable microtubules (Jansen et al., JCB 2023))?

      We also do not know how the activity of the tubulin acetyltransferase αTAT1 is regulated. Is its access to the microtubule lumen regulated or is its enzymatic activity stimulated by some means (e.g., microtubule lattice conformation or a molecular factor)?

      We find the possibility that microtubule severing enzymes release these stable microtubules from the centrioles very exciting and hope to test the effects of their absence on microtubule polarity in the future. We will discuss this in the manuscript as suggested.

      Planned revision:

      • We will expand our discussion about the centriole-associated stable microtubules in the revised manuscript. Minor Points

      • In Movies 3 and 4, please use arrowheads or pseudo-coloring to highlight microtubules detaching from specific points. In Movie 5, please mark the stable microtubule that rotates within the neurite. These annotations would enhance clarity.

      Planned revision:

      • We will add arrowheads/traces to the movies to enhance clarity.* *

      The title states: 'Stable microtubules predominantly oriented minus-end-out in the minor neurites of Stage 2b and 3 neurons.' However, given that the minus-end-out percentage increases after nocodazole treatment but only reaches a median of 0.48, 'predominantly' may be an overstatement. Please consider rewording.

      We thank the reviewer for catching this mistake and will adjust the statement to better reflect the median value.

      Planned revision:

      • We will reword this statement in the revised text.

      *Please compare the StableMARK system with the K560Rigor-SunTag approach described by Tanenbaum et al. (2014). What are the advantages of StableMARK over the SunTag method? *

      While the SunTag is certainly a powerful tool to visualize molecules at low copy number, we believe that StableMARK is more appropriate than the K560Rigor-SunTag tool for our assays due to two main reasons. Firstly, K560Rigor-SunTag is based on the E236A kinesin-1 mutation, while StableMARK is based on the G234A mutation. These are both rigor mutations of kinesin-1 but behave differently; the E236A mutant is strongly bound to the microtubule in an ATP-like state (neck linker docked), while the G234A mutant is also strongly bound, but not in an ATP-like state (Rice et al., Nature 1999). This means that they may have different effects on or preferences of the microtubule lattice. Indeed, while StableMARK (G234A) has been shown to preferentially bind microtubules with an expanded lattice (Jansen et al., JCB 2023; de Jager et al., JCB 2025), this may not be the case for the E236A mutant. In support of this, it has been shown that, while nucleotide free kinesin-1 can expand the lattice of GDP-microtubules at high concentrations (>10% lattice occupancy) in vitro (Peet et al., Nat Nanotechnol 2018; Shima et al., JCB 2018), kinesin-1 in the ATP-bound state does not maintain this expanded lattice (Shima et al., JCB 2018). Thus, we expect the kinesin-1 rigor used by Tanenbaum et al. (Cell 2014) to not be specific for stable microtubules (with an expanded lattice) in cells. In addition, given the dense packing of microtubules in neurites (not well-established in developing neurites, but with an inter-microtubule distance of ~25 nm in axons and ~65 nm in dendrites (Chen et al., Nature 1992)), the very large size of the SunTag could be problematic. The K560Rigor-SunTag tool from Tanenbaum et al. (Cell 2014) is bound by up to 24 copies of GFP (each ~3 nm in size), meaning that it may obstruct or be obstructed by the dense microtubule network in neurites.

      Planned revision:

      • Given that, unlike the K560Rigor-SunTag construct, StableMARK has been carefully validated as a live-cell marker for stable microtubules, we believe that the above discussion goes beyond the scope of the manuscript.* *

      Microscopy data (Movies 2, 3, and 4) show microtubule bundling with StableMARK labeling, which is absent in tubulin immunostaining. Could this be an artifact of ectopic StableMARK expression? If so, a brief note addressing this potential effect would be beneficial.

      As with any overexpression, there is a risk of artifacts. We feel that in the cells presented, the risk of artifacts is limited because we have chosen neurons expressing StableMARK at very low levels. Prior work has demonstrated that in cells where StableMARK has a speckled appearance on microtubules, it has limited undesired effects on stable microtubules or the cargoes moving along them (Jansen et al., JCB 2023). Perhaps some of the apparent differences in the amount of bundling can be explained in that the expansion microscopy images shown may have less apparent bundling because of the improved z-resolution and thus optical sectioning. Any z-slice imaged using expansion microscopy will contain fewer microtubules, so bundling may be less obvious. If we compare the amount of bundling seen in StableMARK expressing cells with the amount of bundling of acetylated microtubules (a marker for stable microtubules) in DMSO/nocodazole treated (non-electroporated) cells imaged by confocal microscopy in Figure S7, we feel that the difference is not so large. Nonetheless, we can briefly address this potential effect in the text.

      Planned revision:

      • We will improve the transparency of the manuscript by briefly mentioning this in the text. Reviewer #1 (Significance)

      It is an important paper challenging established ideas of microtubule organization in neurons. It is important to the wide audience of cell and neurobiologists.__ __

      Reviewer #2 (Evidence, reproducibility and clarity)

      *The manuscript uses state-of-the-art microscopy (e,g. expansion microscopy, motorPAINT) to observe microtubule organization during early events of differentiation of cultured rat hippocampal neurons. The authors confirm previous work showing that microtubules in neurites and dendrites are of mixed polarity whereas they are of uniform plus-end-out polarity in axons. They show that stable microtubules (labeled with antibody against acetylated tubulin) are located in the central region of neurite cross-section across all differentiation stages. They show that acetylated microtubules are associated with centrioles early in differentiation but less so at later stages. And they show that stable microtubules can move from one neurite to another, presumably by microtubule sliding. *

      Comments

      1. *I found the manuscript difficult to read. There are lots of "segregations" of microtubules occurring over these stages of neuronal differentiation: segregation between the center of a neurite and the outer edge with respect to neurite cross-section, segregation between the region proximal to the cell body and the region distal to the cell body, and segregation over time (stages). The authors don't do a good job of distinguishing these and reporting the major findings in a way that is clear and straightforward. *

      We thank the reviewer for their feedback and will go over the text to make it easier to read. Within neurites, we use the word 'segregated' in the manuscript to mean that the microtubules form two spatially separate populations across the width of the neurites (i.e., their cross-section if viewed in 3D). Because of variability seen in the neurites of this stage, this segregation does not always present as a peripheral vs. central enrichment of the different populations of microtubules as we sometimes observed two side-by-side populations instead. We will make sure that we properly define this in the manuscript to avoid any confusion.

      When discussing other types of segregation, we tried to use different wording such as when discussing the proximal-distal distribution of microtubules with different orientations in axon-like neurites in this excerpt:

      Sometimes these axons and axon-like neurites had a small bundle of minus-end-out microtubules proximal to the soma (Figure S4). This suggests that plus-end-out uniformity emerges distally first in these neurites, perhaps by retrograde sliding of these minus-end-out microtubules (see Discussion).

      When discussing changes related to a particular stage, we instead aimed to list which stage we were talking about, such as seen in the discussion:

      Emerging neurites of early stage 2 neurons already contain microtubules of both orientations and these are typically segregated. These emerging neurites also contain segregated networks of acetylated (stable) and tyrosinated (labile) microtubules. In later stage 2, stage 3, and stage 4 neurons, stable (nocodazole-resistant) microtubules are oriented more minus-end-out compared to the total (untreated) population of microtubules; however, in early stage 2 neurons, stable microtubules are preferentially oriented plus-end-out, likely because their minus-ends are still anchored at the centrioles at this stage. The fraction of anchored stable microtubules decreases during development, while the appearance of short stumps of microtubules attached to the centrioles suggests that these microtubules may be released by severing.

      We appreciate the reviewer's concerns and will review the text carefully for clarity.

      Planned revision:

      • We will carefully go through the text when revising the manuscript to ensure that these distinctions are clear and consider using synonyms or other descriptors where they would enhance clarity.

      *The major focus is on microtubule changes between stages 2a and 2b. This is introduced in the text and in the methods but not reflected in Figure 1A which should serve as an orientation of what is to come. It would be helpful to move the information about stages to the main text and/or Figure 1A. *

      We thank the reviewer for pointing this out and will be more explicit about the distinction between stages 2a and 2b in the main text and make the suggested change to Figure 1A.

      Planned revision:

      • We will incorporate the suggested changes in the revised manuscript.

      For Figure 1, the conclusions are generally supported by the data with the exception of the data for stage 2b in 1D and 1H. The images in D and the line scan in H suggest that for stage 2b, minus-end-out are on one edge whereas the plus-end-out are on the other edge of the neurite cross-section. But this is only true for one region along this example neurite. If the white line in D was moved proximal or distal along the neurite, the line scan for stage 2b would look like those of stages 2a and 3.

      We thank the reviewer for noting this in the figure. For these earlier stages in neuronal development, the distribution of different types of microtubules within the neurite is more variable and does not always adhere to the central-peripheral distribution described for more mature neurons (Tas et al., Neuron 2017). We did not intend to suggest that neurites of stage 2b neurons consistently have a different radial distribution of microtubules of opposite orientation, but rather that microtubules of the same orientation tend to bundle together. Sometimes this bundling produces a central or peripheral enrichment, as described for mature neurons (Tas et al., Neuron 2017) and as seen in Figure 1D-F at certain points along the length of the neurites, and sometimes the bundling simply produces two side-by-side populations. To reflect this diversity, we chose two different examples in the figure. The line scans presented in Figure 1H were taken approximately at the midpoint of the presented ROIs. In addition, as our imaging in this case is two-dimensional, we do not want to make explicit claims about the radial distribution of the different populations of microtubules.

      Planned revision:

      • We will adjust our description of this figure in the main text to be more explicit about how we interpret these results. We will ensure that it is apparent that we do not think there is a specific radial distribution of microtubules depending on the developmental stage.

      *For Figure 2, I found it difficult to relate panels A-F to panels G-J. I recommend combining 2G-J with 3A-B for a separate figure focused on the orientation of stable microtubules across different stages. *

      We thank the reviewer for this suggestion and will take it into consideration when preparing the revised manuscript, making sure that our figure organization is well justified.

      For Figure 3, it is difficult to reconcile the traces with the corresponding images - that is, there are many acetylated microtubules in the top view image that appear to contact centrioles but are not in the tracing. Perhaps the tracings would more accurately reflect the localization of the acetylated microtubules in the top view images if a stack of images was shown rather than the max projections. Or if the authors were to stain for CAMSAPs to identify non-centrosomal microtubules. I find the data unconvincing but I do believe their conclusion because it is consistent with published data in the field. The data need to be quantified.

      We thank the reviewer for noting this. Importantly, the tracing was done on a three-dimensional stack of images, whereas we present maximum projections of a few slices in Figure 3C for easy visualization. Projection artifacts indeed make it look as though some additional microtubules are attached to the centrioles, whereas in the three-dimensional stacks it is apparent that they are not. We can include the z-stacks as supplementary material so that readers can also verify this themselves. We will additionally clarify that this is the case in the text related to Figure 3C.

      Planned revision:

      • We will better explain how the tracing was done in the methods section and make a brief note of the projection artifacts in the main text.
      • We will also include the z-stacks as supplementary data.

      *I have a major concern with the conclusions of Figure 4. Here the authors use StableMARK to argue that microtubules do not depolymerize in one neurite and then repolymerize in another neurite but rather can be moved (presumably by sliding) from one neurite to another. The problem is that StableMARK-decorated microtubules do not depolymerize. So yes, StableMARK-decorated microtubules can move from one neurite to another but that does not say anything about what normally happens to microtubules during neuronal differentiation. In addition, the text says that the focus on Figure 4 is on how microtubules change between stages 2a and 2b but data is only shown for stage 2b. *

      As noted by the reviewer, StableMARK can indeed hyperstabilize microtubules when over-expressed; however, it is important to note that this strongly depends on the level of overexpression of the marker. This is discussed in detail in the paper introducing StableMARK, where it is described that at low expression levels, StableMARK does not alter the stability of microtubules (i.e., StableMARK decorated microtubules can still depolymerize/disassemble and they are disassembled in response to serum starvation), alter their post-translational modification status or their distribution in the cell, or impede the transport of cargoes along them (Jansen et al. JCB 2023). Despite this, we agree that it is important to validate these findings in our experimental system (primary rat hippocampal neurons) and so we plan to perform experiments with photoactivatable tubulin to verify the long lifetime of stable microtubules and aim to also observe microtubule sliding (similar to assays performed in the Gelfand lab (Lu et al., Curr Biol 2013)) in the absence of StableMARK.

      Planned revision:

      • We will confirm our findings using photoactivatable tubulin. We hope to demonstrate the long lifetime of the microtubules in this case and observe the sliding of microtubules by another means.
      • We will also revise the text to better explain the potential impacts of StableMARK and that we chose the lowest expressing cells we could find so early after electroporation.

      *The data are largely descriptive and it is of course important to first describe things before one can dive into mechanism. But most of the findings confirm previous work and new findings are limited to showing that e.g. microtubule segregation appears earlier than previously observed. *

      Our study is the first to use Motor-PAINT to carefully map changes in microtubule orientations during neuronal development. Furthermore, it is the first to use the recently introduced live-cell marker for stable microtubules to directly demonstrate the active polarity reversal of stable microtubules during this process.

      Optional: It would be nice if the authors could investigate some potential mechanisms. For example, does knockdown or knockout of severing enzymes prevent the loss of centriolar microtubules shown in Figure 3? Does knockdown or knockout of kinesin-2 or EB1 prevent the reorientation of microtubules (Chen et al 2014)?

      We agree with the reviewer that these are exciting experiments to perform, and we hope to unravel the mechanisms underlying microtubule reorganization in future work. However, this will require many more experiments, as well as the recruitment and training of a new PhD student or postdoc, given that the first author has left the lab. Therefore, we feel that this falls outside the scope of the current work, which carefully maps the microtubule organization during neuronal development and demonstrates the active polarity reversal of stable microtubules during this process.

      *Overall, the methods are presented in such a way that they can be reproduced. One exception is in the motor paint sample prep section: is it three washes for 1 min each or three washes over 1 min? *

      We thank the reviewer for pointing out this mistake and will adjust this step in the methods section accordingly.

      Planned revision:

      • We will revise the methods section to read 'washed three times for 1 minute each'.

      *No statistical analysis is provided. The spread of the data in the violin plots is very large and it is difficult to ascertain how strongly one should make conclusions based on different data spreads between different conditions. *

      We thank the reviewer for noting this and will add statistical tests to the graphs showing the fraction of minus-end-out microtubules in different stages/conditions.

      Planned revision:

      • We will include statistical tests in the specified graphs.

      For Figure S5, the excluded data (axons and axon-like neurites) should also be shown.

      We thank the reviewer for this suggestion and will include this data.

      Planned revision:

      • We will adjust this supplemental figure to also include the specified data.

      *For the movies, it would be helpful to have the microtubule moving from one neurite to another identified in some way as it is difficult to tell what is going on. *

      We thank the reviewer for pointing this out.

      Planned revision:

      • We will trace the microtubule in this movie to enhance clarity.* * Reviewer #2 (Significance)

      A strength of the study is the state-of-the-art microscopy (e,g. expansion microscopy, motorPAINT) and its application to a classic experimental model (rat hippocampal neurons). The information will be useful to those interested in the details of neuronal differentiation. A limitation of the study is that it appears to mostly confirm previous findings in the field (microtubule segregation, loss of centriolar anchoring, microtubule sliding). The advance to the field is that the manuscript shows that these events occur earlier in differentiation that previously known.

      • *

      Reviewer #3 (Evidence, reproducibility and clarity)

      *The study by Iwanski and colleagues explores the establishment of the specific organisation of the neuronal microtubule cytoskeleton during neuronal differentiation. They use cultures of dissociated primary hippocampal rat neurons as a model system, and apply the optimised motor-PAINT technology, expansion microscopy/immunofluorescence and live cell imaging to investigate the polarity establishment and the distribution of differentially modified microtubules during early development. *

      They show that in young neurons microtubules are of mixed polarity, but at this stage already the stable (acetylated) microtubules are preferentially oriented plus-end-out, and are connected to the centrioles. In later stages, the stable microtubules are released from the centrioles and reverse their orientation by moving around inside the cell body and the neurites.

      *Overall, the conclusions are well supported by the presented data. The experiments are conducted thoroughly, the figures are clearly presented (for minor comments, see below) and the manuscript is well and clearly written. *

      Major comments

      1. What is the proportion of neurons with different types of neurites (axon-like, non-axon-like) in stage 2b? (middle paragraph page 5 and Fig 1E). Please provide a quantification. * How was the quantification in Fig 2B-D-F done? Why do the curves all start at 0? Please provide a scheme explaining these measurements. Furthermore, the data in Fig 2B do not reflect the statement "the segregation (...) was less evident" than in later stages (top of page 6): while it is less evident than in stage 2b, it is extremely similar to stage 3. Please revise accordingly.*

      We thank the reviewer for pointing out these important details. We will make the suggested changes in the text, adding the proportion of neurons with different types of neurites and adjusting statement mentioned.

      The radial intensity distributions were quantified as described in Katrukha et al. (eLife 2021). In the methods section, we describe the process in brief:

      To analyze the radial distribution of acetylated and tyrosinated microtubules in expanded neurites, deconvolved image stacks were processed using custom scripts in ImageJ (v1.54f) and MATLAB (R2024b) as described in detail elsewhere (Katrukha et al., 2021). Briefly, on maximum intensity projections (XY plane), we drew polylines of sufficient thickness (300 px) to segment out neurite portions 44 µm (10 µm when corrected for expansion factor) in length proximal to the cell soma. Using Selection > Straighten on the corresponding z-stacks generated straightened B-spline interpolated stacks of the neurite sections. These z-stacks were then resliced perpendicularly to the neurite axis (YZ-plane) to visualize the neurite cross-section. From this, we could semi-automatically find the boundary of the neurite in each slice using first a bounding rectangle that encompasses the neurite (per slice) and then a smooth closed spline (approximately oval). To build a radial intensity distribution from neurite border to center, closed spline contours were then shrunken pixel by pixel in each YZ-slice while measuring ROI area and integrated fluorescence intensity. From this, we could ascertain the average fluorescence intensity per contour iteration, allowing us to calculate a radial intensity distribution by calculating the radius corresponding to each area (assuming the neurite cross-section is circular).

      The curves thus all start at 0 because no intensity "fits" into a circle of radius 0 and then gradually increase because very few microtubules "fit" into circles with the smallest radii.

      Planned revision:

      • We will revise the text to include the suggested changes and add a brief statement to the methods section to explain why the curves start at 0.* *

      *It should be stressed in the text, that the modification-specific antibodies only detect modified microtubules. Thus, in figure 3, in the absence of total tubulin staining, it is possible that there are more microtubules than revealed with the anti-acetylated tubulin antibody. A possible explanation should be discussed. *

      We thank the reviewer for highlighting this point and will adjust the text accordingly.

      Planned revision:

      • We will clarify this in the revised text by adding the following sentence: In addition, given that we specifically stained for acetylated tubulin (a marker for stable microtubules), it is possible that other non-acetylated and thus perhaps dynamic microtubules are also associated with the centrioles.* *

      *OPTIONAL: As discussed in the manuscript's discussion, testing some of the proposed mechanisms regulating microtubule cytoskeleton architecture in development (motors, crosslinkers, severing enzymes) would significantly increase the impact of this study. Exploring these phenomena in a more complex system (3D culture, brain explants) closer to the intricate character of the brain than the 2D dissociated neurons would be a real game-changer. *

      We agree that sorting out the mechanisms driving microtubule reorganization would be very exciting. However, this will require many more experiments, as well as the recruitment and training of a new PhD student or postdoc, given that the first author has left the lab. Therefore, we feel this falls outside the scope of the current work, which carefully maps the microtubule organization during neuronal development and demonstrates the active polarity reversal of stable microtubules during this process.

      Minor comments

      1. *It could be useful to write on each panel whether the images were obtained with expansion or motor-PAINT technique: the rendering of the figures is very similar, and despite the different colour scheme can be confusing. *

      We thank the reviewer for pointing this out.

      Planned revision:

      • We will incorporate this suggestion when revising our manuscript.

      Reviewer #3 (Significance)

      This manuscript provides insights into the establishment of the microtubule cytoskeleton architecture specific to highly polarised neurons. The imaging techniques used, improved from the ones published before (motor-PAINT: Kapitein lab in 2017, U-ExM: Hamel/Guichard lab in 2019), yield beautiful and convincing data, marking an improvement compared to previous studies.

      *However, the novelty of some of the findings is relatively limited. Indeed, a mixed microtubule orientation in very young neurites has already been shown (Yau et al, 2016, co-authored by Kapitein), as has the separate distribution of acetylated and tyrosinated / stable and labile / plus-end-out and plus-end-in microtubules in dendrites (Tas, ..., Kapitein, 2017). *

      *On the other hand, observation of the live movement of microtubules with the resolution allowing to see single (stable) microtubules is new and important. It provides an exciting setup to explore the mechanisms of polarity reversal of microtubules in neuronal development and it is regrettable that these mechanisms have not been explored further. *

      *The association of (stable) microtubules with the centrioles is also a technically challenging analysis. Despite not being able to visualise all microtubules, but only acetylated ones, these data are novel and exciting. *

      *This work will be of interest for neuronal cell biologists, developmental neurobiologists. The impact would be larger if the mechanistic questions were addressed using these sophisticated methodologies. *

      *This reviewer's expertise is the regulation of the microtubule cytoskeleton and its impact on molecular, cellular and organism levels. *

      • *


    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We would like to warmly thank all the reviewers for their helpful and fair comments which will increase the quality of our manuscript.

      We would like to inform the reviewers that changes have been made concerning the Figures numbers as follows :

      Figure number in old version

      Figure number in revised manuscript

      1B

      S1C

      S1C

      S1D

      1C

      S2A

      S1D

      S2B

      S1E

      S2C

      1D

      1B

      S2

      S3

      S3

      S4

      S4

      S5

      1. Description of the planned revision

      Reviewer #1

      Major comments 3) Upon food supplementation with 20E the authors could not measure a significant effect on systemic growth or midgut maturation (Fig. S3), whereas the dose of 20E they fed (20µg/ml) was already much higher than endogenous 20E level they measured in the midgut (Fig. 2B).

      We thank the reviewer #1 for this comment.

      Fig. S3 is now Fig. S4

      First, the concentration of 20µg/mL is the final concentration in the fly food and is different from the levels of 20HE we measured in the organs and in the haemolymph, due to the different cell absorption and degradation of the product.

      This concentration of 20µg/mL corresponds to a molar concentration of approximately 0.04mM which is less than the common concentration of 20HE used in the literature in the food (1mM).

      Tiffany V. Roach, Kari F. Lenhart; Mating-induced Ecdysone in the testis disrupts soma-germline contacts and stem cell cytokinesis. Development 1 June 2024; 151 (11): dev202542. doi: https://doi.org/10.1242/dev.202542

      Ahmed, S.M.H., Maldera, J.A., Krunic, D. et al. Fitness trade-offs incurred by ovary-to-gut steroid signalling in Drosophila. Nature 584, 415-419 (2020). https://doi.org/10.1038/s41586-020-2462-y

      The authors should consider to feed larvae with RH5849 (Dr. Ehrenstorfer), which is an insecticide functioning as an ecdysone agonist and was designed for high stability (Wing et al, 1988). RH5849 was already successfully fed to adult Drosophila to investigate the impact of Ecdysone signalling on the adult midgut (Neophytou et al, 2023; Zipper et al, 2025; Zipper et al, 2020) and elicits 20E response. Furthermore, uptake of RH5849 is not limited by the levels of EcI.

      We thank the reviewer #1 for this comment. We ordered that compound and the experiment should be performed in July since the sending date is expected in late June.

      8) The authors should include a discussion of how Ecdysone signalling in postmitotic EC is regulating midgut size, which may include recent data from Edgar and Reiff labs (Ahmed et al, 2020; Zipper et al., 2025; Zipper et al., 2020).

      We thank the reviewer #1 for this comment. We would like to target a format of report for the journal, thus there are some constraints about the number of words. Of course, if the editor allows us to bypass that limit, we would be delighted to cite and discuss these papers.

      9) There are several recent publications showing a role for gut microbiota in regulating oestrogen metabolism in humans, and implications in oestrogen-related diseases such as endometriosis (Baker et al, 2017; Xholli et al, 2023). More precisely bacteria including Lactobacilli strains produce gut microbial β-glucuronidase enzymes, which reactivate oestrogens (Ervin et al, 2019; Hu et al, 2023). As Drosophila ecdysone is the functional equivalent of mammalian oestrogens (Aranda & Pascual, 2001; Martinez et al, 1991; Oberdörster et al, 2001) these publications should be discussed by the authors.

      We thank the reviewer #1 for this comment. We would like to target a format of report for the journal, thus there are some constraints about the number of words. Also, the topics of these papers seem a little bit out of the scope of our manuscript which is focused on the microbiota impact on midgut growth.

      Reviewer #2 Minor Comments

      Figure S2: columns A and B are box plots, while columns C and D are columns with error bars. Presentation of quantitative data should be uniform and ideally as box plots throughout.

      The authors thank the reviewer #2 for this advice and the figure will be further revised.

      Fig. S2 is now Fig. S3


      __Reviewer #3 __

      Major comments:

      The study relies on loss-of-function experiments to manipulate ecdysone signaling; gain-of-function experiments would provide an informative complement. Does feeding ecdysone phenocopy Lp association in GF larvae? Would ecdysone feeding have an additive effect with Lp association? Given the pleiotropic effects of ecdysone on larval phenotypes, a more targeted approach could be used to overexpress transgenes to augment ecdysone signaling.

      We thank the reviewer #3 for this comment. This thought is shared with reviewer #1 and this experiment will be repeated with RH5849. The results are expected in July.

      Minor comments:

      1. For gut and carcass length analysis, the EcR-RNAi and shd-RNAi conditions look slightly smaller in both GF and Lp conditions. Is there a genetic background effect on larval size? It would be helpful to calculate the interaction score between genotype and microbiome status via a 2-way ANOVA with post hoc tests.

      The authors thank the reviewer #3 for this comment. We will further analyse statistically that differences.


      6) In Fig. 3 the authors added the values for numbers of biological replica within the graphs. In Fig. 4 M-P they added the values for number of technical replicas. They should apply adding these two types of values to all graphs and I would suggest to make the difference between biological replica 'n' and technical replica 'N' obvious in the figure.

      The authors thank the reviewer #3 for this comment. We will modify these numbers in the Figures and/or we will clarify these numbers in the legends to not overwrite the Figures.


      The scope of the bibliography seems limited in scope. As one example, Shin et al., 2011 seems quite relevant for this study.

      We thank the reviewer #1 for this comment. We would like to target a format of report for the journal, thus there are some constraints about the number of words. Of course, if the editor allows us to bypass that limit we would be delighted to cite and discuss this paper.

      • *

      2. Description of the revisions that have already been incorporated in the transferred manuscript

      All changes are visible in red in the text of the revised manuscript.

      __Reviewer #1 __

      __Major remarks __

      1) In Fig.2 E - G there is a remarkable difference between controls in D compared to F and E compared to G. The difference between the controls in E and G is stronger than the shown significant difference of EcRRNAi to the control in E. How do the authors explain such a difference of the two (basically equal) controls and the high variance in control values shown in G?

      We thank the reviewer #1 for this comment. As mentioned in the material and methods, the controls are different due to the different RNAi construct. Thus, this can generate variability in such type of developmental experiment.

      Line 253: "UAS-EcRRNAi (BDSC 9327), UAS-dsmCherryRNAI (BDSC 35785), UAS-shadeRNAi (VDRC 108911), and respective RNAi control lines (KK60101)."

      Are the comparisons of control and EcRRNAi shown in D significantly different?

      As mentioned in the figure panel, the EcRRNAi GF and control GF are significantly different and this is discussed in the text as follows in Line 154: "This phenomenon could be explained by genetic background and/or by additional deleterious effect of germ-freeness, as well as a putative contribution of EcR to intestinal functions that are important for systemic growth independently of the contributions of microbiota to adaptive growth."


      4) Lines 167-169: the authors state that 'Size-matched Lp associated larvae, controlRNAi or EcRRNAi, show longer midguts than their relative GF condition (Fig. 3A, B)', but there are no significant statistics shown for this comparison in Fig. 3A, B.

      We thank the reviewer #1 for this comment and we agree that the sentence can be misleading. Thus, we reformulated it as : "Size-matched Lp-associated EcRRNAi larvae show longer midguts than their relative GF controls (Fig. 3A, B)."

      10) Fig. S4 is not mentioned at all in the manuscript.

      We thank the reviewer #1 for this comment and we added the reference to the supplementary Figure 4, now Figure S5 on Line 202 : "In the anterior part, the cells and nuclei are bigger in Lp-associated than GF animals (Fig. 4M-N, Fig.S5). For the posterior part, the cell area was significantly increased in Lp- monoassociated animals compared to GF cell while no change was shown for the nucleus area (Fig. 4O-P, Fig.S5)."

      Minor comments: • The authors are inconsistent in indicating their experimental groups. One example is Fig. S3: In A and B they write the GF groups non-italic, whereas the L.p. groups are written italic. In C - E they only partially write the L.p. groups italic. Furthermore, in A, C - E they write 'L.p.', whereas its written 'Lp' and missing the 'WJL' in B.

      We thank the reviewer #1 for this comment and we corrected that mistake in Fig. S3.

      Fig. S3 is now Fig. S4

      • Line 52: The last 'i' in 'Lactobacilli' is not italic.

      We thank the reviewer #1 for this comment and we corrected that mistake. • Line 122: Spelling error in 'Surpringsinly'

      We thank the reviewer #1 for this comment and we corrected that mistake. • Line 151: Spelling error in 'progenies'. Needs to read 'progeny'.

      We thank the reviewer #1 for this comment and we corrected that mistake. • Lines 231-235: Last part of the sentence is repetitive

      We thank the reviewer #1 for this comment and we corrected that mistake as "Our work paves the way to deciphering the signals delivered by the bacteria that are sensed at the host cellular level and to understand how this microbe-mediated Ecd-dependent midgut growth contributes to the Drosophila larval growth upon malnutrition."

      Reviewer #2 Minor Comments 1. Figure 1 is interesting but challenging to follow. The fonts are very small and challenging to read. Pink on blue background is particularly hard to read and doesn't seem necessary. As the entire manuscript follows from data in Figure 1, I would encourage the authors to revise it with a vie3w to making the results more accessible.

      The authors thank the reviewer #2 for this advice and the Figure 1 has been revised.

      Figure 4 is impressive and important for the overall manuscript. The authors should provide representative images to show how they measured cell area and nucleus area.

      The authors thank the reviewer #2.

      How cell area and nucleus area were measured is described in Figure S4. The reference to this supplementary Figure was missing in the initial manuscript and we deeply apologize for that.

      Reviewer #1 also pointed out that the reference of Figure S4 covering that point was missing in the text and we corrected that point.

      I struggled to follow this sentence (line 215): "Also, it will be interesting to test, beyond their shared growth phenotype, whether they respond differently at the mechanistical level to the presence of bacteria in the anterior compartment." I would encourage the authors to consider alternative formulations.

      The authors thank the reviewer #2 and revised that sentence as follows :

      "Also, it will be interesting to investigate whether the midgut comprises sub-populations of enterocytes that differ in their physiological functions. Indeed, these sub-populations could be differently distributed along the midgut and be localized on anterior and/or posterior parts. Thus, they could present varied responses to the presence of the bacteria."

      __Reviewer #3 __

      Major comments

      Figure 4 title is misleading. No manipulations of ecdysone signaling are performed to demonstrate whether scaling relationships across tissues differ depending on ecdysone. The same experiment should be performed using mex>EcR-RNAi larvae and/or mex>shd-RNAi larvae.

      We thank the reviewer #3 for this comment.

      We agree with the reviewer and the title has been changed as follows and mentioned in red in the manuscript : Midgut-specific adaptive growth promoted by Lp in Drosophila larvae.


      Minor comments:

      It is notable that mex>EcR-RNAi in germ-free larvae exacerbates developmental delay. A possible interpretation is that ecdysone signaling in the germ-free context promotes increased growth rate. Could the authors comment?

      We thank reviewer #3 for this comment.

      Since we described a local effect at the intestine level for Ecd it is unlikely but not totally excluded that intestinal Ecd promotes systemic growth.

      Our comments are here in the text :

      "This phenomenon could be explained by genetic background and/or by additional deleterious effect of germ-freeness, as well as a putative contribution of EcR to intestinal functions that are important for systemic growth independently of the contributions of microbiota to adaptive growth."

      Experimental variation is substantial between the control conditions of the EcR and Shd knockdown experiments; median control + Lp D50 in the EcR experiment is ~6 days whereas in the shade experiment it is ~9 days. Can the authors comment on this between-experiment variation, which seems substantial (similar to the effect size between control + Lp and control GF)?

      We thank reviewer #3 for this comment which was also highlighted by the reviewer #1 and we answered as follows :

      As mentioned in the material in methods, the controls are different due to the different RNAi construct. Thus, this can generate variability in such type of developmental experiment.

      Line 253: "UAS-EcRRNAi (BDSC 9327), UAS-dsmCherryRNAI (BDSC 35785), UAS-shadeRNAi (VDRC 108911), and respective RNAi control lines (KK60101)."

      As mentioned in the figure panel, the EcRRNAi GF and control GF are significantly different and this is discussed in the text as follows in Line 154: "This phenomenon could be explained by genetic background and/or by additional deleterious effect of germ-freeness, as well as a putative contribution of EcR to intestinal functions that are important for systemic growth independently of the contributions of microbiota to adaptive growth."

      The methods detail an ecdysone feeding protocol that I could not find used in the experiments. Please clarify.

      We thank reviewer #3 for this comment.

      We would like to highlight that this protocol is related to an experiment described in Fig. S3 (now Fig.S4) and that supplementary Figure was cited here in the text of the manuscript Line 179 as follows "While the systemic growth of animals is not affected by addition of 20E, a slight trend to faster midgut maturation of GF larvae is observed through the measurements of longer guts (Fig. S4)."

      Also, in supplementary data :

      Fig. S3 : Feeding larvae with 20E does not impact the gut growth.

      (A-B) Addition of 20E has no impact on larval developmental timing (DT) and their D50. From size-matched animals (C), Lp promotes intestinal growth compare to GF (D) but no significant difference is shown in the gut/carcass ratio (E). Animals receiving 20E are represented with color filled circles +Lp (blue), GF (black) and controls without 20E supplementation with empty circles.

      The manuscript would benefit from additional proofreading. The text contains spelling errors throughout. The in-text reference formatting is inconsistent. Figure legends could be improved to better describe the data.

      We thank reviewer #3 for this comment and following the different reviewers comments we improved the manuscript in that way.

      3. Description of analyses that authors prefer not to carry out

      Reviewer #1

      __Major remarks __ 2) The authors should consider investigating an EcIRNAi in addition to EcRRNAi. EcR functions as activator, but also as suppressor in the absence of Ecdysone and a EcRRNAi suppresses both functions of EcR. By knocking down EcI the authors would prevent uptake of Ecdysone and thus interfere only with the ligand-induced activating function of EcR.

      We thank reviewer #1 for this comment.

      This experiment has been performed using EcI RNAi but not shown here because in our hands the genetic tool was not efficient (RNA interference does not work effectively) and thus the experiment was not conclusive.

      No phenotype was observed in our study (see Figure attached). Also, the others Oatp family members were tested for their expression in midgut and were found close to null expression.

      5) Why are the authors comparing the carcass length of GF shade RNAi with L.p. control in Fig. 3 D?

      We thank reviewer #1 for this comment. For transparency of the results, these statistics were added. Because in these conditions GF larvae were difficult to rise at the same size than their relative Lp monoassociated. Hence, the carcass length was used to normalize the data.

      7) In Fig. S3C the authors compared L.p. WJL 20E with the GF EtOH control, where is the comparison to the corresponding L.p. WJL EtOH control? The L.p. WJL EtOH control is compared to GF 20E instead.

      We thank reviewer #1 for this comment that will help to clarify our experiment.

      Fig. S3 is now Fig. S4

      For the Fig. S4C, it is a larval size that allows to compare sizes in all conditions independently. That explains that statistics are shown between all conditions. To not overload the Figure the p values not different are not mentioned.

      Reviewer #2 Minor Comments 3. Figure S3 confuses me. It seems that addition of 20E to GF larvae leads to a significant reduction of larval size, and that mono-association with Lp also significantly shortens larval size. Data in Figure 4G suggest that Lp should not affect larval body length relative to GF larvae. Can the authors explain the apparent discrepancy?

      The authors thank the reviewer #2 for this question. Fig. S3 is now Fig. S4.

      This difference could be explained as follows :

      • The developmental experiment in Fig. S3B shows no difference between the two GF conditions. Thus, at the end of the is larval development, systemic growth is similar in both conditions.

      Because performed earlier during development, the larval size experiment shows higher variability in measurements of larval size. Moreover, less larvae are present in the GF 20E condition that could explained that difference.

      • We have previously shown that Lp mono-associated larvae grow faster than GF. Thus, to collect size-matched larvae on the same day, GF or Lp animals come from a different initial day of experiment. Due to biological variability, some differences in timing could be observed between GF and Lp animals.

      Reviewer #3

      Major comments

      1. The authors conclude that intestinal ecdysone signals are not required for Lp-promoted systemic growth. However, their data shows that circulating 20E titer increases in an Lp-dependent manner, and this circulating 20E presumably affects multiple tissues throughout the organism. Since EcR is broadly expressed, can the authors examine how EcR knockdown in other tissues influences systemic growth in Lp-associated larvae? Fat body-specific EcR knockdown seems particularly of interest here given the established relationship between fat body ecdysone signaling and growth (Delanoue et al., 2010). This additional analysis would help clarify whether ecdysone signaling in non-intestinal tissues mediates the Lp-associated growth phenotype.

      We thank reviewer #3 for this comment that will help to clarify our manuscript.

      We would like to emphasize that we never mention in this manuscript that intestinal ecdysone signals are not required for systemic growth. Nevertheless, we highlighted that it is required for Lp-related midgut growth and not rate limiting for Lp-promoted systemic growth:

      Line 179 : "While the systemic growth of animals is not affected by addition of 20E, a slight trend to faster midgut maturation of GF larvae is observed through the measurements of longer guts (Fig. S3). Thus, the intestinal Ecd signaling is required for the midgut growth effect mediated by Lp in a context of malnutrition."

      Line 227: "Specifically, intestinal Ecd signaling is not rate-limiting for Lp-mediated adaptive growth."

      While it will be very interesting to study the effects of Ecd modulation from Fat Body, we feel this is out of the scope of our manuscript that focused on the Lp-based intestinal growth.

      The experimental design compares larvae associated with live Lp versus germ-free larvae provided sterile PBS. Since Lp cells constitute a potential nutrient source for developing larvae, it's unclear whether gene expression differences arise from larvae digesting Lp cells as a nutrient source or from active, microbe-host signaling interactions. To conclusively address this ambiguity, the authors should perform RNA-seq on larvae inoculated with live versus heat-killed Lp. Alternatively, qPCR could be used to provide evidence for the extent to which changes in ecdysone-related gene expression specifically require live Lp.

      We thank reviewer #3 for this comment.

      We (the lab) previously showed that the systemic growth phenotype is supported by bacteria during development and that bacteria have to be alive to support optimal effects (Storelli et al 2018, PMID: 29290388; Consuegra et al 2020a, PMID: 32196485; Consuegra et al 2020b, PMID: 32563155). This topic of bacteria viability has also been directly addressed independently by colleagues and reported recently (da Silva Soares NF, PMID: 37488173). Hence, we did not design our RNAseq with inactivated bacteria. However, if the editor believes this is essential to provide qPCR results on Ecd-related gene expression in live vs inactivated bacteria associations, we shall provide them but at this stage we believe this notion is not core to our message.

      Shade is expressed in the larval midgut, however the larval fat body is thought to be a major site of 20E to 20HE conversion. Can the authors test how Shd knockdown in the fat body affects systemic growth in the Lp-associated condition?

      We thank reviewer #3 for this comment. Nevertheless, we think this is out of the scope of our manuscript that focused on the Lp-based intestinal growth.

      In the knockdown experiments, body size is not measured for larvae/pupae. Given that ecdysone signaling impacts pupal volume (Delanoue et al., 2010) and controls metamorphosis timing, D50 plots by pupal volume would be informative to give a rough estimate of growth rate. For example, do germ-free EcR-RNAi larvae, which develop slower, have an equivalent body size to germ-free control larvae?

      We thank the reviewer #3 for this comment.

      All experiments were done with size-matched larvae because the aim of this manuscript is to detail what is the impact of Lp on the relative midgut vs systemic growth. Hence, we are using animals of similar systemic size to study their midgut size and identify allometry changes (midgut/larval size ratios) at a similar developmental point, which is same larval systemic growth (here L3). Thus, we feel that focusing on growth rates and systemic sizes in different genetic conditions, while interesting in general, is out of the scope of the study since we focus our study on midgut/larval size allometry.


      __Minor comments __

      The number of pupae in the EcR-RNAi and shd-RNAi experiments (Fig 2D, F) differ. Were larval densities controlled during development?

      I could not find this mentioned in the methods, and it is an important control parameter as larval density impacts developmental growth. Presenting this data as % viability of a known number of larvae deposited in food would be preferable.

      We thank the reviewer #3 for this comment.

      As mentioned in the material and methods, 40 eggs from axenic animals were deposited on each tube. It is true that the final number of pupae is different and could come from differential viability of the genetic backgrounds used. It would be difficult to follow from the same tube the larval development because of the manipulation of gnotobiotics animals. Nevertheless, in all experiments more than 25% of initial eggs deposited in tubes emerged as adults.

    1. une définition non ambigüe de ce qu’est penser

      Pas forcément de « penser », mais « jouer » – Turing abandonne la première dans son texte (“The original question, ‘Can machines think!’ I believe to be too meaningless to deserve discussion.”, p. 442):

      “We now ask the question, ‘What will happen when a machine takes the part of A in this game?’ Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman? These questions replace our original, ‘Can machines think?’” (p. 434).

      Ce qui compte en pratique (pas juste pour Turing: pour nous aussi aujourd’hui), c’est : est-ce que la machine peut faire ce qu’on veut qu’elle fasse (jouer, parler, écrire sans fautes, bref correspondre à nos attentes d’intelligence).

      May not machines carry out something which ought to be described as thinking but which is very different from what a man does? This objection is a very strong one, but at least we can say that if, nevertheless, a machine can be constructed to play the imitation game satisfactorily, we need not be troubled by this objection.<br /> (p. 435)

    1. Reviewer #3 (Public review):

      Summary:

      This is a timely article that focuses on the molecular machinery in charge of the proliferation of pallial neural stem cells in chicks, and aims to compare them to what is known in mammals. miR19b is related to controlling the expression of E2f8 and NeuroD1, and this leads to a proper balance of division/differentiation, required for the generation of the right number of neurons and their subtype proportions. In my opinion, many experiments do reflect an interaction between all these genes and transcription factors, which likely supports the role of miR19b in participating in the proliferation/differentiation balance.

      Strengths:

      Most of the methodologies employed are suitable for the research question, and present data to support their conclusions.

      The authors were creative in their experimental design, in order to assess several aspects of pallial development.

      Weaknesses:

      However, there are several important issues that I think need to be addressed or clarified in order to provide a clearer main message for the article, as well as to clarify the tools employed. I consider it utterly important to review and reinterpret most of the anatomical concepts presented here. The way the are currently used is confusing and may mislead readers towards an understanding of the bird pallium that is no longer accepted by the community.

      Major Concerns:

      (1) Inaccurate use of neuroanatomy throughout the entire article. There are several aspects to it, that I will try to explain in the following paragraphs:

      a) Figure 1 shows a dynamic and variable expression pattern of miR19b and its relation to NeuroD1. Regardless of the terms used in this figure, it shows that miR19b may be acting differently in various parts of the pallium and developmental stages. However, all the rest of the experiments in the article (except a few cases) abolish these anatomical differences. It is not clear, but it is very important, where in the pallium the experiments are performed. I refer here, at least, to Figures 2C, E, F, H, I; 3D, E; 4C, D, G, I. Regarding time, all experiments were done at HH22, and the article does not show the native expression at this stage. The sacrifice timing is variable, and this variability is not always justified. But more importantly, we don't know where those images were taken, or what part of the pallium is represented in the images. Is it always the same? Do results reflect differences between DVR and Wulst gene expression modifications? The authors should include low magnification images of the regions where experiments were performed. And they should consider the variable expression of all genes when interpreting results.

      b) SVZ is not a postmitotic zone (as stated in line 123, and wrongly assigned throughout the text and figures). On the contrary, the SVZ is a secondary proliferative zone, organized in a layer, located in a basal position to the VZ. Both (VZ and SVZ) are germinative zones, containing mostly progenitors. The only postmitotic neurons in VZ and SVZ occupy them transiently when moving to the mantle zone, which is closer to the meninges and is the postmitotic territory. Please refer to the original Boulder committee articles to revise the SVZ definition. The authors, however, misinterpret this concept, and label the whole mantle zone as it this would be the SVZ. Indeed, the term "mantle zone" does not appear in the article. Please, revise and change the whole text and figures, as SVZ statements and photographs are nearly always misinterpreted. Indeed, SVZ is only labelled well in Figure 4F.

      The two articles mentioning the expression of NeuroD1 in the SVZ (line 118) are research in Xenopus. Is there a proliferative SVZ in Xenopus?

      For the actual existence of the SVZ in the chick pallium, please refer to the recent Rueda-Alaña et al., 2025 article that presents PH3 stainings at different timepoints and pallial areas.

      c) What is the Wulst, according to the authors of the article? In many figures, the Wulst includes the medial pallium and hippocampus, whereas sometimes it is used as a synonym of the hyperpallium (which excludes the medial pallium and hippocampus). Please make it clear, as the addition or not of the hippocampus definitely changes some interpretations.

      d) The authors compare the entirety of the chick pallium - including the hippocampus (see above), hyperpallium, mesopallium, nidopallium - to only the neocortex of mammals. This view - as shown in Suzuki et al., 2012 - forgets the specificity of pallial areas of the pallium and compares it to cortical cells. This is conceptually wrong, and leads to incorrect interpretations (please refer to Luis Puelles' commentaries on Suzuki et al results); there are incorrect conclusions about the existence of upper-layer-like and deep-layer-like neurons in the pallium of birds. The view is not only wrong according to the misinterpreted anatomical comparisons, but also according to novel scRNAseq data (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025). These articles show that many avian glutamatergic neurons of the pallium have highly diversified, and are not comparable to mammalian cortical cells. The authors should therefore avoid this incorrect use of terminology. There are not such upper-layer-like and deep-layer-like neurons in the pallium of birds.

      (2) From introduction to discussion, the article uses misleading terms and outdated concepts of cell type homology and similarity between chick and pallial territories and cells. The authors must avoid this confusing terminology, as non-expert readers will come to evolutionary conclusions which are not supported by the data in this article; indeed, the article does not deal with those concepts.

      a) Recent articles published in Science (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) directly contradict some views presented in this article. These articles should be presented in the introduction as they are utterly important for the subject of this article and their results should be discussed in the light of the new findings of this article. Accordingly, the authors should avoid claiming any homology that is not currently supported. The expression of a single gene is not enough anymore to claim the homology of neuronal populations.

      b) Auditory cortex is not an appropriate term, as there is no cortex in the pallium of birds. Cortical areas require the existence of neuronal arrangements in laminae that appear parallel to the ventricular surface. It is not the case of either hyperpallium or auditory DVR. The accepted term, according to the Avian Nomenclature forum, is Field L.

      c) Forebrain, a term overused in the article, is very unspecific. It includes vast areas of the brain, from the pretectum and thalamus to the olfactory bulb. However the authors are not researching most of the forebrain here. They should be more specific throughout the text and title.

      (3) In the last part of the results, the authors claim miR19b has a role in patterning the avian pallium. What they see is that modifying its expression induces changes in gene expression in certain neurons. Accordingly, the altered neurons would differentiate into other subtypes, not similar to the wild type example. In this sense, miR19b may have a role in cell specification or neuronal differentiation. However, patterning is a different developmental event, which refers to the determination of broad genetic areas and territories. I don't think miR19b has a role in patterning.

      (4) Please add a scheme of the molecules described in this article and the suggested interaction between them.

      (5) The methods section is way too brief to allow for repeatability of the procedures. This may be due to an editorial policy but if possible, please extend the details of the experimental procedures.

    2. Author response:

      Public Reviews:

      Reviewer #1 (Public review):  

      Summary:  

      This study provides new insights into the role of miR-19b, an oncogenic microRNA, in the developing chicken pallium. Dynamic expression pattern of miR-19b is associated with its role in regulating cell cycle progression in neural progenitor cells. Furthermore, miR-19b is involved in determining neuronal subtypes by regulating Fezf2 expression during pallial development. These findings suggest an important role for miR-19b in the coordinated spatio-temporal regulation of neural progenitor cell dynamics and its evolutionary conservation across vertebrate species.  

      Strengths:  

      The authors identified conserved roles of miR-19 in the regulation of neural progenitor maintenance between mouse and chick, and the latter is mediated by the repression of E2f8 and NeuroD1. Furthermore, the authors found that miR-19b-dependent cell cycle regulation is tightly associated with specification of Fezf1 or Mef2c-positive neurons, in spatio-temporal manners during chicken pallial development. These findings uncovered molecular mechanisms underlying microRNA-mediated neurogenic controls.  

      Weaknesses:  

      Although the authors in this study claimed striking similarities of miR-19a/b in neurogenesis between mouse and chick pallium, a previous study by Bian et al. revealed that miR-19a contributes the expansion of radial glial cells by suppressing PTEN expression in the developing mouse neocortex, while miR-19b maintains apical progenitors via inhibiting E2f2 and NeuroD1 in chicken pallium. Thus, it is still unclear whether the orthologous microRNAs regulate common or species-specific target genes.  

      In this study, we have proposed that miR-19b regulates similar phenomena in both species using different targets, such as regulation of proliferation through PTEN in mouse and through E2f8 in the chicken.

      The spatiotemporal expression patterns of miR-19b and several genes are not convincing. For example, the authors claim that NeuroD1 is initially expressed uniformly in the subventricular zone (SVZ) but disappears in the DVR region by HH29 and becomes detectable by HH35 (Figure 1). However, the in situ hybridization data revealed that NeuroD1 is highly expressed in the SVZ of the DVR at HH29 (Figure 4F). Thus, perhaps due to the problem of immunohistochemistry, the authors have not been able to detect NeuroD1 expression in Figure 1D, and the interpretation of the data may require significant modification.  

      While Fig. 1B may suggest that NeuroD1 expression has disappeared from the DVR region by HH29, this is not true in general because we have observed NeuroD1 to be expressed in the DVR at HH29 in images of other sections. In the revised version, we will include improved images for panels of Fig. 1B which accurately show the expression pattern of NeuroD1 and miR19b at stages HH29 and HH35.  

      It seems that miR-19b is also expressed in neurons (Figure 1), suggesting the role of miR19-b must be different in progenitors and differentiated neurons. The data on the gain- and loss-offunction analysis of miR-19b on the expression of Mef2c should be carefully considered, as it is possible that these experiments disturb the neuronal functions of miR19b rather than in the progenitors.

      As pointed out by the reviewer, it is quite possible that upon manipulation of miR19b its neuronal functions are also perturbed in addition to its function in progenitor cells. After introducing gain-of-function construct in progenitor cells, we have observed changes in the morphology of these cells. These data will be included in the revised version.

      The regions of chicken pallium were not consistent among figures: in Figure 1, they showed caudal parts of the pallium (HH29 and 35), while the data in Figure 4 corresponded to the rostral part of the pallium (Figure 4B).  

      We will address this by providing images from a similar region of the pallium showing Fezf2 and Mef2c expression patterns.

      The neurons expressing Fezf2 and Mef2 in the chicken pallium are not homologous neuronal subtypes to mammalian deep and superficial cortical neurons. The authors must understand that chicken pallial development proceeds in an outside-in manner. Thus, Mef2c-postive neurons in a superficial part are early-born neurons, while FezF2-positive neurons residing in deep areas are later-born neurons. It should be noted that the expression of a single marker gene does not support cell type homology, and the authors' description "the possibility of primitive pallial lamina formation in common ancestors of birds and mammals" is misleading.  

      We appreciate this clarification and will modify or remove this statement regarding the “primitive pallial lamina formation” to avoid any confusion and misinterpretation. 

      Overexpression of CDKN1A or Sponge-19b induced ectopic expression of Fezf2 in the ventricular zone (Figure 3C, E). Do these cells maintain progenitor statement or prematurely differentiate to neurons? In addition, the authors must explain that the induction of Fezf2 is also detected in GFP-negative cells.  

      We propose to follow up on the fate of these cells by extending the observation period post-overexpression of CDKN1A or Sponge-19b to assess whether they retain progenitor characteristics or differentiate. The presence of Fezf2 in GFP-negative cells could be due to the non-cell-autonomous effects, and we will discuss this possibility in the revised manuscript.

      Reviewer #2 (Public review):  

      Summary:  

      This paper investigates the general concept that avian and mammalian pallium specifications share similar mechanisms. To explore that idea, the authors focus their attention on the role of miR-19b as a key controlling factor in the neuronal proliferation/differentiation balance. To do so, the authors checked the expression and protein level of several genes involved in neuronal differentiation, such as NeuroD1 or E2f8, genes also expressed in mammals after conducting their functional gene manipulation experiments. The work also shows a dysregulation in the number of neurons from lower and upper layers when miR-19b expression is altered.  

      To test it, the authors conducted a series of functional experiments of gain and loss of function (G&LoF) and enhancer-reporter assays. The enhancer-reporter assays demonstrate a direct relationship between miR-19b and NeuroD1 and E2f8 which is also validated by the G&LoF experiments. It´s also noteworthy to mention that the way miR-19b acts is maintaining the progenitor cells from the ventricular zone in an undifferentiated stage, thus promoting them into a stage of cellular division.  

      Overall, the paper argues that the expression of miR-19b in the ventricular zone promotes the cells in a proliferative phase and inhibits the expression of differentiation genes such as E2f8 and NeurD1. The authors claim that a decrease in the progenitor cell pool leads to an increase and decrease in neurons in the lower and upper layers, respectively.  

      Strengths:  

      (1) Novelty Contribution  

      The paper offers strong arguments to prove that the neurodevelopmental basis between mammals and birds is quite the same. Moreover, this work contributes to a better understanding of brain evolution along the animal evolutionary tree and will give us a clearer idea about the roots of how our brain has been developed. This stands in contrast to the conventional framing of mammal brain development as an independent subject unlinked to the "less evolved species". The authors also nicely show a concept that was previously restricted to mammals - the role of microRNAs in development.  

      (2) Right experimental approach  

      The authors perform a set of functional experiments correctly adjusted to answer the role of miR-19b in the control of neuronal stem cell proliferation and differentiation. Their histological, functional, and genetic approach gives us a clear idea about the relations between several genes involved in the differentiation of the neurons in the avian pallium. In this idea, they maintain the role of miR-19b as a hub controller, keeping the ventricular zone cells in an undifferentiated stage to perpetuate the cellular pool.  

      (3) Future directions  

      The findings open a door to future experiments, particularly to a better comprehension of the role of microRNAs and pallidal genetic connections. Furthermore, this work also proves the use of avians as a model to study cortical development due to the similarities with mammals.  

      Weaknesses:  

      While there are questions answered, there are still several that remain unsolved. The experiments analyzed here lead us to speculate that the early differentiation of the progenitor cells from the ventricular zone entails a reduction in the cellular pool, affecting thereafter the number of latter-born neurons (upper layers). The authors should explore that option by testing progenitor cell markers in the ventricular zone, such as Pax6. Even so, it remains possible that miR-19b is also changing the expression pattern of neurons that are going to populate the different layers, instead of their numbers, so the authors cannot rule that out or verify it. Since the paper focuses on the role of miR-19b in patterning, I think the authors should check the relationship and expression between progenitors (Pax6) and intermediate (Tbr2) cells when miR-19b is affected. Since neuronal expression markers change so fast within a few days (HH24HH35), I don't understand why the authors stop the functional experiments at different time points.  

      To address this, we will examine the expression of Pax6 and Tbr2 following both gain-of-function and loss-of-function manipulations of miR-19b. We agree with the reviewer that miR-19b may influence not only the number of neurons but also the expression pattern of neuronal markers.  Due to the limitations of our experimental design, we acknowledge that this possibility cannot be ruled out. 

      Regarding time points chosen for the functional experiments: We selected different stages based on the expression dynamics of specific markers. To detect possible ectopic induction, we analyzed developmental stages where the expression of a given marker is normally absent. Conversely, to detect loss of expression we examined stages in which the marker is typically expressed robustly. This approach allowed us to better interpret the functional consequences of miR-19b manipulation within relevant developmental windows. 

      Reviewer #3 (Public review):  

      Summary:  

      This is a timely article that focuses on the molecular machinery in charge of the proliferation of pallial neural stem cells in chicks, and aims to compare them to what is known in mammals. miR19b is related to controlling the expression of E2f8 and NeuroD1, and this leads to a proper balance of division/differentiation, required for the generation of the right number of neurons and their subtype proportions. In my opinion, many experiments do reflect an interaction between all these genes and transcription factors, which likely supports the role of miR19b in participating in the proliferation/differentiation balance.  

      Strengths:  

      Most of the methodologies employed are suitable for the research question, and present data to support their conclusions.  

      The authors were creative in their experimental design, in order to assess several aspects of pallial development.  

      Weaknesses:  

      However, there are several important issues that I think need to be addressed or clarified in order to provide a clearer main message for the article, as well as to clarify the tools employed. I consider it utterly important to review and reinterpret most of the anatomical concepts presented here. The way the are currently used is confusing and may mislead readers towards an understanding of the bird pallium that is no longer accepted by the community.  

      Major Concerns:  

      (1) Inaccurate use of neuroanatomy throughout the entire article. There are several aspects to it, that I will try to explain in the following paragraphs:  

      Figure 1 shows a dynamic and variable expression pattern of miR19b and its relation to NeuroD1. Regardless of the terms used in this figure, it shows that miR19b may be acting differently in various parts of the pallium and developmental stages. However, all the rest of the experiments in the article (except a few cases) abolish these anatomical differences. It is not clear, but it is very important, where in the pallium the experiments are performed. I refer here, at least, to Figures 2C, E, F, H, I; 3D, E; 4C, D, G, I. Regarding time, all experiments were done at HH22, and the article does not show the native expression at this stage. The sacrifice timing is variable, and this variability is not always justified. But more importantly, we don't know where those images were taken, or what part of the pallium is represented in the images. Is it always the same? Do results reflect differences between DVR and Wulst gene expression modifications? The authors should include low magnification images of the regions where experiments were performed. And they should consider the variable expression of all genes when interpreting results.  

      We agree that precise anatomical context is essential. In the revised version, we propose to: 

      a) Include schematics of the regions of interest where experimental manipulations were performed.

      b) Provide low-magnification panoramic images where appropriate, for anatomical reference.

      c) Show the expression patterns of relevant marker genes to better justify stages and region selection. 

      d) Provide the expression pattern of markers in panoramic view to show differential expression in the DVR and Wulst region and interpret our results accordingly.

      b) SVZ is not a postmitotic zone (as stated in line 123, and wrongly assigned throughout the text and figures). On the contrary, the SVZ is a secondary proliferative zone, organized in a layer, located in a basal position to the VZ. Both (VZ and SVZ) are germinative zones, containing mostly progenitors. The only postmitotic neurons in VZ and SVZ occupy them transiently when moving to the mantle zone, which is closer to the meninges and is the postmitotic territory. Please refer to the original Boulder committee articles to revise the SVZ definition. The authors, however, misinterpret this concept, and label the whole mantle zone as it this would be the SVZ. Indeed, the term "mantle zone" does not appear in the article. Please, revise and change the whole text and figures, as SVZ statements and photographs are nearly always misinterpreted. Indeed, SVZ is only labelled well in Figure 4F.  

      The two articles mentioning the expression of NeuroD1 in the SVZ (line 118) are research in Xenopus. Is there a proliferative SVZ in Xenopus?  

      For the actual existence of the SVZ in the chick pallium, please refer to the recent Rueda-Alaña et al., 2025 article that presents PH3 stainings at different timepoints and pallial areas.  

      We appreciate the correction suggested by the reviewer. In the revised manuscript: a) SVZ will be labeled correctly in all figures and descriptions b) The mantle zone terminology will be incorporated appropriately c) The two Xenopus-based references in line 118 will be removed as they are not directly relevant and d) We will refer to the Rueda-Alaña et al., (2025) to guide accurate anatomical labeling and interpretation of proliferative zones.

      We also acknowledge that while some proliferative cells exist in the SVZ of the chicken, they are relatively few and do not express typical basal progenitor markers such as Tbr2 (Nomura et al., 2016, Development). We will ensure that this nuance is clearly reflected in the text. 

      What is the Wulst, according to the authors of the article? In many figures, the Wulst includes the medial pallium and hippocampus, whereas sometimes it is used as a synonym of the hyperpallium (which excludes the medial pallium and hippocampus). Please make it clear, as the addition or not of the hippocampus definitely changes some interpretations.  

      We propose to modify the text and figures to accurately represent the correct location of the Wulst in the chick pallium.

      d) The authors compare the entirety of the chick pallium - including the hippocampus (see above), hyperpallium, mesopallium, nidopallium - to only the neocortex of mammals. This view - as shown in Suzuki et al., 2012 - forgets the specificity of pallial areas of the pallium and compares it to cortical cells. This is conceptually wrong, and leads to incorrect interpretations (please refer to Luis Puelles' commentaries on Suzuki et al results); there are incorrect conclusions about the existence of upper-layer-like and deep-layer-like neurons in the pallium of birds. The view is not only wrong according to the misinterpreted anatomical comparisons, but also according to novel scRNAseq data (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025). These articles show that many avian glutamatergic neurons of the pallium have highly diversified, and are not comparable to mammalian cortical cells. The authors should therefore avoid this incorrect use of terminology. There are not such upper-layer-like and deeplayer-like neurons in the pallium of birds.  

      We acknowledge this conceptual oversight. In the manuscript: a) We will avoid direct comparisons between the entire chick pallium and the mammalian neocortex b) Terms like “upper-layer-like” and deep-layer-like” neurons will be removed or modified d) We will cite and integrate recent findings from Rueda-Alaña et al. (2025), Zaremba et al. (2025), and Hecker et al. (2025), which provide updated insights from scRNAseq analyses into the complexity of avian pallial neurons. Cell types will be described based on marker gene expression only, without unsupported evolutionary or homology claims.

      (2) From introduction to discussion, the article uses misleading terms and outdated concepts of cell type homology and similarity between chick and pallial territories and cells. The authors must avoid this confusing terminology, as non-expert readers will come to evolutionary conclusions which are not supported by the data in this article; indeed, the article does not deal with those concepts.  

      We agree with the reviewer. In the revised version, we will remove the misleading terms and outdated concepts and avoid speculative evolutionary conclusions.  

      a) Recent articles published in Science (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) directly contradict some views presented in this article. These articles should be presented in the introduction as they are utterly important for the subject of this article and their results should be discussed in the light of the new findings of this article. Accordingly, the authors should avoid claiming any homology that is not currently supported. The expression of a single gene is not enough anymore to claim the homology of neuronal populations.  

      In the revised version, these above-mentioned articles (Rueda-Alaña et al., 2025; Zaremba et al., 2025; Hecker et al., 2025) will be included in the introduction and discussion.  Our interpretations will be updated to reflect these new insights into neuronal diversity and regionalization in the chick pallium. 

      Auditory cortex is not an appropriate term, as there is no cortex in the pallium of birds. Cortical areas require the existence of neuronal arrangements in laminae that appear parallel to the ventricular surface. It is not the case of either hyperpallium or auditory DVR. The accepted term, according to the Avian Nomenclature forum, is Field L.  

      We will replace all instances of “auditory cortex” with “Field L”, as per the accepted terminology in the Avian Nomenclature Forum.

      c) Forebrain, a term overused in the article, is very unspecific. It includes vast areas of the brain, from the pretectum and thalamus to the olfactory bulb. However, the authors are not researching most of the forebrain here. They should be more specific throughout the text and title.  

      In the revised version, we will replace “forebrain” with “Pallium” throughout the manuscript to more accurately reflect the regions studied.

      (3) In the last part of the results, the authors claim miR19b has a role in patterning the avian pallium. What they see is that modifying its expression induces changes in gene expression in certain neurons. Accordingly, the altered neurons would differentiate into other subtypes, not similar to the wild type example. In this sense, miR19b may have a role in cell specification or neuronal differentiation. However, patterning is a different developmental event, which refers to the determination of broad genetic areas and territories. I don't think miR19b has a role in patterning.  

      We agree with the reviewers that an alteration in one marker for a particular cell type may not indicate a change in patterning. However, including the effect of miR-19b gain- and loss-of-function on Pax6 and Tbr2, may strengthen the idea that it affects patterning as suggested by reviewer #2. 

      (4) Please add a scheme of the molecules described in this article and the suggested interaction between them.  

      In the revised version, we propose to include a diagram to visually summarize the proposed interactions between miR-19b, E2f8, NeuroD1, and other key regulators.  

      (5) The methods section is way too brief to allow for repeatability of the procedures. This may be due to an editorial policy but if possible, please extend the details of the experimental procedures.  

      We will expand the Methods section to provide more detailed protocols and justifications for experimental design, in alignment with journal policy.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors aim to understand the neural basis of implicit causal inference, specifically how people infer causes of illness. They use fMRI to explore whether these inferences rely on content-specific semantic networks or broader, domain-general neurocognitive mechanisms. The study explores two key hypotheses: first, that causal inferences about illness rely on semantic networks specific to living things, such as the 'animacy network,' given that illnesses affect only animate beings; and second, that there might be a common brain network supporting causal inferences across various domains, including illness, mental states, and mechanical failures. By examining these hypotheses, the authors aim to determine whether causal inferences are supported by specialized or generalized neural systems.

      The authors observed that inferring illness causes selectively engaged a portion of the precuneus (PC) associated with the semantic representation of animate entities, such as people and animals. They found no cortical areas that responded to causal inferences across different domains, including illness and mechanical failures. Based on these findings, the authors concluded that implicit causal inferences are supported by content-specific semantic networks, rather than a domain-general neural system, indicating that the neural basis of causal inference is closely tied to the semantic representation of the specific content involved.

      Strengths:

      (1) The inclusion of the four conditions in the design is well thought out, allowing for the examination of the unique contribution of causal inference of illness compared to either a different type of causal inference (mechanical) or non-causal conditions. This design also has the potential to identify regions involved in a shared representation of inference across general domains.

      (2) The presence of the three localizers for language, logic, and mentalizing, along with the selection of specific regions of interest (ROIs), such as the precuneus and anterior ventral occipitotemporal cortex (antVOTC), is a strong feature that supports a hypothesis-driven approach (although see below for a critical point related to the ROI selection).

      (3) The univariate analysis pipeline is solid and well-developed.

      (4) The statistical analyses are a particularly strong aspect of the paper.

      Weaknesses:

      Based on the current analyses, it is not yet possible to rule out the hypothesis that inferring illness causes relies on neurocognitive mechanisms that support causal inferences irrespective of their content, neither in the precuneus nor in other parts of the brain.

      (1) The authors, particularly in the multivariate analyses, do not thoroughly examine the similarity between the two conditions (illness-causal and mechanical-causal), as they are more focused on highlighting the differences between them. For instance, in the searchlight MVPA analysis, an interesting decoding analysis is conducted to identify brain regions that represent illness-causal and mechanical-causal conditions differently, yielding results consistent with the univariate analyses. However, to test for the presence of a shared network, the authors only perform the Causal vs. Non-causal analysis. This analysis is not very informative because it includes all conditions mixed together and does not clarify whether both the illness-causal and mechanical-causal conditions contribute to these results.

      (2) To address this limitation, a useful additional step would be to use as ROIs the different regions that emerged in the Causal vs. Non-causal decoding analysis and to conduct four separate decoding analyses within these specific clusters:

      (a) Illness-Causal vs. Non-causal - Illness First;

      (b) Illness-Causal vs. Non-causal - Mechanical First;

      (c) Mechanical-Causal vs. Non-causal - Illness First;

      (d) Mechanical-Causal vs. Non-causal - Mechanical First.

      This approach would allow the authors to determine whether any of these ROIs can decode both the illness-causal and mechanical-causal conditions against at least one non-causal condition.

      (3) Another possible analysis to investigate the existence of a shared network would be to run the searchlight analysis for the mechanical-causal condition versus the two non-causal conditions, as was done for the illness-causal versus non-causal conditions, and then examine the conjunction between the two. Specifically, the goal would be to identify ROIs that show significant decoding accuracy in both analyses.

      The hypothesis that a neural mechanism supports causal inference across domains predicts higher univariate responses when causal inferences occur than when they do not. This prediction was not generated by us ad hoc but rather has been made by almost all previous cognitive neuroscience papers on this topic (Ferstl & von Cramon, 2001; Satpute et al., 2005; Fugelsang & Dunbar, 2005; Kuperberg et al., 2006; Fenker et al., 2010; Kranjec et al., 2012; Pramod, Chomik-Morales, et al., 2023; Chow et al., 2008; Mason & Just, 2011; Prat et al., 2011). Contrary to this hypothesis, we find that the precuneus (PC) is most activated for illness inferences and most deactivated for mechanical inferences relative to rest, suggesting that the PC does not support domain-general causal inference. To further probe the selectivity of the PC for illness inferences, we created group overlap maps that compare PC responses to illness inferences and mechanical inferences across participants. The PC shows a strong preference for illness inferences and is therefore unlikely to support causal inferences irrespective of their content (Supplementary Figures 6 and 7). We also note that, in whole-cortex analysis, no shared regions responded more to causal inference than noncausal vignettes across domains. Therefore, the prediction made by the ‘domain-general causal engine’ proposal as it has been articulated in the literature is not supported in our data.

      Taking a multivariate approach, the hypothesis that a neural mechanism supports causal inference across domains also predicts that relevant regions can decode between all possible pairs of causal vs. noncausal conditions (e.g., Illness-Causal vs. Noncausal-Illness First, Mechanical-Causal vs. Noncausal-Illness First, etc.). The analysis described by the reviewer in (2), in which the regions that distinguish between causal vs. noncausal conditions in searchlight MVPA are used as ROIs to test various causal vs. noncausal contrasts, is non-independent. Therefore, we cannot perform this analysis. In accordance with the reviewer’s suggestions in (3), now include searchlight MVPA results for the mechanical inference condition compared to the two noncausal conditions (Supplementary Figure 9). No regions are shared across the searchlight analyses comparing all possible pairs of causal and noncausal conditions, providing further evidence that there are no shared neural responses to causal inference in our dataset.

      (4) Along the same lines, for the ROI MVPA analysis, it would be useful not only to include the illness-causal vs. mechanical-causal decoding but also to examine the illness-causal vs. non-causal conditions and the mechanical-causal vs. non-causal conditions. Additionally, it would be beneficial to report these data not just in a table (where only the mean accuracy is shown) but also using dot plots, allowing the readers to see not only the mean values but also the accuracy for each individual subject.

      We have performed these analyses and now include a table of the results as well as figures displaying the dispersion across participants (Supplementary Tables 2 and 3, Supplementary Figures 10 and 11). In the left PC, the illness inference condition was decoded from one of the noncausal conditions, and the mechanical inference condition was decoded from the same noncausal condition. The language network did not decode between any causal/noncausal pairs. In the logic network, the illness inference condition was decoded from one of the noncausal conditions, and the mechanical inference condition was decoded from the other noncausal condition. Thus, no regions showed the predicted ‘domain-general’ pattern, i.e., significant decoding between all causal/noncausal pairs. 

      Importantly, the decoding results must be interpreted in light of significant univariate differences across conditions (e.g., greater responses to illness inferences compared to noncausal vignettes in the PC). Linear classifiers are highly sensitive to univariate differences (Coutanche, 2013; Kragel et al., 2012; Hebart & Baker, 2018; Woolgar et al., 2014; Davis et al., 2014; Pakravan et al., 2022).

      (5) The selection of Regions of Interest (ROIs) is not entirely straightforward:

      In the introduction, the authors mention that recent literature identifies the precuneus (PC) as a region that responds preferentially to images and words related to living things across various tasks. While this may be accurate, we can all agree that other regions within the ventral occipital-temporal cortex also exhibit such preferences, particularly areas like the fusiform face area, the occipital face area, and the extrastriate body area. I believe that at least some parts of this network (e.g., the fusiform gyrus) should be included as ROIs in this study. This inclusion would make sense, especially because a complementary portion of the ventral stream known to prefer non-living items (i.e., anterior medial VOTC) has been selected as a control ROI to process information about the mechanical-causal condition. Given the main hypothesis of the study - that causal inferences about illness might depend on content-specific semantic representations in the 'animacy network' - it would be worthwhile to investigate these ROIs alongside the precuneus, as they may also yield interesting results.

      We thank the reviewer for their suggestion to test the FFA region. We think this provides an interesting comparison to the PC and hypothesized that, in contrast to the PC, the FFA does not encode abstract causal information about animacy-specific processes (i.e., illness). As we mention in the Introduction, although the fusiform face area (FFA) also exhibits a preference for animates, it does so primarily for images in sighted people (Kanwisher et al., 1997; Kanwisher et al., 1997; Grill-Spector et al., 2004; Noppeney et al., 2006; Konkle & Caramazza, 2013; Connolly et al., 2016; Bi et al., 2016).

      We did not select the FFA as a region of interest when preregistering the current study because we did not predict it would show sensitivity to causal knowledge. In accordance with the reviewer’s suggestions, we now include the FFA as an ROI in individual-subject univariate analysis (Supplementary Figure 8, Appendix 4). Because we did not run a separate FFA localizer task when collecting the data, we used FFA search spaces from a previous study investigating responses to face images (Julian et al., 2012). We followed the same analysis procedure that was used to investigate responses to illness inferences in the PC. Neither left nor right FFA exhibited a preference for illness inferences compared to mechanical inferences or to the noncausal conditions. This result is interesting and is now briefly discussed in the Discussion section.

      (6) Visual representation of results:

      In all the figures related to ROI analyses, only mean group values are reported (e.g., Figure 1A, Figure 3, Figure 4A, Supplementary Figure 6, Figure 7, Figure 8). To better capture the complexity of fMRI data and provide readers with a more comprehensive view of the results, it would be beneficial to include a dot plot for a specific time point in each graph. This could be a fixed time point (e.g., a certain number of seconds after stimulus presentation) or the time point showing the maximum difference between the conditions of interest. Adding this would allow for a clearer understanding of how the effect is distributed across the full sample, such as whether it is consistently present in every subject or if there is greater variability across individuals.

      We thank the reviewer for this suggestion. We now include scattered box plots displaying the dispersion in average percent signal change across participants in Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14.

      (7) Task selection:

      (a) To improve the clarity of the paper, it would be helpful to explain the rationale behind the choice of the selected task, specifically addressing: (i) why an implicit inference task was chosen instead of an explicit inference task, and (ii) why the "magic detection" task was used, as it might shift participants' attention more towards coherence, surprise, or unexpected elements rather than the inference process itself.

      (b) Additionally, the choice to include a large number of catch trials is unusual, especially since they are modeled as regressors of non-interest in the GLM. It would be beneficial to provide an explanation for this decision.

      We chose an orthogonal foil detection task, rather than an explicit causal judgment task, to investigate automatic causal inferences during reading and to unconfound such processing as much as possible from explicit decision-making processes (see Kuperberg et al., 2006 for discussion). Analogous foil detection paradigms have been used to study sentence processing and word recognition (Pallier et al., 2011; Dehaene-Lambertz et al., 2018). We now clarify this in the Introduction. The “magical” element occurred both within and across sentences so that participants could not use coherence as a cue to complete the task. Approximately 1/5 (19%) of the trials were magical catch trials to ensure that participants remained attentive throughout the experiment.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors hypothesize that "causal inferences about illness depend on content-specific semantic representations in the animacy network". They test this hypothesis in an fMRI task, by comparing brain activity elicited by participants' exposure to written situations suggesting a plausible cause of illness with brain activity in linguistically equivalent situations suggesting a plausible cause of mechanical failure or damage and non-causal situations. These contrasts identify PC as the main "culprit" in a whole-brain univariate analysis. Then the question arises of whether the content-specificity has to do with inferences about animates in general, or if there are some distinctions between reasoning about people's bodies versus mental states. To answer this question, the authors localize the mentalizing network and study the relation between brain activity elicited by Illness-Causal > Mech-Causal and Mentalizing > Physical stories. They conclude that inferring about the causes of illness partially differentiates from reasoning about people's states of mind. The authors finally test the alternative yet non-mutually exclusive hypothesis that both types of causal inferences (illness and mechanical) depend on shared neural machinery. Good candidates are language and logic, which justifies the use of a language/logic localizer. No evidence of commonalities across causal inferences versus non-causal situations is found.

      Strengths:

      (1) This study introduces a useful paradigm and well-designed set of stimuli to test for implicit causal inferences.

      (2) Another important methodological advance is the addition of physical stories to the original mentalizing protocol.

      (3) With these tools, or a variant of these tools, this study has the potential to pave the way for further investigation of naïve biology and causal inference.

      Weaknesses:

      (1) This study is missing a big-picture question. It is not clear whether the authors investigate the neural correlates of causal reasoning or of naïve biology. If the former, the choice of an orthogonal task, making causal reasoning implicit, is questionable. If the latter, the choice of mechanical and physical controls can be seen as reductive and problematic.

      We have modified the Introduction to clarify that the primary goal of the current study is to test the claim that semantic networks encode causal knowledge – in this case, causal intuitive theories of biology. Most conceptions of intuitive biology, intuitive psychology, and intuitive physics describe them as causal frameworks (e.g., Wellman & Gelman, 1992; Simons & Keil, 1995; Keil et al., 1999; Tenenbaum, Griffiths, & Niyogi, 2007; Gopnik & Wellman, 2012; Gerstenberg & Tenenbaum, 2017). As noted above, we chose an implicit task to investigate automatic causal inferences during reading and to unconfound such processing as much as possible from explicit decision-making processes. We are not sure what the reviewer means when they say that mechanical and physical controls are reductive. This is the standard control condition in neural and behavioral paradigms that investigate intuitive psychology and intuitive biology (e.g., Saxe & Kanwisher, 2003; Gelman & Wellman, 1991).

      (2) The rationale for focusing mostly on the precuneus is not clear and this choice could almost be seen as a post-hoc hypothesis.

      This study is preregistered (https://osf.io/6pnqg). The preregistration states that the precuneus is a hypothesized area of interest, so this is not a post-hoc hypothesis. Our hypothesis was informed by multiple prior studies implicating the precuneus in the semantic representation of animates (e.g., people, animals) (Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019; Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2025). We also conducted a pilot experiment with separate participants prior to pre-registering the study. We now clarify our rationale for focusing on the precuneus in the Introduction:

      “Illness affects living things (e.g., people and animals) rather than inanimate objects (e.g., rocks, machines, houses). Thinking about living things (animates) as opposed to non-living things (inanimate objects/places) recruits partially distinct neural systems (e.g., Warrington & Shallice, 1984; Hillis & Caramazza, 1991; Caramazza & Shelton, 1998; Farah & Rabinowitz, 2003). The precuneus (PC) is part of the ‘animacy’ semantic network and responds preferentially to living things (i.e., people and animals), whether presented as images or words (Devlin et al., 2002; Fairhall & Caramazza, 2013a, 2013b; Fairhall et al., 2014; Peer et al., 2015; Wang et al., 2016; Silson et al., 2019; Rabini, Ubaldi, & Fairhall, 2021; Deen & Freiwald, 2022; Aglinskas & Fairhall, 2023; Hauptman, Elli, et al., 2025). By contrast, parts of the visual system (e.g., fusiform face area) that respond preferentially to animates do so primarily for images (Kanwisher et al., 1997; Grill-Spector et al., 2004; Noppeney et al., 2006; Mahon et al., 2009; Konkle & Caramazza, 2013; Connolly et al., 2016; see Bi et al., 2016 for a review). We hypothesized that the PC represents causal knowledge relevant to animates and tested the prediction that it would be activated during implicit causal inferences about illness, which rely on such knowledge (preregistration: https://osf.io/6pnqg).”

      (3) The choice of an orthogonal 'magic detection' task has three problematic consequences in this study:

      (a) It differs in nature from the 'mentalizing' task that consists of evaluating a character's beliefs explicitly from the corresponding story, which complicates the study of the relation between both tasks. While the authors do not compare both tasks directly, it is unclear to what extent this intrinsic difference between implicit versus explicit judgments of people's body versus mental states could influence the results.

      (b) The extent to which the failure to find shared neural machinery between both types of inferences (illness and mechanical) can be attributed to the implicit character of the task is not clear.

      (c) The introduction of a category of non-interest that contains only 36 trials compared to 38 trials for all four categories of interest creates a design imbalance.

      We disagree with the reviewer’s argument that our use of an implicit “magic detection” task is problematic. Indeed, we think it is one of the advances of the current study over prior work.

      a) Prior work has shown that implicit mentalizing tasks (e.g., naturalistic movie watching) engages the theory of mind network, suggesting that the implicit/explicit nature of the task does not drive the activation of this network (Jacoby et al., 2016; Richardson et al., 2018). With these data in mind, it is unlikely that the implicit/explicit nature of the causal inference and theory of mind tasks in the present experiment can explain observed differences between them.

      b) Explicit causal inferences introduce a collection of executive processes that potentially confound the results and make it difficult to know whether neural signatures are related to causal inference per se. The current study focuses on the neural basis of implicit causal inference, a type of inference that is made routinely during language comprehension. We do not claim to find neural signatures of all causal inferences, we do not think any study could claim to do so because causal inferences are a highly varied class.

      c) Our findings do not exclude the possibility that content-invariant responses are elicited during explicit causality judgments. We clarify this point in the Results (e.g., “These results leave open the possibility that domain-general systems support the explicit search for causal connections”) and Discussion (e.g., “The discovery of novel causal relationships (e.g., ‘blicket detectors’; Gopnik et al., 2001) and the identification of complex causes, even in the case of illness, may depend in part on domain-general neural mechanisms”).

      d) Because the magic trials are excluded from our analyses, it is unclear how the imbalance in the number of magic trials could influence the results and our interpretation of them. We note that the number of catch trials in standard target detection paradigms are sometimes much lower than the number of target trials in each condition (e.g., Pallier et al., 2011).

      (4) Another imbalance is present in the design of this study: the number of trials per category is not the same in each run of the main task. This imbalance does not seem to be accounted for in the 1st-level GLM and renders a bit problematic the subsequent use of MVPA.

      Each condition is shown either 6 or 7 times per run (maximum difference of 1 trial between conditions), and the number of trials per condition is equal across the whole experiment: each condition is shown 7 times in two of the runs and 6 times four of the runs. This minor design imbalance is typical of fMRI experiments and should not impact our interpretations of the data, particularly because we average responses from each condition within a run before submitting them to MVPA.

      (5) The main claim of the authors, encapsulated by the title of the present manuscript, is not tested directly. While the authors included in their protocol independent localizers for mentalizing, language, and logic, they did not include an independent localizer for "animacy". As such, they cannot provide a within-subject evaluation of their claim, which is entirely based on the presence of a partial overlap in PC (which is also involved in a wide range of tasks) with previous results on animacy.

      We respectfully disagree with this assertion. Our primary analysis uses a within-subject leave-one-run-out approach. This approach allows us to use part of the data itself to localize animacy-relevant causal responses in the PC without engaging in ‘double-dipping’ or statistical non-independence (Vul & Kanwisher, 2011). We also use the mentalizing network localizer as a partial localizer for animacy. This is because the control condition (physical reasoning) does not include references to people or any animate agents (Supplementary Figures 1 and 15). We now clarify this point in Methods section of the paper (see below).

      From the Methods: “To test the relationship between neural responses to inferences about the body and the mind, and to localize animacy regions, we used a localizer task to identify the mentalizing network in each participant (Saxe & Kanwisher, 2003; Dodell-Feder et al., 2011; http://saxelab.mit.edu/use-our-efficient-false-belief-localizer)...Our physical stories incorporated more vivid descriptions of physical interactions and did not make any references to human agents, enabling us to use the mentalizing localizer as a localizer for animacy.”

      Reviewer #3 (Public review):

      Summary:

      This study employed an implicit task, showing vignettes to participants while a bold signal was acquired. The aim was to capture automatic causal inferences that emerge during language processing and comprehension. In particular, the authors compared causal inferences about illness with two control conditions, causal inferences about mechanical failures and non-causal phrases related to illnesses. All phrases that were employed described contexts with people, to avoid animacy/inanimate confound in the results. The authors had a specific hypothesis concerning the role of the precuneus (PC) in being sensitive to causal inferences about illnesses.

      These findings indicate that implicit causal inferences are facilitated by semantic networks specialized for encoding causal knowledge.

      Strengths:

      The major strength of the study is the clever design of the stimuli (which are nicely matched for a number of features) which can tease apart the role of the type of causal inference (illness-causal or mechanical-causal) and the use of two localizers (logic/language and mentalizing) to investigate the hypothesis that the language and/or logical reasoning networks preferentially respond to causal inference regardless of the content domain being tested (illnesses or mechanical).

      Weaknesses:

      I have identified the following main weaknesses:

      (1) Precuneus (PC) and Temporo-Parietal junction (TPJ) show very similar patterns of results, and the manuscript is mostly focused on PC (also the abstract). To what extent does the fact that PC and TPJ show similar trends affect the inferences we can derive from the results of the paper? I wonder whether additional analyses (connectivity?) would help provide information about this network.

      We thank the reviewer for this suggestion. While the PC shows the most robust univariate preference for illness inferences compared to both mechanical inferences and noncausal vignettes, the TPJ also shows a preference for illness inferences compared to mechanical inferences in individual-subject fROI analysis. However, as we mention in the Results section, the TPJ does not show a preference for illness inferences compared to noncausal vignettes, suggesting that the TPJ is selective for animacy but may not be as sensitive to causal knowledge about animacy-specific processes. When describing our results, we refer to the ‘animacy network’ (i.e., PC and TPJ) but also highlight that the PC exhibited the most robust responses to illness inferences (from the Results: “Inferring illness causes preferentially recruited the animacy semantic network, particularly the PC”; from the Discussion: “We find that a semantic network previously implicated in thinking about animates, particularly the precuneus (PC), is preferentially engaged when people infer causes of illness…”). We did not collect resting state data that would enable a connectivity analysis, as the reviewer suggests. This is an interesting direction for future work.

      (2) Results are mainly supported by an univariate ROI approach, and the MVPA ROI approach is performed on a subregion of one of the ROI regions (left precuneus). Results could then have a limited impact on our understanding of brain functioning.

      The original and current versions of the paper include results from multiple multivariate analyses, including whole-cortex searchlight MVPA and individual-subject fROI MVPA performed in multiple search spaces (see Supplementary Figures 10 and 11, Supplementary Tables 2 and 3).

      We note that our preregistered predictions focused primarily on univariate differences. This is because the current study investigates neural responses to inferences, and univariate increases in activity is thought to reflect the processing of such inferences. We use multivariate analyses to complement our primary univariate analyses. However, given that we observe significant univariate effects and that multivariate analyses are heavily influenced by significant univariate effects (Coutanche, 2013; Kragel et al., 2012; Hebart & Baker, 2018; Woolgar et al., 2014; Davis et al., 2014; Pakravan et al., 2022), our univariate results constitute the main findings of the paper.

      (3) In all figures: there are no measures of dispersion of the data across participants. The reader can only see aggregated (mean) data. E.g., percentage signal changes (PSC) do not report measures of dispersion of the data, nor do we have bold maps showing the overlap of the response across participants. Only in Figure 2, we see the data of 6 selected participants out of 20.

      We thank the reviewer for this suggestion. We now include graphs depicting the dispersion of the data across participants in the following figures: Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14. We have also created 2 figures that display the overlap of univariate responses across participants (Supplementary Figures 6 and 7). These figures show that there is high overlap across participants in PC responses to illness inferences but not mechanical inferences. In addition, all participants’ results from the analysis depicted in Figure 2 are included in Supplementary Figure 3. 

      (4) Sometimes acronyms are defined in the text after they appear for the first time.

      We thank the reviewer for pointing this out. We now define all acronyms before using them.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I was unable to access the pre-registration on OSF because special permission is required.

      We apologize for this technical error. The preregistration is now publicly available: https://osf.io/6pnqg.

      (2) The length of the MRI session is quite long (around 2 hours). It is generally discouraged to have such extended data acquisition periods, as this can affect the stability and cleanliness of the data. Did you observe any effects of fatigue or attention decline in your data?

      The session was 2 hours long including 1-2 10-minute breaks. Without breaks, the scan would be approximately 1.5 hours. This is a standard length for MRI experiments. The main experiment (causal inference task) was always conducted first and lasted approximately 1 hour. Accuracy did not decrease across the 6 runs of this experiment (repeated measures ANOVA, F<sub>(5,114)</sub> = 1.35, p = .25).

      (3) The last sentence of the results states: "Although MVPA searchlight analysis identified several areas where patterns of activity distinguished between causal and non-causal vignettes, all of these regions showed a preference for non-causal vignettes in univariate analysis (Supplementary Figure 5)." This statement is not entirely accurate. As I previously pointed out, the MVPA searchlight analysis is not very informative and is difficult to interpret. However, as previously suggested, there are additional steps that could be taken to better understand and interpret these results. It is incorrect to conclude that because the brain regions identified in the MVPA analyses show a preference for non-causal vignettes in univariate analyses, the multivariate results lack value. While univariate analyses may show a preference for a specific condition, multivariate analyses can reveal more fine-grained representations of multiple conditions. For a notable example, consider the fusiform face area (FFA) that shows a clear preference for faces at the univariate level but can significantly decode other categories at the multivariate level, even when faces are not included in the analysis.

      The decoding analysis that the reviewer is suggesting for the current study would be analogous to identifying univariate differences between faces and places in the FFA and then decoding between faces and places and claiming that the FFA represents places because the decoding is significant. The decoding analyses enabled by our design are not equivalent to decoding within a condition (e.g., among face identities, among types of illness inferences), as the reviewer suggests above. It is not that such multivariate analyses “lack value” but that they recapitulate established univariate differences. Multivariate analyses are useful for revealing more fine-grained representations when i) significant univariate differences are not observed, or ii) when it is possible to decode among categories within a condition (e.g., among face identities, among types of illness inferences). We are currently collecting data that will enable us to perform within-condition decoding analyses in future work, but the design of the current study does not allow for such a comparison.

      We note that the original quotation from the manuscript has been removed because it is no longer accurate. When including participant response time as a covariate of no interest in the GLM, no regions are shared across the 4 searchlight analyses comparing causal and noncausal conditions, suggesting that there are no shared neural responses to causal inference in our dataset.

      Reviewer #2 (Recommendations for the authors):

      (1) Moderating the strength of some claims made to justify the main hypothesis (e.g., "people but not machines transmit diseases to each other through physical contact").

      We changed this wording so that it now reads: “Illness affects living things (e.g., people and animals) rather than inanimate objects (e.g., rocks, machines, houses).” (Introduction)

      (2) Expanding the paragraph introducing the sub-question about inferring people's "body states" vs "mental states". In addition, given the order in which the hypotheses are introduced, and the results are presented, I would suggest switching the order of presentation of both localizers in the methods section and adding a quick reminder of the hypotheses that justify using these localizers.

      We thank the reviewer for these suggestions. In accordance their suggestions, we have expanded the paragraph Introduction that introduces the “body states” vs. “mental states” question (see below). We have also switched the order of the localizer descriptions in the Methods section and added a sentence at the start of each section describing the relevant hypotheses (see below).

      From the Introduction: “We also compared neural responses to causal inferences about the body (i.e., illness) and inferences about the mind (i.e., mental states). Both types of inferences are about animate entities, and some developmental work suggests that children use the same set of causal principles to think about bodies and minds (Carey, 1985, 1988). Other evidence suggests that by early childhood, young children have distinct causal knowledge about the body and the mind (Springer & Keil, 1991; Callanan & Oakes, 1992; Wellman & Gelman, 1992; Inagaki & Hatano, 1993; 2004; Keil, 1994; Hickling & Wellman, 2001; Medin et al., 2010). For instance, preschoolers are more likely to view illness as a consequence of biological causes, such as contagion, rather than psychological causes, such as malicious intent (Springer & Ruckel, 1992; Raman & Winer, 2004; see also Legare & Gelman, 2008). The neural relationship between inferences about bodies and minds has not been fully described. The ‘mentalizing network’, including the PC, is engaged when people reason about agents’ beliefs (Saxe & Kanwisher, 2003; Saxe et al., 2006; Saxe & Powell, 2006; Dodell-Feder et al., 2011; Dufour et al., 2013). We localized this network in individual participants and measured its neuroanatomical relationship to the network activated by illness inferences.”

      From the Methods, localizer descriptions: “To test the relationship between neural responses to inferences about the body and the mind, and to localize animacy regions, we used a localizer task to identify the mentalizing network in each participant… To test for the presence of domain-general responses to causal inference in the language and logic networks (e.g., Kuperberg et al., 2006; Operskalski & Barbey, 2017), we used an additional localizer task to identify both networks in each participant.”

      (3) Adding a quick analysis of lateralization to support the corresponding claim of left lateralization of responses to causal inferences.

      In accordance with the reviewer’s suggestion, we now include hemisphere as a factor in all ANOVAs comparing univariate responses across conditions.

      From the Results: “In individual-subject fROI analysis (leave-one-run-out), we similarly found that inferring illness causes activated the PC more than inferring causes of mechanical breakdown (repeated measures ANOVA, condition (Illness-Causal, Mechanical-Causal) x hemisphere (left, right): main effect of condition, F<sub>(1,19)</sub> = 19.18, p < .001, main effect of hemisphere, F<sub>(1,19)</sub> = 0.3, p = .59, condition x hemisphere interaction, F<sub>(1,19)</sub> = 27.48, p < .001; Figure 1A). This effect was larger in the left than in the right PC (paired samples t-tests; left PC: t<sub>(19)</sub> = 5.36, p < .001, right PC: t<sub>(19)</sub> = 2.27, p = .04)…In contrast to the animacy-responsive PC, the anterior PPA showed the opposite pattern, responding more to mechanical inferences than illness inferences (leave-one-run-out individual-subject fROI analysis; repeated measures ANOVA, condition (Mechanical-Causal, Illness-Causal) x hemisphere (left, right): main effect of condition, F<sub>(1,19)</sub> = 17.93, p < .001, main effect of hemisphere, F<sub>(1,19)</sub> = 1.33, p = .26, condition x hemisphere interaction, F<sub>(1,19)</sub> = 7.8, p = .01; Figure 4A). This effect was significant only in the left anterior PPA (paired samples t-tests; left anterior PPA: t<sub>(19)</sub> = 4, p < .001, right anterior PPA: t<sub>(19)</sub> = 1.88, p = .08).”

      (4) Making public and accessible the pre-registration OSF link.

      We apologize for this technical error. The preregistration is now publicly available: https://osf.io/6pnqg.

      Reviewer #3 (Recommendations for the authors):

      In all figures: there are no measures of dispersion of the data across participants. The reader can only see aggregated (mean) data. E.g., percentage signal changes (PSC) do not report measures of dispersion of the data, nor do we have bold maps showing the overlap of the response across participants. Only in Figure 2, we see the data of 6 selected participants out of 20.

      We thank the reviewer for this suggestion. We now include graphs depicting the dispersion of the data across participants in the following figures: Figures 1, 3, and 4, and Supplementary Figures 8, 12, and 14. We have also created 2 figures that display the overlap of univariate responses across participants (Supplementary Figures 6 and 7). In addition, all participants’ results from the analysis depicted in Figure 2 are included in Supplementary Figure 3.

      Minor

      (1) Figure 2: Spatial dissociation between responses to illness inferences and mental state inferences in the precuneus (PC). If the analysis is the result of the MVPA, the figure should report the fact that only the left precuneus was analyzed.

      Figure 2 depicts the spatial dissociation in univariate responses to illness inferences and mental state inferences. We now clarify this in the figure legend.

      (2) VOTC and PSC acronyms are defined in the text after they appear for the first time. TPJ is never defined.

      We thank the reviewer for pointing this out. We now define all acronyms before using them.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The paper addresses the knowledge gap between the representation of goal direction in the central complex and how motor systems stabilize movement toward that goal. The authors focused on two descending neurons, DNa01 and 02, and showed that they play different roles in steering the fly toward a goal. They also explored the connectome data to propose a model to explain how these DNs could mediate response to lateralized sensory inputs. They finally used lateralized optogenetic activation/inactivation experiments to test the roles of these neurons in mediating turnings in freely walking flies.

      Strengths:

      The experiments are well-designed and controlled. The experiment in Figure 4 is elegant, and the authors put a lot of effort into ensuring that ATP puffs do not accidentally activate the DNs. They also have explained complex experiments well. I only have minor comments for the authors.

      We are grateful for this positive feedback.

      Weaknesses:

      (1) I do not fully understand how the authors extracted the correlation functions from the population data in Figure 1. Since the ipsilateral DNs are anti-correlated with the contralateral ones, I expected that the average will drop to zero when they are pooled together (e.g., 1E-G). Of course, this will not be the case if all the data in Figure 1 are collected from the same brain hemisphere. It would be helpful if the authors could explain this.

      We regret that this information was not easy to find in our initial submission. As noted in the Figure 1D legend, Here and elsewhere, ipsi and contra are defined relative to the recorded DN(s). We have now added a sentence to the Results (right after we introduce Figure 1D) that also makes this point.

      (2) What constitutes the goal directions in Figures 1-3 and 8, as the authors could not use EPG activity as a proxy for goal directions? If these experiments were done in the dark, without landmarks, one would expect the fly's heading to drift randomly at times, and they would not engage the DNa01/02 for turning. Do the walking trajectories in these experiments qualify as menotactic bouts?

      Published work (Green et al., 2019) has shown that, even in the dark, flies will often walk for extended periods while holding the bump of EPG activity at a fixed location. During these epochs, the brain is essentially estimating that the fly is walking in a straight line in a fixed direction. (The fact that the fly is actually rotating a bit on the spherical treadmill is not something the fly can know, in the dark.) Thus, epochs where the EPG bump is held fixed are treated as menotactic bouts, even in darkness.

      Our results provide additional support for this interpretation. We find that, when flies are walking in darkness and holding the bump of EPG activity at a fixed location, they will make a corrective behavioral turning maneuver in response to an imposed bump-jump. This result argues that the flies are actually engaging in goal-directed straight-line walking, i.e. menotaxis, and it reproduces the findings of Green et al. (2019).

      To clarify this point, we have adjusted the wording of the Results pertaining to Figure 4.

      (3) In Figure 2B, the authors mentioned that DNa02 overpredicts and 01 underpredicts rapid turning and provided single examples. It would be nice to see more population-level quantification to support this claim.

      In this revision, we have reorganized Figures 1 and 2 (and associated text) to improve clarity. As part of this reorganization, we have removed this passage from the text, as it was a minor point in any event.

      Reviewer #2 (Public review):

      The data is largely electrophysiological recordings coupled with behavioral measurements (technically impressive) and some gain-of-function experiments in freely walking flies. Loss-of-function was tested but had minimal effect, which is not surprising in a system with partially redundant control mechanisms. The data is also consistent with/complementary to subsequent manuscripts (Yang 2023, Feng 2024, and Ros 2024) showing additional descending neurons with contributions to steering in walking and flying.

      The experiments are well executed, the results interesting, and the description clear. Some hypotheses based on connectome anatomy are tested: the insights on the pre-synaptic side - how sensory and central complex heading circuits converge onto these DNs are stronger than the suggestions about biomechanical mechanisms for how turning happens on the motor side.

      Of particular interest is the idea that different sensory cues can converge on a common motor program. The turn-toward or turn-away mechanism is initiated by valence rather than whether the stimulus was odor or temperature or memory of heading. The idea that animals choose a direction based on external sensory information and then maintain that direction as a heading through a more internal, goal-based memory mechanism, is interesting but it is hard to separate conclusively.

      To clarify, we mention the role of memory in connection with two places in the manuscript. First, we note that the EPG/head direction system relies on learning and memory to construct a map of directional cues in the environment. These cues are, in principle, inherently neutral, i.e. without valence. Second, we note that specific mushroom body output neurons rely on learning and memory to store the valence associated with an odor. This information is not necessarily associated with an allocentric direction: it is simply the association of odor with value. Both of these ideas are well-attested by previous work.

      The reviewer may be suggesting a sequential scheme whereby the brain initializes an allocentric goal direction based on valence, and then maintains that goal direction in memory, based on that initialization. In other words, memory is used to associate valence with some allocentric direction. This seems plausible, but it is not a claim we make in our manuscript.

      The "see-saw", where left-right symmetry is broken to allow a turn, presumably by excitation on one side and inhibition of the other leg motor modules, is interesting but not well explained here. How hyperpolarization affects motor outputs is not clear.

      We have added several sentences to the Discussion to clarify this point. According to this see-saw model, steering can emerge from right/left asymmetries in excitation, or inhibition, or both. It may be nonintuitive to think that inhibitory input to a DN can produce an action. However, this becomes more plausible given our finding that DNa02 has a relatively high basal firing rate (Fig. 1D), and DNa02 hyperpolarization is associated with contraversive turning (Fig. 5A). It is also relevant to note that there are many inhibitory cell types that form strong unilateral connections onto DNa02 (e.g., AOTU019).

      The statement near Figure 5B that "DNa02 activity was higher on the side ipsilateral to the attractive stimulus, but contralateral to the aversive stimulus" is really important - and only possible to see because of the dual recordings.

      We thank the reviewer for this positive feedback.

      Reviewer #3 (Public review):

      Summary:

      Rayshubskiy et al. performed whole-cell recordings from descending neurons (DNs) of fruit flies to characterize their role in steering. Two DNs implicated in "walking control" and "steering control" by previous studies (Namiki et al., 2018, Cande et al., 2018, Chen et al., 2018) were chosen by the authors for further characterization. In-vivo whole-cell recordings from DNa01 and DNa02 showed that their activity predicts spontaneous ipsilateral turning events. The recordings also showed that while DNa02 predicts transient turns DNa01 predicts slow sustained turns. However, optogenetic activation or inactivation showed relatively subtle phenotypes for both neurons (consistent with data in other recent preprints, Yang et al 2023 and Feng et al 2024). The authors also further characterized DNa02 with respect to its inputs and showed a functional connection with olfactory and thermosensory inputs as well as with the head-direction system. DNa01 is not characterized to this extent.

      Strengths:

      (1) In-vivo recordings and especially dual recordings are extremely challenging in Drosophila and provide a much higher resolution DN characterization than other recent studies that have relied on behavior or calcium imaging. Especially impressive are the simultaneous recordings from bilateral DNs (Figure 3). These bilateral recordings show clearly that DNa02 cells not only fire more during ipsilateral turning events but that they get inhibited during contralateral turns. In line with this observation, the difference between left and right DNa02 neuronal activity is a much better predictor of turning events compared to individual DNa02 activity.

      (2) Another technical feat in this work is driving local excitation in the head-direction neuronal ensemble

      (PEN-1 neurons), while simultaneously imaging its activity and performing whole-cell recordings from DNa02

      (Figure 4). This impressive approach provided a way to causally relate changes in the head-direction system to DNa02 activity. Indeed, DNa02 activity could predict the rate at which an artificially triggered bump in the PEN-1 ring attractor returns to its previous stable point.

      (3) The authors also support the above observations with connectomics analysis and provide circuit motifs that can explain how the head direction system (as well as external olfactory/thermal stimuli) communicated with DNa02. All these results unequivocally put DNa02 as an essential DN in steering control, both during exploratory navigation as well as stimulus-directed turns.

      We are grateful for this detailed positive feedback.

      Weaknesses:

      (1) I understand that the first version of this preprint was already on biorxiv in 2020, and some of the "weaknesses" I list are likely a reflection of the fact that I'm tasked to review this manuscript in late 2024 (more than 4 years later). But given this is a 2024 updated version it suffers from laying out the results in contemporary terms. For instance, the manuscript lacks any reference to the DNp09 circuit implicated in object-directed turning and upstream to DNa02 even though the authors cite one of the papers where this was analyzed (Braun et al, 2024). More importantly, these studies (both Braun et al 2024 and Sapkal et al 2024) along with recent work from the authors' lab (Yang et al 2023) and other labs (Feng et al 2024) provide a view that the entire suite of leg kinematics changes required for turning are orchestrated by populations of heterogeneous interconnected DNs. Moreover, these studies also show that this DN-DN network has some degree of hierarchy with some DNs being upstream to other DNs. In this contemporary view of steering control, DNa02 (like DNg13 from Yang et al 2023) is a downstream DN that is recruited by hierarchically upstream DNs like DNa03, DNp09, etc. In this view, DNa02 is likely to be involved in most turning events, but by itself unable to drive all the motor outputs required for the said events. This reasoning could be used while discussing the lack of major phenotypes with DNa02 activation or inactivation observed in the current study, which is in stark contrast to strong phenotypes observed in the case of hierarchically upstream DNs like DNp09 or DNa03. In the section, "Contributions of single descending neuron types to steering behavior": the authors start off by asking if individual DNs can make measurable contributions to steering behavior. Once more, any citations to DNp09 or DNa03 - two DNs that are clearly shown to drive strong turning-on activation (Bidaye et al, 2020, Feng et al 2024) - are lacking. Besides misleading the reader, such statements also digress the results away from contemporary knowledge in the field. I appreciate that the brief discussion in the section titled "Ensemble codes for steering" tries to cover these recent updates. However, I think this would serve a better purpose in the introduction and help guide the results.

      We apologize for these omissions of relevant citations, which we have now fixed. Specifically, in our revised Discussion, we now point out that:

      - Braun et al. (2024) reported that bilateral optogenetic activation of either DNa02 or DNa01 can drive turning (in either direction). 

      - Braun et al. (2024) also identified DNb02 as a steering-related DN.

      - Bidaye et al. (2020), Sapkal et al. (2024), and Braun et al. (2024) all contributed to the identification of DNp09 as a broadcaster DN with the capacity to promote ipsiversive turning.

      We have also revised the beginning of the Results section titled “Contributions of single descending neuron types to steering behavior”, as suggested by the Reviewer.

      Finally, we agree with the Reviewer’s overall point that steering is influenced by multiple DNs. We have not claimed that any DN is solely responsible for steering. As we note in the Discussion: “We found that optogenetically inhibiting DNa01 produced only small defects in steering, and inhibiting DNa02 did not produce statistically significant effects on steering; these results make sense if DNa02 is just one of many steering DNs.”

      (2) The second major weakness is the lack of any immunohistochemistry (IHC) images quantifying the expression of the genetic tools used in these studies. Even though the main split-Gal4 tools for DNa01 and DNa02 were previously reported by Namiki et al, 2018, it is important to document the expression with the effectors used in this work and explicitly mention the expression in any ectopic neurons. Similarly, for any experiments where drivers were combined together (double recordings, functional connectivity) or modified for stochastic expression (Figure 8), IHC images are absolutely necessary. Without this evidence, it is difficult to trust many of the results (especially in the case of behavioral experiments in Figure 8). For example, the DNa01 genetic driver used by the authors is also expressed in some neurons in the nerve cord (as shown on the Flylight webpage of Janelia Research Campus). One wonders if all or part of the results described in Figure 8 are due to DNa01 manipulation or manipulation of the nerve cord neurons. The same applies for optic lobe neurons in the DNa02 driver.

      This is a reasonable request. We used DN split-Gal4 lines to express three types of UAS-linked transgenes:

      (1) GFP

      In these flies, we know that expression in DNs is restricted to the DN types in question, based on published work (Namki et al., 2018), as well as the fact that we see one labeled DN soma per hemisphere. When we label both cells with GFP, we use the spike waveform to identify DNa02 and DNa01, as described in Figure S1

      (2) ReaChR

      In these flies, expression patterns were different in different flies because ReaChR expression was stochastically sparsened using hs-FLP. Expression was validated in each fly after the experiment, as described in the Methods (“Stochastic ReaChR expression”). hs-FLP-mediated sparsening will necessarily produce stochastic patterns of expression in both DNa02 and off-target cells, and this is true of all the flies in this experiment. What makes the “unilateral” flies distinct from the “bilateral” flies is that unilateral flies express ReaChR in one copy of DNa02, whereas bilateral flies express ReaChR in both copies of DNa02. On average, off-target expression will be the same in both groups.

      (3) GtACR1

      In these flies, we initially assumed that GtACR1 expression was the same as GFP expression under control of the same driver. However, we agree with the reviewer’s point that these two expression patterns are not necessarily identical. Therefore, to address the reviewer’s question, we performed immunofluorescence microscopy to characterize GtACR1 patterns in the brain and VNC of both genotypes. These expression patterns are now shown in a new supplemental figure (Figure S8). This figure shows that, as it happens, expression of GtACR1 is indeed indistinguishable from the GFP expression patterns for the same lines (archived on the FlyLight website). Both DN split-Gal4 lines are largely selective for the DNs in question, with limited off-target labeling. We have now drawn attention to this off-target labeling in the last paragraph of the Results, where the GtACR1 results are discussed.

      (3) The paper starts off with a comparative analysis of the roles of DNa01 and DNa02 during steering. Unfortunately, after this initial analysis, DNa01 is largely ignored for further characterization (e.g. with respect to inputs, connectomics, etc.), only to return in the final figure for behavioral characterization where DNa01 seems to have a stronger silencing phenotype compared to DNa02. I couldn't find an explanation for this imbalance in the characterization of DNa01 versus DNa02. Is this due to technical reasons? Or was it an informed decision due to some results? In addition to being a biased characterization, this also results in the manuscript lacking a coherent thread, which in turn makes it a bit inaccessible to the non-specialist.

      Yes, the first portion of the manuscript focuses on DNa01 and DNa02. The latter part of the manuscript transitions to focus mainly on DNa02. 

      Our rationale is noted at the point in the manuscript where we make this transition, with the section titled “Steering toward internal goals”: “Having identified steering-related DNs, we proceeded to investigate the brain circuits that provide input to these DNs. Here we decided to focus on DNa02, as this cell’s activity is predictive of larger steering maneuvers.” When we say that DNa02 is predictive of larger steering maneuvers, we are referring to several specific results:

      - We obtain larger filter amplitudes for DNa02 versus DNa01 (Fig. 2A-C). This means that, just after a unit change in DN firing rate, we see on average a larger change in steering velocity for DNa02 versus DNa01.

      - The linear filter for DNa02 has a higher variance explained, as compared to DNa01 (Fig. 2D). This means that DNa02 is more predictive of steering.

      - The relationship between firing rate and rotational velocity (150 ms later) is steeper for DNa02 than for DNa01 (Fig. 2G). This means that, if we ignore dynamics and we just regress firing rate against subsequent rotational velocity, we see a higher-gain relationship for DNa02.

      Our focus on DNa02 was also driven by connectivity considerations. In the same paragraph (the first paragraph in the section titled “Steering toward internal goals”). We note that “there are strong anatomical pathways from the central complex to DNa02”; the same is not true of DNa01. This point has also been noted by other investigators (Hulse et al. 2021).

      We don’t think this focus on DNa02 makes our work biased or inaccessible. Any study must balance breadth with depth. A useful general way to balance these constraints is to begin a study with a somewhat broader scope, and then narrow the study’s focus to obtain more in-depth information. Here, we began with comparative study of two cell types, and we progressed to the cell type that we found more compelling.

      (4) There seems to be a discrepancy with regard to what is emphasized in the main text and what is shown in Figures S3/S4 in relation to the role of these DNs in backward walking. There are only two sentences in the main text where these figures are cited.

      a) "DNa01 and DNa02 firing rate increases were not consistently followed by large changes in forward velocity

      (Figs. 1G and S3)."

      b) "We found that rotational velocity was consistently related to the difference in right-left firing rates (Fig. 3B). This relationship was essentially linear through its entire dynamic range, and was consistent across paired recordings (Fig. 3C). It was also consistent during backward walking, as well as forward walking (Fig. S4)." These main text sentences imply the role of the difference between left and right DNa02 in turning. However, the actual plots in the Figures S3 and S4 and their respective legends seem to imply a role in "backward walking". For instance, see this sentence from the legend of Figure S3 "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward. When (firing rateDNa02>>firing rateDNa01), the fly is also often moving backward, but forward movement is still more common overall, and so the net effect is that forward velocity is small but still positive when (firing rateDNa02>>firing rateDNa01). Note that when we condition our analysis on behavior rather than neural activity, we do see that backward walking is associated with a large firing rate differential (Fig. S4)." This sort of discrepancy in what is emphasized in the text, versus what is emphasized in the figures, ends up confusing the reader. More importantly, I do not agree with any of these conclusions regarding the implication of backward walking. Both Figures S3 and S4 are riddled with caveats, misinterpretations, and small sample sizes. As a result, I actually support the authors' decision to not infer too much from these figures in the "main text". In fact, I would recommend going one step further and removing/modifying these figures to focus on the role of "rotational velocity". Please find my concerns about these two figures below:

      a) In Figures S3 and S4, every heat map has a different scale for the same parameter: forward velocity. S3A is -10 to +10mm/s. S3B is -6 to +6 S4B (left) is -12 to +12 and S4B (right) is -4 to +4. Since the authors are trying to depict results based on the color-coding this is highly problematic.

      b) Figure S3A legend "When (ΔvoltageDNa02>>ΔvoltageDNa01), the fly is typically moving backward." There are also several instances when ΔvoltageDNa02= ΔvoltageDNa01 and both are low (lower left quadrant) when the fly is typically moving backwards. So in my opinion, this figure in fact suggests DNa02 has no role in backward velocity control.

      c) Based on the example traces in S4A, every time the fly walks backwards it is also turning. Based on this it is important to show absolute rotational velocity in Figure S4C. It could be that the fly is turning around the backward peak which would change the interpretation from Figure S4C. Also, it is important to note that the backward velocities in S4A are unprecedentedly high. No previous reports show flies walking backwards at such high velocities (for example see Chen et al 2018, Nat Comm. for backward walking velocities on a similar setup).

      d) In my opinion, Figure S4D showing that right-left DNa02 correlates with rotational velocity, regardless of whether the fly is in a forward or backward walking state, is the only important and conclusive result in Figures S3/S4. These figures should be rearranged to only emphasize this panel.

      We agree that it is difficult to interpret some of the correlations between DN activity and forward velocity, given that forward velocity and rotational velocity are themselves correlated to some degree. This is why we did not make claims based on these results in the main text. In response to these comments, we have taken the Reviewer’s suggestion to preserve Figure S4D (now Figure S3). The other components of these supplemental figures have been removed.

      (5) Figure 3 shows a really nice analysis of the bilateral DNa02 recordings data. While Figure S5 [now Figure S4] shows that authors have a similar dataset for DNa01, a similar level analysis (Figures 3D, E) is not done for DNa01 data. Is there a reason why this is not done?

      The reason we did not do the same analysis for DNa01 is that we only have two paired DNa01-DNa01 recordings. It turned out to be substantially more difficult to perform DNa01-DNa01 recordings, as compared to DNa02-DNa02 recordings. For this reason, we were not able to get more than two of these recordings.

      (6) In Figure 4 since the authors have trials where bump-jump led to turning in the opposite direction to the DNa02 being recorded, I wonder if the authors could quantify hyperpolarization in DNa02 as is predicted from connectomics data in Figure 7.

      We agree this is an interesting question. However, DNa02 firing rate and membrane potential are variable, and stimulus-evoked hyperpolarizations in these DNs tend to be relatively small (on the order of 1 mV, in the case of a contralateral fictive olfactory stimulus, Figure 5A). In the case of our fictive olfactory stimuli, we could look carefully for these hyperpolarizations because we had a very large number of trials, and we could align these trials precisely to stimulus onset. By contrast, for the bump-jump experiments, we have a more limited number of trials, and turning onset is not so tightly time-locked to the chemogenetic stimuli; for these reasons, we are hesitant to make claims about any bump-jump-related hyperpolarization in these trials.

      (7) Figure 6 suggests that DNa02 contains information about latent steering drives. This is really interesting. However, in order to unequivocally claim this, a higher-resolution postural analysis might be needed. Especially given that DNa02 activation does not reliably evoke ipsilateral turning, these "latent" steering events could actually contain significant postural changes driven by DNa02 (making them "not latent"). Without this information, at least the authors need to explicitly mention this caveat.

      This is a good point. We cannot exclude the possibility that DNa02 is driving postural changes when the fly is stopped, and these postural changes are so small we cannot detect them. In this case, however, there would still be an interesting mismatch between the stimulus-evoked change in DNa02 firing rate (which is large) and the stimulus-evoked postural response (which would be very small). We have added language to the relevant Results section in order to make this explicit.

      (8) Figure 7 would really benefit from connectome data with synapse numbers (or weighted arrows) and a corresponding analysis of DNa01.

      In response to this comment, we have added synapses number information (represented by weighted arrows) to Figures 7C, E, and F. We also added information to the Methods to explain how cells were chosen for inclusion in this diagram. (In brief: we thresholded these connections so as to discard connections with small numbers of synapses.)

      We did perform an analogous connectome circuit analysis for DNa01, but if we use the same thresholds as we do for DNa02, we obtain a much sparser connectivity graph. We now show this in a new supplemental figure (Figure S9). MBON32 makes no monosynaptic connections onto DNa01, and it only forms one disynaptic connection, via LAL018, which is relatively weak. PFL3 and PFL2 make no mono- or disynaptic connections onto DNa01 comparable in strength to what we find for DNa02. 

      The sparser connectivity graph for DNa01 is partly due to the fact that fewer cell types converge onto DNa01 as compared to DNa02 (110 cell types, versus 287 cell types). Also, it seems that DNa01 is simply less closely connected to the central complex and mushroom body, as compared to DNa02.

      (9) In Figure 8E, the most obvious neuronal silencing phenotype is decreased sideways velocity in the case of DNa01 optogenetic silencing. In Figure S2, the inverse filter for sideways velocity for DNa01 had a higher amplitude than the rotational velocity filter. Taken together, does this point at some role for DNa01 in sideways velocity specifically?

      No. The forward filters describe the average velocity impulse response, given a brief step change in firing rate.

      Figure 1 and Figure S2 show that the sideways velocity forward filter is actually smaller for DNa01 than for DNa02. This means that a brief step change in DNa01 firing rate is followed by only a very small sideways velocity response. Conversely, the reverse filters describe the average firing rate impulse response, given a brief step change in sideways velocity. Figure S2 shows that the sideways velocity reverse filter is larger for DNa01 than for DNa02, but this means that the relationship between DNa01 activity and sideways velocity is so weak that we would need to see a very large neural response in order to get a brief step change in sideways velocity. In other words, the reverse filter says that DNa01 likely has very little role in determining sideways velocity.

      (10) In Figure 8G, the effect on inner hind leg stance prolongation is very weak, and given the huge sample size, hard to interpret. Also, it is not clear how this fits with the role of DNa01 in slow sustained turning based on recordings.

      Yes, this effect is small in magnitude, which is not too surprising, given that many DNs seem to be involved in the control of steering in walking. To clarify the interpretation of these phenotypes, we have added a paragraph to the end of the Results:

      “All these effects are weak, and so they should be interpreted with caution. Also, both DN split-Gal4 lines drive expression in a few off-target cell types, which is another reason for caution (Fig. S8). However, they suggest that both DNs can lengthen the stance phase of the ipsilateral back leg, which would cause ipsiversive turning. These results are also compatible with a scenario where both DNs decrease the step length in the ipsilateral legs, which would also cause ipsiversive turning. Step frequency does not normally change asymmetrically during turning, so the observed decrease in step frequency during optogenetic inhibition may just be a by-product of increasing step length when these DNs are inhibited.” We have also added caveats and clarifications in a new Discussion paragraph:

      “Our study does not fully answer the question of how these DNs affect leg kinematics, because we were not able to simultaneously measure DN activity and leg movement. However, our optogenetic experiments suggest that both DNs can lengthen the stance phase of the ipsilateral back leg (Fig. 8G), and/or  decrease the step length in the ipsilateral legs (Fig. 8H), either of which would cause ipsiversive turning. If these DNs have similar qualitative effects on leg kinematics, then why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I found the sign conventions for rotational velocity particularly confusing. Figure 3 represents clockwise rotations as +ve values, but Figure 4H represents anticlockwise rotations as positive values. But for EPG bumps, anticlockwise rotations are given negative values. Please make them consistent unless I am missing something obvious.

      Different fields use different conventions for yaw velocity. In aeronautics, a clockwise turn is generally positive. In robotics and engineering of terrestrial vehicles, a counterclockwise turn is generally positive. Historically, most Drosophila studies that quantified rotational (yaw) velocity were focused on the behavior of flying flies, and these studies generally used the convention from aeronautics, where a clockwise turn is defined as a positive turn. When we began working in the field, we adopted this convention, in order to conform to previous literature. It might be argued that walking flies are more like robots than airplanes, but it seemed to us that it was confusing to have different conventions for different behaviors of the same animal. Thus, all of the published studies from our lab define clockwise rotation as having positive rotational velocity.

      Figure 4 focuses on the role of the central complex in steering. As the fly turns clockwise (rightward), the bump of activity in EPG neurons normally moves counterclockwise around the ellipsoid body, as viewed from the posterior side (Turner-Evans et al., 2017). The posterior view is the conventional way to represent these dynamics, because (1) we and others typically image the brain from the posterior side, not the anterior side, and (2) in a posterior view, the animal’s left is on the left side of the image, and vice versa. We have added a sentence to the Figure 4A legend to clarify these points.

      Previous work has shown that, when an experimenter artificially “jumps” the EPG bump, this causes the fly to make a compensatory turn that returns the bump to (approximately) its original location (Green et al., 2019). Our work supports this observation. Specifically, we find that clockwise bump jumps are generally followed by rightward turns (which drive the bump to return to its approximate original location via a counterclockwise path), and vice versa. This is noted in the Figure 4D legend. Note that Figure 4D plots the fly’s rotational velocity during the bump return, plotted against the initial bump jump. 

      Figure 4H shows that clockwise (blue) bump returns were typically preceded by leftward turning, counter-clockwise (green) bump returns were preceded by rightward turning, as expected. This is detailed in the Figure 4H legend, and it is consistent with the coordinate frame described above.

      (2) It would be helpful to have images of the DNa01 and DNa02 split lines used in this paper, considering this paper would most likely be used widely to describe the functions of these neurons. Similarly, images of their reconstructions would be a useful addition.

      High-quality three-dimensional confocal stacks of all the driver lines used in our study are publicly available. We have added this information to the Methods (under “Fly husbandry and genotypes”). Confocal images of the full morphologies of DNa01 and DNa02 have been previously published (Namiki et al., 2018). Figure 1A is a schematic that is intended to provide a quick visual summary of this information.

      EM reconstructions of DNa01 and DNa02 are publicly accessible in a whole-brain dataset (https://codex.flywire.ai/) and a whole-VNC dataset (https://neuprint.janelia.org/). Both datasets are referenced in our study. As these datasets are easy to search and browse via user-friendly web-based tools, we expect that interested readers will have no difficulty accessing the underlying datasets directly.

      Reviewer #2 (Recommendations for the authors):

      (1) The description of the activity of the DNs that they "PREDICT steering during walking". This is an interesting word choice. Not causes, not correlates with, not encodes... does that mean the activity always precedes the action? Does that mean when you see activity, you will get behavior? This is important for assessing whether the DN activity is a cause or an effect. It is good to be cautious but it might be worth expanding on exactly what kind of connection is implied to justify the use of the word 'predict'.

      Conventionally, “predict” means “to indicate in advance”. We write that DNs “predict” certain features of behavior. We use this term because (1) these DNs correlate with certain features of behavior, and (2) changes in DN activity precede changes in behavior.

      The notion that neurons can “predict” behavior is not original to our study. Whenever neuroscientists summarize the relationship between neural activity and behavior by fitting a mathematical model (which may be as simple as a linear regression), the fitted model can be said to represent a “prediction” of behavior. These models are evaluated by comparing their predictions with measured behaviors. A good model is predictive, but it also implies that the underlying neural signal is also predictive (Levenstein et al., 2023 Journal of Neuroscience 43: 1074-1088; DOI: 10.1523/JNEUROSCI.1179-22.2022). Here, prediction simply means correlation, without necessarily implying causation. We also use “prediction” to imply correlation.

      We do not think the term “prediction” implies determinism. Meteorologists are said to predict the weather, but it is understood that their predictions are probabilistic, not deterministic. Certainly, we would not claim that there is a deterministic relationship between DN activity and behavior. Figure 2D shows that neither DN type can explain all the variance in the fly’s rotational or sideways velocity. At the same time, both DNs have significant predictive power.

      We might equally say that these DNs “encode” behavior. We have chosen to use the word “predict” rather than “encode” because we do not think it is necessary to use the framework of symbolic communication in connection with these DNs.

      We agree with the Reviewer that it is helpful to test whether any neuron that “predicts” a behavior might also “cause” this behavior. In Figure 8, we show that directly perturbing these DNs can indeed alter locomotor behavior, which suggests a causal role. Connectome analyses also suggest a causal role for these DNs in locomotor behavior (Figure 1B, see especially also Cheong et al., 2024).

      At the same time, it is clear from our results that these DNs are not “command neurons” for turning: they do not deterministically cause turning. Therefore, to avoid misunderstanding, we have generally been careful to summarize the results of our perturbation experiments by avoiding the statement that “this DN causes this behavior”. Rather, we have generally tried to say that “this DN influences this behavior”, or “this DN promotes this behavior”.

      (2) There is some concern about how the linear filter models were developed and then used to predict the relationship between firing rate and steering behavior: how exactly were the build and test data separated to avoid re-extracting the input? It reads like a self-fulfilling prophecy/tautology.

      We used conventional cross-validation for model fitting and evaluation. We apologize that this was not made explicit in our original submission; this was due to an oversight on our part. To be clear: linear filters were computed using the data from the first 20% of a given experiment. We then convolved each cell’s firing rate estimate with the computed Neuron→Behavior filter (the “forward filter”) using the data from the final 80% of the experiment, in order to generate behavioral predictions. Thus, when a model has high variance explained, this is not attributable to overfitting: rather, it quantifies the bona fide predictive power of the model. We have added this information to the Methods (under “Data analysis - Linear filter analysis”).

      (3) Type-O right above Figure 2 [now Figure 1E]: I assume spike rate fluctuations in DNa02 precede DNa01?

      Fixed. Thank you for reading the manuscript carefully.

      (4) The description of the other manuscripts about neural control of the steering as "follow-up" papers is a bit diminishing. They were likely independent works on a similar theme that happened afterwards, rather than deliberate extensions of this paper, so "subsequent" might be a more accurate description.

      We apologize, as we did not intend this to be diminishing. Given this request, we have revised “follow-up” to “subsequent”.

      (5) The idea that DNa02 is high-gain because it is more directly connected to motor neurons is a hypothesis and this should be made clear. We really don't know the functional consequences of the directness of a path or the number of synapses, and which circuits you compare to would change this. DNa02 may be a higher gain than DNa01, but what about relative to the other DNs that enter pre-motor regions? How do you handle a few synapses and several neurons in a common class? All of these connectivity-based deductions await functional tests - like yours! I think it is better to make this clear so readers don't assume a higher level of certainty than we have.

      The Reviewer asks how we handled few-synapse connections, and how we combined neurons in the same class. We apologize for not making this explicit in our original submission. We have now added this information to the Methods. Briefly, to select cell types for inclusion in Figures 7C, we identified all individual cells postsynaptic to PFL3 and presynaptic to DNa02, discarding any unitary connections with <5 synapses. We then grouped unitary connections by cell type, and then summed all synapse numbers within each connection group (e.g., summing all synapses in all PFL3→LAL126 connections). We then discarded connection groups having <200 synapses or <1% of a cell type’s pre- or postsynaptic total. Reported connection weights are per hemisphere, i.e. half of the total within each connection group. For Figure 7F we did the same, but now discarding connection groups having <70 synapses or <0.4% of a cell type’s pre- or postsynaptic total. In Figure S9, we used the same procedures for analyzing connections onto DNa01. 

      We agree that it is tricky to infer function from connectome data, and this applies to motor neuron connectivity. We bring up DN connectivity onto motor neurons in two places. First, in the Results, we note that “steering filters (i.e., rotational and sideways velocity filters) were larger for DNa02 (Fig. 2A,B). This means that an impulse change in firing rate predicts a larger change in steering for this neuron. In other words, this result suggests that DNa02 operates with higher gain. This may be related to the fact that DNa02 makes more direct output synapses onto motor neurons (Fig. 1B) [emphasis added].” We feel this is a relatively conservative statement.

      Subsequently, in the Discussion, we ask, “why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B) [emphasis added].” Again, we feel this is a relatively conservative statement.

      To be sure, none of the motor neurons postsynaptic to DNa02 actually receive most of their synaptic input from DNa02 (or indeed any DN), and this is typical of motor neurons controlling leg muscles. Rather, leg motor neurons tend to get most of their input from interneurons rather than motor neurons (Cheong et al. 2024). Available data suggests that the walking rhythm originates with intrinsic VNC central pattern generators, and the DNs that influence walking do so, in large part, by acting on VNC interneurons. These points have been detailed in recent connectome analyses (see especially Cheong et al. 2024).

      We are reluctant to broaden the scope of our connectome analyses to include other DNs for comparison, because we think these analyses are most appropriate to full-central-nervous-system-(CNS)-connectomes (brain and VNC together), which are currently under construction. Without a full-CNS-connectome, many of the DN axons in the VNC cannot be identified. In the future, we expect that full-CNS-connectomes will allow a systematic comparison of the input and output connectivity of all DN types, and probably also the tentative identification of new steering DNs. Those future analyses should generate new hypotheses about the specializations of DNa02, DNa01, and other DNs. Our study aims to help lay a conceptual foundation for that future work.

      (6) Given the emphasis on the DNa02 to Motor Neuron connectivity shown (Figure 1B) and multiple text mentions, could you include more analyses of which motor neurons are downstream and how these might be expected to affect leg movements? I would like to see the synapse numbers (Figure 1B) as well as the fraction of total output synapses. These additions would help understand the evidence for the "see-saw" model.

      We agree this is interesting. In follow-up work from our lab (Yang et al., 2023), we describe the detailed VNC connectivity linking DNa02 to motor neurons. We refer the Reviewer specifically to Figure 7 of that study (https://www.cell.com/cell/fulltext/S0092-8674(24)00962-0).

      We regret that the see-saw model was perhaps not clear in our original submission. Briefly, this model proposes that an increase in excitatory synaptic input to one DN (and/or a disinhibition of that DN) is often accompanied by an increase in inhibitory synaptic input to the contralateral DN. This model is motivated by connectome data on the brain inputs to DNa02 (Figure 7), along with our observation that excitation of one DN is often accompanied by inhibition of the contralateral DN (Figure 5). We have now added text to the Results in several places in order to clarify these points. 

      This model specifically pertains to the brain inputs to DNs, comparing the downstream targets of these DNs in the VNC would not be a test of this hypothesis. The Reviewer may be asking to see whether there is any connectivity in the brain from one DN to its contralateral partner. We do not find connections of this sort, aside from multisynaptic connections that rely on very weak links (~10 synapses per connection). Figure 7 depicts a much stronger basis for this hypothesis, involving feedforward see-saw connections from PFL3 and MBON32. 

      (7) The conclusions from the data in Figure 8 could be explained more clearly. These seem like small effect sizes on subtle differences in leg movements - maybe like what was seen in granular control by Moonwalker's circuits? Measuring joint angles or step parameters might help clarify, but a summary description would help the reader.

      We agree that these results were not explained very well in our original submission. 

      In our revised manuscript, we have added a new paragraph to the end of this Results section providing some summary and interpretation:

      “All these effects are weak, and so they should be interpreted with caution. However, they suggest that both DNs can lengthen the stance phase of the ipsilateral back leg, which would promote ipsiversive turning. These results are also compatible with a scenario where both DNs decrease the step length in the ipsilateral legs, which would also promote ipsiversive turning. Step frequency does not normally change asymmetrically during turning, so the observed decrease in step frequency during optogenetic inhibition may just be a by-product of increasing step length when these DNs are inhibited.”

      Moreover, in the Discussion, we have also added a new paragraph that synthesizes these results with other results in our study, while also noting the limitations of our study:

      “Our study does not fully answer the question of how these DNs affect leg kinematics, because we were not able to simultaneously measure DN activity and leg movement. However, our optogenetic experiments suggest that both DNs can lengthen the stance phase of the ipsilateral back leg (Fig. 8G), and/or  decrease the step length in the ipsilateral legs (Fig. 8H), either of which would promote ipsiversive turning. If these DNs have similar qualitative effects on leg kinematics, then why does DNa02 precede larger and more rapid steering events? This may be due to the fact that DNa02 receives stronger and more direct input from key steering circuits in the brain (Fig. S9). It may also relate to the fact that DNa02 has more direct connections onto motor neurons (Fig. 1B).”

      In Figure 8D-H, we measure step parameters in freely walking flies during acute optogenetic inhibition of DNa01 and DNa02. In experiments measuring neural activity in flies walking on a spherical treadmill, we did not have a way to measure step parameters. Subsequently, this methodology was developed by Yang et al. (2023) and results for DNa02 are described in that study. 

      Reviewer #3 (Recommendations for the authors):

      Minor Points:

      (1) If space allows, actual membrane potential should be mentioned when raw recordings are shown (for example Figure 1D).

      We have now added absolute membrane potential information to Figure 1d.

      (2) Typo in the sentence "To address this issue directly, we looked closely at the timing of each cell's recruitment in our dual recordings, and found that spike rate fluctuations in DNa02 typically preceded the spike rate fluctuations in DNa02 (Fig. 2A)." The final word should be "DNa01".

      Fixed. Thank you for reading the manuscript carefully.

      (3) Figure 2A - although there aren't direct connections between a01 and a02 in the connectome, the authors never rule out functional connectivity between these two. Given a02 precedes a01, shouldn't this be addressed?

      In the full brain FAFB data set, there are two disynaptic connections from DNa02 onto the ipsilateral copy of DNa01. One connection is via CB0556 (which is GABAergic), and the other is via LAL018 (which is cholinergic). The relevant DNa02 output connections are very weak: each DNa02→CB0556 connection consists of 11 synapses, whereas each DNa02→LAL018 connection consists of 10 synapses (on average). Conversely, each CB0556→DNa01 connection consists of 29 synapses, whereas  each LAL018→DNa01 connection consists of 64 synapses. In short, LAL018 is a nontrivial source of excitatory input to DNa01, but DNa02 is not positioned to exert much influence over LAL018, and the two disynaptic connections from DNa02 onto DNa01 also have the opposite sign. Thus, it seems unlikely that DNa02 is a major driver of DNa01 activity. At the same time, it is difficult to completely exclude this possibility, because we do not understand the logic of the very complicated premotor inputs to these DNs in the brain. Thus, we are hesitant to make a strong statement on this point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Cognitive and brain development during the first two years of life is vast and determinant for later development. However, longitudinal infant studies are complicated and restricted to occidental high-income countries. This study uses fNIRS to investigate the developmental trajectories of functional connectivity networks in infants from a rural community in Gambia. In addition to resting-state data collected from 5 to 24 months, the authors collected growing measures from birth until 24 months and administrated an executive functioning task at 3 or 5 years old.

      The results show left and right frontal-middle and right frontal-posterior negative connections at 5 months that increase with age (i.e., become less negative). Interestingly, contrary to previous findings in high-income countries, there was a decrease in frontal interhemispheric connectivity. Restricted growth during the first months of life was associated with stronger frontal interhemispheric connectivity and weaker right frontal-posterior connectivity at 24 months. Additionally, the study describes that some connectivity patterns related to better cognitive flexibility at pre-school age.

      Strengths:

      - The authors analyze data from 204 infants from a rural area of Gambia, already a big sample for most infant studies. The study might encourage more research on different underrepresented infant populations (i.e., infants not living in occidental high-income countries).

      - The study shows that fNIRS is a feasible instrument to investigate cognitive development when access to fMRI is not possible or outside a lab setting.

      - The fNIRS data preprocessing and analysis are well-planned, implemented, and carefully described. For example, the authors report how the choices in the parameters for the motion artifacts detection algorithm affect data rejection and show how connectivity stability varies with the length of the data segment to justify the threshold of at least 250 seconds free of artifacts for inclusion.

      - The authors use proper statistical methods for analysis, considering the complexity of the dataset.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      - No co-registration of the optodes is implemented. The authors checked for correct placement by looking at pictures taken during the testing session. However, head shape and size differences might affect the results, especially considering that the study involves infants from 5 months to 24 months and that the same fNIRS array was used at all ages.

      The fNIRS array used in this work was co-registered onto age-appropriate MNI templates at every time point in a previous published work L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021). This is reference No. 68 in the manuscript.

      As we mentioned in the section fNIRS preprocessing and data-analysis: ‘The sections were established via the 17 channels of each hemisphere which were grouped into front, middle and back (for a total of six regions) based on a previous co-registration of the BRIGHT fNIRS arrays onto age-appropriate templates’. The procedure mentioned by the reviewer, involving the examination of pictures showing the placement of headbands on participants, aimed to exclude infants with excessive cap displacement from further analysis.

      - The authors regress the global signal to remove systemic physiological noise. While the authors also report the changes in connectivity without global signal regression, there are some critical differences. In particular, the apparent decrease in frontal inter-hemispheric connections is not present when global signal regression is omitted, even though it is present for deoxy-Hb. The authors use connectivity results obtained after applying global signal regression for further analysis. The choice of regressing the global signal is questionable since it has been shown to introduce anti-correlations in fMRI data (Murphy et al., 2009), and fNIRS in young infants does not seem to be highly affected by physiological noise (Emberson et al., 2016). Systemic physiological noise might change at different ages, which makes its remotion critical to investigate functional network development. However, global signal regression might also affect the data differently. The study would have benefited from having short separation channels to measure the systemic psychological component in the data.

      The work of Emberson et. al (2016) mentioned by the reviewer highlights indeed the challenges of removing systemic changes from the infants’ haemodynamic signal with short-channel separation (SSC). In fact, even a SSC of 1 cm detected changes in the blood in the brain, therefore by regressing this signal from the recorded one, the authors removed both systemic changes AND haemodynamic signal. This paper from Emberson et. al (2016) is taken as a reference in the field to suggest that SSC might not be an ideal tool to remove systemic changes when collecting fNIRS data on young infants, as we did in this work.

      We agree with the reviewer's observation that systemic physiological noise may vary with age and among infants. Therefore, for each infant at each age, we regressed the mean value calculated across all channels. This ensures that the regressed signal is not biased by averaged calculations at group levels.

      We are aware of the criticisms directed towards global signal regression in the fMRI literature, although some other works showed anticorrelations in functional connectivity networks both with and without global signal regression (Chaia, 2012). Furthermore, Murphy himself revised his criticism on the use of global signal regression in functional connectivity analysis in one of his more recent works (Murphy et al, 2017). The fact that the decreased FC is significant in results from data pre-processed without global signal regression gives us confidence that this finding is statistically robust and not solely driven by this preprocessing choice in our pipeline.

      An interesting study by Abdalmalak et al. (2022) demonstrated that failing to correct for systemic changes using any method is inappropriate when estimating FC with fNIRS, as it can lead to a high risk of elevated connectivity across the whole brain (see Figure 4 of the mentioned paper). Consequently, we strongly advocate for the implementation of global signal regression in our analysis pipeline as a fundamental step for accurate functional connectivity estimations.

      References:

      Emberson, L. L., Crosswhite, S. L., Goodwin, J. R., Berger, A. J., & Aslin, R. N. (2016). Isolating the effects of surface vasculature in infant neuroimaging using short-distance optical channels: a combination of local and global effects. Neurophotonics, 3(3), 031406-031406.

      Chaia, X. J., Castañóna, A. N., Öngürb, D., & Whitfield-Gabrielia, S. (2012). Anticorrelations in resting state networks without global signal regression. NeuroImage, 59(2), 1420–1428. https://doi.org/10.1515/9783050076010-014

      Murphy, K., & Fox, M. D. (2017). Towards a consensus regarding global signal regression for resting state functional connectivity MRI. NeuroImage, 154(November 2016), 169–173. https://doi.org/10.1016/j.neuroimage.2016.11.052

      Abdalmalak, A., Novi, S. L., Kazazian, K., Norton, L., Benaglia, T., Slessarev, M., ... & Owen, A. M. (2022). Effects of systemic physiology on mapping resting-state networks using functional near-infrared spectroscopy. Frontiers in neuroscience, 16, 803297.

      - I believe the authors bypass a fundamental point in their framing. When discussing the results, the authors compare the developmental trajectories of the infants tested in a rural area of Gambia with the trajectories reported in previous studies on infants growing in occidental high-income countries (likely in urban contexts) and attribute the differences to adverse effects (i.e., nutritional deficits). Differences in developmental trajectories might also derive from other environmental and cultural differences that do not necessarily lead to poor cognitive development.

      We agree with the reviewer that other factors differing between low- and poor-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to investigate this further” (line 238).

      - While the study provides a solid description of the functional connectivity changes in the first two years of life at the group level, the evidence regarding the links between adverse situations, developmental trajectories, and later cognitive capacities is weaker. The authors find that early restricted growth predicts specific connectivity patterns at 24 months and that certain connectivity patterns at specific ages predict cognitive flexibility. However, the link between development trajectories (individual changes in connectivity) with growth and later cognitive capacities is missing. To address this question adequately, the study should have compared infants with different growing profiles or those who suffered or did not from undernutrition. However, as the authors discussed, they lacked statistical power.

      We agree with the reviewer, and indeed we highlighted this as one of the main limitation of our work: “Even given the large sample in our study, we were underpowered to test for group comparisons between sets of infants with distinct undernutrition growth profiles, e.g., infants with early poor growth that later resolved and infants with standard growth early that had a poor growth later. We were also underpowered to test the associations between early growth and FC on clinically undernourished infants (defined as having DWLZ two standard deviations below the mean) (line 311, discussion section).

      We believe this is an important point to consider for the field, as it addresses the sample size required for studies investigating brain development in clinically malnourished infants. We hope this will serve as a valuable reference for future studies in the field. For example, a new study led by Prof. Sophie Moore and other members of the BRIGHT team (INDiGO) is currently recruiting six-hundreds pregnant women with the aim of obtaining a broader distribution of infants’ growth measures (https://www.kcl.ac.uk/research/sophie-moore-research-group).

      Reviewer #2 (Public Review):

      Summary and strengths:

      The article pertains to a topic of importance, specifically early life growth faltering, a marker of undernutrition, and how it influences brain functional connectivity and cognitive development. In addition, the data collection was laborious, and data preprocessing was quite rigorous to ensure data quality, utilizing cutting-edge preprocessing methods.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      However, the subsequent analysis and explanations were not very thorough, which made some results and conclusions less convincing. For example, corrections for multiple tests need to be consistently maintained; if the results do not survive multiple corrections, they should not be discussed as significant results. Additionally, alternative plans for analysis strategies could be worth exploring, e.g., using ΔFC in addition to FC at a certain age. Lastly, some analysis plans lacked a strong theoretical foundation, such as the relationship between functional connectivity (FC) between certain ROIs and the development of cognitive flexibility.

      Thus, as much as I admire the advanced analysis of connectivity that was conducted and the uniqueness of longitudinal fNIRS data from these samples (even the sheer effort to collect fNIRS longitudinally in a low-income country at such a scale!), I have reservations about the importance of this paper's contribution to the field in its present form. Major revisions are needed, in my opinion, to enhance the paper's quality. 

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings as well as hypothesis-generating findings that may not pass stringent significance thresholds. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      The relationship between FC and cognitive flexibility (as well as the relationship between growth and FC) has been explored focusing on those FC that showed a significant change with age, as specified in the results sections: ‘To investigate the impact of early nutritional status on FC at 24 months, we used multiple regression with the infant growth trajectory [...] and FC at 24 months [...]. To maximise power, we considered only those FC that showed a statistically significant change with age’ (line 183) and ‘To investigate whether FC early in life predicted cognitive flexibility at preschool age, we used multiple regression of FC across the first two years of life against later cognitive flexibility in preschoolers at three and five years. As per the analysis above, we focused on only those FC that showed a statistically significant change with age’ (line 198).

      We explored the possibility of investigating the relationship between changes in FC and changes in growth. However, the degrees of freedom in these analyses dropped dramatically (~25/30), thereby putting the significance and the meaning of the results at risk. We look forward to future longitudinal studies with less attrition across these time points to maintain the statistical power necessary to run such analyses.

      Reviewer #3 (Public Review):

      Summary:

      This study aimed to investigate whether the development of functional connectivity (FC) is modulated by early physical growth and whether these might impact cognitive development in childhood. This question was investigated by studying a large group of infants (N=204) assessed in Gambia with fNIRS at 5 visits between 5 and 24 months of age. Given the complexity of data acquisition at these ages and following data processing, data could be analyzed for 53 to 97 infants per age group. FC was analyzed considering 6 ensembles of brain regions and thus 21 types of connections. Results suggested that: i) compared to previously studied groups, this group of Gambian infants have different FC trajectory, in particular with a change in frontal inter-hemispheric FC with age from positive to null values; ii) early physical growth, measured through weight-for-length z-scores from birth on, is associated with FC at 24 months. Some relationships were further observed between FC during the first two years and cognitive flexibility at 4-5 years of age, but results did not survive corrections for multiple comparisons.

      Strengths:

      The question investigated in this article is important for understanding the role of early growth and undernutrition on brain and behavioral development in infants and children. The longitudinal approach considered is highly relevant to investigate neurodevelopmental trajectories. Furthermore, this study targets a little-studied population from a low-/middle-income country, which was made possible by the use of fNIRS outside the lab environment. The collected dataset is thus impressive and it opens up a wide range of analytical possibilities.

      We thank the reviewer for highlighting the strengths of this work.

      Weaknesses:

      - Analyzing such a huge amount of collected data at several ages is not an easy task to test developmental relationships between growth, FC, and behavioral capacities. In its present form, this study and the performed analyses lack clarity, unity and perhaps modeling, as it suggests that all possible associations were tested in an exploratory way without clear mechanistic hypotheses. Would it be possible to specify some hypotheses to reduce the number of tests performed? In particular, considering metrics at specific ages or changes in the metrics with age might allow us to test different hypotheses: the authors might clarify what they expect specifically for growth-FC-behaviour associations. Since some FC measures and changes might be related to one another, would it be reasonable to consider a dimensionality reduction approach (e.g., ICA) to select a few components for further correlation analyses?

      We confirm that this work was motivated by a compelling theoretical question: whether neural mechanisms, specifically FC, can be influenced by early adversity, such as growth, and subsequently impact cognitive outcomes, such as cognitive flexibility. This aligns with the overarching goal of the BRIGHT project, established in 2015 (Lloyd-Fox, 2023). We believe this was evident throughout the manuscript in several instances, for example:

      - “The goal of the study was to investigate early physical growth in infancy, developmental trajectories of brain FC across the first two years of life, and cognitive outcome at school age in a longitudinal cohort of infants and children from rural Gambia, an environment with high rates of maternal and child undernutrition. Specifically, we aimed to: (i) investigate whether differences in physical growth through the first two years of life are related to FC at 24 months, and (ii) investigate if trajectories of early FC have an impact on cognitive outcome at pre-school age in these children.” (page 4, introduction)

      - “This study investigated how early adversity via undernutrition drives longitudinal changes in brain functional connectivity at five time points throughout the first two years of life and how these developmental trajectories are associated with cognitive flexibility at preschool age.” (page 6, discussion)

      - We had a clear hypothesis regarding short-range connectivity decreasing with age and long-range connectivity increasing with age, as stated at the end of the introduction: We hypothesized that (i) long-range FC would increase and short-range FC would decrease throughout the first two years of life” (page 4, line 147). However, we were not able to formulate clear hypotheses about the localization of these connections due to the scarcity of previous studies conducted within this age range, particularly in low-resource settings. The ROI approach for analysis was chosen to mitigate this challenge by reducing the number of comparisons while still enabling us to estimate the developmental trajectories of all the connections from which we acquired data.

      Regarding the use of dimensionality reduction approach, we have not considered the use of ICA in our analysis. These methods require selecting a fixed number of components to remove from all participants. However, due to the high variability of infant fNIRS data across the five timepoints, we considered it untenable to precisely determine the number of components to remove at the group level. Such a procedure carries the risk of over-cleaning the data for some participants while leaving noise in for others (Di Lorenzo, 2019). We also felt that using PCA in this initial study would be beyond the scope of the brain-region-specific hypotheses and would be more appropriate in a follow-up analysis of these important data.

      References:

      Lloyd-Fox, S., McCann, S., Milosavljevic, B., Katus, L., Blasi, A., Bulgarelli, C., Crespo-Llado, M., Ghillia, G., Fadera, T., Mbye, E., Mason, L., Njai, F., Njie, O., Perapoch-Amado, M., Rozhko, M., Sosseh, F., Saidykhan, M., Touray, E., Moore, S. E., … Team, and the B. S. (2023). The Brain Imaging for Global Health (BRIGHT) Study: Cohort Study Protocol. Gates Open Research, 7(126).

      Di Lorenzo, R., Pirazzoli, L., Blasi, A., Bulgarelli, C., Hakuno, Y., Minagawa, Y., & Brigadoi, S. (2019). Recommendations for motion correction of infant fNIRS data applicable to multiple data sets and acquisition systems. NeuroImage, 200(April), 511–527.

      - It seems that neurodevelopmental trajectories over the whole period (5-24 months) are little investigated, and considering more robust statistical analyses would be an important aspect to strengthen the results. The discussion mentions the potential use of structural equation modelling analyses, which would be a relevant way to better describe such complex data.

      We appreciate the complexity of the dataset we are working with, which includes multiple measures and time points. Currently, our focus within the outputs from the BRIGHT project is on examining the relationship between selected measures. While this may not involve statistically advanced modelling at the moment, it is worth noting that most of the results presented in this work have survived correction for multiple comparisons, indicating their statistical robustness. We believe that more advanced statistical analyses are beyond the scope of this rich initial study. In the next phase of the project, known as BRIGHT IMPACT, our team is collaborating with statisticians and experts in statistical modelling to apply more sophisticated and advanced statistical techniques to the data.

      - Given the number of analyses performed, only describing results that survive correction for multiple comparisons is required. Unifying the correction approach (FDR / Bonferroni) is also recommended. For the association between cognitive flexibility and FC, results are not significant, and one might wonder why FC at specific ages was considered rather than the change in FC with age. One of the relevant questions of such a study would be whether early growth and later cognitive flexibility are related through FC development, but testing this would require a mediation analysis that was not performed.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      We did not perform a mediation analysis as i) ΔWLZ between birth and the subsequent time points positively predicted frontal interhemispheric FC at 24 months, ii) frontal interhemispheric FC at 18 months (and right fronto-posterior connectivity at 24 months) predicted cognitive flexibility at preschool age. Considering that the frontal interhemispheric FC at 24 months that was positively predicted by growth, did not significantly predicted cognitive outcome at preschool age, we did not perform mediation models.

      The reviewer raised concerns about using different methods to correct for multiple comparisons throughout the work. Results showing changes in FC with age were Bonferroni corrected, while we used FDR correction for the regression analyses investigating the relationship between growth and FC, as well as FC and cognitive flexibility. Both methods have good control over Type I errors (false positives), but Bonferroni is very conservative, increasing the likelihood of Type II errors (false negatives). We considered Bonferroni an appropriate method for correcting results showing changes in FC with age, where we had a large sample with strong statistical power (i.e. linear mixed models with 132 participants who had at least 250 seconds of good data for 2 out of 5 visits). However, Bonferroni was too conservative for the regression analyses, with N between 57 and 78) (Acharya, 2014; Félix & Menezes, 2018; Narkevich et al., 2020; Narum, 2006; Olejnik et al., 1997).

      References:

      Acharya, A. (2014). A Complete Review of Controlling the FDR in a Multiple Comparison Problem Framework--The Benjamini-Hochberg Algorithm. ArXiv Preprint ArXiv:1406.7117.

      Félix, V. B., & Menezes, A. F. B. (2018). Comparisons of ten corrections methods for t-test in multiple comparisons via Monte Carlo study. Electronic Journal of Applied Statistical Analysis, 11(1), 74–91.

      Narkevich, A. N., Vinogradov, K. A., & Grjibovski, A. M. (2020). Multiple comparisons in biomedical research: the problem and its solutions. Ekologiya Cheloveka (Human Ecology), 27(10), 55–64.

      Narum, S. R. (2006). Beyond Bonferroni: less conservative analyses for conservation genetics. Conservation Genetics, 7, 783–787.

      Olejnik, S., Li, J., Supattathum, S., & Huberty, C. J. (1997). Multiple testing and statistical power with modified Bonferroni procedures. Journal of Educational and Behavioral Statistics, 22(4), 389–406.

      - Growth is measured at different ages through different metrics. Justifying the use of weight-for-length z-scores would be welcome since weight-for-age z-scores might be a better marker of growth and possible undernutrition (this impacting potentially both weight and length). Showing the distributions of these z-scores at different ages would allow the reader to estimate the growth variability across infants.

      We consistently used WLZ as the metric to measure growth throughout. Our analysis investigating the relationship between WLZ and growth included HCZ at 7/14 days to correct for head size at birth. When selecting the best growth measure for this paper, we opted for WLZ over WAZ, given extant evidence that infants in our sample are smaller and shorter compared to the reference WHO standard for the same age group (Nabwera et al., 2017). Therefore, using WLZ allows us to adjust each infant's weight for its own length.

      References:

      Nabwera, H. M., Fulford, A. J., Moore, S. E., & Prentice, A. M. (2017). Growth faltering in rural Gambian children after four decades of interventions: a retrospective cohort study. The Lancet Global Health, 5(2), e208–e216.

      - Regarding FC, clarifications about the long-range vs short-range connections would be welcome, as well as drawing a summary of what is expected in terms of FC "typical" trajectory, for the different brain regions and connections, as a marker of typical development. For instance, the authors suggest that an increase in long-range connectivity vs a decrease in short-range is expected based on previous fNIRS studies. However anatomical studies of white matter growth and maturation would suggest the reverse pattern (short-range connections developing mostly after birth, contrarily to long-range connections prenatally).

      We expected an increase in long-range functional connectivity with age, as discussed in the introduction:

      - “Based on data from fMRI, current models hypothesize that FC patterns mature throughout early development (23–27), where in typically developing brains, adult-like networks emerge over the first years of life as long-range functional connections between pre-frontal, parietal, temporal, and occipital regions become stronger and more selective (28–31). This maturation in FC has been shown to be related to the cascading maturation of myelination and synaptogenesis (32, 33) - fundamental processes for healthy brain development (34)” (line 93, page 3, introduction);

      - “Importantly, normative developmental patterns may be disrupted and even reversed in clinical conditions that impact development; e.g., increased short-range and reduced long-range FC have been observed in preterm infants (36) and in children with autism spectrum disorder (37, 38)” (line 103, page 3, introduction);

      - “We hypothesized that (i) long-range FC would increase and short-range FC would decrease throughout the first two years of life” (line 147, page 4, introduction).

      Since inferences about FC patterns recorded with fNIRS are highly limited by the number and locations of the optodes, it is challenging to make strong inferences about specific brain regions. Moreover, infant FC fNIRS studies are still limited, which is why we focused our inferences on long-range versus short-range connectivity, without specifically pinpointing particular brain regions.

      Additionally, were unable to locate the works mentioned by the reviewer regarding an increase in short-range white matter connectivity immediately after birth. On the contrary, we found several studies documenting an increase in white-matter long-range connectivity after birth, which is consistent with the hypothesised increase in FC long-range connectivity, such as:

      Yap, P. T., Fan, Y., Chen, Y., Gilmore, J. H., Lin, W., & Shen, D. (2011). Development trends of white matter connectivity in the first years of life. PloS one, 6(9), e24678.

      Dubois, J., Dehaene-Lambertz, G., Kulikova, S., Poupon, C., Hüppi, P. S., & Hertz-Pannier, L. (2014). The early development of brain white matter: a review of imaging studies in fetuses, newborns and infants. Neuroscience, 276, 48-71.

      Stephens, R. L., Langworthy, B. W., Short, S. J., Girault, J. B., Styner, M. A., & Gilmore, J. H. (2020). White matter development from birth to 6 years of age: a longitudinal study. Cerebral Cortex, 30(12), 6152-6168.

      Hagmann, P., Sporns, O., Madan, N., Cammoun, L., Pienaar, R., Wedeen, V. J., ... & Grant, P. E. (2010). White matter maturation reshapes structural connectivity in the late developing human brain. Proceedings of the National Academy of Sciences, 107(44), 19067-19072.

      Collin G, van den Heuvel MP. The ontogeny of the human connectome: development and dynamic changes of brain connectivity across the life span. Neuroscientist. 2013 Dec;19(6):616-28. doi: 10.1177/1073858413503712.

      The authors test associations between FC and growth, but making sense of such modulation results is difficult without a clearer view of developmental changes per se (e.g., what does an early negative FC mean? Is it an increase in FC when the value gets close to 0? In particular, at 24m, it seems that most FC values are not significantly different from 0, Figure 2B). Observing positive vs negative association effects depending on age is quite puzzling. It is also questionable, for some correlation analyses with cognitive flexibility, to focus on FC that changes with age but to consider FC at a given age.

      We thank the reviewer for bringing up this important point and understand that it requires some additional consideration. The negative FC values decreasing with age indicate that these regions go from being anti-correlated to becoming increasingly correlated. Hence, FC of these ROIs increased with age. The trajectory seems to suggest that this will keep increasing with age but of course further data need to be collected to assess this.

      Unfortunately, when considering ΔFC to predict cognitive flexibility, the numbers of participants dropped significantly, with N=~15/20 infants per group of preschoolers, making it very challenging to interpret the results with meaningful statistical power.

      - The manuscript uses inappropriate terms "to predict", "prediction" whereas the conducted analyses are not prediction analyses but correlational.

      We thank the reviewer for giving us to opportunity to thoroughly revise the manuscript about this matter. In this work, we had clear hypotheses regarding which variables predicted which certain measures (such as growth predicting FC and FC predicting cognitive outcomes). Therefore, we performed regression analyses rather than correlational analyses to investigate these associations. Hence, we believe that using the term ‘predict and ‘prediction’ is appropriate

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In the introduction and discussion, the authors talk about the link between developmental trajectories and cognitive capacities, and undernutrition. However, they did not compare developmental trajectories but connectivity patterns at different ages with ΔWLZ and cognitive flexibility. I recommend that the authors rephrase the introduction and discussion.

      We thank the reviewer for pointing out places requiring better clarity in the text. We made edits through the introduction to better match our investigations. In particular we changed:

      - ‘our understanding of the relationships between early undernutrition, developmental trajectories of brain connectivity, and later cognitive outcomes is still very limited,’ to, ‘our understanding of the relationships between early undernutrition, brain connectivity, and later cognitive outcomes is still very limited’ (line 89, introduction);

      - ‘(ii) investigate if trajectories of early FC have an impact on cognitive outcome at pre-school age in these children,’ to, ‘(ii) investigate if early FC has an impact on cognitive outcome at pre-school age in these children’ (line 137, introduction);

      - ‘This study investigated how early adversity via undernutrition drives longitudinal changes in brain functional connectivity at five time points throughout the first two years of life and how these developmental trajectories are associated with cognitive flexibility at preschool age,’ to, ‘This study investigated how early adversity via undernutrition drives brain functional connectivity throughout the first two years of life and how these early functional connections are associated with cognitive flexibility at preschool age’ (line 215, discussion).

      (2) Considering most research is done in occidental high-income countries, and this work is one of the few presenting research in another context, I think the authors should discuss in the manuscript that differences with previous studies might also be due to environmental and cultural differences. Since the study lacks the statistical power to perform a statistical analysis that directly establishes a link between developmental trajectories and restricted growth and cognitive flexibility, the authors cannot disentangle which differences are related to undernutrition and which might result from growing up in a different environment. I recommend that the authors avoid phrases like (lines 57-58): "We observed that early physical growth before the fifth month of life drove optimal developmental trajectories of FC..." or (lines 223-224) "...our cohort of Gambian infants exhibit atypical developmental trajectories of functional connectivity...".

      We thank the reviewer for this observation, and we agree with the reviewer that other factors differing between low- and poor-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to explore this further” (line 238). We revised the whole manuscript to reflect similar statements.

      (3) To better interpret the results, it would be interesting to know if poor early growth predicts late cognitive flexibility in the tested sample and if the ΔWLZ distributions differ compared to a population in a high-income country where undernutrition is less frequent.

      We explored the relationship between changes in growth and cognitive flexibility in the two preschooler group, but there were no significant associations.

      Mean and SD values of WLZ are reported in Table 3. The values at every age are negative, indicating that the infants' weight-for-length is below the expected norm at all ages. To our knowledge, no other studies have assessed changes in growth in an infant sample with similar closely spaced age time points in high-income countries, making comparisons on growth changes challenging.

      (4) It is unclear why WLZ at birth and HCZ at 7-14 days are included in the models. I imagine this is to ensure that differences are not due to growing restrictions before birth. It would be nice if the authors could explain this.

      As the reviewer pointed out, HCZ at 7-14 days was included to ensure associations between growth and FC are not due to physical differences at birth. This case be considered as a 'baseline' measure for cerebral development, in the same way that WLZ at birth was used as a baseline for physical development. Therefore, we can more confidently  assume that the associations between growth and FC were specific to the impact of change in WLZ postnatally and not confounded by the size or maturity of the infant at birth. We specified this in the manuscript as follows: “These analyses were adjusted by WLZ at birth and HCZ at 7/14 days, to more confidently assume that the associations between growth and FC were specific to the impact of change in WLZ postnatally and not confounded by the size or maturity of the infant at birth” (line 520, statistical analysis section in the method section).

      (5) Right frontal-posterior connections at 24 months negatively correlate with ΔWLZ. Thus, restricted growth results in stronger frontal-posterior connections at 24 months. However, the same connections at 24 months positively correlate with cognitive flexibility (stronger connections predict better cognitive flexibility). Do the authors have any interpretation of this? How could this relate to previous findings of the authors (Bulgarelli et al. 2020), showing first an increase and then a decrease in functional connectivity between frontal and parietal regions?

      We acknowledge that interpreting the negative relationship between changes in growth and fronto-posterior FC at 24 months, alongside the positive association between the same connection and later cognitive flexibility, is challenging. We refrain from relating these findings to those published by Bulgarelli in 2020 due to differences in optode locations and because in that work the decrease in fronto-posterior FC was observed after 24 months (up to 36 months), whereas the endpoint in this study is right at 24 months.

      (6) With the growth of the head, the frontal channels move to more temporal areas, right? Could this determine the decrease in frontal inter-hemisphere connections?

      As shown in Nabwera (2017) head size does not increase that much in Gambian infants, or at least as expected by the WHO standard measures. We have added HCZ mean and SD values per age in Table 3.

      Minor points

      - HCZ is used in line 184 but not defined.

      We thank the reviewer for spotting this, we have now specified HCZ at line 184 as follows: ‘head-circumference z-score (HCZ)’.

      - Table SI2: NIRS not undertaken = the participant was assessed but did want or could not perform... I imagine there is a missing "not".

      We thank the reviewer for spotting this, we have now modified the legend of Table SI2 as follows: ‘the participant was assessed but did not want or could not perform the NIRS assessments.’

      - The authors should explain what weight-for-length is for those who are not familiar with it.

      We have added an explanation of weight-for-length in the experimental design section, line 339 as follows: ‘We then tested for relationships between brain FC at age 24 months with measures of early growth, as indexed by changes in weight-for-length z-scores (reflecting body weight in proportion to attained growth in length) at one month of age, and at each of the four subsequent visits (details provided below).’

      Reviewer #2 (Recommendations For The Authors):

      (1) I am confused about the authors' interpretation that left and right front-middle and right front-back FC increased with age. It appears in Figure 2 that the negative FC among these ROIs should actually decrease with age. This means that as individuals grow older, the FC values between these regions and zero diminished, albeit starting with negative FC (anticorrelation values) in younger age groups.

      Yes, the reviewer is correct. The negative values of the left and right front-middle and right front-back FC decreasing with age indicate that these regions go from being anti-correlated to becoming increasingly correlated. Hence, FC of these ROIs increased with age.

      (2) Are these negative values mentioned above at 24 months still negative? Have t-tests been run to examine the differences from zero?

      As suggested, we performed t-tests against zero for the mentioned FC at 24 months, and only the left and right fronto-middle FC are significantly different than zero (left fronto-middle FC: t(94) = 1.8, p = 0.036; right fronto-middle FC t(94) = 2.7, p = 0.003).

      (3) With so many correlation analyses, have multiple comparisons been consistently controlled for? While I assume this was done according to the Methods section, could the authors clarify whether FDR adjustment was applied to all the p-values at once or to a group of p-values each time? I found the following way of reporting FDR-adjusted p-values quite informative, such as PFDR, 24 pairs of ROIs < 0.05.

      We thank the reviewer for this insightful comment. P-values of regression analyses were FDR corrected per connection investigated, i.e. 21 possible ΔWLZ values per connection. We have specified this in the method section as follows: “To ensure statistical reliability, results from the regression analyses on each FC were corrected for multiple comparisons using false discovery rate (FDR)(Benjamini & Hochberg, 1995) per each connection investigated, i.e. 21 possible ΔWLZ values per each connection,” (page 12, Statistical Analyses section).

      (4) Can early growth trajectories predict changes in FC? Why not use ΔWLZ to predict ΔFC?

      Unfortunately, when considering ΔWLZ to predict ΔFC, the numbers of participants dropped significantly, with N=~30 infants, making it very challenging to interpret the results. We believe this emphasizes the importance of recruiting large samples when conducting longitudinal studies involving infants and employing multiple measures.

      (5) I might have missed the rationale, but why weren't the growth changes after 5 months studied?

      ΔWLZ between all time points were assessed as predictors of FC at 24 months. We have specified this at line 183 as follows: ‘we used multiple regression with the infant growth trajectory (delta weight for length z-score between all time points, DWLZ) and FC at 24 months’. As indicated in Table 2 and 3 the associations between ΔWLZ at all time points and FC at 24 months were tested, but only those with DWLZ calculated between birth and 1 month and the subsequent time points were significant. DWLZ between 5 months and the subsequent time points, DWLZ between 8 months and the subsequent time points, DWLZ between 12 months and the subsequent time points, DWLZ between 18 months and the subsequent time points did not significantly predict FC at 24 months. These are highlighted in Table 2 and Figure 3 in blue and marked as NS (non-significant).

      (6) Once more, the advantage of longitudinal data is that it allows us to tap into developmental changes. Analyzing and predicting cognitive development based solely on FC values at a single age stage (i.e., 24 months) would overlook the benefits of a longitudinal design, which is regrettable. I suggest that the authors attempt to use ΔFC for predictions and observe the outcomes.

      As mentioned to point (4) raised by the reviewer, unfortunately, when considering ΔWLZ to predict ΔFC, the numbers of participants dropped significantly, with N=~30 infants, making it very challenging to interpret the results. We believe this emphasizes the importance of recruiting large samples when conducting longitudinal studies involving infants and employing various measures.

      (7) In the section "Early FC predicts cognitive flexibility at preschool age", the authors pointed out that "...,none of these survived FDR correction for multiple comparisons." However, the paper discussed the association between FC at 24 months of age and cognitive flexibility, as it was supported by the statistical analysis in the following sections. If FDR correction cannot be satisfied, I would rephrase the implication/conclusion of the results to suggest that early FC does not predict cognitive flexibility at preschool age.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings, even those not passing multiple comparisons corrections, as they may motivate hypothesis-generation for future studies. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further support these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      Following the reviewer’ suggestion, we specified that results from regression analysis are significant but they did not survive multiple comparisons in the discussion as follows: ‘While our results are consistent with previous studies, we acknowledge that the significant association between early FC and later cognitive flexibility does not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample. (line 290, discussion section).

      (8) Have the authors assessed the impact of growth trajectories on cognitive flexibility?

      We explored the relationship between changes in growth and cognitive flexibility in the two preschooler groups, but there were no significant associations.

      (9) Are there no other cognitive or behavioural measures available? Cognitive flexibility is just one domain of cognitive development, and would the impact of undernutrition on cognitive development be domain-specific? There is a lack of theoretical support here. Why choose cognitive flexibility, and should the impact of undernutrition be domain-specific or domain-general?

      We agree with the reviewer that in this work, we chose to focus on one specific cognitive outcome. While this does not imply that the impact of undernutrition is domain-specific, cognitive flexibility, being a core executive function, has been extensively studied in terms of its neural underpinnings using other neuroimaging modalities, especially fMRI (for example see Dajani, 2015; Uddin, 2021).

      Moreover, other studies looking at the effect of adversity on cognitive outcomes focus on specific cognitive skills, such as working memory (Roberts, 2017), reading and arithmetic skills (Soni, 2021).

      We did assess infants also with Mullen Scales of Early Learning (MSEL), although the cognitive flexibility task within the Early Years Toolbox has been specifically designed for preschoolers (Howard, 2015), and this set of tasks has recently been validated in our team in The Gambia (Milosavljevic, 2023).Future works from the BRIGHT team will investigate performance at the MSEL in relation to other variable of the project.

      References:

      D. R. Dajani, L. Q. Uddin, Demystifying cognitive flexibility: Implications for clinical and developmental neuroscience. Trends Neurosci. 38, 571–578 (2015).

      L. Q. Uddin, Cognitive and behavioural flexibility: neural mechanisms and clinical considerations. Nat. Rev. Neurosci. 22, 167–179 (2021).

      Roberts, S. B., Franceschini, M. A., Krauss, A., Lin, P. Y., de Sa, A. B., Có, R., ... & Muentener, P. (2017). A pilot randomized controlled trial of a new supplementary food designed to enhance cognitive performance during prevention and treatment of malnutrition in childhood. Current developments in nutrition, 1(11), e000885.

      Soni, A., Fahey, N., Bhutta, Z. A., Li, W., Frazier, J. A., Moore Simas, T., ... & Allison, J. J. (2021). Early childhood undernutrition, preadolescent physical growth, and cognitive achievement in India: A population-based cohort study. PLoS Medicine, 18(10), e1003838.

      Howard, S. J., & Melhuish, E. (2015). An Early Years Toolbox (EYT) for assessing early executive function, language, self-regulation, and social development: Validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 35(3), 255-275.

      Milosavljevic, B., Cook, C. J., Fadera, T., Ghillia, G., Howard, S. J., Makaula, H., ... & Lloyd‐Fox, S. (2023). Executive functioning skills and their environmental predictors among pre‐school aged children in South Africa and The Gambia. Developmental Science, e13407.

      (10) I would review more previous fNIRS studies on infants if they exist (e.g., the work by S Lloyd-Fox, L Emberson, and others). These studies can help identify brain ROIs likely linked to undernutrition and cognitive flexibility. The current analysis methods lean towards exploratory research. This makes the paper more of a proof-of-concept report rather than a strongly theoretically-driven study.

      We thank the reviewer for this important point. While we have reviewed existing fNIRS infant studies, there are no extant works that showed whether specific brain regions are related undernutrition. However, several fMRI studies assessed regions that do support cognitive flexibility, and we mentioned these in the manuscript (for example see Dajani, 2015; Uddin, 2021).

      Other than the BRIGHT project, we are aware of two other projects that assessed the effect of undernutrition on brain development, assessing cognitive outcomes in poor-resource settings:

      - the BEAN project in Bangladesh in which fNIRS data were recorded from the bilateral temporal cortex (i.e. Pirazzoli, 2022);

      - a project in India in which fNIRS data were recorded from frontal, temporal and parietal cortex bilaterally (i.e. Delgado Reyes, 2020)

      The brain regions recorded in these studies largely overlap with the brain regions we recorded from in this study.

      Another aspect to consider is that infants underwent several fNIRS tasks as part of the BRIGHT project, focusing on social processing, deferred imitation, and habituation responses. Therefore, brain regions for data acquisition were chosen to maximize the likelihood of recording meaningful data for all tasks (Lloyd-Fox, 2023). To clarify the text, we specified this information in the methods section (line 383).

      References:

      D. R. Dajani, L. Q. Uddin, Demystifying cognitive flexibility: Implications for clinical and developmental neuroscience. Trends Neurosci. 38, 571–578 (2015).

      Pirazzoli, L., Sullivan, E., Xie, W., Richards, J. E., Bulgarelli, C., Lloyd-Fox, S., ... & Nelson III, C. A. (2022). Association of psychosocial adversity and social information processing in children raised in a low-resource setting: an fNIRS study. Developmental Cognitive Neuroscience, 56, 101125.

      Delgado Reyes, L., Wijeakumar, S., Magnotta, V. A., Forbes, S. H., & Spencer, J. P. (2020). The functional brain networks that underlie visual working memory in the first two years of life. NeuroImage, 219, Article 116971.

      Lloyd-Fox, S., McCann, S., Milosavljevic, B., Katus, L., Blasi, A., Bulgarelli, C., Crespo-Llado, M., Ghillia, G., Fadera, T., Mbye, E., Mason, L., Njai, F., Njie, O., Perapoch-Amado, M., Rozhko, M., Sosseh, F., Saidykhan, M., Touray, E., Moore, S. E., … Team, and the B. S. (2023). The Brain Imaging for Global Health (BRIGHT) Study: Cohort Study Protocol. Gates Open Research, 7(126).

      (11) Last but not least, in the paper, the authors mentioned that fNIRS offers better spatial resolution and anatomical specificity compared to EEG, thereby providing more precise and reliable localization of brain networks. While I partially agree with this perspective, it remains to be explored whether the current fNIRS analysis strategies indeed yield higher spatial resolution. It is hoped that the authors will delve deeper into this discussion in the paper.

      The brain regions of focus were selected based on coregistration work previously conducted at each time point on the array used in this project (Collins-Jones, 2019). We deliberately avoided making claims about small brain regions, considering that head size might increase slightly less with age in The Gambia compared to Western countries (Nabwera, 2017) . However, we maintain that the conclusions drawn in this study offer higher brain-region specificity than could have been  identified with current common EEG methods alone.

      References:

      L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021).

      Nabwera, H. M., Fulford, A. J., Moore, S. E., & Prentice, A. M. (2017). Growth faltering in rural Gambian children after four decades of interventions: a retrospective cohort study. The Lancet Global Health, 5(2), e208–e216.

      Reviewer #3 (Recommendations For The Authors):

      Introduction

      - Among important developmental mechanisms to mention are the development of exuberant connections and the further selection/stabilization of the relevant ones according to environmental stimulation, vs the pruning of others.

      We agree with the reviewer that the development of exuberant connections and subsequent pruning is a universal process of paramount importance during the first years of life. However, after revising our introduction, given the word limit of the journal, we maintained focus on neurodevelopment and early adversity.

      Results

      - Adding a few more information on the 6 sections and 21 connections would be welcome. In particular for within-section FC: how was this computed?

      The 6 sections were created based on the co-registration of the array used in this study at each age in a previous published work L. H. Collins-Jones, et al., Longitudinal infant fNIRS channel-space analyses are robust to variability parameters at the group-level: An image reconstruction investigation. Neuroimage 237, 118068 (2021). This is reference No. 68 in the manuscript.

      As we mentioned in the section fNIRS preprocessing and data-analysis: ‘The sections were established via the 17 channels of each hemisphere which were grouped into front, middle and back (for a total of six regions) based on a previous co-registration of the BRIGHT fNIRS arrays onto age-appropriate templates’.

      The 21 connections were defined as all the possible links between the 6 regions, specifically: the interhemispheric homotopic connections (in orange in Figure SI1), which connect the same regions between hemispheres (i.e., front left with front right); the intrahemispheric connections (in green in Figure SI1), which correlate channels belonging to the same region; the fronto-posterior connections (in blue in Figure SI1), which link front and middle, middle and back, and front and back regions of the same hemisphere; and the crossing interhemispheric connections (non-homotopic interhemispheric, in yellow in Figure SI1), which link the front, middle, and back areas between left and right hemispheres. We added these specifications also in the legend of Figure SI1 for clarity.

      - The denomination intrahemispheric vs fronto-posterior vs crossed connections is not clear. Maybe prefer intra-hemispheric vs inter-hemispheric homotopic vs inter-hemispheric non-homotopic (also in Figure SI1).

      We appreciate the reviewer's suggestion regarding terminology. However, we believe that the term 'inter-hemispheric non-homotopic' could potentially refer to both connections within the same brain hemisphere from front to back and connections crossing between hemispheres, leading to increased confusion. Therefore, we have chosen not to include the term 'non-homotopic' and instead added 'homotopic' to 'interhemispheric' throughout the manuscript to emphasize that these functional connections occur between corresponding regions of the two hemispheres.

      - with time -> with age.

      We replaced “with time” with “with age” as suggested through the manuscript.

      - The description of both HbO2 and HHb results overloads the main text: would it be relevant to present one of the two in Supplementary Information if the results are coherent?

      We understand the reviewer’s concern regarding overloading the results section with reporting both chromophores. However, reporting results for both HbO and HHb is considered a crucial step for publications in the fNIRS field, as emphasized in recent formal guidance (Yücel et al., 2020). One of the strengths of fNIRS compared to fMRI is its ability to record from both chromophores, enabling a more precise characterization of brain activations and oscillations. Moreover, in FC studies like this one, ensuring that HbO and HHb results overlap is an important check that increases confidence in interpreting the findings.

      References:

      Yücel, M. A., von Lühmann, A., Scholkmann, F., Gervain, J., Dan, I., Ayaz, H., Boas, D., Cooper, R. J., Culver, J., Elwell, C. E., Eggebrecht, A. ., Franceschini, M. A., Grova, C., Homae, F., Lesage, F., Obrig, H., Tachtsidis, I., Tak, S., Tong, Y., … Wolf, M. (2020). Best Practices for fNIRS publications. Neurophotonics, 1–34. https://doi.org/10.1117/1.NPh.8.1.012101

      - HCZ is not defined when first used.

      We thank the reviewer for spotting this, we have now specified HCZ at line 184 as follows: ‘head-circumference z-score (HCZ)’.

      - Choosing the analyzed measures to "maximize power" could be criticised.

      We appreciate the reviewer’s concern. However, correlating all the FC values with all changes in growth would have raised an important issue for multiple comparisons. We therefore we made a priori decision to focus on investigating the relationship between changes in growth and those FC that showed a significant change with age, considering these as the most interesting ones from a developmental perspective in our sample.

      Discussion

      - I would recommend using the same order to synthesize results and further discuss them.

      We agree with the reviewer that the suggested structure is optimal for a clear discussion section. We have indeed followed it, with each paragraph covering specific aspects:

      - Recap of the study aims

      - Results summary and discussion of developmental changes

      - Results summary and discussion of the relationship between changes in growth and FC

      - Results summary and discussion of the relationship between FC and cognitive flexibility

      - Limitations

      - Conclusion

      Given the numerous results presented in this paper, we believe that readers will better digest them by first reading a summary of the results followed by their interpretations, rather than condensing all the interpretations together.

      - Highlighting how "atypical" developmental trajectories are in Gambian infants would be welcome in the Results section. Other interpretations can be found than "The observed decrease in frontal inter-hemispheric FC with increasing age may be due to the exposure to early life undernutrition adversity".

      We agree with the reviewer that other factors that differ between low- and high-resource settings might have an impact on FC trajectories. We therefore specified this in the discussion as follows: “We acknowledge that differences in FC could also be attributed to other environmental and cultural disparities between high-resource and low-resource settings, and future studies are needed to further investigate cultural, environmental, and genetic effects on brain FC” (line 238).

      - Focusing on FC at 24m for the relationship with growth is questionable.

      Correlating the FC values at 5 time points with all changes in growth would have raised an important issue for multiple comparisons. We therefore we made a decision a priori to focus on investigating the relationship between changes in growth and FC at 24 months as our final time point of data collection. We added this information in the methods section as follows: “To investigate the impact of undernutrition on FC development, we used DWLZ as independent variables in regression analyses on HbO2 (as the chromophore with the highest signal-to-noise ratio) FC at 24 months, our final time point of data collection” (line 517, method section).

      - There is too much emphasis on the correlation between FC and cognitive flexibility, whereas results are not significant after correction for multiple comparisons.

      Following the reviewer’ suggestion, we specified that results from regression analysis are significant but they did not survive multiple comparisons in the discussion as follows: While our results are consistent with previous studies, we acknowledge that the significant association between early FC and later cognitive flexibility does not withstand multiple comparisons. Therefore, we encourage future studies that may replicate these findings with a larger sample. (line 290, discussion section).

      Methods

      - I would recommend detailing how z-scores were computed in the paragraph "Anthropometric measures".

      We specified how z-scores were computed in the statistical analysis section as follows: “Anthropometric measures were converted to age and sex adjusted z‐scores that are based on World Health Organization Child Growth Standards (93). Weight‐for‐Length (WLZ) and Head Circumference (HCZ) z-scores were computed” (line 509, method section). As transforming data is the first step of statistical analysis and is not directly related to data collection, we believe it is more appropriate to retain this description in the statistical analysis section.

      - FC computation: the mention of "correlating the first and the last 250s" is not clear.

      We specified this more clearly in the text as follows: We found that correlating the first and the last 250 seconds of valid data after pre-processing provided the highest percentage of infants with strong correlation between the first and the last portion of data (line 467).

      - The manuscript mentions "age 3 years" for the younger preschoolers but ~48months rather corresponds to 4 years.

      We revised the entire manuscript and the supplementary materials, but we could not find any instance in which preschoolers are referred with age in months rather than in years.

      - Specify the number of children evaluated at 4 and 5 years. Is the test of cognitive flexibility normalized for age? If not, how were the 2 groups considered in the analyses? (age as a confounding factor).

      We have added the number of children in the two preschooler groups as follows: younger preschoolers (age mean ± SD=47.96 ± 2.77 months, N=77) and older preschoolers (age mean ± SD=57.58 ± 2.11 months, N=84). (line 484).

      The cognitive flexibility test was not normalized for age, as this task was specifically developed for preschoolers (Howard, 2015). As mentioned in ‘Cognitive flexibility at preschool age’ of the methods section, “data were collected in two ranges of preschool ages”, which guided our decision to perform regression analysis on the impact of FC on cognitive flexibility separately within these two age groups, rather than treating them as a single group of preschoolers.

      References:

      Howard, S. J., & Melhuish, E. (2015). An Early Years Toolbox (EYT) for assessing early executive function, language, self-regulation, and social development: Validity, reliability, and preliminary norms. Journal of Psychoeducational Assessment, 35(3), 255-275.

      Figures and Tables

      - Table 1 could highlight the significant results. It is not clear what the "baseline" results correspond to.

      We have marked in bold the results that are statistically significant in Table 1. In the linear mixed model we performed, the first time point (i.e. 5 months) is chosen as ‘baseline’, i.e. the reference against which the other time points are compared to, and its statistical values refer to its significance against 0 (as it has been performed in Bulgarelli 2020).

      - Figures 2 B and C seem redundant? What is SE vs SD?

      We believe that both figures 2B and 2C are useful for the readers. While the first one shows the mean FC values at the group level, the second one highlights the individual variability of FC values (typical of infant neuroimaging data), which also why it is interesting to relate these measures to other variables of our dataset (i.e. growth and cognitive flexibility). Figure 2C also reports mean FC values per age, but these might be less visible considering that also one dot per infant is also plotted.

      SE stands for standard error, and in the legend of the figure we specified this as follows: ‘Mean and standard error of the mean (SE)’. SD stands for standard deviation, and we have now specified this as follows: ‘mean ± standard deviation (SD)’ .

      - Table 2: I would recommend removing results that don't survive corrections for multiple comparisons.

      We acknowledge the reviewer’s concern regarding the reporting of results that do not survive multiple comparisons. However, considering the uniqueness of our dataset and the novelty of our work, we believe it is crucial to report all significant findings. We have taken great care to transparently distinguish between results that survived multiple comparisons and those that did not in both the Results and Discussion sections, ensuring that readers are not misled. It is possible that future studies may replicate and further strengthen these associations. Therefore, by sharing these results with the research community, we provide valuable insights for future investigations.

      - Figure 3: the top is redundant with Table 2: to be merged? B: the statistical results might be shown in a Table.

      We agree with the reviewer that the top part of Figure 3 and Table 2 report the same results. However, given the richness of these findings, we believe that the top part of Figure 3 serves as a useful summary for readers. Additionally, examining both the top and bottom parts of Figure 3 provides a comprehensive overview of the regression analysis conducted in this study.

      - Figure SI6: Is it really a % in x-axis?

      We thank the reviewer for spotting this typo, the percentage is relevant for the y-axis only. We removed the % symbol from ticks of the x-axis.

      - Table SI1: the presented p-values don't seem to survive Bonferroni correction, contrary to what is written.

      We thank the reviewer for spotting this mistake, we removed the reference to the Bonferroni correction for the p-values.

      - Table SI2: For the proportion of children included in the analysis, maybe be precise that the proportion was computed based on the ones with acquired data. Maybe also add the proportion according to all children, to better show the high drop-out rate at certain ages?

      We thank the reviewer for these useful suggestions. We have specified in the legend of the table how we calculated the proportion of infants included as follows: ‘The proportion of children included in the analysis was computed based on the infants with FC data’. We have also added a column in the table called ‘Inclusion rate (from the 204 infants recruited)’, following the reviewer’s suggestion. This will be a useful reference for future studies.

      - A few typos should be corrected throughout the manuscript.

      We thoroughly revised the main manuscript and the supplementary materials for typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Building on previous in vitro synaptic circuit work (Yamawaki et al., eLife 10, 2021), Piña Novo et al. utilize an in vivo optogenetic-electrophysiological approach to characterize sensory-evoked spiking activity in the mouse's forelimb primary somatosensory (S1) and motor (M1) areas. Using a combination of a novel "phototactile" somatosensory stimuli to the mouse's hand and simultaneous high-density linear array recordings in both S1 and M1, the authors report in awake mice that evoked cortical responses follow a triphasic peak-suppression-rebound pattern response. They also find that M1 responses are delayed and attenuated relative to S1. Further analysis revealed a 20-fold difference in subcortical versus corticocortical propagation speeds.

      They also report that PV interneurons in S1 are strongly recruited by hand stimulation. Furthermore, they report that selective activation of PV cells can produce a suppression and rebound response similar to "phototactile" stimuli. Lastly, the authors demonstrate that silencing S1 through local PV cell activation reduces M1 response to hand stimulation, suggesting S1 may directly drive M1 responses.

      Strengths:

      The study was technically well done, with convincing results. The data presented are appropriately analyzed. The author's findings build on a growing body of both in vitro and in vivo work examining the synaptic circuits underlying the interactions between S1 and M1. The paper is well-written and illustrated. Overall, the study will be useful to those interested in forelimb S1-M1 interactions.

      Weaknesses:

      Although the results are clear and convincing, one weakness is that many results are consistent with previous studies in other sensorimotor systems, and thus not all that surprising. For example, the findings that sensory stimulation results in delayed and attenuated responses in M1 relative to S1 and that PV inhibitory cells in S1 are strongly recruited by sensory stimulation are not novel (e.g., Bruno et al., J Neurosci 22, 10966-10975, 2002; Swadlow, Philos Trans R Soc Lond B Biol Sci 357, 1717-1727, 2002; Gabernet et al., Neuron 48, 315-327, 2005; Cruikshank et al., Nat Neurosci 10, 462-468, 2007; Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016; Yu et al., Neuron 104, 412-427 e414, 2019). Furthermore, the observation that sensory processing in M1 depends upon activity in S1 is also not novel (e.g., Ferezou et al., Neuron 56, 907-923, 2007; Sreenivasan et al., Neuron 92, 1368-1382, 2016). The authors do a good job highlighting how their results are consistent with these previous studies.

      We thank the reviewer for the close reading of the manuscript and the many constructive comments and critiques. As the reviewer notes, there have been many prior studies of related circuits in other sensorimotor systems forming an important context for our study and findings, as we have tried to highlight. We appreciate the suggestions for additional relevant articles to cite.

      Perhaps a more significant weakness, in my opinion, was the missing analyses given the rich dataset collected. For example, why lump all responsive units and not break them down based on their depth? Given superficial and deep layers respond at different latencies and have different response magnitudes and durations to sensory stimuli (e.g., L2/3 is much more sparse) (e.g., Constantinople et al., Science 340, 1591-1594, 2013; Manita et al., Neuron 86, 1304-1316, 2015; Petersen, Nat Rev Neurosci 20, 533-546, 2019; Yu et al., Neuron 104, 412-427 e414, 2019), their conclusions could be biased toward more active layers (e.g., L4 and L5). These additional analyses could reveal interesting similarities or important differences, increasing the manuscript's impact. Given the authors use high-density linear arrays, they should have this data.

      We have analyzed the activity patterns as a function of cortical depth, and now include these results in the manuscript as suggested. The key new finding is that the M1 responses are strongest in upper layers, consistent with expectations based on the excitatory corticocortical synaptic connectivity characterized previously. Changes to the manuscript include new figures (Figure 5; Figure 5 - figure supplement 1), which we explain (Methods: page 14, lines 618-621), describe (new Results section: pages 4-5, lines 183-189), comment on (Discussion: page 9, lines 378-391), and summarize the significance of (Abstract: page 1, lines 22-24). In addition, we incorporated the new laminar analysis into a summary schematic (Figure 9). We thank the reviewer for suggesting this analysis.

      Similarly, why not isolate and compare PV versus non-PV units in M1? They did the photostimulation experiments and presumably have the data. Recent in vitro work suggests PV neurons in the upper layers (L2/3) of M1 are strongly recruited by S1 (e.g., Okoro et al., J Neurosci 42, 8095-8112, 2022; Martinetti et al., Cerebral cortex 32, 1932-1949, 2022). Does the author's data support these in vitro observations?

      These experiments were relatively complex and M1 optotagging was not routinely included in the stimulus and acquisition protocol. Therefore, we don’t have sufficient data for this analysis. We plan to address this in future studies.

      It would have also been interesting to suppress M1 while stimulating the hand to determine if any part of the S1 triphasic response depends on M1 feedback.

      We agree that this is of interest but consider this to be outside the scope of the current study.

      I appreciate the control experiment showing that optical hand stimulation did not evoke forelimb movement. However, this appears to be an N=1. How consistent was this result across animals, and how was this monitored in those animals? Can the authors say anything about digit movement?

      We have performed additional experiments to address this point. A constraint with EMG is that it is limited to the muscle(s) one chooses to record from, and it is difficult to implant tiny muscles of the hand. Therefore, for this analysis, we used kilohertz videography as a high-sensitivity method for movement surveillance across the hand. Hand stimulation did not evoke any detectable movements. Changes in the manuscript include: revised Figure 1 - figure supplement 1; supplementary Figure 1 - video 1; and associated text edits in the Methods (page 13, line 557; page 14, lines 626-639) and Results sections (page 2, lines 84-85).

      A light intensity of 5 mW was used to stimulate the hand, but it is unclear how or why the authors chose this intensity. Did S1 and M1 responses (e.g., amplitude and latency) change with lower or higher intensities? Was the triphasic response dependent on the intensity of the "phototactile" stimuli?

      As we now say in the Methods > Optogenetic photostimulation of the hand section (page 13, lines 562-565), “This intensity was chosen based on pilot experiments in which we varied the LED power, which showed that this intensity was reliably above the threshold for evoking robust responses in both S1 and M1 without evoking any visually detectable movements (as subsequently confirmed by videography)”.

      Reviewer #2 (Public review):

      Summary:

      Communication between sensory and motor cortices is likely to be important for many aspects of behavior, and in this study, the authors carefully analyse neuronal spiking activity in S1 and M1 evoked by peripheral paw stimulation finding clear evidence for sensory responses in both cortical regions

      Strengths:

      The experiments and data analyses appear to have been carefully carried out and clearly represented.

      Weaknesses:

      (1) Some studies have found evidence for excitatory projection neurons expressing PV and in particular some excitatory pyramidal cells can be labelled in PV-Cre mice. The authors might want to check if this is the case in their study, and if so, whether that might impact any conclusions.

      Thank you for pointing this out. The prior studies suggest it is mainly a subset of layer 5B excitatory neurons that may express PV. We checked this in two ways. Anatomically, we did not find double-labeling. An electrophysiology assay showed that, although some evoked excitatory synaptic input could be detected in some neurons, these inputs were very weak. Results from these assays are shown in new Figure 6 - figure supplement 1, with associated text edits in the Methods (page 11, lines 469-471; page 15, lines 657-668) and Results (page 5, lines 198-199) sections.

      (2) I think the analysis shown in Figure S1 apparently reporting the absence of movements evoked by the forepaw stimulation could be strengthened. It is unclear what is shown in the various panels. I would imagine that an average of many stimulus repetitions would be needed to indicate whether there is an evoked movement or not. This could also be state-dependent and perhaps more likely to happen early in a recording session. Videography could also be helpful.

      As noted above, we have performed additional experiments to address this.

      (3) Some similar aspects of the evoked responses, including triphasic dynamics, have been reported in whisker S1 and M1, and the authors might want to cite Sreenivasan et al., 2016.

      Thank you for pointing this out; we now cite this article (page 1, line 46; page 10, line 415).

      Reviewer #3 (Public review):

      Summary:

      This is a solid study of stimulus-evoked neural activity dynamics in the feedforward pathway from mouse hand/forelimb mechanoreceptor afferents to S1 and M1 cortex. The conclusions are generally well supported, and match expectations from previous studies of hand/forelimb circuits by this same group (Yamawaki et al., 2021), from the well-studied whisker tactile pathway to whisker S1 and M1, and from the corresponding pathway in primates. The study uses the novel approach of optogenetic stimulation of PV afferents in the periphery, which provides an impulselike volley of peripheral spikes, which is useful for studying feedforward circuit dynamics. These are primarily proprioceptors, so results could differ for specific mechanoreceptor populations, but this is a reasonable tool to probe basic circuit activation. Mice are awake but not engaged in a somatosensory task, which is sufficient for the study goals.

      The main results are:

      (1) brief peripheral activation drives brief sensory-evoked responses at ~ 15 ms latency in S1 and ~25 ms latency in M1, which is consistent with classical fast propagation on the subcortical pathway to S1, followed by slow propagation on the polysynaptic, non-myelinated pathway from S1 to M1;

      (2) each peripheral impulse evokes a triphasic activation-suppression-rebound response in both S1 and M1;

      (3) PV interneurons carry the major component of spike modulation for each of these phases; (4) activation of PV neurons in each area (M1 or S1) drives suppression and rebound both in the local area and in the other downstream area;

      (5) peripheral-evoked neural activity in M1 is at least partially dependent on transmission through S1.

      All conclusions are well-supported and reasonably interpreted. There are no major new findings that were not expected from standard models of somatosensory pathways or from prior work in the whisker system.

      Strengths:

      This is a well-conducted and analyzed study in which the findings are clearly presented. This will provide important baseline knowledge from which studies of more complex sensorimotor processing can build.

      Weaknesses:

      A few minor issues should be addressed to improve clarity of presentation and interpretation:

      (1) It is critical for interpretation that the stimulus does not evoke a motor response, which could induce reafference-based activity that could drive, or mask, some of the triphasic response. Figure S1 shows that no motor response is evoked for one example session, but this would be stronger if results were analyzed over several mice.

      As noted above, we have performed additional experiments to address this point.

      (2) The recordings combine single and multi-units, which is fine for measures of response modulation, but not for absolute evoked firing rate, which is only interpretable for single units. For example, evoked firing rate in S1 could be higher than M1, if spike sorting were more difficult in S1, resulting in a higher fraction of multi-units relative to M1. Because of this, if reporting of absolute firing rates is an essential component of the paper, Figs 3D and 4E should be recalculated just for single units.

      Thank you for noting this. Although the absolute firing rates are not essential for the main findings or conclusions (which as noted focus on response modulations and relative differences) we agree that analyzing the single-unit response amplitudes is useful. Therefore, changes in the manuscript now include: revised Figure 3, and associated text edits in the Methods (page 12, lines 543-545), Results (page 3, lines 115-119), and Discussion (page 7, lines 305-311) sections.

      (3) In Figure 5B, the average light-evoked firing rate of PV neurons seems to come up before time 0, unlike the single-trial rasters above it. Presumably, this reflects binning for firing rate calculation. This should be corrected to avoid confusion.

      Yes, this reflects the binning. We agree that this is potentially confusing and have removed these average plots below the raster plots, as the rasters alone suffice to demonstrate the result (i.e., that PV units are strongly activated and thus tagged by optogenetic stimulation). Changes are now reflected in revised Figure 6.

      (4) In Figure 6A bottom, please clarify what legends "W. suppression" and "W. rebound" mean.

      In the figure plot legends, the “W.” has been removed. Changes are now reflected in revised Figure 7 and Figure 7 – figure supplement 1.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Did you filter the neural signals during acquisition? If so, please include these details in the results.

      Signals were bandpass-filtered (2.5 Hz to 7.6 KHz) at the hardware level at acquisition (with no additional software filtering applied), as now clarified in the Methods Electrophysiological recordings section as requested (page 12, lines: 525-526).

      Reviewer #2 (Recommendations for the authors):

      (1) Some studies have found evidence for excitatory projection neurons expressing PV and in particular some excitatory pyramidal cells can be labelled in PV-Cre mice. The authors might want to check if this is the case in their study, and if so, whether that might impact any conclusions.

      Please see above for our response to this issue.

      (2) I think the analysis shown in Figure S1 apparently reporting the absence of movements evoked by the forepaw stimulation could be strengthened. It is unclear what is shown in the various panels. I would imagine that an average of many stimulus repetitions would be needed to indicate whether there is an evoked movement or not. This could also be state-dependent and perhaps more likely to happen early in a recording session. Videography could also be helpful.

      Please see above for our response to this issue.

      (3) Some similar aspects of the evoked responses, including triphasic dynamics, have been reported in whisker S1 and M1, and the authors might want to cite Sreenivasan et al., 2016.

      As noted above, we now cite this study.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable work explores how synaptic activity encodes information during memory tasks. All reviewers agree that the quality of the work is high. Although experimental data do support the possibility that phospholipase diacylglycerol signaling and synaptotagmin 7 (Syt7) dynamically regulate the vesicle pool required for presynaptic release, concerns remain that the central finding of paired pulse depression at very short intervals was more likely caused by Ca<sup>2+</sup> channel inactivation than pool depletion. Overall, this is a solid study with valuable findings, but the results warrant consideration of alternative interpretations.

      We greatly appreciate invaluable and constructive comments from Editors and Reviewers. We also thank for their time and patience. We are pleased for our manuscript to have been assessed valuable and solid.

      One of the most critical concerns was a possible involvement of Ca<sup>2+</sup> channel inactivation in the strong paired pulse depression (PPD). Meanwhile, we have measured total (free plus buffered) calcium increments induced by each of first four APs in 40 Hz trains at axonal boutons of prelimbic layer 2/3 pyramidal cells. We found that first four Ca<sup>2+</sup> increments were not different from one another, arguing against possible contribution of Ca<sup>2+</sup> channel inactivation to PPD. Please see our reply to the 2nd issue in the Weakness section of Reviewer #3.

      The second critical issue was on the definition of ‘vesicular probability’. Previously, vesicular probability (p<sub>v</sub>) has been used with reference to the releasable vesicle pool which includes not only tightly docked vesicles but also reluctant vesicles. On the other hand, the meaning of p<sub>v</sub> in the present study is the release probability of tightly docked vesicles. We clarified this point in our replies to the 1st issues in the Weakness sections of Reviewer #2 and Reviewer #3.

      We below described our point-by-point replies to the Reviewers’ comments.

      Public Reviews:

      Reviewer #1 (Public review):

      Shin et al. conduct extensive electrophysiological and behavioral experiments to study the mechanisms of short-term synaptic plasticity at excitatory synapses in layer 2/3 of the rat medial prefrontal cortex. The authors interestingly find that short-term facilitation is driven by progressive overfilling of the readily releasable pool, and that this process is mediated by phospholipase C/diacylglycerol signaling and synaptotagmin-7 (Syt7). Specifically, knockdown of Syt7 not only abolishes the refilling rate of vesicles with high fusion probability, but it also impairs the acquisition of trace fear memory. Overall, the authors offer novel insight to the field of synaptic plasticity through well-designed experiments that incorporate a range of techniques.

      Reviewer #2 (Public review):

      Summary:

      Shin et al aim to identify in a very extensive piece of work a mechanism that contributes to dynamic regulation of synaptic output in the rat cortex at the second time scale. This mechanism is related to a new powerful model is well versed to test if the pool of SV ready for fusion is dynamically scaled to adjust supply demand aspects. The methods applied are state-of-the-art and both address quantitative aspects with high signal to noise. In addition, the authors examine both excitatory output onto glutamatergic and GABAergic neurons, which provides important information on how general the observed signals are in neural networks, The results are compellingly clear and show that pool regulation may be predominantly responsible. Their results suggests that a regulation of release probability, the alternative contender for regulation, is unlikely to be involved in the observed short term plasticity behavior (but see below). Besides providing a clear analysis pof the underlying physiology, they test two molecular contenders for the observed mechanism by showing that loss of Synaptotagmin7 function and the role of the Ca dependent phospholipase activity seems critical for the short term plasticity behavior. The authors go on to test the in vivo role of the mechanism by modulating Syt7 function and examining working memory tasks as well as overall changes in network activity using immediate early gene activity. Finally, they model their data, providing strong support for their interpretation of TS pool occupancy regulation.

      Strengths:

      This is a very thorough study, addressing the research question from many different angles and the experimental execution is superb. The impact of the work is high, as it applies recent models of short term plasticity behavior to in vivo circuits further providing insights how synapses provide dynamic control to enable working memory related behavior through nonpermanent changes in synaptic output.

      Weaknesses:

      (1) While this work is carefully examined and the results are presented and discussed in a detailed manner, the reviewer is still not fully convinced that regulation of release provability is not a putative contributor to the observed behavior. No additional work is needed but in the moment I am not convinced that changes in release probability are not in play. One solution may be to extend the discussion of changes in release probability as an alternative.

      Quantal content (m) depends on n * p<sub>v</sub>, where n = RRP size and p<sub>v</sub> =vesicular release probability. The value for p<sub>v</sub> critically depends on the definition of RRP size. Recent studies revealed that docked vesicles have differential priming states: loosely or tightly docked state (LS or TS, respectively). Because the RRP size estimated by hypertonic solution or long presynaptic depolarization is larger than that by back extrapolation of a cumulative EPSC plot (Moulder & Mennerick, 2005; Sakaba, 2006) in glutamatergic synapses, the former RRP (denoted as RRP<sub>hyper</sub>) may encompass not only AP-evoked fast-releasing vesicles (TS vesicle) but also reluctant vesicles (LS vesicles). Because we measured p<sub>v</sub> based on AP-evoked EPSCs such as strong paired pulse depression (PPD) and associated failure rates, p<sub>v</sub> in the present study denotes vesicular fusion probability of TS vesicles, not that of LS plus TS vesicles.

      Recent studies suggest that release sites are not fully occupied by TS vesicles in the baseline (Miki et al., 2016; Pulido and Marty, 2018; Malagon et al., 2020; Lin et al., 2022). Instead, the occupancy (p<sub>occ</sub>) by TS vesicles is subject to dynamic regulation by reversible rate constants (denoted by k<sub>1</sub> and b<sub>1</sub>, respectively). The number of TS vesicles (n) can be factored into the number of release sites (N) and p<sub>occ</sub>, among which N is a fixed parameter but p<sub>occ</sub> depends on k<sub>1</sub>/(k<sub>1</sub>+b<sub>1</sub>) under the framework of the simple refilling model (see Methods). Because these refilling rate constants are regulated by Ca<sup>2+</sup> (Hosoi, et al., 2008), p<sub>occ</sub> is not a fixed parameter. Therefore, release probability should be re-defined as p<sub>occ</sub> * p<sub>v</sub>. Given that N is fixed, the increase in release probability is a major player in STF. Our study asserts that STF by 2.3 times can be attributed to an increase in p<sub>occ</sub> rather than p<sub>v</sub>, because p<sub>v</sub> is close to unity (Fig. S8). Moreover, strong PPD was observed not only in the baseline but also at the early and in the middle of a train (Fig. 2 and 7) and during the recovery phase (Fig. 3), arguing against a gradual increase in p<sub>v</sub> of reluctant vesicles.

      We imagine that the Reviewer meant vesicular release or fusion probability (p<sub>v</sub>) by ‘release probability’. If so, p<sub>v</sub> (of TS vesicles) cannot be a major player in STF, because the baseline p<sub>v</sub> is already higher than 0.8 even if it is most parsimoniously estimated (Fig. 2). Moreover, considering very high refilling rate (23/s), the high double failure rate cannot be explained without assuming that p<sub>v</sub> is close to unity (Fig. S8).

      Conventional models for facilitation assume a post-AP residual Ca<sup>2+</sup>-dependent step increase in p<sub>v</sub> of RRP (Dittman et al., 2000) or reluctant vesicles (Turecek et al., 2016). Given that p<sub>v</sub> of TS vesicles is close to one, an increase in p<sub>v</sub> of TS vesicles cannot account for facilitation. The possibility for activity-dependent increase in fusion probability of LS vesicles (denoted as p<sub>v,LS</sub>) should be considered in two ways depending on whether LS and TS vesicles reside in distinct pools or in the same pool. Notably, strong PPD at short ISI implies that p<sub>v,LS</sub> is near zero at the resting state. Whereas LS vesicles do not contribute to baseline transmission, short-term facilitation (STF) may be mediated by cumulative increase in p<sub>v v,LS </sub> that reside in a distinct pool. Because the increase in p<sub>v,LS</sub> during facilitation recruits new release sites (increase in N), the variance of EPSCs should become larger as stimulation frequency increases, resulting in upward deviation from a parabola in the V-M plane, as shown in recent studies (Valera et al., 2012; Kobbersmed et al., 2020). This prediction is not compatible with our results of V-M analysis (Fig. 3), showing that EPSCs during STF fell on the same parabola regardless of stimulation frequencies. Therefore, it is unlikely that an increase in fusion probability of reluctant vesicles residing in a distinct release pool mediates STF in the present study.

      For the latter case, in which LS and TS vesicles occupy in the same release sites, it is hard to distinguish a step increase in fusion probability of LS vesicles from a conversion of LS vesicles to TS. Nevertheless, our results do not support the possibility for gradual increase in p<sub>v,LS</sub> that occurs in parallel with STF. Strong PPD, indicative of high p<sub>v</sub>, was consistently found not only in the baseline (Fig. 2 and Fig. S6) but also during post-tetanic augmentation phase (Fig. 3D) and even during the early development of facilitation (Fig. 2D-E and Fig. 7), arguing against gradual increase in p<sub>v,LS</sub>. One may argue that STF may be mediated by a drastic step increase of p<sub>v,LS</sub> from zero to one, but it is not distinguishable from conversion of LS to TS vesicles.

      To address the reviewer’s concern, we incorporated these perspectives into Discussion and further clarified the reasoning behind our conclusions.

      References

      Moulder KL, Mennerick S (2005) Reluctant vesicles contribute to the total readily releasable pool in glutamatergic hippocampal neurons. J Neurosci 25:3842–3850.

      Sakaba, T (2006) Roles of the fast-releasing and the slowly releasing vesicles in synaptic transmission at the calyx of Held. J Neurosci 26(22): 5863-5871.

      Please note that papers cited in the manuscript are not repeated here.

      (2) Fig 3 I am confused about the interpretation of the Mean Variance analysis outcome. Since the data points follow the curve during induction of short term plasticity, aren't these suggesting that release probability and not the pool size increases? Related, to measure the absolute release probability and failure rate using the optogenetic stimulation technique is not trivial as the experimental paradigm bias the experiment to a given output strength, and therefore a change in release probability cannot be excluded.

      Under the recent definition of release probability, it can be factored into p<sub>v</sub> and p<sub>occ</sub>, which are fusion probability of TS vesicles and the occupancy of release sites by TS vesicles, respectively. With this regard, our interpretation of the Variance-Mean results is consistent with conventional one: different data points along a parabola represent a change in release probability (= p<sub>occ</sub> x p<sub>v</sub>). Our novel finding is that the increase in release probability should be attributed to an increase in p<sub>occ</sub>, not to that in p<sub>v</sub>.

      (3) Fig4B interprets the phorbol ester stimulation to be the result of pool overfilling, however, phorbol ester stimulation has also been shown to increase release probability without changing the size of the readily releasable pool. The high frequency of stimulation may occlude an increased paired pulse depression in presence of OAG, which others have interpreted in mammalian synapses as an increase in release probability.

      To our experience in the calyx of Held synapses, OAG, a DAG analogue, increased the fast releasing vesicle pool (FRP) size (Lee JS et al., 2013), consistent with our interpretation (pool overfilling). Once the release sites are overfilled in the presence of OAG, it is expected that the maximal STF (ratio of facilitated to baseline EPSCs) becomes lower as long as the number of release sites (N) are limited. As aforementioned, the baseline p<sub>v</sub> is already close to one, and thus it cannot be further increased by OAG. Instead, the baseline p<sub>occ</sub> seems to be increased by OAG.

      Reference

      Lee JS, et al., Superpriming of synaptic vesicles after their recruitment to the readily releasable pool. Proc Natl Acad Sci U S A, 2013. 110(37): 15079-84.

      (4) The literature on Syt7 function is still quite controversial. An observation in the literature that loss of Syt7 function in the fly synapse leads to an increase of release probability. Thus the observed changes in short term plasticity characteristics in the Syt7 KD experiments may contain a release probability component. Can the authors really exclude this possibility? Figure 5 shows for the Syt7 KD group a very prominent depression of the EPSC/IPSC with the second stimulus, particularly for the short interpulse intervals, usually a strong sign of increased release probability, as lack of pool refilling can unlikely explain the strong drop in synaptic output.

      The reviewer raises an interesting point regarding the potential link between Syt7 KD and increased initial p<sub>v</sub>, particularly in light of observations in Drosophila synapses (Guan et al., 2020; Fujii et al., 2021), in which Syt7 mutants exhibited elevated initial p<sub>v</sub>. However, it is important to note that these findings markedly differ from those in mammalian systems, where the role of Syt7 in regulating initial p<sub>v</sub> has been extensively studied. In rodents, consistent evidence indicates that Syt7 does not significantly affect initial p<sub>v</sub>, as demonstrated in several studies (Jackman et al., 2016; Chen et al., 2017; Turecek and Regehr, 2018). Furthermore, in our study of excitatory synapses in the mPFC layer 2/3, we observed an initial p<sub>v</sub> already near its maximal level, approaching a value of 1. Consequently, it is unlikely that the loss of Syt7 could further elevate the initial p<sub>v</sub>. Instead, such effects are more plausibly explained by alternative mechanisms, such as alterations in vesicle replenishment dynamics, rather than a direct influence on p<sub>v</sub>.

      References

      Chen, C., et al., Triple Function of Synaptotagmin 7 Ensures Efficiency of High-Frequency Transmission at Central GABAergic Synapses. Cell Rep, 2017. 21(8): 2082-2089.

      Fujii, T., et al., Synaptotagmin 7 switches short-term synaptic plasticity from depression to facilitation by suppressing synaptic transmission. Scientific reports, 2021. 11(1): 4059.

      Guan, Z., et al., Drosophila Synaptotagmin 7 negatively regulates synaptic vesicle release and replenishment in a dosage-dependent manner. Elife, 2020. 9: e55443.

      Jackman, S.L., et al., The calcium sensor synaptotagmin 7 is required for synaptic facilitation. Nature, 2016. 529(7584): 88-91.

      Turecek, J. and W.G. Regehr, Synaptotagmin 7 mediates both facilitation and asynchronous release at granule cell synapses. Journal of Neuroscience, 2018. 38(13): 3240-3251.

      Reviewer #3 (Public review):

      Summary:

      The report by Shin, Lee, Kim, and Lee entitled "Progressive overfilling of readily releasable pool underlies short-term facilitation at recurrent excitatory synapses in layer 2/3 of the rat prefrontal cortex" describes electrophysiological experiments of short-term synaptic plasticity during repetitive presynaptic stimulation at synapses between layer 2/3 pyramidal neurons and nearby target neurons. Manipulations include pharmacological inhibition of PLC and actin polymerization, activation of DAG receptors, and shRNA knockdown of Syt7. The results are interpreted as support for the hypothesis that synaptic vesicle release sites are vacant most of the time at resting synapses (i.e., p_occ is low) and that facilitation (and augmentation) components of short-term enhancement are caused by an increase in occupancy, presumably because of acceleration of the transition from not-occupied to occupied. The report additionally describes behavioural experiments where trace fear conditioning is degraded by knocking down syt7 in the same synapses.

      Strengths:

      The strength of the study is in the new information about short-term plasticity at local synapses in layer 2/3, and the major disruption of a memory task after eliminating short-term enhancement at only 15% of excitatory synapses in a single layer of a small brain region. The local synapses in layer 2/3 were previously difficult to study, but the authors have overcome a number of challenges by combining channel rhodopsins with in vitro electroporation, which is an impressive technical advance.

      Weaknesses:

      (1) The question of whether or not short-term enhancement causes an increase in p_occ (i.e., "readily releasable pool overfilling") is important because it cuts to the heart of the ongoing debate about how to model short term synaptic plasticity in general. However, my opinion is that, in their current form, the results do not constitute strong support for an increase in p_occ, even though this is presented as the main conclusion. Instead, there are at least two alternative explanations for the results that both seem more likely. Neither alternative is acknowledged in the present version of the report.

      The evidence presented to support overfilling is essentially two-fold. The first is strong paired pulse depression of synaptic strength when the interval between action potentials is 20 or 25 ms, but not when the interval is 50 ms. Subsequent stimuli at frequencies between 5 and 40 Hz then drive enhancement. The second is the observation that a slow component of recovery from depression after trains of action potentials is unveiled after eliminating enhancement by knocking down syt7. Of the two, the second is predicted by essentially all models where enhancement mechanisms operate independently of release site depletion - i.e., transient increases in p_occ, p_v, or even N - so isn't the sort of support that would distinguish the hypothesis from alternatives (Garcia-Perez and Wesseling, 2008, https://doi.org/10.1152/jn.01348.2007).

      The apparent discrepancy in interpretation of post-tetanic augmentation between the present and previous papers [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)] is an important issue that should be clarified. We noted that different meanings of ‘vesicular release probability’ in these papers are responsible for the discrepancy. We added an explanation to Discussion on the difference in the meaning of ‘vesicular release probability’ between the present study and previous studies [Sevens Wesseling (1999), Garcia-Perez and Wesseling (2008)]. In summary, the p<sub>v</sub> in the present study was used for vesicular release probability of TS vesicles, while previous studies used it as vesicular release probability of vesicles in the RRP, which include LS and TS vesicles. Accordingly, p<sub>occ</sub> in the present study is the occupancy of release sites by TS vesicles.

      Not only double failure rate but also other failure rates upon paired pulse stimulation were best fitted at p<sub>v</sub> close to 1 (Fig. S8 and associated text). Moreover, strong PPD, indicating release of vesicles with high p<sub>v</sub>, was observed not only at the beginning of a train but also in the middle of a 5 Hz train (Fig. 2D), during the augmentation phase after a 40 Hz train (Fig 3D), and in the recovery phase after three pulse bursts (Fig. 7). Given that p<sub>v</sub> is close to 1 throughout the EPSC trains and that N does not increase during a train (Fig. 3), synaptic facilitation can be attained only by the increase in p<sub>occ</sub> (occupancy of release sites by TS vesicles). In addition, it should be noted that Fig. 7 demonstrates strong PPD during the recovery phase after depletion of TS vesicles by three pulse bursts, indicating that recovered vesicles after depletion display high p<sub>v</sub> too. Knock-down of Syt7 slowed the recovery of TS vesicles after depletion of TS vesicles, highlighting that Syt7 accelerates the recovery of TS vesicles following their depletion.

      As addressed in our reply to the first issue raised by Reviewer #2 and the third issue raised by Reviewer #3, our results do not support possibilities for recruitment of new release sites (increase in N) having low p<sub>v</sub> or for a gradual increase in p<sub>v</sub> of reluctant vesicles during short-term facilitation.  

      Following statement was added to Discussion in the revised manuscript

      “Previous studies suggested that an increase in p<sub>v</sub> is responsible for post-tetanic augmentation (Stevens and Wesseling, 1999; Garcia-Perez and Wesseling, 2008) by observing invariance of the RRP size after tetanic stimulation. In these studies, the RRP size was estimated by hypertonic sucrose solution or as the sum of EPSCs evoked 20 Hz/60 pulses train (denoted as ‘RRP<sub>hyper</sub>’). Because reluctant vesicles (called LS vesicles) can be quickly converted to TS vesicles (16/s) and are released during a train (Lee et al., 2012), it is likely that the RRP size measured by these methods encompasses both LS and TS vesicles. In contrast, we assert high p<sub>v</sub> based on the observation of strong PPD and failure rates upon paired stimulations at ISI of 20 ms (Fig. 2 and Fig. S8). Given that single AP-induced vesicular release occurs from TS vesicles but not from LS vesicles, p<sub>v</sub> in the present study indicates the fusion probability of TS vesicles. From the same reasons, p<sub>occ</sub> denotes the occupancy of release sites by TS vesicles. Note that our study does not provide direct clue whether release sites are occupied by LS vesicles that are not tapped by a single AP, although an increase in the LS vesicle number may accelerate the recovery of TS vesicles. As suggested in Neher (2024), even if the number of LS plus TS vesicles are kept constant, an increase in p<sub>occ</sub> (occupancy by TS vesicles) would be interpreted as an increase in ‘vesicular release probability’ as in the previous studies (Stevens and Wesseling (1999); Garcia-Perez and Wesseling (2008)) as long as it was measured based on RRP<sub>hyper</sub>.”

      (2) Regarding the paired pulse depression: The authors ascribe this to depletion of a homogeneous population of release sites, all with similar p_v. However, the details fit better with the alternative hypothesis that the depression is instead caused by quickly reversing inactivation of Ca<sup>2+</sup> channels near release sites, as proposed by Dobrunz and Stevens to explain a similar phenomenon at a different type of synapse (1997, PNAS, https://doi.org/10.1073/pnas.94.26.14843). The details that fit better with Ca<sup>2+</sup> channel inactivation include the combination of the sigmoid time course of the recovery from depression (plotted backwards in Fig1G,I) and observations that EGTA (Fig2B) increases the paired-pulse depression seen after 25 ms intervals. That is, the authors ascribe the sigmoid recovery to a delay in the activation of the facilitation mechanism, but the increased paired pulse depression after loading EGTA indicates, instead, that the facilitation mechanism has already caused p_r to double within the first 25 ms (relative to the value if the facilitation mechanism was not active). Meanwhile, Ca<sup>2+</sup> channel inactivation would be expected to cause a sigmoidal recovery of synaptic strength because of the sigmoidal relationship between Ca<sup>2+</sup>-influx and exocytosis (Dodge and Rahamimoff, 1967, https://doi.org/10.1113/jphysiol.1967.sp008367).

      The Ca<sup>2+</sup>-channel inactivation hypothesis could probably be ruled in or out with experiments analogous to the 1997 Dobrunz study, except after lowering extracellular Ca<sup>2+</sup> to the point where synaptic transmission failures are frequent. However, a possible complication might be a large increase in facilitation in low Ca<sup>2+</sup> (Fig2B of Stevens and Wesseling, 1999, https://doi.org/10.1016/s0896-6273(00)80685-6).

      We appreciate the reviewer's thoughtful comment regarding the potential role of Ca<sup>2+</sup> channel inactivation in the observed paired-pulse depression (PPD). As noted by the Reviewer, the Dobrunz and Stevens (1997) suggested that the high double failure rate at short ISIs in synapses exhibiting PPD can be attributed to Ca<sup>2+</sup> channel inactivation. This interpretation seems to be based on a premise that the number of RRP vesicles are not varied trial-by-trial. The number of TS vesicles, however, can be dynamically regulated depending on the parameters k<sub>1</sub> and b<sub>1</sub>, as shown in Fig. S8, implying that the high double failure rate at short ISIs cannot be solely attributed to Ca<sup>2+</sup> channel inactivation. Nevertheless, we acknowledge the possibility that Ca<sup>2+</sup> channel inactivation may contribute to PPD, and therefore, we have further investigated this possibility. Specifically, we measured action potential (AP)-evoked Ca<sup>2+</sup> transients at individual axonal boutons of layer 2/3 pyramidal cells in the mPFC using two-dye ratiometry techniques. Our analysis revealed no evidence for Ca<sup>2+</sup> channel inactivation during a 40 Hz train of APs. This finding indicates that voltage-gated Ca<sup>2+</sup> channel inactivation is unlikely to contribute to the pronounced PPD.

      Figure 2—figure supplement 2 shows how we measured the total Ca<sup>2+</sup> increments at axonal boutons. First we estimated endogenous Ca<sup>2+</sup>-binding ratio from analyses of single AP-induced Ca<sup>2+</sup> transients at different concentrations of Ca<sup>2+</sup> indicator dye (panels A to E). And then, using the Ca<sup>2+</sup> buffer properties, we converted free [Ca<sup>2+</sup>] amplitudes to total calcium increments for the first four AP-evoked Ca<sup>2+</sup> transients in a 40 Hz train (panels G-I). We incorporated these results into the revised version of our manuscript to provide evidence against the Ca<sup>2+</sup> channel inactivation.

      (3) On the other hand, even if the paired pulse depression is caused by depletion of release sites rather than Ca<sup>2+</sup>-channel inactivation, there does not seem to be any support for the critical assumption that all of the release sites have similar p_v. And indeed, there seems to be substantial emerging evidence from other studies for multiple types of release sites with 5 to 20-fold differences in p_v at a wide variety of synapse types (Maschi and Klyachko, eLife, 2020, https://doi.org/10.7554/elife.55210; Rodriguez Gotor et al, eLife, 2024, https://doi.org/10.7554/elife.88212 and refs. therein). If so, the paired pulse depression could be caused by depletion of release sites with high p_v, whereas the facilitation could occur at sites with much lower p_v that are still occupied. It might be possible to address this by eliminating assumptions about the distribution of p_v across release sites from the variance-mean analysis, but this seems difficult; simply showing how a few selected distributions wouldn't work - such as in standard multiple probability fluctuation analyses - wouldn't add much.

      We appreciate the reviewer’s insightful comments regarding the potential increase in p<sub>fusion</sub> of reluctant vesicles. It should be noted, however, that Maschi and Klyachko (2020) showed a distribution of release probability (p<sub>r</sub>) within a single active zone rather than a heterogeneity in p<sub>fusion</sub> of individual docked vesicles. Therefore both p<sub>occ</sub> and p<sub>v</sub> of TS vesicles would contribute to the p<sub>r</sub> distribution shown in Maschi and Klyachko (2020). 

      The Reviewer’s concern aligns closely with the first issue raised by Reviewer #2, to which we addressed in detail. Briefly, new release site may not be recruited during facilitation or post-tetanic augmentation, because variance of EPSCs during and after a train fell on the same parabola (Fig. 3). Secondly, strong PPD was observed not only in the baseline but also during early and late phases of facilitation, indicating that vesicles with very high p<sub>v</sub> contribute to EPSC throughout train stimulations (Fig. 2, 3, and 7). These findings argue against the possibilities for recruitment of new release sites harboring low p<sub>v</sub> vesicles and for a gradual increase in fusion probability of reluctant vesicles.

      To address the reviewers’ concern, we incorporated the perspectives into Discussion and further clarified the reasoning behind our conclusions.

      (4) In any case, the large increase - often 10-fold or more - in enhancement seen after lowering Ca<sup>2+</sup> below 0.25 mM at a broad range of synapses and neuro-muscular junctions noted above is a potent reason to be cautious about the LS/TS model. There is morphological evidence that the transitions from a loose to tight docking state (LS to TS) occur, and even that the timing is accelerated by activity. However, 10-fold enhancement would imply that at least 90 % of vesicles start off in the LS state, and this has not been reported. In addition, my understanding is that the reverse transition (TS to LS) is thought to occur within 10s of ms of the action potential, which is 10-fold too fast to account for the reversal of facilitation seen at the same synapses (Kusick et al, 2020, https://doi.org/10.1038/s41593-020-00716-1).

      As the Reviewer suggested, low external Ca<sup>2+</sup> concentration can lower release probability (p<sub>r</sub>). Given that both p<sub>v</sub> and p<sub>occ</sub> are regulated by [Ca<sup>2+</sup>]<sub>i</sub>, low external [Ca<sup>2+</sup>] may affect not only p<sub>v</sub> but also p<sub>occ</sub>, both of which would contribute to low p<sub>r</sub>. Under such conditions, it would be plausible that the baseline p<sub>r</sub> becomes much lower than 0.1 due to low p<sub>v</sub> and p<sub>occ</sub> (for instance, p<sub>v</sub> decreases from 1 to 0.5, and p<sub>occ</sub> from 0.3 to 0.1, then p<sub>r</sub> = 0.05), and then p<sub>r</sub> (= p<sub>v</sub> x p<sub>occ</sub>) has a room for an increase by a factor of ten (0.5, for example) by short-term facilitation as cytosolic [Ca<sup>2+</sup>] accumulates during a train.

      If p<sub>v</sub> is close to one, p<sub>r</sub> depends p<sub>occ</sub>, and thus facilitation depends on the number of TS vesicles just before arrival of each AP of a train. Thus, post-train recovery from facilitation would depend on restoration of equilibrium between TS and LS vesicles to the baseline. Even if transition between LS and TS vesicles is very fast (tens of ms), the equilibrium involved in de novo priming (reversible transitions between recycling vesicle pool and partially docked LS vesicles) seems to be much slower (13 s in Fig. 5A of Wu and Borst 1999). Thus, we can consider a two-step priming model (recycling pool -> LS -> TS), which is comprised of a slow 1st step (-> LS) and a fast 2nd step (-> TS). Under the framework of the two-step model, the slow 1st step (de novo priming step) is the rate limiting step regulating the development and recovery kinetics of facilitation. Given that on and off rate for Ca<sup>2+</sup> binding to Syt7 is slow, it is plausible that Syt7 may contribute to short-term facilitation (STF) by Ca<sup>2+</sup>-dependent acceleration of the 1st step (as shown in Fig. 9). During train stimulation, the number of LS vesicles would slowly accumulate in a Syt7 and Ca<sup>2+</sup>-dependent manner, and this increase in LS vesicles would shift LS/TS equilibrium towards TS, resulting in STF. After tetanic stimulation, the recovery kinetics from facilitation would be limited by slow recovery of LS vesicles.

      Reference

      Wu, L.-G. and Borst J.G.G. (1999) The reduced release probability of releasable vesicles during recovery from short-term synaptic depression. Neuron, 23(4): 821-832.

      Please note that papers cited in the manuscript are not repeated here.

      Individual points:

      (1) An additional problem with the overfilling hypothesis is that syt7 knockdown increases the estimate of p_occ extracted from the variance-mean analysis, which would imply a faster transition from unoccupied to occupied, and would consequently predict faster recovery from depression. However, recovery from depression seen in experiments was slower, not faster. Meanwhile, the apparent decrease in the estimate of N extracted from the mean-variance analysis is not anticipated by the authors' model, but fits well with alternatives where p_v varies extensively among release sites because release sites with low p_v would essentially be silent in the absence of facilitation.

      Slower recovery from depression observed in the Syt7 knockdown (KD) synapses (Fig. 7) may results from a deficiency in activity-dependent acceleration of TS vesicle recovery. Although basal occupancy was higher in the Syt7 KD synapses, this does not indicate a faster activity-dependent recovery.

      Higher baseline occupancy does not always imply faster recovery of PPR too. Actually PPR recovery was slower in Syt7 KD synapses than WT one (18.5 vs. 23/s). Under the framework of the simple refilling model (Fig. S8Aa), the baseline occupancy and PPR recovery rate are calculated as k<sub>1</sub> / (k<sub>1</sub> + b<sub>1</sub>) and (k<sub>1</sub> + b<sub>1</sub>), respectively. The baseline occupancy depends on k<sub>1</sub>/b<sub>1</sub>, while the PPR recovery on absolute values of k<sub>1</sub> and b<sub>1</sub>. Based on p<sub>occ</sub> and PPR recovery time constant of WT and KD synapses, we expect higher k<sub>1</sub>/b<sub>1</sub> but lower values for (k<sub>1</sub> + b<sub>1</sub>) in Syt7 KD synapses compared to WT ones.

      Lower release sites (N) in Syt7-KD synapses was not anticipated. As you suggested, such low N might be ascribed to little recruitment of release sites during a train in KD synapses. But our results do not support this model. If silent release sites are recruited during a train, the variance should upwardly deviate from the parabola predicted under a fixed N (Valera et al., 2012; Kobbersmed et al. 2020). Our result was not the case (Fig. 3). In the first version of the manuscript, we have argued against this possibility in line 203-208.

      As discussed in both the Results and Discussion sections, the baseline EPSC was unchanged by KD (Fig. S3) because of complementary changes in the number of docking sites and their baseline occupancy (Fig. 6). These findings suggest that Syt7 may be involved in maintaining additional vacant docking sites, which could be overfilled during facilitation. It remains to be determined whether the decrease in docking sites in Syt7 KD synapses is related to its specific localization of Syt7 at the plasma membrane of active zones, as proposed in previous studies (Sugita et al., 2001; Vevea et al., 2021).

      (2) Figure S4A: I like the TTX part of this control, but the 4-AP part needs a positive control to be meaningful (e.g., absence of TTX).

      The reason why we used 4-AP in the presence of TTX was to increase the length constant of axon fibers and to facilitate the conduction of local depolarization in the illumination area to axon terminals. The lack of EPSC in the presence of 4-AP and TTX indicates that illumination area is distant from axon terminals enough for optic stimulation-induced local depolarization not to evoke synaptic transmission. This methodology has been employed in previous studies including the work of Little and Carter (2013).

      Reference

      Little JP and Carter AG (2013) Synaptic mechanisms underlying strong reciprocal connectivity between the medial prefrontal cortex and basolateral amygdala. J Neurosci, 33(39): 15333-15342.

      (3) Line 251: At least some of the previous studies that concluded these drugs affect vesicle dynamics used logic that was based on some of the same assumptions that are problematic for the present study, so the reasoning is a bit circular.

      (4) Line 329 and Line 461: A similar problem with circularity for interpreting earlier syt7 studies.

      (Reply to #3 and #4) We selected the target molecules as candidates based on their well-characterized roles in vesicle dynamics, and aimed to investigate what aspects of STP are affected by these molecules in our experimental context. For example, we could find that the baseline p<sub>occ</sub> and short-term facilitation (STF) are enhanced by the baseline DAG level and train stimulation-induced PLC activation, respectively. Notably, the effect of dynasore informed us that slow site clearing is responsible for the late depression of 40 Hz train EPSC. The knock-down experiments also provided us with information on the critical role of Syt7 in replenishment of TS vesicles. These approaches do not deviate from standard scientific reasoning but rather builds upon prior knowledge to formulate and test hypotheses.

      Importantly, our conclusions do not rely solely on the assumption that altering the target molecule impacts synaptic transmission. Instead, our conclusions are derived from a comprehensive analysis of diverse outcomes obtained through both pharmacological and genetic manipulations. These interpretations align closely with prior literature, further validating our conclusions.

      Therefore, the use of established studies to guide candidate selection and the consistency of our findings with existing knowledge do not represent a logical circularity but rather a reinforcement of the proposed mechanism through converging lines of evidence.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Comments:

      (1) While the authors claim that Syt7-mediated facilitation is connected to the behavioral deficits they observed, this link is still somewhat speculative. This manuscript could benefit from further discussions of other alternative mechanisms to consider.

      We added following statement to Discussion of the revised manuscript:

      “The acquisition of trace fear memory was impaired by inhibition of persistent activity in mPFC during trace period (Gilmartin et al., 2013). The similar deficit observed in Syt7 KD animals is consistent with the hypothesis that STF provides bi-stable ensemble activity in a recurrent network (Mongillo et al., 2012). Nevertheless, alternative mechanisms may be responsible for the behavioral deficit. Not only recurrent network but also long-range loop between the mPFC and the mediodorsal (MD) thalamus play a critical role in maintaining persistent activity within the mPFC especially for a delay period longer than 10 s (Bolkan et al., 2017). Prefrontal L2/3 is heavily innervated by MD thalamus, and L2/3-PCs subsequently relay signals to L5 cortico-thalamic (CT) neurons (Collins et al., 2018). Given that L2/3 is an essential component of the PFC-thalamic loop, loss of STF at recurrent synapses between L2/3 PCs may lead to insufficient L2/3 inputs to L5 CT neurons and failure in the reverberant PFC-MD thalamic feedback loop. Therefore, not only L2/3 recurrent network but also its output to downstream network should be considered as a possible network mechanism underlying behavioral deficit caused by Syt7 KD L2/3.”

      (2) The authors mention that Syt7 contributes to persistent activity during working memory tasks but focus on using only a trace fear conditioning task. However, it would be interesting to see if their results are generalizable to other working memory tasks (i.e. a delayed alternation task).

      We thank to Reviewer for the insightful suggestion. Trace fear conditioning (tFC) shares behavioral properties with working memory (WM) tasks in that tFC is vulnerable to attentional distraction and to the load of WM task. In general WM tasks including delayed alternation tasks such as a T-maze task need persistent activity of ensemble neurons representing target-specific information among multiple choices. Different from such WM tasks, tFC is not appropriate to examine target-specific ensemble activity. Because it is not trivial to examine in vivo recordings in KD animals during delayed alternation tasks, it will be appropriate to study the effect of Syt7 KD in a separate study. 

      (3) The figure legend in Figure 6A and 6B mentions dotted lines and broken lines in the figure. However, this is confusing, and it is unclear as to what these lines are referring to in the figure.

      To avoid the confusion in the figure legend for Figure 6A and 6B, we corrected “dotted line” to " vertical broken line", and “broken lines” to “dashed parabolas”.

      (4) The manuscript can benefit from close reading and editing to catch typos and improve general readability (i.e. line 173: the word "are" is repeated twice).

      We corrected typographical errors throughout the manuscript and carefully read the manuscript to improve readability. A revised version reflecting these corrections has been prepared and will be resubmitted for your consideration.

      Reviewer #3 (Recommendations for the authors):

      The points in this section are all minor.

      (1) Line 44: Define release probability (p_r) more clearly. Authors use it to mean p<sub>v</sub>*p<sub>occ</sub>, but others routinely use it to mean p<sub>v</sub>*p<sub>occ</sub>*N.

      We understand that the Reviewer meant “others routinely use it to mean p<sub>v</sub>”. At this statement, we meant conventional definition of release probability, which is release probability among vesicles of RRP. We think that it is not appropriate to re-define release probability as p<sub>v</sub> * p<sub>occ</sub> in this first paragraph of Introduction. Therefore we clarified this issue in Discussion as we mentioned in our reply to the 1st weakness issue raised by Reviewer #3.   

      (2) Line 82: For clarity, define better what recurrent excitatory synapses are. It seems that synapses between L2/3 PCs and local targets may all be recurrent?

      Each of L2/3 and L5 of the prefrontal cortical layers harbors intralaminar recurrent excitatory synapses between pyramidal cells, called a recurrent network. Previous theoretical studies have proposed that a single layer recurrent network model can have bi-stable E/I balanced states (up- and down-states) if recurrent excitatory synapses display short-term facilitation (STF), and thus is able to temporally hold an information once external input shifts the network to the up-state. In this theory, synapses to local targets across layers are not considered and specific roles of L2/3 and L5 in working memory tasks are still elusive. For clarity, we added a statement at the beginning of the paragraph (line 82): “Each of layer 2/3 (L2/3) and layer 5 (L5) of neocortex displays intralaminar excitatory synapses between pyramidal cells comprising a recurrent network (Holmgren et al., 2003; Thomson and Lamy, 2007)”

      (3) Cite earlier studies of short-term synaptic plasticity at synapses between L2/3 pyramidal neurons and local targets in mPFC. If there are none, take more explicit credit for being first.

      As we mentioned in Introduction, previous studies on short-term plasticity (STP) at neocortical excitatory recurrent synapses have focused on synapses between L5 pyramidal cells (PCs) (Hemple et al. 2000; Wang et al. 2006; Morishima et al., 2011; Yoon et al., 2020). The local connectivity between L2/3 PCs in the somatosensory cortex has been elucidated by Homgren et al. (2003) and Ko et al. (2011). Although these study showed STP of EPSPs, it was at a fixed frequency or stimulus pattern at high external [Ca<sup>2+</sup>] (2 mM). There is a study on the frequency-dependence of STP of EPSP between L2/3-PCs (Feldmyer et al., 2006). Different from our study, Feldmyer et al., (2006) observed monotonous STD at all frequencies less than 50 Hz, but this study was done in the somatosensory cortex and at high external [Ca<sup>2+</sup>] (2 mM). To our knowledge, no previous study have investigated STP at recurrent excitatory synapses of L2/3 pyramidal cells of the mPFC especially at physiological external [Ca<sup>2+</sup>]. The present study, therefore, represents the first extensive investigation of STP at recurrent excitatory synapses in L2/3 of the mPFC under physiologically relevant external [Ca<sup>2+</sup>].

      References

      Feldmeyer D, Lubke J, Silver RA, Sakmann B (2002) Synaptic connections between layer 4 spiny neurone-layer 2/3 pyramidal cell pairs in juvenile rat barrel cortex: physiology and anatomy of interlaminar signalling within a cortical column. J Physiol 538:803-822.

      Holmgren C, Harkany T, Svennenfors B, Zilberter Y (2003) Pyramidal cell communication within local networks in layer 2/3 of rat neocortex. J Physiol 551:139-153.

      Ko H, Hofer SB, Pichler B, Buchanan KA, Sjöström PJ, Mrsic-Flogel TD (2011) Functional specificity of local synaptic connections in neocortical networks. Nature 473:87-91.

      Morishima M, Morita K, Kubota Y, Kawaguchi Y (2011) Highly differentiated projection-specific cortical subnetworks. Journal of Neuroscience 31:10380-10391.

      Wang Y, Markram H, Goodman PH, Berger TK, Ma J, Goldman-Rakic PS (2006) Heterogeneity in the pyramidal network of the medial prefrontal cortex. Nat Neurosci 9:534-542.

      (4) I couldn't figure out the significance of Figure S3. Perhaps this could be explained better.

      Optical minimal stimulation methods have not been previously documented in detail. This figure illustrates what parameters we should carefully examine in order to attain optical minimal stimulation, which hopefully stimulates a single afferent fiber. A single fiber stimulation by optical minimal stimulation is supported by the similarity of our estimate for the number of release sites (N) as the previous morphological estimate (Holler et al., 2021). For minimal stimulation, we used a collimated DMD-coupled LED was employed to restrict 470 nm illumination to a small and well-defined region within layer 2/3 of the prelimbic mPFC, and carefully adjusted the illumination radius such that one step smaller (by 1 μm) illumination results in failure to evoke EPSCs. Our typical illumination area ranged between 3–4 μm, as shown in Figure S3A. Under this minimal illumination area, we confirmed unimodal distributions for the EPSC parameters (amplitude, rise time, decay time and time to peak; Figure 3B-E). Otherwise, we excluded the recordings from analysis. We hope this explanation provides a clearer understanding of the figure's significance.

      (5) Note that CTZ seems to alter p_r at some synapses.

      We acknowledge that CTZ can increase release probability by blocking presynaptic K<sup>+</sup> currents. Indeed, Ishikawa and Takahashi (2001) reported that CTZ slowed the repolarizing phase of presynaptic action potentials and the frequency of miniature EPSCs in the calyx synapses. Consistently, we observed a slight increase in the baseline EPSC amplitude, from 33.3 pA to 41.9 pA (p=0.045) following the application of 50 µM CTZ. However, given that vesicular release probability (p<sub>v</sub>) is already close to 1 at the synapse of our interest, we believe that the observed effect is more likely attributed to an increase in release sites occupancy (p<sub>occ</sub>), which would be reflected as an increase in miniature EPSC frequency in Ishikawa and Takahashi (2001). Given that PPR depends on p<sub>v</sub> rather than p<sub>occ</sub>, this increase in p<sub>occ</sub> would not critically change our conclusion that AMPA receptor desensitization is not responsible for the strong PPD.

      Reference

      Ishikawa, T., & Takahashi, T. (2001). Mechanisms underlying presynaptic facilitatory effect of cyclothiazide at the calyx of Held of juvenile rats. The Journal of Physiology, 533(2), 423-431.

      (6) Figure 8B. The result in Figure 8C seems important, but I couldn't figure out why behaviour was not altered during the acquisition phase summarized in Figure 8B. Perhaps this could be explained more clearly for non-experts.

      Little difference in freezing behavior during acquisition has been also observed when prelimbic persistent firing was optogenetically inhibited (Gilmartin, 2013). Not only CS (tone) but also other sensory inputs (visual and olfactory etc.) and the spatial context could be a cue predicting US (shock). Moreover, during the acquisition phase, the presence of the electric shock inherently induces a freezing response as a natural defensive behavior, which may obscure specific behavioral changes related to the associative learning process. Therefore, the freezing behavior during acquisition cannot be regarded as a sign for specific association of CS and US. Instead, on the next day, we specifically evaluated the CS-US association of the conditioned animals by measuring freezing behavior in response to CS in a distinct context. We explicitly documented little difference between WT and KD animals during the acquisition phase in the relevant paragraph (line 397).

    1. Reviewer #1 (Public review):

      This paper presents a set of tools that will pave the way for a comprehensive understanding of the circuits that control wing motion in flies during flight or courtship. These tools are mainly focused on wing motor neurons and interneurons, as well as a few motor neurons of the haltere. This paper and the library of driver lines described within it will serve as a crucial resource in the pursuit of understanding how neural circuits give rise to behavior. Overall, I found the paper well-written, the figures are quite nice, and the data from the functional experiments convincing. I do not have many major concerns, but a few suggestions that I think will make the paper easier to understand.

      I think the introduction could use some reorganization, as right now I found it quite difficult to follow. For example, lines 85-88 seem to fit more naturally at the end of the next paragraph, compared to the current location of those sentences, which feels rather disjointed. I would suggest introducing the organization of the wing motor system (paragraphs 3 and 4) and then discussing the VNC (paragraph 2) before moving on to describe the neurons within the VNC that may control wing motion. Additionally, lines 141-144, which describe the broad subdivisions of the VNC, can be moved up to where the VNC is first introduced.

      One of my major takeaways from the paper is the call to examine the premotor circuits that govern wing motion. For that reason, I was surprised that there was little mention of the role of sensory input to these circuits. As the authors point out in the discussion, the haltere, for example, provides important input to the wing steering system. I recognize that creating driver lines for the sensory neurons that innervate the VNC is well beyond the scope of this project. I would just like some clarification in the text of the role these inputs play in structuring wing motion, especially as some act at rapid timescales that possibly forgo processing by the very circuits detailed here. This brings up a related issue: if the roles of the interneurons that are presynaptic to the wing motor neurons are "largely unexplored," with how much confidence can we say that they are the key for controlling behavior? To be sure, this has been demonstrated quite nicely in the case of courtship, but in flight, I think the evidence supporting this argument is less clear. I suggest the authors rephrase their language here.

  2. Apr 2025
    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      It seems as if the main point of the paper is about the new data related to rat fish although your title is describing it as extant cartilaginous fishes and you bounce around between the little skate and ratfish. So here's an opportunity for you to adjust the title to emphasize ratfish is given the fact that leader you describe how this is your significant new data contribution. Either way, the organization of the paper can be adjusted so that the reader can follow along the same order for all sections so that it's very clear for comparative purposes of new data and what they mean. My opinion is that I want to read, for each subheading in the results, about the the ratfish first because this is your most interesting novel data. Then I want to know any confirmation about morphology in little skate. And then I want to know about any gaps you fill with the cat shark. (It is ok if you keep the order of "skate, ratfish, then shark, but I think it undersells the new data).

      The main points of the paper are 1) to define terms for chondrichthyan skeletal features in order to unify research questions in the field, and 2) add novel data on how these features might be distributed among chondrichthyan clades. However, we agree with the reviewer that many readers might be more interested in the ratfish data, so we have adjusted the order of presentation to emphasize ratfish throughout the manuscript.

      Strengths:

      The imagery and new data availability for ratfish are valuable and may help to determine new phylogenetically informative characters for understanding the evolution of cartilaginous fishes. You also allude to the fossil record.

      Thank you for the nice feedback.

      Opportunities:

      I am concerned about the statement of ratfish paedomorphism because stage 32 and 33 were not statistically significantly different from one another (figure and prior sentences). So, these ratfish TMDs overlap the range of both 32 and 33. I think you need more specimens and stages to state this definitely based on TMD. What else leads you to think these are paedomorphic? Right now they are different, but it's unclear why. You need more outgroups.

      Sorry, but we had reported that the TMD of centra from little skate did significantly increase between stage 32 and 33. Supporting our argument that ratfish had features of little skate embryos, TMD of adult ratfish centra was significantly lower than TMD of adult skate centra (Fig1). Also, it was significantly higher than stage 33 skate centra, but it was statistically indistinguishable from that of stage 33 and juvenile stages of skate centra. While we do agree that more samples from these and additional groups would bolster these data, we feel they are sufficiently powered to support our conclusions for this current paper.

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth.

      We have included more data summarized in results sub-heading in the abstract as suggested (lines 32-37).

      Historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology and development of these fishes.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies and I don't think your list is exhaustive. You need to expand this list and history which will help with your ultimate comparative analysis without you needed to sample too many new data yourself.

      We have added additional recent and older references: Kölliker, 1860; Daniel, 1934; Wurmbach, 1932; Liem, 2001; Arratia et al., 2001.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text.

      We address a similar comment from this reviewer in more detail below, hoping that any concerns about continuity have been addressed with inclusion of a summary of proposed characters in a new Table 1, re-writing of the Discussion, and modified Fig7 and re-written Fig7 legend.

      Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      While a little unclear exactly what was requested, we restructured the branches to indicate that holocephalans diverged earlier from the ancestors that led to elasmobranchs. Also in response to this comment, we added catshark (S. canicula) and little skate (L. erinacea) specifically to the character matrix.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      Reviewer #1 (Recommendations For The Authors):

      Further Strengths and Opportunities:

      Your headings for the results subsection and figures are nice snapshots of your interpretations of the results and I think they would be better repurposed in your abstract, which needs more depth. It's a little unusual to try and state an interpretation of results as the heading title in a results section and the figures so it feels out of place. You could also use the headings as the last statement of each section, after you've presented the results. In order I would change these results subheadings to:

      Tissue Mineral Density (TMD)

      Tissue Properties of Neural Arches

      Trabecular mineralization

      Cap zone and Body zone Mineralization Patterns

      Areolar mineralization

      Developmental Variation

      Sorry, but we feel that summary Results sub-headings are the best way to effectively communicate to readers the story that the data tell, and this style has been consistently used in our previous publications. No changes were made.

      You allude to the fossil record and that is great. That said historical literature is more abundant than what you've listed. Your first sentence describes a long fascination and only goes back to 1990. But there are authors that have had this fascination for centuries and so I think you'll benefit from looking back. Especially because several of them have looked into histology of these fishes. You even have one sentence citing Coates et al. 2018, Frey et al., 2019 and ørvig 1951 to talk about the potential that fossils displayed trabecular mineralization. That feels like you are burying the lead and may have actually been part of the story for where you came up with your hypothesis in the beginning... or the next step in future research. I feel like this is really worth spending some more time on in the intro and/or the discussion.

      We’ve added older REFs as pointed out above. Regarding fossil evidence for trabecular mineralization, no, those studies did not lead to our research question. But after we discovered how widespread trabecular mineralization was in extant samples, we consulted these papers, which did not focus on the mineralization patterns per se, but certainly led us to emphasize how those patterns fit in the context of chondrichthyan evolution, which is how we discussed them.

      I agree that in the past 15 years or so a lot more work has been done because it can be done using newer technologies. That said there's a lot more work by Mason Dean's lab starting in 2010 that you should take a look at related to tesserae structure... they're looking at additional taxa than what you did as well. It will be valuable for you to be able to make any sort of phylogenetic inference as part of your discussion and enhance the info your present in figure 7. Go further back in time... For example:

      de Beer, G. R. 1932. On the skeleton of the hyoid arch in rays and skates. Quarterly Journal of Microscopical Science. 75: 307-319, pls. 19-21.

      de Beer, G. R. 1937. The Development of the Vertebrate Skull. The University Press, Oxford.

      Indeed, we have read all of Mason’s work, citing 9 of his papers, and where possible, we have incorporated their data on different species into our Discussion and Fig7. Thanks for the de Beer REFs. While they contain histology of developing chondrichthyan elements, they appear to refer principally to gross anatomical features, so were not included in our Intro/Discussion.

      Most sections within the results, read more like a discussion than a presentation of the new data and you jump directly into using an argument of those data too early. Go back in and remove the references or save those paragraphs for the discussion section. Particularly because this journal has you skip the method section until the end, I think it's important to set up this section with a little bit more brevity and conciseness. For instance, in the first section about tissue mineral density, change that subheading to just say tissue mineral density. Then you can go into the presentation of what you see in the ratfish, and then what you see in the little skate, and then that's it. You save the discussion about what other elasmobranch's or mineralizing their neural arches, etc. for another section.

      We dramatically reduced background-style writing and citations in each Results section (other than the first section of minor points about general features of the ratfish, compared to catshark and little skate), keeping only a few to briefly remind the general reader of the context of these skeletal features.

      I like that your first sentence in the paragraph is describing why you are doing. a particular method and comparison because it shows me (the reader) where you're sampling from. Something else is that maybe as part of the first figure rather than having just each with the graph have a small sketch for little skate and catch shark to show where you sampled from for comparative purposes. That would relate back, then to clarifying other figures as well.

      Done (also adding a phylogenetic tree).

      Second instance is your section on trabecular mineralization. This has so many references in it. It does not read like results at all. It looks like a discussion. However, the trabecular mineralization is one of the most interesting aspect of this paper, and how you are describing it as a unique feature. I really just want a very clear description of what the definition of this trabecular mineralization is going to be.

      In addition to adding Table 1 to define each proposed endoskeletal character state, we have changed the structure of this section and hope it better communicates our novel trabecular mineralization results. We also moved the topic of trabecular mineralization to the first detailed Discussion point (lines 347-363) to better emphasize this specific topic.

      Carry this reformatting through for all subsections of the results.

      As mentioned above, we significantly reduced background-style writing and citations in each Results section.

      I'd like to see modifications to figure 7 so that you can add more continuity between the characters, illustrated in figure 7 and the body of the text. I think you can give the characters a number so that you can actually refer to them in each subsection of the results. They can even be numbered sequentially so that they are presented in a standard character matrix format, that future researchers can add directly to their own character matrices. You could actually turn it into a separate table so it doesn't taking up that entire space of the figure, because there need to be additional taxa referred to on the diagram. Namely, you don't have any out groups in figure 7 so it's hard to describe any state specifically as ancestral and wor derived. Generally Holocephalans are the outgroup to elasmobranchs - right now they are presented as sister taxa with no ability to indicate derivation. Why isn't the catshark included in this diagram?

      The character matrix is a fantastic idea, and we should have included it in the first place! We created Table 1 summarizing the traits and terminology at the end of the Introduction, also adding the character matrix in Fig7 as suggested, including specific fossil and extant species. For the Fig7 branching and catshark inclusion, please see above.

      You can repurpose the figure captions as narrative body text. Use less narrative in the figure captions. These are your results actually, so move that text to the results section as a way to truncate and get to the point faster.

      By figure captions, we assume the reviewer refers to figure legends. We like to explain figures to some degree of sufficiency in the legends, since some people do not read the main text and simply skim a manuscript’s abstract, figures, and figure legends. That said, we did reduce the wording, as requested.

      More specific comments about semantics are listed here:

      The abstract starts negative and doesn't state a question although one is referenced. Potential revision - "Comprehensive examination of mineralized endoskeletal tissues warranted further exploration to understand the diversity of chondrichthyans... Evidence suggests for instance that trabecular structures are not common, however, this may be due to sampling (bring up fossil record.) We expand our understanding by characterizing the skate, cat shark, and ratfish... (Then add your current headings of the results section to the abstract, because those are the relevant takeaways.)"

      We re-wrote much of the abstract, hoping that the points come across more effectively. For example, we started with “Specific character traits of mineralized endoskeletal tissues need to be clearly defined and comprehensively examined among extant chondrichthyans (elasmobranchs, such as sharks and skates, and holocephalans, such as chimaeras) to understand their evolution”. We also stated an objective for the experiments presented in the paper: “To clarify the distribution of specific endoskeletal features among extant chondrichthyans”.

      In the last paragraph of the introduction, you say that "the data argue" and I admit, I am confused. Whose data? Is this a prediction or results or summary of other people's work? Either way, could be clarified to emphasize the contribution you are about to present.

      Sorry for this lack of clarity, and we have changed the wording in this revision to hopefully avoid this misunderstanding.

      In the second paragraph of the TMD section, you mention the synarcual comparison. I'm not sure I follow. These are results, not methods. Tell me what you are comparing directly. The non-centrum part of the synarcual separate from the centrum? They both have both parts... did you mean the comparison of those both to the cat shark? Just be specific about which taxon, which region, and which density. No need to go into reasons why you chose those regions here.. Put into methods and discussion for interpretation.

      We hope that we have now clarified wording of that section.

      Label the spokes somehow either in caption or on figure direction. I think I see it as part of figure 4E, I, and J, but maybe I'm misinterpreting.

      Based upon histological features (e.g., regions of very low cellularity with Trichrome unstained matrix) and hypermineralization, spokes in Fig4 are labelled with * and segmented in blue. We detailed how spokes were identified in main text (lines 241-243; 252-254) and figure legend (lines 597-603).

      Reviewer #2 (Public Review):

      General comment:

      This is a very valuable and unique comparative study. An excellent combination of scanning and histological data from three different species is presented. Obtaining the material for such a comparative study is never trivial. The study presents new data and thus provides the basis for an in-depth discussion about chondrichthyan mineralised skeletal tissues.

      Many thanks for the kind words

      I have, however, some comments. Some information is lacking and should be added to the manuscript text. I also suggest changes in the result and the discussion section of the manuscript.

      Introduction:

      The reader gets the impression almost no research on chondrichthyan skeletal tissues was done before the 2010 ("last 15 years", L45). I suggest to correct that and to cite also previous studies on chondrichthyan skeletal tissues, this includes studies from before 1900.

      We have added additional older references, as detailed above.

      Material and Methods:

      Please complete L473-492: Three different Micro-CT scanners were used for three different species? ScyScan 117 for the skate samples. Catshark different scanner, please provide full details. Chimera Scncrotron Scan? Please provide full details for all scanning protocols.

      We clarified exact scanners and settings for each micro-CT experiment in the Methods (lines 476-497).

      TMD is established in the same way in all three scanners? Actually not possible. Or, all specimens were scanned with the same scanner to establish TMD? If so please provide the protocol.

      Indeed, the same scanner was used for TMD comparisons, and we included exact details on how TMD was established and compared with internal controls in the Methods. (lines 486-488)

      Please complete L494 ff: Tissue embedding medium and embedding protocol is missing. Specimens have been decalcified, if yes how? Have specimens been sectioned non-decalcified or decalcified?

      Please complete L506 ff: Tissue embedding medium and embedding protocol is missing. Description of controls are missing.

      Methods were updated to include these details (lines 500-503).

      Results:

      L147: It is valuable and interesting to compare the degree of mineralisation in individuals from the three different species. It appears, however, not possible to provide numerical data for Tissue Mineral Density (TMD). First requirement, all specimens must be scanned with the same scanner and the same calibration values. This in not stated in the M&M section. But even if this was the case, all specimens derive from different sample locations and have, been preserved differently. Type of fixation, extension of fixation time in formalin, frozen, unfrozen, conditions of sample storage, age of the samples, and many more parameters, all influence TMD values. Likewise the relative age of the animals (adult is not the same as adult) influences TMD. One must assume different sampling and storage conditions and different types of progression into adulthood. Thus, the observation of different degrees of mineralisation is very interesting but I suggest not to link this observation to numerical values.

      These are very good points, but for the following reasons we feel that they were not sufficiently relevant to our study, so the quantitative data for TMD remain scientifically valid and critical for the field moving forward. Critically, 1) all of the samples used for TMD calculations underwent the same fixation protocols, and 2) most importantly, all samples for TMD were scanned on the same micro-CT scanner using the same calibration phantoms for each scanning session. Finally, while the exact age of each adult was not specified, we note for Fig1 that clear statistically significant differences in TMD were observed among various skeletal elements from ratfish, shark, and skate. Indeed, ratfish TMD was considerably lower than TMD reported for a variety of fishes and tetrapods (summarized in our paper about icefish skeletons, who actually have similar TMD to ratfish: https://doi.org/10.1111/joa.13537).

      In response, however, we added a caveat to the paper’s Methods (lines 466-469), stating that adult ratfish were frozen within 1 or 2 hours of collection from the wild, staying frozen for several years prior to thawing and immediate fixation.

      Parts of the results are mixed with discussion. Sometimes, a result chapter also needs a few references but this result chapter is full of references.

      As mentioned above, we reduced background-style writing and citations in each Results section.

      Based on different protocols, the staining characteristics of the tissue are analysed. This is very good and provides valuable additional data. The authors should inform the not only about the staining (positive of negative) abut also about the histochemical characters of the staining. L218: "fast green positive" means what? L234: "marked by Trichrome acid fuchsin" means what? And so on, see also L237, L289, L291

      We included more details throughout the Results upon each dye’s first description on what is generally reflected by the specific dyes of the staining protocols. (lines 178, 180, 184, 223, 227, and 243-244)

      Discussion

      Please completely remove figure 7, please adjust and severely downsize the discussion related to figure 7. It is very interesting and valuable to compare three species from three different groups of elasmobranchs. Results of this comparison also validate an interesting discussion about possible phylogenetic aspects. This is, however, not the basis for claims about the skeletal tissue organisation of all extinct and extant members of the groups to which the three species belong. The discussion refers to "selected representatives" (L364), but how representative are the selected species? Can there be a extant species that represents the entire large group, all sharks, rays or chimeras? Are the three selected species basal representatives with a generalist life style?

      These are good points, and yes, we certainly appreciate that the limited sampling in our data might lead to faulty general conclusions about these clades. In fact, we stated this limitation clearly in the Introduction (lines 126-128), and we removed “representative” from this revision. We also replaced general reference to chondrichthyans in the Title by listing the specific species sampled. However, in the Discussion, we also compare our data with previously published additional species evaluated with similar assays, which confirms the trend that we are concluding. We look forward to future papers specifically testing the hypotheses generated by our conclusions in this paper, which serves as a benchmark for identifying shared and derived features of the chondrichthyan endoskeleton.

      Please completely remove the discussion about paedomorphosis in chimeras (already in the result section). This discussion is based on a wrong idea about the definition of paedomorphosis. Paedomorphosis can occur in members of the same group. Humans have paedormorphic characters within the primates, Ambystoma mexicanum is paedormorphic within the urodeals. Paedomorphosis does not extend to members of different vertebrate branches. That elasmobranchs have a developmental stage that resembles chimera vertebra mineralisation does not define chimera vertebra centra as paedomorphic. Teleost have a herocercal caudal fin anlage during development, that does not mean the heterocercal fins in sturgeons or elasmobranchs are paedomorphic characters.

      We agree with the reviewer that discussion of paedomorphosis should apply to members of the same group. In our paper, we are examining paedomorphosis in a holocephalan, relative to elasmobranch fishes in the same group (Chrondrichthyes), so this is an appropriate application of paedomorphosis. In response to this comment, we clarified that our statement of paedomorphosis in ratfish was made with respect to elasmobranchs (lines 37-39; 418-420).

      L432-435: In times of Gadow & Abott (1895) science had completely wrong ideas bout the phylogenic position of chondrichthyans within the gnathostomes. It is curious that Gadow & Abott (1895) are being cited in support of the paedomorphosis claim.

      If paedomorphosis is being examined within Chondrichthyes, such as in our paper and in the Gadow and Abbott paper, then it is an appropriate reference, even if Gadow and Abbott (and many others) got the relative position of Chondrichthyes among other vertebrates incorrect.

      The SCPP part of the discussion is unrelated to the data obtained by this study. Kawaki & WEISS (2003) describe a gene family (called SCPP) that control Ca-binding extracellular phosphoproteins in enamel, in bone and dentine, in saliva and in milk. It evolved by gene duplication and differentiation. They date it back to a first enamel matrix protein in conodonts (Reif 2006). Conodonts, a group of enigmatic invertebrates have mineralised structures but these structure are neither bone nor mineralised cartilage. Cat fish (6 % of all vertebrate species) on the other hand, have bone but do not have SCPP genes (Lui et al. 206). Other calcium binding proteins, such as osteocalcin, were initially believed to be required for mineralisation. It turned out that osteocalcin is rather a mineralisation inhibitor, at best it regulates the arrangement collagen fiber bundles. The osteocalcin -/- mouse has fully mineralised bone. As the function of the SCPP gene product for bone formation is unknown, there is no need to discuss SCPP genes. It would perhaps be better to finish the manuscript with summery that focuses on the subject and the methodology of this nice study.

      We completely agree with the reviewer that many papers claim to associate the functions of SCPP genes with bone formation, or even mineralization generally. The Science paper with the elephant shark genome made it very popular to associate SCPP genes with bone formation, but we feel that this was a false comparison (for many reasons)! In response to the reviewer’s comments, however, we removed the SCPP discussion points, moving the previous general sentence about the genetic basis for reduced skeletal mineralization to the end of the previous paragraph (lines 435-439). We also added another brief Discussion paragraph afterwards, ending as suggested with a summary of our proposed shared and derived chondrichthyan endoskeletal traits (lines 440-453).

      Reviewer #2 (Recommendations For The Authors):

      Other comments

      L40: remove paedomorphism

      No change; see above

      L53: down tune languish, remove "severely" and "major"

      Done (lines 57-59)

      L86: provide species and endoskeletal elements that are mineralized

      No change; this paragraph was written generally, because the papers cited looked at cap zones of many different skeletal elements and neural arches in many different species

      L130: remove TMD, replace by relative, descriptive, values

      No change; see above

      L135: What are "segmented vertebral neural arches and centra" ?

      Changed to “neural arches and centra of segmented vertebrae” (lines 140-141)

      L166: L168 "compact" vs. "irregular". Partial mineralisation is not necessarily irregular.

      Thanks for pointing out this issue; we changed wording, instead contrasting “non-continuous” and “continuous” mineralization patterns (lines 171-174)

      L192: "several endoskeletal regions". Provide all regions

      All regions provided (lines 198-199)

      L269: "has never been carefully characterized in chimeras". Carefully means what? Here, also only one chimera is analyses, not several species.

      Sentence removed

      302: Can't believe there is no better citation for elasmobranch vertebral centra development than Gadow and Abott (1895)

      Added Arriata and Kolliker REFs here (lines 293-295)

      L318 ff: remove discussion from result chapter

      References to paedomorphism were removed from this Results section

      L342: refer to the species studied, not to the entire group.

      Sorry, the line numbering for the reviewer and our original manuscript have been a little off for some reason, and we were unclear exactly to which line of text this comment referred. Generally in this revision, however, we have tried to restrict our direct analyses to the species analyzed, but in the Discussion we do extrapolate a bit from our data when considering relevant published papers of other species.

      346: "selected representative". Selection criteria are missing

      “selected representative” removed

      L348: down tune, remove "critical"

      Done

      L351: down tune, remove "critical"

      Done

      L 364: "Since stem chondrichthyans did not typically mineralize their centra". Means there are fossil stem chondrichthyans with full mineralised centra?

      Re-worded to “Stem chondrichthyans did not appear to mineralize their centra” (lines 379)

      L379: down tune and change to: "we propose the term "non-tesseral trabecular mineralization. Possibly a plesiomorphic (ancestral) character of chondrichthyans"

      No change; sorry, but we feel this character state needs to be emphasized as we wrote in this paper, so that its evolutionary relationship to other chondrichthyan endoskeletal features, such as tesserae, can be clarified.

      L407: suggests so far palaeontologist have not been "careful" enough?

      Apologies; sentence re-worded, emphasizing that synchrotron imaging might increase details of these descriptions (lines 406-408)

      414: down tune, remove "we propose". Replace by "possibly" or "it can be discussed if"

      Sentence re-worded and “we propose” removed (lines 412-415)

      L420: remove paragraph

      No action; see above

      L436: remove paragraph

      No action; see above

      L450: perhaps add summery of the discussion. A summery that focuses on the subject and the methodology of this nice study.

      Yes, in response to the reviewer’s comment, we finished the discussion with a summary of the current study. (lines 440-453)

    1. These people are exceeding courteous, gentle of disposition, and well conditioned, excelling all others that we have seen. I think they excel all the people of America; of stature much higher than we. Some of them are black thin bearded. They make beards of the hair of beasts and one of them offered a beard of their making to one of our sailors, for his that grew on his face, which because it was of a red color they judged to be none of his own. They are quick eyed and steadfast in their looks, fearless of others’ harms, as intending none themselves. Some of the meaner sort given to filching, which the very name of Savages (not weighing their ignorance in good or evil) may easily excuse

      The explorers describe the Native Americans they encountered in notably positive terms, contrasting with later harsher colonial attitudes; it hints at initial possibilities for peaceful relations

    1. Author response:

      The following is the authors’ response to the original reviews

      Joint Public Review:

      Idiopathic scoliosis (IS) is a common spinal deformity. Various studies have linked genes to IS, but underlying mechanisms are unclear such that we still lack understanding of the causes of IS. The current manuscript analyzes IS patient populations and identifies EPHA4 as a novel associated gene, finding three rare variants in EPHA4 from three patients (one disrupting splicing and two missense variants) as well as a large deletion (encompassing EPHA4) in a Waardenburg syndrome patient with scoliosis. EPHA4 is a member of the Eph receptor family. Drawing on data from zebrafish experiments, the authors argue that EPHA4 loss of function disrupts the central pattern generator (CPG) function necessary for motor coordination.

      The main strength of this manuscript is the human genetic data, which provides convincing evidence linking EPHA4 variants to IS. The loss of function experiments in zebrafish strongly support the conclusion that EPHA4 variants that reduce function lead to IS.

      The conclusion that disruption of CPG function causes spinal curves in the zebrafish model is not well supported. The authors' final model is that a disrupted CPG leads to asymmetric mechanical loading on the spine and, over time, the development of curves. This is a reasonable idea, but currently not strongly backed up by data in the manuscript. Potentially, the impaired larval movements simply coincide with, but do not cause, juvenile-onset scoliosis. Support for the authors' conclusion would require independent methods of disrupting CPG function and determining if this is accompanied by spine curvature. At a minimum, the language of the manuscript could be toned down, with the CPG defects put forward as a potential explanation for scoliosis in the discussion rather than as something this manuscript has "shown". An additional weakness of the manuscript is that the zebrafish genetic tools are not sufficiently validated to provide full confidence in the data and conclusions.

      We highly appreciate the reviewer’s insightful comments and the acknowledgment of the main values of our study. We agree with the reviewer that further experiments are needed to fully establish the relationship between CPG and scoliosis. In response, we have revised the conclusion in the manuscript to better reflect this. Additionally, we conducted further analyses on the mutants to provide additional evidence supporting this concept.

      Reviewer #1 (Recommendations for the authors):

      Epha4a mutant zebrafish exhibited mild spinal curves, mostly laterally and in the tail. This was 75% of homozyous mutants but also, surprisingly, about 20% of heterozygotes. epha4b mutants also developed some mild scoliosis. If the two zebrafish paralogs can compensate for each other (partial redundancy), we might expect more severe scoliosis in double mutants. Did the authors generate and analyze double mutants? I believe it would be very useful for this study to report the zebrafish phenotype of loss of both paralogs together.

      We appreciate the reviewer’s insightful comment regarding the potential value of reporting the phenotype of eph4a/eph4b double mutants. While we fully agree that this analysis would be valuable, our attempts to generate double mutants have been unsuccessful. These two genes are closely linked on the chromosome, with less than 100 kb separating them, which makes it challenging to generate double mutants through standard genetic crossing. Establishing a double mutant line would require more than a year due to the technical constraints of the process. Although we are unable to address this question directly at this time, we hypothesize that eph4a/eph4b double mutants may exhibit a higher likelihood of body axis abnormalities based on the phenotypes observed in single mutants and the known functions of these genes.

      We hope this perspective will provide some useful context despite the limitations.

      In Figure 1F, a pCDK5 western blot is performed as a readout of EPH4A signaling after either WT or C849Y mutant EPH4A is transfected into HEK 293T cells. It would be useful to mention in the text, or at least the figure legend, how this experiment was performed/where the protein samples came from. It is included in the methods, but in the main text, it simply says "we conducted western blotting" without mentioning whether the protein samples were from cell lines, patients, or another source.

      Sorry for our ignorance. A detailed description of the western blotting conduction was supplemented at both “results” part (page 8, line 187-190) and the Figure 1 legend.

      Was the relative turn angle biased to the left or right side of the fish? (i.e. is a positive angle a rightward or leftward turn?)

      We are sorry for our unclear description. In Figure 3D, positive angle means turning left, while negative angle means turning right. In wild-type larvae, the average turning angle over a 4-minute period is approximately 0, whereas in mutants, this value deviates from 0, indicating a directional preference (positive for leftward and negative for rightward turns) in swimming behavior during the recording period. We have also made the necessary supplementation in the text and figure legend.

      In Figure 4, morpholinos rather than mutants are used, but it is not clear why. Has it been established that the MO used disrupts gene function specifically? Can the effect of the MO be rescued by expressing a wild-type mRNA of Epha4a? Does MO knockdown induce spinal curves if fish are raised? Indeed, this could be a way to determine whether the spinal curves are caused by early events in development (when MOs are active).

      Thanks for the comments. The efficacy of relevant MOs has been well-documented in numerous previous studies (Addison et al., 2018; Cavodeassi et al., 2013; Letelier et al., 2018; Royet et al., 2017). Following this reviewer’s suggestion, we have raised the epha4a morphants into adults, while no scoliosis were observed, suggesting that the spinal curvature formation may be induced by long-term defects in the absence of Epha4a. Additionally, we reconfirmed the abnormal motor neuron activation frequency phenotype in the mutants background. The corresponding data have replaced the original Figure 4 in the manuscript. 

      References

      (1) Addison, M., Xu, Q., Cayuso, J., and Wilkinson, D.G. (2018). Cell Identity Switching Regulated by Retinoic Acid Signaling Maintains Homogeneous Segments in the Hindbrain. Dev Cell 45, 606-620 e603.

      (2) Cavodeassi, F., Ivanovitch, K., and Wilson, S.W. (2013). Eph/Ephrin signalling maintains eye field segregation from adjacent neural plate territories during forebrain morphogenesis. Development 140, 4193-4202.

      (3) Letelier, J., Terriente, J., Belzunce, I., Voltes, A., Undurraga, C.A., Polvillo, R., Devos, L., Tena, J.J., Maeso, I., Retaux, S., et al. (2018). Evolutionary emergence of the rac3b/rfng/sgca regulatory cluster refined mechanisms for hindbrain boundaries formation. Proc Natl Acad Sci U S A 115, E3731-E3740.

      (4) Royet, A., Broutier, L., Coissieux, M.M., Malleval, C., Gadot, N., Maillet, D., Gratadou-Hupon, L., Bernet, A., Nony, P., Treilleux, I., et al. (2017). Ephrin-B3 supports glioblastoma growth by inhibiting apoptosis induced by the dependence receptor EphA4. Oncotarget 8, 23750-23759.

      Reviewer #2 (Recommendations for the authors):

      Supplementary Table 3 is missing.

      Sorry for any inconvenience caused to the reviewers. Due to the size of the supplementary Table 3, we have separately uploaded an Excel file as supplementary materials. We have also double-checked during the resubmission process of the revised manuscript. Thanks for your thorough review.

      The authors report only a single mutant allele for zebrafish epha4a and epha4b. Additionally, they provide no information about how many generations each allele has been outcrossed. The authors should provide some type of validation that the phenotypes they describe result from loss of function of the targeted gene and not from an off-targeting event.

      Thanks for the comments. For epha4a and epha4b mutants, each homozygous mutant was initially derived from the self-crossing of first filial generation heterozygotes, and subsequent homozygous generations were maintained for fewer than three rounds of in-crossing. Interestingly, we observed a reduction in the incidence of scoliosis across successive generations. This trend may be attributed to potential genetic compensation mechanisms, which could mitigate the phenotypic severity over time. To address concerns about possible off-target effects, we synthesized and injected epha4a mRNA to test for phenotypic rescue. Our data show that epha4a mRNA injection partially restored swimming coordination in the mutants (Fig. S5). Moreover, similar motor coordination defects have been reported in Epha4-deficient mice, as documented in previous studies (Kullander et al., 2003; Borgius et al., 2014). These findings collectively strengthen the hypothesis that Epha4a plays a critical role in regulating motor coordination.

      References

      (1) Borgius, L., Nishimaru, H., Caldeira, V., Kunugise, Y., Low, P., Reig, R., Itohara, S., Iwasato, T., and Kiehn, O. (2014). Spinal glutamatergic neurons defined by EphA4 signaling are essential components of normal locomotor circuits. J Neurosci 34, 3841-3853.

      (2) Kullander, K., Butt, S.J., Lebret, J.M., Lundfald, L., Restrepo, C.E., Rydstrom, A., Klein, R., and Kiehn, O. (2003). Role of EphA4 and EphrinB3 in local neuronal circuits that control walking. Science 299, 1889-1892.

      The authors need to provide allele designations for the mutant alleles following accepted nomenclature guidelines.

      Thank you for your careful review! We have reviewed and made revisions to the genes and mutation symbols throughout the entire text.

      The three antisense morpholino oligonucleotides need to be validated for efficacy and specificity.

      Thanks for the comments. The morpholinos were extensively used and validated in previous studies, and the efficacy of these morpholinos has been thoroughly validated in multiple studies (Addison et al., 2018; Cavodeassi et al., 2013; Letelier et al., 2018; Royet et al., 2017). Furthermore, we also performed swimming behavior analysis in the mutant background, which showed similar results as the morphants. Moreover, we also performed rescue experiments to confirm the specificity of the mutants (Fig. S5). Finally, we reconfirmed the abnormal calcium signaling in the mutants (Fig. 4), which further support our previous knockdown results.

      References

      (1) Addison, M., Xu, Q., Cayuso, J., and Wilkinson, D.G. (2018). Cell Identity Switching Regulated by Retinoic Acid Signaling Maintains Homogeneous Segments in the Hindbrain. Dev Cell 45, 606-620 e603.

      (2) Cavodeassi, F., Ivanovitch, K., and Wilson, S.W. (2013). Eph/Ephrin signalling maintains eye field segregation from adjacent neural plate territories during forebrain morphogenesis. Development 140, 4193-4202.

      (3) Letelier, J., Terriente, J., Belzunce, I., Voltes, A., Undurraga, C.A., Polvillo, R., Devos, L., Tena, J.J., Maeso, I., Retaux, S., et al. (2018). Evolutionary emergence of the rac3b/rfng/sgca regulatory cluster refined mechanisms for hindbrain boundaries formation. Proc Natl Acad Sci U S A 115, E3731-E3740.

      (4) Royet, A., Broutier, L., Coissieux, M.M., Malleval, C., Gadot, N., Maillet, D., Gratadou-Hupon, L., Bernet, A., Nony, P., Treilleux, I., et al. (2017). Ephrin-B3 supports glioblastoma growth by inhibiting apoptosis induced by the dependence receptor EphA4. Oncotarget 8, 23750-23759.

      Line 229. "While in consistent with previous reports, the hindbrain rhombomeric boundaries were found to be defective....". This sentence is not clear. Please describe how it is "inconsistent".

      Thanks for the comments and sorry for the unclear description, we have described this more clearly in our revised manuscript (page 9, line 229-230).

      Animals frequently are described as "heterozygous mutants" or "mutants". Please make clear that the latter are homozygous mutant animals.

      Thanks for the comments. In the manuscript, all references to mutants specifically indicate homozygous mutants. Heterozygous mutants are explicitly identified as such.

      The chromatin interaction portion of the Methods does not include any information on how these experiments were conducted or where the data were obtained. This information needs to be provided.

      Thanks for your advice. The detailed information of chromatin interaction mapping has been provided in “Methods and Materials” (page 18-19, line 450-455). Information about the interacting regions was derived from Hi-C datasets of 21 tissues and cell types provided by GSE87112. The significance of interactions for Hi-C datasets was computed by Fit-Hi-C, with an FDR ≤ 10-6 considered significant.

      The authors present single-cell RNA-seq data in Supplementary Figure 5 for which they cite Cavone et al, 2021. This seems like an odd database to use. Can the authors provide an explanation for choosing it? In any case, the citation should also be made in the Supplementary Figure 5 legend.

      Thank you for your rigorous comment, we have cited this literature in the proper place of the revised manuscript. Cavone et al. used the her4.3:GFP line to label ependymo-radial glia (ERG) progenitor cells and performed single-cell RNA-seq on FACS-isolated fluorescent cells. The isolated cells included not only ERG progenitors but also undifferentiated and differentiated neurons and oligodendrocytes. The authors attributed this to the relative stability of the GFP protein, which remained in the progeny of GFP-expressing her4.3+ ERG progenitor cells, thus effectively acting as a short-term cell lineage tracer. Indeed, clustering analysis of this data successfully identifies neural progenitors and other neural clusters. Therefore, we consider that this scRNA-seq data encompasses a comprehensive range of neural cell types and is suitable for analyzing the expression of genes of interest. Furthermore, we downloaded and analyzed the scRNA-seq data of the zebrafish nervous system reported by Scott et al. in 2021 (Fig. S7B) (Scott et al., 2021). Despite differences in the developmental stages of the larvae analyzed (Cavone et al. examined larvae at 4 dpf, whereas Scott et al. analyzed larvae at 24, 36, and 48 hpf), our findings are consistent. Specifically, epha4a and epha4b are expressed in interneurons, whereas efnb3a and efnb3b are enriched in floor plate cells.

      References

      (1) Scott, K., O'Rourke, R., Winkler, C.C., Kearns, C.A., and Appel, B. (2021). Temporal single-cell transcriptomes of zebrafish spinal cord pMN progenitors reveal distinct neuronal and glial progenitor populations. Dev Biol 479, 37-50.

      In Figure Legend 1, "expressed from the EPHA4-mutant plasmid" is not an accurate description of the experiment.

      Sorry for the previous inaccurate description. The description has been revised to accurately reflect the experiment. “Western blot analysis of EPHA4-c.2546G>A variant showing the protein expression levels of EPHA4 and CDK5 and the amount of phosphorylated CDK5 (pCDK5) in HEK293T cells transfected with EPHA4-mutant or EPHA4-WT plasmid”.

      Figure 3 panels J and K need more explanation. I don't understand what the different colors represent nor do I understand what are wild type and what are mutant data.

      Thank you for your valuable feedback. We apologize for the lack of clarity in the original figure legend. To address this, we have revised the legend of Figure 3 to provide a more detailed explanation. In panels J and K, each color-coded curve represents the response of an individual larva from an independent experimental trial to the stimulus. Specifically, panel J depicts the response data for the wild-type larvae, whereas panel K presents the response data for the homozygous epha4a mutants.

      Please provide the genotypes for the images in Figure 5A.

      Thanks for the comments and we are sorry for our unclear description, we have described this more clearly in the Figure 5.

      Figure legend 6B should also note the heterozygote data with the wild type and homozygous mutant data.

      Thanks for the comments, the data are now included in Figure 6B.

      Epha4 and Efnb3 have well-established roles in axon guidance. Although this is noted in the Discussion, I think a more extensive description of prior findings would be helpful.

      Thanks for your valuable feedback. A more detailed description of the roles of Epha4 and Efnb3 in axon guidance was provided in the “Discussion” (page 16, line 388-396).

      The main conclusion of this manuscript is that EPHA4 variants cause IS by disrupting central pattern generator function. I think this is misleading. I think that the more valid conclusion is that EPHA4 loss of function causes axon pathfinding defects that impair locomotion by disrupting CPG activity, thereby leading to IS. I urge the authors to consider this more nuanced interpretation.

      Thank you for your insightful comments. We appreciate your suggestion to refine our main conclusion. We agree that the proposed revision more accurately reflects our findings and will revise the manuscript accordingly to state that “EPHA4 loss of function causes axon pathfinding defects, which impair locomotion by disrupting central pattern generator activity, potentially leading to IS.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Seidenthal et al. investigated the role of the C. elegans Flower protein, FLWR-1, in synaptic transmission, vesicle recycling, and neuronal excitability. They confirmed that FLWR-1 localizes to synaptic vesicles and the plasma membrane and facilitates synaptic vesicle recycling at neuromuscular junctions, albeit in an unexpected manner. The authors observed that hyperstimulation results in endosome accumulation in flwr-1 mutant synapses, suggesting that FLWR-1 facilitates the breakdown of endocytic endosomes, which differs from earlier studies in flies that suggested the Flower protein promotes the formation of bulk endosomes. This is a valuable finding. Using tissue-specific rescue experiments, the authors showed that expressing FLWR-1 in GABAergic neurons restored the aldicarb-resistant phenotype seen in flwr-1 mutants to wild-type levels. In contrast, FLWR-1 expression in cholinergic neurons in flwr-1 mutants did not restore aldicarb sensitivity, yet muscle expression of FLWR-1 partially but significantly recovered the aldicarb-resistant defects. The study also revealed that removing FLWR-1 leads to increased Ca<sup>2+</sup> signaling in motor neurons upon photo-stimulation. Further, the authors conclude that FLWR-1 contributes to the maintenance of the excitation/inhibition (E/I) balance by preferentially regulating the excitability of GABAergic neurons. Finally, SNG-1::pHluorin data imply that FLWR-1 removal enhances synaptic transmission, however, the electrophysiological recordings do not corroborate this finding.

      Strengths:

      This study by Seidenthal et al. offers valuable insights into the role of the Flower protein, FLWR-1, in C. elegans. Their findings suggest that FLWR-1 facilitates the breakdown of endocytic endosomes, which marks a departure from its previously suggested role in forming endosomes through bulk endocytosis. This observation could be important for understanding how Flower proteins function across species. In addition, the study proposes that FLWR-1 plays a role in maintaining the excitation/inhibition balance, which has potential impacts on neuronal activity.

      Weaknesses:

      One issue is the lack of follow-up tests regarding the relative contributions of muscle and GABAergic FLWR-1 to aldicarb sensitivity. The findings that muscle expression of FLWR-1 can significantly rescue aldicarb sensitivity are intriguing and may influence both experimental design and data interpretation. Have the authors examined aldicarb sensitivity when FLWR-1 is expressed in both muscles and GABAergic neurons, or possibly in muscles and cholinergic neurons? Given that muscles could influence neuronal activity through retrograde signaling, a thorough examination of FLWR-1's role in muscle is necessary, in my opinion.

      We thank the reviewer for this suggestion. Indeed, the retrograde inhibition of cholinergic transmission by signals from muscle has been demonstrated by the Kaplan lab in a number of publications. We have now done the experiments that were suggested, see the new Fig. S3B: rescuing FLWR-1 in cholinergic neurons and in muscle did not perform any better in the aldicarb assay, while co-rescue in GABAergic neurons and muscle, like rescue in GABA neurons, led to a complete rescue to wild type levels. Thus, retrograde signaling from muscle to neurons does not contribute to effects on the E/I imbalance caused by the absence of FLWR1. The fact that muscle rescue can partially rescue the flwr-1 phenotype is likely due a cellautonomous effect of FLWR-1 on muscle excitability, facilitating muscle contraction.

      Would the results from electrophysiological recordings and GCaMP measurements be altered with muscle expression of FLWR-1? Most experiments presented in the manuscript compare wild-type and flwr-1 mutant animals. However, without tissue-specific knockout, knockdown, or rescue experiments, it is difficult to separate cell-autonomous roles from non-cell-autonomous effects, in particular in the context of aldicarb assay results. Also, relying solely on levamisole paralysis experiments is not sufficient to rule out changes in muscle AChRs, particularly due to the presence of levamisole-resistant receptors.

      We repeated the Ca<sup>2+</sup> imaging in cholinergic neurons, in response to optogenetic activation, with expression of FLWR-1 in muscle, see Fig. 4E. This did not significantly alter the increased excitability of the flwr-1 mutant. Thus, we conclude that, along with the findings in aldicarb assays, the function of FLWR-1 in muscle is cell-autonomous, and does not indirectly affect its roles in the motor neurons. Also, cholinergic expression of FLWR-1 by itself reduced Ca<sup>2+</sup> levels to those in wild type (Fig. 4E). In addition, we now also assessed the contribution of the N-AChR (ACR-16) to aldicarb-induced paralysis (Fig. S3C), showing that flwr-1 and acr-16 mutations independently mediate aldicarb resistance, and that these effects are additive. Thus, FLWR-1 does not affect the expression level or function of the N-AChR, as otherwise, the flwr1; acr-16 double mutation would not exacerbate the phenotype of the single mutants.

      This issue regarding the muscle role of FLWR-1 also complicates the interpretation of results from coelomocyte uptake experiments, where GFP secreted from muscles and coelomocyte fluorescence were used to estimate endocytosis levels. A decrease in coelomocyte GFP could result from either reduced endocytosis in coelomocytes or decreased secretion from muscles. Therefore, coelomocytespecific rescue experiments seem necessary to distinguish between these possibilities.

      We have performed a rescue of FLWR-1 in coelomocytes to address this, and found that this fully recovered the CC GFP signals to wild type levels. Therefore, the absence of FLWR-1 in muscles does not affect exocytosis of GFP. The data can be found in Fig. 5A, B.

      The manuscript states that GCaMP was used to estimate Ca<sup>2+</sup> levels at presynaptic sites. However, due to the rapid diffusion of both Ca<sup>2+</sup> and GCaMP, it is unclear how this assay distinguishes Ca<sup>2+</sup> levels specifically at presynaptic sites versus those in axons. What are the relative contributions of VGCCs and ER calcium stores here? This raises a question about whether the authors are measuring the local impact of FLWR-1 specifically at presynaptic sites or more general changes in cytoplasmic calcium levels.

      We compared Ca<sup>2+</sup> signals in synaptic puncta versus axon shafts, and did not find any differences. The data previously shown have been replaced by data where the ROIs were restricted to synaptic puncta. The outcome is the same as before. These data are provided in Fig. 4A, B, E, F. We thus conclude that the impact of FLWR-1 is local, in synaptic boutons.

      The experiments showing FLWR-1's presynaptic localization need clarification/improvement. For example, data shown in Fig. 3B represent GFP::FLWR-1 is expressed under its own promoter, and TagRFP::ELKS-1 is expressed exclusively in GABAergic neurons. Given that the pflwr-1 drives expression in both cholinergic and GABAergic neurons, and there are more cholinergic synapses outnumbering GABAergic ones in the nerve cord, it would be expected that many green FLWR-1 puncta do not associate with TagRFP::ELKS-1. However, several images in Figure 3B suggest an almost perfect correlation between FLWR-1 and ELKS-1 puncta. It would be helpful for the readers to understand the exact location in the nerve cord where these images were collected to avoid confusion.

      Thank you for making us aware that the provided images may be misleading. We have now extended this Figure (Fig. 3A-C) and provided more intensity profiles along the nerve cords in Fig. S4A-C. The quantitative analysis of average R<sup>2</sup> for the two fluorescent signals in each neuron type did not show any significant difference between the two, also after choosing slightly smaller ROIs for line scan analysis. We also highlighted the puncta corresponding to FLWR-1 in both neurons types, as well as to ELKS-1 in each specific neuron type, to identify FLWR-1 puncta without co-localized ELKS-1 signal. Also, we indicated the region that was imaged, i.e. the DNC posterior of the vulva, halfway to the posterior end of the nerve cord.

      The SNG-1::pHluorin data in Figure 5C is significant, as they suggest increased synaptic transmission at flwr-1 mutant synapses. However, to draw conclusions, it is necessary to verify whether the total amount of SNG-1::pHluorin present on synaptic vesicles remains the same between flwr-1 mutant and wild-type synapses. Without this comparison, a conclusion on levels of synaptic vesicle release based on changes in fluorescence might be premature, in particular given the results of electrophysiological recordings.

      We appreciate the comment. We now added data and experiments that verify that the basal SNG-1::pHluorin signal in the plasma membrane, measured at synaptic puncta and in adjacent axonal areas, is not different in flwr-1 mutants compared to wild type in the absence of stimulation. This data can be found in Fig. S5A. In addition, we cultured primary neurons from transgenic animals to compare total SNG-1::pHluorin to the vesicular fraction, by adding buffers of defined pH to the external, or buffers that penetrate the cell and fix intracellular pH. These experiments (Fig. S5B, C) showed no difference in the vesicle fraction of the pHluorin signal in wild type vs. flwr-1 mutant cells, demonstrating that flwr-1 mutants do not per se have altered SNG-1::pHluorin in their SV or plasma membranes.

      Finally, the interpretation of the E74Q mutation results needs reconsideration. Figure 8B indicates that the E74Q variant of FLWR-1 partially loses its rescuing ability, which suggests that the E74Q mutation adversely affects the function of FLWR-1. Why did the authors expect that the role of FLWR-1 should have been completely abolished by E74Q? Given that FLWR-1 appears to work in multiple tissues, might FLWR-1's function in neurons requires its calcium channel activity, whereas its role in muscles might be independent of this feature? While I understand there is ongoing debate about whether FLWR1 is a calcium channel, the experiments in this study do not definitively resolve local Ca<sup>2+</sup> dynamics at synapses. Thus, in my opinion, it may be premature to draw firm conclusions about calcium influx through FLWR-1.

      Thank you for bringing this up. We did not expect E74Q to necessarily abolish FLWR-1 function, unless it would be a Ca<sup>2+</sup> channel. Of course the reviewer is right, FLWR-1 might have functions as an ion channel as well as channel-independent functions. Yet, we are quite confident that FLWR-1 is not an ion channel. Instead, we think that E74Q alters stability of the protein (however, in the absence of biochemical data, we removed this conclusion), and that this impairs the function of FLWR-1 as a modulator, or possibly even, accessory subunit of the PMCA MCA-3. This interaction was indicated by a new experiment we added, where we found that FLWR-1 and MCA-3 must be physically very close to each other in the plasma membrane, using bimolecular fluorescence complementation (see new Fig. 9A, B). This provides a reasonable explanation for findings we obtained, i.e. increased Ca<sup>2+</sup> levels in stimulated neurons of the flwr-1 mutant. If FLWR-1 acts as a stimulatory subunit of MCA-3, then its absence may cause reduced MCA-3 function and thus an accumulation of Ca<sup>2+</sup> in the synaptic terminals. In Drosophila, hyperstimulation of neurons led to reduced Ca<sup>2+</sup> levels (Yao et al., 2017, PLoS Biol 15: e2000931), suggesting that Flower is a Ca<sup>2+</sup> channel. Based on our findings, we suggest an alternative explanation. Based on proteomics, the PMCA is a component of SVs (Takamori et al., 2006, Cell 127: 831-846). Increased insertion of PMCA into the plasma membrane during high stimulation, along with impaired endocytosis in flower mutants, would increase the steadystate levels of PMCA in the PM. This could lead to reduced steady state levels of Ca<sup>2+</sup>. This ‘g.o.f.’ in Flower may also impact on Ca<sup>2+</sup> microdomains of the P/Q type VGCC required for SV fusion, which could contribute to the rundown of EPSCs we find during synaptic hyperstimulation (Fig. 5G-J). We acknowledge, though, that Yao et al. (2009, Cell 138: 947– 960), showed increased uptake of Ca<sup>2+</sup> into liposomes reconstituted with purified Flower protein. However, it cannot be ruled out that a protein contaminant could be responsible, as the controls were empty liposomes, not liposomes reconstituted with a mutated Flower protein purified the same way.

      We also tested the E74Q mutant in its ability to rescue the reduced PI(4,5)P<sub>2</sub> levels in coelomocytes (CCs), where we observed no positive effect. While we have not measured Ca<sup>2+</sup> in CCs, we would assume that here a function of FLWR-1 affecting increased PI(4,5)P<sub>2</sub> levels is not linked to a channel function. It was, nevertheless, compromised by E74Q (Fig. 8D).

      Also, the aldicarb data presented in Figures 8B and 8D show notable inconsistencies that require clarification. While Figure 8B indicates that the 50% paralysis time for flwr-1 mutant worms occurs at 3.5-4 hours, Figure 8D shows that 50% paralysis takes approximately 2.5 hours for the same flwr-1 mutants. This discrepancy should be addressed. In addition, the manuscript mentions that the E74Q mutation impairs FLWR-1 folding, which could significantly affect its function. Can the authors show empirical data supporting this claim?

      We performed the aldicarb assays in a consistent manner, but nonetheless note that some variability from day to day can affect such outcomes. Importantly, we always measured each control (wild type, flwr-1) along with each test strain (FLWR-1 point mutants), to ensure the relevant estimate of a point-mutant’s effect. These assays have been repeated, now including the FLWR-1 wild type rescue strain as a comparison. The data are now combined in Fig. 8B. Regarding the assumed instability of the E74Q mutant, as we, indeed, do not have any experimental data supporting this, we removed this sentence.

      Reviewer #2 (Public review):

      Summary:

      The Flower protein is expressed in various cell types, including neurons. Previous studies in flies have proposed that Flower plays a role in neuronal endocytosis by functioning as a Ca<sup>2+</sup> channel. However, its precise physiological roles and molecular mechanisms in neurons remain largely unclear. This study employs C. elegans as a model to explore the function and mechanism of FLWR-1, the C. elegans homolog of Flower. This study offers intriguing observations that could potentially challenge or expand our current understanding of the Flower protein. Nevertheless, further clarification or additional experiments are required to substantiate the study's conclusions.

      Strengths:

      A range of approaches was employed, including the use of a flwr-1 knockout strain, assessment of cholinergic synaptic activity via analyzing aldicarb (a cholinesterase inhibitor) sensitivity, imaging Ca<sup>2+</sup> dynamics with GCaMP3, analyzing pHluorin fluorescence, examination of presynaptic ultrastructure by EM, and recording postsynaptic currents at the neuromuscular junction. The findings include notable observations on the effects of flwr-1 knockout, such as increased Ca<sup>2+</sup> levels in motor neurons, changes in endosome numbers in motor neurons, altered aldicarb sensitivity, and potential involvement of a Ca<sup>2+</sup>-ATPase and PIP2 binding in FLWR-1's function.

      Weaknesses:

      (1) The observation that flwr-1 knockout increases Ca<sup>2+</sup> levels in motor neurons is notable, especially as it contrasts with prior findings in flies. The authors propose that elevated Ca<sup>2+</sup> levels in flwr-1 knockout motor neurons may stem from "deregulation of MCA-3" (a Ca<sup>2+</sup> ATPase in the plasma membrane) due to FLWR-1 loss. However, this conclusion relies on limited and somewhat inconclusive data (Figure 7). Additional experiments could clarify FLWR-1's role in MCA-3 regulation. For instance, it would be informative to investigate whether mutations in other genes that cause elevated cytosolic Ca<sup>2+</sup> produce similar effects, whether MCA-3 physically interacts with FLWR-1, and whether MCA-3 expression is reduced in the flwr-1 knockout.

      We thank the reviewer for bringing up these critical points. As to other mutations that produce elevated cytosolic Ca<sup>2+</sup>: Possible mutations could be g.o.f. mutations of the ryanodine receptor UNC-68, the sarco-endoplasmatic Ca<sup>2+</sup> ATPase, or mutants affecting VGCCs, like the L-type channel EGL-19 or the P/Q-type channel UNC-2. However, any such mutant would affect muscle contractions (as we have shown for r.o.f. mutations in unc-68, egl-19 and unc-2 in Nagel et al. 2005 Curr Biol 15: 2279-84) and thus would affect aldicarb assays (see aldicarb resistance induced by RNAi of these genes in Sieburth et al., 2005, Nature 436: 510). The same should be expected for g.o.f. mutations of any such gene. In neurons, we would expect increased or decreased Ca<sup>2+</sup> levels in response to stimulation.

      Regarding the physical interaction of MCA-3 and FLWR-1, we performed bimolecular fluorescence complementation, with two fragments of mVenus fused to the two proteins. This assay shows mVenus reconstitution (i.e., fluorescence) if the two proteins are found in close vicinity to each other. Testing MCA-3 and FLWR-1 in muscle indeed showed a robust signal, evenly distributed on the plasma membrane. As a control, FLWR-1 did not interact with another plasma membrane protein, the stomatin UNC-1 interacting with gap junction proteins (Chen et al., 2007, Curr Biol 17: 1334-9). FLWR-1 also interacted with the ER chaperone Nicalin (NRA2 in C. elegans), which helps assembling the TM domains of integral membrane proteins in association with the SEC translocon. However, this signal only occurred in the ER membrane, demonstrating the specificity of the BiFC assay. This data is presented in Fig. 9A, B. Additionally, we show that FLWR-1 expression has a function in stabilizing MCA-3 localization at synapses, which is also in line with the idea of a direct interaction (Fig. 9C, D).

      (2) In silico analysis identified residues R27 and K31 as potential PIP2 binding sites in FLWR-1. The authors observed that FLWR-1(R27A/K31A) was less effective than wild-type FLWR-1 in rescuing the aldicarb sensitivity phenotype of the flwr-1 knockout, suggesting that FLWR-1 function may depend on PIP2 binding at these two residues. Given that mutations in various residues can impair protein function non-specifically, additional studies may be needed to confirm the significance of these residues for PIP2 binding and FLWR-1 function. In addition, the authors might consider explicitly discussing how this finding aligns or contrasts with the results of a previous study in flies, where alanine substitutions at K29 and R33 impaired a Flower-related function (Li et al., eLife 2020).

      We further investigated the role of these two residues in an in vivo assay for PIP2 binding and membrane association of a reporter. We used the coelomocytes (CCs), in which a previous publication demonstrated that a GFP variant tagged with a PH domain would be recruited to the CC membrane (Bednarek et al., 2007, Traffic 8: 543-53). This assay was performed in wild type, flwr-1 mutants, and flwr-1 mutants rescued with wild type FLWR-1, the FLWR-1(E74Q) mutant, or the FLWR-1(K27A; R31A) double mutant. The data are shown in Fig. 8C, D. While the wild type FLWR-1 rescued PH-GFP levels at the CC membrane to the wild type control, the FLWR-1(K27A; R31A) double mutant did not rescue the reporter binding, indicating that, at least in CCs, reduced PIP2 levels are associated with non-functional FLWR-1. Mechanistically, this is not clear at present, though we noted a possible mechanism as found for synaptotagmin, that recruits the PIP2 kinase to the plasma membrane via a lysine and arginine containing motif (Bolz et al., 2023, Neuron 111: 3765-3774.e3767). We mention this now in the discussion. We also discussed our data with respect to the findings of Li et al., about the analogous residues K27, R31 (K29, R33) in the discussion section, i.e. lines 667-670, and the differences of our findings in electron microscopy compared to the Drosophila work (more rather than less bulk endosomes) were discussed in lines 713-720.

      (3) A primary conclusion from the EM data was that FLWR-1 participates in the breakdown, rather than the formation, of bulk endosomes (lines 20-22). However, the reasoning behind this conclusion is somewhat unclear. Adding more explicit explanations in the Results section would help clarify and strengthen this interpretation.

      We added a sentence trying to better explain our reasoning. Mainly, the argument is that accumulation of such endosomes of unusually large size is seen in mutants affecting formation of SVs from the endosome (in endophilin and synaptojanin mutants), while mutants affecting mainly endocytosis (dynamin) cause formation of many smaller endocytic structures that stay attached to the plasma membrane (Kittelmann et al., 2013, PNAS 110: E3007-3016). We changed our data analysis in that we collated the data for what we previously termed endosomes and large vesicles. According to the paper by Watanabe, 2013, eLife 2: e00723, endosomes are defined by their location in the synapse, and their size. However, this work used a much shorter stimulus and froze the preparations within a few dozens to hundreds of msec after the stimulus, while we used the protocol of Kittelmann 2013, which uses 30 sec stimulation and freezing after 5 sec. There, endosomes were defined as structures larger than SVs or DCVs, but no larger than 80 nm, with an electron dense lumen, and were very rarely observed. In contrast, large vesicles or ‘100 nm vesicles’, ranged from 50-200 nm diameter, with a clear lumen, were morphologically similar to the bulk endosomes as observed by Li et al., 2021. We thus reordered our data and jointly analyzed these structure as large vesicles / bulk endosomes. The outcome is still the same, i.e. photostimulated flwr-1 mutants showed more LVs than wild type synapses.

      (4) The aldicarb assay results in Figure 3 are intriguing, indicating that reduced GABAergic neuron activity alone accounts for the flwr-1 mutant's hyposensitivity to aldicarb. Given that cholinergic motor neurons also showed increased activity in the flwr-1 mutant, one might expect the flwr-1 mutant to display hypersensitivity to aldicarb in the unc-47 knockout background. However, this was not observed. The authors might consider validating their conclusion with an alternative approach or, at the minimum, providing a plausible explanation for the unexpected result. Since aldicarb-induced paralysis can be influenced by factors beyond acetylcholine release from cholinergic motor neurons, interpreting aldicarb assay results with caution may be advisable. This is especially relevant here, as FLWR-1 function in muscle cells also impacts aldicarb sensitivity (Figure S3B). Previous electrophysiological studies have suggested that aldicarb sensitivity assays may sometimes yield misleading conclusions regarding protein roles in acetylcholine release.

      We tested the unc-47; flwr-1 animals again at a lower concentration of aldicarb, to see if the high concentration may have leveled the differences between unc-47 animals and the double mutant. This experiment is shown in Fig. S3D, demonstrating that the double mutant is significantly less resistant to aldicarb. This verifies that FLWR-1 acts not only in GABAergic neurons, but also in cholinergic neurons (as we saw by electron microscopy and electrophysiology), and that the increased excitability of cholinergic cells leads to more acetylcholine being released. In the double mutant, where GABA release is defective, this conveys hypersensitivity to aldicarb.

      (5) Previous studies have suggested that the Flower protein functions as a Ca<sup>2+</sup> channel, with a conserved glutamate residue at the putative selectivity filter being essential for this role. However, mutating this conserved residue (E74Q) in C. elegans FLWR-1 altered aldicarb sensitivity in a direction opposite to what would be expected for a Ca<sup>2+</sup> channel function. Moreover, the authors observed that E74 of FLWR1 is not located near a potential conduction pathway in the FLWR-1 tetramer, as predicted by Alphafold3. These findings raise the possibility that Flower may not function as a Ca<sup>2+</sup> channel. While this is a potentially significant discovery, further experiments are needed to confirm and expand upon these results.

      As above, we do not exclude that FLWR-1 may constitute a channel, however, based on our findings, AF3 structure predictions and data in the literature, we are considering alternative explanations for the observed effect on Ca<sup>2+</sup> levels of Flower mutants in worms and flies. The observations of increase Ca<sup>2+</sup> levels in stimulated flwr-1 mutant neurons could result from a reduced stimulation of the PMCA, and this was also observed with low stimulation in Drosophila (Yao et al., 2017). This idea is supported by the indications of a direct physical interaction, or proximity, of the two proteins. The reduced Ca<sup>2+</sup> levels after hyperstimulation of Drosophila Flower mutants may have to do with increased levels of non-recycling PMCA in the plasma membrane, indicating that PMCA requires Flower for recycling. This could be underlying the rundown of evoked PSCs we find in worm flwr-1 mutants, and would also be in line with a function of FLWR-1 and MCA-3 in coelomocytes, cells that constantly endocytose, and in which both proteins are required for proper function (our data, Figs. 5A, B; 8D, E) and Bednarek et al., 2007 (Traffic 8: 543-553). CCs need to recycle / endocytose membranes and membrane proteins, and such proteins, likely including FLWR-1 and MCA-3, need to be returned to the PM effectively.

      We thus refrained from testing a putative FLWR-1 channel function in Xenopus oocytes, in part also because we would not be able to acutely trigger possible FLWR-1 gating. A constitutive Ca<sup>2+</sup> current, if it were present, would induce large Cl<sup>-</sup> conductance in oocytes, that would likely be problematic / killing the cells. The demonstration that FLWR-1(E74Q) does not rescue the PI(4,5)P<sub>2</sub> levels in coelomocytes is also more in line with a non-channel function of FLWR-1.

      (6) Phrases like "increased excitability" and "increased Ca<sup>2+</sup> influx" are used throughout the manuscript. However, there is no direct evidence that motor neurons exhibit increased excitability or Ca<sup>2+</sup> influx. The authors appear to interpret the elevated Ca<sup>2+</sup> signal in motor neurons as indicative of both increased excitability and Ca<sup>2+</sup> influx. However, this elevated Ca<sup>2+</sup> signal in the flwr-1 mutant could occur independently of changes in excitability or Ca<sup>2+</sup> influx, such as in cases of reduced MCA-3 activity. The authors may wish to consider alternative terminology that more accurately reflects their findings.

      Thank you, we rephrased the imprecise wording. Ca<sup>2+</sup> influx was meant with respect to the cytosol.

      Reviewer #3 (Public review):

      Summary:

      Seidenthal et al. investigated the role of the Flower protein, FLWR-1, in C. elegans and confirmed its involvement in endocytosis within both synaptic and non-neuronal cells, possibly by contributing to the fission of bulk endosomes. They also uncovered that FLWR-1 has a novel inhibitory effect on neuronal excitability at GABAergic and cholinergic synapses in neuromuscular junctions.

      Strengths:

      This study not only reinforces the conserved role of the Flower protein in endocytosis across species but also provides valuable ultrastructural data to support its function in the bulk endosome fission process. Additionally, the discovery of FLWR-1's role in modulating neuronal excitability broadens our understanding of its functions and opens new avenues for research into synaptic regulation.

      Weaknesses:

      The study does not address the ongoing debate about the Flower protein's proposed Ca<sup>2+</sup> channel activity, leaving an important aspect of its function unexplored. Furthermore, the evidence supporting the mechanism by which FLWR-1 inhibits neuronal excitability is limited. The suggested involvement of MCA-3 as a mediator of this inhibition lacks conclusive evidence, and a more detailed exploration of this pathway would strengthen the findings.

      We added new data showing the likely direct interaction of FLWR-1 with the PMCA, possibly upregulating / stimulating its function. This data is shown now in Fig. 9A, B. Also, we show now that FLWR-1 is required to stabilize MCA-3 expression / localization in the pre-synaptic plasma membrane (Fig. 9C, D). These findings are not supporting the putative function of FLWR-1 as an ion channel, but suggest that increased Ca<sup>2+</sup> levels following neuron stimulation in flwr-1 mutants are due to an impairment of MCA-3 and thus reduced Ca<sup>2+</sup> extrusion.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors might consider focusing on one or two key findings from this study and providing robust evidence to substantiate their conclusions.

      We did substantiate the interactions of FLWR-1 and the PMCA, as well as assessing the function of FLWR-1 in the coelomocytes and the function of FLWR-1 in regulating PIP2 levels in the plasma membrane.

      Reviewer #3 (Recommendations for the authors):

      (1) Behavioral Analysis of Locomotion

      In Figure 1, the authors are encouraged to examine whether flwr-1 mutants show altered locomotion behaviors, such as velocity, in a solid medium.

      We performed such an analysis for wild type, comparing to flwr-1 mutants and flwr-1 mutants rescued with FLWR-1 expressed from the endogenous promoter. The data are shown in Fig. S1C. There was no difference. We note that we observed differences in swimming assays also only when we strongly stimulated the cholinergic neurons by optogenetic depolarization, but not during unstimulated, normal swimming.

      (2) Validation of FLWR-1 Tagging

      In Figure 2A, it is recommended that the authors confirm the functionality of the C-terminal-tagged FLWR-1.

      We performed such rescue assays during swimming. The data is shown in Fig. S2S, E. While the GFP::FLWR-1 animals were slightly affected right after the photostimulation, they quickly caught up with the wild type controls, while flwr-1 mutants remained affected even after several minutes.

      (3) Explanation of Differential Rescue in GABAergic Neurons and Muscle

      The authors should provide a rationale for why restoring FLWR-1 in GABAergic neurons fully rescues the aldicarb resistance phenotype, while its restoration in muscle also partially rescues it.

      We think that these effects are independent of each other, i.e. loss of FLWR-1 in muscles increases muscular excitability, which becomes apparent in the behavioral assay that depends on locomotion and muscle contraction. To assess this further, we performed combined GABAergic neuron and muscle rescue assays, as shown in Fig. S3B. The double rescue was not different from wild type, and performed better than the muscle rescue alone.

      (4) Rescue Experiments for Swimming Defect in GABAergic Neurons

      Consider adding rescue experiments to determine whether expressing FLWR-1 specifically in GABAergic neurons can restore the swimming defect phenotype.

      We did not perform this assay as swimming is driven by cholinergic neurons, meaning that we would only indirectly probe GABAergic neuron function and a GABAergic FLWR-1 rescue would likely not improve swimming much. Also, given the importance of the correct E/I balance in the motor neurons, it would likely require achieving expression levels that are very precisely matching endogenous expression levels, which is not possible in a cell-specific manner.

      (5) Further Data on GCaMP Assay for mca-3; flwr-1 Additive Effect

      The additive effect of the mca-3 and flwr-1 mutations on GCaMP signals requires further data for substantiation. Additional GCaMP recordings or statistical analysis would provide stronger support for the proposed interaction between MCA-3 and FLWR-1 in calcium signaling.

      Thank you. We increased the number of observations, and could thus improve the outcome of the assay in that it became more conclusive. Meaning, the double mutation was not exacerbating the effect of either single mutant, demonstrating that FLWR-1 and MCA-3 are acting in the same pathway. The data are in Fig. 7B, C.

      (6) Inclusion of Wild-Type FLWR-1 Rescue in Figures 8B and 8D

      Figures 8B and 8D would benefit from the inclusion of wild-type FLWR-1 as a rescue control.

      We included the FLWR-1 wild type rescue as suggested and summarized the data in Fig. 8B.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Responses to final minor critiques following initial revision

      Reviewer #1 (Recommendations for the authors): 

      The authors have generally done an excellent job of addressing my and the other reviewers' concerns. I have a few additional concerns that the authors could consider addressing through changes to the text: 

      We thank the Reviewer for this assessment and are glad to have addressed the major points.

      - Regarding the gRNA used for NMR studies, I thank the authors for adding additional rationale for their design of the RNA used. However, I still believe that it is misleading to term this RNA as a "gRNA", given that it is mainly composed of a sequence that is arbitrary (the spacer) and the sections of the gRNA that are constant between all gRNAs are truncated in a way that removes secondary structure that is likely essential for specific contacts with the Rec domains. I do not believe the authors need to make alterations to any of their experiments. However, I do think their description of the "gRNA" should be updated to properly reflect that this RNA lacks any of the secondary structure present in a typical gRNA, much of which is necessary to confer specificity of binding between GeoCas9 and the gRNA. As mentioned in my previous review, this may be best achieved by adding a cartoon of the secondary structure of the full-length gRNA and highlighting the region that was used in the truncated "gRNA". 

      We understand the Reviewer’s point. For any experiment in which the gRNA was truncated (i.e. NMR or some MST studies), we have clarified the text and no longer call it a “gRNA.” We state initially that it is a portion of the gRNA and then call it simply an “RNA.” 

      For experiments using the full-length constructs, we have kept the term “gRNA,” as it remains appropriate.

      We have also added a final Supplementary figure (S12) showing the structures of the truncated and full-length RNAs used, based on the _Geo_Cas9 cryo-EM structure and predicted with RNAfold.

      - Lines 256-257: "The ~3-fold decrease in Kd...". I believe the authors are discussing the Kd's of the mutants relative to WT, in which case the Kd increased. Also, the fold-change appears closer to 2fold than to 3-fold. 

      Yes, the Reviewer makes a good catch. We have corrected this.

      - Lines 407-408: "The mutations also diminished the stability of the full-length GeoCas9 RNP complex." This statement seems at odds with the authors' conclusions in the Results section that the full-length GeoCas9 variants had comparable affinities for the gRNAs (lines 376-382) 

      We agree that this seems contradictory. In the absence of full-length structures for all variants, we can’t definitively state what causes this. It could be that the mutation has an interesting allosteric effect on structure that does not affect RNA binding but induces the Cas9 protein to simply fall apart at lower temperatures, rendering the binding interaction moot. We have added a statement to this section.

      - The authors chose to keep "SpCas9" for consistency with their prior work and the work of many several others, including Doudna et al and Zhang et al. However, I will note that their publications on GeoCas9, the Doudna lab did use SpyCas9 to ensure consistent nomenclature within the publications. 

      We have made the change to “_Spy_Cas9”

      Reviewer #3 (Recommendations for the authors): 

      The authors clearly answered most of my concerns. I still have some technical questions about the analysis of CPMG-RD data but the numbers provided now seem to make sense. While I still think that crystal structures of the point mutant would make the conclusions more "bullet proof", I do appreciate the work associated with this and consider that the manuscript can be published as is. 

      We agree that additional magnetic fields could allow for additional models of CPMG data fitting and that additional crystal structures of the mutants could add to the conclusions. We appreciate the Reviewer recognizing the balance of the current results and potential future studies in signing off on publication.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their thoughtful comments and suggestions. Our plans for revisions are first summarized. Below you can find the original reviews and our responses and detailed plans (indicated by "Response").

      Revision plan summary:

      1. Many of the concerns can be addressed by changes in the text and better explanations of how the experiments were done. These changes are detailed in the point-by-point responses.
      2. The reviewers suggested experiments such as ChIP-seq and immunoprecipitation which require collection of a large number of mutants. Since our mutants are sterile, the line needs to be maintained as heterozygotes, from which we can pick out individual mutant worms. Therefore, with the current reagents it is impossible to collect mutants in sufficient quantities for ChIP-seq or IP. We understand that it limits the conclusions that can be drawn.
      3. For some figures, additional quantification of fluorescence signal will be done to show differences between mutant and wild type.
      4. A few experiments will be repeated:
      5. We will repeat the ATPase assays shown on Fig 1 with additional independently prepared and purified protein samples.
      6. Additional replicates will be performed for the few immunofluorescence experiments that were only performed once. Point-by-point responses:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Dosage compensation (DC) in C. elegans involves halving the gene expression from the two hermaphrodite X chromosomes to match the output of the single X in male worms. The key regulator of this repression is a specialized condensin complex, which is defined by a dedicated SMC-4 paralog, termed DPY-27. SMC-4 in other animals is an ATPase that functions as a motor of loop extrusion in cohesion complexes. In their current manuscript, Chawla et al. assessed whether DPY-27 has ATPase function and whether this activity is required for dosage compensation. It had previously been shown that an ATPase-deficient 'EQ' mutant DPY-27 protein interacts with other DC complex members, yet fails to localize to the X. This observation was made with an extra copy of DPY-GFP expressed in addition to the endogenous wildtype protein [Ref 77]. No dominant negative effect was observed. The authors have now engineered the 'EQ' mutation into the endogenous gene locus and genetically generated hetero- and homozygous ATPase mutant worms. Their data suggest that the ATPase activity is required or X-chromosome localization, complex assembly, chromosome compaction as well as enrichment of H4K20me1 on the dampened X chromosome.

      Major comments: 1. ATPase assays, Figure 1.Preparations of individual recombinant proteins may vary significantly and may occasionally show much reduced enzymatic activity. A conclusion about the failure of an ATPase activity should not be concluded from a single preparation, but several protein preps need to be tested, which then serve as 'biological replicates' for the in vitro reaction. Apparently, the ATPase assays shown only involved technical replicates, which is not sufficient.

      Response: We will express and purify additional protein samples and will repeat the assay.

      CRISPR-mediated engineering may lead to unwanted reactions, exemplified by the 'indel' mutation that was recovered in one clone. As a good practice and important control, the sequences of the mutated alleles in the worms should be determined by sequencing of PCR products. Restrictions enzyme cleavage or gel electrophoresis of the PCR products is not sufficient to document the nature of the mutation.

      Response: The sequence of the edit was confirmed by Sanger sequencing. We will make it clear in the text.

      All IF data need to be collected from at least 2 biological replicates, i.e. the experiment must have been carried out independently on two different days. The replicates should deliver consistent results. The number of independent replicates should be mentioned in each figure legend.

      Response: Most of our experiments were performed multiple times. We will indicate the number of replicates in the figure legends. The one or two experiments that were only performed once, will be repeated an additional time.

      The expression levels of wildtype and mutant proteins are concluded from IFM. This is very qualitative; quantitative measurements would strengthen the paper.

      Response: We will quantify fluorescence intensity on our existing images to show differences between mutant and wild type.

      Figure 4B: What are the criteria for classification of the three classes of mutant nuclei? To the uninitiated eye they look very similar. I am a bit worried about the human bias, if such diffuse staining are to be categorized. The two categories of localization need be documented better.

      Response: We will provide more images to show the range of phenotypes and provide a better explanation of how they were classified. We will also try a few ways to quantify “diffuseness” to provide a numerical readout.

      Figure 5: volume of the X chromosome. Related to (5): Apparently, the mask that contains the X chromosome was drawn by hand on each individual nucleus? I find it very difficult to see how the X chromosomal territory would be assessed in the examples shown. I would be good to see a panel of nuclei, in which the masks are visible. I think the analysis should be blinded, in which a researcher not involved in the analysis draws masks on coded nuclei and their classes are only revealed later. The same concern holds for the FISH/IP overlaps or DPY-27/SDC-2 overlaps.

      Response: The masks used were not drawn by hand but were based on fluorescence intensity thresholds. We will make a supplementary figure that shows the masks used for quantification to help clarify how the experiment and quantification were performed.

      For figure 5, age-matched hermaphrodites were analyzed. How was the age determined and what would be the consequence of age-variations? What is the effect of the mutations on development?

      Response: For our staining experiments, we routinely use young adult which we define as 24 hr past larval L4 stage. At this stage, young adults have started laying eggs. We have unpublished data that shows that dosage compensation and chromosome compaction deteriorates with age. To avoid using old worms in our assays, we pick L4 larvae, and then use them for experiments the following day.

      Minor comments: 8. The labeling of p-values as a-f in the figures with the values listed in a supplemental table is not comfortable. The p-values corresponding to the letters should be listed in the corresponding legends.

      Response: p values can be added to the figure or the figure legend (they are currently in supplementary tables).

      How were the concentrations of the ATPase preparations determined? It would help to see a proteins gel in the supplement to assess their purity.

      Response: Concentrations were determined using a spectrometer. We can show protein gels of the preparations as a supplementary figure.

      In figure 1, heterodimers are assumed, but not shown. Do they dimerize under these conditions?

      Response: We can cite papers from others that show heterodimerization in these conditions (for example, Hassler et al, 2019).

      Reviewer #1 (Significance (Required)):

      Significance: The involvement of the ATPase function of DPY-27 was somewhat expected, in light of the earlier findings published in reference 77 using a transgene. The current study confirms and extends these earlier findings. In principle, the genetic experiment presented here is stronger, if documented better.

      Strengths: The study investigates endogenous proteins and measures different phenomena known to be correlated from previous work. The data are internally consistent.

      Limitations: The lack of biological replicates, and unclear procedures of how to draw the IF masks that underlie the conclusions about X chromosome (co)localization and nuclear volume determination render the argument less convincing. For this reviewer, who is not in the C. elegans field, the analysis of mutant phenotypes is difficult to follow. The conclusions are based on only one type of experiment. In reference 77, the X chromosome binding was done by ChIP-seq, clearly a superior, complementary method.

      Response: As explained above, since the strain has to be maintained as a heterozygote, we are unable to collect enough mutants for a ChIP-seq experiment. We can perform and better document the experimental replicates and we can better explain the quantification methods used.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: The authors analyzed the ATPase function of an SMC-4 variant required for dosage compensation in C. elegans. They made a single amino acid mutation that significantly reduced ATPase activity of the protein as shown by in vitro ATP hydrolysis. They showed that the mutation results in the phenotypic consequences of those shown for other DC mutants, including viability assay, immunofluorescence and DNA FISH. These results demonstrate the important role of ATPase activity in transcription repression.

      Major comments: - Are the key conclusions convincing? The key conclusion that DPY-27 has ATPase activity and using a classic mutation that reduces it largely eliminates its function is convincing. The interpretation of the IF experiments to build the model in the final figure requires stronger evidence, as commented below in additional experiment section.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? Yes, as explained below.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The main issue with the current model is that the authors assume that the EQ proteins that they are analyzing is in complex with the rest of the condensin IDC subunits. However, there is no evidence in the paper suggesting that this occurs. The results are consistent with the possibility that a large portion of the DPY27-EQ is not in a complex.

      IP-western experiments comparing the proportion of other subunits pulled down by the wild type versus the EQ mutant (perhaps extract from ~50% EQ containing population could be reached) is needed to understand the incorporation of the EQ mutant in the complex. This is particularly important for the interpretation of the data in Figure 4A, where 70% of the nuclei show diffuse CAPG-1 and DPY-27 EQ. Is this signal due to disassembled subunits diffusing freely, or as depicted in the model figure, bound less stably everywhere? The immunofluorescence results are consistent with both EQ mutation 1) forming a full complex and unstably binding or 2) destabilizing the complex but incompletely assembled complexes sustaining a pool of free EQ detected by the immunofluorescence experiments.

      Response: We agree that to conclusively show interactions, an IP would be necessary. However, as explained above for ChIP, it is not possible to collect enough mutants to make enough protein extract for an IP. An IP in heterozygous worms is also not ideal, as it would be nearly impossible to distinguish wild protein from the mutant. The antibody we used recognizes the N terminus, which is identical in the two proteins. The only way to distinguish them would be mass spec. However, during the fragmentation process for mass spec, Q can deaminate to E, which would complicate interpretation of our data. To do this experiment properly, we would need to introduce a different tag into the mutant protein. With the current reagents, an IP is not possible.

      Instead, we have to rely on indirect evidence. The fact that DPY-27 and CAPG-1 colocalize (figure 4) does provide some support for the hypothesis. From previous studies,including our recent publication Trombley et al PLoS Genetics 2025, we know that the condensin IDC complex is not stable unless all subunits are present. It is therefore highly unlikely, although not impossible, that what we detect is diffuse individual subunits.

      We can make changes in the text to soften this claim and better discuss the caveats of the experiment and the conclusions.

      Along the same point, authors show that EQ protein that binds to the X is incapable of bringing H4K20me1, which is consistent with the possibility that a large portion of the EQ protein is not in a complex. : "To our surprise, we observed that there was no discernable enrichment of H4K20me1, even though there is discernable enrichment of DPY-27 EQ on the X chromosomes in the dpy-27 EQ mutants (Figure 8A).

      Response: There is an important difference. CAPG-1 and DPY-27 are both members of condensin IDC. The five subunits of this complex depend on each other for stability. DPY-21, the protein that introduces the H4K20me1 mark, also localizes to the X chromosomes, but is not part of condensin IDC. Condensin IDC is able to localize to the X chromosomes in the absence of DPY-21, and is not dependent on DPY-21 for stability. However, DPY-21 is dependent on condensin IDC for X localization (Yonker et al 2003). It is then possible that the mutant condensin IDC is X-bound, but it is unable to recruit DPY-21. We can clarify this in the text.

      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. It is unclear how long it would take to collect enough het/mutant worms can be collected for IP-western. Without additional evidence, interpretation of the data would be affected.

      Response: As explained above, collecting enough mutant worms is essentially impossible. Collecting enough heterozygotes is possible, but distinguishing the mutant protein from the wild type in hets is not.

      • Are the data and the methods presented in such a way that they can be reproduced? Yes
      • Are the experiments adequately replicated and statistical analysis adequate? Yes, except the presentation of the test (see minor comment below)

      Minor comments: - Specific experimental issues that are easily addressable. The use of letters for statistical test result is confusing and the figure legend is not clear about what actual p values were produced "Letters represent multiple comparison p values, with different letters indicating statistically significant differences, and any repeated letter demonstrating no significance. " Providing the values at a reasonably concise manner in the legend will help the reader a lot.

      Response: P values can be added to the figures, or the legend

      • Are prior studies referenced appropriately? The authors state that "Surprisingly, this mutant did not phenocopy the transgenic EQ mutant in [77], .." however in the previous paragraph, the authors state that the transgenic was expressed in the presence of wild type copy. Therefore, the endogenous mutant showing phenotypes rather than the transgenic is rather expected.

      Response: What we referred to were ways in which the protein behaved (for example in ability to bind to the X at all), and not mutant phenotypes of worms. We can clarify this in the text.

      The authors state that "One possible explanation could be that mitotic condensation has multiple drivers of equal consequence including changes in histone modifications [129], whereas condensation of dosage compensated X chromosomes is predominantly dependent on the DCC. " In a dpy-21 mutant, X chromosome decondenses but DPY-27 stays on the chromosome. Therefore, the effect of the EQ mutation may be due to lack of H4K20me1 enrichment in addition to the lack of loop extrusion.

      Response: We can add the role of H4K20me1 to the discussion.

      • Are the text and figures clear and accurate? Yes
      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? The Pearson correlation coefficient for assessing colocalization between SDC-2 and DPY-27 was helpful for quantification, because there is a lot of background signal that makes the support for or lack of colocalization with the X in the other IF/FISH figures difficult to assess. Additionally, please provide information on how chromatic aberration was assessed when analyzing colocalization experiments.

      Response: Chromatic aberration was not considered for these experiments.

      Reviewer #2 (Significance (Required)):

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field. Although long assumed to be a functional SMC, the demonstration of DPY-27 function depending on ATPase activity is important. This demonstrates that an X-specific condensin retained its SMC activity.

      • Place the work in the context of the existing literature (provide references, where appropriate). The authors do an adequate job in doing this in their discussion.

      • State what audience might be interested in and influenced by the reported findings. The field of 3D genome organization and function would be influenced by the reported findings.

      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Genomic analyses of 3D genome organization and gene expression.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Nakagawa and colleagues report the observation that YAP is differentially localized, and thus differentially transcriptionally active, in spheroid cultures versus monolayer cultures. YAP is known to play a critical role in the survival of drug-tolerant cancer cells, and as such, the higher levels of basally activated YAP in monolayer cultures lead to higher fractions of surviving drug-tolerant cells relative to spheroid culture (or in vivo culture). The findings of this study, revealed through convincing experiments, are elegantly simple and straightforward, yet they add significantly to the literature in this field by revealing that monolayer cultures may actually be a preferential system for studying residual cell biology simply because the abundance of residual cells in this format is much greater than in spheroid or xenograft models. The potential linkage between matrix density and stiffness and YAP activation, while only speculated upon in this manuscript, is intriguing and a rich starting point for future studies.

      Although this work, like any important study, inspires many interesting follow-on questions, I am limiting my questions to only a few minor ones, which may potentially be explored either in the context of the current study or in separate, follow-on studies.

      We appreciate Reviewer #1's comments that our work is of importance to the field and particularly that it will "...add significantly to the literature in this field by revealing that monolayer cultures may actually be a preferential system for studying residual cell biology..."  We have sought to highlight the importance of how our findings could be applied to study resistance mechanisms at various points in the manuscript.

      Strengths:

      The major strengths of the work are described above.

      Weaknesses:

      Rather than considering the following points as weaknesses, I instead prefer to think of them as areas for future study:

      (1) Given the field's intense interest in the biology and therapeutic vulnerabilities of residual disease cells, I suspect that one major practical implication of this work could be that it inspires scientists interested in working in the residual disease space to model it in monolayer culture. However, this relies upon the assumption that drug-tolerant cells isolated in monolayer culture are at least reasonably similar in nature to drug-tolerant cells isolated from spheroid or xenograft systems. Is this true? An intriguing experiment that could help answer this question would be to perform gene expression profiling on a cell line model in the following conditions: monolayer growth, drug tolerant cells isolated from monolayer growth conditions, spheroid growth, drug tolerant cells isolated from spheroid growth conditions, xenograft tumors, and drug tolerant cells isolated from xenograft tumors. What are the genes and programs shared between drug-tolerant cells cultured in the three conditions above? Which genes and programs differ between these conditions? Data from this exercise could help provide additional, useful context with which to understand the benefits and pitfalls of modeling residual tumor cell growth in monolayer culture.

      We thank the reviewer for suggesting valuable future studies. We agree that the proposed experiments represent important next steps in understanding the role of YAP and other pathways in primary resistance. We believe, however, these experiments are both beyond the scope of the current manuscript and beyond what can reasonably be addressed in a revision. The distinct challenges associated with comparing in vivo and in vitro conditions would require significant optimization of single-cell approaches, especially given the robust cell death driven by afatinib treatment in vivo. Given the complexity of in vivo experimentation, we are concerned that such studies may not guarantee biologically meaningful insights. Nonetheless, we agree that this is a compelling direction for future research. If common gene expression patterns could be identified despite these challenges, such studies could help validate monolayer culture as a relevant model for investigating residual disease.

      (2) In relation to the point above, there is an interesting and established connection between mesenchymal gene expression and YAP/TAZ signaling. For example, analyses of gene expression data from human tumors and cell lines demonstrate an extremely strong correlation between these two gene expression programs. Further, residual persister cancer cells have often been characterized as having undergone an EMT-like transition. From the analysis above, is there evidence that residual tumor cells with increased YAP signaling also exhibit increased mesenchymal gene expression?

      We agree with the reviewer that a connection between YAP/TAZ activity and EMT is likely, given prior studies exploring correlations between these two gene signatures. We believe, however, exploring EMT represents a distinct research direction from the primary focus of the current manuscript.  We are concerned exploration of EMT, especially in the absence of corresponding preclinical models or mechanistic data directly linking EMT to therapy resistance in our models, could distract from the main conclusions of the manuscript. While we plan to stain for EMT-associated markers in the residual cancer tissue from the in vivo studies, it remains unclear whether such data would meaningfully contribute to the revised manuscript, regardless of the outcome.

      Reviewer #2 (Public review):

      The manuscript by Nakagawa R, et al describes a mechanism of how NSCLC cells become resistant to EGFR and KRAS G12C inhibition. Here, the authors focus on the initial cellular changes that occur to confer resistance and identify YAP activation as a non-genetic mechanism of acute resistance.

      The authors performed an initial xenograft study to identify YAP nuclear localization as a potential mechanism of resistance to EGFRi. The increase in the stromal component of the tumors upon Afatinib treatment leads the authors to explore the response to these inhibitors in both 2D and 3D culture. The authors extend their findings to both KRAS G12C and BRAF inhibitors, suggesting that the mechanism of resistance may be shared along this pathway.

      The paper would benefit from additional cell lines to determine the generalizability of the findings they presented. While the change in the localization of YAP upon Afatinib treatment was identified in a xenograft model, the authors do not return to animal models to test their potential mechanism, and the effects of the hyperactivated S127A YAP protein on Afatinib sensitivity in culture are modest. Also, combination studies of YAP inhibitors and EGFR/RAS/RAF inhibitors would have strengthened the studies.

      We thank the reviewer for their insightful comments. In this manuscript, we present data from 5 cell lines representing the EGFR/BRAF/KRAS pathway, demonstrating the generalizability of YAP-driven decreased cancer cell sensitivity to targeted inhibitors when cultured in 2D compared to spheroid counterparts. While expanding this analysis to a larger panel of cell lines is beyond the scope of the current study, we believe our findings provide a strong rationale for future investigations, including high-throughput screens conducted by other research groups and pharmaceutical companies, to recognize the value in screening spheroid cell cultures. We hope this work helps shift the field of cancer therapeutics toward screening approaches that better reflect tumor biology into drug discovery pipelines and believe this could be one of the most impactful and enduring contributions of our study.

      Reviewer #2 also mentions that "...combination studies of YAP inhibitors and EGFR/RAS/RAF inhibitors would have strengthened the studies..."  The concept that YAP/TAZ inhibitors (i.e. TEAD inhibitors) could be additive or synergistic in 2D culture is one that is being actively tested across several groups and in pharma. Several recent examples include a publication by Hagenbeek, et al., Nat. Cancer, 2023 (PMID: 37277530) showing that a TEAD inhibitor overcomes KRASG12C inhibitor resistance. Additional, recent work by Pfeifer, et al., Comm. Biol., 2024 (PMID: 38658677) suggests a similar effect between EGFR inhibitors and a different TEAD inhibitor. While neither of these studies extensively probes cell death pathways in the way performed in our studies, they nevertheless provide strong evidence that indeed TEAD + targeted EGFR/RAF/RAS inhibition in 2D have additive, if not synergistic, effects. We feel that these recent published studies affirm our findings and repeating such experiments is unlikely to add much new information. We thus feel they are beyond the scope of our present studies.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      Olfactory sensory neurons (OSNs) in the olfactory epithelium detect myriads of environmental odors that signal essential cues for survival. OSNs are born throughout life and thus represent one of the few neurons that undergo life-long neurogenesis. Until recently, it was assumed that OSN neurogenesis is strictly stochastic with respect to subtype (i.e. the receptor the OSN chooses to express).

      However, a recent study showed that olfactory deprivation via naris occlusion selectively reduced birthrates of only a fraction of OSN subtypes and indicated that these subtypes appear to have a special capacity to undergo changes in birthrates in accordance with the level of olfactory stimulation. These previous findings raised the interesting question of what type of stimulation influences neurogenesis, since naris occlusion does not only reduce the exposure to potentially thousands of odors but also to more generalized mechanical stimuli via preventing airflow.

      In this study, the authors set out to identify the stimuli that are required to promote the neurogenesis of specific OSN subtypes. Specifically, they aim to test the hypothesis that discrete odorants selectively stimulate the same OSN subtypes whose birthrates are affected. This would imply a highly specific mechanism in which exposure to certain odors can "amplify" OSN subtypes responsive to those odors suggesting that OE neurogenesis serves, in part, an adaptive function.

      To address this question, the authors focused on a family of OSN subtypes that had previously been identified to respond to musk-related odors and that exhibit higher transcript levels in the olfactory epithelium of mice exposed to males compared to mice isolated from males. First, the authors confirm via a previously established cell birth dating assay in unilateral naris occluded mice that this increase in transcript levels actually reflects a stimulus-dependent birthrate acceleration of this OSN subtype family. In a series of experiments using the same assay, they show that one specific subtype of this OSN family exhibits increased birthrates in response to juvenile male exposure while a different subtype shows increased birthrates to adult mouse exposure. In the core experiment of the study, they finally exposed naris occluded mice to a discrete odor (muscone) to test if this odor specifically accelerates the birth rates of OSN types that are responsive to this odor. This experiment reveals a complex relationship between birth rate acceleration and odor concentrations showing that some muscone concentrations affect birth rates of some members of this family and do not affect two unrelated OSN subtypes.

      In addition to the results nicely summarized by the reviewer, which focus on experiments to examine the effects of odor stimulation on unilateral naris occluded (UNO) mice, an important part of the present study are experiments on non-occluded (i.e., non-UNO-treated) mice. These experiments show: 1) that the exposure of non-occluded mice to odors from adolescent male mice selectively increases quantities of newborn OSNs of the musk-responsive subtype Olfr235 (Figure 3G, H; previously Figure 6), 2) the exposure of non-occluded female mice to 2 different musk odorants (muscone, ambretone) selectively increases quantities of newborn OSNs of 3 musk responsive subtypes: Olfr235, Olfr1440 and Olfr1431 (Figure 4D-F; previously Figure 6), and 3) the exposure of non-occluded adult female mice to a musk odorants selectively increases quantities of newborn OSNs of musk responsive subtypes (Figure 5; previously Fig. S7). We have reorganized the revised manuscript to more prominently and clearly present the experimental design and findings of these experiments. We have also made changes to clarify (via schematics) the experimental conditions used (i.e., UNO, non-UNO, odor exposure) in each experiment.

      Strengths:

      The scientific question is valid and opens an interesting direction. The previously established cell birth dating assay in naris occluded mice is well performed and accompanied by several control experiments addressing potential other interpretations of the data.

      Weaknesses:

      (1) The main research question of this study was to test if discrete odors specifically accelerate the birth rate of OSN subtypes they stimulate, i.e. does muscone only accelerate the birth rate of OSNs that express muscone-responsive ORs, or vice versa is the birthrate of muscone-responsive OSNs only accelerated by odors they respond to?

      This question is only addressed in Figure 5 of the manuscript and the results only partially support the above claim. The authors test one specific odor (muscone) and find that this odor (only at certain concentrations) accelerates the birth rate of some musk-responsive OSN subtypes, but not two other unrelated control OSN subtypes. This does not at all show that musk-responsive OSN subtypes are only affected by odors that stimulate them and that muscone only affects the birthrate of musk-responsive OSNs, since first, only the odor muscone was tested and second, only two other OSN subtypes were tested as controls, that, importantly, are shown to be generally stimulus-independent OSN subtypes (see Figure 2 and S2).

      As a minimum the authors should have a) tested if additional odors that do not activate the three musk-responsive subtypes affect their birthrate b) choose 2-3 additional control subtypes that are known to be stimulus-dependent (from their own 2020 study) and test if muscone affects their birthrates.

      We appreciate these suggestions. Within the revised manuscript, we have described and included the results from several new experiments:

      (1) As noted by the reviewer, we had previously tested the effects of exposure to only one exogenous musk odorant, muscone, on quantities of newborn OSNs of the musk-responsive subtypes Olfr235, Olfr1440, and Olfr1431. To test whether the effects observed with muscone exposure occur with other musk odorants, we assessed the effects of exposure to ambretone (5-cyclohexadecenone), a musk odorant previously found to robustly activate musk-responsive OSNs (Sato-Akuhara et al., 2016; Shirasu et al., 2014), on quantities of newborn OSNs of 3 musk-responsive subtypes Olfr235, Olfr1440, and Olfr1431, as well as the SBT-responsive subtype Olfr912, in the OEs of non-occluded female mice. Exposure to ambretone was found to significantly increase quantities of newborn OSNs of all 3 musk-responsive subtypes (Figure 4D-F) but not the SBT-responsive subtype (Figure 4–figure supplement 4C-left), indicating that a variety of musk odorants can accelerate the birthrates of musk responsive subtypes.

      (2) To verify that exogenous non-musk odors do not increase quantities of newborn OSNs of musk responsive OSN subtypes (point a, above), we quantified newborn OSNs of 3 musk-responsive subtypes, Olfr235, Olfr1440, and Olfr1431, in non-occluded female mice that were exposed to the non-musk odorants SBT or IAA. As expected, neither of these odorants significantly affected the birthrates of the subtypes tested (Figure 4D-F).

      (3) To confirm that exogenous musk odors do not accelerate the birthrates of non-musk responsive OSN subtypes that were previously found to undergo stimulation-dependent neurogenesis (point b, above), we quantified newborn OSNs of 2 such subtypes, Olfr827 and Olfr1325, in non-occluded female mice that were exposed to muscone. As expected, exposure to muscone did not significantly affect the birthrates of either of these subtypes (Figure 4–figure supplement 4C-middle, right).

      (4) To provide additional confirmation that only some OSN subtypes have a capacity to exhibit increases in newborn OSN quantities in the presence of odors that activate them, we compared quantities of newborn OSNs of the SBT-responsive subtype Olfr912 in non-occluded females that were either exposed to 0.1% SBT versus unexposed controls. As expected, exposure of SBT caused no significant increase in quantities of newborn Olfr912 OSNs (Figure 4–figure supplement 4C-left).

      (2) The finding that Olfr1440 expressing OSNs do not show any increase in UNO effect size under any muscone concentration (Figure 5D, no significance in line graph for UNO effect sizes, middle) seems to contradict the main claim of this study that certain odors specifically increase birthrates of OSN subtypes they stimulate. It was shown in several studies that olfr1440 is seemingly the most sensitive OR for muscone, yet, in this study, muscone does not further increase birthrates of OSNs expressing olfr1440. The effect size on birthrate under muscone exposure is the same as without muscone exposure (0%).

      In contrast, the supposedly second most sensitive muscone-responsive OR olfr235 shows a significant increase in UNO effect size between no muscone exposure (0%) and 0.1% as well as 1% muscone.

      Findings that quantities of newborn Olfr1440 OSNs do not show a significantly greater UNO effect size in the OEs from mice exposed to muscone compared to control mice was also somewhat surprising to us. We think that there are two potential explanations for this result: 1) Unlike subtype Olfr235, subtype Olfr1440 exhibits a significant open-side bias in newborn OSN quantities in UNO-treated adolescent females even in the absence of exposure to muscone. We speculate that this subtype (as well as subtype Olfr1431) is stimulated by odors that are emitted by female mice at the adolescent stage, and/or by another environmental source. This may limit the influence of muscone exposure on the UNO effect size. 2) There is compelling evidence that odors within the environment can enter the closed side of the OE transnasally [via the nasopharyngeal canal (Kelemen, 1947)] and/or retronasally (via the nasopharynx) in UNO-treated mice [reviewed in (Coppola, 2012)]. Thus, it is conceivable that chronic exposure of UNO-treated mice to muscone results in the eventual entry on the closed side of the OE of muscone at concentrations sufficient to promote neurogenesis. If Olfr1440 is more sensitive to muscone than Olfr235 [e.g., (Sato-Akuhara et al., 2016; Shirasu et al., 2014)], OSNs of this subtype may be especially sensitive to small amounts of odors that enter the closed side of the OE transnasally and/or retronasally. These explanations are supported by the following results:

      - UNO-treated females exposed to 0.1% muscone show higher quantities of newborn Olfr1440 OSNs on both the open and closed sides of the OE in muscone exposed females compared to their unexposed counterparts (Figure 4–figure supplement 1A-middle). Similar results were also observed for newborn Olfr235 OSNs (Figure 4C-middle), albeit to a lesser extent, perhaps due to the lower sensitivity of this subtype to muscone.

      - In non-occluded female mice, exposure to 0.1% muscone was found to significantly increase quantities of newborn Olfr1440 OSNs, as well as newborn Olfr235 and Olfr1431 OSNs (Figure 4D-F in revised manuscript; Figure 6 in original version). Similar results were also observed upon exposure to ambretone, another musk odor (Figure 4D-F). These experiments strongly support the hypothesis that musk odors selectively increase birthrates of OSN subtypes that they stimulate.

      We have addressed these points within the results section of the revised manuscript.

      (3) The authors introduce their choice to study this particular family of OSN subtypes with first, the previous finding that transcripts for one of these musk-responsive subtypes (olfr235) are downregulated in mice that are deprived of male odors. Second, musk-related odors are found in the urine of different species. This gives the misleading impression that it is known that musk-related odors are indeed excreted into male mouse urine at certain concentrations. This should be stated more clearly in the introduction (or cited, if indeed data exist that show musk-related odors in male mouse urine) because this would be a very important point from an ethological and mechanistic point of view.

      In addition, this would also be important information to assess if the chosen muscone concentrations fall at all into the natural range.

      These are important points, which have addressed within the revised manuscript:

      (1) Within the introduction, we have now stated that the emission of musk odors by mice has not been documented. We have also added extensive discussions of what is known about the emission of musk odors by mice in a new subsection within Results, as well as within the Discussion section. Most prominently, we have cited one study (Sato-Akuhara et al., 2016) that noted unpublished evidence for the emission of Olfr1440-activating compounds from male preputial glands: “Indeed, our preliminary experiments suggest that there are unidentified compounds that activate MOR215-1 in mouse preputial gland extracts.” Another study, which used histomorphology, metabolomic and transcriptomic analyses to compare the mouse preputial glands to muskrat scent glands, found that the two glands are similar in many ways, including molecular composition (Han et al., 2022). However, the study did not identify known musk compounds within mouse preputial glands.

      (2) Based on the reviewer’s feedback and our own curiosity, we used GC-MS to analyze both mouse urine and preputial gland extracts for the presence of known musk odorants, particularly those known to activate Olfr235 and Olfr1440 (Sato-Akuhara et al., 2016). Although we were unable to find evidence for known musk odorants in mouse urine extracts (possibly due to insufficient sensitivity of the assay employed), we found that preputial gland extracts contain GC-MS signals that are structurally consistent with known musk odorants. A limitation of this approach, however, is that the conclusive identification of specific musk odorants in extracts derived from mouse urine and tissues requires comparisons to pure standards, many of which we could not readily obtain. For example, we were unable to obtain a pure sample of cycloheptadecanol, a musk molecule with a predicted potential match to a signal identified within preputial gland extracts. Another limitation is that although several known musk odorants have been found to activate Olfr235 and Olfr1440 OSNs, it is conceivable that structurally distinct odorants that have not yet been identified might also activate them. The findings from these experiments have been included in a new figure within the revised manuscript (Appendix 2–figure 1).

      Related: If these are male-specific cues, it is interesting that changes in OR transcripts (Figure 1) can already be seen at the age of P28 where other male-specific cues are just starting to get expressed. This should be discussed.

      We agree that the observed changes in quantities of newborn OSNs of musk-responsive subtypes in mice exposed to juvenile male odors deserves additional discussion. We have included a more extensive discussion of this observation in both the Results and Discussion sections of the revised manuscript.

      (4) Figure 5: Under muscone exposure the number of newborn neurons on the closed sides fluctuates considerably. This doesn't seem to be the case in other experiments and raises some concerns about how reliable the naris occlusion works for strong exposure to monomolecular odors or what other potential mechanisms are at play.

      We agree that the variability in quantities of newborn OSNs of musk-responsive subtypes on the closed side of the OE of UNO-treated mice deserves further discussion. As noted above, we suspect that these fluctuations are due, at least in part, to transnasal and/or retronasal odor transfer via the nasopharyngeal canal (Kelemen, 1947) and nasopharynx, respectively [reviewed in (Coppola, 2012)], which would be expected to result in exposure of the closed OE to odor concentrations that rise with increasing environmental concentrations. In support of this, quantities of newborn Olfr235 and Olfr1440 OSNs increase on both the open and closed sides with increasing muscone concentration (except at the highest concentration, 10%, in the case of Olfr1440) (Figure 4C-middle, Figure 4–figure supplement 1A-middle). It is conceivable that reductions in newborn Olfr1440 OSN quantities observed in the presence of 10% muscone reflect overstimulation-dependent reductions in survival. Our findings from UNO-based experiments are consistent with expectations that naris occlusion does not completely block exposure to odorants on the closed side, particularly at high concentrations. However, they also appear consistent with the hypothesis that exposure to musk odors promotes the neurogenesis of musk-responsive OSN subtypes.

      Considering the limitations of the UNO procedure, it is important to note that the present study also includes experimental exposure of non-occluded animals to both male odors (Figure 3G, H) and exogenous musk odorants (Figures 4D-F). Findings from the latter experiments provide strong evidence that exposure to multiple musk odorants (muscone, ambretone) causes selective increases in the birthrates of multiple musk-responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431).

      We have included within the Results section of the revised manuscript a discussion of how observed effects of muscone exposure of UNO-treated mice may be influenced by transnasal/ retronasal odor transfer to the closed side of the OE.

      (5) In contrast to all other musk-responsive OSN types, the number of newborn OSNs expressing olfr1437 increases on the closed side of the OE relative to the open in UNO-treated male mice (Figure 1). This seems to contradict the presented theory and also does not align with the bulk RNAseq data (Figure S1).

      Subtype Olfr1437 is indeed an outlier among musk-responsive subtypes that were previously found to be more highly represented in the OSN population in 6-month-old sex-separated males compared to females (Appendix 1–figure 1)(C. van der Linden et al., 2018; Vihani et al., 2020). Somewhat unexpectedly, our findings from scRNA-seq experiments show slightly greater quantities of immature Olfr1437 OSNs on the closed side of the OE in juvenile males (Figure 1D, E of the revised manuscript, which now includes data from a second OE). Perhaps more informatively considering the small number of iOSNs of specific subtypes in the scRNA-seq datasets, EdU birthdating experiments show no difference in newborn Orlfr1437 OSN quantities on the 2 sides of the OE from UNO-treated juvenile males (Figure 2G). It is unclear to us why subtype Olfr1437 does not show open-side biases in newborn OSN quantities in juvenile male mice, but potential explanations include:

      - Age: Findings based on bulk RNA-seq that musk responsive OSN subtypes are more highly represented in mice exposed to male odors analyzed mice that were 6 months old (C. van der Linden et al., 2018) or > 9 months old (Vihani et al., 2020) at the time of analysis. By contrast, the present study primarily analyzed mice that were juveniles (PD 28) at the time of scRNA-seq analysis (Figure 1) or EdU labeling (Figure 2G). It is conceivable that different musk-responsive subtypes are selectively responsive to distinct odors that are emitted at different ages. In this scenario, odors that increase the birthrates of Olfr235, Olfr1440, and Olfr1431 OSNs may be emitted starting at the juvenile stage, while those that increase the birthrate of Olfr1437 OSNs may be emitted in adulthood. In potential support of this, juvenile males exposed to their adult parents at the time of EdU labeling showed a slightly greater (although not statistically significantly different) UNO effect size in quantities of newborn Olfr1437 OSNs compared to controls (Figure 3–figure supplement 3).

      - Capacity for stimulation-dependent neurogenesis: It is also conceivable that, unlike other musk-responsive OSN subtypes, Olfr1437 OSNs lack the capacity for stimulation-dependent neurogenesis (like the SBT-responsive subtype Olfr912, for example). If so, this would imply that increased representations of Olfr1437 OSNs observed in mice exposed to male odors for long periods (C. van der Linden et al., 2018; Vihani et al., 2020) may be due to male odor-dependent increases in the lifespans of Olfr1437 OSNs.

      Within the Discussion section of the revised manuscript, we have discussed the findings concerning Olfr1437.

      (6) The authors hypothesize in relation to the accelerated birthrate of musk-responsive OSN subtypes that "the acceleration of the birthrates of specific OSN subtypes could selectively enhance sensitivity to odors detected by those subtypes by increasing their representation within the OE". However, for two other OSN subtypes that detect male-specific odors, they hypothesize the opposite "By contrast, Olfr912 (Or8b48) and Olfr1295 (Or4k45), which detect the male-specific non-musk odors 2-sec-butyl-4,5-dihydrothiazole (SBT) and (methylthio)methanethiol (MTMT), respectively, exhibited lower representation and/or transcript levels in mice exposed to male odors, possibly reflecting reduced survival due to overstimulation."

      Without any further explanation, it is hard to comprehend why exposure to male-derived odors should, on one hand, accelerate birthrates in some OSN subtypes to potentially increase sensitivity to male odors, but on the other hand, lower transcript levels and does not accelerate birth rates of other OSN subtypes due to overstimulation.

      We agree that this point deserves further explanation. Within the revised manuscript, we have expanded the Introduction and Results to describe evidence from previous studies that exposure to stimulating odors causes two categories of changes to specific OSN subtypes: elevated representations or reduced representations within the OSN population. In one study (C. J. van der Linden et al., 2020), UNO treatment was found to cause a fraction of OSN subtypes to exhibit lower birthrates and representations on the closed side of the OE relative to the open. By contrast, another fraction of OSN subtypes exhibited higher representations on the closed side of the OEs of UNO-treated mice, but no difference in birthrates between the two sides. The latter subtypes were found to be distinguished by their receipt of extremely high levels of odor stimulation, suggesting that reduced odor stimulation via naris occlusion may lengthen their lifespans. In support of the possibility that Olfr912 (and Olfr1295), which detect SBT and MTMT, respectively (Vihani et al., 2020), which are emitted specifically by male mice (Lin et al., 2005; Schwende et al., 1986), UNO treatment was previously found to increase total Olfr912 OSN quantities on the closed side compared to the open side in sex-separated males (C. van der Linden et al., 2018), a finding confirmed in the present study (Figure 3–figure supplement 1H).

      Taken together, findings from previous studies as well as the current one indicate that olfactory stimulation can accelerate the birthrates and/or reduced the lifespans of OSNs, depending on the specific subtypes and odors within the environment. As we have now indicated in the Discussion, we do not yet know what distinguishes subtypes that undergo stimulation-dependent neurogenesis, but it is conceivable that they detect odors with a particular salience to mice. Thus, observations that some odorants (e.g., musks) cause stimulation-dependent neurogenesis while others do not (e.g., SBT) might reflect an animal’s specific need to adapt its sensitivity to the former. Alternatively, it is conceivable that stimulation-dependent reductions in representations of subtypes such as Olfr912 and Olfr1295 reflect a fundamentally different mode of plasticity that is also adaptive, as has been hypothesized (C. van der Linden et al., 2018; Vihani et al., 2020).

      Reviewer #1 (Recommendations For The Authors):

      To support the main claim, several controls are necessary as mentioned under point 1 of the public review.

      As outlined in our responses to the public review, new experiments within the revised manuscript indicate the following:

      (1) Accelerated birthrates of 3 different musk responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431) are observed in non-occluded mice following exposure to multiple exogenous musk odorants (muscone, ambretone) (Figure 4D-F).

      (2) Exposure of non-occluded mice to non-musk odors (SBT, IAA) does not accelerate the birthrates of musk responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431) (Figure 4D-F).

      (3) Exposure of mice to exogenous musk odors (muscone, ambretone) does not accelerate the birthrates of non-musk responsive OSN subtypes (e.g., Olfr912), including those previously found to undergo stimulation-dependent neurogenesis (Olfr827, Olfr1325) (Figure 4–figure supplement 4C).

      (4) Only a fraction of OSN subtypes have a capacity to undergo accelerated neurogenesis in the presence of odors that activate them (e.g., Olfr912 birthrates are not accelerated by SBT exposure) (Figure 4–figure supplement 4C-left).

      In addition, this study could be considerably improved by showing that the proposed mechanism applies beyond a single OSN subtype (olfr235), especially since the most sensitive OR subtype (expressing olfr1440) does not align with the main claim. The introduction states that this is difficult because the ligands for many ORs are unknown including all subtypes previously found to undergo stimulation-dependent neurogenesis referring to your 2020 study. While this reviewer agrees that the lack of deorphanization is a significant hurdle in the field, the 2020 study states that about 4% of all ORs (which should equal >40 ORs) show a stimulus-dependent down-regulation on the closed side, not only the 7 ORs which are closer examined (Figure 1). It would tremendously improve the impact of the current study to show that the proposed effect applies also to one of these other >40 ORs.

      We appreciate this question, as it alerted us to some shortcomings in how our findings were presented within the original manuscript. We respectfully disagree that only findings regarding subtype Olfr235 align with the main hypothesis of this study, which is that discrete odors can selectively promote the neurogenesis of sensory neuron subtypes that they stimulate. Specifically, we would like to draw attention to experiments on non-occluded female mice exposed to exogenous musk odorants (muscone, ambretone; revised Figures 4D-F; previously, Figure 6). Findings from these experiments provide compelling evidence that exposure to musk odorants causes selective increases in the birthrates of three different musk-responsive OSN subtypes: Olfr235, Olfr1440, and Olfr1431. Thus, we would suggest that results from the present study already show that the proposed mechanism applies to more than the just Olfr235 subtype. However, we agree with what we think is the essence of the reviewer’s point: that it is important to determine the extent to which this mechanism applies to OSN subtypes that are responsive to other (i.e., non-musk) odorants. While, as noted by the reviewer, our previous study identified several OSN subtypes that undergo stimulation-dependent neurogenesis (as well as many others that predicted to do so)(C. J. van der Linden et al., 2020), we are not aware of ligands that have been identified with high confidence for those subtypes. Although we are in the process of conducting experiments to identify additional odor/subtype pairs to which the mechanism described in this study applies, the early-stage nature of these experiments precludes their inclusion in the present manuscript.

      The ethological and mechanistic relevance of the current study could be significantly improved by showing that musk-related odors that activate olfr235 are actually found in male mouse urine (and additionally are not found in female mouse urine). Otherwise, the implicated link between the acceleration of OSN birthrates by exposure to male odors and acceleration by specific monomolecular odors does not hold, raising the question of any natural relevance (e.g. the proposed adaptive function to increase sensitivity to certain odors).

      As noted in our responses to the public review, we have addressed this important point within the revised manuscript as follows:

      (1) We have included an extensive discussion of what is known about the emission of musk-like odors by mice.

      (2) We have used GC-MS to analyze both mouse urine and preputial gland extracts for the presence of known musk compounds. Although inconclusive, we report that preputial glands contain signals that are structurally consistent with known musk compounds. The findings of these experiments have been included in the revised manuscript (new Appendix 2–figure 1), along with a discussion of their limitations.

      Reviewer #2 (Public Review):

      In their paper entitled "In mice, discrete odors can selectively promote the neurogenesis of sensory neuron subtypes that they stimulate" Hossain et al. address lifelong neurogenesis in the mouse main olfactory epithelium. The authors hypothesize that specific odorants act as neurogenic stimuli that selectively promote biased OR gene choice (and thus olfactory sensory neuron (OSN) identity). Hossain et al. employ RNA-seq and scRNA-seq analyses for subtype-specific OSN birthdating. The authors find that exposure to male and musk odors accelerates the birthrates of the respective responsive OSNs. Therefore, Hossain et al. suggest that odor experience promotes selective neurogenesis and, accordingly, OSN neurogenesis may act as a mechanism for long-term olfactory adaptation.

      We appreciate this summary but would like to underscore that a mechanism involving biased OR gene choice is just one of two possibilities proposed in the Discussion section to explain how odorant stimulation of specific subtypes accelerates the birthrates of those subtypes.

      The authors follow a clear experimental logic, based on sensory deprivation by unilateral naris occlusion, EdU labeling of newborn neurons, and histological analysis via OR-specific RNA-FISH. The results reveal robust effects of deprivation on newborn OSN identity. However, the major weakness of the approach is that the results could, in (possibly large) parts, depend on "downregulation" of OR subtype-specific neurogenesis, rather than (only) "upregulation" based on odor exposure. While, in Figure 6, the authors show that the observed effects are, in part, mediated by odor stimulation, it remains unclear whether deprivation plays an "active" role as well. Moreover, as shown in Figure 1C, unilateral naris occlusion has both positive and negative effects in a random subtype sample.

      In our view, the present study involves two distinct and complementary experimental designs: 1) odor exposure of UNO-treated animals and 2) odor exposure of non-occluded animals. Here we address this comment with respect to each of these designs:

      (1) For experiments performed on UNO-treated animals, we agree that observed differences in birthrates on the open and closed sides of the OE reflect, largely, a deceleration (i.e., downregulation) of the birthrates of these subtypes on the closed side relative to the open (as opposed to an acceleration of birthrates on the open side). Our objective in using this design was to test the extent to which specific OSN subtypes undergo stimulation-dependent neurogenesis under various odor exposure conditions. According to the main hypothesis of this study, a lower birthrate of a specific OSN subtype on the closed side of the OE compared to the open is predicted to reflect a lower level of odor stimulation on the closed side received by OSNs of that subtype. However (and as described in our responses to reviewer #1), a limitation of this design is that environmental odorants, especially at high concentrations, are likely to stimulate responsive OSNs on the closed side of the OE in addition to the open side due to transnasal and/or retronasal air flow.

      (2) Experiments performed on non-occluded animals were designed to provide critical complementary evidence that specific OSN subtypes undergo accelerated neurogenesis in the presence of specific odors. Using this design, we have found compelling evidence that:

      - Exposure of non-occluded mice to male odors causes the selective acceleration of the birthrate of Olfr235 OSNs (Figure 3G, H).

      - Exposure of non-occluded female mice to two different musk odorants (muscone and ambretone) selectively accelerates the birthrates three different musk responsive subtypes: Olfr235, Olfr1440, and Olf1431 (Figure 4D-F and Figure 4–figure supplement 4C).

      We have reorganized the revised manuscript to more clearly present the most important experimental findings using these two experimental designs. We have also highlighted (via schematics) the experimental conditions (e.g., UNO, non-occlusion, odor exposure) used for each experiment.

      Another weakness is that the authors build their model (Figure 8), specifically the concept of selectivity, on a receptor-ligand pair (Olfr912 that has been shown to respond, among other odors, to the male-specific non-musk odors 2-sec-butyl-4,5-dihydrothiazole (SBT)) that would require at least some independent experimental corroboration. At least, a control experiment that uses SBT instead of muscone exposure should be performed.

      We agree that this important concern deserves additional control experiments and discussion. We have addressed this concern within the revised manuscript as follows:

      - Within the Results section, we have added multiple new control experiments (detailed in response to Reviewer #1), including the one recommended above. As suggested, we quantified newborn OSNs of the SBT-responsive subtype Olfr912 in non-occluded females that were either exposed to 0.1% SBT or unexposed controls. Exposure of SBT was found to cause no significant increase in quantities of newborn Olfr912 OSNs (newly added Figure 4–figure supplement 4C-left). These findings further support the model in Figure 7 (previously Figure 8) that only a fraction of OSN subtypes have a capacity to undergo accelerated neurogenesis in the presence of odors that activate them.

      - Also within the Results section, we have made efforts to better highlight relevant control experiments that were included in the original version, particularly those showing that quantities of newborn Olfr912 OSNs are not affected by UNO in mice exposed to male odors (Figure 2H and Figure 3–figure supplement 1G; previously Figure 2F and Figure 3H) or by exposure of non-occluded females to male odors (Figure 3H; previously Figure 6E). Since Olfr235 is responsive to component(s) of male odors (C. van der Linden et al., 2018; Vihani et al., 2020), these results indicate that this subtype does not have the capacity of stimulation-dependent neurogenesis, which is consistent with our previous findings that only a fraction of subtypes have this capacity (C. J. van der Linden et al., 2020).

      In this context, it is somewhat concerning that some results, which appear counterintuitive (e.g., lower representation and/or transcript levels of Olfr912 and Olfr1295 in mice exposed to male odors) are brushed off as "reflecting reduced survival due to overstimulation." The notion of "reduced survival" could be tested by, for example, a caspase3 assay.

      This is a point that we agree deserves further discussion. Please see the explanation that we have outlined above in response to Reviewer #1.

      Within the revised manuscript, we have expanded the Introduction to describe evidence from previous studies that exposure to stimulating odors causes two categories of changes to specific OSN subtypes: elevated representations or reduced representations within the OSN population. We outline evidence from previous studies that Olfr912 and Olfr1295 belong to the latter category, and that the representations of these subtypes are likely reduced by male odor overstimulation-dependent shortening of OSN lifespan.

      Important analyses that need to be done to better be able to interpret the findings are to present (i) the OR+/EdU+ population of olfactory sensory neurons not just as a count per hemisection, but rather as the ratio of OR+/EdU+ cells among all EdU+ cells; and (ii) to the ratio of EdU+ cells among all nuclei (UNO versus open naris). This way, data would be normalized to (i) the overall rate of neurogenesis and (ii) any broad deprivation-dependent epithelial degeneration.

      We have addressed this concern in two ways within the revised manuscript:

      (1) We have noted within the Methods section that the approach of using half-sections for normalization has been used in multiple previous studies for quantifying newborn (OR+/EdU+) and total (OR+) OSN abundances (Hossain et al., 2023; Ibarra-Soria et al., 2017; C. van der Linden et al., 2018; C. J. van der Linden et al., 2020). Additionally, within the figure legends and Methods, we have more thoroughly described the approach used, including that it relies on averaging the quantifications from at least 5 high-quality coronal OE tissue sections that are evenly distributed throughout the anterior-posterior length of each OE and thereby mitigates the effects of section size and cell number variation among sections. In the case of UNO treated mice, the open and closed sides within the same section are paired, which further reduces the effects of section-to section variation. We have found that this approach yields reproducible quantities of newborn and total OSNs among biological replicate mice and enables accurate assessment of how quantities of OSNs of specific subtypes change as a result of altered olfactory experience, a key objective of this study.

      (2) To assess whether the use of alternative approaches for normalizing newborn OSN quantities suggested by the reviewers would affect the present study’s findings, we compared three methods for normalizing the effects of exposure to male odors or muscone on quantities of newborn Olfr235 OSNs in the OEs of both UNO-treated and non-occluded mice: 1) OR+/EdU+ OSNs per half-section (used in this study), 2) OR+/EdU+ OSNs per total number of EdU+ cells (reviewer suggestion (i)), and 3) OR+/EdU+ OSNs per unit of DAPI+ area (an approximate measure of nuclei number; reviewer suggestion (ii)). The three normalization methods yielded statistically indistinguishable differences in assessing the effects of exposure of either UNO-treated or non-occluded mice to male odors (newly added Figure 2–figure supplement 2 and Figure 3–figure supplement 2), or of exposure of non-occluded mice to muscone (newly added Figure 4–figure supplement 3). Based on these findings, and the considerable time that would be required to renormalize all data in the manuscript, we have chosen to maintain the use of normalization per half-section.

      Finally, the paper will benefit from improved data presentation and adequate statistical testing. Images in Figures 2 - 7, showing both EdU labeling of newborn neurons and OR-specific RNA-FISH, are hard to interpret. Moreover, t-tests should not be employed when data is not normally distributed (as is the case for most of their samples).

      We have made extensive changes within the revised manuscript to increase the clarity and interpretability of the figures, including:

      (1) Addition of a split-channel, high-magnification view of a representative image that shows the overlap of FISH and EdU signals (Figure 2D).

      (2) Addition of experimental schematics and timelines corresponding to each set of experiments.

      In the revised manuscript, several changes to the statistical tests have been made, as follows:

      (1) To assess deviation from normality of the histological quantifications of newborn and total OSNs of specific subtypes in this study, all datasets were tested using the Shapiro-Wilk test for non-normality and the P values obtained are included in Supplementary file 1 (figure source data). Of the 274 datasets tested, 253 were found to have Shapiro-Wilk P values > 0.05, indicating that the vast majority (92%) do not show evidence of significant deviation from a normal distribution.

      (2) A general lack of deviation of the datasets in this study from a normal distribution is further supported by quantile-quantile (QQ) plots, which compare actual data to a theoretically normal distribution (Appendix 4–figure 1). The datasets analyzed were separated into the following categories:

      a. Quantities of newborn OSNs in UNO treated mice (Appendix 4-figure 1A)

      b. Quantities of total OSNs in UNO treated mice (Appendix 4-figure 1B)

      c. Quantities of newborn OSNs in non-occluded mice (Appendix 4-figure 1C)

      d. UNO effect sizes for newborn or total OSNs (Appendix 4-figure 1D)

      (3) Results of both parametric and non-parametric statistical tests of comparisons in this study have been included in Supplementary file 2 (statistical analyses). In general, the results from parametric and non-parametric tests are in good agreement.

      (4) Statistical analyses of differences in OSN quantities in the OEs of non-occluded mice or UNO effect sizes in UNO-treated mice subjected more than two different experimental conditions have now been performed using one-way ANOVA tests, FDR-adjusted using the 2-stage linear step-up procedure of Benjamini, Krieger and Yekutieli.

      Reviewer #2 (Recommendations for the Authors):

      The manuscript by Hossain et al. would benefit from a thorough revision. Here, we outline several points that should be addressed:

      Figure 3E - I & Figure 4E&F: Red lines that connect mean values are misleading.

      Within the revised manuscript, the UNO effect size graphs have been modified for clarity, including removal of the lines between mean values except for those comparing changes over time post EdU injection (Figure 6 and Figure 6-figure supplement 1). For these latter graphs, we think that lines help to illustrate changes in effect sizes over time.

      Figure 3E - I: UNO effect sizes (right) should be tested via ANOVA.

      In the revised manuscript, statistical analyses of UNO effect sizes in UNO-treated mice subjected more than two different experimental conditions were done using one-way ANOVA tests, FDR-adjusted using the 2-stage linear step-up procedure of Benjamini, Krieger and Yekutieli (Figure 2-figure supplement 2; Figure 3; Figure 3-figure supplement 1; Figure 4; Figure 4-figure supplements 1, 2). The same tests were used for analysis of differences in OSN quantities in the OEs of non-occluded mice subjected more than two different experimental conditions (Figure 3; Figure 3-figure supplement 2; Figure 4; Figure 4-figure supplements 3, 4). For comparisons of differences in quantities of newborn OSNs of musk-responsive subtypes at 4 and 7 days post-EdU between non-occluded mice exposed and unexposed to muscone, a two sample ANOVA - fixed-test, using F distribution (right-tailed) was used (Figure 6; Figure 6-figure supplement 1).

      Images in Figures 2 - 7, showing both EdU labeling of newborn neurons and OR-specific RNA-FISH: Colabeling is hard / often impossible to discern. Show zoom-ins and better explain the criteria for "colabeling" in the methods.

      In the revised manuscript an enlarged and split-channel view of an image showing multiple newborn Olfr235 OSNs (OR+/EdU+) has been added (Figure 2D). A detailed description of the criteria for OR+/EdU+ OSNs is provided in Methods under the section “Histological quantification of newborn and total OSNs of specific subtypes.”

      Figure 1C: add Olfr912.

      As a control group for iOSN quantities of musk-responsive subtypes in Figure 1, we selected random subtypes that are expressed in the same zones: 2 and 3. Olfr912 OSNs were not included because this subtype was not randomly chosen, nor is it expressed the same zones (Olfr912 is expressed in zone 4). We also note that the scRNA-seq analysis was done to allow an initial exploration of the hypothesis that some OSN subtypes with that are more highly represented in mice exposed to male odors show stimulation-dependent neurogenesis. Considering that the scRNA-seq datasets contain only small numbers of iOSNs of specific subtypes, we think they are more useful for analyzing changes in birthrates within groups of subtypes (e.g., musk responsive, random) rather than individual subtypes.

      The time of OE dissection is different for data shown in Figure 1 (P28) as compared to other figures (P35). Please comment/discuss.

      Within the Results section of the revised manuscript, we have now clarified that the PD 28 timepoint chosen for EdU birthdating in the histological quantification of newborn OSNs of specific subtypes is analogous to the PD 28 timepoint chosen for identification of immature (Gap43-expressing) OSNs in the scRNA-seq samples. In the case of EdU birthdating, it is necessary to provide a chase period of sufficient length to enable robust and stable expression of an OR, which defines the subtype. A chase period of 7 days was chosen based on a previous study (C. J. van der Linden et al., 2020). Hence, a dissection date of PD 35 was chosen.

      Figure 3F&G: please discuss the female à female effects

      In the Results and Discussion sections of the revised manuscript, we discuss our observation that the Olfr1440 and Olfr1431 subtypes show significantly higher quantities of newborn OSNs on the open side compared to closed sides in UNO-treated females. We speculate that these subtypes may receive some odor stimulation in juvenile females, perhaps via musk or related odors emitted by females themselves or from elsewhere within the environment.

      Figure 4E (and other examples): male à male displays two populations (no effect versus effect); please explain/speculate.

      For some UNO effect sizes, there appears to be high degree of variation among mice, and, in some cases, this diversity appears to cause the data to separate into groups. We assessed whether this diversity might reflect mice that came from different litters, but this is not the case. Rather, we speculate that the observed diversity most likely reflects low representations of newborn OSNs of some subtypes and/or under specific conditions. The data referred to by the reviewer (now Figure 3–figure supplement 3D), for example, shows UNO effect sizes for quantities of newborn Olfr1431 OSNs, which has the lowest representation among the musk-responsive subtypes analyzed in this study.

      Figure 5C-E: It is unclear why strong muscone concentrations (10%) have no effect, whereas no muscone sometimes (D&E) has an effect.

      As discussed in response to comments from Reviewer #1, we speculate that fluctuations in UNO effect sizes in muscone-exposed mice, particularly at high muscone concentrations, may be due, at least in part, to transnasal and/or retronasal air flow [reviewed in (Coppola, 2012)], which would be expected to result in exposure of the closed side of the OE to muscone concentrations that increase with increasing environmental concentrations. In support of this, quantities of newborn Olfr235 (Figure 4C-middle) and Olfr1440 (Figure 4–figure supplement 1A-middle) OSNs increase on both the open and closed sides with increasing muscone concentration (except at the highest concentration, 10%, in the case of Olfr1440). We speculate that reductions in newborn Olfr1440 OSN quantities observed in the presence of 10% muscone may reflect overstimulation-dependent reductions in survival.

      As emphasized above, our study also includes experiments on non-occluded animals (Figures 3, 4, 5). Findings from these experiments provide additional evidence that exposure to multiple musk odorants (muscone, ambretone) causes selective increases in the birthrates of multiple musk-responsive OSN subtypes (Olfr235, Olfr1440, Olfr1431).

      We have included an extensive interpretation of UNO-based experiments, including their limitations, within the Results section of the revised manuscript.

      Figure S1: please explain the large error bars regarding "Transcript level".

      We have clarified that the error bars in this figure, which is now Appendix 1–figure 1, correspond to 95% confidence intervals.

      The figure captions could be improved for ease of reading.

      Figure captions have been revised for increased clarity.

      Figure 4: Include Olfr235 data for consistency.

      All OSN subtypes analyzed for the effects of exposure to adult mice on UNO-induced open-side biases in quantities of newborn OSNs have been included in a single figure, which is now Figure 3–figure supplement 3.

      Figure S6F&G: Do not run statistics on n = 2 (G) or 3 (F) samples.

      We have removed statistical test results from comparisons involving fewer than 4 observations.

      Reviewer #3 (Public Review):

      Summary:

      Neurogenesis in the mammalian olfactory epithelium persists throughout the life of the animal. The process replaces damaged or dying olfactory sensory neurons. It has been tacitly that replacement of the OR subtypes is stochastic, although anecdotal evidence has suggested that this may not be the case. In this study, Santoro and colleagues systematically test this hypothesis by answering three questions: is there enrichment of specific OR subtypes associated with neurogenesis? Is the enrichment dependent on sensory stimulus? Is the enrichment the result of differential generation of the OR type or from differential cell death regulated by neural activity? The authors provide some solid evidence indicating that musk odor stimulus selectively promotes the OR types expressing the musk receptors. The evidence argues against a random selection of ORs in the regenerating neurons.

      Strengths:

      The strength of the study is a thorough and systematic investigation of the expression of multiple musk receptors with unilateral naris occlusion or under different stimulus conditions. The controls are properly performed. This study is the first to formulate the selective promotion hypothesis and the first systematic investigation to test it. The bulk of the study uses in situ hybridization and immunofluorescent staining to estimate the number of OR types. These results convincingly demonstrate the increased expression of musk receptors in response to male odor or muscone stimulation.

      Weaknesses:

      A major weakness of the current study is the single-cell RNASeq result. The authors use this piece of data as a broad survey of receptor expression in response to unilateral nasal occlusion. However, several issues with this data raise serious concerns about the quality of the experiment and the conclusions. First, the proportion of OSNs, including both the immature and mature types, constitutes only a small fraction of the total cells. In previous studies of the OSNs using the scRNASeq approach, OSNs constitute the largest cell population. It is curious why this is the case. Second, the authors did not annotate the cell types, making it difficult to assess the potential cause of this discrepancy. Third, given the small number of OSNs, it is surprising to have multiple musk receptors detected in the open side of the olfactory epithelium whereas almost none in the closed side. Since each OR type only constitutes ~0.1% of OSNs on average, the number of detected musk receptors is too high to be consistent with our current understanding and the rest of the data in the manuscript. Finally, unlike the other experiments, the authors did not describe any method details, nor was there any description of quality controls associated with the experiment. The concerns over the scRNASeq data do not diminish the value of the data presented in the bulk of the study but could be used for further analysis.

      We are grateful to the reviewer for raising these important questions.

      In the revised manuscript, we have clarified that the scRNA-seq dataset presented in the original version of the manuscript (now called dataset OE 1) was published and described in detail in a previous study (C. J. van der Linden et al., 2020). The reviewer is correct that the proportion of OSNs within that dataset was lower in that dataset than in other datasets that have been published more recently (using updated methods). We think this is likely because of the way that the cells were processed (e.g., from cryopreserved single cells followed by live/dead selection). However, because the open and closed sides were processed identically, we do not expect the ratios of OSNs of specific subtypes to be greatly affected. Hence, the differences observed for specific OSN subtypes on the open versus closed sides are expected to be valid.

      As the reviewer notes, there is a surprisingly large difference between the number of OSNs of musk-responsive subtypes on the open and closed sides within the OE 1 dataset. This difference is a key piece of information that led us to formulate the hypothesis in the study: that musk responsive subtypes are born at a higher rate in the presence of male/musk odor stimulation. And while it is true that, on average, each subtype represents ~0.1% of the population, it is known that there is wide variance in representations among different subtypes [e.g., (Ibarra-Soria et al., 2017)]. The frequencies of the musk responsive subtypes among all OSNs on the open side of OE 1 (0.3% for Olfr235, 0.4% for olfr1440, 0.06% for Olfr1434, 0% for olfr1431, and 1% for Olfr1437) are in line with previous findings.

      To confirm that the scRNA-seq findings from dataset OE 1 are not an artifact of the cell preparation methods used, we generated a second scRNA-seq dataset, OE 2, which has been added to the revised manuscript (Figure 1). The OE 2 dataset was prepared according to the same experimental timeline as OE 1, but the cells were captured immediately after dissociation and live/dead sorting via FACS. As expected, most cells within OE 2 dataset are OSNs (77% on the open side, 66% on the closed). Importantly, like the OE 1 dataset, the OE 2 dataset shows higher quantities of iOSNs of musk responsive subtypes on the open side of the OE compared to the closed (normalized for either total cells or total OSNs) (Figure 1–figure supplement 1D, E).

      A weakness of the experiment assessing musk receptor expression is that the authors do not distinguish immature from mature OSNs. Immature OSNs express multiple receptor types before they commit to the expression of a single type. The experiments do not reveal whether mature OSNs maintain an elevated expression level of musk receptors.

      While it is established that multiple ORs are coexpressed at a low level during OSN differentiation (Bashkirova et al., 2023; Fletcher et al., 2017; Hanchate et al., 2015; Pourmorady et al., 2024; Saraiva et al., 2015; Scholz et al., 2016; Tan et al., 2015), this has been found to occur primarily at the immediate neuronal precursor 3 (INP3) stage (Bashkirova et al., 2023; Fletcher et al., 2017), which is characterized by expression of Tex15 (Fletcher et al., 2017; Pourmorady et al., 2024) and precedes the immature OSN (iOSN) stage, which is characterized by expression of Gap43 (Fletcher et al., 2017; McIntyre et al., 2010; Verhaagen et al., 1989). Within the scRNA-seq datasets in the present study, iOSNs of specific subtypes are identified based on robust expression of Gap43 (Log<sup>2</sup> UMI > 1) and a specific OR gene (Log<sup>2</sup> UMI > 2), as described in the figures and methods. Thus, the cells defined as iOSNs are expected to express a single OR gene and this expression should be maintained as iOSNs transition to mOSNs. To confirm these predictions, we carried out a detailed analysis of OR expression at three different stages of OSN differentiation: INP3, iOSN, and mOSN (Figure 1–figure supplement 2). The cells chosen for analysis express the musk-responsive ORs Olfr235 or Olfr1440 or a randomly chosen OR Olfr701, in addition to markers that define INP3, iOSN, or mOSN cells. As expected, individual iOSNs and mOSNs of musk-responsive subtypes were found to exhibit robust and singular OR expression on the open and closed sides of OEs from UNO-treated mice. Moreover, and as observed previously, INP3 cells coexpress multiple OR transcripts at low levels. A detailed description of how the analysis was performed is included in the Methods section under Quantification and statistical analysis.

      Within the histology-based quantifications, newborn OSNs are identified based on their robust RNA-FISH signals corresponding to a specific OR transcript and an EdU label. Considering the EdU chase time of 7 days, most EdU-positive cells are expected to have passed the INP3 stage and be iOSNs or mOSNs. Moreover, considering the low level of OR expression within INP3 cells, it is unlikely OR transcripts are expressed at a high enough level to be detectable and/or counted at this stage and thereby affect newborn OSN quantifications.

      There are also two conceptual issues that are of concern. The first is the concept of selective neurogenesis. The data show an increased expression of musk receptors in response to male odor stimulation. The authors argue that this indicates selective neurogenesis of the musk receptor types. However, it is not clear what the distinction is between elevated receptor expression and a commitment to a specific fate at an early stage of development. As immature OSNs express multiple receptors, a likely scenario is that some newly differentiated immature OSNs have elevated expression of not only the musk receptors but also other receptors. The current experiments do not distinguish the two alternatives. Moreover, as pointed out above, it is not clear whether mature OSNs maintain the increased expression. Although a scRNASeq experiment can clarify it, the authors, unfortunately, did not perform an in-depth analysis to determine at which point of neurogenesis the cells commit to a specific musk receptor type. The quality of the scRNASeq data unfortunately also does not lend confidence for this type of analysis.

      The addition of a second scRNA-seq dataset within the revised manuscript (Figure 1), combined with the new scRNA-seq-based analyses of OR expression in INP3, iOSN, and mOSN cells (Figure 1-figure supplement 2), provide strong evidence that iOSNs and mOSNs robustly express a single OR gene and that cellular expression is stable from the iOSN to the mOSN stage. These analyses do not support a scenario in which odor stimulation causes upregulated expression of multiple ORs and thereby causes apparent increases in quantities of newly generated OSNs that express musk-responsive ORs. Rather, the data firmly support a mechanism in which odor stimulation increases quantities of newly generated OSNs that have stably committed to the robust expression of a single musk-responsive OR.

      A second conceptual issue, the idea of homeostasis in regeneration, which the authors presented in the Introduction, needs clarification. In its current form, it is confusing. It could mean that a maintenance of the distribution of receptor types, or it could mean the proper replacement of a specific OR type upon the loss of this type. The authors seem to refer to the latter and should define it properly.

      We have revised the Introduction section to clarify our use of the term homeostatic in one instance (paragraph 4) and replace it with more specific language in a second instance (paragraph 5).

      Reviewer #3 (Recommendations For The Authors):

      Concerns over scRNASeq data. It appears that the samples may have included non-OE tissues, which reduced the representation of the OSNs. This experiment may need to be repeated to increase the number of OSNs.

      As outlined in the response to the public comments, we think that the low proportion of OSNs in the OE 1 data set reflects how the cells were prepared and processed. We have now included a second scRNA-seq dataset to address this concern.

      Cell types should be identified in the scRNASeq analysis, and the number of cells documented for each cell type, at least for the OSNs. The data should be made available for general access.

      We have now clarified that the OE 1 dataset was published as part of a previous study (C. J. van der Linden et al., 2020) and was made publicly available as part of that study (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE157119). All cell types in the newly generated OE 2 dataset have been annotated (Figure 1) and this dataset has also been made publicly available (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE278693). The numbers and percentages of OSNs within OE 1 and OE 2 datasets have been added to the legend of Figure 1-figure supplement 1.

      The specific OR types should be segregated for mature and immature OSNs. The percentage of a specific OR type should be normalized to the total number of OSNs, rather than the total cells. The current quantification is misleading because it gives the false sense that the muscone receptors represent ~0.1% of cells when the proportion is much higher if only OSNs are considered.

      In the revised manuscript, quantities of iOSNs (Gap43+ cells) of specific subtypes within the OE 1 and OE 2 scRNA-seq datasets are graphed as percentages of both all OSNs (Figure 1E, Figure 1–figure supplement 1D) and all cells (Figure 1–figure supplement 1E). As a percentage of all OSNs, average quantities of iOSNs of musk responsive subtypes on the open side of the OE range from 0.005% (for Olfr1431) to 0.14% (for Olfr1440) (Figure 1E).

      Within the feature plots for the two datasets, the differentiation stages of indicated OSNs have been clearly defined within the figures and figure legends. For the OE 1 dataset, iOSNs are differentiated from mOSNs by arrows (Figure 1–figure supplement 1C). For the OE 2 dataset (Figure 1D), only immature OSNs are shown for simplicity.

      Technical details of the scRNASeq should be documented. In the feature plot of musk-response receptors (Figure. 1D), it is better to use the actual quantity of expression rather than binarized representation (with or without an OR). If one needs to use on/off to determine the number of cells for a given OR type, then the criteria of selection should be given.

      Technical details of generation of the scRNA-seq datasets have been documented in the “Method details” section (for the OE 2 dataset) and in the method section of our previous publication of the OE 1 dataset (C. J. van der Linden et al., 2020). Details of the scRNA-seq analyses, including the criteria used to define immature OSNs of specific subtypes, are documented within the “Quantification and statistical analysis” section.

      Within the feature plots, we have decided to show OSNs of a given subtype in a binary fashion using specific colors for the sake of simplicity (Figure 1D, Figure 1-figure supplement 1C). To address the reviewer’s cooncern, we have added a new figure that provides detailed information about OR transcript expression (levels and genes) within iOSNs and mOSNs of two different musk responsive subtypes and a randomly chosen subtype (Figure 1-figure supplement 2).

      An in-depth analysis of the onset of OR expression in the GBC, INP, immature, and mature OSNs should be performed. It is also important to determine how many other receptors are detected in the cells that express the musk receptors. The current scRNASeq data may not be of sufficiently high quality and the experiment needs to be repeated. It is also important for the authors to take measures to eliminate ambient RNA contamination.

      The revised manuscript includes a second scRNA-seq dataset (OE 2; Figure 1). Details of how both the original (OE 1) and new datasets were generated have been documented within the Methods sections of the corresponding publications [(C. J. van der Linden et al., 2020); present study]. For both datasets, live/dead selection of cells was performed, which was expected to reduce ambient RNA.

      The revised manuscript also includes a new figure that provides detailed information about OR transcript expression within INP3, iOSN and mOSN cells that express one of two different musk responsive ORs or a randomly chosen OR (Figure 1-figure supplement 2). These data reveal, as reported previously (Bashkirova et al., 2023; Fletcher et al., 2017; Pourmorady et al., 2024), that low levels of multiple OR transcripts are detected in INP3 (Tex15+) cells. By contrast, iOSN (Gap43+) and mOSN (Omp+) cells robustly express a single OR, with little or no expression of other ORs.

      Quantification of cells for Figure 2-7 should be changed. Instead of using cell number per 1/2 section, the data should be calculated using density (using the area of the epithelium or normalized to the total number of cells (based on DAPI staining). This is because multiple sections are taken from the same mouse along the A-P axis. These sections have different sizes and numbers of cells.

      As noted in response to a similar concern of Reviewer #2, this has been addressed in two ways within the revised manuscript:

      (1) We have noted within the Methods section that the approach of using half-sections for normalization has been used in multiple previous studies for quantifying newborn (OR+/EdU+) and total (OR+) OSN abundances (Hossain et al., 2023; Ibarra-Soria et al., 2017; C. van der Linden et al., 2018; C. J. van der Linden et al., 2020). Additionally, within the figure legends and Methods, we have more thoroughly described the approach used, including that it relies on averaging the quantifications from at least 5 high-quality coronal OE tissue sections that are evenly distributed throughout the anterior-posterior length of each OE and thereby mitigates the effects of section size and cell number variation among sections. In the case of UNO treated mice, the open and closed sides within the same section are paired, which further reduces the effects of section-to section variation. We have found that this approach yields reproducible quantities of newborn and total OSNs among biological replicate mice and enables accurate assessment of how quantities of OSNs of specific subtypes change as a result of altered olfactory experience, a key objective of this study.

      (2) To assess whether the use of alternative approaches for normalizing newborn OSN quantities suggested by the reviewers would affect the present study’s findings, we compared three methods for normalizing the effects of exposure to male odors or muscone on quantities of newborn Olfr235 OSNs in the OEs of both UNO-treated and non-occluded mice: 1) OR+/EdU+ OSNs per half-section (used in this study), 2) OR+/EdU+ OSNs per total number of EdU+ cells (reviewer suggestion (i)), and 3) OR+/EdU+ OSNs per unit of DAPI+ area (an approximate measure of nuclei number; reviewer suggestion (ii)). The three normalization methods yielded statistically indistinguishable differences in assessing the effects of exposure of either UNO-treated or non-occluded mice to male odors (newly added Figure 2–figure supplement 2 and Figure 3–figure supplement 2), or of exposure of non-occluded mice to muscone (newly added Figure 4–figure supplement 3). Based on these findings, and the considerable time that would be required to renormalize all data in the manuscript, we have chosen to maintain the use of normalization per half-section.

      References

      Bashkirova, E. V., Klimpert, N., Monahan, K., Campbell, C. E., Osinski, J., Tan, L., Schieren, I., Pourmorady, A., Stecky, B., Barnea, G., Xie, X. S., Abdus-Saboor, I., Shykind, B. M., Marlin, B. J., Gronostajski, R. M., Fleischmann, A., & Lomvardas, S. (2023). Opposing, spatially-determined epigenetic forces impose restrictions on stochastic olfactory receptor choice. eLife, 12, RP87445. https://doi.org/10.7554/eLife.87445

      Coppola, D. M. (2012). Studies of olfactory system neural plasticity: The contribution of the unilateral naris occlusion technique. Neural Plasticity, 2012, 351752. https://doi.org/10.1155/2012/351752

      Fletcher, R. B., Das, D., Gadye, L., Street, K. N., Baudhuin, A., Wagner, A., Cole, M. B., Flores, Q., Choi, Y. G., Yosef, N., Purdom, E., Dudoit, S., Risso, D., & Ngai, J. (2017). Deconstructing Olfactory Stem Cell Trajectories at Single-Cell Resolution. Cell Stem Cell, 20(6), 817-830.e8. https://doi.org/10.1016/j.stem.2017.04.003

      Han, X., Jiang, Y., Feng, N., Yang, P., Zhang, M., Jin, W., Zhang, T., Huang, Z., Zhao, H., Zhang, K., Liu, S., & Hu, D. (2022). Comparison of the Homology Between Muskrat Scented Gland and Mouse Preputial Gland. Journal of Mammalian Evolution, 29(2), 435–446. https://doi.org/10.1007/s10914-022-09604-w

      Hanchate, N. K., Kondoh, K., Lu, Z., Kuang, D., Ye, X., Qiu, X., Pachter, L., Trapnell, C., & Buck, L. B. (2015). Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis. Science (New York, N.Y.), 350(6265), 1251–1255. https://doi.org/10.1126/science.aad2456

      Hossain, K., Smith, M., & Santoro, S. W. (2023). A histological protocol for quantifying the birthrates of specific subtypes of olfactory sensory neurons in mice. STAR Protocols, 4(3), 102432. https://doi.org/10.1016/j.xpro.2023.102432

      Ibarra-Soria, X., Nakahara, T. S., Lilue, J., Jiang, Y., Trimmer, C., Souza, M. A., Netto, P. H., Ikegami, K., Murphy, N. R., Kusma, M., Kirton, A., Saraiva, L. R., Keane, T. M., Matsunami, H., Mainland, J., Papes, F., & Logan, D. W. (2017). Variation in olfactory neuron repertoires is genetically controlled and environmentally modulated. eLife, 6. https://doi.org/10.7554/eLife.21476

      Kelemen, G. (1947). The junction of the nasal cavity and the pharyngeal tube in the rat. Archives of Otolaryngology, 45(2), 159–168. https://doi.org/10.1001/archotol.1947.00690010168002

      Lin, D. Y., Zhang, S.-Z., Block, E., & Katz, L. C. (2005). Encoding social signals in the mouse main olfactory bulb. Nature, 434(7032), 470–477. https://doi.org/10.1038/nature03414

      McIntyre, J. C., Titlow, W. B., & McClintock, T. S. (2010). Axon growth and guidance genes identify nascent, immature, and mature olfactory sensory neurons. Journal of Neuroscience Research, 88(15), 3243–3256. https://doi.org/10.1002/jnr.22497

      Pourmorady, A. D., Bashkirova, E. V., Chiariello, A. M., Belagzhal, H., Kodra, A., Duffié, R., Kahiapo, J., Monahan, K., Pulupa, J., Schieren, I., Osterhoudt, A., Dekker, J., Nicodemi, M., & Lomvardas, S. (2024). RNA-mediated symmetry breaking enables singular olfactory receptor choice. Nature, 625(7993), 181–188. https://doi.org/10.1038/s41586-023-06845-4

      Saraiva, L. R., Ibarra-Soria, X., Khan, M., Omura, M., Scialdone, A., Mombaerts, P., Marioni, J. C., & Logan, D. W. (2015). Hierarchical deconstruction of mouse olfactory sensory neurons: From whole mucosa to single-cell RNA-seq. Scientific Reports, 5, 18178. https://doi.org/10.1038/srep18178

      Sato-Akuhara, N., Horio, N., Kato-Namba, A., Yoshikawa, K., Niimura, Y., Ihara, S., Shirasu, M., & Touhara, K. (2016). Ligand Specificity and Evolution of Mammalian Musk Odor Receptors: Effect of Single Receptor Deletion on Odor Detection. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 36(16), 4482–4491. https://doi.org/10.1523/JNEUROSCI.3259-15.2016

      Scholz, P., Kalbe, B., Jansen, F., Altmueller, J., Becker, C., Mohrhardt, J., Schreiner, B., Gisselmann, G., Hatt, H., & Osterloh, S. (2016). Transcriptome Analysis of Murine Olfactory Sensory Neurons during Development Using Single Cell RNA-Seq. Chemical Senses, 41(4), 313–323. https://doi.org/10.1093/chemse/bjw003

      Schwende, F. J., Wiesler, D., Jorgenson, J. W., Carmack, M., & Novotny, M. (1986). Urinary volatile constituents of the house mouse,Mus musculus, and their endocrine dependency. Journal of Chemical Ecology, 12(1), 277–296. https://doi.org/10.1007/BF01045611

      Shirasu, M., Yoshikawa, K., Takai, Y., Nakashima, A., Takeuchi, H., Sakano, H., & Touhara, K. (2014). Olfactory receptor and neural pathway responsible for highly selective sensing of musk odors. Neuron, 81(1), 165–178. https://doi.org/10.1016/j.neuron.2013.10.021

      Tan, L., Li, Q., & Xie, X. S. (2015). Olfactory sensory neurons transiently express multiple olfactory receptors during development. Molecular Systems Biology, 11(12), 844. https://doi.org/10.15252/msb.20156639

      van der Linden, C. J., Gupta, P., Bhuiya, A. I., Riddick, K. R., Hossain, K., & Santoro, S. W. (2020). Olfactory Stimulation Regulates the Birth of Neurons That Express Specific Odorant Receptors. Cell Reports, 33(1), 108210. https://doi.org/10.1016/j.celrep.2020.108210

      van der Linden, C., Jakob, S., Gupta, P., Dulac, C., & Santoro, S. W. (2018). Sex separation induces differences in the olfactory sensory receptor repertoires of male and female mice. Nature Communications, 9(1), 5081. https://doi.org/10.1038/s41467-018-07120-1

      Verhaagen, J., Oestreicher, A. B., Gispen, W. H., & Margolis, F. L. (1989). The expression of the growth associated protein B50/GAP43 in the olfactory system of neonatal and adult rats. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 9(2), 683–691.

      Vihani, A., Hu, X. S., Gundala, S., Koyama, S., Block, E., & Matsunami, H. (2020). Semiochemical responsive olfactory sensory neurons are sexually dimorphic and plastic. eLife, 9, e54501. https://doi.org/10.7554/eLife.54501

    1. And his trails do not fade

      Trails will never fade

      4 - IndyWeb - TrailScape - TrailMarks - ClueTrails - HyperMaps

      of Individual, collaborative Trails blazed by Trail

      Eventually everything connects

      Just connect

      https://hypothes.is/users/gyuri?q=connections+key

      People Ideas and things

      Eventually everything connects — people, ideas, objects… the quality of the connections is the key to quality per se

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study offers a valuable investigation into the role of cholecystokinin (CCK) in thalamocortical plasticity during early development and adulthood, employing a range of experimental techniques. The authors demonstrate that tetanic stimulation of the auditory thalamus induces cortical long-term potentiation (LTP), which can be evoked through either electrical or optical stimulation of the thalamus or by noise bursts. They further show that thalamocortical LTP is abolished when thalamic CCK is knocked down or when cortical CCK receptors are blocked. Interestingly, in 18-month-old mice, thalamocortical LTP was largely absent but could be restored through the cortical application of CCK. The authors conclude that CCK contributes to thalamocortical plasticity and may enhance thalamocortical plasticity in aged subjects.

      While the study presents compelling evidence, I would like to offer several suggestions for the authors' consideration:

      (1) Thalamocortical LTP and NMDA-Dependence:

      It is well established that thalamocortical LTP is NMDA receptor-dependent, and blocking cortical NMDA receptors can abolish LTP. This raises the question of why thalamocortical LTP is eliminated when thalamic CCK is knocked down or when cortical CCK receptors are blocked. If I correctly understand the authors' hypothesis - that CCK promotes LTP through CCKR-intracellular Ca2+-AMPAR. This pathway should not directly interfere with the NMDA-dependent mechanism. A clearer explanation of this interaction would be beneficial.

      Thank you for your question regarding the role of CCK and NMDA receptors (NMDARs) in thalamocortical LTP. We propose that CCK receptor (CCKR) activation enhances intracellular calcium levels, which are crucial for thalamocortical LTP induction. Calcium influx through NMDARs is also essential to reach the threshold required for activating downstream signaling pathways that promote LTP (Heynen and Bear, 2001). Thus, CCKRs and NMDARs may function in a complementary manner to facilitate LTP, with both contributing to the elevation of intracellular calcium.

      However, it is important to note that the postsynaptic mechanisms of thalamocortical LTP in the auditory cortex (ACx) differ from those in other sensory cortices. Studies have shown that thalamocortical LTP in the ACx appears to be less dependent on NMDARs (Chun et al., 2013), which is distinct from somatosensory or visual cortices. Our previous studies also found that while NMDAR antagonists can block HFS-induced LTP in the inner ACx, LTP can still be induced in the presence of CCK even after the NMDARs blockade (Chen et al. 2019). These findings suggest that CCK may act through an alternative mechanism involving CCKR-mediated calcium signaling and AMPAR modulation, which partially compensates for the loss of NMDAR signaling. This distinction may reflect functional differences between the ACx and other sensory cortices, as highlighted in previous studies (King and Nelken, 2009).

      While our current study focuses on the role of CCKR-mediated plasticity in the auditory system, further investigations are needed to elucidate how CCKRs and NMDARs interact within the broader framework of thalamocortical neuroplasticity across different cortical regions. Understanding whether similar mechanisms operate in other sensory systems, such as the visual cortex, will be an important direction for future research.

      Heynen, A.J., and Bear, M.F. (2001). Long-term potentiation of thalamocortical transmission in the adult visual cortex in vivo. J Neurosci 21, 9801-9813. 10.1523/jneurosci.21-24-09801.2001.

      Chun, S., Bayazitov, I.T., Blundon, J.A., and Zakharenko, S.S. (2013). Thalamocortical Long-Term Potentiation Becomes Gated after the Early Critical Period in the Auditory Cortex. The Journal of Neuroscience 33, 7345-7357. 10.1523/jneurosci.4500-12.2013.

      Chen, X., Li, X., Wong, Y.T., Zheng, X., Wang, H., Peng, Y., Feng, H., Feng, J., Baibado, J.T., Jesky, R., et al. (2019). Cholecystokinin release triggered by NMDA receptors produces LTP and sound-sound associative memory. Proc Natl Acad Sci U S A 116, 6397-6406. 10.1073/pnas.1816833116.

      King, A. J., & Nelken, I. (2009). Unraveling the principles of auditory cortical processing: can we learn from the visual system? Nature neuroscience, 12(6), 698-701.

      (2) Complexity of the Thalamocortical System:

      The thalamocortical system is intricate, with different cortical and thalamic subdivisions serving distinct functions. In this study, it is not fully clear which subdivisions were targeted for stimulation and recording, which could significantly influence the interpretation of the findings. Clarifying this aspect would enhance the study's robustness.

      Thank you for your valuable feedback. We would like to clarify that stimulation was conducted in the medial geniculate nucleus ventral (MGv), and recording was performed in layer IV of the ACx. Targeting the MGv allows us to investigate the influence of thalamic inputs on auditory cortical responses. Layer IV of the ACx is known to receive direct thalamic projections, making it an ideal site for assessing how thalamic activity influences cortical processing. We will incorporate this clarification into the revised manuscript to enhance the robustness of our study.

      Results section:

      “Stimulation electrodes were placed in the MGB (specifically in the medial geniculate nucleus ventral subdivision, MGv), and recording electrodes were inserted into layer IV of ACx”

      “The recording electrodes were lowered into layer IV of ACx, while the stimulation electrodes were lowered into MGB (MGv subdivision). The final stimulating and recording positions were determined by maximizing the cortical fEPSP amplitude triggered by the ES in the MGB. The accuracy of electrode placement was verified through post-hoc histological examination and electrophysiological responses.”

      (3) Statistical Variability:

      Biological data, including field excitatory postsynaptic potentials (fEPSPs) and LTP, often exhibit significant variability between samples, sometimes resulting in a standard deviation that exceeds 50% of the mean value. The reported standard deviation of LTP in this study, however, appears unusually small, particularly given the relatively limited sample size. Further discussion of this observation might be warranted.

      Thank you for your question. In our experiments, the sample size N represents the number of animals used, while n refers to the number of recordings, with each recording corresponding to a distinct stimulation and recording sites. To adhere to ethical guidelines and minimize animal usage, we often perform multiple recordings within a single animal, such as from different hemispheres of the brain. Although N may appear small, our statistical analyses are based on n, ensuring sufficient data points for reliable conclusions.

      Furthermore, as our experiments are conducted in vivo, we observe lower variability in the increase of fEPSP slopes following LTP induction compared to brain slice preparations, where standard deviations exceeding 50% of the mean are common. This reduced variability likely reflects the robustness of the physiologically intact conditions in the in vivo setup.

      (4) EYFP Expression and Virus Targeting:

      The authors indicate that AAV9-EFIa-ChETA-EYFP was injected into the medial geniculate body (MGB) and subsequently expressed in both the MGB and cortex. If I understand correctly, the authors assume that cortical expression represents thalamocortical terminals rather than cortical neurons. However, co-expression of CCK receptors does not necessarily imply that the virus selectively infected thalamocortical terminals. The physiological data regarding cortical activation of thalamocortical terminals could be questioned if the cortical expression represents cortical neurons or both cortical neurons and thalamocortical terminals.

      Thank you for your question. In Figure 2A, EYFP expression indicates thalamocortical projections, while the co-expression of EYFP with PSD95 confirms the identity of thalamocortical terminals. The CCK-B receptors (CCKBR) are located on postsynaptic cortical neurons. The observed co-labeling of thalamocortical terminals and postsynaptic CCKBR suggests that CCK-expressing neurons in the medial geniculate body (MGB) can release CCK, which subsequently acts on the postsynaptic CCKBR. This evidence supports our interpretation of the functional role of CCK modulating neural plasticity between thalamocortical inputs and cortical neurons. As shown in Figure 2A, we aim to demonstrate that the co-labeling of thalamocortical terminals with CCK receptors accounts for a substantial proportion of the thalamocortical terminals. We will ensure that this clarification is emphasized in the revised manuscript to address your concerns.

      Results section:

      “Cre-dependent AAV9-EFIa-DIO-ChETA-EYFP was injected into the MGB of CCK-Cre mice. EYFP labeling marked CCK-positive neurons in the MGB. The co-expression of EYFP thalamocortical projections with PSD95 confirms the identity of thalamocortical terminals (yellow), which primarily targeted layer IV of the ACx (Figure 2A, upper panel). Immunohistochemistry revealed that a substantial proportion (15 out of 19, Figure 2A lower right panel) of thalamocortical terminals (arrows) colocalize with CCK receptors (CCKBR) on postsynaptic cortical neurons in the ACx (Figure 2A lower panel), supporting the functional role of CCK in modulating thalamocortical plasticity.”

      (5) Consideration of Previous Literature:

      A number of studies have thoroughly characterized auditory thalamocortical LTP during early development and adulthood. It may be beneficial for the authors to integrate insights from this body of work, as reliance on data from the somatosensory thalamocortical system might not fully capture the nuances of the auditory pathway. A more comprehensive discussion of the relevant literature could enhance the study's context and impact.

      Thank you for your valuable feedback. We will enhance our discussion on auditory thalamocortical LTP during early development and adulthood to provide a more comprehensive context for our study.

      (6) Therapeutic Implications:

      While the authors suggest potential therapeutic applications of their findings, it may be somewhat premature to draw such conclusions based on the current evidence. Although speculative discussion is not harmful, it may not significantly add to the study's conclusions at this stage.

      Thank you for your thoughtful feedback. We agree that the therapeutic applications mentioned in our study are speculative at this stage and should be regarded as a forward-looking perspective rather than definitive conclusions. Our intention was to highlight the broader potential of our findings to inspire further research, rather than to propose immediate clinical applications.

      In light of your feedback, we have adjusted the language in the manuscript to reflect a more cautious interpretation. Speculative discussions are now explicitly framed as hypotheses or possibilities for future exploration. We emphasize that our findings provide a foundation for further investigations into CCK-based plasticity and its implications.

      We believe that appropriately framed forward-thinking discussions are valuable in guiding the direction of future research. We sincerely hope that our current and future work will contribute to a deeper understanding of thalamocortical plasticity and, over time, potentially lead to advancements in human health.

      Reviewer #2 (Public review):

      Summary:

      This work used multiple approaches to show that CCK is critical for long-term potentiation (LTP) in the auditory thalamocortical pathway. They also showed that the CCK mediation of LTP is age-dependent and supports frequency discrimination. This work is important because it opens up a new avenue of investigation of the roles of neuropeptides in sensory plasticity.

      Strengths:

      The main strength is the multiple approaches used to comprehensively examine the role of CCK in auditory thalamocortical LTP. Thus, the authors do provide a compelling set of data that CCK mediates thalamocortical LTP in an age-dependent manner.

      Weaknesses:

      The behavioral assessment is relatively limited but may be fleshed out in future work.

      Reviewer #3 (Public review):

      Summary:

      Cholecystokinin (CCK) is highly expressed in auditory thalamocortical (MGB) neurons and CCK has been found to shape cortical plasticity dynamics. In order to understand how CCK shapes synaptic plasticity in the auditory thalamocortical pathway, they assessed the role of CCK signaling across multiple mechanisms of LTP induction with the auditory thalamocortical (MGB - layer IV Auditory Cortex) circuit in mice. In these physiology experiments that leverage multiple mechanisms of LTP induction and a rigorous manipulation of CCK and CCK-dependent signaling, they establish an essential role of auditory thalamocortical LTP on the co-release of CCK from auditory thalamic neurons. By carefully assessing the development of this plasticity over time and CCK expression, they go on to identify a window of time that CCK is produced throughout early and middle adulthood in auditory thalamocortical neurons to establish a window for plasticity from 3 weeks to 1.5 years in mice, with limited LTP occurring outside of this window. The authors go on to show that CCK signaling and its effect on LTP in the auditory cortex is also capable of modifying frequency discrimination accuracy in an auditory PPI task. In evaluating the impact of CCK on modulating PPI task performance, it also seems that in mice <1.5 years old CCK-dependent effects on cortical plasticity are almost saturated. While exogenous CCK can modestly improve discrimination of only very similar tones, exogenous focal delivery of CCK in older mice can significantly improve learning in a PPI task to bring their discrimination ability in line with those from young adult mice.

      Strengths:

      (1) The clarity of the results along with the rigor multi-angled approach provide significant support for the claim that CCK is essential for auditory thalamocortical synaptic LTP. This approach uses a combination of electrical, acoustic, and optogenetic pathway stimulation alongside conditional expression approaches, germline knockout, viral RNA downregulation, and pharmacological blockade. Through the combination of these experimental configures the authors demonstrate that high-frequency stimulation-induced LTP is reliant on co-release of CCK from glutamatergic MGB terminals projecting to the auditory cortex.

      (2) The careful analysis of the CCK, CCKB receptor, and LTP expression is also a strength that puts the finding into the context of mechanistic causes and potential therapies for age-dependent sensory/auditory processing changes. Similarly, not only do these data identify a fundamental biological mechanism, but they also provide support for the idea that exogenous asynchronous stimulation of the CCKBR is capable of restoring an age-dependent loss in plasticity.

      (3) Although experiments to simultaneously relate LTP and behavioral change or identify a causal relationship between LTP and frequency discrimination are not made, there is still convincing evidence that CCK signaling in the auditory cortex (known to determine synaptic LTP) is important for auditory processing/frequency discrimination. These experiments are key for establishing the relevance of this mechanism.

      Weaknesses:

      (1) Given the magnitude of the evoked responses, one expects that pyramidal neurons in layer IV are primarily those that undergo CCK-dependent plasticity, but the degree to which PV-interneurons and pyramidal neurons participate in this process differently is unclear.

      Thank you for this insightful comment. We agree that the differential roles of PV-interneurons and pyramidal neurons in CCK-dependent thalamocortical plasticity remain unclear and acknowledge this as an important limitation of our study. Our primary focus was on pyramidal neurons, as our in vivo electrophysiological recordings measured the fEPSP slope in layer IV of the auditory cortex, which primarily reflects excitatory synaptic activity. However, we recognize the critical role of the excitatory-inhibitory balance in cortical function and the potential contribution of PV-interneurons to this process. In future studies, we plan to utilize techniques such as optogenetics, two-photon calcium imaging and cell-type-specific recordings to investigate the distinct contributions of PV-interneurons and pyramidal neurons to CCK-dependent thalamocortical plasticity, thereby providing a more comprehensive understanding of how CCK modulates thalamocortical circuits.

      (2) While these data support an important role for CCK in synaptic LTP in the auditory thalamocortical pathway, perhaps temporal processing of acoustic stimuli is as or more important than frequency discrimination. Given the enhanced responsivity of the system, it is unclear whether this mechanism would improve or reduce the fidelity of temporal processing in this circuit. Understanding this dynamic may also require consideration of cell type as raised in weakness #1.

      Thank you for this thoughtful comment. We acknowledge that our study did not directly address the fidelity of temporal processing, which is indeed a critical aspect of auditory function. Our behavioral experiments primarily focused on linking frequency discrimination to the role of CCK in synaptic strengthening within the auditory thalamocortical pathway. However, we agree that enhanced responsivity of the system could also impact temporal processing dynamics, such as the precise timing of auditory responses. Whether this modulation improves or reduces the fidelity of temporal processing remains an open and important question.

      As you noted, understanding these dynamics will require a deeper investigation into the interactions between different cell types, particularly the balance between excitatory and inhibitory neurons. Exploring how CCK modulation affects both the circuit and cellular levels in temporal processing is an important direction for future research, which we plan to pursue. Thank you again for raising this important point.

      Disscusion section:

      “While we focused on homosynaptic plasticity at thalamocortical synapses by recording only fEPSPs in layer IV of ACx, it is essential to further explore heterosynaptic effects of CCK released from thalamocortical synapses on intracortical circuits, particularly its role in modulating the excitatory-inhibitory balance. PV-interneurons, as key regulators of cortical inhibition, may contribute to the temporal fidelity of sensory processing, which is critical for auditory perception (Nocon et al., 2023; Cai et al., 2018). Additionally, CCK may facilitate cross-modal plasticity by modulating heterosynaptic plasticity in interconnected cortical areas. Future studies would provide valuable insights into the broader role of CCK in shaping sensory processing and cortical network dynamics.”

      Nocon, J.C., Gritton, H.J., James, N.M., Mount, R.A., Qu, Z., Han, X., and Sen, K. (2023). Parvalbumin neurons enhance temporal coding and reduce cortical noise in complex auditory scenes. Communications Biology 6, 751. 10.1038/s42003-023-05126-0.

      Cai, D., Han, R., Liu, M., Xie, F., You, L., Zheng, Y., Zhao, L., Yao, J., Wang, Y., Yue, Y., et al. (2018). A Critical Role of Inhibition in Temporal Processing Maturation in the Primary Auditory Cortex. Cereb Cortex 28, 1610-1624. 10.1093/cercor/bhx057.

      (3) In Figure 1, an example of increased spontaneous and evoked firing activity of single neurons after HFS is provided. Yet it is surprising that the group data are analyzed only for the fEPSP. It seems that single-neuron data would also be useful at this point to provide insight into how CCK and HFS affect temporal processing and spontaneous activity/excitability, especially given the example in 1F.

      Thank you for your insightful comment. In our in vivo electrophysiological experiments on LTP induction, we recorded neural activity for over 1.5 hours to assess changes in neuronal responses over time, both prior to and following the induction. While single neuron firing data can provide valuable insights, such measurements are inherently more variable due to factors like cortical state fluctuations and the condition of nearby neurons, which makes them less reliable for long-term analysis. For this reason, we focused on fEPSP, as it offers a more stable and robust readout of synaptic activity over extended periods.

      We appreciate your suggestion and recognize the value of single-neuron data in understanding how CCK and HFS affect temporal processing and excitability. In future studies, we will consider to incorporate single-neuron analyses to complement our synaptic-level findings and provide a more comprehensive understanding of these mechanisms.

      (4) The authors mention that CCK mRNA was absent in CCK-KO mice, but the data are not provided.

      Thank you for your comment. Data from the CCK-KO mice are presented in Figure 3A (far right) and in the upper panel of Figure 3B (far right). In the lower panel of Figure 3B, data from the CCK-KO group are not shown because the normalized values for this group were essentially zero, as expected due to the absence of CCK mRNA.

      (5) The circuitry that determines PPI requires multiple brain areas, including the auditory cortex. Given the complicated dynamics of this process, it may be helpful to consider what, if anything, is known specifically about how layer IV synaptic plasticity in the auditory cortex may shape this behavior.

      Thank you for raising this important point. Pre-pulse inhibition (PPI) of the acoustic startle response indeed involves multiple brain regions, with the ascending auditory pathway playing a key role (Gómez-Nieto et al., 2020). Within the auditory cortex, layer IV neurons receive tonotopically organized inputs from the medial geniculate nucleus and are critical for integrating thalamic inputs and shaping auditory processing.

      In our behavioral experiments, mice were required to discriminate pre-pulses of varying frequencies against a continuous background sound. Given the role of auditory cortical neurons in integrating thalamic inputs and shaping auditory processing, it is likely that synaptic plasticity in these neurons contributes to the enhanced discrimination of pre-pulses. Supporting this idea, our previous work demonstrated that local infusion of CCK, paired with weak acoustic stimuli, significantly increased auditory responses in the auditory cortex (Li et al., 2014). In the current study, we further showed that CCK release during high-frequency stimulation of the thalamocortical pathway induced LTP in layer IV of the auditory cortex. Together, these findings suggest that CCK-dependent synaptic plasticity in layer IV may amplify the cortical representation of weak auditory inputs, thereby improving pre-pulses detection and enhancing PPI performance.

      It is also worth noting that aged mice with hearing loss typically exhibit PPI deficits due to impaired auditory processing (Ouagazzal et al., 2006 and Young et al., 2010). We propose that enhanced plasticity in the thalamocortical pathway, mediated by CCK, might partially compensate for these deficits by amplifying residual auditory signals in aged mice. However, the precise mechanisms by which layer IV synaptic plasticity modulates PPI behavior remain to be fully understood. Given the complex dynamics of sensory processing, future studies could explore how layer IV neurons interact with other cortical and subcortical circuits involved in PPI, as well as the specific contributions of excitatory and inhibitory cell types. These investigations will help provide a more comprehensive understanding of the role of CCK in modulating sensory gating and auditory processing.

      Gómez-Nieto, R., Hormigo, S., & López, D. E. (2020). Prepulse inhibition of the auditory startle reflex assessment as a hallmark of brainstem sensorimotor gating mechanisms. Brain sciences, 10(9), 639.

      Li, X., Yu, K., Zhang, Z., Sun, W., Yang, Z., Feng, J., Chen, X., Liu, C.-H., Wang, H., Guo, Y.P., and He, J. (2014). Cholecystokinin from the entorhinal cortex enables neural plasticity in the auditory cortex. Cell Research 24, 307-330. 10.1038/cr.2013.164.

      Ouagazzal, A. M., Reiss, D., & Romand, R. (2006). Effects of age-related hearing loss on startle reflex and prepulse inhibition in mice on pure and mixed C57BL and 129 genetic background. Behavioural brain research, 172(2), 307-315.

      Young, J. W., Wallace, C. K., Geyer, M. A., & Risbrough, V. B. (2010). Age-associated improvements in cross-modal prepulse inhibition in mice. Behavioral neuroscience, 124(1), 133.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) In Figure 1, the authors used different metrics for fEPSP strength. In Figure 1D, the authors used the slope, while they used the amplitude in Figure 1G. It is known that the two metrics are different from each other. While the slope is calculated from the linear regression between the voltage change per time of the rising phase of the fEPSP, the amplitude represents the voltage value of the fEPSP's peak. Please clarify here and in the method what metric you used, because the two terms are not interchangeable.

      Thank you for pointing out this oversight in our manuscript. We confirm that we used the slope of the fEPSP as the metric for assessing synaptic strength throughout the study, including both Figure 1D and Figure 1G. We will make the necessary corrections to ensure clarity and consistency. Thank you for bringing this to our attention.

      (2) It is not mentioned in the details of the methods about the CCK-KO mice. Please give such details. Although the authors used the CCK-KO mouse model as a control, I think that it is not a good choice to test the hypothesis mentioned in lines 165 and 166. The experiment was supposed to monitor the CCK-BR activity after HFS of the MGB and answer whether the CCK-BR will get activated by thalamic stimulation, but the CCK-KO mouse does not have CCK to be released after the optogenetic activation of the Chrimson probe. Therefore, it is expected to give nothing as if the experimenter runs an experiment without intervention. I think that the appropriate way to examine the hypothesis is to compare mice that were either injected with AAV9-Syn-FLEX-ChrimsonR-tdTomato or AAV9-Syn-FLEX-tdTomato. However, CCK-OK would be a perfect model to confirm that LTP can be only generated dependently on CCK, by simply running the HFS of the MGB that would be associated with the cortical recording of the fEPSP. This also will rule out the assumption that the authors mentioned in lines 191 and 192.

      Thank you for your valuable feedback. The rationale behind our experimental design was to validate the newly developed CCK sensor and confirm its specificity. We aimed to verify CCK release post-HFS by comparing the responses of the CCK sensor in CCK-KO mice and CCK-Cre mice. This comparison allowed us to determine that the observed increase in fluorescence intensity post-HFS was specifically due to CCK release, rather than other neurotransmitters induced by HFS.

      We appreciate your suggestion to compare mice injected with AAV9-Syn-FLEX-ChrimsonR-tdTomato and AAV9-Syn-FLEX-tdTomato, as it is indeed a valuable approach for directly testing the hypothesis regarding CCK-BR activation. However, we prioritized using the CCK-KO model to validate the CCK sensor's efficacy and specificity. The validation can be inferred by comparing the CCK sensor activity before and after HFS.

      Regarding concerns mentioned in lines 191 and 192 about potential CCK release from other projections via indirect polysynaptic activation, CCK-KO mice were not suitable for this aspect due to their global knockout of CCK. To address this limitation, we utilized shRNA to specifically down-regulate Cck expression in MGB neurons. This approach focused on the necessity of CCK released from thalamocortical projections for the observed LTP and effectively ruled out the possibility of indirect polysynaptic activation.

      We also acknowledge that the methods section lacked sufficient details about the CCK-KO mice, which may have caused confusion. In the revised methods section, we will add the following details:

      (1) The genotype of the CCK-KO mice used in this study (CCK-ires-CreERT2, Jax#012710).

      (2) A brief description of the CCK-KO validation, emphasizing the absence of CCK mRNA in these mice (as shown in Figure 3A and 3B).

      (3) The experimental purpose of using CCK-KO mice to validate the specificity of the CCK sensor.

      We believe these additions will clarify the rationale for using CCK-KO mice and their role in this study. Thank you again for highlighting these important points.

      (3) Figure 3C: The authors should examine if there is a difference in the baseline of fEPSPs across different age groups as the dependence on the normalization in the analysis within each group would hide if there were any difference of the baseline slope of fEPSP between groups which could be related to any misleading difference after HFS. Also, I wonder about the absence of LTP in P20, which is a closer age to the critical period. Could the authors discuss that, please?

      Thank you for your insightful feedback. To address your concern regarding baseline differences in fEPSP slopes across age groups, we conducted additional analysis. Baseline fEPSP across the three groups (P20, 8w, 18m), normalized to the 8w group, were 64.8± 13.1%, 100.0 ± 20.4%, and 58.8± 10.3%, respectively. While there was a trend suggesting smaller fEPSP slopes in the P20 and 18m groups compared to the young adult group, these differences were not statistically significant due to data variability (P20 vs. 8w, P = 0.319; 8w vs. 18m, P=0.147; P20 vs. 18m, P = 1.0, one-way ANOVA). These results suggest that baseline variability is unlikely to confound the observed differences in LTP after HFS. Furthermore, we ensured that normalization minimized any potential baseline effects.

      Regarding the absence of LTP in P20, this likely reflects developmental regulation of CCKBR expression in the auditory cortex (ACx). The HFS-induced thalamocortical LTP observed in our study is CCK-dependent and mechanistically distinct from the NMDA-dependent thalamocortical LTP during the critical period. Specifically, correlated pre- and postsynaptic activity can induce NMDA-dependent thalamocortical LTP only during an early critical period corresponding to the first several postnatal days, after which this pairing becomes ineffective starting from the second postnatal week (Crair and Malenka, 1995; Isaac et al., 1997; Chun et al., 2013). In contrast, the CCK-dependent Thalamocortical LTP induced by HFS is robust in adult mice but appears absent in P20, likely due to the lack of postsynaptic CCKBR expression in the ACx at this developmental stage.

      We will include these clarifications in the revised manuscript, particularly in the Discussion section, to provide a more comprehensive explanation of our findings. Thank you for your valuable comments and suggestions.

      Crair, M.C., and Malenka, R.C. (1995). A critical period for long-term potentiation at thalamocortical synapses. Nature 375, 325-328. 10.1038/375325a0.

      Isaac, J.T.R., Crair, M.C., Nicoll, R.A., and Malenka, R.C. (1997). Silent Synapses during Development of Thalamocortical Inputs. Neuron 18, 269-280. https://doi.org/10.1016/S0896-6273(00)80267-6.

      Chun, S., Bayazitov, I.T., Blundon, J.A., and Zakharenko, S.S. (2013). Thalamocortical Long-Term Potentiation Becomes Gated after the Early Critical Period in the Auditory Cortex. The Journal of Neuroscience 33, 7345-7357. 10.1523/jneurosci.4500-12.2013.

      (4) Figure 4F: It is noticed that the baseline fEPSP of the CCK group and ACSF groups were different, which raises a concern about the baseline differences between treatment groups.

      Thank you for your valuable feedback and for pointing out this important detail. We apologize for any confusion caused by the presentation of the data. As noted in the figure legend, the scale bars for the fEPSPs were different between the left (0.1 mV) and right panels (20 µV). This difference in scale may have created the perception of baseline differences between the CCK and ACSF groups. To enhance clarity and avoid potential misunderstanding, we will unify the scale bar values in the revised figure. This adjustment will provide a clearer and more accurate comparison of fEPSPs between groups. Thank you again for bringing this issue to our attention.

      (5) From Figure S2D, it seems that different animals were injected with the drug and ACSF. Therefore, how the authors validate the position of the recording electrode to the cortical area of certain CF and relative EF. Also, there is not enough information about the basis of the selection of the EF. Should it be lower than the CF with a certain value? Was the EF determined after the initial tuning curve in each case? To mitigate this difference, it would be appropriate if the authors examined the presence of a significant difference in the tuning width and CFs between animals exposed to ACSF and CCK-4. This will give some validation of a balanced experiment between ACSF and CCK-4. I wonder also why the authors used rats here not mice, as it will be easier to interpret the results came from the same species.

      Thank you for your thoughtful comments. The effective frequency (EF) was determined after measuring the initial tuning curve for each case. The EF was selected to elicit a clear sound response while maintaining a sufficient distance from the characteristic frequency (CF) to allow measurable increases in response intensity. Specifically, EF was selected based on the starting point of the tuning peak, which corresponds to the onset of its fastest rising phase. From this point, EF was determined by moving 0.2 or 0.4 octaves toward the CF. While there were individual differences in EF selection among animals, the methodology for determining EF was standardized and applied consistently across both the ACSF and CCK-4 groups.

      Regarding the use of rats in these experiments, these studies were conducted prior to our current work with mice. The findings in rat provide valuable insights that support our current results in mice. Since the rat data are supplementary to the primary findings, we included them as supplementary material to provide additional context and validation. Furthermore, in consideration of animal welfare, we chose not to replicate these experiments in mice, as the findings from rats were sufficient to support our conclusions.

      Methods section:

      “The tuning curve was determined by plotting the lowest intensity at which the neuron responded to different tones. The characteristic frequency (CF) is defined as the frequency corresponding to the lowest point on this curve. The effective frequency (EF) was determined to elicit a clear sound response while maintaining a sufficient distance from the CF to allow measurable increases in response intensity. Specifically, EF was selected based on the starting point of the tuning peak, which corresponds to the onset of its fastest rising phase. From this point, EF was determined by moving 0.2 or 0.4 octaves toward the CF.”

      (6) Lines 384-386: There are no figures named 5H and I.

      Thank you for pointing this out. The references to Figures 5H and 5I were incorrect and should have referred to Figures 5C and 5D. We sincerely apologize for this oversight and will correct these errors in the revised manuscript to ensure clarity and accuracy. Thank you again for bringing this to our attention.

      (7) The authors should mention the sex of the animals used.

      Thank you for your comment and for highlighting this important detail. The sex of the animals used in this study is specified in the Animals section of the Methods: "In the present study, male mice and rats were used to investigate thalamocortical LTP." We appreciate your careful attention to this point and will ensure that this detail remains clearly stated in the manuscript.

      (8) Lines 534 and 648: These coordinates are difficult to understand. Since the experiment was done on both mice and rats, we need a clear description of the coordinates in both. Also, I think that you should mention the lateral distance from the sagittal suture as the ventral coordinates should be calculated from the surface of the skull above the AC and not from the sagittal suture.

      Thank you for your valuable feedback and for pointing out this important issue. We apologize for any confusion caused by our description of the coordinates. The term “ventral” was deliberately used because the auditory cortex is located on the lateral side of the skull, which may have caused some misunderstanding.

      To provide a clearer and more accurate descriptions of the coordinates, we will revise the text in the manuscript as follows: “A craniotomy was performed at the temporal bone (-2 to -4 mm posterior and -1.5 to -3 mm ventral to bregma for mice; -3.0 to -5.0 mm posterior and -2.5 to -6.5 mm ventral to bregma for rats) to access the auditory cortex.'

      We appreciate your attention to these details and will ensure that the revised manuscript includes this clarification to improve accuracy and eliminate potential confusion. Thank you again for bringing this to our attention.

      (9) Line 536: The author should specify that these coordinates are for the experiment done on mice.

      Thank you for your valuable feedback. We will revise the manuscript to explicitly specify that these coordinates refer to the experiments conducted on mice. This clarification will help improve the clarity and precision of the manuscript. We greatly appreciate your attention to this point and your effort to enhance the quality of our work.

      Methods section:

      “and a hole was drilled in the skull according to the coordinates of the ventral division of the MGB (MGv, AP: -3.2 mm, ML: 2.1 mm, DV: 3.0 mm) for experiments conducted on mice.”

      (10) Line 590: Please add the specifications of the stimulating electrode. Is it unipolar or bipolar? What is the cat.# provided by FHC?

      Thank you for your valuable feedback. The electrodes used in the experiments are unipolar. We will include the catalog number provided by FHC in the revised manuscript for clarity. The revised text will be updated as follows:

      “In HFS-induced thalamocortical LTP experiments, two customized microelectrode arrays with four tungsten unipolar electrodes each, impedance: 0.5-1.0 MΩ (recording: CAT.# UEWSFGSECNND, FHC, U.S.), and 200-500 kΩ (stimulating: CAT.# UEWSDGSEBNND, FHC, U.S.), were used for the auditory cortical neuronal activity recording and MGB ES, respectively.”

      We appreciate your attention to this detail, and we will ensure that the revised manuscript reflects this clarification accurately.

      (11) Lines 612-614: There are no details of how the optic fiber was inserted or post-examined. If there is a word limitation, the authors may reference another study showing these procedures.

      Thank you for your insightful comment and for highlighting this important aspect of the methodology. To address this, we will reference the study by Sun et al. (2024) in the revised manuscript, which provides detailed procedures for optic fiber insertion and post-examination. We believe that this reference will help enhance the clarity and completeness of the methods section.

      Sun, W., Wu, H., Peng, Y., Zheng, X., Li, J., Zeng, D., Tang, P., Zhao, M., Feng, H., Li, H., et al. (2024). Heterosynaptic plasticity of the visuo-auditory projection requires cholecystokinin released from entorhinal cortex afferents. eLife 13, e83356. 10.7554/eLife.83356.

      We appreciate your valuable suggestion, which will contribute to improving the quality of the manuscript.

      Minor concerns:

      (1) The definition of HFS was repeated many times throughout the manuscript. Please mention the defined name for the first time in the manuscript only followed by its abbreviation (HFS).

      Thank you for your suggestion and for pointing out this important detail. We will revise the manuscript to ensure that all abbreviations are defined only upon their first mention in the manuscript, with subsequent mentions using the abbreviations consistently. We appreciate your careful attention to detail and your effort to help improve the manuscript.

      (2) Line 173: There is a difference between here and the methods section (620 nm here and 635 nm there) please correct which wavelength the authors used.

      Thank you for your careful review and for bringing this discrepancy to our attention. We have corrected the inconsistency, and the wavelength has been unified throughout the manuscript to ensure accuracy and clarity. The revised text now reads as follows:

      “The fluorescent signal was monitored for 25s before and 60s after the HFLS (5~10 mW, 620 nm) or HFS application.”

      We appreciate your valuable feedback, which has helped us improve the precision and consistency of the manuscript.

      (3) Line 185: I think the authors should refer to Figure 2G before mentioning the statistical results.

      Thank you for your careful review and for pointing out this oversight. We have now added a reference to Figure 2G at the appropriate location to ensure clarity and logical flow in the manuscript, as recommended..

      (4) Line 202: I think the authors should refer to Figure 2J before mentioning the statistical results.

      Thank you again for your careful review and for highlighting this point. We have revised the manuscript to include a reference to Figure 2J before mentioning the statistical results.

      We appreciate your valuable feedback, which has helped us improve the accuracy and presentation of the results.

      (5) Line 260: Please add appropriate references at the end of the sentence to support the argument.

      Thank you for your valuable suggestion. To address this, we have add appropriate references to support the statement regarding the multiple steps involved between mRNA expression and neuropeptide release. Additionally, we have revised the statement to adopt a more cautious interpretation. The revised text is as follows:

      “It is widely recognized that mRNA levels do not always directly correlate with peptide levels due to multiple steps involved in peptide synthesis and processing, including translation, post-translational modifications, packaging, transportation, and proteolytic cleavage, all of which require various enzymes and regulatory mechanisms (38-41). A disruption at any stage in this process could lead to impaired CCK release, even when Cck mRNA is present.”

      We have included the following references to support this statement:

      38. Mierke, C.T. (2020). Translation and Post-translational Modifications in Protein Biosynthesis. In Cellular Mechanics and Biophysics: Structure and Function of Basic Cellular Components Regulating Cell Mechanics, C.T. Mierke, ed. (Springer International Publishing), pp. 595-665. 10.1007/978-3-030-58532-7_14.

      39. Gualillo, O., Lago, F., Casanueva, F.F., and Dieguez, C. (2006). One ancestor, several peptides post-translational modifications of preproghrelin generate several peptides with antithetical effects. Mol Cell Endocrinol 256, 1-8. 10.1016/j.mce.2006.05.007.

      40. Sossin, W.S., Fisher, J.M., and Scheller, R.H. (1989). Cellular and molecular biology of neuropeptide processing and packaging. Neuron 2, 1407-1417. https://doi.org/10.1016/0896-6273(89)90186-4.

      41. Hook, V., Funkelstein, L., Lu, D., Bark, S., Wegrzyn, J., and Hwang, S.R. (2008). Proteases for processing proneuropeptides into peptide neurotransmitters and hormones. Annu Rev Pharmacol Toxicol 48, 393-423. 10.1146/annurev.pharmtox.48.113006.094812.

      We greatly appreciate your helpful feedback, which has allowed us to improve both the accuracy and the depth of discussion in the manuscript.

      (6) Line 278: The authors mentioned "due to the absence of CCK in aged animals", which was not an appropriate description. It should be a reduction of CCK gene expression or a possible deficient CCK release.

      Thank you for your careful review and for pointing out the inaccuracy in our description. We agree with your suggestion and have revised the statement to more appropriately reflect the findings.

      “Our findings revealed that thalamocortical LTP cannot be induced in aged mice, likely due to insufficient CCK release, despite intact CCKBR expression.”

      This revision ensures a more accurate and precise description of the potential mechanisms underlying the observed phenomenon. We greatly appreciate your valuable feedback, which has helped us improve the clarity and accuracy of the manuscript.

      (7) Line 291: The authors mentioned that "without MGB stimulation", which is confusing. The MGB was stimulated with a single electrical pulse to evoke cortical fEPSPs. Therefore it should be "without HFS of MGB".

      Thank you for pointing this out and for highlighting the potential confusion caused by our original phrasing. Upon review, we recognize that our original phrasing "without MGB stimulation" may have been unclear and could have led to misinterpretation. To clarify, our intention was to describe the period during which CCK was present without any stimulation of the MGB.

      It is important to note that, in the presence of CCK, LTP can be induced even with low-frequency stimulation, including in aged mice. This observation underscores the potent effect of CCK in facilitating thalamocortical LTP, regardless of the specific stimulation protocol used.

      To address this issue, we have revised the sentence for improved clarity as follows::

      " To investigate whether CCK alone is sufficient to induce thalamocortical LTP without activating thalamocortical projections, we infused CCK-4 into the ACx of young adult mice immediately after baseline fEPSPs recording. Stimulation was then paused for 15 min to allow for CCK degradation, after which recording resumed."

      We believe this revision resolves the misunderstanding and provides a clearer and more accurate description of the experimental context. We greatly appreciate your insightful feedback, which has helped us refine the manuscript for clarity and precision.

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      (1) Line 99, 134, possibly other locations: "site" to "sites".

      Thank you for your careful review. We appreciate your attention to detail and have made the necessary corrections in the manuscript.

      (2) Throughout the manuscript there are some minor issues with language choice and subtle phrasing errors and I suggest English language editing.

      Thank you for your suggestion. In response, we have thoroughly reviewed the manuscript and addressed issues related to language choice and phrasing. The text has been carefully edited to ensure clarity, precision, and consistency. We believe these revisions have significantly enhanced the overall quality of the manuscript. We greatly appreciate your feedback, which has been invaluable in improving the presentation of our work.

      (3) Based on the experimental configurations, I do not think it is a problematic caveat, but authors should be aware of the high likelihood of AAV9 jumping synapses relative to other AAV serotypes.

      Thank you for bringing up the potential of AAV9 crossing synapses, a recognized characteristic of this serotype. We appreciate your observation regarding its relevance to our experimental design. In our study, we carefully considered the possibility of trans-synaptic transfer during both the experimental design and data interpretation phases. To minimize the likelihood of significant trans-synaptic spread, we implemented several measures, including controlling the injection volume, using a slow injection rate, and limiting the viral expression time. Post-hoc histological analyses confirmed that the expression of AAV9 was largely confined to the intended regions, with limited evidence of synaptic jumping under our experimental conditions.

      While we acknowledge the inherent potential for AAV9 to cross synapses, we believe this effect does not substantially confound the interpretation of our findings in the current study. To address this concern, we have added a brief discussion on this point in the revised manuscript to enhance clarity. We greatly appreciate your insightful comment, which has helped us further refine our work.

      Discussion section:

      “ One potential limitation of our study is the trans-synaptic transfer property of AAV9. To mitigate this, we carefully controlled the injection volume, rate, and viral expression time, and conducted post-hoc histological analyses to minimize off-target effects, thereby reducing the likelihood of trans-synaptic transfer confounding the interpretation of our findings.”

      (4) The trace identifiers (1-4) do not seem correctly placed/colored in Figure S1D. Please check others carefully.

      Thank you for your careful review and for bringing this issue to our attention. We have corrected the trace identifiers in Figure S1D. Additionally, we have carefully reviewed all other figures to ensure their accuracy and consistency. We greatly appreciate your attention to detail, which has helped improve the overall quality of the manuscript.

      (5) Please provide a value of the laser power range based on calibrated values.

      Thank you for your suggestion. We have included the calibrated laser power range in the revised manuscript as follows:

      “The laser stimulation was produced by a laser generator (5-20 mW(30), Wavelength: 473 nm, 620 nm; CNI laser, China) controlled by an RX6 system and delivered to the brain via an optic fiber (Thorlabs, U.S.) connected to the generator.”

      We appreciate your feedback, which has helped improve the clarity and precision of our methodological description.

      (6) It would be useful to annotate figures in a way that identifies in which transgenic mice experiments are being performed.

      Thank you for your valuable suggestion. We will add annotations to the figures to explicitly identify the type of mice used in each experiment. We believe this enhancement will improve the clarity and accessibility of our results. We greatly appreciate your input in making our manuscript more informative.

      (7) Please comment on the rigor you use to address the accuracy of viral injections. How often did they spread outside of the MGB/AC?

      Thank you for raising this important question regarding the accuracy of viral injections and the potential spread outside the MGB or AC. Below, we provide details for each set of experiments:

      shRNA Experiments:

      For the shRNA experiments targeting the MGB, our primary goal was to achieve comprehensive coverage of the entire MGB. To this end, we used larger injection volumes and multiple injection sites, which inevitably resulted in some viral spread beyond the MGB. However, this approach was necessary to ensure robust knockdown effects that were representative of the entire MGB. While strict confinement to specific subregions could not be guaranteed, this strategy allowed us to prioritize the effectiveness of the knockdown within the target region.

      Fiber photometry Experiments:

      For the fiber photometry experiments targeting the auditory cortex (AC), we used larger injection volumes and multiple injection sites to cover its relatively large size. Although this approach might have resulted in some CCK-sensor virus spread outside the AC, the placement of the optic fiber was guided by the location of the auditory cortex. Consequently, any minor viral expression outside the AC would not affect the experimental results, as recordings were confined to the intended area through precise fiber placement.  

      Optogenetic Experiments:

      For the optogenetic experiments targeting the MGB, we specifically injected virus into the MGv subregion. To minimize viral spread, we employed several strategies, including the used fine injection needles, waiting for tissue stabilization (7 minutes post-needle insertion), delivering small volumes at a slow rate to prevent backflow, aspirating 5 nL of the solution post-injection, and raising the needle by 100 μm before waiting an additional 5 minutes prior to full retraction. These measures significantly reduced the risk of viral leakage to adjacent regions.

      Histological Validation:

      After the electrophysiological experiments, we systematically verified the accuracy of viral expression by examining histological sections to ensure that the expression was primarily localized within the intended regions.

      Terminology in the Manuscript:

      In the manuscript, we deliberately used the term "MGB" in the manuscript rather than specifically "MGv" to transparently acknowledge the potential for viral spread in some experiments.

      We hope this explanation clarifies the strategies we employed to address the accuracy of viral injections, as well as how we managed potential viral spread. We have also added a brief information in the revised manuscript to reflect these points and acknowledge the inherent variability in viral delivery.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers for their thoughtful and detailed feedback, which we found highly constructive and encouraging. The comments have been invaluable in guiding improvements to the clarity, rigor, and impact of our manuscript. Below, we provide our responses and outline the specific revisions we plan to make in response to each point raised. It was extremely encouraging that all the comments were highly relevant to the study demonstrating careful work by experts in the field and they truly help to improve the clarity and message of the manuscript.

      2. Description of the planned revisions


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Gizaw et al characterizes the cholesterol biosynthetic pathway and the effect of its knockdown or inhibition on rhabdomyosarcoma tumor properties. The Authors find that the PROX1 transcription factor mediated cholesterol biosynthesis regulates rhabdomyosarcoma cell growth and proliferation. Blocking the cholesterol biosynthetic pathway leads to reduced proliferation, cell cycle arrest and ER-stress mediated enhanced apoptosis. Detailed transcriptomic analysis indicate gene expression patterns that support these findings. Reviewer #1 (Significance (Required)):

      Based on my expertise on rhabdomyosarcoma tumors, the manuscript is clear, concise and provides a significant advance to the field. Detailed mechanistic characterization is lacking, which takes away some of the significance of the findings, but the work done stands alone as description of the effect of the cholesterol biosynthetic pathway in rhabdomyosarcoma. Another aspect to be considered by the Authors is the potential specificity of targeting a ubiquitous pathway such as cholesterol biosynthesis, which is important to most cells and not only cancer cells. Overall, the manuscript may be revised to address the specific comments below.

      Responses to Reviewer #1 comments

      We thank the reviewer for the thoughtful and encouraging comments on our manuscript. We appreciate the recognition of the significance of our findings and the detailed suggestions provided. We are committed to addressing each of the reviewer's points to strengthen the manuscript and ensure clarity and rigor. Below, we outline how we plan to address each comment.

      Major Comments:

      1. __ Details of the healthy human myoblasts that are used in Figure 1A are not provided and should be updated. Evidence of PROX1 knockdown should be presented. What kind of pathways and gene ontology predictions were associated with the 225 genes that are commonly downregulated between all three cell lines in Figure 1A?__

      Response: In the revised manuscript, we will include complete information regarding the origin and characterization of the healthy human myoblasts used in the Figure 1A. We will also provide additional data confirming PROX1 knockdown. Furthermore, we will present more details on the gene ontology (GO) and pathway enrichment analyses, and include the full results as supplemental data to highlight key biological processes affected by PROX1 silencing.

      __ In Figure 2, while the effect of the shRNAs targeting DHCR7 or the DHCR7 inhibitor AY9944 are striking, it is not clear whether these effects are specific to rhabdomyosarcoma cells or cancer cells. A control, human myoblast cell line or another non-cancerous cell line should be used to repeat these experiments quantifying Caspase3/7 activity, cell growth etc. to assess the cancer cell specificity of such treatments. Evidence of DHCR7 knockdown at the protein level would add to the study.__


      Response: We fully agree with the reviewer's suggestion and will conduct additional experiments using non-cancerous human myoblasts to assess the specificity of DHCR7 inhibition. These will include assays for Caspase 3/7 activation, cell viability, and proliferation under similar conditions. We have already performed western blot validation of DHCR7 knockdown at the protein level in RMS cell lines and will include this data in the manuscript. We will also highlight in the discussion that RMS cells in our experiments were highly vulnerable when cultured with full media (incl. FBS), whereas previous studies with breast cancer cells have shown that their growth is affected by cholesterol biosynthesis inhibition only if they are cultured without serum (containing cholesterol). We also show that cholesterol supplementation does not rescue RMS cells demonstrating the essential role of de novo cholesterol synthesis.

      __ Western blots for Caspase3 quantification and a cell proliferation marker such as Cyclin D in shSCR and shDHCR7 tumor lysates would validate the data shown in the Figure 3. Are the shRNA constructs used inducible ones? If not, how do the Authors distinguish the effect of shDHCR7 on tumor engraftment versus tumor proliferation and growth? Many of the graphs need proper labeling of the axes and what the bars represent.__


      Response: We will include western blot analysis for cleaved Caspase 3 and Cyclin D1 in tumor lysates to support the observed effects on apoptosis and proliferation. We will clarify in the revised manuscript that the shRNA constructs used were constitutive. To distinguish between effects on tumor engraftment versus tumor growth, we will provide additional detail on how we controlled for initial cell viability and engraftment potential prior to injection. We will also revise figure panels to ensure all axes and error bars are clearly labeled.

      __ Gene ontology and pathway analysis will add to Figure 4.__


      Response: We will expand Figure 4 to include GO and pathway enrichment analyses of the RNA-seq data following DHCR7 knockdown. This will help illustrate the functional significance of the transcriptional changes and further support our conclusions regarding ER stress, apoptosis, and cell cycle regulation.

      __ In Figure 5A, how do the Authors explain the upregulation of cholesterol biosynthetic pathway genes upon shDHCR7 treatment? Are these effects seen at the protein level and if alternate pathways maintain cholesterol biosynthesis, how do the Authors think this strategy will be viable to treat such tumors? In Figure 5G-H, was a loading control used? If so, blots for that should be included.__


      Response: We will expand the discussion to address the compensatory transcriptional upregulation of cholesterol biosynthesis genes following DHCR7 knockdown, likely driven by SREBP-mediated feedback regulation. To support this, we will include western blot data for key enzymes in the pathway. We will also clarify that despite this transcriptional compensation, functional cholesterol synthesis is impaired due to DHCR7 silencing, which cannot be rescued by increased upstream pathway activity. Regarding Figure 5G-H, we will include the missing loading control images in the revised version. Protein normalization was performed using Stain-Free technology, which enables the quantification of total protein in each lane, and was analyzed using ImageLab 6.0.1 software (Bio-Rad). We will include the Stain-Free gel images to demonstrate equal protein loading and will also indicate the molecular weights of the presented proteins in the updated figure legend.

      __ Lines 286-287 refer to Figure S1G, H; it should be corrected to Figure S1I, J.__

      Response: We thank the reviewer for pointing this out. We will correct the figure citation in the revised manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript entitled "Targeting de novo cholesterol synthesis in rhabdomyosarcoma induces cell cycle arrest and triggers apoptosis through ER stress-mediated pathways" Gizaw et al investigate the crucial effect of targeting cholesterol biosynthesis in RMS. While this manuscript gives novel insights into putative therapeutic approach, there are some comments that should be address by the authors.

      Reviewer #2 (Significance (Required)):

      A nice and coherent study. Please see text above.


      Response to Reviewer #2

      We are grateful to the reviewer for the thoughtful and constructive comments on our manuscript. We appreciate your recognition of the novelty and therapeutic potential of our findings, and we thank you for highlighting specific areas that will help further improve the clarity, rigor, and reproducibility of our work. Below, we respond point-by-point to your comments and outline how we plan to address each issue in the revised version of the manuscript.

      Major Comments:

      1. __ The authors demonstrated a correlation between PROX1 levels and the cholesterol synthesis pathway. Which genes from the pathway are mostly affected? The manuscript could benefit from a graphical representation of the pathway showing up- and downregulated genes from the RNA-seq analysis. This will help in understanding why the authors decided to study HMGCR silencing as shown in Supplementary Figure 1A.__

      Response: We fully agree and will include a new graphical figure showing the cholesterol biosynthesis pathway, with up- and downregulated genes from our RNA-seq data visually mapped. This is, indeed, interesting as the whole pathway is consistently downregulated. We chose to study specifically these two rate-limiting genes in the pathway, as DHCR7 is the last enzyme in the mevalonate pathway and its inhibition does not affect other arms deviating from this pathway. It was also recently found to be highly upregulated in pancreatic cancer, suggesting its role in cancer development/growth. HMGCR was chosen as it is the target for statins, which are widely used in treating high cholesterol and shown to be rather safe in clinical use. We will add this rationale to the manuscript to clarify our focus on HMGCR and DHCR7.

      __ Based on the previous comment, are the genes from the cholesterol synthesis identified in the RNA-seq similar to those detected in the publicly available data set presented in Figure 1E? In addition, validation of changes of these genes should be performed in the RMS cell lines as well as in myoblasts.__


      Response: Yes, there is a significant overlap between the cholesterol biosynthesis genes identified in our RNA-seq dataset and those from the public dataset in Figure 1E. In the revised version, we will include this comparative analysis with the inclusion of the schematic figure (see our response #1). We also plan to perform qPCR validation of several key cholesterol biosynthesis genes in additional RMS cell lines and healthy myoblasts to reinforce the disease-specific regulation of this pathway.

      __ In Figure 3, the authors study the impact of DHCR7-silencing in tumor growth in vivo. Please, provide stainings also for DHCR7 to show that cells indeed have silenced DHCR7.__


      Response: Thank you for this important suggestion. We will include immunofluorescence staining for DHCR7 in xenograft tumor sections to confirm DHCR7 knockdown in vivo and visually validate the efficiency of our silencing strategy. We will also add qPCR results from the cells at the time when they were implanted confirming the deletion.

      __ In Figure 4, the RNA-seq data revealed downregulation in E2F genes as well as genes involved in cell cycle progression. It would be important that the authors provide examples of these genes and validate this data by performing qPCR.__


      Response: We will select representative cell cycle-related genes, including members of the E2F family and other G1/S and G2/M regulators, for qPCR validation in RMS cells following DHCR7 knockdown. Comparison to healthy myoblasts will be also performed. This will further substantiate the transcriptomic findings.

      __ In Figure 4J-M, cell cycle distribution using flow cytometry should be assessed in an additional cell line.__


      Response: We will repeat the flow cytometry-based cell cycle analysis in an additional RMS cell line to ensure reproducibility and confirm the generalizability of the observed G2/M arrest phenotype.

      __ In line 271, the authors described that PROX1 is associated with an increase in DHCR7. However, in the next paragraph they evaluated the effect of silencing HMGCR. Is this enzyme also increased? Please clarify.__


      Response: We appreciate the need for clarity. HMGCR expression is also elevated in RMS cells and regulated by PROX1. We will clarify this in the revised manuscript and update the text to explain the rationale behind examining both enzymes: HMGCR as the rate-limiting enzyme at the top of the cholesterol biosynthesis pathway, and DHCR7 as the final step enzyme. See also our response to question #1.

      __ The authors show that cholesterol biosynthesis is crucial in RMS. Would overexpression of DHCR7 in shDHCR7 cells rescue the anti-tumor effects? A rescue experiment would give information on whether this enzyme has a direct role in driving RMS cell behavior.__


      Response: This is an excellent suggestion. We are currently generating a DHCR7 rescue construct and plan to perform these experiments. While these data may not be available in time for the current revision, we will clearly outline this approach as a key next step in our Discussion section and incorporate results if available.

      Minor Comments:

      1. __ In line 287 "Supplementary Fig.1G and 1H" are mentioned, while it should be "Supplementary Fig.1I and 1J" since it regards the treatment with lovastatin.__

      Response: Thank you for catching this. We will correct the figure references accordingly.

      __ In line 340, authors mentioned the data "Supplementary Figure 4A and 4E", but there is not any corresponding data available in the Supplementary Information.__


      Response: We apologize for this oversight. These references will be corrected, and any missing supplementary data will be properly included and labeled.

      __ In the Legend of Figure 2L, authors mention "PRXO-1 silencing", this should be corrected to "shDHCR7". Also, please change "l" to capital "L".__


      Response: This will be corrected in the revised figure legend.

      __ In Figure 5G-H, please provide the data regarding loading control in the Western blot, as well as the molecular weights of the proteins presented.__


      Response: We thank the reviewer for this important point. For the Western blot analysis in Figure 5G-H, normalization was performed by quantifying the total protein in each lane using Bio-Rad's Stain-Free technology and analyzed with ImageLab 6.0.1 software. This approach allows for accurate lane-to-lane comparison without relying on a single housekeeping protein. We will add the Stain-Free total protein images as a supplemental figure (Supplementary Figure) and include the molecular weights for each of the proteins in the figure legend to improve clarity and reproducibility.

      __ Please, include the information of what black, red etc refer to in each figure. This information is missing in several figures including Figure 2D, 2K, 3C, 3J, 3K, 3L which makes it difficult to follow.__


      Response: We agree and will update all relevant figure legends to clearly explain color coding, symbols, and what each bar or line represents to improve figure clarity.

      __ The authors should indicate the numbers of biological replicates in individual experiments throughout whole figure legends.__


      Response: Thank you for the suggestion. We will include the number of biological replicates for each experiment in the figure legends to enhance transparency and reproducibility.


    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study investigates how hearing impairment affects neural encoding of speech, in particular the encoding of hierarchical linguistic information. The current analysis provides incomplete evidence that hearing impairment affects speech processing at multiple levels, since the novel analysis based on HM-LSTM needs further justification. The advantage of this method should also be further explained. The study can also benefit from building a stronger link between neural and behavioral data.

      We sincerely thank the editors and reviewers for their detailed and constructive feedback.

      We have revised the manuscript to address all of the reviewers’ comments and suggestions. The primary strength of our methods lies in the use of the HM-LSTM model, which simultaneously captures linguistic information at multiple levels, ranging from phonemes to sentences. As such, this model can be applied to other questions regarding hierarchical linguistic processing. We acknowledge that our current behavioral results from the intelligibility test may not fully differentiate between the perception of lower-level acoustic/phonetic information and higher-level meaning comprehension. However, it remains unclear what type of behavioral test would effectively address this distinction. We aim to xplore this connection further in future studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors are attempting to use the internal workings of a language hierarchy model, comprising phonemes, syllables, words, phrases, and sentences, as regressors to predict EEG recorded during listening to speech. They also use standard acoustic features as regressors, such as the overall envelope and the envelopes in log-spaced frequency bands. This is valuable and timely research, including the attempt to show differences between normal-hearing and hearing-impaired people in these regards. I will start with a couple of broader questions/points, and then focus my comments on three aspects of this study: The HM-LSTM language model and its usage, the time windows of relevant EEG analysis, and the usage of ridge regression.

      Firstly, as far as I can tell, the OSF repository of code, data, and stimuli is not accessible without requesting access. This needs to be changed so that reviewers and anybody who wants or needs to can access these materials. 

      It is my understanding that keeping the repository private during the review process and making them public after acceptance is standard practice. As far as I understand, although the OSF repository was private, anyone with the link should be able to access it. I have now made the repository public.

      What is the quantification of model fit? Does it mean that you generate predicted EEG time series from deconvolved TRFs, and then give the R2 coefficient of determination between the actual EEG and predicted EEG constructed from the convolution of TRFs and regressors? Whether or not this is exactly right, it should be made more explicit.

      Model fit was measured by spatiotemporal cluster permutation tests (Maris & Oostenveld, 2007) on the contrasts of the timecourses of the z-transformed coefficient of determination (R<sup>2</sup>). For instance, to assess whether words from the attended stimuli better predict EEG signals during the mixed speech compared to words from the unattended stimuli, we used the 150dimensional vectors corresponding to the word layer from our LSTM model for the attended and unattended stimuli as regressors. We then fit these regressors to the EEG signals at 9 time points (spanning -100 ms to 300 ms around the sentence offsets, with 50 ms intervals). We then conducted one-tailed two-sample t-tests to determine whether the differences in the contrasts of the R<sup>2</sup> timecourses were statistically significant. Note that we did not perform TRF analyses. We have clarified this description in the “Spatiotemporal clustering analysis” section of the “Methods and Materials” on p.10 of the manuscript.

      About the HM-LSTM:

      • In the Methods paragraph about the HM-LSTM, a lot more detail is necessary to understand how you are using this model. Firstly, what do you mean that you "extended" it, and what was that procedure? 

      The original HM-LSTM model developed by Chung et al. (2017) consists of only two levels: the word level and the phrase level (Figure 1b from their paper). By “extending” the model, we mean that we expanded its architecture to include five levels: phoneme, syllable, word, phrase, and sentence. Since our input consists of phoneme embeddings, we cannot directly apply their model, so we trained our model on the WenetSpeech corpus (Zhang et al., 2021), which provides phoneme-level transcripts. We have added this clarification on p.4 of the manuscript.

      • And generally, this is the model that produces most of the "features", or regressors, whichever word we like, for the TRF deconvolution and EEG prediction, correct? 

      Yes, we extracted the 2048-dimensional hidden layer activity from the model to represent features for each sentence in our speech stimuli at the phoneme, syllable, word, phrase and sentence levels. But we did not perform any TRF deconvolution, we fit these features (downsampled to 150-dimension using PCA) to the EEG signals at 9 timepoints around the offset of each sentence using ridge regression. We have now added a multivariate TRF (mTRF) analysis following Reviewer 3’s suggestions, and the results showed similar patterns to the current results (see Figure S2). We have added the clarification in the “Ridge regression at different time latencies” section of the “Methods and Materials” on p.10 of the manuscript.

      Resutls from the mTRF analyses were added on p.7 of the manuscript.

      • A lot more detail is necessary then, about what form these regressors take, and some example plots of the regressors alongside the sentences.

      The linguistic regressors are just 5 150-dimensional vectors, each corresponding to one linguistic level, as shown in Figure 1B.

      • Generally, it is necessary to know what these regressors look like compared to other similar language-related TRF and EEG/MEG prediction studies. Usually, in the case of e.g. Lalor lab papers or Simon lab papers, these regressors take the form of single-sample event markers, surrounded by zeros elsewhere. For example, a phoneme regressor might have a sample up at the onset of each phoneme, and a word onset regressor might have a sample up at the onset of each word, with zeros elsewhere in the regressor. A phoneme surprisal regressor might have a sample up at each phoneme onset, with the value of that sample corresponding to the rarity of that phoneme in common speech. Etc. Are these regressors like that? Or do they code for these 5 linguistic levels in some other way? Either way, much more description and plotting is necessary in order to compare the results here to others in the literature.

      No, these regressors were not like that. They were 150-dimensional vectors (after PCA dimension reduction) extracted from the hidden layers of the HM-LSTM model. After training the model on the WenetSpeech corpus, we ran it on our speech stimuli and extracted representations from the five hidden layers to correspond to the five linguistic levels. As mentioned earlier, we did not perform TRF analyses; instead, we used ridge regression to predict EEG signals around the offset of each sentence, a method commonly employed in the literature (e.g., Caucheteux & King, 2022; Goldstein et al., 2022; Schmitt et al., 2021; Schrimpf et al., 2021). For instance, Goldstein et al. (2022) used word embeddings from GPT-2 to predict ECoG activity surrounding the onset of each word during naturalistic listening. We have included these literatures on p.3 in the manuscript, and the method is illustrated in Figure 1B.

      • You say that the 5 regressors that are taken from the trained model's hidden layers do not have much correlation with each other. However, the highest correlations are between syllable and sentence (0.22), and syllable and word (0.17). It is necessary to give some reason and interpretation of these numbers. One would think the highest correlation might be between syllable and phoneme, but this one is almost zero. Why would the syllable and sentence regressors have such a relatively high correlation with each other, and what form do those regressors take such that this is the case?

      All the regressors are represented as 2048-dimensional vectors derived from the hidden layers of the trained HM-LSTM model. We applied the trained model to all 284 sentences in our stimulus text, generating a set of 284 × 2048-dimensional vectors. Next, we performed Principal Component Analysis (PCA) on the 2048 dimensions and extracted the first 100 principal components (PCs), resulting in 284 × 100-dimensional vectors for each regressor. These 284 × 100 matrices were then flattened into 28,400-dimensional vectors. Subsequently, we computed the correlation matrix for the z-transformed 28,400-dimensional vectors of our five linguistic regressors. The code for this analysis, lstm_corr.py, can be found in our OSF repository. We have added a section “Correlation among linguistic features” in “Materials and Methods” on p.10 of the manuscript.

      We consider the observed coefficients of 0.17 and 0.22 to be relatively low compared to prior model-brain alignment studies which report correlation coefficients above 0.5 for linguistic regressors (e.g., Gao et al., 2024; Sugimoto et al., 2024). In Chinese, a single syllable can also function as a word, potentially leading to higher correlations between regressors for syllables and words. However, we refrained from overinterpreting the results to suggest a higher correlation between syllable and sentence compared to syllable and word. A paired ttest of the syllable-word coefficients versus syllable-sentence coefficients across the 284 sentences revealed no significant difference (t(28399)=-3.96, p=1). We have incorporated this information into p.5 of the manuscript.

      • If these regressors are something like the time series of zeros along with single sample event markers as described above, with the event marker samples indicating the onset of the relevant thing, then one would think e.g. the syllable regressor would be a subset of the phoneme regressor because the onset of every syllable is a phoneme. And the onset of every word is a syllable, etc.

      All the regressors are aligned to 9 time points surrounding sentence offsets (-100 ms to 300 ms with a 50 ms interval). This is because all our regressors are taken from the HM-LSTM model, where the input is the phoneme representation of a sentence (e.g., “zh ə_4 y ie_3 j iəu_4 x iaŋ_4 sh uei_3 y ii_2 y aŋ_4”). For each unit in the sentence, the model generates five 2048dimensional vectors, each corresponding to the five linguistic levels of the entire sentence. We have added the clarification on p.11 of the manuscript.

      For the time windows of analysis:

      • I am very confused, because sometimes the times are relative to "sentence onset", which would mean the beginning of sentences, and sometimes they are relative to "sentence offset", which would mean the end of sentences. It seems to vary which is mentioned. Did you use sentence onsets, offsets, or both, and what is the motivation?

      • If you used onsets, then the results at negative times would not seem to mean anything, because that would be during silence unless the stimulus sentences were all back to back with no gaps, which would also make that difficult to interpret.

      • If you used offsets, then the results at positive times would not seem to mean anything, because that would be during silence after the sentence is done. Unless you want to interpret those as important brain activity after the stimuli are done, in which case a detailed discussion of this is warranted.

      Thank you very much for pointing this out. All instances of “sentence onset” were typos and should be corrected to “sentence offset.” We chose offset because the regressors are derived from the hidden layer activity of our HM-LSTM model, which processes the entire sentence before generating outputs. We have now corrected all the typos. In continuous speech, there is no distinct silence period following sentence offsets. Additionally, lexical or phrasal processing typically occurs 200 ms after stimulus offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Therefore, we included a 300 ms interval after sentence offsets in our analysis, as our regressors encompass linguistic levels up to the sentence level. We have added this motivation on p.11 of the manuscript.

      • For the plots in the figures where the time windows and their regression outcomes are shown, it needs to be explicitly stated every time whether those time windows are relative to sentence onset, offset, or something else.

      Completely agree and thank you very much for the suggestion. We have now added this information on Figure 4-6.

      • Whether the running correlations are relative to sentence onset or offset, the fact that you can have numbers outside of the time of the sentence (negative times for onset, or positive times for offset) is highly confusing. Why would the regressors have values outside of the sentence, meaning before or after the sentence/utterance? In order to get the running correlations, you presumably had the regressor convolved with the TRF/impulse response to get the predicted EEG first. In order to get running correlation values outside the sentence to correlate with the EEG, you would have to have regressor values at those time points, correct? How does this work?

      As mentioned earlier, we did not perform TRF analyses or convolve the regressors. Instead, we conducted regression analyses at each of the 9 time points surrounding the sentence offsets, following standard methods commonly used in model-brain alignment studies (e.g., Gao et al., 2024; Goldstein et al., 2022). The time window of -100 to 300 ms was selected based on prior findings that lexical and phrasal processing typically occurs 200–300 ms after word offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Additionally, we included the -100 to 200 ms time period in our analysis to examine phoneme and syllable level processing (cf. Gwilliams et al., 2022). We have added the clarification on p. of the manuscript.

      • In general, it seems arbitrary to choose sentence onset or offset, especially if the comparison is the correlation between predicted and actual EEG over the course of a sentence, with each regressor. What is going on with these correlations during the middle of the sentences, for example? In ridge regression TRF techniques for EEG/MEG, the relevant measure is often the overall correlation between the predicted and actual, calculated over a longer period of time, maybe the entire experiment. Here, you have calculated a running comparison between predicted and actual, and thus the time windows you choose to actually analyze can seem highly cherry-picked, because this means that most of the data is not actually analyzed.

      The rationale for choosing sentence offsets instead of onsets is that we are aligning the HM-LSTM model’s activity with EEG responses, and the input to the model consists of phoneme representations of the entire sentence at one time. In other words, the model needs to process the whole sentence before generating representations at each linguistic level. Therefore, the corresponding EEG responses should also align with the sentence offsets, occurring after participants have seen the complete sentence. The ridge regression followed the common practice in model-brain alignment studies (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021), and the time window is not cherrypicked but based on prior literature reporting lexical and sublexical processing at these time period (e.g., Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Gwilliams et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021).

      • In figures 5 and 6, some of the time window portions that are highlighted as significant between the two lines have the lines intersecting. This looks like, even though you have found that the two lines are significantly different during that period of time, the difference between those lines is not of a constant sign, even during that short period. For instance, in figure 5, for the syllable feature, the period of 0 - 200 ms is significantly different between the two populations, correct? But between 0 and 50, normal-hearing are higher, between 50 and 150, hearing-impaired are higher, and between 150 and 200, normal-hearing are higher again, correct? But somehow they still end up significantly different overall between 0 and 200 ms. More explanation of occurrences like these is needed.

      The intersecting lines in Figures 5 and represent the significant time windows for withingroup comparisons (i.e., significant model fit compared to 0). They do not depict betweengroup comparisons, as no significant contrasts were found between the groups. For example, in Figure 1, the significant time windows for the acoustic models are shown separately for the hearing-impaired and normal-hearing groups. No significant differences were observed, as indicated by the sensor topography. We have now clarified this point in the captions for Figures 5 and 6.

      Using ridge regression:

      • What software package(s) and procedure(s) were specifically done to accomplish this? If this is ridge regression and not just ordinary least squares, then there was at least one non-zero regularization parameter in the process. What was it, how did it figure in the modeling and analysis, etc.?

      The ridge regression was performed using customary python codes, making heavy use of the sklearn (v1.12.0) package. We used ridge regression instead of ordinary least squares regression because all our linguistic regressors are 150-dimensional dense vectors, and our acoustic regressors are 130-dimension vectors (see “Acoustic features of the speech stimuli” in “Materials and Methods”). We kept the default regularization parameter (i.e., 1). This ridge regression methods is commonly used in model-brain alignment studies, where the regressors are high-dimensional vectors taken from language models (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021). The code ridge_lstm.py can be found in our OSF repository, and we have added the more detailed description on p.11 of the manuscript.

      • It sounds like the regressors are the hidden layer activations, which you reduced from 2,048 to 150 non-acoustic, or linguistic, regressors, per linguistic level, correct? So you have 150 regressors, for each of 5 linguistic levels. These regressors collectively contribute to the deconvolution and EEG prediction from the resulting TRFs, correct? This sounds like a lot of overfitting. How much correlation is there from one of these 150 regressors to the next? Elsewhere, it sounds like you end up with only one regressor for each of the 5 linguistic levels. So these aspects need to be clarified.

      • For these regressors, you are comparing the "regression outcomes" for different conditions; "regression outcomes" are the R2 between predicted and actual EEG, which is the coefficient of determination, correct? If this is R2, how is it that you have some negative numbers in some of the plots? R2 should be only positive, between 0 and 1.

      Yes we reduced 2048-dimensional vectors for each of the 5 linguistic levels to 150 using PCA, mainly for saving computational resources. We used ridge regression, following the standard practice in the field (e.g., Gao et al., 2024; Goldstein et al., 2022; Huth et al., 2016; Schmitt et al., 2021; Schrimpf et al., 2021). 

      Yes, the regression outcomes are the R<sup>2</sup> values representing the fit between the predicted and actual EEG data. However, we reported normalized R<sup>2</sup> values which are ztransformed in the plots. All our spatiotemporal cluster permutation analyses were conducted using the z-transformed R<sup>2</sup> values. We have added this clarification both in the figure captions and on p.11 of the manuscript. As a side note, R<sup>2</sup> values can be negative because they are not the square of a correlation coefficient. Rather, R<sup>2</sup> compares the fit of the chosen model to that of a horizontal straight line (the null hypothesis). If the chosen model fits the data worse than the horizontal line, then R<sup>2</sup> value becomes negative: https://www.graphpad.com/support/faq/how-can-rsup2sup-be-negative 

      Reviewer #2 (Public Review):

      This study compares neural responses to speech in normal-hearing and hearing-impaired listeners, investigating how different levels of the linguistic hierarchy are impacted across the two cohorts, both in a single-talker and multi-talker listening scenario. It finds that, while normal-hearing listeners have a comparable cortical encoding of speech-in-quiet and attended speech from a multi-talker mixture, participants with hearing impairment instead show a reduced cortical encoding of speech when it is presented in a competing listening scenario. When looking across the different levels of the speech processing hierarchy in the multi-talker condition, normal-hearing participants show a greater cortical encoding of the attended compared to the unattended stream in all speech processing layers - from acoustics to sentencelevel information. Hearing-impaired listeners, on the other hand, only have increased cortical responses to the attended stream for the word and phrase levels, while all other levels do not differ between attended and unattended streams.

      The methods for modelling the hierarchy of speech features (HM-LSTM) and the relationship between brain responses and specific speech features (ridge-regression) are appropriate for the research question, with some caveats on the experimental procedure. This work offers an interesting insight into the neural encoding of multi-talker speech in listeners with hearing impairment, and it represents a useful contribution towards understanding speech perception in cocktail-party scenarios across different hearing abilities. While the conclusions are overall supported by the data, there are limitations and certain aspects that require further clarification.

      (1) In the multi-talker section of the experiment, participants were instructed to selectively attend to the male or the female talker, and to rate the intelligibility, but they did not have to perform any behavioural task (e.g., comprehension questions, word detection or repetition), which could have demonstrated at least an attempt to comply with the task instructions. As such, it is difficult to determine whether the lack of increased cortical encoding of Attended vs. Unattended speech across many speech features in hearing-impaired listeners is due to a different attentional strategy, which might be more oriented at "getting the gist" of the story (as the increased tracking of only word and phrase levels might suggest), or instead it is due to hearing-impaired listeners completely disengaging from the task and tuning back in for selected key-words or word combinations. Especially the lack of Attended vs. Unattended cortical benefit at the level of acoustics is puzzling and might indicate difficulties in performing the task. I think this caveat is important and should be highlighted in the Discussion section. RE: Thank you very much for the suggestion. We admit that the hearing-impaired listeners might adopt different attentional strategies or potentially disengage from the task due to comprehension difficulties. However, we would like to emphasize that our hearing-impaired participants have extended high-frequency (EHF) hearing loss, with impairment only at frequencies above 8 kHz. Their condition is likely not severe enough to cause them to adopt a markedly different attentional strategy for this task. Moreover, it is possible that our normalhearing listeners may also adopt varying attentional strategies, yet the comparison still revealed notable differences.We have added the caveat in the Discussion section on p.8 of the manuscript.

      (2) In the EEG recording and preprocessing section, you state that the EEG was filtered between 0.1Hz and 45Hz. Why did you choose this very broadband frequency range? In the literature, speech responses are robustly identified between 0.5Hz/1Hz and 8Hz. Would these results emerge using a narrower and lower frequency band? Considering the goal of your study, it might also be interesting to run your analysis pipeline on conventional frequency bands, such as Delta and Theta, since you are looking into the processing of information at different temporal scales.

      Indeed, we have decomposed the epoched EEG time series for each section into six classic frequency bands components (delta 1–3 Hz, theta 4–7 Hz, alpha 8–12 Hz, beta 12–20 Hz, gamma 30–45 Hz) by convolving the data with complex Morlet wavelets as implemented in MNE-Python (version 0.24.0). The number of cycles in the Morlet wavelets was set to frequency/4 for each frequency bin. The power values for each time point and frequency bin were obtained by taking the square root of the resulting time-frequency coefficients. These power values were normalized to reflect relative changes (expressed in dB) with respect to the 500 ms pre-stimulus baseline. This yielded a power value for each time point and frequency bin for each section. We specifically examined the delta and theta bands, and computed the correlation between the regression outcome (R<sup>2</sup> in the shape of number of subject * sensor * time were flattened for computing correlation) for the five linguistic predictors from these bands and those obtained using data from all frequency bands. The results showed high correlation coefficients (see the correlation matrix in Supplementary Figures S2 for the attended and unattended speech). Therefore, we opted to use the epoched EEG data from all frequency bands for our analyses. We have added this clarification in the Results section on p.5 and the “EEG recording and preprocessing” section in “Materials and Methods” on p.11 of the manuscript.

      (3) A paragraph with more information on the HM-LSTM would be useful to understand the model used without relying on the Chung et al. (2017) paper. In particular, I think the updating mechanism of the model should be clarified. It would also be interesting to modify the updating factor of the model, along the lines of Schmitt et al. (2021), to assess whether a HM-LSTM with faster or slower updates can better describe the neural activity of hearing-impaired listeners. That is, perhaps the difference between hearing-impaired and normal-hearing participants lies in the temporal dynamics, and not necessarily in a completely different attentional strategy (or disengagement from the stimuli, as I mentioned above).

      Thank you for the suggestion. We have added more details on our HM-LSTM model on p.10 “Hierarchical multiscale LSTM model” in “Materials and Methods”: Our HM-LSTM model consists of 4 layers, at each layer, the model implements a COPY or UPDATE operation at each time step t. The COPY operation maintains the current cell state of without any changes until it receives a summarized input from the lower layer. The UPDATE operation occurs when a linguistic boundary is detected in the layer below, but no boundary was detected at the previous time step t-1. In this case, the cell updates its summary representation, similar to standard RNNs. We agree that exploring modifications to the model’s updating factor would be an interesting direction. However, since we have already observed contrasts between normal-hearing and hearing-impaired listeners using the current model’s update parameters, we believe discussing additional hypotheses would overextend the scope of this paper.

      (4) When explaining how you extracted phoneme information, you mention that "the inputs to the model were the vector representations of the phonemes". It is not clear to me whether you extracted specific phonetic features (e.g., "p" sound vs. "b" sound), or simply the phoneme onsets. Could you clarify this point in the text, please?

      The model inputs were individual phonemes from two sentences, each transformed into a 1024-dimensional vector using a simple lookup table. This lookup table stores embeddings for a fixed dictionary of all unique phonemes in Chinese. This approach is a foundational technique in many advanced NLP models, enabling the representation of discrete input symbols in a continuous vector space. We have added this clarification on p.10 of the manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The authors aimed to investigate how the brain processes different linguistic units (from phonemes to sentences) in challenging listening conditions, such as multi-talker environments, and how this processing differs between individuals with normal hearing and those with hearing impairments. Using a hierarchical language model and EEG data, they sought to understand the neural underpinnings of speech comprehension at various temporal scales and identify specific challenges that hearing-impaired listeners face in noisy settings.

      Strengths:

      Overall, the combination of computational modeling, detailed EEG analysis, and comprehensive experimental design thoroughly investigates the neural mechanisms underlying speech comprehension in complex auditory environments.

      The use of a hierarchical language model (HM-LSTM) offers a data-driven approach to dissect and analyze linguistic information at multiple temporal scales (phoneme, syllable, word, phrase, and sentence). This model allows for a comprehensive neural encoding examination of how different levels of linguistic processing are represented in the brain.

      The study includes both single-talker and multi-talker conditions, as well as participants with normal hearing and those with hearing impairments. This design provides a robust framework for comparing neural processing across different listening scenarios and groups.

      Weaknesses:

      The analyses heavily rely on one specific computational model, which limits the robustness of the findings. The use of a single DNN-based hierarchical model to represent linguistic information, while innovative, may not capture the full range of neural coding present in different populations. A low-accuracy regression model-fit does not necessarily indicate the absence of neural coding for a specific type of information. The DNN model represents information in a manner constrained by its architecture and training objectives, which might fit one population better than another without proving the non-existence of such information in the other group. To address this limitation, the authors should consider evaluating alternative models and methods. For example, directly using spectrograms, discrete phoneme/syllable/word coding as features, and performing feature-based temporal response function (TRF) analysis could serve as valuable baseline models. This approach would provide a more comprehensive evaluation of the neural encoding of linguistic information.

      Our acoustic features are indeed direct the broadband envelopes and the log-mel spectrograms of the speech streams. The amplitude envelope of the speech signal was extracted using the Hilbert transform. The 129-dimension spectrogram and 1-dimension envelope were concatenated to form a 130-dimension acoustic feature at every 10 ms of the speech stimuli. Given the duration of our EEG recordings, which span over 10 minutes, conducting multivariate TRF (mTRF) analysis with such high-dimensional predictors was not feasible. Instead, we used ridge regression to predict EEG responses across 9 temporal latencies, ranging from -100 ms to +300 ms, with additional 50 ms latencies surrounding sentence offsets. To evaluate the model's performance, we extracted the R<sup>2</sup> values at each latency, providing a temporal profile of regression performance over the analyzed time period. This approach is conceptually similar to TRF analysis.

      We agree that including baseline models for the linguistic features is important, and we have now added results from mTRF analysis using phoneme, syllable, word, phrase, and sentence rates as discrete predictors (i.e., marking a value of 1 at each unit boundary offset). Our EEG data spans the entire 10-minute duration for each condition, sampled at 10-ms intervals. The TRF results for our main comparison—attended versus unattended conditions— showed similar patterns to those observed using features from our HM-LSTM model. At the phoneme and syllable levels, normal-hearing listeners showed marginally significantly higher TRF weights for attended speech compared to unattended speech at approximately -80 to 150 ms after phoneme offsets (t=2.75, Cohen’s d=0.87, p=0.057), and 120 to 210 ms after syllable offsets (t=3.96, Cohen’s d=0.73d = 0.73, p=0.083). At the word and phrase levels, normalhearing listeners exhibited significantly higher TRF weights for attended speech compared to unattended speech at 190 to 290 ms after word offsets (t=4, Cohen’s d=1.13, p=0.049), and around 120 to 290 ms after phrase offsets (t=5.27, Cohen’s d=1.09, p=0.045). For hearing-impaired listeners, marginally significant effects were observed at 190 to 290 ms after word offsets (t=1.54, Cohen’s d=0.6, p=0.059), and 180 to 290 ms after phrase offsets (t=3.63, Cohen’s d=0.89, p=0.09). These results have been added on p.7 of the manuscript, and the corresponding figure is included as Supplementary F2.

      It is not entirely clear if the DNN model used in this study effectively serves the authors' goal of capturing different linguistic information at various layers. Specifically, the results presented in Figure 3C are somewhat confusing. While the phonemes are labeled, the syllables, words, phrases, and sentences are not, making it difficult to interpret how the model distinguishes between these levels of linguistic information. The claim that "Hidden-layer activity for samevowel sentences exhibited much more similar distributions at the phoneme and syllable levels compared to those at the word, phrase and sentence levels" is not convincingly supported by the provided visualizations. To strengthen their argument, the authors should use more quantified metrics to demonstrate that the model indeed captures phrase, word, syllable, and phoneme information at different layers. This is a crucial prerequisite for the subsequent analyses and claims about the hierarchical processing of linguistic information in the brain.

      Quantitative measures such as mutual information, clustering metrics, or decoding accuracy for each linguistic level could provide clearer evidence of the model's effectiveness in this regard.

      In Figure 3C, we used color-coding to represent the activity of five hidden layers after dimensionality reduction. Each dot on the plot corresponds to one test sentence. Only phonemes are labeled because each syllable in our test sentences contains the same vowels (see Table S1). The results demonstrate that the phoneme layer effectively distinguishes different phonemes, while the higher linguistic layers do not. We believe these findings provide evidence that different layers capture distinct linguistic information. Additionally, we computed the correlation coefficients between each pair of linguistic predictors, as shown in Figure 3B. We think this analysis serves a similar purpose to computing the mutual information between pairs of hidden-layer activities for our constructed sentences. Furthermore, the mTRF results based on rate models of the linguistic features we presented earlier align closely with the regression results using the hidden-layer activity from our HM-LSTM model. This further supports the conclusion that our model successfully captures relevant information across these linguistic levels. We have added the clarification on p.5 of the manuscript.

      The formulation of the regression analysis is somewhat unclear. The choice of sentence offsets as the anchor point for the temporal analysis, and the focus on the [-100ms, +300ms] interval, needs further justification. Since EEG measures underlying neural activity in near real-time, it is expected that lower-level acoustic information, which is relatively transient, such as phonemes and syllables, would be distributed throughout the time course of the entire sentence. It is not evident if this limited time window effectively captures the neural responses to the entire sentence, especially for lower-level linguistic features. A more comprehensive analysis covering the entire time course of the sentence, or at least a longer temporal window, would provide a clearer understanding of how different linguistic units are processed over time. Additionally, explaining the rationale behind choosing this specific time window and how it aligns with the temporal dynamics of speech processing would enhance the clarity and validity of the regression analysis.

      Thank you for pointing this out. We chose this time window as lexical or phrasal processing typically occurs 200 ms after stimulus offsets (Bemis & Pylkkanen, 2011; Goldstein et al., 2022; Li et al., 2024; Li & Pylkkänen, 2021). Additionally, we included the -100 to 200 ms time period in our analysis to examine phoneme and syllable level processing (e.g., Gwilliams et al., 2022). Using the entire sentence duration was not feasible, as the sentences in the stimuli vary in length, making statistical analysis challenging. Additionally, since the stimuli consist of continuous speech, extending the time window would risk including linguistic units from subsequent sentences. This would introduce ambiguity as to whether the EEG responses correspond to the current or the following sentence. We have added this clarification on p.12 of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As I mentioned, I think the OSF repo needs to be changed to give anyone access. I would recommend pursuing the lines of thought I mentioned in the public review to make this study complete and to allow it to fit into the already existing literature to facilitate comparisons.

      Yes the OSF folder is now public. We have made revisions following all reviewers’ suggestions.

      There are some typos in figure labels, e.g. 2B.

      Thank you for pointing it out! We have now revised the typo in Figure 2B.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was able to access all of the audio files and code for the study, but no EEG data was shared in the OSF repository. Unless there is some ethical and/or legal constraint, my understanding of eLife's policy is that the neural data should be made publicly available as well.

      The preprocessed EEG data in .npy format in the OSF repository. 

      (2) The line-plots in Figures 4B,5B, and 6B have very similar colours. They would be easier to interpret if you changed the line appearance as well as the colours. E.g., dotted line for hearingimpaired listeners, thick line for normal-hearing.

      Thank you for the suggestion! We have now used thicker lines for normal-impaired listeners in all our line plots.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors may consider presenting raw event-related potentials (ERPs) or spatiotemporal response profiles before delving into the more complex regression encoding analysis. This would provide a clearer foundational understanding of the neural activity patterns. For example, it is not clear if the main claims, such as the neural activity in the normal-hearing group encoding phonetic information in attended speech better than in unattended speech, are directly observable. Showing ERP differences or spatiotemporal response pattern differences could support these claims more straightforwardly. Additionally, training pattern classifiers to test if different levels of information can be decoded from EEG activity in specific groups could provide further validation of the findings.

      We have now included results from more traditional mTRF analyses using phoneme, syllable, word, phrase, and sentence rates as baseline models (see p.7 of the manuscript and Figure S3). The results show similar patterns to those observed in our current analyses. While we agree that classification analyses would be very interesting, our regression analyses have already demonstrated distinct EEG patterns for each linguistic level. Consequently, classification analyses would likely yield similar results unless a different method for representing linguistic information at these levels is employed. To the best of our knowledge, no other computational model currently exists that can simultaneously represent these linguistic levels.

      (2) Is there any behavioral metric suggesting that these hearing-impaired participants do have deficits in comprehending long sentences? The self-rated intelligibility is useful, but cannot fully distinguish between perceiving lower-level phonetic information vs longer sentence comprehension.

      In the current study, we included only self-rated intelligibility tests. We acknowledge that this approach might not fully distinguish between the perception of lower-level phonetic information and higher-level sentence comprehension. However, it remains unclear what type of behavioral test would effectively address this distinction. Furthermore, our primary aim was to use the behavioral results to demonstrate that our hearing-impaired listeners experienced speech comprehension difficulties in multi-talker environments, while relying on the EEG data to investigate comprehension challenges at various linguistic levels.

      Minor:

      (1) Page 2, second line in Introduction, "Phonemes occur over ..." should be lowercase.

      According to APA format, the first word after the colon is capitalized if it begins a complete sentence (https://blog.apastyle.org/apastyle/2011/06/capitalization-after-colons.html). Here

      the sentence is a complete sentence so we used uppercase for “phonemes”.

      (2) Page 8, second paragraph "...-100ms to 100ms relative to sentence onsets", should it be onsets or offsets?

      This is typo and it should be offsets. We have now revised it.

      References

      Bemis, D. K., & Pylkkanen, L. (2011). Simple composition: An MEG investigation into the comprehension of minimal linguistic phrases. Journal of Neuroscience, 31(8), 2801– 2814.

      Gao, C., Li, J., Chen, J., & Huang, S. (2024). Measuring meaning composition in the human brain with composition scores from large language models. In L.-W. Ku, A. Martins, & V. Srikumar (Eds.), Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 11295–11308). Association for Computational Linguistics.

      Goldstein, A., Zada, Z., Buchnik, E., Schain, M., Price, A., Aubrey, B., Nastase, S. A., Feder, A., Emanuel, D., Cohen, A., Jansen, A., Gazula, H., Choe, G., Rao, A., Kim, C., Casto, C., Fanda, L., Doyle, W., Friedman, D., … Hasson, U. (2022). Shared computational principles for language processing in humans and deep language models. Nature Neuroscience, 25(3), Article 3.

      Gwilliams, L., King, J.-R., Marantz, A., & Poeppel, D. (2022). Neural dynamics of phoneme sequences reveal position-invariant code for content and order. Nature Communications, 13(1), Article 1.

      Huth, A. G., de Heer, W. A., Griffiths, T. L., Theunissen, F. E., & Gallant, J. L. (2016). Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600), 453–458.

      Li, J., Lai, M., & Pylkkänen, L. (2024). Semantic composition in experimental and naturalistic paradigms. Imaging Neuroscience, 2, 1–17.

      Li, J., & Pylkkänen, L. (2021). Disentangling semantic composition and semantic association in the left temporal lobe. Journal of Neuroscience, 41(30), 6526–6538.

      Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177–190.

      Schmitt, L.-M., Erb, J., Tune, S., Rysop, A. U., Hartwigsen, G., & Obleser, J. (2021). Predicting speech from a cortical hierarchy of event-based time scales. Science Advances, 7(49), eabi6070.

      Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hosseini, E. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2021). The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45), e2105646118.

      Sugimoto, Y., Yoshida, R., Jeong, H., Koizumi, M., Brennan, J. R., & Oseki, Y. (2024). Localizing Syntactic Composition with Left-Corner Recurrent Neural Network Grammars. Neurobiology of Language, 5(1), 201–224.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary:

      The study identifies two types of activation: one that is cue-triggered and nonspecific to motion directions, and another that is specific to the exposed motion directions but occurs in a reversed manner. The finding that activity in the medial temporal lobe (MTL) preceded that in the visual cortex suggests that the visual cortex may serve as a platform for the manifestation of replay events, which potentially enhance visual sequence learning.

      Evaluations:

      Identifying the two types of activation after exposure to a sequence of motion directions is very interesting. The experimental design, procedures and analyses are solid. The findings are interesting and novel.

      In the original submission, it was not immediately clear to me why the second type of activation was suggested to occur spontaneously. The procedural differences in the analyses that distinguished between the two types of activation need to be a little better clarified. However, this concern has been satisfactorily addressed in the revision.

      We thank the reviewer for his/her positive evaluation and thoughtful comments. 

      Reviewer #2 (Public review):

      This paper shows and analyzes an interesting phenomenon. It shows that when people are exposed to sequences of moving dots (That is moving dots in one direction, followed by another direction etc.), that showing either the starting movement direction, or ending movement direction causes a coarsegrained brain response that is similar to that elicited by the complete sequence of 4 directions. However, they show by decoding the sensor responses that this brain activity actually does not carry information about the actual sequence and the motion directions, at least not on the time scale of the initial sequence. They also show a reverse reply on a highly-compressed time scale, which is elicited during the period of elevated activity, and activated by the first and last elements of the sequence, but not others. Additionally, these replays seem to occur during periods of cortical ripples, similar to what is found in animal studies.

      These results are intriguing. They are based on MEG recordings in humans, and finding such replays in humans is novel. Also, this is based on what seems to be sophisticated statistical analysis. The statistical methodology seems valid, but due to its complexity it is not easy to understand. The methods especially those described in figures 3 and 4 should be explained better.  

      We thank the reviewer’s detailed evaluation. As suggested, we have further revised the Methods and Results sections, particularly the descriptions related to Figures 3 and 4, to enhance clarity. Please see the revisions highlighted in red in the revised manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The most important results here are in Figure 4, and they rely on methods explained in Figure 3. Figure 4 and the results in the figure are confusing.

      What is the red bar in 4B,E. What are the units of the Y axis in figure 4B,E?

      Does sequenceness have units? How do we interpret these magnitudes apart from the line of statistical significance? Shouldn't there be two lines, one for forward replay and the other for backward replay rather than a single line with positive and negative values? The term sequnceness is defined in figure 3, and is key. The replayed sequence in figure 4A,D seems to last about 120 ms.

      What is the meaning of having significance only within a window of 28-36 ms?

      We thank the reviewer’s careful reading and insightful comments. We apologize for the lack of clarity regarding these details in the previous version. As mentioned above, we have revised the Methods and Results sections to enhance clarity throughout the manuscript. For convenience, we provide detailed explanations addressing the specific points raised by the reviewer below.

      First, the red bars in Figures 4B and 4E indicate the lags when the evidence of sequenceness surpassed the statistical significance threshold, as determined by permutation testing. We have now explicitly clarified this in the revised figure captions.

      Second, sequenceness doesn’t have units. It corresponds to the regression coefficient (β) obtained from the second-level GLM in the TDLM framework. Specifically, in the first step of TDLM, we constructed an empirical transition matrix that quantifies the evidence for all possible transitions (e.g., 0° → 90°) at each time lag (Δt). In the second step, we evaluated the extent to which each model transition matrix (e.g., forward or backward transitions) predicts the empirical transition matrix at each Δt, yielding second-level β values. Sequenceness is defined as the difference between the β values for the forward and backward transition models, reflecting the relative strength and directionality of sequential replay. As it is derived from regression coefficients, sequenceness is inherently a unitless measure.

      Regarding the interpretation of sequenceness magnitudes beyond statistical significance, the β values reflect the extent to which the model transition matrix explains variance in the empirical transition matrix. While larger β values suggest stronger sequenceness, absolute magnitudes are influenced by various factors, such as between-participant noise. Therefore, the key criterion for interpreting these values is whether they surpass permutationbased significance thresholds, which indicate that the observed sequenceness is unlikely to have occurred by chance.

      Third, as the reviewer correctly pointed out, we initially computed two separate regression lines, one for forward replay and the other for backward replay. We then defined sequenceness as the contrast between the forward and backward replay (forward minus backward). This contrast approach is commonly used in previous studies to remove between-participant variance in the sequential replay per se, which may arise due to variability in task engagement or measurement sensitivity (Liu et al., 2021; Nour et al., 2021).

      Finally, regarding the duration of replay events, the example sequences shown in Figures 4A and 4D indeed span about 120 ms in total. However, the time lag (Δt) between successive reactivation peaks within these sequences is about 30 ms. This is in line with the findings shown in Figures 4B and 4E, where statistical significance is observed at a time lag window of 28 – 36 ms on the x-axis. It is important to note that the x-axis in these plots represents the time lag (Δt) between sequential reactivations, rather than absolute time.

      We hope these clarifications address the reviewer’s concerns, and we have revised the manuscript accordingly to make these points clearer to readers.

      The methods here are not simple and not simple to explain. The new version is easier to understand. From the new version it seems that the methodology is sound. It should be still clarified and better explained.

      We have carefully revised the manuscript to better explain the methodology. We appreciate the reviewer’s feedback, which is valuable in improving the clarity of our work.

      Now that I understand what they mean by decoding probability, I think that this term is confusing or even misleading. The decoding accuracy is the probability that the direction of motion classification was correct. It seems the so-called decoding probability is value of the logistic regression after normalizing the sum to 1. If this is a standard term it can probably be kept, if not another term would be better.

      Thank you for the reviewer’s comment. We agree that the term decoding probability may initially seem confusing. However, decoding probability is a commonly used term in the neural decoding literature, particularly in human studies (e.g., Liu et al., 2019; Nour et al., 2021; Turner et al., 2023). To maintain consistency with previous work, we have kept this term in the manuscript. We appreciate the opportunity to clarify this point.

      References

      Liu, Y., Dolan, R. J., Higgins, C., Penagos, H., Woolrich, M. W., Ólafsdóttir, H. F., Barry, C., Kurth-Nelson, Z., & Behrens, T. E. (2021). Temporally delayed linear modelling (TDLM) measures replay in both animals and humans. eLife, 10, e66917. https://doi.org/10.7554/eLife.66917

      Liu, Y., Dolan, R. J., Kurth-Nelson, Z., & Behrens, T. E. J. (2019). Human Replay Spontaneously Reorganizes Experience. Cell, 178(3), 640-652.e14. https://doi.org/10.1016/j.cell.2019.06.012

      Nour, M. M., Liu, Y., Arumuham, A., Kurth-Nelson, Z., & Dolan, R. J. (2021). Impaired neural replay of inferred relationships in schizophrenia. Cell, 184(16), 4315-4328.e17. https://doi.org/10.1016/j.cell.2021.06.012

      Turner, W., Blom, T., & Hogendoorn, H. (2023). Visual Information Is Predictively Encoded in Occipital Alpha/Low-Beta Oscillations. Journal of Neuroscience, 43(30), 5537–5545. https://doi.org/10.1523/JNEUROSCI.0135-23.2023

    1. Author response:

      Reviewer 1:

      (1) In general, the representation of target and distractor processing is a bit of a reach. Target processing is represented by SSVEP amplitude, which is most likely going to be related to the contrast of the dots, as opposed to representing coherent motion energy, which is the actual target. These may well be linked (e.g., greater attention to the coherent motion task might increase SSVEP amplitude), but I would call it a limitation of the interpretation. Decoding accuracy of emotional content makes sense as a measure of distractor processing, and the supplementary analysis comparing target SSVEP amplitude to distractor decoding accuracy is duly noted.

      We agree with the reviewer. This is certainly a limitation and will be acknowledged as such in the revised manuscript.

      (2) Comparing SSVEP amplitude to emotional category decoding accuracy feels a bit like comparing apples with oranges. They have different units and scales and probably reflect different neural processes. Is the result the authors find not a little surprising in this context? This relationship does predict performance and is thus intriguing, but I think this methodological aspect needs to be discussed further. For example, is the phase relationship with behaviour a result of a complex interaction between different levels of processing (fundamental contrast vs higher order emotional processing)?

      Traditionally, the SSVEP amplitude at the distractor frequency is used to quantify distractor processing. Given that the target SSVEP amplitude is stronger than that for the distractor, it is possible that the distractor SSVEP amplitude is contaminated by the target SSVEP amplitude due to spectral power leakage; see Figure S4 for a demonstration of this. Because of this issue we therefore introduce the use of decoding accuracy as an index of distractor processing. This has not been done in the SSVEP literature. The lack of correlation between the distractor SSVEP amplitude and the distractor decoding accuracy, although it is kind of like comparing apples with oranges as pointed out by the reviewer, serves the purpose of showing that these two measures are not co-varying, and the use of decoding accuracy is free from the influence of the distractor SSVEP amplitude and thereby free from the influence by the target SSVEP amplitude. This is an important point. We will provide a more thorough discussion of this point in the revised manuscript. 

      Reviewer 2:

      (1) Incomplete Evidence for Rhythmicity at 1 Hz: The central claim of 1 Hz rhythmic sampling is insufficiently validated. The windowing procedure (0.5s windows with 0.25s step) inherently restricts frequency resolution, potentially biasing toward low-frequency components like 1 Hz. Testing different window durations or providing controls would significantly strengthen this claim.

      This is an important point. We plan to follow the reviewer’s suggestion and repeat our analysis using different window sizes to test the robustness of the observed 1Hz rhythmicity. In addition, we plan to also apply the Hilbert transform to extract time-point-by-time-point amplitude envelopes, which will provide a window-free estimation of the distractor strength and further validate the presence of the low-frequency 1Hz dynamics.

      (2) No-Distractor Control Condition: The study lacks a baseline or control condition without distractors. This makes it difficult to determine whether the distractor-related decoding signals or the 1 Hz effect reflect genuine distractor processing or more general task dynamics.

      We agree with the reviewer. This is certainly a limitation and will be acknowledged as such in the revised manuscript.

      (3) Decoding Near Chance Levels: The pairwise decoding accuracies for distractor categories hover close to chance (~55%), raising concerns about robustness. While statistically above chance, the small effect sizes need careful interpretation, particularly when linked to behavior.

      This is a good point. In addition to acknowledging this in the revised manuscript, we will carry out two additional analyses to test this issue further. First, we will implement a random permutation procedure, in which the trial labels are randomly shuffled and the null-hypothesis distribution for decoding accuracy is built, and compare the decoding accuracy from the actual data to this distribution. Second, we will perform a temporal generalization analysis to examine whether the neural representations of the distractor drift over the course of an entire trial, which is 11 seconds long. Recent studies suggest that even when the stimulus stays the same, their neural representations may drift over time.

      (4) No Clear Correlation Between SSVEP and Behavior: Neither target nor distractor signal strength (SSVEP amplitude) correlates with behavioral accuracy. The study instead relies heavily on relative phase, which - while interesting - may benefit from additional converging evidence.

      We felt that what the reviewer pointed out is actually the main point of our study, namely, it is not the overall target or distractor strength that matters for behavior, it is their temporal relationship that matters for behavior. This reveals a novel neuroscience principle that has not been reported in the past. We will stress this point further in the revised manuscript.

      (5) Phase-analysis: phase analysis is performed between different types of signals hindering their interpretability (time-resolved SSVEP amplitude and time-resolved decoding accuracy).

      The time-resolved SSVEP amplitude is used to index the temporal dynamics of target processing whereas the time-resolved decoding accuracy is used to index the temporal dynamics of distractor processing. As such, they can be compared, using relative phase for example, to examine how temporal relations between the two types of processes impact behavior. This said, we do recognize the reviewer’s concern that these two processes are indexed by two different types of signals. We plan to normalize each time course, make them dimensionless, and then compute the temporal relations between them.   

      Appraisal of Aims and Conclusions:

      The authors largely achieved their stated goal of assessing rhythmic sampling of distractors. However, the conclusions drawn - particularly regarding the presence of 1 Hz rhythmicity - rest on analytical choices that should be scrutinized further. While the observed phase-performance relationship is interesting and potentially impactful, the lack of stronger and convergent evidence on the frequency component itself reduces confidence in the broader conclusions.

      Impact and Utility to the Field:

      If validated, the findings will advance our understanding of attentional dynamics and competition in complex visual environments. Demonstrating that ignored distractors can be rhythmically sampled at similar frequencies to targets has implications for models of attention and cognitive control. However, the methodological limitations currently constrain the paper's impact.

      Thanks for these comments and positive assessment of our work’s potential implications and impact. We will try our best in the revision process to address the concerns.

      Additional Context and Considerations:

      (1) The use of EEG-fMRI is mentioned but not leveraged. If BOLD data were collected, even exploratory fMRI analyses (e.g., distractor modulation in visual cortex) could provide valuable converging evidence.

      Indeed, leveraging fMRI data in EEG studies would be very beneficial, as having been demonstrated in our previous work. However, given that this study concerns the temporal relationship between target and distractor processing, it is felt that fMRI, given its well-known limitation in temporal resolution, has limited potential to contribute. We will be exploring this rich dataset in other ways where the two modalities are integrated to gain more insights not possible with either modality used alone.

      (2) In turn, removal of fMRI artifacts might introduce biases or alter the data. For instance, the authors might consider investigating potential fMRI artifact harmonics around 1 Hz to address concerns regarding induced spectral components.

      We have done extensive work in the area of simultaneous EEG-fMRI and have not encountered artifacts with a 1Hz rhythmicity. Also, the fact that the temporal relations between target processing and distractor processing at 1Hz predict behavior is another indication that the 1Hz rhythmicity is a neuroscientific effect not an artifact. However, we will be looking into this carefully and address this in the revision process.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." The proposed mechanisms result in moderate performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Given the high level of complexity of all components of the model, it is not clear which features of which components are most important for its performance. There is also room for improvement in the narrative structure of the manuscript and the organization of concepts and data.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation. That said, the fundamental concepts behind nonlinear feature binding in neurons with compartmentalized dendrites have been explored in previous work, so it is not clear how this study represents a significant conceptual advance. Finally, the presentation of the model, the motivation and justification of each design choice, and the interpretation of each result could be restructured for clarity to be better received by a wider audience.

      Thank you for the feedback! We agree that the complexity of our model can make it challenging to intuitively understand the underlying mechanisms. To address this, we have revised the manuscript to include additional simulations and clearer explanations of the mechanisms at play.

      In the revised introduction, we now explicitly state our primary aim: to assess to what extent a biophysically detailed neuron model can support the theory proposed by Tran-Van-Minh et al. and explore whether such computations can be learned by a single neuron, specifically a projection neuron in the striatum. To achieve this, we focus on several key mechanisms:

      (1) A local learning rule: We develop a learning rule driven by local calcium dynamics in the synapse and by reward signals from the neuromodulator dopamine. This plasticity rule is based on the known synaptic machinery for triggering LTP or LTD in the corticostriatal synapse onto dSPNs (Shen et al., 2008). Importantly, the rule does not rely on supervised learning paradigms and neither is a separate training and testing phase needed.

      (2) Robust dendritic nonlinearities: According to Tran-Van-Minh et al., (2015) sufficient supralinear integration is needed to ensure that e.g. two inputs (i.e. one feature combination in the NFBP, Figure 1A) on the same dendrite generate greater somatic depolarization than if those inputs were distributed across different dendrites. To accomplish this we generate sufficiently robust dendritic plateau potentials using the approach in Trpevski et al., (2023). 

      (3) Metaplasticity: Although not discussed much in more theoretical work, our study demonstrates the necessity of metaplasticity for achieving stable and physiologically realistic synaptic weights. This mechanism ensures that synaptic strengths remain within biologically plausible ranges during training, regardless of initial synaptic weights.

      We have also clarified our design choices and the rationale behind them, as well as restructured the interpretation of our results for greater accessibility. We hope these revisions make our approach and findings more transparent and easier to engage with for a broader audience.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This study extends three previous lines of work:  

      (1) Prior computational/phenomenological work has shown that the presence of dendritic nonlinearities can enable single neurons to perform linearly non-separable tasks like XOR and feature binding (e.g. Tran-Van-Minh et al., Front. Cell. Neurosci., 2015).

      Prior computational and phenomenological work, such as Tran-Van-Minh et al. (Front. Cell. Neurosci., 2015), directly inspired our study, as we now explicitly state in the introduction (page 4, lines 19-22). While Tran-Van-Minh theoretically demonstrated that these principles could solve the NFBP, it remains untested to what extent this can be achieved quantitatively in biophysically detailed neuron models using biologically plausible learning rules - which is what we test here.

      (2) This study and a previous biophysical modeling study (Trpevski et al., Front. Cell. Neurosci., 2023) rely heavily on the finding from Chalifoux & Carter, J. Neurosci., 2011 that blocking glutamate transporters with TBOA increases dendritic calcium signals. The proposed model thus depends on a specific biophysical mechanism for dendritic plateau potential generation, where spatiotemporally clustered inputs must be co-activated on a single branch, and the voltage compartmentalization of the branch and the voltage-dependence of NMDARs is not enough, but additionally glutamate spillover from neighboring synapses must activate extrasynaptic NMDARs. If this specific biophysical implementation of dendritic plateau potentials is essential to the findings in this study, the authors have not made that connection clear. If it is a simple threshold nonlinearity in dendrites that is important for the model, and not the specific underlying biophysical mechanisms, then the study does not appear to provide a conceptual advance over previous studies demonstrating nonlinear feature binding with simpler implementations of dendritic nonlinearities.

      We appreciate the feedback on the hypothesized role of glutamate spillover in our model. While the current manuscript and Trpevski et al. (2023) emphasize glutamate spillover as a plausible biophysical mechanism to provide sufficiently robust and supralinear plateau potentials, we acknowledge, however, that the mechanisms of supralinearity of dendritic integration, might not depend solely on this specific mechanism in other types of neurons. In Trpevski et al (2023) we, however, realized that if we allow too ‘graded’ dendritic plateaus, using the quite shallow Mg-block reported in experiments, it was difficult to solve the NFBP. The conceptual advance of our study lies in demonstrating that sufficiently nonlinear dendritic integration is needed and that this can be accounted for by assuming spillover in SPNs—but regardless of its biophysical source (e.g. NMDA spillover, steeper NMDA Mg block activation curves or other voltage dependent conductances that cause supralinear dendritic integration)—it enables biophysically detailed neurons to solve the nonlinear feature binding problem. To address this point and clarify the generality of our conclusions, we have revised the relevant sections in the manuscript to state this explicitly.

      (3) Prior work has utilized "sliding-threshold," BCM-like plasticity rules to achieve neuronal selectivity and stability in synaptic weights. Other work has shown coordinated excitatory and inhibitory plasticity. The current manuscript combines "metaplasticity" at excitatory synapses with suppression of inhibitory strength onto strongly activated branches. This resembles the lateral inhibition scheme proposed by Olshausen (Christopher J. Rozell, Don H. Johnson, Richard G. Baraniuk, Bruno A. Olshausen; Sparse Coding via Thresholding and Local Competition in Neural Circuits. Neural Comput 2008; 20 (10): 2526-2563. doi: https://doi.org/10.1162/neco.2008.03-07-486). However, the complexity of the biophysical model makes it difficult to evaluate the relative importance of the additional complexity of the learning scheme.

      We initially tried solving the NFBP with only excitatory plasticity, which worked reasonably well, especially if we assume a small population of neurons collaborates under physiological conditions. However, we observed that plateau potentials from distally located inputs were less effective, and we now explain this limitation in the revised manuscript (page 14, lines 23-37).

      To address this, we added inhibitory plasticity inspired by mechanisms discussed in Castillo et al. (2011) , Ravasenga et al., and Chapman et al. (2022) , as now explicitly stated in the text (page 32, lines 23-26). While our GABA plasticity rule is speculative, it demonstrates that distal GABAergic plasticity can enhance nonlinear computations. These results are particularly encouraging, as it shows that implementing these mechanisms at the single-neuron level produces behavior consistent with network-level models like BCM-like plasticity rules and those proposed by Rozell et al. We hope this will inspire further experimental work on inhibitory plasticity mechanisms.

      P2, paragraph 2: Grammar: "multiple dendritic regions, preferentially responsive to different input values or features, are known to form with close dendritic proximity." The meaning is not clear. "Dendritic regions" do not "form with close dendritic proximity."

      Rewritten (current page 2, line 35)

      P5, paragraph 3: Grammar: I think you mean "strengthened synapses" not "synapses strengthened".

      Rewritten (current page 14, line 36)

      P8, paragraph 1: Grammar: "equally often" not "equally much".

      Updated (current page 10, line 2)

      P8, paragraph 2: "This is because of the learning rule that successively slides the LTP NMDA Ca-dependent plasticity kernel over training." It is not clear what is meant by "sliding," either here or in the Methods. Please clarify.

      We have updated the text and removed the word “sliding” throughout the manuscript to clarify that the calcium dependence of the kernels are in fact updated

      P10, Figure 3C (left): After reading the accompanying text on P8, para 2, I am left not understanding what makes the difference between the two groups of synapses that both encode "yellow," on the same dendritic branch (d1) (so both see the same plateau potentials and dopamine) but one potentiates and one depresses. Please clarify.

      Some "yellow" and "banana" synapses are initialized with weak conductances, limiting their ability to learn due to the relatively slow dynamics of the LTP kernel. These weak synapses fail to reach the calcium thresholds necessary for potentiation during a dopamine peak, yet they remain susceptible to depression under LTD conditions. Initially, the dynamics of the LTP kernel does not allow significant potentiation, even in the presence of appropriate signals such as plateau potentials and dopamine (page 10, lines 22–26). We have added a more detailed explanation of how the learning rule operates in the section “Characterization of the Synaptic Plasticity Rule” on page 9 and have clarified the specific reason why the weaker yellow synapses undergo LTD (page 11, lines 1–7).

      As shown in Supplementary Figure 6, during subthreshold learning, the initial conductance is also low, which similarly hinders the synapses' ability to potentiate. However, with sufficient dopamine, the LTP kernel adapts by shifting closer to the observed calcium levels, allowing these synapses to eventually strengthen. This dynamic highlights how the model enables initially weak synapses to "catch up" under consistent activation and favorable dopaminergic conditions.

      P9, paragraph 1: The phrase "the metaplasticity kernel" is introduced here without prior explanation or motivation for including this level of complexity in the model. Please set it up before you use it.

      A sentence introducing metaplasticity has been added to the introduction (page 3, lines 36-42) as well as on page 9, where the kernel is introduced (page 9, lines 26-35)

      P10, Figure 3D: "kernel midline" is not explained.

      We have replotted fig 3 to make it easier to understand what is shown. Also, an explanation of the Kernel midpoint is added to the legend (current page 12, line 19)

      P11, paragraph 1; P13, Fig. 4C: My interpretation of these data is that clustered connectivity with specific branches is essential for the performance of the model. Randomly distributing input features onto branches (allowing all 4 features to innervate single branches) results in poor performance. This is bad, right? The model can't learn unless a specific pre-wiring is assumed. There is not much interpretation provided at this stage of the manuscript, just a flat description of the result. Tell the reader what you think the implications of this are here.

      Thanks for the suggestion - we have updated this section of the manuscript, adding an interpretation of the results that the model often fails to learn both relevant stimuli if all four features are clustered onto the same dendrite (page 13, lines 31-42). 

      In summary, when multiple feature combinations are encoded in the same dendrite with similar conductances, the ability to determine which combination to store depends on the dynamics of the other dendrite. Small variations in conductance, training order, or other stochastic factors can influence the outcome. This challenge, known as the symmetry-breaking problem, has been previously acknowledged in abstract neuron models (Legenstein and Maass, 2011). To address this, additional mechanisms such as branch plasticity—amplifying or attenuating the plateau potential as it propagates from the dendrite to the soma—can be employed (Legenstein and Maass, 2011). 

      P12, paragraph 2; P13, Figure 4E: This result seems suboptimal, that only synapses at a very specific distance from the soma can be used to effectively learn to solve a NFBP. It is not clear to what extent details of the biophysical and morphological model are contributing to this narrow distance-dependence, or whether it matches physiological data.

      We have added Figure 5—figure supplement 1A to clarify why distal synapses may not optimally contribute to learning. This figure illustrates how inhibitory plasticity improves performance by reducing excessive LTD at distal dendrites, thereby enhancing stimulus discrimination. Relevant explanations have been integrated into Page 18, Lines 25-39 in the revised manuscript.

      P14, paragraph 2: Now the authors are assuming that inhibitory synapses are highly tuned to stimulus features. The tuning of inhibitory cells in the hippocampus and cortex is controversial but seems generally weaker than excitatory cells, commensurate with their reduced number relative to excitatory cells. The model has accumulated a lot of assumptions at this point, many without strong experimental support, which again might make more sense when proposing a new theory, but this stitching together of complex mechanisms does not provide a strong intuition for whether the scheme is either biologically plausible or performant for a general class of problem.

      We acknowledge that it is not currently known whether inhibitory synapses in the striatum are tuned to stimulus features. However, given that the striatum is a purely inhibitory structure, it is plausible that lateral inhibition from other projection neurons could be tuned to features, even if feedforward inhibition from interneurons is not. Therefore, we believe this assumption is reasonable in the context of our model. As noted earlier, the GABA plasticity rule in our study is speculative. However, we hope that our work will encourage further experimental investigations, as we demonstrate that if GABAergic inputs are sufficiently specific, they can significantly enhance computations (This is discussed on page 17, lines 8-15.).

      P16, Figure 5E legend: The explanation of the meaning of T_max and T_min in the legend and text needs clarification.

      The abbreviations  T<sub>min</sub> and  T<sub>max</sub> have been updated to CTL and CTH to better reflect their role in calcium threshold tracking. The Figure 5E legend and relevant text have been revised for clarity. Additionally, the Methods section has been reorganized for better readability.

      P16, Figure 5B, C: When the reader reaches this paper, the conundrums presented in Figure 4 are resolved. The "winner-takes-all" inhibitory plasticity both increases the performance when all features are presented to a single branch and increases the range of somatodendritic distances where synapses can effectively be used for stimulus discrimination. The problem, then, is in the narrative. A lot more setup needs to be provided for the question related to whether or not dendritic nonlinearity and synaptic inhibition can be used to perform the NFBP. The authors may consider consolidating the results of Fig. 4 and 5 so that the comparison is made directly, rather than presenting them serially without much foreshadowing.

      In order to facilitate readability, we have updated the following sections of the manuscript to clarify how inhibitory plasticity resolves challenges from Figure 4:

      Figure 5B and Figure 5–figure supplement 1B: Two new panels illustrate the role of inhibitory plasticity in addressing symmetry problems.

      Figure 5–figure supplement 1A: Shows how inhibitory plasticity extends the effective range of somatodendritic distances.

      P18, Figure 6: This should be the most important figure, finally tying in all the previous complexity to show that NFBP can be partially solved with E and I plasticity even when features are distributed randomly across branches without clustering. However, now bringing in the comparison across spillover models is distracting and not necessary. Just show us the same plateau generation model used throughout the paper, with and without inhibition.

      Figure updated. Accumulative spillover and no-spillover conditions have been removed.

      P18, paragraph 2: "In Fig. 6C, we report that a subset of neurons (5 out of 31) successfully solved the NFBP." This study could be significantly strengthened if this phenomenon could (perhaps in parallel) be shown to occur in a simpler model with a simpler plateau generation mechanism. Furthermore, it could be significantly strengthened if the authors could show that, even if features are randomly distributed at initialization, a pruning mechanism could gradually transition the neuron into the state where fewer features are present on each branch, and the performance could approach the results presented in Figure 5 through dynamic connectivity.

      To model structural plasticity is a good suggestion that should be investigated in later work, however, we feel that it goes beyond what we can do in the current manuscript.  We now acknowledge that structural plasticity might play a role. For example we show that if we can assume ‘branch-specific’ spillover, that leads to sufficiently development of local dendritic non-linearities, also one can learn with distributed inputs. In reality, structural plasticity is likely important here, as we now state (current page 22, line 35-42). 

      P17, paragraph 2: "As shown in Fig. 6B, adding the hypothetical nonlinearities to the model increases the performance towards solving part of the NFBP, i.e. learning to respond to one relevant feature combination only. The performance increases with the amount of nonlinearity." This is not shown in Figure 6B.

      Sentence removed. We have added a Figure 6 - figure supplement 1 to better explain the limitations.

      P22, paragraph 1: The "w" parameter here is used to determine whether spatially localized synapses are co-active enough to generate a plateau potential. However, this is the same w learned through synaptic plasticity. Typically LTP and LTD are thought of as changing the number of postsynaptic AMPARs. Does this "w" also change the AMPAR weight in the model? Do the authors envision this as a presynaptic release probability quantity? If so, please state that and provide experimental justification. If not, please justify modifying the activation of postsynaptic NMDARs through plasticity.

      This is an important remark. Our plasticity model differs from classical LTP models as it depends on the link between LTP and increased spillover as described by Henneberger et al., (2020).

      We have updated the method section (page 27, lines 6-11), and we acknowledge, however, that in a real cell, learning might first strengthen the AMPA component, but after learning the ratio of NMDA/AMPA is unchanged ( Watt et al., 2004). This re-balancing between NMDA and AMPA might perhaps be a slower process.

      Reviewer #2 (Public Review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

      Weaknesses:

      I am concerned that the manuscript was submitted too hastily, as evidenced by the quality and logic of the writing and the presentation of the figures. These issues may compromise the integrity of the work. I would recommend a substantial revision of the manuscript to improve the clarity of the writing, incorporate more experiments, and better define the goals of the study.

      Thanks for the valuable feedback. We have now gone through the whole manuscript updating the text, and also improved figures and added some supplementary figures to better explain model mechanisms. In particular, we state more clearly our goal already in the introduction.

      Major Points:

      (1) Quality of Scientific Writing: The current draft does not meet the expected standards. Key issues include:

      i. Mathematical and Implementation Details: The manuscript lacks comprehensive mathematical descriptions and implementation details for the plasticity models (LTP/LTD/Meta) and the SPN model. Given the complexity of the biophysically detailed multicompartment model and the associated learning rules, the inclusion of only nine abstract equations (Eq. 1-9) in the Methods section is insufficient. I was surprised to find no supplementary material providing these crucial details. What parameters were used for the SPN model? What are the mathematical specifics for the extra-synaptic NMDA receptors utilized in this study? For instance, Eq. 3 references [Ca2+]-does this refer to calcium ions influenced by extra-synaptic NMDARs, or does it apply to other standard NMDARs? I also suggest the authors provide pseudocodes for the entire learning process to further clarify the learning rules.

      The model is quite detailed but builds on previous work. For this reason, for model components used in earlier published work (and where models are already available via model repositories, such as ModelDB), we refer the reader to these resources in order to improve readability and to highlight what is novel in this paper - the learning rules itself. The learning rule is now explained in detail. For modelers that want to run the model, we have also provided a GitHub link to the simulation code. We hope this is a reasonable compromise to all readers, i.e, those that only want to understand what is new here (learning rule) and those that also want to test the model code. We explain this to the readers at the beginning of the Methods section.

      ii. Figure quality. The authors seem not to carefully typeset the images, resulting in overcrowding and varying font sizes in the figures. Some of the fonts are too small and hard to read. The text in many of the diagrams is confusing. For example, in Panel A of Figure 3, two flattened images are combined, leading to small, distorted font sizes. In Panels C and D of Figure 7, the inconsistent use of terminology such as "kernels" further complicates the clarity of the presentation. I recommend that the authors thoroughly review all figures and accompanying text to ensure they meet the expected standards of clarity and quality.

      Thanks for directing our attention to these oversights. We have gone through the entire manuscript, updating the figures where needed, and we are making sure that the text and the figure descriptions are clear and adequate and use consistent terminology for all quantities.

      iii. Writing clarity. The manuscript often includes excessive and irrelevant details, particularly in the mathematical discussions. On page 24, within the "Metaplasticity" section, the authors introduce the biological background to support the proposed metaplasticity equation (Eq. 5). However, much of this biological detail is hypothesized rather than experimentally verified. For instance, the claim that "a pause in dopamine triggers a shift towards higher calcium concentrations while a peak in dopamine pushes the LTP kernel in the opposite direction" lacks cited experimental evidence. If evidence exists, it should be clearly referenced; otherwise, these assertions should be presented as theoretical hypotheses. Generally, Eq. 5 and related discussions should be described more concisely, with only a loose connection to dopamine effects until more experimental findings are available.

      The “Metaplasticity” section (pages 30-32) has been updated to be more concise, and the abundant references to dopamine have been removed.

      (2) Goals of the Study: The authors need to clearly define the primary objective of their research. Is it to showcase the computational advantages of the local learning rule, or to elucidate biological functions?

      We have explicitly stated our goal in the introduction (page 4, lines 19-22). Please also see the response to reviewer 1.

      i. Computational Advantage: If the intent is to demonstrate computational advantages, the current experimental results appear inadequate. The learning rule introduced in this work can only solve for four features, whereas previous research (e.g., Bicknell and Hausser, 2021) has shown capability with over 100 features. It is crucial for the authors to extend their demonstrations to prove that their learning rule can handle more than just three features. Furthermore, the requirement to fine-tune the midpoint of the synapse function indicates that the rule modifies the "activation function" of the synapses, as opposed to merely adjusting synaptic weights. In machine learning, modifying weights directly is typically more efficient than altering activation functions during learning tasks. This might account for why the current learning rule is restricted to a limited number of tasks. The authors should critically evaluate whether the proposed local learning rule, including meta-plasticity, actually offers any computational advantage. This evaluation is essential to understand the practical implications and effectiveness of the proposed learning rule.

      Thank you for your feedback. To address the concern regarding feature complexity, we extended our simulations to include learning with 9 and 25 features, achieving accuracies of 80% and 75%, respectively (Figure 6—figure supplement 1A). While our results demonstrate effective performance, the absence of external stabilizers—such as error-modulated functions used in prior studies like Bicknell and Hausser (2021)—means that the model's performance can be more sensitive to occasional incorrect outcomes. For instance, while accuracy might reach 90%, a few errors can significantly affect overall performance due to the lack of mechanisms to stabilize learning.

      In order to clarify the setup of the rule, we have added pseudocode in the revised manuscript (Pages 31-32) detailing how the learning rule and metaplasticity update synaptic weights based on calcium and dopamine signals. Additionally, we have included pseudocode for the inhibitory learning rule on Pages 34-35. In future work, we also aim to incorporate biologically plausible mechanisms, such as dopamine desensitization, to enhance stability.

      ii. Biological Significance: If the goal is to interpret biological functions, the authors should dig deeper into the model behaviors to uncover their biological significance. This exploration should aim to link the observed computational features of the model more directly with biological mechanisms and outcomes.

      As now clearly stated in the introduction, the goal of the study is to see whether and to what quantitative extent the theoretical solution of the NFBP proposed in Tran-Van-Minh et al. (2015) can be achieved with biophysically detailed neuron models and with a biologically inspired learning rule. The problem has so far been solved with abstract and phenomenological neuron models (Schiess et al., 2014; Legenstein and Maass, 2011) and also with a detailed neuron model but with a precalculated voltage-dependent learning rule (Bicknell and Häusser, 2021).

      We have also tried to better explain the model mechanisms by adding supplementary figures.

      Reviewer #2 (Recommendations For The Authors):

      Minor:

      (1) The [Ca]NMDA in Figure 2A and 2C can have large values even when very few synapses are activated. Why is that? Is this setting biologically realistic?

      The elevated [Ca²⁺]NMDA with minimal synaptic activation arises from high spine input resistance, small spine volume, and NMDA receptor conductance, which scales calcium influx with synaptic strength. Physiological studies report spine calcium transients typically up to ~1 μM (Franks and Sejnowski 2002, DOI: 10.1002/bies.10193), while our model shows ~7 μM for 0.625 nS and around ~3 μM for 0.5 nS, exceeding this range. The calcium levels of the model might therefore be somewhat high compared to biologically measured levels - however, this does not impact the learning rule, as the functional dynamics of the rule remain robust across calcium variations.

      (2) In the distributed synapses session, the study introduces two new mechanisms "Threshold spillover" and "Accumulative spillover". Both mechanisms are not basic concepts but quantitative descriptions of them are missing.

      Thank you for your feedback. Based on the recommendations from Reviewer 1, we have simplified the paper by removing the "Accumulative spillover" and focusing solely on the "Thresholded spillover" mechanism. In the updated version of the paper, we refer to it only as glutamate spillover. However, we acknowledge (page 22, lines 40-42) that to create sufficient non-linearities, other mechanisms, like structural plasticity, might also be involved (although testing this in the model will have to be postponed to future work).

      (3) The learning rule achieves moderate performance when feature-relevant synapses are organized in pre-designed clusters, but for more general distributed synaptic inputs, the model fails to faithfully solve the simple task (with its performance of ~ 75%). Performance results indicate the learning rule proposed, despite its delicate design, is still inefficient when the spatial distribution of synapses grows complex, which is often the case on biological neurons. Moreover, this inefficiency is not carefully analyzed in this paper (e.g. why the performance drops significantly and the possible computation mechanism underlying it).

      The drop in performance when using distributed inputs (to a mean performance of 80%) is similar to the mean performance in the same situation in Bicknell and Hausser (2021), see their Fig. 3C. The drop in performance is due to that: i) the relevant feature combinations are not often colocalized on the same dendrite so that they can be strengthened together, and ii) even if they are, there may not be enough synapses to trigger the supralinear response from the branch spillover mechanism, i.e. the inputs are not summated in a supralinear way (Fig. 6B, most input configurations only reach 75%).

      Because of this, at most one relevant feature combination can be learned. In the several cases when the random distribution of synapses is favorable for both relevant feature combinations to be learned, the NFBP is solved (Figs. 6B, some performance lines reach 100 % and 6C, example of such a case). We have extended the relevant sections of the paper trying to highlight the above mentioned mechanisms.

      Further, the theoretical results in Tran-Van-Minh et al. 2015 already show that to solve the NFBP with supralinear dendrites requires features to be pre-clustered in order to evoke the supralinear dendritic response, which would activate the soma. The same number of synapses distributed across the dendrites i) would not excite the soma as strongly, and ii) would summate in the soma as in a point neuron, i.e. no supralinear events can be activated, which are necessary to solve the NFBP. Hence, one doesn’t expect distributed synaptic inputs to solve the NFBP with any kind of learning rule. 

      (4) Figure 5B demonstrates that on average adding inhibitory synapses can enhance the learning capabilities to solve the NFBP for different pattern configurations (2, 3, or 4 features), but since the performance for excitatory-only setup varies greatly between different configurations (Figure 4B, using 2 or 3 features can solve while 4 cannot), can the results be more precise about whether adding inhibitory synapses can help improve the learning with 4 features?

      In response to the question, we added a panel to Figure 5B showing that without inhibitory synapses, 5 out of 13 configurations with four features successfully learn, while with inhibitory synapses, this improves to 7 out of 13. Figure 5—figure supplement 1B provides an explanation for this improvement: page 18 line 10-24

      (5) Also, in terms of the possible role of inhibitory plasticity in learning, as only on-site inhibition is studied here, can other types of inhibition be considered, like on-path or off-path? Do they have similar or different effects?

      This is an interesting suggestion for future work. We observed relevant dynamics in Figure 6A, where inhibitory synapses increased their weights on-site when randomly distributed. Previous work by Gidon and Segev (2012) examined the effects of different inhibitory types on NMDA clusters, highlighting the role of on-site and off-path inhibition in shunting. In our context, on-site inhibition in the same branch, appears more relevant for maintaining compartmentalized dendritic processing.

      (6) Figure 6A is mentioned in the context of excitatory-only setup, but it depicts the setup when both excitatory and inhibitory synapses are included, which is discussed later in the paper. A correction should be made to ensure consistency.

      We have updated the figure and the text in order to make it more clear that simulations are run both with and without inhibition in this context (page 21 line 4-13)

      (7) In the "Ca and kernel dynamics" plots (Fig 3,5), some of the kernel midlines (solid line) are overlapped by dots, e.g. the yellow line in Fig 3D, and some kernel midlines look like dots, which leads to confusion. Suggest to separate plots of Ca and kernel dynamics for clarity. 

      The design of the figures has been updated to improve the visibility of the calcium and kernel dynamics during training.

      (8) The formulations of the learning rule are not well-organized, and the naming of parameters is kind of confusing, e.g. T_min, T_max, which by default represent time, means "Ca concentration threshold" here.

      The abbreviations of the thresholds  ( T<sub>min</sub>,  T<sub>max</sub> in the initial version) have been updated to CTL and CTH, respectively, to better reflect their role in tracking calcium levels. The mathematical formulations have further been reorganized for better readability. The revised Methods section now follows a more structured flow, first explaining the learning mechanisms, followed by the equations and their dependencies.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with a rich dataset and solid methodology.

      The revisions made by the authors in this version have greatly improved the validity and clarity of the statistical techniques, and as a result the paper's findings are more convincing.

      This paper's primary strengths are: 1) its comprehensive dataset that allows for a snapshot of the dynamics of several related fields; 2) its thorough exploration of how self-citation behavior relates to characteristics of research and researchers.

      Thank you for your positive view of our paper and for your previous comments.

      Its primary weakness is that the study stops short of digging into potential mechanisms in areas where it is potentially feasible to do so - for example, studying international dynamics by identifying and studying researchers who move between countries, or quantifying more or less 'appropriate' self-citations via measures of abstract text similarity.

      We agree that these are limitations of the existing study. We updated the limitations section as follows (page 15, line 539):

      “Similarly, this study falls short in several potential mechanistic insights, such as by investigating citation appropriateness via text similarity or international dynamics in authors who move between countries.”

      Yet while these types of questions were not determined to be in scope for this paper, the study is quite effective at laying the important groundwork for further study of mechanisms and motivations, and will be a highly valuable resource for both scientists within the field and those studying it.

      Reviewer #2 (Public review):

      The study presents valuable findings on self-citation rates in the field of Neuroscience, shedding light on potential strategic manipulation of citation metrics by first authors, regional variations in citation practices across continents, gender differences in early-career self-citation rates, and the influence of research specialization on self-citation rates in different subfields of Neuroscience. While some of the evidence supporting the claims of the authors is solid, some of the analysis seems incomplete and would benefit from more rigorous approaches.

      Thank you for your comments. We have addressed your suggestions presented in the “Recommendations for the authors” section by performing your recommended sensitivity analysis that specifically identifies authors who could be considered neurologists, neuroscientists, and psychiatrists (as opposed to just papers that are published in these fields). Please see the “Recommendations for the authors” section for more details.

      Reviewer #3 (Public review):

      This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. The interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated.

      This issue of interpretability was already raised in my review of the previous revision, where I argued that the authors should take a more explicit causal framework. The authors have now revised some of the language in this revision, in order to downplay causal language. Although this is perfectly fine, this misses the broader point, namely that it is not clear what is being estimated. Perhaps it is best to refer to Lundberg et al. (2021) and ask the authors to clarify "What is your Estimand?" In my view, the theoretical estimands the authors are interested in are causal in nature. Perhaps the authors would argue that their estimands are descriptive. In either case, it would be good if the authors could clarify that theoretical estimand.

      Thank you for your comment and for highlighting this insightful paper. After reading this paper, we believe that our theoretical estimand is descriptive in nature. For example, in the abstract of our paper, we state: “This work characterizes self-citation rates in basic, translational, and clinical Neuroscience literature by collating 100,347 articles from 63 journals between the years 2000-2020.” This goal seems consistent with the idea of a descriptive estimand, as we are not interested in any particular intervention or counterfactual at this stage. Instead, we seek to provide a broad characterization of subgroup differences in self-citations such that future work can ask more focused questions with causal estimands.

      Our analysis included subgroup means and generalized additive models, both of which were described as empirical estimands for a theoretical descriptive estimand in Lundberg et al. We added the following text to the paper (page 3, line 112):

      “Throughout this work, we characterized self-citation rates with descriptive, not causal, analyses. Our analyses included several theoretical estimands that are descriptive 17, such as the mean self-citation rates among published articles as a function of field, year, seniority, country, and gender. We adopted two forms of empirical estimands. First, we showed subgroup means in self-citation rates. We then developed smooth curves with generalized additive models (GAMs) to describe trends in self-citation rates across several variables.”

      In addition, we added to the limitations section as follows (page 15, line 539):

      “Yet, this study may lay the groundwork for future works to explore causal estimands.”

      Finally, in my previous review, I raised the issue of when self-citations become "problematic". The authors have addressed this issue satisfactorily, I believe, and now formulate their conclusions more carefully.

      Thank you for your previous comments. We agree that they improved the paper.

      Lundberg, I., Johnson, R., & Stewart, B. M. (2021). What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory. American Sociological Review, 86(3), 532-565. https://doi.org/10.1177/00031224211004187

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Thank you for your thorough revisions and responses to the reviews

      Reviewer #2 (Recommendations for the authors):

      I appreciate the authors' responses and am satisfied with all their replies except for my second comment. I still find the message conveyed slightly misleading, as the results seem to be generalized to neurologists, neuroscientists, and psychiatrists. It is important to refine the analysis to focus specifically on neuroscientists, identified as first or last authors based on their publication history. This approach is common in the science of science literature and would provide a more accurate representation of the findings specific to neuroscientists, avoiding the conflation with other related fields. This refinement could serve as a robustness check in the supplementary. I think adding this sub-analysis is essential to the validity of the results claimed in this paper.

      Thank you for your comment. We added a sensitivity analysis where fields are defined by an author’s publication history, not by the journal of each article.

      In the main text, we added the following:

      (Page 3, line 129) “When determining fields by each author’s publication history instead of the journal of each article, we observed similar rates of self-citation (Table S7). The 95% confidence intervals for each field definition overlapped in most cases, except for Last Author self-citation rates in Neuroscience (7.54% defined by journal vs. 8.32% defined by author) and Psychiatry (8.41% defined by journal vs. 7.92% defined by author).”

      Further details are provided in the methods section (page 21, line 801):

      “4.11 Journal-based vs. author-based field sensitivity analyses

      We refined our field-based analysis to focus only on authors who could be considered neuroscientists, neurologists, and psychiatrists. For each author, we looked at the number of articles they had in each subfield, as defined by Scopus. We considered 12 subfields that fell within Neurology, Neuroscience, and Psychiatry. These subfields are presented in Table S12. For each First Author and Last Author, we excluded them if any of their three most frequently published subfields did not include one of the 12 subfields of interest. If an author’s top three subfields included multiple broader fields (e.g., both Neuroscience and Psychiatry), then that author was categorized according to the field in which they published the most articles. Among First Authors, there were 86,220 remaining papers, split between 33,054 (38.33%) in Neurology, 23,216 (26.93%) in Neuroscience, and 29,950 (34.73%) in Psychiatry. Among Last Authors, there were 85,954 remaining papers, split between 31,793 (36.98%) in Neurology, 25,438 (29.59%) in Neuroscience, and 28,723 (33.42%) in Psychiatry.”

      Reviewer #3 (Recommendations for the authors):

      I would like to thank the authors for their responses the points that I raised, I do not have any new comments or further responses.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript reports that expression of the E. coli operon topAI/yjhQ/yjhP is controlled by the translation status of a small open reading frame, that authors have discovered and named toiL, located in the leader region of the operon. The authors propose the following model for topAI activation: Under normal conditions, toiL is translated but topAI is not expressed because of Rho-dependent transcription termination within the topAI ORF and because its ribosome binding site and start codon are trapped in an mRNA hairpin. Ribosome stalling at various codons of the toiL ORF, caused by the presence of some ribosome-targeting antibiotics, triggers an mRNA conformational switch which allows translation of topAI and, in addition, activation of the operon's transcription because the presence of translating ribosomes at the topAI ORF blocks Rho from terminating transcription. Even though the model is appealing and several of the experimental data support some aspects of it, several inconsistencies remain to be solved. In addition, even though TopAI was shown to be an inhibitor of topoisomerase I (Yamaguchi & Inouye, 2015, NAR 43:10387), the authors suggest, without offering any experimental support, that, because ribosome-targeting antibiotics act as inducers, expression of the topAI/yjhQ/yjhP operon may confer resistance to these drugs.

      Strengths:

      - There is good experimental support of the transcriptional repression/activation switch aspect of the model, derived from well-designed transcriptional reporters and ChIP-qPCR approaches.

      - There is a clever use of the topAI-lacZ reporter to find the 23S rRNA mutants where expression topAI was upregulated. This eventually led the authors to identify that translation events occurring at toiL are important to regulate the topAI/yjhQ/yjhP operon. Is there any published evidence that ribosomes with the identified mutations translate slowly (decreased fidelity does not necessarily mean slow translation, does it?)?

      G2253 is in helix 80 of the 23S rRNA, which has been proposed to be involved in correct positioning of the tRNA. Mutations in helix 80 have been reported to cause defects in peptidyl transferase center activity, which could reduce the rate of ribosome movement along the mRNA. If ribosomes are sufficiently slowed when translating toiL, this could induce expression of topAI. G1911 and Ψ1917 are in helix 69 of the 23S rRNA, which is involved in forming the inter-subunit bridge, as well as interactions with release factors. Mutations in helix 69 cause a decrease in the processivity of translation, suggesting that the mutations we identified may increase the occupancy of ribosomes within toiL, thereby inducing expression of topAI. We have added text to the Discussion section to include this speculation.

      - Authors incorporate relevant links to the antibiotic-mediated expression regulation of bacterial resistance genes. Authors can also mention the tryptophan-mediated ribosome stalling at the tnaC leader ORF that activates the expression of tryptophan metabolism genes through blockage of Rho-mediated transcriptional attenuation.

      We have added a citation to a recent structural study of ribosomes translating the tnaC uORF. Specifically, we speculate in the Discussion that toiL may have evolved to sense a ribosome-targeting antibiotic, or another ribosome-targeting small molecule such as an amino acid.

      Weaknesses:

      The main weaknesses of the work are related to several experimental results that are not consistent with the model, or related to a lack of data that needs to be included to support the model.

      The following are a few examples:

      - It is surprising that authors do not mention that several published Ribo-seq data from E. coli cells show active translation of toiL (for example Li et al., 2014, Cell 157: 624). Therefore, it is hard to reconcile with the model that starts codon/Shine-Dalgarno mutations in the toiL-lux reporter have no effect on luciferase expression (Figure 2C, bar graphs of the no antibiotic control samples).

      These data are for a topAI-lux reporter construct rather than toiL-lux. In our model, ribosome stalling within toiL is required to induce expression of the downstream genes; preventing translation of toiL by mutating the start codon or Shine-Dalgarno sequence would not cause ribosome stalling, consistent with the lack of an effect on topAI expression.

      - The SHAPE reactivity data shown in Figure 5A are not consistent with the toiL ORF being translated. In addition, it is difficult to visualize the effect of tetracycline on mRNA conformation with the representation used in Figure 5B. It would be better to show SHAPE reactivity without/with Tet (as shown in panel A of the figure).

      We have modified this figure (now Figure 6) so that we no longer show the SHAPE-seq data +/- tetracycline overlayed on the predicted RNA structure, since at best, the predicted structure likely only represents uninduced state. We have included the predicted structure together with the SHAPE-seq data for untreated cells as a separate panel because it is part of the basis for our model. We have also added a supplementary figure showing a similar RNA structure prediction based on conservation of the topAI upstream region across species (Figure 6 – figure supplement 1), and we describe this in the text.

      - The "increased coverage" of topAI/yjhP/yjhQ in the presence of tetracycline from the Ribo-seq data shown in Figure 6A can be due to activation of translation, transcription, or both. For readers to know which of these possibilities apply, authors need to provide RNA-seq data and show the profiles of the topAI/yjhQ/yjhP genes in control/Tet-treated cells.

      A previous study (Li et al., 2014, PMID 24766808) compared RNA-seq and Ribo-seq data for E. coli to measure normalized ribosome occupancy for each gene. However, sequence coverage for topAI was too low to confidently quantify either the RNA-seq or the Ribo-seq data. Presumably RNA levels were low because of Rho termination. Hence, we were not confident that RNA-seq would provide information on the regulation of topAI-yjhQP. Other data in our study provide strong evidence that regulation is primarily at the level of translation. And the key conclusion from Figure 6 (now Figure 7) is that tetracycline stalls ribosomes on start codons.

      - Similarly, to support the data of increased ribosomal footprints at the toiL start codon in the presence of Tet (Figure 6B), authors should show the profile of the toiL gene from control and Tet-treated cells.

      Figure 6B shows data for both treated and untreated cells. The overall ribosome occupancy is much lower for untreated cells, making it difficult to draw strong conclusions about the relative distribution of ribosomes across toiL.

      - Representation of the mRNA structures in the model shown in Figure 5, does not help with visualizing 1) how ribosomes translate toiL since the ORF is trapped in double-stranded mRNA, and 2) how ribosome stalling on toiL would lead to the release of the initiation region of topAI to achieve expression activation.

      We now show the predicted structure with only SHAPE-seq data for untreated cells. The comparison of SHAPE-seq +/- tetracycline is shown without reference to the predicted structure.

      - The authors speculate that, because ribosome-targeting antibiotics act as expression inducers [by the way, authors should mention and comment that, more than a decade ago, it had been reported that kanamycin (PMID: 12736533) and gentamycin (PMID: 19013277) are inducers of topAI and yjhQ], the genes of the topAI/yjhQ/yjhP operon may confer resistance to these antibiotics. Such a suggestion can be experimentally checked by simply testing whether strains lacking these genes have increased sensitivity to the antibiotic inducers.

      We thank the reviewer for pointing out these references, which we now cite. The fact that another group found that gentamycin induces topAI expression – it is one of the most highly induced genes in that paper – strongly suggests that we missed the key inducing concentrations for one or more antibiotics, meaning that topAI is induced by even more ribosome-targeting antibiotics than we realized.

      We did some preliminary experiments to look for effects of TopAI, YjhQ, and/or YjhP on antibiotic sensitivity, but generated only negative results. Since these experiments were preliminary and far from exhaustive, we have chosen not to include them in the manuscript. Other studies of genes regulated by ribosome stalling in a uORF have looked at genes whose functions in responding to translation stress were already known, so the environmental triggers were more obvious. With so many possible triggers for topAI-yjhQP, it will likely require considerable effort to find the relevant trigger(s). Hence, we consider this an important question, but beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this important study, Baniulyte and Wade describe how the translation of an 8-codon uORF denoted toiL upstream of the topAI-yjhQP operon is responsive to different ribosome-targeting antibiotics, consequently controlling translation of the TopAI toxin as well as Rho-dependent termination with the gene.

      Strengths:

      I appreciate that the authors used multiple different approaches such as a genetic screen to identify factors such as 23S rRNA mutations that affect topA1 expression and ribosome profiling to examine the consequences of various antibiotics on toiL-mediated regulation. The results are convincing and clearly described.

      Weaknesses:

      I have relatively minor suggestions for improving the manuscript. These mainly relate to the figures.

      Reviewer #3 (Public Review):

      Summary:

      The authors nicely show that the translation and ribosome stalling within the ToiL uORF upstream of the co-transcribed topAI-yjhQ toxin-antitoxin genes unmask the topAI translational initiation site, thereby allowing ribosome loading and preventing premature Rho-dependent transcription termination in the topAI region. Although similar translational/transcriptional attenuation has been reported in other systems, the base pairing between the leader sequence and the repressed region by the long RNA looping is somehow unique in toiL-topAI-yjhQP. The experiments are solidly executed, and the manuscript is clear in most parts with areas that could be improved or better explained. The real impact of such a study is not easy to appreciate due to a lack of investigation on the physiological consequences of topAI-yjhQP activation upon antibiotic exposure (see details below).

      Strengths:

      Conclusion/model is supported by the integrated approaches consisting of genetics, in vivo SHAPE-seq and Ribo-Seq.

      Provide an elegant example of cis-acting regulatory peptides to a growing list of functional small proteins in bacterial proteomes.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Examine the consequences of mutations impeding translation of the topAI/yjhQ/yjhP operon on cell growth in the presence and absence of antibiotics.

      See response to Reviewer 1’s comment.

      (2) Resolve discrepancies between the SHAPE data indicating constitutive sequestration of the toiL Shine Dalgarno sequence with antibiotic-regulated translation of the toiL ORF.

      See response to Reviewer 1’s comment.

      (3) Reconcile published Ribo-Seq data with the model that start codon/Shine-Dalgarno mutations in the toiL-lux reporter have no effect on luciferase expression in the absence of antibiotics.

      See response to Reviewer 1’s comment.

      (4) Clarify whether antibiotic MIC values were employed to select antibiotic concentrations for different experiments.

      The antibiotic concentrations we used are in line with reported MICs for E. coli. We now list the reported ECOFFs/MICs and include relevant citations.

      (5) Provide RNA-seq data to complement the Ribo-Seq data for the topAI/yjhQ/yjhP genes in control vs. Tet-treated cells.

      See response to Reviewer 1’s comment.

      (6) Revise the text to address as many of the reviewers' suggestions as reasonably possible.

      Changes to the text have been made as indicated in the responses to the reviewers’ comments.

      Reviewer #2 (Recommendations for the Authors):

      (1) Page 6: I would have liked to have more information about the 39 suppressor mutations in rho. Do any of the cis-acting mutations give support for the model proposed in Figure 8?

      We only know the specific mutation for some of the strains, and we now list those mutations in the Methods section. For other mutants, we mapped the mutation to either the rho gene or to Rho activity, but we did not sequence the rho gene. Most of the specific mutations we did identify fall within the primary RNA-binding site of Rho and hence should be considered partial-loss-of-function mutations (complete loss of function would be lethal).

      We identified cis-acting mutations by re-transforming the lacZ reporter plasmid into a wild-type strain. We did not sequence any of these plasmids.

      (2) Page 12-13, Section entitled "Mapping ribosome stalling sites induced by different antibiotics": This section should start with a better transition regarding the logic of why the experiments were carried out and should end with an interpretation of the results.

      We have added a few sentences at the start of this section to explain the rationale. We have also added two sentences at the end of this section to summarize the interpretation of the data.

      (3) Page 15: The authors should discuss under what conditions the expression of TopAI (and YjhQ/YjhP might be induced? Is expression also elevated upon amino acid starvation?

      We have looked through public RNA-seq data but have not identified growth conditions other than antibiotic treatment that induce expression of topAI, yjhQ or yjhP.

      (4) References: The authors should be consistent about capitalization, italics, and abbreviations in the references.

      These formatting errors will be fixed in the proofing stage.

      (5) All graph figures: There should be more uniformity in the sizes of individual data points (some are almost impossible to see) and error bars across the figures.

      We have tried to make the data points and error bars more visible for figures where they were smaller.

      (6) Figure 1B: I do not think the left arrow labeling is very intuitive and suggest renaming these constructs.

      We have removed the arrows to improve clarity.

      (7) Figure 2A: toiL should be introduced at the first mention of Figure 2A.

      We have added a schematic of the topAI-yjhQ-yjhP region as Figure 1A, including the toiL ORF, which we briefly mention in the text. We have opted to split Figure 2C into two panels. In Figure 2C we now only show data for the wild-type construct. Data for the mutant constructs are now shown in a new figure (Figure 5), alongside data for the wild-type constructs. We have simplified Figure 2A, since the mutations are not relevant to this revised figure, and we now show the schematic with the mutations as Figure 5A.

      (8) Figure 3C and 3D: I suggest giving these graphs headings (or changing the color of the bars in Figure 3D) to make it more obvious that different things are measured in the two panels.

      We have added headers to panels B-D make it clear that which graphs show ChIP-qPCR data which graph shows qRT-PCR data.

      (9) Figure 6: It might be nice to show the topAI-yjhPQ operon here.

      We now show the operon in Figure 1A.

      (10) Figure 8: This figure could be optimized by adding 5' and 3' end labels and having more similarity with the model in Figure 7.

      The constructs shown in Figure 7 lack most of the topAI upstream region, so they aren’t readily comparable to the schematic in Figure 8. However, we have changed the color of the ribosome in Figure 7 to match that in Figure 8. We also indicate the 5’ end of the RNA in Figure 8.

      Reviewer #3 (Recommendations for the Authors):

      Areas to improve:

      (1) While it's important to learn about ToiL-dependent regulation of the downstream topAI-yjhQ toxin-antitoxin genes, the physiological consequence of topAI-yjhQ activation seems to be lost in the manuscript. Everything was done with a reporter lacZ/lux. In the absence of toiL translation (i.e. SD mutant) and/or ribosome stalling, does premature transcription termination result in non-stochiometric synthesis of toxin vs. antitoxin, leading to growth arrest or other measurable phenotype? Knowing the impact of ToiL in the native topAI-yjhQ context will be valuable.

      See response to Reviewer 1’s comment.

      (2) It was indicated in Figure 4-figure supplement 1 that toiL homologs are found in many other proteobacteria, are the UR sequences in those species also form a similar inhibitory RNA loop?? The nt sequence identity of toiL is likely to be constrained by the base pairing of the topAI 5' region.

      We have added a supplementary figure panel showing an RNA structure prediction for the topAI upstream region based on sequence alignment of homologous regions from other species (Figure 6 – figure supplement 1).

      What is the frequency of the MLENVII hepta-peptide in the E. coli genome-wide. Is the sequence disfavored to avoid spurious multi-antibiotic sensing?

      LENVII is not found in any annotated E. coli K-12 protein. However, this is a sufficiently long sequence that we would expect few to no instances in the E. coli proteome.

      (3) Figure 1A, it would be helpful to indicate the location of the toiL (red arrow as in Figure 2A) relative to the putative rut site early in the beginning of the results. Does TSS mark the transcription start site? There is no annotation of TSS in the figure legend. Was TSS previously mapped experimentally? Please include relevant citations.

      We now indicate the position of the TSS relative to the topAI start codon. Similarly, we indicate the position of the start of toiL relative to the topAI start codon in Figure 2A. We now explain “TSS” in the figure legend. There is a reference in the text for the TSS (Thomason et al., 2015).

      (4) Please consider rearranging the results section, perhaps more helpful to introduce the toiL in Figure 1 or earlier. The current format requires readers to switch back-and-forth between Figure 4 and Figure 2.

      We have added a schematic of the topAI upstream region as Figure 1A, and we have separated Figure 2C as described in a response to a comment from Reviewer 2.

      (5) Figure 2A and Figure 2-Figure Suppl 1A, for clarity, please mark the rut site upstream of the red arrow.

      Rather than mark the rut on Figure 2A, which would make for a busy schematic, readers can compare the positions of the rut to those of toiL, which we have now added to Figures 1B (formerly Figure 1A) and 2A.

      (6) The following conclusion seems speculative: "...but does not trigger termination until RNAP ..., >180 nt further downstream…". Shouldn't the authors already know where the termination site is based on their previous Term-seq data (see Ref 1, Adams PP et al 2021)?

      Sites of Rho-dependent transcription termination cannot be mapped precisely from Term-seq data because exoribonucleases rapidly process the unstructured RNA 3’ ends.

      (7) Genetic screen: Please discuss why the 23S rRNA mutations that cause translational infidelity could promote topAI translation. Wouldn't the mutant ribosome be affected in translating toiL?

      See response to Reviewer 1’s comment.

      (8) Although antibiotic concentrations were provided in Figure 2 legend, please provide the MIC values of each antibiotic, e.g., in Table S2, for the tested E. coli strain, to inform readers how specific subinhibitory concentrations were chosen.

      See response to Reviewing Editor.

      (9) Please clarify the calculation of luciferase units in the y-axis of Figure 2A, why the scale is drastically higher than that of Figure 7C using the same antibiotics?

      These reporter assays use different constructs. The reporter construct used for experiments in Figure 7 includes a portion of the ermCL gene and associated downstream sequence. We have enlarged Figure 7A to highlight the difference in reporter constructs.

      (10) Table S4 needs a few more details. It is unclear how those numbers in columns G-H were generated. Do those numbers correspond to ribosome density per nt/ORF?

      We have added footnotes to Table S4 to indicate that the numbers in columns G and H represent sequence read coverage normalized by region length and by the upper quartile of gene expression.

      (11) Figure 5, if the SHAPE results were true, the Shine Dalgarno sequence of toiL is sequestered in the hairpin structure with and without tetracycline treatment. It is inconceivable that translational initiation will occur efficiently, please discuss.

      Our representation of the SHAPE-seq data was confusing since we overlayed the SHAPE-seq changes on a predicted structure that likely corresponds to the uninduced state. We hope that the new version of Figure 5 is clearer.

      We presume the reviewer is referring to the Shine-Dalgarno sequence of topAI rather than toiL, since the Shine-Dalgarno sequence of toiL is predicted to be unstructured even in the absence of tetracycline treatment. The ribosome-binding site of topAI is more accessible in cells treated with tetracycline, although the SHAPE-seq data suggest that this is a transient event. The binding of the initiating ribosome may also reduce reactivity in this region under inducing conditions. We now discuss this briefly in the text.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors repeatedly assert that an individual's behavior in the foraging assay depends on its prior history (particularly cultivation conditions). While this seems like a reasonable expectation, it is not fully fleshed out. The work would benefit from studies in which animals are raised on more or less abundant food before the behavioral task.

      Cultivation density: While we agree with the reviewer that testing the effects of varying bacterial density during animal development (cultivation) is an interesting experiment, it is not feasible at this time. We previously attempted this experiment but found it nontrivial to maintain stable bacterial density conditions over long timescales as this requires matching the rate of bacterial growth with the rate of bacterial consumption. Despite our best efforts, we have not been able to identify conditions that satisfy these requirements. Thus, we focused our revised manuscript to include only assertions about the effects of recent experiences and added this inquiry as a future direction (lines 618-624).

      (2) The authors convincingly show that the probability of particular behavioral outcomes occurring upon patch encounter depends on time-associated parameters (time since last patch encounter, time since last patch exploitation). There are two concerns here. First, it is not clear how these values are initialized - i.e., what values are used for the first occurrence of each behavioral state? More importantly, the authors don't seem to consider the simplest time parameter, the time since the start of the assay (or time since worm transfer). Transferring animals to a new environment can be associated with significant mechanical stimulus, and it seems quite possible that transferring animals causes them to enter a state of arousal. This arousal, which certainly could alter sensory function or decision-making, would likely decay with time. It would be interesting to know how well the model performs using time since assay starts as the only time-dependent parameter.

      Parameter Initialization: We thank the reviewer for pointing out an oversight in our methods section regarding the model parameter values used for the first encounter. We clarified the initialization of parameters in the manuscript (lines 1162-1179). In short, for the first patch encounter where k = 1:

      ρ<sub>k</sub> is the relative density of the first patch.

      τ<sub>s</sub> is the duration of time spent off food since the beginning of the recorded experiment. For the first patch, this is equivalent to the total time elapsed.

      ρ<sub>h</sub> is the approximated relative density of the bacterial patch on the acclimation plates (see Assay preparation and recording in Methods). Acclimation plates contained one large 200 µL patch seeded with OD<sub>600</sub> = 1 and grown for a total of ~48 hours. As with all patches, the relative density was estimated from experiments using fluorescent bacteria OP50-GFP as described in Bacterial patch density estimation in Methods.

      ρ<sub>e</sub> is equivalent to ρ<sub>h</sub>.

      Transfer Method: We thank the reviewer for their thoughtful comment on how the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We anticipated this possibility and, in order to mitigate the stress of moving, we used an agar plug method where animals were transferred using the flat surface of small cylinders of agar. Importantly, the use of agar as a medium to transfer animals provides minimal disruption to their environment as all physical properties (e.g. temperature, humidity, surface tension) are maintained. Qualitatively, we observed no marked change in behavior from before to after transfer with the agar plug method, especially as compared to the often drastic changes observed when using a metal or eyelash pick. We added these additional methodological details to the methods (lines 791-796).

      Time Parameter: However, the reviewer’s concern that the simplest time parameter (time since start of the assay) might better predict animal behavior is valid. We thank the reviewer for pointing out the need to specifically test whether the time-dependent change in explore-exploit decision-making corresponds better with satiety (time off patch) or arousal (time since transfer/start of assay) state. To test this hypothesis, we ran our model with varying combinations of the satiety term τ<sub>s</sub> and a transfer term τ<sub>t</sub>. We found that when both terms were included in the model, the coefficient of the transfer term was non-significant. This result suggests that the relevant time-dependent term is more likely related to satiety than transfer-induced stress (lines 343-358; Figure 4 - supplement 4D).

      (3) Similarly, Figures 2L and M clearly show that the probability of a search event occurring upon a patch encounter decreases markedly with time. Because search events are interpreted as a failure to detect a patch, this implies that the detection of (dilute) patches becomes more efficient with time. It would be useful for the authors to consider this possibility as well as potential explanations, which might be related to the point above.

      Time-dependent changes in sensing: We agree with the reviewer that we observe increased responsiveness to dilute patches with time. Although this is interesting, our primary focus was on what decision an animal made given that they clearly sensed the presence of the bacterial patch. Nonetheless, we added this observation to the discussion as an area of future work to investigate the sensory mechanisms behind this effect (lines 563-568).

      (4) Based on their results with mec-4 and osm-6 mutants, the authors assert that chemosensation, rather than mechanosensation, likely accounts for animals' ability to measure patch density. This argument is not well-supported: mec-4 is required only for the function of the six non-ciliated light-touch neurons (AVM, PVM, ALML/R, PLML/R). In contrast, osm-6 is expected to disrupt the function of the ciliated dopaminergic mechanosensory neurons CEP, ADE, and PDE, which have previously been shown to detect the presence of bacteria (Sawin et al 2000). Thus, the paper's results are entirely consistent with an important role of mechanosensation in detecting bacterial abundance. Along these lines, it would be useful for the authors to speculate on why osm-6 mutants are more, rather than less, likely to "accept" when encountering a patch.

      Sensory mutant behavior: We thank the reviewer for pointing out the error in our interpretation of the behavior of osm-6 and mec-4 animals. We further elaborated on our findings and edited the text to better reflect that osm-6 mutants lack both chemosensory and mechanosensory ciliated sensory neurons (lines 406-448; lines 567-577). Specifically, we provided some commentary on the finding that osm-6 mutants show an augmented ability to detect the presence of bacterial patches but a reduced ability to assess their bacterial density. While this finding seems contradictory, it suggests that in the absence of the ability to assess bacterial density, animals must prioritize exploiting food resources when available.

      (5) While the evidence for the accept-reject framework is strong, it would be useful for the authors to provide a bit more discussion about the null hypothesis and associated expectations. In other words, what would worm behavior in this assay look like if animals were not able to make accept-reject decisions, relying only on exploit-explore decisions that depend on modulation of food-leaving probability?

      Accept-reject vs. stay-switch: We thank the reviewer for alerting us to this gap in our discussion. We have revised the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior (lines 507-533).

      Reviewer #3 (Public review):

      (1) Sensing vs. non-sensing

      The authors claim that when animals encounter dilute food patches, they do not sense them, as evidenced by the shallow deceleration that occurs when animals encounter these patches. This seems ethologically inaccurate. There is a critical difference between not sensing a stimulus, and not reacting to it. Animals sense numerous stimuli from their environment, but often only behaviorally respond to a fraction of them, depending on their attention and arousal state. With regard to C. elegans, it is well-established that their amphid chemosensory neurons are capable of detecting very dilute concentrations of odors. In addition, the authors provide evidence that osm-6 animals have altered exploit behaviors, further supporting the importance of amphid chemosensory neurons in this behavior.

      Interpretation of “non-sensing” encounters: We thank the reviewer for their comment and agree that we do not know for certain whether the animals sensed these patches or were merely non-responsive to them. We are, however, confident that these encounters lack evidence of sensing. Specifically, we note that our analyses used to classify events as sensing or non-sensing examined whether an animal’s slow-down upon patch entry could be distinguished from either that of events where animals exploited or that of encounters with patches lacking bacteria. We found that  “non-sensing” encounters are indeed indistinguishable from encounters with bacteria-free patches where there are no bacteria to be sensed (see Figure 2 - Supplement 8A-C and Patch encounter classification as sensing or non-responding in Methods). Regardless, we agree with the reviewer that all that can be asserted about these events is that animals do not appear to respond to the bacterial patch in any way that we measured. Therefore, we have replaced the term “non-sensing” with “non-responding” to better indicate the ethological interpretation of these events and clarified the text to reflect this change (lines 193-200; lines 211-212).

      (2) Search vs. sample & sensing vs. non-sensing

      In Figures 2H and 2I, the authors claim that there are three behavioral states based on quantifying average velocity, encounter duration, and acceleration, but I only see three. Based on density distributions alone, there really only seem to be 2 distributions, not 3. The authors claim there are three, but to come to this conclusion, they used a QDA, which inherently is based on the authors training the model to detect three states based on prior annotations. Did the authors perform a model test, such as the Bayesian Information Criterion, to confirm whether 2 vs. 3 Gaussians is statistically significant? It seems like the authors are trying to impose two states on a phenomenon with a broad distribution. This seems very similar to the results observed for roaming vs. dwelling experiments, which again, are essentially two behavioral states.

      Validation of sensing clusters: We are grateful to the reviewer for pointing out the difficulty in visualizing the clusters and the need for additional clarity in explaining the semi-supervised QDA approach. We added additional visualizations and methods to validate the clusters we have discovered. Specifically, we used Silverman’s test to show that the sensing vs. non-responding data were bi-modal (i.e. a two-cluster classification method fits best) and accompanied this statistical test with heat maps which better illustrate the clusters (lines 171-173; lines 190-191; lines 948-972; lines 1003-1005; Figure 2 - supplement 6A-C; Figure 2 - supplement 7C-F).

      Further, it seems that there may be some confusion as to how we arrived at 3 encounter types (i.e. search, sample, exploit). It’s important to note that two methods were used on two different (albeit related) sets of parameters. We first used a two-cluster GMM to classify encounters as explore or exploit. We then used a two-cluster semi-supervised QDA to classify encounters as sensing or non-sensing (now changed to “non-responding”, see above response) using a different set of parameters. We thus separated the explore cluster into two (sensing and non-responding exploratory events) resulting in three total encounter types: exploit, sample (explore/sensing), and search (explore/non-sensing).

      (4) History-dependence of the GLM

      The logistic GLM seems like a logical way to model a binary choice, and I think the parameters you chose are certainly important. However, the framing of them seems odd to me. I do not doubt the animals are assessing the current state of the patch with an assessment of past experience; that makes perfect logical sense. However, it seems odd to reduce past experience to the categories of recently exploited patch, recently encountered patch, and time since last exploitation. This implies the animals have some way of discriminating these past patch experiences and committing them to memory. Also, it seems logical that the time on these patches, not just their density, should also matter, just as the time without food matters. Time is inherent to memory. This model also imposes a prior categorization in trying to distinguish between sensed vs. not-sensed patches, which I criticized earlier. Only "sensed" patches are used in the model, but it is questionable whether worms genuinely do not "sense" these patches.

      Model design: We thank the reviewer for their thoughtful comments on the model. We completed a number of analyses involving model selection including model selection criteria (AIC, BIC) and optimization with regularization techniques (LASSO and elastic nets) and found that the problem of model selection was compounded by the enormous array of highly-correlated variables we had to choose from. Additionally, we found that both interaction terms and non-linear terms of our task variables could be predictive of accept-reject decisions but that the precise set of terms selected depended sensitively on which model selection technique was used and generally made rather small contributions to prediction. The diverse array of results and combinatorial number of predictors to possibly include failed to add anything of interpretable value. We therefore chose to take a different approach to this problem. Rather than trying to determine what the “best” model was we instead asked whether a minimal model could be used to answer a set of core questions. Indeed, our goal was not maximal predictive performance but rather to distinguish between the effects of different influences enough to determine if encounter history had a significant, independent effect on decision making. We thus chose to only include task variables that spanned the most basic components of behavioral mechanisms to ask very specific questions. For example, we selected a time variable that we thought best encapsulated satiety. While we could have included many additional terms, or made different choices about which terms to include, based on our analyses these choices would not have qualitatively changed our results. Further, we sought to validate the parameters we chose with additional studies (i.e. food-deprived and sensory mutant animals). We regard our study as an initial foray into demonstrating accept-reject decision-making in nematodes. The exact mechanisms and, consequently, the best model design are therefore beyond the scope of this study.

      Lastly, in regards to the use of only sensed patches in the model; while we acknowledge that we are not certain as to whether the “non-responding” encounters are truly not sensed, we find qualitatively similar results when including all exploratory patches in our analyses. However, we take the position that sensation is necessary for decision-making and thus believe that while our model’s predictive performance may be better using all encounters, the interpretation of our findings is stronger when we only include sensing events. We have added additional commentary about our model to the discussion section (lines 667-695).

      (5) osm-6

      The osm-6 results are interesting. This seems to indicate that the worms are still sensing the food, but are unable to assess quality, therefore the default response is to exploit. How do you think the worms are sensing the food? Clearly, they sense it, but without the amphid sensory neurons, and not mechanosensation. Perhaps feeding is important? Could you speculate on this?

      We thank the reviewer for their thoughtful remarks. We have added additional commentary about the result of our sensory mutant experiments as described above in response to Reviewer #1 under Sensory mutant behavior.

      (7) Impact:

      I think this work will have a solid impact on the field, as it provides tangible variables to test how animals assess their environment and decide to exploit resources. I think the strength of this research could be strengthened by a reassessment of their model that would both simplify it and provide testable timescales of satiety/starvation memory.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors title the work as an "ethological study" and emphasize the theme of "foraging in naturalistic environments" in contrast to typical laboratory conditions. The only difference in this study relative to typical laboratory conditions is that the food bacteria is distributed in many small patches as compared to one large patch. First, it is not clear to the reviewer that the size of the food patches in these experiments is more relevant to C. elegans in its natural context than the standard sizes of food patches. Furthermore, all the other highly unnatural conditions typical of laboratory cultivation still apply: the use of a 2D agar substrate, a single food bacteria that is not a component of a naturalistic diet, and the use of a laboratory-adapted strain of C. elegans with behavior quite distinct from that of natural isolates. The reviewer is not suggesting that the authors need to make their experiments more naturalistic, only that the experiments as described here should not be described as naturalistic or ethological as there is no support for such claims.

      Ethological interpretation: We thank the reviewer for their comments about the use of the term ethological to describe this study. We chose to develop a patchy bacterial assay to mimic the naturalistic “boom-or-bust” environment. While we agree with the reviewer that we do not know if the size and distribution of the food patches in these experiments is more relevant to C. elegans, we maintain that these experiments were ecologically-inspired and revealed behavior that is difficult to observe in environments with large, densely-seeded bacterial patches. We have updated our text to better reflect that this study was “ecologically-inspired” rather than truly “ethological” in nature (lines 94, 693).

      The main finding of the paper is that worms explore and then exploit, i.e. they frequently reject several bacterial patches before accepting one. This result requires additional scrutiny to reject other possible interpretations. In particular, when worms are transferred to a new plate we would expect some period of increased arousal due to the stressful handling process. A high arousal state might cause rejection of food patches. Could the measured accept/reject decisions be influenced by this effect? One approach to addressing this concern would be to allow the animals to acclimate to the new plate on a bare region before encountering the new food patches.

      We thank the reviewer for their comment on how the stress of transferring animals to a new plate may have resulted in an increased arousal state and thus a greater probability of rejecting patches. We addressed this above in response to Reviewer #1 under Transfer Method and Time Parameter. In brief, we used a worm picking method that mitigated stress and added additional analyses showing that a transfer-related term was less predictive than a satiety-related term.

      Related to the above, in what circumstances exactly are the authors claiming that worms first explore and then exploit? After being briefly deprived of food? After being handled?

      Explore-then-exploit: All animals were well-fed and handled gently as described above under Transfer Method (lines 787-795). Our results suggest that the appearance of an explore-then-exploit strategy is a byproduct of being transferred from an environment with high bacterial density to an environment with low bacterial density as described in the manuscript (lines 461-466).

      The authors emphasize their analysis of the accept/reject decision as a critical innovation. However, the accept/reject decision does not strike me as substantially different from the previously described stay/switch decision. When a worm encounters a new patch of bacteria, accepting this bacteria is equivalent to staying on it and rejecting (leaving) it is equivalent to switching away from it. The authors should explain how these concepts are significantly distinct.

      Accept-reject vs. stay-switch: We thank the reviewer for alerting us to this gap in our discussion. We have revised the text to further extrapolate upon our point of view on this somewhat philosophical distinction and what it predicts about C. elegans behavior (lines 507-533).

      During patch encounter classification, the authors computed three of the animals' behavioral metrics (Line 801-804) and claimed that the combination of these three metrics reveals two non-Gaussian clusters representing encounters where animals sensed the patch or did not appear to sense the patch. The authors also refer to a video to demonstrate the two clusters by rotating the 3-dimension scatter plot. However, the supposed clusters, if any, are difficult to see in a 3D (Video 5) or in a 2D scatter plot (Figure 3I). The authors need to clearly demonstrate the distinct clustering as claimed in the paper as this feature is fundamental and necessary for the model implementation and interpretation of results.

      We are grateful to the reviewer for pointing out the difficulty in visualizing the clusters. We added additional visualizations and methods to validate the clusters we have discovered as described in our above response to Reviewer #3 under Validation of sensing clusters.

      When selecting parameters (covariates) for their model, it is critical to avoid overfitting. Therefore, the authors used AIC and BIC (Figure 4- supplement 1) to demonstrate that the full GLM model has a better model performance than the other models which contain only a subset of the full covariates (in a total of 5). However, the authors compare the full set with only 4 other models whereas the total number of models that need to be compared with is 2^5-2. The authors at least need to include the AIC and BIC scores of all possible models in order to draw the conclusion about the performance of the full model.

      Model selection criterion: We thank the reviewer for pointing out this gap in our methodology. We have now run the model with all combinations of subsets of model parameters and have confirmed that the model with all 5 covariates outperforms all other models even when using BIC, the strictest criterion for overfitting (Figure 1 - supplement 1A). The only other model that performs well (though not as often as the 5-term model) is the 4-term model lacking ρ<sub>h</sub>. This result is not surprising as ρ<sub>h</sub> only changes substantially once in an animal’s encounter history for the single-density, multi-patch data that this model was fit to. For example, for an animal foraging on patches of density 10, on the first encounter ρ<sub>h</sub> = ~200 (see Parameter initialization above), but on every subsequent encounter ρ<sub>h</sub> = ~10. Resultantly, the effect of ρ<sub>h</sub> on the probability of exploiting is somewhat binary on the single-density, multi-patch data set. Nevertheless, we see significantly improved prediction of behavior in the novel multi-density, multi-patch data (Figure 4F) as we observe an effect of the most recently encountered patch. Additionally, we observe a similar impact (i.e., significant coefficient of negative sign) of the ρ<sub>h</sub> term when the model is fit to the multi-density, multi-patch data set (Figure 4 - supplement 4D).

      In any bacterial patch, the edges have a higher density of bacteria than the patch center. Thus, it is possible that a worm scans the patch edge density, on the basis of which it decides to accept or reject the patch whose average density is smaller. This could potentially cause an underestimate of the bacteria density used in the model. Furthermore, the potential inhomogeneity of the patch may further complicate the worm's decision-making, and the discrepancy between the reality and the model assumption will reduce the validity of the model. The authors need to estimate the inhomogeneity of the bacterial patches used in their assays and discuss how the edge effects may affect their results and conclusions.

      Bacterial patch inhomogeneity: We extensively tested the landscape of the bacterial patches by imaging fluorescently-labeled bacteria OP50-GFP (Bacterial Patch Density in Methods; Figure 2 - supplement 1-3). As the reviewer mentions, we observe significantly greater bacterial density at the patch edge. This within-patch spatial inhomogeneity results from areas of active proliferation of bacteria and likely complicates an animal’s ability to accurately assess the quantity of bacteria within a patch and, consequently, our ability to accurately compute a metric related to our assumptions of what the animal is sensing. In our study, we used the relative density of the patch edge where bacterial density is highest as a proxy for an animal’s assessment of bacterial patch density (Figure 2 – supplement 1). This decision was based on a previous finding that the time spent on the edge of a bacterial patch affected the dynamics of subsequent area-restricted search. While within-patch spatial inhomogeneity likely affects an animal’s ability to assess patch density, we do not believe that this qualitatively affects the results of our study. Both the patch densities tested (Figure 2 – supplement 3A) as well as our observations of time-dependent changes in exploitation (Figure 2E,N-O; Figure 3H-I) maintained a monotonic relationship. Therefore, alternative methods of patch density estimation should yield similar results. We have added additional discussion on this topic to our manuscript (lines 578-593).

      The authors claim that their methods (GMM and semi-supervised QDA) are unbiased. This seems unlikely as the QDA involves supervision. The authors need to provide additional explanation on this point.

      Semi-supervised QDA labelling: We have removed the term “unbiased” to avoid any misinterpretation of the methodology and clarified our method of labelling used for “supervising” QDA. Specifically, we made two simple assumptions: 1) animals must have sensed the patch if they exploited it and 2) animals must not have sensed the patch if there was no bacteria to sense. Thus, we labeled encounters as sensing if they were found to be exploitatory as we assume that sensation is prerequisite to exploitation; and we labeled encounters as non-sensing for events where animals encountered patches lacking bacteria (OD<sub>600</sub> = 0). All other points were non-labeled prior to learning the model. In this way, our labels were based on the experimental design and results of the GMM, an unsupervised method; rather than any expectations we had about what sensing should look like. The semi-supervised QDA method then used these initial labels to iteratively fit a paraboloid that best separated these clusters, by minimizing the posterior variance of classification (lines 1012-1021). See Figure 2 - supplement 8A-B for a visualization showing the labelled data.

      Based on the authors' result, worms behaviorally exhibit their preferences toward food abundance (density), which results in a preference scale for a range of densities. Does this scale vary with the worms' initial cultivation states? The author partially verified that by observing starved worms. This hypothesis could be better tested if the authors could analyze the decision-making of the worms that were initially cultivated with different densities of bacterial food.

      While we agree with the reviewer that testing the effects of varying bacterial density during animal development (cultivation) is a very interesting experiment, it is not feasible at this time. We focused our revised manuscript to include only assertions about the effects of recent experiences and added this inquiry as a future direction as described above in our response to Reviewer #1 under Cultivation density.

      It would be helpful to elaborate more on how the framework developed in this paper can be applied more broadly to other behaviors and/or organisms and how it may influence our understanding of decision-making across species.

      We thank the reviewer for alerting us to this gap in our discussion. We have added additional commentary about our model and its utility to the discussion section (lines 667-695).

      Reviewer #3 (Recommendations for the authors):

      Sensing vs. non-sensing

      Perhaps a more ethologically accurate term to describe this behavior would be "ignoring" rather than "not sensing". If the authors feel strongly about using the term "not sensing", then they should provide experimental evidence supporting this claim. However, I think simply changing the terminology negates these experiments.

      We thank the reviewer for their thoughtful comments. While we agree with the reviewer that the term “non-sensing” may not be ethologically accurate (see response to Public Review above under Interpretation of “non-sensing” encounters), we interpret the term “ignoring” to mean that the animal sensed the patches but decided not to react. We have chosen to replace the term “non-sensing” with “non-responding” to best indicate the ethological interpretation of our observation. Nonetheless, we believe that it remains possible that animals are truly not sensing the bacterial patches as our method of classification compared the behavior against encounters with patches lacking bacteria (as described above in response to Reviewer #2 under Semi-supervised QDA labelling).

      History-dependence of the GLM

      Perhaps a simpler approach would be to say the worm senses everything, and this accumulative memory affects the decision to exploit. For example, the animal essentially experiences two feeding states: feeding on patches, and starvation off of patches.

      The level of satiety could be modeled linearly:

      Satiety(t_enter:t_leave) = k_feed*patch_density*delta_t

      Where k_feed is some model parameter for rate of satiety signal accumulation, t_enter is the time the animal entered the patch, t_leave is the time the animal left the patch, and delta_t is the difference between the two. Perhaps you could add a saturation limit to this, but given your data, I doubt that is the case.

      Starvation could be modeled as simply a decay from the last satiety signal:

      Starvation(t_leave:t_enter) = Satiety(t_leave)*exp(-k_starve*delta_t).

      Where starvation is the rate constant for the decay of the satiety signal.

      For the logistic model, the logistic parameter is simply the difference between the current patch density and the current satiety signal.

      A nice thing about this approach is that it negates the need to categorize your patches. All patch encounters matter. Brief patch encounters (categorized as non-sensing and not used in the prior GLM) naturally produce a very small satiety signal and contribute very little to the exploit decision. Another nice thing about this approach is that it gives you memory timescales, that are testable. There is a rate of satiety accumulation and a rate of satiety loss. You should be able to predict behavior with lower patch density, assuming the rate constants hold. (I am not advocating you do more experiments here, just pointing out a nice feature of this approach).

      You could possibly apply this to a GLM for velocity on a non-exploited patch as well, though I assume this would be a linear GLM, given the velocity distributions you provided.

      We thank the reviewer for their time and thoughtfulness in thinking about our model. The reviewer’s proposed model seems entirely reasonable and could aid in elucidating the time component of how prior experience affects decision-making. However, we decided to keep our paper focused on using a minimal model to answer a set of core questions (e.g., Does encounter history or satiety influence decision-making?) (see above under Model design for a more detailed response). Future studies investigating the mechanisms of these foraging decisions should open the door for more mechanistically accurate models. We have expanded our discussion of the model to include this assertion (lines 667-695).

    1. Author response:

      The following is the authors’ response to the original reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Sample size: If the sample size of the study is increased, more confidence and new insights can be inferred about myometrial enhancer-mediated gene regulation in term pregnancy. Such a small sample size (N = 3) limits the statistical power of the study. As mentioned in the manuscript they failed to identify chromatin loops in the second subject's biopsy is observed due to a limited sample.

      We agree with the reviewer’s comment about the sample size. We sincerely hope the result of this study would increase the interest of stakeholders to fund future projects in a larger scale.

      (2) Figure quality: There is a lack of good representations of the results (e.g., screenshots of tables as figure panels!) as well as missing interpretations that might add value to the manuscript.

      Figure 1B and 2B have been converted to the pie chart format.

      (3) Definition of super-enhancer: The definition of super-enhancer is not clear. Also, the computational merging of enhancers to define super-enhancers should be described better.

      Added more details about tool and parameter setting in the Method section of “Identification of super enhancers”:

      “Identification of super enhancers

      H3K27ac-positive enhancers were defined as regions of H3K27ac ChIP-seq peaks in each sample. The enhancers within 12.5Kb were merged by using bedtools merge function with parameter “-d 12500”. The combined enhancer regions were called super enhancers if they were larger than 15Kb. The common super enhancers from multiple samples were used for downstream analysis.”

      Reference:

      Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013 Apr 11;153(2):307-19. doi: 10.1016/j.cell.2013.03.035. PMID: 23582322; PMCID: PMC3653129.

      (4) Assay-Specific Limitations: Each assay employed in the study, such as ChIP-Seq and CRISPRa-based Perturb-Seq, has its limitations, including potential biases, sensitivity issues, and technical challenges, which could impact the accuracy and reliability of the results. These limitations should be addressed properly to avoid false-positive results and improve the interpretability of the results.

      The major limitations of the CRISPRa-based Perturb-Seq protocol in this study are the use of the hTERT-HM cells and the two-vector system for transduction. While hTERT-HM cells are a much easier platform in terms of technical operation, primary human myometrial cells are generally considered retaining a molecular context that is closer to the in vivo tissues. Due to the limitation on the efficiency of having two vectors simultaneously present in the same cell, hTERT-HM cells are much more affordable and operationally feasible to conduct the experiment. Future advancements on the increase of viral vector payload capacity may overcome this challenge and open the venue to perform the assay on primary human myometrial cells.

      (5) Sample collection and comparison: There is mention of matched gravid term and non-gravid samples whereas no description or use of control samples was found in the results. Also, the comparison of non-labor samples with labor samples would provide a better understanding of epigenomic and transcriptomic events of myometrium leading to laboring events.

      The description has been updated:

      “Collection of myometrial specimens

      Permission to collect human tissue specimens was prospectively obtained from individuals undergoing hysterectomy or cesarean section for benign clinical indications (H-33461). Gravid myometrial tissue was obtained from the margin of the hysterotomy in women undergoing term cesarean sections (>38 weeks estimated gestational age) without evidence of labor. Non-gravid myometrial tissue was collected from pre-menopausal women undergoing hysterectomy for benign conditions. Specimens from gravid women receiving treatment for pre-eclampsia, eclampsia, pregnancy-related hypertension, or pre-term labor were excluded.”

      (6) Lack of clarity:

      (6a) It is written as 'Chromatin Conformation Capture (Hi-C)'. I think Hi-C is Histone Capture and 3C is Chromosome Conformation Capture! This needs clear writing.

      As the reviewer suggested, to make it clear, we have changed the text “A high throughput chromatin conformation capture (Hi-C) assay” to “A High-throughput Chromosome Conformation Capture (Hi-C) assay”.

      (6b) In multiple places, 'PLCL2' gene is written as 'PCLC2'.

      Corrected as suggested.

      (6c) What is the biological relevance of considering 'active' genes with FPKM {greater than or equal to} 1? This needs clarification.

      In RNA-seq analysis, the gene expression levels are often quantified using FPKM (Fragments Per Kilobase of transcript per Million mapped reads). Setting a threshold of FPKM for defining "active" genes in RNA-seq analysis is biologically relevant, because it helps to distinguish between genuinely expressed genes and background noise. It helps researchers focus on genes, which are more likely to have a significant biological impact. A common threshold for defining "active" genes is FPKM ≥ 1. Genes with FPKM values below this threshold may be transcribed at very low levels or could be background noise.

      (6d) The understanding of differentially methylated genes at promoters is underrated as per the authors. But, why leaving DNA methylation apart, they selected histone modification as the basis of epigenetic reprogramming in terms of myometrium is unclear.

      DNA methylation indeed plays a crucial role in evaluating the impact of cis-acting elements on gene regulation. Large-scale studies, such as the comprehensive analysis of the myometrial methylome landscape in human biopsies (Paul et al., JCI Insight, 2022, PMID: 36066972), have provided valuable insights. When integrated with histone modification and chromatin looping data, contributed by our group and collaborators, future secondary analyses leveraging machine learning are poised to further elucidate the mechanisms underlying myometrial transcriptional regulation.

      (6e) How does the identification of PGR as an upstream regulator of PLCL2 gene expression in human myometrial cells contribute to our understanding of progesterone signaling in myometrial function?

      In a previous study, we demonstrated a positive correlation between PLCL2 and PGR expression in a mouse model and identified PLCL2's role in negatively modulating oxytocin-induced myometrial cell contraction (Peavy et al., PNAS, 2021, PMID: 33707208). The present study builds on this by providing evidence for a direct regulatory mechanism in which PGR influences PLCL2 transcription, likely through a cis-acting element located 35 kb upstream. These findings suggest that PLCL2 acts as a mediator of PGR-dependent myometrial quiescence prior to labor, rather than merely participating in a parallel pathway. Further in vivo studies are necessary to delineate the extent to which PLCL2 mediates PGR activity, particularly the contraction-dampening function of the PGR-B isoform.

      (7) Grammatical error: The manuscript has numerous grammatical errors. Please correct them.

      Corrections have been made as suggested.

      (8) Use of single-cell data: Though from the Methods section, it can be understood that single-cell RNA-seq was done to identify CRISPRa gRNA expressing cells to characterize the effect of gene activation, some results from single-cell data e.g., cell clustering, cell types, gRNA expression across clusters could be added for better elucidation.

      As reviewer suggested, we have prepared a file “PerturbSeq_summary.xlsx” (Dataset S9) to provide additional results of perturb-seq data analysis. It includes 2 spreadsheets, “Cell_per_gRNA” for clustering and “Protospacer_calls_per_cell” for gRNA expression across clusters.

      Reviewer #2 (Recommendations For The Authors):

      (1) The following are a number of grammatical issues in the abstract. I suggest having a careful read of the entire manuscript to identify additional grammatical issues as I may not be able to highlight all of these issues.

      (1a) "The myometrium plays a critical component during pregnancy." change component to role.

      (1b) "It is responsible for the uterus' structural integrity and force generation at term," à replace "," with "."

      (1c) Also, I suggest rephrasing the first 2 sentences to: The myometrium plays a critical role during pregnancy as it is responsible for both the structural integrity of the uterus and force generation at term.

      (1d) "Here we investigated the human term pregnant nonlabor myometrial biopsies for transcriptome, enhancer histone mark cistrome, and chromatin conformation pattern mapping." Remove "the", and modify to "Here we investigated human term pregnant".

      (1e) Missing period and sentence fragment, "PGR overexpression facilitated PLCL2 gene expression in myometrial cells Using CRISPR activation the functionality of a PGR putative enhancer 35-kilobases upstream of the contractile-restrictive gene PLCL2.

      Corrections have been made as suggested.

      (2) Sentence fragment: Studies on the role of steroid hormone receptors in myometrial remodeling have provided evidence that the withdrawal of functional progesterone signaling at term is due to a stoichiometric increase of progesterone receptor (PGR) A to B isoform-related estrogen receptor (ESR) alpha expression activation at term. (Mesiano, Chan et al. 2002) (Merlino, Welsh et al. 2007) (Nadeem, Shynlova et al. 2016).

      The statement has been updated:

      “Studies on the role of steroid hormone receptors in myometrial remodeling suggest that the withdrawal of functional progesterone signaling at term results from a stoichiometric shift favoring the PGR-A isoform over PGR-B. This shift is associated with increased activation of estrogen receptor alpha (ESR1) expression at term (Mesiano, Chan et al. 2002) (Merlino, Welsh et al. 2007) (Nadeem, Shynlova et al. 2016).”

      (3) FOS:JUN heterodimers are implicated to be critical for the initiation of labor through transcriptional regulation of gap junction proteins such as Cx43 (Nadeem, Farine et al. 2018) (Balducci, Risek et al. 1993).

      Use Gja1 (Gap junction alpha 1) as the current correct gene, not Cx43.

      Also, several references predate Nadeem, Farine et al. 2018 and are more appropriate to use as references for the role of Ap-1 proteins in regulating Gja1; PMID: 15618352 and PMID: 12064606 were the first to show this relationship in myometrial cells.

      The statement has been updated as suggested:

      “FOS:JUN heterodimers are implicated to be critical for the initiation of labor through transcriptional regulation of gap junction proteins such as GJA1 (Nadeem, Farine et al. 2018) (Balducci, Risek et al. 1993)”

      (4) Define PLCL2 on first use.

      Updated as suggested.

      (5) There are a number of issues with this section, "Matched sSpecimens of gravid myometrium were collected at the margin of hysterotomy from women undergoing clinically indicated cesarean section at term (>38 weeks estimated gestation age) without evidence of labor. Specimens of healthy, non-gravid myometrium were also pecimens were collected from uteri removed from pre-menopausal women undergoing hysterectomy for benign clinical indications."

      The description has been updated:

      “Collection of myometrial specimens

      Permission to collect human tissue specimens was prospectively obtained from individuals undergoing hysterectomy or cesarean section for benign clinical indications (H-33461). Gravid myometrial tissue was obtained from the margin of the hysterotomy in women undergoing term cesarean sections (>38 weeks estimated gestational age) without evidence of labor. Non-gravid myometrial tissue was collected from pre-menopausal women undergoing hysterectomy for benign conditions. Specimens from gravid women receiving treatment for pre-eclampsia, eclampsia, pregnancy-related hypertension, or pre-term labor were excluded.”

      (6) Enriched motifs were identified by HOMER (Hypergeometric Optimization of Motif EnRichment) v4.11 (Heinz, Benner et al. 2010).

      Please clarify what background is used for motif enrichment.

      We used the default background sequences generated by HOMER from a set of random genomic sequences matching the input sequences in terms of basic properties, such as GC content and length. We have added more details in the Method section:

      “DNA-binding factor motif enrichment analysis

      Enriched motifs were identified by HOMER (Hypergeometric Optimization of Motif EnRichment) v4.11 with default background sequences matching the input sequences (Heinz, Benner et al. 2010).”

      (7) "Six of the seven regions are also co-localized with previously published genome occupancy of transcription regulators curated by the ReMap Atlas"

      Please clarify if this Atlas includes myometrial tissues or not and clarify the cell types included in the atlas.

      According to the UCSC Genome Browser and the reference by Hammal et al. (2022), the current ReMap database includes PGR ChIP-seq data from human myometrial biopsies, available under NCBI GEO accession number GSE137550, alongside data from various other cell and tissue types. ReMap provides valuable insights into potential functional cis-acting elements in the genome from a systems biology perspective. However, tissue specificity requires independent validation.

      (8) "Notably, 76% of the putative super-enhancers are co-localized with known PGR-occupied regions in the human myometrial tissue (Figure S2). This is significantly higher than the 20% co-localization in the regular enhancer group (Figure S2)."

      Because there is a huge difference in the size of the putative super enhancer regions and the isolated enhancers this comparison is not appropriate as conducted. The comparison needs to account for the difference in size of the regions. Please provide P values for significance statements.

      We acknowledge the reviewer's concern that our initial statement was overstated and potentially misleading, given the substantial difference in size between putative super-enhancer regions and regular enhancers. Rather than emphasizing the enrichment, it would be more accurate to simply describe our observation that super-enhancers encompass more PGR-occupied regions.

      Here is the updated version:

      “Notably, 76% of the putative super-enhancers co-localize with known PGR-occupied regions in human myometrial tissue, compared to 20% co-localization observed in regular enhancers (Figure S2).”

      Reviewer #3 (Recommendations For The Authors):

      (1) Title is extremely misleading, as here we do not get a view of the epigenomic landscape, but rather sparce data related to H3K27ac and H3K4me (focusing on enhancers) and chromatin conformation associated with the PLCL2 transcription start site (TSS).

      As suggested, the title is modified to “Assessment of the Histone Mark-based Epigenomic Landscape in Human Myometrium at Term Pregnancy”.

      (2) Improve the first result paragraph by providing a clear rationale for the experiments and their objectives, as well as introducing the samples used. Rather than simply listing approaches and end results in Table 1, offer concise explanations for the experiments alongside the supporting data presented in detailed figures. Using appropriate figures/graphs to effectively contextualize these datasets would be greatly appreciated by readers and would add more value to this research. Currently, it is difficult for us to assess and appreciate the quality of the data.

      The following statement is included in the beginning of the Result section:

      "To better understand the regulatory network shaping the myometrial transcriptome before labor, we analyzed transcriptome and putative enhancers in individual human myometrial specimens. Using RNA-seq, we identified actively expressed RNAs, while ChIP-seq for H3K27ac and H3K4me1 was used to map putative enhancers. Active genes were associated with nearby putative enhancers based on their genomic proximity. Additionally, chromatin looping patterns were mapped using Hi-C to further link active genes and putative enhancers within the same chromatin loops."

      (3) The statistics for every sequencing approach need to be provided for each sample (e.g., RNA-seq: number of total reads, number of mapped reads, % of mapped reads; ChIP-Seq: number of mapped reads, % of mapped reads, % of duplicates).

      We have generated the summary table of each dataset included in this study (Dataset S7) [NGS-summary.xls].

      (4) Figure S1: The rationale behind comparing the Dotts study and yours regarding H3K27ac-positive regions needs to be better defined. Why is this performed if the data will not be used afterwards? What are the conserved regions associated with vs the ones that are variable? Is this biologically relevant? Why not use only the regions conserved between the 6 samples, to have more robust conclusions?

      The purpose of comparing our data with the Dotts dataset is to highlight the degree of variation across studies. In this study, we focused on addressing specific biological questions using our own dataset rather than developing methodologies for meta-analysis. Future advancements in meta-analysis techniques could leverage the combined power of multiple datasets to provide deeper insights.

      (5) Perhaps due to a lack of details, I am unable to ascertain how the putative myometrial enhancers were defined. In Dataset S1, it is stated, "we define the regions that have overlapping H3K27ac and H3K4me1 marks as putative myometrial enhancers at the term pregnant nonlabor stage (Dataset S1)". Within Dataset S1, for subjects 1, 2, and 3, H3K27ac and H3K4me1 double-positive enhancers are shown in term pregnant, non-labor human myometrial specimens, with approximately 100 regions corresponding to 131 (sample 1), 127 (sample 2), and 140 (sample 3) common peaks. However, in Figure 1a, reference is made to the 13114 putative enhancers commonly present across the three specimens. Is Dataset S1 intended to represent only a small fraction of the 13114 putative enhancers? Detailed analyses need to be conducted and better showcased.

      Dataset S1 has been updated to list all 13,114 putative enhancers.

      (6) For the gene expression analyses of RNA-seq data, FPKM values were utilized. However, it is unclear why the gene expression count matrix was normalized based on the ratio of total mapped read pairs in each sample to 56.5 million for the term myometrial specimens. I would recommend exercising caution regarding the use of FPKM expression units, as samples are normalized only within themselves, lacking cross-sample normalization. Consequently, due to external factors unaccounted for by this normalization method, a value of 10 in one sample may not equate to 10 in another.

      We value the reviewer’s input. This question will be addressed in future secondary data analyses with suitable methodologies, as it is beyond the scope of this study.

      (7) In Figure 1b, the authors have categorized their 12157 active genes into 3 bins based on FPKM values: >5 FPKM >1, >15 FPKM >5, and >15 FPKM. However, in the text, they describe these as 'actively high-expressing genes (FPKM >= 15)'. I would advise caution regarding the interpretation of these values, as an FPKM of 15 is not typically associated with highly expressed genes. According to literature and resources such as the Expression Atlas, an FPKM of 15 is generally considered to represent a low to medium expression level.

      We appreciate the reviewer’s feedback. This question will be revisited during secondary data analyses using appropriate methodologies, as it falls outside the scope of the present study.

      To increase readability and clarity, we modified the sentence as following: More than 40% of the 540 putative super enhancers are located within a 100-kilobase distance to high-expressing genes (FPKM >= 15), while only 7.3% of putative myometrial super enhancers are found near low-expressing genes (5 > FPKM >=1) (Figure 2B).

      (8) Out of the 12157 active genes, approximately two-thirds have an FPKM >15. Was this expected? How does this correspond to what is observed in the literature, particularly in other similar studies (https://pubmed.ncbi.nlm.nih.gov/30988671/ ; https://pubmed.ncbi.nlm.nih.gov/35260533/ ) .

      This is indeed an intriguing question that merits further exploration in future secondary analyses.

      (9) It is also surprising to see that for the motif enrichment analysis (Fig. 1C), the P-values are small. This is probably because the percentage of target sequences with the motif is very similar to the percentage of background sequences with the motif. For instance, for selected genes in Figure 1C: AP-1 (50.68% vs. 46.50%), STAT5 (28.08% vs. 25.04%), PGR (17.90% vs. 16.12%), etc. Can one really say that you have a biologically relevant enrichment for values that are so close between target sequences and background sequences?

      Reviewer’s comment is noted. Biological relevance shall be experimentally examined though wet-lab assays in future studies.

      (10) For Figure 2, again not convinced that FPKM >= 15 can be used to say: Compared with the regular putative enhancers, the putative myometrial super-enhancers are found more frequently near active genes that are expressed at relatively higher levels (Figure 1B and Figure 2B). A higher threshold should be used if they want to say this.

      To compare the association of putative enhancers with active genes expressed at different levels, we categorized the active genes into three groups based on their FPKM (Fragments Per Kilobase of transcript per Million mapped reads) values. These groups are defined as follows: the top third active genes (FPKM ≥ 15), the middle third active genes (5 ≤ FPKM < 15), and the bottom third active genes (1 ≤ FPKM < 5). By "active genes expressed at relatively higher levels," we refer specifically to the top third active genes with FPKM values of 15 or higher, indicating their relatively higher expression levels compared to the other groups of active genes.

      (11) More detailed explanations and methods are needed regarding how the data for Figure S2 was obtained.

      The following details were added to the methods section:

      “Colocalization of super enhancers and PGR genome occupancy was compared by calling peaks from previously published PGR ChIP-seq data (GSM4081683 and GSM4081684). The percentages of enhancers and super enhancers that manifest PGR occupancy were calculated by overlapping the genomic regions in each category with PGR occupancy regions.”

      (12) In Figure 2C, there is no information provided on the genes used to obtain the results. It would be helpful to include examples of these genes, along with their expression values, for instance.

      The expression levels of the 346 active genes that are associated with myometrial super enhancers are included in Dataset S4, along with results of the updated gene ontology enrichment analysis using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) of Knowledgebase v2024q4. Selected pathways of interest are listed in updated Figure 2C.

      (13) The linking of PLCL2-related data to the first part of the story is lacking, and the rationale behind it is missing. This entire section should be more detailed, and the data should be expanded to better reflect the context.

      As suggested, we included the following statement at the beginning of the section “Cis-acting elements for the control of the contractile gene PLCL2”:

      “We previously demonstrated the positive correlation of PLCL2 and PGR expression in a mouse model and PLCL2’s function on negatively modulating oxytocin-induced myometrial cell contraction (Peavy et al., 2021). However, the mechanism underlies the PGR regulation of PLCL2 remains unclear. Taking advantage of the mapped myometrial cis-acting elements, we aimed to identify the cis-acting elements that may contribute to the PLCL2 transcriptional regulation with a special interest on the PGR-related enhancers.”

      The context is that our results provide additional evidence to support a direct regulation mechanism of PGR on the PLCL2 transcription, likely though the 35-kb upstream cis-acting element. This finding suggests that PLCL2 likely plays a mediator’s role of PGR dependent myometrial quiescence before laboring rather than a mere passenger on a parallel pathway. Further studies using in vivo models are needed to determine the extent of PLCL2 in mediating PGR, especially PGR-B isoform’s contraction-dampening function.

      (14) The entire Hi-C data should be presented to allow for the assessment of its quality and further value.

      The revised manuscript has included the Hi-C quality control summary in Dataset S8 [HiC-QC-Summary.xlsx].

      (15) The authors state: "For the purpose of functional screening, we focus on H3K27ac signals instead of using H3K27ac/H3K4me1 double positive criterium to cast a wider net." However, it is unclear how many of the targeted regions contained H3K27ac/H3K4me1 peaks. Were enhancers or super-enhancers targeted, and if so, how did they compare to H3K27ac sites?

      The numbers of H3K27ac/H3K4me1 double positive peaks are recorded in Figure 1A. Compared to the numbers of H3K27ac intervals (Table 1), the H3K27ac/H3K4me1 double positive peaks are 62.9%, 70.7%, and 61.2% of corresponding H3K27ac intervals in each individual specimen.

      (16) For the first set of data (Table 1), the authors state, "Together, these results reveal an epigenomic landscape in the human term pregnant myometrial tissue before the onset of labor, which we use as a resource to investigate the molecular mechanisms that prepare the myometrium for subsequent parturition." While it is acknowledged that an epigenetic landscape exists in all tissues, there is a lack of clarity regarding this landscape in the current manuscript, as we are only presented with a table containing numbers.

      This sentence has been revised to: “Together, these results delineate a map of H3K27ac and H3K4me1 positive signals in the human term pregnant myometrial tissue before the onset of labor, which we use as a resource to investigate the molecular mechanisms that prepare the myometrium for subsequent parturition.”

      (17) For S1, the authors conclude: These data together highlight the degree of variation in mapping the epigenome among specimens and datasets. This conclusion seems somewhat perplexing, and I find myself in partial disagreement. Firstly, providing a clear rationale for this section would strengthen the conclusions. It's important to consider what factors may contribute to this variability. It could simply be attributed to differences in experimental settings, such as variations in samples, protocols used, antibodies, sequencing departments, or overall data quality. Deeper analyses of the data could have provided more information.

      We agree with the reviewer that deeper analyses are needed in order to extract more information among studies. However, appropriate methods for meta-analyses should be carefully evaluated and employed for this purpose. We humbly believe that such a task should belong to future studies that may combine available datasets for secondary analyses, leveraging the collective contribution of the reproductive biology community.

      (18) In the methods section, please include an explanation of how enhancers and super-enhancers were defined or add appropriate citations for reference.

      Added more details about tool and parameter setting in the Method section of “Identification of super enhancers”.

      “Identification of super enhancers

      H3K27ac-positive enhancers were defined as regions of H3K27ac ChIP-seq peaks in each sample. The enhancers within 12.5Kb were merged by using bedtools merge function with parameter “-d 12500”. The combined enhancer regions were called super enhancers if they were larger than 15Kb. The common super enhancers from multiple samples were used for downstream analysis.”

      Reference:

      Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell. 2013 Apr 11;153(2):307-19. doi: 10.1016/j.cell.2013.03.035. PMID: 23582322; PMCID: PMC3653129.

      (19) Additional description on the "Inferred myometrial PGR activities and the correlation analysis "method section should be included to enhance clarity and understanding.

      The description has been updated:

      “The inferred PGR activities were represented by the T-score, which was derived by inputting the mouse myometrial Pgr gene signature, based on the differentially expressed genes between control and myometrial Pgr knockout groups at mid-pregnancy (Wu, Wang et al., 2022), into the SEMIPs application (Li, Bushel et al., 2021). The T-scores were computed using this signature alongside the normalized gene expression counts (FPKM) from 43 human myometrial biopsy specimens.”

      (20) How was the qPCR analysis performed? Was the ddCT method utilized, and was a reference gene used for control? Additional information would be beneficial.

      Quantifying relative mRNA levels was performed via the standard curve method.

      The following details were added: “Relative levels of genes of interest were normalized to the 18S rRNA.”

      (21) Regarding the RNA-Seq analysis of Provera-treated human Myometrial Specimens, the continued use of FPKM is not ideal due to potential differences in RNA composition between libraries. Additionally, clarification is needed on why Cufflinks 2.0.2 was used, considering it is no longer supported.

      FPKM (Fragments Per Kilobase of transcript per Million mapped reads) is used in RNA-Seq analysis, because it allows for the normalization of gene expression data, accounting for differences in gene length and sequencing depth, and facilitates comparability across different genes and libraries. This makes it one of the essential tools for accurately measuring and comparing gene expression levels in various biological and clinical research contexts.

      CuffLinks was once a popular tool for analyzing RNA-seq data, transcriptome assembly, and DEG identification. Its usage has declined in recent years due to the emergence of newer and more advanced tools. The main reason is that it was used for RNA-seq analysis at early stage of this study a few years ago. For the purpose of comparison and consistency, we continued using this tool for later RNA-seq analysis. If we start a new project now, we will choose newer tools, such as HISAT2, Salmon, and DEseq2.

      (22) Overall, sentence structure and typos need to be corrected across the text. Here are some examples:

      Line 17: at term, emerging studies.

      Line 20-22: Here we investigated the human term pregnant nonlabor myometrial biopsies for transcriptome, enhancer histone mark cistrome, and chromatin conformation pattern mapping.

      Line 30-32: PGR overexpression facilitated PLCL2 gene expression in myometrial cells Using CRISPR activation the functionality of a PGR putative enhancer 35-kilobases upstream of the contractile-restrictive gene PLCL2.

      Line 66-70: However, the role of differential myometrial DNA methylation at contractility-driving gene promoter CpG islands in preterm birth is not thought to be major (Mitsuya, Singh et al. 2014), but given that DNA methylation-mediated gene regulation often occurs outside of CpG islands (Irizarry, Ladd-Acosta et al. 2009), there is still work to be done at this interface.

      Line 80-83: Putative enhancers upstream of the PLCL2, a gene encoding for the protein PLCL2 which has been implicated in the modulation of calcium signaling (Uji, Matsuda et al. 2002) and maintenance of myometrial quiescence (Peavey, Wu et al. 2021), transcriptional start site were subject to functional assessment using CRISPR activation based assays.

      Line 290 : sSpecimens

      We appreciate the reviewer’s kind efforts and have made changes accordingly.

    1. Public Reviews: Reviewer #1 (Public Review): Summary: A cortico-centric view is dominant in the study of the neural mechanisms of consciousness. This investigation represents the growing interest in understanding how subcortical regions are involved in conscious perception. To achieve this, the authors engaged in an ambitious and rare procedure in humans of directly recording from neurons in the subthalamic nucleus and thalamus. While participants were in surgery for the placement of deep brain stimulation devices for the treatment of essential tremor and Parkinson's disease, they were awakened and completed a perceptual-threshold tactile detection task. The authors identified individual neurons and analyzed single-unit activity corresponding with the task phases and tactile detection/perception. Among the neurons that were perception-responsive, the authors report changes in firing rate beginning ~150 milliseconds from the onset of the tactile stimulation. Curiously, the majority of the perception-responsive neurons had a higher firing rate for missed/not perceived trials. In summary, this investigation is a valuable addition to the growing literature on the role of subcortical regions in conscious perception. Strengths: The authors achieved the challenging task of recording human single-unit activity while participants performed a tactile perception task. The methods and statistics are clearly explained and rigorous, particularly for managing false positives and non-normal distributions. The results offer new detail at the level of individual neurons in the emerging recognition of the role of subcortical regions in conscious perception. We thank the reviewer for their positive comments. Weaknesses: "Nonetheless, it remains unknown how the firing rate of subcortical neurons changes when a stimulus is consciously perceived." (lines 76-77) The authors could be more specific about what exactly single-unit recordings offer for interrogating the role of subcortical regions in conscious perception that is unique from alternative neural activity recordings (e.g., local field potential) or recordings that are used as proxies of neural activity (e.g., fMRI). We agree with the reviewer that the contribution of micro-electrode recordings was not sufficiently put forward in our manuscript. We added the following sentences to the discussion, when discussing the multiple types of neurons we found: Single-unit recordings provide a much higher temporal resolution than functional imaging, which helps assess how the neural correlates of consciousness unfold over time. Contrary to local field potentials, single-unit recordings can expose the variety of functional roles of neurons within subcortical regions, thereby offering a potential for a better mechanistic understanding of perceptual consciousness. Related comment for the following excerpts: "After a random delay ranging from 0.5 to 1 s, a "respond" cue was played, prompting participants to verbally report whether they felt a vibration or not. Therefore, none of the reported analyses are confounded by motor responses." (lines 97-99). "These results show that subthalamic and thalamic neurons are modulated by stimulus onset, irrespective of whether it was reported or not, even though no immediate motor response was required." (lines 188190). "By imposing a delay between the end of the tactile stimulation window and the subjective report, we ensured that neuronal responses reflected stimulus detection and not mere motor responses." (lines 245247). It is a valuable feature of the paradigm that the reporting period was initiated hundreds of milliseconds after the stimulus presentation so that the neural responses should not represent "mere motor responses". However, verbal report of having perceived or not perceived a stimulus is a motor response and because the participants anticipate having to make these reports before the onset of the response period, there may be motor preparatory activity from the time of the perceived stimulus that is absent for the not perceived stimulus. The authors show sensitivity to this issue by identifying task-selective neurons and their discussion of the results that refer to the confound of post-perceptual processing. Still, direct treatment of this possible confound would help the rigor of the interpretation of the results. We agree with the reviewer that direct treatment would have provided the best control. One way to avoid motor preparation is to only provide the stimulus-effector mapping after the stimulus presentation (Bennur & Gold, 2011; Twomey et al., 2016; Fang et al., 2024). Other controls to avoid post-perceptual processing used in consciousness research consist of using no-report paradigms (Tsuchiya et al., 2015) as we did in previous studies (Pereira et al., 2021; Stockart et al., 2024). Unfortunately, neither of these procedures was feasible during the 10 minutes allotted for the research task in an intraoperative setting with auditory cues and vocal responses. We would like to highlight nonetheless that the effects we report are shortlived and incompatible with sustained motor preparation activity. We added the following sentence to the discussion: Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness. "When analyzing tactile perception, we ensured that our results were not contaminated with spurious behavior (e.g. fluctuation of attention and arousal due to the surgical procedure)." (lines 118-117). Confidence in the results would be improved if the authors clarified exactly what behaviors were considered as contaminating the results (e.g., eye closure, saccades, and bodily movements) and how they were determined. This sentence was indeed unclear. It introduced the trial selection procedure we used to compensate for drifts in the perceptual threshold, which can result from fluctuations in attention or arousal. We modified the sentence, which now reads: When analyzing tactile perception, we ensured that our results were not contaminated by fluctuating attention and arousal due to the surgical procedure. Based on objective criteria, we excluded specific series of trials from analyses and focused on time windows for which hits and misses occurred in commensurate proportions (see methods). During the recordings, the experimenter stood next to the patients and monitored their bodily movements, ensuring they did not close their eyes or produce any other bodily movements synchronous with stimulus presentation. The authors' discussion of the thalamic neurons could be more precise. The authors show that only certain areas of the thalamus were recorded (in or near the ventral lateral nucleus, according to Figure S3C). The ventral lateral nucleus has a unique relationship to tactile and motor systems, so do the authors hypothesize these same perception-selective neurons would be active in the same way for visual, auditory, olfactory, and taste perception? Moreover, the authors minimally interpret the location of the task, sensory, and perception-responsive neurons. Figure S3 suggests these neurons are overlapping. Did the authors expect this overlap and what does it mean for the functional organization of the ventral lateral nucleus and subthalamic nucleus in conscious perception? These are excellent questions, the answers to which we can only speculate. In rodents, the LT is known as a hub for multisensory processing, as over 90% of LT neurons respond to at least two sensory modalities (for a review, see Yang et al., 2024). Yet, no study has compared how LT neurons in rodents encode perceived and nonperceived stimuli across modalities. Evidence in humans is scarce, with only a few studies documenting supramodal neural correlates of consciousness at the cortical level with noninvsasive methods (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022). We now refer to these studies in the revised discussion: Moreover, given the prominent role of the thalamus in multisensory processing, it will be interesting to assess if it is specifically involved in tactile consciousness or if it has a supramodal contribution, akin to what is found in the cortex (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022). Concerning the anatomical overlap of neurons, we could not reconstruct the exact locations of the DBS tracts for all participants. Because of the limited number of recorded neurons, we preferred to refrain from drawing strong conclusions about the functional organization of the ventral lateral nucleus. "We note that, 6 out of 8 neurons had higher firing rates for missed trials than hit trials, although this proportion was not significant (binomial test: p = 0.145)." (lines 215-216). It appears that in the three example neurons shown in Figure 4, 2 out of 3 (#001 and #068) show a change in firing rate predominantly for the missed stimulations. Meanwhile, #034 shows a clear hit response (although there is an early missed response - decreased firing rate - around 150 ms that is not statistically significant). This is a counterintuitive finding when compared to previous results from the thalamus (e.g., local field potentials and fMRI) that show the opposite response profile (i.e., missed/not perceived trials display no change or reduced response relative to hit/perceived trials). The discussion of the results should address this, including if these seemingly competing findings can be rectified. We thank the reviewer for pointing out this limitation of the discussion. We avoided putting too much emphasis on these aspects due to the limited number of perception-selective neurons. Although subcortical connectivity models would predict that neurons in the thalamus should increase their firing rate for perceived stimuli, we were not surprised to see this heterogeneity as we had previously found neurons decreasing their firing rates for missed stimuli in the posterior parietal cortex (Pereira et al., 2021). We answer these points in response to the reviewer’s last comment below on the latencies of the effects. The authors report 8 perception-responsive neurons, but there are only 5 recording sites highlighted (i.e., filled-in squares and circles) in Figures S3C and 4D. Was this an omission or were three neurons removed from the perception-responsive analysis? Unfortunately, we could not obtain anatomical images for all participants. This information was present in the methods section, although not clearly enough: For 34 / 50 neurons, preoperative MRI and postoperative CT scans (co-registered in patient native space using CranialSuite) were available to precisely reconstruct surgical trajectories and recording locations (for the remaining 16 neurons, localizations were based on neurosurgical planning and confirmed by electrophysiological recordings at various depths). Therefore, we added the following sentence in Figures 2, 3, 4 and S3. [...] for patients for which we could obtain anatomical images. Could the authors speak to the timing of the responses reported in Figure 4? The statistically significant intervals suggested both early (~160-200ms) to late responses (~300ms). Some have hypothesized that subcortical regions are early - ahead of cortical activation that may be linked with conscious perception. Do these results say anything about this temporal model for when subcortical regions are active in conscious perception? We agree that response timing could have been better described. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of the two clusters mentioned by the reviewer very clearly. We now include this analysis in a new Figure 5 in the revised manuscript. We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section: We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods: To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component. We updated the discussion, including the points made in the comment about higher activity for missed stimuli (above): The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021). Reviewer #2 (Public Review): The authors have studied subpopulations of individual neurons recorded in the thalamus and subthalamic nucleus (STN) of awake humans performing a simple cognitive task. They have carefully designed their task structure to eliminate motor components that could confound their analyses in these subcortical structures, given that the data was recorded in patients with Parkinson's Disease (PD) and diagnosed with an Essential Tremor (ET). The recorded data represents a promising addition to the field. The analyses that the authors have applied can serve as a strong starting point for exploring the kinds of complex signals that can emerge within a single neuron's activity. Pereira et. al conclude that their results from single neurons indicate that task-related activity occurs, purportedly separate from previously identified sensory signals. These conclusions are a promising and novel perspective for how the field thinks about the emergence of decisions and sensory perception across the entire brain as a unit. We thank the reviewer for these positive comments. Despite the strength of the data that was obtained and the relevant nature of the conclusions that were drawn, there are certain limitations that must be taken into consideration: (1) The authors make several claims that their findings are direct representations of consciousnessidentifiable in subcortical structures. The current context for consciousness does not sufficiently define how the consciousness is related to the perceptual task. This is indeed a complex issue in all studies concerned with perceptual consciousness and we were careful not to make such “direct” claims. Instead, we used the state-of-the-art tools available to study consciousness (see below) and only interpreted our findings with respect to consciousness in the discussion. For example, in the abstract, our claim is that “Our results provide direct neurophysiological evidence of the involvement of the subthalamic nucleus and the thalamus for the detection of vibrotactile stimuli, thereby calling for a less cortico-centric view of the neural correlates of consciousness.” In brief, first, we used near-threshold stimuli which allowed us to contrast reported vs. unreported trials while keeping the physical properties of the stimulus comparable. Second, we used subjective reports without incentive for participants to be more conservative or liberal in their response (e.g. through reward). Third, we introduced a random delay before the responses to limit confounding effects due to the report. We also acknowledged that “... it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & Tallon-Baudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015)”. This last sentence now reads (to address a point made by Reviewer 1 about motor preparation): Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness. (2) The current work would benefit greatly from a description and clarification of what all the neurons thathave been recorded are doing. The authors' criteria for selecting subpopulations with task-relevant activity are appropriate, but understanding the heterogeneity in a population of single neurons is important for broader considerations that are being studied within the field. We followed the reviewer’s suggestions and added new results regarding the latencies of the reported effects (new Figure 5). We also now show firing rates for hits, misses and overall sensory activity (hits and misses combined) for all perception-selective or sensory-selective (when behavior was good enough; Figure S5). Although a more detailed characterization of the heterogeneity of the neurons identified would have been relevant, it seems beyond the scope of the present study, especially given the relatively small number of neurons we identified, as well as the relative simplicity of the paradigm imposed by the clinical context in which we worked. (3) The authors have omitted a proper set of controls for comparison against the active trials, forexample, where a response was not necessary. Please explain why this choice was made and what implications are necessary to consider. We had mentioned this limitation in the discussion: Nevertheless, it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & TallonBaudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015). We agree that such a control would have been relevant, but this was not feasible during the 10 minutes allotted for the research task in an intraoperative setting. These constraints are both clinical, to minimize discomfort for patients and practical, as is difficult to track neurons in an intraoperative setting for more than 10 minutes. We added a sentence to this effect in the discussion. Reviewer #3 (Public Review): Summary: This important study relies on a rare dataset: intracranial recordings within the thalamus and the subthalamic nucleus in awake humans, while they were performing a tactile detection task. This procedure allowed the authors to identify a small but significant proportion of individual neurons, in both structures, whose activity correlated with the task (e.g. their firing rate changed following the audio cue signalling the start of a trial) and/or with the stimulus presentation (change in firing rate around 200 ms following tactile stimulation) and/or with participant's reported subjective perception of the stimulus (difference between hits and misses around 200 ms following tactile stimulation). Whereas most studies interested in the neural underpinnings of conscious perception focus on cortical areas, these results suggest that subcortical structures might also play a role in conscious perception, notably tactile detection. Strengths: There are two strongly valuable aspects in this study that make the evidence convincing and even compelling. First, these types of data are exceptional, the authors could have access to subcortical recordings in awake and behaving humans during surgery. Additionally, the methods are solid. The behavioral study meets the best standards of the domain, with a careful calibration of the stimulation levels (staircase) to maintain them around the detection threshold, and an additional selection of time intervals where the behavior was stable. The authors also checked that stimulus intensity was the same on average for hits and misses within these selected periods, which warrants that the effects of detection that are observed here are not confounded by stimulus intensity. The neural data analysis is also very sound and well-conducted. The statistical approach complies with current best practices, although I found that, in some instances, it was not entirely clear which type of permutations had been performed, and I would advocate for more clarity in these instances. Globally the figures are nice, clear, and well presented. I appreciated the fact that the precise anatomical location of the neurons was directly shown in each figure. We thank the reviewer for this positive evaluation. Weaknesses: Some clarification is needed for interpreting Figure 3, top rows: in my understanding the black curve is already the result of a subtraction between stimulus present trials and catch trials, to remove potential drifts; if so, it does not make sense to compare it with the firing rate recorded for catch trials. The black curve represents the firing rate without any subtraction. We only subtracted the firing rates of catch trials in the statistical procedure, as the reviewer noted, to remove potential drift. We added (before baseline correction) to the legend of Figure 3. I also think that the article could benefit from a more thorough presentation of the data and that this could help refine the interpretation which seems to be a bit incomplete in the current version. There are 8 stimulus-responsive neurons and 8 perception-selective neurons, with only one showing both effects, resulting in a total of 15 individual neurons being in either category or 13 neurons if we exclude those in which the behavior is not good enough for the hit versus miss analysis (Figure S4A). In my opinion, it should be feasible to show the data for all of them (either in a main figure, or at least in supplementary), but in the present version, we get to see the data for only 3 neurons for each analysis. This very small selection includes the only neuron that shows both effects (neuron #001; which is also cue selective), but this is not highlighted in the text. It would be interesting to see both the stimulus-response data and the hit versus miss data for all 13 neurons as it could help develop the interpretation of exactly how these neurons might be involved in stimulus processing and conscious perception. This should give rise to distinct interpretations for the three possible categories. Neurons that are stimulus-responsive but not perception-selective should show the same response for both hits and misses and hence carry out indifferently conscious and unconscious responses. The fact that some neurons show the opposite pattern is particularly intriguing and might give rise to a very specific interpretation: if the neuron really doesn't tend to respond to the stimulus when hits and misses are put together, it might be a neuron that does not directly respond to the stimulus, but whose spontaneous fluctuations across trials affect how the stimulus is perceived when they occur in a specific time window after the stimulus. Finally, neuron #001 responds with what looks like a real burst of evoked activity to stimulation and also shows a difference between hits and misses, but intriguingly, the response is strongest for misses. In the discussion, the interesting interpretation in terms of a specific gating of information by subcortical structures seems to apply well to this last example, but not necessarily to the other categories. We now provide a supplementary Figure showing firing rates for hits, misses and the combination of both. The reviewer’s analysis about whether a perception-selective neuron also has to respond to the stimulus to be involved in gating is interesting. With more data, a finer characterization of these neurons would have been possible. In our study, it is possible that more neurons have similar characteristics as #001 (e.g. #032, #062, #068) but do not show a significant difference with respect to baseline when both hits and misses are considered. We now avoid interpreting null effects, especially considering the low number of trials with near-threshold detection behavior we could collect in 10 minutes. We also realized that we had not updated Figure S7 after the last revision in which we had corrected for possible drifts to obtain sensory-selective neurons. The corrected panel A is provided below. Recommendations for the authors: Reviewer #1 (Recommendations For The Authors): It appears that the correct rejection was low for most participants. It would improve interpretation of the behavioral results if correct rejection was shown as a rate (i.e., # of correct rejection trials / total number of no stimulus/blank trials) rather than or in addition to reporting the number of correct rejection trials (Figure 1C). We added the following figure to the supplementary information. The axis tick marks in Figure 5A late versus early are incorrect (appears the axis was duplicated). Thank you for spotting this, it has been corrected. Reviewer #2 (Recommendations For The Authors): We would like to congratulate the authors on this strongly supported contribution to the field. The manuscript is well-written, although a little bit too concise in sections. See the following comments for the methods that could benefit the present conclusions: Thank you for these suggestions that we believe improved our interpretations. Major Points (1) The subpopulations of neurons that are considered are small, but it is not a confounding issue for the conclusions drawn. However, the behavior of the neurons that were excluded should be considered by calculating the percentage of neurons that are selective for the distinct parameters, as a function of time. This would greatly strengthen the understanding of what can be observed in the two subcortical structures. We thank the reviewer for this suggestion. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of two clusters, as shown in the new Figure 5 copied below We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section: We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods: To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component. We also updated the discussion: The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021). (2) We highly recommend that the authors consider employing some analysis that decodes therepresentations observable in the activity of individual neurons as a function of time (e.g. Shannon's Mutual Information). This would reinforce and emphasize the most relevant conclusions. We thank the reviewers for this suggestion. Unfortunately, such methods would require many more trials than what we were able to collect in the 10-minute slots available in the operating room. (3) Although there are small populations recorded in each of the two subcortical structures, they aresufficient to attempt a study using population dynamics (primarily, PCA can still work with smaller populations). Given the broad range of dynamics that are observed in a population of single units typically involved in decision-making, it would be interesting to consider whether heterogeneity is a hallmark of decision-making, and trying to summarize the variance in the activity of the entire population should provide a certain understanding of the cue-selective versus the perception-selective qualities, as an example. We now present all 13 neurons that were sensory- or perception-selective for which we had good enough behavior to show hit vs. miss differences in Supplementary Figure S5. Although population-level analyses would be relevant, they are not compatible with the number of neurons we identified. (4) A stronger presentation of what the expectations are for the results would also benefit theinterpretability of the manuscript when added to the introduction and discussion sections. Due to the scarcity of single-neuron data related to perceptual consciousness, especially in the subcortical structures we explored, our prior expectations did not exceed finding perception-selective neurons. We would prefer to avoid refining these expectations post-hoc. Minor Comments (1) Add the shared overlap between differently selective neurons explicitly in the manuscript. We added this information at the end of the results section. (2) Add a consideration in the methods of why the Wilcoxon test or permutation test was selected forseparate uses. How do the results compare? Sorry for this misunderstanding. We clarified this in revised methods: To deal with possibly non-parametric distributions, we used Wilcoxon rank sum test or sign test instead of t-tests to test differences between distributions. We used permutation tests instead of Binomial tests to test whether a reported number of neurons could have been obtained by chance. Reviewer #3 (Recommendations For The Authors): Suggestions for improved or additional experiments, data or analysis: As suggested already in the public review, it might be worth showing all 13 neurons with either stimulusresponsive or perception-selective behaviour and, based on that, deepen the potential interpretation of the results for the different categories. We agree that this information improves the understanding of the underlying data and this addition was also proposed by reviewer 2. We added it in a new supplementary Figure S5. Recommendations for improving the writing and presentation As mentioned in the public review, I think Figure 3 needs clarification. I found that, in some instances, it was not entirely clear which type of analyses or permutation tests had been performed, and I would advocate for more clarity in these instances. For example: Page 6 line 146 "permuting trial labels 1000 times": do you mean randomly attributing a trial to aneuron? Or something else? We agree that this was somewhat unclear. We modified the sentence to: permuting the sign of the trial-wise differences We now define a sign permutation test for paired tests and a trial permutation test for two-sample tests in the methods and specify which test was used in the maintext. Page 7, neurons which have their firing rate modulated by the stimulus: I think you ought to be moreexplicit about the analysis so that we grasp it on the first read. To understand what is shown in Figure 3 I had to go back and forth between the main text and the method, and I am still not sure I completely understood. You compare the firing rate in sliding windows following stimulus onset with the mean firing rate during the 300ms baseline. Sliding windows are between 0 and 400 ms post-stim (according to methods ?) and a neuron is deemed responsive if you find at least one temporal cluster that shows a significant difference with baseline activity (using cluster permutation). Is that correct? Either way, I would recommend being a bit more precise about the analysis that was carried out in the main text, so that we only need to refer to methods when we need specialized information. We agree that the methods section was unclear. We re-wrote the following two paragraphs: To identify sensory-selective neurons, we assumed that subcortical signatures of stimulus detection ought to be found early following its onset and looked for differences in the firing rates during the first 400 ms post-stimulus onset compared to a 300 ms pre-stimulus baseline. To correct for possible drifts occurring during the trial, we subtracted the average cue-locked activity from catch trials to the cuelocked activity of each stimulus-present trials before realigning to stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses, as assessed by a non-parametric sign rank test. A putative neuron was considered sensory-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate. Whether for the shuffled data or the observed data, if more than one cluster was obtained, we discarded all but the longest cluster. This permutation test allowed us to control for multiple comparisons across time and participants. For perception-selective neurons, we looked for differences in the firing rates between hit and miss trials during the first 400 ms post-stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses as assessed by a nonparametric Wilcoxon rank sum test. As for sensory-selective neurons, a putative neuron was considered perception-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate and we discarded all but the longest cluster. Minor points : Figure 3: inset showing action potentials, please also provide the time scale (in the legend for example), so that it's clear that it is not commensurate with the firing rate curve below, but rather corresponds to the dots of the raster plot. We added the text ”[...], duration: 2.5 ms” in Figures 2, 3, and 4. Line 210: I recommend: “we found 8 neurons [...] showing a significant difference *between hits and misses* after stimulus onset." We made the change. Top of page 9, the following sentence is misleading “This result suggests that neurons in these two subcortical structures have mostly different functional roles ; this could read as meaning that functional roles are different between the two structures. Probably what you mean is rather something along this line : “these two subcortical structures both contain neurons displaying several different functional roles” Changed. Line 329: remove double “when” We made the change, thank you for spotting this.

    2. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      A cortico-centric view is dominant in the study of the neural mechanisms of consciousness. This investigation represents the growing interest in understanding how subcortical regions are involved in conscious perception. To achieve this, the authors engaged in an ambitious and rare procedure in humans of directly recording from neurons in the subthalamic nucleus and thalamus. While participants were in surgery for the placement of deep brain stimulation devices for the treatment of essential tremor and Parkinson's disease, they were awakened and completed a perceptual-threshold tactile detection task. The authors identified individual neurons and analyzed single-unit activity corresponding with the task phases and tactile detection/perception. Among the neurons that were perception-responsive, the authors report changes in firing rate beginning ~150 milliseconds from the onset of the tactile stimulation. Curiously, the majority of the perception-responsive neurons had a higher firing rate for missed/not perceived trials. In summary, this investigation is a valuable addition to the growing literature on the role of subcortical regions in conscious perception.

      Strengths:

      The authors achieved the challenging task of recording human single-unit activity while participants performed a tactile perception task. The methods and statistics are clearly explained and rigorous, particularly for managing false positives and non-normal distributions. The results offer new detail at the level of individual neurons in the emerging recognition of the role of subcortical regions in conscious perception.

      We thank the reviewer for their positive comments.

      Weaknesses:

      "Nonetheless, it remains unknown how the firing rate of subcortical neurons changes when a stimulus is consciously perceived." (lines 76-77) The authors could be more specific about what exactly single-unit recordings offer for interrogating the role of subcortical regions in conscious perception that is unique from alternative neural activity recordings (e.g., local field potential) or recordings that are used as proxies of neural activity (e.g., fMRI).

      We agree with the reviewer that the contribution of micro-electrode recordings was not sufficiently put forward in our manuscript. We added the following sentences to the discussion, when discussing the multiple types of neurons we found:

      Single-unit recordings provide a much higher temporal resolution than functional imaging, which helps assess how the neural correlates of consciousness unfold over time. Contrary to local field potentials, single-unit recordings can expose the variety of functional roles of neurons within subcortical regions, thereby offering a potential for a better mechanistic understanding of perceptual consciousness.

      Related comment for the following excerpts:

      "After a random delay ranging from 0.5 to 1 s, a "respond" cue was played, prompting participants to verbally report whether they felt a vibration or not. Therefore, none of the reported analyses are confounded by motor responses." (lines 97-99).

      "These results show that subthalamic and thalamic neurons are modulated by stimulus onset, irrespective of whether it was reported or not, even though no immediate motor response was required." (lines 188190).

      "By imposing a delay between the end of the tactile stimulation window and the subjective report, we ensured that neuronal responses reflected stimulus detection and not mere motor responses." (lines 245247).

      It is a valuable feature of the paradigm that the reporting period was initiated hundreds of milliseconds after the stimulus presentation so that the neural responses should not represent "mere motor responses". However, verbal report of having perceived or not perceived a stimulus is a motor response and because the participants anticipate having to make these reports before the onset of the response period, there may be motor preparatory activity from the time of the perceived stimulus that is absent for the not perceived stimulus. The authors show sensitivity to this issue by identifying task-selective neurons and their discussion of the results that refer to the confound of post-perceptual processing. Still, direct treatment of this possible confound would help the rigor of the interpretation of the results.

      We agree with the reviewer that direct treatment would have provided the best control. One way to avoid motor preparation is to only provide the stimulus-effector mapping after the stimulus presentation (Bennur & Gold, 2011; Twomey et al., 2016; Fang et al., 2024). Other controls to avoid post-perceptual processing used in consciousness research consist of using no-report paradigms (Tsuchiya et al., 2015) as we did in previous studies (Pereira et al., 2021; Stockart et al., 2024). Unfortunately, neither of these procedures was feasible during the 10 minutes allotted for the research task in an intraoperative setting with auditory cues and vocal responses. We would like to highlight nonetheless that the effects we report are shortlived and incompatible with sustained motor preparation activity.

      We added the following sentence to the discussion:

      Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness.

      "When analyzing tactile perception, we ensured that our results were not contaminated with spurious behavior (e.g. fluctuation of attention and arousal due to the surgical procedure)." (lines 118-117).

      Confidence in the results would be improved if the authors clarified exactly what behaviors were considered as contaminating the results (e.g., eye closure, saccades, and bodily movements) and how they were determined.

      This sentence was indeed unclear. It introduced the trial selection procedure we used to compensate for drifts in the perceptual threshold, which can result from fluctuations in attention or arousal. We modified the sentence, which now reads:

      When analyzing tactile perception, we ensured that our results were not contaminated by fluctuating attention and arousal due to the surgical procedure. Based on objective criteria, we excluded specific series of trials from analyses and focused on time windows for which hits and misses occurred in commensurate proportions (see methods).

      During the recordings, the experimenter stood next to the patients and monitored their bodily movements, ensuring they did not close their eyes or produce any other bodily movements synchronous with stimulus presentation.

      The authors' discussion of the thalamic neurons could be more precise. The authors show that only certain areas of the thalamus were recorded (in or near the ventral lateral nucleus, according to Figure S3C). The ventral lateral nucleus has a unique relationship to tactile and motor systems, so do the authors hypothesize these same perception-selective neurons would be active in the same way for visual, auditory, olfactory, and taste perception? Moreover, the authors minimally interpret the location of the task, sensory, and perception-responsive neurons. Figure S3 suggests these neurons are overlapping. Did the authors expect this overlap and what does it mean for the functional organization of the ventral lateral nucleus and subthalamic nucleus in conscious perception?

      These are excellent questions, the answers to which we can only speculate. In rodents, the LT is known as a hub for multisensory processing, as over 90% of LT neurons respond to at least two sensory modalities (for a review, see Yang et al., 2024). Yet, no study has compared how LT neurons in rodents encode perceived and nonperceived stimuli across modalities. Evidence in humans is scarce, with only a few studies documenting supramodal neural correlates of consciousness at the cortical level with noninvsasive methods (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022). We now refer to these studies in the revised discussion: Moreover, given the prominent role of the thalamus in multisensory processing, it will be interesting to assess if it is specifically involved in tactile consciousness or if it has a supramodal contribution, akin to what is found in the cortex (Noel et al., 2018; Sanchez et al., 2020; Filimonov et al., 2022).

      Concerning the anatomical overlap of neurons, we could not reconstruct the exact locations of the DBS tracts for all participants. Because of the limited number of recorded neurons, we preferred to refrain from drawing strong conclusions about the functional organization of the ventral lateral nucleus.

      "We note that, 6 out of 8 neurons had higher firing rates for missed trials than hit trials, although this proportion was not significant (binomial test: p = 0.145)." (lines 215-216).

      It appears that in the three example neurons shown in Figure 4, 2 out of 3 (#001 and #068) show a change in firing rate predominantly for the missed stimulations. Meanwhile, #034 shows a clear hit response (although there is an early missed response - decreased firing rate - around 150 ms that is not statistically significant). This is a counterintuitive finding when compared to previous results from the thalamus (e.g., local field potentials and fMRI) that show the opposite response profile (i.e., missed/not perceived trials display no change or reduced response relative to hit/perceived trials). The discussion of the results should address this, including if these seemingly competing findings can be rectified.

      We thank the reviewer for pointing out this limitation of the discussion. We avoided putting too much emphasis on these aspects due to the limited number of perception-selective neurons. Although subcortical connectivity models would predict that neurons in the thalamus should increase their firing rate for perceived stimuli, we were not surprised to see this heterogeneity as we had previously found neurons decreasing their firing rates for missed stimuli in the posterior parietal cortex (Pereira et al., 2021). We answer these points in response to the reviewer’s last comment below on the latencies of the effects.

      The authors report 8 perception-responsive neurons, but there are only 5 recording sites highlighted (i.e., filled-in squares and circles) in Figures S3C and 4D. Was this an omission or were three neurons removed from the perception-responsive analysis?

      Unfortunately, we could not obtain anatomical images for all participants. This information was present in the methods section, although not clearly enough:

      For 34 / 50 neurons, preoperative MRI and postoperative CT scans (co-registered in patient native space using CranialSuite) were available to precisely reconstruct surgical trajectories and recording locations (for the remaining 16 neurons, localizations were based on neurosurgical planning and confirmed by electrophysiological recordings at various depths).

      Therefore, we added the following sentence in Figures 2, 3, 4 and S3.

      [...] for patients for which we could obtain anatomical images.

      Could the authors speak to the timing of the responses reported in Figure 4? The statistically significant intervals suggested both early (~160-200ms) to late responses (~300ms). Some have hypothesized that subcortical regions are early - ahead of cortical activation that may be linked with conscious perception. Do these results say anything about this temporal model for when subcortical regions are active in conscious perception?

      We agree that response timing could have been better described. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of the two clusters mentioned by the reviewer very clearly. We now include this analysis in a new Figure 5 in the revised manuscript.

      We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section:

      We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods:

      To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component.

      We updated the discussion, including the points made in the comment about higher activity for missed stimuli (above):

      The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021).

      Reviewer #2 (Public Review):

      The authors have studied subpopulations of individual neurons recorded in the thalamus and subthalamic nucleus (STN) of awake humans performing a simple cognitive task. They have carefully designed their task structure to eliminate motor components that could confound their analyses in these subcortical structures, given that the data was recorded in patients with Parkinson's Disease (PD) and diagnosed with an Essential Tremor (ET). The recorded data represents a promising addition to the field. The analyses that the authors have applied can serve as a strong starting point for exploring the kinds of complex signals that can emerge within a single neuron's activity. Pereira et. al conclude that their results from single neurons indicate that task-related activity occurs, purportedly separate from previously identified sensory signals. These conclusions are a promising and novel perspective for how the field thinks about the emergence of decisions and sensory perception across the entire brain as a unit.

      We thank the reviewer for these positive comments.

      Despite the strength of the data that was obtained and the relevant nature of the conclusions that were drawn, there are certain limitations that must be taken into consideration:

      (1) The authors make several claims that their findings are direct representations of consciousnessidentifiable in subcortical structures. The current context for consciousness does not sufficiently define how the consciousness is related to the perceptual task.

      This is indeed a complex issue in all studies concerned with perceptual consciousness and we were careful not to make such “direct” claims. Instead, we used the state-of-the-art tools available to study consciousness (see below) and only interpreted our findings with respect to consciousness in the discussion. For example, in the abstract, our claim is that “Our results provide direct neurophysiological evidence of the involvement of the subthalamic nucleus and the thalamus for the detection of vibrotactile stimuli, thereby calling for a less cortico-centric view of the neural correlates of consciousness.”

      In brief, first, we used near-threshold stimuli which allowed us to contrast reported vs. unreported trials while keeping the physical properties of the stimulus comparable. Second, we used subjective reports without incentive for participants to be more conservative or liberal in their response (e.g. through reward). Third, we introduced a random delay before the responses to limit confounding effects due to the report. We also acknowledged that “... it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & Tallon-Baudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015)”. This last sentence now reads (to address a point made by Reviewer 1 about motor preparation):

      Future studies ruling out the presence of motor preparation triggered by perceived stimuli (Bennur & Gold, 2011; Fang et al., 2024; Twomey et al., 2016) and verifying that similar neuronal activity occurs in the absence of task-demands (no-reports; Tsuchiya et al., 2015) or attention (Wyart & Tallon-Baudry, 2008) will be useful to support that subcortical neurons contribute specifically to perceptual consciousness.

      (2) The current work would benefit greatly from a description and clarification of what all the neurons thathave been recorded are doing. The authors' criteria for selecting subpopulations with task-relevant activity are appropriate, but understanding the heterogeneity in a population of single neurons is important for broader considerations that are being studied within the field.

      We followed the reviewer’s suggestions and added new results regarding the latencies of the reported effects (new Figure 5). We also now show firing rates for hits, misses and overall sensory activity (hits and misses combined) for all perception-selective or sensory-selective (when behavior was good enough; Figure S5). Although a more detailed characterization of the heterogeneity of the neurons identified would have been relevant, it seems beyond the scope of the present study, especially given the relatively small number of neurons we identified, as well as the relative simplicity of the paradigm imposed by the clinical context in which we worked.

      (3) The authors have omitted a proper set of controls for comparison against the active trials, forexample, where a response was not necessary. Please explain why this choice was made and what implications are necessary to consider.

      We had mentioned this limitation in the discussion: Nevertheless, it will be important in future studies to examine if similar subcortical responses are obtained when stimuli are unattended (Wyart & TallonBaudry, 2008), task-irrelevant (Shafto & Pitts, 2015), or when participants passively experience stimuli without the instruction to report them (i.e., no-report paradigms) (Tsuchyia et al., 2015). We agree that such a control would have been relevant, but this was not feasible during the 10 minutes allotted for the research task in an intraoperative setting. These constraints are both clinical, to minimize discomfort for patients and practical, as is difficult to track neurons in an intraoperative setting for more than 10 minutes.

      We added a sentence to this effect in the discussion.

      Reviewer #3 (Public Review):

      Summary:

      This important study relies on a rare dataset: intracranial recordings within the thalamus and the subthalamic nucleus in awake humans, while they were performing a tactile detection task. This procedure allowed the authors to identify a small but significant proportion of individual neurons, in both structures, whose activity correlated with the task (e.g. their firing rate changed following the audio cue signalling the start of a trial) and/or with the stimulus presentation (change in firing rate around 200 ms following tactile stimulation) and/or with participant's reported subjective perception of the stimulus (difference between hits and misses around 200 ms following tactile stimulation). Whereas most studies interested in the neural underpinnings of conscious perception focus on cortical areas, these results suggest that subcortical structures might also play a role in conscious perception, notably tactile detection.

      Strengths:

      There are two strongly valuable aspects in this study that make the evidence convincing and even compelling. First, these types of data are exceptional, the authors could have access to subcortical recordings in awake and behaving humans during surgery. Additionally, the methods are solid. The behavioral study meets the best standards of the domain, with a careful calibration of the stimulation levels (staircase) to maintain them around the detection threshold, and an additional selection of time intervals where the behavior was stable. The authors also checked that stimulus intensity was the same on average for hits and misses within these selected periods, which warrants that the effects of detection that are observed here are not confounded by stimulus intensity. The neural data analysis is also very sound and well-conducted. The statistical approach complies with current best practices, although I found that, in some instances, it was not entirely clear which type of permutations had been performed, and I would advocate for more clarity in these instances. Globally the figures are nice, clear, and well presented. I appreciated the fact that the precise anatomical location of the neurons was directly shown in each figure.

      We thank the reviewer for this positive evaluation.

      Weaknesses:

      Some clarification is needed for interpreting Figure 3, top rows: in my understanding the black curve is already the result of a subtraction between stimulus present trials and catch trials, to remove potential drifts; if so, it does not make sense to compare it with the firing rate recorded for catch trials.

      The black curve represents the firing rate without any subtraction. We only subtracted the firing rates of catch trials in the statistical procedure, as the reviewer noted, to remove potential drift. We added (before baseline correction) to the legend of Figure 3.

      I also think that the article could benefit from a more thorough presentation of the data and that this could help refine the interpretation which seems to be a bit incomplete in the current version. There are 8 stimulus-responsive neurons and 8 perception-selective neurons, with only one showing both effects, resulting in a total of 15 individual neurons being in either category or 13 neurons if we exclude those in which the behavior is not good enough for the hit versus miss analysis (Figure S4A). In my opinion, it should be feasible to show the data for all of them (either in a main figure, or at least in supplementary), but in the present version, we get to see the data for only 3 neurons for each analysis. This very small selection includes the only neuron that shows both effects (neuron #001; which is also cue selective), but this is not highlighted in the text. It would be interesting to see both the stimulus-response data and the hit versus miss data for all 13 neurons as it could help develop the interpretation of exactly how these neurons might be involved in stimulus processing and conscious perception. This should give rise to distinct interpretations for the three possible categories. Neurons that are stimulus-responsive but not perception-selective should show the same response for both hits and misses and hence carry out indifferently conscious and unconscious responses. The fact that some neurons show the opposite pattern is particularly intriguing and might give rise to a very specific interpretation: if the neuron really doesn't tend to respond to the stimulus when hits and misses are put together, it might be a neuron that does not directly respond to the stimulus, but whose spontaneous fluctuations across trials affect how the stimulus is perceived when they occur in a specific time window after the stimulus. Finally, neuron #001 responds with what looks like a real burst of evoked activity to stimulation and also shows a difference between hits and misses, but intriguingly, the response is strongest for misses. In the discussion, the interesting interpretation in terms of a specific gating of information by subcortical structures seems to apply well to this last example, but not necessarily to the other categories.

      We now provide a supplementary Figure showing firing rates for hits, misses and the combination of both. The reviewer’s analysis about whether a perception-selective neuron also has to respond to the stimulus to be involved in gating is interesting. With more data, a finer characterization of these neurons would have been possible. In our study, it is possible that more neurons have similar characteristics as #001 (e.g. #032, #062, #068) but do not show a significant difference with respect to baseline when both hits and misses are considered. We now avoid interpreting null effects, especially considering the low number of trials with near-threshold detection behavior we could collect in 10 minutes. 

      We also realized that we had not updated Figure S7 after the last revision in which we had corrected for possible drifts to obtain sensory-selective neurons. The corrected panel A is provided below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It appears that the correct rejection was low for most participants. It would improve interpretation of the behavioral results if correct rejection was shown as a rate (i.e., # of correct rejection trials / total number of no stimulus/blank trials) rather than or in addition to reporting the number of correct rejection trials (Figure 1C).

      We added the following figure to the supplementary information.

      The axis tick marks in Figure 5A late versus early are incorrect (appears the axis was duplicated).

      Thank you for spotting this, it has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      We would like to congratulate the authors on this strongly supported contribution to the field. The manuscript is well-written, although a little bit too concise in sections. See the following comments for the methods that could benefit the present conclusions:

      Thank you for these suggestions that we believe improved our interpretations.

      Major Points

      (1) The subpopulations of neurons that are considered are small, but it is not a confounding issue for the conclusions drawn. However, the behavior of the neurons that were excluded should be considered by calculating the percentage of neurons that are selective for the distinct parameters, as a function of time. This would greatly strengthen the understanding of what can be observed in the two subcortical structures.

      We thank the reviewer for this suggestion. We performed a new analysis of the latencies at which our main effects were observed. This analysis revealed the existence of two clusters, as shown in the new Figure 5 copied below

      We also performed a new analysis to support the existence of bimodal distributions and quantified the latencies. We added this text to the result section:

      We note that the timings of sensory and perception effects in Figures 3 and 4 showed a bimodal distribution with an early cluster (149 ms for sensory neurons; 121 ms for perception neurons; c.f. methods) and a later cluster (330 ms for sensory neurons; 315 ms for perception neurons; Figure 5). and this section to the methods:

      To measure bimodal timings of effect latencies, we fitted a two-component Gaussian mixture distribution to the data in Figure 5 by minimizing the mean square error with an interior-point method. We took the best of 20 runs with random initialization points and verified that the resulting mean square error was markedly (> 4 times) better than using a single component.

      We also updated the discussion:

      The early cluster’s average timing around 150 ms post-stimulus corresponds to the onset of a putative cortical correlate of tactile consciousness, the somatosensory awareness negativity (Dembski et al., 2021). Similar electroencephalographic markers are found in the visual and auditory modality. It is unclear, however, whether these markers are related to perceptual consciousness or selective attention (Dembski et al., 2021). The later cluster is centered around 300 ms and could correspond to a well known electroencephalographic marker, the P3b (Polich, 2007) whose association with perceptual consciousness has been questioned (Pitts et al., 2014; Dembski et al., 2021) although brain activity related to consciousness has been observed at similar timing even in the absence of report demands (Sergent et al., 2021; Stockart et al., 2024). It is also important to note that these clusters contain neurons with both increased and decreased firing rates following stimulus onset, similar to what was observed previously in the posterior parietal cortex (Pereira et al., 2021).

      (2) We highly recommend that the authors consider employing some analysis that decodes therepresentations observable in the activity of individual neurons as a function of time (e.g. Shannon's Mutual Information). This would reinforce and emphasize the most relevant conclusions.

      We thank the reviewers for this suggestion. Unfortunately, such methods would require many more trials than what we were able to collect in the 10-minute slots available in the operating room.

      (3) Although there are small populations recorded in each of the two subcortical structures, they aresufficient to attempt a study using population dynamics (primarily, PCA can still work with smaller populations). Given the broad range of dynamics that are observed in a population of single units typically involved in decision-making, it would be interesting to consider whether heterogeneity is a hallmark of decision-making, and trying to summarize the variance in the activity of the entire population should provide a certain understanding of the cue-selective versus the perception-selective qualities, as an example.

      We now present all 13 neurons that were sensory- or perception-selective for which we had good enough behavior to show hit vs. miss differences in Supplementary Figure S5. Although population-level analyses would be relevant, they are not compatible with the number of neurons we identified.

      (4) A stronger presentation of what the expectations are for the results would also benefit theinterpretability of the manuscript when added to the introduction and discussion sections.

      Due to the scarcity of single-neuron data related to perceptual consciousness, especially in the subcortical structures we explored, our prior expectations did not exceed finding perception-selective neurons. We would prefer to avoid refining these expectations post-hoc. 

      Minor Comments

      (1) Add the shared overlap between differently selective neurons explicitly in the manuscript.

      We added this information at the end of the results section.

      (2) Add a consideration in the methods of why the Wilcoxon test or permutation test was selected forseparate uses. How do the results compare?

      Sorry for this misunderstanding. We clarified this in revised methods:

      To deal with possibly non-parametric distributions, we used Wilcoxon rank sum test or sign test instead of t-tests to test differences between distributions. We used permutation tests instead of Binomial tests to test whether a reported number of neurons could have been obtained by chance.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analysis:

      As suggested already in the public review, it might be worth showing all 13 neurons with either stimulusresponsive or perception-selective behaviour and, based on that, deepen the potential interpretation of the results for the different categories.

      We agree that this information improves the understanding of the underlying data and this addition was also proposed by reviewer 2. We added it in a new supplementary Figure S5.

      Recommendations for improving the writing and presentation

      As mentioned in the public review, I think Figure 3 needs clarification. I found that, in some instances, it was not entirely clear which type of analyses or permutation tests had been performed, and I would advocate for more clarity in these instances. For example:

      Page 6 line 146 "permuting trial labels 1000 times": do you mean randomly attributing a trial to aneuron? Or something else?

      We agree that this was somewhat unclear. We modified the sentence to:

      permuting the sign of the trial-wise differences

      We now define a sign permutation test for paired tests and a trial permutation test for two-sample tests in the methods and specify which test was used in the maintext.

      Page 7, neurons which have their firing rate modulated by the stimulus: I think you ought to be moreexplicit about the analysis so that we grasp it on the first read. To understand what is shown in Figure 3 I had to go back and forth between the main text and the method, and I am still not sure I completely understood. You compare the firing rate in sliding windows following stimulus onset with the mean firing rate during the 300ms baseline. Sliding windows are between 0 and 400 ms post-stim (according to methods ?) and a neuron is deemed responsive if you find at least one temporal cluster that shows a significant difference with baseline activity (using cluster permutation). Is that correct? Either way, I would recommend being a bit more precise about the analysis that was carried out in the main text, so that we only need to refer to methods when we need specialized information.

      We agree that the methods section was unclear. We re-wrote the following two paragraphs:

      To identify sensory-selective neurons, we assumed that subcortical signatures of stimulus detection ought to be found early following its onset and looked for differences in the firing rates during the first 400 ms post-stimulus onset compared to a 300 ms pre-stimulus baseline. To correct for possible drifts occurring during the trial, we subtracted the average cue-locked activity from catch trials to the cuelocked activity of each stimulus-present trials before realigning to stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses, as assessed by a non-parametric sign rank test. A putative neuron was considered sensory-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate. Whether for the shuffled data or the observed data, if more than one cluster was obtained, we discarded all but the longest cluster. This permutation test allowed us to control for multiple comparisons across time and participants.

      For perception-selective neurons, we looked for differences in the firing rates between hit and miss trials during the first 400 ms post-stimulus onset. We defined a cluster as a set of adjacent time points for which the firing rates were significantly different between hits and misses as assessed by a nonparametric Wilcoxon rank sum test. As for sensory-selective neurons, a putative neuron was considered perception-selective when the length of a cluster was above 80 ms, corresponding to twice the standard deviation of the smoothing kernel used to compute the firing rate and we discarded all but the longest cluster.

      Minor points:

      Figure 3: inset showing action potentials, please also provide the time scale (in the legend for example), so that it's clear that it is not commensurate with the firing rate curve below, but rather corresponds to the dots of the raster plot.

      We added the text ”[...], duration: 2.5 ms” in Figures 2, 3, and 4.

      Line 210: I recommend: “we found 8 neurons [...] showing a significant difference *between hits and misses* after stimulus onset."

      We made the change.

      Top of page 9, the following sentence is misleading “This result suggests that neurons in these two subcortical structures have mostly different functional roles ; this could read as meaning that functional roles are different between the two structures. Probably what you mean is rather something along this line : “these two subcortical structures both contain neurons displaying several different functional roles”

      Changed.

      Line 329: remove double “when”

      We made the change, thank you for spotting this.

    1. Author response:

      The following is the authors’ response to the previous reviews

      We would like to thank you for your valuable comments and suggestions, which have greatly contributed to improving our manuscript.

      We have carefully addressed all the reviewers' suggestions, and detailed responses for each Reviewer are provided at the end of this letter. In summary:

      • The Introduction has been revised to provide a more focused discussion on results, toning down the speculative discussion on seasonal host shifts.

      • The methodology section has been clarified, particularly the power analysis, which now includes a clearer explanation. The random effects in the models have been better described to ensure transparency.

      • The Results section was reorganized to highlight the key findings more effectively.

      • The Discussion has been restructured for clarity and conciseness, ensuring the interpretation of the results is clearer and better aligned with the study objectives.

      • Minor edits throughout the manuscript were made to improve readability and accuracy.

      We hope you find this revised version of the manuscript satisfactory.

      Reviewer #1 (Public review):

      Summary:

      This study examines the role of host blood meal source, temperature, and photoperiod on the reproductive traits of Cx. quinquefasciatus, an important vector of numerous pathogens of medical importance. The host use pattern of Cx. quinquefasciatus is interesting in that it feeds on birds during spring and shifts to feeding on mammals towards fall. Various hypotheses have been proposed to explain the seasonal shift in host use in this species but have provided limited evidence. This study examines whether the shifting of host classes from birds to mammals towards autumn offers any reproductive advantages to Cx.

      quinquefasciatus in terms of enhanced fecundity, fertility, and hatchability of the offspring. The authors found no evidence of this, suggesting that alternate mechanisms may drive the seasonal shift in host use in Cx. quinquefasciatus.

      Strengths:

      Host blood meal source, temperature, and photoperiod were all examined together.

      Weaknesses:

      The study was conducted in laboratory conditions with a local population of Cx. quinquefasciatus from Argentina. I'm not sure if there is any evidence for a seasonal shift in the host use pattern in Cx. quinquefasciatus populations from the southern latitudes.

      Comments on the revision:

      Overall, the manuscript is much improved. However, the introduction and parts of the discussion that talk about addressing the question of seasonal shift in host use pattern of Cx. quin are still way too strong and must be toned down. There is no strong evidence to show this host shift in Argentinian mosquito populations. Therefore, it is just misleading. I suggest removing all this and sticking to discussing only the effects of blood meal source and seasonality on the reproductive outcomes of Cx. quin.

      Introduction and discussion have been modified, toned down and sticked to discuss the results as suggested.

      Reviewer #1 (Recommendations for the authors):

      Some more minor comments are mentioned below.

      Line 51: Because 'of' this,

      Changed as suggested.

      Line 56: specialists 'or' generalists

      Changed as suggested.

      Line 56: primarily

      Changed as suggested.

      Line 98: Because 'of' this,

      Changed as suggested.

      Reviewer #2 (Public review):

      Summary:

      Conceptually, this study is interesting and is the first attempt to account for the potentially interactive effects of seasonality and blood source on mosquito fitness, which the authors frame as a possible explanation for previously observed hostswitching of Culex quinquefasciatus from birds to mammals in the fall. The authors hypothesize that if changes in fitness by blood source change between seasons, higher fitness on birds in the summer and on mammals in the autumn could drive observed host switching. To test this, the authors fed individuals from a colony of Cx. quinquefasciatus on chickens (bird model) and mice (mammal model) and subjected each of these two groups to two different environmental conditions reflecting the high and low temperatures and photoperiod experienced in summer and autumn in Córdoba, Argentina (aka seasonality). They measured fecundity, fertility, and hatchability over two gonotrophic cycles. The authors then used generalized linear mixed models to evaluate the impact of host species, seasonality, and gonotrophic cycle on fecundity, fertility, and hatchability. The authors were trying to test their hypothesis by determining whether there was an interactive effect of season and host species on mosquito fitness. This is an interesting hypothesis; if it had been supported, it would provide support for a new mechanism driving host switching. While the authors did report an interactive impact of seasonality and host species, the directionality of the effect was the opposite from that hypothesized. The authors have done a very good job of addressing many of the reviewer's concerns, especially by adding two additional replicates. Several minor concerns remain, especially regarding unclear statements in the discussion.

      Strengths:

      (1) Using a combination of laboratory feedings and incubators to simulate seasonal environmental conditions is a good, controlled way to assess the potentially interactive impact of host species and seasonality on the fitness of Culex quinquefasciatus in the lab.

      (2) The driving hypothesis is an interesting and creative way to think about a potential driver of host switching observed in the field.

      Weaknesses:

      (1) The methods would be improved by some additional details. For example, clarifying the number of generations for which mosquitoes were maintained in colony (which was changed from 20 to several) and whether replicates were conducted at different time points.

      Changed as suggested.

      (2) The statistical analysis requires some additional explanation. For example, you suggest that the power analysis was conducted a priori, but this was not mentioned in your first two drafts, so I wonder if it was actually conducted after the first replicate. It would be helpful to include further detail, such as how the parameters were estimated. Also, it would be helpful to clarify why replicate was included as a random effect for fecundity and fertility but as a fixed effect for hatchability. This might explain why there were no significant differences for hatchability given that you were estimating for more parameters.

      The power analysis was conducted a posteriori, as you correctly inferred. While I did not indicate that it was performed a priori, you are right in noting that this was not explicitly mentioned. As you suggested, the methodology for the power analysis has been revised to clarify any potential doubts.

      Regarding the model for hatchability, a model without a random effect variable was used, as all attempts to fit models with random effects resulted in poor validation. These points have now been clarified and explained in the corresponding section.

      (3) A number of statements in the discussion are not clear. For example, what do you mean by a mixed perspective in the first paragraph? Also, why is the expectation mentioned in the second paragraph different from the hypothesis you described in your introduction?

      Changed as suggested.

      (4) According to eLife policy, data must be made freely available (not just upon request).

      Data and code will be publicly available. The corresponding section was modified.

      Reviewer #2 (Recommendations for the authors):

      Your manuscript is much improved by the inclusion of two additional replicates! The results are much more robust when we can see that the trends that you report are replicable across 3 iterations of the experiment. Congratulations on a greatly improved study and paper! I have several minor concerns and suggestions, listed below:

      38-39: I think it is clearer to say "no statistically significant effect of season on hatchability of eggs" ... or specify if you are referring to blood or the interaction of blood and season. It isn't clear which treatment you are referring to here.

      Changed as suggested.

      54-57: This could be stated more succinctly. Instead of citing papers that deal with specific examples of patterns, I would suggest citing a review paper that defines these terms.

      Changed as suggested.

      83-84: What if another migratory bird is the preferred host in Argentina? I would state this more cautiously (e.g. "may not be applicable...").

      Changed as suggested.

      95-96: I don't understand what you mean by this. These hypotheses are specifically meant to understand mosquitoes that DO have a distinct seasonal phenology, so I'm not sure why this caveat is relevant. And naturally this hypothesis is host dependent, since it is based on specific host reproductive investments. I think that the strongest caveat to this hypothesis is simply that it hasn't been proven.

      Changed as suggested.

      97-115: This is a great paragraph! Very clear and compelling.

      Thanks for your words!

      118: Do you have an exact or estimated number of rafts collected?

      Sorry, I have not the exact number of rafts, but it was at leas more than 20-30.

      135: "over twenty" was changed to "several"; several would imply about 3 generations, so this is misleading. If the colony was actually maintained for over twenty generations, then you should keep that wording.

      Changed as suggested.

      163-164: Can you please clarify whether the replicates were conducted a separate time points?

      Changed as suggested.

      Note: the track changes did not capture all of the changes made; e.g. 163-164 should show as new text but does not.

      You are absolutely right; when I uploaded the last version, I unfortunately deleted all tracked changes and cannot recover them. In this new version, I will ensure that all minimal changes are included as tracked changes.

      186 - 189: the terms should be "fixed effect" and "random effect"

      Changed as suggested.

      191: Edit: linear

      Changed as suggested.

      194: why was replicate not included as a random effect here when it was above? Also, can you please clarify "interaction effects"? Which interactions did you include?

      Changed as suggested. Explained above and in methodology. Hatchability models with random effect variable were poor fitted and validated. The interactions for hatchability were a four-way (season, blood source, cycle and replicate)

      207-208: I'm not sure what you mean by "aimed to achieve"? Weren't you doing this after you conducted the experiments, so wouldn't this be determining the power of your model (post-hoc power analysis)? Also, I think you should provide the parameter estimates that were used (e.g. effect size - did you use the effect size you estimated across the 3 replicates?).

      Changed as suggested.

      214-215: this should be reworded to acknowledge that this is estimated for the given effect size; for example, something like "This sample size was sufficient to detect the observed effect with a statistical power of 0.8" or something along those lines (unless I am misunderstanding how you conducted this test).

      Changed as suggested.

      246. Abbreviate Culex

      Changed as suggested.

      253-255: This sentence isn't clear. What do you mean by mixed? Also, the season really seemed to mainly impact the fitness of mosquitoes fed on mouse blood and here the way it is phrased seems to indicate that season has an impact on the fitness of those fed with chicken blood.

      Changed as suggested.

      258-260: You stated your hypothesis as the relative fitness shifting between seasons, but this statement about the expectation is different from your hypothesis stated earlier. Please clarify.

      You are right. Thank you for noting this. It was changed as suggested.  

      263-266: I also don't understand this sentence; what does the first half of the sentence have to do with the second?

      Changed as suggested.

      269-270: This doesn't align with your observation exactly; you say first AND second are generally most productive, but you observed a drop in the second. Please clarify this.

      Changed as suggested.

      280: I suggest removing "as same as other studies"; your caveats are distinct because your experimental design was unique

      Changed as suggested.

      287: you shouldn't be looking for a "desired" effect; I suggest removing this word

      Changed as suggested.

      288: It wasn't really a priori though, since you conducted it after your first replicate (unless you didn't use the results from the first replicate you reported in the original drafts?)

      It was a posteriori. Changed as suggested.

      290: Why is 290 written here?

      It was a mistype. Deleted as suggested.

      291-298: The meaning of this section of your paragraph is not clear.

      Improve as suggested.

      304-313: This list of 3 explanations are directed at different underlying questions. Explanations 1 and 2 are alternative explanations for why host switching occurs if not due to differences in fitness. This isn't really an explanation of your results so much as alternative explanations for a previously reported phenomenon. And the third is an explanation for why you may not have observed the expected effect. I suggest restructuring this to include the fact that Argentinian quinqs may not host switch as part of your previous list of caveats. Then you can include your two alternative explanations for host switching as a possible future direction (although I would say that it is really just one explanation because "vector biology" is too broad of a statement to be testable). Also, you haven't discussed possible explanations for your actual result, which showed that mosquito fitness decreased when feeding on mouse blood in autumn conditions and in the second gonotrophic, while those that fed on chicken did not experience these changes. Why might that be?

      The discussion was restructured to include all these suggested changes. Additionally, it was also discussed some possible explanations of our results.

      315-317: This statement is vague without a direct explanation of how this will provide insight. I suggest removing or providing an explanation of how this provides insight to transmission and forecasting.

      Changed as suggested.

      319-320: According to eLife policy, all data should be publicly available. From guidelines: "Media Policy FAQs Data Availability Purpose and General Principles To maintain high standards of research reproducibility, and to promote the reuse of new findings, eLife requires all data associated with an article to be made freely and widely available. These must be in the most useful formats and according to the relevant reporting standards, unless there are compelling legal or ethical reasons to restrict access. The provision of data should comply with FAIR principles (Findable, Accessible, Interoperable, Reusable). Specifically, authors must make all original data used to support the claims of the paper, or that is required to reproduce them, available in the manuscript text, tables, figures or supplementary materials, or at a trusted digital repository (the latter is recommended). This must include all variables, treatment conditions, and observations described in the manuscript. The authors must also provide a full account of the materials and procedures used to collect, pre-process, clean, generate and analyze the data that would enable it to be independently reproduced by other researchers."

      - so you need to make your data available online; I also understand the last sentence to indicate that code should be made available.  

      Data and code will be publicly available.

      Table 1: it is notable that in replicate 2, the autumn:mouse:gonotrophic cycle II fecundity and fertility are actually higher than in the summer, which is the opposite of reps 1 and 3 and the overall effect you reported from the model. This might be worth mentioning in the discussion.

      Mentioned in the discussion as suggested.

      Tables 1 and 2: shouldn't this just be 8 treatments? You included replicate as a random effect, so it isn't really a separate set of treatments.

      This table reflects the output of the whole experiment, that is why it is present the 24 expetiments.

      Figure 3: Can you please clarify if this is showing raw data?

      Changed as suggested.

      Note: grammatical copy editing would be beneficial throughout

      Grammar was improved as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors Eapen et al. investigated the peptide inhibitors of Cdc20. They applied a rational design approach, substituting residues found in the D-box consensus sequences to better align the peptides with the Cdc20-degron interface. In the process, the authors designed and tested a series of more potent binders, including ones that contain unnatural amino acids, and verified binding modes by elucidating the Cdc-20-peptide structures. The authors further showed that these peptides can engage with Cdc20 in the cellular context, and can inhibit APC/CCdc20 ubiquitination activity. Finally, the authors demonstrated that these peptides could be used as portable degron motifs that drive the degradation of a fused fluorescent protein.

      Strengths:

      This manuscript is clear and straightforward to follow. The investigation of different peptide variations was comprehensive and well-executed. This work provided the groundwork for the development of peptide drug modalities to inhibit degradation or apply peptides as portable motifs to achieve targeted degradation. Both of which are impactful.

      Weaknesses:

      A few minor comments:

      (1) In my opinion, more attention to the solubility issue needs to be discussed and/or tested. On page 10, what is the solubility of D2 before a modification was made? The authors mentioned that position 2 is likely solvent exposed, it is not immediately clear to me why the mutation made was from one hydrophobic residue to another. What was the level of improvement in solubility? Are there any affinity data associated with the peptide that differ with D2 only at position 2?

      The reviewer is correct that we have not done any detailed solubility characterisation; we refer only to observations rather than quantitative analysis. We wrote that we reverted from Leu to Ala due to solubility - we have clarified this statement (page 11) to say that that we reverted to Ala, as it was the residue present in D1, for which we observed a measurable affinity by SPR and saw a concentration-dependent response in the thermal shift analysis. We do not have any peptides or affinity data that explore single-site mutations with the parental peptide of D2. D2 is included in the paper because of its link to the consensus D-box sequence and thus was the logical path to the investigations into positions 3 and 7 that come later in the manuscript.

      (2) I'm not entirely convinced that the D19 density not observed in the crystal structure was due to crystal packing. This peptide is peculiar as it also did not induce any thermal stabilization of Cdc20 in the cellular thermal shift assay. Perhaps the binding of this peptide could be investigated in more detail (i.e., NMR?) Or at least more explanation could be provided.

      This section has been clarified (page 16). The lack of observed density was likely due to the relatively low affinity of D19 and also to the lack of binding of the three C-terminal residues in the crystal, and consequently it has a further reduced affinity. The current wording in the manuscript puts greater emphasis on this second aspect being a D19-specific issue, even though it applies to all four soaked peptides. The extent of peptide-induced thermal stabilisations observed by TSA and CETSA is different, with the latter experiment consistently showing smaller shifts. This observation may be due to the more complex medium (cell lysate vs. purified protein) and/or different concentrations of the proteins in solution. In the CETSA, we over-expressed a HiBiT-tagged Cdc20, which is present in addition to any endogenously expressed Cdc20. Although we did not investigate it, the near identical D-box binding sites on Cdc20 and Cdh1 would suggest that there will be cross-specificity, which could further influence the CETSA experiments.

      The section now reads:

      “We therefore assume that this is the reason for the lack of observed density in this region of the peptides D20 and D21 (Fig. S3E and S3F, respectively). We believe that it causes a reduction in binding affinities of all peptides in crystallo, given the evidence from SPR highlighting a role of position 7 in the interaction (Table 1). Interestingly, the observed electron density of the peptide correlates with Cdc20 binding affinity: D21 and D20, having the highest affinities, display the clearest electron density allowing six amino acids to be modeled, whereas D7 shows relatively poor density permitting modelling of only four residues. For D19, the lack of density observed likely reflects its intrinsically weaker affinity compared to the other peptides, in addition to losing the interactions from position 7 due to crystal packing.”

      Reviewer #2 (Public review):

      Summary:

      The authors took a well-characterised (partly by them), important E3 ligase, in the anaphase-promoting complex, and decided to design peptide inhibitors for it based on one of the known interacting motifs (called D-box) from its substrates. They incorporate unnatural amino acids to better occupy the interaction site, improve the binding affinity, and lay foundations for future therapeutics - maybe combining their findings with additional target sites.

      Strengths:

      The paper is mostly strengths - a logical progression of experiments, very well explained and carried out to a high standard. The authors use a carefully chosen variety of techniques (including X-ray crystallography, multiple binding analyses, and ubiquitination assays) to verify their findings - and they impressively achieve their goals by honing in on tight-binders.

      Weaknesses:

      Some things are not explained fully and it would be useful to have some clarification. Why did the authors decide to model their inhibitors on the D-box motif and not the other two SLiMs that they describe?

      For completeness, in addition to the D-box we did originally construct peptides based on the ABBA and KEN-box motifs, but they did not show any shift in melting temperature of cdc20 in the thermal shift assay whereas the D-box peptides did; consequently, we focused our efforts on the D-box peptides. Moreover, there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study by Mark Hall’s lab (described in Qin et al. 2016), which tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated. They observed that whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study from David Morgan’s lab (Hartooni et al. 2022) looking at binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.

      We have added the following text to the Results section “Design of D-box peptides” (page 10):

      “We focused on D-box peptides, as there is much evidence from the literature that points to the unique importance of the D-box motif in mediating productive interactions of substrates with the APC/C (i.e. those leading to polyubiquitination & degradation). One of the clearest examples is a study that tested the degradation of 15 substrates of yeast APC/C in strains carrying alleles of Cdh1 in which the docking sites for D-box, KEN or ABBA were mutated ((Qin et al. 2017)). They observed that, whereas degradation of all 15 substrates depended on D-box binding, only a subset required the KEN binding site on Cdh1 and only one required the ABBA binding site. A more recent study (Hartooni et al. 2022) of binding affinities of different degron peptides concluded that KEN motif has very low affinity for Cdc20 and is unlikely to mediate degradation of APC/C-Cdc20 substrates. Engagement of substrate with the D-box receptor is therefore the most critical event mediating APC/C activity and the interaction that needs to be blocked for most effective inhibition of substrate degradation.”

      What exactly do they mean when they say their 'observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast 'pseudo-substrate' inhibitor Acm1, acts to impede polyubiquitination of the bound protein'? It's an interesting thing to think about, and probably the paper they cite explains it more but I would like to know without having to find that other paper.

      Interesting results from a number of labs (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10.  In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator. Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      We have modified the text (page 18) from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation.  Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Reviewer #3 (Public review):

      Summary:

      Eapen and coworkers use a rational design approach to generate new peptide-inspired ligands at the D-box interface of cdc20. These new peptides serve as new starting points for blocking APC/C in the context of cancer, as well as manipulating APC/C for targeted protein degradation therapeutic approaches.

      Strengths:

      The characterization of new peptide-like ligands is generally solid and multifaceted, including binding assays, thermal stability enhancement in vitro and in cells, X-ray crystallography, and degradation assays.

      Weaknesses:

      One important finding of the study is that the strongest binders did not correlate with the fastest degradation in a cellular assay, but explanations for this behavior were not supported experimentally. Some minor issues regarding experimental replicates and details were also noted.

      Interesting results from a number of labs (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011, Qin et al. 2019) have shown that mutation of degron SLiMs in Acm1 that weaken interaction with the APC/C have the unexpected consequence of converting Acm1 from APC/C inhibitor to APC/C substrate. A necessary conclusion of these studies is that the outcome of degron binding (i.e. whether the binder functions as substrate or inhibitor) depends on factors other than D-box affinity and that D-box affinity can counteract them. One idea is that if a binder interacts too tightly, this removes some flexibility required for the polyubiquitination process. The most recent study on this question (Qin et al.2019) specifically pins the explanation for the inhibitory function of the high affinity D-box in Acm1 on its ‘D-box Extension’ (i.e. residues 8-12) preventing interaction with APC10.  In our current study, the binding affinity of peptides is measured against Cdc20. In cellular assays however, the D-box must also engage APC10 for degradation to occur. It may be that the peptide binding most strongly to the D-box pocket on Cdc20 is less able to bind to APC10 and therefore less effective in triggering APC10-dependent steps in the polyubiquitination pathway. The important Hartooni et al. paper from David Morgan’s lab confirms that even though the binding of D-box residues to APC10 is very weak on its own, it can contribute 100X increase in affinity of a peptide by adding cooperativity to the interaction of D-box with co-activator. Re Figure 6 and the fact that we did look at peptide binding in cells, these experiments were done in unsynchronised cells, so most Cdc20 would not be bound to APC/C.

      We have modified the text (page 18) from:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with the idea that high-affinity binding at degron binding sites on APC/C, such as in the case of the yeast ‘pseudo-substrate’ inhibitor Acm1, acts to impede polyubiquitination of the bound protein (Qin et al. 2019). Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. As shown in Qin et al., mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Qin et al. 2019). Overall, our results support the conclusions that all the D-box peptides engage productively with the APC/C and that the highest affinity interactors act as inhibitors rather than functional degrons of APC/C.”

      to:

      “However, we found the opposite effect: D2 and D3 showed increased rates of mNeon degradation compared to D1 and D19 (Fig. 8C,D). This observation is consistent with conclusions from other studies that affinity of degron binding does not necessarily correlate with efficiency of degradation.  Indeed, there is no evidence that Hsl1, which is the highest affinity natural D-box (D1) used in our study, is degraded any more rapidly than other substrates of APC/C in yeast mitosis. A number of studies of a yeast ‘pseudo-substrate’ inhibitor Acm1, have shown that mutation of the high affinity D-box in Acm1 converts it from inhibitor to substrate (Choi et al. 2008,  Enquist-Newman et al. 2008,  Burton et al. 2011) through a mechanism that governs recruitment of APC10 (Qin et al. 2019). Our study does not consider the contribution of APC10 to binding of our peptides to APC/C<sup>Cdc20</sup> complex, but since there is strong cooperativity provided by this additional interaction (Hartooni et al. 2022) we propose this as the critical factor in determining the ability of the different peptides to mediate degradation of associated mNeon.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) On page 12 (towards the end), the author stated D10 contained an A3P mutation, they meant P3A right? 'To test this hypothesis, we proceeded to synthesise D10, a derivative of D4 containing an A3P single point mutation.'

      We thank the reviewer for spotting this typo, which we have corrected.

      (2) Have the authors considered other orthogonal approaches to cross-examine/validate binding affinities? That said, I do not think extra experiments are necessary.

      We did not explore further orthogonal approaches due to the challenges of producing sufficient amounts of the Cdc20 protein. Due to the low affinities of many peptides for Cdc20, many techniques would have required more protein than we were able to produce. We believe that the qualitative TSA combined with the SPR is sufficient to convince the readers; indeed there is a correlation between SPR-determined binding affinities and the thermal shifts: For the natural amino acid-containing peptides (Table 1) D19 has the highest affinity and causes the largest thermal shift in the Cdc20 melting temperature, D10 has the lowest affinity and causes the smallest thermal shift, and D1, D3, D4, and D5 and all rank in the middle by both techniques. For those peptides containing unnatural amino acids (Table 2), again higher affinities are reflected in larger thermal shifts.

      Reviewer #2 (Recommendations for the authors):

      The data seem fine to me. I would appreciate a little more detail on the points mentioned in the public review. Also a thorough reread, maybe by a disinterested party as there are various typos that could be corrected - all in all an excellent clear paper that encompasses a lot of work.

      A colleague has carefully checked the manuscript, and typos have been corrected.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      The following revisions are in progress:

      - From Reviewer-1: The authors observe defects in CNCCs through genomic experiments. It would be really nice to perform simple wound healing/scratch assays and/or transwell assays to test if the CNCC migration phenotype is reduced in the CHD3 KO as well which would support the transcriptomic data.

      As recommended by the Reviewer, we are performing a transwell assays to investigate whether CHD3 loss leads to defects in cell migration. These experiments should be completed in the next two weeks.

      __- From Reviewer-2: __Since CHD3 shows a progressive upregulation in expression during CNCC differentiation (Fig. 2E), one hypothesis can be that it is not necessary involved in the activation of the CNCC programs but instead it is involved in maintaining these programs active - by keeping regulatory elements accessible. Thus, authors should check expression of CNCC markers, and EMT genes at the same time point than Fig. 2E in both WT and KO cells.

      As recommended by the reviewer we are differentiating the cells to perform RT-qPCR timecourse for CNCC and EMT markers. These experiments will be completed in the next two weeks.

      __- From Reviewer-2: __It has been shown that CNCC regulatory elements controlling differentiation genes are primed/accessible prior migration (PMID: 31792380; PMID: 33542111). Since the authors claim "CHD3 may have the role of priming the developing CNCCs to respond to BMP by opening the chromatin at the BMP responsive enhancers", it will be good to perform ATAC-seq are several time point during the differentiation process to assess the dynamic of chromatin reorganization to see when the switch to mesoderm fate occurs and how accessibility of BMP responsive element changes in WT and KO cells during CNCC differentiation to be able to demonstrate the KO fail to make BMP responsive element accessible or whether it is a defect in the maintenance of this accessibility.

      As recommended by the Reviewer, we are differentiating the cells to perform ATAC-seq timecourse. These experiments will be completed in the next two/three weeks.

      2. Description of the revisions that have already been incorporated in the transferred manuscript

      The following revisions have already been carried out:

      Reviewer1

      1. Figure 1 presents nice confirmation of the CHD3 KO cell lines being used. However, given that these cell lines were previously published, I suggest moving these data to the supplement. As suggested by the Reviewer, we moved most of Figure 1 to the supplement, merging the remaining Figure 1 with Figure 2.

      In the results section for Figure 1, the authors discuss the CHD3 heterozygotes, but I only see the KO cell line data presented. It would be especially nice to see the protein levels of Chd3 in the het.

      As suggested, we have now performed western blot and qPCR for CHD3 in the heterozygous line and added it to Supplementary Figure S1.

      The authors discuss which genes are up and downregulated in the Chd3 KO D18 RNA-seq, and show a clear heatmap in Figure 2A for WT cells. The same heatmap for candidate genes discussed in the results would be appreciated for Chd3 KO.

      As recommended by the Reviewer, we have added CHD3-KO RNA-seq to the heatmap in Fig. 2A.

      In general 2-3 replicates are presented. While the authors are showing heatmaps for selected locations for individual clones, which is appreciated (ex: Figure 4B and Fig 6), the QC for data quality is missing. For example, show spearmean correlation across the genome for datasets as a supplement.

      We performed spearman correlation of ATAC-seq and RNA-seq data, which confirmed the replicates are very highly correlated, and created new dedicated supplemental figures (Supplemental Figures S3, S4, S5, S6, S7).

      In the section discussing the results presented in Figure 4, the authors discuss the ATAC-seq peak number changes and overlap with gene expression changes. However, the overlap with gene expression changes is not shown. Making a simple Venn diagram would help readers.

      As suggested, we added a Venn diagram with ATAC-seq/RNA-seq overlap in Figure 3D.

      In addition, showing a heatmap for unchanged ATAC-seq peaks can help to demonstrate the increase/decrease.

      As recommended, we have added an heatmap for unchanged ATAC-seq regions as Supplementary Figure S7.

      In Figure 6, the authors present ChIPseq data for CHD3 in D14 and D18 samples, focusing on locations losing or gaining accessibility. What is enrichment at unchanged sites? Is CHD3 specifically enriched at changed locations? Then what about over genes with altered gene expression vs not changed? Is CHD3 only bound to distal elements? Performing an analysis of the peak distribution, perhaps with ChromHMM or other methods to look at promoter vs enhancer vs other locations. These types of analyses could really enrich the interpretation of direct CHD3 function.

      Unfortunately, there is no ChromHMM data for neural crest cells, nor for closely related cell types. Therefore, to address the Reviewer's suggestion, we have taken two approaches: 1) We have further broken down the distribution of the peaks, dividing them between intergenic, intronic, exonic and TSS. Moreover, we have leveraged publicly available H3K27ac ChIP-seq data generated (by our group) in iPSC-derived CNCCs to identify CHD3 peaks that are decorated by this histone modification which typically marks active enhancers. This analysis revealed that 91% of the peaks are either intergenic (50%) or intronic (41%) and that ~a third of the peaks are decorated with H3K27ac in human iPSC-derived CNCCs, suggesting that they are bona-fide active enhancers in this cell type.

      Related to the above, I am not sure if there is a phenotypic test for enhanced mesoderm. I suspect only IF/expression and morphology are possible, which the authors did. However, sorting the cells (with some defined markers) to ask how many are mesoderm-like vs CNCC in WT vs CHD3 KO would give some information outside of the bulk expression data.

      The manuscript already included IF experiments for mesodermal markers, which clearly show that nearly all the cells acquired the mesodermal fate. See for example Brachyury IF in Figure 2E.

      Minor points Reviewer-1: 12. 1A seems to fit better with Figure 2. Done 13. The authors say that the KO cell lines are not defective in pluripotency, but Figures 1G suggests a slight decrease in SSEA-1. Is this reproducibly observed? It is not statistically significant and not reproducibly observed. 14. Would be nice to show number of up and downregulated genes in volcano plots for fast viewing of readers (ex: Fig 2B). We have modified the volcano plot as suggested. 15. Is it fair to use violin plots when data points are only 2-3 replicates (as in Figures 2C, 3D). To address this, we have layered the actual datapoints on top of the violin plots.

      The labels in Fig 4A and 5E are very hard to read.We have changed color to improve readability. 17. For browser tracks, the authors show very zoomed in examples (Fig 4C, and especially Fig 6C). showing a bit more of the area around these peaks would give readers a more clear appreciation of the data. Related to browser tracks, including more information just as including the gene expression changes (such as in Fig 6C) to enhance the interpretation of the impact of Chd3 binding, accessibility change and then, I presume, reduced Sox9 expression. Similar suggestion for Figure 4C, where I anticipate coordinate transcription changes of the associated genes. We have zoomed out the tracks, as suggested, and added expression data next to them. 19. Do the authors observe any clone variability between the two CHD3 KO clones? There is variability I see in some of the heatmaps, but don't know if that it is because of clones or technical variation. We do not observe any significant variability between the clones.

      Reviewer-2 1. What is the expression level of CHD3 in the heterozygote line? Does the remaining allele compensate for the loss which will explain the absence of phenotype?

      Ass suggested also by Reviewer-1, we have performed western blot for CHD3 in the heterozygous line and added it to Supplementary Figure S1. The bot shows that the remaining allele does not compensate. However it is likely that even a reduced amount of wild-type CHD3 is sufficient for proper CNCC specification.

      The authors should use the term "regulatory elements" instead of "enhancers" as they can act either as activator or repressors.

      As suggested, we have changed nomenclature from enhancers to cis-regulatory elements.

      On the same line, while the authors indicate "Motif analysis of the enhancers aberrantly active in CHD3-KO cells ", they haven't shown these are active. They should say they perform the analysis on regulatory elements aberrantly accessible in CHD3 KO. Done.

      See point 3 above.

      The rationale that led the authors to focus on genes typically expressed in the primitive streak and in the early pre-migratory mesoderm, and BMP responsive transcription factors could be better explained. Are they part of the most deregulated genes in the RNA-seq analysis?

      Not only mesodermal genes are among the most upregulated genes in the RNA-seq, but the motifs for the transcription factors encoded by these genes (e.g. TBR2, Brachyury, GATA, TBX3, TBX6) are among the most frequently represented in the aberrantly accessible cis-regulatory elements. The same applies to BMP responsive factor, but the other way around (they are downregulated and enriched in the aberrantly closed ATAC-seq regions).

      In the absence of CHD3, BMP response is not effective. While the authors nicely showed this is linked with changes in chromatin accessibility, it is necessary to check the expression levels of BMP receptors in CHD3 KO cells.

      We have checked the expression of these genes, and they were not differentially expressed. This is consistent with the downstream response being affected rather than ligand binding to the receptors.

      Aberrant early mesoderm signature of the CHD3-KO cells needs to be better shown. It is not obvious from the GO analysis in Fig. 2 and the authors then showed expression of some markers but it is unclear how they picked them up.

      See point 5: not only mesodermal genes are among the most upregulated genes in the RNA-seq, but the motifs for the transcription factors encoded by these genes (e.g. TBR2, Brachyury, GATA, TBX3, TBX6) are among the most frequently represented in the aberrantly accessible cis-regulatory elements. See for example expression levels of typical mesodermal genes below:

      EOMES - upregulated log2FC: 5.5

      TBXT - upregulated log2FC: 4.6

      MESP1 - upregulated log2FC: 4.7

      MIXL1 - upregulated log2FC: 5.4

      TBX6 - upregulated log2FC: 3.2

      MSGN1 - upregulated log2FC: 4.6

      HAND1 - upregulated log2FC: 5.5

      The authors claim CHD3 directly binds at BMP responsive enhancers, but in the figure, they show the data for all the region gaining or losing activity. It will be nice to add the information for the BMP responsive elements only.

      As recommended, we have added an heatmap for BMP responsive regions only, clearly showing that CHD3 binds them (Supplementary Figure S7).

      The authors need to support better that CHD3-KO express more Wnt signaling/activity.

      We have checked expression of many genes that are typically Wnt responsive during mesoderm specification (see also point 7). These include:

      EOMES - upregulated log2FC: 5.5

      TBXT - upregulated log2FC: 4.6

      MESP1 - upregulated log2FC: 4.7

      MIXL1 - upregulated log2FC: 5.4

      TBX6 - upregulated log2FC: 3.2

      MSGN1 - upregulated log2FC: 4.6

      HAND1 - upregulated log2FC: 5.5

      These data clearly support that the Wnt-mediated mesodermal program is markedly upregulated.

      Minor points Reviewer-2: 13. In the discussion, the authors could indicate whether CHD3 mutants somehow phenocopies some of the craniofacial defects observed in DLX5 mutant patients. Done. 14. It is not indicated were to find the data regarding expression epithelial and mesenchymal genes in the CHD3-KO cells. They are in the heatmap in Fig. 1C. 15. Authors could add in the discussion what is known about how CHD3 function changes from opening or closing chromatin is very intriguing a could be discussed. To our knowledge, nothing is known on this. CHD3 is significantly understudied.

      OPTIONAL: While this is not necessary for the current study, it is very intriguing that other CHD family member do not compensate. How this tissue or DNA sequence activity is achieved could be discussed. What are CHD4 or CHD5 expressed during CNCC differentiation? Could they be used to rescue the CHD3 KO phenotype? While this may be difficult to test, it could perhaps be discussed.

      We have added a paragraph on this in the discussion.

      3. Description of analyses that authors prefer not to carry out* *

      From Reviewer 1: Given the changes in the CHD3-KO accessibility are mostly gene distal, are there existing Hi-C/microC/promoter CaptureC or other that can be used to ask if these are interacting with the predicted genes?

      We are not aware of this type of essays being performed genome-wide in human CNCCs. The only studies performed in human CNCCs are SOX9-centred. Looking at 3D chromatin conformation would also be out of the scope of the paper.

      From Reviewer-2:

      OPTIONAL: Does increasing BMP concentration early during CHD3 KO differentiation has a better effect at rescuing CNCC differentiation?

      Indicated by Reviewer as OPTIONAL. We do not think that adding BMP earlier on would make a significant difference in rescuing CNCC differentiation.

      From Reviewer-1: Are the results observed NuRD-based or CHD3 NuRD independent functions? Looking at other NuRD subunit binding or effects in differentiation would help to dig into this a bit more. I realize this is a bit of a big ask, so I am not asking for everything. Are there existing binding data in CNCCs for a NuRD subunit that could be examined for overlap in where these changes occur, for example? I want to be clear I am not asking the authors to do all the experiments for an alternative NuRD subunit.

      There are no existing data on NuRD binding in CNCCs. However, while the Reviewer is definitely not recommending generating new data in this regard, we still decided to make an attempt at performing ChIP-seq for the core NuRD subunit MBD3 in our CNCC. We will only make one attempt (multiple replicates), and if it does not work we will not pursue this any further as the Reviewer clearly stated that this is not necessary nor required and we do not want to delay the resubmission.

    1. Reviewer #3 (Public review):

      Summary:

      The authors compare how well their automatic dimension prediction approach (DimPred) can support similarity judgements and compare it to more standard RSA approaches. The authors show that the DimPred approach does better when assessing out-of-sample heterogeneous image sets, but worse for out-of-sample homogeneous image sets. DimPred also does better at predicting brain-behaviour correspondences compared to an alternative approach. The work appears to be well done, but I'm left unsure what conclusions the authors are drawing.

      In the abstract, the authors write: "Together, our results demonstrate that current neural networks carry information sufficient for capturing broadly-sampled similarity scores, offering a pathway towards the automated collection of similarity scores for natural images". If that is the main claim, then they have done a reasonable job supporting this conclusion. However the importance of automating this process for broadly-sampled object categories is not made so clear.

      But the authors also highlight the importance that similarity judgements have been for theories of cognition and brain, such as in the first paragraph of the paper they write: "Similarity judgments allow us to improve our understanding of a variety of cognitive processes, including object recognition, categorization, decision making, and semantic memory6-13. In addition, they offer a convenient means for relating mental representations to representations in the human brain14,15 and other domains16,17". The fact that the authors also assess how well a CLIP model using DimPred can predict brain activation suggests that their work is not just about automating similarity judgements, but highlighting how their approach reveals that ANNs are more similar to brains than previously assessed.

      My main concern is with regards to the claim that DimPred is revealing better similarities between ANNs and brains (a claim that the authors may not be making, but this should be clarified). The fact that predictions are poor for homogenous images is problematic for this claim, and I expect their DimPred scores would be very poor under many conditions, such as when applied to line drawings of objects, or a variety of addition out-of-sample stimuli that are easily identified by humans. The fact that so many different models get such similar prediction scores (Fig 3) also raises questions as to the inferences you can make about ANN-brain similarity based on the results. Do the authors want to claim that CLIP models are more like brains?

      With regards to the brain prediction results, why is the DimPred approach doing so much better in V1? I would not think the 49 interpretable categories are encoded in V1, and the ability to predict would likely reflect a confound rather than V1 encoding these categories (e.g., if a category was "things that are burning" then DNN might predict V1 activation based on the encoding of colour).

      In addition, more information is needed on the baseline model, as it is hard to appreciate whether we should be impressed by the better performance of DimPred based on what is provided: "As a baseline, we fit a voxel encoding model of all 49 dimensions. Since dimension scores were available only for one image per category36, for the baseline model, we used the same value for each image of the same category and estimated predictive performance using cross-validation". Is it surprising that predictions are not good with one image per category? Is this a reasonable comparison?

      Relatedly, what was the ability of the baseline model to predict? (I don't think that information was provided). Did the authors attempt to predict outside the visual brain areas? What would it mean if predictions were still better there?

      Minor points:

      The authors write: "Please note that, for simplicity, we refer to the similarity matrix derived from this embedding as "ground-truth", even though this is only a predicted similarity". Given this, it does not seem a good idea to use "ground truth" as this clarification will be lost in future work citing this article.

      It would be good to have the 49 interpretable dimensions listed in the supplemental materials rather than having to go to the original paper.

      Strengths:

      The experiments seem well done.

      Weaknesses:

      It is not clear what claims are being made.

    1. Author response:

      We thank the reviewers for their comments and for their constructive suggestions. We intend to submit a revised manuscript where we address the comments made in the Public Reviews as well as in the Recommendations for the Authors.

      One of our most interesting findings, as noted by the reviewers, was the discovery of a small subpopulation of cells likely arrested in G2 that accounts for a disproportionate amount of radiation-induced gene expression. In addition, to the responses indicated below, we are planning to include additional “wet lab” experiments in the revised manuscript that address the properties of this seemingly important subpopulation of cells.

      Reviewer 1:

      Strengths:

      (1) The authors have used robust methods for rearing Drosophila larvae, irradiating wing discs, and analyzing the data with Seurat v5 and HHI.

      (2) These data will be informative for the field.

      (3) Most of the data is well-presented.

      (4) The literature is appropriately cited.

      Thank you for these comments

      Weaknesses:

      (1) The data in Figure 1 are single-image representations. I assume that counting the number of nuclei that are positive for these markers is difficult, but it would be good to get a sense of how representative these images are and how many discs were analyzed for each condition in B-M.

      (2) Some of the figures are unclear.

      In the revised manuscript, we will provide a more detailed quantitative analysis. For each condition, we analyzed 4 - 9 discs.

      We assume that the reviewer in referring to panels in Figure 1. We will review these images and if necessary, repeat the experiments or choose alternative images that appear clearer.

      Reviewer 2:

      Overall, the data presented in the manuscript are of high quality but are largely descriptive. This study is therefore perceived as a resource that can serve as an inspiration for the field to carry out follow-up experiments.

      We intend to include more  “wet lab” experiments in our revised manuscript to address the identity and properties of the high-trbl cells that we have identified using the clustering approach based on cell-cycle gene expression.

      Reviewer 3:

      Strengths:

      Overall, the manuscript makes a compelling case for heterogeneity in gene expression changes that occur in response to uniform induction of damage by X-rays in a single-layer epithelium. This is an important finding that would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

      Thank you.

      Weaknesses:

      This work would be more useful to the field if the authors could provide a more comprehensive discussion of both the impact and the limitations of their findings, as explained below.

      Propidium iodide staining was used as a quality control step to exclude cells with a compromised cell membrane. But this would exclude dead/dying cells that result from irradiation. What fraction of the total do these cells represent? Based on the literature, including works cited by the authors, up to 85% of cells die at 4000R, but this likely happens over a longer period than 4 hours after irradiation. Even if only half of the 85% are PI-positive by 4 hr, this still removes about 40% of the cell population from analysis. The remaining cells that manage to stay alive (excluding PI) at 4 hours and included in the analysis may or may not be representative of the whole disc. More relevant time points that anticipate apoptosis at 4 hr may be 2 hr after irradiation, at which time pro-apoptotic gene expression peaks (Wichmann 2006). Can the authors rule out the possibility that there is heterogeneity in apoptosis gene expression, but cells with higher expression are dead by 4 hours, and what is left behind (and analyzed in this study) may be the ones with more uniform, lower expression? I am not asking the authors to redo the study with a shorter time point, but to incorporate the known schedule of events into their data interpretation.

      We thank the reviewer for these important comments. The generation of single-cell RNAseq data from irradiated cells is tricky. Many cells have already died. Even those that do not incorporate propidium iodide are likely in early stages of apoptosis or are physiologically unhealthy and likely made it through our FACS filters. Indeed, in irradiated samples up to  57% of sequenced cells were not included in our analysis since their RNA content seemed to be of low quality. It is therefore likely that our data are biased towards cells that are less damaged. As advised by the reviewer, we will include a clearer discussion of these issues as well as the time course of events and how our analysis captures RNA levels only at a single time point.

      If cluster 3 is G1/S, cluster 5 is late S/G2, and cluster 4 is G2/M, what are clusters 0, 1, and 2 that collectively account for more than half of the cells in the wing disc? Are the proportions of clusters 3, 4, and 5 in agreement with prior studies that used FACS to quantify wing disc cells according to cell cycle stage?

      Clusters 0, 1, and 2 likely contain cells in other stages of the cell cycle, including early G1. Other studies indicate that more than 70% of cells are expected to have a 4C DNA content 4 h after irradiation at 4000 Rad. The high-trbl cluster only accounts for 18% of cells. Thus clusters 0, 1 and 2 could potentially contain other populations that also have a 4C DNA content. Importantly, similar proportions of cells in these clusters are also observed in unirradiated discs. We are mining the gene expression patterns in these clusters with the goal of estimating their location in the cell cycle and will include those data in the revised manuscript.

      The EdU data in Figure 1 is very interesting, especially the persistence in the hinge. The authors speculate that this may be due to cells staying in S phase or performing a higher level of repair-related DNA synthesis. If so, wouldn't you expect 'High PCNA' cells to overlap with the hinge clusters in Figures 6G-G'? Again, no new experiments are needed. Just a more thorough discussion of the data.

      We have found that the locations of elevated PCNA expression do not always correlate with the location of EdU incorporation either by examining scRNA-seq data or by using HCR to detect PCNA. PCNA expression is far more widespread. We intend to present additional data that address this point and also a more thorough discussion in the revised manuscript.

      Trbl/G2/M cluster shows Ets21C induction, while the pattern of Ets21C induction as detected by HCR in Figures 5H-I appears in localized clusters. I thought G2/M cells are not spatially confined. Are Ets21C+ cells in Figure 5 in G2/M? Can the overlap be confirmed, for example, by co-staining for Trbl or a G2/M marker with Ets21C?

      The data show that the high_-trbl_ cells are higher in Ets21C transcripts relative to other cell-cycle-based clusters after irradiation. This does not imply that high-trbl-cells in all regions of the disc upregulate Ets21C equally. Ets21C expression is likely heterogeneous in both ways – by location in the disc and by cell-cycle state. We will attempt to look for co-localization as suggested by the reviewer.

      Induction of dysf in some but not all discs is interesting. What were the proportions? Any possibility of a sex-linked induction that can be addressed by separating male and female larvae?

      We can separate the cells in our dataset into male and female cells by expression of lncRNA:roX1/2. When we do this, we see X-ray induced dysf expressed similarly in both male and female cells. We think that it is therefore unlikely that this difference in expression can be attributed to cell sex. We are investigating other possibilities such as the maturity of discs.

    1. There’s one critical aspect of critiques that we haven’t discussed yet, however. How does someone judge what makes a design “good”?In one sense, “good” is a domain-dependent idea. For example, what makes an email client “good” in our example above is shaped by the culture and use of email, and the organizations and communities in which it is used. Therefore, you can’t define good without understanding context of use.

      I agree with this part because having a "good" design is hard to judge and can vary from person to person. Some people may believe that a good design is one that is able to generate a lot of profits and help make an organization successful financially. Others may think that a good design has to be unique, creative, and stand out from competitors. I think that those are some elements that designers may think about when creating designs, but I think it all comes back to user research and understanding their needs. I view a good design as one that meets the needs of the users and is accessible to everyone. However, this is still an unclear definition because it is difficult to know which user needs to be prioritized and which is why design can be so complex.

    1. Welcome back, and in this lesson, I want to cover the high-level architecture of Amazon Lex. Amazon Lex is a product that allows you to create interactive chatbots. For most areas of study and for solutions architects working in the real world, you only need a basic level of understanding, and that's exactly what this video will provide. If you need to know anything beyond this, the course you're studying will likely include follow-up videos to this one. If not, don't worry—this video will cover everything that you need. Now let's jump in and get started.

      Amazon Lex is a back-end service. It's not something you're likely to use from a user perspective. Instead, you'll use it to add capabilities to your application. Lex provides text or voice conversational interfaces. For the exam, remember “Lex for voice” or “Lex for Alexa.” If you're familiar with Amazon voice products, just know that Lex powers those products—it provides the conversational capability. It's what lets the lady in the tube answer your questions.

      Lex provides two main bits of functionality. First is automatic speech recognition (ASR), which is simply speech-to-text. Now, I say “simple,” but doing this well is exceptionally difficult. If any of you have tried using Siri, Apple’s voice assistant, you may have noticed how often it gets things wrong compared to the Alexa product. That’s because Siri doesn’t do ASR as well as Lex. And for any lawyers listening—this is just my opinion.

      Lex also provides natural language understanding (NLU) services, which allow it to discover your intent and even perform intent chaining. Imagine the act of ordering a pizza. You might start the conversation by saying, “Can I order a pizza please?” or “I want to order a pizza,” or even “A large pepperoni pizza, please.” The intent—the thing you want to do—is ordering pizza, and it's Lex's job to determine that. But what about your next sentence? “Make that an extra large, please.” Lex needs to understand that this second statement relates to the first. As humans, this is easy—we're good at natural language processing. Computers historically haven't been, but Lex enables voice and text understanding in your applications without needing to code that functionality yourself. You simply integrate Lex, and it does the hard work for you.

      As a service, Lex scales well and integrates with other AWS products such as Amazon Connect. It’s quick to deploy and uses a pay-as-you-go pricing model, meaning it only costs when you’re actively using it. This makes it ideal for event-driven or serverless architectures. In terms of use cases, Lex can help you build chatbots—the kind that pop up on websites asking if you need help—or automated support chats for logging tickets. You can also build voice assistants that respond when you ask for something, just like the lady in the tube. Use cases also include Q&A bots or enterprise productivity bots—basically, any interactive bot that accepts text or voice and performs a service.

      Let’s now review some of the key Lex concepts. Lex provides bots that are designed to interactively converse in one or more languages. I previously mentioned the term "intent." This represents an action the user wants to perform—things like ordering a pizza, ordering a milkshake, or getting a side of fries. In addition to intents, we have the concept of utterances. When creating an intent, you can provide sample utterances—these are ways an intent might be expressed. So to order a pizza, milkshake, or fries, a user might say “Can I order,” “I want to order,” or “Give me a.” These are all different ways of expressing or uttering an intent.

      Along with configuring utterances, you also need to tell Lex how to fulfill the intent, and this is often done using Lambda integration. If Lex understands that the user wants to order a pizza, it needs a way to initiate that process—Lambda functions are typically used for this purpose. Lambda works especially well in event-driven architectures, making it a natural complement to Lex. Additionally, Lex includes the concept of a slot, which you can think of as a parameter for an intent. These might include the size of the pizza (small, medium, or large), the type of crust (normal or cheesy), and other similar details. You can configure slots as required parameters that Lex must gather from the user during the interaction.

      Just to reiterate, Lex is a product you won’t usually interact with directly through the console. It’s something you’ll architect into your applications. If you want to provide interactive voice assistance via a chat or voice-capable bot, you’ll use Amazon Lex. So remember this for the exam.

      With that being said, that is everything I wanted to cover in this video. Go ahead and complete the video, and when you're ready, I’ll look forward to you joining me in the next.

    1. I saw students nodding their heads. And I saw for the first tim e that there can be, and usually is, som e degree o f pain involved in giving up oid ways of thinking and knowing and )earning new approaches. I respect that pain. And I inducte recognition of it now when I teach, that is to say, I teach about shifting paradigms and talk about the discomfort it can cause. White students learning to think more critically about ques-tions o f race and racism may go home for the holidays and sud-denly see their parents in a different light. They may recognize nonprogressive thinking, racism, and so on, and it may hurt them that new ways of knowing may crea te estrangement where there was none. Often when students return from breaks I ask them to share with us how ideas that they bave Jearned or worked on in the classroom impacted on their experience out-side. This gives them both the opportunity to know that diffi-cult experiences may be commou and practice at integrating theory and practice: ways of knowing with habits of being. We practice interrogating habits ofbeing as well as ideas. Through this process we build community

      The final section unified all the concepts for my understanding. Real learning about race coupled with identity becomes a transformative process even though it creates emotional difficulty that pushes students toward development. The teacher promotes students to evaluate school learning effects on their daily lives beyond classrooms. The process of transformative education demonstrates knowledge acquisition as only one aspect because it primarily shifts our worldview and self-understanding.

    1. Welcome back, and in this lesson, I want to cover the FSx products, specifically FSx for Windows File Server. FSx is a shared file system product, but it handles the implementation in a very different way than, say, EFS, which we've covered earlier in the course. FSx for Windows File Server is one of the core components of the range of services that AWS provides to support Windows environments in AWS. For a fair amount of AWS history, its support of Windows environments was pretty bad; it just didn't seem to be a priority. Now this changed with FSx for Windows File Server, which provides fully managed native Windows File Servers or, more specifically, file shares. You're provided with file shares as your unit of consumption. The servers themselves are hidden, which is similar to how RDS is architected, but instead of databases, you get file shares.

      Now, it's a product designed for integration with Windows environments. It's a native Windows file system; it's not an emulated file server. It can integrate with either managed Active Directory or self-managed Active Directory, and this can be running inside AWS or on-premises. This is a critical feature for enterprises who already have their own Active Directory provision. It is a resilient and highly available system, and it can be deployed in either single or multi-AZ mode. Picking between the two controls the network interfaces available and used to access the product. It uses elastic network interfaces inside the VPC. The backend, even in single AZ mode, uses replication within that availability zone to ensure that it's resilient to hardware failure. However, if you pick multi-AZ, then you get a fully multi-AZ, highly available solution.

      It can also perform a full range of different types of backups, which include both client-side and AWS-side features. I'll talk about that later in the lesson. From an AWS side, it can perform both automatic and on-demand backups. Now, file systems that are created inside the FSx product are accessible within a VPC. But also, and this is how more complex environments are supported, they can be accessed over peering connections, VPN connections, and even accessed over physical direct connects. So if you're a large enterprise with a dedicated private link into a VPC, you can access FSx file systems over Direct Connect.

      Now, in the exam, when you’re faced with any questions that talk about shared file systems, you need to be looking to identify any Windows-related keywords. Look for things like native Windows file systems, look for things like Active Directory or Directory Service integration, and look for any of the more advanced features, which I’ll talk about over the remainder of this lesson. Essentially, your job in the exam is to pick when to use FSx versus EFS because these are both network shared file systems that you’ll find on the exam. Generally, EFS tends to be used for shared file systems for Linux EC2 instances as well as Linux on-premises servers, whereas FSx is dedicated to Windows environments, so that's the main distinction between these two different services.

      So let's have a look visually at how a typical implementation of FSx for Windows File Server might look for an organization like Animals for Life. We start with a familiar architecture. We have a VPC on the left and a corporate network on the right, and these networks are connected with Direct Connect or VPN, with some on-premises staff members. Inside the VPC, we have two availability zones (A and B), and in each of those availability zones, we have two different private subnets. FSx uses Active Directory for its user store, so logically, we start with a directory, which can either be a managed directory delivered as a service from AWS or something that is on-premises.

      Now, this is important: FSx can integrate with both, and it doesn’t actually need an Active Directory service defined inside the Directory Services product. Instead, it can connect directly to Active Directory running on-premises. This is critical to understand because it means it can integrate with a completely normal implementation of Active Directory that most large enterprises already have. As I already mentioned, FSx can be deployed either in single AZ or multi-AZ mode, and in both of those, it needs to be connected to some form of directory for its user store. Once deployed, you can create a network share using FSx, and this can be accessed in the normal way using the double backslash, DNS name, and share notation that you'll be familiar with if you use Windows environments. For example, a file system ID dot animalsforlife.org, followed by a slash and "cat pics." In this example, "cat pics" is the actual share.

      Using this access path, the file system can be accessed from other AWS services that use Windows-based storage. An example of this is Workspaces, which is a virtual desktop service similar to Citrix available inside AWS. When you deploy Workspaces into a VPC, not only does it require a directory service to function, but for any shared file system needs, it can also use FSx. The most important thing to remember about FSx is that it is a native Windows file system. It supports things like deduplication, the distributed file system (DFS), which is a way Windows can group file shares together and scale out for a more managed file share structure at scale. It supports at-rest encryption using KMS, and it also lets you enforce encryption in transit. Shares are accessed using the SMB protocol, which is standard in Windows environments, and FSx even allows for volume shadow copies. In this context, volume shadow copies allow users to see multiple file versions and initiate restores from the client side.

      So that’s really important to understand: if you’re utilizing an FSx share from a Windows environment, you can right-click on a file or folder, view previous versions, and initiate file-level restores without having to use AWS or engage with a system administrator. That’s something that’s provided along with the FSx product as long as it’s integrated with Windows environments—you get that capability. Now, from a performance perspective, FSx is highly performant. The performance delivered can range from anywhere from 8 megabytes per second to 2 gigabytes per second. It can deliver hundreds of thousands of IOPS and less than one millisecond latency, so it can scale up to whatever performance requirements your organization has.

      Now, for the exam, you don't need to be aware of the implementation details. I’m trying to focus really on the topics and services that you need for the exam in this course. So when things do occur, I want to teach you more information than you may require for the exam, but there are a lot of topics or features of different services that you only require a high-level overview of, and this is one of those topics. So, what I want to do now is go through some keywords or features that you should be on the lookout for when you see any exam questions that you think might be related to FSx.

      The first of these is DFS, a Windows feature that allows users to perform file and folder-level restores. This is one of the features that's provided and is unique to FSx, meaning that if you have any users of Workspaces and they use files and folders on an FSx share, they can right-click, view previous versions, and restore from a user-driven perspective without having to engage a system administrator. Another thing to be aware of is that FSx provides native Windows file systems that are accessible over SMB. If you see SMB mentioned in the exam, it’s probably going to be FSx as the default correct answer. Remember, the EFS file system uses the NFS protocol and is only accessible from Linux EC2 instances or Linux on-premises servers. If you see any mention of SMB, then you can be almost certain that it’s a Windows environment question and involves FSx.

      Another key feature provided by FSx is that it uses the Windows permission model, so if you're used to managing permissions for folders or files on Windows file systems, you'll be used to exactly how FSx handles permissions. This is provided natively by the product specifically to support Windows environments in AWS. Next is that the product supports DFS, the distributed file system. If you see that mentioned, either its full name or DFS, then you know that this is going to be related to FSx. DFS is a way that you can natively scale out file systems inside Windows environments. You can either group file shares together in one enterprise-wide structure or use DFS for replication or scaling out performance. It’s a really capable distributed file system.

      Now, if you see any questions that talk about the provision of a native Windows file server, but where the admin overhead of running a self-managed EC2 instance running something like Windows Server is not ideal, then you know that it's going to be FSx. FSx provides you with the ability to provision a native Windows file server with file shares but without the admin overhead of managing that server yourself. Lastly, the product is unique in the sense that it delivers these file shares, which can also be integrated with either directory service or your own active directory directly. These are really important things to remember for the exam, and they’ll help you select between other products and FSx.

      Again, I don’t expect you to get many questions on FSx. I do know of at least one or two unique questions in the exam, but even if it only gets you that one extra mark, it can be the difference between a pass and a fail. So try your best to remember all the key features I’ve explained throughout this lesson. But at that point, that is everything I wanted to cover in this theory-only lesson. Go ahead, complete this video, and then when you're ready, I look forward to you joining me in the next.

    1. Welcome back.

      Over the next few lessons, I'm going to be covering Storage Gateway in more depth, focusing on the types of architectures it can support. The key to exam success when it comes to Storage Gateway is understanding when you would use each of the modes, as each has its own specific situation where it should or shouldn't be used. In this lesson, I'll start off with the Storage Gateway running in Volume Stored mode and Volume Cached mode—so let's jump in and get started.

      Storage Gateway normally runs as a virtual machine on-premises, although it can be ordered as a hardware appliance. However, it's much more common to use the virtual machine version of this product. It acts as a bridge between storage that exists on-premises or in a data center and AWS. Locally, it presents storage using iSCSI (a SAN and NAS protocol), NFS (commonly used by Linux environments to share storage over a network), and SMB (used within Windows environments). On the AWS side, it integrates with EBS, S3, and the various types of Glacier.

      As a product, Storage Gateway is used for tasks such as migrations from on-premises to AWS, extending a data center into AWS, and addressing storage shortages by leveraging AWS storage. It can implement storage tiering, assist with disaster recovery, and replace legacy tape media backup solutions. For the exam, you need to identify the correct type of Storage Gateway for a given scenario—and that's what I want to help you with in this set of lessons.

      As a quick visual refresher, a Storage Gateway is typically deployed as a virtual appliance on-premises. Architecturally, you might also have some Network Attached Storage (NAS) or a Storage Area Network (SAN) running on-premises. These storage systems are used by a collection of servers—also running on-premises. The servers probably have their own local disks, but for primary storage, they're likely to connect to the SAN or NAS equipment.

      These storage systems (SANs or NASs) generally use the iSCSI protocol, which presents raw block storage over the network as block devices. The servers see them as just another type of storage device to create a file system on and use normally. This is a traditional architecture in many businesses. What's also common, especially for smaller businesses, is limited funding for backups or effective disaster recovery, prompting them to consider AWS as a solution to rising operational costs or as an alternative to maintaining their own data centers.

      So how does Storage Gateway work? Volume Gateway works in two different modes: Cached mode and Stored mode. They are quite different and offer distinct advantages. First, let's look at Stored mode. In this mode, the virtual appliance presents volumes over iSCSI to servers running on-premises, functioning similarly to NAS or SAN hardware. These volumes appear just like those presented by NAS or SAN devices, allowing servers to create file systems on top of them as they normally would.

      In Gateway Stored mode, these volumes consume local capacity. The Storage Gateway has local storage, which serves as the primary location for all the volumes it presents over iSCSI. This is a critical point for the exam—when you're using Storage Gateway in Volume Stored mode, everything is stored locally. All volumes presented to servers are stored on on-premises local storage.

      In this mode, Storage Gateway also has a separate area called the upload buffer. Any data written to the local volumes is temporarily written to this buffer and then asynchronously copied into AWS via the Storage Gateway endpoint—a public endpoint accessible over a normal internet connection or a public VIF using Direct Connect. The data is copied into S3 in the form of EBS snapshots. Conceptually, these are snapshots of the on-premises volumes, occurring constantly in the background without human intervention. That's the architecture of Storage Gateway running in Volume Stored mode. Think about the architecture and what it enables, because this is what's important for the exam.

      This mode is excellent for doing full disk backups of servers. You're using raw volumes on the on-premises side, and by asynchronously backing them up as EBS snapshots, you get a reliable full disk backup solution with strong RPO and RTO characteristics. Volume Gateway in Stored mode is also great for disaster recovery, since EBS snapshots can be used to create new EBS volumes. In theory, you could provision a full copy of an on-premises server in AWS using just these snapshots.

      However—and this is important for the exam—this mode doesn't support extending your data center capacity. The primary location for data using this mode is on-premises. For every volume presented, there's a full copy of the data stored locally. If you're facing capacity issues, this mode won't help. But if you need low-latency data access, this mode is ideal, as the data resides locally. It also works well for full disk backups or disaster recovery scenarios.

      I emphasize “full disk” here because in the next lessons, I’ll cover other Storage Gateway modes that also help with backups. Volume Gateway deals in volumes—raw disks presented over iSCSI. Some key facts worth knowing (though not required to memorize for the exam): in Volume Stored mode, you can have 32 volumes per gateway, with up to 16 TB per volume, for a total of 512 TB per gateway.

      Now let’s turn to Volume Gateway in Cached mode, which suits different scenarios. Cached mode shares the same basic architecture: the Storage Gateway still runs as a virtual appliance (or physical in some cases), local servers are still presented with volumes via iSCSI, and the Gateway still communicates with AWS via the Storage Gateway endpoint, which remains a public endpoint using either internet or Direct Connect.

      The major difference is the location of the primary data. In Cached mode, the main storage location is AWS—specifically S3—rather than on-premises. The Storage Gateway now only has local cache, while the primary data for all presented volumes resides in S3. This distinction is crucial: in Volume Stored mode, the data is stored locally; in Cached mode, it’s stored in AWS and only cached locally.

      Importantly, when we say the data is in S3, it's actually in an AWS-managed area of S3, visible only through the Storage Gateway console. You can’t browse it in a regular S3 bucket because it stores raw block data, not files or objects. You can still create EBS snapshots from it, just like in Stored mode.

      So the key difference between Stored and Cached modes is the location of the data. Stored mode keeps everything on-premises, using AWS only for backups. Cached mode stores data in S3, caching only the frequently accessed portions locally. This offers substantial architectural benefits: since only cached data is stored locally, you can manage hundreds of terabytes through the gateway while using only a small local cache. This enables an architecture called data center extension.

      For example, imagine an on-premises facility with limited space and rising storage needs. Instead of investing in more hardware, the business can extend into AWS. Storage in AWS appears local, but it's actually hosted in the cloud. While Volume Stored and Cached modes are similar in using raw volumes and supporting EBS snapshots, only Cached mode enables extending data center capacity.

      Stored mode is for backups, DR, and migration. It ensures local LAN-speed access, but requires full data storage locally. Cached mode allows AWS to act as primary storage, storing frequently accessed data locally, enabling cost-effective capacity extension while maintaining low-latency access for hot data. Less frequently accessed data may load more slowly, but it allows huge scalability. In Cached mode, a single gateway can handle up to 32 volumes at 32 TB each—up to 1 PB of data.

      In summary, both modes work with volumes (raw block storage), but Stored mode stores everything locally and uses AWS only for backups, while Cached mode stores data in AWS and caches hot data locally, supporting data center extension. For the exam, if you see the keyword “volume” in a Storage Gateway question, you’re dealing with Volume mode. Deciding between Stored and Cached will depend on whether the scenario focuses on backup/DR/migration or on extending capacity.

      That wraps up the theory for this lesson. In the next lesson, I’ll cover another mode of Storage Gateway: Tape mode, also known as VTL mode. Go ahead and complete this lesson, and when you’re ready, I look forward to having you join me in the next.

    1. Welcome back. In this lesson, I want to talk about AWS Direct Connect. A Direct Connect (DX) is a physical connection into an AWS region. If you order this via AWS, the connection is either 1 gig, 10 gig, or 100 gig at the time of creating this lesson. There are other ways to provision slower speeds, but I'll be covering those in a dedicated lesson later in this section of the course. The connection is between a business premises, a Direct Connect (DX) location, and finally an AWS region. I’ll show this architecture visually on the next screen.

      Conceptually, think of three different physical locations: your business premises, where you have a customer premises router; a DX location, where you also have other equipment such as a DX router and maybe some servers; and finally an AWS region, such as US East 1. When you order a DX connection, what you're actually ordering is a network port at the DX location. AWS provides a port allocation and authorizes you to connect to that port, which I’ll detail soon. However, a Direct Connect ordered directly from AWS doesn’t actually provide a connection of any kind—it’s just a physical port. It’s up to you to connect to this directly or arrange the connection to be extended via a third-party communications provider.

      The port has two costs: an hourly cost based on the DX location and the speed of the port, and a charge for outbound data transfer. Inbound data transfer is free of charge. There are a couple of important things to keep in mind about Direct Connect. First is the provisioning time—AWS will take time to allocate a port, and once allocated, you’ll need to arrange the connection into that port at the DX location. If you haven’t already connected the DX location to your business network, you might be looking at weeks or months of extra time for the physical laying of cables between the DX location and your business premises. Keep that in mind.

      Since it’s a physical cable, there’s no built-in resilience—if the cable is cut, it’s cut. You can design in resilience by using multiple Direct Connects, but that’s something you have to layer on top. Direct Connect provides low latency because data isn’t transiting across the public internet like with a VPN. It also provides consistent latency, as you’re using a single physical cable at best or a small number of private networking links at worst. If you need low and consistent latency for an application, Direct Connect is the way to go. In addition, it’s also the best way to achieve the highest speeds for hybrid networking within AWS. As mentioned, it can be provisioned with 1, 10, or 100 gigabit speeds, and since it’s a dedicated port, you’re very likely to achieve the maximum possible speed.

      Compare that to an IPsec VPN, which uses encryption and therefore incurs processing overhead while transiting over the public internet. Direct Connect will give you higher, more consistent speeds. Lastly, Direct Connect can be used to access both AWS private services running in a VPC and AWS public services. However, it cannot be used to access the public internet unless you add a proxy or another networking appliance to handle that for you.

      Visually, the architecture of Direct Connect starts on the right with your business premises, where you'll have some kind of customer premises router or firewall. This might be the same router connected to your internet connection or a new, dedicated DX-capable router, which I’ll explain more about in an upcoming lesson. Additionally, you’ll have some staff, in this case, Bob and Julie. In the middle, we have a DX location. This is often confusing, as it’s not a location actually owned by AWS—it’s not an AWS building. It’s usually a large regional data center where AWS rents space, and your business might also rent space alongside other businesses.

      Inside this DX location is an AWS cage—an area owned by AWS containing one or more DX routers known as AWS DX routers, which are the endpoints of the Direct Connect service. You might also rent space in this DX location, known as the customer cage. If you’re a large organization, you might rent this space directly, housing some of your infrastructure and a router known as the customer DX router. If you’re a smaller organization, this cage might belong to a communications partner—this is called the comms partner cage. If you don’t have space in a DX location, the communications partner does and can extend connections from this DX location to your business premises.

      The key thing to understand about Direct Connect is that it's a port allocation. When you order a Direct Connect from AWS to a specific DX location, you’re allocated a DX port. This must be physically connected using a fiber optic cable to another port in the DX location—either your router in your cage or a communications partner’s router in the same DX location. In either case, you’ll have a corresponding port within the DX location, whether on your own equipment or that of a comms provider. Between these two ports, you’ll need to order a cross connect.

      The cross connect is a physical connection between the AWS DX port in the AWS cage and your or your provider’s port within the DX location. This concept is crucial, whether you have equipment in the DX location or purchase access through a communications partner. From the partner, you'll be allocated a port within the DX location, and it is to this port that the cross connect is linked. This is the cable that connects the AWS DX port to your router or a communications partner’s router. If you're using a communications partner, this link can then be extended to your customer premises. But in all cases, you must have a port within either a customer cage or comms partner cage at the DX location to establish a cross connect with AWS’s DX port.

      On the left side, we have an AWS region—such as AP Southeast 2—with a VPC containing a private subnet and services. We also have the AWS public zone and example services such as SQS, Elastic IP addresses, and S3. The AWS region is AWS-owned infrastructure, which may or may not be in the same facility as the DX location but is always connected with multiple high-speed resilient network connections. Conceptually, you can think of the region as always being connected to one or more local DX locations.

      That’s the physical architecture, and I’ll go into more detail in upcoming lessons elsewhere in the course. Logically, we configure virtual interfaces—called VIFs—over this single physical connection. There are three types of VIFs. First are transit VIFs, which have specific use cases that I’ll explain in detail later. Second are public VIFs, used to access AWS public space services. A public VIF runs over the full Direct Connect path—from your customer router to your DX router, then into the AWS DX router, and finally into the public AWS region. Third are private VIFs, which also run over Direct Connect but connect into virtual private gateways attached to a VPC, giving you access to private AWS services.

      That’s everything I wanted to cover in this lesson. Go ahead and complete it, and when you're ready, I look forward to you joining me in the next one.

    1. Welcome back. This is part two of this lesson, and we’re going to continue immediately from the end of part one. So let's get started.

      Now, the previous architecture can be evolved by using queues. A queue is a system that accepts messages. Messages are sent onto a queue and can be received or polled off the queue. In many queues, there's ordering, meaning that in most cases, messages are received off the queue in a first-in, first-out (FIFO) architecture, though it's worth noting that this isn't always the case.

      Using a queue-based decoupled architecture, CatTube would look something like this: Bob would upload his newest video of whiskers laying on the beach to the upload component. Once the upload is complete, instead of passing this directly onto the processing tier, it does something slightly different. It stores the master 4K video inside an S3 bucket and adds a message to the queue detailing where the video is located, as well as any other relevant information, such as what sizes are required. This message, because it’s the first message in the queue, is architecturally at the front of the queue. At this point, the upload tier, having uploaded the master video to S3 and added a message to the queue, finishes this particular transaction. It doesn’t talk directly to the processing tier and doesn't know or care if it’s actually functioning. The key thing is that the upload tier doesn't expect an immediate answer from the processing tier. The queue has decoupled the upload and processing components.

      It's moved from a synchronous style of communication where the upload tier expects and needs an immediate answer and waits for that answer, to asynchronous communications. Here, the upload tier sends the message and can either wait in the background or just continue doing other things while the processing tier does its job. While this process is going on, the upload component is probably getting additional videos being uploaded, and they’re added to the queue along with the whiskers video processing job. Other messages that are added to the queue are behind the whiskers job because there is an order in this queue: it is a FIFO queue.

      At the other side of the queue, we have an auto-scaling group, which has been configured with a minimum size of 0, a desired size of 0, and a maximum size of 1,337. Currently, it has no instances provisioned, but it has auto-scaling policies that provision or terminate instances based on what's called the queue length, which is the number of items in the queue. Because there are messages on the queue added by the upload tier, the auto-scaling group detects this and increases the desired capacity from 0 to 2. As a result, instances are provisioned by the auto-scaling group. These instances start polling the queue and receive messages that are at the front of the queue. These messages contain the data for the job and the location of the S3 bucket and the object in that bucket. Once these jobs are received from the queue by these processing instances, they can retrieve the master video from the S3 bucket.

      The jobs are processed by the instances, and once they are completed, the messages are deleted from the queue, leaving only one job in the queue. At this point, the auto-scaling group may decide to scale back because of the shorter queue length, so it reduces the desired capacity from 2 to 1, which terminates one of the processing instances. The instance that remains polls the queue and receives the last message. It completes the processing of that message, performs the transcoding on the videos, and leaves zero messages in the queue. The auto-scaling group realizes this and scales back the desired capacity from 1 to 0, resulting in the termination of the last processing EC2 instance.

      Using a queue architecture to place a queue between two application tiers decouples those tiers. One tier adds jobs to the queue and doesn’t care about the health or the state of the other tier. The other tier can read jobs from the queue, and it doesn't care how they got there. This is unlike the previous example where application load balancers were used between tiers. While this did allow for high availability and scaling, the upload tier in the previous example still synchronously communicated with one instance of the processing tier. With the queue architecture, no communication happens directly between the components. The components are decoupled and can scale independently and freely. In this case, the processing tier uses a worker fleet architecture that can scale anywhere from zero to a near-infinite number of instances based on the length of the queue.

      This is a really powerful architecture because of the asynchronous communications it uses. It's an architecture commonly used in applications like CatTube, where customers upload things for processing, and you want to ensure that a worker fleet behind the scenes can scale to perform that processing. You might be asking why this matters in the context of event-driven architectures, and I’m getting there, I promise.

      If you continue breaking down a monolithic application into smaller and smaller pieces, you'll eventually end up with a microservice architecture, which is a collection of, as the name suggests, microservices. Microservices do individual things very well. In this example, we have the upload microservice, the processing microservice, and the store and manage microservice. A full application like CatTube might have hundreds or even thousands of these microservices. They might be different services, or there might just be many copies of the same service, like in this example, which is fortunate because it's much easier to diagram. The upload service is a producer, the processing node is a consumer, and the data store and manage microservice performs both roles.

      Logically, producers produce data or messages, and consumers, as the name suggests, consume data or messages. There are also microservices that can do both things. The things that services produce and consume architecturally are events. Queues can be used to communicate events, as we saw with the previous example, but larger microservices architectures can get complex quickly. Services need to exchange data between partner microservices, and if we do this with a queue architecture, we'll logically have many queues. While this works, it can be complicated. Keep in mind that a microservice is just a tiny self-sufficient application. It has its own logic, its own store of data, and its own input/output components.

      Now, if you hear the term "event-driven architecture," I don’t want you to be too apprehensive. Event-driven architectures are simply a collection of event producers, which might be components of your application that directly interact with customers, parts of your infrastructure like EC2, or systems monitoring components. These are bits of software that generate or produce events in reaction to something. If a customer clicks submit, that might be an event. If an error occurs during the upload of the whiskers holiday video, that's an event. Producers are things that produce events, and the inverse of this is consumers—pieces of software that are ready and waiting for events to occur. When they see an event they care about, they take action. This might involve displaying something for a customer, dispatching a human to resolve an order packing issue, or retrying an upload.

      Components or services within an application can be both producers and consumers. Sometimes a component might generate an event, for example, a failed upload, and then consume events to force a retry of that upload. The key thing to understand about event-driven architectures is that neither the producers nor the consumers are sitting around waiting for things to occur. They're not constantly consuming resources or running at 100% CPU load, waiting for things to happen. Producers generate events when something occurs, such as when a button is clicked, an upload works, or when it doesn’t work. These producers produce events, but consumers aren’t waiting around for those events. They have those events delivered, and when they receive an event, they take an action, then stop. They're not constantly consuming resources.

      Applications would be really complex if every software component or service needed to be aware of every other component. If every application component required a queue between it and every other component to put events into and access them from, the architecture would be really complicated. Best practice event-driven architectures have what's called an event router, a highly available central exchange point for events. The event router has an event bus, which you can think of as a constant flow of information. When events are generated by producers, they're added to this event bus, and the router can deliver them to event consumers.

      The WordPress system we’ve used so far has been running on an EC2 instance, which is essentially a consistent allocation of resources. Whether the WordPress system is under low load or large load, we’re still billed for that EC2 instance, consuming resources. Now, imagine a system with lots of small services all waiting for events. If events are received, the system springs into action, allocating resources and scaling components as needed. It deals with those events, then returns to a low or no resource usage state, which is the default. Event-driven architectures only consume resources when needed. There’s nothing constantly running or waiting for things to happen. We don’t constantly poll, hoping for something to happen. We have producers that generate events when something happens. For example, on Amazon.com, when you click "order," it generates an event, and actions are taken based on that event. But Amazon.com doesn’t constantly check your browser every second to see if you've clicked "submit."

      So, in summary, a mature event-driven architecture only consumes resources while handling events. When events are not occurring, it doesn’t consume resources. This is one of the key components of a serverless architecture, which I’ll talk about more later in this section.

      I know this has been a lot of theory, but I promise you, as you continue through the course, it will really make sense why I introduced this theory in detail at this point. It will help you with the exam, too. In the rest of this section, we’ll be covering more AWS-specific and practical topics, but they’ll all rely on your knowledge of this evolution of systems architecture.

      Thanks for watching this video. You can go ahead and finish it off, and when you’re ready, I look forward to you joining me in the next lesson.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. While such efforts are in principle useful, the absence of benchmarking against synthetic data and a wide range of different single-cell data sets make this study incomplete. Based on what is presented, one can neither ultimately judge if this will be an advance over previous work nor whether the approach will be of general applicability.

      We thank the eLife editor for the valuable feedback. Both benchmarking against other methods and validation on a synthetic dataset (“dyntoy”) are indeed presented in the Supplementary Note, although this was not sufficiently highlighted in the main text, which has now been improved.

      Our manuscript contains benchmarking against a challenging synthetic dataset in Figure 1; furthermore, both the synthetic dataset and the real-world thymus dataset have been analyzed in parallel using currently available TI tools (as detailed in the Supplementary Note). z other single-cell datasets (single-cell RNA-seq) were added in response to the reviewers' comments.

      One of the reviewers correctly points out that tviblindi goes against the philosophy of automated trajectory inference. This is correct; we believe that a new class of methods, complementary to fully automated approaches, is needed to explore datasets with unknown biology. tviblindi is meant to be a representative of this class of methods—a semi-automated framework that builds on features inferred from the data in an unbiased and mathematically well-founded fashion (pseudotime, homology classes, suitable low-dimensional representation), which can be used in concert with expert knowledge to generate hypotheses about the underlying dynamics at an appropriate level of detail for the particular trajectory or biological process.

      We would also like to mention that the algorithm and the workflow are not the sole results of the paper. We have thoroughly characterized human thymocyte development, where, in addition to expected biological endpoints, we found and characterized an unexpected activated thymic T-reg endpoint.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present tviblindi, a computational workflow for trajectory inference from molecular data at single-cell resolution. The method is based on (i) pseudo-time inference via expecting hitting time, (ii) sampling of random walks in a directed acyclic k-NN where edges are oriented away from a cell of origin w.r.t. the involved nodes' expected hitting times, and (iii) clustering of the random walks via persistent homology. An extended use case on mass cytometry data shows that tviblindi can be used elucidate the biology of T cell development.

      Strengths:

      - Overall, the paper is very well written and most (but not all, see below) steps of the tviblindi algorithm are explained well.

      - The T cell biology use case is convincing (at least to me: I'm not an immunologist, only a bioinformatician with a strong interest in immunology).

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below

      Weaknesses:

      - The main weakness of the paper is that a systematic comparison of tviblindi against other tools for trajectory inference (there are many) is entirely missing. Even though I really like the algorithmic approach underlying tviblindi, I would therefore not recommend to our wet-lab collaborators that they should use tviblindi to analyze their data. The only validation in the manuscript is the T cell development use case. Although this use case is convincing, it does not suffice for showing that the algorithms's results are systematically trustworthy and more meaningful (at least in some dimension) than trajectories inferred with one of the many existing methods.

      We have compared tviblindi to several trajectory inference methods (Supplementary note section 8.2: Comparison to state-of-the-art methods, namely Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021), StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Also, in the meantime we have successfully used tviblindi to investigate human B-cell development in primary immunodeficiency (Bakardjieva M, et al. Tviblindi algorithm identifies branching developmental trajectories of human B-cell development and describes abnormalities in RAG-1 and WAS patients. Eur J Immunol. 2024 Dec;54(12):e2451004. doi: 10.1002/eji.202451004.).

      - The authors' explanation of the random walk clustering via persistent homology in the Results (subsection "Real-time topological interactive clustering") is not detailed enough, essentially only concept dropping. What does "sparse regions" mean here and what does it mean that "persistent homology" is used? The authors should try to better describe this step such that the reader has a chance to get an intuition how the random walk clustering actually works. This is especially important because the selection of sparse regions is done interactively. Therefore, it's crucial that the users understand how this selection affects the results. For this, the authors must manage to provide a better intuition of the maths behind clustering of random walks via persistent homology.

      In order to satisfy both reader types: the biologist and the mathematician, we explain the mathematics in detail in the Supplementary Note, section 4. We improved the Results text to better point the reader to the mathematical foundations in the Supplementary Note.  

      - To motivate their work, the authors write in the introduction that "TI methods often use multiple steps of dimensionality reduction and/or clustering, inadvertently introducing bias. The choice of hyperparameters also fixes the a priori resolution in a way that is difficult to predict." They claim that tviblindi is better than the original methods because "analysis is performed in the original high-dimensional space, avoiding artifacts of dimensionality reduction." However, in the manuscript, tviblindi is tested only on mass cytometry data which has a much lower dimensionality than scRNA-seq data for which most existing trajectory inference methods are designed. Since tviblindi works on a k-NN graph representation of the input data, it is unclear if it could be run on scRNA-seq data without prior dimensionality reduction. For this, cell-cell distances would have to be computed in the original high-dimensional space, which is problematic due to the very high dimensionality of scRNA-seq data. Of course, the authors could explicitly reduce the scope of tviblindi to data of lower dimensionality, but this would have to be stated explicitly.

      In the manuscript we tested the framework on the scRNA-seq data from Park et al 2020 (DOI: 10.1126/science.aay3224). To illustrate that tviblindi can work directly in the high-dimensional space, we applied the framework successfully on imputed 2000 dimensional data. Furthermore we successfully used tviblindi to investigate bone marrow atlas scRNA-Seq dataset Zhang et al. (2024) and atlas of mouse gastrulation Pijuan-Sala et al. (2019). The idea behind tviblindi is to be able to work without the necessity to use non-linear dimensionality reduction techniques, which reduce the dimensionality to a very low number of dimensions and whose effects on the data distribution are difficult to predict. On the other hand the use of (linear) dimensionality reduction techniques which effectively suppress noise in the data such as PCA is a good practice (see also response to reviewer 2). We have emphasized this in the revised version and added the results of the corresponding analysis (see Supplementary note, section 9).

      - Also tviblindi has at least one hyper-parameter, the number k used to construct the k-NN graphs (there are probably more hidden in the algorithm's subroutines). I did not find a systematic evaluation of the effect of this hyper-parameter.

      Detailed discussion of the topic is presented in the Supplementary Note, section 8.1, where Spearman correlation coefficient between pseudotime estimated using k=10 and k=50 nearest neighbors was 0.997.   The number k however does affect the number of candidate endpoints. But even when larger k causes spurious connection between unrelated cell fates, the topological clustering of random walks allows for the separation of different trajectories. We have expanded the “sensitivity to hyperparameters” section 8.1 also in response to reviewer 2.

      Reviewer #2 (Public Review):

      Summary:

      In Deconstructing Complexity: A Computational Topology Approach to Trajectory Inference in the Human Thymus with tviblindi, Stuchly et al. propose a new trajectory inference algorithm called tviblindi and a visualization algorithm called vaevictis for single-cell data. The paper utilizes novel and exciting ideas from computational topology coupled with random walk simulations to align single cells onto a continuum. The authors validate the utility of their approach largely using simulated data and establish known protein expression dynamics along CD4/CD8 T cell development in thymus using mass cytometry data. The authors also apply their method to track Treg development in single-cell RNA-sequencing data of human thymus.

      The technical crux of the method is as follows: The authors provide an interactive tool to align single cells along a continuum axis. The method uses expected hitting time (given a user input start cell) to obtain a pseudotime alignment of cells. The pseudotime gives an orientation/direction for each cell, which is then used to simulate random walks. The random walks are then arranged/clustered based on the sparse region in the data they navigate using persistent homology.

      We thank the reviewer for feedback and suggestions that we have accommodated, we responded point-by-point below.

      Strengths:

      The notion of using persistent homology to group random walks to identify trajectories in the data is novel.

      The strength of the method lies in the implementation details that make computationally demanding ideas such as persistent homology more tractable for large scale single-cell data. This enables the authors to make the method more user friendly and interactive allowing real-time user query with the data.

      Weaknesses:

      The interactive nature of the tool is also a weakness, by allowing for user bias leading to possible overfitting for a specific data.

      tviblindi is not designed as a fully automated TI tool (although it implements a fully automated module), but as a data driven framework for exploratory analysis of unknown data. There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models. 

      tvilblindi tries to solve this challenge by intentionally overfitting the data and keeping the level of resolution on a single random walk. In this way we aim to capture all putative local relationships in the data. The on-demand aggregation of the walks using the global topology of the data allows researchers to use their expert knowledge to choose the right level of detail (as demonstrated in the Figure 4 of the manuscript) while relying on the topological structure of the high dimensional point cloud. At all times tviblindi allows to inspect the composition of the trajectory to assess the variance in the development, possible hubs on the KNN-graph etc.

      The main weakness of the method is lack of benchmarking the method on real data and comparison to other methods. Trajectory inference is a very crowded field with many highly successful and widely used algorithms, the two most relevant ones (closest to this manuscript) are not only not benchmarked against, but also not sited. Including those that specifically use persistent homology to discover trajectories (Rizvi et.al. published Nat Biotech 2017). Including those that specifically implement the idea of simulating random walks to identify stable states in single-cell data (e.g. CellRank published in Lange et.al Nat Meth 2022), as well as many trajectory algorithms that take alternative approaches. The paper has much less benchmarking, demonstration on real data and comparison to the very many other previous trajectory algorithms published before it. Generally speaking, in a crowded field of previously published trajectory methods, I do not think this one approach will compete well against prior work (especially due to its inability to handle the noise typical in real world data (as was even demonstrated in the little bit of application to real world data provided).

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      Beyond general lack of benchmarking there are two issues that give me particular concern. As previously mentioned, the algorithm is highly susceptible to user bias and overfitting. The paper gives the example (Figure 4) of a trajectory which mistakenly shows that cells may pass from an apoptotic phase to a different developmental stage. To circumvent this mistake, the authors propose the interactive version of tviblindi that allows users to zoom in (increase resolution) and identify that there are in fact two trajectories in one. In this case, the authors show how the author can fix a mistake when the answer is known. However, the point of trajectory inference is to discover the unknown. With so much interactive options for the user to guide the result, the method is more user/bias driven than data-driven. So a rigorous and quantitative discussion of robustness of the method, as well as how to ensure data-driven inference and avoid over-fitting would be useful.

      Local directionality in expression data is a challenge which is not, to our knowledge, solved. And we are not sure it can be solved entirely, even theoretically. The random walks passing “through” the apoptotic phase are biologically infeasible, but it is an (unbiased) representation of what the data look like based on the diffusion model. It is a property of the data (or of the panel design), which has to be interpreted properly rather than a mistake. Of note, except for Monocle3 (which does not provide the directionality) other tested methods did not discover this trajectory at all.

      The “zoom in” has in fact nothing to do with “passing through the apoptosis”. We show how the researcher can investigate the suggested trajectory to see if there is an additional structure of interest and/or relevance. This investigation is still data driven (although not fully automated). Anecdotally in this particular case this branching was discovered by a bioinformatician, who knew nothing about the presence of beta-selection in the data.  

      We show that the trajectory of apoptosis of cortical thymocytes consists of 2 trajectories corresponding to 2 different checkpoints (beta-selection and positive/negative selection). This type of a structure, where 2 (or more) trajectories share the same path for most of the time, then diverge only to be connected at a later moment (immediately from the point of view of the beta-selection failure trajectory) is a challenge for TI algorithms and none of tested methods gave a correct result. More importantly there seems to be no clear way to focus on these kinds of structures (common origin and common fate) in TI methods.

      Of note, the “zoom in” is a recommended and convenient method to look for an inner structure, but it does not necessarily mean addition of further homological classes. Indeed, in this case the reason that the structure is not visible directly is the limitation of the dendrogram complexity (only branches containing at least 10% of simulated random walks are shown by default). In summary, tviblindi effectively handled all noise in the data that obscured biologically valid trajectories for other methods. We have improved the discussion of the robustness in the current version.  

      Second, the paper discusses the benefit of tviblindi operating in the original high dimensions of the data. This is perhaps adequate for mass cytometry data where there is less of an issue of dropouts and the proteins may be chosen to be large independent. But in the context of single-cell RNA-sequencing data, the massive undersampling of mRNA, as well as high degree of noise (e.g. ambient RNA), introduces very large degree of noise so that modeling data in the original high dimensions leads to methods being fit to the noise. Therefore ALL other methods for trajectory inference work in a lower dimension, for very good reason, otherwise one is learning noise rather than signal. It would be great to have a discussion on the feasibility of the method as is for such noisy data and provide users with guidance. We note that the example scRNA-seq data included in the paper is denoised using imputation, which will likely result in the trajectory inference being oversmoothed as well.

      We agree with the reviewer. In our manuscript we wanted to showcase that tviblindi can directly operate in high-dimensional space (thousands of dimensions) and we used MAGIC imputation for this purpose. This was not ideal. More standard approach, which uses 30-50 PCs as input to the algorithm resulted in equivalent trajectories. We have added this analysis to the study (Supplementary note, section 9).

      In summary, the fact that tviblindi scales well with dimensionality of the data and is able to work in the original space does not mean that it is always the best option. We have added a corresponding comment into the Supplementary note.  

      Reviewer #3 (Public Review):

      Summary:

      Stuchly et al. proposed a single-cell trajectory inference tool, tviblindi, which was built on a sequential implementation of the k-nearest neighbor graph, random walk, persistent homology and clustering, and interactive visualization. The paper was organized around the detailed illustration of the usage and interpretation of results through the human thymus system.

      Strengths:

      Overall, I found the paper and method to be practical and needed in the field. Especially the in-depth, step-by-step demonstration of the application of tviblindi in numerous T cell development trajectories and how to interpret and validate the findings can be a template for many basic science and disease-related studies. The videos are also very helpful in showcasing how the tool works.

      Weaknesses:

      I only have a few minor suggestions that hopefully can make the paper easier to follow and the advantage of the method to be more convincing.

      (1) The "Computational method for the TI and interrogation - tviblindi" subsection under the Results is a little hard to follow without having a thorough understanding of the tviblindi algorithm procedures. I would suggest that the authors discuss the uniqueness and advantages of the tool after the detailed introduction of the method (moving it after the "Connectome - a fully automated pipeline".

      We thank the reviewer for the suggestion and we have accommodated it to improve readability of the text.

      Also, considering it is a computational tool paper, inevitably, readers are curious about how it functions compared to other popular trajectory inference approaches. I did not find any formal discussion until almost the end of the supplementary note (even that is not cited anywhere in the main text). Authors may consider improving the summary of the advantages of tviblindi by incorporating concrete quantitative comparisons with other trajectory tools.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      (2) Regarding the discussion in Figure 4 the trajectory goes through the apoptotic stage and reconnects back to the canonical trajectory with counterintuitive directionality, it can be a checkpoint as authors interpret using their expert knowledge, or maybe a false discovery of the tool. Maybe authors can consider running other algorithms on those cells and see which tracks they identify and if the directionality matches with the tviblindi.

      We have indeed used the thymus dataset for comparison of all TI algorithms listed above. Except for Monocle 3 they failed to discover the negative selection branch (Monocle 3 does not offer directionality information). Therefore, a valid topological trajectory with incorrect (expert-corrected) directionality was partly or entirely missed by other algorithms. 

      (3) The paper mainly focused on mass cytometry data and had a brief discussion on scRNA-seq. Can the tool be applied to multimodality data such as CITE-seq data that have both protein markers and gene expression? Any suggestions if users want to adapt to scATAC-seq or other epigenomic data?

      The analysis of multimodal data is the logical next step and is the topic of our current research. At this moment tviblindi cannot be applied directly to multimodal data. It is possible to use the KNN-graph based on multimodal data (such as weighted nearest neighbor graph implemented in Seurat) for pseudotime calculation and random walk simulation. However, we do not have a fully developed triangulation for the multimodal case yet. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses:

      -  Benchmark against existing trajectory inference methods.

      -  Benchmark on scRNA-seq data or an explicit statement that, unlike existing methods, tviblindi is not designed for such data.

      We provided comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021),  StaVia (Via 2.0) Stassen et al. (2024), CellRank 2 (v2.06) Weiler et al. (2024)  and PAGA (scanpy==1.9.3) Wolf et al. (2019). We added thorough and systematic comparisons to the other algorithms mentioned by reviewers. We included extended evaluation on publicly available datasets (Supplementary Note section 10).

      -  Systematic evaluation of the effetcs of hyper-parameters on the performance of tviblindi (as mentioned above, there is at least one hyper-parameter, the number k to construct the k-NN graphs).

      This is described in Supplementary Note section 8.1

      Recommendations for improving the writing and presentation:

      -  The GitHub link to the algorithm which is currently hidden in the Methods should be moved to the abstract and/or a dedicated section on code availability.

      -  The presentation of the persistent homology approach used for random walk clustering should be improved (see public comment above).

      This is described extensively in Supplementary Note  

      -  A very minor point (can be ignored by the authors): consider renaming the algorithm. At least for me, it's extremely difficult to remember.

      We choose to keep the original name

      Minor corrections to the text and figures:

      -  Labels and legend texts are too small in almost all figures.

      Reviewer #2 (Recommendations For The Authors):  

      (1) On page 3: "(2) Analysis is performed in the original high-dimensional space avoiding artifacts of dimensionality reduction." In mass cytometry data where there is no issue of dropouts, one may choose proteins such that they are not correlated with each other making dimensionality reduction techniques less relevant. But in the context of an unbiased assays such as single-cell RNA-sequencing (scRNA-seq), one measures all the genes in a cell so dimensionality reduction can help resolve the redundancy in the feature space due to correlated/co-regulated gene expression patterns. This assumption forms the basis of most methods in scRNA-seq. More importantly, in scRNA-seq data the dropouts and ambient molecules in mRNA counts result in so much noise that modeling cells in the full gene expression is highly problematic. So the authors are requested to discuss in detail how they would propose to deal with noise in scRNA-seq data.

      On this note, the authors mention in Supplementary Note 9 (Analysis of human thymus single-cell RNA-seq data): "Imputed data are used as the input for the trajectory inference, scaled counts (no imputation) are shown in line plots". The line plots indicate the gene expression trends along the obtained pseudotime. The authors use MAGIC to impute the data, and we request the authors to mention this in the Methods section (currently one must look through the code on Supplementary Note 1.3 to find this). Data imputation in single-cell RNA-seq data are intended to enable quantification of individual gene expression distribution or pairwise gene associations. But when all the genes in an imputed data are used for visualization, clustering or trajectory inference, the averaging effect will compound and result in severely smoothed data that misses important differences between cell states. Especially, in the case of MAGIC, which uses a transition matrix raised to a power, it is over-smoothing of the data to use a transition matrix smoothed data to obtain another transition matrix to calculate the hitting time (or simulate random walks). Second, the authors' proposal to use scaled counts to study gene trends cannot be generalized to other settings due to drop out issue. Given the few genes (and only one branch) that are highlighted in Figure 7D-G and Figure 31 in Supplementary Note, it is hard to say if scaling raw values would pick up meaningful biology robustly here for other branches.

      We recommend that this data be reanalyzed with non-imputed data used for trajectory inference and imputed gene expression used for line plots.

      As stated above in the public review, we reanalyzed the scRNA Seq data using a more standard approach (first 50 principal components). We have also analyzed two additional scRNA Seq datasets (Section 1 and section 10 of Supplementary Note)

      On the same note, the authors use Seurat's CellCycleScoring to obtain the cell cycle phase of each cell and later use ScaleData to regress them out. While we agree that it is valuable to remove cell cycle effect from the data for trajectory inference (and has been used previously in other methods), the regression approach employed in Seurat's ScaleData is not appropriate. It is an aggressive approach that severely changes expression pattern of many genes and can result in new artifacts (false positives) in the data. We recommend the authors to explore this more and consider using a more principled alternatives such as fscLVM (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1334-8). 

      Cell cycle correction is an open problem (Heumos, Nat Rev Genetics, 2023)

      Here we use an (arguably aggressive) approach to make the presentation more straightforward. The cells we are interested here (end #6) are not dividing and the regression does not change the conclusion drawn in the paper

      (2) The figures provided are extremely low in resolution that it is practically impossible to correctly interpret a lot of the conclusion and references made in the figure (especially Figure 3 in the main text).

      Resolution of the Figures was improved

      (3) There are many aspects of the method that enable easy user biases and can lead to substantial overfitting of the data.

      a. On page 7: "The topology of the point cloud representing human T-cell development is more complex ... and does not offer a clear cutoff for the choice of significant sparse regions. Interactive selection allows the user to vary the resolution and to investigate specific sparse regions in the data iteratively." This implies that the method enables user biases to be introduced into the data analysis. While perhaps useful for exploration, quantitative trajectory assessment using such approach can be faulty when the user (A) may not know the underlying dynamics (B) forces preconceived notion of trajectory.

      The authors should consider making the trajectory inference approach less dependent on interactive user input and show that the trajectory results are robust to any choices the user may make. It may also help if the authors provide an effective guide and mention clearly what issues could result due to the use of such thresholds.

      As explained in the response in public reviews, tviblindi is not designed as a fully automated TI tool, but as a data driven framework for exploratory analysis of unknown data. 

      There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models.  To specifically address the points raised by the reviewer:

      “(A) may not know the underlying dynamics” - tviblindi is designed to perform exploratory analysis of the unknown underlying dynamics. We showcase in the study how this can be performed and we highlight possible cases which can be resolved expertly (spurious connections (doublets), different scales of resolution (beta selection)). Crucially, compared to other TI methods, tviblindi offers a clear mechanism on how to discover, focus and resolve these issues which would (and do) contaminate the trajectories discovered fully automatically by tested methods (cf. the beta selection, or the development of plasmacytoid dendritic cells (PDCs) (Supplementary note, section 10.1).

      “(B) forces preconceived notion of trajectory” - user interaction in tviblindi does not force a preconceived notion of the trajectory. The random walks are simulated before the interactive step in an unbiased manner. During the interactive step the user adjusts trajectory specific resolution - incorrect choice of the resolution may result in either merging distinct trajectories into one or over separating the trajectories (which is arguably much less serious). However the interactive step is designed to deal with exactly this kind of challenge. We showcase (e.g. beta selection, or PDCs development) how to address the issue - tviblindi allows us to investigate deeper structure in any considered trajectory.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools. It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner, including pseudotime, homology classes, and appropriate low-dimensional representations. These can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      b. In Figure 4, the authors discuss the trajectory of cells emanating from CD3 negative double positive stage and entering apoptotic phase and mention tviblindi may give "the false impression that cells may pass through an apoptotic phase into a later developmental stage" and propose that the interactive version of tviblindi can help user zoom into (increase resolution) this phenomenon and identify that there are in fact two trajectories in one. Given this, how do the other trajectories in the data change if a user manually adjusts the resolution? A quantification of the robustness is important. Also, it appears that a more careful data clean up could avoid such pitfalls where the algorithm infers trajectory based on mixed phenotype and the user would not have to manually adjust the resolution to obtain clear biological conclusion. We not that the original publication of this data did such "data clean up" using simple diffusion map based dimensionality reduction which the authors boast they avoid. There is a reason for this dimensionality reduction (distinguishing signal from noise), even in CyTOF data, let alone its importance in single cell data.

      The reviewer is concerned about two different, but intertwined issues we wish to untangle here. First, data clean-up is typically done on the premise that dead cells are irrelevant and they are a source of false signals. In the case of the thymocytes in the human thymus this premise is not true. Apoptotic cells are a legitimate (actually dominant) fate of the development and thus need to be represented in the TI dataset. Their biological behavior is however complex as they stop expressing proteins and thus lose their surface markers gradually, as dictated by the particular protein degradation kinetics. So can we clean up dead and dying cells better? Yes, but we don't want to do it since we would lose cells we want to analyze. Second, do trajectories change when we zoom into the data? No, only the level of detail presented visually changes. Since we calculate 5000 trajectories in the dataset, we need to aggregate them already for the hierarchical clustering visualization. Note that Figure 4, panel A highlights 159 trajectories selected in V. group. Zooming in means that the hierarchy of trajectories within V. group is revealed (panel D, groups V.a and Vb.) and can be interpreted on the vaevictis and lineplot graphs (panel E, F). 

      c. In the discussion, the authors write "[tviblindi] allows the selection and grouping of similar random walks into trajectories based on visual interaction with the data". This counters the idea of automated trajectory inference and can lead to severe overfitting.

      As explained in reply to Q3, our aim was NOT to create a fully automated trajectory inference tool. Even more, in our experience we realized that all current tools are taking this fully  automated approach with a search for an “ideal” set of hyperparameters. This, in our experience,  leads to a “blackbox” tool that is difficult to interpret for the expert in the biological field. To respond to this need we designed a modular approach where the results of the TI are presented and the expert can interact with them to focus the visualization and to derive interpretation. Our interactive concept is based on 15 years of experience with the data analysis in flow cytometry, where neither manual gating nor full automation is the ultimate solution but smart integration of both approaches eventually wins the game.

      Thus, tviblindi represents a new class of methods that is complementary to fully automated trajectory inference tools.  It offers a semi-automated tool that leverages features derived from data in an unbiased and mathematically rigorous manner. These features include pseudotime, homology classes, and appropriate low-dimensional representations. These features can be integrated with expert knowledge to formulate hypotheses regarding the underlying dynamics, tailored to the specific trajectory or biological process under investigation.

      d. The authors provide some comment on the robustness to the relaxation parameter for witness complex construction in Supplementary Note Section 8.1.2 but it is limited given the importance of this parameter and a more thorough investigation is recommended. We request the authors to provide concrete examples with figures of how changing alpha2 parameter leads to simplicial complexes of different sizes and an assessment of contexts in which the parameter is robust and when not (in both simulated and publicly available real data). Of note, giving the users a proper guide for parameter choice based on these examples and offering them ways to quantify robustness of their results may also be valuable.

      Section 8 in Supplementary Note was extended as requested.

      e. The authors are requested for an assessment of possible short-circuits (e.g. cells of two distantly related phenotypes that get connected erroneously in the trajectory) in the data, and how their approach based on persistent homology deals with it.

      If a short circuit results in a (spurious) alternative trajectory, the persistent homology approach allows us to distinguish it from genuine trajectories that do not follow the short circuit. This prevents contamination of the inferred evolution by erroneous connections. The ability to distinguish and separate distinct trajectories with the same fate is a major strength of this approach (e.g., the trajectory through doublets or the trajectories around checkpoints in thymocytes’ evolution).

      (4) The authors propose vaevictis as a new visualization tool and show its performance compared to the standard UMAP algorithm on a simulated data set (Figure 1 in Supplementary Notes). We recommend a more comprehensive comparison between the two algorithms on a wide array of publicly available single-cell datasets. As well as comparison to other popular dimensionality reduction approaches like force directed layouts, which are the most widely used tool specifically to visualize trajectories.

      We added Section 10 to Supplementary Note that presents multiple comparisons of this kind. It is important to note that tviblindi works independently of visualization and any preferred visualization can be used in the interactive phase (multiple visualisation methods are implemented).

      (5) In Supplementary Note 8.2, the authors compare tviblindi against the other methods. We recommend the authors to quantify the comparison or expand on their assesments in real biological data. For example, in comparison against Palantir and VIA the authors mention "... discovers candidate endpoints in the biological dataset but lacks toolbox to interrogate subtle features such as complex branching" and "fails to discover subtle features (such as Beta selection)" respectively. We recommend the authors to make these comparisons more precise or provide quantification. While the added benefit of interactive sessions of tviblindi may make it more user friendly, the way tviblindi appears to enable analysis of subtle features (e.g. Figure 1H) should be possible in Palantir or VIA as well.

      We extended the comparisons and presented them in Section 8 and 10 in Supplementary Note.  

      (6) The notion of using random walk simulations to identify terminal (and initial states) has been previously used in single-cell data (CellRank algorithm: https://www.nature.com/articles/s41592-021-01346-6). We request the authors to compare their approach to CellRank.

      We compared our algorithm to the CellRank successor CellRank 2 (see section 8.2, Supplementary Note)

      (7) The notion of using persistent homology to discover trajectories has been previously used in single cell data https://pubmed.ncbi.nlm.nih.gov/28459448/. we request a comparison to this approach

      The proposed algorithm was not able to accommodate the large datasets we used.

      scTDA (Rizvi, Camara et al. Nat. Biotechnol. 2017) has not been updated for 6 years. It is not suited for complex atlas-sized datasets both in terms of performance and utility, with its limited visualization tools. It also lacks capabilities to analyze individual trajectories.

      (8) In Figure 3B, the authors visualize the endpoints and simulated random walks using the connectome. There is no edge from start to the apoptotic cells here. It is not clear why? If they are not relevant based on random walks, can the user remove them from analysis? Same for the small group of pink cells below initial point.

      The connectome is a fully automated approach (similar to PAGA) which gives a basic overview of the data. It is not expected to be able to compete with the interactive pipeline of tviblindi for the same reasons as the fully automated methods (difficult to predict the effect of hyperparameters).

      (9) In Supplementary Figure 3, in relation to "Variants of trajectories including selection processes" the author mention that there is a spurious connection between CD4 single positive, and the doublet set of cells. The authors mention that the presence of dividing cells makes it difficult to remove the doublets. We request the authors to discuss why. For example, the authors seem to have cell cycle markers (e.g. Ki67, pH3, Cyclin) and one would think that coupled with DNA intercalator 191/193lr one could further clean-up the data. Can the authors employ alternative toolkits such as doublet detection methods?

      To address this issue, we do remove doublets with illegitimate cell barcodes (e.g. we remove any two cells from two samples with different barcode which present with double barcode). Although there are computational doublet removal approaches for mass cytometry (Bagwell, Cytometry A 2020), mostly applied to peripheral blood samples (where cell division is not present under steady state immune system conditions), these are however not well suited for situations where dividing samples occur (Rybakowska P, Comput Struct Biotechnol J. 2021), which is the case of our thymocyte samples. Furthermore, there are other situations where doublet formation is not an accident, but rather a biological response (Burel JG, Cytometry A (2020). Thus, the doublet cell problem is similar to the apoptotic cell problem discussed earlier.

      We could remove cells with the double DNA signal, but this would remove not only accidental doublets but also the legitimate (dividing) cells. So the question is how to remove the illegitimate doublets but not the legitimate?

      Of note, the trajectory going through doublets does not affect the interpretation of other trajectories as it is readily discriminated by persistent homology and thus random walks passing through this (spurious) trajectory do not contaminate the markers’ evolution inferred for legitimate trajectories.

      We therefore prefer to remove only the barcode illegitimate and keep all others in analysis, using the expert analysis step also to identify (using the cell cycle markers plus other features) the artificially formed doublets and thus spurious connections.

      (10) The authors should discuss how the gene expression trend plots are made (e.g. how are the expression averaged? Rolling mean?).

      The development of those markers is shown as a line plot connecting the average values of a specific marker within a pseudotime segment. By default, the pseudotime values are divided into uniform segments (each containing the same number of points) whose number can be changed in the GUI. To focus on either early or late stages of the development, the segment division can be adjusted in GUI. See section 6 of the Supplementary Note.

      Reviewer #3 (Recommendations For The Authors):

      The overall figures quality needs to be improved. For example, I can barely see the text in Figure 3c.

      Resolution of the Figures was improved

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02825

      Corresponding author(s): Padinjat, Raghu

      Key to revision plan document:

      Black: reviewer comments

      Red: response to reviewer comment-authors

      Blue: specific changes that will be done in a revision-authors

      1. General Statements [optional]

      We thank the reviewers for their detailed comments on our manuscript and appreciating the novelty, quality and thoroughness of the work. Detailed responses to individual queries and revision plans are indicated below.

      2. Description of the planned revisions

      Reviewer 1:

      Summary The study by Sharma et al uses iPSC and neural differentiation in 2D and 3D to investigate how mutation in the OCRL gene affects neural differentiation and neurons. Mutation in the OCRL gene the cause of Lowe Syndrome (LS), a neurodevelopmental disorder. Neural cultures derived from LS patient iPSCs exhibited reduced excitability and increased glial markers expression. Additional data show increased levels of DLK1, cleaved Notch protein, and HES5 indicate upregulated Notch signaling in OCRL mutated neural cells. Treatment of brain organoids with a PIP5K inhibitor restored calcium signalling in neurons. These findings describe new dysregulated phenotypes in neural cultures of OCRL mutated cell shedding light on the underlaying caus of Lowe Syndrome.

      Major comments

      1. In general, I think the use of iNeurons usually means direct reprogramming from a somatic cell to neurons without the iPSC stage. Could be confusing to use this term for iPSC derived neurons. Thank you for pointing this out. We agree and will remove this term and replace it with a more suitable one in the revised manuscript.

      Please add at least one more replicate of WP cell line to the single nuclei RNAseq.

      There is no cell line called WP1 in the manuscript. We believe the reviewer was likely referring to WT1 (wild-type 1).

      10xgenomics guidelines highlight that the statistical power of a multiome experiment relies on several factors including sequencing depth, total number of cells per sample, sample size and number of cells per cell type of interest (10xgenomics). In this study, we performed a multiome experiment and obtained high-quality reads from 20,000 nuclei for each sample for both the modalities: snRNA seq and snATAC seq. The multiome kit recommends a lower limit is 10,000 nuclei per sample. Thus the number of cells sampled per cell line is double the suggested minimum. Therefore, and consistent with other single-cell seq studies already published, our study followed the approach where biological replicates were not included ( for e.g see PMID: 39487141, GSE238206; PMID: 31651061; PMID: 32109367, GSE144477; PMID: 40056913, GSE279894; PMID: 38280846 GSE250386; PMID: 36430334, GSE213798; PMID: 33333020, GSE123722; PMID: 32989314, GSE145122; PMID: 38711218, GSE243015, PMID: 38652563, GSE236197). Furthermore, single-cell RNA-seq inherently treats each individual cell as as a replicate (Satija lab guidelines, PMID: 29567991; Wellcome Sanger Institute), reducing the necessity for additional biological replicates. Overall this appears to be the current standard in the field which we have followed.

      Importantly, we took additional steps to validate the predictions our single-nuclei RNA-seq findings experimentally. For this we used a 3D brain organoid system. We confirmed key observations noted initially in 2D neural stem cells using a brain organoid model. This approach allowed us to confirm key predictions from the single cell sequencing data set. For example, in Lowe Syndrome patient derived organoids and OCRL-KO organoids, we noted increased DLK1 levels (Fig5.C-D, H-I) as well as increased GFAP+ cells and gene expression in brain organoids (Fig.S4E,F). These complementary approaches strengthen our confidence in the biological relevance of our findings from the single nuclei sequencing experiments.

      The WT1 and the patient lines are rarely analysed together with the WT2 and KO lines, thus it is tricky to understand if the KO line is mimicking the patient lines? Please, add more merged analyses. Co-analysing all lines:

      (i)would show if the KO line is more similar to the patient lines or to the WT1 or somewhere in between.

      1. ii) Could answer questions about the variation in phenotypes between the genetic backgrounds. iii) Elucidate how much variability there is between the two WT lines in your assays. If the two WT lines vary much then conclusions about phenotypes in the patients and KO lines might need to be rethought? The reviewer is right is noting that throughout the manuscript we have analysed the patient lines with WT1 and the KO line with WT2. This was a conscious decision which we believe is the correct one for the following reasons:

      It is well recognized and discussed in the literature that genetic background can be a key factor contributing to phenotypes observed in cells differentiated from iPSC (Anderson et al., 2021, PMID: 33861989; Brunner et al., 2023, PMID: 36385170; Hockemeyer and Jaenisch, 2016, PMID: 27152442; Soldner and Jaenisch, 2012, PMID: 30340033; Volpato and Webber, 2020, PMID: 31953356). Therefore, as a matter of abundant precaution, in this study we have tried to use the closest possible genetically matched control lines for analysis.

      The patient lines used in this study for Lowe syndrome were all derived from a family in India of Indian ethnic origin. Therefore, in order to reduce the potential impact of genetic background contributing to potential phenotypes, we have used a control line derived from an individual of Indian ethnic background; this line has previously been developed and published by our group (PMID: 29778976 DOI: 10.1016/j.scr.2018.05.001). By contrast, the OCRLKO line was generated using the control line NCRM5 (WT2); this line is derived from a Caucasian male (RRID: CVCL_1E75). Therefore, whenever we have analyzed OCRLKO, we have used NCRM5 as the control; throughout the manuscript, NCRM5 is referred to as WT2.

      However, in deference to the reviewer’s concerns we have performed a few analyses to compare the extent of variability between the two control lines.

      Figure Legend: Replotted [Ca2+]i transients data from LS patient lines, OCRLKO and two control cell lines WT1 and WT2. (A) There is no statistical difference in the frequency of [Ca2+]i transients between WT 1 and WT2. Test used-Mann Whitney test. (B) Plot with WT1 and WT2 data combined versus all three LS lines and OCRLKO combined. Test used-Mann Whitney test. (C) WT1 and WT2 combined plotted against three individual patient lines and OCRLKO. Statistical test used One-way ANOVA. (total neurons analysed: WT1:808; WT2:267; LSP2:150; LSP3:462; LSP4:463; OCRLKO:411)

      (i) We compared the frequency of calcium transients between neurons of age 30 DIV between WT1 and WT2 (Panel A above). We found no significant difference between these.

      Additionally, as suggest we combined the data from both control lines into a single set and that from all the LSP patient lines and OCRLKO into another one (Panel B above). At the end of the analysis the difference between control and OCRL depleted cells remains. Please note the large number of cells studied in each genotype.

      We also combined both control lines into a single control data set and compared it to each patient line and OCRLKO. We find that each patient line and OCRLKO is still significantly different from the control set (panel C above).

      We did not find that OCRLKO to be significantly different from LSP2 or LSP4, indicating that the OCRLKO line closely aligns with the patient-derived lines, supporting the idea that the observed phenotype is primarily disease-driven rather than background-dependent. However, we did observe a significant difference between LSP3 and OCRLKO, highlighting some degree of inter-patient variability. Therefore, the key point is that the disease phenotype remains stable across different backgrounds, reinforcing the idea that the observed differences are driven by OCRL loss rather than background variability. This will be discussed in the revision.

      (ii) In our RTPCR assay for HES5, when WT1 and WT2 are plotted together, there is no significant difference observed (panel A below). Similarly, western blotting data for cNotch (panel C) and DLK1 (panel B) of pooled WT1 and WT2 together on one plot shows no significant difference (Unpaired t-test, Welch’s correction). Overall, based on the above data, WT1 and WT2 are not statistically different.

      Figure legend: Comparison of control lines WT1 and WT2. (A) comparison of HES5 transcripts. (B) Western blot for DLK1 levels. (C) Western blot for cleaved notch protein levels. Statistical test: Unpaired t-test, Welch’s correction.

      Please include more discussion and rational around the link between the expression pattern of OCRL and the various phenotypes shown. From the RNAseq data performed at the NSC state where the expression of OCRL is lower than in neurons there are considerable differences in cell type distribution between lines. How can this skew cell type distribution affect downstream differentiation and neuronal function?

      We would like to highlight that we did not perform bulk RNAseq in NSC and neurons; rather, we performed snRNA seq in NSCs (Fig3). The data in Fig.1E is mined from a publicly available resource dataset (Sidhaye et.al., 2023, PMID: 36989136) as mentioned in line 155, which is an integrated proteomics and transcriptomics generated from iPSC-derived human brain organoids at different stages of development in-vitro.

      Fig 1D and 1E do indeed show lower levels of OCRL expression in NSC compared to neurons. However, it is important to bear in mind that even though OCRL may be expressed at relatively low levels during the NSC stage, its enzymatic activity could still have a substantial impact. Therefore, even at low expression levels, OCRL could be modulating the PI(4,5)P2 pool in ways that significantly influence cellular functions, especially during early stages of neurodevelopment that alter cell-fate decisions thereby affecting neuronal excitability.

      Our working model posits that loss of OCRL leads to increased levels of PI(4,5)P2 which upregulates Notch pathway thereby leading to an increase in its downstream effector HES5. HES5 is a known transcription factor influencing gliogenesis and thus leading to a precocious glial shift in OCRL deficient NSCs as seen in our multiome dataset. This temporal perturbation in differentiation affects maturation of LS/OCRL-KO neurons and/or astrocytes leading to a defective neuronal excitability.

      Also, OCRL is expressed also at the iPSC state as shown in Figure 1I, do you see any phenotypes in iPSC? If not, explain how that could be.

      Yes, OCRL is indeed expressed in iPSCs as shown in Figure 1I. In an earlier paper from our lab that described the generation of these patient derived iPSC from Lowe syndrome patients (Akhtar et.al 2022 PMID: 35023542), we have reported that PIP2 levels are elevated at the iPSC stage as well as NSC stage in OCRL patient lines. We have not performed a detailed analysis of the iPSC stage for these lines as the focus of our investigation was primarily on the later stages of differentiation, particularly in neural progenitors and differentiated neurons. However, in response to the reviewer’s questions on why there are no obvious phenotypes at the iPSC we would suggest that this is due to compensation from the activity of other genes of the 5-phosphatse family. In support of this, we would cite our previous study (Akhtar et.al 2022 PMID: 35023542), in which we show that in LS patient derived lines, at the iPSC Stage, at least six other 5-phosphatases are upregulated.

      There is not enough data in the manuscript to show mechanistic links between OCRL, DLK1 and Notch so be aware not to overstate the conclusions.

      We appreciate the reviewer’s constructive comment regarding the mechanistic links between OCRL, DLK1, and Notch. Treatment of organoids and neurons with UNC-3230 PIP5K1C inhibitor rescues the observed phenotypes suggesting a role for a PIP2 dependent process, this process itself remains to be identified. We will adjust the wording in the manuscript during the revision to ensure that this comes through and the conclusions do not appear overstated.

      Line 173, please describe what mutation in the OCRL these patients have, is it a biallelic deletion? Is the protein totally absents? Please show western blot analyses of the protein in the patient lines.

      The patients from whom these LS lines were generated, the nature of the OCRL allele in them and the status of OCRL protein have all been previously been described in detail in a paper from our lab. This paper (Akhtar et.al 2022 PMID: 35023542) has been cited in the present manuscript at the very first occasion that the lines are described (Line 174, references 26 and 27). In addition, in the present manuscript, the protein status of OCRL in all the three patient lines is shown with a Western blot in Figure 3C.

      Would be good with a bit of clinical explanation of these patients? Do they have the same level of severity? Are there any differences between their clinical symptoms? This could be interesting to link to differences in cellular phenotypes.

      The clinical details of each patient are described in a preprint from our lab (Pallikonda et.al., 2021 bioRxiv 2021.06.22.449382).The potential reasons for the difference in severity, a very interesting scientific question, is also addressed in this preprint. Currently experimental analysis to support the proposed likely reasons is ongoing in our lab. We feel those analysis are beyond the scope of this manuscript and will be published later this year as a separate study.

      As described in in ref 26 and 27, LSP patients have a mutation in exon 8 leading to a stop codon. We mimicked this by CRISPR based genome editing to introduce a stop codon and protein truncation in exon 8 to generate of WT2 to OCRLKO. This is also described in supplementary Fig 1 of the present manuscript and the technical details of line generation are fully described in the materials and methods.

      Like the patient lines OCRLKO is a protein null allele-this is shown by Western blot in Fig 2D. Also in OCRLKO, the PIP2 levels are elevated (Fig 2E) recapitulating what has been described by us in (Akhtar et.al 2022 PMID: 35023542). We will explicitly state this detail around line 185.

      Figure 1I, could the protein levels at the different stages be quantified?

      Yes, we can and will do it in the revision

      Figure 3A, there seem to be much more cells in LSP2, making it tricky to compare with the other cell lines. Density during differentiation can affect the cell fate. Please, provide images from the different lines that are comparable with similar density.

      We controlled for cell density by seeding equal number of cells 50,000 cells/cm2 for all the genotypes, as mentioned in the material and methods. However, heterogeneity between lines during terminal differentiation is well-established, leading to crowding in some genotypes while not in others. Additionally, different growth rates during terminal differentiation also leads to crowded neural cultures as a function of genotype. Therefore, to complement our immunostaining data, we have provided western blot analyses showing increased GFAP protein levels in LS patient lines compared to controls. We will provide images from different lines that are comparable in density during the revision.

      Please provide quantification to the statement that there is fewer number of S100B cells in the LSP lines.

      As we haven’t quantified the number of S100B cells, we will remove that statement.

      Figure 3B, the images show cells very different, and it is tricky to compare similarities and differences, please provide images that look more similar to each other. Avoid images with clusters of cells or make sure to select representative images with clusters from each cell line. If the clustering is a phenotype explain and quantify that. Make sure the density is similar in all pictures.

      We will provide images of matched density during the revision. Also see response to comment above.

      Line 2018, the statement "In the same cultures, there was no change in the staining pattern of the neuronal markers MAP2 and CTIP2 (Fig 3B)" is not strengthened by the figure. Please provide new pictures or data to prove the statement.

      As CTIP2 staining is inherently observed in either clumps or sparsely distributed regions across WT1 and LSP genotypes, we will replace the CTIP2 marker with TBR1, which is also a deep layer cortical marker (layer VI-V), as shown below. Using this additional marker for neurons, we continue to see no change in staining pattern of neuronal markers MAP2 and TBR1. Corresponding images for each genotype are optically zoomed-in images of individual neurons positive for MAP2 and TBR1. Scale bar=50µm, 20µm.

      Figure 3E, please describe all markers in the picture, thus also MAP2, S100B, CTIP2 and draw conclusions. Try to show comparable pictures.

      This will be attended in the revision

      Fig 3D and G, what are the replicates? please explain.

      Each point represents a single neural induction done on iPSCs to generate NSCs and then terminally differentiated 30DIV cultures. Experiments were done across 3-6 independent neural inductions. This detail will be included in the revised figure legend.

      Figure 4 A, C, there is a large difference in the ratio of different cell types between the different cell lines, also between the LSP2 and LSP3. This would indicate either that the genetic background affects the phenotype to a large extent or that there is large variability between rounds of differentiation. To understand how much variability that comes from the differentiation and culturing: another replicate of WP cell from another donor (WT2) should be included (single nuclei RNAseq). Confirm that three independed rounds of differentiation of the WT1, WT2, LSP2, LSP3, LSP4, and OCRL-KO result in similar outcome when it comes to cell type distribution. Could be done with qPCR marker.

      For scientific reasons explained in response to the reviewer’s comment #2 we feel it is not necessary to perform replicates of the single nucleus multiome seq. However to allay the reviewer’s concern of variability between differentiations leading to a conclusion of altered cell state we present the following three suggestions for a revised manuscript:

      • We will perform multiple differentiations from iPSC to NSC and test the altered cell state using Q-PCR for transcripts of glial lineage markers.
      • Shown below are western blot analyses for WT1, LSP2, LSP3 and LSP4 NSCs (left). Analyses were done from 4 independent rounds of neural inductions and exhibit a significant increase in the levels of a astrocytic fate-determinant marker NF1A in LSP NSCs wrt to WT1 (Mann Whitney test used to measure statistical significance). Each point represents sample from an independent neural differentiation.

      • We would also like to highlight that we have already demonstrated increased GFAP levels in LS patient derived differentiated cultures and OCRLKO. These data, quantified in Fig 3D are done using samples derived from multiple differentiations of iPSC to NSC and then terminally differentiated. Thus the phenotype of enhanced glial cells in LS derived cultures, is most likely a consequence of the increased number of glial precursor cells is seen across multiple differentiations.

      Line 309, "astrocytic transcripts NF1A and GFAP was elevated" It is unclear from this sentence in which cell lines NF1A and GFAP is elevated? Please explain.

      We acknowledge the incompleteness in the statement. We will add the complete statement explaining the graphs. The levels of astrocytic transcripts NF1A and GFAP were elevated in LSP3 and LSP4 compared to WT1.

      Figure 5C, E, G, there is a large variation of Notch and Hes5 expression between the different

      This comment is incomplete.

      Figure 5H, unclear which of the bands that is DLK1 and how the bands relate to the quantification. The band at 50 kDa seems to be stronger in the WT2 than in the OCRL-KO but in the quantification in Figure 5I, it shows 2x more in the KO. Thus, the other way around.

      The datasheet of DLK1 antibody used (Abcam ab21682; RRID_AB731965) describes bands seen at 50,48, 45 and 15kDa. We have quantified the bands at 50kDa and 48-45kDa for all the genotypes. This will be explicitly stated in the revised figure legend.

      Figure 6, please show that the inhibitor is inhibiting PIP5KC.

      Have you titered the added concentration of the inhibitor?

      Figure legend: Fields of view from WT1 derived NSC expressing the plasma membrane PIP2 reporter. Plasma membrane distribution of the probe indicating PIP2 levels is shown in (A) untreated cells (B) treatment with 10mM and (C) 50mM UNC-3230 PIP5K1C inhibitor. Scale bar=50µm (D) Quantification of plasma membrane PIP2 levels using this reporter. Y-axis shows probe levels at PM; X-axis shows treatment conditions.

      Yes, we used a previously generated plasma membrane PH-PLC::mCherry reporter WT1-NSCs (Akhtar et.al., 2021) and carried out a dose-response experiment using 10mM and 50mM of the UNC-3230 PIP5K1C inhibitor as shown above. We quantified intensity of PI(4,5)P2::mCherry at the plasma membrane and plotted the mean intensity. We observed a significant decrease in plasma PI(4,5)P2 levels at 50mM (Statistical used: Mann Whitney test) but not 10mM and therefore we selected that concentration for our experiments.

      Figure 6B, why do the calcium data for the WT2+1Ci look so different to the other, the dots are much more spread and seem to fewer replicates that for the other sample, please explain.

      We had only analysed a few replicates for WT2+1Ci genotype. We analysed the remaining replicates and have updated the data as shown below. The revised data set resolves the reviewer’s concern. The revised data set will be included in the revision.

      Figure 6F, there is no significant differences between the bars but the statement in the text (sentence starts on line 332) indicate it is, please update the figure or remove the statement.

      We added more replicates (now total is 7-10 biological replicates each with 15-20 organoids) and updated the figure (panel B) is shown below. The differences between treated and untreated of OCRLKO are significant whereas there is no significant difference between wild type, treated and untreated (statistical test: Mann Whitney test).

      Revised figure will be included in the revision

      Figure 6G, the HES5 expression seem to behave very similar in both WT2 and OCRL-KO cells when the inhibitor is used. What does this mean? Seems to not be linked to OCRL. Explain.

      Thank you for your comment. In our initial experiment (shown in original version of manuscript), we observed a reduction in HES5 expression upon inhibitor treatment in both WT2 and OCRL-KO cells. However, to ensure robustness of our findings, we repeated the experiment across multiple, additional independent organoid differentiation batches. In this redone experiment, we no longer observe the previous trend. Instead, we see no significant changes in WT2 on inhibitor treatment, while OCRLKO cells show a reduction in HES5 expression upon inhibitor treatment (Panel A). Similarly, the protein levels of cNotch and DLK1 are not different between WT2 and WT2+1Ci (panel B and C). This strongly suggests loss of OCRL leading to elevated levels of PIP2 perturbs Notch pathway, resulting in higher cNotch and thereby increased effector expression of HES5. New data set will be included in the revision.

      Minor comments

      The panels in Figure 6 are not completely referred to correctly in the text, please check. Double check that all figure panels are referred to properly in the text

      Yes, we will correct it in the revised manuscript.

      Reviewer #1 (Significance (Required)): The manuscript is an interesting addition to the in vitro iPSC derived cellular modelling of neurodevelopmental disorder. Strengths: The use of both patient iPSC lines and CRISPR edited lines The use of both monolayer and 3D cultures We thanks the reviewer for their detailed critique. Addressing these has helped improve the manuscript. We thank the reviewer for appreciating the strengths of the manuscript. Weaknesses: the significance decrease a bit due too few replicates (only 1 WT line in each experiment) and the variability between the patients' cell lines. We thank the reviewer for this comment. As explained above we have added substantially more data and revised the analysis which should remove this concern.

      Reviewer 2:

      This paper describes the effects of loss of OCRL (the Lowe syndrome protein) upon the function and differentiation of neurones, using an in vitro iPSC model system. Cells derived from three related Lowe syndrome patients and an OCRL knockout, generated using CRISPR, were used for these experiments. The results show that upon loss of OCRL, differentiation of stem cells into neurones is reduced, with an increased number of cells adopting glial and astrocytic fates. The neurones that are generated have reduced calcium transients and electrical activity. Gene expression data combined with biochemical analysis indicate altered Notch activity, which may account for the altered cell fate data seen in the in vitro differentiation model. Finally, rescue of cell fate and neuronal activity is seen upon knockdown of a PIP5K, which indicates that these phenotypes are due to the elevated PIP2 levels seen on the OCRL-deficient cells.

      The results provide new insights into the pathogenesis of Lowe syndrome. I found the paper to be well done, and the data supports the conclusions of the authors. I have a few comments below that may improve the manuscript:

      We thank the reviewer for summarizing the comprehensive nature of our study and appreciating the value of our study in providing new insights into the pathogenesis of Lowe syndrome with respect to the brain. Thank you for appreciating that our study is well done, and that the data supports the conclusions of the authors.

      Major points

      1. The UMAP and ATAC-Seq data indicate different maps for the two different Lowe syndrome patient-derived cells (Fig 4 and Fig S3). This suggests that the cells are quite different, and therefore that changes seen in one Lowe syndrome patient may not be applicable to the others. I think this heterogeneity has important implications for the paper i.e. how general are findings obtained? Several different glioblast types are described (numbered 1-5)- how different or similar are these? We are unclear what the reviewer means by “ the UMAP and ATAC seq data indicate different maps…….”.

      UMAP is a technique for visually representing data generated by single cell analysis methods be it RNAseq or ATAC seq. Perhaps what the reviewer means is that the UMAP generated from RNA seq and ATAC seq data looks different from each other.

      We would like to reiterate that the UMAP generated from single cell RNA seq data is based on the complement of transcripts in each cell of the analysis compared to an existing single cell RNAseq data set, whereas the UMAP generated from ATACseq is generated from regions of open chromatin detected in and around genes and therefore presumably also reflecting ongoing gene expression. In principle the two analyses for any set of cells should indicate overall clustering into similar groups on UMAPs generated using both data sets, if the ATACseq based read out of transcription largely maps the RNAseq based read out of differences in transcription. However, it may not be reasonable to expect them to be identical. This is indeed what we see for our data set, and this has been represented in Fig 4E. The cell clusters detected based on GEX (gene expression i.e single cell RNA seq) analysis are plotted against the cells clusters detected from ATACseq data using a confusion matrix. As can be seen from this panel (Fig 4E), a very large fraction of cells falls on the diagonal indicated a large degree of similarity between clusters detected by both methods (GEX and ATACseq) of analysis. This can be reiterated more strongly during the revision by strengthening this statement.

      The PIP5K inhibitor seems to have a very strong effect on both WT and KO cells in terms of Notch activity (Fig 5G). This strongly suggests the effects of this inhibitor are not through OCRL and that changes in PIP2 induced by the inhibitor override those of OCRL. Thus, the experiments shown in Fig 5 seem not to be due to a rescue of OCRL activity as such.

      We think reviewer means Fig 6G and our response is as follows:

      In our initial experiment (shown in the current version of manuscript), we observed a reduction in HES5 expression upon inhibitor treatment in both WT2 and OCRLKO cells. However, to ensure robustness of our findings, we repeated the experiment across multiple, additional independent organoid differentiation batches. In this redone experiment, we no longer observe the previous trend. Instead, we see no significant changes in WT2 on inhibitor treatment, while OCRLKO cells show a reduction in HES5 expression upon inhibitor treatment (Panel A). Similarly, the protein levels of cNotch and DLK1 are not different between WT2 and WT2+1Ci (panel B and C). This strongly suggests loss of OCRL leading to elevated levels of PIP2 perturbs Notch pathway, resulting in higher cNotch and thereby increased effector expression of HES5. The figures updated with the new data will be included in the revision.

      Minor points

      1. The main text needs to say what synapsin is and why it was analysed. In Fig 1I, synapsin abundance declines at 90 days. This appears quite strange. The authors should comment on it in the text. We will add a line about use of synapsin in the western. Synapsin is only used qualitatively to highlight mature neuronal culture age, as was done in Sidhaye et.al PMID: 36989136.

      In the revised main text, we will add the following explanation: "We also analyzed the expression of synapsin-1, a synaptic vesicle protein that serves as a marker for mature synapses and functional neuronal networks. The presence of synapsin-1 indicates the development of synaptic connections in our cultures, providing evidence of neuronal maturation."

      .

      The decline and thereby variability in synapsin-1 protein levels has been reported before. Regarding the decline in synapsin-1 at 90 days, we can add the following discussion:

      "We observed a decline in synapsin-1 levels at 90 days in vitro (DIV) compared to earlier time points. This pattern has been previously reported in iPSC-derived neuronal models (Togo et.al PMID: 34629097 and Nazir et.al PMID: 30342961). Such variability in synapsin-1 expression over extended culture periods may reflect the dynamic nature of synaptic remodeling and maturation processes in vitro. It's important to note that synapsin-1 levels can fluctuate due to various factors, including culture conditions and the heterogeneity of neuronal populations present at different time points."

      In Fig 2A and 3B there are clumps of green cells (CTIP2 positive). I am concerned that the lack of uniformity in the cell distribution could impact other analysis performed, where certain fields of view have been analysed e.g. by imaging or electrophysiology e.g. calcium measurements.

      To address the reviewers concern about uniformity, in the revised manuscript, we will provide/replace the representative images of deep layer markers along with MAP2 from all genotypes showing the areas selected for analysis to demonstrate that data collection was performed in comparable regions across all experimental conditions. As answered in the response to reviewer 1, comment 11.

      The clumps of neurons (as seen in Fig2A) poses challenges for obtaining high-quality seals during patch-clamp recordings. To address this, we primarily selected areas with sparsely distributed neurons for electrophysiology experiments. This approach ensured robust recordings. To address this, we can provide a clarification in the Methods section to explicitly state that neurons used for all patch-clamp recordings were chosen from regions where cells were sparsely distributed.

      In case of calcium imaging experiments, we focused on both crowded and sparse fields of views across genotypes to avoid potential biases introduced by clumped cells. However, it is to be noted that during the stages of terminal differentiation there are NSCs undergoing proliferation, which makes the neuronal culture denser. We can provide video files as a supplementary material to demonstrate the types of areas used for calcium imaging experiments. Additionally, we will include a statement in the Methods section specifying that regions with uniform neuronal distribution were selected for calcium imaging to ensure consistency in our analysis.

      In Fig 2J and 2K are the differences between sampels significant? The error bars are huge.

      From line 204-209, we have not used the word “significantly different”. We acknowledge that the error bars in Figures 2J and 2K are indeed large, which is not uncommon in electrophysiological recordings from iPSC-derived neurons due to their inherent variability. We have intentionally refrained from claiming statistical significance for these specific comparisons. Instead, we describe the data as showing a pattern or trend of reduced currents in OCRLKO neurons compared to WT2. To improve clarity, we propose to add a statement in the results section acknowledging the variability in these measurements and explaining our interpretation of the data as a trend rather than a statistically significant difference.

      In Fig S4- it would be good to show gene expression analysis and GFAP staining

      We are not completely sure what this comment means. However the present figure shows double staining with GFAP and S100beta. These will be split and shown separately to enhance clarity.

      Fig 5A needs more annotation- fold change comparing what to what?

      We will add the annotation “fold change wrt to WT1”.

      There should be more information provided in the main text relating to DLK1. For example, it is shown to be secreted, but no information is provided on whether this is expected. Secreted? The DLK1 blot in Fig 5F is not convincing.

      We will add more information relating to DLK1 and secretion status.

      DLK1 is a non-canonical notch ligand that is indeed known to be secreted by neighboring cells to either activate/inhibit notch pathway. While we acknowledge the blot could have been better, however, variability in the blot could arise due to differences in secretion efficiency, or protein stability in the cell culture media that could have led to inconsistencies across LSP genotypes. However, as shown in the blot, the OCRLKO shows a clear enrichment of secreted-DLK1 compared to WT2.

      We have performed the western blot analyses across two independent differentiations of organoids from WT1, LSP2, LSP3, LSP4, WT2, OCRL-KO iPSCs in phenol-free neurobasal-A medium, and quantified secreted protein. We then loaded 40mg of protein per genotype. Shown below is the quantification. The quantification of mean intensity of DLK1 band shows a moderate increase in LSP2, and substantial increase in LSP3 and LSP4 organoids as compared to WT1. While OCRL-KO a substantial increase compared to its control, WT2. A revised figure will be used in the revision.

      Rationale for choosing PIP5K1C

      PIP5K1C is one of the major regulators maintaining appropriate levels of the synaptic pool of PI(4,5)P2, synaptic transmission and synaptic vesicle trafficking (Hara et al., 2013 PMID: 23802628; Morleo et al., 2023 PMID: 37451268; Wenk et al., 2001 PMID: 11604140). Therefore, we were interested in rescuing the physiological phenotype, we chose PIP5K1C. Additionally, in initial experiments we found that inhibiting PIP5K1B using ISA-2011B killed the organoids or lead to detachment of 2D neuronal cultures.

      Fig 6D is confusing. I suspect the figure labelling is not correct- it does not correlate with the graphs.

      We apologise for the error and will correct this.

      Reviewer #2 (Significance (Required)):

      This paper is significant because it provides important new information on the neurological features of Lowe syndrome. The approach is novel in terms of studying this condition. The findings are likely to be of interest to clinicians, cell biologists, neurobiologists and those studying human development. My expertise is in membrane traffic and OCRL/Lowe syndrome. I am not a neurobiologist.

      We thank the reviewer for appreciating the importance of our study, novelty of findings and newof our approach we have used. We would light to highlight that while extensive work has been done with respect to the renal phenotype of Lowe syndrome, the brain phenotypes have remained largely a black box. This is in part because mouse knockouts of OCRL have failed to recapitulate the brain related clinical phenotypes displayed by Lowe syndrome patients (for e.g. PMID: 30590522; PMCID: PMC6548226; DOI: 10.1093/hmg/ddy449). Our study of brain development defects in Lowe syndrome depleted cells provides the first insight into the cellular and developmental changes in this disorder.

      Reviewer 3:

      This paper by Sharma et al describes findings in an iPSC model of Lowe Syndrome. This is an important line of research because no mouse models phenocopy the neurodevelopmental aspects of the condition. They identified a potential role of Notch signaling in pathogenesis, a potentially druggable target. However, several issues need to be addressed.

      We thank the reviewer for appreciating the importance of our study in covering the basis of the neurodevelopmental phenotype of Lowe syndrome. Due to a lack of a mouse model, there was previously no understanding of how the clinical features related to the brain arise.

      Major issues

      1. The sample size is very small, which is understandable to some extent given the expense and difficulty doing research using iPSCs. However, there are a couple of opportunities to improve the sample size. For example, in the analysis of DLK1 and other proteins shown in Figure 5, the analysis amounts to a single control vs the 3 patient lines, and a single control vs the KO line. The separation is justified because a complete KO of the gene might result in differences compared to hypomorphic mutation that apparently affects the 3 cases. However, there is no reason why WT1 and WT2 shouldn't be combined. They are both random controls. This might not affect the results of the other proteins analyzed, NOTCH and HES5, but the significance of DLK1 could change. Nature of the allele in LS patient lines

      There is a misconception in the reviewer comment that the OCRL allele in the three Lowe syndrome lines is a hypomorph. This is not correct. In the patients from whom these LS lines were generated, the nature of the OCRL allele and the status of OCRL protein in cells have been previously described in detail in a peer-reviewed, published paper from our lab. This paper (Akhtar et.al 2022 PMID: 35023542) has been cited in the present manuscript at the very first occasion that the LS patient lines are described (Line 174, references 26 and 27). As described in in ref 26 and 27, LSP patients have a mutation in exon 8 leading to a stop codon. This results in a protein null allele of OCRL in all three patient lines. This has been shown in Fig 1B of Akhtar et.al 2022 by immunofluorescence using an OCRL specific antibody (PMID: 35023542). It has also been demonstrated by Western blot using an OCRL specific antibody for all three LS patient lines in Fig 3C and 5C of the present manuscript. The nature of the allele will be highlighted more clearly in the revision.

      *Combining WT1 and WT2 *

      We are not in favour of combining WT1 and WT2. The reason for this is as follows.

      It is well recognized and discussed that genetic background can be a key factor contributing to phenotypes observed in cells differentiated from iPSC (Anderson et al., 2021, PMID: 33861989; Brunner et al., 2023, PMID: 36385170; Hockemeyer and Jaenisch, 2016, PMID: 27152442; Soldner and Jaenisch, 2012, PMID: 30340033; Volpato and Webber, 2020, PMID: 31953356). As a result, it is recommended that a line closely matched for genetic background be used when assessing the validity of observed phenotypes. The patient lines used in this study for Lowe syndrome were all derived from a family in India of Indian ethnic origin. Therefore, in order to reduce the impact of genetic background contributing to potential phenotypes, we have used a control line (referred to in this manuscript as WT1) derived from an individual of Indian ethnic background; this line has previously been developed and published by our group (PMID: 29778976 DOI: 10.1016/j.scr.2018.05.001).”

      In the case of OCRLKO we have genome edited NCRM5 (a white Caucasian male control line) to introduce a stop codon in exon 8 to mimic the truncation seen in our LS patient lines. This allele is also protein null as shown by Western blot using an OCRL specific antibody. The data is shown in Fig 2D of the present manuscript. Therefore, we reiterate that all the LS patient lines in this study and OCRLKO are protein null alleles.

      Status of DLK1 levels

      We have performed a combined analysis of DLK1 levels in the two control lines and all the patient lines as well as OCRLKO. As shown below the result remains unchanged, namely that DLK1 levels are elevated in OCRL depleted cells in this model system.

      Figure legend: Quantification of DLK1 protein levels in control, LS patient and OCRLKO iPSC lines. Western blot intensities for each patient line and OCRLKO were normalized to GAPDH and then to the respective internal WT control (WT1 or WT2) resulting in fold-change values. For statistical analysis across genotypes, normalized fold-change values from different gels were pooled post hoc. All statistical testing was performed on fold-change values. Statistical test used: Mann Whitney test. (A) Values for WT1 and WT2 have been combined and plotted against individual values for three patient lines and OCRLKO (B) Values for WT1 and WT2 have been combined and plotted against combined values for all three LSP lines and OCRLKO.

      Reviewer comment: DLK1 expression brings up another point. This, along with MEG3 and MEG8 are imprinted genes, two of the top differentially expressed genes in this study. However, these findings can be questioned by the well-known phenomenon that the expression of some imprinted genes may not be properly maintained during iPSC reprogramming. Thus, the differential expression of these imprinted genes might be due to a reprogramming artifact rather than the effects of OCRL per se. Analyzing both controls together could mitigate this objection. However, even if it does, the potential dysregulation of imprinted genes in the development of iPSCs should be acknowledged and addressed.

      We are aware that the DLK1 locus is imprinted. However, we feel that reprogramming artifacts are very unlikely to explain the observed changes in DLK1 levels.

      It is important to note that the patient lines and WT1 were not directly re-programmed from White blood cells to iPSC and then used for differentiation and analysis. As detailed in our previous peer-reviewed publications WT1 (PMID: 29778976) and the patient LSP lines (PMID: 35023542) were first converted to lymphoblastoid cell lines and subsequently reprogrammed into iPSC.

      We think that re-programming induced imprinting changes are unlikely to be responsible for the elevated levels of DLK1 seen in LS patient lines. The reason is as follows:

      We compared DLK1 levels in WT2 and OCRLKO which is a CRISPR edited line that introduces a stop codon in exon 8. NCRM-5/WT2 was derived from CD34+ cord blood cells. What we found is that levels of DLK1 are elevated in OCRLKO compared to WT2. Since OCRLKO was generated by genome editing WT2, it must be the case that the level of imprinting of the DLK-DIO3 locus is comparable if not identical between the two lines. Therefore, the difference in DLK1 levels between WT2 and OCRLKO cannot be a consequence of different imprinting status of the DLK1 locus between these two lines. Rather, it strongly suggests a causal link to OCRL deficiency. Following on from this, the DLK1 levels are elevated in patient lines compared to the OCRLKO. We will highlight and discuss and explain this in the revised version.

      Similarly, in the calcium signaling experiment shown in fig.2, the KO and patient lines are justifiably separated. However, again, why not combine both controls in the comparison with the patient samples?

      The data has been reanalyzed and presented as requested by the reviewer. There is no change in the conclusion.

      For the reasons described above, it remains our preference to present each set of lines with the appropriate control; i.e WT1 and the three LS patient lines and WT2 with OCRLKO. However, as the reviewer has asked for it, we also present below analysis in which WT1 and WT2 and combined and LS patient lines and OCRLKO are combined. The replotted data is shown below. The essential conclusion shown in the main manuscript remains, namely that [Ca2+]i transients in LS depleted developing neurons is lower than in wild type.

      Figure Legend: Replotted [Ca2+]i transients from LS patient lines, OCRLKO and two control cell lines WT1 and WT2 (A) There is no statistical difference in the frequency of [Ca2+]i transients between WT 1 and WT2. Test used-Mann Whitney test. (B) Plot with WT1 and WT2 data combined v all three LS lines and OCRLKO combined. Test used-Mann Whitney test. (C) WT1 and WT2 combined plotted against three individual patient lines and OCRLKO. Statistical test used One-way ANOVA. (total neurons analyzed: WT1:808; WT2:267; LSP2:150; LSP3:462; LSP4:463; OCRLKO:411)

      Regarding the hypomorphic nature of the patient-specific iPSC, I do not see the OCRL variant that was found in the family. Please correct me if I missed that, and if it was omitted, it should be included. I suspect that the variant generates a hypomorphic OCRL protein because the authors show expression in Figure 1D. Hypomorphic OCRL mutations compared with complete KO could show differences in molecular phenotypes, as found in Barnes et al. (PMID: 30147856) in an analysis of F-actin and WAVE-1 expression.

      Nature of the allele in LS patient lines

      There is a misconception in the reviewer’s comment that the OCRL allele in the three Lowe syndrome lines is a hypomorph. This is incorrect. In the patients from whom these LS lines were generated, the nature of the OCRL allele in them and the status of OCRL protein have all previously been described in detail in a peer-reviewed, published paper from our lab. This paper (Akhtar et.al 2022 PMID: 35023542) has been cited in the present manuscript at the very first occasion that the LS patient lines are described (Line 174, references 26 and 27). As described in in ref 26 and 27, LSP patients have a mutation in exon 8 leading to a stop codon. This results in a protein null allele of OCRL in all three patient lines. This has been shown in Fig 1B of Akhtar et.al 2022 by immunofluorescence using an OCRL specific antibody (PMID: 35023542). It has also been demonstrated by Western blot using an OCRL specific antibody for all three LS patient lines in Fig 3C and 5C of the present manuscript.

      The data presented in Fig.1D, E is a publicly available resource data PMID: 36989136 as mentioned in line 155, which is an integrated proteomics and transcriptomics generated from control iPSC-derived human brain organoids at different stages of development in-vitro.

      Minor issue

      The authors use the term mental retardation on line 102 to describe the cognitive phenotype in Lowe Syndrome. Medical, legal, and advocacy groups have abandoned this term because it is viewed as offensive. It is being replaced by intellectual disability, although this term also is problematic. In any event, many conferences on autism and intellectual disabilities are attended by families, and high-functioning cases sometimes address an audience of scientists. They would object to the use of this term if presented in a talk by one of the co-authors.

      Thank you. We will rephrase this line.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Not applicable at this stage. The above is a revision plan.

      4. Description of analyses that authors prefer not to carry out

      We prefer to not carry out replicates of the single cell multiome analysis. As explained above the state of the art in the single cell analysis field is to not do so. The scientific reasons as to why such replicates are not required have been explained in the response to the reviewer comment.

    1. Author response:

      Reviewer 1 (Public Review):

      “Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.”

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest entrance using views within a confined area. While many studies have focused on larger scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing on a smaller scale, especially in dense environments.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      We agree with your comment about the term "clutter". Therefore, we will refer to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions:

      line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views." Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the clutter but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing. (Neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we will include model results with the arena wall in the revised paper.

      As we wanted to investigate if bees would use ground views or bird’s eye views to home in a dense environment, we think the catchment volumes would provide qualitatively similar, though quantitatively more detailed information as catchment slices. Our approach of catchment slices is sufficient to predict whether ground or bird' s-eye views perform better in leading to the nest, and we will, therefore, not include further computations of catchment volumes.

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17.

      Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments. A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Our current knowledge of learning flights did not permit these investigations of bee training. Firstly, our setup does not allow us to record each inbound and outbound flight of the bumblebees during training. Doing so would require blocking the entire colony for extended time periods, potentially impairing the motivation of the bees to forage or the survival and development of the colony. Secondly, the exact locations where bees learn or if and whether they continuously learn by weighting the visual experience based on their positions and orientations is not always clear. It makes it difficult to categorise these flights accurately in learning and return flights. Additionally, homing models remain elusive on the learning mechanisms at play during the learning flights. Therefore, we believe that continuous effort must be made to understand bees' learning and homing ability. We felt it was necessary first to establish that bees could navigate back to the nest in a dense, cluttered environment. With this understanding, we are currently conducting a detailed study of the bees' learning flights in various dense environments and provide these results in a separate article.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the clutter.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled lab conditions. Both field and lab research are absolutely necessary and should feed each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of these components for the behaviour through targeted variation of individual components of the environment. These results should guide field-based experiments for validation.

      Our lab settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and will refer to our environment as a "dense environment."

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factor inherent to field work, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious

      mechanisms for homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public Review):

      Summary:

      In this paper, the authors aimed to test the ability of bumblebees to use bird-view and ground-view for homing in cluttered landscapes. Using modelling and behavioural experiments, the authors showed that bumblebees rely most on ground-views for homing.

      Strengths:

      The behavioural experiments are well-designed, and the statistical analyses are appropriate for the data presented.

      Weaknesses:

      Views of animals are from a rather small catchment area.

      Missing a discussion on why image difference functions were sufficient to explain homing in wasps (Murray and Zeil 2017).

      The artificial habitat is not really 'cluttered' since landmarks are quite uniform, making it difficult to infer ecological relevance.

      Thank you for your thorough evaluation of our study. We aimed to investigate local homing behaviour on a small spatial scale, which is ecologically relevant given that the entrance of bumblebee nests is often inconspicuously hidden within the vegetation. This requires bees to locate their nest hole within a confined area. While many studies have focused on larger spatial scales using radar tracking (e.g. Capaldi et al. 2000; Osborne et al. 2013; Woodgate et al. 2016), there is limited understanding of the mechanisms behind local homing, especially in dense environments as we propose here.

      We appreciate your suggestion to include the study by Murray and Zeil (2017) in our discussion. Their research explored the catchment areas of image difference functions on a larger spatial scale with a cubic volume of 5m x 5m x 5m. Aligned with their results, we found that image difference functions pointed towards the location of the objects surrounding the nest when the images were taken above the objects. However, within the clutter, i.e. the dense set of objects surrounding the nest, the model did not perform well in pinpointing the nest position.

      See the new discussion at lines 192-197

      We agree with your comment about the term "clutter". Therefore, we referred to our landmark arrangement as a "dense environment" instead. Uniformly distributed objects do indeed occur in nature, as seen in grasslands, flower meadows, or forests populated with similar plants.

      See line 20 and we changed the wording throughout the manuscript and figures.

      Reviewer 1 (Recommendations): 

      The manuscript is well written, nicely designed experiments and well illustrated. I have a few comments below.

      It would be useful to discuss known data of learning flights in bumblebees, and the height or catchment area of their flights. This will allow the reader to compare your exp design to the natural learning flights.

      In our study, we first focused on demonstrating the ability to solve a homing task in a dense environment. As we observed the bees returning within the dense environment and not from above it (contrary to the model predictions), we investigated whether they flew above it during their first flights. The bees did indeed fly above, demonstrating their ability to ascend and descend within the constellation of objects (see Supplementary Material Fig. 22).

      In nature, the learning flight of bumblebees may cover several decametres, with the loops performed during these flights increasing with flight time (e.g. Osborne et al. 2013; Woodgate et al. 2016). A similar pattern can be observed on a smaller spatial scale (e.g. Philippides et al. 2013). Similar to the loops that extend over time, the bees gradually gain altitude (Lobecke et al., 2018). However, these observations come from studies where few conspicuous objects surround the nest entrance.

      Although our study  focussed on the performance in goal finding in cluttered environments, we now also address the issue of learning flights in the discussion, as learning flights are the scaffolding of visual learning. We have already conducted several learning flight experiments to fill the knowledge gap mentioned above. These will allow us in a forthcoming paper to compare learning flights in this environment with the existing literature (Sonntag et al., 2024).

      We added a reference to this in the discussion (lines 218-219 and 269-272)

      Include bumblebee in the title rather than 'bees'.

      We adapted the title accordingly:

      “Switching perspective: Comparing ground-level and bird’s-eye views for bumblebees navigating dense environments”

      I found switching between bird-views and frog-views to explain bee-views slightly tricky to read. Why not use 'ground-views', which you already have in the title?

      We agree and adapted the wording in the manuscript according to this suggestion.

      I am not convinced there is evidence here to suggest the bees do not use view-based navigation, because of the following: In L66: unclear what were the views centred around, I assume it is the nest. Is 45cm above the ground the typical height gained by bumblebees during learning flight? The clutter seems to be used more as an obstacle that they are detouring to reach the goal, isn't it?

      Based on many previous studies, view-based navigation can be assumed to be one of the plausible mechanisms bees use for homing (Cartwright & Collett, 1987; Doussot et al., 2020; Lehrer & Collett, 1994; Philippides et al., 2013; Zeil, 2022). In our tests, when the dense environment was shifted to a different position in the flight arena, almost no bees searched at the real location of the nest entrance but at the fictive new location within the dense environment, indicating that the bees assumed  the nest to be located within the dense environment, and therefore  that vision played a crucial role for homing. We thus never meant that the bees were not using view-based navigation. We clarified this point in the revised manuscript.

      See lines 247-248, 250-259, added visual memory to schematic in Fig. 6

      In our model simulations, the memorised snapshots were centred around the nest. However, we found that a multi-snapshot model could not explain the behaviour of the bees. This led us to suggest that bees likely employ acombination of multiple mechanisms for navigation.

      We refined paragraph about possible alternative homing mechanisms. See lines  218-263

      The height of learning flights has not been extensively investigated in previous studies, and typical heights are not well-documented in the literature. However, from our observations of the first outbound flights of bumblebees within the dense environment, we noted that they quickly increased their altitude and then flew above the objects. Since the objects had a height of 0.3 metres, we chose 0.45 metres as a height above the objects for our study.

      Furthermore, the nest is positioned within the arrangement of objects, making it a target the bees must actively find rather than detour around.

      I think a discussion to contrast your findings with Murray and Zeil 2017 will be useful. It was unclear to me whether the flight arena had UV availability, if it didn't, this could be a reason for the difference.

      We referred to this study in the discussion of the revised paper (see our response to the public review). Lines 192-197

      As in most lab studies on local homing, the bees did not have UV light available in the arena. Even without this, they were successful in finding their nest position during the tests. We clarified that in the revised manuscript. See line 334-336

      Figure 2A, can you add a scale bar?

      We added a scale bar to the figure showing the dimensions of the arena. See Fig. 2

      The citation of figure orders is slightly off. We have Figure 5 after Figure 2, without citing Figures 3 and 4. Similarly for a few others.

      We carefully checked the order of cited figures and adapted them.

      Reviewer 2 (Public Review):

      Summary:

      In a 1.5m diameter, 0.8m high circular arena bumblebees were accustomed to exiting the entrance to their nest on the floor surrounded by an array of identical cylindrical landmarks and to forage in an adjacent compartment which they could reach through an exit tube in the arena wall at a height of 28cm. The movements of one group of bees were restricted to a height of 30cm, the height of the landmark array, while the other group was able to move up to heights of 80cm, thus being able to see the landmark array from above.

      During one series of tests, the flights of bees returning from the foraging compartment were recorded as they tried to reach the nest entrance on the floor of the arena with the landmark array shifted to various positions away from the true nest entrance location. The results of these tests showed that the bees searched for the net entrance in the location that was defined by the landmark array.

      In a second series of tests, access to the landmark array was prevented from the side, but not from the top, by a transparent screen surrounding the landmark array. These tests showed that the bees of both groups rarely entered the array from above, but kept trying to enter it from the side.

      The authors express surprise at this result because modelling the navigational information supplied by panoramic snapshots in this arena had indicated that the most robust information about the location of the nest entrance within the landmark array was supplied by views of the array from above, leading to the following strong conclusions: line 51: "Snapshot models perform best with bird's eye views"; line 188: "Overall, our model analysis could show that snapshot models are not able to find home with views within a cluttered environment but only with views from above it."; line 231: "Our study underscores the limitations inherent in snapshot models, revealing their inability to provide precise positional estimates within densely cluttered environments, especially when compared to the navigational abilities of bees using frog's-eye views."

      Strengths:

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      The experimental set-up allows for the recording of flight behaviour in bees, in great spatial and temporal detail. In principle, it also allows for the reconstruction of the visual information available to the bees throughout the arena.

      Weaknesses:

      Modelling:

      Modelling left out information potentially available to the bees from the arena wall and in particular from the top edge of the arena and cues such as cameras outside the arena. For instance, modelled IDF gradients within the landmark array degrade so rapidly in this environment, because distant visual features, which are available to bees, are lacking in the modelling. Modelling furthermore did not consider catchment volumes, but only horizontal slices through these volumes.

      When we started modelling the bees’ homing based on image-matching, we included the arena wall. However, the model simulations pointed only coarsely towards the dense environment but not toward the nest position. We hypothesised that the arena wall and object location created ambiguity. Doussot et al. (2020) showed that such a model can yield two different homing locations when distant and local cues are independently moved. Therefore, we reduced the complexity of the environment by concentrating on the visual features, which were moved between training and testing (neither the camera nor the wall were moved between training and test). We acknowledge that this information should have been provided to substantiate our reasoning. As such, we included model results with the arena wall in the supplements of the revised paper. See lines 290-293, Figures S17-21

      We agree that the catchment volumes would provide quantitatively more detailed information as catchment slices. Nevertheless, since our goal was  to investigate if bees would use ground views or bird's eye views to home in a dense environment, catchment slices, which provide qualitatively similar information as catchment volumes, are sufficient to predict whether ground or bird's-eye views perform better in leading to the nest. Therefore, we did not include further computations of catchment volumes. (ll. 296-297)

      Behavioural analysis:

      The full potential of the set-up was not used to understand how the bees' navigation behaviour develops over time in this arena and what opportunities the bees have had to learn the location of the nest entrance during repeated learning flights and return flights.

      Without a detailed analysis of the bees' behaviour during 'training', including learning flights and return flights, it is very hard to follow the authors' conclusions. The behaviour that is observed in the tests may be the result of the bees' extended experience shuttling between the nest and the entry to the foraging arena at 28cm height in the arena wall. For instance, it would have been important to see the return flights of bees following the learning flights shown in Figure 17. Basically, both groups of bees (constrained to fly below the height of landmarks (F) or throughout the height of the arena (B)) had ample opportunities to learn that the nest entrance lies on the floor of the landmark array. The only reason why B-bees may not have entered the array from above when access from the side was prevented, may simply be that bumblebees, because they bumble, find it hard to perform a hovering descent into the array.

      A prerequisite for studying the learning flight in a given environment is showing that the bees manage to return to their home. Here, our primary goal was to demonstrate this within a dense environment. While we understand that a detailed analysis of the learning and return flights would be valuable, we feel this is outside the scope of this particular study.

      Multi-snapshot models have been repeatedly shown to be sufficient to explain the homing behaviour in natural as well as artificial environments(Baddeley et al., 2012; Dittmar et al., 2010; Doussot et al., 2020; Möller, 2012; Wystrach et al., 2011, 2013; Zeil, 2012). A model can not only be used to replicate but also to predict a given outcome and shape the design of experiments. Here, we used the models to shape the experimental design, as it does not require the entire history of the bee's trajectory to be tested and provides interesting insight into homing in diverse environments.

      Since we observed behavioural responses different from the one suggested by the models, it becomes interesting to look at the flight history. If we had found an alignment between the model and the behaviour, looking at thehistory would have become much less interesting. Thus our results raise an interest in looking at the entire flight history, which will require not only effort on the recording procedure, but as well conceptually. At the moment the underlying mechanisms of learning during outbound, inbound, exploration, or orientation flight remains evasive and therefore difficult to test a hypothesis. A detailed description of the flight during the entire bee history would enable us to speculate alternative models to the one tested in our study, but would remain limited in testing those.

      While we acknowledge that the bees had ample opportunities to learn the location of the nest entrance, we believe that their behaviour of entering the dense environment at a very low altitude cannot be solely explained by extended experience. It is possible that the bees could have also learned to enter at the edge of the objects or above the objects before descending within the dense environment.

      General:

      The most serious weakness of the set-up is that it is spatially and visually constrained, in particular lacking a distant visual panorama, which under natural conditions is crucial for the range over which rotational image difference functions provide navigational guidance. In addition, the array of identical landmarks is not representative of natural clutter and, because it is visually repetitive, poses un-natural problems for view-based homing algorithms. This is the reason why the functions degrade so quickly from one position to the next (Figures 9-12), although it is not clear what these positions are (memory0-memory7).

      In conclusion, I do not feel that I have learnt anything useful from this experiment; it does suggest, however, that to fully appreciate and understand the homing abilities of insects, there is no alternative but to investigate these abilities in the natural conditions in which they have evolved.

      We respectfully disagree with the evaluation that our study does not provide new insights due to the controlled laboratory conditions. Both field and laboratory research are necessary and should complement each other. Dismissing the value of controlled lab experiments would overlook the contributions of previous lab-based research, which has significantly advanced our understanding of animal behaviour. It is only possible to precisely define the visual test environments under laboratory conditions and to identify the role of the components of the environment for the behaviour through targeted variation of them. These results yield precious information to then guide future field-based experiments for validation.

      Our laboratory settings are a kind of abstraction of natural situations focusing on those aspects that are at the centre of the research question. Our approach here was based on the knowledge that bumblebees have to find their inconspicuous nest hole in nature, which is difficult to find in often highly dense environments, and ultimately on a spatial scale in the metre range. We first wanted to find out if bumblebees can find their nest hole under the particularly challenging condition that all objects surrounding the nest hole are the same. This was not yet clear. Uniformly distributed objects may, however, also occur in nature, as seen with visually inconspicuous nest entrances of bumblebees in grass meadows, flower meadows, or forests with similar plants. We agree that the term "clutter" is not well-defined in the literature and now refer to the  environment as a "dense environment."

      We changed the wording throughout the manuscript and figures.

      Despite the lack of a distant visual panorama, or also UV light, wind, or other confounding factors inherent to field work conditions, the bees successfully located the nest position even when we shifted the dense environment within the flight arena. We used rotational-image difference functions based on snapshots taken around the nest position to predict the bees' behaviour, as this is one of the most widely accepted and computationally most parsimonious assessments of catchment areas in the context of local homing. This approach also proved effective in our more restricted conditions, where the bees still managed to pinpoint their home.

      Reviewer 2 (Recommendations):

      (1) Clarify what is meant by modelling panoramic images at 1cm intervals (only?) along the x-axis of the arena.

      The panoramic images were taken along a grid with 0.5cm steps within the dense environment and 1cm steps in the rest of the arena. A previous study (Doussot et al., 2020) showed successful homing of multi-snapshot models in an environment of similar scale with a grid with 2cm steps. Therefore, we think that our scaling is sufficiently fine. We apologise for the missing information in the method section and added it to the revised manuscript. See lines 286-287

      (2) In Figures 9-12 what are the memory0 to memory7 locations and reference image orientations? Explain clearly which image comparisons generated the rotIDFs shown.

      Memory 0 to memory 7 are examples of the eight memorised snapshots, which are aligned in the nest direction and taken around the nest. In the rotIDFs shown, we took memory 0 as a reference image, and compared the 7 others by rotating them against memory 0. We clarified that in the revised manuscript.

      See revised figure caption in Fig. S9 – 16

      (3) Figure 9 seems to compare 'bird's-eye', not 'frog's-eye' views.

      We apologise for that mistake and carefully double-checked the figure caption.

      See revised figure caption Fig. S9

      (4) Why do you need to invoke a PI vector (Figure 6) to explain your results?

      Since the bees were able to home in the dense environment without entering the object arrangement from above but from the side, image matching alone could not explain the bees’ behaviour. Therefore, we suggest, as an hypothesis for future studies, a combination of mechanisms such as a home vector. Other alternatives, perhaps without requiring a PI vector, may explain the bees’ behaviour, and we will welcome any future contributions from the scientific community.

      References

      Baddeley, B., Graham, P., Husbands, P., & Philippides, A. (2012). A Model of Ant Route Navigation Driven by Scene Familiarity. PLoS Computational Biology,8(1), e1002336. https://doi.org/10.1371/journal.pcbi.1002336

      Capaldi, E. A., Smith, A. D., Osborne, J. L., Farris, S. M., Reynolds, D. R., Edwards, A. S., Martin, A., Robinson, G. E., Poppy, G. M., & Riley, J. R. (2000).

      Ontogeny of orientation flight in the honeybee revealed by harmonic radar. Nature, 403. https://doi.org/10.1038/35000564

      Cartwright, B. A., & Collett, T. S. (1987). Landmark maps for honeybees. Biological Cybernetics, 57(1), 85–93. https://doi.org/10.1007/BF00318718

      Dittmar, L., Stürzl, W., Baird, E., Boeddeker, N., & Egelhaaf, M. (2010). Goal seeking in honeybees: Matching of optic flow snapshots? Journal of Experimental Biology, 213(17), 2913–2923. https://doi.org/10.1242/jeb.043737

      Doussot, C., Bertrand, O. J. N., & Egelhaaf, M. (2020). Visually guided homing of bumblebees in ambiguous situations: A behavioural and modelling study. PLoS Computational Biology, 16(10). https://doi.org/10.1371/journal.pcbi.1008272

      Lehrer, M., & Collett, T. S. (1994). Approaching and departing bees learn different cues to the distance of a landmark. Journal of Comparative Physiology A, 175(2), 171–177. https://doi.org/10.1007/BF00215113

      Lobecke, A., Kern, R., & Egelhaaf, M. (2018). Taking a goal-centred dynamic snapshot as a possibility for local homing in initially naïve bumblebees. Journal of Experimental Biology, 221(2), jeb168674. https://doi.org/10.1242/jeb.168674

      Möller, R. (2012). A model of ant navigation based on visual prediction. Journal of Theoretical Biology, 305, 118–130. https://doi.org/10.1016/j.jtbi.2012.04.022

      Murray, T., & Zeil, J. (2017). Quantifying navigational information: The catchment volumes of panoramic snapshots in outdoor scenes. PLOS ONE, 12(10), e0187226. https://doi.org/10.1371/journal.pone.0187226

      Osborne, J. L., Smith, A., Clark, S. J., Reynolds, D. R., Barron, M. C., Lim, K. S., & Reynolds, A. M. (2013). The ontogeny of bumblebee flight trajectories: From Naïve explorers to experienced foragers. PLoS ONE, 8(11). https://doi.org/10.1371/journal.pone.0078681

      Philippides, A., de Ibarra, N. H., Riabinina, O., & Collett, T. S. (2013). Bumblebee calligraphy: The design and control of flight motifs in the learning and return flights of Bombus terrestris. Journal of Experimental Biology, 216(6), 1093–1104. https://doi.org/10.1242/jeb.081455

      Sonntag, A., Lihoreau, M., Bertrand, O. J. N., & Egelhaaf, M. (2024). Bumblebees increase their learning flight altitude in dense environments. bioRxiv, 2024.10.14.618154. https://doi.org/10.1101/2024.10.14.618154

      Woodgate, J. L., Makinson, J. C., Lim, K. S., Reynolds, A. M., & Chittka, L. (2016). Life-long radar tracking of bumblebees. PLoS ONE, 11(8). https://doi.org/10.1371/journal.pone.0160333

      Wystrach, A., Mangan, M., Philippides, A., & Graham, P. (2013). Snapshots in ants? New interpretations of paradigmatic experiments. Journal of Experimental Biology, 216(10), 1766–1770. https://doi.org/10.1242/jeb.082941

      Wystrach, A., Schwarz, S., Schultheiss, P., Beugnon, G., & Cheng, K. (2011). Views, landmarks, and routes: How do desert ants negotiate an obstacle course? Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology, 197(2), 167–179. https://doi.org/10.1007/s00359-010-0597-2

      Zeil, J. (2012). Visual homing: An insect perspective. Current Opinion in Neurobiology, 22(2), 285–293. https://doi.org/10.1016/j.conb.2011.12.008

      Zeil, J. (2022). Visual navigation: Properties, acquisition and use of views. Journal of Comparative Physiology A. https://doi.org/10.1007/s00359-022-01599-2

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The manuscript titled "Household clustering and seasonal genetic  variation of Plasmodium falciparum at the community-level in The Gambia" presents a valuable genetic spatio-temporal analysis of  malaria-infected individuals from four villages in The Gambia, covering  the period between December 2014 and May 2017. The majority of samples  were analyzed using a SNP barcode with the Spotmalaria panel, with a  subset validated through WGS. Identity-by-descent (IBD) was calculated  as a measure of genetic relatedness and spatio-temporal patterns of the  proportion of highly related infections were investigated. Related  clusters were detected at the household level, but only within a short  time period.

      Strengths:

      This study offers a valuable dataset, particularly due to its  longitudinal design and the inclusion of asymptomatic cases. The  laboratory analysis using the Spotmalaria platform combined and  supplemented with WGS is solid, and the authors show a linear  correlation between the IBD values determined with both methods,  although other studies have reported that at least 200 SNPs are required for IBD analysis. Data-analysis pipelines were created for (1) variant  filtering for WGS and subsequent IBD analysis, and (2) creating a  consensus barcode from the spot malaria panel and WGS data and  subsequent SNP filtering and IBD analysis.

      Weaknesses:

      Further refining the data could enhance its impact on both the scientific community and malaria control efforts in The Gambia.

      (1) The manuscript would benefit from improved clarity and better  explanation of results to help readers follow more easily. Despite  familiarity with genotyping, WGS, and IBD analysis, I found myself  needing to reread sections. While the figures are generally clear and  well-presented, the text could be more digestible. The aims and  objectives need clearer articulation, especially regarding the rationale for using both SNP barcode and WGS (is it to validate the approach with the barcode, or is it to have less missing data?). In several analyses, the purpose is not immediately obvious and could be clarified.

      The text of the manuscript has now been thoroughly revised. But please let us know if a specific section remains unclear.

      (2) Some key results are only mentioned briefly in the text without  corresponding figures or tables in the main manuscript, referring only  to supplementary figures, which are usually meant for additional detail, but not main results. For example, data on drug resistance markers  should be included in a table or figure in the main manuscript.

      We agree with the reviewer suggesting to move the prevalence of drug resistance markers from supplementary figures (previously Figure S8) to the main manuscript (now Figure 5). If other Figure/Table should be moved to the main manuscript please let us know.

      (3) The study uses samples from 2 different studies. While these are  conducted in the same villages, their study design is not the same,  which should be addressed in the interpretation and discussion of the  results. Between Dec 2014 and Sept 2016, sampling was conducted only in 2 villages and at less frequent intervals than between Oct 2016 to May  2017. The authors should assess how this might have impacted their  temporal analysis and conclusions drawn. In addition, it should be  clarified why and for exactly in which analysis the samples from Dec  2016 - May 2017 were excluded as this is a large proportion of your  samples.

      We have clarified which set of samples was used in our Results (Lines 293-295, 316-319). While two villages were recruited halfway through the study, two villages (J and K, Figure 1C) consistently provided data for each transmission season. Importantly, our temporal analysis accounts for these differences by grouping paired barcodes based on their respective locations (Figure 3B). Despite variations in sampling frequency, we still observe a clear overall decline in relatedness between the ‘0-2 months’ and ‘2-5 months’ groups, both of which include barcodes from all four villages.

      (4) Based on which criteria were samples selected for WGS? Did the  spatiotemporal spread of the WGS samples match the rest of the genotyped samples? I.e. were random samples selected from all times and places,  or was it samples from specific times/places selected for WGS?

      All P. falciparum positive samples were sent for genotyping and whole genome sequencing, ensuring no selection bias. However, only samples with sufficient parasite DNA were successfully sequenced. We have updated the text (Line 129-130) and added a supplementary figure (Figure S4) to show the sample collection broken down by type of data (barcode or genome). High quality genomes are distributed across all time points.

      (5) The manuscript would benefit from additional detail in the methods section.

      Please see our response in the section “Recommendation for the authors”.

      (6) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      While we acknowledge the potential for bias between samples with a consensus barcode (based on WGS) and those with genotyping-only barcodes, its impact is minimal. WGS does indeed produce a more accurate barcode compared to SNP genotyping, but any errors in the genotyping barcodes were mitigated by excluding loci that systematically mismatched with WGS data (see Figure S3). Additionally, the use of WGS improved the accuracy of 51 % (216/425) of barcodes, which strengthens the overall quality and validity of our analysis.

      (7) The linear correlation between IBD-values of barcode vs genome is  clear. However, since you do not use absolute values of IBD, but a  classification of related (>=0.5 IBD) vs. unrelated (<0.5), it  would be good to assess the agreement of this classification between the 2 barcodes. In Figure S6 there seem to be quite some samples that would be classified as unrelated by the consensus barcode, while they have  IBD>0.5 in the Genome-IBD; in other words, the barcode seems to be  underestimating relatedness.

      a. How sensitive is this correlation to the nr of SNPs in the barcode?

      We measured the agreement between the two classifications using specificity (0.997), sensitivity (0.841) and precision (0.843) described in the legend of Figure S8. To further demonstrate the good agreement between the two methods, we calculated a Cohen’s kappa value of 0.839 (Lines 226, 290), indicative of a strong agreement (McHugh 2012). As expected, the correlation between IBD values obtained by both methods improves (higher Cohen’s kappa and R<sup>2</sup>) as the cutoff for the minimal number of comparable and informative loci per barcode pair is raised (data not shown).

      (8) With the sole focus on IBD, a measure of genetic relatedness, some of the conclusions from the results are speculative.

      a. Why not include other measures such as genetic diversity, which  relates to allele frequency analysis at the population level (using, for example, nucleotide diversity)? IBD and the proportion of highly  related pairs are not a measure of genetic diversity. Please revise the  manuscript and figures accordingly.

      We agree with the fact that IBD is not a direct measure of genetic diversity, even though both are related (Camponovo et al., 2023). More precisely, IBD is a measure of the level of inbreeding in the population (Taylor et al., 2019). We have updated our manuscript by replacing “genetic diversity” with “genetic relatedness” or “inbreeding/outcrossing” when appropriate. Nucleotide diversity would be relevant if we wanted to compare different settings, e.g. Africa vs Asia, however this is not the case here.

      b. Additionally, define what you mean by "recombinatorial genetic  diversity" and explain how it relates to IBD and individual-level  relatedness.

      We considered the term ‘recombinatorial genetic diversity’ to be equivalent to the level of inbreeding in the population. Because this expression is rather uncommon, we decided to drop it from our manuscript and replace it with “inbreeding/outcrossing”.

      c. Recombination is one potential factor contributing to the loss of  relatedness over time. There are several other factors that could  contribute, such as mobility/gene flow, or study-specific limitations  such as low numbers of samples in the low transmission season and many  months apart from the high transmission samples.

      Indeed, the loss of relatedness could be attributed not only to the recombination of local cases but also to new parasites introduced by imported malaria cases. As we stated in our manuscript, previous studies have shown a limited effect of imported cases on maintaining transmission (Lines 72-74). Nevertheless, we cannot definitely exclude that imported cases have an effect on inbreeding levels, since we do not have access to genetic data of surrounding parasites at the time of the study. We updated the discussion accordingly (Lines 497-501).

      d. By including other measures such as linkage disequilibrium you could  further support the statements related to recombination driving the loss of relatedness.

      This commendable suggestion is actually part of an ongoing project focusing on the sharing of IBD fragments and how it correlates with linkage disequilibrium. However, we believe that this analysis would not fit in the scope of our manuscript which is really about spatio-temporal effects on parasite relatedness at a local scale.

      (9) While the authors conclude there is no seasonal pattern in the  drug-resistant markers, one can observe a big fluctuation in the dhps  haplotypes, which go down from 75% to 20% and then up and down again  later. The authors should investigate this in more detail, as dhps is  related to SP resistance, which could be important for seasonal malaria  chemoprofylaxis, especially since the mutations in dhfr seem near-fixed  in the population, indicating high levels of SP resistance at some of  the time points.

      As the reviewer noted, the DHPS A437G haplotype appears to decrease in prevalence twice throughout our study: from the 2015 and 2016 high transmission seasons to the subsequent 2016 and 2017 low transmission seasons. Seasonal Malaria Chemoprophylaxis (SMC) was carried out in the area through the delivery of sulfadoxine–pyrimethamine plus amodiaquine to children 5 years old and younger during high transmission seasons. As DHPS A437G haplotype has been associated with resistance to sulfadoxine, its apparent increase in prevalence during high transmission seasons could be resulting from the selective pressure imposed on parasites. After SMC, the decrease in prevalence observed during low transmission seasons could be caused by a fitness cost of the mutation favouring wild-type parasites over resistant ones. We updated our manuscript to reflect this relevant observation (Lines 400-405).

      (10) I recommend that raw data from genotyping and WGS should be deposited in a public repository.

      Genotyping data is available in the supplementary table 4 (Table S4). Whole genome sequencing is accessible in a European Nucleotide Archive public repository with the identifiers provided in supplementary table 5 (Table S5). We added references to these tables in the manuscript (Lines 249-250).

      Reviewer #2 (Public review):

      Summary:

      Malaria transmission in the Gambia is highly seasonal, whereby periods  of intense transmission at the beginning of the rainy season are  interspersed by long periods of low to no transmission. This raises  several questions about how this transmission pattern impacts the  spatiotemporal distribution of circulating parasite strains. Knowledge  of these dynamics may allow the identification of key units for targeted control strategies, the evaluation of the effect of selection/drift on  parasite phenotypes (e.g., the emergence or loss of drug resistance  genotypes), and analyze, through the parasites' genetic nature, the  duration of chronic infections persisting during the dry season. Using a combination of barcodes and whole genome analysis, the authors try to  answer these questions by making clever use of the different  recombination rates, as measured through the proportion of genomes with  identity-by-descent (IBD), to investigate the spatiotemporal relatedness of parasite strains at different spatial (i.e., individual, household,  village, and region) and temporal (i.e., high, low, and the  corresponding the transitions) levels. The authors show that a large  fraction of infections are polygenomic and stable over time, resulting  in high recombinational diversity (Figure 2). Since the number of  recombination events is expected to increase with time or with the  number of mosquito bites, IBD allows them to investigate the  connectivity between spatial levels and to measure the fraction of  effective recombinational events over time. The authors demonstrate the  epidemiological connectivity between villages by showing the presence of related genotypes, a higher probability of finding similar genotypes  within the same household, and how parasite-relatedness gradually  disappears over time (Figure 3). Moreover, they show that transmission  intensity increases during the transition from dry to wet seasons  (Figure 4). If there is no drug selection during the dry season and if  resistance incurs a fitness cost it is possible that alleles associated  with drug resistance may change in frequency. The authors looked at the  frequencies of six drug-resistance haplotypes (aat1, crt, dhfr, dhps,  kelch13, and mdr1), and found no evidence of changes in allele  frequencies associated with seasonality. They also find chronic  infections lasting from one month to one and a half years with no  dependence on age or gender.

      The use of genomic information and IBD analytic tools provides the  Control Program with important metrics for malaria control policies, for example, identifying target populations for malaria control and  evaluation of malaria control programs.

      Strength:

      The authors use a combination of high-quality barcodes (425 barcodes  representing 101 bi-allelic SNPs) and 199 high-quality genome sequences  to infer the fraction of the genome with shared Identity by Descent  (IBD) (i.e. a metric of recombination rate) over several time points  covering two years. The barcode and whole genome sequence combination  allows full use of a large dataset, and to confidently infer the  relatedness of parasite isolates at various spatiotemporal scales.

      Reviewer #3 (Public review):

      Summary

      This study aimed to investigate the impact of seasonality on the malaria parasite population genetic. To achieve this, the researchers conducted a longitudinal study in a region characterized by seasonal malaria  transmission. Over a 2.5-year period, blood samples were collected from  1,516 participants residing in four villages in the Upper River Region  of The Gambia and tested the samples for malaria parasite positivity.  The parasites from the positive samples were genotyped using a genetic  barcode and/or whole genome sequencing, followed by a genetic  relatedness analysis.

      The study identified three key findings:

      (1) The parasite population continuously recombines, with no single genotype dominating, in contrast to viral populations;

      (2) The relatedness of parasites is influenced by both spatial and temporal distances; and

      (3) The lowest genetic relatedness among parasites occurs during the  transition from low to high transmission seasons. The authors suggest  that this latter finding reflects the increased recombination associated with sexual reproduction in mosquitoes.

      The results section is well-structured, and the figures are clear and  self-explanatory. The methods are adequately described, providing a  solid foundation for the findings. While there are no unexpected  results, it is reassuring to see the anticipated outcomes supported by  actual data. The conclusions are generally well-supported; however, the  discussion on the burden of asymptomatic infections falls outside the  scope of the data, as no specific analysis was conducted on this aspect  and was not stated as part of the aims of the study. Nonetheless, the  recommendation to target asymptomatic infections is logical and  relevant.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript would benefit from additional detail in the methods section.

      a. Refer to Figure 1 when you describe the included studies and sample processing.

      We added the reference to Figure 1 (Line 131).

      b. While you describe each step in the pipeline, you do not specify the  tools, packages, or environment used (the GitHub link is also  non-functional). A graphic representation of the pipeline, with more  bioinformatic details than Supplementary Figure S1, would be helpful.  Add references to used tools and software created by others.

      The GitHub link has been updated and is now functional. We find Figure S1 already heavy in details, adding in more would be detrimental to our will of it being an easily readable summary of our pipeline. Readers seeking in-depth explanation of our pipeline might be more interested in reading the methods section instead. We are very much committed to credit the authors of the tools that were essential for us to create our analysis pipeline. The two most relevant tools that we used are hmmIBD and the Fws calculation, which were both cited in the methods (Lines 148-152, 214-215).

      c. What changed in the genotyping protocol after May 2016? Does it not  lead to bias in the (temporal) analysis by leaving these loci in for  samples collected before May 2016 and making them 'unknown' for the  majority of samples collected after this date?

      These 21 SNPs all clustered in 1 of the 4 multiplexes used for molecular genotyping, which likely failed to produce accurate base calls. We updated the text to include this information (Lines 198-200).

      The rationale behind the discarding of these 21 SNPs for barcodes sampled after May 2016 was that they were consistently mismatching with the WGS SNPs, probably due to genotyping error as mentioned above. However, by replacing these unknown positions in the molecular barcodes with WGS SNPs, 141 samples did recover some of these 21 SNPs with the accurate base calls (Figure S3A). Additionally, we added an extra analysis to assess the agreement between barcodes and WGS data (Figure S3B).

      d. Related to this, how are unknown and mixed genotypes treated in the  binary matrix? How is the binary matrix coded? Is 0 the same as the  reference allele? So all the missing and mixed are treated as  references? How many missing and mixed alleles are there, how often does it occur and how does this impact the IBD analysis?

      We acknowledge that the details that we provided regarding the IBD analysis were confusing. hmmIBD requires a matrix that contains positive or null integers for each different allele at a given loci (all our loci were bi-allelic, thus only 0 and 1 were used) and -1 for missing data. In our case, we set missing and mixed alleles to -1, which were then ignored during the IBD estimation. The corresponding text was updated accordingly (Lines 173-175).

      e. By excluding households with less than 5 comparisons, are you not preselecting households with high numbers of cases, and therefore higher likelihood of transmission within the household?

      All participants in each household were sampled at every collection time point. This sampling was unbiased towards likelihood of transmission. Excluding pairs of households with less than 5 comparisons was necessary to ensure statistical robustness in our analyses. Besides, this does not necessarily restrict the analysis to only households with a high number of cases as it is the total number of pairs between households that must equal 5 at least (for instance these pairs would pass the cutoff: household with 1 case vs household with 5 cases; household with 2 cases vs household with 3 cases).

      (2) Since the authors only do the genotype replacement and build  consensus barcode for 199 samples, there is a bias between the samples  with consensus barcode and those with only the genotyping barcode. How  did this impact the analysis?

      See (6) in the Public Review.

      a. It would be good to get a better sense of the distribution of the nr  of SNPs in the barcode. The range is 30-89, and 30 SNPs for IBD is  really not that much.

      Adding the range of the number of available SNPs per barcode is indeed particularly relevant. We added a supplementary figure (Figure S5) showing the distribution of homozygous SNPs per barcode, showing that a very small minority of barcodes have only 30 SNPs available for IBD (average of 65, median of 64).

      b. Did you compare the nr of SNPs in the consensus vs. only genotyped  barcodes? Is there more missing data in the genotype-only barcodes?

      We added a supplementary figure (Figure S5) with the distribution of homozygous SNPs in consensus (216 samples) and molecular (209 samples) barcodes. Consensus barcodes have more homozygous SNPs (average 76, median 82) than molecular barcodes (average of 54, median of 53), showing the improvement resulting from using whole genome sequencing data.

      c. How was the cut-off/sample exclusion criteria of 30 SNPs in the barcode determined?

      As described above (Public review section 7.a.), we removed pairs of barcodes with less than 30 comparable loci (and 10 informative loci) because this led to a good agreement between IBD values obtained from barcodes and genomes while still retaining a majority of pairwise IBD values.

      d. Was there more/less IBD between sample pairs with a consensus barcode vs those with genotype-only barcodes?

      We separated pairwise IBD values into two groups: “within consensus” and “within molecular”. The percentages of related barcodes (IBD ≥ 0.5) was virtually identical between “within consensus” (1.88 %) and “within molecular” (1.71 %) groups (χ<sup>2</sup> = 1.33, p value > 0.24).

      (3) Line 124 adds a reference for the PCR method used.

      We have updated this information: varATS qPCR (Line 121).

      (4) Line 126, what is MN2100ff? Is this the catalogue number of the  cellulose columns? Please clarify and add manufacturer details.

      MN2100ff was a replacement for CF11. We added a link to the MalariaGen website describing the product and the procedure (Lines 124-125).

      (5) Line 143: Figure S7 is the first supplementary figure referenced. Change the order and make this Figure S1?

      The numbering of figures is now fixed.

      (6) Line 154: How many SNPs were in the vcf before filtering?

      There were 1,042,186 SNPs before filtering. This information was added to the methods (Line 168).

      (7) Line 156: Why is QUAL filtered at 10000? This seems extremely high.  (I could be mistaken, but often QUAL above 50 or so is already fine, why discard everything below 10000?). What is the range of QUAL scores in  your vcf?

      We used the QUAL > 10000 to make our analyses less computationally intensive while keeping enough relevant genetic information. We agree that keeping variants with extremely high values of QUAL is not relevant above a certain threshold as it translates into infinitesimally low probabilities (10<sup>-(QUAL/10)</sup>) of the variant calling being wrong. We then decided to use a minimal population minor allele frequency (MAF) of 0.01 to keep a variant as this will make the IBD calculation more accurate (Taylor et al., 2019). The variant filtering was carried out with the MAF > 0.01 filter, resulting in 27,577 filtered SNPs with a minimal QUAL of 132. With a cutoff of 3000 available SNPs, we retrieved all 199 genomes previously obtained with the QUAL > 10000 condition. The methods have been updated accordingly (Lines 166-170).

      (8) Line 161-165: How did you handle the mixed alleles in the hmmIBD  analysis for the WGS data? Did you set them as 0 as you do later on for  the consensus barcode?

      Mixed alleles and missing data were ignored. This translated into a value of -1 for the hmmIBD matrix and not 0 as we incorrectly stated previously. We updated our manuscript with this correct information (Lines 173-175).

      (9) Line 168-171: How many SNPs do you have in the WGS dataset after all the filtering steps? If the aim of the IBD with WGS was to validate the IBD-analysis with the barcode, wouldn't it make sense to have at least  200 loci (as shown in Taylor et al to be required for hmmIBD) in the WGS data? What proportion of comparisons were there with only 100 pairs of  loci? This seems like really few SNPs from WGS data.

      There were 27,577 SNPs overall in the 199 high quality genomes. In our analysis, we make the distinction between comparable and informative loci. For two loci to be comparable, they both have to be homozygous. To be informative, they must be comparable and at least one of them must correspond to the minor allele in the population. We borrowed this term and definition from hmmIBD software which yields directly the number of informative loci per pair. By keeping pairs with at least 100 informative SNPs, we aimed to reduce the number of samples artificially related because only population major alleles are being compared. Pairs of genomes had between 1073 and 27466 of these, way above the recommended 200 loci in Taylor et al. (2019). We added more details on comparable and informative sites (Lines 152-160).

      (10) Line 178: why remove the 12 loci that are absent from the WGS? Are  these loci also poorly genotyped in the spotmalaria panel?

      As our goal is to validate the reliability of molecular genotyped SNPs, these 12 loci have to be removed. Especially because we did find a consistent discrepancy between genotyped and WGSed SNPs, which cannot be tested if these SNPs are absent from the genomes.

      (11) Line 180-182: What do you mean by this sentence: "Genomic barcodes  are built using different cutoffs of within-sample MAF and aligned  against molecular barcodes from the same isolates." Is this the analysis presented in the supplementary figure and resulting in the cut-off of  MAF 0.2? Please clarify.

      A loci where both alleles are called can result from two distinct haploïd genomes present or from an error occurring during sequencing data acquisition or processing. To distinguish between the two, we empirically determined the cutoff of within-sample MAF above which the loci can be considered heterozygous and below which only the major allele is kept. The corresponding figure was indeed Figure S2 (referenced in next sentence Lines 192-195). We clarified our approach in the methods (Lines 190-192) and legends of Figures S2 and Figure S3.

      (12) Line 191: How often was there a mismatch between WGS and SNP barcode?

      We added a panel (Figure S3B) showing the average agreement of each SNP between molecular genotyping and WGS. We highlighted the 21 discrepant SNPs showing a lower agreement only for samples collected after May 2016.

      (13) Line 201-204: This part is unclear (as above for the WGS): did you  include sample pairs with more than 10 paired loci? But isn't 10 loci  way too few to do IBD analysis?

      We included pairs of samples with at least 30 comparable loci and 10 informative paired loci (refer to our answer to comment 8 for the difference between the two). We added more details regarding comparable and informative sites (Lines 152-160). Indeed, using fewer than 200 loci leads to an IBD estimation that is on average off by 0.1 or more (Taylor et al., 2019). However we showed that the barcode relatedness classification based on a cutoff of IBD (related when above 0.5, unrelated otherwise) was close enough to our gold standard using genomes (each pair having more than 1000 comparable sites). Because we use this classification approach rather than the exact value of barcode-estimated IBD in our study, our 30 minimum comparable sites cutoff seems sufficient.

      (14) Lines 206-207: which program did you use to analyse Fws?

      We did not use any program, we computed Fws according to Manske et al. (2012) methods.

      (15) Line 233: "we attempted parasite genotyping and whole genome  sequencing of 522 isolates over 16 time points" => This is confusing, you did not do WGS of 522 samples, only 199 as mentioned in the next  sentence.

      We attempted whole genome sequencing on 331 isolates and molecular genotyping on 442 isolates with 251 isolates common between the two methods. We updated our text to clarify this point (Lines 247-252).

      (16) Lines 256-259: Add a range of proportions or some other summary  statistic in this section as you are only referring here to  supplementary figures to support these statements.

      The text has been updated (Lines 271-274).

      (17) Line 260: check the formatting of the reference "Collins22" as the rest of the document references are numbered.

      Fixed.

      (18) Figure 2/3:

      a. You could also inspect relatedness at the temporal level, by  adjusting the network figure where the color is village and shape is  time (month/year).

      Although visualising the effect of time on the parasite relatedness network would be a valuable addition, we did not find any intuitive and simple way of doing so. Using shapes to represent time might end up being more confusing than helpful, especially because the sampling was not done at fixed intervals.

      b. To further support the statement of clustering at the household  level, it might be useful to add a (supplementary) figure with the  network with household number/IDs as color or shape. In the network,  there seems to be a lot of relatedness within the villages and between  villages. Perhaps looking only at the distribution of the proportion of  highly related isolates is simplifying the data too much. Besides, there is no statistical difference between clustering at the household vs  within-village levels as indicated in Figure 3.

      Unfortunately, there are too many households (71 in Figure 2) to make a figure with one color or shape per household readable. The statistical test of the difference between the within household and within village relatedness yielded a p value above the cutoff of 0.05 (p value of 0.084). However, it is possible that the lack of significance arises from the relatively low number of data points available in the “within household” group. This is even more plausible considering the statistical difference of both “within household” and “within village” groups with “between village” group. Overall, our results indicate a decreasing parasite relatedness with spatial distance, and that more investigation would be needed to quantify the difference between “within household” and “within village” groups. 

      (19) Figure 4: Please add more description in the caption of this figure to help interpret what is displayed here. Figure 4A is hard to  interpret and does not seem to show more than is already shown in Figure 3A. What do the dots represent in Figure 4B? It is not clear what is  presented here.

      Compared to Figure 3A, Figure 4A enables the visualization of the relatedness between each individual pair of time points, which are later used in the comparison of relatedness between seasonal groups in Figure 4B. For this reason, we believe that Figure 4A should remain in the manuscript. However, we agree that the relationship between Figure 4A and Figure 4B is not intuitive in the way we presented it initially. For this reason, we added more details in the legend and modified Figure 4A to highlight the seasonal groups used in Figure 4B. 

      (20) Line 360-361: what did you do when haplotypes were not identical?

      We explained it in the methods section (Lines 144-146): in this case, only WGS haplotypes were kept.

      (21) Section chronic infections: it is important to mention that the  majority of chronic infections are individuals from the monthly  dry-season cohort.

      We added a statement about the 21 chronically infected individuals that were also part of the December 2016 – May 2017 monthly follow-up (Lines 423-426).

      (22) Lines 381-386: Did you investigate COI in these individuals? Could  it be co-circulating strains that you do not pick up at all times due to the consensus barcodes and discarding of mixed genotypes (and does not  necessarily show intra-host competition. That is speculation and should  perhaps not be in the results)?

      This is exactly what we think is happening. Due to the very nature of genotyping, only one strain may be observed at a time in the case of a co-infection, where distinct but related strains are simultaneously present in the host. The picked-up strain is typically the one with the highest relative abundance at the time of sampling. As the reviewer stated, fluctuation of strain abundance might not only be due to intra-host competition but also asynchronous development stages of the two strains. We added this observation to the manuscript (Lines 432-435).

      (22) Figure 6: highlight the samples where the barcode was not available in a different color to be able to see the difference between a  non-matching barcode and missing data.

      We thank the reviewer for this great suggestion. We have now added to Figure 6 barcodes available along with their level of relatedness with the dominant genotypes for each continuous infections.

      (24) Improve the discussion by adding a clear summary of the main  findings and their implications, as well as study-specific limitations.

      The Discussion has been updated with a paragraph summarizing the primary results (Lines 451-457).

      (25) Line 445: "implying that the whole population had been replaced in just one year "

      a. What do you mean by replaced? Did other populations replace the  existing populations? I am not sure the lack of IBD is enough to show  that the population changed/was replaced. Perhaps it is more accurate to say that the same population evolved. Nevertheless, other measures such as genetic diversity and genetic differentiation or population  structure.would be more suitable to strengthen these conclusions.

      We agree that “replaced” was the wrong term in this case. We rather intended to describe how the numerous recombinations between malaria parasites completely reshaped the same initial population which gradually displayed lower levels of relatedness over time. We updated the manuscript accordingly (Lines 507-512).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 260: Remove Collins 22.

      Fixed.

      (2) Lines 270-274: 73 + 213 = 286 not 284; sum of percentages is equal to 101%.

      The numbers are correct: the 73 barcodes identical (IBD >= 0.9) to another barcode are a subset of the 213 related (IBD >= 0.5) to another barcode. However we agree that this might be confusing and will considering barcodes to be related if they have an IBD between 0.5 and 0.9, while excluding those with an IBD >= 0.9. The text has been updated (Lines 299-301).

      (3) Section: "Independence of seasonality and drug resistance markers prevalence".

      The text has been revised and the supplementary figure is now a main figure.

      (4) For readers unaware of malaria control policy in the Gambia it would be helpful to have more details on the specifics of anti-malarial drug  administration.

      We added the drugs used in SMC (sulfadoxine-pyrimethamine and amodiaquine) and the first line antimalarial treatment in use in The Gambia during our study (Coartem) (Lines 383-388).

      Reviewer #3 (Recommendations for the authors):

      (1) The abstract is not as clear as the authors' summary. For example, I found the sentence starting with "with 425 P. falciparum..." hard to  follow.

      The abstract has been updated.

      (2) It is better to consistently use "barcode genotyping "or "genotyping by barcode". Sometimes "molecular genotyping" is used instead of  "barcode genotyping"

      We have now replaced all occurrences of “barcode genotyping” with “molecular genotyping” or “molecular barcode genotyping”. We prefer to stick with “molecular genotyping” as this let us distinguish between the molecular and the genomic barcode.

      (3) The introduction is quite disjoined and does not provide a clear  build-up to the gap in knowledge that the study is attempting to fill.  please revise.

      Introduction is now thoroughly revised.

      (4) Line 31 "with notable increase of parasite differentiation" is an interpretation and not an observation.

      We have modified that sentence (Lines 31-33).

      (5) Overall, the introduction requires substantial revision.

      Introduction is now thoroughly revised.

      (6) Line 70 "parasite population adapts..." I thought this required phenotypic analysis and not genetics?

      The idea is that population of parasites may adapt to environmental conditions (such as seasonality) by selecting the most fitted genotypes. For instance, antimalarial exposure has an effect of selecting parasites with specific mutations in drug resistance related genes, and this even appears to be transient (for example with chloroquine). As such, there is good reason to think that seasonality might have a similar effect on parasite genetics.

      (7) Line 129-130: the #442 is not reflected in the schematic Figure 1.

      This is an intentional choice to make the figure more synthetic. For this reason, we included the Figure S1, which provides more details on the data collection and analysis pipeline.

      (8) Line 242-243: "Made with natural earth". What is this?

      This is a statement acknowledging the use of Natural Earth data to produce the map presented in Figure 1A.

      (9) Line 260: "collins22", is this a reference?

      Fixed.

      (10) Line 269-70. Very hard to follow. Please revise.

      We changed the text (Lines 293-297).

      (11) Line 324: similarly... I think there is a typo here.

      We did not find any typo in this specific sentence. However, “Similarly to Figure 3” sounds maybe a bit off, so we changed it to “As in Figure 3” (Line 351).

      (12) Line 332-334: very hard to follow. please revise. Again, the lower  parasite relatedness during the transition from low to high was linked  to recombination occurring in the mosquito but what about infection  burden shifting to naive young children? Is there a role for host  immunity in the observed reduction in parasite-relatedness during the  transition period?

      This text has been rewritten (Lines 356-361).

      About the hypothesis of infection burden shifting to naïve young children, this question is difficult to address in The Gambia because children under 5 years old received Seasonal Malaria Chemoprophylaxis during the high transmission season. In older children (6-15 years old), the prevalence was similar to adults (Fogang et al., 2024).

      About the role of host immunity on parasite relatedness across time and space, our dataset is too small to divide it in different age groups. Further studies should address this very interesting question.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines changes in relaxation time (T1 and T2) and magnetization transfer parameters that occur in a model system and in vivo when cells or tissue are depolarized using an equimolar extracellular solution with different concentrations of the depolarizing ion K<sup>+</sup>. The motivation is to explain T2 changes that have previously been observed by the authors in an in vivo model with neural stimulation (DIANA) and to try to provide a mechanism to explain those changes.

      Strengths:

      The authors argue that the use of various concentrations of KCL in the extracellular fluid depolarize or hyperpolarize the cell pellets used and that this change in membrane potential is the driving force for the T2 (and T1-supplementary material) changes observed. In particular, they report an increase in T2 with increasing KCL concentration in the extracellular fluid (ECF) of pellets of SH-SY5Y cells. To offset the increasing osmolarity of the ECF due to the increase in KCL, the NaCL molarity of the ECF is proportionally reduced. The authors measure the intracellular voltage using patch clamp recordings, which is a gold standard. With 80 mM of KCL in the ECF, a change in T2 of the cell pellets of ~10 ms is observed with the intracellular potential recorded as about -6 mv. A very large T1 increase of ~90 ms is reported under the same conditions. The PSR (ratio of hydrogen protons on macromolecules to free water) decreases by about 10% at this 80 mM KCL concentration. Similar results are seen in a Jurkat cell line and similar, but far smaller changes are observed in vivo, for a variety of reasons discussed. As a final control, T1 and T2 values are measured in the various equimolar KCL solutions. As expected, no significant changes in T1 and T2 of the ECF were observed for these concentrations.

      Weaknesses:

      [Reviewer 1, Comment 1] While the concepts presented are interesting, and the actual experimental methods seem to be nicely executed, the conclusions are not supported by the data for a number of reasons. This is not to say that the data isn't consistent with the conclusions, but there are other controls not included that would be necessary to draw the conclusion that it is membrane potential that is driving these T1 and T2 changes. Unfortunately for these authors, similar experiments conducted in 2008 (Stroman et al. Magn. Reson. in Med. 59:700-706) found similar results (increased T2 with KCL) but with a different mechanism, that they provide definite proof for. This study was not referenced in the current work.

      It is well established that cells swell/shrink upon depolarization/hyperpolarization. Cell swelling is accompanied by increased light transmittance in vivo, and this should be true in the pellet system as well. In a beautiful series of experiments, Stroman et al. (2008) showed in perfused brain slices that the cells swell upon equimolar KCL depolarization and the light transmittance increases. The time course of these changes is quite slow, of the order of many minutes, both for the T2-weighted MRI signal and for the light transmittance. Stroman et al. also show that hypoosmotic changes produce the exact same time course as the KCL depolarization changes (and vice versa for the hyperosmotic changes - which cause cell shrinkage). Their conclusion, therefore, was that cell swelling (not membrane potential) was the cause of the T2-weighted changes observed, and that these were relatively slow (on the scale of many minutes).

      What are the implications for the current study? Well, for one, the authors cannot exclude cell swelling as the mechanism for T2 changes, as they have not measured that. It is however well established that cell swelling occurs during depolarization, so this is not in question. Water in the pelletized cells is in slow/intermediate exchange with the ECF, and the solutions for the two compartment relaxation model for this are well established (see Menon and Allen, Magn. Reson. in Med. 20:214-227 (1991). The T2 relaxation times should be multiexponential (see point (3) further below). The current work cannot exclude cell swelling as the mechanism for T2 changes (it is mentioned in the paper, but not dealt with). Water entering cells dilutes the protein structures, changes rotational correlation times of the proteins in the cell and is known to increase T2. The PSR confirms that this is indeed happening, so the data in this work is completely consistent with the Stroman work and completely consistent with cell swelling associated with depolarization. The authors should have performed light scattering studies to demonstrate the presence or absence of cell swelling. Measuring intracellular potential is not enough to clarify the mechanism.

      [Reviewer 1, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed changes in T2, PSR, and T1, especially in pelletized cells. For this reason, we already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes, though this study did not present the magnitude of the cell volume changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed the work of Stroman et al. in the revised manuscript.

      In addition, we acknowledge that the title and main conclusion of the original manuscript may be misleading, as we did not separately consider the effect of cell volume changes on MR parameters. To more accurately reflect the scope and results of this study and also take into account the reviewer 2’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Finally, when [K<sup>+</sup>]-induced membrane potential changes are involved, there seems to be factors other than cell volume changes that appear to influence T<sup>2</sup> changes. Our follow-up study shows that there are differences in volume changes for the same T<sup>2</sup> change in the following two different situations: pure osmotic volume changes versus [K<sup>+</sup>]-induced volume changes. For example, for the same T<sup>2</sup> change, the volume change for depolarization is greater than the volume change for hypoosmotic conditions. We will present these results in this coming ISMRM 2025 and are also preparing a manuscript to report shortly.

      [Reviewer 1, Comment 2] So why does it matter whether the mechanism is cell swelling or membrane potential? The reason is response time. Cell swelling due to depolarization is a slow process, slower than hemodynamic responses that characterize BOLD. In fact, cell swelling under normal homeostatic conditions in vivo is virtually non-existent. Only sustained depolarization events typically associated with non-naturalistic stimuli or brain dysfunction produce cell swelling. Membrane potential changes associated with neural activity, on the other hand, are very fast. In this manuscript, the authors have convincingly shown a signal change that is virtually the same as what was seen in the Stroman publication, but they have not shown that there is a response that can be detected with anything approaching the timescale of an action potential. So one cannot definitely say that the changes observed are due to membrane potential. One can only say they are consistent with cell swelling, regardless of what causes the cell swelling.

      For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity. I think one would find that these are minuscule within the context of an action potential, or even bulk action potential.

      [Reviewer 1, Response 2] In the context of cell swelling occurring at rapid response times, if we define cell swelling simply as an “increase in cell volume,” there are several studies reporting transient structural (or volumetric) changes (e.g., ~nm diameter change over ~ms duration) in neuron cells during action potential propagation (Akkin et al., Biophys J 93:1347-1353, 2007; Kim et al., Biophys J 92:3122-3129, 2007; Lee et al., IEEE Trans Biomed Eng 58:3000-3003, 2011; Wnek et al., J Polym Sci Part B: Polym Phys 54:7-14, 2015; Yang et al., ACS Nano 12:4186-4193, 2018). These studies show a good correlation between membrane potential changes and cell volume changes (even if very small) at the cellular level within milliseconds.

      As mentioned in the Response 1 above, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly mentioned as one of the limitations in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (e.g., T<sup>2</sup> and PSR) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 above.

      There are a few smaller issues that should be addressed.

      [Reviewer 1, Comment 3] (1) Why were complicated imaging sequences used to measure T1 and T2? On a Bruker system it should be possible to do very simple acquisitions with hard pulses (which will not need dictionaries and such to get quantitative numbers). Of course, this can only be done sample by sample and would take longer, but it avoids a lot of complication to correct the RF pulses used for imaging, which leads me to the 2nd point.

      [Reviewer 1, Response 3] We appreciate the reviewer’s suggestion regarding imaging sequences. In fact, we used dictionaries for fitting in vivo T<sup>2</sup> decay data, not in vitro data. Sample-by-sample nonlocalized acquisition with hard pulses may be applicable for in vitro measurements. However, for in vivo measurements, a slice-selective multi-echo spin-echo sequence was necessary to acquire T<sup>2</sup> maps within a reasonable scan time. Our choice of imaging sequence was guided by the need to spatially resolve MR signals from specific regions of interest while balancing scan time constraints.

      [Reviewer 1, Comment 4] (2) Figure S1 (H) is unlike any exponential T2 decay I have seen in almost 40 years of making T2 measurements. The strange plateau at the beginning and the bump around TE = 25 ms are odd. These could just be noise, but the fitted curve exactly reproduces these features. A monoexponential T2 decay cannot, by definition, produce a fit shaped like this.

      [Reviewer 1, Response 4] The T<sup>2</sup> decay curves in Figure S1(H) indeed display features that deviate from a simple monoexponential decay. In our in vivo experiments, we used a multi-echo spin-echo sequence with slice-selective excitation and refocusing pulses. In such sequences, the echo train is influenced by stimulated echoes and imperfect slice profiles. This phenomenon is inherent to the pulse sequence rather than being artifacts or fitting errors (Hennig, Concepts Magn Reson 3:125-143, 1991; Lebel and Wilman, Magn Reson Med 64:1005-1014, 2010; McPhee and Wilman, Magn Reson Med 77:2057-2065, 2017). Therefore, we fitted the T<sub>2</sub> decay curve using the technique developed by McPhee and Wilman (2017).

      [Reviewer 1, Comment 5] (3) As noted earlier, layered samples produce biexponential T2 decays and monoexponential T1 decays. I don't quite see how this was accounted for in the fitting of the data from the pellet preparations. I realize that these are spatially resolved measurements, but the imaging slice shown seems to be at the boundary of the pellet and the extracellular media and there definitely should be a biexponential water proton decay curve. Only 5 echo times were used, so this is part of the problem, but it does mean that the T2 reported is a population fraction weighted average of the T2 in the two compartments.

      [Reviewer 1, Response 5] We understand the reviewer’s concern regarding potential biexponential decay due to the presence of different compartments. In our experiments, we carefully positioned the imaging slice sufficiently remote from the pellet-media interface. This approach ensures that the signal predominantly arises from the cells (and interstitial fluid), excluding the influence of extracellular media above the cell pellet. We described the imaging slice more clearly in the revised manuscript. As mentioned in our Methods section, for in vitro experiments, we repeated a single-echo spin-echo sequence with 50 difference echo times. While Figure 1C illustrates data from five echo times for visual clarity, the full dataset with all 50 echo times was used for fitting. We clarified this point in the revised manuscript to avoid any misunderstanding.

      [Reviewer 1, Comment 6] (4) Delta T1 and T2 values are presented for the pellets in wells, but no absolute values are presented for either the pellets or the KCL solutions that I could find.

      [Reviewer 1, Response 6] As requested by the reviewer, we included the absolute values in the supplementary information.

      Reviewer #2 (Public review):

      Summary:

      Min et al. attempt to demonstrate that magnetic resonance imaging (MRI) can detect changes in neuronal membrane potentials. They approach this goal by studying how MRI contrast and cellular potentials together respond to treatment of cultured cells with ionic solutions. The authors specifically study two MRI-based measurements: (A) the transverse (T2) relaxation rate, which reflects microscopic magnetic fields caused by solutes and biological structures; and (B) the fraction or "pool size ratio" (PSR) of water molecules estimated to be bound to macromolecules, using an MRI technique called magnetization transfer (MT) imaging. They see that depolarizing K<sup>+</sup> and Ba2+ concentrations lead to T2 increases and PSR decreases that vary approximately linearly with voltage in a neuroblastoma cell line and that change similarly in a second cell type. They also show that depolarizing potassium concentrations evoke reversible T2 increases in rat brains and that these changes are reversed when potassium is renormalized. Min et al. argue that this implies that membrane potential changes cause the MRI effects, providing a potential basis for detecting cellular voltages by noninvasive imaging. If this were true, it would help validate a recent paper published by some of the authors (Toi et al., Science 378:160-8, 2022), in which they claimed to be able to detect millisecond-scale neuronal responses by MRI.

      Strengths:

      The discovery of a mechanism for relating cellular membrane potential to MRI contrast could yield an important means for studying functions of the nervous system. Achieving this has been a longstanding goal in the MRI community, but previous strategies have proven too weak or insufficiently reproducible for neuroscientific or clinical applications. The current paper suggests remarkably that one of the simplest and most widely used MRI contrast mechanisms-T2 weighted imaging-may indicate membrane potentials if measured in the absence of the hemodynamic signals that most functional MRI (fMRI) experiments rely on. The authors make their case using a diverse set of quantitative tests that include controls for ion and cell type-specificity of their in vitro results and reversibility of MRI changes observed in vivo.

      Weaknesses:

      [Reviewer 2, Comment 1] The major weakness of the paper is that it uses correlational data to conclude that there is a causational relationship between membrane potential and MRI contrast. Alternative explanations that could explain the authors' findings are not adequately considered. Most notably, depolarizing ionic solutions can also induce changes in cellular volume and tissue structure that in turn alter MRI contrast properties similarly to the results shown here. For example, a study by Stroman et al. (Magn Reson Med 59:700-6, 2008) reported reversible potassium-dependent T2 increases in neural tissue that correlate closely with light scattering-based indications of cell swelling. Phi Van et al. (Sci Adv 10:eadl2034, 2024) showed that potassium addition to one of the cell lines used here likewise leads to cell size increases and T2 increases. Such effects could in principle account for Min et al.'s results, and indeed it is difficult to see how they would not contribute, but they occur on a time scale far too slow to yield useful indications of membrane potential. The authors' observation that PSR correlates negatively with T2 in their experiments is also consistent with this explanation, given the inverse relationship usually observed (and mechanistically expected) between these two parameters. If the authors could show a tight correspondence between millisecond-scale membrane potential changes and MRI contrast, their argument for a causal connection or a useful correlational relationship between membrane potential and image contrast would be much stronger. As it is, however, the article does not succeed in demonstrating that membrane potential changes can be detected by MRI.

      [Reviewer 2, Response 1] We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed MR parameter changes. For this reason, we have already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008) and Phi Van et al. (Sci Adv 10:eadl2034, 2024). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we additionally discussed both work of Stroman et al. and Phi Van et al. in the revised manuscript.

      In addition, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations of this study in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (although on a slow time scale) when using ionic solutions that modulate membrane potential. Identifying MR parameter changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be addressed in the follow-up study mentioned in the Response 1 to Reviewer 1’s Comment 1 above.

      Together, we acknowledge that the title and main conclusion of the original manuscript may be misleading. To more accurately reflect the scope and results of this study and also consider the reviewer’s suggestion, we adjusted the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and also revised the relevant phrases in the main text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      [Reviewer 1, Comment 7] The manuscript is well written. One thing to emphasize early on is that the KCL depolarization is done in an equimolar (or isotonic) manner. I was not clear on this point until I got to the very end of the methods. This is a strength of the paper and should be presented earlier.

      [Reviewer 1, Response 7] In response to the reviewer’s suggestion, we have revised the manuscript to present the equimolar characteristic of our experiment earlier.

      [Reviewer 1, Comment 8] In terms of experiments, the relaxation time measurements are not well constructed. They should be done with a CPMG sequence with hundreds of echos and properly curve fit. This is entirely possible on a Bruker spectrometer.

      [Reviewer 1, Response 8] As noted in our Response to Reviewer 1’s Comment 3, while a CPMG sequence with numerous echoes and straightforward curve fitting can be effective, it is less feasible for in vivo experiments. Our multi-echo spin-echo sequence was a balanced approach between spatial resolution, reasonable scan duration, and the need to localize signals within specific regions of interest.

      [Reviewer 1, Comment 9] Measurements of cell swelling should be done to determine the time course of the cell swelling. This could be with NMR (CPMG) or with light scattering. For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity.

      [Reviewer 1, Response 9] We acknowledge the importance of further research to further strengthened the claims of this study through additional experiments such as cell volume recording. We will do it in future studies.

      As noted in our Response 2 to Reviewer 1’s Comment 2, this study does not address rapid membrane potential changes on the millisecond scale, and we acknowledge that establishing the precise timing of cell swelling is crucial for fully understanding the mechanisms of DIANA. Our current work demonstrates that MR parameters (e.g., T<sup>2</sup> and PSR) correlate strongly with membrane potential-modulating ionic environments, but it does not extend to millisecond-scale neural activation. We recognize the importance of further experiments, such as direct cell volume measurements and plan to incorporate it in future studies to build on the insights gained from the present work.

      Reviewer #2 (Recommendations for the authors):

      Here are a few comments, questions, and suggestions for improvement:

      [Reviewer 2, Comment 2] I could not find much information about the various incubation times and delays used for the authors' in vitro experiments. For each of the in vitro experiments in particular, how long were cells exposed to the stated ionic condition prior to imaging, and how long did the imaging take? Could this and any other relevant information about the experimental timing please be provided and added to the methods section?

      [Reviewer 2, Response 2] We have included the information about the preparation/incubation times in the revised manuscript. For the scan time, it was already stated in the original manuscript: 23 minutes for the single-echo spin-echo sequence and 23 minutes for the inversion-recovery multi-echo spin-echo, for a total of 46 minutes.

      [Reviewer 2, Comment 3] In what format were the cells used for patch clamping, and were any controls done to ensure that characteristics of these cells were the same as those pelleted and imaged in the MRI studies? How long were the incubation times with ionic solutions in the patch clamp experiment? This information should likewise be added to the paper.

      [Reviewer 2, Response 3] We have clarified in the revised manuscript that SH-SY5Y cells were patch clamp-measured in their adherent state. On the other hand, the cells were dissociated from the culture plate and pelleted, so the experimental environments were not entirely identical. The patch clamp experiments involved a 20–30 minutes incubation period with the ionic solutions. We have included this information in the revised manuscript.

      [Reviewer 2, Comment 4] Can the authors provide information about the mean cell size observed under each condition in their in vitro experiments?

      [Reviewer 2, Response 4] We did not directly quantify the mean cell size for each in vitro condition in this study, so we do not have corresponding data. However, we acknowledge that this information could provide valuable insights into potential mechanisms underlying the observed MR parameter changes. In future experiments, we plan to include direct cell-size measurements to further elucidate how changes in cell volume or hydration contribute to our MR findings.

      [Reviewer 2, Comment 5] The ionic challenges used both in vitro and in vivo could also have affected cell permeability, with corresponding effects that would be detectable in diffusion weighted imaging. Did the authors examine this or obtain any results that could reflect on contributions of permeability properties to the contrast effects they report?

      [Reviewer 2, Response 5] We did not perform diffusion-weighted imaging and therefore do not have direct data regarding changes in cell permeability. We agree that incorporating diffusion-weighted measurements could help distinguish whether the MR parameters changes are driven primarily by membrane potential shifts, cell volume changes, or variations in permeability properties. We will consider these approaches in our future studies.

      [Reviewer 2, Comment 6] Clearly, a faster stimulation method such as optogenetics, in combination with time-locked MRI readouts of the pelleted cells, would be more effective at demonstrating a useful relationship between cellular neurophysiology and MRI contrast in vitro. Can the authors present data from such an experiment? Is there any information they can present that documents the time course of observed responses in their experiments?

      [Reviewer 2, Response 6] In the current study, our methodology did not include time-resolved or dynamic measurements. While it may be possible to obtain indirect information about the temporal dynamics using T<sup>2</sup>-weighted or MT-weighted imaging, such an experiment was beyond the scope of this work. However, we agree that an optogenetic approach with time-locked MRI acquisitions could help directly link cell physiology to MRI contrast, and we will explore this in future studies.

      [Reviewer 2, Comment 7] The authors used a drug cocktail to suppress hemodynamic effects in the experiments of Figs. 5-6. What evidence is there that this cocktail successfully suppresses hemodynamic responses and that it also preserves physiological responses to the ionic challenges used in their experiments? Were analogous in vivo results also obtained in the absence of the cocktail?

      [Reviewer 2, Response 7] We appreciate the reviewer’s concern regarding pharmacological suppression of hemodynamic effects. Although each component is known to inhibit nitric oxide synthesis, we did not directly measure the degree of hemodynamic suppression in this study. In addition, we cannot definitively confirm that these agents preserved the physiological responses to the ionic challenges. We have clarified these points in the revised manuscript and identified them as limitations of the study.

      [Reviewer 2, Comment 8] Why weren't PSR results reported as part of the in vivo experimental results in Fig. 5? Does PSR continue to vary inversely to T2 in these experiments?

      [Reviewer 2, Response 8] In our current experimental setup, acquiring the T<sup>2</sup> map four times required 48 minutes, and extending the scan to include additional quantitative MT measurements for PSR would have significantly prolonged the scanning session. Given that these experiments were conducted on acutely craniotomized rats, maintaining stable physiological conditions for such a long period of time was challenging. Therefore, due to time constraints, we did not perform MT measurements and focused on T<sub>2</sub> mapping.

      [Reviewer 2, Comment 9] The authors have established in vivo optogenetic stimulation paradigms in their laboratory and used them in the Toi et al. DIANA study. Were T2 or PSR changes observed in vivo using standard T2 measurement or T2-weighted imaging methods that do not rely on the DIANA pulse sequence they originally applied?

      [Reviewer 2, Response 9] Our current T<sub>2</sub> mapping experiments utilized a standard multi-echo spin-echo sequence, rather than the DIANA pulse sequence employed in our previous work. In this respect, the T<sub>2</sub> changes we observed in vivo do not rely on the specialized DIANA methodology.

      [Reviewer 2, Comment 10] In the discussion section, the authors state that to their knowledge, theirs "is the first report that changes in membrane potential can be detected through MRI." This cannot be true, as their own Toi et al. Science paper previously claimed this, and a number of the studies cited on p.2 also claimed to detect close correlates of neuroelectric activity. This statement should be amended or revised.

      [Reviewer 2, Response 10] We appreciate the reviewer’s comment. We have revised the discussion section of the manuscript to reflect the points raised by the reviewer.

      [Reviewer 2, Comment 11] Because the current study does not actually demonstrate that changes in membrane potential can be detected by MRI, the authors should alter the title, abstract, and a number of relevant statements throughout the text to avoid implying that this has been shown. The title, for instance, could be changed to "Responses to depolarizing and hyperpolarizing ionic solutions measured by magnetic resonance imaging of excitable cells and rat brains," or something along these lines.

      [Reviewer 2, Response 11] We appreciate the reviewer’s suggestions. We have revised the title, abstract, and relevant statements of the manuscript to clarify that our findings show MR-detectable responses to ionic solutions that are expected to modulate membrane potential, rather than demonstrating direct detection of membrane potential changes by MRI.

      [Reviewer 2, Comment 12] The axes in Fig. 3 seem to be mislabeled. I think the horizontal axes are supposed to be membrane potential measured in mV.

      [Reviewer 2, Response 12] Thank the reviewer for finding an error. We have corrected the axis labels in Figure 3 to indicate membrane potential (in mV) on the horizontal axis.

      [Reviewer 2, Comment 13] Since neither the experiments in Jurkat cells (Fig. 4) nor the in vivo MRI tests (Fig. 5-6) appear to have made in conjunction with membrane potential measurements, it seems like a stretch to refer to these experiments as involving manipulation of membrane potentials per se. Instead, the authors should refer to them as involving administration of stimuli expected to be depolarizing or hyperpolarizing. The "hyperpolarization" and "depolarization" labels of Fig. 4 similarly imply a result that has not actually been shown, and should ideally be changed.

      [Reviewer 2, Response 13] To prevent any misleading that membrane potential changes were directly measured in Jurkat cells or in vivo, we have revised the relevant text and figure labels.

      [Reviewer 2, Comment 14] The changes in T2 and PSR documented with various K<sup>+</sup> challenges to Jurkat cells in Fig. 4 seem to follow a step-function-like profile that differs from the results reported in SH-SY5Y cells. Can the authors explain what might have caused this difference?

      [Reviewer 2, Response 14] We currently do not have a definitive explanation for why Jurkat cells exhibit a step-function-like response to varying K⁺ levels, whereas SH-SY5Y cells show a linear response to log [K<sup>+</sup>]. Experiments that include direct membrane potential measurements in Jurkat cells would help clarify whether this difference arises from genuinely different patterns of depolarization/hyperpolarization or from other factors. We have revised the revised manuscript to address this point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary: 

      This fascinating manuscript studies the effect of education on brain structure through a natural experiment. Leveraging the UK BioBank, these authors study the causal effect of education using causal inference methodology that focuses on legislation for an additional mandatory year of education in a regression discontinuity design. 

      Strengths: 

      The methodological novelty and study design were viewed as strong, as was the import of the question under study. The evidence presented is solid. The work will be of broad interest to neuroscientists 

      Weaknesses: 

      There were several areas which might be strengthed from additional consideration from a methodological perspective. 

      We sincerely thank the reviewer for the useful input, in particular, their recommendation to clarify RD and for catching some minor errors in the methods (such as taking the log of the Bayes factors). 

      Reviewer #1 (Recommendations for the authors): 

      (1) The fuzzy local-linear regression discontinuity analysis would benefit from further description. 

      (2) In the description of the model, the terms "smoothness" and "continuity" appear to be used interchangeably. This should be adjusted to conform to mathematical definitions. 

      We have now added to our explanations of continuity regression discontinuity. In particular, we now explain “fuzzy”, and add emphasis on the two separate empirical approaches (continuity and local-randomization), along with fixing our use of “smoothness” and “continuity”.

      results:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (3) The optimization of the smoother based on MSE would benefit from more explanation and consideration. How was the flexibility of the model taken into account in testing? Were there any concerns about post-selection inference? A sensitivity analysis across bandwidths is also necessary. Based on the model fit in Figure 1, results from a linear model should also be compared. 

      It is common in the RD literature to illustrate plots with higher-order polynomial fits while inference is based on linear (or at most quadratic) models (Cattaneo, Idrobo & Titiunik, 2019). We agree that this field-specific practice can be confusing to readers. Therefore, we have redone Figure 1 using local-linear fits better aligning with our analysis pipeline. Yet, it is still not a one-to-one alignment as point estimation and confidence are handled robustly while our plotting tools are simple linear fits. In addition, we updated Sup. Fig 3 and moved 3rd-order polynomial RD plots to Sup. Fig 4.

      Empirical RD has many branching analytical decisions (bandwidth, polynomial order, kernel) which can have large effects on the outcome. Fortunately, RD methodology is starting to become more standardized (Catteneo & Titiunik, 2022, Ann. Econ Rev) as there have been indications of publication bias using these methods (Stommes, Aronow & Sävje, 2023, Research and Politics (This paper suggest it is not researcher degrees of freedom, rather inappropriate inferential methods)). While not necessarily ill-intended, researcher degrees of freedom and analytic flexibility are major contributors to publication bias. We (self) limited our analytic flexibility by using pre-registration (https://osf.io/rv38z).

      One of the most consequential analytic decisions in RD is the bandwidth size as there is no established practice, they are context-specific and can be highly influential on the results. The choice of bandwidths can be framed as a ‘bias vs. variance trade-off’. As bandwidths increase, variance decreases since more subjects are added yet bias (misspecification error/smoothing bias) also increases (as these subjects are further away and less similar). In our case, our assignment (running/forcing) variable is ‘date of birth in months’; therefore our smallest comparison would be individuals born in August 1957 (unaffected/no treatment) vs September 1957 (affected/treated). This comparison has the least bias (subjects are the most similar to each other), yet it comes at the expense of very few subjects (high variance in our estimate). 

      MSE-derived bandwidths attempt to solve this issue by offering an automatic method to choose an analysis bandwidth in RD. Specifically, this aims to minimize the MSE of the local polynomial RD point estimator – effectively choosing a bandwidth by balancing the ‘bias vs. variance trade-off’ (explained in detail 4.4.2 Cattaneo et al., 2019 p 45 - 51 “A practical introduction to regression discontinuity designs: foundations”). Yet, you are very correct in highlighting potential overfitting issues as they are “by construction invalid for inference” (Calonico, Cattaneo & Farrell, 2020, p. 192). Quoting from Cattaneo and Titiunik’s Annual Review of Economics from 2022: 

      “Ignoring the misspecification bias can lead to substantial overrejection of the null hypothesis of no treatment effect. For example, back-of-the-envelop calculations show that a nominal 95% confidence interval would have an empirical coverage of about 80%.”

      Fortunately, modern RD analysis packages (such as rdrohust or RDHonest) calculate robust confidence intervals - for more details see Armstrong and Kolesar (2020). For a summary on MSE-bandwidths see the section “Why is it hard to estimate RD effects?” in Stommes and colleagues 2023 (https://arxiv.org/abs/2109.14526). For more in-depth handling see the Catteneo, Idrobo, and Titiunik primer (https://arxiv.org/abs/1911.09511).

      Lastly, with MSE-derived bandwidths sensitivity tests only make sense within a narrow window of the MSE-optimized bandwidth (5.5 Cattaneo et al., 2019 p 106 - 107). When a significant effect occurs, placebo cutoffs (artificially moving the cutoff) and donut-hole analysis are great sensitivity tests. Instead of testing our bandwidths, we decided to use an alternate RD framework (local randomization) in which we compare 1-month and 5-month windows. Across all analysis strategies, MRI modalities, and brain regions, we do not find any effects of the education policy change ROSLA on long-term neural outcomes.

      (4) In the Bayesian analysis, the authors deviated from their preregistered analytic plan. This whole section is a bit confusing in its current form - for example, point masses are not wide but rather narrow. Bayes factors are usually estimated; it is unclear how or why a prior was specified. What exactly is being modeled using a prior? Also, throughout - If the log was taken, as the methods seem to indicate for the Bayes factor, this should be mentioned in figures and reported estimates. 

      First, we would like to thank you for spotting that we incorrectly kept the log in the methods. We have fixed this and added the following sentence to the methods: 

      “Bayes factors are reported as BF<sub>10</sub> in support of the alternative hypothesis, we report Bayes factors under 1 as the multiplicative inverse (BF<sub>01</sub> = 1/BF)”

      All Bayesian analyses need to have a prior. In practice, this becomes an issue when you’re uncertain about 1) the location of the effect (directionality & center mass, defined by a location parameter), yet more importantly, the 2) confidence/certainty of the range-spread of possible effects (determined by a scale parameter). In normally distributed priors these two ‘beliefs’ are represented with a mean and a standard deviation (the latter impacts your confidence/certainty on the range of plausible parameter space). 

      Supplementary figure 6 illustrates several distributions (location = 0 for all) with varying scale parameters; when used as Bayesian priors this indicates differing levels of confidence in our certainty of the plausible parameter space. We illustrate our three reported, normally distributed priors centered at zero in blue with their differing scale parameters (sd = .5, 1 & 1.5).

      All of these five prior distributions have the same location parameter (i.e., 0) yet varying differences in the scale parameter – our confidence in the certainty of the plausible parameter space. At first glance it might seem like a flat/uniform prior (not represented) is a good idea – yet, this would put equal weight on the possibility of every estimate thereby giving the same probability mass to implausible values as plausible ones. A uniform prior would, for instance, encode the hypothesis that education causing a 1% increase in brain volume is just as plausible as it causing either a doubling or halving in brain volume. In human research, we roughly know a range of reasonable effect sizes and it is rare to see massive effects.

      A benefit of ‘weakly-informative’ priors is that they limit the range of plausible parameter values. The default prior in STAN (a popular Bayesian estimation program; https://mc-stan.org) is a normally distributed prior with a mean of zero and an SD of 2.5 (seen in orange in the figure; our initial preregistered prior). This large standard deviation easily permits positive and negative estimates putting minimal emphasis on zero. Contrast this to BayesFactor package’s (Morey R, Rouder J, 2023) default “wide” prior which is the Cauchy distribution (0, .7) illustrated in magenta (for more on the Cauchy see: https://distribution-explorer.github.io/continuous/cauchy.html). 

      These different defaults reflect differing Bayesian philosophical schools (‘estimate parameters’ vs ‘quantify evidence’ camps); if your goal is to accurately estimate a parameter it would be odd to have a strong null prior, yet (in our opinion) when estimating point-null BF’s a wide default prior gives far too much evidence in support of the null. In point-null BF testing the Savage-Dickey density ratio is the ratio between the height of the prior at 0 and the height of the posterior at zero (see Figure under section “testing against point null 0”). This means BFs can be very prior sensitive (seen in SI tables 5 & 6). For this reason, we thought it made sense to do prior sensitivity testing, to ensure our conclusions in favor of the null were not caused solely by an overly wide prior (preregistered orange distribution) we decided to report the 3 narrower priors (blue ones).

      Alternative Bayesian null hypotheses testing methods such as using Bayes Factors to test against a null region and ‘region of practical equivalence testing’ are less prior sensitive, yet both methods demand the researcher (e.g. ‘us’) to decide on a minimal effect size of practical interest. Once a minimal effect size of interest is determined any effect within this boundary is taken as evidence in support of the null hypothesis.

      (5) It is unclear why a different method was employed for the August / September data analysis compared to the full-time series. 

      We used a local-randomization RD framework, an entirely different empirical framework than continuity methods (resulting in a different estimate). For an overview see the primer by Cattaneo, Idrobo & Titiunik 2023 (“A Practical Introduction to Regression Discontinuity Designs: Extensions”; https://arxiv.org/abs/2301.08958).

      A local randomization framework is optimal when the running variable is discrete (as in our case with DOB in months) (Cattaneo, Idrobo & Titiunik 2023). It makes stronger assumptions on exchangeability therefore a very narrow window around the cutoff needs to be used. See Figure 2.1 and 2.2 (in the Cattaneo, Idrobo & Titiunik 2023) for graphical illustrations of 1) a randomized experiment, 2) a continuity RD design, and 3) local-randomization RD. Using the full-time series in a local randomization analysis is not recommended as there is no control for differences between individuals as we move further away from the cutoff – making the estimated parameter highly endogenous.

      We understand how it is confusing to have both a new framework and Bayesian methods (we could have chosen a fully frequentist approach) but using a different framework allows us to weigh up the aforementioned ‘bias vs variance tradeoff’ while Bayesian methods allow us to say something about the weight of evidence (for or against) our hypothesis.

      (6) Figure 1 - why not use model fits from those employed for hypothesis testing? 

      This is a great suggestion (ties into #3), we have now redone Figure 1.

      (7) The section on "correlational effect" might also benefit from additional analyses and clarifications. Indeed, the data come from the same randomized experiment for which minimum education requirements were adjusted. Was the only difference that the number of years of education was studied as opposed to the cohort? If so, would the results of this analysis be similar in another subsample of the UK Biobank for which there was no change in policy?

      We have clarified the methods section for the correlational/associational effect. This was the same subset of individuals for the local randomization analysis; all we did was change the independent variable from an exogenous dummy-coded ROSLA term (where half of the sample had the natural experiment) to a continuous (endogenous) educational attainment IV. 

      In principle, the results from the associational analysis should be exactly the same if we use other UK Biobank cohorts. To see if the association of education attainment with the global neuroimaging cohorts was similar across sub-cohorts of new individuals, we conducted post hoc Bayesian analysis on eight more subcohort of 10-month intervals, spaced 2 years apart from each other (Sup. Figure 7; each indicated by a different color). Four of these sub-cohorts predate ROSLA, while the other four are after ROSLA. Educational attainment is slowly increasing across the cohorts of individuals born from 1949 until 1965; intriguingly the effect of ROSLA is visually evident in the distributions of educational attainment (Sup. Figure 7). Also, as seen in the cohorts predating ROSLA more and more individuals were (already) choosing to stay in education past 15 years of age (see cohort 1949 vs 1955 in Sup. Figure 7).

      Sup. Figure 8 illustrates boxplots of the educational attainment posterior of the eight sub-cohorts in addition to our original analysis (s1957) using a normal distributed prior with a mean of 0 and a sd of 1. Total surface area shows a remarkably replicable association with education attainment. Yet, it is evident the “extremely strong” association we found for CSF was a statistical fluke – as the posterior of other cohorts (bar our initial test) crosses zero. The conclusions for the other global neuroimaging covariates where we concluded ‘no associational effect’ seems to hold across cohorts.

      We have now added methods, deviation from preregistration, and the following excerpt to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      Reviewer #2 (Public review): 

      Summary: 

      The authors conduct a causal analysis of years of secondary education on brain structure in late life. They use a regression discontinuity analysis to measure the impact of a UK law change in 1972 that increased the years of mandatory education by 1 year. Using brain imaging data from the UK Biobank, they find essentially no evidence for 1 additional year of education altering brain structure in adulthood. 

      Strengths: 

      The authors pre-registered the study and the regression discontinuity was very carefully described and conducted. They completed a large number of diagnostic and alternate analyses to allow for different possible features in the data. (Unlike a positive finding, a negative finding is only bolstered by additional alternative analyses). 

      Weaknesses: 

      While the work is of high quality for the precise question asked, ultimately the exposure (1 additional year of education) is a very modest manipulation and the outcome is measured long after the intervention. Thus a null finding here is completely consistent educational attainment (EA) in fact having an impact on brain structure, where EA may reflect elements of training after a second education (e.g. university, post-graduate qualifications, etc) and not just stopping education at 16 yrs yes/no. 

      The work also does not address the impact of the UK Biobank's well-known healthy volunteer bias (Fry et al., 2017) which is yet further magnified in the imaging extension study (Littlejohns et al., 2020). Under-representation of people with low EA will dilute the effects of EA and impact the interpretation of these results. 

      References: 

      Fry, A., Littlejohns, T. J., Sudlow, C., Doherty, N., Adamska, L., Sprosen, T., Collins, R., & Allen, N. E. (2017). Comparison of Sociodemographic and Health-Related Characteristics of UK Biobank Participants With Those of the General Population. American Journal of Epidemiology, 186(9), 1026-1034. https://doi.org/10.1093/aje/kwx246 

      Littlejohns, T. J., Holliday, J., Gibson, L. M., Garratt, S., Oesingmann, N., Alfaro-Almagro, F., Bell, J. D., Boultwood, C., Collins, R., Conroy, M. C., Crabtree, N., Doherty, N., Frangi, A. F., Harvey, N. C., Leeson, P., Miller, K. L., Neubauer, S., Petersen, S. E., Sellors, J., ... Allen, N. E. (2020). The UK Biobank imaging enhancement of 100,000 participants: rationale, data collection, management and future directions. Nature Communications, 11(1), 2624. https://doi.org/10.1038/s41467-020-15948-9 

      We thank the reviewer for the positive comments and constructive feedback, in particular, their emphasis on volunteer bias in UKB (similar points were mentioned by Reviewer 3). We have now addressed these limitations with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We also highlighted it both in the results and methods.

      We appreciate that one year of education may seem modest compared to the entire educational trajectory, but as an intervention, we disagree that one year of education is ‘a very modest manipulation’. It is arguably one of the largest positive manipulations in childhood development we can administer. If we were to translate a year of education into the language of a (cognitive) intervention, it is clear that the manipulation, at least in terms of hours, days, and weeks, is substantial. Prior work on structural plasticity (e.g., motor, spatial & cognitive training) has involved substantially more limited manipulations in time, intensity, and extent. There is even (limited) evidence of localized persistent long-term structural changes (Wollett & Maguire, 2011, Cur. Bio.).

      We have now also highlighted the limited generalizability of our findings since we estimate a ‘local’ average treatment effect. It is possible higher education (college, university, vocational schools, etc.) could impact brain structure, yet we see no theoretical reason why it would while secondary wouldn’t. Moreover, higher education education is even trickier to research empirically due to heightened self and administrative selection pressures. While we cannot discount this possibility, the impacts of endogenous factors such as genetics and socioeconomic status are most likely heightened. That being said, higher education offers exciting possibilities to compare more domain-specific processes (e.g., by comparing a philosophy student to a mathematics student). Causality could be tested in European systems with point entry into field-specific programs – allowing comparison of students who just missed entry criteria into one topic and settled for another.

      Regarding the amount of time following the manipulation, as we highlight in our discussion this is both a weakness and a strength. Viewed from a developmental neuroplasticity lens it would have been nice to have imaging immediately following the manipulation. Yet, from an aging perspective, our design has increased power to detect an effect.  

      Reviewer #2 (Recommendations for the authors): 

      (1) The authors assert there is no strong causal evidence for EA on brain structure. This overlooks work from Mendielian Randomisation, e.g. this careful work: https://pubmed.ncbi.nlm.nih.gov/36310536/ ... evidence from (good quality) MR studies should be considered. 

      We thank the reviewer for highlighting this well-done mendelian randomization study. We have now added this citation and removed previous claims on the “lack of causal evidence existing”. We refrain from discussing Mendelian randomization, as it it would need to be accompanied by a nuanced discussion on the strong limitations regarding EduYears-PGS in Mendelian randomization designs.

      (2) Tukey/Boxplot is a good name for your identification of outliers but your treatment of outliers has a well-recognized name that is missing: Windsorisation. Please add this term to your description to help the reader more quickly understand what was done. 

      Thanks, we have now added the term winsorized.

      (3) Nowhere is it plainly stated that "fuzzy" means that you allow for imperfect compliance with the exposure, i.e. some children born before the cut-off stayed in school until 16, and some born after the cut-off left school before 16. For those unfamiliar with RD it would be very helpful to explain this at or near the first reference of the term "fuzzy". 

      We have now clarified the term ‘fuzzy’ to the results and methods:

      methods:

      “RD designs, like ours, can be ‘fuzzy’ indicating when assignment only increases the probability of receiving it, in turn, treatment assigned and treatment received do not correspond for some units 33,53. For instance, due to cultural and historical trends, there was an increase in school attendance before ROSLA; most adolescents were continuing with education past 15 years of age (Sup Plot. 7b). Prior work has estimated that 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      (4) Supplementary Figure 2 never states what the percentage actually measures. What exactly does each dot represent? Is it based on UK Biobank subjects with a given birth month? If so clarify. 

      Fixed!

      Reviewer #3 (Public review): 

      Summary: 

      This study investigates evidence for a hypothesized, causal relationship between education, specifically the number of years spent in school, and brain structure as measured by common brain phenotypes such as surface area, cortical thickness, total volume, and diffusivity. 

      To test their hypothesis, the authors rely on a "natural" intervention, that is, the 1972 ROSLA act that mandated an extra year of education for all 15-year-olds. The study's aim is to determine potential discontinuities in the outcomes of interest at the time of the policy change, which would indicate a causal dependence. Naturalistic experiments of this kind are akin to randomised controlled trials, the gold standard for answering questions of causality. 

      Using two complementary, regression-based approaches, the authors find no discernible effect of spending an extra year in primary education on brain structure. The authors further demonstrate that observational studies showing an effect between education and brain structure may be confounded and thus unreliable when assessing causal relationships. 

      Strengths: 

      (1) A clear strength of this study is the large sample size totalling up to 30k participants from the UK Biobank. Although sample sizes for individual analyses are an order of magnitude smaller, most neuroimaging studies usually have to rely on much smaller samples. 

      (2) This study has been preregistered in advance, detailing the authors' scientific question, planned method of inquiry, and intended analyses, with only minor, justifiable changes in the final analysis. 

      (3) The analyses look at both global and local brain measures used as outcomes, thereby assessing a diverse range of brain phenotypes that could be implicated in a causal relationship with a person's level of education. 

      (4) The authors use multiple methodological approaches, including validation and sensitivity analyses, to investigate the robustness of their findings and, in the case of correlational analysis, highlight differences with related work by others. 

      (5) The extensive discussion of findings and how they relate to the existing, somewhat contradictory literature gives a comprehensive overview of the current state of research in this area. 

      Weaknesses: 

      (1) This study investigates a well-posed but necessarily narrow question in a specific setting: 15-year-old British students born around 1957 who also participated in the UKB imaging study roughly 60 years later. Thus conclusions about the existence or absence of any general effect of the number of years of education on the brain's structure are limited to this specific scenario. 

      (2) The authors address potential concerns about the validity of modelling assumptions and the sensitivity of the regression discontinuity design approach. However, the possibility of selection and cohort bias remains and is not discussed clearly in the paper. Other studies (e.g. Davies et al 2018, https://www.nature.com/articles/s41562-017-0279-y) have used the same policy intervention to study other health-related outcomes and have established ROSLA as a valid naturalistic experiment. Still, quoting Davies et al. (2018), "This assumes that the participants who reported leaving school at 15 years of age are a representative sample of the sub-population who left at 15 years of age. If this assumption does not hold, for example, if the sampled participants who left school at 15 years of age were healthier than those in the population, then the estimates could underestimate the differences between the groups.". Recent studies (Tyrrell 2021, Pirastu 2021) have shown that UK Biobank participants are on average healthier than the general population. Moreover, the imaging sub-group has an even stronger "healthy" bias (Lyall 2022). 

      (3) The modelling approach used in this study requires that all covariates of no interest are equal before and after the cut-off, something that is impossible to test. Mentioned only briefly, the inclusion and exclusion of covariates in the model are not discussed in detail. Standard imaging confounds such as head motion and scanning site have been included but other factors (e.g. physical exercise, smoking, socioeconomic status, genetics, alcohol consumption, etc.) may also play a role. 

      We thank the reviewer for their numerous positive comments and have now attempted to address the first two limitations (generalizability and UKB bias) with the following passage in the discussion:

      “The UK Biobank is known to have ‘healthy volunteer bias’, as respondents tend to be healthier, more educated, and are more likely to own assets [71,72]. Various types of selection bias can occur in non-representative samples, impacting either internal (type 1) or external (type 2) validity. One benefit of a natural experimental design is that it protects against threats to internal validity from selection bias [43], design-based internal validity threats still exist, such as if volunteer bias differentially impacts individuals based on the cutoff for assignment. A more pressing limitation – in particular, for an education policy change – is our power to detect effects using a sample of higher-educated individuals. This is evident in our first stage analysis examining the percentage of 15-year-olds impacted by ROSLA, which we estimate to be 10% in neuro-UKB (Sup. Figure 2 & Sup. Table 2), yet has been reported to be 25% in the UK general population [41]. Our results should be interpreted for this subpopulation  (UK, 1973, from 15 to 16 years of age, compliers) as we estimate a ‘local’ average treatment effect [73]. Natural experimental designs such as ours offer the potential for high internal validity at the expense of external validity.”

      We further highlight this in the results section:

      “Compliance with ROSLA was very high (near 100%; Sup. Figure 2). However, given the cultural and historical trends leading to an increase in school attendance before ROSLA, most adolescents were continuing with education past 15 years of age before the policy change (Sup Plot. 7b). Prior work has estimated 25 percent of children would have left school a year earlier if not for ROSLA 41. Using the UK Biobank, we estimate this proportion to be around 10%, as the sample is healthier and of higher SES than the general population (Sup. Figure 2; Sup. Table 2) 46–48.”

      Healthy volunteer bias can create two types of selection bias; crucially participation itself can serve as a collider threatening internal validity (outlined in van Alten et al., 2024; https://academic.oup.com/ije/article/53/3/dyae054/7666749). Natural experimental designs are partially sheltered from this major limitation, as ‘volunteer bias’ would have to differentially impact individuals on one side of the cutoff and not the other – thereby breaking a primary design assumption of regression discontinuity. Substantial prior work (including this article) has not found any threats to the validity of the 1973 ROSLA (Clark & Royer 2010, 2013; Barcellos et al., 2018, 2023; Davies et al., 2018, 2023). While the Davies 2028 article did IP-weight with the UK Biobank sample, Barcellos and colleagues 2023 (and 2018) do not, highlighting the following “Although the sample is not nationally representative,  our estimates have internal validity because there is no differential selection on the two sides of the September 1, 1957 cutoff – see  Appendix A.”.

      The second (more acknowledged & arguably less problematic) type of selection bias results in threats to external validity (aka generalizability). As highlighted in your first point; this is a large limitation with every natural experimental design, yet in our case, this is further amplified by the UK Biobank’s healthy volunteer bias. We have now attempted to highlight this limitation in the discussion passage above.

      Point 3 – the inability to fully confirm design validity – is again, another inherent limitation of a natural experimental approach. That being said, extensive prior work has tested different predetermined covariates in the 1973 ROSLA (cited within), and to our knowledge, no issues have been found. The 1973 ROSLA seems to be one of the better natural experiments around (there was also a concerted effort to have an ‘effective’ additional year; see Clark & Royer 2010). For these reasons, we stuck with only testing the variables we wanted to use to increase precision (also offering new neuroimaging covariates that didn’t exist in the literature base). One additional benefit of ROSLA was that the cutoff was decided years later on a variable that happened (date of birth) in the past – making it particularly hard for adolescents to alter their assignments.

      Reviewer #3 (Recommendations for the authors): 

      (1) FMRIB's preprocessing pipeline is mentioned. Does this include deconfounding of brain measures? Particularly, were measures deconfounded for age before the main analysis? 

      This is such a crucial point that we triple-checked, brain imaging phenotypes were not corrected for age (https://biobank.ctsu.ox.ac.uk/crystal/crystal/docs/brain_mri.pdf) – large effects of age can be seen in the global metrics; older individuals have less surface area, thinner cortices, less brain volume (corrected for head size), more CSF volume (corrected for head size), more white matter hyperintensities, and worse FA values. Figure 1 shows these large age effects, which are controlled for in our continuity-based RD analysis.

      One’s date of birth (DOB) of course does not match perfectly to their age, this is why we included the covariate ‘visit date’; this interplay can now be seen in our updated SI Figure 1 (recommended in #3) which shows the distributions of visit date, DOB, and age of scan. 

      In a valid RD design covariates should not be necessary (as they should be balanced on either side of the cutoff), yet the inclusion of covariates does increase precision to detect effects. We tested this assumption, finding the effect of ‘visit date’ and its quadratic term to be not related to ROSLA (Sup. Table 1). This adds further evidence (specific to the UK Biobank sample) to the existing body of work showing the 1973 ROSLA policy change to not violate any design assumptions. Threats to internal validity would more than likely increase endogeneity and result in ‘false causal positive causal effects’ (which is not what we find).  

      (2) Despite the large overall sample size, I am wondering whether the effective number of samples is sufficient to detect a potentially subtle effect that is further attenuated by the long time interval before scanning. As stated, for the optimised bandwidth window (DoB 20 to 35 months around cut-off), N is about 5000. Does this mean that effectively about 250 (10%) out of about 2500 participants born after the cut-off were leaving school at 16 rather than 15 because of ROSLA? For the local randomisation analysis, this becomes about N=10 (10% out of 100). Could a power analysis show that these cohort sizes are large enough to detect a reasonably large effect? 

      This is a very valid point, one which we were grappling with while the paper was out for review. We now draw attention to this in the results and highlight this as a limitation in the discussion. While UKB’s non-representativeness limits our power (10% affected rather than 25% in the general population), it is still a very large sample. Our sample size is more in line with standard neuroimaging studies than with large cohort studies. 

      The novelty of our study is its causal design, while we could very precisely measure an effect of some phenotype (variable X) in 40,000 individuals. This effect is probably not what we think we are measuring. Without IP-weighting it could even have a different sign. But more importantly, it is not variable X – it is the thousands of things (unmeasured confounders) that lead an individual to have more or less of variable X. The larger the sample the easier it is for small unmeasured confounders to reach significance (Big data paradox) – this in no way invalidates large samples, it is just our thinking and how we handle large samples will hopefully change to a more casual lens.

      (3) Supplementary Figure 1: A similar raincloud plot of date of birth would be instructive to visualise the distribution of subjects born before and after the 1957 cut-off. 

      Great idea! We have done this in Sup Fig. 1 for both visit date and DOB.

      (4) p.9: Not sure about "extreme evidence", very strong would probably be sufficient. 

      As preregistered, we interpreted Bayes Factors using Jeffrey’s criteria. ‘Extreme evidence’ is only used once and it is about finding an associational effect of educational attainment on CSF (BF10 > 100). Upon Reviewer 1’s recommendation 7, we conducted eight replication samples (Sup. Figure 7 & 8) and have now added the following passage to the results:

      “A post hoc replication of this associational analysis in eight additional 10-month cohorts spaced two years apart (Sup. Figure 7) indicates our preregistered report on the associational effect of educational attainment on CSF to be most likely a false-positive (Sup. Figure 8). Yet, the positive association between surface area and educational attainment is robust across the additional eight replication cohorts.”

      (5) The code would benefit from a bit of clean-up and additional documentation. In its current state, it is not easy to use, e.g. in a replication study. 

      We have now further added documentation to our code; including a readme describing what each script does. The analysis pipeline used is not ideal for replications as the package used for continuity-based RD (RDHonest) initially could not handle covariates – therefore we manually corrected our variables after a discussion with Prof Kolesár (https://github.com/kolesarm/RDHonest/issues/7). 

      Prof Kolesár added this functionality recently and future work should use the latest version of the package as it can correct for covariates. We have a new preprint examining the effect of 1972 ROLSA on telomere length in the UK Biobank using the latest package version of RDHonest (https://www.biorxiv.org/content/10.1101/2025.01.17.633604v1). To ensure maximum availability of such innovations, we will ensure the most up-to-date version of this script becomes available on this GitHub link (https://github.com/njudd/EduTelomere).

    1. We also may change how we behave and speak depending on the situation or who we are around, which is called code-switching [f21]. While modified behaviors to present a persona or code switch may at first look inauthentic, they can be a way of authentically expressing ourselves in each particular setting

      I like how this part of the reading brings awareness to the negative reputation that code switching has but also shows how it can be very useful. I think it's similar to when people say there is a place and a time to do something, usually in the context that you shouldn't be misbehaving in important setting. I have personally code switched in different scenarios such as my friends and my professor will see very different versions of me since I talk more formally to a professor than I would with my friends.

    1. Reviewer #1 (Public review):

      Summary:

      The manuscript by Egawa and colleagues investigates differences in nodal spacing in an avian auditory brain stem circuit. The results are clearly presented and data are of very high quality. The authors make two main conclusions:

      (1) Node spacing, i.e. internodal length, is intrinsically specified by the oligodendrocytes in the region they are found in, rather than axonal properties (branching or diameter).

      (2) Activity is necessary (we don't know what kind of signaling) for normal numbers of oligodendrocytes and therefore the extent of myelination.

      These are interesting observations, albeit phenomenon. I have only a few criticisms that should be addressed:

      (1) The use of the term 'distribution' when describing the location of nodes is confusing. I think the authors mean rather than the patterns of nodal distribution, the pattern of nodal spacing. They have investigated spacing along the axon. I encourage the authors to substitute node spacing or internodal length for node distribution.

      (2) In Seidl et al. (J Neurosci 2010) it was reported that axon diameter and internodal length (nodal spacing) were different for regions of the circuit. Can the authors help me better understand the difference between the Seidl results and those presented here?

      (3) The authors looked only in very young animals - are the results reported here applicable only to development, or does additional refinement take place with aging?

      (4) The fact that internodal length is specified by the oligodendrocyte suggests that activity may not modify the location of nodes of Ranvier - although again, the authors have only looked during early development. This is quite different than this reviewer's original thoughts - that activity altered internodal length and axon diameter. Thus, the results here argue against node plasticity. The authors may choose to highlight this point or argue for or against it based on results in adult birds?:

      Significance:

      This paper may argue against node plasticity as a mechanism for tuning of neural circuits. Myelin plasticity is a very hot topic right now and node plasticity reflects myelin plasticity. this seems to be a circuit where perhaps plasticity is NOT occurring. That would be interesting to test directly. One limitation is that this is limited to development.

    1. Reviewer #2 (Public Review):

      Summary:

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality.

      The authors benchmark their approach experimentally in several synthetic circuits. In 4 positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in 2 of the 4 positive control circuits. The authors constructed 16 negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter, or simply the cellular growth rate. The proposed method detected a causal effect in two of the 16 negative controls, which the authors argue is perhaps not a false positive, but due to an unexpected causal effect. Overall, the data support the potential value of the proposed approach.

      Strengths:

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations.

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations.

      The authors have improved the clarity and completeness of their proof compared to a previous version of the manuscript.

      Limitations:

      The authors themselves clearly outline the primary limitations of the study: The experimental benchmark is a proof of principle, and limited to synthetic circuits involving a handful of genes expressed on plasmids in E. coli. As acknowledged in the Discussion, negative controls were chosen based on the absence of known interactions, rather than perturbation experiments. Further work is needed to establish that this technique applies to other organisms and to biological networks involving a wider variety of genes and cellular functions. It seems to me that this paper's objective is not to delineate the technique's practical domain of validity, but rather to motivate this future work, and I think it succeeds in that.

      Might your new "Proposed additional tests" subsection be better housed under Discussion rather than Results?

      I may have missed this, but it doesn't look like you ran simulation benchmarks of your bootstrap-based test for checking whether the normalized covariances are equal. It would be useful to see in simulations how the true and false positive rates of that test vary with the usual suspects like sample size and noise strengths.

      It looks like you estimated the uncertainty for eta_xz and eta_yz separately. Can you get the joint distribution? If you can do that, my intuition is you might be able to improve the power of the test (and maybe detect positive control #3?). For instance, if you can get your bootstraps for eta_xz and eta_yz together, could you just use a paired t-test to check for equality of means?

      The proof is a lot better, and it's great that you nailed down the requirement on the decay of beta, but the proof is still confusing in some places:

      - On pg 29, it says "That is, dividing the right equation in Eq. 5.8 with alpha, we write the ..." but the next equation doesn't obviously have anything to do with Eq. 5.8, and instead (I think) it comes from Eq 5.5. This could be clarified.

      - Later on page 29, you write "We now evoke the requirement that the averages xt and yt are stationary", but then you just repeat Eq. 5.11 and set it to zero. Clearly you needed the limit condition to set Eq. 5.11 to zero, but it's not clear what you're using stationarity for. I mean, if you needed stationarity for 5.11 presumably you would have referenced it at that step.

      It could be helpful for readers if you could spell out the practical implications of the theorem's assumptions (other than the no-causality requirement) by discussing examples of setups where it would or wouldn't hold.

  3. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Early in the days of YouTube, one YouTube channel (lonelygirl15 [f1]) started to release vlogs (video web logs) consisting of a girl in her room giving updates on the mundane dramas of her life. But as the channel continued posting videos and gaining popularity, viewers started to question if the events being told in the vlogs were true stories, or if they were fictional. Eventually, users discovered that it was a fictional show, and the girl giving the updates was an actress.

      I thought there was something particularly interesting about lonelygirl15's story in that it illustrates how much responsibility there is to being authentic online. The fact that "humans don't like being fooled" really resonated with me—I have certainly felt that way when I discovered something I had considered to be true later turned out to have been staged or manufactured. And, I have to admit, I also think that something is sort of interesting in that despite the revelation of truth, the channel just kept growing. People may have been upset initially, but they also realized that the narrative being told really was good, and they still wanted to know what occurred. It makes me wonder if, even though we appreciate authenticity, we just sort of love a good story even if it isn't "real."

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper Kawasaki et al describe a regulatory role for the PIWI/piRNA pathway in rRNA regulation in Zebrafish. This regulatory role was uncovered through a screen for gonadogenesis defective mutants, which identified a mutation in the meioc gene, a coiled-coil germ granule protein. Loss of this gene leads to redistribution of Piwil1 from germ granules to the nucleolus, resulting in silencing of rRNA transcription.

      Strengths:

      Most of the experimental data provided in this paper is compelling. It is clear that in the absence of meioc, PiwiL1 translocates in to the nucleolus and results in down regulation of rRNA transcription. the genetic compensation of meioc mutant phenotypes (both organismal and molecular) through reduction in PiwiL1 levels are evidence for a direct role for PiwiL1 in mediating the phenotypes of meioc mutant.

      Weaknesses:

      Questions remain on the mechanistic details by which PiwiL1 mediated rRNA down regulation, and whether this is a function of Piwi in an unperturbed/wildtype setting. There is certainly some evidence provided in support of the natural function for piwi in regulating rRNA transcription (figure 5A+5B). However, the de-enrichment of H3K9me3 in the heterozygous (Figure 6F) is very modest and in my opinion not convincingly different relative to the control provided. It is certainly possible that PiwiL1 is regulating levels through cleavage of nascent transcripts. Another aspect I found confounding here is the reduction in rRNA small RNAs in the meioc mutant; I would have assumed that the interaction of PiwiL1 with the rRNA is mediated through small RNAs but the reduction in numbers do not support this model. But perhaps it is simply a redistribution of small RNAs that is occurring. Finally, the ability to reduce PiwiL1 in the nucleolus through polI inhibition with actD and BMH-21 is surprising. What drives the accumulation of PiwiL1 in the nucleolus then if in the meioc mutant there is less transcription anyway?

      Despite the weaknesses outlined, overall I find this paper to be solid and valuable, providing evidence for a consistent link between PIWI systems and ribosomal biogenesis. Their results are likely to be of interest to people in the community, and provide tools for further elucidating the reasons for this link.

      The amount of cytoplasmic rRNA in piwi+/- was increased by 26% on average (figure 5A+5B), the amount of ChiP-qPCR of H3K9 was decreased by about 26% (Figure 6F), and ChiP-qPCR of Piwil1 was decreased by 35% (Figure 6G), so we don't think there is a big discrepancy. On the other hand, the amount of ChiP-qPCR of H3K9 in meioc<sup>mo/mo</sup> was increased by about 130% (Figure 6F), while ChiP-qPCR of Piwil1 was increased by 50%, so there may be a mechanism for H3K9 regulation of Meioc that is not mediated by Piwil1. As for what drives the accumulation of Piwil1 in the nucleolus, although we have found that Piwil1 has affinity for rRNA (Fig. 6A), we do not know what recruits it. Significant increases in the 18-35nt small RNA of 18S, 28S rRNAs and R2 were not detected in meioc<sup>mo/mo</sup> testes enriched for 1-8 cell spermatogonia, compared with meioc<sup>+/mo</sup> testes. The nucleolar localization of Piwil1 has revealed in this study, which will be a new topic for future research.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors report that Meioc is required to upregulate rRNA transcription and promote differentiation of spermatogonial stem cells in zebrafish. The authors show that upregulated protein synthesis is required to support spermatogonial stem cells' differentiation into multi-celled cysts of spermatogonia. Coiled coil protein Meioc is required for this upregulated protein synthesis and for increasing rRNA transcription, such that the Meioc knockout accumulates 1-2 cell spermatogonia and fails to produce cysts with more than 8 spermatogonia. The Meioc knockout exhibits continued transcriptional repression of rDNA. Meioc interacts with and sequesters Piwil1 to the cytoplasm. Loss of Meioc increases Piwil1 localization to the nucleolus, where Piwil1 interacts with transcriptional silencers that repress rRNA transcription.

      Strengths:

      This is a fundamental study that expands our understanding of how ribosome biogenesis contributes to differentiation and demonstrates that zebrafish Meioc plays a role in this process during spermatogenesis. This work also expands our evolutionary understanding of Meioc and Ythdc2's molecular roles in germline differentiation. In mouse, the Meioc knockout phenocopies the Ythdc2 knockout, and studies thus far have indicated that Meioc and Ythdc2 act together to regulate germline differentiation. Here, in zebrafish, Meioc has acquired a Ythdc2-independent function. This study also identifies a new role for Piwil1 in directing transcriptional silencing of rDNA.

      Weaknesses:

      There are limited details on the stem cell-enriched hyperplastic testes used as a tool for mass spec experiments, and additional information is needed to fully evaluate the mass spec results. What mutation do these testes carry? Does this protein interact with Meioc in the wildtype testes? How could this mutation affect the results from the Meioc immunoprecipitation?

      Stem cell-enriched hyperplastic testes came from wild-type adult sox17::GFP transgenic zebrafish. Sperm were found in these hyperplastic testes, and when stem cells were transplanted, they self-renewed and differentiated into sperm. It is not known if the hyperplasias develop due to a genetic variant in the line. We added the following comment in L201-204.

      “The SSC-enriched hyperplastic testes, which are occasionally found in adult wildtype zebrafish, contain cells at all stages of spermatogenesis. Hyperplasia-derived SSCs self-renewed and differentiated in transplants of aggregates mixed with normal testicular cells.”

      Reviewer #3 (Public review):

      Summary:

      The paper describes the molecular pathway to regulate germ cell differentiation in zebrafish through ribosomal RNA biogenesis. Meioc sequesters Piwil1, a Piwi homolog, which suppresses the transcription of the 45S pre-rDNA by the formation of heterochromatin, to the perinuclear bodies. The key results are solid and useful to researchers in the field of germ cell/meiosis as well as RNA biosynthesis and chromatin.

      Strengths:

      The authors nicely provided the molecular evidence on the antagonism of Meioc to Piwil1 in the rRNA synthesis, which supported by the genetic evidence that the inability of the meioc mutant to enter meiosis is suppressed by the piwil1 heterozygosity.

      Weaknesses:

      (1) Although the paper provides very convincing evidence for the authors' claim, the scientific contents are poorly written and incorrectly described. As a result, it is hard to read the text. Checking by scientific experts would be highly recommended. For example, on line 38, "the global translation activity is generally [inhibited]", is incorrect and, rather, a sentence like "the activity is lowered relative to other cells" is more appropriate here. See minor points for more examples.

      Thank you for pointing that out. I corrected the parts pointed out.

      (2) In some figures, it is hard for readers outside of zebrafish meiosis to evaluate the results without more explanation and drawing.

      We refined Figure 1A and added explanation about SSC, sox17::egfp positive cells, and the SSC-enriched hyperplastic testis in L155-158.

      (3) Figure 1E, F, cycloheximide experiments: Please mention the toxicity of the concentration of the drug in cell proliferation and viability.

      When testicular tissue culture was performed at 0.1, 1, 10, 100, 250, and 500mM, abnormal strong OP-puro signals including nuclei were found in cells at 10mM or more. We added the results in the Supplemental Figure S2G. In addition, at 1mM, growth was perturbed in fast-growing 32≤-cell cysts of spermatogonia, but not in 1-4-cell spermatogonia, as described in L127-130.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I don't have any recommendations for improvement. While I have outlined some of the weaknesses of the paper above. I don't see addressing these questions as pertinent for publication of this paper.

      Reviewer #2 (Recommendations for the authors):

      (1) The manuscript uses the terms 1-2 cell spermatogonia, GSC, and SSC throughout the figures and text. For example, 1-2 cell spermatogonia is used in Figure 1C, GSC is used in Figure 1F, and SSC is used in Figure 1 legend. The use of all three terms without definitions as to how they each relate with one another is confusing, particularly to those outside the zebrafish spermatogenesis field. It would be best to only use one term if the three terms are used interchangeably or to define each term if they represent different populations.

      GSC is a writing mistake. In this study, sox17-positive cells, which have been confirmed to self-renew and differentiate (Kawasaki et al., 2016), are considered SSCs. On the other hand, a comparison of meioc and ythdc2 mutants revealed differences in the composition of each cyst, so we describe the number of cysts confirmed. We added new data that 1-2 cell spermatogonia are sox17-positive in Supplemental Figure S3 (L157-158).

      (2) Figure 1B: What does the "SC" label represent in these figure panels?

      We added the explanation in the Figure legend.

      (3) Fig 7B and S7B show incongruent results, and the text implies that Fig S7B data better reflects in vivo biology. It is not clear how the authors interpret the different results between 7B and S7B.

      Thank you for pointing that out. Fig 7A and 7B were obtained by isolating sox17-positive cells. Because it was difficult to detect nucleoli in the isolated cells, probably due to the isolation procedure, we added S7B, which was analyzed in sectioned tissues. As this reviewer pointed out, S7B reflects the in vivo state better, so we changed S7B to 7B and 7B to S7B.

      Reviewer #3 (Recommendations for the authors):

      Minor points:

      (1) For general readers, it is nice to add a scheme of zebrafish spermatogenesis (lines 77-78) together with Figure 1A.

      As mentioned above, we refined Figure 1A.

      (2) Line 28, silence: the word "silence" is too strong here since rDNA is transcribed in some levels to ensure the cell survival.

      Thank you for your comment. We changed "silence" to "maintain low levels."

      (3) Line 60, YTDHC2: Please explain more about what protein YTDHC2 is.

      We added a description of Ythdc2 in the introduction.

      (4) Line 69, Piwil1: Please explain more about what protein Piwil1 is.

      We added a description of Piwil1 in the introduction.

      (5) Figure 1B, sperm: Please show clearly which sperms are in this figure using arrows etc.

      We represented sperm using arrowheads in Fig 1B.

      (6) Figure 1C, SC: Please show what SC is in the legend.

      We added the explanation in the Figure legend.

      (7) Line 83, meiotic makers: should be "meiotic prophase I makers".

      Thank you for pointing out the inaccurate expression description. We revised it.

      (8) Line 84, phosphor-histone H3: Should be "histone H3 phospho-S10 "

      We revised it.

      (9) Figure S1A, PH3: Please add PH3 is "histone H3 phospho-S10 ".

      We revised it.

      (10) Figure S1A, moto+/-: this heterozygous mutant showed an increased apoptosis. If so, please mention this in the text. If not, please remove the data.

      Thank you for pointing that out. The heterozygous mutant did not increase apoptosis, so we removed the data.

      (11) Line 88, no females developed: This means all males in the mutant. If so, what Figure S1B shows? These cells are spermatocytes? No "oocytes" developed is correct here?

      All meioc<sup>mo/mo</sup> zebrafish were males, and the meioc<sup>mo/mo</sup> cells in Fig. S1B are spermatogonia. No spermatocytes or oocytes were observed. To show this, we added "no oocytes" in L90.

      (12) Line 89, initial stages: What do the initial stages mean here? Please explain.

      The “initial stages” was changed to the pachytene stage.

      (13) Figure S1C: mouse Meioc rectangle lacks a right portion of it. Please explain two mutations encode a truncated protein in the main text.

      I apologize. It seems that the portion was missing during the preparation of the manuscript. We corrected it. In addition, we added a description of the protein truncation in L100-101.

      (14) Line 99: What "GRCz11" is.

      GRCz11 refers to the version of the zebrafish reference genome assembly. We added this.

      (15) Figure S2A: Dotted lines are cysts. If so, please mention it in the legend.

      We corrected the figure legend.

      (16) Figure S2B and C:, B1-4, C1-7: Rather use spermatogonia etc as a caption here.

      We corrected the figure and figure legend.

      (17) Line 113, hereafter, wildtype: Should be "wild type" or "wild-type".

      We corrected them.

      (18) Figure 1C: Please indicate what dotted lines mean here.

      We added “Dotted lines; 1-2 cell spermatogonia.”

      (19) Line 113, de novo: Please italicize it.

      We corrected it.

      (20) Line 113-116: Figure 1D shows two populations in the protein synthesis (low and high) in the 1-2-cell stage. Please mention this in the text.

      We added mention of two population.

      (21) Line 121, in vitro: Please italicize it.

      We corrected it.

      (22) Line 138-139, Figure 2A: Please indicate two populations in the rRNA concentrations (low and high) in the 1-2-cell stage. How much % of each cell is?

      We added mention of two population and % of each cell.

      (23) Figure 2B, cytes: Please explain the rRNA expression in spermatocytes (cytes) in the text.

      The decrease in rRNA signal intensity in spermatocytes was added.

      (24) Figure 2A, lines 147, low signals: Figure 2A did not show big differences between wild type and the mutant. What did the authors mean here? Lower levels of rRNAs in the mutant than in wild type. If so, please write the text in that way.

      We think that it is important to note that we were unable to find cells with upregulated rRNA signals, and therefore changed to “could not find cells with high signals of rRNAs and Rpl15 in meioc<sup>mo/mo</sup> spermatogonia”.

      (25) Figure 2E: Please add a schematic figure of a copy of rDNA locus such as Fig. S3A right.

      We added a schema of rDNA locus and primer sites such as Figure S3A right (now Figure 2F) in Figure 2E.

      (26) Figure S3A: This Figure should be in the main Figure. The quantification of Northern blots should be shown as a graph with statistical analysis.

      We added the quantification and transfer to the main Figure (Figure 2F).

      (27) Figure 4A: Please show single-color images (red or green) with merged ones.

      We added single-color images in the Figure 4A.

      (28) Line 198, Piwil1: Please explain what Piwil1 is briefly.

      We are sorry, but we could not quite understand the meaning of this comment. To show that Piwil1 is located in the nucleolus, we indicated it as (Figure 4A, arrowhead) in L209.

      (29) Line 198, Ddx4-positive: What is "Ddx4-positive"? Explain it for readers.

      Ddx4 is a marker for germinal granules, and the description was changed to reflect this.

      (30) Line 209, Fig. S4D-G: Please mention the method of the detection of piRNA briefly.

      We have described that we have sequenced small RNAs of 18-35 nt. Accordingly, we changed the term piRNA to small RNA.

      (31) Line 217: Please mention piwil1 homozygous mutant are inviable.

      We added that piwil1-/- are viable in L231.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      __Evidence, reproducibility and clarity __

      The manuscript explores mild physiological and metabolic disturbances in patient-derived fibroblasts lacking G6Pase expression, suggesting that these cells retain a "distinctive disease phenotype" of GSD1a. The manuscript is well written with well-designed experiments. However, it remains unclear whether these phenotypes genuinely reflect the pathology of GSD1a-relevant tissues. The authors did not validate these findings in a liver-specific G6pc knockout mouse model, raising concerns about the study's relevance to GSD1a. Additionally, the lack of sufficient in vivo evidence undermines the therapeutic potential of GHF201 for this disease. Overall, the study lacks a few key pieces of evidence to completely justify its conclusions at both fundamental and experimental levels.

      __Reply:__We thank the reviewer for this general comment which gives us the opportunity to better explain the scope of our work. The purpose and focus of this work are not to test the pathological relevance of skin fibroblasts to GSD1a pathology. We do not claim that skin fibroblasts are involved in GSD1a pathogenesis. It is also not a developmental work claiming to uncover GSD1a pathogenic axis throughout embryonic development. As a matter of fact, since skin fibroblasts originate from the mesoderm embryonic germ layer and hepatocytes develop from the endoderm embryonic germ layer, it would even be unlikely that the pathological phenotype found in skin fibroblasts directly contributes to GSD1a pathology in model mice or in patients. Indeed, we are not aware of any dermatological contribution to GSD1a pathology in patients. However, our results suggest that in addition to the established and mutated organ (liver in the liver-specific G6pc knockout mouse model), other, relatively less studied, patho-mechanisms in distal tissues may also contribute to GSD1a pathology. Notably, this work is also not testing a therapeutic modality for GSD1a. Our work uses GSD1a disease models as a tool for demonstrating, or reviving, the concept of epigenomic landscape (Waddington, 1957): Different cell phenotypes, such as healthy and diseased, are established by innate metabolic differences between their respective cell environments, which impose epigenetic changes generating these different phenotypes. In this respect, our manuscript has a similar message to the one in the recently published paper Korenfeld et al (2024) Nucleic Acids Res 53:gkae1161. doi: 10.1093/nar/gkae1161: The Kornfeld et al paper shows that intermittent fasting generates an epigenetic footprint in PPARα-binding enhancers that is "remembered" by hepatocytes leading to stronger transcriptional response to imposed fasting by up-regulation of ketogenic pathways. In the same way, the diseased GSD1a status imposes metabolic changes, as detailed here, leading to permanent epigenetic changes, also described here, which are "remembered" by GSD1a fibroblasts and play a major role in the transcription of pathogenic genes in these patient's cells. This in turn is how the diseased state is preserved, even in cells not expressing the G6Pase mutant, which is the direct cause of the disease. We added this perspective to the Discussion to better highlight the key takeaway from our manuscript.Naturally, research such as ours with a claim on biological memory would involve ex vivo experiments where tissues are isolated from their in-situ environments and tested for preservation of the original in situ phenotype. The few in vivo experiments we performed (Fig. 5) are mainly aimed at demonstrating that not only the phenotype, but also therapy response is "remembered" ex vivo: In the same way that the G6PC-loss-of-function liver responded positively to GHF201 therapy in situ, ex vivo cells not expressing G6PC also responded positively to the same therapy. This observation only demonstrates further support for "memorization" of the disease phenotype by cell types not expressing the mutant: Both the diseased phenotype and response to therapy were preserved ex vivo.Lastly, while interesting, validation of our findings in vivo (as suggested by the reviewer) is not related to the scope of this manuscript. Such experiments, using the liver-targeted G6pc knockout mouse model, are the follow-up story, which is related to the origin of inductive signals that cause the curious and novel phenotype mechanism in GSD1a fibroblasts described in this manuscript. The scope and volume of such research constitute a novel manuscript.

      Since dietary restriction is the only management strategy for GSD1a, the authors should clarify whether the patient fibroblast donors were on a dietary regimen and for how long. Given that fibroblasts do not express G6Pase, it is possible that the observed phenotype could be influenced by the patient's diet history.

      __Reply:__We thank the reviewer for this important comment, we agree that it is important to note the dietary regimen assigned to the cohort of patients described in this study. We added an explanation to the manuscript on patient's diets as shown below.Briefly, all patients besides patient 6894 were treated with the recommended dietary regimen for GSD1a as explained in Genereviews (Bali et al (2021)). This dietary treatment (now added to the Methods section in the manuscript) allows to maintain normal blood glucose levels, prevent secondary metabolic derangements, and prevent long-term complications. Specifically, this dietary treatment includes- nocturnal nasogastric infusion of a high glucose formula in addition to usual frequent meals during. By constantly maintaining a nearly normal level of blood glucose, this treatment causes a remarkable decrease, although not normalization, of blood lactate, urate and triglyceride levels, as well as bleeding time values. A second layer in the treatment includes the use of uncooked starch in the dietary regimen to allow maintenance of a normal blood glucose levels for long periods of time. Patient 6894 did not tolerate well the uncooked cornstarch and therefore was treated with a tailored dietary treatment planned by metabolic disease specialists and dedicated certified dieticians highly experienced with the management of pediatric and adult patients with GSDs and other inborn errors of metabolism. The biopsies of patients were taken in the range of 3 month to several years from receiving the aforementioned dietary regimen.Importantly, the strict metabolic diet imposed on GSD1a patients might influence the observed phenotype described throughout the manuscript. This concept aligns with our claim that the GSD1a skin cells are affected by the dysregulated metabolism in patients in comparison to healthy individuals. Interestingly, while patient 0762 harbors a mutation in the SI gene in addition to the G6PC mutation and patient 6894 did not receive the same dietary regimen as other patients (as explained above), all patients do show similar disease related phenotypes, perhaps highlighting the role of an early programing process that affected these cells due to the severe metabolic aberrations presented in this disease from birth.One of the main pathological features of GSD1a is glycogen buildup. The authors should compare glycogen levels between healthy controls and GSD1a fibroblasts and provide a dot plot analysis.

      One of the main pathological features of GSD1a is glycogen buildup. The authors should compare glycogen levels between healthy controls and GSD1a fibroblasts and provide a dot plot analysis.

      __Reply:__We thank the reviewer for this important comment. We added glycogen levels of HC to Figure S2A and accordingly also edited the relevant text in the Results section.

      Figure S2A - As mentioned above, the authors should present healthy control vs. patient fibroblast glycogen data. Without this, the rationale for using GHF201 is questionable.

      __Reply:__We thank the reviewer for this important comment. We added glycogen levels of HC to Figure S2A as mentioned above.

      Figure S2B-C - If the authors propose that GHF201 reduces glycogen and increases intracellular glucose in GSD1a fibroblasts, they need direct evidence. Either directly quantifying glycogen levels or even better would be a labeling experiment to confirm that the free intracellular glucose originates from glycogen. Additionally, the reduction in sample size from N=24 in glycogen analysis to N=3 in the glucose assay needs justification.

      __Reply:__We thank the reviewer for this comment. To clarify, the results shown in Figure S2A left are based on PAS assay, directly quantifying glycogen in cells with and without GHF201 treatment. We have now added HC glycogen levels as requested above. Regarding N, this is explained in Methods: In imaging experiments N was determined based on wells from the experiments done in three independent plates following the rationale that each well is independent from the others and reflects a population of hundreds of cells as previously described in (Lazic SE, Clarke-Williams CJ, Munafò MR (2018) What exactly is 'N' in cell culture and animal experiments?. PLOS Biology 16(4):e2005282. https://doi.org/10.1371/journal.pbio.2005282, Gharaba S, Sprecher U, Baransi A, Muchtar N, Weil M. Characterization of fission and fusion mitochondrial dynamics in HD fibroblasts according to patient's severity status. Neurobiol Dis. 2024 Oct 15;201:106667. doi: 10.1016/j.nbd.2024.106667. Epub 2024 Sep 14. PMID: 39284371.). Figure S2A right shows the glucose quantification experiment that we think the reviewer is referring to. Glucose increase is normally concomitant with glycogen reduction and we therefore show these results in support of the glycogen reduction results. These glucose results are part of our metabolomics results done on the same cells (Figure 6), where glucose is one of the metabolites analyzed. This metabolomics analysis was repeated three times; therefore, N is 3. In summary, these results show that GHF201 directly contributes to glycogen reduction in GSD1a fibroblasts and concomitantly increases glucose levels.

      Figure S2B-C- It is not shown how GHF201 increases intracellular glucose? If glycophagy is a possibility, the authors should do an experiment to confirm this.

      __Reply:__Assuming the reviewer's comment is related to Figure S2A right, glucose levels are only shown to validate the glycogen reduction results (also see point 4): When glycogen levels are reduced, especially by inhibition of glycogen synthesis, glucose levels are supposed to concomitantly rise, being spared as an indirect substrate of glycogen synthesis. There is no proof, and as a matter of fact we also do not assume, that the GHF201-mediated reduction in glycogen levels is a result of increased glycophagy: Glycophagy has been described in cell types with high glycogen turnover, e.g., muscle and liver cells, not fibroblasts. Additionally, glycophagy is a glycogen-selective process implicating STBD1 whose expression in skin fibroblasts is negligible (https://www.proteinatlas.org/ENSG00000118804-STBD1/tissue).On the other hand, glycogen in GSD1a does not accumulate in lysosomes. It is built up in the cytoplasm (Hicks et al (2011) Ultrastr Pathol 35: 183-196; Hannah et al (2023) Nat Rev Dis Primers DOI: 10.1038/s41572-023-00456-z). Therefore, we do not believe that GHF201 reduced glycogen by enhancing glycophagy. As we show, GHF201 activated several key catabolic pathways. It is more likely that activation of one of these pathways, the AMPK pathway, inhibited glycogen synthesis via phosphorylation and ensuing inhibition of glycogen synthase. Alternatively, excessive cytoplasmic glycogen might enter lysosomes by bulk autophagy, or microautophagy (not by glycophagy) and GHF201 might induce lysosomal glycogenolysis by alpha glucosidase as an established lysosomal activator (Kakhlon et al (2021)). However, since, as explained, the mechanism of action of GHF201 is not the topic of this manuscript and therefore we did not dwell more into that.

      Figure 2- How can GSD1a fibroblasts have significantly reduced OCR (Fig. 2B) but increased mitochondrial ATP production (Fig. 2H)?

      __Reply:__We thank the reviewer for highlighting this important topic. OCR, measured in Fig. 2B, is an indirect measure of ATP production. Therefore, changes in OCR only measure the capacity of the mitochondria to produce ATP, and not the direct quantity of ATP. Other factors might influence ATP production, e.g., substrate availability and the activity of other metabolic pathways. On the other hand, the ATP Rate Assay (Figure 2h), provides a real-time direct measurement of ATP levels incorporating coupling efficiency and P/O ratio assumptions. Therefore, these two measurements do not necessarily match. We will add this information to the relevant segment in the text to clarify why OCR is reduced and mitochondrial ATP production increased in GSD1a cells.

      Why do GSD1a fibroblasts show reduced glycolytic ATP (Figure 2h) despite increased glycolysis and glycolytic capacity (Fig 2J-K)?

      __Reply:__We thank the reviewer for highlighting this important topic. ECAR measures medium acidification and thus reflects the production of lactic acid, which is a byproduct of glycolysis. However, medium acidification is also influenced by other factors that can acidify the extracellular environment, especially CO2 production which can originate from the intramitochondrial Krebs cycle which produces reductive substrates for mitochondrial respiration, or OCR. Moreover, the buffering capacity of the Seahorse mito stress assay medium might mask changes in lactic acid production, leading to an underestimation of glycolytic activity. On the other hand, glycolytic ATP production measured by the ATP rate assay directly quantifies the rate of ATP production from glycolysis. Notably, there is a major difference between ECAR and the ATP rate assay: The ATP rate assay is less sensitive to variations in buffering capacity than ECAR measurements. This is because the ATP rate assay relies on inhibitor-driven changes in OCR and ECAR, rather than absolute pH values.Teleologically, as indicated, the increased ECAR in GSD1a cells represents a known compensatory response to deficient ATP production which is stimulation of glycolysis (Figure 2i). To test the success of this known compensatory attempt, we applied the real-time ATP rate assay, but as explained they do not report the same entities. We will add this information to the relevant segment in the text to clarify how reduced glycolytic ATP can be co-observed with increased glycolytic capacity.

      The authors should clarify how many healthy control and patient fibroblast lines were compared per experiment. Given the wide age range, the unexpectedly small error bars raise concerns about variability and statistical robustness.

      Reply:__We thank the reviewer for raising this topic. Number of samples per experiment is reported in the Methods section. As for the age range, patients age was matched to healthy controls to account for age differences and experiments were performed under similar passages range. This procedure allowed us to control for technical differences between samples that might arise due to different passages and ages. Importantly, the cohort of samples used in this manuscript included GSD1a patients with different ages further implying the strength of the observed disease phenotype found in patients' cells which exists regardless of the different age and gender of patients. The HC samples were chosen to match age and gender and passages were used in the recommended range (L. Hayflick,The limited in vitro lifetime of human diploid cell strains,Experimental Cell Research,Volume 37, Issue 3,1965,Pages 614-636, änzelmann S, Beier F, Gusmao EG, Koch CM, Hummel S, Charapitsa I, Joussen S, Benes V, Brümmendorf TH, Reid G, Costa IG, Wagner W. Replicative senescence is associated with nuclear reorganization and with DNA methylation at specific transcription factor binding sites. Clin Epigenetics. 2015 Mar 4;7(1):19. doi: 10.1186/s13148-015-0057-5. PMID: 25763115; PMCID: PMC4356053., Magalhães, S.; Almeida, I.; Pereira, C.D.; Rebelo, S.; Goodfellow, B.J.; Nunes, A. The Long-Term Culture of Human Fibroblasts Reveals a Spectroscopic Signature of Senescence. Int. J. Mol. Sci. __2022, 23, 5830. https://doi.org/10.3390/ijms23105830). Finally, for the error bars, assuming the reviewer is addressing this for all experiments, this means that results are consistent across each compared group and reflects robustness of the results. Further, to ensure statistical robustness we used bootstrapping, 95% confidence intervals and other statistical methodologies that were designed to increase the validity of the conclusions drawn from different experiments.

      Figure 5- The study should include Tamoxifen-untreated mice as a control to properly assess the efficacy of GHF201 in regulating glucose-6-P and glycogen levels.

      __Reply:__GHF201 reduced liver glucose-6-phosphate (G6P) with p-/-* mice livers and their normalization by GHF201.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      General comments: the authors propose a very intriguing concept, that metabolic abnormalities trigger epigenetic changes in tissues distal from the disease site, even in cells in which the affected gene is not expressed. This is demonstrated in primary fibroblasts from patients with Glycogen Storage Disease type 1a (GSD1a). The authors provide a large amount of data to support the compelling concept of "Disease-Associated Programming", a term that they have coined to describe this effect. The level of novelty is very high and so is the impact of the study, since the above may apply to many different pathological conditions. Although, the study is well performed and employs multiple approaches and analyses to address the raised hypothesis, there are some limitations and concerns that need to be addressed by the authors.

      __Reply:__We thank the reviewer for this comment and will address each comment raised.

      The different phenotypic characteristics are only demonstrated in skin fibroblasts which is not sufficient to support the conclusions made in the Discussion about the general applicability of the proposed disease-induced, metabolite-driven epigenetic programming to all cells and tissues. The authors should discuss this as a limitation of the study and general conclusions should be formulated with more caution.

      __Reply:__We concur with this comment and accept that this is a general limitation of the study. We added a reservation clause at the beginning of the Discussion section.

      The authors describe a range of alterations in patients' fibroblasts as compared to healthy control fibroblasts. However, they draw parallels to the liver which is the organ primarily affected by GSD1a, stating that tissues other than the liver such as skin fibroblasts phenocopy the liver pathology (Discussion). Extrapolation of the findings to the liver is also made in the section "ATAC-seq, RNA-seq and EPIC methylation data integration". Here, the authors comment on the finding that identified genes are associated with tumour formation and draw parallels to hepatocellular carcinoma which is an important co-morbidity of GSD1a. These correlations, although interesting, should be presented as indications and not as "strong links". A major difference between fibroblasts and liver cells in the case of GSD1a is the massive accumulation of glycogen in the liver. This is a major metabolic feature which largely defines the disease's pathology. In addition to the similarities in the pathological features between the liver and other tissues such as fibroblasts, the authors should highlight this major difference and discuss their findings within this context.

      __Reply:__We thank the reviewer for this important comment. We have toned down the language correlating the regulation of gene expression between fibroblasts and liver in GSD1a. We have also alluded to the key metabolic difference between fibroblasts and liver - glycogen levels and turnover - in the second paragraph of the Discussion. We are aware that if our deep analyses were conducted on a different tissue with different basal metabolism the results might have been different. However, the GSD1a-pathogenic findings in fibroblasts suggest that they might also contribute to pathology in situ, perhaps by modulating the expression of functionally redundant genes.

      For basically all experiments performed in the study the authors follow the approach of culturing cells for 48 hours under serum and glucose starvation, followed be 24-hour cultivation in complete medium. This was practiced in a previous study by the authors (PMID: 34486811) to enhance the levels of glycogen in skin fibroblasts of patients with Adult Polyglucosan Body Disease. For the current study the selection of this treatment protocol is not sufficiently justified. Although, differences are described between patients' fibroblasts and controls under these conditions, it would have been interesting to address the reported parameters also at standard culturing conditions. This might be too much to ask for the purposes of this revision, but the authors may provide a better justification for the selection of the above treatment protocol and discuss whether the described phenotypic features are constitutive abnormalities present at all times or are induced by the metabolic stress imposed to the cells through this treatment.

      __Reply:__We thank the reviewer for pointing this important topic. Previously, we used the 72 h condition (48 h starvation followed by 24 h glucose supplementation) to attain two goals: generation of glycogen burden by excessive glucose re-uptake after glucose starvation and induction of basal autophagy by serum starvation so as to sensitize detection of the action of the autophagic activator GHF201 on a background of already induced autophagy. As stated, this 72 h condition was used previously in other GSD cell models (Kakhlon et al (2021) - GSDIV, Mishra et al (2024) - GSDIII, GSDII - in preparation), so we decided to use it in this work as well to enable cross-GSD comparison of GHF201 efficacy in GSD cell models. Moreover, as shown in Figure 1, the largest differences between HC and GSD1a fibroblasts, especially in lysosomal and mitochondrial features, were observed at the 72 h time condition. We therefore used this condition in all other fibroblasts experiments presented in this manuscript. Our ultimate aim was to test whether the metabolic reprograming induced in situ by the patients' diseased state before culturing generates stable epigenetic modifications withstanding seclusion from the original in situ environment. Thus, using the non-physiological 72 h condition, after the fibroblasts were cultured in full media remote from the in situ environment, can only confirm the stability and environment-independence of these metabolically-driven epigenetic modulations. We now provide this justification at the beginning of the Results section.

      In the Figures, the authors provide comparisons between controls and patient fibroblasts (+/- GHF201). Although the authors provide the respective p values in all figures, it is not clear which differences are considered significant and which are not. Since some of the indicated p values are > 0.0. The authors should indicate which of these changes are significant or non-significant and these should be presented and discussed accordingly in the text.

      __Reply:__We thank the reviewer for highlighting this important topic. We will add this information to the methods segment. Throughout the manuscript, p https://doi.org/10.1080/00031305.2018.1529624, Cumming, G. (2013). The New Statistics: Why and How. Psychological Science, 25(1), 7 29. https://doi.org/10.1177/0956797613504966 (Original work published 2014)). Along with the p values we presented all data points in each comparison and added bootstrap mediated 95 % confidence intervals as well. Since our sample size was small, we chose to focus on effect sizes, to use a higher p value threshold and to implement various advanced methodologies that allowed us to find important biological patterns.

      In Figure S2A, the authors show a reduction of glycogen levels in GSD1a fibroblasts following treatment with GHF201. Glycogen accumulation is central to this study, since a) is considered by the authors "a disease marker which is reversed by GHF201" - this is demonstrated in the liver of L.G6pc-/- mice and, according to the authors, replicated in the fibroblasts, b) as suggested by the authors it is the biochemical aberration that drives epigenetic modifications generating "disease memory". It is therefore important to appreciate whether GSD1a cells display pathologically increased levels of glycogen. This is also pertinent to the lack of G6PC expression in fibroblasts. The authors should include in Fig. S2A glycogen measurements of HC control fibroblasts cultured under the same conditions to compare with the levels present in GSD1a cells.

      __Reply:__We thank the reviewer for highlighting this issue. We added glycogen levels of HC to Figure 2SA as requested. Expectedly, glycogen levels are similar between HC and GSD1a fibroblasts because neither wild type G6PC1 in HC, or mutated G6PC1 in GSD1a fibroblasts is expressed. We have now corrected the manuscript text suggesting that glycogen is accumulated in GSD1a fibroblasts and rephrased the text to express the more versatile state where epigenetic modulation could be mediated by different metabolic perturbations according to the expression profile: G6PC1 mutant expressers (notably liver and kidney cells) could inhibit p-AMPK by glycogen accumulation, while non-expressers could inhibit p-AMPK by lowering NAD+. Text changes related to this new concept are found in the Results section "Exploring epigenetics as a phenotypic driver in GSD1a fibroblasts by ATAC-seq analysis" and in the Discussion section "Metabolic-driven, disease-associated programming of cell memory."

      Comparisons between protein levels (AMPK/pAMPK, Sirt1, TFEB, p62 ane PGC1a) are made on the basis of fluorescence intensity in immunostained cells. These results need to be supported by relevant western blot images to exclude that binding of the antibodies to unspecific sites contributes to the measured fluorescence.

      __Reply:__We thank the reviewer for this comment allowing us to clarify the reasoning behind the selected methods for the main markers identification. Throughout the manuscript we employed both Western blot and immunofluorescence experiments. We believe that immunofluorescence present as a more robust and efficient method for the following reasons: i. It allows to focus on proteins in their native state; ii. Immunofluorescence allows to observe proteins in relation to their location in the cells (for example TFs in nuclei area); iii. Immunofluorescence allows to focus on each cell and exclude cells which are dead, stressed or with a low viability characteristic; iv. Immunofluorescence allows to generate much more data. For the following reasons, the main proteins explored in this work we used immunofluorescence, in each immunofluorescence experiment we added a control for the secondary antibody alone, verifying the signal is related to the antibodies only. This information can be added if requested. Importantly, some of the antibodies used were recommended for immunofluorescence and not for Western blot. As the reviewer requested, we now provide western blot results for proteins that produced a signal with the antibodies in Western blots, all markers mentioned except TFEB were added to Figure S3 d.

      The authors demonstrate that treatment of GSD1a fibroblasts with histone deacetylase inhibitors reverses some of the phenotypic alterations. Given that GHF201 also improves these phenotypic differences it would be interesting to address whether GHF201 has any effect on histone acetylation.

      Reply: We strongly agree with this comment and have therfore tested for the effect of GHF201 on H3K27 acetylation levels as shown in Fiugre 3f and on the deacetylase -SIRT-1 as shown in Figure 3e, Figure S3d and representative images in Figure S2b.

      The authors report reduced levels of the transcription factors PGC1α and TFEB in GSD1a fibroblasts. Does this correlate with lower levels of expression of PGC1α and TFEB target genes in the RNA-seq experiments?

      Reply:

      We thank the reviewer for raising this topic, since there were thousands of differentially expressed genes and we cannot mention all we focused on the most important ones that comprise key pathways we wanted to highlight as described in the Results section. We have now linked in the Results section examples of PGC1α and TFEB target genes that were reduced due to lower levels of these transcription factors in GSD1a, as compared to HC cells. Importantly, a full list of the genes from the RNA-seq experiment can be found in Table S3. Genes regulated by TFEB contain the CLEAR (Coordinated Lysosomal Expression and Regulation) motif. Two notable genes regulated by CLEAR binding TFs such as TFEB, which are very important biologically, are cathepsin L and S (Figure 6A right) both of which were reduced in GSD1a and are now elaborated in the Results section referring to Figure 6a right. Additionally, Table S3 shows differentially expressed genes in GSD1a cells where there are many other lysosomal related genes that are downmodulated in GSD1a, we now added another important example, ATP6V0D2 to the Discussion as the reviewer suggested. As for PGC1alpha, a notable gene whose expression is up-modulated by PGC1alpha, which is down-modulated in GSD1a, is ALDH1A1 (Figure 6a right). In addition, we have now added PPARG and its coactivators alpha and beta to the discussion as requested by the reviewer, these genes are shown in Table S3 and are downmodulated in GSD1a. Finally, the transcriptional effect of PGC1alpha and TFEB is also mentioned in the Discussion within the cell phenotyping section, where we describe the deep impact of dysregulation of NAD+/NADH-Sirt-1-TFEB regulatory axis on the cell phenotype at all the levels described in the manuscript.

      Please revise the following sentences as the statements made are not adequately supported by the provided data a. "This NAD+/NADH increase correlated with reduced cytotoxicity and increased cell confluence (Figure 3d) suggesting that NAD+ availability prevails over ATP availability as an effector of cell thriving in GSD1a cells."

      __Reply:__If one ranks treatments according to NAD+/NADH (Figure 3c) and according to cytotoxicity (Figure 3d left) and cell confluence (Figure 3d right), then the mentioned correlation can be supported. ATP availability is compromised by gramicidin, yet gramicidin, which also increased NAD+/NADH, reduced cytotoxicity and enhanced cell confluence.

      b. "....in further support that respiration-dependent NAD+ availability mediate GHF201's corrective effect in GSD1a cells."

      __Reply:__Our data (Figure 3c) show that GHF201 increased NAD+/NADH both alone and with gramicidin.

      Please indicate on the densitometry graph of Fig. 10b the treatment (HDACi), for better visibility.

      __Reply:__We agree and have corrected the Figure as requested.

      The reference list (n=160) is probably too long for a research article.

      __Reply:__The number of references reflect the length and depth of the manuscript and we believe that each reference merits its place. We agree that the number of references is large but we are not sure which criteria to use to exclude some references and to reduce them to a more acceptable number that we assume would be determined by the publishing journal.

      The study is of high novelty and impact, as it proposes a so far undescribed biological mechanism contributing to disease pathology that could apply for general pathological conditions. Although this is a compelling concept, it is only demonstrated in skin fibroblasts which limits its applicability at an organismal level.

      __Reply:__We thank the reviewer for this comment and for raising the important comments that allowed us to improve our manuscript, please see our reply to point 1.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We* thank all three Reviewers for appreciating our work and for sharing constructive feedback to further enhance the quality of our work. It is really gratifying to read that the Reviewers believe that this work will be of interest to broad audience and will be suitable for a high profile journal. Further, the experiments suggested by the reviewers will add value to the work and will substantiate our findings. It is important to highlight that we have already performed most of the suggested experiments except a couple of experiments that we have plan to carry out during full revision. Please find below the details of experiments performed and planned to address the reviewers comments. *

      2. Description of the planned revisions

      Reviewer #1

      Comment 6. In Figure 6A, B, does the Orai3 western blot show any of the heavier bands seen in the ubiquitination IP if you show the whole blot? It should.

      Reviewer #2

      Comment 5. Fig. 6A and 6B. Show the full Orai3 and Ubiquitin WBs. As presented the figure current just shows that there are ubiquitin proteins in Orai3 pull down, not that Orai3 is ubiquitinated.

      Reviewer #3

      Comment 3. In the scheme in Fig. 10, the authors highlight that Orai3 is ubiquitinated. Do they have any idea where the site of action of ubiquitination in Orai3 is located?

      Response: We thank the Reviewer 1, 2 and 3 regarding their query on the co-immunoprecipitation assays performed for studying Orai3 ubiquitination. The reviewers are asking for ubiquitination status of Orai3 and the potential sites for Orai3 ubiquitination. To address these comments, we are planning to perform co-immunoprecipitation assays with mutated Orai3 with mutations of potential ubiquitination sites. We have already performed bioinformatic analysis and it revealed presence of three potential ubiquitination sites on Orai3: K2 (present on N-terminal region), K274 and K279 (present on C-terminal region). We would mutate these lysine residues on Orai3 protein via site-directed mutagenesis and check the Orai3 ubiquitination status. These experiments will answer the question raised by Reviewers and strengthen the Orai3 ubiquitination data.

      Please refer to below diagrammatic illustration showing potential ubiquitination sites on Orai3:

      Reviewer #2

      Comment 7. Also, all the imaging and pull down do not prove conclusively direct interaction between MARCH8 and Orai3, they rather show that the proteins are in the same complex. Although it is unlikely best for the text to be moderated accordingly.

      Response: We understand the concern raised by Reviewer 2 regarding direct or indirect interaction of MARCH8 and Orai3. Hence, we are planning to perform co-immunoprecipitation assays in which we delete the MARCH8 interacting domain in Orai3 protein and check the for direct interaction of these proteins. Bioinformatic analysis and literature survey have highlighted two possible MARCH8 interacting domains in Orai3. The first domain is present in 2nd loop region, present between the 2nd and 3rd transmembrane domains at the LMVXXXL (AA113-120) motif and the second domain is present at the GXXXG (AA235-239) motif, present in the 3rd loop region of Orai3. We will remove these domains from Orai3 protein individually and check its effect on MARCH8 interaction. These experiments will provide conclusive evidence of direct interaction between Orai3 and MARCH8.

      Please refer to below diagrammatic illustration displaying potential MARCH8 binding sites on Orai3:

      3. Description of the revisions that have already been incorporated in the transferred manuscript


      Reviewer #1

      Comment 1. The observation that both transcriptional regulation and protein degradation of Orai3 is regulated downstream of one transcription factor is not, in and of itself, entirely surprising. All proteolytic components are transcriptionally regulated and this phenomenon is likely relatively common. However, what I do think is both impressive and important is that the authors have characterized both components of the pathway within a disease context. While I am not going to search the literature for how often transcription and proteolysis are co-regulated for other proteins, it is the case for many short-lived proteins and perhaps many others. As such, discussion throughout the abstract and introduction that co-regulation of these processes is unprecedented should be removed.

      Response: We thank the Reviewer for thinking that our work is both impressive and important. Further, we understand the Reviewer’s point that transcription and proteolysis may be co-regulated for other proteins. However, our extensive literature search did not resulted in such scenarios. Therefore, to best of our knowledge, we are revealing for the first time that same transcription factor regulates both transcription and protein degradation of the same target in a context dependent manner in a single study. In case, Reviewer would still recommend to modify the text in abstract and introduction, we would do it.

      Comment 2. In discussing figure 1, the authors switch from claiming to be studying NFATc binding to studying NFAT expression. This use of 2 different naming conventions is certain to confuse readers; the authors should use the approved current naming system in referring to NFAT isoforms. In which case NFAT2 is NFATc1.

      Response: We would like to thank the Reviewer for highlighting this point. We have effectively addressed this comment by changing the nomenclature of NFAT2 to NFATc1 throughout the manuscript text and figures.

      Comment 3. The ChIP analyses in figures 1H and 7D are important findings, however, there is missing information. Typically, ChIP is used to validate putative binding sites; as such, one would expect 3 separate qPCR reactions for Orai3, not one. It is also important to note that qPCR products should be uniform in size and under 100 bp; here, the product size is not stated. Finally, demonstrating that an antibody targeting ANY other NFAT isoform fails to pull down whatever product this is would increase confidence considerably.

      Also, the gold standard for validating ChIP is to mutate the sites and eliminate binding. The "silver" standard would be to mutate them in your luciferase vector and demonstrate that NFATc1 no longer stimulates luciferase expression. Since neither of these was done, the ChIP data provided should not be considered formally validated.

      Response: We thank the Reviewer for raising this highly relevant concern. In this revised manuscript, we have addressed this comment by performing several additional experiments. The new data provided in the revised manuscript corroborates our earlier results. Indeed, this data has notably strengthen our work.

      In the revised manuscript, we performed ChIP assay where we increased the number of sonication cycles to 35 so as to make sheared chromatin of around 100 bp. Next, we designed primers to amplify individual NFATc1 binding sites on Orai3 promoter, but due to close proximity of the NFATc1 binding sites, we could design two primer sets. The primer first set to amplify the -1017 bp binding site and the second set to amplify the -990 and -920 bp. Further, as suggested by the Reviewer, we performed immunoprecipitation with the four isoforms of NFAT. Our results show that only NFATc1 pulldown shows significant enrichment of Orai3 promoter with both the primer sets as compared to the IP mock samples and other NFAT isoforms (Figure 1J). Hence, our data reveals that only NFATc1 binds to these predicted sites on the Orai3 promoter and it doesn’t show a preference among these binding sites.

      Further, as suggested by the Reviewer, we mutated the Orai3 promoter in luciferase vector with deletions of the individual NFATc1 binding sites and also cloned a truncated Orai3 promoter with no NFATc1 binding sites into the luciferase vector. The luciferase assays with these mutant and truncated promoters show that upon co-expression of NFATc1, the luciferase activity of the mutant Orai3 promoter with deletion of individual NFATc1 binding site is significantly reduced in comparison to wild type Orai3 promoter. Furthermore, the maximum decrease in luciferase activity was seen with the truncated Orai3 promoter with no NFATc1 binding sites (Figure 1I). These results show that NFATc1 binds to the predicted binding sites on Orai3 promoter. Taken together, the additional ChIP assays with the four isoforms of NFAT and luciferase assays with mutated & truncated Orai3 promoters validates the transcriptional regulation of Orai3 by NFATc1.

      Comment 4. In figures 2 and 3, only one cell line is used to represent each of 3 conditions of pancreatic cancer. That is insufficient to make generalized conclusions; some aspects of this figure (expression and stability, not function) should be extended to 2 to 3 cell lines/condition. TCGA data validating this point would also be helpful.

      Response: We really appreciate the feedback given by Reviewer 1. To strengthen our manuscript, we have addressed this comment by performing experiments in 2 cell lines/condition of pancreatic cancer. This new data in the revised manuscript provides substantial evidence for the dichotomous regulation of Orai3 by NFATc1.

      In the revised manuscript, we carried out NFATc1 overexpression and NFAT inhibition via VIVIT studies in three additional cell lines: BXPC-3 (non-metastatic), ASPC-1 (invasive) and SW1990 (metastatic). The results in these cell-lines support our earlier findings as both overexpression of NFATc1 and VIVIT mediated NFAT inhibition leads to transcriptional upregulation of Orai3 in BXPC-3 (non-metastatic) (Figure S3A, D), ASPC-1 (invasive) (Figure S3G, J) and SW1990 (metastatic) (Figure S3M, P). These results are similar to our earlier data from MiaPaCa-2 (non-metastatic), PANC-1 (invasive) and CFPAC-1 (metastatic) cells. Further, NFATc1 overexpression leads to an increase in Orai3 protein levels in BXPC-3 (non-metastatic) (Figure S3B, C) and a decrease in Orai3 protein levels in ASPC-1 (invasive) (Figure S3H, I) and SW1990 (metastatic) (Figure S3N, O). Moreover, VIVIT transfection leads to a decrease in Orai3 protein levels in BXPC-3 (non-metastatic) (Figure S3E, F) and an increase in Orai3 protein levels in ASPC-1 (invasive) (Figure S3K, L) and SW1990 (metastatic) (Figure S3Q, R). The findings in these cell lines recapitulates the data obtained earlier from MiaPaCa-2 (non-metastatic), PANC-1 (invasive) and CFPAC-1 (metastatic) cell lines. Therefore, this new data supports our conclusion regarding the dichotomous regulation of Orai3 by NFATc1 across the three conditions of pancreatic cancer.

      Comment 5. Upon finding that NFAT inhibition stimulates Orai3 transcription (same as O/E), the authors essentially conclude that this confirms regulation of Orai3 by NFAT and that there must be compensation. This is not supported by any data; the use of siRNA validates that Orai3 has some dependence on NFATc1 for transcription, but the nature of this relationship is not adequately explained.

      Response: We thank the Reviewer for asking this question. In our manuscript, we performed NFATc1 inhibition studies using VIVIT and siRNA-mediated NFATc1 knockdown. Both of these assays show increase in Orai3 mRNA levels in all non-metastatic, invasive and metastatic pancreatic cancer cell lines. To understand if the increase in Orai3 mRNA levels is via transcriptional regulation, we performed luciferase assay which showed that VIVIT mediated NFAT inhibition leads to increase in luciferase activity suggesting the binding of other transcription factors on the Orai3 promoter. To corroborate this hypothesis, in our revised manuscript, we performed luciferase assay in wild type Orai3 promoter and truncated Orai3 promoter with no NFATc1 binding sites. NFAT inhibition via VIVIT transfection led to an increase in luciferase activity in both wild type and truncated Orai3 promoter (Figure S2A). Hence, removal of NFATc1 binding sites had no significant effect on luciferase activity suggesting that apart from NFATc1, other endogenous transcription factors are involved in regulating Orai3 transcription. We have not identified all the transcription factors that can modulate Orai3 upon NFAT inhibition as it is beyond the scope of this study. We sincerely hope the Reviewer 1 would be satisfied with this additional data.

      Reviewer #2

      Comment 1. Figure 1 all overexpression no evidence of endogenous NFAT2 regulating Orai3. I realize there may be limitations on available NFAT isoform specific antibodies so it is not essential to directly show this but a comment to that effect in the paper would be useful.

      Response: We apologize to the Reviewer for not highlighting the NFAT2 (NFATc1) loss of function data effectively. Actually, in the __Figure 3 __and __Supplementary Figure 2 __of the original manuscript, we showed VIVIT mediated NFAT inhibition and siRNA induced NFATc1 silencing data to provide the evidence that endogenous NFATc1 regulates Orai3.

      Comment 2. Figure 1F. Show RNA levels of Orai3 following overexpression of the other NFAT isoforms.

      Response: As suggested by the Reviewer, in the revised manuscript, we overexpressed the four NFAT isoforms: NFATc2, NFATc1, NFATc4 & NFATc3 and checked Orai3 mRNA levels. qRT-PCR analysis shows that overexpression of NFATc1 results in the highest and significant increase in Orai3 mRNA levels compared to the empty vector and other NFAT isoforms (Figure 1F). This data corroborates the western blot data of NFAT isoforms overexpression highlighting the transcriptional regulation of Orai3 by NFATc1.

      Comment 3. Fig. S3D, E. For both MARCH3 and 8 higher expression levels correlate with better survival whereas in the text it is stated that this is the case only for MARCH8. Please correct.

      Response: The survival analysis of pancreatic cancer patients with low MARCH3 and MARCH8 levels shows that around 30% of patients with low MARCH3 levels survived for 5.5 years, whereas in case of MARCH8 30% of patients with high MARCH8 levels survived for >7.5 years. Hence high MARCH8 expression in pancreatic cancer patients provided significant survival advantage compared to high MARCH3 levels. Therefore, in the text, we meant that compared to MARCH3, higher MARCH8 levels correlate with better survival. As suggested by the Reviewer, we have modified the text to make this point clearer.

      Comment 4. For the 2APB stimulation experiments there is a large variation in the level of the response between experiments even for the same cell type. For example, compare the level of the 2APB-stimulated Orai3 influx between Fig. 4H and 5C on the MiaPaCa-2 cells. Also there doesn't seem to be a correlation between the levels of Orai3 protein from WB and the 2APB stimulated entry among the different cell lines. This needs to be addressed and differences explained.

      Response: We understand the concern raised by Reviewer 2 regarding calcium imaging experiments in MiaPaCa-2 cell line. Therefore, in the revised manuscript, we repeated calcium imaging experiments in MiaPaCa-2 and updated the representative traces as well as quantitative analysis (Figure 2D, E, 3D, E, 4H, I, S2L, M). Further, we have discussed this point in the text of the manuscript.

      Comment 6. Fig. 6C and 6D. Show the line in 6C from which the intensity profile in 6D was generated. Also give the details of the imaging setup in methods: size of the pinhole, imaging mode, etc. The colocalization is not very convincing.

      Response: As recommended by the Reviewer, in the revised manuscript, we have indicated the region used for intensity profile generation by drawing a line in the representative image (Figure 6D). Further, we have updated the methodology of colocalization microscopy with details of the size of the pinhole and imaging mode.

      Comment 8. May be worth showing that overexpression of MARCH8 in the metastatic cell lines decreases their migration and metastasis as the argument is that these cells need high Orai3 but not too high. So, it would be predicted that overexpression of MARCH8 should lower Orai3 levels enough to prevent their metastasis.

      Response: We would like to thank the Reviewer for this highly relevant suggestion. In our revised manuscript, we carried out transwell migration assays with MARCH8 overexpression as well as MARCH8 knockdown in CFPAC-1 (metastatic) cells. Our data shows that stable lentiviral knockdown of MARCH8 increased the number of migrated CFPAC-1 cells compared to shNT CFPAC-1 cells while MARCH8 overexpression decreased the number of migrated CFPAC-1 cells compared to empty vector control cells (Figure 9F, G). Therefore, as pointed out by the Reviewer, MARCH8 overexpression lowers Orai3 levels in metastatic pancreatic cancer cells and hinders their metastatic potential.

      Comment 9. Fig. 10. Show higher levels of Orai3 protein in the metastatic side.

      Response: As suggested, we have updated the summary figure (Figure 10) showing higher Orai3 protein levels in the metastatic side.

      Comment 10. Please show all full WBs in the supplementary data.

      Response: As recommended by the Reviewer, we have provided all full western blots in a supplementary file (Supplementary File 1).

      Reviewer #3


      Comment 1. The authors show that MARCH8 physically associates with Orai3 using Co-IP and Co-localization studies. For the co-localization studies the authors should still provide a quantitative analysis. Furthermore, can the authors detect FRET between March and Orai3? Can you please state the labels used in the co-localization experiments also in the figure legend.

      Response: As suggested by Reviewer 3, in the revised manuscript, we have provided quantitative analysis of Orai3 and MARCH8 co-localization. Further, we have stated the labels used in the co-localization experiment in the figure legend of the revised manuscript. Unfortunately, we could not perform FRET assay between Orai3 and MARCH8 due to limited resources. Instead, as discussed in the planned revisions section, we are planning to perform co-immunoprecipitation assay with mutated Orai3 protein in which the MARCH8 interacting domains are deleted to investigate direct interaction of Orai3 and MARCH8. We believe that Reviewer 3 will be satisfied with this experiment.

      Comment 2. In the abstract it is only getting clear at the end that pancreatic cancer cells are used. It would be great if the authors could introduce this fact already more at the beginning of the abstract.

      Response: As recommended by the Reviewer, in the revised manuscript, we have introduced the use of pancreatic cancer cells at the beginning of the abstract.

      Comment 4. In other cancer types recent reports suggest a co-expression of Orai1 and Orai3 and even the formation of heteromers. Does only Orai3 or also Orai1 play a role in pancreatic cancer cells? Could there we difference in degradation when Orai3 forms homomers or heteromers with Orai1.

      Response: We thank the reviewer for asking this interesting question. There is only one report on Orai1’s role in pancreatic cancer. It was suggested that Orai1 can contribute to apoptotic resistance of pancreatic cancer cells (Kondratska et al. BBA-Molecular Cell Research, 2014). However, only one cell line i.e. PANC-1 was used in this study. While our earlier work and other studies have demonstrated that Orai3 drives pancreatic cancer metastasis (Arora et al. Cancers, 2021) and proliferation (Dubois et al. BBA-Molecular Cell Research, 2021) respectively. Therefore, emerging literature suggests that both Orai1 and Orai3 can contribute to different aspects of pancreatic cancer progression. But whether Orai1 and Orai3 form heteromers in pancreatic cancer cells remains unexplored. Further, we believe that the degradation machinery and the underlying molecular mechanisms would be analogous for both Orai3 homomers and heteromers. Nonetheless, the rate of degradation may differ for Orai3 homomers and heteromers as literature suggests that usually proteins are more stable in large heteromeric protein complexes.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript describes a series of experiments documenting trophic egg production in a species of harvester ant, Pogonomyrmex rugosus. In brief, queens are the primary trophic egg producers, there is seasonality and periodicity to trophic egg production, trophic eggs differ in many basic dimensions and contents relative to reproductive eggs, and diets supplemented with trophic eggs had an effect on the queen/worker ratio produced (increasing worker production).

      The manuscript is very well prepared and the methods are sufficient. The outcomes are interesting and help fill gaps in knowledge, both on ants as well as insects, more generally. More context could enrich the study and flow could be improved.

      We thank the reviewer for these comments. We agree that the paper would benefit from more context. We have therefore greatly extended the introduction.

      Reviewer #2 (Public Review):

      The manuscript by Genzoni et al. provides evidence that trophic eggs laid by the queen in the ant Pogonomyrmex rugosis have an inhibitory effect on queen development. The authors also compare a number of features of trophic eggs, including protein, DNA, RNA, and miRNA content, to reproductive eggs. To support their argument that trophic eggs have an inhibitory effect on queen development, the authors show that trophic eggs have a lower content of protein, triglycerides, glycogen, and glucose than reproductive eggs, and that their miRNA distributions are different relative to reproductive eggs. Although the finding of an inhibitory influence of trophic eggs on queen development is indeed arresting, the egg cross-fostering experiment that supports this finding can be effectively boiled down to a single figure (Figure 6). The rest of the data are supplementary and correlative in nature (and can be combined), especially the miRNA differences shown between trophic and reproductive eggs. This means that the authors have not yet identified the mechanism through which the inhibitory effect on queen development is occurring. To this reviewer, this finding is more appropriate as a short report and not a research article. A full research article would be warranted if the authors had identified the mechanism underlying the inhibitory effect on queen development. Furthermore, the article is written poorly and lacks much background information necessary for the general reader to properly evaluate the robustness of the conclusions and to appreciate the significance of the findings.

      We thank the reviewer for these comments. We agree that the paper would benefit by having more background information and more discussion. We have followed this advice in the revision.

      Reviewer #3 (Public Review):

      In "Trophic eggs affect caste determination in the ant Pogonomyrmex rugosus" Genzoni et al. probe a fundamental question in sociobiology, what are the molecular and developmental processes governing caste determination? In many social insect lineages, caste determination is a major ontogenetic milestone that establishes the discrete queen and worker life histories that make up the fundamental units of their colonies. Over the last century, mechanisms of caste determination, particularly regulators of caste during development, have remained relatively elusive. Here, Genzoni et al. discovered an unexpected role for trophic eggs in suppressing queen development - where bi-potential larvae fed trophic eggs become significantly more likely to develop into workers instead of gynes (new queens). These results are unexpected, and potentially paradigm-shifting, given that previously trophic eggs have been hypothesized to evolve to act as an additional intracolony resource for colonies in potentially competitive environments or during specific times in colony ontogeny (colony foundation), where additional food sources independent of foraging would be beneficial. While the evidence and methods used are compelling (e.g., the sequence of reproductive vs. trophic egg deposition by single queens, which highlights that the production of trophic eggs is tightly regulated), the connective tissue linking many experiments is missing and the downstream mechanism is speculative (e.g., whether miRNA, proteins, triglycerides, glycogen levels in trophic eggs is what suppresses queen development). Overall, this research elevates the importance of trophic eggs in regulating queen and worker development but how this is achieved remains unknown.

      We thank the reviewer for these comments and agree that future work should focus on identifying the substances in trophic eggs that are responsible for caste determination.  

      Reviewer #1 (Recommendations For The Authors):

      Introduction:

      The context for this study is insufficiently developed in the introduction - it would be nice to have a more detailed survey of what is known about trophic eggs in insects, especially social insects. The end of the introduction nicely sets up the hypothesis through the prior work described by Helms Cahan et al. (2011) where they found JH supplementation increased trophic egg production and also increased worker size. I think that the introduction could give more context about egg production in Pogonomyrmex and other ants, including what is known about worker reproduction. For example, Suni et al. 2007 and Smith et al. 2007 both describe the absence of male production by workers in two different harvester ants. Workers tend to have underdeveloped ovaries when in the presence of the queen. Other species of ants are known to have worker reproduction seemingly for the purpose of nutrition (see Heinze and Hölldober 1995 and subsequent studies on Crematogaster smithi). Because some ants, including Pogonomyrmex, lack trophallaxis, it has been hypothesized that they distribute nutrients throughout the nest via trophic eggs as is seen in at least one other ant (Gobin and Ito 2000). Interestingly, Smith and Suarez (2009) speculated that the difference in nutrition of developing sexual versus worker larvae (as seen in their pupal stable isotope values) was due to trophic egg provisioning - they predicted the opposite as was found in this study, but their prediction was in line with that of Helms Cahan et al. (2011). This is all to say that there is a lot of context that could go into developing the ideas tested in this paper that is completely overlooked. The inclusion of more of what is known already would greatly enrich the introduction.

      We agree that it would be useful to provide a larger context to the study. We now provide more information on the life-history of ants and explained under what situations queens and workers may produce trophic eggs. We also mentioned that some ants such as Crematogaster smithi have a special caste of “large workers” which are morphologically intermediate between winged queens and small workers and appear to be specialized in the production of unfertilized eggs. We now also mention the study of Goby and Ito (200) where the authors show that trophic eggs may play an important role in food distribution withing the colony, in particular in species where trophallaxis is rare or absent.

      Methods:

      L49: What lineage is represented in the colonies used? The collection location is near where both dependent-lineage (genetic caste determining) P. rugosus and "H" lineage exist. This is important to know. Further, depending on what these are, the authors should note whether this has relevance to the study. Not mentioning genetic caste determination in a paper that examines caste determination is problematic.

      This is a good point. We have now provided information at the very beginning of the material and method section that the queens had been collected in populations known not to have dependentlineage (genetic caste determining) mechanisms of caste determination.

      L63 and throughout: It would be more efficient to have a paragraph that cites R (must be done) and RStudio once as the tool for all analyses. It also seems that most model construction and testing was done using lme4 - so just lay this out once instead of over and over.

      We agree and have updated the manuscript accordingly.

      L95: 'lenght' needs to be 'length' in the formula.

      Thanks, corrected.

      L151: A PCA was used but not described in the methods. This should be covered here. And while a Mantel test is used, I might consider a permANOVA as this more intuitively (for me, at least) goes along with the PCA.

      We added the PCA description in the Material and Method section.

      Results:

      I love Fig. 3! Super cool.

      Thanks for this positive comment.

      Discussion:

      It would be good to have more on egg cannibalism. This is reasonably well-studied and could be good extra context.

      We have added a paragraph in the discussion to mention that egg cannibalism is ubiquitous in ants.

      Supp Table 1: P. badius is missing and citations are incorrectly attributed to P. barbatus.

      P. badius was present in the Table but not with the other Pogonomyrmex species. For some genera the species were also not listed in alphabetic order. This has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      COMMENTS ON INTRODUCTION:

      The introduction is missing information about caste determination in ants generally and Pogonomyrmex rugosis specifically. This is important because some colonies of Pogonomyrmex rugosis have been shown to undergo genetic caste determination, in which case the main result would be rendered insignificant. What is the evidence that caste determination in the lineages/colonies used is largely environmentally influenced and in what contexts/environmental factors? All of this should be made clear.

      This is a good point. We have expanded the introduction to discuss previous work on caste determination in Pogonomyrmex species with environmental caste determination and now also provide evidence at the beginning of the Material and Method section that the two populations studied do not have a system of genetic caste determination.

      Line 32 and throughout the paper: What is meant exactly by 'reproductive eggs'? Are these eggs that develop specifically into reproductives (i.e., queens/males) or all eggs that are non-trophic? If the latter, then it is best to refer to these eggs as 'viable' in order to prevent confusion.

      We agree and have updated the manuscript accordingly.

      Figure 1/Supp Table 1: It is surprising how few species are known to lay trophic eggs. Do the authors think this is an informative representation of the distribution of trophic egg production across subfamilies, or due to lack of study? Furthermore, the branches show ant subfamilies, not families. What does the question mark indicate? Also, the information in the table next to the phylogeny is not easy to understand. Having in the branches that information, in categories, shown in color for example, could be better and more informative. Finally, having the 'none' column with only one entry is confusing - discuss that only one species has been shown to definitely not lay trophic eggs in the text, but it does not add much to the figure.

      Trophic eggs are probably very common in ants, but this has not been very well studied. We added a sentence in the manuscript to make this clear.

      Thanks for noticing the error family/subfamily error. This has been corrected in Figure 1 and Supplementary Table 1.

      The question mark indicates uncertainty about whether queens also contribute to the production of trophic eggs in one species (Lasius niger). We have now added information on that in the Figure legend.

      We agree with the reviewer that it would be easier to have the information on whether queens and workers produce trophic on the branches of the Tree. However, having the information on the branches would suggest that the “trait” evolved on this part of the tree. As we do not know when worker or queen production of trophic eggs exactly evolved, we prefer to keep the figure as it is.

      Finally, we have also removed the none in the figure as suggested by the reviewer and discussed in the manuscript the fact that the absence of trophic eggs has been reported in only one ant species (Amblyopone silvestrii: Masuko 2003).

      COMMENTS ON MATERIALS AND METHODS:

      Why did they settle on three trophic eggs per larva for their experimental setup?

      We used three trophic eggs because under natural conditions 50-65% of the eggs are trophic. The ratio of trophic eggs to viable eggs (larvae) was thus similar natural condition.

      Line 50: In what kind of setup were the ants kept? Plaster nests? Plastic boxes? Tubes? Was the setup dry or moist? I think this information is important to know in the context of trophic eggs.

      We now explain that colonies were maintained in plastic boxes with water tubes.

      Line 60: Were all the 43 queens isolated only once, or multiple times?

      Each of the 43 queens were isolated for 8 hours every day for 2 weeks, once before and once after hibernation (so they were isolated multiple times). We have changed the text to make clear that this was done for each of the 43 queens.

      Could isolating the queen away from workers/brood have had an effect on the type of eggs laid?

      This cannot be completely ruled out. However, it is possible to reliably determine the proportion of viable and trophic eggs only by isolating queens. And importantly the main aim of these experiments was not to precisely determine the proportion viable and trophic eggs, but to show that this proportion changes before and after hibernation and that queens do not lay viable and trophic eggs in a random sequence.

      Since it was established that only queens lay trophic eggs why was the isolation necessary?

      Yes this was necessary because eggs are fragile and very difficult to collect in colonies with workers (as soon as eggs are laid they are piled up and as soon as we disturb the nest, a worker takes them all and runs away with them). Moreover, it is possible that workers preferentially eat one type of eggs thus requiring to remove eggs as soon as queens would have laid them. This would have been a huge disturbance for the colonies.

      Line 61: Is this hibernation natural or lab induced? What is the purpose of it? How long was the hibernation and at what temperature? Where are the references for the requirement of a diapause and its length?

      The hibernation was lab induced. We hibernated the queens because we previously showed that hibernation is important to trigger the production of gynes in P. rugosus colonies in the laboratory (Schwander et al 2008; Libbrecht et al 2013). Hibernation conditions were as described in Libbrecht et al (2013).  

      Line 73: If the queen is disturbed several times for three weeks, which effect does it have on its egg-laying rate and on the eggs laid? Were the eggs equally distributed in time in the recipient colonies with and without trophic eggs to avoid possible effects?

      It is difficult to respond what was the effect of disturbance on the number and type of eggs laid. But again our aim was not to precisely determine these values but determine whether there was an effect of hibernation on the proportion of trophic eggs. The recipient colonies with and without trophic eggs were formed in exactly the same way. No viable eggs were introduced in these colonies, but all first instar larvae have been introduced in the same way, at the same time, and with random assignment. We have clarified this in the Material and Method section.

      Line 77: Before placing the freshly hatched larvae in recipient colonies, how long were the recipient colonies kept without eggs and how long were they fed before giving the eggs? Were they kept long enough without the queen to avoid possible effects of trophic eggs, or too long so that their behavior changed?

      The recipient colonies were created 7 to 10 days before receiving the first larvae and were fed ad libitum with grass seeds, flies and honey water from the beginning. Trophic eggs that would have been left over from the source colony should have been eaten within the first few days after creating the recipient colonies. However, even if some trophic eggs would have remained, this would not influence our conclusion that trophic eggs influence caste fate, given the fully randomized nature of our treatments and the considerable number of independent replicates. The same applies to potential changes in worker behavior following their isolation from the queen.

      Line 77: Is it known at what stage caste determination occurs in this species? Here first instar larvae were given trophic eggs or not. Does caste-determination occur at the first instar stage? If not, what effect could providing trophic eggs at other stages have on caste-determination?

      A previous study showed that there is a maternal effect on caste determination in the focal species (Schwander et al 2008). The mechanism underlying this maternal effect was hypothesized to be differential maternal provisioning of viable eggs. However, as we detail in the discussion, the new data presented in our study suggests that the mechanism is in fact a different abundance of trophic eggs laid by queens. There is currently no information when exactly caste determination occurs during development

      COMMENTS ON RESULTS:

      Line 65: How does investigating the order of eggs laid help to "inform on the mechanisms of oogenesis"?

      We agree that the aim was not to study the mechanism of oogenesis. We have changed this sentence accordingly: “To assess whether viable and trophic eggs were laid in a random order, or whether eggs of a given type were laid in clusters, we isolated 11 queens for 10 hours, eight times over three weeks, and collected every hour the eggs laid”

      Figure 2: There is no description/discussion of data shown in panels B, C, E, and F in the main text.

      We have added information in the main text that while viable eggs showed embryonic development at 25 and 65 hours (Fig 12 B, C) there was no such development for trophic eggs (Fig. 2 E,F).

      Line 172: Please explain hibernation details and its significance on colony development/life cycle.

      We have added this information in the Material and Method section.

      Figure 6: How is B plotted? How could 0% of gynes have 100% survival?

      The survival is given for the larvae without considering caste. We have changed the de X axis of panel B and reworded the Figure legend to clarify this.

      Is reduced DNA content just an outcome of reduced cell number within trophic eggs, i.e., was this a difference in cell type or cell number? Or is it some other adaptive reason?

      It is likely to be due to a reduction in cell number (trophic eggs have maternal DNA in the chorion, while viable eggs have in addition the cells from the developing zygote) but we do not have data to make this point.

      Is there a logical sequence to the sequence of egg production? The authors showed that the sequence is non-random, but can they identify in what way? What would the biological significance be?

      We could not identify a logical sequence. Plausibly, the production of the two types of eggs implies some changes in the metabolic processes during egg production resulting in queens producing batches of either viable or trophic eggs. This would be an interesting question to study, but this is beyond the scope of this paper.

      Figure 6b is difficult to follow, and more generally, legends for all figures can be made clearer and more easy to follow.

      We agree. We have now improved the legends of Fig 6B and the other figures.

      Lines 172-174: "The percentage of eggs that were trophic was higher before hibernation...than after. This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable" - are these data shown? It would be nice to see how the total egglaying rate changes after hibernation. Also, is the proportion of trophic eggs laid similar between individual queens?

      No the data were not shown and we do not have excellent data to make this point. We have therefore removed the sentence “This higher percentage was due to a reduced number of reproductive eggs, the number of trophic eggs laid remained stable” from the manuscript.

      Figure 6B: Do several colonies produce 100% gynes despite receiving trophic eggs? It would be interesting if the authors discussed why this might occur (e.g., the larvae are already fully determined to be queens and not responsive to whatever signal is in the trophic eggs).

      The reviewer is correct that 4 colonies produced 100% gynes despite receiving trophic eggs. However, the number of individuals produced in these four colonies was small (2,1,2,1, see supplementary Table 2). So, it is likely that it is just by chance that these colonies produced only gynes.

      Figure 5: Why a separation by "size distribution variation of miRNA"? What is the relevance of looking at size distributions as opposed to levels?

      We did that because there many different miRNA species, reflected by the fact that there is not just one size peak but multiple one. This is why we looked at size distribution

      Figure 2: The image of the viable embryo is not clear. If possible, redo the viable to show better quality images.

      Unfortunately, we do not anymore have colonies in the laboratory so this is not possible.

      COMMENTS ON DISCUSSION:

      Lines 236-247: Can an explanation be provided as to why the effect of trophic eggs in P. rugosus is the opposite of those observed by studies referenced in this section? Could P. rugosus have any life history traits that might explain this observation?

      In the two mentioned studies there were other factors that co-varied with variation in the quantity of trophic eggs. We mentioned that and suggested that it would be useful to conduct experimental manipulation of the quantity of trophic eggs in the Argentine ant and P. barbatus (the two species where an effect of trophic eggs had been suggested).

      The discussion should include implications and future research of the discovery.

      We made some suggestions of experiments that should be performed in the future

      The conclusion paragraph is too short and does not represent what was discussed.

      We added two sentences at the end of the paragraph to make suggestions of future studies that could be performed.

      Lines 231 to 247: Drastically reduce and move this whole part to the introduction to substantiate the assumption that trophic eggs play a nutritional role.

      We moved most of this paragraph to the introduction, as suggested by the reviewer.

      Reviewer #3 (Recommendations For The Authors):

      I would like to commend the authors on their study. The main findings of the paper are individually solid and provide novel insight into caste determination and the nature of trophic eggs. However, the inferences made from much of the data and connections between independent lines of evidence often extend too far and are unsubstantiated.

      We thank the reviewer for the positive comment. We made many changes in the manuscript to improve the discussion of our results.

    1. This is the premise of design justice44 Costanza-Chock, S. (2020). Design justice: Community-led practices to build the worlds we need. MIT Press. , which observes that design is fundamentally about power, in that designs may not only serve some people less well, but systematically exclude them in surprising, often unintentional ways.

      This point is eye-opening because I hadn’t thought about design as something that could exclude people. It makes me realize that designers have a lot of responsibility to think about who might be left out. I want to learn more about how to avoid these mistakes and make my designs more fair and accessible for everyone.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This work presents a valuable self-supervised method for the segmentation of 3D cells in microscopy images, alongside an implementation as a Napari plugin and an annotated dataset. While the Napari plugin is readily applicable and promises to eliminate time consuming data labeling to speed up quantitative analysis, there is incomplete evidence to support the claim that the segmentation method generalizes to other light-sheet microscopy image datasets beyond the two specific ones used here.

      Technical Note: We showed the utility of CellSeg3D in the first submission and in our revision on 5 distinct datasets; 4 of which we showed F1-Score performance on. We do not know which “two datasets” are referenced. We also already showed this is not limited to LSM, but was used on confocal images; we already limited our scope and changed the title in the last rebuttal, but just so it’s clear, we also benchmark on two non-LSM datasets.

      In this revision, we have now additionally extended our benchmarking of Cellpose and StarDrist on all 4 benchmark datasets, where our Wet3D (our novel contribution of a self-supervised model) outperforms or matches these supervised baselines. Moreover, we perform rigorous testing of our model’s generalization by training on one dataset and testing generalization to the other 3; we believe this is on par (or beyond) what most cell segmentation papers do, thus we hope that “incomplete” can now be updated.

      Public Reviews:

      Reviewer #1 (Public review):

      This work presents a self-supervised method for the segmentation of 3D cells in microscopy images, an annotated dataset, as well as a napari plugin. While the napari plugin is potentially useful, there is insufficient evidence in the manuscript to support the claim that the proposed method is able to segment cells in other light-sheet microscopy image datasets than the two specific ones used here.

      Thank you again for your time. We benchmarked already on four datasets the performance of WNet3Dd (our 3D SSL contribution) - thus, we do not know which two you refer to. Moreover, we now additionally benchmarked Cellpose and StarDist on all four so readers can see that on all datasets, WNet3D outperforms or matches these supervised methods.

      I acknowledge that the revision is now more upfront about the scope of this work. However, my main point still stands: even with the slight modifications to the title, this paper suggests to present a general method for self-supervised 3D cell segmentation in light-sheet microscopy data. This claim is simply not backed up.

      We respectfully disagree; we benchmark on four 3D datasets: three curated by others and used in learning ML conference proceedings, and one that we provide that is a new ground truth 3D dataset - the first of its kind - on mesoSPIM-acquired brain data. We believe benchmarking on four datasets is on par (or beyond) with current best practices in the field. For example, Cellpose curated one dataset and tested on held-out test data on this one dataset (https://www.nature.com/articles/s41592-020-01018-x) and benchmarked against StarDist and Mask R-CNN (two models). StarDist (Star-convex Polyhedra for 3D Object Detection and Segmentation in Microscopy) benchmarked on two datasets and against two models, IFT-Watershed and 3D U-Net. Thus, we feel our benchmarking on more models and more datasets is sufficient to claim our model and associated code is of interest to readers and supports our claims (for comparison, Cellpose’s title is “Cellpose: a generalist algorithm for cellular segmentation”, which is much broader than our claim).

      I still think the authors should spell out the assumptions that underlie their method early on (cells need to be well separated and clearly distinguishable from background). A subordinate clause like "often in cleared neural tissue" does not serve this purpose. First, it implies that the method is also suitable for non-cleared tissue (which would have to be shown). Second, this statement does not convey the crucial assumptions of well separated cells and clear foreground/background differences that the method is presumably relying on.

      We expanded the manuscript now quite significantly. To be clear, we did show our method works on non-cleared tissue; the Mouse Skull, 3D platynereis-Nuclei, and 3D platynereis-ISH-Nuclei is not cleared tissue, and not all with LSM, but rather with confocal microscopy. We attempted to make that more clear in the main text.

      Additionally, we do not believe it needs to be well separated and have a perfectly clean background. While we removed statements like "often in cleared neural tissue", expanded the benchmarking, and added a new demo figure for the readers to judge. As in the last rebuttal, we provide video-evidence (https://www.youtube.com/watch?v=U2a9IbiO7nE) of the WNet3D working on the densely packed and hard to segment by a human, Mouse Skull dataset and linked this directly in the figure caption.

      We have re-written the main manuscript in an attempt to clarify the limitations, including a dedicated “limitations” section. Thank you for the suggestion.

      It does appear that the proposed method works very well on the two investigated datasets, compared to other pre-trained or fine-tuned models. However, it still remains unclear whether this is because of the proposed method or the properties of those specific datasets (namely: well isolated cells that are easily distinguished from the background). I disagree with the authors that a comparison to non-learning methods "is unnecessary and beyond the scope of this work". In my opinion, this is exactly what is needed to proof that CellSeg3D's performance can not be matched with simple image processing.

      We want to again stress we benchmarked WNet3D on four datasets, not two. But now additionally added benchmarking with Cellpose, StarDist and a non-deep learning method as requested (see new Figures 1 and 3).

      As I mentioned in the original review, it appears that thresholding followed by connected component analysis already produces competitive segmentations. I am confused about the authors' reply stating that "[this] is not the case, as all the other leading methods we fairly benchmark cannot solve the task without deep learning". The methods against which CellSeg3D is compared are CellPose and StarDist, both are deep-learning based methods.

      That those methods do not perform well on this dataset does not imply that a simpler method (like thresholding) would not lead to competitive results. Again, I strongly suggest the authors include a simple, non-learning based baseline method in their analysis, e.g.: * comparison to thresholding (with the same post-processing as the proposed method) * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)

      We added a non-deep learning based approach, namely, comparing directly to thresholding with the same post hoc approach we use to go from semantic to instance segmentation. WNet3D (and other deep learning approaches) perform favorably (see Figure 2 and 3).

      Regarding my feedback about the napari plugin, I apologize if I was not clear. The plugin "works" as far as I tested it (i.e., it can be installed and used without errors). However, I was not able to recreate a segmentation on the provided dataset using the plugin alone (see my comments in the original review). I used the current master as available at the time of the original review and default settings in the plugin.

      We updated the plugin and code for the revision at your request to make this possible directly in the napari GUI in addition to our scripts and Jupyter Notebooks (please see main and/or `pip install --upgrade napari-cellseg3d`’ the current is version 0.2.1). Of course this means the original submission code (May 2024) will not have this in the GUI so it would require you to update to test this. Alternatively, you can see the demo video we now provide for ease: https://www.youtube.com/watch?v=U2a9IbiO7nE (we understand testing code takes a lot of time and commitment).

      We greatly thank the review for their time, and we hope our clarifications, new benchmarking, and re-write of the paper now makes them able to change their assessment from incomplete to a more favorable and reflective eLife adjective.

      Reviewer #2 (Public review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      -  The idea behind the self-supervised learning loss is interesting.

      -  It provides a new annotated dataset for an important segmentation problem.

      -  The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      -  The comparison to other methods on the provided dataset is extensive and experiments are reproducible via public notebooks.

      Weaknesses:

      The experiments presented by the authors support the core claims made in the paper. However, they do not convincingly prove that the method is applicable to segmentation problems with more complex morphologies or more crowded cells/nuclei.

      Major weaknesses:

      (1) The method only provides functionality for semantic segmentation outputs and instance segmentation is obtained by morphological post-processing. This approach is well known to be of limited use for segmentation of crowded objects with complex morphology. This is the main reason for prediction of additional channels such as in StarDist or CellPose. The experiments do not convincingly show that this limitation can be overcome as model comparisons are only done on a single dataset with well separated nuclei with simple morphology. Note that the method and dataset are still a valuable contribution with this limitation, which is somewhat addressed in the conclusion. However, I find that the presentation is still too favorable in terms of the presentation of practical applications of the method, see next points for details.

      Thank you for noting the methods strengths and core features. Regarding weaknesses, we have revised the manuscript again and added direct benchmarking now on four datasets and a fifth “worked example” (https://www.youtube.com/watch?v=3UOvvpKxEAo&t=4s) in a new Figure 4.

      We also re-wrote the paper to more thoroughly present the work (previously we adhered to the “Brief Communication” eLife format), and added an explicit note in the results about model assumptions.

      (2) The experimental set-up for the additional datasets seems to be unrealistic as hyperparameters for instance segmentation are derived from a grid search and it is unclear how a new user could find good parameters in the plugin without having access to already annotated ground-truth data or an extensive knowledge of the underlying implementations.

      We agree that of course with any self-supervised method the user will need a sense of what a good outcome looks like; that is why we provide Google Colab Notebooks

      (https://github.com/AdaptiveMotorControlLab/CellSeg3D/tree/main/notebooks) and the napari-plugin GUI for extensive visualization and even the ability to manually correct small subsets of the data and refine the WNet3D model.

      We attempted to make this more clear with a new Figure 2 and additional functionality directly into the plugin (such as the grid search). But, we believe this “trade-off” for SSL approaches over very labor intensive 3D labeling is often worth it; annotators are also biased so extensive checking of any GT data is equally required.

      We also added the “grid search” functionality in the GUI (please `pip install --upgrade napari-cellseg3d`; the latest v0.2.1) to supplement the previously shared Notebook (https://github.com/C-Achard/cellseg3d-figures/blob/main/thresholds_opti/find_best_threshold s.ipynb) and added a new YouTube video: https://www.youtube.com/watch?v=xYbYqL1KDYE.

      (3) Obtaining segmentation results of similar quality as reported in the experiments within the napari plugin was not possible for me. I tried this on the "MouseSkull" dataset that was also used for the additional results in the paper.

      Again we are sorry this did not work for you, but we added new functionality in the GUI and made a demo video (https://www.youtube.com/watch?v=U2a9IbiO7nE) where you either update your CellSeg3D code or watch the video to see how we obtained these results.

      Here, I could not find settings in the "Utilities->Convert to instance labels" widget that yielded good segmentation quality and it is unclear to me how a new user could find good parameter settings. In more detail, I cannot use the "Voronoi-Otsu" method due to installation issues that are prohibitive for a non expert user and the "Watershed" segmentation method yields a strong oversegmentation.

      Sorry to hear of the installation issue with Voronoi-Otsu; we updated the documentation and the GUI to hopefully make this easier to install. While we do not claim this code is for beginners, we do aim to be a welcoming community, thus we provide support on GitHub, extensive docs, videos, the GUI, and Google Colab Notebooks to help users get started.

      Comments on revised version

      Many of my comments were addressed well:

      -  It is now clear that the results are reproducible as they are well documented in the provided notebooks, which are now much more prominently referenced in the text.

      Thanks!

      -  My concerns about an unfair evaluation compared to CellPose and StarDist were addressed. It is now clear that the experiments on the mesoSPIM dataset are extensive and give an adequate comparison of the methods.

      Thank you; to note we additionally added benchmarking of Cellpose and StarDist on the three additional datasets (for R1), but hopefully this serves to also increase your confidence in our approach.

      -  Several other minor points like reporting of the evaluation metric are addressed.

      I have changed my assessment of the experimental evidence to incomplete/solid and updated the review accordingly. Note that some of my main concerns with the usability of the method for segmentation tasks with more complex morphology / more crowded cells and with the napari plugin still persist. The main points are (also mentioned in Weaknesses, but here with reference to the rebuttal letter):

      - Method comparison on datasets with more complex morphology etc. are missing. I disagree that it is enough to do this on one dataset for a good method comparison.

      We benchmarked WNet3D (our contribution) on four datasets, and to aid the readers we additionally now added Cellpose and StarDist benchmarking on all four. WNet3D performs favorably, even on the crowded and complex Mouse Skull data. See the new Figure 3 as well as the associated video: https://www.youtube.com/watch?v=U2a9IbiO7nE&t=1s.

      -  The current presentation still implies that CellSeg3d **and the napari plugin** work well for a dataset with complex nucleus morphology like the Mouse Skull dataset. But I could not get this to work with the napari plugin, see next points.

      - First, deriving hyperparameters via grid search may lead to over-optimistic evaluation results. How would a user find these parameters without having access to ground-truth? Did you do any experiments on the robustness of the parameters?

      -  In my own experiments I could not do this with the plugin. I tried this again, but ran into the same problems as last time: pyClesperanto does not work for me. The solution you link requires updating openCL drivers and the accepted solution in the forum post is "switch to a different workstation".

      We apologize for the confusion here; the accepted solution (not accepted by us) was user specific as they switched work stations and it worked, so that was their solution. Other comments actually solved the issue as well. For ease this package can be installed on Google Colab (here is the link from our repo for ease: https://colab.research.google.com/github/AdaptiveMotorControlLab/CellSeg3d/blob/main/not ebooks/Colab_inference_demo.ipynb) where pyClesperanto can be installed via: !pip install pyclesperanto-prototype without issue on Google Colab.

      This a) goes beyond the time I can invest for a review and b) is unrealistic to expect computationally inexperienced users to manage. Then I tried with the "watershed" segmentation, but this yields a strong oversegmentation no matter what I try, which is consistent with the predictions that look like a slightly denoised version of the input images and not like a proper foreground-background segmentation. With respect to the video you provide: I would like to see how a user can do this in the plugin without having a prior knowledge on good parameters or just pasting code, which is again not what you would expect a computationally unexperienced user to do.

      We agree with the reviewer that the user needs domain knowledge, but we never claim our method was for inexperienced users. Our main goal was to show a new computer vision method with self-supervised learning (WNet3D) that works on LSM and confocal data for cell nuclei. To this end, we made you a demo video to show how a user can visually perform a thresholding check https://www.youtube.com/watch?v=xYbYqL1KDYE&t=5s, and we added all of these new utilities to the GUI, thanks for the suggestion. Otherwise, the threshold can also be done in a Notebook (as previously noted).

      I acknowledge that some of these points are addressed in the limitations, but the text still implies that it is possible to get good segmentation results for such segmentation problems: "we believe that our self-supervised semantic segmentation model could be applied to more challenging data as long as the above limitations are taken into account." From my point of view the evidence for this is still lacking and would need to be provided by addressing the points raised above for me to further raise the Incomplete/solid rating, especially showing how this can be done wit the napari plugin. As an alternative, I would also consider raising it if the claims are further reduced and acknowledge that the current version of the method is only a good method for well separated nuclei.

      We hope our new benchmarking and clear demo on four datasets helps improve your confidence in our evidence in our approach. We also refined our over text and hope our contributions, the limitations and the advantages are now more clear.

      I understand that this may be frustrating, but please put yourself in the role of a new reader of this work: the impression that is made is that this is a method that can solve 3D segmentation tasks in light-sheet microscopy with unsupervised learning. This would be a really big achievement! The wording in the limitation section sounds like strategic disclaimers that imply that it is still possible to do this, just that it wasn't tested enough.

      But, to the best of my assessment, the current version of the method only enables the more narrow case of well separated nuclei with a simple morphology. This is still a quite meaningful achievement, but more limited than the initial impression. So either the experimental evidence needs to be improved, including a demonstration how to achieve this in practice, including without deriving parameters via grid-search and in the plugin, or the claim needs to be meaningfully toned down.

      Thanks for raising this point; we do think that WNet3D and the associated CellSeg3D package - aimed to continue to integrate state of the art models, is a non-trivial step forward. Have we completely solved the problem, certainly not, but given the limited 3D cell segmentation tools that exist, we hope this, coupled with our novel 3D dataset, pushes the field forward. We don’t show it works on the narrow well-separated use case, but rather show this works even better than supervised models on the very challenging benchmark Mouse Skull. Given we now show evidence that we outperform or match supervised algorithms with an unsupervised approach, we respectfully do think this is a noteworthy achievement. Thank you for your time in assessing our work.

    1. Reviewer #2 (Public review):

      Summary:

      The goal of the paper was to trace the transitions hippocampal microglia undergo along aging. ScRNA-seq analysis allowed the authors to predict a trajectory and hypothesize about possible molecular checkpoints, which keep the pace of microglial aging. E.g. TGF1b was predicted as a molecule slowing down the microglial aging path and indeed, loss of TGF1 in microglia led to premature microglia aging, which was associated with premature loss of cognitive ability. The authors also used the parabiosis model to show how peripheral, blood-derived signals from the old organism can "push" microglia forward on the aging path.

      Strengths:

      A major strength and uniqueness of this work is the in-depth single-cell dataset, which may be a useful resource for the community, as well as the data showing what happens to young microglia in heterochronic parabiosis setting and upon loss of TGFb in their environment.

      Weaknesses:

      All weaknesses were addressed during revision.

      Overall:

      In general, I think the authors did a good job following the initial observations and devised clever ways to test the emerging hypotheses. The resulting data are an important addition to what we know about microglial aging and can be fruitfully used by other researchers, e.g. those working on microglia in a disease context.

      Comments on revisions:

      All my comments were addressed.

    1. Welcome back, and in this video, I want to talk about how RDS can be backed up and restored, as well as covering the different methods of backup that we have available. Now we do have a lot to cover, so let's jump in and get started. Within RDS, there are two types of backup-like functionality: automated backups and snapshots. Both of these are stored in S3, but they use AWS-managed buckets, so they won't be visible to you within your AWS console. You can see backups in the RDS console, but you can't move to S3 and see any form of RDS bucket, which exists for backups. Keep this in mind because I've seen questions on it in the exam.

      Now, the benefits of using S3 is that any data contained in backups is now regionally resilient, because it's stored in S3, which replicates data across multiple AWS availability zones within that region. RDS backups, when they do occur, are taken in most cases from the standby instance if you have multi-AZ enabled. So, while they do cause an I/O pause, this occurs from the standby instance, and so there won't be any application performance issues. If you don't use multi-AZ, for example, with test and development instances, then the backups are taken from the only available instance, so you may have pauses in performance.

      Now, I want to step through how backups work in a little bit more detail, and I'm going to start with snapshots. Snapshots aren't automatic; they're things that you run explicitly or via a script or custom application. You have to run them against an RDS database instance. They're stored in S3, which is managed by AWS, and they function like the EBS snapshots that you've covered elsewhere in the course. Snapshots and automated backups are taken of the instance, which means all the databases within it, rather than just a single database. The first snapshot is a full copy of the data stored within the instance, and from then on, snapshots only store data which has changed since the last snapshot.

      When any snapshot occurs, there is a brief interruption to the flow of data between the compute resource and the storage. If you're using single AZ, this can impact your application. If you're using multi-AZ, this occurs on the standby, and so won't have any noticeable effect. Time-wise, the initial snapshot might take a while; after all, it's a full copy of the data. From then on, snapshots will be much quicker because only changed data is being stored. Now, the exception to this are instances where there's a lot of data change. In this type of scenario, snapshots after the initial one can also take significant amounts of time. Snapshots don't expire; you have to clear them up yourself. It means that snapshots live on past when you delete the RDS instance. Again, they're only deleted when you delete them manually or via some external process. Remember that one because it matters for the exam.

      Now you can run one snapshot per month, one per week, one per day, or one per hour. The choice is yours because they're manual. And one way that lower recovery point objectives can be met is by taking more frequent snapshots. The lower the time frame between snapshots, the lower the maximum data loss that can occur when you have a failure. Now, this is assuming we only have snapshots available, but there is another part to RDS backups, and that's automated backups. These occur once per day, but the architecture is the same. The first one is a full, and any ones which follow only store changed data. So far, you can think of them as though they're automated snapshots, because that's what they are. They occur during a backup window which is defined on the instance. You can allow AWS to pick one at random or use a window which fits your business. If you're using single AZ, you should make sure that this happens during periods of little to no use, as again there will be an I/O pause. If you're using multi-AZ, this isn't a concern, as the backup occurs from the standby.

      In addition to this automated snapshot, every five minutes, database transaction logs are also written to S3. Transaction logs store the actual operations which change the data, so operations which are executed on the database. And together with the snapshots created from the automated backups, this means a database can be restored to a point in time with a five-minute granularity. In theory, this means a five-minute recovery point objective can be reached. Now automated backups aren't retained indefinitely; they're automatically cleared up by AWS, and for a given RDS instance, you can set a retention period from zero to 35 days. Zero means automated backups are disabled, and the maximum is 35 days. If you use a value of 35 days, it means that you can restore to any point in time over that 35-day period using the snapshots and transaction logs, but it means that any data older than 35 days is automatically removed.

      When you delete the database, you can choose to retain any automated backups, but, and this is critical, they still expire based on the retention period. The way to maintain the contents of an RDS instance past this 35-day max retention period is that if you delete an RDS instance, you need to create a final snapshot, and this snapshot is fully under your control and has to be manually deleted as required. Now, RDS also allows you to replicate backups to another AWS region, and by backups, I mean both snapshots and transaction logs. Now, charges apply for both the cross-region data copy and any storage used in the destination region, and I want to stress this really strongly. This is not the default. This has to be configured within automated backups. You have to explicitly enable it.

      Now let's talk a little bit about restores. The way RDS handles restores is really important, and it's not immediately intuitive. It creates a new RDS instance when you restore an automated backup or a manual snapshot. Why this matters is that you will need to update applications to use the new database endpoint address because it will be different than the existing one. When you restore a manual snapshot, you're restoring the database to a single point in time. It's fixed to the time that the snapshot was created, which means it influences the RPO. Unless you created a snapshot right before a failure, then chances are the RPO is going to be suboptimal. Automated backups are different. With these, you can choose a specific point to restore the database to, and this offers substantial improvements to RPO. You can choose to restore to a time which was minutes before a failure.

      The way that it works is that backups are restored from the closest snapshot, and then transaction logs are replayed from that point onwards, all the way through to your chosen time. What's important to understand though is that restoring snapshots isn't a fast process. If appropriate for the exam that you're studying, I'm going to include a demo where you'll get the chance to experience this yourself practically. It can take a significant amount of time to restore a large database, so keep this in mind when you think about disaster recovery and business continuity. The RDS restore time has to be taken into consideration.

      Now in another video elsewhere in this course, I'm going to be covering read replicas, and these offer a way to significantly improve RPO if you want to recover from failure. So, RDS automated backups are great as a recovery to failure, or as a restoration method for any data corruption, but they take time to perform a restore, so account for this within your RTO planning. Now once again, if appropriate for the exam that you're studying, you're going to get the chance to experience a restore in a demo lesson elsewhere in the course, which should reinforce the knowledge that you've gained within this theory video. If you don't see this then don't worry, it's not required for the exam that you're studying.

      At this point though, that is everything I wanted to cover in this video, so go ahead and complete the video, and when you're ready, I'll look forward to you joining me in the next.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      __* SUMMARY

      This study utilizes the developing chicken neural tube to assess the regulation of the balance between proliferative and neurogenic divisions in the vertebrate CNS. Using single-cell RNAseq and endogenous protein tagging, the authors identify Cdkn1c as a potential regulator of the transition towards neurogenic divisions. Cdkn1c knockdown and overexpression experiments suggest that low Cdkn1c expression enhances neurogenic divisions. Using a combination of clonal analysis and sequential knockdown, the authors find that Cdkn1c lengthens the G1 phase of the cell cycle via inhibition of cyclinD1. This study represents a significant advance in understanding how cells can transition between proliferative and asymmetric modes of division, the complex and varying roles of cycle regulators, and provides technical advance through innovative combination of existing tools.

      MAJOR AND MINOR COMMENTS *__

      Overall Sample numbers are missing or unclear throughout for all imaging experiments. The authors should add numbers of cells analysed and/or numbers of embryos for their results to be appropriately convincing.

      This information is now provided in the figure legends (numbers of cells analyzed and/or numbers of embryos) except for data in Figure 5, which are presented in a new Supplementary Table

      Values and error bars on graphs must be defined throughout. Are the values means and error bars SD or SEM?

      We have used SD throughout the study. This information has now been added in figure legends.

      Results 2

      ____A reference should be provided for cell type distribution in spinal neural tube, where the authors state that cell bodies of progenitors reside within the ventricular zone.

      We now cite a recent review on spinal cord development (Saade and E. Marti, Nature Reviews Neuroscience, 2025) to illustrate this point

      The authors state that Cdkn1c "was expressed at low levels in a salt and pepper fashion in the ventricular zone, where the cell bodies of neural progenitors reside, and markedly increased in a domain immediately adjacent to this zone which is enriched in nascent neurons on their way to the mantle zone. In contrast, the transcript was completely excluded from the mantle zone, where HuC/D positive mature neurons accumulate." It is not clear if this is referring only to E4 or also to E3 embryos. Indeed, Cdkn1c expression appears to be much more salt and pepper at E3 and only resolves into a clear domain of high expression adjacent to the mantle zone at E4. It may be helpful if this expression pattern could be described in a bit more detail highlighting the changes that occur between E3 and E4.

      We have now reformulated this paragraph as follows: "At E3, the transcript was expressed at low levels in a salt and pepper fashion in the ventricular zone, where the cell bodies of neural progenitors reside (Saade and Marti, 2025)). One day later, at E4, this salt and pepper expression was still detected in the ventricular zone, while it markedly increased in the region of the mantle zone that is immediately adjacent to the ventricular zone. This region is enriched in nascent neurons on their way to differentiation that are still HuC/D negative. In contrast, the transcript was completely excluded from the more basal region of the mantle zone, where mature HuC/D positive neurons accumulate.

      It would be useful to annotate the ISH images in Fig 2A to show the ventricular and mantle zones as defined by immunofluorescence.

      Thank you for the suggestion. We have now added a dotted line that separates the ventricular zone from the mantle zone at E3 and E4 in Figure 2A

      Reference should be included for pRb expression dynamics.

      This section has been rewritten in response to comments from Reviewer #3, and now contains several references regarding pRb expression dynamics. See detailed response to Reviewer #3 for the new version

      Could the Myc tag insertion approach disrupt protein function or turnover? ____Why was the insertion target site at the C terminus chosen?

      The first reason was practical: at the time when we decided to generate a KI in Cdkn1c, we had already generated several successful KIs at C-termini of other genes, in particular using the P2A-Gal4 approach (see Petit-Vargas et al, 2024), and had not yet experimented with N-terminal Gal4-P2A. We therefore decided to use the same approach for Cdkn1c.

      We also chose to target the C-terminus to avoid affecting the active CKI domain which is located at the N-terminus.

      Nevertheless, the C-terminal targeting may have an impact on the turnover: it has been described that CDK2 phosphorylation of a Threonin close to the C-terminus of Cdkn1c leads to its targeting for degradation by the proteasome from late G1 (Kamura et al, PNAS, 2003; doi: 10.1073/pnas.1831009100). We can therefore not rule out that the addition of the Myc tags close to this phosphorylation site modulates the dynamics of Cdkn1c degradation. We note, however, that we observed little overlap between the Cdkn1c-Myc and pRb signals in cycling progenitors, suggesting that Cdkn1c is effectively degraded from late G1.

      OPTIONAL Could a similar approach be used to tag Cdkn1c with a fluorescent protein to enable live imaging of dynamics?

      Although it could be done, we have not attempted to do this for CDKN1c because our current experience of endogenous tagging of several genes with a similar expression level (based on our scRNAseq data) and nuclear localization (Hes5, Pax7) with a fluorescent reporter shows that the fluorescent signal is extremely low or undetectable in live conditions; Therefore we favored the multi-Myc tagging approach, and indeed we find that the Myc signal in progenitors is also very low even though it is amplified by the immunohistology method; this suggests that most likely, the only signal that would be detected -if any- with a fluorescent approach would be the peak of expression in newborn neurons.

      In suppl Fig 1C nlsGFP-positive cells are shown in the control shRNA condition. How can this be explained and does it impact the interpretation of the findings?

      The reviewer refers to the control gRNA condition in panel C, that shows that two small patches of GFP-positive cells are visible in the whole spinal cord of this particular embryo.

      Technically, the origin of these "background" cells could be multiple. A spontaneous legitimate insertion at the CDKN1c locus by homologous recombination is possible, although we tend to think it is unlikely, given the extremely short length of the arms of homology; illegitimate insertions of the Myc-P2A-Gal4 cassette at off-target sites of the control gRNA is a possibility. Alternatively, a low-level leakage of Gal4 expression from the donor vector could lead to a detectable nls-GFP expression in a few cells via Gal4-UAS amplification.

      In any case, these cells are observed at a very low frequency (1 or 2 patches of cells/embryo) relative to the signal obtained in presence of the CDKN1c gRNA#1 (probably several thousand positive cells per embryo). This suggests that if similar "background" cells are also present in presence of the CDKN1c gRNA, they would not significantly contribute to the signal, and would not impact the interpretation.

      In Fig 2B, there are a number of Myc labelled cells in the mantle zone, whereas the in situ images show no appreciable transcript expression. Is this because the protein but not the transcript is present in these cells? Could the authors comment on this?

      It is indeed possible that the CDKN1c protein is more stable than the transcript in newborn neurons and remains detectable in the mantle zone after the mRNA disappears. In Gui et al, 2006, where they use an anti-CDKN1c antibody to label the protein in mouse spinal cord transverse sections at E11.5 (Figure 1B), a few positive cells are also visible basally. They could correspond to neurons that have not yet degraded CDKN1c, although it is unclear in the picture whether these cells are really in the mantle zone or in the adjacent dorsal root ganglion; we note that a similar differential expression dynamics between mRNA and protein has been described for Tis21/Btg2 in the developing mouse cortex, where the protein, but not the mRNA, is detected in some differentiated bIII-tubulin-positive neurons (Iacopetti et al, 1999).

      However, related to our response above to a previous comment from the same reviewer, we cannot rule out the possibility that the Myc tags modulate the turnover of CDKN1c protein and slow down the dynamics of its degradation in differentiating neurons.

      We have added a sentence to indicate the presence of these cells: "In addition, a few Myc-positive cells were located deeper in the mantle zone, where the transcript is no more present, suggesting that the protein is more stable than the transcript."

      Results

      It should be mentioned how mRNA expression levels were quantified in the shRNA validation experiment (supp Fig 2A).

      We did not quantify the level of mRNA reduction, it was just evaluated by eye. The reason for choosing shRNA1 for the whole study was dictated by 1) the fact that we more consistently saw (by eye) a reduction in the signal on the electroporated side with this construct than with the other shRNAs, and 2) that the effect on neurogenesis was also more consistent.

      We will perform additional experiments to provide some quantitation of the shRNA effect, as this is also requested by Reviewer #3.

      As our Cdkn1c KI approach offers a direct read-out of the protein levels in the ventricular and mantle zones, and since our shRNA strategy of "partial knock-down" is based on the idea that the shRNA effect should be more complete in progenitors expressing Cdkn1c at low levels than in newborn progenitors that express the protein at a higher level, we propose to validate the shRNA in the Cdkn1c-Myc knock-in background, by comparing the Myc signal intensity between control and Cdkn1c shRNA conditions

      Figure panels are not currently cited in order. Citation or figure order could be changed.

      We have now added a common citation of the panels referring to analyses at 24 and 48 hours after electroporation (now Figure 3A-F), allowing us to display the experimental data on the figure according to the timing post electroporation, while the text details the phenotype at the later time point first.

      The authors should provide representative images for the graphs shown in Fig 3A and 3B. These could go into supplementary if the authors prefer.

      We have added images in a revised version of the Figure 3, as requested

      A supplementary figure showing the Caspase3 experiment should be added.

      We have added data showing Caspase3 experiments in Supplementary Figure 3D

      OPTIONAL. Identification of sister cells in the clonal analysis experiments is based on static images and cannot be guaranteed. Could live imaging be used to watch divisions followed by fixation and immunostaining to confirm identity?

      We agree with the reviewer that direct tracking is the most direct method for the identification of pairs of sister cells. However, it remains technically challenging, and the added value compared to the retrospective identification would be limited, while requiring a great workload, especially considering the many different experimental conditions that we have explored in this study.

      Results 4

      How did the authors quantify the intensity of endogenous Myc-tagged Cdkn1c to confirm the validity of the Pax7 locus knock in? Can they show that the expression level was consistently lower than the endogenous expression in neurons? Quantification and sample numbers should be shown.

      We have not done these quantifications in the original version of the study. We will add a quantification of the signal intensity in the ventricular and mantle zones for the revised version of the manuscript, as also requested by reviewer #3.

      In Fig 4B, the brightness of row 2 column 1 is lower than the same image in row 2 column 2, which is slightly misleading, since it makes the misexpressed expression level look lower than it is compared with endogenous in column 3. Is this because only a single z-section is being displayed in the zoomed in image? If so, this should be stated in the figure legend.

      All images in the figure are single Z confocal images. Images in Column 2 (showing both electroporated sides of the same tube) were acquired with a 20x objective, whereas the insets shown in Columns 1 and 3 are 100x confocal images. 100x images on both sides were acquired with the same acquisition parameters, and the display parameters are the same for both images in the figure. The signal intensity can therefore be compared directly between columns 1 and 3.

      We have modified the legend of the Figure to indicate these points: "The insets shown in Columns 1 and 3 are 100x confocal images acquired in the same section and are presented with the same display parameters".

      In Fig 4D, the increase in neurogenic divisions is mainly because of the rise in terminal NN divisions according to the graph, but no clear increase in PN divisions. Could the authors comment on the significance of this?

      Our interpretation is that Pax7-CDKN1c misexpression experiments cause both PP to PN and PN to NN conversions. This is coherent with the classical idea of a progressive transition between these three modes of division in the spinal cord. Coincidentally, in our experimental conditions (timing of analysis and level of overexpression), the increase in PN resulting from PP to PN conversions is perfectly balanced by a decrease resulting from PN to NN conversions, giving the artificial impression that the PN compartment is unaffected. A less likely hypothesis would be that misexpression directly transforms symmetric PP into symmetric NN divisions, and that asymmetric PN divisions are insensitive to CDKN1c levels. We do not favor this hypothesis, because one would expect, in that case, that the shRNA approach would also not affect the PN compartment, and it is not what we have observed (see Figure 3H - previously 3F).

      We have modified the manuscript to elaborate on our interpretation of this result: "We observed an increase in the proportion of terminal neurogenic (NN) divisions and a decrease in proliferative (PP) divisions (Figure 4D). This suggests that CDKN1c premature expression in PP progenitors converts them to the PN mode of division, while the combined endogenous and Pax7-driven expression of CDKN1c converts PN progenitors to the NN mode of division. Coincidentally, at the stage analyzed, PP to PN conversions are balanced by PN to NN conversions, leaving the PN proportion artificially unchanged. The alternative interpretation of a direct conversion of symmetric PP into symmetric NN divisions is less likely, because the PN compartment was affected in the reciprocal CDKN1c shRNA approach (see Figure 3H)."

      Results 5 ____The proportion of pRb-positive progenitors having entered S phase was stated to be higher at all time points; however, it is not significantly higher until 6h30 and is actually trending lower at 2h30.

      Thank you for pointing this out. We have modified the sentence in the main text.

      "We found that the proportion of pRb positive progenitors having entered S phase (EdU positive cells) was significantly higher at all time points examined more than 4h30 after FT injection in the Cdkn1c knock-down condition compared to the control population (Figure 5D)"

      OPTIONAL Could CyclinD1 activity be directly assessed?

      This is an interesting suggestion. For example, using the fluorescent CDK4/6 sensor developed by Yang et al (eLife, 2020; https://doi.org/10.7554/eLife.44571) in a CDKN1c shRNA condition would represent an elegant experimental alternative to complement our rescue experiments with the double CDKN1c/CyclinD1 shRNA. However, we fear that setting up and calibrating such a tool for in vivo usage in the chick embryo represents too much of a challenge for incorporation in this study.

      General ____Scale bars missing fig s1c s4d.

      Thanks for pointing this out. Scale bars have been added in the figures and corresponding legends

      OPTIONAL Some of the main findings be replicated in another species, for example, mouse or human to examine whether the mechanism is conserved.

      OPTIONAL Could use approaches other than image analysis be used to reinforce findings, for example biochemical methods, RNAseq or FACS?

      We agree that it will be interesting and important that our findings are replicated in other species, experimental systems, and even tissues, or by alternative experimental approaches. Nevertheless, it is probably beyond the scope of this study.

      A model cartoon to summarise outcomes would be useful.

      We thank the reviewer for the suggestion. We will propose a summary cartoon for the revised version of the manuscript.

      Unclear how cells were determined to be positive or negative for a label. Was this decided by eye? If so, how did the authors ensure that this was unbiased?

      Positivity or negativity was decided by eye. However, for each experiment, we ensured that all images of perturbed conditions and the relevant controls were analyzed with the same display parameters and by the same experimenter to guarantee that the criteria to determine positivity or negativity were constant.

      Reviewer #1 (Significance (Required)):

      SIGNIFICANCE

      Strengths: This manuscript investigates the mechanisms regulating the switch from symmetric proliferative divisions to neurogenic division during vertebrate neuronal differentiation. This is a question of fundamental importance, the answer to which has eluded us so far. As such, the findings presented here are of significant value to the neurogenesis community and will be of broad interest to those interested in cell divisions and asymmetric cell fate acquisition. Specific strengths include:

      • Variety of approaches used to manipulate and observe individual cell behaviour within a physiological context.
      • A limitation of using the chicken embryo is the lack of available antibodies for immunostaining. The authors take advantage of recent advances in chicken embryo CRISPR strategy to endogenously tag the target protein with Myc, to facilitate immunostaining.
      • Innovative combination of genetic and labelling tools to target cells, for example, use of FlashTag and EdU in combination to more accurately assess G1 length than the more commonly used method.
      • Premature misexpression demonstrates that the previously observed dynamics indeed regulate cell fate.
      • Mechanistic insight by examining downstream target CyclinD1.
      • Clearly presented with useful illustrations throughout.
      • Logic is clear and examination thorough.
      • Conclusions are warranted on the basis of their findings. ____Limitations ____T____his study primarily used visual analysis of fixed tissue images to assess the main outcomes. To reinforce the conclusions, these could be supplemented with live imaging to appreciate dynamics, or biochemical techniques to look at protein expression levels.

      Some aspects of quantification require explanation in order for the experiments to be replicated.

      It is imperative that precise sample sizes are included for all experiments presented.

      Advance: ____First functional demonstration role for Cdkn1c in regulating neurogenic transition in progenitors.

      Conceptual advance suggesting Cdkn1c has dual roles in driving neurogenesis: promoting neurogenic divisions of progenitors and the established role of mediating cell cycle exit previously reported.

      Technical advances in the form of G1 signposting and endogenous Myc tagging using CRISPR in chicken embryonic tissue.

      Audience:

      Of broad interest to developmental biologists. Could be relevant to cancer, since Cdkn1c is implicated.

      Please define your field of expertise with a few keywords to help the authors contextualize your point

      Developmental biology, vertebrate embryonic development, neuronal differentiation, imaging. Please note that we have not commented on RNAseq experiments as these are outside of our area of expertise.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The work by Mida and colleagues addresses important questions about neurogenesis in the embryo, using the chicken neural tube as their model system. The authors investigate the mechanisms involved in the transition from stem cell self-renewal to neurogenic progenitor divisions, using a combination of single cell, gene functional and tracing studies.

      The authors generated a new single cell data set from the embryonic chicken spinal cord and identify a transitory cell population undergoing neuronal differentiation, which expresses Tis21, Neurog2 and Cdkn1c amongst other genes. They then study the role of Cdkn1c and investigate the hypothesis that it plays a dual role in spinal cord neurogenesis: low levels favour transition from proliferative to neurogenic divisions and high levels drive cell cycle exit and neuronal differentiation.

      Major comments

      I have only a general comment related to the main point of the paper. The authors claim that Cdkn1c onset in cycling progenitor drives transition towards neurogenic modes of division, which is different from its role in cell cycle exit and differentiation. Figures 3F and 4D are key figures where the authors analysed PP, PN and NN mode of divisions via flash tag followed by analysis of sister cell fate. If their assumption is correct, shouldn't they also see, for example in Fig. 4D, an increase in PN or is this too transient to be observed or is it bypassed?

      As already stated in our response to a similar question from reviewer #1, our interpretation is that Pax7-CDKN1c misexpression experiments cause both PP to PN and PN to NN conversions. This is coherent with the classical idea of a progressive transition between these three modes of division in the spinal cord. Coincidentally, in our experimental conditions (timing of analysis and level of overexpression), the increase in PN resulting from PP to PN conversions is perfectly balanced by a decrease resulting from PN to NN conversions, giving the artificial impression that the PN compartment is unaffected. A less likely hypothesis would be that misexpression directly transforms symmetric PP into symmetric NN divisions, and that asymmetric PN divisions are insensitive to CDKN1c levels. We do not favor this hypothesis, because one would expect, in that case, that the shRNA approach would also not affect the PN compartment, and it is not what we have observed (see Figure 3H - previously 3F).

      At the moment, the calculations of PN and NN frequencies are merged in the text, so perhaps describing PN and NN numbers separately will help better understand the dynamics of this gradual process (especially since there is little to no difference in PN).

      Regarding the results of Pax7 overexpression presented in figure 4D (now Figure 4E in the revised version), we had made the choice to merge PN and NN values in the main text to focus on the neurogenic transition from PP to PN/NN collectively. We agree with this reviewer, as well as with reviewer #1, that it should be more detailed and better discussed. We therefore propose to modify the paragraph as follows (and as already indicated above in the response to reviewer #1):

      "We observed an increase in the proportion of terminal neurogenic (NN) divisions and a decrease in proliferative (PP) divisions (Figure 4D). This suggests that Cdkn1c premature expression in PP progenitors converts them to the PN mode of division, while the combined endogenous and Pax7-driven expression of Cdkn1c converts PN progenitors to the NN mode of division. Coincidentally, at the stage analyzed, PP to PN conversions are balanced by PN to NN conversions, leaving the PN proportion artificially unchanged. The alternative interpretation of a direct conversion of symmetric PP into symmetric NN divisions is less likely, because the PN compartment was affected in the reciprocal Cdkn1c shRNA approach (see Figure 3F, now 3H)."

      Could the increase in NN be compatible also with a role in cell cycle exit and differentiation, for example from cells that have been targeted and are still undergoing the last division (hence marked by flash tag) or there won't be any GFP cells marked by flash tag a day after expression of high levels of Cdkn1c?

      It is likely that a proportion of cells that would normally have done a NN division are pushed to a direct differentiation that bypasses their last division in the Pax7-CDKN1c condition, and that they contribute to the general increase in neuron production observed in our quantification 48hae (Figure 3F -previously 3C). However, these cases would not contribute to the increase in the NN quantification in pairs of sister cells 6 hours after division at 24hae (Figure 4E - previously 4D), because by design they would not incorporate FlashTag. The rise in NN is therefore the result of a PN to NN conversion.

      Basically, what would the effect of expressing higher levels of Cdkn1c be? I guess this will really help them distinguish between transition to neurogenic division rather than neuronal differentiation. If not experimentally, any further comments on this would be appreciated.

      These experiments have been performed and presented in the study by Gui et al., 2007, which we cite in the paper. Using a strong overexpression of CDKN1c from the CAGGS promoter, they showed a massive decrease in proliferation, assessed by BrdU incorporation, 24hours after electroporation. We will cite this result more explicitly in the main text, and better explain the difference of our approach. We propose the following modification

      « We next explored whether low Cdkn1c activity is sufficient to induce the transition to neurogenic modes of division. A previous study has shown that overexpression of Cdkn1c driven by the strong CAGGS promoter triggers cell cycle exit of chick spinal cord progenitors, revealed by a drastic loss of BrdU incorporation 1 day after electroporation (Gui et al., 2007). As this precludes the exploration of our hypothesis, we developed an alternative approach designed to prematurely induce a pulse of Cdkn1c in progenitors, with the aim to emulate in proliferative progenitors the modest level of expression observed in neurogenic progenitors. We took advantage of the Pax7 locus, which is expressed in progenitors in the dorsal domain at a level similar to that observed for Cdkn1c in neurogenic precursors (Supplementary Figure 6A)."

      * * Minor comments

      Fig 3C my understanding is that HuC/D should be nuclear, but in fig 3C it seems more cytoplasmic (any comment?)

      Some studies suggest that HuC/D can, under certain conditions, be observed in the nucleus of neurons. However, HuC/D is a RNA binding protein whose localization is mainly expected to be cytoplasmic. In our experience (Tozer et al, 2017), and in other publications using the antibody in the chick spinal cord (see, for example, le Dreau et al, 2014), it is observed in the cell body of differentiated neurons, as in the current manuscript.

      Fig Suppl 3E (and related 4B), immuno for Cdkn1c-Myc: to help the reader understand the difference between the immuno signals when looking at the figure, I would suggest writing on the panel i) Pax7-Cdkn1c-Myc and ii) endogenous Cdkn1c-Myc, rather than 'misexpressed' and 'endogenous', which is slightly confusing (especially because what it is called endogenous expression is higher).

      This has now been modified in the figures.

      Literature citing: Introduction and discussion are very nicely written, although they could benefit from some more recent literature on the topic. For example, Cdkn1c role as a gatekeeper of stem cell reserve in the stomach, gut, (Lee et al, CellStemCell 2022 PMID: 35523142) or some other work on symmetric/asymmetric divisions and clonal analysis in zebrafish (Hevia et al, CellRep 2022 PMID: 35675784, Alexandre et al, NatNeur PMID: 20453852), mammals (Royal et al, Elife 2023 37882444, Appiah et al, EMBO rep 2023 PMID: 37382163). Also, similar work has been performed in the developing pancreatic epithelium, where mild expression of Cdkn1a under Sox9rtTa control was used to lengthen G1 without overt cell cycle exit and this resulted in Neurog3 stabilization and priming for endocrine differentiation (Krentz et al, DevCell 2017 PMID: 28441528), so similar mechanisms might be in in place to gradually shift progenitor towards stable decision to differentiate. Moreover, in the discussion, alongside Neurog2 control of Cdkn1c, it could be mentioned that the feedback loop between Cdk inhibitors and neurogenic factor is usually established via Cdk inhibitor-mediated inhibition of proneural bHLHs phosphorylation by CDKs (Krentz et al, DevCell 2017 PMID: 28441528, Ali et al, 24821983, Azzarelli et al 2017 - PMID: 28457793; 2024 - PMID:39575884). Further, in the discussion, could they mention anything about the following open questions: is there evidence for Cdkn1c low/high expression in mammalian spinal cord? Or maybe of other Cdk inhibitors? Is Cdkn1c also involved in cell cycle exit during gliogenesis? Or is there another Cdk inhibitor expressed at later developmental stages, hence linking this with specific cell fate decisions?

      We will modify the introduction and discussion in several instances, in order to address the above suggestions and we will:

      • add references to its role in other contexts and/or species.

      • expand the discussion on the cross talk between neurogenic factors and CDK inhibitors in other cellular contexts.

      • add a dedicated paragraph in the discussion to answer reviewer#2's questions: is there evidence for Cdkn1c low/high expression in mammalian spinal cord? Or maybe of other Cdk inhibitors? Is Cdkn1c also involved in cell cycle exit during gliogenesis or is there another Cdk inhibitor expressed at later developmental stages?

      Reviewer #2 (Significance (Required)):

      The work here presented has important implications on neural development and its disorders. The authors used the most advanced technologies to perform gene functional studies, such as CRISPR-HDR insertion of Myc-tag to follow endogenous expression, or expression under endogenous Pax7 promoter, often followed by flash tag experiments to trace sister cell fate, and all of this in an in vivo system. They then tested cell cycle parameters, clonal behaviour and modes of cell division in a very accurate way. Overall data are convincing and beautifully presented. The limitation is potentially in the resolution between the events of switching to neurogenic division versus neuronal differentiation, which might just warrant further discussion. This work advances our knowledge on vertebrate neurogenesis, by investigating a key player in proliferation and differentiation.

      ____I believe this work will be of general interest to developmental and cellular biologists in different fields. Because it addresses fundamental questions about the coordination between cell cycle and differentiation and fate decision making, some basic concepts can be translated to other tissues and other species, thus increasing the potential interested audience.

      My work focuses on stem cell fate decisions in mammalian systems, and I am familiar with the molecular underpinnings of the work here presented. However, I am not an expert in the chicken spinal cord as a model and yet the manuscript was interesting. I am also not sufficiently expert in the bioinformatic analysis, so cannot comment on the technical aspects of Figure 1 and the way they decided to annotate their data.

      __*

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): *__

      Summary: In this study, Mida et al. analyze large-scale single-cell RNA-seq data from the chick embryonic neural tube and identify Cdkn1c as a key molecular regulator of the transition from proliferative to neurogenic cell divisions, marking the onset of neurogenesis in the developing CNS. To confirm this hypothesis, they employed classical techniques, including the quantification of neural cell-specific markers combined with the flashTAG label, to track and isolate isochronic cohorts of newborn cells in different division modes. Their findings reveal that Cdkn1c expression begins at low levels in neurogenic progenitors and becomes highly expressed in nascent neurons. Using a classical knockdown strategy based on short hairpin RNA (shRNA) interference, they demonstrate that Cdkn1c suppression promotes proliferative divisions, reducing neuron formation. Conversely, novel genetic manipulation techniques inducing low-level CDKN1c misexpression drive progenitors into neurogenic divisions prematurely.

      By employing cumulative EdU incorporation assays and shRNA-based loss-of-function approaches, Mida et al. further show that Cdkn1c extends the G1 phase by inhibiting cyclin D, ultimately concluding that Cdkn1c plays a dual role: first facilitating the transition of progenitors into neurogenic divisions at low expression levels, and later promoting cell cycle exit to ensure proper neural development.

      This study presents several ambiguities and lacks precision in its analytical methodologies and quantification approaches, which contribute to confusion and potential bias. To enhance the reliability of the conclusions, a more rigorous validation of the methods employed is essential.

      This study introduces a novel approach to tracking the fate of sister cells from neural progenitor divisions to infer the division modes. While previous methods for analyzing the division mode of neural progenitor cells have been implemented, rigorous validation of the approach introduced by Mida et al. is necessary. Furthermore, the concept of cell cycle regulators interacting to control the duration of specific cell cycle stages and influencing progenitor cell division modes has been explored before, potentially limiting the novelty of these findings.

      Major comments:

      1.-The study presents ambiguity and lacks precision in quantifying neural precursor division modes. The authors use phosphorylated retinoblastoma protein (pRb) as a marker for neurogenic progenitors, claiming its reliability in identifying neurogenic divisions.

      However, they do not provide a thorough characterization of pRb expression in the developing chick neural tube, leaving its suitability as a neurogenic division marker unverified.

      Throughout their comments on the manuscript, this reviewer raises several points regarding the characterization of pRb expression in our model and of our use of this marker in our study. We take these comments into account and propose to expand on pRb characteristics in the first occurrence of pRb as a marker of cycling cells in the manuscript. The modifications rely on:

      • the quotation of several studies showing that phosphorylation of Rb is regulated during the cell cycle, and that "it is not detectable during a period of variable length in early G1 in several cell types (Moser et al, 2018;Spencer et al, 2013; Gookin et al, 2017), including neural progenitors in the developing chick spinal cord (Molina et al, 2022). Apart from this absence in early G1, pRb is detected throughout the rest of the cell cycle until mitosis".

      • a more detailed description of our own characterization of pRb dynamics in a synchronous cohort of cycling cells, which reveals a similar heterogeneity in the timing of the onset of Rb phosphorylation after mitosis. This description was initially shown in supplementary figure 3 and will be transferred to a new supplementary figure 2 to account for the fact that it will now be cited earlier in the manuscript.

      Regarding the specific question the "suitability (of pRb) as a neurogenic division marker": we do not directly "use phosphorylated retinoblastoma protein (pRb) as a marker for neurogenic progenitors", but we use Rb phosphorylation to discriminate between progenitors (pRb+) and neurons (pRb-) identity in pairs of sister cells to retrospectively identify the mode of division of their mother.

      Given that Rb is unphosphorylated during a period of variable length after mitosis (see references above), pRb is not a reliable marker of ALL cycling progenitors. We developed an assay to identify the timepoint (the maximal length of this "pRb-negative" phase) after which Rb is phosphorylated in all cycling progenitors (new Supplementary Figure 2). This assay relies on a time course of pRb detection in cohorts of FlashTag-positive pairs of sister cells born at E3. This time course experiment allowed us to identify a plateau after which the proportion of pRb-positive cells in the cohort remains constant. From this timepoint, this proportion corresponds to the proportion of cycling cells in the cohort. Rb phosphorylation therefore becomes a discriminating factor between cycling progenitors (pRb+) and non-cycling neurons (pRb-).

      We are confident that this provides a solid foundation for the determination of the identity of pairs of sister cells in all our Flash-Tag based assays, which retrospectively identify the mode of division of a progenitor on the basis of the phosphorylation status of its daughter cells 6 hours after division.

      We propose to modify the main text to describe the strategy and protocol more explicitly, by introducing the sentence highlighted in yellow in the following paragraph where the paired-cell analysis is first introduced (in the section on CDKN1c knock-down):

      "This approach allows to retrospectively deduce the mode of division used by the mother progenitor cell. We injected the cell permeant dye "FlashTag" (FT) at E3 to specifically label a cohort of progenitors that undergoes mitosis synchronously (Baek et al., 2018; Telley et al., 2016 and see Methods), and let them develop for 6 hours before analyzing the fate of their progeny using pRb immunoreactivity (Figure 3D). Our characterization of pRb immunoreactivity in the tissue had established beforehand that 6 hours after mitosis, all progenitors can reliably be detected with this marker (Supplementary Figure 2, Methods). Therefore, at this timepoint after FT injection, two-cell clones selected on the basis of FT incorporation can be categorized as PP, PN, or NN based on pRb positivity (P) or not (N) (see Methods, new Figure 3G and new Supplementary Figures 2 and 4)."

      We also modified accordingly the legend to Supplementary Figure 2 (previously Supplementary Figure 3, which describes the identification of the plateau of pRb.

      Furthermore, retinoblastoma protein (Rb) and cyclin D interact crucially to regulate the G1/S phase transition of the cell cycle, with cyclin D/CDK complexes phosphorylating Rb. Since the authors conclude that CDKN1c primarily acts by inhibiting the cyclin D/CDK6 complex, it is likely that CDKN1c influences pRb expression or phosphorylation state. This raises the possibility that pRb could be a direct target of CDKN1c, whose expression and phosphorylation would be altered in gain-of-function (GOF) and loss-of-function (LOF) analyses of CDKN1c.

      In light of this, it would be more appropriate to consider pRb as a CDKN1c target and discuss the molecular mechanisms regulating cell cycle components.

      We agree with the reviewer that Rb phosphorylation may be a direct or indirect target of Cdkn1c activity, and exploring the molecular aspects of the cellular and developmental phenomena that we describe in our manuscript would represent an interesting follow up study.

      ____A more precise approach would involve using other markers or targets to quantify neural precursor division modes at earlier stages of neurogenesis.

      To complement our analyses of the modes of division, we propose to use a positive marker to assess neural identity in parallel to the absence of pRb within pairs of cells. This approach may be the most meaningful in the gain of function context (Pax7 driven expression of Cdkn1c) because in this context, the time-point to reach the plateau of Rb phosphorylation used in our FT-based assay may indeed be delayed. On the opposite, in the context of loss of functions, the plateau may be reached earlier, which would have no effect on this assay.

      2.-Furthermore, the study employs FlashTag labeling to track daughter cells post-division, but the 16-hour post-injection window may result in misidentification of sister cells due to the potential presence of FlashTagged cells that did not originate from the same division.

      This introduces a risk of bias in quantification, data misinterpretation, and potential errors in defining division modes. A more rigorous validation of the FlashTag strategy and its specificity in tracking division pairs is necessary to ensure the reliability of their conclusions.

      The reviewer probably mistyped and meant 6-hour post injection, which is the duration that we use for paired cell tracking. We would like to emphasize that in addition to the FlashTag label, we benefit from the electroporation reporter to assess clonality. Altogether, we combine 5 criteria to define a clonal relationship :

      • 2 cells are positive for Flash Tag
      • The Flash Tag intensity is similar between the 2 cells
      • The 2 cells are positive for the electroporation reporter
      • The electroporation reporter intensity is similar between the two cells
      • the position of the two cells is consistent with the radial organization of clones in this tissue (Leber and Sanes, 1995;__; __Loulier et al, 2014): they are found on a shared line along the apico-basal axis, and share the same Dorso-Ventral and Antero-Posterior position . This combination is already described in the Methods section. We propose to modify the paragraph to include the sentence highlighted in yellow in the text below;

      "Cell identity of transfected GFP positive cells was determined as follows: cells positive for pRb and FT were classified as progenitors and cells positive for FT and negative for pRb as neurons. In addition, a similar intensity of both the GFP and FT signals within pairs of cells, and a relative position of the two cells consistent with the radial organization of clones in this tissue (Leber and Sanes, 1995; Loulier et al, 2014) were used as criteria to further ascertain sisterhood. This combination restricts the density of events fulfilling all these independent criteria, and can confidently be used to ensure a robust identification of pairs of sister cells."

      3.- The knock-in strategy used to tag the endogenous CDKN1c protein in Figure 2 is an elegant tool to infer protein dynamics in vivo. However, since strong conclusions regarding CDKN1c dynamics during the cell cycle are drawn from this section, it would be advisable to strengthen the results by including quantification with adequate replication and proper statistical analysis, as the current findings are preliminary and somewhat speculative.

      - "Although pRb is specific for cycling cells, it is only detected once cells have passed the point of restriction during the G1 phase." Please provide literary reference confirming this observation.

      We have entirely remodeled this section, which describes the expression of Myc-tagged Cdkn1c relative to pRb and now provide several references that describe the generally accepted view that pRb is specific of cycling cells, regulated during the cell cycle, and in particular absent in early G1. We also remove the mention of the "Restriction point" in the main text to avoid any confusion on the timing of phosphorylation, as the notion of restriction point is not useful in our study. The section now reads as follows:

      "To ascertain that Cdkn1c is translated in neural progenitors, we used an anti-pRb antibody, recognizing a phosphorylated form of the Retinoblastoma (Rb) protein that is specifically detected in cycling cells (Gookin et al., 2017; Moser et al., 2018; Spencer et al., 2013) , including neural progenitors of the developing chick spinal cord (Molina et al., 2022). In the ventricular zone of transverse sections at E4 (48hae), we detected triple Cdkn1c-Myc/GFP/pRb positive cells (arrowheads in Figure 2B), providing direct evidence for the Cdkn1c protein in cycling progenitors. We also observed many double GFP/pRb positive cells that were Myc negative (arrowheads in Figure 2B). The observation of UAS-driven GFP in these pRb-positive cells is evidence for the translation of Gal4 and therefore provides a complementary demonstration that the Cdkn1c *transcript is translated in progenitors. The absence of Myc detection in these double GFP/pRb positive cells also suggests that Cdkn1c/Cdkn1c-Myc stability is regulated during the cell cycle. *

      Finally, we observed double Myc/GFP-positive cells that were pRb-negative (Figure 2B; asterisks). One characteristic of Rb phosphorylation as a marker of cycling cells is a period in early G1 during which it is not detectable, as described in several cell types (Gookin et al., 2017; Moser et al., 2018; Spencer et al., 2013) including chick spinal cord neural progenitors (Molina et al., 2022). Using a method that specifically labels a synchronous cohort of dividing cells in the neural tube, we similarly observed a period in early G1 during which pRb is not detectable in some progenitors at E3 (See Supplementary Figure 2 and Methods). Hence, the double Myc/GFP positive and pRb negative cells may correspond to progenitors in early G1. Alternatively, they may be nascent neurons whose cell body has not yet translocated basally (see Figure 2C). Finally, we observed a pool of GFP positive/pRb negative nuclei with a strong Myc signal in the region of the mantle zone that is in direct contact with the ventricular zone (VZ), corresponding to the region where the transcript is most strongly detected (see Figure 2A). This pool of cells with a high Cdkn1c expression likely corresponds to immature neurons exiting the cell cycle and on their way to differentiation (Figure 2B; double asterisks). In addition, a few Myc positive cells were located deeper in the mantle zone, where the transcript is no more present, suggesting that the protein is more stable than the transcript.

      In summary, our dual Myc and Gal4 knock-in strategy which reveals the history of Cdkn1c transcription and translation confirms that Cdkn1c is expressed at low level in a subset of progenitors in the chick spinal neural tube, as previously suggested (Gui et al., 2007; Mairet-Coello et al., 2012). In addition, the restricted overlap of Cdkn1c-Myc detection with Rb phosphorylation suggests that in progenitors, Cdkn1c is degraded during or after G1 completion. "

      This section will again be remodeled in a future revised version of the manuscript, in which we will add quantifications of Myc levels, as requested by Reviewer 1 above, and also by Reviewer #3 below.

      Given that pRb immunoreactivity is used as a marker for cycling progenitors to base many of the results of this study, it would be very valuable to characterize the dynamics of pRb in cycling cells in the studied tissue, for instance combined with the cell cycle reporter used by Molina et al. (Development 2022).

      In the original version of the manuscript, the section describing the dynamics of CDKN1c-Myc in the KI experiments presented in Figure 2 relied on the idea that the dynamics of pRb in chick spinal progenitors is similar to what I described in other tissues and cell types, without providing any references to substantiate this fact. Actually, Molina et al provide a characterization of pRb in combination with their cell cycle reporter and conclude that pRb negative progenitors are in G1 ("We also verified that phospho-Rb- and HuC/D-negative cells were in G1 by using our FUCCI G1 and PCNA reporters"). We will now cite this reference to support our claim. In addition, our characterization of Rb progressive phosphorylation in the synchronic Flash-Tag cohort of newborn sister cells provides a complementary demonstration that a fraction of the progenitors are pRb-negative when they exit mitosis (i.e. in early G1). This analysis was initially only introduced in the supplementary Figure 3, as support for the section that presents the Paired-cell assay used in Figure 3. We propose to introduce the data from Supplementary Figure 3 earlier in the manuscript (now Supplementary Figure 2), in order to better introduce the reader with the dynamics of pRb in cycling cells in our model. This will better support our description of the Cdkn1c-Myc dynamics in relation with pRb. We therefore propose to reformulate this whole section as follows.

      - It would be valuable to analyse the dynamics of Myc immunoreactivity in combination of pRb in all three gRNAs (highlighted in Supplementary Figure 1), as it would be a strong point in favour that the dynamics reflect the endogenous CDKN1c dynamics.

      - It would be very valuable to provide a quantification of said dynamics (e.g. plotting myc intensity / pRb immunoreactivity along the apicobasal axis of the tissue).

      These are two interesting suggestions. To complement our data with guide #1, we have performed Myc-immunostaining experiments on transverse sections in the context of guide #3, showing exactly the same pattern of Myc signal, with low expression in the VZ, and a peak of signal in the part of the mantle zone that is immediately touching the VZ. This confirms the specificity of the spatial distribution of the Cdkn1c-Myc signal. These data have been added in a revised version of Supplementary Figure 1.

      We will perform the suggested quantifications using guides #1 and #3, which both show a good KI efficiency. We do not think it is useful to do these experiments with guide #2, whose efficiency is much lower, and which would lead to a very sparse signal.

      - The characterization of dynamics is performed only with one of the gRNAs (#1) on the basis that it produces the strongest NLS-GFP signal, as a proxy for guide efficiency. It would be nice if the authors could validate guide cutting efficiency via sequencing (e.g. using a Cas9-T2A-GFP plasmid and sorting for positive cells).

      We will perform these experiments to validate guide cutting efficiency using the Tide method (Brinkman et al, 2014)

      - In order to make sure that the dynamics inferred from Myc-tag immunoreactivity do reflect the cell cycle dynamics of CDKN1c-myc, it would be advisable to confirm in-frame insertion of the myc-tag sequence.

      We will perform genomic PCR experiments to confirm in-frame insertion of the Myc tags at the Cdkn1c locus

      4.- In Figure 3, the authors use a short-hairpin-mediated knock-down strategy to decrease the levels of Cdkn1c, and show that this manipulation leads to an increase percentage of cycling progenitors and a decrease in the number of neurons in electroporated cells.

      The authors claim that their shRNA-based knockdown strategy aims to reduce low-level Cdkn1c expression in neurogenic progenitors while minimally affecting the higher expression in newborn neurons required for cell cycle exit. However, several factors need consideration. Electroporation introduces variability in shRNA delivery, making it difficult to achieve consistent gene inhibition across all cells, especially for dose-dependent genes like Cdkn1c.

      Additionally, Cdkn1c generates multiple isoforms, which may not be fully annotated in the chick genome, raising the possibility that the shRNA targets specific isoforms, potentially explaining the observed low expression.

      All the predicted isoforms in the chick genome contain the sequence targeted by shRNA1, which is located in the CKI domain, the region of the protein that is most conserved between species. Besides, all the isoforms annotated in the mouse and human genomes also contain the region targeted by shRNA1. We are therefore confident that shRNA1 should target all chick isoforms.

      A more rigorous approach, such as qPCR analysis of sorted electroporated cells, would better validate the expression levels, rather than relying on in situ hybridization, presenting electroporated and non-electroporated cells in the same section (Supp. Figure 2).

      This approach (qRT-PCR on sorted cells) would enable us to focus solely on electroporated cells, but it would result in an averaged quantification of Cdkn1c depletion. In order to obtain additional information on the shRNA-dependent decrease in Cdkn1C in the different neural cell populations (progenitor versus differentiating neuron), we propose an alternative approach consisting in monitoring the level of Cdkn1c protein, assessed through Cdkn1c-Myc signal in knock-in cells, in the presence versus absence of Cdkn1c shRNA.

      - As the authors note, "Unambiguous identification of cycling progenitors and postmitotic neurons is notoriously difficult in the chick spinal cord". "markers of progenitors usually either do not label all the phases of the cell cycle (eg. Phospho-Rb, thereafter pRb), or persist transiently in newborn neurons (eg. Sox2)." Given that pRb immunoreactivity is used as the basis for a lot of the conclusions in this study, it would be valuable to add a characterization of its dynamics as mentioned in Figure 2, as well as provide literary references/proof that Sox2 expression persists in newborn neurons.

      We have addressed the case of pRb dynamics in progenitors above and added a reference documented pRb expression during the cell cycle of chick neural progenitors (Molina et al, 2022).

      Regarding Sox2 persistence: we consistently detect a small fraction of double positive Sox2+/HuC/D+ cells in chick spinal cord transverse sections. We have shown that this marker of differentiating neurons (HuC/D) only becomes detectable more than 8 hours after mitosis in newborn neurons at E3 (Baek et al, 2018), indicating that Sox2 protein can persist for up to at least 8 hours in newborn neurons.

      We now cite a paper showing that a similar persistence of Sox2 protein is reported in differentiating neurons of the human neocortex, where double Sox2/NeuN positive cells are frequently observed in cerebral organoids (Coquand et al, Nature Cell Biology 2024__)__

      - The undefined population (pRb-/HuCD-) introduces an unknown that assumes that the percentage of progenitors in G1 phase before the restriction point and the number of newborn neurons are equal for both conditions in an experiment. Can the authors provide explanation for this assumption?

      We do not think that these numbers are equal for both conditions, and we did not formulate this assumption. We only indicate (in the methods section) that this undefined/undetermined population (based on negativity for both markers) is a mix of two possible cell types. However, we do not offer any interpretation of the CDKN1c phenotypes based on the changes in this population. Indeed, our interpretation of the knock-down phenotype is solely based on the increase in pRb-positive and decrease in HuC/D-positive cells, which both suggest a delay in neurogenesis. We understand from the reviewer's comment that depicting an "undefined" population on the graph may cause some confusion. We therefore propose to present the data on pRb and HuC/D in different graphs, rather than on a combined plot, and to remove the reference to undefined cells in Figure 3, as well as in Figures 4 and 5 depicting the gain of function and double knock-down experiments. We have implemented these changes in updated versions of the figures.

      - In Gui et al. (Dev Biol 2006), authors showed that a knockdown of Cdkn1c leads to a failure of nascent neurons to exit the cell cycle and causes them to re-entry the cell cycle, shown by ectopic mitoses. In that study, cells born from those ectopic mitoses eventually leave the cell cycle leading to an increase in the number of neurons. Can the authors check for ectopic mitoses at 24hpe and 48hpe?

      We have now performed experiments with an anti phospho Histone 3 antibody, which labels mitotic cells, at 24 and 48 hours post electroporation. We do not see any ectopic mitoses upon Cdkn1c knock-down with this marker, and we have produced a Supplementary Figure with these data. This is consistent with the fact that we also do not see ectopic pRb or Sox2 positive cells in the mantle zone in the knock-down experiments. These data (pH3 and Sox2) have been added in the new Supplementary Figure 3E and F.

      We have now modified the main text to include these data:

      "In the context of a full knock-out of Cdkn1c in the mouse spinal cord, a reduction in neurogenesis was also observed, which was attributed to a failure of prospective neurons to exit the cell cycle, resulting in the observation of ectopic mitoses in the mantle zone (Gui et al, 2007). In contrast with this phenotype, using an anti phospho-Histone3 antibody, we did not observe any ectopic mitoses 24 or 48 hours after electroporation in our knock-down condition (Supplementary Figure 3E-F). This is consistent with the fact that we also do not observe ectopic cycling cells with pRb (Figure 3A and D) and Sox2 (Supplementary Figure 3E-F) antibodies. We therefore postulated that the reduced neurogenesis that we observe upon a partial Cdkn1c knock-down may result from a delayed transition of progenitors from the proliferative to neurogenic modes of division."

      - The authors then address the question of whether the decrease in neuron number is due to the failure of newborn neurons to exit the cell cycle or to a delay in the transition from proliferative to neurogenic divisions. For that, they implement a strategy to label a synchronized cohort of progenitors based of incorporation of a FlashTag dye.

      - Given that this strategy is the basis of many of the experiments in this article, it would be very valuable to expand on the validation of this technique as cited in major comment #2. In figure 3E, the close proximity of cell pairs in PP and PN clones shown in the pictures makes their sibling status apparent. However, this is not the case for the NN clone. Can the authors further explain with what criteria they determined the clonal status of two FlashTag labelled cells?

      The key criterion for cells that are not directly touching each other is that their relative position corresponds to the classical "radial" organization of clones in this tissue (Leber and Sanes, 1995__; __Loulier et al, Neuron, 2014). In other words, we make sure that they are located on a same apico-basal axis, as is the case for the NN clone presented on the figure. As stated above in our response to major comment #2, we have modified the Methods section accordingly.

      Can they provide further image examples of different types of clones?

      We now provide additional examples in a new Supplementary Figure 4

      - Can the authors show that the plateau reached in Sup Figure 3 for pRb immunoreactivity corresponds to a similar dynamic for HuC/D immunoreactivity?

      The plateau for Rb phosphorylation in progenitors is reached before 6 hours post mitosis at E3. At the same age, we have previously shown (Baek et al, PLoS Biology 2018) in a similar time course experiment in pairs of FT+ cells that the HuC/D signal is not detected in newborn neurons 8 hours after mitosis. HuC/D only starts to appear between 8 and 12 hours, and still increases between 8 and 16 hours. The plateau would therefore be very delayed for HuC/D compared to pRb. This long delay in the appearance of this « positive » marker of neural differentiation is the main reason why we chose to use Rb phosphorylation status for the analysis of synchronous cohorts of pairs of sister cells, because pRb becomes a discriminating factor much earlier than HuC/D after mitosis.

      - In order to further validate the strategy, could the authors use it at different stages to validate if they can replicate the different percentages of PP/PN/NN reported in the literature (e.g. Saade Cell Rep 2013)?

      We have carried out similar experiments at E2, showing a plateau of 95% of pRb-positive cells in the FT-positive population (see graph on the right). This provides a retrospective estimate of the mode of division of the mother cells at this stage (roughly 90% of PP and 10% of PN) which is consistent with the vast majority of PP divisions described by Saade et al (2013, see Figure S1) at this stage.

      5.- In Figure 4, the strategy used to induce a low-dose overexpression of CDKN1c is an elegant method to introduce CDKN1c-Myc expression under the control of the endogenous Pax7 promoter, active in proliferative progenitors. The main point to address is:

      - Please provide proof that Pax7 expression is not altered in guides with a successful knock-in event (e.g. sorting and WB against the Pax7 protein) or the immunohistochemistry as performed in the Pax7-P2A-Gal4 tagging in Petit-Vargas et al., 2024.

      We have now performed Pax7 immunostainings on transverse sections at 24 and 48 hours post electroporation, both with the Pax7-CDKN1c-Gal4 and with the Pax7-Gal4 control constructs. We present these data in the new supplementary figure 7. In both conditions, we find that the Pax7 protein is still present in KI-positive cells. We observe a modest increase in Pax7 signal intensity in these cells, suggesting either that the insertion of exogenous sequences stabilizes the Pax7 transcript, or that the C-terminal modification of Pax7 protein with the P2A tag increases its stability. This does not affect the interpretation of the CDKN1c overexpression phenotype, because we used the Pax7-Gal4 construct that shows the same modification of Pax7 stability as a control for this experiment. We have introduced this comment in the legend of Supplementary Figure 7.

      - Given the cell cycle regulated expression and activity of CDKN1c, can the authors elaborate on whether this is regulated at the promoter level?

      Cdkn1c transcription is regulated by multiple transcription factors and non-coding RNAs (see for example Creff and Besson, 2020, or Rossi et al, 2018 for a review). To our knowledge, these studies focus more on the regulation of Cdkn1c global expression than on the regulation of its levels during cell cycle progression. Although it is very likely that transcriptional regulation contributes, post-translational regulation, and in particular degradation by the proteasome, is also a key factor in the cell cycle regulation of Cdkn1c activity

      If so, how does this differ from the promoter activity of Pax7?

      The transcriptional regulation of Pax7 and Cdkn1c is probably controlled by different regulators, since their expression profiles are very different. Regardless of the mechanisms that control their expression, the rationale for choosing Pax7 as a driver for Cdkn1c expression was that Pax7 expression precedes that of Cdkn1c in the progenitor population, and that it disappears in newborn neurons, when that of Cdkn1c peaks. This provided us with a way to advance the timing of Cdkn1c expression onset in proliferative progenitors.

      - It would be advisable to characterize the dynamics along the cell cycle for the overexpressed form of CDKN1c-Myc relative to pRb, similarly to what was done in Figure 2B.

      We will carry out experiments similar to those shown in Figure 2B in order to characterise the dynamics of Cdkn1c in a context of overexpression, in relation to pRb.

      In addition, we will include a more precise quantification of the "misexpressed" compared to "endogenous" Cdkn1c -Myc levels, as already mentioned in the answer to a request by reviewer1.

      6.-In figure 5, the authors use a double knock-down strategy to test the hypothesis that the effect of Cdkn1c in G1 length is partially at least through its inhibition of CyclinD1. Results show that double shRNA-mediated knock-down of CyclinD1 and Cdkn1c counteracts the effects of Cdkn1c-sh alone on EdU incorporation, PP/PN/NN cell divisions and overall rations of progenitors and neurons.

      - In the measurement of progenitor cell cycle length in Figure 5A, it would be more appropriate to present the nonlinear regression method described by Nowakowski et al. (1989), as has been commonly used in the field (Saade et al., 2013, PMID: 23891002, Le Dreau et al., 2014, PMID: 24515346, Arai et al., 2011, PMID: 21224845).

      The Nowakowski non linear regression method has been used often in the literature in the same tissue, and is generally used to calculate fixed values for Tc, Ts, etc... This method is based on several selective criteria, and in particular the assumption that "all of the cells have the same cycle times". Yet, many studies have documented that cell cycle parameters change during the transition from proliferative to neurogenic modes of division during which our analysis is performed; live imaging data in the chick spinal cord have illustrated very different cell cycle durations at a given time point (see Molina et al). We therefore think that the proposed formulas do not reflect the heterogenous reality of neural progenitors of the embryonic spinal cord. However, the cumulative approach described by Nowakowski is useful to show qualitative differences between populations (e.g. a global decrease of the cycle length, like in our comparison between control and shRNA conditions). For these reasons, we prefer to display only the raw measurements rather than the regression curves.

      - Cumulative EdU incorporation in spinal progenitors (pRb-positive) at E3 (24 hours after injection) showed that the proportion of EdU-positive progenitors reached a plateau at 14 hours in control conditions, which is later than what has been reported in Le Dreau et al., 2014 (PMID: 24515346). Can you explain why?

      Le Dreau et al count the EdU+ proportion of cells in the total population of electroporated cells located in the VZ (which includes progenitors, but also future neurons that have been labelled during the previous cycles -at least for the time points after 2hours- and have not yet translocated to the mantle zone), whereas we only consider pRb+ progenitors in the analysis. In addition, the experiments are not performed at the same developmental stage. Altogether, this may account for the different curves obtained in our study.

      - It would be interesting to measure G1 length as in Figure 5D for the double cdkn1c-sh - ccnd1-sh knock down condition, to see if it rescues G1 length. As well as in the Ccnd1 knock down condition alone to see if it increases G1 length in this context as well.

      We will perform cumulative EDU incorporation experiments similar to that shown in Figure 5D to measure G1 length for the cdkn1c-sh - ccnd1-sh knock down double conditions, as well as in the Ccnd1 knock down condition alone.

      Minor comments

      __*Introduction:

      • The introduction should include references of studies of the role of Cdkn1c in cortical development (Imaizumi et al. Sci Rep 2020, Colasante et al. Cereb Cortex 2015, Laukoter et al. ____Nature Communications 2020).*__

      We will modify the introduction in several instances, in order to address suggestions by Reviewers #2 (see above) and #3, in particular to expand the description of the role of Cdkn1c during cortical development

      1) Transcriptional signature of the neurogenic transition (Figure 1).

      - In the result section, it would be informative to include the genes used to determine the progenitor and neuron score (instead of in Methods).

      We have now listed the genes used to determine the progenitor and neuron score in the main text of the result section

      - Figure 1A. It would be informative to add in the diagram what "filtering" means (eg. Neural crest cells).

      We have now added the detail of what 'filtering' means in the diagram

      - In the result section, "However, while Tis21 expression is switched off in neurons, Cdkn1c transiently peaks at high levels in nascent neurons before fading off in more mature cells." Missing literary reference or data to clearly demonstrate this point.

      We have reworded this sentence, adding a reference to the expression profile of Tis 21. The paragraph now reads as follows:

      « However, Cdkn1c expression is maintained longer and transiently peaks at high levels after Tis21 expression is switched off. Given that Tis21 is no more expressed in neurons (Iacopetti et al, 1999), this suggests that Cdkn1c expression is transiently upregulated in nascent neurons before fading off in more mature cells. »

      - "Interestingly, the gene cluster that contained Tis21 also contained genes encoding proteins with known expression and/or functions at the transition from proliferation to differentiation, such as the Notch ligand Dll1, the bHLH transcription factors Hes6, NeuroG1 and NeuroG2, and the coactivator Gadd45g." Missing references.

      We have now added references linking the function and/or expression profile of these genes to the neurogenic transition: Dll1 (Henrique et al., 1995), the bHLH transcription factors Hes6 (Fior and Henrique, 2005), NeuroG1 and NeuroG2 (Lacomme et al., 2012; Sommer et al., 1996) and the coactivator Gadd45g (Kawaue et al., 2014).

      - There is an error in the color code in Cell Clusters in Figure 1C (cluster 4 yellow in the legend but ocre in the figure)

      - Figure Sup3B colour code is switched (green for PP and red for NN) compared to the rest of the paper.

      We have corrected the colour code errors in Figure 1c and Supp Figure 3B (now changed to Supplementary Figure 5 in the modified revision)

      ____It would be valuable to assign cell cycle stage to neural progenitor cells (based on cell cycle score) and determine whether cdkn1c at the transcript level also shows enrichment in G1 cells considered to be progenitors.

      We have so far refrained from performing the suggested combined analysis based on cell cycle and cell type scores, as the "neurogenic progenitor population" (based on neurogenic progenitor score values) in which Cdkn1c expression is initiated represents a small number of cells in our scRNAseq, and felt that the significance of such an analysis is uncertain. We will perform this analysis in the revised version

      2) Progressive increase in Cdkn1c/p57kip2 expression underlie different cellular states in the embryonic spinal neural tube (Figure 2).

      - Figure 2A. Scale bar is missing in E3 and E4. It is important to consider the growth of the developing spinal cord and present it accordingly (E3 transverse section, Figure 2).

      The scale bar is actually valid for the whole panel A. The E2 section in the original figure appeared as "large" as the E3 section along the DV axis probably because the cutting angle was not perfectly transverse at E2, artificially lengthening the section. In a new version of the figure, we have replaced the E2 images with another section from the same experiment. The scale bar remains valid for the whole panel.

      - Figure 2 could use a diagram of the knock-in strategy used, similar as the one in Figure 4A.

      We have now added a diagram for the knock-in strategy in Figure 2B, and modified the legend of the figure accordingly.

      - Indicate hours post-electroporation. Indicate which guide is used in the main text.

      We have now added the post-electroporation timing and guide used in the main text.

      3) Downregulation of Cdkn1c in neural progenitors delays the transition from proliferative to neurogenic modes of division (Figure 3).

      - In methods: "Thus, to reason on a more homogeneous progenitor population, we restricted all our analysis to the dorsal one half or two thirds of the neural tube." Indicate when and depending on what one half or two thirds of the neural tube were analysed.

      - Are the clonal analysis experiments (Fig 3D, E and F) also restricted to the dorsal region?

      __We have modified this sentence as follows: "__Thus, to reason on a more homogeneous progenitor population, we restricted all our analysis to the dorsal two thirds of the neural tube, except for the Pax7-Cdkn1c misexpression analysis, which was performed in the more dorsal Pax7 domain."

      This is valid both for the whole population and clonal analyses

      - Figure 3. Would have a better flow if 3C preceded 3A and 3B.

      We have modified the Figure accordingly.

      - Figure 3C. it would be informative to show pictures of the electroporated NT at both 24hpe and 48hpe, as well as highlighting the dorsal part of the neural tube that was used for quantification.

      We have modified the Figure accordingly

      - In methods "At each measured timepoint (1h, 4h, 7h, 10h, 12h, 14 and 17h after the first EdU injection), we quantified the number of EdU positive electroporated progenitors (triple positive for EdU, pRb and GFP) over the total population of electroporated progenitor cells (pRb and GFP positive) (Figure 3B)." Explanation does not correspond to Figure 3B.

      This explanation corresponds indeed to Figure 5A. We have corrected this mistake in the new version of the manuscript.

      4) Inducing a premature expression of Cdkn1c in progenitors triggers the transition to neurogenic modes of division (Figure 4.).

      - "We took advantage of the Pax7 locus, which is expressed in progenitors in the dorsal domain at a level similar to that observed for Cdkn1c in neurogenic precursors (Supplementary Figure 4A)". Missing reference or data showing that Pax7 is restricted to the dorsal domain.

      We have added references to the expression profile of Pax7 in the dorsal neural tube (Jostes et al, 1990). In addition, the new Supplementary Figure 7 shows anti-Pax7 staining that confirm this expression pattern at E3 and E4

      - "its intensity was similar to the one observed for endogenous Myc-tagged Cdkn1c in progenitors (Figure 4B and Supplementary Figure 4E), and remained below the endogenous level of Myc-tagged Cdkn1c observed in nascent neurons, confirming the validity of our strategy". It would be valuable to add a quantification to demonstrate this point, either by fluorescence levels or WB of nls-GFP cells.

      As stated in the response to Major Point 5 above, we will perform a quantification based on Myc immunofluorescence to compare endogenous Cdkn1c expression versus Cdkn1c expression upon overexpression.

      - "At the population level, at E4, Cdkn1c expression from the Pax7 locus resulted in a strong reduction in the number of progenitors (pRb positive cells)". Indicate in the main text that this is 48hpe.

      We have added in the main text that the quantification was performed 48hae.

      - Legend of figure 4D should indicate that the quantification has been done 24hpe.

      We have added the timing of quantification in the legend of Figure 4D.

      - "To circumvent the cell cycle arrest that is triggered in progenitors by strong overexpression of Cdkn1c (Gui et al., 2007)". It would be advisable to expand on this reference on the text, or ideally to include a simple Cdkn1c overexpression experiment.

      These experiments have been performed and presented in the study by Gui et al., 2007, which we cite in the paper. Using a strong overexpression of CDKN1c from the CAGGS promoter, they showed a massive decrease in proliferation, assessed by BrdU incorporation, 24hours after electroporation. We will cite this result more explicitly in the main text, and better explain the difference of our approach. We propose the following modification:

      « We next explored whether low Cdkn1c activity is sufficient to induce the transition to neurogenic modes of division. A previous study has shown that overexpression of Cdkn1c driven by the strong CAGGS promoter triggers cell cycle exit of chick spinal cord progenitors, revealed by a drastic loss of BrdU incorporation 1 day after electroporation (Gui et al., 2007). As this precludes the exploration of our hypothesis, we developed an alternative approach designed to prematurely induce a pulse of Cdkn1c in progenitors, with the aim to emulate in proliferative progenitors the modest level of expression observed in neurogenic progenitors. We took advantage of the Pax7 locus, which is expressed in progenitors in the dorsal domain at a level similar to that observed for Cdkn1c in neurogenic precursors (Supplementary Figure 4A)."

      - "We observed a massive increase in the proportion of neurogenic (PN and NN) divisions rising from 57% to 84% at the expense of proliferative pairs (43% PP pairs in controls versus 16% in misexpressing cells, Figure 4D)." adding the percentages in the main text is a bit inconsistent with how the rest of the data is presented in the rest of the sections.

      This whole section has been modified in response to a question from reviewer 1. The new version does not contain percentages in the main text, and reads as follows:

      « Using the FlashTag cohort labeling approach described above, we traced the fate of daughter cells born 24 hae. We observed an increase in the proportion of terminal neurogenic (NN) divisions and a decrease in proliferative (PP) divisions (Figure 4D). This suggests that CDKN1c premature expression in PP progenitors converts them to the PN mode of division, while the combined endogenous and Pax7-driven expression of CDKN1c converts PN progenitors to the NN mode of division. Coincidentally, at the stage analyzed, PP to PN conversions are balanced by PN to NN conversions, leaving the PN proportion artificially unchanged. The alternative interpretation of a direct conversion of symmetric PP into symmetric NN divisions is less likely, because the PN compartment was affected in the reciprocal CDKN1c shRNA approach (see Figure 3F). Overall, these data show that inducing a premature low-level expression of Cdkn1c in cycling progenitors is sufficient to accelerate the transition towards neurogenic modes of division. »

      - Figure sup 4C includes references to 3 gRNAs even when only one is used in the study.

      The three guides listed in the original Supplementary Figure 4C correspond to the guides that we tested in Petit-Vargas et al. 2024. In this study, we only used the most efficient of these three guides. We have modified Figure 4C by quoting only this guide.

      5) The proneurogenic activity of Cdkn1c in progenitors is mediated by modulation of cell cycle dynamics (Figure 5)

      - "we targeted the CyclinD1/CDK4-6 complex, which promotes cell cycle progression and proliferation, and is inhibited by Cdkn1c." reference missing

      We have included references related to the activity of the CyclinD1/CDK4-6 complex in the developing CNS, and the antagonistic activities of CyclinD1 and Cdkn1c in this model

      - "we targeted the CyclinD1/CDK4-6 complex, which promotes cell cycle progression and proliferation in the developing CNS (Lobjois et al, 2004, 2008, Lange 2009, Gui et al 2007), and is inhibited by Cdkn1c (Gui et al, 2007)."

      - It would be informative to include experimental set-up information (e.g. hae) in Figures 5A, 5B, 5F and 5G.

      We have added the experimental set-up information in Figure 5.

      - Clarify if analysis is restricted to the dorsal progenitors or the whole dorsoventral length of the tube.

      The analyses were carried out on two thirds of the neural tube (dorsal 2/3), excluding the ventral zone, as specified above (and in the Methods section)

      - It would be valuable to add an image to illustrate what is quantified in Figure 5D, Figure F and Figure G.

      - For Figure 4C and D, it would be valuable to add images to illustrate the quantification.

      We have added images:

      • in Supplementary Figure 7C to illustrate what is quantified in Figures 4C (now 4C and 4D);
      • In Figure 5E to illustrate what is quantified in Figure 5D
      • In Supplementary Figure 8B to illustrate what is quantified in Figure 5G (now Figure 5H and 5I) Regarding the requested images for Figures 4D and 5F, they correspond to the same types of images already shown in Figure 3E. Since we have now added several additional examples of representative pairs of each type of mode of division in the new Supplementary Figure 4, we do not think that adding more of these images in figures 4 and 5 would strengthen the result of the quantifications.

      Discussion:

      - "Nonetheless, studies in a wide range of species have demonstrated that beyond this binary choice, cell cycle regulators also influence the neurogenic potential of progenitors, i.e the commitment of their progeny to differentiate or not (Calegari and Huttner, 2003; FUJITA, 1962; Kicheva et al., 2014; Lange et al., 2009; Lukaszewicz and Anderson, 2011a; Pilaz et al., 2009; Smith and Schoenwolf, 1987; Takahashi et al., 1995)." Should include maybe references to Peco et al. Development 2012, Roussat et al. J Neurosci. 2023).

      We have now included the references suggested by the reviewer.

      - "This occurs through a change in the mode of division of progenitors, acting primarily via the inhibition of the CyclinD1/CDK6 complex." The data shown in the paper does not demonstrate that Cdkn1c is inhibiting CyclinD1, only that knocking down both mRNAs counteracts the effect of knocking down Cdkn1c alone at the general tissue level and in the percentage of PP/PN/NN clones. This statement should be qualified.

      We propose to reformulate this paragraph in the discussion as follows to take this remark into account

      "This allows us to re-interpret the role of Cdkn1c during spinal neurogenesis: while previously mostly considered as a binary regulator of cell cycle exit in newborn neurons, we demonstrate that Cdkn1c is also an intrinsic regulator of the transition from the proliferative to neurogenic status in cycling progenitors. This occurs through a change in their mode of division, and our double knock-down experiments suggest that the onset of Cdkn1c expression may promote this change by counteracting a CyclinD1/CDK6 complex dependent mechanism."

      Other comments:

      - To improve clarity for the reader, it would help if electroporation was shown consistently on the same side of the neural tube. If electroporation has been performed at different sides and this is reflected in the figures, it would be advisable to explain on the figure legend.

      We have modified the figures to systematically show the electroporated side of the neural tube on the same side of the image for single electroporations.

      ____- Figure legends should include the number of embryos/tissue sections analysed for each experiment, as well as information on whether the sections were cryostat or vibratome.

      This information is now provided in the figure legends (numbers of cells analysed and/or numbers of embryos), except for data in Figure 5, which are presented in a new Supplementary Table 1.

      All experiments were performed on vibratome sections, except for in situ hybridization experiments, which were performed on cryostat sections. This last information was already indicated in the relevant figure legends

      - Overall, there is a lack of consistency in the figures regarding how much information is available to the reader (e.g. Sup Figure 2A, in the panel mRNA in situ hybridisation of Cdkn1c is referred to only as Cdkn1c whereas in Sup figure 5 the in situ reads as CCND1 mRNA). Readability would improve a lot if figures included information on what is an electroporated fluorescent tag or an immunostaining (similar to the label in sup 4D) as well as the exact stage and hours after electroporation where relevant.

      - There is a general lack of consistency in indicating the timing of the experiments, both in terms of embryonic stage/day and in terms of hours-post-electroporation.

      We have now homogenized the nomenclature in the figures.

      - "Primary antibodies used are: chick anti-GFP (GFP-1020 - 1:2000) from Aves Labs; goat antiSox2 (clone Y-17 - 1:1000) from Santa Cruz". There is no Sox2 immunostaining in the article.

      In the original version of the manuscript, the anti-Sox2 antibody was not used; we have now added experiments using this antibody in the modified version of the manuscript; this sentence in the Methods thus remains unchanged.

      Reviewer #3 (Significance (Required)):

      __*Significance:

      In neural development, there is a progressive switch in competence in neural progenitor cells, that transition from a proliferative (able to expand the neural progenitor pool) to neurogenic (able to produce neurons). Several factors are known to influence the transition of neural progenitor cells from a proliferative to a neurogenic state, including the activity of extracellular signalling pathways (e.g. SHH) (Saade et al. 2013, Tozer et al. 2017). In this study, the authors perform scRNA-seq of the cervical neural tube of chick at a stage of both proliferative and neurogenic progenitors are present, and identify transcriptional differences between the two populations. Among the differently expressed transcripts, they identify Cdkn1c (p57-Kip2) as enriched in neurogenic progenitors. Initially characterized as a driver of cell cycle exit in newborn neurons, the authors investigate the role of Cdkn1c in cycling progenitors. *__

      The authors find that knock-down of Cdkn1c leads to an increase in proliferative divisions at the expense of neurogenic divisions. Conversely, misexpression of Cdkn1c in proliferative progenitors leads to a switch to neurogenic divisions. Furthermore, they find that knock-down of Cdkn1c shortens G1 phase of the cell cycle, suggesting a link between G1 length and neurogenic competence in neural progenitor cells. Cell cycle length has previously been linked to competence of neural progenitors, and it has been described that longer G1 duration is linked to neurogenic competence (e.g. Calegari F, Huttner WB. 2003).

      The strengths of the study include:

      The identification of a subset of genes enriched in neurogenic vs. proliferative progenitors. Since the transition from proliferative to neurogenic competence is a gradual process at the tissue level, the classification of proliferative vs. neurogenic progenitors based on a score of transcripts and the identification of a subset of transcripts that are enriched in neurogenic progenitors is a valuable contribution to the neurodevelopmental field.

      - The somatic knock-in strategy used to induce low-level overexpression of Cdkn1c in proliferative progenitors is an elegant strategy to induce overexpression in a subset of cells in a controlled manner and is a valuable technical advance.

      - The characterization of a specific role of Cdkn1c in regulating cell cycle length in cycling progenitors is novel and valuable knowledge contributing to our understanding of how regulation of cell cycle length impacts competence of neural progenitors.

      The aspects to improve:

      - The sc-RNAseq isolated genes enriched in neurogenic versus proliferative progenitors, providing valuable insight into the gradual transition from proliferative to neurogenic competence at the tissue level. However, this gene subset requires clearer representation and detailed characterization. Additionally, the full scRNA-seq dataset should be made publicly available to support further research in neurodevelopment.

      The sequencing dataset has been deposited in NCBI's Gene Expression Omnibus database. It is currently under embargo, but will be made available upon acceptance and publication of the peer reviewed manuscript. Access is nonetheless available to the reviewers via a token that can be retrieved from the Review Commons website.

      The following information will be added in the final manuscript.

      Data availability

      Single cell RNA sequencing data have been deposited in NCBI's Gene Expression Omnibus (GEO) repository under the accession number GSE273710, and are available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE273710."

      - The characterization of Cdkn1c dynamics in cycling progenitors using endogenous tagging of the Cdkn1c transcript with a Myc tag is an elegant way to investigate the dynamics of Cdkn1c-myc along the cell cycle. However, it would be much more powerful if combined with a careful characterization of pRb immunostaining along the cell cycle in this tissue, as well as the quantifications and controls proposed. - Retinoblastoma protein (Rb) and cyclin D play a key role in regulating the G1/S transition, with cyclin D/CDK complexes phosphorylating Rb. Given that CDKN1c primarily inhibits the cyclin D/CDK6 complex, it likely affects pRb expression or phosphorylation. This suggests pRb may be a direct target of CDKN1c, making it an unreliable marker for tracking and quantifying neurogenic progenitors through CDKN1c modulation. In light of this, it would be more appropriate to consider pRb as a CDKN1c target and discuss the molecular mechanisms regulating cell cycle components. A more precise approach would involve using other markers or targets to quantify neural precursor division modes at earlier stages of neurogenesis.

      - Many of the conclusions of the study are based on experiments performed using the FlashTag dye in order to perform clonal analysis of proliferative vs. neurogenic divisions. It would be very valuable to further characterize the reliability of this tool as well as to provide more information on the criteria used to determine the fate of the pairs of sister cells.

      - The somatic knock-in strategy used to induce low-level overexpression of Cdkn1c in proliferative progenitors is an elegant strategy to induce overexpression in a subset of cells in a controlled manner. It would be valuable to further characterize the dynamics of Cdkn1c expression using this too and to provide proof that Pax7 expression is not altered in guides with the knock-in event.

      - The presentation of the existing literature could be more up to date.

      - The presentation of the data in the figures could be improved for readability. The sc-RNA seq data and the technical advances could be of interest for an audience of researchers using chick as a model organism, and working on neurodevelopment in general. Furthermore, the characterization of Cdkn1c as a regulator of G1 length in cycling progenitors and its implications for neurogenic competence could be of general interest for people working on basic research in the neurodevelopmental field.

      Field of expertise of the reviewer: neural development, cell biology, embryology.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study addresses the question of how task-relevant sensory information affects activity in the motor cortex. The authors use various approaches to address this question, looking at single units and population activity. They find that there are three subtypes of modulation by sensory information at the single unit level. Population analyses reveal that sensory information affects the neural activity orthogonally to motor output. The authors then compare both single unit and population activity to computational models to investigate how encoding of sensory information at the single unit level is coordinated in a network. They find that an RNN that displays similar orbital dynamics and sensory modulation to the motor cortex also contains nodes that are modulated similarly to the three subtypes identified by the single unit analysis.

      Strengths:

      The strengths of this study lie in the population analyses and the approach of comparing single-unit encoding to population dynamics. In particular, the analysis in Figure 3 is very elegant and informative about the effect of sensory information on motor cortical activity.

      The task is also well designed to suit the questions being asked and well controlled.

      We appreciate these kind comments.

      It is commendable that the authors compare single units to population modulation. The addition of the RNN model and perturbations strengthen the conclusion that the subtypes of individual units all contribute to the population dynamics. However, the subtypes (PD shift, gain, and addition) are not sufficiently justified. The authors also do not address that single units exhibit mixed modulation, but RNN units are not treated as such.

      We’re sorry that we didn’t provide sufficient grounds to introduce the subtypes. We have updated this in the revised manuscript, in Lines 102-104 as:

      “We determined these modulations on the basis of the classical cosine tuning model (Georgopoulos et al., 1982) and several previous studies (Bremner and Andersen, 2012; Pesaran et al., 2010; Sergio et al., 2005).”

      In our study, we applied the subtype analysis as a criterion to identify the modulation in neuron populations, rather than sorting neurons into exclusively different cell types.

      Weaknesses:

      The main weaknesses of the study lie in the categorization of the single units into PD shift, gain, and addition types. The single units exhibit clear mixed selectivity, as the authors highlight. Therefore, the subsequent analyses looking only at the individual classes in the RNN are a little limited. Another weakness of the paper is that the choice of windows for analyses is not properly justified and the dependence of the results on the time windows chosen for single-unit analyses is not assessed. This is particularly pertinent because tuning curves are known to rotate during movements (Sergio et al. 2005 Journal of Neurophysiology).

      In our study, the mixed selectivity or specifically the target-motion modulation on reach- direction tuning is a significant feature of the single neurons. We categorized the neurons into three subclasses, not intending to claim their absolute cell types, but meaning to distinguish target-motion modulation patterns. To further characterize these three patterns, we also investigated their interaction by perturbing connection weights in RNN.

      Yes, it’s important to consider the role of rotating tuning curves in neural dynamics during interception. In our case, we observed population neural state with sliding windows, and we focused on the period around movement onset (MO) due to the unexpected ring-like structure and the highest decoding accuracy of transferred decoders (Figure S7C). Then, the single-unit analyses were implemented.

      This paper shows sensory information can affect motor cortical activity whilst not affecting motor output. However, it is not the first to do so and fails to cite other papers that have investigated sensory modulation of the motor cortex (Stavinksy et al. 2017 Neuron, Pruszynski et al. 2011 Nature, Omrani et al. 2016 eLife). These studies should be mentioned in the Introduction to capture better the context around the present study. It would also be beneficial to add a discussion of how the results compare to the findings from these other works.

      Thanks for the reminder. We’ve introduced these relevant researches in the updated manuscript in Lines 422-426 as:

      “To further clarify, the discussing target-motion effect is different from the sensory modulation in action selection (Cisek and Kalaska, 2005), motor planning (Pesaran et al., 2006), visual replay and somatosensory feedback (Pruszynski et al., 2011; Stavisky et al., 2017; Suway and Schwartz, 2019; Tkach et al., 2007), because it occurred around movement onset and in predictive control trial-by-trial.”

      This study also uses insights from single-unit analysis to inform mechanistic models of these population dynamics, which is a powerful approach, but is dependent on the validity of the single-cell analysis, which I have expanded on below.

      I have clarified some of the areas that would benefit from further analysis below:

      (1) Task:

      The task is well designed, although it would have benefited from perhaps one more target speed (for each direction). One monkey appears to have experienced one more target speed than the others (seen in Figure 3C). It would have been nice to have this data for all monkeys.

      A great suggestion; however, it is hardly feasible as the Utah arrays have already been removed.

      (2) Single unit analyses:

      In some analyses, the effects of target speed look more driven by target movement direction (e.g. Figures 1D and E). To confirm target speed is the main modulator, it would be good to compare how much more variance is explained by models including speed rather than just direction. More target speeds may have been helpful here too.

      A nice suggestion. The fitting goodness of the simple model (only movement direction) is much worse than the complex models (including target speed). We’ve updated the results in the revised manuscript in Lines 119-122, as “We found that the adjusted R2 of a full model (0.55 ± 0.24, mean ± sd.) can be higher than that of the PD shift (0.47 ± 0.24), gain (0.46 ± 0.22), additive (0.41 ± 0.26), and simple models (only reach direction, 0.34 ± 0.25) for three monkeys (1162 neurons, ranksum test, one-tailed, p<0.01, Figure S5).”

      The choice of the three categories (PD shift, gain addition) is not completely justified in a satisfactory way. It would be nice to see whether these three main categories are confirmed by unsupervised methods.

      A good point. It is a pity that we haven’t found an appropriate unsupervised method.

      The decoder analyses in Figure 2 provide evidence that target speed modulation may change over the trial. Therefore, it is important to see how the window considered for the firing rate in Figure 1 (currently 100ms pre - 100ms post movement onset) affects the results.

      Thanks for the suggestion and close reading. Because the movement onset (MO) is the key time point of this study, we colored this time period in Figure 1 to highlight the perimovement neuronal activity.

      (3) Decoder:

      One feature of the task is that the reach endpoints tile the entire perimeter of the target circle (Figure 1B). However, this feature is not exploited for much of the single-unit analyses. This is most notable in Figure 2, where the use of a SVM limits the decoding to discrete values (the endpoints are divided into 8 categories). Using continuous decoding of hand kinematics would be more appropriate for this task.

      This is a very reasonable suggestion. In the revised manuscript, we’ve updated the continuous decoding results with support vector regression (SVR) in Figure S7A and in Lines 170-173 as:

      “These results were stable on the data of the other two monkeys and the pseudopopulation of all three monkeys (Figure S6) and reconfirmed by the continuous decoding results with support vector regressions (Figure S7A), suggesting that target motion information existed in M1 throughout almost the entire trial.”

      (4) RNN:

      Mixed selectivity is not analysed in the RNN, which would help to compare the model to the real data where mixed selectivity is common. Furthermore, it would be informative to compare the neural data to the RNN activity using canonical correlation or Procrustes analyses. These would help validate the claim of similarity between RNN and neural dynamics, rather than allowing comparisons to be dominated by geometric similarities that may be features of the task. There is also an absence of alternate models to compare the perturbation model results to.

      Thank you for these helpful suggestions. We have performed decoding analysis on RNN units and updated in Figure S12A and Lines 333-334 as: “First, from the decoding result, target motion information existed in nodes’ population dynamics shortly after TO (Figure S12A).”

      We also have included the results of canonical correlation analysis and Procrustes analysis in Table S2 and Lines 340-342 as: “We then performed canonical component analysis (CCA) and Procrustes analysis (Table S2; see Methods), the results also indicated the similarity between network dynamics and neural dynamics.”

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Zhang et al. examine neural activity in the motor cortex as monkeys make reaches in a novel target interception task. Zhang et al. begin by examining the single neuron tuning properties across different moving target conditions, finding several classes of neurons: those that shift their preferred direction, those that change their modulation gain, and those that shift their baseline firing rates. The authors go on to find an interesting, tilted ring structure of the neural population activity, depending on the target speed, and find that (1) the reach direction has consistent positioning around the ring, and (2) the tilt of the ring is highly predictive of the target movement speed. The authors then model the neural activity with a single neuron representational model and a recurrent neural network model, concluding that this population structure requires a mixture of the three types of single neurons described at the beginning of the manuscript.

      Strengths:

      I find the task the authors present here to be novel and exciting. It slots nicely into an overall trend to break away from a simple reach-to-static-target task to better characterize the breadth of how the motor cortex generates movements. I also appreciate the movement from single neuron characterization to population activity exploration, which generally serves to anchor the results and make them concrete. Further, the orbital ring structure of population activity is fascinating, and the modeling work at the end serves as a useful baseline control to see how it might arise.

      Thank you for your recognition of our work.

      Weaknesses:

      While I find the behavioral task presented here to be excitingly novel, I find the presented analyses and results to be far less interesting than they could be. Key to this, I think, is that the authors are examining this task and related neural activity primarily with a singleneuron representational lens. This would be fine as an initial analysis since the population activity is of course composed of individual neurons, but the field seems to have largely moved towards a more abstract "computation through dynamics" framework that has, in the last several years, provided much more understanding of motor control than the representational framework has. As the manuscript stands now, I'm not entirely sure what interpretation to take away from the representational conclusions the authors made (i.e. the fact that the orbital population geometry arises from a mixture of different tuning types). As such, by the end of the manuscript, I'm not sure I understand any better how the motor cortex or its neural geometry might be contributing to the execution of this novel task.

      This paper shows the sensory modulation on motor tuning in single units and neural population during motor execution period. It’s a pity that the findings were constrained in certain time windows. We are still working on this task, please look forward to our following work.

      Main Comments:

      My main suggestions to the authors revolve around bringing in the computation through a dynamics framework to strengthen their population results. The authors cite the Vyas et al. review paper on the subject, so I believe they are aware of this framework. I have three suggestions for improving or adding to the population results:

      (1) Examination of delay period activity: one of the most interesting aspects of the task was the fact that the monkey had a random-length delay period before he could move to intercept the target. Presumably, the monkey had to prepare to intercept at any time between 400 and 800 ms, which means that there may be some interesting preparatory activity dynamics during this period. For example, after 400ms, does the preparatory activity rotate with the target such that once the go cue happens, the correct interception can be executed? There is some analysis of the delay period population activity in the supplement, but it doesn't quite get at the question of how the interception movement is prepared. This is perhaps the most interesting question that can be asked with this experiment, and it's one that I think may be quite novel for the field--it is a shame that it isn't discussed.

      It’s a great idea! We are on the way, and it seems promising.

      (2) Supervised examination of population structure via potent and null spaces: simply examining the first three principal components revealed an orbital structure, with a seemingly conserved motor output space and a dimension orthogonal to it that relates to the visual input. However, the authors don't push this insight any further. One way to do that would be to find the "potent space" of motor cortical activity by regression to the arm movement and examine how the tilted rings look in that space (this is actually fairly easy to see in the reach direction components of the dPCA plot in the supplement--the rings will be highly aligned in this space). Presumably, then, the null space should contain information about the target movement. dPCA shows that there's not a single dimension that clearly delineates target speed, but the ring tilt is likely evident if the authors look at the highest variance neural dimension orthogonal to the potent space (the "null space")-this is akin to PC3 in the current figures, but it would be nice to see what comes out when you look in the data for it.

      Thank you for this nice suggestion. While it was feasible to identify potent subspaces encoding reach direction and null spaces for target-velocity modulation, as suggested by the reviewer, the challenge remained that unsupervised methods were insufficient to isolate a pure target-velocity subspace from numerous possible candidates due to the small variance of target-velocity information. Although dPCA components can be used to construct orthogonal subspaces for individual task variables, we found that the targetvelocity information remained highly entangled with reach-direction representation. More details can be found in Figure S8C and its caption as below:

      “We used dPCA components with different features to construct three subspaces (same data in A, reach-direction space #3, #4, #5; target-velocity space #10, #15, #17; interaction space #6, #11, #12), and we projected trial-averaged data into these orthogonal subspaces using different colormaps. This approach allowed us to obtain a “potent subspace” coding reach direction and a “null space” for target velocity. The results showed that the reach-direction subspace effectively represented the reach direction. However, while the target-velocity subspace encoded the target velocity information, it still contained reach-direction clusters within each target-velocity condition, corroborating the results of the addition model in the main text (Figure 4). The interaction subspace revealed that multiple reach-direction rings were nested within each other, similar to the findings from the gain model (Figure 3 & 4). The interaction subspace also captured more variance than target-velocity subspace, consistent with our PCA results, suggesting the target-velocity modulation primarily coexists with reach-direction coding. Furthermore, we explored alternative methods to verify whether orthogonal subspaces could effectively separate the reach direction and target velocity. We could easily identify the reach-direction subspace, but its orthogonal subspace was relatively large, and the target-velocity information exhibited only small variance, making it difficult to isolate a subspace that purely encodes target velocity.”

      (3) RNN perturbations: as it's currently written, the RNN modeling has promise, but the perturbations performed don't provide me with much insight. I think this is because the authors are trying to use the RNN to interpret the single neuron tuning, but it's unclear to me what was learned from perturbing the connectivity between what seems to me almost arbitrary groups of neurons (especially considering that 43% of nodes were unclassifiable). It seems to me that a better perturbation might be to move the neural state before the movement onset to see how it changes the output. For example, the authors could move the neural state from one tilted ring to another to see if the virtual hand then reaches a completely different (yet predictable) target. Moreover, if the authors can more clearly characterize the preparatory movement, perhaps perturbations in the delay period would provide even more insight into how the interception might be prepared.

      We are sorry that we did not clarify the definition of “none” type, which can be misleading. The 43% unclassifiable nodes include those inactive ones; when only activate (taskrelated) nodes included, the ratio of unclassifiable nodes would be much lower. We recomputed the ratios with only activated units and have updated Table 1. By perturbing the connectivity, we intended to explore the interaction between different modulations.

      Thank you for the great advice. We considered moving neural states from one ring to another without changing the directional cluster. However, we found that this perturbation design might not be fully developed: since the top two PCs are highly correlated with movement direction, such a move—similar to exchanging two states within the same cluster but under different target-motion conditions—would presumably not affect the behavior.

      Reviewer #3 (Public Review):

      Summary:

      This experimental study investigates the influence of sensory information on neural population activity in M1 during a delayed reaching task. In the experiment, monkeys are trained to perform a delayed interception reach task, in which the goal is to intercept a potentially moving target.

      This paradigm allows the authors to investigate how, given a fixed reach endpoint (which is assumed to correspond to a fixed motor output), the sensory information regarding the target motion is encoded in neural activity.

      At the level of single neurons, the authors found that target motion modulates the activity in three main ways: gain modulation (scaling of the neural activity depending on the target direction), shift (shift of the preferred direction of neurons tuned to reach direction), or addition (offset to the neural activity).

      At the level of the neural population, target motion information was largely encoded along the 3rd PC of the neural activity, leading to a tilt of the manifold along which reach direction was encoded that was proportional to the target speed. The tilt of the neural manifold was found to be largely driven by the variation of activity of the population of gain-modulated neurons.

      Finally, the authors studied the behaviour of an RNN trained to generate the correct hand velocity given the sensory input and reach direction. The RNN units were found to similarly exhibit mixed selectivity to the sensory information, and the geometry of the “ neural population” resembled that observed in the monkeys.

      Strengths:

      - The experiment is well set up to address the question of how sensory information that is directly relevant to the behaviour but does not lead to a direct change in behavioural output modulates motor cortical activity.

      - The finding that sensory information modulates the neural activity in M1 during motor preparation and execution is non trivial, given that this modulation of the activity must occur in the nullspace of the movement.

      - The paper gives a complete picture of the effect of the target motion on neural activity, by including analyses at the single neuron level as well as at the population level. Additionally, the authors link those two levels of representation by highlighting how gain modulation contributes to shaping the population representation.

      Thank you for your recognition.

      Weaknesses:

      - One of the main premises of the paper is the fact that the motor output for a given reach point is preserved across different target motions. However, as the authors briefly mention in the conclusion, they did not record muscle activity during the task, but only hand velocity, making it impossible to directly verify how preserved muscle patterns were across movements. While the authors highlight that they did not see any difference in their results when resampling the data to control for similar hand velocities across conditions, this seems like an important potential caveat of the paper whose implications should be discussed further or highlighted earlier in the paper.

      Thanks for the suggestion. We’ve highlighted the resampling results as an important control in the revised manuscript in Figure S11 and Lines 257-260 as:

      “To eliminate hand-speed effect, we resampled trials to construct a new dataset with similar distributions of hand speed in each target-motion condition and found similar orbital neural geometry. Moreover, the target-motion gain model provided a better explanation compared to the hand-speed gain model (Figure S11).”

      - The main takeaway of the RNN analysis is not fully clear. The authors find that an RNN trained given a sensory input representing a moving target displays modulation to target motion that resembles what is seen in real data. This is interesting, but the authors do not dissect why this representation arises, and how robust it is to various task design choices. For instance, it appears that the network should be able to solve the task using only the motion intention input, which contains the reach endpoint information. If the target motion input is not used for the task, it is not obvious why the RNN units would be modulated by this input (especially as this modulation must lie in the nullspace of the movement hand velocity if the velocity depends only on the reach endpoint). It would thus be important to see alternative models compared to true neural activity, in addition to the model currently included in the paper. Besides, for the model in the paper, it would therefore be interesting to study further how the details of the network setup (eg initial spectral radius of the connectivity, weight regularization, or using only the target position input) affect the modulation by the motion input, as well as the trained population geometry and the relative ratios of modulated cells after training.

      Great suggestions. In the revised manuscript, we’ve added the results of three alternative modes in Table S4 and Lines 355-365 as below:

      “We also tested three alternative network models: (1) only receives motor intention and a GO-signal; (2) only receives target location and a GO-signal; (3) initialized with sparse connection (sparsity=0.1); the unmentioned settings and training strategies were as the same as those for original models (Table S4; see Methods). The results showed that the three modulations could emerge in these models as well, but with obviously distinctive distributions. In (1), the ring-like structure became overlapped rings parallel to the PC1PC2 plane or barrel-like structure instead; in (2), the target-motion related tilting tendency of the neural states remained, but the projection of the neural states on the PC1-PC2 plane was distorted and the reach-direction clusters dispersed. These implies that both motor intention and target location seem to be needed for the proposed ring-like structure. The initialization of connection weights of the hidden layer can influence the network’s performance and neural state structure, even so, the ring-like structure”

      - Additionally, it is unclear what insights are gained from the perturbations to the network connectivity the authors perform, as it is generally expected that modulating the connectivity will degrade task performance and the geometry of the responses. If the authors wish the make claims about the role of the subpopulations, it could be interesting to test whether similar connectivity patterns develop in networks that are not initialized with an all-to-all random connectivity or to use ablation experiments to investigate whether the presence of multiple types of modulations confers any sort of robustness to the network.

      Thank you for these great suggestions. By perturbations, we intended to explore the contribution of interaction between certain subpopulations. We’ve included the ablation experiments in the updated manuscript in Table S3 and Lines 344-346 as below: “The ablation experiments showed that losing any kind of modulation nodes would largely deteriorate the performance, and those nodes merely with PD-shift modulation could mostly impact the neural state structure (Table S3).”

      - The results suggest that the observed changes in motor cortical activity with target velocity result from M1 activity receiving an input that encodes the velocity information. This also appears to be the assumption in the RNN model. However, even though the input shown to the animal during preparation is indeed a continuously moving target, it appears that the only relevant quantity to the actual movement is the final endpoint of the reach. While this would have to be a function of the target velocity, one could imagine that the computation of where the monkeys should reach might be performed upstream of the motor cortex, in which case the actual target velocity would become irrelevant to the final motor output. This makes the results of the paper very interesting, but it would be nice if the authors could discuss further when one might expect to see modulation by sensory information that does not directly affect motor output in M1, and where those inputs may come from. It may also be interesting to discuss how the findings relate to previous work that has found behaviourally irrelevant information is being filtered out from M1 (for instance, Russo et al, Neuron 2020 found that in monkeys performing a cycling task, context can be decoded from SMA but not from M1, and Wang et al, Nature Communications 2019 found that perceptual information could not be decoded from PMd)?

      How and where sensory information modulating M1 are very interesting and open questions. In the revised manuscript, we discuss these in Lines 435-446, as below: “It would be interesting to explore whether other motor areas also allow sensory modulation during flexible interception. The functional differences between M1 and other areas lead to uncertain speculations. Although M1 has pre-movement activity, it is more related to task variables and motor outputs. Recently, a cycling task sets a good example that the supplementary motor area (SMA) encodes context information and the entire movement (Russo et al., 2020), while M1 preferably relates to cycling velocity (Saxena et al., 2022). The dorsal premotor area (PMd) has been reported to capture potential action selection and task probability, while M1 not (Cisek and Kalaska, 2005; Glaser et al., 2018; Wang et al., 2019). If the neural dynamics of other frontal motor areas are revealed, we might be able to tell whether the orbital neural geometry of mixed selectivity is unique in M1, or it is just inherited from upstream areas like PMd. Either outcome would provide us some insights into understanding the interaction between M1 and other frontal motor areas in motor planning.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      At times the writing was a little hard to parse. It could benefit from being fleshed out a bit to link sentences together better.

      There are a few grammatical errors, such as:

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact each other, so the PD-shift nodes should not be neglected."

      should be

      "These results support strong and similar roles of gain and additive nodes, but what is even more important is that the three modulations interact WITH each other, so the PDshift nodes should not be neglected."

      The discussion could also be more extensive to benefit non-experts in the field.

      Thank you. We have proofread and polished the updated manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Other comments:

      - The authors mention mixed selectivity a few times, but Table 1 doesn't have a column for mixed selective neurons--this seems like an important oversight. Likewise, it would be good to see an example of a "mixed" neuron.

      - The structure of the writing in the results section often talked about the supplementary results before the main results - this seems backwards. If the supplementary results are important enough to come before the main figures, then they should not be supplementary. Otherwise, if the results are truly supplementary, they should come after the main results are discussed.

      - Line 305: Authors say "most" RNN units could be classified, and this is technically true, but only barely, according to Table 1. It might be good to put the actual percentage here in the text.

      - Figure 5a: typo ("Motion intention" rather than "Motor")

      - I couldn't find any mention of code or data availability in the manuscript.

      - There were a number of lines that didn't make much sense to me and should probably be rewritten or expanded on:

      - Lines 167-168: "These results qualitatively imply the interaction as that target speeds..." - Lines 178-179: "However, these neural trajectories were not yet the ideal description, because they were shaped mostly by time."

      - Lines 187-188: "...suggesting that target motion affects M1 neural dynamics via a topologically invariant transformation."

      - Lines 224-226: "Note that here we performed an linear transformation on all resulting neural state points to make the ellipse of the static condition orthogonal to the z-axis for better visualization." Does this mean that the z-axis is not PC 3 anymore?

      - Lines 272-274: "These simulations suggest that the existence of PD-shift and additive modulation would not disrupt the neural geometry that is primarily driven by gain modulation; rather it is possible that these three modulations support each other in a mixed population."

      Thank you for these detailed suggestions. By “mixed selectivity”, we mean the joint tuning of both target-motion and movement. In this case, the target-motion modulated neurons (regardless of the modulation type) are of mixed selectivity. The term “motor intention” refers to Mazzoni et al., 1996, Journal of Neurophysiology. We also revised the manuscript for better readership.

      We have updated the data and code availability in Data availability as below:

      “The example experimental datasets and relevant analysis code have been deposited in Mendeley Data at https://data.mendeley.com/datasets/8gngr6tphf. The RNN relevant code and example model datasets are available at https://github.com/yunchenyc/RNN_ringlike_structure.“

      Reviewer #3 (Recommendations For The Authors):

      Minor typos:

      Line 153: “there were”

      Line 301: “network was trained to generate”

      Line 318: “interact with each other”

      Suggested reformulations :

      Line 310 : “tilting angles followed a pattern similar to that seen in the data” Line 187 : the claim of a “topologically invariant transformation” seems strong as the analysis is quite qualitative.

      Suggested changes to the paper (aside from those mentioned in the main review): It could be nice to show behaviour in a main figure panel early on in the paper. This could help with the task description (as it would directly show how the trials are separated based on endpoint) and could allow for discussing the potential caveats of the assumption that behaviour is preserved.

      Thank you. We have corrected these typos and writing problems. As the similar task design has been reported, we finally decided not to provide extra figures or videos. Still, we thank this nice suggestion.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript by Thronlow Lamson et al., the authors develop a "beads-on-a-string" or BOAS strategy to link diverse hemagglutinin head domains, to elicit broadly protective antibody responses. The authors are able to generate varying formulations and lengths of the BOAS and immunization of mice shows induction of antibodies against a broad range of influenza subtypes. However, several major concerns are raised, including the stability of the BOAS, that only 3 mice were used for most immunization experiments, and that important controls and analyses related to how the BOAS alone, and not the inclusion of diverse heads, impacts humoral immunity.

      Strengths:

      Vaccine strategy is new and exciting.

      Analyses were performed to support conclusions and improve paper quality.

      Weaknesses:

      Controls for how different hemagglutinin heads impact immunity versus the multivalency of the BOAS.

      Only 3 mice were used for most experiments.

      There were limited details on size exclusion data.

      We appreciate the reviewer’s comments and have made the following changes to the manuscript.

      (1) We recognize that deconvoluting the effect of including a diverse set of HA heads and multivalency in the BOAS immunogens is necessary to understand the impact on antigenicity. Therefore, we now include a cocktail of the identical eight HA heads used in the 8-mer and BOAS nanoparticle (NP) as an additional control group. While we observed similar HA binding titers relative to the 8-mer and BOAS NP groups, the cocktail group-elicited sera was unable to neutralize any of the viruses tested; multivalency thus appears to be important for eliciting neutralizing responses

      (2) We increased the sample size by repeated immunizations with n=5 mice, for a total of n=8 mice across two independent experiments.

      (3) We expanded the details on size exclusion data to include:

      a) extended chromatograms from Figure 2C as Supplemental Figure 3.

      b) additional details in the materials and methods section (lines 370-372):

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a "beads-on-a-string" (BOAS) immunogen, where they link, using a non-flexible glycine linker, up to eight distinct hemagglutinin (HA) head domains from circulating and non-circulating influenzas and assess their immunogenicity. They also display some of their immunogens on ferritin NP and compare the immunogenicity. They conclude that this new platform can be useful to elicit robust immune responses to multiple influenza subtypes using one immunogen and that it can also be used for other viral proteins.

      Strengths:

      The paper is clearly written. While the use of flexible linkers has been used many times, this particular approach (linking different HA subtypes in the same construct resembling adding beads on a string, as the authors describe their display platform) is novel and could be of interest.

      Weaknesses:

      The authors did not compare to individuals HA ionized as cocktails and did not compare to other mosaic NP published earlier. It is thus difficult to assess how their BOAS compare.<br /> Other weaknesses include the rationale as to why these subtypes were chosen and also an explanation of why there are different sizes of the HA1 construct (apart from expression). Have the authors tried other lengths? Have they expressed all of them as FL HA1?

      We appreciate the reviewer’s comments. We responded to the concerns below and modified the manuscript accordingly.

      (1) We recognize that including a “cocktail” control is important to understand how the multivalency present in a single immunogen affects the immune response. We now include an additional control group comprised of a mixture of the same eight HA heads used in the 8-mer and the BOAS nanoparticle (NP). While this cocktail elicited similar HA binding titers relative to the 8-mer and BOAS NP immunogens (Fig. 6G), there was no detectable neutralization any of the viruses tested (Fig. 7).

      (2) In the introduction we reference other multivalent display platforms but acknowledge that distinct differences in their immunogen design platforms make direct comparisons to ours difficult—which is ultimately why we did not use them as comparators for our in vivo studies. Perhaps most directly relevant to our BOAS platform is the mosaic HA NP from Kanekiyo et al. (PMID 30742080). Here, HA heads, with similar boundaries to ours, were selected from historical H1N1 strains. These NPs however were significantly less antigenic diverse relative to our BOAS NPs as they did not include any group 2 (e.g., H7, H9) or B influenza HAs; restricting their multivalent display to group 1 H1N1s likely was an important factor in how they were able to achieve broad, neutralizing H1N1 responses. Additionally, Cohen et al. (PMID 33661993) used similarly antigenically distinct HAs in their mosaic NP, though these included full-length HAs with the conserved stem region, which likely has a significant impact on the elicited cross-reactive responses observed. Lastly, we reference Hills et al. (PMID 38710880), where authors designed similar NPs with four tandemly-linked betacoronoavirus receptor binding domains (RBDs) to make “quartets”. In contrast to our observations, the authors observed increased binding and neutralization titers following conjugation to protein-based NPs. We acknowledge potential differences between the studies, such as the antigen and larger VLP NP, that could lead to the different observed outcomes.

      (3) We intended to highlight the “plug-and-play” nature of the BOAS platform; theoretically any HA subtype could be interchanged into the BOAS. To that end, our rationale for selecting the HA subtypes in our proof-of-principle immunogen was to include an antigenically diverse set of circulating and non-circulating HAs that we could ultimately characterize with previously published subtype-specific antibodies that were also conformation-specific. In doing so, these diagnostic antibodies could confirm presence and conformation integrity of each component. We intentionally did not include HA subtypes that we did not have a conformation-specific antibody for.

      The different sizes of HA head domains was determined exclusively by expression of the recombinant protein. We have not attempted expression of full-length HA1 domains. Furthermore, we have not attempted to express the full-length HA (inclusive of HA1 and HA2) in our BOAS platform. The primary reason was to avoid including the conserved stem region of HA2 which may distract from the HA1 epitopes (e.g., receptor binding site, lateral patch) that can be engaged by broadly neutralizing antibodies. Additionally, the full-length HA is inherently trimeric and may not be as amenable to our BOAS platform as the monomeric HA1 head domain.

      Reviewer #3 (Public Review):

      This work describes the tandem linkage of influenza hemagglutinin (HA) receptor binding domains of diverse subtypes to create 'beads on a string' (BOAS) immunogens. They show that these immunogens elicit ELISA binding titers against full-length HA trimers in mice, as well as varying degrees of vaccine mismatched responses and neutralization titers. They also compare these to BOAS conjugated on ferritin nanoparticles and find that this did not largely improve immune responses. This work offers a new type of vaccine platform for influenza vaccines, and this could be useful for further studies on the effects of conformation and immunodominance on the resulting immune response.

      Overall, the central claims of immunogenicity in a murine model of the BOAS immunogens described here are supported by the data.

      Strengths included the adaptability of the approach to include several, diverse subtypes of HAs. The determination of the optimal composition of strains in the 5-BOAS that overall yielded the best immune responses was an interesting finding and one that could also be adapted to other vaccine platforms. Lastly, as the authors discuss, the ease of translation to an mRNA vaccine is indeed a strength of this platform.

      One interesting and counter-intuitive result is the high levels of neutralization titers seen in vaccine-mismatched, group 2 H7 in the 5-BOAS group that differs from the 4-BOAS with the addition of a group 1 H5 RBD. At the same time, no H5 neutralization titers were observed for any of the BOAS immunogens, yet they were seen for the BOAS-NP. Uncovering where these immune responses are being directed and why these discrepancies are being observed would constitute informative future work.

      There are a few caveats in the data that should be noted:

      (1) 20 ug is a pretty high dose for a mouse and the majority of the serology presented is after 3 doses at 20 ug. By comparison, 0.5-5 ug is a more typical range (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6380945/, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980174/). Also, the authors state that 20 ug per immunogen was used, including for the BOAS-NP group, which would mean that the BOAS-NP group was given a lower gram dose of HA RBD relative to the BOAS groups.

      We agree that this is on the “upper end” of recombinant protein dose. While we did not do a dose-response, we now include serum analyses after a single prime. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which reinforces our observation that the multivalency of the HA heads is necessary for eliciting robust serum responses to each component. These data are included in Supplemental Figure 5, and we’ve modified the text (lines 185-187) to include;

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      Additionally, we acknowledge that there is a size discrepancy between the BOAS NP and the largest BOAS, leading to an approximately ~15-fold difference on a per mole basis of the BOAS immunogen. The smallest and largest BOAS also differ by ~ 2.5-fold on a per mole basis; this could favor the overall amount of the smaller immunogens, however because vaccine doses are typically calculated on a mg per kg basis, we did not calculate on a molar basis for this study. Any promising immunogens will be evaluated in dose-response study to optimize elicited responses.

      (2) Serum was pooled from all animals per group for neutralization assays, instead of testing individual animals. This could mean that a single animal with higher immune responses than the rest in the group could dominate the signal and potentially skew the interpretation of this data.

      We repeated the neutralization assays with data points for individual mice. There does appear to be variability in the immune response between mice. This is most noticeable for responses to the H5 component. We are currently assessing what properties of our BOAS immunogen might contribute to the variability across individual mice.

      (3) In Figure S2, it looks like an apparent increase in MW by changing the order of strains here, which may be due to differences in glycosylation. Further analysis would be needed to determine if there are discrepancies in glycosylation amongst the BOAS immunogens and how those differ from native HAs.

      There does appear to be a relatively small difference in MW between the two BOAS configurations shown in Figure S2. This could be due to differences in glycosylation, as the reviewer points out, and in future studies, we intend to assess the influence of native glycosylation on antibody responses elicited by our BOAS immunogens.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major Concerns

      (1) From Figure 2D-E, it looks like BOAS are forming clusters, rather than a straight line. Do these form aggregates over time? Both at 4 degrees over a few days or after freeze-thaw cycle(s)? It is unclear from the SEC methods how long after purification this was performed and stability should be considered.

      Due to the inherent flexibility of the Gly-Ser linker between each component we do not anticipate that any rigidity would be imposed resulting in a “straight line”. Nevertheless, we appreciate the reviewers concern about the long-term stability of the BOAS immunogens. To address this, we include 1) the extended chromatograms from Figure 2C as Supplemental Figure 3 to show any aggregates present, 2) traces from up to 48 hours post-IMAC, and 3) chromatograms following a freeze-thaw cycle. Post-IMAC purification there is a minor (<10% total peak height) at ~9mL corresponding to aggregation. Note, we excluded this aggregation for immunizations. Post freeze-thaw cycle, we can see that upon immediate (<24hrs) thawing, the BOAS maintain a homogeneous peak with no significant (<10%) aggregation or degradation peak. However, after ~1 week post-freeze-thaw cycle at 4C, additional peaks within the chromatogram correspond to degradation of the BOAS.

      We modified the materials and methods section to state (lines 370-372)

      “Recovered proteins were then purified on a Superdex 200 (S200) Increase 10/300 GL (for trimeric HAs) or Superose 6 Increase 10/300 GL (for BOAS) size-exclusion column in Dulbecco’s Phosphate Buffered Saline (DPBS) within 48 hours of cobalt resin elution.”

      We commented on BOAS stability in the results section (lines 142-148)

      “Following SEC, affinity tags were removed with HRV-3C protease; cleaved tags, uncleaved BOAS, and His-tagged enzyme were removed using cobalt affinity resin and snap frozen in liquid nitrogen before immunizations. BOAS maintained monodispersity upon thawing, though over time, degradation was observed following longer term (>1 week) storage at 4C (Supplemental Figure 3). This degradation became more significant as BOAS increased in length (Supplemental Figure 3).”

      We also included in the discussion (lines 277-279):

      “Notably, for longer BOAS we observed degradation following longer term storage at 4C, which may reflect their overall stability.”

      (2) Figures 3-4 and 6-7, to make conclusions off of 3 mice per group is inappropriate. A sample size calculation should have been conducted and the appropriate number of mice tested. In addition, two independent mouse experiments should always be performed. Moreover, the reliability of the statistical tests performed seems unlikely, given the very small sample size.

      We agree that additional mice are necessary to make assessments regarding immunogenicity and cross-reactivity differences between the immunogens. To address this, we repeated the immunization with 5 additional mice, for a total of n=8 mice over two independent experiments. We incorporated these data into Figure 3B-D, as well as an additional Figure 3E (see below). We also now report the log-transformed endpoint titer (EPT) values rather than reciprocal EC50 values and added clarity to statistical analyses used. We have added the following lines to the methods section

      lines 427-431:

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      lines 406-408:

      “C57BL/6 mice (Jackson Laboratory) (n=8 per group for 3-, 4-, 5-, 6-, 7-, and 8mer cohorts; n=5 for BOAS NP, NP, and mix cohorts) were immunized with 20µg of BOAS immunogens of varying length and adjuvanted with 50% Sigmas Adjuvant for a total of 100µL of inoculum.”

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using Prism (GraphPad Prism v10.2.3). ELISAs comparing serum reactivity and microneutralization and comparing >2 samples were analyzed using a Kruskal-Wallis test with Dunn’s post-hoc test to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. ELISAs comparing two samples were analyzed using a Mann-Whitney test. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) One critical control that is missing is a homogenous BOAS, for example, just linking one H1 on a BOAS. Does oligomerization and increasing avidity alone improve humoral immunity?

      We agree that this is an interesting point, However, to address the impact of oligomerization and avidity on humoral immunity, we now include an additional control with a cocktail of HA heads used in the 8mer. We have incorporated this into Figure 3A, 3D and 3E, Figure 6G, and Figure 7.

      Additionally, we have added the following lines in the manuscript:

      lines 38-40:

      “Finally, vaccination with a mixture of the same HA head domains is not sufficient to elicit the same neutralization profile as the BOAS immunogens or nanoparticles.”

      lines 105-106:

      “Additionally, we showed that a mixture of the same HA head components was not sufficient to recapitulate the neutralizing responses elicited by the BOAS or BOAS NP.”

      lines 169-172:

      “To determine immunogenicity of each BOAS immunogen, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A). We compared these BOAS to a control group immunized with a mixture of the eight HA heads present in the 8mer.”

      lines 265-267:

      “There were qualitatively immunodominant HAs, notably H4 and H9, and these were relatively consistent across BOAS in which they were a component. This effect was reduced in the mix cohort.”

      (4) While some cross-reactivity is likely (Figure 6G), there is considerable loss of binding when there is a mismatch. Of the antibodies induced, how much of this is strain-specific? For example, how well do serum antibodies bind to a pre-2009 H1?

      We agree with the reviewer that there is a considerable loss of binding when there is a mismatched HA component. To better understand this and incorporate a mismatched strain into our analysis of the 8mer and BOAS NP, we looked at serum binding titers to a pre-2009 H1, H1/Solomon Islands/2006, and an antigenically distinct H3, H3/Hong Kong/1968. We have incorporated this data into Figures 3D, 3E, 6F and 6G. We observed relatively high titers against both a mismatched H1 and H3, indicating that the BOAS maintain high titers against subtype-specific strains that are conserved over considerable antigenic distance. However, this was similar in the mixture group, indicating that this may not be specific to oligomerization of BOAS immunogens.

      We added the following to the methods section:

      lines 357-361

      “Head subdomains from these HAs were used in the BOAS immunogens, and full-length soluble ectodomain (FLsE) trimers were used in ELISAs. Additional H1 (H1/A/Solomon Islands/3/2006) and H3 (H3/A/Hong Kong/1/1968) FLsEs were used in ELISAs as mismatched, antigenically distinct HAs for all BOAS.”

      Minor Concerns

      (1) Line 44-46, the deaths per year are almost exclusively due to seasonal influenza outbreaks caused by antigenically drifted viruses in humans, not those spilling over from avian sp. and swine. For accuracy, please adjust this sentence.

      We have adjusted lines 45-48 to say “This is largely a consequence of viral evolution and antigenic drift as it circulates seasonally within humans and ultimately impacts vaccine effectiveness. Additionally, the chance for spillover events from animal reservoirs (e.g., avian, swine) is increasing as population and connectivity also increase.”

      (2) Figure 4D-E, provide a legend for what the symbols indicate, or simply just put the symbol next to either the homology score and % serum competition labels on the y-axis.

      We have included a legend in Figures 4D,E to distinguish between homology score and % serum competition

      (3) I am a bit confused by the data presented in Figure 7. The figure legend says the two symbols represent technical replicates. How? Is one technical replicate of all the mice in a group averaged and that's what's graphed? If so, this is not standard practice. I would encourage the authors to show the average technical replicates of each animal, which is standard.

      We thank the reviewer for their suggestion, and we have revised Figure 7 such that each symbol represents a single animal for n=5 animals. We have also adjusted the figure caption to the following:

      “Figure 7: Microneutralization titers to matched and mis-matched virus- Microneutralization of matched and mis-matched psuedoviruses: H1N1 (green, top left), H3N2 (orange, top right), H5N1 (yellow, bottom left), and H7N9 viruses (pink, bottom right) with d42 serum. Solid bars below each plot indicate a matched sub-type, and striped bars indicate a mis-matched subtype (i.e. not present in the BOAS). NP negative controls were used to determine threshold for neutralization. Upper and lower dashed lines represent the first dilution (1:32) (for H1N1, H3N2, and H5N1) or neutralization average with negative control NP serum (H7N9), and the last serum dilution (1:32,768), respectively, and points at the dashed lines indicate IC50s at or outside the limit of detection. Individual points indicate IC50 values from individual mice from each cohort (n=5). The mean is denoted by a bar and error bars are +/- 1 s.d., * = p<0.05 as determined by a Kruskal-Wallis test with Dunn’s multiple comparison post hoc test relative to the mix group.”

      (4) Paragraphs 298-313, multiple studies are referred to but not referenced.

      We have added the following references to this section:

      (38) Kanekiyo, M. et al. Self-assembling influenza nanoparticle vaccines elicit broadly neutralizing H1N1 antibodies. Nature 498, 102–106 (2013).

      (48) Hills, R. A. et al. Proactive vaccination using multiviral Quartet Nanocages to elicit broad anti-coronavirus responses. Nat. Nanotechnol. 1–8 (2024) doi:10.1038/s41565-024-01655-9.

      (65) Jardine, J. et al. Rational HIV immunogen design to target specific germline B cell receptors. Science 340, 711–716 (2013).

      (66) Tokatlian, T. et al. Innate immune recognition of glycans targets HIV nanoparticle immunogens to germinal centers. Science 363, 649–654 (2019).

      (67) Kato, Y. et al. Multifaceted Effects of Antigen Valency on B Cell Response Composition and Differentiation In Vivo. Immunity 53, 548-563.e8 (2020).

      (68) Marcandalli, J. et al. Induction of Potent Neutralizing Antibody Responses by a Designed Protein Nanoparticle Vaccine for Respiratory Syncytial Virus. Cell 176, 1420-1431.e17 (2019).

      (69) Bruun, T. U. J., Andersson, A.-M. C., Draper, S. J. & Howarth, M. Engineering a Rugged Nanoscaffold To Enhance Plug-and-Display Vaccination. ACS Nano 12, 8855–8866 (2018).

      (70) Kraft, J. C. et al. Antigen- and scaffold-specific antibody responses to protein nanoparticle immunogens. Cell Reports Medicine 100780 (2022) doi:10.1016/j.xcrm.2022.100780.

      Reviewer #2 (Recommendations For The Authors):

      Can the authors define "detectable titers"?

      Maybe add a threshold value of reciprocal EC on the figure for each plot.

      We recognize the reviewers concern with reporting serum titers in this way, and we have adjusted our reported titers as endpoint titers (EPT) with a dotted line for the first detectable dilution (1:50). We have also adjusted the methods section to reflect this change:

      (lines 427-431)

      “Serum endpoint titer (EPT) were determined using a non-linear regression (sigmoidal, four-parameter logistic (4PL) equation, where x is concentration) to determine the dilution at which dilution the blank-subtracted 450nm absorbance value intersect a 0.1 threshold. Serum titers for individual mice against respective antigens are reported as log transformed values of the EPT dilution.”

      It also appears that not all X-mer elicits an immune response against matched HA, e.g. for the 7 and 8 -mer. Not sure why the authors do not mention this. It could be due to too many HAs, not sure.

      We apologize for the confusion, and agree that our original method of reporting EC50 values does not reflect weak but present binding titers. Upon further analysis with additional mice as well as adjusting our method of reporting titers, it is easier to see in Figure 3D that all X-mer BOAS do indeed elicit binding detectable titers to matched HA components.

      It will be nice to add a conclusion to the cross-reactivity - again it appears that past 6-mer there has been a loss in cross-reactivity even though there are more subtypes on the BOAS.

      Also, the TI seemed to be the more conserved epitope targeted here.

      (Of note these two are mentioned in the discussion)

      We have updated the results section to include the following:

      (lines 281-294)

      “Based on the immunogenicity of the various BOAS and their ability to elicit neutralizing responses, it may not be necessary to maximize the number of HA heads into a single immunogen. Indeed, it qualitatively appears that the intermediate 4-, 5-, and 6mer BOAS were the most immunogenic and this length may be sufficient to effectively engage and crosslink BCR for potent stimulation. These BOAS also had similar or improved binding cross-reactivity to mis-matched HAs as compared to longer 7- or 8mer BOAS. Notably, the 3mer BOAS elicited detectable cross-reactive binding titers to H4 and H5 mismatched HAs in all mice. This observed cross-reactivity could be due to sequence conservation between the HAs, as H3 and H4 share ~51% sequence identity, and H1 and H2 share ~46% and ~62% overall sequence identity with H5, respectively (Supplemental Figure 6). Additionally, the degree of surface conservation decreased considerably beyond the 5mer as more antigenically distinct HAs were added to the BOAS. These data suggest that both antigenic distance between HA components and BOAS length play a key role in eliciting cross-reactive antibody responses, and further studies are necessary to optimize BOAS valency and antigenic distance for a desired response.”

      Figure 5E, the authors could indicate which subtype each mab is specific to for those who are not HA experts. (They have them color-coded but it is hard to see because very small).

      The authors also do not explain why 3E5 does not bind well to H1, H2, H3, H4 4-mer BOA, etc...

      We apologize for the lack of clarity in this figure. We updated Figure 5E to include the subtype it is specific for as well as listing the antibodies and their subtype and targeted epitope in the figure caption.

      Minor

      Figure 1B zoom looks like the line is hidden to the structure - should come in front

      We adjusted the figure accordingly.

      Line 127 - whether the order

      Corrected

      What is the rationale for thinking that a different order will lead to a different expression and antigenic results?

      We thank the reviewer for this question. We did not necessarily anticipate a difference in protein expression based on BOAS order We, however, wanted to verify that our platform was indeed “plug-and-play” platform and we could readily exchange components and order. We do, however, hypothesize that a different order may in fact lead to different antigenic results. We think that the conformation of the BOAS as well as physical and antigenic distance of HA components may influence cross-linking efficiency of BCRs and lead to different antigenic results with different levels of cross-reactivity. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in proximity to each other or could be potentially shielded in certain conformations, and thus could affect antigenic results. We expand on this rationale in the discussion in lines 310-314:

      “Further studies with different combinations of HAs could aid in understanding how length and composition influences epitope focusing. For example, a BOAS design with a cluster of group 1 HAs followed by a cluster of group 2 HAs, rather than our roughly alternating pattern could impact which HAs are in close proximity to one other or could be potentially shielded in certain conformations, and thus could affect antigenic results.”

      Maybe list HA#1 HA#2 HA#3 instead of HA1, HA2, HA3 to make sure it is not confounded with HA2 and HA2

      We agree that this may be confusing for readers, and have adjusted Figure 1C to show HA#1, HA#2, etc.

      For nsEM, do the authors have 2D classes and even 3D reconstructions? Line 148-149: maybe or just because there are more HAs.

      We did not obtain 2D class or 3D reconstructions of these BOAS. However, we do agree with the reviewer that the collapsed/rosette structure of the 8mer BOAS may be a consequence of the additional HA heads as well as the flexible Gly-Ser linkers between the components. We have added clarify to our statement in the discussion to read:

      lines 154-156:

      “This is likely a consequence of the flexible GSS linker separating the individual HA head components as well as the addition of significantly more HA head components to the construct.”.

      Line 153 " interface-directed" - what does this mean?

      We apologize for any confusion- we intend for “interface-directed” to refer antibodies that engage the trimer interface (TI) epitope between HA protomers. We have adjusted the manuscript to use the same terminology throughout, i.e. trimer interface or its abbreviation, TI.

      For Figure 2 F - do you have a negative control? Usually one does not determine an ELISA KD, it is not very accurate but shows binding in terms of OD value.

      We did include a negative control, MEDI8852, a stem-directed antibody, though it was not shown in the figure because we observed no binding, as expected. This negative control antibody was also used in Figure 5E for characterizing the BOAS NPs, and also shows no binding. We recognize that in an ELISA the KD is an equilibrium measurement and we do not report kinetic measurements as determined by a method such as bio-layer interferometry (BLI), and have this adjusted the figure caption to denote the values as “apparent K<sub>D</sub> values”.

      Line 169 - reads strangely, "BOAS-elicited serum, regardless of its length, reacted<br /> The length is the one of the Immunogen, not the serum

      We agree that this statement is unclear, and we have modified the sentence to read:

      lines 177-178:

      “Each of the BOAS, regardless of its length, elicited binding titers to all matched full-length HAs representing individual components (Figure 3D).”

      What is the adjuvant used (add in results)?

      We used Sigma adjuvant for all immunizations, and have included this information in the results section:

      lines 169-171:

      “To determine immunogenicity of each BOAS, we performed a prime-boost-boost vaccination regimen in C5BL/6 mice at two-week intervals with 20µg of immunogen and adjuvanted with Sigma Adjuvant (Figure 3A).”

      This information is also included in the methods section in lines 406-412.

      Line 178 - remove " across"

      We have removed the word “across” in this sentence and replaced it with “on” (line 194)

      Trimer- interface, and interface epitopes are used exchangeably - maybe keep it as trimer interface to be more precise

      As stated above, we have adjusted the manuscript to use the same term throughout, i.e., trimer interface or its abbreviation, TI.

      Line 221 - no figure 6H (6G?)

      We apologize for this typo and have corrected to Figure 6G (line 231)

      Reviewer #3 (Recommendations For The Authors):

      (1) Since 20 ug x3 doses is quite a high amount of vaccine, differences between immunogens may become blurred. Thus, it may be informative to compare post-prime serology for all immunogens or select immunogens to compare to the post-3rd dose data.

      We agree with the reviewer that this is on the upper end of vaccine dose and thus we explored the serum responses after a single boost. The overall trends and reactivity to matched and mis-matched BOAS components remained similar across days d28 and d42. However, the differences between the BOAS and BOAS NP groups and the mixture group were more pronounced at d28, which bolsters our claim that the presentation of the HA heads is important for eliciting strong serum responses to all components. We have included this data in Supplemental Figure 5, and have acknowledged this in the text:

      lines 185-187:

      “Similar binding trends were also observed with d28 serum, though the difference between the 8mer and mix groups was more pronounced at d28 (Supplemental Figure 5).”

      (2) Significance statistics for all immunogenicity data should be added and discussed; it is particularly absent in Figures 3D and 7.

      We have added statistical analyses to Figure 3 and Figure 7 to reflect changes in immunogenicity. We have also added the following to the methods section:

      lines 482-490:

      “Statistical Analysis

      Significance for ELISAs and microneutralization assays were determined using either a Mann-Whitney test or a Kruskal-Wallis test with Dunn’s post-hoc test in Prism (GraphPad Prism v10.2.3) to correct for multiple comparisons. Multiple comparisons were made between each possible combination or relative to a control group, where indicated. Significance was assigned with the following: * = p<0.05, ** = p<0.01, *** = p<0.001, and **** = p<0.0001. Where conditions are compared and no significance is reported, the difference was non-significant.”

      (3) Figure 2F: the figure has K03.12 listed for the H3-specific mAb and in the main text, but the caption says 3E5 - is the 3E5 in the caption a typo? 3E5 is listed for the competition ELISAs as an RBS mAb, but its binding site is distal to the RBS at residues 165-170 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9787348/), H7.167 binds in the RBS periphery and not directly within the RBS, and the epitope for P2-D9 is undetermined/not presented. This could mean that there is actually a higher proportion of RBS-directed antibodies than what is determined from this serum competition data. Also, reference to these as 'RBS-directed' in the serum competition methods section should be revised for accuracy.

      We sincerely apologize for this error and the resulting confusion. 3E5 in the caption is incorrect and should be K03.12 (https://www.rcsb.org/structure/5W08) and does engage the receptor binding site. We also apologize for the oversight that H7.167 is in the RBS periphery and not directly in the RBS. The additional P2-D9 in the panel of RBS-directed antibodies was also in error, as we do not believe it is RBS-directed, but is indeed H4 specific. We also included a reference to the paper and immunogen that elicited this antibody. We agree that this indicates that there could be a higher proportion of RBS-directed antibodies in the serum and have modified the text in the results and methods sections to read:

      lines 300-306:

      “Notably, this proportion is approximate, as at the time of reporting, antibodies that bind the receptor binding site of all components were not available. RBS-directed antibodies to the H4 and H9 component were not available, and the RBS-directed antibodies used targeting the other HA components have different footprints around the periphery of the RBS. Additionally, there are currently no reported influenza B TI-directed antibodies in the literature. Therefore, this may be an underestimate of the serum proportion focused to the conserved RBS and TI epitopes.”

      lines 435-439:

      “Following blocking with BSA in PBS-T, blocking solution was discarded and 40µL of either DPBS (no competition control), a cocktail of humanized antibodies targeting the RBS and periphery (5J8, 2G1, K03.12, H5.3, H7.167, H1209), a cocktail of humanized TI-directed antibodies (S5V2-29, D1 H1-17/H3-14, D2 H1-1/H3-1), or a negative control antibody (MEDI8852) were added at a concentration of 100µg/mL per antibody.”

      (4) Only nsEM data is shown for the 3-BOAS and 8-BOAS, where differences in morphology were seen between these longer and shorter proteins. Including nsEM images for all BOAS immunogens may show trends in morphology or organization that could correlate with immune responses, e.g. if the 5-BOAS also forms a higher proportion of rosette-like structures, while the the 4-BOAS is still a mix between extended and rosette-like, this could be a factor in the better immune responses seen for 5-BOAS.

      We appreciate the reviewer’s suggestion for further analysis of morphology between the intermediate BOAS sizes. We agree that the relationship between BOAS length and morphology should be explored more in depth, and we intend to do so in future studies and to also vary linker length and rigidity.

    1. Welcome back and in this fundamentals video I want to briefly talk about Kubernetes which is an open source container orchestration system, and you use it to automate the deployment, scaling and management of containerized applications. At a super high level, Kubernetes lets you run containers in a reliable and scalable way, making efficient use of resources and lets you expose your containerized applications to the outside world or your business. It's like Docker, only with robots to automate it and super intelligence for all of the thinking. Now Kubernetes is a cloud agnostic product so you can use it on-premises and within many public cloud platforms. Now I want to keep this video to a super high level architectural overview but that's still a lot to cover, so let's jump in and get started.

      Let's quickly step through the architecture of a Kubernetes cluster. A cluster in Kubernetes is a highly available cluster of compute resources and these are organized to work as one unit. The cluster starts with the cluster control plane which is the part which manages the cluster; it performs scheduling, application management, scaling and deployment and much more. Compute within a Kubernetes cluster is provided via nodes and these are virtual or physical servers which function as a worker within the cluster; these are the things which actually run your containerized applications. Running on each of the nodes is software and at minimum this is container D or another container runtime which is the software used to handle your container operations, and next we have KubeLit which is an agent to interact with the cluster control plane. KubeLit running on each of the nodes communicates with the cluster control plane using the Kubernetes API. Now this is the top level functionality of a Kubernetes cluster — the control plane orchestrates containerized applications which run on nodes.

      But now let's explore the architecture of control planes and nodes in a little bit more detail. On this diagram I've zoomed in a little — we have the control plane at the top and a single cluster node at the bottom, complete with the minimum Docker and KubeLit software running for control plane communications. Now I want to step through the main components which might run within the control plane and on the cluster nodes — keep in mind this is a fundamental level video, it's not meant to be exhaustive, Kubernetes is a complex topic so I'm just covering the parts that you need to understand to get started. The cluster will also likely have many more nodes — it's rare that you only have one node unless this is a testing environment.

      First I want to talk about pods and pods are the smallest unit of computing within Kubernetes; you can have pods which have multiple containers and provide shared storage and networking for those pods, but it's very common to see a one container one pod architecture which as the name suggests means each pod contains only one container. Now when you think about Kubernetes don't think about containers — think about pods — you're going to be working with pods and you're going to be managing pods, the pods handle the containers within them. Architecturally you would generally only run multiple containers in a pod when those containers are tightly coupled and require close proximity and rely on each other in a very tightly coupled way. Additionally although you'll be exposed to pods you'll rarely manage them directly — pods are non-permanent things; in order to get the maximum value from Kubernetes you need to view pods as temporary things which are created, do a job and are then disposed of. Pods can be deleted when finished, evicted for lack of resources or if the node itself fails — they aren't permanent and aren't designed to be viewed as highly available entities. There are other things linked to pods which provide more permanence but more on that elsewhere.

      So now let's talk about what runs on the control plane. Firstly I've already mentioned this one — the API known formally as kube-api server — this is the front end for the control plane, it's what everything generally interacts with to communicate with the control plane and it can be scaled horizontally for performance and to ensure high availability. Next we have ETCD and this provides a highly available key value store — so a simple database running within the cluster which acts as the main backing store for data for the cluster. Another important control plane component is kube-scheduler and this is responsible for constantly checking for any pods within the cluster which don't have a node assigned, and then it assigns a node to that pod based on resource requirements, deadlines, affinity or anti affinity, data locality needs and any other constraints — remember nodes are the things which provide the raw compute and other resources to the cluster and it's this component which makes sure the nodes get utilized effectively.

      Next we have an optional component — the cloud controller manager — and this is what allows kubernetes to integrate with any cloud providers. It's common that kubernetes runs on top of other cloud platforms such as AWS, Azure or GCP and it's this component which allows the control plane to closely interact with those platforms. Now it is entirely optional and if you run a small kubernetes deployment at home you probably won't be using this component.

      Now lastly in the control plane is the kube controller manager and this is actually a collection of processes — we've got the node controller which is responsible for monitoring and responding to any node outages, the job controller which is responsible for running pods in order to execute jobs, the end point controller which populates end points in the cluster (more on this in a second but this is something that links services to pods — again I'll be covering this very shortly), and then the service account and token controller which is responsible for account and API token creation. Now again I haven't spoken about services or end points yet — just stick with me, I will in a second.

      Now lastly on every node is something called kproxy known as kube proxy and this runs on every node and coordinates networking with the cluster control plane — it helps implement services and configures rules allowing communications with pods from inside or outside of the cluster. You might have a kubernetes cluster but you're going to want some level of communication with the outside world and that's what kube proxy provides.

      Now that's the architecture of the cluster and nodes in a little bit more detail but I want to finish this introduction video with a few summary points of the terms that you're going to come across. So let's talk about the key components — so we start with the cluster and conceptually this is a deployment of kubernetes, it provides management, orchestration, healing and service access. Within a cluster we've got the nodes which provide the actual compute resources and pods run on these nodes — a pod is one or more containers and is the smallest admin unit within kubernetes and often as I mentioned previously you're going to see the one container one pod architecture — simply put it's cleaner. Now a pod is not a permanent thing, it's not long lived — the cluster can and does replace them as required.

      Services provide an abstraction from pods so the service is typically what you will understand as an application — an application can be containerized across many pods but the service is the consistent thing, the abstraction — service is what you interact with if you access a containerized application. Now we've also got a job and a job is an ad hoc thing inside the cluster — think of it as the name suggests as a job — a job creates one or more pods, runs until it completes, retries if required and then finishes — now jobs might be used as back end isolated pieces of work within a cluster.

      Now something new that I haven't covered yet and that's ingress — ingress is how something external to the cluster can access a service — so you have external users, they come into an ingress, that's routed through the cluster to a service, the service points at one or more pods which provides the actual application. So an ingress is something that you will have exposure to when you start working with Kubernetes. And next is an ingress controller and that's a piece of software which actually arranges for the underlying hardware to allow ingress — for example there is an AWS load balancer ingress controller which uses application and network load balancers to allow the ingress, but there are also other controllers such as engine X and others for various cloud platforms.

      Now finally and this one is really important — generally it's best to architect things within Kubernetes to be stateless from a pod perspective — remember pods are temporary — if your application has any form of long running state then you need a way to store that state somewhere. Now state can be session data but also data in the more traditional sense — any storage in Kubernetes by default is ephemeral provided locally by a node and thus if a pod moves between nodes then that storage is lost. Conceptually think of this like instance store volumes running on AWS EC2. Now you can configure persistent storage known as persistent volumes or PVs and these are volumes whose life cycle lives beyond any one single pod which is using them and this is how you would provision normal long running storage to your containerised applications — now the details of this are a little bit beyond this introduction level video but I wanted you to be aware of this functionality.

      Ok so that's a high level introduction to Kubernetes — it's a pretty broad and complex product but it's super powerful when you know how to use it. This video only scratches the surface. If you're watching this as part of my AWS courses then I'm going to have follow up videos which step through how AWS implements Kubernetes with their EKS service. If you're taking any of the more technically deep AWS courses then maybe other deep dive videos into specific areas that you need to be aware of. So there may be additional videos covering individual topics at a much deeper level. If there are no additional videos then don't worry because that's everything that you need to be aware of. Thanks for watching this video, go ahead and complete the video and when you're ready I look forward to you joining me in the next.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2025-02860

      Corresponding author(s): Duncan, Sproul

      [The "revision plan" should delineate the revisions that authors intend to carry out in response to the points raised by the referees. It also provides the authors with the opportunity to explain their view of the paper and of the referee reports.

      • *

      The document is important for the editors of affiliate journals when they make a first decision on the transferred manuscript. It will also be useful to readers of the reprint and help them to obtain a balanced view of the paper.

      • *

      If you wish to submit a full revision, please use our "Full Revision" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We thank the reviewers for recognizing that our work contributes 'both conceptually and mechanistically to our understanding of how DNA methylation patterns are regulated during cancer development' and their insightful suggestions to improve the manuscript. We note that the reviewers suggest that the data are 'comprehensive', 'well-controlled', 'rigorously done' and 'diligently analysed'.

      Our planned revisions focus on further elucidating the broader implications of our findings for partially methylated domain formation in cancer, the effects of the methylation changes we observe on gene expression and the potential mechanisms underpinning the formation of the hypermethylated domains we observe.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      We have reproduced the reviewer's comments in their entirety and highlighted them in blue italics.

      February 21, 2025*RE: Review Commons Refereed Preprint #RC-2025-02860 *

      *Kafetzopoulos *

      DNMT1 loss leads to hypermethylation of a subset of late replicating domains by DNMT3A

      ------------------------------------------------------------------------------

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)): *

      The DNA methylation landscape is frequently altered in cancers, which may contribute to genome misregulation and cancer cell behavior. One phenomenon is the emergence of "partially methylated domains (PMDs)": intermediately methylated regions of the genome that are generally heterochromatic and late replicating. The prevailing explanation is that the DNA methyltransferase, DNMT1, is not able to maintain DNAme levels at late replicating sites in proliferating cancer cells. This could result in genome instability. In this study, Kafetzopoulos and colleagues interrogated this possibility using a common laboratory colorectal cancer cell line (HCT116). Additionally, they utilized a DNMT1 mutant line that they refer to as a knockout, even though, more accurately, it is a hypomorphic truncation. They performed several genomic assays, such as whole genome bisulfite sequencing, ChIP and repli-seq, in order to assess the effect of reduced DNMT1 activity. While expectedly, global DNAme levels are decreased, they discovered a subset of PMDs gain DNA methylation, which they term hyperPMDs. There seems to be no impact on DNA replication timing, but the authors did go on to show that the de novo DNA methyltransferase, DNMT3Α, is likely responsible for this counterintuitive increase in DNAme levels.

      *Reviewer #1 (Significance (Required)): *

      Overall, I found the data well-presented and diligently analyzed, as we have come to expect from the Sproul group. However, I am somewhat at a loss to understand both the rationale for the experimental set-up and the meaning of the results. The HCT116 cell line is already transformed but was treated as though it was a wild-type control. I was more curious to see how the PMD chromatin state and replication compare to a healthy cell.

      We focused on the comparison between WT and DNMT1 KO cells as we wanted to understand the role of DNMT1 in maintaining the organisation of the cancer methylome. We agree that, strictly, this could differ from its role in normal cells. However, we are unaware of a suitable cell line to test the consequences of DNMT1 KO in normal colon cells and testing this in vivo would be beyond the time-scale of a manuscript revision.

      To further understand the relevance of our findings in the context of carcinogenesis, we propose to analyse data derived from normal and cancerous colon tissue in the revised manuscript. Preliminary analysis shows that HCT116 PMDs are hypomethylated in a colorectal tumour but not in the normal colon (revision plan figure 1). This suggests that HCT116 cells are a model that can be used to understand PMD formation in tumours and we will extend this analysis in the revised manuscript. We will also add discussion of the caveat that DNMT1 may function differently in normal tissues and cancer cells.

      Note, revision plan figure 1 was included with the full submission but cannot be uploaded in this format.

      Revision plan figure 1. HCT116 PMDs are hypomethylated in colorectal tumours. Heatmaps and pileup plots of HCT116, normal colon and colorectal tumour DNA methylation levels for HCT116 PMDs (n=546 domains) and HMDs (n=558 domains). DNA methylation levels are mean % mCpG. PMDs and HMDs are aligned and scaled to the start and end points of each domain and ranked based on their mean methylation levels in HCT116 cells. Colon and tumour data re-analysed from a previous publication (Berman et al 2011, PMID: 22120008).

      Moreover, the link between late replication and PMDs would indicate that a DNMT1 gain-of- function line would potentially be more interesting: could more increased DNMT activity rescue the PMDs, and how would this impact the chromatin and replication states? Perhaps this is not trivial to create; I do not know if simply overexpressing DNMT1 and/or UHRF1 could act as a gain-of-function.

      We agree with the reviewer that a DNMT1 overexpression or a gain-of-function mutation cell line would be interesting to analyse and potentially informative as to the mechanism of PMD formation. However, as the reviewer notes, this is a complex experiment that could require the overexpression of partners such as UHRF1 or generation of an unknown gain-of-function mutation. In addition, the full dissection of the implications of this separate experimental strategy would entail the repetition of the majority of our experiments in DNMT1 KO cells. Instead, in the revised manuscript, we will focus on a related experiment suggested by reviewer 2 and ask whether re-expression of DNMT1 rescues DNA methylation patterns DNMT1 KO cells.

      Nevertheless, the appearance of hyperPMDs was a curious finding worth publishing. However, it is unclear what the biological relevance is. There is no effect on replication timing, and no assessment on cell behavior (eg, proliferation assays).* In other words, is DNMT3A performing some kind of compensatory action, or is it just a curiosity? Below in the significance section, I have highlighted some additional specific points *

      PMDs are important to study because cancer-associated hypomethylation is believed to drive carcinogenesis through genomic instability (Eden et al 2003, PMID: 12702868). However, the mechanisms underpinning their formation remain unclear. At present the predominant hypothesis is that PMDs emerge in heterochromatin because their late replication timing leaves insufficient time for re-methylation following DNA replication (Zhou et al 2018, PMID: 29610480 and Petryk et al 2021, PMID: 33300031). We believe that our observations of hypermethylated PMDs in DNMT1 KO cells provides important evidence contrary to this hypothesis because they disconnect domain-level methylation patterns from the replication timing program. Our work instead suggests that the localization of de novo DNMTs plays a key role in the formation of PMDs by protecting euchromatin from hypomethylation.

      To further explore this hypothesis, we propose to analyze data derived from tumours in our revised manuscript to understand the degree to which our findings are reflected in vivo. As shown above, our preliminary analysis suggests that HCT116 cell PMDs are also hypomethylated in a colorectal tumour but not the normal colon (revision plan figure 1). We will also analyze how the changes in methylome affect gene expression using our RNA-seq data.

      - Why were DNMT3A and 3B transgenes used for ChIP instead of endogenous proteins? I know the authors cited work justifying this strategy, but this still merits explanation. Also, the expression level of transgenes compared to the endogenes was not shown (neither protein nor RNA level).

      DNMT3A and B transgenes were used because antibodies against the endogenous proteins are not suitable for ChIP. Furthermore, performing these experiments using endogenously tagged proteins, required generating 3 knock-in tagged lines (we have already generated HCT116 cells with tagged DNMT3B, Masalmeh et al 2021, PMID: 33514701).

      We have previously shown that our constructs do indeed result in overexpression of DNMT3B compared to endogenous protein in this system (Masalmeh et al 2021, PMID: 33514701). However, our previous results also demonstrate that overexpressed DNMT3B recapitulates the localization of the endogenously tagged protein to the genome (Taglini et al 2024, PMID: 38291337). Others have similarly demonstrated that ectopically expressed DNMT3A and DNMT3B can be used to understand their localization on the genome (Baubec et al 2015, PMID: 25607372 and Weinberg et al 2019, PMID: 31485078).

      To address this point, we propose to add further justification of our approach and discussion of this potential limitation to a revised version of the manuscript.

      - The DNMT3A binding profile appears as though it is on the edges of the PMDs and fairly depleted within (Fig 4A,D). Could the authors comment on this?

      This is an interesting point. We note that although mean DNMT3A signal is indeed higher at the edges of hypermethylated PMDs than inside these domains, its levels are both above background and the levels observed in HCT116 cells. As suggested by reviewer 3, this could be consistent with H3K36me2 and DNMT3A spreading in from the boundaries of hypermethylated PMDs in DNMT1 KO cells. We propose to add discussion of this possibility to the revised version of the manuscript.

      - A more compelling experiment would be to assess the loss of DNMT3A genetically. How would this affect PMD DNA methylation? Maybe in this case there would be an effect on replication timing. Could a KO or KD (eg, siRNA) strategy be employed to assess this on top of either the HCT116 or DNMT1 KO.

      As the reviewer suggests, functional experiments aimed at understanding the role of DNMT3A in our system are likely to be informative. We therefore propose to include such experiments in a revised version of the manuscript.

      - What is the major H3K36me2 methylatransferase in these cells? Could an Nsd1 KO or KD strategy be used, for example, to show that indeed H3K36 methylation is required for HyperPMDs? This would complement the DNMT3A experiment above.

      H3K36 methylation is thought to be deposited in the mammalian genome by at least 8 different methyltransferase enzymes, NSD1, NSD2, NSD3, ASH1L, SETD2, SETMAR, SMYD2 and SETD3 (Wagner and Carpenter 2023, PMID: 22266761). To understand which of these might be responsible for the deposition of H3K36me2 in hypermethylated PMDs, we have examined their expression in HCT116 and DNMT1 KO cells using our RNA-seq data. This suggests that 5 of these enzymes are highly expressed in HCT116 cells and their expression levels are similar in DNMT1 KO cellsrevision plan figure 2). The other 3 putative methyltransferases have lower expression levels and, although SMYD2 is significantly upregulated in DNMT1 KO cells, its expression remains low (revision plan figure 2). It is currently unclear whether SMYD2 is a bona fide H3K36 methyltransferase (Wagner and Carpenter 2023, PMID: 22266761). We also note that in a recent study, cells lacking NSD1, NSD2, NSD3, ASH1L and SETD2 had no detectable H3K36 methylation, although expression levels of SMYD2 were not reported (Shipman et al, 2024. PMID: 39390582). Based on this analysis, it is therefore unclear which enzyme(s) might be responsible for H3K36me2 deposition in hypermethylated PMDs and delineation of this enzyme would require multiple perturbation and sequencing experiments. We therefore suggest that assessing the consequences of knocking out H3K36me2 methyltransferase activity on hypermethylated PMDs is beyond the scope of a manuscript revision. We propose to include discussion of the expression of the different H3K36me2 depositing enzymes in the revised manuscript.

      Note, revision plan figure 2 was included with the full submission but cannot be uploaded in this format.

      Revision plan figure 2. HCT116 cells express multiple H3K36 methyltransferases. Barplot of mean expression levels for putative mammalian H3K36 methyltransferases in HCT116 and DNMT1 KO cells. Expression levels are counts per million (CPM) derived from RNA-seq. Mean expression levels are derived from 9 and 4 independent cultures of HCT116 and DNMT1 KO cells respectively.

      - Based on Figure 2C, it seems that a general predictive pattern of hyperPMDs is H3K9me3-enriched and H3K27me3-depleted. Is this an accurate interpretation? Given the authors' expertise in the relationship between DNMT3A and polycomb, could they perhaps give an explanation for this phenomenon?

      The reviewer is correct. In HCT116 cells, those PMDs that become hypermethylated in DNMT1 KO cells are marked by H3K9me3 and are H3K27me3-depleted (except at their boundaries). DNMT3A is recruited to polycomb-marked regions associated with H3K27me3 through interaction of its N-terminal region with H2AK119ub. However, this mark is depleted from hypermethylated-PMDs in DNMT1 KO cells (current manuscript Figure S5D) meaning that this pathway of recruitment is unlikely to explain DNMT3A's localisation to these regions in DNMT1 KO cells. This is discussed in the current manuscript:

      We and others have reported that DNMT3A is also recruited to the polycomb-associated H2AK119ub mark through its N-terminal region (Chen et al, 2024; Gretarsson et al, 2024; Gu et al, 2022; Wapenaar et al, 2024; Weinberg et al, 2021). However, we do not observe the polycomb-associated H3K27me3 mark, which is generally tightly correlated with H2AK119ub (Ku et al, 2008), at hypermethylated PMDs suggesting that H2AK119ub does not play a role in the recruitment of DNMT3A to these regions.

      Furthermore, DNMT3A's localisation is predominantly driven by its PWWP-dependent H3K36me2 recruitment pathway unless its PWWP domain is mutated (Heyn et al 2019, PMID: 30478443, Sendžikaitė et al 2019, PMID: 31015495, Kibe et al 2021, PMID: 34048432 and Weinberg et al, 2021, PMID: 33986537). Our observations of DNMT3A at hypermethylated PMDs marked by H3K36me2 is therefore consistent with previous findings. We propose to discuss this point in the revised manuscript.

      - This is a minor point, but calling the DNMT1 mutant a "KO" seemed a bit misleading, as it is a truncation mutant. Perhaps there is a more accurate way to describe this line.

      We propose to amend the manuscript to reflect this point as suggested by the reviewer. To ensure our responses are consistent with the reviewer comments we continue to refer to this line as DNMT1 KO cells in our revision plan.

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)): *

      *In this study, Kafetzopoulos et al. investigated the role of DNMT1-mediated methylation maintenance in cancer partially methylated domains (PMDs) using DNMT1 knockout HCT116 colorectal cancer cells. They used a range of sequencing-based approaches, including whole-genome bisulfite sequencing (WGBS), chromatin immunoprecipitation sequencing (ChIP-seq), and replication timing sequencing (Repli-seq), to define the dynamics of DNA methylation loss and gain in PMDs during DNA synthesis. Interestingly, they demonstrate that specific PMDs marked by H3K9me3 undergo a gain of DNA methylation in DNMT1-deficient HCT116 cells. This increase in methylation is associated with the loss of H3K9me3, an enrichment of H3K36me2, and the recruitment of DNMT3A. These findings suggest that de novo methyltransferase activity plays a critical role in determining which genomic regions become PMDs in cancer. *

      *The authors use a comprehensive and well-controlled set of sequencing-based techniques. While the sequencing depth for DNA methylation is somewhat limited, the inclusion of multiple biological replicates strengthens the reliability of the data. The study effectively integrates multiple layers of epigenomic information, providing a nuanced view of PMD regulation in the context of DNMT1 loss. *

      *Overall, this paper provides valuable insights into the epigenetic regulation of PMDs in cancer, and its conclusions are well supported by the data. It significantly advances our understanding of how DNMT1 loss reshapes the epigenome and highlights the interplay between de novo and maintenance methylation mechanisms in cancers. *

      ------------------------------------------------------------------------------

      *Reviewer #2 (Significance (Required)): *

      General assessment

      -The main strength of the study lies in the clear presentation of the data, which follows a cohesive and well-defined storyline.

      *-The authors demonstrate that both hypomethylated and hypermethylated domains occur at the late replication stage. They further investigate the dynamics of histone modifications and DNA methylation, focusing on the acquisition and loss of these marks, particularly in relation to DNMT3A and DNMT3B. *

      Limitation

      -Although the study is compelling, its primary limitation is the correlative nature of most of the data. While the high-level representations (e.g., tracks, heat maps) are convincing, the study would have been more informative if it had explored the impact of these changes on a specific set of genes or regions critical to cancer initiation and progression. For example, in the DNMT1 knockout model, how does the loss of H3K9me3, the gain of H3K36me2, and the recruitment of DNMT3A in hypermethylated PMDs affect the expression of key genes involved in colorectal cancer?

      To understand how the remodeling of DNA methylation and chromatin structure in DNMT1 KO cells affects gene expression, we propose to include an analysis of our RNA-seq data in the revised manuscript. We will also cross reference these results and our ChIP-seq with lists of colorectal cancer genes.

      Additional experiments that could provide deeper insights

      -Cross-validation in other cancer cell lines would have enable to define if these signatures are observed beyond HCT116.

      As the reviewer suggests, we propose to undertake analyses of additional samples in the revised manuscript to understand how our findings relate to domain-level methylation patterns beyond HCT116 cells. As noted above in response to reviewer 1, our preliminary analysis suggests our findings are relevant for primary colorectal tumours (revision plan figure 1).

      -Are the observed signatures permanent, or could they be reversed by reinstating the full activity of DNMT1? Since DNMT1 might be dysregulated but never completely deleted.

      To address this suggestion, we propose to include the results of a DNMT1 rescue experiment in the revised manuscript.

      -Use knockdown and overexpression experiments to track the dynamics and occurrence of these molecular events over time, providing insight into the progression and reversibility of epigenetic changes.

      This is an interesting suggestion. As the reviewer suggests, we propose to analyse data derived from time-course experiments to understand the dynamics of changes in different genomic compartments following perturbation of DNMT1.

      Advances

      -The study provides new insights into the establishment of PMD types in colorectal cancer cell lines.

      -These findings contribute both conceptually and mechanistically to our understanding of how DNA methylation patterns are regulated during cancer development.

      Audience:

      -This study will appeal to a broad audience, from researchers primarily focused on epigenetics and cancer biology to those interested in the mechanistic underpinnings of DNA methylation and its role in cancer progression. It will also be relevant to those exploring therapeutic strategies targeting epigenetic regulators in cancer.

      We thank the reviewer for their kind comments on our manuscript.

      ------------------------------------------------------------------------------

      *Reviewer #3 (Evidence, reproducibility and clarity (Required)): *

      Summary:*Cancer is linked to the acquisition of an atypical DNA methylation landscape, with broad domains of partial DNA methylation (termed PMDs). This study investigates PMDs in a colorectal cancer cell line and evaluates the contribution of DNMT1 in maintaining PMDs, using a DNMT1 KO line. The authors find that PMDs preferentially lose DNA methylation upon loss of DNMT1, but they find a number of domains that paradoxically gain DNA methylation (hyperPMDs). They attribute this gain of methylation to the action of DNMT3A through the accumulation of H3K36me2 and loss of heterochromatin mark H3K9me3. Together this work sheds light on the dynamic mechanisms regulating the atypical DNA methylation landscape in colorectal cancer cells. *

      General comments:The introduction is informative and well written. Additionally, the work is rigorously done and analyses are clear. However, the conclusions and summary figure largely focus on the relationship between PMDs with H3K9me3 and H3K36me2, but I think the role for H3K27me3 should be revisited based on the results presented. H3K9me3 is present at PMDs and hyperPMDs, but H3K27me3 level appears to be a much more defining feature of whether they lose or gain methylation upon loss of DNMT1 (Figure 2, Figure S2C- D). There is a reported interplay between PRC2 and DNMT3A activity at DNA methylation valleys in other cell contexts (e.g., mouse embryogenesis, hematopoietic cells), so couldn't H3K27me3 be performing a 'boundary' function at PMDs and when sufficiently low, permits spread of H3K36me2 in the absence of DNMT1? I think it is worth further exploring the H3K27me3 data.

      The reviewer makes an interesting point about the potential for H3K27me3 to act as a boundary preventing H3K36me2 spread into PMDs. Multiple studies have shown that H3K36me2 restricts H3K27me3 deposition in the genome (Streubel et al 2018, PMID: 29606589, Shirane et al 2020, PMID: 32929285 and Farhangdoost et al 2021, PMID: 3362635). The structural nature of this inhibitory effect has also been resolved, demonstrating that the PRC2 catalytic subunit, EZH2 directly binds H3K36 and this is inhibited when the residue is methylated (Jani et al 2019, PMID: 30967505, Finogenova et al 2020, PMID: 33211010 and Cookis et al 2025, PMID: 39774834). The effect of H3K27me3 on H3K36me2 is less well characterised. However, previous work has suggested that inhibiting EZH2 leads to elevated H3K36me2 being established on newly replicated chromatin (Alabert et al 2020, PMID: 31995760). Expression of the EZH2-inhibiting oncohistone H3.3K27M has also been reported to lead to increased H3K36me2 dependent on NSD1/2 in diffuse intrinsic pontine gliomas (DIPG) (Stafford et al 2018, PMID: 30402543 and Yu et al 2021, PMID: 34261657). However, this increase was not reported by an independent study of H3.3K27M DIPG cells (Harutyunyan et al 2020, PMID: 33207202) and the molecular basis of the effect of H3K27me3 on H3K36me2 remains unclear.

      As the reviewer suggests, we propose to explore the relationship between H3K27me3 and H3K36me2 further in a revised manuscript along with the including further discussion of previous findings in this area.

      Additionally, a key point that is illustrated in the summary figure, is the localization of H3K36me2 at HMDs and its mutual exclusivity with H3K9me3 (a mark typically associated with high DNA methylation). However, because the H3K36me2 is introduced quite late in the analysis, I feel that a rigorous evaluation of its enrichment and anti-correlation with H3K9me3 at highly methylated domains (HMDs) is missing.

      The relationship between H3K36me2 and H3K9me3 is far less explored than that of H3K27me3 and H3K36me2. Interestingly, we note that a recent study reported that depletion of H3K36me2 results in H3K9me3 re-distribution suggesting that H3K9me3 is restricted by H3K36me2 (Padilla et al 2024, DOI: 10.1101/2024.08.10.607446, also cited in the original manuscript).

      To understand this relationship further, we therefore propose to explore the relationship between H3K9me3 and H3K36me2 in our datasets as part of revised manuscript along with including additional discussion of relevant experimental findings.

      In general, I also found that I was jumping between figures a lot and needed to look at the supplements to gain the full picture. It may be beneficial to re-organize the figures.

      In accordance with the reviewer's suggestion, we propose to re-organise the revised manuscript to make it easier to follow.

      Specific Comments/Questions:

      • An expanded explanation of the truncated DNMT1 in the DNMT1 KO cells would be helpful for context**
* As suggested by the reviewer, we will amend the manuscript to include an expanded discussion of the DNMT1 truncation present in the cell line.

      • Does the DNMT expression in HCT116 cells reflect the levels seen in primary colorectal cancers? Hence, do you think these cultured cells reflect aspects of DNA methylation dynamics that would be seen in tumors?**
*

      While differences between cancer cell line and tumour methylation patterns have previously been noted (for example Anne Rogers et al 2018, PMID: 30559935), we have previously demonstrated that HCT116 cells recapitulate CpG island methylation patterns observed in colorectal tumours (Masalmeh et al 2021, PMID: 33514701). As stated above in response to reviewer 1, we have now examined the methylation status of HCT116 PMDs in a colorectal tumour. This analysis shows that HCT116 PMDs have reduced methylation levels in a colorectal tumour but not in the normal colon (revision plan figure 1). We propose to extend this analysis of colorectal tumour samples and add them to the revised manuscript to address this point.

      Regarding the expression of DNMTs in colorectal tumours, DNMT1 is ubiquitously expressed to our knowledge. DNMT3B is reported to be overexpressed in 15-20% of cases of colorectal cancer, often as a result of amplification (Nosho et al 2009, PMID: 19470733, Ibrahim et al 2011, PMID: PMID: 21068132, Zhang et al 2018, PMID: 30468428 and Mackenzie et al 2020, PMID: 32058953). DNMT3A expression in colorectal tumours is less studied but one report suggests upregulation in at least some tumours (Robertson et al 1999, PMID: 10325416 and Zhang et al 2018, PMID: 30468428). We propose to add additional discussion of DNMT expression in colorectal cancer to the revised manuscript to clarify the degree to which our results reflect methylation regulation in primary colorectal tumours.

      • Although DNMT3A/B mRNA levels are similar between DNMT1 KO and HCT116 cells, is the protein abundance altered? I think there would be value in showing a Western blot analysis, as the loss of DNMT1 protein may alter the stability of the de novo DNMTs. Is a similar level of expression of the ectopic T7-DNMT3A and T7-DNMT3B achieved in HCT116 and DNMT1 KO cells? A western blot showing this would also be valuable.**
*

      As part of our work towards revising the manuscript, we have undertaken blots of DNMT3A in our cell lines. This shows that DNMT3A levels in DNMT1 KO cells are similar to those in HCT116 cells which (revision plan figure 3). We propose to include this in the revised manuscript alongside a similar analysis of DNMT3B. We will also include an analysis of T7-DNMT3A and T7-DNMT3B levels to understand whether they are expressed to similar levels in HCT116 and DNMT1 KO cells.

      Note, revision plan figure 3 was included with the full submission but cannot be uploaded in this format.

      Revision plan figure 3. DNMT3A protein levels are similar in HCT116 and DNMT1 KO cells. Left, representative DNMT3A Western blot. Right, bar plot quantifying relative DNMT3A levels. The bar height indicates the mean levels observed in protein extracts from 3 independent cell cultures. Individual points indicate the level of each replicate.

      • Do you think that the increase in DNMT3A over HyperPMD compared to H3K9me3-marked PMDs is related to an increase in protein bound at these domains or an altered residence time?*

      The reviewer makes an interesting point with regard to a potential alteration of DNMT3A residence at hypermethylated PMDs. Given that ChIP-seq signal is affected by residence time (Schmiedeberg et al 2009, PMID: 19247482), it is possible that our findings could reflect this rather than increased DNMT3A localisation. We propose to add discussion of this point as a limitation of the current study to the manuscript.

      It would also be valuable to move the plot showing levels of DNMT3A/3B at HMDs, from the S4C/D to the main Figure 4, for reference. It would also be interesting to see the enrichment of DNMT3A/B at all PMDs (not just H3K9me3-marked PMDs).*
*

      As the reviewer suggests, we will include the data on HMDs to the main Figure 4 and include enrichments at all PMDs in the supplementary figures.

      • It appears that the same genomic locus is used multiple times across figures Fig 1A, Fig 2B, Fig 3A, Fig 4A, Fig 5B to illustrate the trends reported from the global analyses. While this has value in showing the dynamics across datasets at this region, I think it is important to illustrate that these trends can be observed elsewhere. Please add or replace some plots with additional loci. Furthermore, please add the genomic region coordinates to the figure or figure legend.*

      We had shown a single locus for consistency and to not overcomplicate figures which already contain multiple panels. As the reviewer suggests, we will add additional loci in the supplementary figures of our revised manuscript. We had also included the chromosome co-ordinates in the figures. In the revised version we will ensure that the precise co-ordinates are included in the legends.

      • The ChIP-seq data is quantified as IP/input. This quantitation can be prone to introducing artefacts into analyses if the input coverage is substantially uneven over AT-rich regions or CpG islands, or if the sequencing depth is insufficient. I would encourage the authors to check that the trends observed are still present if quantified without correcting against the inputs. If using IP/input, in the supplementary figures, I think it would be valuable to show the uncorrected quantitation of inputs across PMDs, to demonstrate that there is even coverage and this isn't contributing to any of the changes observed.**
*

      We thank the reviewer for this point and we propose to examine the quantification of the ChIP-seq without normalizing to input to ensure that uneven input signal does not substantially contribute to our results.

      • Generally, the n numbers for different groups of probes can be confusing and increased clarity would be helpful.*

      We will clarify the explanation of n numbers in the revised manuscript.

      *Reviewer #3 (Significance (Required)): *

      This study adds to the accumulating body of evidence that DNMT3A recruitment is mediated primarily through H3K36me2 across cell contexts, shedding light on the interplay between histone modifications and de novo DNA methylation. Understanding these mechanisms is important to appreciate the role for DNMT3A in establishing DNA methylation in development and disease contexts. It does remain unclear why, upon loss of DNMT1 in colorectal cancer cells, some PMDs accumulate H3K36me2 and consequently DNA methylation, while others do not. Further study into the chromatin dynamics will be valuable in understanding determinants of the DNA methylation landscape in cancer.

      We thank the reviewer for their insightful comments and believe that our proposed revisions will further clarify the points they raise.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      We have not yet incorporated revisions into the manuscript.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      As stated in our responses to the reviewer comments above, we plan to address all comments. However, we suggest that two experiments proposed by the reviewers are beyond the scope of a manuscript revision and we will instead address these comments in the following manner:

      Analysis of a DNMT1 gain-of-function line (Reviewer 1). As suggested by the reviewer such a line is non-trivial to generate. It would also require extensive profiling of this new line to fully understand its implications for our findings. We therefore believe it is outwith the scope of a manuscript revision. Instead, we propose to address this comment by undertaking the related experiment suggested by Reviewer 2 and perform a DNMT1 rescue experiment in the DNMT1 KO line. Analysis of H3K36me2 methyltransferase knockout cells (Reviewer 1). Our preliminary analysis suggests that HCT116 cells express multiple H3K36 methyltransferases and that their expression does not vary greatly in DNMT1 KO cels (revision plan figure 2). This means that it is unclear which enzyme(s) might be responsible for depositing H3K36me2 in hypermethylated PMDs. Delineation of this would require the generation and analysis of multiple knockouts and we suggest it is therefore outwith the scope of a manuscript revision. To address this point we will instead include discussion of the spectrum of H3K36 methyltransferases expressed in our cells in the revised manuscript as detailed in the specific response above.

    1. AbstractReef-building corals are integral ecosystem engineers in tropical coral reefs worldwide but are increasingly threatened by climate change and rising ocean temperatures. Consequently, there is an urgency to identify genetic, epigenetic, and environmental factors, and how they interact, for species acclimatization and adaptation. The availability of genomic resources is essential for understanding the biology of these organisms and informing future research needs for management and and conservation. The highly diverse coral genus Acropora boasts the largest number of high-quality coral genomes, but these remain limited to a few geographic regions and highly studied species. Here we present the assembly and annotation of the genome and DNA methylome of Acropora pulchra from Mo’orea, French Polynesia. The genome assembly was created from a combination of long-read PacBio HiFi data, from which DNA methylation data were also called and quantified, and additional Illumina RNASeq data for ab initio gene predictions. The work presented here resulted in the most complete Acropora genome to date, with a BUSCO completeness of 96.7% metazoan genes. The assembly size is 518 Mbp, with 174 scaffolds, and a scaffold N50 of 17 Mbp. Structural and functional annotation resulted in the prediction of a total of 40,518 protein-coding genes, and 16.74% of the genome in repeats. DNA methylation in the CpG context was 14.6% and predominantly found in flanking and gene body regions (61.7%). This reference assembly of the A. pulchra genome and DNA methylome will provide the capacity for further mechanistic studies of a common coastal coral in French Polynesia of great relevance for restoration and improve our capacity for comparative genomics in Acropora and cnidarians more broadly.

      This work has been published in GigaByte Journal under a CC-BY 4.0 license (https://doi.org/10.46471/gigabyte.153). These reviews (including a protocol review) are as follows.

      Reviewer 1. Yanshuo Liang

      The manuscript by Conn et al. detail the high-quality genome assembly of Acropora pulchra, a Acropora of ecological and evolutionary significance, and also analyzes its genome-wide DNA methylation characteristics. These data complement the genetic resources of the Acropora genome. This manuscript is well written and represents a valuable contribution to the field. I have some comments below for the authors to address but look forward to seeing this research published. Q1: In the first sentence of the second paragraph of the Context: This is the first study to utilize PacBio long-read HiFi sequencing to generate a high quality genome with high BUSCO completeness, in tandem with its DNA methylome for scleractinian corals. Language such as "new", "first", "unprecedented", etc, should be avoided because it often leads to unproductive controversy. As far as I know, the genome you assembled is not the first stony coral to be sequenced using PacBio long-read HiFi sequencing. Back in 2024, He et al. assembled Pocillopora verrucosa (Scleractinia) to the chromosome level using PacBio HiFi long-read sequencing and Hi-C technology. Here I would suggest please rephrase. Reference: He CP, Han TY, Huang WL, et al. Deciphering omics atlases to aid stony corals in response to global change, 11 March 2024, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-4037544/v1]. Q2: In this sentence: “On 23 October 2022, sperm samples were collected from the spawning of A.pulchra and preserved in Zymo DNA/RNA shield.” Please “A.pulchra” to “A. pulchra”. Q3: Please change all “k-mer” into “k-mer” in the manuscript. Q4: Please change “Long-Tandem Repeats” to “Long Terminal Repeats” Q5: In this sentence: “Funannotate train uses Trinity [18] and PASA [19] for ab initio predictions. Funannotate predict was then run to assign gene models using AUGUSTUS [20], GeneMark [21], and Evidence Modeler [19] to estimate final gene models.” Please write versions of these software. Q6: [20] Later references do not correspond well in the manuscript, please check!

      Reference 2. Jason Selwyn

      Is the language of sufficient quality? Yes. There are some minor grammatical issues throughout that warrent a closer reading to correct. E.g. Abstract: "...urgency to identify how genetic, epigenetic, and environmental...", "...management and and conservation...". Context: "...we aim to provide..." etc. Are all data available and do they match the descriptions in the paper? Yes. The link to the OSF repository in the PDF did not work. However, the link to the OSF repository from the github did work. Is the data acquisition clear, complete and methodologically sound? No. It isn't mentioned in the manuscript where the RNAseq data used to annotate the genome is from, nor any quality filtering steps that may have been applied to the RNA data prior to its use for annotation. Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes. Excluding the above comment about the RNA data. Additional Comments: This is a well assembled, and annotated genome that will contribute to the growing database of Acropora genomes. The manuscript could do with a simple pass to identify and correct some relatively minor grammatical issues and inconsistencies (Table 1 includes a thousands comma separator in some instances and not others) and needs to include details about the source of the RNA data used to train the ab initio gene predictors. There also appears to be a problem with the citation numbering after 20.

      **Reviewer 3. Benjamin Young ** Are all data available and do they match the descriptions in the paper? Yes. Raw reads, metadata, and genome assembly are publicly available and have a NCBI project number in which they are all linked. Is the data acquisition clear, complete and methodologically sound? Yes. Collection of sperm samples, HMW DNA extraction, and SMRT Bell Library prep are written clearly. I have asked for a few clarifications on wording in this section in the attached edited pdf document. Is there sufficient detail in the methods and data-processing steps to allow reproduction? Yes. I think the pipeline used for de-novo genome generation (including raw read cleaning and assembly), repeat masking, and gene prediction and annotation is of high quality and best practices. With the inclusion of the GitHub and all analyses scripts, it is possible to reproduce the assembly generated. Is there sufficient data validation and statistical analyses of data quality? Yes. This is not super relevant for a genome assembly paper so I have no additional comments here. Is the validation suitable for this type of data? Yes. The authors use tools such as GenomeScope2 and BUSCO for validation of their data. It would be nice to see the tool they used to identify N50 and L50 (maybe Quast) included in the methods. Additionally, I would like to see a Merqury analysis of the HifiAsm primary and alternate assemblies to show that duplicate purging was successful. Additional Comments: I would first like to commend the authors for a well assembled genome resource for a coral species that will be greatly beneficial to the wider coral and scientific community. I have provided a PDF with comments throughout for the authors to address. The majority of these are easy fixes, including things such as sentence structure, inconsistent capitalisation of subheadings, additional references for methods, clarification of statements, and other suggestions. I do have a few larger requests for this to be published, and these are the reasons for selecting the major revision option as there may need to be figure updates, and quick additional analyses to be run. 1. Can you please correct the verbiage around BUSCO analysis throughout the manuscript. It is often stated "BUSCO completeness of xx%". BUSCO doesn't directly measure completeness, rather completeness of single copy orthologs against a specific database. I have left comments throughout on potential rewording for these instances. Please also specify the exact database you used (i.e. odb10_metazoa). Finally, can you please be more specific when stating BUSCO results, specifically when you use 96.9% this is single copy and duplicated complete BUSCOS. I have left comments in the pdf again for this. 2. In the results for Genome Assembly section can you please include results (i.e. length, N50, L50, number contigs/scaffolds) for the primary assembly and the scaffolded assembly. 3. I think it would be not much work and provide additional information to show successful duplicate purging to run a Merqury analysis on the primary and alternative assemblies from HiFiAsm. 4. Can you include some additional information in the "Structural and Functional Annotation section". Specifically, can you provide information on the results from the funannoatate predict step, and then how funannotate update improved this (if at all). 5. Please double check the methods section for funannotate. From reading the funannoatate documentation I think there may be some confusion on what each step (train, predict, update, annotate) is doing. I have provided comments in the pdf to help clarify, and have also linked the funnannotate documentation. 6. On NCBI I see that an additional Acropora pulchra genome has just been made available (29th Jan 2025), with this to the chromosome level (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_965118205.1/). I think it would be prudent to include this assemblies statistics in your Table 1, and also run a BUSCO analysis on this other assembly to compare with your one. While they got to chromosome level, you do have markedly less contigs. I do not think this is necessary for this manuscript, but future work you could look to use their chromosome assembly to get your scaffolded assembly to chromosome level. Again, I want to say this is a wonderful resource for the coral and wider scientific community, and the pipeline for de-novo assembly and annotation is best practices in my opinion. Annotated additional file: https://gigabyte-review.rivervalleytechnologies.comdownload-api-file?ZmlsZV9wYXRoPXVwbG9hZHMvZ3gvRFIvNTk0L2Nvbm5ldGFsMjAyNV9yZXZpZXdjb21tZW50cy5wZGY=

      Re-review:

      The authors have addressed all my comments and queries, and included nearly all recommendations. Thank you ! A few quick notes to fix before publication -
      

      "The input created Funannotate train uses Trinity v.2.15.2 [22] and PASA v.2.5.3 [23] for transcript assembly prior to ab initio predictions". This sentence reads weird, reword before publishing. I think maybe just remove "created Funannotate train" and then it reads correctly. Or "Funnannotate trains uses .....". - "PFAM v.37.0 [28], CAZyme [29], UniProtKB v[30] and GO [31]." Missing a few version numbers, and UniProt just has a v. - "The mitochondrial genome was successfully assembled and circularized using MitoHifi v3.2.2 The final assembled A. pulchra mitogenome is". Just missing a period i think before "The final assembly". Great job and a very useful resource for the coral community !!

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their feedback on our paper. We have taken all their comments into account in revising the manuscript. We provide a point-by-point response to their comments, below.

      Reviewer #1

      Major comments:

      The manuscript is clearly written with a level of detail that allows others to reproduce the imaging and cell-tracking pipeline. Of the 22 movies recorded one was used for cell tracking. One movie seems sufficient for the second part of the manuscript, as this manuscript presents a proof-of-principle pipeline for an imaging experiment followed by cell tracking and molecular characterisation of the cells by HCR. In addition, cell tracking in a 5-10 day time-lapse movie is an enormous time commitment.

      My only major comment is regarding "Suppl_data_5_spineless_tracking". The image file does not load. It looks like the wrong file is linked to the mastodon dataset. The "Current BDV dataset path" is set to "Beryl_data_files/BLB mosaic cut movie-02.xml", but this file does not exist in the folder. Please link it to the correct file.

      We have corrected the file path in the updated version of Suppl. Data 5.

      Minor comments:

      The authors state that their imaging settings aim to reduce photo damage. Do they see cell death in the regenerating legs? Is the cell death induced by the light exposure or can they tell if the same cells die between the movies? That is, do they observe cell death in the same phases of regeneration and/or in the same regions of the regenerating legs?

      Yes, we observe cell death during Parhyale leg regeneration. We have added the following sentence to explain this in the revised manuscript: "During the course of regeneration some cells undergo apoptosis (reported in Alwes et al., 2016). Using the H2B-mRFPruby marker, apoptotic cells appear as bright pyknotic nuclei that break up and become engulfed by circulating phagocytes (see bright specks in Figure 2F)."

      We now also document apoptosis in regenerated legs that have not been subjected to live imaging in a new supplementary figure (Suppl. Figure 3), and we refer to these observations as follows: "While some cell death might be caused by photodamage, apoptosis can also be observed in similar numbers in regenerating legs that have not been subjected to live imaging (Suppl. Figure 3)."

      Based on 22 movies, the authors divide the regeneration process into three phases and they describe that the timing of leg regeneration varies between individuals. Are the phases proportionally the same length between regenerating legs or do the authors find differences between fast/slow regenerating legs? If there is a difference in the proportions, why might this be?

      Both early and late phases contribute to variation in the speed of regeneration, but there is no clear relationship between the relative duration of each phase and the speed of regeneration. We now present graphs supporting these points in a new supplementary figure (Suppl. Figure 2).

      To clarify this point, we have added the following sentence in the manuscript: "We find that the overall speed of leg regeneration is determined largely by variation in the speed of the early (wound closure) phase of regeneration, and to a lesser extent by variation in later phases when leg morphogenesis takes place (Suppl. Figure 2 A,B). There is no clear relationship between the relative duration of each phase and the speed of regeneration (Suppl. Figure 2 A',B')."

      Based on their initial cell tracing experiment, could the authors elaborate more on what kind of biological information can be extracted from the cell lineages, apart from determining which is the progenitor of a cell? What does it tell us about the cell population in the tissue? Is there indication of multi- or pluripotent stem cells? What does it say about the type of regeneration that is taking place in terms of epimorphosis and morphallaxis, the old concepts of regeneration?

      In the first paragraph of Future Directions we describe briefly the kind of biological information that could be gained by applying our live imaging approach with appropriate cell-type markers (see below). We do not comment further, as we do not currently have this information at hand. Regarding the concepts of epimorphosis and morphallaxis, as we explain in Alwes et al. 2016, these terms describe two extreme conditions that do not capture what we observe during Parhyale leg regeneration. Our current work does not bring new insights on this topic.

      Page 5. The authors mention the possibility of identifying the cell ID based on transcriptomic profiling data. Can they suggest how many and which cell types they expect to find in the last stage based on their transcriptomic data?

      We have added this sentence: "Using single-nucleus transcriptional profiling, we have identified approximately 15 transcriptionally-distinct cell types in adult Parhyale legs (Almazán et al., 2022), including epidermis, muscle, neurons, hemocytes, and a number of still unidentified cell types."

      Page 6. Correction: "..molecular and other makers.." should be "..molecular and other markers.."

      Corrected

      Page 8. The HCR in situ protocol probably has another important advantage over the conventional in situ protocol, which is not mentioned in this study. The hybridisation step in HCR is performed at a lower temperature (37˚C) than in conventional in situ hybridisation (65˚C, Rehm et al., 2009). In other organisms, a high hybridisation temperature affects the overall tissue morphology and cell location (tissue shrinkage). A lower hybridisation temperature has less impact on the tissue and makes manual cell alignment between the live imaging movie and the fixed HCR in situ stained specimen easier and more reliable. If this is also the case in Parhyale, the authors must mention it.

      This may be correct, but all our specimens were treated at 37˚C, so we cannot assess whether hybridisation temperature affects morphological preservation in our specimens.

      Page 9. The authors should include more information on the spineless study. What been is spineless? What do the cell lineages tell about the spineless progenitors, apart from them being spread in the tissue at the time of amputation? Do spineless progenitors proliferate during regeneration? Do any spineless expressing cells share a common progenitor cell?

      We now point out that spineless encodes a transcription factor. We provide a summary of the lineages generating spineless-expressing cells in Suppl. Figure 6, and we explain that "These epidermal progenitors undergo 0, 1 or 2 cell divisions, and generate mostly spineless-expressing cells (Suppl. Figure 5)."

      Page 10. Regarding the imaging temperature, the Materials and Methods state "... a temperature control chamber set to 26 or 27˚C..."; however, in Suppl. Data 1, 26˚C and 29˚C are indicated as imaging temperatures. Which is correct?

      We corrected the Methods by adding "with the exception of dataset li51, imaged at 29{degree sign}C"

      Page 10. Regarding the imaging step size, the Materials and Methods state "...step size of 1-2.46 µm..."; however, Suppl. Data 1 indicate a step size between 1.24 - 2.48 µm. Which is correct?

      We corrected the Methods.

      Page 11. Correct "...as the highest resolution data..." to "...at the highest resolution data..."

      The original text is correct ("standardised to the same dimensions as the highest resolution data").

      Page 11. Indicate which supplementary data set is referred to: "Using Mastodon, we generated ground truth annotations on the original image dataset, consisting of 278 cell tracks, including 13,888 spots and 13,610 links across 55 time points (see Supplementary Data)."

      Corrected

      p. 15. Indicate which supplementary data set is referred to: "In this study we used HCR probes for the Parhyale orthologues of futsch (MSTRG.441), nompA (MSTRG.6903) and spineless (MSTRG.197), ordered from Molecular Instruments (20 oligonucleotides per probe set). The transcript sequences targeted by each probe set are given in the Supplementary Data."

      Corrected

      Figure 3. Suggestion to the overview schematics: The authors might consider adding "molting" as the end point of the red bar (representing differentiation).

      The time of molting is not known in the majority of these datasets, because the specimens were fixed and stained prior to molting. We added the relevant information in the figure legend: "Datasets li-13 and li-16 were recorded until the molt; the other recordings were stopped before molting."

      Figure 4B': Please indicate that the nuclei signal is DAPI.

      Corrected

      Supplementary figure 1A. Word is missing in the figure legend: ...the image also shows weak...

      Corrected

      Supplementary Figure 2: Please indicate the autofluorescence in the granular cells. Does it correspond to the yellow cells?

      Corrected

      Video legend for video 1 and 2. Please correct "H2B-mREFruby" to "H2B-mRFPruby".

      Corrected

      Reviewer #2

      Major comments:

      MC 1. Given that most of the technical advances necessary to achieve the work described in this manuscript have been published previously, it would be helpful for the authors to more clearly identify the primary novelty of this manuscript. The abstract and introduction to the manuscript focus heavily on the technical details of imaging and analysis optimization and some additional summary of the implications of these advances should be included here to aid the reader.

      This paper describes a technical advance. While previous work (Alwes et al. 2016) established some key elements of our live imaging approach, we were not at that time able to record the entire time course of leg regeneration (the longest recordings were 3.5 days long). Here we present a method for imaging the entire course of leg regeneration (up to 10 days of imaging), optimised to reduce photodamage and to improve cell tracking. We also develop a method of in situ staining in cuticularised adult legs (an important technical breakthrough in this experimental system), which we combine with live imaging to determine the fate of tracked cells. We have revised the abstract and introduction of the paper to point out these novelties, in relation to our previous publications.

      In the abstract we explain: "Building on previous work that allowed us to image different parts of the process of leg regeneration in the crustacean Parhyale hawaiensis, we present here a method for live imaging that captures the entire process of leg regeneration, spanning up to 10 days, at cellular resolution. Our method includes (1) mounting and long-term live imaging of regenerating legs under conditions that yield high spatial and temporal resolution but minimise photodamage, (2) fixing and in situ staining of the regenerated legs that were imaged, to identify cell fates, and (3) computer-assisted cell tracking to determine the cell lineages and progenitors of identified cells. The method is optimised to limit light exposure while maximising tracking efficiency."

      The introduction includes the following text: "Our first systematic study using this approach presented continuous live imaging over periods of 2-3 days, capturing key events of leg regeneration such as wound closure, cell proliferation and morphogenesis of regenerating legs with single-cell resolution (Alwes et al., 2016). Here, we extend this work by developing a method for imaging the entire course of leg regeneration, optimised to reduce photodamage and to improve cell tracking. We also develop a method of in situ staining of gene expression in cuticularised adult legs, which we combine with live imaging to determine the fate of tracked cells."

      MC 2. The description of the regeneration time course is nicely detailed but also very qualitative. A major advantage of continuous recording and automated cell tracking in the manner presented in this manuscript would be to enable deeper quantitative characterization of cellular and tissue dynamics during regeneration. Rather than providing movies and manually annotated timelines, some characterization of the dynamics of the regeneration process (the heterogeneity in this is very very interesting, but not analyzed at all) and correlating them against cellular behaviors would dramatically increase the impact of the work and leverage the advances presented here. For example, do migration rates differ between replicates? Division rates? Division synchrony? Migration orientation? This seems to be an incredibly rich dataset that would be fascinating to explore in greater detail, which seems to me to be the primary advance presented in this manuscript. I can appreciate that the authors may want to segregate some biological findings from the method, but I believe some nominal effort highlighting the quantitative nature of what this method enables would strengthen the impact of the paper and be useful for the reader. Selecting a small number of simple metrics (eg. Division frequency, average cell migration speed) and plotting them alongside the qualitative phases of the regeneration timeline that have already been generated would be a fairly modest investment of effort using tools that already exist in the Mastodon interface, I would roughly estimate on the order of an hour or two per dataset. I believe that this effort would be well worth it and better highlight a major strength of the approach.

      The primary goal of this work was to establish a robust method for continuous long-term live imaging of regeneration, but we do appreciate that a more quantitative analysis would add value to the data we are presenting. We tried to address this request in three steps:

      First, we examined whether clear temporal patterns in cell division, cell movements or other cellular features can be observed in an accurately tracked dataset (li13-t4, tracked in Sugawara et al. 2022). To test this we used the feature extraction functions now available on the Mastodon platform (see link). We could discern a meaningful temporal pattern for cell divisions (see below); the other features showed no interpretable pattern of variation.

      Second, we asked whether we could use automated cell tracking to analyse the patterns of cell division in all our datasets. Using an Elephant deep learning model trained on the tracks of the li13-t4 dataset, we performed automated cell tracking in the same dataset, and compared the pattern of cell divisions from the automated cell track predictions with those coming from manually validated cell tracks. We observed that the automated tracks gave very imprecise results, with a high background of false positives obscuring the real temporal pattern (see images below, with validated data on the left, automated tracking on the right). These results show that the automated cell tracking is not accurate enough to provide a meaningful picture on the pattern of cell divisions.

      Third, we tried to improve the accuracy of detection of dividing cells by additional training of Elephant models on each dataset (to lower the rate of false positives), followed by manual proofreading. Given how labour intensive this is, we could only apply this approach to 4 additional datasets. The results of this analysis are presented in Figure 4.

      MC 3. The authors describe the challenges faced by their described approach: Using this mode of semi-automated and manual cell tracking, we find that most cells in the upper slices of our image stacks (top 30 microns) can be tracked with a high degree of confidence. A smaller proportion of cell lineages are trackable in the deeper layers.

      Given that the authors quantify this in Table 1, it would aid the reader to provide metrics in the manuscript text at this point. Furthermore, the metrics provided in Table 1 appear to be for overall performance, but the text describes that performance appears to be heavily depth dependent. Segregating the performance metrics further, for example providing DET, TRA, precision and recall for superficial layers only and for the overall dataset, would help support these arguments and better highlight performance a potential adopter of the method might expect.

      In the revised manuscript we have added data on the tracking performance of Elephant in relation to imaging depth in Suppl. Figure 3. These data confirm our original statement (which was based on manual tracking) that nuclei are more challenging to track in deeper layers.

      We point to these new results in two parts of the paper, as follows: "A smaller proportion of cells are trackable in the deeper layers (see Suppl. Figure 3)", and "Our results, summarised in Table 1A, show that the detection of nuclei can be enhanced by doubling the z resolution at the expense of xy resolution and image quality. This improvement is particularly evident in the deeper layers of the imaging stacks, which are usually the most challenging to track (Suppl. Figure 3)."

      MC 4. Performance characterization in Table 1 appears to derive from a single dataset that is then subsampled and processed in different ways to assess the impact of these changes on cell tracking and detection performance. While this is a suitable strategy for this type of optimization it leaves open the question of performance consistency across datasets. I fully recognize that this type of quantification can be onerous and time consuming, but some attempt to assess performance variability across datasets would be valuable. Manual curation over a short time window over a random sampling of the acquired data would be sufficient to assess this.

      We think that similar trade-offs will apply to all our datasets because tracking performance is constrained by the same features, which are intrinsic to our system; e.g. by the crowding of nuclei in relation to axial resolution, or the speed of mitosis in relation to the temporal resolution of imaging. We therefore do not see a clear rationale for repeating this analysis. On a practical level, our existing image datasets could not be subsampled to generate the various conditions tested in Table 1, so proving this point experimentally would require generating new recordings, and tracking these to generate ground truth data. This would require months of additional work.

      A second, related question is whether Elephant would perform equally well in detecting and tracking nuclei across different datasets. This point has been addressed in the Sugawara et al. 2022 paper, where the performance of Elephant was tested on diverse fluorescence datasets.

      Reviewer #3

      Major comments:

      The authors should clearly specify what are the key technical improvements compared to their previous studies (Alwes et al. 2016, Elife; Konstantinides & Averof 2014, Science). There, the approaches for mounting, imaging, and cell tracking are already introduced, and the imaging is reported to run for up to 7 days in some cases.

      In Konstantinides and Averof (2014) we did not present any live imaging at cellular resolution. In Alwes et al. (2016) we described key elements of our live imaging approach, but we were never able to record the entire time course of leg regeneration. The longest recordings in that work were 3.5 days long.

      We have revised the abstract and introduction to clarify the novelty of this work, in relation to our previous publications. Please see our response to comment MC1 of reviewer 2.

      While the authors mention testing the effect of imaging parameters (such as scanning speed and line averaging) on the imaging/tracking outcome, very little or no information is provided on how this was done beyond the parameters that they finally arrived to.

      Scan speed and averaging parameters were determined by measuring contrast and signal-to-noise ratios in images captured over a range of settings. We have now added these data in Supplementary Figure 1.

      The authors claim that, using the acquired live imaging data across entire regeneration time course, they are now able to confirm and extend their description of leg regeneration. However, many claims about the order and timing of various cellular events during regeneration are supported only by references to individual snapshots in figures or supplementary movies. Presenting a more quantitative description of cellular processes during regeneration from the acquired data would significantly enhance the manuscript and showcase the usefulness of the improved workflow.

      The events we describe can be easily observed in the maximum projections, available in Suppl. Data 2. Regarding the quantitative analysis, please see our response to comment MC2 of reviewer 2.

      Table 1 summarizes the performance of cell tracking using simulated datasets of different quality. However only averages and/or maxima are given for the different metrics, which makes it difficult to evaluate the associated conclusions. In some cases, only 1 or 2 test runs were performed.

      The metrics extracted from each of the three replicates, per dataset, are now included in Suppl. Data 4.

      We consistently used 3 replicates to measure tracking performance with each of the datasets. The "replicates" column label in Table 1 referred to the number of scans that were averaged to generate the image, not to the replicates used for estimating the tracking performance. To avoid confusion, we changed that label to "averaging".

      OPTIONAL: An imaging approach that allows using the current mounting strategy but could help with some of the tradeoffs is using a spinning-disk confocal microscope instead of a laser scanning one. If the authors have such a system available, it could be interesting to compare it with their current scanning confocal setup.

      Preliminary experiments that we carried out several years ago on a spinning disk confocal (with a 20x objective and the CSU-W1 spinning disk) were not very encouraging, and we therefore did not pursue this approach further. The main problem was bad image quality in deeper tissue layers.

      Minor comments:

      The presented imaging protocol was optimized for one laser wavelength only (561 nm) - this should be mentioned when discussing the technical limitations since animals tend to react differently to different wavelengths. Same settings might thus not be applicable for imaging a different fluorescent protein.

      In the second paragraph of the Results section, we explain that we perform the imaging at long wavelengths in order to minimise photodamage. It should be clear to the readers that changing the excitation wavelength will have an impact for long-term live imaging.

      For transferability, it would be useful if the intensity of laser illumination was measured and given in the Methods, instead of just a relative intensity setting from the imaging software. Similarly,more details of the imaging system should be provided where appropriate (e.g., detector specifications).

      We have now measured the intensity of the laser illumination and added this information in the Methods: "Laser power was typically set to 0.3% to 0.8%, which yields 0.51 to 1.37 µW at 561 nm (measured with a ThorLabs Microscope Slide Power Sensor, #S170C)."

      Regarding the imaging system and the detector, we provide all the information that is available to us on the microscope's technical sheets.

      The versions of analysis scripts associated with the manuscript should be uploaded to an online repository that permanently preserves the respective version.

      The scripts are now available on gitbub and online repositories. The relevant links are included in the revised manuscript.

    1. Welcome back, and in this lesson, I want to cover EC2 purchase options. EC2 purchase options are often referred to as launch types, but the official way to refer to them from AWS is purchase options, and so to be consistent, I think it's worth focusing on that name. So, EC2 purchase options. Let's step through all of the main types with a focus on the situations where you would and wouldn't use each of them. So, let's jump in and get started.

      The first purchase option that I want to talk about is the default, which is on demand, and on demand is simple to explain because it's entirely unremarkable in every way. It's the default because it's the average of anything with no specific pros or cons. Now, the way that it works, let's start with two EC2 hosts. Obviously, AWS has more, but it's easy to diagram with just the two. Now, instances of different sizes when launched using on demand will run on these EC2 hosts, and different AWS customers, they're all mixed up on the shared pool of EC2 hosts. So, even though instances are isolated and protected, different AWS customers launch instances which share the same pool of underlying hardware. This means that AWS can efficiently allocate resources, which is why the starting price for on demand in EC2 is so reasonable.

      In terms of the price, on demand uses per second billing, and this happens while instances are running, so you're paying for the resources that you consume. If you shut an instance down logically, you don't pay for those resources. Other associated services such as storage, which do consume resources regardless of if the instance is running or in a shutdown state, do charge constantly while those resources are being consumed. So, remember this: while instances only charge while in the running state, other associated resources may charge regardless. This is how on demand works, but what types of situations should it be used for? Well, it's the default purchase option, and so you should always start your evaluation process by considering on demand as your default. For all projects, assume on demand and move to something else if you can justify that alternative purchase option.

      With on demand, there are no interruptions. You launch an instance, you pay a per second charge, and barring any failures, the instance runs until you decide otherwise. You don't receive any capacity reservations with on demand. If AWS has a major failure and capacity is limited, the reserved purchase option receives highest provisioning priority on whatever capacity remains, and so if something is critical to your business, then you should consider an alternative rather than using on demand. So, on demand does not give you any priority access to remaining capacity if there are any major failures.

      On demand offers predictable pricing, it's defined upfront, you pay a constant price, but there are no specific discounts. This consistent pricing applies to the duration that you use instances. So, on demand is suitable for short term workloads. Anything which you just need to provision, perform a workload and then terminate is ideal for on demand. If you're unsure about the duration or the type of workload, then again, on demand is ideal. And then lastly, if you have short term or unknown workloads, which definitely can't tolerate any interruption, then on demand is the perfect purchase option.

      Next, let's talk about spot pricing, and spot is the cheapest way to get access to EC2 capacity. Let's look at how this works visually. Let's start with the same two EC2 hosts. On the left, we have A and on the right B. Then, on these EC2 hosts, we're currently running four EC2 instances, two per host. And let's assume for this example that all of these four instances are using the on demand purchase option. So, right now, with what you see on screen, the hosts are wasting capacity. Enough capacity for four additional instances on each host is being wasted. Spot pricing is AWS selling that spare capacity at a discounted rate.

      The way that it works is that within each region for each type of instance, there is a given amount of free capacity on EC2 hosts at any time. AWS tracks this and it publishes a price for how much it costs to use that capacity, and this price is the spot price. Now, you can offer to pay more than the spot price, but this is a maximum. You'll only ever pay the current spot price for the type of instance in the specific region where you provision services. So, let's say that there are two different customers who want to provision four instances each. The first customer sets a maximum price of four gold coins, and the other customer sets a maximum price of two gold coins. Now, obviously, AWS doesn't charge in gold coins, and there are more than two EC2 hosts, but it's just easier to represent it in this way.

      Now, because the current spot price set by AWS is only two gold coins, then both customers are only paying two gold coins a second for their instances. Even though customer one has offered to pay more, this is their maximum and they only ever pay the current spot price. So, let's say now that the free capacity is getting a little bit on the low side. AWS are getting nervous, they know that they need to free up capacity for the normal on demand instances, which they know are about to launch, and so they up the spot price to four gold coins. Now, customer one is fine because they've set a maximum price of four coins, and so now they start paying four coins because that's what the current spot price is. Customer two, they've set their maximum price at two coins, and so their instances are terminated.

      If the spot price goes above your maximum price, then any spot instances which you have are terminated. That's the critical part to understand because spot instances should not be viewed as reliable. At this point in our example, maybe another customer decides to launch four on demand instances. AWS sell that capacity at the normal on demand rates, which are higher, and no capacity is wasted. Spot pricing offers up to a 90% reduction versus the price of on demand, and there are some significant trade offs that you need to be aware of.

      You should never use the spot purchase option for workloads which can't tolerate interruptions. No matter how well you manage your maximum spot price, there are going to be periods when instances are terminated. If you run workloads where that's a problem, don't use spot. This means that workloads such as domain controllers, mail servers, traditional websites, or even flight control systems are all bad fits for spot instances. The types of scenarios which are good fits for using spot instances are things which are not time critical. Since the spot price changes throughout each day and throughout days of the week, if you're able to process workloads around this, then you can take advantage of the maximum cost benefits for using spot. Anything which can tolerate interruption and just rerun is ideal for spot instances.

      So, if you have highly parallel workloads which can be broken into hundreds or thousands of pieces, maybe scientific analysis, and if any parts which fail can be rerun, then spot is ideal. Anything which has a bursty capacity need, maybe media processing, image processing, any cost sensitive workloads which wouldn't be economical to do using normal on-demand instances, assuming they can tolerate interruption, these are ideal for spot. Anything which is stateless where the state of the user session is not stored on the instances themselves, meaning they can handle disruption, again, ideal for using spot. Don't use spot for anything that's long-term, anything that requires consistent, reliable compute, any business critical things, or things which cannot tolerate disruption. For those type of workloads, you should not use spot. It's an anti-pattern.

      OK, so this is the end of part one of this lesson. It was getting a little bit on the long side, and I wanted to give you the opportunity to take a small break, maybe stretch your legs or make a coffee. Now, part two will continue immediately from this point, so go ahead, complete this video, and when you're ready, I look forward to you joining me in part two.

    1. Welcome back. In this lesson, now that we've covered virtualization at a high level, I want to focus on the architecture of the EC2 product in more detail. EC2 is one of the services you'll use most often in AWS since one which features on a lot of exam questions, so let's get started.

      First thing, let's cover some key, high level architectural points about EC2. EC2 instances are virtual machines, so this means an operating system plus an allocation of resources such as virtual CPU, memory, potential some local storage, maybe some network storage, and access to other hardware such as networking and graphics processing units. EC2 instances run on EC2 hosts, and these are physical servers hardware which AWS manages. These hosts are either shared hosts or dedicated hosts.

      Shared hosts are hosts which are shared across different AWS customers, so you don't get any ownership of the hardware and you pay for the individual instances based on how long you run them for and what resources they have allocated. It's important to understand, though, that every customer when using shared hosts are isolated from each other, so there's no visibility of it being shared, there's no interaction between different customers, even if you're using the same shared host, and shared hosts are the default.

      With dedicated hosts, you're paying for the entire host, not the instances which run on it. It's yours, it's dedicated to your account, and you don't have to share it with any other customers. So if you pay for a dedicated host, you pay for that entire host, you don't pay for any instances running on it, and you don't share it with other AWS customers.

      EC2 is an availability zone resilient service. The reason for this is that hosts themselves run inside a single availability zone, so if that availability zone fails, the hosts inside that availability zone could fail, and any instances running on any hosts that fail will themselves fail. So as a solutions architect, you have to assume if an AZ fails, then at least some and probably all of the instances that are running inside that availability zone will also fail or be heavily impacted.

      Now let's look at how this looks visually. So this is a simplification of the US East One region, I've only got two AZs represented, AZA and AZB, and in AZA, I've represented that I've got two subnet, subnet A and subnet B. Now inside each of these availability zones is an EC2 host. Now these EC2 hosts, they run within a single AZ, I'm going to keep repeating that because it's critical for the exam and you're thinking about EC2 in the exam.

      Keep thinking about it being an AZ resilient service, if you see EC2 mentioned in an exam, see if you can locate the availability zone details because that might factor into the correct answer. Now EC2 hosts have some local hardware, logically CPU and memory, which you should be aware of, but also they have some local storage called the instance store. The instance store is temporary, if an instance is running on a particular host, depending on the type of the instance, it might be able to utilize this instance store, but if the instance moves off this host to another one, then that storage is lost.

      And they also have two types of networking, storage networking and data networking. When instances are provisioned into a specific subnet within a VPC, what's actually happening is that a primary elastic network interface is provisioned in a subnet, which maps to the physical hardware on the EC2 host. Remember, subnets are also in one specific availability zone. Instances can have multiple network interfaces, even in different subnets, as long as they're in the same availability zone. Everything about EC2 is focused around this architecture, the fact that it runs in one specific availability zone.

      Now EC2 can make use of remote storage so an EC2 host can connect to the elastic block store, which is known as EBS. The elastic block store service also runs inside a specific availability zone, so the service running inside availability zone A is different than the one running inside availability zone B, and you can't access them cross zone. EBS lets you allocate volumes and volumes of portions of persistent storage, and these can be allocated to instances in the same availability zone, so again, it's another area where the availability zone matters.

      What I'm trying to do by keeping repeating availability zone over and over again is to paint a picture of a service which is very reliant on the availability zone that it's running in. The host is in an availability zone, the network is per availability zone, the persistent storage is per availability zone, even availability zone in AWS experiences major issues, it impacts all of those things.

      Now an instance runs on a specific host, and if you restart the instance, it will stay on a host. Instances stay on a host until one of two things happen: firstly, the host fails or is taken down for maintenance for some reason by AWS; or secondly, if an instance is stopped and then started, and that's different than just restarting, so I'm focusing on an instance being stopped and then being started, so not just a restart. If either of those things happen, then an instance will be relocated to another host, but that host will also be in the same availability zone.

      Instances cannot natively move between availability zones. Everything about them, their hardware, networking and storage is locked inside one specific availability zone. Now there are ways you can do a migration, but it essentially means taking a copy of an instance and creating a brand new one in a different availability zone, and I'll be covering that later in this section where I talk about snapshots and AMIs.

      What you can never do is connect network interfaces or EBS storage located in one availability zone to an EC2 instance located in another. EC2 and EBS are both availability zone services, they're isolated, you cannot cross AZs with instances or with EBS volumes. Now instances running on an EC2 host share the resources of that host. And instances of different sizes can share a host, but generally instances of the same type and generation will occupy the same host.

      And I'll be talking in much more detail about instance types and sizes and generations in a lesson that's coming up very soon. But when you think about an EC2 host, think that it's from a certain year and includes a certain class of processor and a certain type of memory and a certain type and configuration of storage. And instances are also created with different generations, different versions that you apply specific types of CPU memory and storage, so it's logical that if you provision two different types of instances, they may well end up on two different types of hosts.

      So a host generally has lots of different instances from different customers of the same type, but different sizes. So before we finish up this lesson, I want to answer a question. That question is what's EC2 good for? So what types of situations might you use EC2 for? And this is equally valuable when you're evaluating a technical architecture while you're answering questions in the exam.

      So first, EC2 is great when you've got a traditional OS and application compute need, so if you've got an application that requires to be running on a certain operating system at a certain runtime with certain configuration, maybe your internal technical staff are used to that configuration, or maybe your vendor has a certain set of support requirements, EC2 is a perfect use case for this type of scenario.

      And it's also great for any long running compute needs. There are lots of other services inside AWS that provide compute services, but many of these have got runtime limits, so you can't leave these things running consistently for one year or two years. With EC2, it's designed for persistent, long running compute requirements. So if you have an application that runs constantly 24/7, 365, and needs to be running on a normal operating system, Linux or Windows, then EC2 is the default and obvious choice for this.

      If you have any applications, which is server style applications, so traditional applications they expect to be running in an operating system, waiting for incoming connections, then again, EC2 is a perfect service for this. And it's perfect for any applications or services that need burst requirements or steady state requirements. There are different types of EC2 instances, which are suitable for low levels of normal loads with occasional bursts, as well as steady state load.

      So again, if your application needs an operating system, and it's not bursty needs or consistent steady state load, then EC2 should be the first thing that you review. EC2 is also great for monolithic application stack, so if your monolithic application requires certain components, a stack, maybe a database, maybe some middleware, maybe other runtime based components, and especially if it needs to be running on a traditional operating system, EC2 should be the first thing that you look at.

      And EC2 is also ideally suited for migrating application workloads, so application workloads, which expect a traditional virtual machine or server style environment, or if you're performing disaster recovery. So if you have existing traditional systems which run on virtual servers, and you want to provision a disaster recovery environment, then EC2 is perfect for that.

      In general, EC2 tends to be the default compute service within AWS. There are lots of niche requirements that you might have, and if you do have those, there are other compute services such as the elastic container service or Lambda. But generally, if you've got traditional style workloads, or you're looking for something that's consistent, or if it requires an operating system, or if it's monolithic, or if you migrated into AWS, then EC2 is a great default first option.

      Now in this section of the course, I'm covering the basic architectural components of EC2, so I'm gonna be introducing the basics and let you get some exposure to it, and I'm gonna be teaching you all the things that you'll need for the exam.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      Examination of (a)periodic brain activity has gained particular interest in the last few years in the neuroscience fields relating to cognition, disorders, and brain states. Using large EEG/MEG datasets from younger and older adults, the current study provides compelling evidence that age-related differences in aperiodic EEG/MEG signals can be driven by cardiac rather than brain activity. Their findings have important implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac signals is essential.

      We want to thank the editors for their assessment of our work and highlighting its importance for the understanding of aperiodic neural activity. Additionally, we want to thank the three present and four former reviewers (at a different journal) whose comments and ideas were critical in shaping this manuscript to its current form. We hope that this paper opens up many more questions that will guide us - as a field - to an improved understanding of how “cortical” and “cardiac” changes in aperiodic activity are linked and want to invite readers to engage with our work through eLife’s comment function.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The present study addresses whether physiological signals influence aperiodic brain activity with a focus on age-related changes. The authors report age effects on aperiodic cardiac activity derived from ECG in low and high-frequency ranges in roughly 2300 participants from four different sites. Slopes of the ECGs were associated with common heart variability measures, which, according to the authors, shows that ECG, even at higher frequencies, conveys meaningful information. Using temporal response functions on concurrent ECG and M/EEG time series, the authors demonstrate that cardiac activity is instantaneously reflected in neural recordings, even after applying ICA analysis to remove cardiac activity. This was more strongly the case for EEG than MEG data. Finally, spectral parameterization was done in large-scale resting-state MEG and ECG data in individuals between 18 and 88 years, and age effects were tested. A steepening of spectral slopes with age was observed particularly for ECG and, to a lesser extent, in cleaned MEG data in most frequency ranges and sensors investigated. The authors conclude that commonly observed age effects on neural aperiodic activity can mainly be explained by cardiac activity.

      Strengths:

      Compared to previous investigations, the authors demonstrate the effects of aging on the spectral slope in the currently largest MEG dataset with equal age distribution available. Their efforts of replicating observed effects in another large MEG dataset and considering potential confounding by ocular activity, head movements, or preprocessing methods are commendable and valuable to the community. This study also employs a wide range of fitting ranges and two commonly used algorithms for spectral parameterization of neural and cardiac activity, hence providing a comprehensive overview of the impact of methodological choices. Based on their findings, the authors give recommendations for the separation of physiological and neural sources of aperiodic activity.

      Weaknesses:

      While the aim of the study is well-motivated and analyses rigorously conducted, the overall structure of the manuscript, as it stands now, is partially misleading. Some of the described results are not well-embedded and lack discussion.

      We want to thank the reviewer for their comments focussed on improving the overall structure of the manuscript. We agree with their suggestions that some results could be more clearly contextualized and restructured the manuscript accordingly.

      Reviewer #2 (Public review):

      I previously reviewed this important and timely manuscript at a previous journal where, after two rounds of review, I recommended publication. Because eLife practices an open reviewing format, I will recapitulate some of my previous comments here, for the scientific record.

      In that previous review, I revealed my identity to help reassure the authors that I was doing my best to remain unbiased because I work in this area and some of the authors' results directly impact my prior research. I was genuinely excited to see the earlier preprint version of this paper when it first appeared. I get a lot of joy out of trying to - collectively, as a field - really understand the nature of our data, and I continue to commend the authors here for pushing at the sources of aperiodic activity!

      In their manuscript, Schmidt and colleagues provide a very compelling, convincing, thorough, and measured set of analyses. Previously I recommended that the push even further, and they added the current Figure 5 analysis of event-related changes in the ECG during working memory. In my opinion this result practically warrants a separate paper its own!

      The literature analysis is very clever, and expanded upon from any other prior version I've seen.

      In my previous review, the broadest, most high-level comment I wanted to make was that authors are correct. We (in my lab) have tried to be measured in our approach to talking about aperiodic analyses - including adopting measuring ECG when possible now - because there are so many sources of aperiodic activity: neural, ECG, respiration, skin conductance, muscle activity, electrode impedances, room noise, electronics noise, etc. The authors discuss this all very clearly, and I commend them on that. We, as a field, should move more toward a model where we can account for all of those sources of noise together. (This was less of an action item, and more of an inclusion of a comment for the record.)

      I also very much appreciate the authors' excellent commentary regarding the physiological effects that pharmacological challenges such as propofol and ketamine also have on non-neural (autonomic) functions such as ECG. Previously I also asked them to discuss the possibility that, while their manuscript focuses on aperiodic activity, it is possible that the wealth of literature regarding age-related changes in "oscillatory" activity might be driven partly by age-related changes in neural (or non-neural, ECG-related) changes in aperiodic activity. They have included a nice discussion on this, and I'm excited about the possibilities for cognitive neuroscience as we move more in this direction.

      Finally, I previously asked for recommendations on how to proceed. The authors convinced me that we should care about how the ECG might impact our field potential measures, but how do I, as a relative novice, proceed. They now include three strong recommendations at the end of their manuscript that I find to be very helpful.

      As was obvious from previous review, I consider this to be an important and impactful cautionary report, that is incredibly well supported by multiple thorough analyses. The authors have done an excellent job responding to all my previous comments and concerns and, in my estimation, those of the previous reviewers as well.

      We want to thank the reviewer for agreeing to review our manuscript again and for recapitulating on their previous comments and the progress the manuscript has made over the course of the last ~2 years. The reviewer's comments have been essential in shaping the manuscript into its current form. Their feedback has made the review process truly feel like a collaborative effort, focused on strengthening the manuscript and refining its conclusions and resulting recommendations.

      Reviewer #3 (Public review):

      Summary:

      Schmidt et al., aimed to provide an extremely comprehensive demonstration of the influence cardiac electromagnetic fields have on the relationship between age and the aperiodic slope measured from electroencephalographic (EEG) and magnetoencephalographic (MEG) data.

      Strengths:

      Schmidt et al., used a multiverse approach to show that the cardiac influence on this relationship is considerable, by testing a wide range of different analysis parameters (including extensive testing of different frequency ranges assessed to determine the aperiodic fit), algorithms (including different artifact reduction approaches and different aperiodic fitting algorithms), and multiple large datasets to provide conclusions that are robust to the vast majority of potential experimental variations.

      The study showed that across these different analytical variations, the cardiac contribution to aperiodic activity measured using EEG and MEG is considerable, and likely influences the relationship between aperiodic activity and age to a greater extent than the influence of neural activity.

      Their findings have significant implications for all future research that aims to assess aperiodic neural activity, suggesting control for the influence of cardiac fields is essential.

      We want to thank the reviewer for their thorough engagement with our work and the resultant substantive amount of great ideas both mentioned in the section of Weaknesses and Authors Recommendations below. Their suggestions have sparked many ideas in us on how to move forward in better separating peripheral- from neuro-physiological signals that are likely to greatly influence our future attempts to better extract both cardiac and muscle activity from M/EEG recordings. So we want to thank them for their input, time and effort!

      Weaknesses:

      Figure 4I: The regressions explained here seem to contain a very large number of potential predictors. Based on the way it is currently written, I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions?

      I'm not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including these latent contributions to the full signal back into the same regression model. It seems that there could be some circularity or redundancy in doing so. Can the authors provide a justification for why this is a valid approach?

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      I'm not sure whether there is good evidence or rationale to support the statement in the discussion that the presence of the ECG signal in reference electrodes makes it more difficult to isolate independent ECG components. The ICA algorithm will still function to detect common voltage shifts from the ECG as statistically independent from other voltage shifts, even if they're spread across all electrodes due to the referencing montage. I would suggest there are other reasons why the ICA might lead to imperfect separation of the ECG component (assumption of the same number of source components as sensors, non-Gaussian assumption, assumption of independence of source activities).

      The inclusion of only 32 channels in the EEG data might also have reduced the performance of ICA, increasing the chances of imperfect component separation and the mixing of cardiac artifacts into the neural components, whereas the higher number of sensors in the MEG data would enable better component separation. This could explain the difference between EEG and MEG in the ability to clean the ECG artifact (and perhaps higher-density EEG recordings would not show the same issue).

      The reviewer is making a good argument suggesting that our initial assumption that the presence of cardiac activity on the reference electrode influences the performance of the ICA may be wrong. After rereading and rethinking upon the matter we think that the reviewer is correct and that their assumptions for why the ECG signal was not so easily separable from our EEG recordings are more plausible and better grounded in the literature than our initial suggestion. We therefore now highlight their view as a main reason for why the ECG rejection was more challenging in EEG data. However, we also note that understanding the exact reason probably ends up being an empirical question that demands further research stating that:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      In addition to the inability to effectively clean the ECG artifact from EEG data, ICA and other component subtraction methods have also all been shown to distort neural activity in periods that aren't affected by the artifact due to the ubiquitous issue of imperfect component separation (https://doi.org/10.1101/2024.06.06.597688). As such, component subtraction-based (as well as regression-based) removal of the cardiac artifact might also distort the neural contributions to the aperiodic signal, so even methods to adequately address the cardiac artifact might not solve the problem explained in the study. This poses an additional potential confound to the "M/EEG without ECG" conditions.

      The reviewer is correct in stating that, if an “artifactual” signal is not always present but appears and disappears (like e.g. eye-blinks) neural activity may be distorted in periods where the “artifactual” signal is absent. However, while this plausibly presents a problem for ocular activity, there is no obvious reason to believe that this applies to cardiac activity. While the ECG signal is non-stationary in nature, it is remarkably more stable than eye-movements in the healthy populations we analyzed (especially at rest). Therefore, the presence of the cardiac “artifact” was consistently present across the entirety of the MEG recordings we visually inspected.

      Literature Analysis, Page 23: was there a method applied to address studies that report reducing artifacts in general, but are not specific to a single type of artifact? For example, there are automated methods for cleaning EEG data that use ICLabel (a machine learning algorithm) to delete "artifact" components. Within these studies, the cardiac artifact will not be mentioned specifically, but is included under "artifacts".

      The literature analysis was largely performed automatically and solely focussed on ECG related activity as described in the methods section under Literature Analysis, if no ECG related terms were used in the context of artifact rejection a study was flagged as not having removed cardiac activity. This could have been indeed better highlighted by us and we apologize for the oversight on our behalf. We now additionally link to these details stating that:

      “However, an analysis of openly accessible M/EEG articles (N<sub>Articles</sub>=279; see Methods - Literature Analysis for further details) that investigate aperiodic activity revealed that only 17.1% of EEG studies explicitly mention that cardiac activity was removed and only 16.5% measure ECG (45.9% of MEG studies removed cardiac activity and 31.1% of MEG studies mention that ECG was measured; see Figure 1EF).”

      The reviewer makes a fair point that there is some uncertainty here and our results probably present a lower bound of ECG handling in M/EEG research as, when I manually rechecked the studies that were not initially flagged in studies it was often solely mentioned that “artifacts” were rejected. However, this information seemed too ambiguous to assume that cardiac activity was in fact accounted for. However, again this could have been mentioned more clearly in writing and we apologize for this oversight. Now this is included as part of the methods section Literature Analysis stating that:

      “All valid word contexts were then manually inspected by scanning the respective word context to ensure that the removal of “artifacts” was related specifically to cardiac and not e.g. ocular activity or the rejection of artifacts in general (without specifying which “artifactual” source was rejected in which case the manuscript was marked as invalid). This means that the results of our literature analysis likely present a lower bound for the rejection of cardiac activity in the M/EEG literature investigating aperiodic activity.”

      Statistical inferences, page 23: as far as I can tell, no methods to control for multiple comparisons were implemented. Many of the statistical comparisons were not independent (or even overlapped with similar analyses in the full analysis space to a large extent), so I wouldn't expect strong multiple comparison controls. But addressing this point to some extent would be useful (or clarifying how it has already been addressed if I've missed something).

      In the present study we tried to minimize the risk of type 1 errors by several means, such as A) weakly informative priors, B) robust regression models and C) by specifying a region of practical equivalence (ROPE, see Methods Statistical Inference for further Information) to define meaningful effects.

      Weakly informative priors can lower the risk of type 1 errors arising from multiple testing by shrinking parameter estimates towards zero (see e.g. Lemoine, 2019). Robust regression models use a Student T distribution to describe the distribution of the data. This distribution features heavier tails, meaning it allocates more probability to extreme values, which in turn minimizes the influence of outliers. The ROPE criterion ensures that only effects exceeding a negligible size are considered meaningful, representing a strict and conservative approach to interpreting our findings (see Kruschke 2018, Cohen, 1988).

      Furthermore, and more generally we do not selectively report “significant” effects in the situations in which multiple analyses were conducted on the same family of data (e.g. Figure 2 & 4). Instead we provide joint inference across several plausible analysis options (akin to a specification curve analysis, Simonsohn, Simmons & Nelson 2020) to provide other researchers with an overview of how different analysis choices impact the association between cardiac and neural aperiodic activity.

      Lemoine, N. P. (2019). Moving beyond noninformative priors: why and how to choose weakly informative priors in Bayesian analyses. Oikos, 128(7), 912-928.

      Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2020). Specification curve analysis. Nature Human Behaviour, 4(11), 1208-1214.

      Methods:

      Applying ICA components from 1Hz high pass filtered data back to the 0.1Hz filtered data leads to worse artifact cleaning performance, as the contribution of the artifact in the 0.1Hz to 1Hz frequency band is not addressed (see Bailey, N. W., Hill, A. T., Biabani, M., Murphy, O. W., Rogasch, N. C., McQueen, B., ... & Fitzgerald, P. B. (2023). RELAX part 2: A fully automated EEG data cleaning algorithm that is applicable to Event-Related-Potentials. Clinical Neurophysiology, result reported in the supplementary materials). This might explain some of the lower frequency slope results (which include a lower frequency limit <1Hz) in the EEG data - the EEG cleaning method is just not addressing the cardiac artifact in that frequency range (although it certainly wouldn't explain all of the results).

      We want to thank the reviewer for suggesting this interesting paper, showing that lower high-pass filters may be preferable to the more commonly used >1Hz high-pass filters for detection of ICA components that largely contain peripheral physiological activity. However, the results presented by Bailey et al. contradict the more commonly reported findings by other researchers that >1Hz high-pass filter is actually preferable (e.g. Winkler et al. 2015; Dimingen, 2020 or Klug & Gramann, 2021) and recommendations in widely used packages for M/EEG analysis (e.g. https://mne.tools/1.8/generated/mne.preprocessing.ICA.html). Yet, the fact that there seems to be a discrepancy suggests that further research is needed to better understand which type of high-pass filtering is preferable in which situation. Furthermore, it is notable that all the findings for high-pass filtering in ICA component detection and removal that we are aware of relate to ocular activity. Given that ocular and cardiac activity have very different temporal and spectral patterns it is probably worth further investigating whether the classic 1Hz high-pass filter is really also the best option for the detection and removal of cardiac activity. However, in our opinion this requires a dedicated investigation on its own..

      We therefore highlight this now in our manuscript stating that:

      “Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)).

      Winkler, S. Debener, K. -R. Müller and M. Tangermann, "On the influence of high-pass filtering on ICA-based artifact reduction in EEG-ERP," 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 2015, pp. 4101-4105, doi: 10.1109/EMBC.2015.7319296.

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      Klug, M., & Gramann, K. (2021). Identifying key factors for improving ICA‐based decomposition of EEG data in mobile and stationary experiments. European Journal of Neuroscience, 54(12), 8406-8420.

      It looks like no methods were implemented to address muscle artifacts. These can affect the slope of EEG activity at higher frequencies. Perhaps the Riemannian Potato addressed these artifacts, but I suspect it wouldn't eliminate all muscle activity. As such, I would be concerned that remaining muscle artifacts affected some of the results, particularly those that included high frequency ranges in the aperiodic estimate. Perhaps if muscle activity were left in the EEG data, it could have disrupted the ability to detect a relationship between age and 1/f slope in a way that didn't disrupt the same relationship in the cardiac data (although I suspect it wouldn't reverse the overall conclusions given the number of converging results including in lower frequency bands). Is there a quick validity analysis the authors can implement to confirm muscle artifacts haven't negatively affected their results?

      I note that an analysis of head movement in the MEG is provided on page 32, but it would be more robust to show that removing ICA components reflecting muscle doesn't change the results. The results/conclusions of the following study might be useful for objectively detecting probable muscle artifact components: Fitzgibbon, S. P., DeLosAngeles, D., Lewis, T. W., Powers, D. M. W., Grummett, T. S., Whitham, E. M., ... & Pope, K. J. (2016). Automatic determination of EMG-contaminated components and validation of independent component analysis using EEG during pharmacologic paralysis. Clinical neurophysiology, 127(3), 1781-1793.

      We thank the reviewer for their suggestion. Muscle activity can indeed be a potential concern, for the estimation of the spectral slope. This is precisely why we used head movements (as also noted by the reviewer) as a proxy for muscle activity. We also agree with the reviewer that this is not a perfect estimate. Additionally, also the riemannian potato would probably only capture epochs that contain transient, but not persistent patterns of muscle activity.

      The paper recommended by the reviewer contains a clever approach of using the steepness of the spectral slope (or lack thereof) as an indicator whether or not an independent component (IC) is driven by muscle activity. In order to determine an optimal threshold Fitzgibbon et al. compared paralyzed to temporarily non paralyzed subjects. They determined an expected “EMG-free” threshold for their spectral slope on paralyzed subjects and used this as a benchmark to detect IC’s that were contaminated by muscle activity in non paralyzed subjects.

      This is a great idea, but unfortunately would go way beyond what we are able to sensibly estimate with our data for the following reasons. The authors estimated their optimal threshold on paralyzed subjects for EEG data and show that this is a feasible threshold to be applied across different recordings. So for EEG data it might be feasible, at least as a first shot, to use their threshold on our data. However, we are measuring MEG and as alluded to in our discussion section under “Differences in aperiodic activity between magnetic and electric field recordings” the spectral slope differs greatly between MEG and EEG recordings for non-trivial reasons. Furthermore, the spectral slope even seems to also differ across different MEG devices. We noticed this when we initially tried to pool the data recorded in Salzburg with the Cambridge dataset. This means we would need to do a complete validation of this procedure for the MEG data recorded in Cambridge and in Salzburg, which is not feasible considering that we A) don’t have direct access to one of the recording sites and B) would even if we had access face substantial hurdles to get ethical approval for the experiment performed by Fitzgibbon et al..

      However, we think the approach brought forward by Fitzgibbon and colleagues is a clever way to remove muscle activity from EEG recordings, whenever EMG was not directly recorded. We therefore suggested in the Discussion section that ideally also EMG should be recorded stating that:

      “It is worth noting that, apart from cardiac activity, muscle activity can also be captured in (non-)invasive recordings and may drastically influence measures of the spectral slope(72). To ensure that persistent muscle activity does not bias our results we used changes in head movement velocity as a control analysis (see Supplementary Figure S9). However, it should be noted that this is only a proxy for the presence of persistent muscle activity. Ideally, studies investigating aperiodic activity should also be complemented by measurements of EMG. Whenever such measurements are not available creative approaches that use the steepness of the spectral slope (or the lack thereof) as an indicator to detect whether or not e.g. an independent component is driven by muscle activity are promising(72,73). However, these approaches may require further validation to determine how well myographic aperiodic thresholds are transferable across the wide variety of different M/EEG devices.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) As outlined above, I recommend rephrasing the last section of the introduction to briefly summarize/introduce all main analysis steps undertaken in the study and why these were done (for example, it is only mentioned that the Cam-CAN dataset was used to study the impact of cardiac on MEG activity although the author used a variety of different datasets). Similarly, I am missing an overview of all main findings in the context of the study goals in the discussion. I believe clarifying the structure of the paper would not only provide a red thread to the reader but also highlight the efforts/strength of the study as described above.

      This is a good call! As suggested by the reviewer we now try to give a clearer overview of what was investigated why. We do that both at the end of the introduction stating that: “Using the publicly available Cam-CAN dataset(28,29), we find that the aperiodic signal measured using M/EEG originates from multiple physiological sources. In particular, significant portions of age-related changes in aperiodic activity –normally attributed to neural processes– can be better explained by cardiac activity. This observation holds across a wide range of processing options and control analyses (see Supplementary S1), and was replicable on a separate MEG dataset. However, the extent to which cardiac activity accounts for age-related changes in aperiodic activity varies with the investigated frequency range and recording site. Importantly, in some frequency ranges and sensor locations, age-related changes in neural aperiodic activity still prevail. But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging. In sum, our results highlight the complexity of aperiodic activity while cautioning against interpreting it as solely “neural“ without considering physiological influences.”

      and at the beginning of the discussion section:

      “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources (see Figure 1EF). Additionally, it is worth noting that the effectiveness of an ICA crucially depends on the quality of the extracted components(63,64) and even widely suggested settings e.g. high-pass filtering at 1Hz before fitting an ICA may not be universally applicable (see supplementary material of (64)). “

      (2) I found it interesting that the spectral slopes of ECG activity at higher frequency ranges (> 10 Hz) seem mostly related to HRV measures such as fractal and time domain indices and less so with frequency-domain indices. Do the authors have an explanation for why this is the case? Also, the analysis of the HRV measures and their association with aperiodic ECG activity is not explained in any of the method sections.

      We apologize for the oversight in not mentioning the HRV analysis in more detail in our methods section. We added a subsection to the Methods section entitled ECG Processing - Heart rate variability analysis to further describe the HRV analyses.

      “ECG Processing - Heart rate variability analysis

      Heart rate variability (HRV) was computed using the NeuroKit2 toolbox, a high level tool for the analysis of physiological signals. First, the raw electrocardiogram (ECG) data were preprocessed, by highpass filtering the signal at 0.5Hz using an infinite impulse response (IIR) butterworth filter(order=5) and by smoothing the signal with a moving average kernel with the width of one period of 50Hz to remove the powerline noise (default settings of neurokit.ecg.ecg_clean). Afterwards, QRS complexes were detected based on the steepness of the absolute gradient of the ECG signal. Subsequently, R-Peaks were detected as local maxima in the QRS complexes (default settings of neurokit.ecg.ecg_peaks; see (98) for a validation of the algorithm). From the cleaned R-R intervals, 90 HRV indices were derived, encompassing time-domain, frequency-domain, and non-linear measures. Time-domain indices included standard metrics such as the mean and standard deviation of the normalized R-R intervals , the root mean square of successive differences, and other statistical descriptors of interbeat interval variability. Frequency-domain analyses were performed using power spectral density estimation, yielding for instance low frequency (0.04-0.15Hz) and high frequency (0.15-0.4Hz) power components. Additionally, non-linear dynamics were characterized through measures such as sample entropy, detrended fluctuation analysis and various Poincaré plot descriptors. All these measures were then related to the slopes of the low frequency (0.25 – 20 Hz) and high frequency (10 – 145 Hz) aperiodic spectrum of the raw ECG.”

      With regards to association of the ECG’s spectral slopes at high frequencies and frequency domain indices of heart rate variability. Common frequency domain indices of heart rate variability fall in the range of 0.01-.4Hz. Which probably explains why we didn’t notice any association at higher frequency ranges (>10Hz).

      This is also stated in the related part of the results section:

      “In the higher frequency ranges (10 - 145 Hz) spectral slopes were most consistently related to fractal and time domain indices of heart rate variability, but not so much to frequency-domain indices assessing spectral power in frequency ranges < 0.4 Hz.”

      (3) Related to the previous point - what is being reflected in the ECG at higher frequency ranges, with regard to biological mechanisms? Results are being mentioned, but not further discussed. However, this point seems crucial because the age effects across the four datasets differ between low and high-frequency slope limits (Figure 2C).

      This is a great question that definitely also requires further attention and investigation in general (see also Tereshchenko & Josephson, 2015). We investigated the change of the slope across frequency ranges that are typically captured in common ECG setups for adults (0.05 - 150Hz, Tereshchenko & Josephson, 2015; Kusayama, Wong, Liu et al. 2020). While most of the physiological significant spectral information of an ECG recording rests between 1-50Hz (Clifford & Azuaje, 2006), meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz) that falls straight in our spectral analysis window. However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). Yet, the exact physiological mechanisms underlying so-called high-frequency QRS remain unclear (HF-QRS; see Tereshchenko & Josephson, 2015; Qiu et al. 2024 for a review discussing possible mechanisms). Yet, at the same time the HF-QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range (Schlegel et al. 2004; Qiu et al. 2024). All optimism aside, it is also worth noting that ECG recordings at higher frequencies can capture skeletal muscle activity with an overlapping frequency range up to 400Hz (Kusayama, Wong, Liu et al. 2020). We highlight all of this now when introducing this analysis in the results sections as outstanding research question stating that:

      “However, substantially less is known about aperiodic activity above 0.4Hz in the ECG. Yet, common ECG setups for adults capture activity at a broad bandwidth of 0.05 - 150Hz(33,34).

      Importantly, a lot of the physiological meaningful spectral information rests between 1-50Hz(35), similarly to M/EEG recordings. Furthermore, meaningful information can be extracted at much higher frequencies. For instance, ventricular late potentials have a broader frequency band (~40-250Hz(35)). However, that’s not all, as further meaningful information can be extracted at even higher frequencies (>100Hz). For instance, the so-called high-frequency QRS seems to be highly informative for the early detection of myocardial ischemia and other cardiac abnormalities that may not yet be evident in the standard frequency range(36,37). Yet, the exact physiological mechanisms underlying the high-frequency QRS remain unclear (see (37) for a review discussing possible mechanisms). ”

      Tereshchenko, L. G., & Josephson, M. E. (2015). Frequency content and characteristics of ventricular conduction. Journal of electrocardiology, 48(6), 933-937.

      Kusayama, T., Wong, J., Liu, X. et al. Simultaneous noninvasive recording of electrocardiogram and skin sympathetic nerve activity (neuECG). Nat Protoc 15, 1853–1877 (2020). https://doi.org/10.1038/s41596-020-0316-6

      Clifford, G. D., & Azuaje, F. (2006). Advanced methods and tools for ECG data analysis (Vol. 10). P. McSharry (Ed.). Boston: Artech house.

      Qiu, S., Liu, T., Zhan, Z., Li, X., Liu, X., Xin, X., ... & Xiu, J. (2024). Revisiting the diagnostic and prognostic significance of high-frequency QRS analysis in cardiovascular diseases: a comprehensive review. Postgraduate Medical Journal, qgae064.

      Schlegel, T. T., Kulecz, W. B., DePalma, J. L., Feiveson, A. H., Wilson, J. S., Rahman, M. A., & Bungo, M. W. (2004, March). Real-time 12-lead high-frequency QRS electrocardiography for enhanced detection of myocardial ischemia and coronary artery disease. In Mayo Clinic Proceedings (Vol. 79, No. 3, pp. 339-350). Elsevier.

      (4) Page 10: At first glance, it is not quite clear what is meant by "processing option" in the text. Please clarify.

      Thank you for catching this! Upon re-reading this is indeed a bit oblivious. We now swapped “processing options” with “slope fits” to make it clearer that we are talking about the percentage of effects based on the different slope fits.

      (5) The authors mention previous findings on age effects on neural 1/f activity (References Nr 5,8,27,39) that seem contrary to their own findings such as e.g., the mostly steepening of the slopes with age. Also, the authors discuss thoroughly why spectral slopes derived from MEG signals may differ from EEG signals. I encourage the authors to have a closer look at these studies and elaborate a bit more on why these studies differ in their conclusions on the age effects. For example, Tröndle et al. (2022, Ref. 39) investigated neural activity in children and young adults, hence, focused on brain maturation, whereas the CamCAN set only considers the adult lifespan. In a similar vein, others report age effects on 1/f activity in much smaller samples as reported here (e.g., Voytek et al., 2015).

      I believe taking these points into account by briefly discussing them, would strengthen the authors' claims and provide a more fine-grained perspective on aging effects on 1/f.

      The reviewer is making a very important point. As age-related differences in (neuro-)physiological activity are not necessarily strictly comparable and entirely linear across different age-cohorts (e.g. age-related changes in alpha center frequency). We therefore, added the suggested discussion points to the discussion section.

      “Differences in electric and magnetic field recordings aside, aperiodic activity may not change strictly linearly as we are ageing and studies looking at younger age groups (e.g. <22; (44) may capture different aspects of aging (e.g. brain maturation), than those looking at older subjects (>18 years; our sample). A recent report even shows some first evidence of an interesting putatively non-linear relationship with age in the sensorimotor cortex for resting recordings(59)”

      (6) The analysis of the working memory paradigm as described in the outlook-section of the discussion comes as a bit of a surprise as it has not been introduced before. If the authors want to convey with this study that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity, I recommend introducing this analysis and the results earlier in the manuscript than only in the discussion to strengthen their message.

      The reviewer is correct. This analysis really comes a bit out of the blue. However, this was also exactly the intention for placing this analysis in the discussion. As the reviewer correctly noted, the aim was to suggest “that, in general, aperiodic neural activity could be influenced by aperiodic cardiac activity”. We placed this outlook directly after the discussion of “(neuro-)physiological origins of aperiodic activity”, where we highlight the potential challenges of interpreting drug induced changes to M/EEG recordings. So the aim was to get the reader to think about whether age is the only feature affected by cardiac activity and then directly present some evidence that this might go beyond age.

      However, we have been rethinking this approach based on the reviewers comments and moved that paragraph to the end of the results section accordingly and introduce it already at the end of the introduction stating that:

      “But does the influence of cardiac activity on the aperiodic spectrum extend beyond age? In a preliminary analysis, we demonstrate that working memory load modulates the aperiodic spectrum of “pure” ECG recordings. The direction of this working memory effect mirrors previous findings on EEG data(5) suggesting that the impact of cardiac activity goes well beyond aging.”

      (7) The font in Figure 2 is a bit hard to read (especially in D). I recommend increasing the font sizes where necessary for better readability.

      We agree with the Reviewer and increased the font sizes accordingly.

      (8) Text in the discussion: Figure 3B on page 10 => shouldn't it be Figure 4?

      Thank you for catching this oversight. We have now corrected this mistake.

      (9) In the third section on page 10, the Figure labels seem to be confused. For example, Figure 4 E is supposed to show "steepening effects", which should be Figure 4B I believe.

      Please check the figure labels in this section to avoid confusion.

      Thank you for catching this oversight. We have now corrected this mistake.

      (10) Figure Legend 4 I), please check the figure labels in the text

      Thank you for catching this oversight. We have now corrected this mistake.

      Reviewer #3 (Recommendations for the authors):

      I have a number of suggestions for improving the manuscript, which I have divided by section in the following:

      ABSTRACT:

      I would suggest re-writing the first sentences to make it easier to read for non-expert readers: "The power of electrophysiologically measured cortical activity decays with an approximately 1/fX function. The slope of this decay (i.e. the spectral exponent, X) is modulated..."

      Thank you for the suggestion. We adjusted the sentence as suggested to make it easier for less technical readers to understand that “X” refers to the exponent.

      Including the age range that was studied in the abstract could be informative.

      Done as suggested.

      As an optional recommendation, I think it would increase the impact of the article if the authors note in the abstract that the current most commonly applied cardiac artifact reduction approaches don't resolve the issue for EEG data, likely due to an imperfect ability to separate the cardiac artifact from the neural activity with independent component analysis. This would highlight to the reader that they can't just expect to address these concerns by cleaning their data with typical cleaning methods.

      I think it would also be useful to convey in the abstract just how comprehensive the included analyses were (in terms of artifact reduction methods tested, different aperiodic algorithms and frequency ranges, and both MEG and EEG). Doing so would let the reader know just how robust the conclusions are likely to be.

      This is a brilliant idea! As suggested we added a sentence highlighting that simply performing an ICA may not be sufficient to separate cardiac contributions to M/EEG recordings and refer to the comprehensiveness of the performed analyses.

      INTRODUCTION:

      I would suggest re-writing the following sentence for readability: "In the past, aperiodic neural activity, other than periodic neural activity (local peaks that rise above the "power-law" distribution), was often treated as noise and simply removed from the signal"

      To something like: "In the past, aperiodic neural activity was often treated as noise and simply removed from the signal e.g. via pre-whitening, so that analyses could focus on periodic neural activity (local peaks that rise above the "power-law" distribution, which are typically thought to reflect neural oscillations).

      We are happy to follow that suggestion.

      Page 3: please provide the number of articles that were included in the examination of the percentage that remove cardiac activity, and note whether the included articles could be considered a comprehensive or nearly comprehensive list, or just a representative sample.

      We stated the exact number of articles in the methods section under Literature Analysis. However, we added it to the Introduction on page 3 as suggested by the reviewer. The selection of articles was done automatically, dependent on a list of pre-specified terms and exclusively focussed on articles that had terms related to aperiodic activity in their title (see Literature Analysis). Therefore, I would personally be hesitant in calling it a comprehensive or nearly comprehensive list of the general M/EEG literature as the analysis of aperiodic activity is still relatively niche compared to the more commonly investigated evoked potentials or oscillations. I think whether or not a reader perceives our analysis as comprehensive should be up to them to decide and does not reflect something I want to impose on them. This is exacerbated by the fact that the analysis of neural aperiodic activity has rapidly gained traction over the last years (see Figure 1D orange) and the literature analysis was performed almost 2 years ago and therefore, in my eyes, only represents a glimpse in the rapidly evolving field related to the analysis of aperiodic activity.

      Figure 1E-F: It's not completely clear that the "Cleaning Methods" part of the figure indicates just methods to clean the cardiac artifact (rather than any artifact). It also seems that ~40% of EEG studies do not apply any cleaning methods even from within the studies that do clean the cardiac artifact (if I've read the details correctly). This seems unlikely. Perhaps there should be a bar for "other methods", or "unspecified"? Having said that, I'm quite familiar with the EEG artifact reduction literature, and I would be very surprised if ~40% of studies cleaned the cardiac artifact using a different method to the methods listed in the bar graph, so I'm wondering if I've misunderstood the figure, or whether the data capture is incomplete / inaccurate (even though the conclusion that ICA is the most common method is almost certainly accurate).

      The cleaning is indeed only focussed on cardiac activity specifically. This was however also mentioned in the caption of Figure 1: “We were further interested in determining which artifact rejection approaches were most commonly used to remove cardiac activity, such as independent component analysis (ICA(22)), singular value decomposition (SVD(23)), signal space separation (SSS(24)), signal space projections (SSP(25)) and denoising source separation (DSS(26)).” and in the methods section under Literature Analysis. However, we adjusted figure 1EF to make it more obvious that the described cleaning methods were only related to the ECG. Aside from using blind source separation techniques such as ICA a good amount of studies mentioned that they cleaned their data based on visual inspection (which was not further considered). Furthermore, it has to be noted that only studies were marked as having separated cardiac from neural activity, when this was mentioned explicitly.

      RESULTS:

      Page 6: I would delete the "from a neurophysiological perspective" clause, which makes the sentence more difficult to read and isn't so accurate (frequencies 13-25Hz would probably more commonly be considered mid-range rather than low or high). Additionally, both frequency ranges include 15Hz, but the next sentence states that the ranges were selected to avoid the knee at 15Hz, which seems to be a contradiction. Could the authors explain in more detail how the split addresses the 15Hz knee?

      We removed the “from a neurophysiological perspective” clause as suggested. With regards to the “knee” at ~15Hz I would like to defer the reviewer to Supplementary Figure S1. The Knee Frequency varies substantially across subjects so splitting the data at only 1 exact Frequency did not seem appropriate. Additionally, we found only spurious significant age-related variations in Knee Frequency (i.e. only one out of the 4 datasets; not shown).

      Furthermore, we wanted to better connect our findings to our MEG results in Figure 4 and also give the readers a holistic overview of how different frequency ranges in the aperiodic ECG would be affected by age. So to fulfill all of these objectives we decided to fit slopes with respective upper/lower bounds around a range of 5Hz above and below the average 15Hz Knee Frequency across datasets.

      The later parts of this same paragraph refer to a vast amount of different frequency ranges, but only the "low" and "high" frequency ranges were previously mentioned. Perhaps the explanation could be expanded to note that multiple lower and upper bounds were tested within each of these low and high frequency windows?

      This is a good catch we adjusted the sentence as suggested. We now write: “.. slopes were fitted individually to each subject's power spectrum in several lower (0.25 – 20 Hz) and higher (10-145 Hz) frequency ranges.”

      The following two sentences seem to contradict each other: "Overall, spectral slopes in lower frequency ranges were more consistently related to heart rate variability indices(> 39.4% percent of all investigated indices)" and: "In the lower frequency range (0.25 - 20Hz), spectral slopes were consistently related to most measures of heart rate variability; i.e. significant effects were detected in all 4 datasets (see Figure 2D)." (39.4% is not "most").

      The reviewer is correct in stating that 39.4% is not most. However, the 39.4% is the lowest bound and only refers to 1 dataset. In the other 3 datasets the percentage of effects was above 64% which can be categorized as “most” i.e. above 50%. We agree that this was a bit ambiguous in the sentence so we added the other percentages as well as a reference to Figure 2D to make this point clearer.

      Figure 2D: it isn't clear what the percentages in the semi-circles reflect, nor why some semi-circles are more full circles while others are only quarter circles.

      The percentages in the semi-circles reflect the amount of effects (marked in red) and null effects (marked in green) per dataset, when viewed as average across the different measures of HRV. Sometimes less effects were found for some frequency ranges resulting in quarters instead of semi circles.

      Page 8: I think the authors could make it more clear that one of the conditions they were testing was the ECG component of the EEG data (extracted by ICA then projected back into the scalp space for the temporal response function analysis).

      As suggested by the reviewer we adjusted our wording and replaced the arguably a bit ambiguous “... projected back separately” with “... projected back into the sensor space”. We thank the reviewer for this recommendation, as it does indeed make it easier to understand the procedure.

      “After pre-processing (see Methods) the data was split in three conditions using an ICA(22). Independent components that were correlated (at r > 0.4; see Methods: MEG/EEG Processing - pre-processing) with the ECG electrode were either not removed from the data (Figure 3ABCD - blue), removed from the data (Figure 2ABCD - orange) or projected back into the sensor space (Figure 3ABCD - green).”

      Figure 4A: standardized beta coefficients for the relationship between age and spectral slope could be noted to provide improved clarity (if I'm correct in assuming that is what they reflect).

      This was indeed shown in Figure 4A and noted in the color bar as “average beta (standardized)”. We do not specifically highlight this in the text, because the exact coefficients would depend on both on the analyzed frequency range and the selected electrodes.

      Figure 4I: The regressions explained at this point seems to contain a very large number of potential predictors, as I'm assuming it includes all sensors for both the ECG component and ECG rejected conditions? (if that is not the case, it could be explained in greater detail). I'm also not sure about the logic of taking a complete signal, decomposing it with ICA to separate out the ECG and non-ECG signals, then including them back into the same regression model. It seems that there could be some circularity or redundancy in doing so. However, I'm not confident that this is an issue, so would appreciate the authors explaining why it this is a valid approach (if that is the case).

      After observing significant effects both in the MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> conditions in similar frequency bands we wanted to understand whether or not these age-related changes are statistically independent. To test this we added both variables as predictors in a regression model (thereby accounting for the influence of the other in relation to age). The regression models we performed were therefore actually not very complex. They were built using only two predictors, namely the data (in a specific frequency range) averaged over channels on which we noticed significant effects in the ECG rejected and ECG components data respectively (Wilkinson notation: age ~ 1 + ECG rejected + ECG components). This was also described in the results section stating that: “To see if MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub> explain unique variance in aging at frequency ranges where we noticed shared effects, we averaged the spectral slope across significant channels and calculated a multiple regression model with MEG<sub>ECG component</sub> and MEG<sub>ECG rejected</sub> as predictors for age (to statistically control for the effect of MEG<sub>ECG component</sub>s and MEG<sub>ECG rejected</sub> on age). This analysis was performed to understand whether the observed shared age-related effects (MEG<sub>ECG rejected</sub> and MEG<sub>ECG component</sub>) are in(dependent).”  

      We hope this explanation solves the previous misunderstanding.

      The explanation of results for relationships between spectral slopes and aging reported in Figure 4 refers to clusters of effects, but the statistical inference methods section doesn't explain how these clusters were determined.

      The wording of “cluster” was used to describe a “category” of effects e.g. null effects. We changed the wording from “cluster” to “category” to make this clearer stating now that: “This analysis, which is depicted in Figure 4, shows that over a broad amount of individual fitting ranges and sensors, aging resulted in a steepening of spectral slopes across conditions (see Figure 4E) with “steepening effects” observed in 25% of the processing options in MEG<sub>ECG not rejected</sub> , 0.5% in MEG<sub>ECG rejected</sub>, and 60% for MEG<sub>ECG components</sub>. The second largest category of effects were “null effects” in 13% of the options for MEG<sub>ECG not rejected</sub> , 30% in MEG<sub>ECG rejected</sub>, and 7% for MEG<sub>ECG components</sub>. ”

      Page 12: can the authors clarify whether these age related steepenings of the spectral slope in the MEG are when the data include the ECG contribution, or when the data exclude the ECG? (clarifying this seems critical to the message the authors are presenting).

      We apologize for not making this clearer. We now write: “This analysis also indicates that a vast majority of observed effects irrespective of condition (ECG components, ECG not rejected, ECG rejected) show a steepening of the spectral slope with age across sensors and frequency ranges.”

      Page 13: I think it would be useful to describe how much variance was explained by the MEG-ECG rejected vs MEG-ECG component conditions for a range of these analyses, so the reader also has an understanding of how much aperiodic neural activity might be influenced by age (vs if the effects are really driven mostly by changes in the ECG).

      With regards to the explained variance I think that the very important question of how strong age influences changes in aperiodic activity is a topic better suited for a meta analysis. As the effect sizes seems to vary largely depending on the sample e.g. for EEG in the literature results were reported at r=-0.08 (Cesnaite et al. 2023), r=-0.26 (Cellier et al. 2021), r=-0.24/r=-0.28/r=-0.35 (Hill et al. 2022) and r=0.5/r=0.7 (Voytek et al. 2015). I would defer the reader/reviewer to the standardized beta coefficients as a measure of effect size in the current study that is depicted in Figure 4A.

      Cellier, D., Riddle, J., Petersen, I., & Hwang, K. (2021). The development of theta and alpha neural oscillations from ages 3 to 24 years. Developmental cognitive neuroscience, 50, 100969.

      Cesnaite, E., Steinfath, P., Idaji, M. J., Stephani, T., Kumral, D., Haufe, S., ... & Nikulin, V. V. (2023). Alterations in rhythmic and non‐rhythmic resting‐state EEG activity and their link to cognition in older age. NeuroImage, 268, 119810.

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076.

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38), 13257-13265.

      Also, if there are specific M/EEG sensors where the 1/f activity does relate strongly to age, it would be worth noting these, so future research could explore those sensors in more detail.

      I think it is difficult to make a clear claim about this for MEG data, as the exact location or type of the sensor may differ across manufacturers. Such a statement could be easier made for source projected data or in case EEG electrodes were available, where the location would be normed eg. according to the 10-20 system.

      DISCUSSION:

      Page 15: Please change the wording of the following sentence, as the way it is currently worded seems to suggest that the authors of the current manuscript have demonstrated this point (which I think is not the case): "The authors demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods."

      Apologies for the oversight! The reviewer is correct we in fact did not show this, but the authors of the cited manuscript. We correct the sentence as suggested stating now that:

      “Bénar et al. demonstrate that EEG typically integrates activity over larger volumes than MEG, resulting in differently shaped spectra across both recording methods.”

      Page 16: The authors mention the results can be sensitive to the application of SSS to clean the MEG data, but not ICA. I think it would be sensitive to the application of either SSS or ICA?

      This is correct and actually also supported by Figure S7, as differences in ICA thresholds affect also the detection of age-related effects. We therefore adjusted the related sentences stating now that:

      “ In case of the MEG signal this may include the application of Signal-Space-Separation algorithms (SSS(24,55)), different thresholds for ICA component detection (see Figure S7), high and low pass filtering, choices during spectral density estimation (window length/type etc.), different parametrization algorithms (e.g. IRASA vs FOOOF) and selection of frequency ranges for the aperiodic slope estimation.”

      It would be worth clarifying that the linked mastoid re-reference alone has been proposed to cancel out the ECG signal, rather than that a linked-mastoid re-reference improves the performance of the ICA separation (which could be inferred by the explanation as it's currently written).

      This is correct and we adjusted the sentence accordingly! Stating now that:

      “ Previous work(12,56) has shown that a linked mastoid reference alone was particularly effective in reducing the impact of ECG related activity on aperiodic activity measured using EEG. “

      The issue of the number of EEG channels could probably just be noted as a potential limitation, as could the issue of neural activity being mixed into the ECG component (although this does pose a potential confound to the M/EEG without ECG condition, I suspect it wouldn't be critical).

      This is indeed a very fair point as a higher amount of electrodes would probably make it easier to better isolate ECG components in the EEG, which may be the reason why the separation did not work so well in our case. However, this is ultimately an empirical question so we highlighted it in the discussion section stating that: “Difficulties in removing ECG related components from EEG signals via ICA might be attributable to various reasons such as the number of available sensors or assumptions related to the non-gaussianity of the underlying sources. Further understanding of this matter is highly important given that ICA is the most widely used procedure to separate neural from peripheral physiological sources. ”

      OUTLOOK:

      Page 19: Although there has been a recent trend to control for 1/f activity when examining oscillatory power, recent research suggests that this should only be implemented in specific circumstances, otherwise the correction causes more of a confound than the issue does. It might be worth considering this point with regards to the final recommendation in the Outlook section: Brake, N., Duc, F., Rokos, A., Arseneau, F., Shahiri, S., Khadra, A., & Plourde, G. (2024). A neurophysiological basis for aperiodic EEG and the background spectral trend. Nature Communications, 15(1), 1514.

      We want to thank the reviewer for recommending this very interesting paper! The authors of said paper present compelling evidence showing that, while peak detection above an aperiodic trend using methods like FOOOF or IRASA is a prerequisite to determine the presence of oscillatory activity, it’s not necessarily straightforward to determine which detrending approach should be applied to determine the actual power of an oscillation. Furthermore, the authors suggest that wrongfully detrending may cause larger errors than not detrending at all. We therefore added a sentence stating that: “However, whether or not periodic activity (after detection) should be detrended using approaches like FOOOF or IRASA still remains disputed, as incorrectly detrending the data may cause larger errors than not detrending at all(75).”

      RECOMMENDATIONS:

      Page 20: "measure and account for" seems like it's missing a word, can this be re-written so the meaning is more clear?

      Done as suggested. The sentence now states: “To better disentangle physiological and neural sources of aperiodic activity, we propose the following steps to (1) measure and (2) account for physiological influences.”

      I would re-phrase "doing an ICA" to "reducing cardiac artifacts using ICA" (this wording could be changed in other places also).

      I do not like to describe cardiac or ocular activity as artifactual per se. This is also why I used hyphens whenever I mention the word “artifact” in association with the ECG or EOG. However, I do understand that the wording of “doing an ICA” is a bit sloppy. We therefore reworded it accordingly throughout the manuscript to e.g. “separating cardiac from neural sources using an ICA” and “separating physiological from neural sources using an ICA”.

      I would additionally note that even if components are identified as unambiguously cardiac, it is still likely that neural activity is mixed in, and so either subtracting or leaving the component will both be an issue (https://doi.org/10.1101/2024.06.06.597688). As such, even perfect identification of whether components are cardiac or not would still mean the issue remains (and this issue is also consistent across a considerable range of component based methods). Furthermore, current methods including wavelet transforms on the ICA component still do not provide good separation of the artifact and neural activity.

      This is definitely a fair point and we also highlight this in our recommendations under 3 stating that:

      “However, separating physiological from neural sources using an ICA is no guarantee that peripheral physiological activity is fully removed from the cortical signal. Even more sophisticated ICA based methods that e.g. apply wavelet transforms on the ICA components may still not provide a good separation of peripheral physiological and neural activity76,77. This turns the process of deciding whether or not an ICA component is e.g. either reflective of cardiac or neural activity into a challenging problem. For instance, when we only extract cardiac components using relatively high detection thresholds (e.g. r > 0.8), we might end up misclassifying residual cardiac activity as neural. In turn, we can’t always be sure that using lower thresholds won’t result in misinterpreting parts of the neural effects as cardiac. Both ways of analyzing the data can potentially result in misconceptions.”

      Castellanos, N. P., & Makarov, V. A. (2006). Recovering EEG brain signals: Artifact suppression with wavelet enhanced independent component analysis. Journal of neuroscience methods, 158(2), 300-312.

      Bailey, N. W., Hill, A. T., Godfrey, K., Perera, M. P. N., Rogasch, N. C., Fitzgibbon, B. M., & Fitzgerald, P. B. (2024). EEG is better when cleaning effectively targets artifacts. bioRxiv, 2024-06.

      METHODS:

      Pre-processing, page 24: I assume the symmetric setting of fastica was used (rather than the deflation setting), but this should be specified.

      Indeed the reviewer is correct, we used the standard setting of fastICA implemented in MNE python, which is calling the FastICA implementation in sklearn that is per default using the “parallel” or symmetric algorithm to compute an ICA. We added this information to the text accordingly, stating that:

      “For extracting physiological “artifacts” from the data, 50 independent components were calculated using the fastica algorithm(22) (implemented in MNE-Python version 1.2; with the parallel/symmetric setting; note: 50 components were selected for MEG for computational reasons for the analysis of EEG data no threshold was applied).”

      Temporal response functions, page 26: can the authors please clarify whether the TRF is computed against the ECG signal for each electrode or sensory independently, or if all electrodes/sensors are included in the analysis concurrently? I'm assuming it was computed for each electrode and sensory separately, since the TRF was computed in both the forward and backwards direction (perhaps the meaning of forwards and backwards could be explained in more detail also - i.e. using the ECG to predict the EEG signal, or using the EEG signal to predict the ECG signal?).

      A TRF can also be conceptualized as a multiple regression model over time lags. This means that we used all channels to compute the forward and backward models. In the case of the forward model we predicted the signal of the M/EEG channels in a multivariate regression model using the ECG electrode as predictor. In case of the backward model we predicted the ECG electrode based on the signal of all M/EEG channels. The forward model was used to depict the time window at which the ECG signal was encoded in the M/EEG recording, which appears at 0 time lags indicating volume conduction. The backward model was used to see how much information of the ECG was decodable by taking the information of all channels.

      We tried to further clarify this approach in the methods section stating that:

      “We calculated the same model in the forward direction (encoding model; i.e. predicting M/EEG data in a multivariate model from the ECG signal) and backward direction (decoding model; i.e. predicting the ECG signal using all M/EEG channels as predictors).”

      Page 27: the ECG data was fit using a knee, but it seems the EEG and MEG data was not.

      Does this different pose any potential confound to the conclusions drawn? (having said this, Figure S4 suggests perhaps a knee was tested in the M/EEG data, which should perhaps be explained in the text also).

      This was indeed tested in a previous review round to ensure that our results are not dependent on the presence/absence of a knee in the data. We therefore added figure S4, but forgot to actually add a description in the text. We are sorry for this oversight and added a paragraph to S1 accordingly:

      “Using FOOOF(5), we also investigated the impact of different slope fitting options (fixed vs. knee model fits) on the aperiodic age relationship (see Supplementary Figure S4). The results that we obtained from these analyses using FOOOF offer converging evidence with our main analysis using IRASA.”

      Page 32: my understanding of the result reported here is that cleaning with ICA provided better sensitivity to the effects of age on 1/f activity than cleaning with SSS. Is this accurate? I think this could also be reported in the main manuscript, as it will be useful to researchers considering how to clean their M/EEG data prior to analyzing 1/f activity.

      The reviewer is correct in stating that we overall detected slightly more “significant” effects, when not additionally cleaning the data using SSS. However, I am a bit wary of recommending omitting the use of SSS maxfilter solely based on this information. It can very well be that the higher quantity of effects (when not employing SSS maxfilter) stems from other physiological sources (e.g. muscle activity) that are correlated with age and removed when applying SSS maxfiltering. I think that just conditioning the decision of whether or not maxfilter is applied based on the amount or size of effects may not be the best idea. Instead I think that the applicability of maxfilter for research questions related to aperiodic activity should be the topic of additional methodological research. We therefore now write in Text S1:

      “Considering that we detected less and weaker aperiodic effects when using SSS maxfilter is it advisable to omit maxfilter, when analyzing aperiodic signals? We don’t think that we can make such a judgment based on our current results. This is because it's unclear whether or not the reduction of effects stems from an additional removal of peripheral information (e.g. muscle activity; that may be correlated with aging) or is induced by the SSS maxfiltering procedure itself. As the use of maxfilter in detecting changes of aperiodic activity was not subject of analysis that we are aware of, we suggest that this should be the topic of additional methodological research.”

      Page 39, Figure S6 and Figure S8: Perhaps the caption could also briefly explain the difference between maxfilter set to false vs true? I might have missed it, but I didn't gain an understanding of what varying maxfilter would mean.

      Figure S6 shows the effect of ageing on the spectral slope averaged across all channels. The maxfilter set to false in AB) means that no maxfiltering using SSS was performed vs. in CD) where the data was additionally processed using the SSS maxfilter algorithm. We now describe this more clearly by writing in the caption:

      “Supplementary Figure S6: Age-related changes in aperiodic brain activity are most prominent on explained by cardiac components irrespective of maxfiltering the data using signal space separation (SSS) or not AC) Age was used to predict the spectral slope (fitted at 0.1-145Hz) averaged across sensors at rest in three different conditions (ECG components not rejected [blue], ECG components rejected [orange], ECG components only [green].”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Weber et al. investigate the role of 4 dopaminergic neurons of the Drosophila larva in mediating the association between an aversive high-salt stimulus and a neutral odor. The 4 DANs belong to the DL1 cluster and innervate non-overlapping compartments of the mushroom body, distinct from those involved in appetitive associative learning. Using specific driver lines, they show that activation of the DAN-g1 is sufficient to mimic an aversive memory and it is also necessary to form a high-salt memory of full strength, although optogenetic silencing of this neuron only partially affects the performance index. The authors use calcium imaging to show that the DAN-g1 is not the only one that responds to salt. DAN-c1 and d1 also respond to salt, but they seem to play no role in the assays tested. DAN-f1, which does not respond to salt, is able to lead to the formation of memory (if optogenetically activated), but it is not necessary for the salt-odor memory formation in normal conditions. However, silencing of DAN-f1 together with DAN-g1, enhances the memory deficit of DAN-g1.

      Strengths:

      The paper therefore reveals that also in the Drosophila larva as in the adult, rewards and punishments are processed by exclusive sets of DANs and that a complex interaction between a subset of DANs mediates salt-odor association.

      Overall, the manuscript contributes valuable results that are useful for understanding the organization and function of the dopaminergic system. The behavioral role of the specific DANs is accessed using specific driver lines which allow for testing of their function individually and in pairs. Moreover, the authors perform calcium imaging to test whether DANs are activated by salt, a prerequisite for inducing a negative association with it. Proper genetic controls are carried across the manuscript.

      Weaknesses:

      The authors use two different approaches to silence dopaminergic neurons: optogenetics and induction of apoptosis. The results are not always consistent, and the authors could improve the presentation and interpretation of the data. Specifically, optogenetics seems a better approach than apoptosis, which can affect the overall development of the system, but apoptosis experiments are used to set the grounds of the paper.

      The physiological data would suggest the role of a certain subset of DANs in salt-odor association, but a different partially overlapping set seems to be necessary. This should be better discussed and integrated into the author's conclusion. The EM data analysis reveals a non-trivial organization of sensory inputs into DANs and it is hard to extrapolate a link to the functional data presented in the paper.

      We would like to thank reviewer 1 for the positive evaluation of our work and for the critical suggestions for improvement. In the new version of the manuscript, we have centralized the optogenetic results and moved some of the ablation experiments to the Supplement. We also discuss in detail the experimental differences in the results. In addition, we have softened our interpretation of the specificity of memory for salt. As a result, we now emphasize more the general role of DANs for aversive learning in the larva. These changes are now also summarized and explained more simply and clearly in the Discussion, along with a revised discussion of the EM data.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors show that dopaminergic neurons (DANs) from the DL1 cluster in Drosophila larvae are required for the formation of aversive memories. DL1 DANs complement pPAM cluster neurons which are required for the formation of attractive memories. This shows the compartmentalized network organization of how an insect learning center (the mushroom body) encodes memory by integrating olfactory stimuli with aversive or attractive teaching signals. Interestingly, the authors found that the 4 main dopaminergic DL1 neurons act redundantly, and that single-cell ablation did not result in aversive memory defects. However, ablation or silencing of a specific DL1 subset (DAN-f1,g1) resulted in reduced salt aversion learning, which was specific to salt but no other aversive teaching stimuli were tested. Importantly, activation of these DANs using an optogenetic approach was also sufficient to induce aversive learning in the presence of high salt. Together with the functional imaging of salt and fructose responses of the individual DANs and the implemented connectome analysis of sensory (and other) inputs to DL1/pPAM DANs, this represents a very comprehensive study linking the structural, functional, and behavioral role of DL1 DANs. This provides fundamental insight into the function of a simple yet efficiently organized learning center which displays highly conserved features of integrating teaching signals with other sensory cues via dopaminergic signaling.

      Strengths:

      This is a very careful, precise, and meticulous study identifying the main larval DANs involved in aversive learning using high salt as a teaching signal. This is highly interesting because it allows us to define the cellular substrates and pathways of aversive learning down to the single-cell level in a system without much redundancy. It therefore sets the basis to conduct even more sophisticated experiments and together with the neat connectome analysis opens the possibility of unraveling different sensory processing pathways within the DL1 cluster and integration with the higher-order circuit elements (Kenyon cells and MBONs). The authors' claims are well substantiated by the data and clearly discussed in the appropriate context. The authors also implement neat pathway analyses using the larval connectome data to its full advantage, thus providing network pathways that contribute towards explaining the obtained results.

      Weaknesses:

      While there is certainly room for further analysis in the future, the study is very complete as it stands. Suggestions for clarification are minor in nature.

      We would like to thank reviewer 2 for the positive evaluation of our work. In fact, follow-up work is already underway to further analyze the role of the individual DL1 DANs. We have addressed the constructive and detailed suggestions for improvement in our point-by-point responses in the “Recommendations for the authors” section.

      Reviewer #3 (Public Review):

      The study of Weber et al. provides a thorough investigation of the roles of four individual dopamine neurons for aversive associative learning in the Drosophila larva. They focus on the neurons of the DL-1 cluster which already have been shown to signal aversive teaching signals. However, the authors go far beyond the previous publications and test whether each of these dopamine neurons responds to salt or sugar, is necessary for learning about salt, bitter, or sugar, and is sufficient to induce a memory when optogenetically activated. In addition, previously published connectomic data is used to analyze the synaptic input to each of these dopamine neurons. The authors conclude that the aversive teaching signal induced by salt is distributed across the four DL-1 dopamine neurons, with two of them, DAN-f1 and DAN-g1, being particularly important. Overall, the experiments are well designed and performed, support the authors' conclusions, and deepen our understanding of the dopaminergic punishment system.

      Strengths:

      (1) This study provides, at least to my knowledge, the first in vivo imaging of larval dopamine neurons in response to tastants. Although the selection of tastants is limited, the results close an important gap in our understanding of the function of these neurons.

      (2) The authors performed a large number of experiments to probe for the necessity of each individual dopamine neuron, as well as combinations of neurons, for associative learning. This includes two different training regimens (1 or 3 trials), three different tastants (salt, quinine, and fructose) and two different effectors, one ablating the neuron, the other one acutely silencing it. This thorough work is highly commendable, and the results prove that it was worth it. The authors find that only one neuron, DAN-g1, is partially necessary for salt learning when acutely silenced, whereas a combination of two neurons, DAN-f1 and DAN-g1, are necessary for salt learning when either being ablated or silenced.

      (3) In addition, the authors probe whether any of the DL-1 neurons is sufficient for inducing an aversive memory. They found this to be the case for three of the neurons, largely confirming previous results obtained by a different learning paradigm, parameters, and effector.

      (4) This study also takes into account connectomic data to analyze the sensory input that each of the dopamine neurons receives. This analysis provides a welcome addition to previous studies and helps to gain a more complete understanding. The authors find large differences in inputs that each neuron receives, and little overlap in input that the dopamine neurons of the "aversive" DL-1 cluster and the "appetitive" pPAM cluster seem to receive.

      (5) Finally, the authors try to link all the gathered information in order to describe an updated working model of how aversive teaching signals are carried by dopamine neurons to the larva's memory center. This includes important comparisons both between two different aversive stimuli (salt and nociception) and between the larval and adult stages.

      Weaknesses:

      (1) The authors repeatedly claim that they found/proved salt-specific memories. I think this is problematic to some extent.

      (1a) With respect to the necessity of the DL-1 neurons for aversive memories, the authors' notion of salt-specificity relies on a significant reduction in salt memory after ablating DAN-f1 and g1, and the lack of such a reduction in quinine memory. However, Fig. 5K shows a quite suspicious trend of an impaired quinine memory which might have been significant with a higher sample size. I therefore think it is not fully clear yet whether DAN-f1 and DAN-g1 are really specifically necessary for salt learning, and the conclusions should be phrased carefully.

      (1b) With respect to the results of the optogenetic activation of DL-1 neurons, the authors conclude that specific salt memories were established because the aversive memories were observed in the presence of salt. However, this does not prove that the established memory is specific to salt - it could be an unspecific aversive memory that potentially could be observed in the presence of any other aversive stimuli. In the case of DAN-f1, the authors show that the neuron does not even get activated by salt, but is inhibited by sugar. Why should activation of such a neuron establish a specific salt memory? At the current state, the authors clearly showed that optogenetic activation of the neurons does induce aversive memories - the "content" of those memories, however, remains unknown.

      (2) In many figures (e.g. figures 4, 5, 6, supplementary figures S2, S3, S5), the same behavioural data of the effector control is plotted in several sub-figures. Were these experiments done in parallel? If not, the data should not be presented together with results not gathered in parallel. If yes, this should be clearly stated in the figure legends.

      We would also like to thank reviewer 3 for his positive assessment of our work. As already mentioned by reviewer 1, we understand the criticism that the salt specificity for which the individual DANs are coded is not fully always supported by the results of the work. We have therefore rewritten the relevant passages, which are also cited by the reviewer. We have also included the second point of criticism and incorporated it into our manuscript. As the control groups were always measured in parallel with the experimental animals, we can also present the data together in a sub-figure. We clearly state this now in the revised figure legends.

      Summary of recommendations to authors:

      Overall, the study is commendable for its systematic approach and solid methodology. Several weaknesses were identified, prompting the need for careful revisions of the manuscript:

      We thank the reviewers for the careful revision of our manuscript. In the subsequent sections, we aim to address their concerns as thoroughly as possible. A comprehensive one-to-one listing can be found below.

      (1) The authors should reconsider their assertion of uncovering a salt-specific memory, as the evidence does not conclusively demonstrate the exclusive necessity of DAN-f1 and DAN-g1 for salt learning. In particular, the optogenetic activation of DAN-f1 leads to plasticity but this might not be salt-specific. The precise nature of the memory content remains elusive, warranting a nuanced rephrasing of the conclusions.

      We only partially agree – optogenetic activation of DANs does not really allow to comment on its salt-specificity, true. However, we used high-salt concentrations during test. Over the years, the Gerber lab nicely demonstrated in several papers that larvae recall an aversive odor-salt memory only if salt is present during test (Gerber and Hendel, 2006; Niewalda et al 2008; Schleyer et al. 2011; Schleyer et al. 2015). The used US has to be present during test. Even at the same concentration other aversive stimuli (e.g. bitter quinine) are not able to allow the larvae to recall this particular type of memory. So, if the optogenetic activation of DAN-f1 establishes a memory that can be recalled on salt, we argue that it has to encode aspects of the salt information. On the other hand, only for DAN-g1 we see the necessity for salt learning. And – although (based on the current literature) very unlikely, we cannot fully exclude that the activation of DAN-f1 establishes a yet unknown type of memory that can be also recalled on a salt plate. Therefore, we partially agree and accordingly have rephrased the entire manuscript to avoid an over-interpretation of our data. Throughout the manuscript we avoid now to use the term salt-specific memory but rather describe the type of memory as aversive memory.

      (2) A thorough examination or discussion about the potential influence of blue light aversion on behavioral observations is necessary to ensure a balanced interpretation of the findings.

      To address this point every single behavioral experiment that uses optogenetic blue light activation runs with appropriate and mandatory controls. For blue light activation experiments, two genetic controls are used that either get the same blue light treatment (effector control, w1118>UAS-ChR2XXL) or no blue light treatment (dark control, XY-split-Gal4>UAS-ChR2XXL). For blue light inactivation experiments one group is added that has exactly the same genotype but did not receive food containing retinal. These experiments show that blue light exposure itself does not induce an aversive nor positive memory and blue light exposure does not impair the establishment of odor-high salt memory. In addition, we used the latest established transgenes available. ChR2<sup>XXL</sup> is very sensitive to blue light. Only 220 lux (60 µW/cm<sup>²</sup>) were necessary to obtain stable results. In our hands – short term exposure for up to 5 minutes with such low intensities does not induce a blue light aversion. Following the advice of the reviewer, we also address this concern by adding several sentences into the related results and methods sections.

      (3) The authors should address the limitations associated with the use of rpr/hid for neuronal ablations, such as the effects of potential developmental compensation.

      We agree with this concern. It is well possible that the ablation experiments induce compensatory effects during larval development. Such an effect may be the reason for differences in phenotypes when comparing hid,rpr ablation with optogenetic inhibition. This is now part of the discussion. In addition, we evaluated if the ablation worked in our experiments. So far controls were missing that show that the expression of hid,rpr really leads to the ablation of DANs. We now added these experiments and clearly show anatomically that the DANs are ablated (related to figure 4-figure supplement 6).

      (4) While the connectome analysis offers valuable insights into the observed functions of specific DANs in relation to their extrinsic (sensory) and intrinsic (state) inputs, integrating this data more cohesively within the manuscript through careful rewriting would enhance the coherence of the study.

      We understand this concern. Therefore, the new version of our manuscript is now intensifying the inclusion of the EM data in our interpretation of the results. Throughout the entire manuscript we have now rewritten the related parts. We have also completely revised the corresponding section in the results chapter.

      (5) More generally, the authors are encouraged to discuss internal discrepancies in the results of their functional manipulation experiments.

      Thank you for this suggestion. We do of course understand that we have not given the different results enough space in the discussion. We have now changed this and have been happy to comprehensively address the concern. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Here are some suggestions for clarification and improvement of the manuscript:

      (1) The authors should discuss why the silencing experiment with TH-GAL4 (Fig. 1) does not abolish memory formation (I assume that the PI should go to zero). Does it mean that other non-TH neurons are involved in salt-odor memory formation? Are there other lines that completely abolish this type of learning?

      Thank you very much for highlighting this crucial point. Indeed, the functional intervention does not completely eliminate the memory. There could be several reasons, or a combination thereof, for this outcome. For instance, it's plausible that the UAS-GtACR2 effector doesn't entirely suppress the activity of dopaminergic neurons. Additionally, the memory may comprise different types, not all of which are linked to dopamine function. It's also noteworthy that TH-Gal4 doesn't encompass all dopaminergic neurons – even a neuron from the DL1 cluster is absent (as previously reported in Selcho et al., 2009). Considering we're utilizing high salt concentrations in this experiment, it's conceivable that non gustatory-driven memories are formed based solely on the systemic effects of salt (e.g., increased osmotic pressure). These possibilities are now acknowledged in the text.

      (2) The Rpr experiments in Fig. 4 do not lead to any phenotype and there is a general assumption that the system compensates during development. However, there is no demonstration that Rpr worked or that development compensated for that. What do we learn from these data? Would it make sense to move it to supplement to make the story more compact? In addition: the conclusion at L 236 "DL1.... Are not individually necessary" is later disproved by optogenetic silencing. Similarly, optogenetic silencing of f1+g1 is affecting 1X and 3X learning, but not when using Rpr. Moreover, Rpr wdid not give any phenotype in other data in the supplementary material. I'm not sure how valid these results are.

      We acknowledge this concern and have actively deliberated various options for restructuring the presented ablation data. Ultimately, we reached a consensus that relocating Figure 4 to the supplement is warranted. Furthermore, corresponding adjustments have been made in the text. This decision amplifies the significance of the optogenetic results. In addition, we also addressed the other part of the concern. We examined the efficacy of hid and rpr in our experiments. Indeed, we successfully ablated specific DANs, as illustrated in the new anatomical data presented in Figure 4- figure supplement 6, which strengthens the interpretation of the hid,rpr experiments.

      (3) In most figures that show data for 1X and 3X training, there is no difference between these two conditions (I would suggest moving one set as a supplement). When a difference appears (Fig.5A-D) the implications are not discussed properly. Is it known that some circuits are necessary for the 1X but not for the 3X protocol? Is that a reasonable finding? I would expect the opposite, but I might lack of knowledge here. However, the optogenetic silencing of the same neurons in Figure 7 shows the same phenotype for 1X and 3X. Again, the validity of the Rpr experiments seems debatable.

      Different training protocols lead to different memory phases (STM and STM+ARM). We have shown that in the past in Widmann et al. 2016. Therefore, we are convinced that it makes sense to keep both data sets in the main manuscript. However, we agree that this was not properly introduced and discussed and therefore made the respective changes in the manuscript.

      (4) In Figure 3, it is unclear what the responses were tested against. Since they are so small and noisy there would be a need for a control. Moreover, in some cases, it looks like the DF/F is normalized to the wrong value: e.g. in DAN-c1 100mM, the activity in 0-10s is always above zero, and in pPAM with fructose is always below zero. This might not have any consequence on the results but should be adjusted.

      Thank you very much for your criticism, which we greatly appreciate. We have carefully re-examined the data and found that there was a mistake for the normalization of the values. We made the necessary adjustments to the evaluation, as per your suggestions. The updated figures, figure legends, and results have been incorporated into the new version of the manuscript. As noted by the reviewer, these corrections have not altered the interpretation of the data or the primary responses of the various DANs.

      (5) In the abstract: "Optogenetic activation of DAN-f1 and DAN-g1 alone suffices to substitute for salt punishment... Each DAN encodes a different aspect of salt punishment". These sentences might be misleading and an overstatement: only DAN-g1 shows a clear role, while the function of the other DANs in the context of salt-odor learning remains obscure.

      We have refined the respective part of the abstract accordingly. Consequently, we have reworded the related section, aiming to avoid any exaggeration.

      (6) The physiology is done in L1 larvae but behavior is tested in L3 larvae. There could be a change in this time that could explain the salt responses in c1 and d1 but no role in salt-odor learning?

      While we cannot dismiss the possibility of a developmental change from L1 to L3, a comparison of the anatomical data of the DL1 DANs from electron microscopy (EM) and light microscopy (LM) data indicates that their overall morphology remains consistent. However, it's important to note that this observation does not analyse the physiological aspects of these cells. Consequently, we have incorporated this concern into the discussion of the revised version of the manuscript.

      (7) The introduction needs some editing starting at L 129, as it ends with a discussion of a previously published EM data analysis. I would rather suggest stating which questions are addressed in this paper and which methods will be used and perhaps a hint on the results obtained.

      We understand the concern. We have added a concise paragraph to the conclusion of the introduction, highlighting the biological question, technical details, and a short hint on the acquired findings.

      (8) It is clear to me that the presentation of salt during the test is necessary for recall, however in L 166 I don't understand the explanation: how is the memory used in a beneficial way in the test? The salt is present everywhere and the odor cue is actually useless to escape it.

      Extensive research, exemplified by studies such as Schleyer et al. (2015) published in Elife, clearly demonstrates that the recall of odor-high salt memory occurs exclusively when tested on a high salt plate. Even when tested on a bitter quinine plate, the aversive memory is not recalled. This phenomenon is attributed to the triggering of motivation to recall the memory by the omnipresent abundance of the unconditioned stimulus (US) during the test, which in our case is high salt. Furthermore, the concentration of the stimulus plays a crucial role (Schleyer et al. 2011). The odor cue indicates where the situation could potentially be improved; however, if high salt is absent, this motivational drive diminishes as there is no memory present to enhance the already favorable situation. Additionally, the motivation to evade the omnipresent and unpleasant high salt stimulus persists throughout the entire 5-minute test period.

      (9) L288: the fact that f1 shows a phenotype in this experiment does not mean that it encodes a salt signal, indeed it does not respond to salt. It perhaps induces a plasticity that can be recalled by salt, but not necessarily linked to salt. The synergy between f1 and g1 in the salt assay was postulated based on exp with Rpr, but the validity of these experiments is dubious. I'm not sure there is sufficient evidence from Figures 6 and 7 to support a synergistic action between f1 and g1.

      It is true that DAN-f1 alone is not necessary for mediating a high salt teaching signal based on ablation, optogenetic inhibition and even physiology. However, optogenetic activation alone shows a memory tested on a salt plate. Given the logic explained above that is accepted by several publications, we would like to keep the statement. Especially as the joined activation with DAN-g1 gives rise to significant higher or lower values after joined optogenetic activation or inactivation (Figure 5E and F, Figure 6E and F in the new version). Nevertheless, we have modified the sentence. In the text we describe these effects now as “these results may suggest that DAN-f1 and DAN-g1 encode aspects of the natural aversive high salt teaching signal under the conditions that we tested”. We think that this is an appropriate and three-fold restricted statement. Therefore, we would like to keep it in this restricted version. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (10) I find the EM analysis hard to read. First of all, because of the two different graphical representations used in Fig. 8, wouldn't one be sufficient to make the point? Secondly, I could not grasp a take-home-message: what do we learn from the EM data? Do they explain any of the results? It seems to me that they don't provide an explanation of why some DL1 neurons respond to salt and others don't.

      We understand that the EM analysis is hard to read and have now carefully rewritten this part of the manuscript. See also general concern 4 above. The main take home message is not to explain why some DL1 neurons respond to salt and other do not. This cannot be resolved due to the missing information on the salt perceiving receptor cells. Unfortunately, we miss the peripheral nervous system in the EM - the first layer of salt information processing. However, our analysis shows clearly that the 4 DANs have their own identity based on their connectivity. None of them is the same – but to a certain extent similarities exist. This nicely reflects the physiological and behavioral results. We have now clarified that in the result to ease the understanding for the readership. In addition, we also clearly state that we don’t address the point why some DL1 neurons respond to salt and why others don’t respond.

      (11) Do the manipulations (activation and silencing) affect odor preference in the presence of salt? Did the authors test that the two odors do not drive different behaviors on the salty plate? Or did they only test the odor preference on plain agarose? Can we exclude a role for the DAN in driving multisensory-driven innate behavior?

      Innate odor preferences are not changed by the presence of salt or even other tastants (this work but see also Schleyer et al 2015, Figure 3, Elife). Even the naïve choice between two odors is the same if tested in the presence of different tastants (Schleyer et al 2015, Figure 3, Elife). This shows – at least for the tested stimuli and conditions – that are similar to the ones that we use – that there is no multisensory-driven innate odor-taste behavior. Therefore – at least to our knowledge - experiments as the ones suggested by the reviewer were never done in larval odor-taste learning studies. Therefore, we suggest that DAN activation has no effect on innate larval behavior. However, we are happy to reconsider this if the reviewer thinks it is critical. 

      (12) L 280: the authors generalize the conclusion to all DL1-DANs, but it does not apply to c1 and d1.

      Thanks for this comment. We deleted that sentence as suggested and thus do not anymore generalize the conclusion to all DL-DANs.

      (13) L345: I do not see the described differences in Fig. 8F, presynaptic sites of both types seem to appear in rather broad regions: could the author try to clarify this?

      We understand that the anatomical description of the data is often hard to read. Especially to readers that are not used to these kind of figures. We have therefore modified the text to ease the understanding and clarify the difference in the labeled brain regions for the broad readership.

      (14) L373: the conclusion on c1 is unsupported by data: this neuron responds to both salt and fructose (Figure 3 ) while the conclusion is purely based on EM data analysis.

      The sentence is not a conclusion but a speculation and we also list the cell's response to positive and negative gustatory stimuli. Therefore, we do not understand exactly what the reviewer means here. However, we have tried to address the criticism and have revised the sentences.

      (15) L385: the data on d1 seem to be inconsistent with Eschbach 2020, but the authors do not discuss if this is due to the differential vs absolute training, or perhaps the presence of the US during the test (which does not seem to be there in Eschbach, 2020) - is the training protocol really responsible for this inconsistency? For f1 the data seem to be consistent across these studies. The authors should clarify how the exp in Fig 6 differs from Eschbach, 2020 and how one could interpret the differences.

      True. This concern is correct. We now discuss the difference in more detail. Eschbach et al. used Cs-Crimson as a genetic tool, a one odor paradigm with 3 training cycles, and no gustatory cues in their approach. These differences are now discussed in the new version of the manuscript.

      (16) L460-475 A long part of this paragraph discusses the similarities between c1 and d1 and corresponding PPL1 neurons in the adult fly. However, c1 and d1 do not really show any phenotype in this paper, I'm not sure what we learn from this discussion and how much this paper can contribute to it. I would have wished for a discussion of how one could possibly reconcile the observed inconsistencies.

      Based on the comments of the different reviewers several paragraphs in the discussion were modified. We agree that the part on the larval-adult comparison is quite long. Thus we have shortened it as suggested by the reviewer.

      Minor corrections:

      L28 "resultant association" maybe resulting instead.

      L55 "animals derive benefit": remove derive.

      L78 "composing 12,000 neurons": composed of.

      L79 what is stable in a "stable behavioral assay"?

      L104: 2 times cluste.

      L122: "DL1 DANs are involved" in what?

      Fig. 1 please check subpanels labels, D repeats.

      L 362: "But how do individual neurons contribute to the teaching signal of the complete cluster?" I don't understand the question.

      L364 I did not hear before about the "labeled line hypothesis" in this context - could the author clarify?

      L368: edit "combinatorically".

      L390: "current suppression" maybe acute suppression.

      L 400 I'm not sure what is meant by "judicious functional configuration" and "redundancy". The functions of these cells are not redundant, and no straightforward prediction of their function can be done from their physiological response to salt.

      Thanks a lot for your in detail review of our manuscript. We welcome your well-taken concerns and have made the requested changes for all points that you have raised.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1 the reconstruction of pPAM and DL1 DANs shows the compartmentalized innervation of the larval MB. However, the images are a bit low in color contrast to appreciate the innervation well. In particular in panel B, it is hard to identify the innervated MB body structure. A schematic model of the larval MB and DAN innervation domains like in Fig. 2A would help to clarify the innervation pattern to the non-specialist.

      We understand this concern and have changed figure 1 as suggested by the reviewer. A schematic model of the MB and DANs is now presented already in figure 1 as well as the according supplemental figure.

      (2) Blue light itself can be aversive for larvae and thus interfere with the aversive learning paradigm. Does the given Illuminance (220 lux) used in these experiments affect the behavior and learning outcome?

      Yes, in former times high intensities of blue light were necessary to trigger the first generation optogenetic tools. The high intensity blue light itself was able to establish an aversive memory (e.g. Rohwedder et al. 2016). Usage of the second generation optogenetic tools allowed us to strongly reduce the applied light intensity. Now we use 220 lux (equal to 60 µW/cm<sup>2</sup>). Please note that all Gal4 and UAS controls in the manuscript are nonsignificant different from zero. The mild blue light stimulation therefore does not serve as a teaching signal and has neither an aversive nor an appetitive effect. Furthermore, we use this mild light intensity for several other behavioral paradigms (locomotion, feeding, naïve preferences) and have never seen an effect on the behavior.

      (3) Fig.2: Except for MB054B-Gal4 only the MB expression pattern is shown for other lines. Is there any additional expression in other cells of the brain? In the legend in line 761, the reporter does not show endogenous expression, rather it is a fluorescent reporter signal labeling the mushroom body.

      The lines were initially identified by a screen on larval MB neurons done together with Jim Truman, Marta Zlatic and Bertram Gerber. Here full brain scans were always analyzed. These images can be seen in Eschbach et al. 2020, extended figure 1. Neither in their evaluation nor in our anatomical evaluation (using a different protocol) additional expression in brain cells was detectable. We also modified the figure legend as suggested.

      (4) Fig.3: Precise n numbers per experiment should be stated in the figure legend.

      True, we now present n numbers per experiment whenever necessary.

      (5) Fig.4: Have the authors confirmed complete ablation of the targeted neuron using rpr/hid? Ablations can be highly incomplete depending on the onset and strength of Gal4 expression, leaving some functionality intact. While the ablation experiments are largely in line with the acute silencing of single DANs during high salt learning performed later on (Fig.7), there is potentially an interesting aspect of developmental compensation hidden in this data. Not a major point, but potentially interesting to check.

      We agree with this criticism. We have not tested if the expression of hid,rpr in DL1 DANs does really ablate them. Therefore we did an additional experiment to show that. The new data is now present as a supplemental figure (Figure 4- figure supplement 6). The result shows that expression of hid,rpr ablates also DL1 DANs similar to earlier experiments where we used the same effectors to ablate serotoniergic neurons (Huser et al., 2012, figure 5).

      (6) The performance index in Fig. 4 and 5 sometimes seems lower and the variability is higher than in some of the other experiments shown. Is this due to the high intrinsic variability of these particular experiments, or the background effects of the rpr/hid or splitGal4 lines?

      The general variability of these experiments is within the expected and known borders. In these kind of experiments there is always some variation due to several external factors (e.g. experimental time over the year). Therefore it is always important to measure controls and experimental animals at the same time. Of course that’s what we did and we only compare directly results of individual datasets. But not between different datasets. This is further hampered given that the experiments of Figure 4 (now Figure 4- figure supplement 1) and Figure 5 (now Figure 4) differ in several parameters from other learning experiments presented later in the text. Optogenetic activation uses blue light stimulation instead of “real world” high salt. Most often direct activation of specific DANs in the brain is more stable than the external high salt stimulation. Also optogenetic inactivation uses blue light stimulation and also retinal supplemented food. Both factors can affect the measurement. We thus want to argue that it is for each experiment most often the particular parameters that affect the variability of the results rather than background effects of the rpr/hid and split-Gal4 lines.

      (7) Fig.7: This is a neat experiment showing the effects of acute silencing of individual DL1 DANs. As silencing DAN-f1/g1 does not result in complete suppression of aversive learning, it would be highly interesting to test (or speculate about) additive or modulatory effects by the other DANs. Dan-c-1/d-1 also responds to high salt but does not show function on its own in these assays. I am aware that this is currently genetically not feasible. It would however be a nice future experiment.

      True, we were intensively screening for DL1 cluster specific driver lines that cover all 4 DL1 neurons or other combinations than the ones we tested. Unfortunately, we did not succeed in identifying them. Nevertheless, we will further screen new genetic resources (e.g. Meissner et al., 2024, bioRxiv) to expand our approach in future experiments. Please also see our comment on concern 1 of reviewer 1 for further technical limitations and biological questions that can also potentially explain the absence of complete suppression of high salt learning and memory. Some of these limitations are now also mentioned and discussed in the new version of the manuscript.

      (8) The discussion is excellent. I would just amend that it is likely that larval DAN-c1, which has high interconnectivity within the larval CNS, is likely integrating state-dependent network changes, similar to the role of some DANs in innate and state-dependent preference behavior. This might contribute to modulating learned behavior depending on the present (acute) and previous environmental conditions.

      Thanks a lot for bringing this up. We rewrote this part and added a discussion on recent work on DAN-c1 function in larvae as well as results on DAN function in innate and state-dependent preference behavior.

      (9) Citation in line 1115 missing access information: "Schnitzer M, Huang C, Luo J, Je Woo S, Roitman L, et al. 2023. Dopamine signals integrate innate and learned valences to regulate memory dynamics. Research Square".

      Unfortunately this escaped our notice. The paper is now published in Nature: Huang, C., Luo, J., Woo, S.J. et al. Dopamine-mediated interactions between short- and long-term memory dynamics. Nature 634, 1141–1149 (2024). https://doi.org/10.1038/s41586-024-07819-w. We have now changed the citation. The new citation includes the missing access information.

      Reviewer #3 (Recommendations For The Authors):

      Regarding my issue about salt specificity in the public review, I want to make clear that I do not suggest additional experiments, but to be very careful in phrasing the conclusions, in particular whenever referring to the experiments with optogenetic activation. This includes presenting these experiments as "(salt) substitution" experiments - inferring that the optogenetic activation would substitute for a natural salt punishment. As important and interesting as the experiments are, they simply do not allow such an interpretation at this point.

      Results, line 140ff: When presenting the results regarding TH-Gal4 crossed to ChR2-XXL, please cite Schroll et al. 2006 who demonstrated the same results for the first time.

      Thanks for mentioning this. We now cite Schroll et al. 2006 here in the text of the manuscript.

      Figure 3: The subfigure labels (ABC) are missing.

      Unfortunately this escaped our notice. Thanks a lot – we have now corrected this mistake.

      Figure 5: For I and L, it reads "salt replaced with fru", but the sketch on the left shows salt in the test. I assume that fructose was not actually present in the test, and therefore the figure can be misleading. I suggest separate sketches. Also, I and L are not mentioned in the figure legend.

      True, this is rather confusing. Based on the well taken concern we have changed the figure by adding a new and correct scheme for sugar reward learning that does not symbolize fructose during test.

      Figure S1: The experimental sketches for E,F and G,H seem to be mixed up.

      We thank the reviewer for bringing this up. In the new version we corrected this mistake.

      Figure S5: There are three sub-figures labelled with B. Please correct.

      Again, thanks a lot. We made the suggested correction in Figure S5.

      Discussion, line 353ff: this and the following sentences can be read as if the authors have discovered the DL-1 neurons as aversive teaching mediators in this study. However, Eschbach et al. 2020 already demonstrated very similar results regarding the optogenetic activation of single DL-1 DANs. I suggest to rephrase and cite Eschbach et al. 2020 at this point.

      That is correct. Our focus was on the gustatory pathway. The original discovery was made by Eschbach et al. We have now corrected this in the discussion and clarified our contribution. It was never our intention to hide this work, as the laboratory was also involved. Nevertheless, this is an annoying omission on our side.

      Line 385-387: this sentence is only correct with respect to Eschbach et al. 2020. Weiglein et al. 2021 used ChR2-XXL as an effector, but another training regimen.

      We understand this criticism. Therefore, we changed the sentence as suggested by the reviewer. See also our response on concern 15 of reviewer 1.

      Line 389ff: I do not understand this sentence. What is meant by persistent and current suppression of activity? If this refers to the behavioural experiments, it is misleading as in the hid, reaper experiments neurons are ablated and not suppressed in activity.

      We made the requested changes in the text. It is true that the ablation of a neuron throughout larval life is different from constantly blocking the output of a persisting neuron.

      Methods, line 615 ff: the performance index is said to be calculated as the difference between the two preferences, but the equation shows the average of the preferences.

      Thanks a lot. We are sorry for the confusion. We have carefully rewritten this part of the methods section to avoid any misunderstanding.

      When discussing the organization of the DL1 cluster, on several occasions I have the impression the authors use the terms "redundant" and "combinatorial" synonymously. I suggest to be more careful here. Redundancy implies that each DAN in principle can "do the job", whereas combinatorial coding implies that only a combination of DANs together can "do the job". If "the job" is establishing an aversive salt memory, the authors' results point to redundancy: no experimental manipulation totally abolished salt learning, implying that the non-manipulated neurons in each experiment sufficed to establish a memory; and several DANs, when individually activated, can establish an aversive memory, implying that each of them indeed can "do the job".

      Based on this concern we have rewritten the discussion as suggested to be more precise when talking about redundancy or combinatorial coding of the aversive teaching signal. Basically, we have removed all the combinatorial terms and replaced them by the term “redundancy”.

      The authors mix parametric and non-parametric statistical tests across the experiments dependent on whether the distribution of the data is normal or not. It would help readers if the authors would clearly state for which data which tests were used.

      We understand the criticism and now have added an additional supplemental file that includes all the information on the statistical tests applied and the distribution of the data.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary

      In this study, the authors build upon previous research that utilized non-invasive EEG and MEG by analyzing intracranial human ECoG data with high spatial resolution. They employed a receptive field mapping task to infer the retinotopic organization of the human visual system. The results present compelling evidence that the spatial distribution of human alpha oscillations is highly specific and functionally relevant, as it provides information about the position of a stimulus within the visual field.

      Using state-of-the-art modeling approaches, the authors not only strengthen the existing evidence for the spatial specificity of the human dominant rhythm but also provide new quantification of its functional utility, specifically in terms of the size of the receptive field relative to the one estimated based on broad band activity.

      We thank the reviewer for their positive summary.

      Weakness 1.1

      The present manuscript currently omits the complementary view that the retinotopic map of the visual system might be related to eye movement control. Previous research in non-human primates using microelectrode stimulation has clearly shown that neuronal circuits in the visual system possess motor properties (e.g. Schiller and Styker 1972, Schiller and Tehovnik 2001). More recent work utilizing Utah arrays, receptive field mapping, and electrical stimulation further supports this perspective, demonstrating that the retinotopic map functions as a motor map. In other words, neurons within a specific area responding to a particular stimulus location also trigger eye movements towards that location when electrically stimulated (e.g. Chen et al. 2020).

      Similarly, recent studies in humans have established a link between the retinotopic variation of human alpha oscillations and eye movements (e.g., Quax et al. 2019, Popov et al. 2021, Celli et al. 2022, Liu et al. 2023, Popov et al. 2023). Therefore, it would be valuable to discuss and acknowledge this complementary perspective on the functional relevance of the presented evidence in the discussion section.

      The reviewer notes that we do not discuss the oculomotor system and alpha oscillations. We agree that the literature relating eye movements and alpha oscillations are relevant.

      At the Reviewer’s suggestion, we added a paragraph on this topic to the first section of the Discussion (section 3.1, “Other studies have proposed … “).

      Reviewer #2 (Public Review):

      Summary:

      In this work, Yuasa et al. aimed to study the spatial resolution of modulations in alpha frequency oscillations (~10Hz) within the human occipital lobe. Specifically, the authors examined the receptive field (RF) tuning properties of alpha oscillations, using retinotopic mapping and invasive electroencephalogram (iEEG) recordings. The authors employ established approaches for population RF mapping, together with a careful approach to isolating and dissociating overlapping, but distinct, activities in the frequency domain. Whereby, the authors dissociate genuine changes in alpha oscillation amplitude from other superimposed changes occurring over a broadband range of the power spectrum. Together, the authors used this approach to test how spatially tuned estimated RFs were when based on alpha range activity, vs. broadband activities (focused on 70-180Hz). Consistent with a large body of work, the authors report clear evidence of spatially precise RFs based on changes in alpha range activity. However, the size of these RFs were far larger than those reliably estimated using broadband range activity at the same recording site. Overall, the work reflects a rigorous approach to a previously examined question, for which improved characterization leads to improved consistency in findings and some advance of prior work.

      We thank the reviewer for the summary.

      Strengths:

      Overall, the authors take a careful and well-motivated approach to data analyses. The authors successfully test a clear question with a rigorous approach and provide strong supportive findings. Firstly, well-established methods are used for modeling population RFs. Secondly, the authors employ contemporary methods for dissociating unique changes in alpha power from superimposed and concomitant broadband frequency range changes. This is an important confound in estimating changes in alpha power not employed in prior studies. The authors show this approach produces more consistent and robust findings than standard band-filtering approaches. As noted below, this approach may also account for more subtle differences when compared to prior work studying similar effects.

      We thank the reviewer for the positive comments.

      Weaknesses:

      Weakness 2.1 Theoretical framing:

      The authors frame their study as testing between two alternative views on the organization, and putative functions, of occipital alpha oscillations: i) alpha oscillation amplitude reflects broad shifts in arousal state, with large spatial coherence and uniformity across cortex; ii) alpha oscillation amplitude reflects more specific perceptual processes and can be modulated at local spatial scales. However, in the introduction this framing seems mostly focused on comparing some of the first observations of alpha with more contemporary observations. Therefore, I read their introduction to more reflect the progress in studying alpha oscillations from Berger's initial observations to the present. I am not aware of a modern alternative in the literature that posits alpha to lack spatially specific modulations. I also note this framing isn't particularly returned to in the discussion.

      This was helpful feedback. We have rewritten nearly the entire Introduction to frame the study differently. The emphasis is now on the fact that several intracranial studies of spatial tuning of alpha (in both human and macaque) tend to show increases in alpha due to visual stimulation, in contrast to a century of MEG/EEG studies, from Berger to the present, showing decreases. We believe that the discrepancy is due to an interaction between measurement type and brain signals. Specifically, intracranial measurements sum decreases in alpha oscillations and increases in broadband power on the same trials, and both signals can be large. In contrast, extracranial measures are less sensitive to the broadband signals and mostly just measure the alpha oscillation. Our study reconciles this discrepancy by removing the baseline broadband power increases, thereby isolating the alpha oscillation, and showing that with iEEG spatial analyses, the alpha oscillation decreases with visual stimulation, consistent with EEG and MEG results.

      Weakness 2.2 A second important variable here is the spatial scale of measurement.

      It follows that EEG based studies will capture changes in alpha activity up to the limits of spatial resolution of the method (i.e. limited in ability to map RFs). This methodological distinction isn't as clearly mentioned in the introduction, but is part of the author's motivation. Finally, as noted below, there are several studies in the literature specifically addressing the authors question, but they are not discussed in the introduction.

      The new Introduction now explicitly contrasts EEG/MEG with intracranial studies and refers to the studies below.

      Weakness 2.3 Prior studies:

      There are important findings in the literature preceding the author's work that are not sufficiently highlighted or cited. In general terms, the spatio-temporal properties of the EEG/iEEG spectrum are well known (i.e. that changes in high frequency activity are more focal than changes in lower frequencies). Therefore, the observations of spatially larger RFs for alpha activities is highly predicted. Specifically, prior work has examined the impact of using different frequency ranges to estimate RF properties, for example ECoG studies in the macaque by Takura et al. NeuroImage (2016) [PubMed: 26363347], as well as prior ECoG work by the author's team of collaborators (Harvey et al., NeuroImage (2013) [PubMed: 23085107]), as well as more recent findings from other groups (Luo et al., (2022) BioRxiv: https://doi.org/10.1101/2022.08.28.505627). Also, a related literature exists for invasively examining RF mapping in the time-voltage domain, which provides some insight into the author's findings (as this signal will be dominated by low-frequency effects). The authors should provide a more modern framing of our current understanding of the spatial organization of the EEG/iEEG spectrum, including prior studies examining these properties within the context of visual cortex and RF mapping. Finally, I do note that the author's approach to these questions do reflect an important test of prior findings, via an improved approach to RF characterization and iEEG frequency isolation, which suggests some important differences with prior work.

      Thank you for these references and suggestions. Some of the references were already included, and the others have been added.

      There is one issue where we disagree with the Reviewer, namely that “the observations of spatially larger RFs for alpha activities is highly predicted”. We agree that alpha oscillations and other low frequency rhythms tend to be less focal than high frequency responses, but there are also low frequency non-rhythmic signals, and these can be spatially focal. We show this by demonstrating that pRFs solved using low frequency responses outside the alpha band (both below and above the alpha frequency) are small, similar to high frequency broadband pRFs, but differing from the large pRFs associated with alpha oscillations. Hence we believe the degree to which signals are focal is more related to the degree of rhythmicity than to the temporal frequency per se. While some of these results were already in the supplement, we now address the issue more directly in the main text in a new section called, “2.5 The difference in pRF size is not due to a difference in temporal frequency.”

      We incorporated additional references into the Introduction, added a new section on low frequency broadband responses to the Results (section 2.5), and expanded the Discussion (section 3.2) to address these new references.

      Weakness 2.4 Statistical testing:

      The authors employ many important controls in their processing of data. However, for many results there is only a qualitative description or summary metric. It appears very little statistical testing was performed to establish reported differences. Related to this point, the iEEG data is highly nested, with multiple electrodes (observations) coming from each subject, how was this nesting addressed to avoid bias?

      We reviewed the primary claims made in the manuscript and for each claim, we specify the supporting analyses and, where appropriate, how we address the issue of nesting. Although some of these analyses were already in the manuscript, many of them are new, including all of the analyses concerning nesting. We believe that putting this information in one place will be useful to the reader, and we now include this text as a new section in supplement, Graphical and statistical support for primary claims.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 2.1:

      Data presentation: In several places, the authors discuss important features of cortical responses as measured with iEEG that need to be carefully considered. This is totally appropriate and a strength of the author's work, however, I feel the reader would benefit from more depiction of the time-domain responses, to help better understand the authors frequency domain approach. For example, Figure 1 would benefit from showing some form of voltage trace (ERP) and spectrogram, not just the power spectra. In addition, part (a) of Figure 1 could convey some basic information about the timing of the experimental paradigm.

      We changed panel A of Figure 1 to include the timing of the experimental paradigm, and we added panels C and D to show the electrode time series before and after regression out of the ERP.

      Recommendation 2.2

      Update introduction to include references to prior EEG/iEEG work on spatial distribution across frequency spectrum, and importantly, prior work mapping RFs with different frequencies.

      We have addressed this issue and re-written our introduction. Please refer to our response in Public Review for further details.

      Recommendation 2.3

      Figure 3 has several panels and should be labeled to make it easier to follow.The dashed line in lower power spectra isn't defined in a legend and is missing from the upper panel - please clarify.

      We updated Figure 3 and reordered the panels to clarify how we computed the summary metrics in broadband and alpha for each stimulus location (i.e., the “ratio” values plotted in panel B). We also simplified the plot of the alpha power spectrum. It now shows a dashed line representing a baseline-corrected response to the mapping stimulus, which is defined in the legend and explained in the caption.

      Recommendation 2.4

      Power spectra are always shown without error shading, but they are mean estimates.

      We added error shading to Figures 1, 2 and 3.

      Recommendation 2.5

      The authors deal with voltage transients in response to visual stimulation, by subtracting out the trail averaged mean (commonly performed). However, the efficacy of this approach depends on signal quality and so some form of depiction for this processing step is needed.

      We added a depiction of the processing steps for regressing out the averaged responses in Figure 1 in an example electrode (panels C and D). We also show in the supplement the effect of regressing out the ERP on all the electrode pRFs. We have added Supplementary Figure 1-2.

      Recommendation 2.6

      I have a similar request for the authors latency correction of their data, where they identified a timing error and re-aligned the data without ground truth. Again, this is appropriate, but some depiction of the success of this correction is very critical for confirming the integrity of the data.

      We now report more detail on the latency correction, and also point out that any small error in the estimate would not affect our conclusions (4.6 ECoG data analysis | Data epoching). The correction was important for a prior paper on temporal dynamics (Groen et al, 2022), which used data from the same participants and estimated the latency of responses. In this paper, our analyses are in the spectral domain (and discard phase), so small temporal shifts are not critical. We now also link to the public code associated with that paper, which implemented the adjustment and quantified the uncertainty in the latency adjustment.

      More details on latency adjustment provided in section 4.6.

      Recommendation 2.7

      In many places the authors report their data shows a 'summary' value, please clarify if this means averaging or summation over a range.

      For both broadband and alpha, we derive one summary value (a scalar) for trial for each stimulus. For broadband, the summary metric is the ratio of power during a given trial and power during blanks, where power in a trial is the geometric mean of the power at each frequency within the defined band). This is equation 3 in the methods, which is now referred to the first time that summary metrics are mentioned in the results.  For alpha, the summary metric is the height of the Gaussian from our model-based approach. This is in equations 1 and 2, and is also now referred to the first time summary metrics are mentioned in the results.

      We added explanation of the summary metrics in the figure captions and results where they are first used, and also referred to the equations in the methods where they are defined.

      Recommendation 2.8

      The authors conclude: "we have discovered that spectral power changes in the alpha range reflect both suppression of alpha oscillations and elevation of broadband power." It might not have been the intention, but 'discovered' seems overstated.

      We agree and changed this sentence.

      Recommendation 2.9

      Supp Fig 9 is a great effort by the authors to convey their findings to the reader, it should be a main figure.

      We are glad you found Supplementary Figure 9 valuable. We moved this figure to the main text.

      Reviewer #3 (Public Review):

      Summary:

      This study tackles the important subject of sensory driven suppression of alpha oscillations using a unique intracranial dataset in human patients. Using a model-based approach to separate changes in alpha oscillations from broadband power changes, the authors try to demonstrate that alpha suppression is spatially tuned, with similar center location as high broadband power changes, but much larger receptive field. They also point to interesting differences between low-order (V1-V3) and higher-order (dorsolateral) visual cortex. While I find some of the methodology convincing, I also find significant parts of the data analysis, statistics and their presentation incomplete. Thus, I find that some of the main claims are not sufficiently supported. If these aspects could be improved upon, this study could potentially serve as an important contribution to the literature with implications for invasive and non-invasive electrophysiological studies in humans.

      We thank the reviewer for the summary.

      Strengths:

      The study utilizes a unique dataset (ECOG & high-density ECOG) to elucidate an important phenomenon of visually driven alpha suppression. The central question is important and the general approach is sound. The manuscript is clearly written and the methods are generally described transparently (and with reference to the corresponding code used to generate them). The model-based approach for separating alpha from broadband power changes is especially convincing and well-motivated. The link to exogenous attention behavioral findings (figure 8) is also very interesting. Overall, the main claims are potentially important, but they need to be further substantiated (see weaknesses).

      We thank the reviewer for the positive comments.

      Weaknesses:

      I have three major concerns:

      Weakness 3.1. Low N / no single subject results/statistics:

      The crucial results of Figure 4,5 hang on 53 electrodes from four patients (Table 2). Almost half of these electrodes (25/53) are from a single subject. Data and statistical analysis seem to just pool all electrodes, as if these were statistically independent, and without taking into account subject-specific variability. The mean effect per each patient was not described in text or presented in figures. Therefore, it is impossible to know if the results could be skewed by a single unrepresentative patient. This is crucial for readers to be able to assess the robustness of the results. N of subjects should also be explicitly specified next to each result.

      We have added substantial changes to deal with subject specific effects, including new results and new figures.

      • Figure 4 now shows variance explained by the alpha pRF broken down by each participant for electrodes in V1 to V3. We also now show a similar figure for dorsolateral electrodes in Supplementary Figure 4-2.

      • Figure 5, which shows results from individual electrodes in V1 to V3, now includes color coding of electrodes by participant to make it clear how the electrodes group with participant. Similarly, for dorsolateral electrodes, we show electrodes grouped by participant in Supplementary Figure 5-1. Same for Supplementary Figure 6-2.

      • Supplementary Figure 7-2 now shows the benefits of our model-based approach for estimating alpha broken down by individual participants.

      • We also now include a new section in the supplement that summarizes for every major claim, what the supporting data are and how we addressed the issue of nesting electrodes by participant, section Graphical and statistical support for primary claims.

      Weakness 3.2. Separation between V1-V3 and dorsolateral electrodes:

      Out of 53 electrodes, 27 were doubly assigned as both V1-V3 and dorsolateral (Table 2, Figures 4,5). That means that out of 35 V1-V3 electrodes, 27 might actually be dorsolateral. This problem is exasperated by the low N. for example all the 20 electrodes in patient 8 assigned as V1-V3 might as well be dorsolateral. This double assignment didn't make sense to me and I wasn't convinced by the authors' reasoning. I think it needlessly inflates the N for comparing the two groups and casts doubts on the robustness of these analyses.

      Electrode assignment was probabilistic to reflect uncertainty in the mapping between location and retinotopic map. The probabilistic assignment is handled in two ways.

      (1) For visualizing results of single electrodes, we simply go with the maximum probability, so no electrode is visualized for both groups of data. For example, Figure 5a (V1-V3) and supplementary Figure 5-1a (dorsolateral electrodes) have no electrodes in common: no electrode is in both plots.

      (2) For quantitative summaries, we sample the electrodes probabilistically (for example Figures 4, 5c). So, if for example, an electrode has a 20% chance of being in V1 to V3, and 30% chance of being in dorsolateral maps, and a 50% chance of being in neither, the data from that electrode is used in only 20% of V1-V3 calculations and 30% of dorsolateral calculations. In 50% of calculations, it is not used at all. This process ensures that an electrode with uncertain assignment makes no more contribution to the results than an electrode with certain assignment. An electrode with a low probability of being in, say, V1-V3, makes little contribution to any reported results about V1-V3. This procedure is essentially a weighted mean, which the reviewer suggests in the recommendations. Thus, we believe there is not a problem of “double counting”.

      The alternative would have been to use maximum probability for all calculations. However, we think that doing so would be misleading, since it would not take into account uncertainty of assignment, and would thus overstate differences in results between the maps.

      We now clarify in the Results that for probabilistic calculations, the contribution of an electrode is limited by the likelihood of assignment (Section 2.3). We also now explain in the methods why we think probabilistic sampling is important.

      Weakness 3.3. Alpha pRFs are larger than broadband pRFs:

      First, as broadband pRF models were on average better fit to the data than alpha pRF models (dark bars in Supp Fig 3. Top row), I wonder if this could entirely explain the larger Alpha pRF (i.e. worse fits lead to larger pRFs). There was no anlaysis to rule out this possibility.

      We addressed this question in a new paragraph in Discussion section 3.1 (“What is the function of the large alpha pRFs?”, paragraph beginning… “Another possible interpretation is that the poorer model fit in the alpha pRF is due to lower signal-to-noise”). This paragraph both refers to prior work on the relationship between noise and pRF size and to our own control analyses (Supplementary Figure 5-2).

      Weakness 3.4 Statistics

      Second, examining closely the entire 2.4 section there wasn't any formal statistical test to back up any of the claims (not a single p-value is mentioned). It is crucial in my opinion to support each of the main claims of the paper with formal statistical testing.

      We agree that it is important for the reader to be able to link specific results and analyses to specific claims. We are not convinced that null hypothesis statistical testing is always the best approach. This is a topic of active debate in the scientific community.

      We added a new section that concisely states each major claim and explicitly annotates the supporting evidence. (Section 4.7). Please also refer to our responses to Reviewer #2 regarding statistical testing (Reviewer weakness 2.4 “Statistical testing”)

      Weakness 3.5 Summary

      While I judge these issues as crucial, I can also appreciate the considerable effort and thoughtfulness that went into this study. I think that addressing these concerns will substantially raise the confidence of the readership in the study's findings, which are potentially important and interesting.

      We again thank the reviewer for the positive comments.

      Reviewer #3 (Recommendations For The Authors):

      Suggestions for how to address the three major concerns:

      Suggestion 3.1.

      I am very well aware that it's very hard to have n=30 in a visual cortex ECOG study. That's fine. Best practice would be to have a linear mixed effects model with patients as a random effect. However, for some figures with just 3-4 patients (Figure 4,5) the sample size might be too small even for that. At the very minimum, I would expect to show in figures/describe in text all results per patient (perhaps one can do statistics within each patient, and show for each patient that the effect is significant). Even in primate studies with just two subjects it is expected to show that the results replicate for subject A and B. It is necessary to show that your results don't depend on a single unrepresentative subject. And if they do, at least be transparent about it.

      We have addressed this thoroughly. Please see response to Weakness 3.1 (“Low N / no single subject results/statistics”).

      Suggestion 3.2.

      I just don't get it. I would simply assign an electrode to V1-V3 or dorsolateral cortex based on which area has the highest probability. It doesn't make sense to me that an electrode that has 60% of being in dorsolateral cortex and only 10% to be in V1-V3 would be assigned as both V1-V3 and dorsolateral. Also, what's the rationale to include such electrode in the analysis for let's say V1-V3 (we have weak evidence to believe it's there)? I would either assign electrodes based on the highest probability, or alternatively do a weighted mean based on the probability of each electrode belonging to each region group (e.g. electrode with 40% to be in V1-V3, will get twice the weight as an electrode who has 20% to be in V1-V3) but this is more complicated.

      We have addressed this issue. Please refer to our response in Public Review (“Weakness 3.2 Separation between V1-V3 and dorsolateral”) for details.

      Suggestion 3.3.

      First, to exclude the possibility that alpha pRF are larger simply because they have a worse fit to the neural data, I would show if there is a correlation between the goodnessof-fit and pRF size (for alpha and broadband signals, separately). No [negative] correlation between goodness-of-fit and pRF size would be a good sign. I would also compare alpha & broadband receptive field size when controlling for the goodness-of-fit (selecting electrodes with similar goodness-of-fit for both signals). If the results replicate this way it would be convincing.

      Second, there are no statistical tests in section 2.4, possibly also in others. Even if you employ bootstrap / Monte-Carlo resampling methods you can extract a p-value.

      We have addressed this issue. Please refer to our response in Public Review Point 3.3 (“Alpha pRFs are larger than broadband pRFs”) for further details.

      Suggestion 3.4.

      Also, I don't understand the resampling procedure described in lines 652-660: "17.7 electrodes were assigned to V1-V3, 23.2 to dorsolateral, and 53 to either " - but 17.7 + 23.2 doesn't add up to 53. It also seems as if you assign visual areas differently in this resampling procedure than in the real data - "and randomly assigned each electrode to a visual area according to the Wang full probability distributions". If you assign in your actual data 27 electrodes to both visual areas, the same should be done in the resampling procedure (I would expect exactly 35 V1-V3 and 45 dorsolateral electrodes in every resampling, just the pRFs will be shuffled across electrodes).

      We apologize for the confusion.

      We fixed the sentence above, clarified the caption to Table 2, and also explained the overall strategy of probabilistic resampling better. See response to Public Review point 3.2 for details.

      Suggestion 3.5.

      These are rather technical comments but I believe they are crucial points to address in order to support your claims. I genuinely think your results are potentially interesting and important but these issues need to be first addressed in a revision. I also think your study may carry implications beyond just the visual domain, as alpha suppression is observed for different sensory modalities and cortical regions. Might be useful to discuss this in the discussion section.

      Agree. We added a paragraph on this point to the Discussion (very end of 3.2).

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the reviewers for their thoughtful feedback. We have made substantial revisions to the manuscript to address each of their comments, as we detail below. We want to highlight one major change in particular that addresses a concern raised by both reviewers: the role of the drift rate in our models. Motivated by their astute comments, we went back through our models and realized that we had made a particular assumption that deserved more scrutiny. We previously assumed that the process of encoding the observations made correct use of the objective, generative correlation, but then the process of calculating the weight of evidence used a mis-scaled, subjective version of the correlation. These assumptions led us to scale the drift rate in the model by a term that quantified how the standard deviation of the observation distribution was affected by the objective correlation (encoding), but to scale the bound height by the subjective estimate of the correlation (evidence weighing). However, we realized that encoding may also depend on the subjective correlation experienced by the participant. We have now tested several alternative models and found that the best-fitting model assumes that a single, subjective estimate of the correlation governs both encoding and evidence weighing. An important consequence of updating our models in this way is that we can now account for the behavioral data without needing the additional correlation-dependent drift terms (which, as reviewer #2 pointed out, were difficult to explain).

      We also note that we changed the title slightly, replacing “weighting” with “weighing” for consistency with our usage throughout the manuscript.

      Please see below for more details about this important point and our responses to the reviewers’ specific concerns. 

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The behavioral strategies underlying decisions based on perceptual evidence are often studied in the lab with stimuli whose elements provide independent pieces of decision-related evidence that can thus be equally weighted to form a decision. In more natural scenarios, in contrast, the information provided by these pieces is often correlated, which impacts how they should be weighted. Tardiff, Kang & Gold set out to study decisions based on correlated evidence and compare the observed behavior of human decision-makers to normative decision strategies. To do so, they presented participants with visual sequences of pairs of localized cues whose location was either uncorrelated, or positively or negatively correlated, and whose mean location across a sequence determined the correct choice. Importantly, they adjusted this mean location such that, when correctly weighted, each pair of cues was equally informative, irrespective of how correlated it was. Thus, if participants follow the normative decision strategy, their choices and reaction times should not be impacted by these correlations. While Tardiff and colleagues found no impact of correlations on choices, they did find them to impact reaction times, suggesting that participants deviated from the normative decision strategy. To assess the degree of this deviation, Tardiff et al. adjusted drift-diffusion models (DDMs) for decision-making to process correlated decision evidence. Fitting these models to the behavior of individual participants revealed that participants considered correlations when weighing evidence, but did so with a slight underestimation of the magnitude of this correlation. This finding made Tardiff et al. conclude that participants followed a close-to-normative decision strategy that adequately took into account correlated evidence.

      Strengths:

      The authors adjust a previously used experimental design to include correlated evidence in a simple, yet powerful way. The way it does so is easy to understand and intuitive, such that participants don't need extensive training to perform the task. Limited training makes it more likely that the observed behavior is natural and reflective of everyday decision-making. Furthermore, the design allowed the authors to make the amount of decision-related evidence equal across different correlation magnitudes, which makes it easy to assess whether participants correctly take account of these correlations when weighing evidence: if they do, their behavior should not be impacted by the correlation magnitude.

      The relative simplicity with which correlated evidence is introduced also allowed the authors to fall back to the well-established DDM for perceptual decisions, which has few parameters, is known to implement the normative decision strategy in certain circumstances, and enjoys a great deal of empirical support. The authors show how correlations ought to impact these parameters, and which changes in parameters one would expect to see if participants misestimate these correlations or ignore them altogether (i.e., estimate correlations to be zero). This allowed them to assess the degree to which participants took into account correlations on the full continuum from perfect evidence weighting to complete ignorance. With this, they could show that participants in fact performed rational evidence weighting if one assumed that they slightly underestimated the correlation magnitude.

      Weaknesses:

      The experiment varies the correlation magnitude across trials such that participants need to estimate this magnitude within individual trials. This has several consequences:

      (1) Given that correlation magnitudes are estimated from limited data, the (subjective) estimates might be biased towards their average. This implies that, while the amount of evidence provided by each 'sample' is objectively independent of the correlation magnitude, it might subjectively depend on the correlation magnitude. As a result, the normative strategy might differ across correlation magnitudes, unlike what is suggested in the paper. In fact, it might be the case that the observed correlation magnitude underestimates corresponds to the normative strategy.

      We thank the reviewer for raising this interesting point, which we now address directly with new analyses including model fits (pp. 15–24). These analyses show that the participants were computing correlation-dependent weights of evidence from observation distributions that reflected suboptimal misestimates of correlation magnitudes. This strategy is normative in the sense that it is the best that they can do, given the encoding suboptimality. However, as we note in the manuscript, we do not know the source of the encoding suboptimality (pp. 23–24). We thus do not know if there might be a strategy they could have used to make the encoding more optimal.

      (2) The authors link the normative decision strategy to putting a bound on the log-likelihood ratio (logLR), as implemented by the two decision boundaries in DDMs. However, as the authors also highlight in their discussion, the 'particle location' in DDMs ceases to correspond to the logLR as soon as the strength of evidence varies across trials and isn't known by the decision maker before the start of each trial. In fact, in the used experiment, the strength of evidence is modulated in two ways:

      (i) by the (uncorrected) distance of the cue location mean from the decision boundary (what the authors call the evidence strength) and

      (ii) by the correlation magnitude. Both vary pseudo-randomly across trials, and are unknown to the decision-maker at the start of each trial. As previous work has shown (e.g. Kiani & Shadlen (2009), Drugowitsch et al. (2012)), the normative strategy then requires averaging over different evidence strength magnitudes while forming one's belief. This averaging causes the 'particle location' to deviate from the logLR. This deviation makes it unclear if the DDM used in the paper indeed implements the normative strategy, or is even a good approximation to it.

      We appreciate this subtle, but important, point. We now clarify that the DDM we use includes degrees of freedom that are consistent with normative decision processes that rely on the imperfect knowledge that participants have about the generative process on each trial, specifically: 1) a single drift-rate parameter that is fit to data across different values of the mean of the generative distribution, which is based on the standard assumption for these kinds of task conditions in which stimulus strength is varied randomly from trial-to-trial and thus prevents the use of exact logLR (which would require stimulus strength-specific scale factors; Gold and Shadlen, 2001); 2) the use of a collapsing bound, which in certain cases (including our task) is thought to support a stimulus strength-dependent calibration of the decision variable to optimize decisions (Drugowitsch et al, 2012); and 3) free parameters (one per correlation) to account for subjective estimates of the correlation, which affected the encoding of the observations that are otherwise weighed in a normative manner in the best-fitting model.

      Also, to clarify our terminology, we define the objective evidence strength as the expected logLR in a given condition, which for our task is dependent on both the distance of the mean from the decision boundary and the correlation (p. 7). 

      Given that participants observe 5 evidence samples per second and on average require multiple seconds to form their decisions, it might be that they are able to form a fairly precise estimate of the correlation magnitude within individual trials. However, whether this is indeed the case is not clear from the paper.

      These points are now addressed directly in Results (pp. 23–24) and Figure 7 supplemental figures 1–3. Specifically, we show that, as the reviewer correctly surmised above, empirical correlations computed on each trial tended to be biased towards zero (Fig 7–figure supplement 1). However, two other analyses were not consistent with the idea that participants’ decisions were based on trial-by-trial estimates of the empirical correlations: 1) those with the shortest RTs did not have the most-biased estimates (Fig 7–figure supplement 2), and 2) there was no systematic relationship between objective and subjective fit correlations across participants (Fig 7–figure supplement 3).

      Furthermore, the authors capture any underestimation of the correlation magnitude by an adjustment to the DDM bound parameter. They justify this adjustment by asking how this bound parameter needs to be set to achieve correlation-independent psychometric curves (as observed in their experiments) even if participants use a 'wrong' correlation magnitude to process the provided evidence. Curiously, however, the drift rate, which is the second critical DDM parameter, is not adjusted in the same way. If participants use the 'wrong' correlation magnitude, then wouldn't this lead to a mis-weighting of the evidence that would also impact the drift rate? The current model does not account for this, such that the provided estimates of the mis-estimated correlation magnitudes might be biased.

      We appreciate this valuable comment, and we agree that we previously neglected the potential impact of correlation misestimates on evidence strength. As we now clarify, the correlation enters these models in two ways: 1) via its effect on how the observations are encoded, which involves scaling both the drift and the bound; and 2) via its effect on evidence weighing, which involves scaling only the bound (pp. 15–18). We previously assumed that only the second form of scaling might involve a subjective (mis-)estimate of the correlation. We now examine several models that also include the possibility of either or both forms using subjective correlation estimates. We show that a model that assumes that the same subjective estimate drives both encoding and weighing (the “full-rho-hat” model) best accounts for the data. This model provides better fits (after accounting for differences in numbers of parameters) than models with: 1) no correlation-dependent adjustments (“base” model), 2) separate drift parameters for each correlation condition (“drift” model), 3) optimal (correlation-dependent) encoding but suboptimal weighing (“bound-rho-hat” model, which was our previous formulation), 4) suboptimal encoding and weighing (“scaled-rho-hat” model), and 5) optimal encoding but suboptimal weighing and separate correlation-dependent adjustments to the drift rate (“boundrho-hat plus drift” model). We have substantially revised Figures 5–7 and the associated text to address these points.

      Lastly, the paper makes it hard to assess how much better the participants' choices would be if they used the correct correlation magnitudes rather than underestimates thereof. This is important to know, as it only makes sense to strictly follow the normative strategy if it comes with a significant performance gain.

      We now include new analyses in Fig. 7 that demonstrate how much participants' choices and RT deviate from: 1) an ideal observer using the objective correlations, and 2) an observer who failed to adjust for the fit subjective correlation when weighing the evidence (i.e., using the subjective correlation for encoding but a correlation of zero for weighing). We now indicate that participants’ performance was quite close to that predicted by the ideal observer (using the true, objective correlation) for many conditions. Thus, we agree that they might not have had the impetus to optimize the decision process further, assuming it were possible under these task conditions.

      Reviewer #2 (Public review):

      Summary:

      This study by Tardiff, Kang & Gold seeks to: i) develop a normative account of how observers should adapt their decision-making across environments with different levels of correlation between successive pairs of observations, and ii) assess whether human decisions in such environments are consistent with this normative model.

      The authors first demonstrate that, in the range of environments under consideration here, an observer with full knowledge of the generative statistics should take both the magnitude and sign of the underlying correlation into account when assigning weight in their decisions to new observations: stronger negative correlations should translate into stronger weighting (due to the greater information furnished by an anticorrelated generative source), while stronger positive correlations should translate into weaker weighting (due to the greater redundancy of information provided by a positively correlated generative source). The authors then report an empirical study in which human participants performed a perceptual decision-making task requiring accumulation of information provided by pairs of perceptual samples, under different levels of pairwise correlation. They describe a nuanced pattern of results with effects of correlation being largely restricted to response times and not choice accuracy, which could partly be captured through fits of their normative model (in this implementation, an extension of the well-known drift-diffusion model) to the participants' behaviour while allowing for misestimation of the underlying correlations.

      Strengths:

      As the authors point out in their very well-written paper, appropriate weighting of information gathered in correlated environments has important consequences for real-world decisionmaking. Yet, while this function has been well studied for 'high-level' (e.g. economic) decisions, how we account for correlations when making simple perceptual decisions on well-controlled behavioural tasks has not been investigated. As such, this study addresses an important and timely question that will be of broad interest to psychologists and neuroscientists. The computational approach to arrive at normative principles for evidence weighting across environments with different levels of correlation is very elegant, makes strong connections with prior work in different decision-making contexts, and should serve as a valuable reference point for future studies in this domain. The empirical study is well designed and executed, and the modelling approach applied to these data showcases a deep understanding of relationships between different parameters of the drift-diffusion model and its application to this setting. Another strength of the study is that it is preregistered.

      Weaknesses:

      In my view, the major weaknesses of the study center on the narrow focus and subsequent interpretation of the modelling applied to the empirical data. I elaborate on each below:

      Modelling interpretation: the authors' preference for fitting and interpreting the observed behavioural effects primarily in terms of raising or lowering the decision bound is not well motivated and will potentially be confusing for readers, for several reasons. First, the entire study is conceived, in the Introduction and first part of the Results at least, as an investigation of appropriate adjustments of evidence weighting in the face of varying correlations. The authors do describe how changes in the scaling of the evidence in the drift-diffusion model are mathematically equivalent to changes in the decision bound - but this comes amidst a lengthy treatment of the interaction between different parameters of the model and aspects of the current task which I must admit to finding challenging to follow, and the motivation behind shifting the focus to bound adjustments remained quite opaque. 

      We appreciate this valuable feedback. We have revised the text in several places to make these important points more clearly. For example, in the Introduction we now clarify that “The weight of evidence is computed as a scaled version of each observation (the scaling can be applied to the observations or to the bound, which are mathematically equivalent; Green and Swets, 1966) to form the logLR” (p. 3). We also provide more details and intuition in the Results section for how and why we implemented the DDM the way we did. In particular, we now emphasize that the correlation enters these models in two ways: 1) via its effect on encoding the observations, which scales both the drift and the bound; and 2) via its effect on evidence weighing, which scales only the bound (pp. 15–18).

      Second, and more seriously, bound adjustments of the form modelled here do not seem to be a viable candidate for producing behavioural effects of varying correlations on this task. As the authors state toward the end of the Introduction, the decision bound is typically conceived of as being "predefined" - that is, set before a trial begins, at a level that should strike an appropriate balance between producing fast and accurate decisions. There is an abundance of evidence now that bounds can change over the course of a trial - but typically these changes are considered to be consistently applied in response to learned, predictable constraints imposed by a particular task (e.g. response deadlines, varying evidence strengths). In the present case, however, the critical consideration is that the correlation conditions were randomly interleaved across trials and were not signaled to participants in advance of each trial - and as such, what correlation the participant would encounter on an upcoming trial could not be predicted. It is unclear, then, how participants are meant to have implemented the bound adjustments prescribed by the model fits. At best, participants needed to form estimates of the correlation strength/direction (only possible by observing several pairs of samples in sequence) as each trial unfolded, and they might have dynamically adjusted their bounds (e.g. collapsing at a different rate across correlation conditions) in the process. But this is very different from the modelling approach that was taken. In general, then, I view the emphasis on bound adjustment as the candidate mechanism for producing the observed behavioural effects to be unjustified (see also next point).

      We again appreciate this valuable feedback and have made a number of revisions to try to clarify these points. In addition to addressing the equivalence of scaling the evidence and the bound in the Introduction, we have added the following section to Results (Results, p.18):

      “Note that scaling the bound in these formulations follows conventions of the DDM, as detailed above, to facilitate interpretation of the parameters. These formulations also raise an apparent contradiction: the “predefined” bound is scaled by subjective estimates of the correlation, but the correlation was randomized from trial to trial and thus could not be known in advance. However, scaling the bound in these ways is mathematically equivalent to using a fixed bound on each trial and scaling the observations to approximate logLR (see Methods). This equivalence implies that in the brain, effectively scaling a “predefined” bound could occur when assigning a weight of evidence to the observations as they are presented.”

      We also note in Methods (pp. 40–41):

      “In the DDM, this scaling of the evidence is equivalent to assuming that the decision variable accumulates momentary evidence of the form (x1 + x2) and then dividing the bound height by the appropriate scale factor. An alternative approach would be to scale both the signal and noise components of the DDM by the scale factor. However, scaling the bound is both simpler and maintains the conventional interpretation of the DDM parameters in which the bound reflects the decision-related components of the evidence accumulation process, and the drift rate represents sensory-related components.”

      We believe we provide strong evidence that participants adjust their evidence weighing to account for the correlations (see response below), but we remain agnostic as to how exactly this weighing is implemented in the brain.

      Modelling focus: Related to the previous point, it is stated that participants' choice and RT patterns across correlation conditions were qualitatively consistent with bound adjustments (p.20), but evidence for this claim is limited. Bound adjustments imply effects on both accuracy and RTs, but the data here show either only effects on RTs, or RT effects mixed with accuracy trends that are in the opposite direction to what would be expected from bound adjustment (i.e. slower RT with a trend toward diminished accuracy in the strong negative correlation condition; Figure 3b). Allowing both drift rate and bound to vary with correlation conditions allowed the model to provide a better account of the data in the strong correlation conditions - but from what I can tell this is not consistent with the authors' preregistered hypotheses, and they rely on a posthoc explanation that is necessarily speculative and cannot presently be tested (that the diminished drift rates for higher negative correlations are due to imperfect mapping between subjective evidence strength and the experimenter-controlled adjustment to objective evidence strengths to account for effects of correlations). In my opinion, there are other candidate explanations for the observed effects that could be tested but lie outside of the relatively narrow focus of the current modelling efforts. Both explanations arise from aspects of the task, which are not mutually exclusive. The first is that an interesting aspect of this task, which contrasts with most common 'univariate' perceptual decision-making tasks, is that participants need to integrate two pieces of information at a time, which may or may not require an additional computational step (e.g. averaging of two spatial locations before adding a single quantum of evidence to the building decision variable). There is abundant evidence that such intermediate computations on the evidence can give rise to certain forms of bias in the way that evidence is accumulated (e.g. 'selective integration' as outlined in Usher et al., 2019, Current Directions in Psychological Science; Luyckx et al., 2020, Cerebral Cortex) which may affect RTs and/or accuracy on the current task. The second candidate explanation is that participants in the current study were only given 200 ms to process and accumulate each pair of evidence samples, which may create a processing bottleneck causing certain pairs or individual samples to be missed (and which, assuming fixed decision bounds, would presumably selectively affect RT and not accuracy). If I were to speculate, I would say that both factors could be exacerbated in the negative correlation conditions, where pairs of samples will on average be more 'conflicting' (i.e. further apart) and, speculatively, more challenging to process in the limited time available here to participants. Such possibilities could be tested through, for example, an interrogation paradigm version of the current task which would allow the impact of individual pairs of evidence samples to be more straightforwardly assessed; and by assessing the impact of varying inter-sample intervals on the behavioural effects reported presently.

      We thank the reviewer for this thoughtful and valuable feedback. We have thoroughly updated the modeling section to include new analysis and clearer descriptions and interpretations of our findings (including Figs. 5–7 and additional references to the Usher, Luyckx, and other studies that identified decision suboptimalities). The comment about “an additional computational step” in converting the observations to evidence was particularly useful, in that it made us realize that we were making what we now consider to be a faulty assumption in our version of the DDM. Specifically, we assumed that subjective misestimates of the correlation affected how observations were converted to evidence (logLR) to form the decision (implemented as a scaling of the bound height), but we neglected to consider how suboptimalities in encoding the observations could also lead to misestimates of the correlation. We have retained the previous best-fitting models in the text, for comparison (the “bound-rho-hat” and “bound-rho-hat + drift” models). In addition, we now include a “full-rho-hat” model that assumes that misestimates of rho affect both the encoding of the observations, which affects the drift rate and bound height, and the weighing of the evidence, which affects only the bound height. This was the best-fitting model for most participants (after accounting for different numbers of parameters associated with the different models we tested). Note that the full-rho-hat model predicts the lack of correlation-dependent choice effects and the substantial correlation-dependent RT effects that we observed, without requiring any additional adjustments to the drift rate (as we resorted to previously).

      In summary, we believe that we now have a much more parsimonious account of our data, in terms of a model in which subjective estimates of the correlation are alone able to account for our patterns of choice and RT data. We fully agree that more work is needed to better understand the source of these misestimates but also think those questions are outside the scope of the present study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      A few minor comments:

      (1) Evidence can be correlated in multiple ways. It could be correlated within individual pieces of evidence in a sequence, or across elements in that sequence (e.g., across time). This distinction is important, as it determines how evidence ought to be accumulated across time. In particular, if evidence is correlated across time, simply summing it up might be the wrong thing to do. Thus, it would be beneficial to make this distinction in the Introduction, and to mention that this paper is only concerned with the first type of correlation.

      We now clarify this point in the Introduction (p. 5–6).

      (2) It is unclear without reading the Methods how the blue dashed line in Figure 4c is generated. To my understanding, it is a prediction of the naive DDM model. Is this correct?

      We now specify the models used to make the predictions shown in Fig. 4c (which now includes an additional model that uses unscaled observations as evidence).

      (3) In Methods, given the importance of the distribution of x1 + x2, it would be useful to write it out explicitly, e.g., x1 + x2 ~ N(2 mu_g, ..), specifying its mean and its variance.

      Excellent suggestion, added to p. 38.

      (4) From Methods and the caption of Figure 6 - Supplement 1 it becomes clear that the fitted DDM features a bound that collapses over time. I think that this should also be mentioned in the main text, as it is a not-too-unimportant feature of the model.

      Excellent suggestion, added to p. 15, with reference to Fig. 6-supplement 1 on p. 20.

      (5) The functional form of the bound is 2 (B - tb t). To my understanding, the effective B changes as a function of the correlation magnitude. Does tb as well? If not, wouldn't it be better if it does, to ensure that 2 (B - tb t) = 0 independent of the correlation magnitude?

      In our initial modeling, we also considered whether the correlation-dependent adjustment, which is a function of both correlation sign and magnitude, should be applied to the initial bound or to the instantaneous bound (i.e., after collapse, affecting tb as well). In a pilot analysis of data from 22 participants in the 0.6 correlation-magnitude group, we found that this choice had a negligible effect on the goodness-of-fit (deltaAIC = -0.9, protected exceedance probability = 0.63, in favor of the instantaneous bound scaling). We therefore used the instantaneous bound version in the analyses reported in the manuscript but doubt this choice was critical based on these results. We have clarified our implementation of the bound in Methods (p. 43–44).

      Reviewer #2 (Recommendations for the authors):

      In addition to the points raised above, I have some minor suggestions/open questions that arose from my reading of the manuscript:

      (1) Are the predictions outlined in the paper specific to cases where the two sources are symmetric around zero? If distributions are allowed to be asymmetric then one can imagine cases (i.e. when distribution means are sufficiently offset from one another) where positive correlations can increase evidence strength and negative correlations decrease evidence strength. There's absolutely still value and much elegance in what the authors are showing with this work, but if my intuition is correct, it should ideally be acknowledged that the predictions are restricted to a specific set of generative circumstances.

      We agree that there are a lot of ways to manipulate correlations and their effect on the weight of evidence. At the end of the Discussion, we emphasize that our results apply to this particular form of correlation (p. 32).

      (2) Isn't Figure 4C misleading in the sense that it collapses across the asymmetry in the effect of negative vs positive correlations on RT, which is clearly there in the data and which simply adjusting the correlation-dependent scale factor will not reproduce?

      We agree that this analysis does not address any asymmetries in suboptimal estimates of positive versus negative correlations. We believe that those effects are much better addressed using the model fitting, which we present later in the Results section. We have now simplified the analyses in Fig. 4c, reporting the difference in RT between positive and negative correlation conditions instead of a linear regression.

      (3) I found the transition on p.17 of the Results section from the scaling of drift rate by correlation to scaling of bound height to be quite abrupt and unclear. I suspect that many readers coming from a typical DDM modelling background will be operating under the assumption that drift rate and bound height are independent, and I think more could be done here to explain why scaling one parameter by correlation in the present case is in fact directly equivalent to scaling the other.

      Thank you for the very useful feedback, we have substantially revised this text to make these points more clearly.

      (4) P.3, typo: Alan *Turing*

      That’s embarrassing. Fixed.

      (5) P.27, typo: "participants adopt a *fixed* bound"

      Fixed.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study presents valuable findings related to seasonal brain size plasticity in the Eurasian common shrew (Sorex araneus), which is an excellent model system for these studies. The evidence supporting the authors' claims is convincing. However, the authors should be careful when applying the term adaptive to the gene expression changes they observe; it would be challenging to demonstrate the differential fitness effects of these gene expression changes. The work will be of interest to biologists working on neuroscience, plasticity, and evolution.

      We appreciate the reviewers’ suggestions and comments. For the phylogenetic ANOVA we used (EVE), which tests for a separate RNA expression optimum specific to the shrew lineage consistent with expectations for adaptive evolution of gene expression. But, as you noted, while this analysis highlights many candidate genes evolving in a manner consistent with positive selection, further functional validation is required to confirm if and how these genes contribute to Dehnel’s phenomenon. In the discussion, we now emphasize that inferred adaptive expression of these genes is putative and outline that future studies are needed to test the function of proposed adaptations. For example, cell line validations of BCL2L1 on apoptosis is a case study that tests the function of a putatively adaptive change in gene expression, and it illuminates this limitation. We also have refined our discussion to focus more on pathway-level analyses rather than on individual genes, and have addressed other issues presented, including clarity of methods and using sex as a covariate in our analyses.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this paper, Thomas et al. set out to study seasonal brain gene expression changes in the Eurasian common shrew. This mammalian species is unusual in that it does not hibernate or migrate but instead stays active all winter while shrinking and then regrowing its brain and other organs. The authors previously examined gene expression changes in two brain regions and the liver. Here, they added data from the hypothalamus, a brain region involved in the regulation of metabolism and homeostasis. The specific goals were to identify genes and gene groups that change expression with the seasons and to identify genes with unusual expression compared to other mammalian species. The reason for this second goal is that genes that change with the season could be due to plastic gene regulation, where the organism simply reacts to environmental change using processes available to all mammals. Such changes are not necessarily indicative of adaptation in the shrew. However, if the same genes are also expression outliers compared to other species that do not show this overwintering strategy, it is more likely that they reflect adaptive changes that contribute to the shrew's unique traits.

      The authors succeeded in implementing their experimental design and identified significant genes in each of their specific goals. There was an overlap between these gene lists. The authors provide extensive discussion of the genes they found.

      The scope of this paper is quite narrow, as it adds gene expression data for only one additional tissue compared to the authors' previous work in a 2023 preprint. The two papers even use the same animals, which had been collected for that earlier work. As a consequence, the current paper is limited in the results it can present. This is somewhat compensated by an expansive interpretation of the results in the discussion section, but I felt that much of this was too speculative. More importantly, there are several limitations to the design, making it hard to draw stronger conclusions from the data. The main contribution of this work lies in the generated data and the formulation of hypotheses to be tested by future work.

      Thank you for your interest in our manuscript and for your insights. We addressed your comments below: we now highlight the limitations of our study design in the discussion and emphasize that, while a second optimum of gene expression in shrews is consistent with adaptive evolution, we recognize that not all sources of variation in gene expression can be fully accounted for. We highlight the putative nature of these results in our revisions, especially in our new limitations section (lines 541-555).

      Strengths:

      The unique biological model system under study is fascinating. The data were collected in a technically sound manner, and the analyses were done well. The paper is overall very clear, well-written, and easy to follow. It does a thorough job of exploring patterns and enrichments in the various gene sets that are identified.

      I specifically applaud the authors for doing a functional follow-up experiment on one of the differentially expressed genes (BCL2L1), even if the results did not support the hypothesis. It is important to report experiments like this and it is terrific to see it done here.

      We are glad to hear that you found our manuscript fascinating and clearly written. While we hoped to see an effect of BCL2L1 on apoptosis as proposed, we agree that reporting null results is valuable when validating evolutionary inferences.

      Weaknesses:

      While the paper successfully identifies differentially expressed seasonal genes, the real question is (as explained by the authors) whether these are evolved adaptations in the shrews or whether they reflect plastic changes that also exist in other species. This question was the motivation for the inter-species analyses in the paper, but in my view, these cannot rigorously address this question. Presumably, the data from the other species were not collected in comparable environments as those experienced by the shrews studied here. Instead, they likely (it is not specified, and might not be knowable for the public data) reflect baseline gene expression. To see why this is problematic, consider this analogy: if we were to compare gene expression in the immune system of an individual undergoing an acute infection to other, uninfected individuals, we would see many, strong expression differences. However, it would not be appropriate to claim that the infected individual has unique features - the relevant physiological changes are simply not triggered in the other individuals. The same applies here: it is hard to draw conclusions from seasonal expression data in the shrews to non-seasonal data in the other species, as shrew outlier genes might still reflect physiological changes that weren't active in the other species.

      There is no solution for this design flaw given the public data available to the authors except for creating matched data in the other species, which is of course not feasible. The authors should acknowledge and discuss this shortcoming in the paper.

      Thank you for taking the time to provide such insightful feedback. As you noted, whiles shrews experience seasonal size changes, their environments may differ from the other species used in this experiment, leading to increased or decreased expression of certain genes and reducing our ability accurately detect selection across the phylogeny. Although we sought to control for as many sources of variation as possible, such as using only post-pubescent, wild, or non-domesticated individuals when feasible, we recognize that not all sources of variation can be fully accounted for within a practical experiment. We agree that these sources of variation can introduce both false positives and negatives into our results, and we have now highlighted this limitation within our discussion (lines 538-552).

      Related to the point above: in the section "Evolutionary Divergence in Expression" it is not clear which of the shrew samples were used. Was it all of them, or only those from winter, fall, etc? One might expect different results depending on this. E.g., there could be fewer genes with inferred adaptive change when using only summer samples. The authors should specify which samples were included in these analyses, and, if all samples were used, conduct a robustness analysis to see which of their detected genes survive the exclusion of certain time points.

      Thank you for this attention to detail. We used spring adults for this analysis. This decision was made as only used post pubescent individuals for all species in the analysis, and this was the only season where adult shrews were going through Dehnel’s phenomenon. We have now clarified this in both the methods and results (line 247 and line 667)

      In the same section, were there also genes with lower shrew expression? None are mentioned in the text, so did the authors not test for this direction, or did they test and there were no significant hits?

      We did test for decreased shrew expression compared to the rest of the species, but there were no significant genes with significant decreases. We hypothesize that there are two potential reasons for this results; 1) If a gene were to be selected for decreased expression, selection for constitutive expression of the gene across all species may be weak, and thus found in other lineages as well, or 2) decreased or no expression may relax selection on the coding regions, and thus these genes are not pulled out as we identify 1:1 orthologs. This is consistent with results provided from the original methods manuscript. Thank you for pointing out that we did not discuss this information in the text, and we now include it in our results (lines 250-251).

      The Discussion is too long and detailed, given that it can ultimately only speculate about what the various expression changes might mean. Many of the specific points made (e.g. about the blood-brain-barrier being more permissive to sensing metabolic state, about cross-organ communication, the paragraphs on single, specific genes) are a stretch based on the available data. Illustrating this point, the one follow-up experiment the authors did (on BCL2L1) did not give the expected result. I really applaud the authors for having done this experiment, which goes beyond typical studies in this space. At the same time, its result highlights the dangers of reading too much into differential expression analyses.

      We agree with your point, while our extensive discussion is useful for testing future hypotheses, ultimately some of the discussion may be too speculative for our readers. To amend this, we have reduced some portions of our discussion and focused more on pathways than individual genes, including removing mechanisms related to HRH2, FAM57B, GPR3, and GABAergic neurons. We hope that this highlights to the reader the speculative nature of many of our results.

      There is no test of whether the five genes observed in both analyses (seasonal change and inter-species) exceed the number expected by chance. When two gene sets are drawn at random, some overlap is expected randomly. The expected overlap can be computed by repeated draws of pairs of random sets of the same size as seen in real data and by noting the overlap between the random pairs. If this random distribution often includes sets of five genes, this weakens the conclusions that can be drawn from the genes observed in the real data.

      Thank you for highlighting this approach, it is greatly needed. After running this test, we found that observed overlapping genes were more than the expected overlap, yet not significant. We now show this in our methods (lines 277-278) and results (lines 719-720).

      Reviewer #2 (Public review):

      Summary:

      Shrews go through winter by shrinking their brain and most organs, then regrow them in the spring. The gene expression changes underlying this unusual brain size plasticity were unknown. Here, the authors looked for potential adaptations underlying this trait by looking at differential expression in the hypothalamus. They found enrichments for DE in genes related to the blood-brain barrier and calcium signaling, as well as used comparative data to look at gene expression differences that are unique in shrews. This study leverages a fascinating organismal trait to understand plasticity and what might be driving it at the level of gene expression. This manuscript also lays the groundwork for further developing this interesting system.

      We are glad you found our manuscript interesting and thank and thank you for your feedback. We hope that we have addressed all of your concerns as described below.

      Strengths:

      One strength is that the authors used OU models to look for adaptation in gene expression. The authors also added cell culture work to bolster their findings.

      Weaknesses:

      I think that there should be a bit more of an introduction to Dehnel's phenomenon, given how much it is used throughout.

      Thank you for this insight. With a lengthy introduction and discussion, we agree that the importance of Dehnel’s phenomenon may have been overshadowed. We have shortened both sections and emphasized the background on Dehnel’s phenomenon in the first two paragraphs of the introduction, allowing this extraordinary seasonal size plasticity to stand out.

      Reviewer #3 (Public review):

      Summary:

      In their study, the authors combine developmental and comparative transcriptomics to identify candidate genes with plastic, canalized, or lineage-specific (i.e., divergent) expression patterns associated with an unusual overwintering phenomenon (Dehnel's phenomenon - seasonal size plasticity) in the Eurasian shrew. Their focus is on the shrinkage and regrowth of the hypothalamus, a brain region that undergoes significant seasonal size changes in shrews and plays a key role in regulating metabolic homeostasis. Through combined transcriptomic analysis, they identify genes showing derived (lineage-specific), plastic (seasonally regulated), and canalized (both lineage-specific and plastic) expression patterns. The authors hypothesize that genes involved in pathways such as the blood-brain barrier, metabolic state sensing, and ion-dependent signaling will be enriched among those with notable transcriptomic patterns. They complement their transcriptomic findings with a cell culture-based functional assessment of a candidate gene believed to reduce apoptosis.

      Strengths:

      The study's rationale and its integration of developmental and comparative transcriptomics are well-articulated and represent an advancement in the field. The transcriptome, known for its dynamic and plastic nature, is also influenced by evolutionary history. The authors effectively demonstrate how multiple signals-evolutionary, constitutive, and plastic-can be extracted, quantified, and interpreted. The chosen phenotype and study system are particularly compelling, as it not only exemplifies an extreme case of Dehnel's phenotype, but the metabolic requirements of the shrew suggest that genes regulating metabolic homeostasis are under strong selection.

      Weaknesses:

      (1) In a number of places (described in detail below), the motivation for the experimental, analytical, or visualization approach is unclear and may obscure or prevent discoveries.

      Thank you for finding our research and manuscript compelling, as well as the valuable feedback that will drastically improve our manuscript. We hope that we have alleviated your concerns below by following your instructions below.

      (2) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text:

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Figure 1B).

      - The authors do not indicate whether they perform cluster-specific GO or KEGG pathway enrichment analyses. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses.

      Thank you for this valuable feedback. We did not want to include clusters we deemed to be related to development, as this should not be attributed to changes associated with Dehnel’s phenomenon. We did this through qualitative, visual inspection, which we realize can differ between parties (i.e., clusters 2, 8, and 12 appeared to be seasonal). Qualitatively, we were looking for extreme divergence between Stage 1 and Stage 5 individuals, as expression was related to season and not development, then the average of these stages within cluster should be relatively similar. We have now quantified this as large differences in z-score (abs(summer juvenile-summer adult)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it both our methods (lines 699-702) and results (line 192).

      Regarding the combination of clusters for pathway enrichment compared to individual pathways, we agree that combining clusters may be more informative for overall homeostasis, compared to individual clusters which may inform us on processes directly related to Dehnel’s phenomenon. Initially, we were tentative to conduct this analysis, as clusters contain small gene sets, reducing the ability to detect pathway enrichments. We have now included this analysis, which is reported in our methods (lines 703-704), results (lines 203-204)., and new supplemental table.

      (3) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We agree that our rationale for validating BCL2L1 function in neural cell lines was not clearly explained in the manuscript. We selected BCL2L1 because it is the furthest downstream gene in the apoptotic pathway, thus making it the most directly involved gene in programmed cell death, whereas upstream genes could influence additional genes or alternative processes. We have clarified this choice in the revised methods section (lines 748-750).

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis, but it is unclear if this was also done for the S. araneus analysis. If not, why? If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis? Sex-specific expression elevates with group variation and could impact the discovery of differentially expressed genes.

      Regarding the use of sex as a covariate, we acknowledge the concerns raised. In our evolutionary analyses, we maintained a balanced sex ratio within species when possible. EVE models handle the effect of sex on gene expression as intraspecific variation. In shrews, however, we used males exclusively, as females were only found among juvenile individuals. Including those juvenile females would have introduced age effects, with perhaps a larger effect on our results. For the seasonal data, we have now included sex as a covariate in differential expression analyses. However, our design is imbalanced in relation to sex, which we have now discussed in our methods (lines 713-714) and discussion limitations (lines 544-548).

      (4) Discussion: The term "adaptive" is used frequently and liberally throughout the discussion. The interpretation of seasonal changes in gene expression as indicators of adaptive evolution should be done cautiously as such changes do not necessarily imply causal or adaptive associations.

      Thank you for this insight. We have reviewed our discussion and clarified that adaptations are putative (i.e. lines 146, 285, and 332), and highlighted this in our limitations section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I would recommend always spelling out "Dehnel's phenomenon" or even replacing this term (after crediting the DP term) with the more informative "seasonal size plasticity". Every time I saw "DP", I had to remind myself what this referred to. If the authors choose not to do so, please use the acronym consistently (e.g. line 186 has it spelled out).

      We have replaced the acronym DP with either the full term or the more informative “seasonal size plasticity” throughout the text.

      (2) Line 202: "DEG" has not been defined. Simply add to the line before.

      Thank you for this attention to detail. We have added this to the line above (210).

      (3) Please add a reference for the "AnAge" tool that was used to determine if samples were pubescent.

      Thank you for identifying this oversight. We have now cited the proper paper in line 634.

      (4) In the BCL2L1 section in the results, add a callout to Figure 2D.

      We have now added a callout to Figure 2D within the results (line 234).

      Reviewer #2 (Recommendations for the authors):

      (1) Line 122: is associated? These adaptations?

      Thank you for identifying that we were missing the words “associated with” here. We have fixed this in the revision.

      (2) The first paragraph of the Results should be moved to the methods, except maybe the number of orthologs.

      Thank you for this insight. We have removed this portion from the results section.

      (3) Why a Bonferroni correction on line 188? That seems too strict.

      We agree the Bonferroni correction is strict. Results when using other less strict methods for controlling false discovery rate are also not significant after correction. These corrections can be found within the data, however, we only report on the Bonferroni correction.

      (4) Line 427: "is a novel candidate gene for several neurological disorders" needs some references. I see them a couple of sentences later, but that's quite a sentence with no references at the end.

      We have added the proper citations for this sentence (line 524).

      Reviewer #3 (Recommendations for the authors):

      (1) Temporal Expression - Figure 1 and Supplemental Figure 2 and associated text Line176-193:

      - The authors report the total number of genes meeting inclusion criteria (>0.5-fold change between any two stages and 2 samples >10 normalized reads), but it would be more informative to also provide the number of genes within each temporal cluster. This would offer a clearer understanding of how gene expression patterns are distributed over time.

      Unfortunately, this information is difficult to depict on our figure and would use too much space in the text. We have thus added a description of the range of genes in a new supplemental table depicting this information.

      - It is unclear whether quantitative criteria were used to distinguish "developmental shift" clusters from "season shift" clusters. A visual inspection of Supplemental Figure 2 suggests that some clusters (e.g., clusters 2, 8, and to a lesser extent 12) show seasonal variation, not just developmental differences between stages 1 and 2. While clustering helps to visualize expression patterns, it may not be the most appropriate filter in this case, particularly since all "season shift" clusters are later combined in KEGG pathway and GO analyses (Fig. 1B). Using a differential gene expression criterion might be more suitable. For example, do excluded genes show significant log-fold differences between late-stage comparisons?

      As previously mentioned, we have now quantified seasonal shifts as large differences in z-score (abs(summer juveniles-summer adults)>1.25) without meaningful interseason variations determined by a second local maximum (abs(autumn-winter)<0.5 and abs(winter-summer)<0.5)), and added it to our methods (lines 699-702).  We then follow this up with differential expression analyses as described in Figure 2.

      - Did the authors perform cluster-specific GO or KEGG pathway enrichment analyses instead of focusing on the combined set of genes across the season shift clusters? While I understand that the small number of genes in each cluster may be limiting, if pathways emerge from cluster-specific analysis, they could provide more detailed insights into the functional significance of these temporal expression patterns. The current analysis picks up relevant pathways for hypothalamic control of homeostasis, which is a useful validation, but this approach might not fully address the study's key hypotheses. Additionally, no corrections for multiple hypothesis testing were applied, as noted in the results. A more refined gene set (e.g., using differential expression criteria, described above) could be more appropriate for these analyses.

      We have now included cluster-specific KEGG enrichments as previously described.

      (2) Differential expression between shrinkage (stage 2) and regrowth (stage 4) and cell culture targets - Figure 2 and lines195-227:

      - The rationale for selecting BCL2L1 for cell culture experiments should be clarified. While it is part of the apoptosis pathway, several other apoptosis-related genes were identified in the differential gene expression (DGE) analysis, some showing stronger differential expression or shrew-specific branch shifts. Why was BCL2L1 prioritized over these other candidates?

      We have now included the reasoning for further validation of BCL2L1 as described above.

      - The relevance of the "higher degree" differentially expressed genes needs more explanation. Although this group of genes is highlighted in the results, they are not featured in any subsequent analyses, leaving their importance unclear.

      Thank you for this insight. We have removed this from the methods as it is not relevant to subsequent analyses or conclusions.

      - The authors mention maintaining (or at least attempting to maintain) a 1:1 sex ratio for the comparative analysis (Line 525), but it is unclear if this was also done for the S. araneus analysis. If so, was sex included as a covariate (e.g., a random effect) in the differential expression analysis?

      We have now incorporated information on sex as described above.

      (3) Discussion:

      The term "adaptive" is used frequently and liberally throughout the discussion, but the authors should be cautious in interpreting seasonal changes in gene expression as indicators of adaptive evolution. Such changes do not necessarily imply causal or adaptive associations, and this distinction should be clearly stated when discussing the results.

      Thank you for this feedback and we agree with your conclusion, while a second expression optimum in the shrew lineage is indicative of adaptive expression, we cannot fully determine whether these are caused by genetic or environmental factors, despite careful attention to experimental design. We have highlighted this as a limitation in the discussion.

      (4) Minor Editorial Comment:

      Line 105: "... maintenance of an energy budgets..." delete "an"

      We have removed this grammatical error.

    1. Author response:

      (This author response relates to the first round of peer review by Biophysics Colab. Reviews and responses to both rounds of review are available here: https://sciety.org/articles/activity/10.1101/2023.10.23.563601.)

      General Assessment:

      Pannexin (Panx) hemichannels are a family of heptameric membrane proteins that form pores in the plasma membrane through which ions and relatively large organic molecules can permeate. ATP release through Panx channels during the process of apoptosis is one established biological role of these proteins in the immune system, but they are widely expressed in many cells throughout the body, including the nervous system, and likely play many interesting and important roles that are yet to be defined. Although several structures have now been solved of different Panx subtypes from different species, their biophysical mechanisms remain poorly understood, including what physiological signals control their activation. Electrophysiological measurements of ionic currents flowing in response to Panx channel activation have shown that some subtypes can be activated by strong membrane depolarization or caspase cleavage of the C-terminus. Here, Henze and colleagues set out to identify endogenous activators of Panx channels, focusing on the Panx1 and Panx2 subtypes, by fractionating mouse liver extracts and screening for activation of Panx channels expressed in mammalian cells using whole-cell patch clamp recordings. The authors present a comprehensive examination with robust methodologies and supporting data that demonstrate that lysophospholipids (LPCs) directly Panx-1 and 2 channels. These methodologies include channel mutagenesis, electrophysiology, ATP release and fluorescence assays, molecular modelling, and cryogenic electron microscopy (cryo-EM). Mouse liver extracts were initially used to identify LPC activators, but the authors go on to individually evaluate many different types of LPCs to determine those that are more specific for Panx channel activation. Importantly, the enzymes that endogenously regulate the production of these LPCs were also assessed along with other by-products that were shown not to promote pannexin channel activation. In addition, the authors used synovial fluid from canine patients, which is enriched in LPCs, to highlight the importance of the findings in pathology. Overall, we think this is likely to be a landmark study because it provides strong evidence that LPCs can function as activators of Panx1 and Panx2 channels, linking two established mediators of inflammatory responses and opening an entirely new area for exploring the biological roles of Panx channels. Although the mechanism of LPC activation of Panx channels remains unresolved, this study provides an excellent foundation for future studies and importantly provides clinical relevance.

      We thank the reviewers for their time and effort in reviewing our manuscript. Based on their valuable comments and suggestions, we have made substantial revisions. The updated manuscript now includes two new experiments supporting that lysophospholipid-triggered channel activation promotes the release of signaling molecules critical for immune response and demonstrates that this novel class of agonist activates the inflammasome in human macrophages through endogenously expressed Panx1. To better highlight the significance of our findings, we have excluded the cryo-EM panel from this manuscript. We believe these changes address the main concerns raised by the reviewers and enhance the overall clarity and impact of our findings. Below, we provide a point-by-point response to each of the reviewers’ comments.

      Recommendations:

      (1) The authors present a tremendous amount of data using different approaches, cells and assays along with a written presentation that is quite abbreviated, which may make comprehension challenging for some readers. We would encourage the authors to expand the written presentation to more fully describe the experiments that were done and how the data were analysed so that the 2 key conclusions can be more fully appreciated by readers. A lot of data is also presented in supplemental figures that could be brought into the main figures and more thoroughly presented and discussed.

      We appreciate and agree with the reviewers’ observation. Our initial manuscript may have been challenging to follow due to our use of both wild-type and GS-tagged versions of Panx1 from human and frog origins, combined with different fluorescence techniques across cell types. In this revision, we used only human wild-type Panx1 expressed in HEK293S GnTI- cells, except for activity-guided fractionation experiments, where we used GS-tagged Panx1 expressed in HEK293 cells (Fig. 1). For functional reconstitution studies, we employed YO-PRO-1 uptake assays, as optimizing the Venus-based assay was challenging. We have clarified these exceptions in the main text. We think these adjustments simplify the narrative and ensure an appropriate balance between main and supplemental figures.

      (2) It would also be useful to present data on the ion selectivity of Panx channels activated by LPC. How does this compare to data obtained when the channel is activated by depolarization? If the two stimuli activate related open states then the ion selectivity may be quite similar, but perhaps not if the two stimuli activate different open states. The authors earlier work in eLife shows interesting shifts in reversal potentials (Vrev) when substituting external chloride with gluconate but not when substituting external sodium with N-methyl-D-glucamine, and these changed with mutations within the external pore of Panx channels. Related measurements comparing channels activated by LPC with membrane depolarization would be valuable for assessing whether similar or distinct open states are activated by LPC and voltage. It would be ideal to make Vrev measurements using a fixed step depolarization to open the channel and then various steps to more negative voltages to measure tail currents in pinpointing Vrev (a so called instantaneous IV).

      We fully agree with the reviewer on the importance of ion selectivity experiments. However, comparing the properties of LPC-activated channels with those activated by membrane depolarization presented technical challenges, as LPC appears to stimulate Panx1 in synergy with voltage. Prolonged LPC exposure destabilizes patches, complicating G-V curve acquisition and kinetic analyses. While such experiments could provide mechanistic insights, we think they are beyond the scope of current study.

      (3) Data is presented for expression of Panx channels in different cell types (HEK vs HEKS GnTI-) and different constructs (Panx1 vs Panx1-GS vs other engineered constructs). The authors have tried to be clear about what was done in each experiment, but it can be challenging for the reader to keep everything straight. The labelling in Fig 1E helps a lot, and we encourage the authors to use that approach systematically throughout. It would also help to clearly identify the cell type and channel construct whenever showing traces, like those in Fig 1D. Doing this systematically throughout all the figures would also make it clear where a control is missing. For example, if labelling for the type of cell was included in Fig 1D it would be immediately clear that a GnTI- vector alone control for WT Panx1 is missing as the vector control shown is for HEK cells and formally that is only a control for Panx2 and 3. Can the authors explain why PLC activates Panx1 overexpressed in HEK293 GnTl- cells but not in HEK293 cells? Is this purely a function of expression levels? If so, it would be good to provide that supporting information.

      As mentioned above, we believe our revised version is more straightforward to digest. We have improved labeling and provided explanations where necessary to clarify the manuscript. While Panx1 expression levels are indeed higher in GnTI- than in HEK293 cells, we are uncertain whether the absence of detectable currents in HEK293 cells is solely due to expression levels. Some post-translational modifications that inhibit Panx1, such as lysine acetylation, may also impact activity. Future studies are needed to explore these mechanisms further.

      (4) The mVenus quenching experiments are somewhat confusing in the way data are presented. In Fig 2B the y axis is labelled fluorescence (%) but when the channel is closed at time = 0 the value of fluorescence is 0 rather than 100 %, and as the channel opens when LPC is added the values grow towards 100 instead of towards 0 as iodide permeates and quenches. It would be helpful if these types of data could be presented more intuitively. Also, how was the initial rate calculated that is plotted in Fig 2C? It would be helpful to show how this is done in a figure panel somewhere. Why was the initial rate expressed as a percent maximum, what is the maximum and why are the values so low? Why is the effect of CBX so weak in these quenching experiments with Panx1 compared to other assays? This assay is used in a lot of experiments so anything that could be done to bolster confidence is what it reports on would be valuable to readers. Bringing in as many control experiments that have been done, including any that are already published, would be helpful.

      We modified the Y-axis in Figure 2 to “Quench (%)” for clarity. The data reflects fluorescence reduction over time, starting from LPC addition, normalized to the maximal decrease observed after Triton-X100 addition (3 minutes), enabling consistent quenching value comparisons. Although the quenching value appears small, normalization against complete cell solubilization provides reproducible comparisons. We do not fully understand why CBX effects vary in Venus quenching experiments, but we speculate that its steroid-like pentacyclic structure may influence the lysophospholipid agonistic effects. As noted in prior studies (DOI: 10.1085/jgp.201511505; DOI: 10.7554/eLife.54670), CBX likely acts as an allosteric modulator rather than a simple pore blocker, potentially contributing to these variations.

      (5) Could provide more information to help rationalize how Yo-Pro-1, which has a charge of +2, can permeate what are thought to be anion favouring Panx channels? We appreciate that the biophysical properties of Panx channel remain mysterious, but it would help to hear how a bit more about the authors thinking. It might also help to cite other papers that have measured Yo-Pro-1 uptake through Panx channels. Was the Strep-tagged construct of Panx1 expressed in GnTI- cells and shown to be functional using electrophysiology?

      Our recent study suggest that the electrostatic landscape along the permeation pathway may influence its ion selectivity (DOI: 10.1101/2024.06.13.598903). However, we have not yet fully elucidated how Panx1 permeates both anions and cations. Based on our findings, ion selectivity may vary with activation stimulus intensity and duration. Cation permeation through Panx1 is often demonstrated with YO-PRO-1, which measures uptake over minutes, unlike electrophysiological measurements conducted over milliseconds to seconds. We referenced two representative studies employing YO-PRO-1 to assess Panx1 activity. Whole-cell current measurements from a similar construct with an intracellular loop insertion indicate that our STREP-tagged construct likely retains functional capacity.

      (6) In Fig 5 panel C, data is presented as the ratio of LPC induced current at -60 mV to that measured at +110 mV in the absence of LPC. What is the rationale for analysing the data this way? It would be helpful to also plot the two values separately for all of the constructs presented so the reader can see whether any of the mutants disproportionately alter LPC induced current relative to depolarization activated current. Also, for all currents shown in the figures, the authors should include a dashed coloured line at zero current, both for the LPC activated currents and the voltage steps.

      We used the ratio of LPC-induced current to the current measured at +110 mV to determine whether any of the mutants disproportionately affect LPC-induced current relative to depolarization-activated current. Since the mutants that did not respond to LPC also exhibited smaller voltage-stimulated currents than those that did respond, we reasoned that using this ratio would better capture the information the reviewer is suggesting to gauge. Showing the zero current level may be helpful if the goal was to compare basal currents, which in our experience vary significantly from patch to patch. However, since we are comparing LPC- and voltage-induced currents within the same patch, we believe that including basal current measurements would not add useful information to our study.

      Given that new experiments included to further highlight the significance of the discovery of Panx1 agonists, we opted to separate structure-based mechanistic studies from this manuscript and removed this experiment along with the docking and cryo-EM studies.

      (7) The fragmented NTD density shown in Fig S8 panel A may resemble either lipid density or the average density of both NTD and lipid. For example, Class7 and Class8 in Fig.S8 panel D displayed split densities, which may resemble a phosphate head group and two tails of lipid. A protomer mask may not be the ideal approach to separate different classes of NTD because as shown in Fig S8 panel D, most high-resolution features are located on TM1-4, suggesting that the classification was focused on TM1-4. A more suitable approach would involve using a smaller mask including NTD, TM1, and the neighbouring TM2 region to separate different NTD classes.

      We agree with the reviewer and attempted 3D classification using multiple smaller masks including the suggested region. However, the maps remained poorly defined, and we were unable to confidently assign the NTD.

      (8) The authors don’t discuss whether the LPC-bound structures display changes in the external part of the pore, which is the anion-selective filter and the narrower part of the pore. If there are no conformational changes there, then the present structures cannot explain permeability to large molecules like ATP. In this context, a plot for the pore dimension will be helpful to see differences along the pore between their different structures. It would also be clearer if the authors overlaid maps of protomers to illustrate differences at the NTD and the "selectivity filter."

      Both maps show that the narrowest constriction, formed by W74, has a diameter of approximately 9 Å. Previous steered molecular dynamics simulations suggest that ATP can permeate through such a constriction, implying an ion selection mechanism distinct from a simple steric barrier.

      (9) The time between the addition of LPC to the nanodisc-reconstituted protein and grid preparation is not mentioned. Dynamic diffusion of LPC could result in equal probabilities for the bound and unbound forms. This raises the possibility of finding the Primed state in the LPC-bound state as well. Additionally, can the authors rationalize how LPC might reach the pore region when the channel is in the closed state before the application of LPC?

      We appreciate the reviewer’s insight. We incubated LPC and nanodisc-reconstituted protein for 30 minutes, speculating that LPC approaches the pore similarly to other lipids in prior structures. In separate studies, we are optimizing conditions to capture more defined conformations.

      (10) In the cryo-EM map of the “resting” state (EMDB-21150), a part of the density was interpreted as NTD flipped to the intracellular side. This density, however, is poorly defined, and not connected to the S1 helix, raising concerns about whether this density corresponds to the NTD as seen in the “resting” state structure (PDB-ID: 6VD7). In addition, some residues in the C-terminus (after K333 in frog PANX1) are missing from the atomic model. Some of these residues are predicted by AlphaFold2 to form a short alpha helix and are shown to form a short alpha helix in some published PANX1 structures. Interestingly, in both the AF2 model and 6WBF, this short alpha helix is located approximately in the weak density that the authors suggest represents the “flipped” NTD. We encourage the authors to be cautious in interpreting this part as the “flipped” NTD without further validation or justification.

      We agree that the density corresponding the extended NTD into the cytoplasm is relatively weak. In our recent study, we compared two Panx1 structures with or without the mentioned C-terminal helix and found evidence suggesting the likelihood of NTD extension (DOI: 10.1101/2024.06.13.598903). Nevertheless, to prevent potential confusion, we have removed the cryo-EM panel from this manuscript.

      (11) Since the authors did not observe densities of bound PLC in the cryo-EM map, it is important to acknowledge in the text the inherent limitations of using docking and mutagenesis methods to locate where PLC binds.

      Thank you for the suggestion. We have removed this section to avoid potential confusion.

      Optional suggestions:

      (1) The authors used MeOH to extract mouse liver for reversed-phase chromatography. Was the study designed to focus on hydrophobic compounds that likely bind to the TMD? Panx1 has both ECD and ICD with substantial sizes that could interact with water soluble compounds? Also, the use of whole-cell recordings to screen fractions would not likely identify polar compounds that interact with the cytoplasmic part of the TMD? It would be useful for the authors to comment on these aspects of their screen and provide their rationale for fractionating liver rather than other tissues.

      We have added a rationale in line 90, stating: “The soluble fractions were excluded from this study, as the most polar fraction induced strong channel activities in the absence of exogenously expressed pannexins.” Additionally, we have included a figure to support this rationale (Fig. S1A).

      (2) The authors show that LPCs reversibly increase inward currents at a holding voltage of -60 mV (not always specified in legends) in cells expressing Panx1 and 2, and then show families of currents activated by depolarizing voltage steps in the absence of LPC without asking what happens when you depolarize the membrane after LPC activation? If LPCs can be applied for long enough without disrupting recordings, it would be valuable to obtain both I-V relations and G-V relations before and after LPC activation of Panx channels. Does LPC disproportionately increase current at some voltages compared to others? Is the outward rectification reduced by LPC? Does Vrev remain unchanged (see point above)? Its hard to predict what would be observed, but almost any outcome from these experiments would suggest additional experiments to explore the extent to which the open states activated by LPC and depolarization are similar or distinct.

      Unfortunately, in our hands, the prolonged application of lysolipids at concentrations necessary to achieve significant currents tends to destabilize the patch. This makes it challenging to obtain G-V curves or perform the previously mentioned kinetic analyses. We believe this destabilization may be due to lysolipids’ surfactant-like qualities, which can disrupt the giga seal. Additionally, prolonged exposure seems to cause channel desensitization, which could be another confounding factor.

      (3) From the results presented, the authors cannot rule out that mutagenesis-induced insensitivity of Panx channels to LPCs results from allosteric perturbations in the channels rather than direct binding/gating by LPCs. In Fig 5 panel A-C, the authors introduced double mutants on TM1 and TM2 to interfere with LPC binding, however, the double mutants may also disrupt the interaction network formed within NTD, TM1, and TM2. This disruption could potentially rearrange the conformation of NTD, favouring the resting closed state. Three double Asn mutants, which abolished LPC induced current, also exhibited lower currents through voltage activation in Fig 5S, raising the possibility the mutant channels fail to activate in response to LPC due to an increased energy barrier. One way to gain further insight would be to mutate residues in NTD that interact with those substituted by the three double Asn mutants and to measuring currents from both voltage activation and LPC activation. Such results might help to elucidate whether the three double Asn mutants interfere with LPC binding. It would also be important to show that the voltage-activated currents in Fig. S5 are sensitive to CBX?

      Thank you for the comment, with which we agree. Our initial intention was to use the mutagenesis studies to experimentally support the docking study. Due to uncertainties associated with the presented cryo-EM maps, we have decided to remove this study from the current manuscript. We will consider the proposed experiments in a future study.

      (4) Could the authors elaborate on how LPC opens Panx1 by altering the conformation of the NTDs in an uncoordinated manner, going from “primed” state to the “active” state. In the “primed” state, the NTDs seem to be ordered by forming interactions with the TMD, thus resulting in the largest (possible?) pore size around the NTDs. In contrast, in the “active” state, the authors suggest that the NTDs are fragmented as a result of uncoordinated rearrangement, which conceivably will lead to a reduction in pore size around NTDs (isn’t it?). It is therefore not intuitive to understand why a conformation with a smaller pore size represents an “active” state.

      We believe the uncoordinated arrangement of NTDs is dynamic, allowing for potential variations in pore size during the activated conformation. Alternatively, NTD movement may be coupled with conformational changes in TM1 and the extracellular domain, which in turn could alter the electrostatic properties of the permeation pathway. We believe a functional study exploring this mechanism would be more appropriately presented as a separate study.

      (5) Can the authors provide a positive control for these negative results presented in Fig S1B and C?

      The positive results are presented in Fig. 1D and E.

      (6) Raw images in Fig S6 and Fig S7 should contain units of measurement.

      Thank you for pointing this out.

      (7) It may be beneficial to show the superposition between primed state and activated state in both protomer and overall structure. In addition, superposition between primed state and PDB 7F8J.

      We attempted to superimpose the cryo-EM maps; however, visually highlighting the differences in figure format proved challenging. Higher-resolution maps would allow for model building, which would more effectively convey these distinctions.

      (8) Including particles number in each class in Fig S8 panel C and D would help in evaluating the quality of classification.

      Noted.

      (9) A table for cryo-EM statistics should be included.

      Thanks, noted.

      (10) n values are often provided as a range within legends but it would be better to provide individual values for each dataset. In many figures you can see most of the data points, which is great, but it would be easy to add n values to the plots themselves, perhaps in parentheses above the data points.

      While we agree that transparency is essential, adding n-values to each graph would make some figures less clear and potentially harder to interpret in this case. We believe that the dot plots, n-value range, and statistical analysis provide adequate support for our claims.

      (11) The way caspase activation of Panx channels is presented in the introduction could be viewed as dismissive or inflammatory for those who have studied that mechanism. We think the caspase activation literature is quite convincing and there is no need to be dismissive when pointing out that there are good reasons to believe that other mechanisms of activation likely exist. We encourage you to revise the introduction accordingly.

      Thank you for this comment. Although we intended to support the caspase activation mechanism in our introduction, we understand that the reviewer’s interpretation indicates a need for clarification. We hope the revised introduction removes any perception of dismissiveness.

      (12) Why is the patient data in Fig 4F normalized differently than everything else? Once the above issues with mVenus quenching data are clarified, it would be good to be systematic and use the same approach here.

      For Fig. 4F, we used a distinct normalization method to account for substantial day-to-day variation in experiments involving body fluids. Notably, we did not apply this normalization to other experimental panels due to their considerably lower day-to-day variation.

      (13) What was the rational for using the structure from ref 35 in the docking task?

      The docking task utilized the human orthologue with a flipped-up NTD. We believe that this flipped-up conformation is likely the active form that responds to lysolipids. As our functional experiments primarily use the human orthologue for biological relevance, this structure choice is consistent. Our docking data shows that LPC does not dock at this site when using a construct with the downward-flipped NTD.

      (14) Perhaps better to refer to double Asn ‘substitutions’ rather than as ‘mutations’ because that makes one think they are Asn in the wt protein.

      Done.

      (15) From Fig S1, we gather that Panx2 is much larger than Panx1 and 3. If that is the case, its worth noting that to readers somewhere.

      We have added the molecular weight of each subtype in the figure legend.

      (16) Please provide holding voltages and zero current levels in all figures presenting currents.

      We provided holding voltages. However, the zero current levels vary among the examples presented, making direct comparisons difficult. Since we are comparing currents with and without LPC, we believe that indicating zero current levels is unnecessary for this study.

      (17) While the authors successfully establish lysophospholipid-gating of Panx1 and Panx2, Panx3 appears unaffected. It may be advisable to be more specific in the title of the article.

      We are uncertain whether Panx3 is unaffected by lysophospholipids, as we have not observed activation of this subtype under any tested conditions.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight on the structural basis for the pharmacology of G protein-coupled receptors.

      We thank the Reviewer for the positive comment on the manuscript and the proposed methods.

      Reviewer #2 (Public review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.

      Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.

      MwSuMD was exploited to address:

      a) binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);

      b) molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDPbound Gs protein;

      c) molecular recognition of the A1-adenosine receptor (A1R) and palmotoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;

      d) the whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron.

      The revised version has improved clarity and rigor compared to the original also thanks to the reduction in the number of complex case studies treated superficially.

      The mwSuMD method is solid and valuable, has wide applicability and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.

      The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.

      While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. Definition of the metrics is a user- and system-dependent process.

      We thank the Reviewer for the positive comment on the revised manuscript and mwSuMD. We agree that the choice of supervised metrics is user- and systemdependent. We aim to improve this aspect in the future with the aid of interpretable machine learning.

      Reviewer #3 (Public review):

      Summary:

      In the present work Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has potential to provide novel insight into GPCR functionality. Example is the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are reproductions of the previously reported structures.

      The method focus of our study (mwSuMD) is an enhancement of the supervised molecular dynamics that allows supervising two metrics at the same time and uses a score, rather than a tabù-like algorithm, for handing the simulation. Further changes are the seeding of parallel short replicas (walkers) rather than a series of short simulations, and the software implementation on different MD engines (e.g. Acemd, OpenMM, NAMD, Gromacs).

      We agree with the Reviewer that experimental validation of the findings would be advisable, in line with any computational prediction. We are positive that future studies from our group employing mwSuMD will inform mutagenesis and BRET-based experiments.

      Reviewer #2 (Recommendations for the authors):

      As for GLP1R, I remain convinced that the 7LCI would have been better as a reference for all simulations than 7LCJ, also because 7LCI holds a slightly more complete ECD.

      We agree that 7LCJ would have been a better starting point than 7LCI for simulations because it presents the stalk region, contrary to 7LCJ. However, we do not think it might have influenced the output because the stalk is the most flexible segment of GLP1R, and any initial conformation is usually not retained during MD simulations.

      Please, correct everywhere the definition of the 6LN2 structure of GPL1R as a ligand-free or apo, because that structure is indeed bound to a negative allosteric modulator docked on the cytosolic end of helix-6

      We thank the reviewer for this precision. The text has been modified accordingly.

      As for the beta2-AR, the "full-length" AlphaFold model downloaded from the GPCRdb is not an intermediate active state because it is very similar to the receptor in the 3SN6 complex with Gs. Please, eliminate the inappropriate and speculative adjective "intermediate".

      We have changed “intermediate” to “not fully active”, which is less speculative since full activation can be achieved only in the presence of the G protein.

      Incidentally, in that model, the C-tail, eliminated by the authors, is completely wrong and occupies the G protein binding site. It is not clear to me the reason why the authors preferred to used an AlphaFold model as an input of simulations rather than a high resolution structural model, e.g. 4LDO. Perhaps, the reason is that all ICL regions, including ICL3, were modeled by AlphaFold even if with low confidence. I disagree with that choice.

      We understand the reviewer’s point of view. Should we have simulated an “equilibrium” receptor-ligand complex, we would have made the same choice. However, the conformational changes occurring during a G protein binding are so consistent that the starting conformation of the receptor becomes almost irrelevant as long as a sensate structure is used.  

      Reviewer #3 (Recommendations for the authors):

      The revised version of the manuscript is more concise, focusing only on two systems. However, the authors have responded superficially to the reviewers' comments, merely deleting sections of text, making minor corrections, or adding small additions to the text. In particular, the authors have not addressed the main critical points raised by both Reviewer 2 and Reviewer 3. 

      For example, the RMSD values for the binding of PF06882961 to GLP-1R remain high, raising doubts about the predictive capabilities of the method, at least for this type of system.

      What is the RMSD of the ligand relative to the experimental pose obtained in the simulations? This value must be included in the text.

      We have added this piece of information about PF06882961 RMSD in the text, which on page 6 now reads “We simulated the binding of PF06882961, reaching an RMSD to its bound conformation in 7LCJ of 3.79 +- 0.83 Å (computed on the second half of the merged trajectory, superimposing on GLP-1R Ca atoms of TMD residues 150 to 390), using multistep supervision on different system metrics (Figure 2) to model the structural hallmark of GLP-1R activation (Video S5, Video S6).”

      Similarly, the activation mechanism of GLP-1R is only partially simulated.

      Furthermore, it is not particularly meaningful to justify the high RMSD values of the SuMD simulations for the binding of Gs to GLP-1R by comparing them with those reported under unbiased MD conditions. "Replica 2, in particular, well reproduced the cryo-EM GLP-1R complex as suggested by RMSDs to 7LCI of 7.59{plus minus}1.58Å, 12.15{plus minus}2.13Å, and 13.73{plus minus}2.24Å for Gα, Gβ, and Gγ respectively. Such values are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with Gs and GLP-149 (Gα = 6.18 {plus minus} 2.40 Å; Gβ = 7.22 {plus minus} 3.12 Å; Gγ = 9.30 {plus minus} 3.65 Å), which indicates overall higher flexibility of Gβ and Gγ compared to Gα, which acts as a sort of fulcrum bound to GLP-1R."

      Without delving into the accuracy of the various calculations, the authors should acknowledge that comparing protein structures with such high RMSD values has no meaningful significance in terms of convergence toward the same three-dimensional structure.

      The text has been edited to accommodate the reviewer’s suggestion and still give the readers the measure of the high flexibility of Gs bound to GLP-1R. It now reads “Such values do not support convergence with the static experimental structure but are not far from the RMSDs measured in our previous simulations of GLP-1R in complex with G<sub>s</sub> and GLP-1 (G<sub>α</sub> = 6.18 ± 2.40 Å; G<sub>b</sub> = 7.22 ± 3.12 Å; G<sub>g</sub> = 9.30 ± 3.65 Å), which indicates overall higher flexibility of G<sub>b</sub> and G<sub>g</sub> compared to G<sub>α</sub>, which acts as a sort of fulcrum bound to GLP-1R.”

      Have the authors simulated the binding of the Gs protein using the experimentally active structure of GLP-1R in complex with the ligand PF06882961 (PDB ID 7LCJ)? Such a simulation would be useful to assess the quality of the binding simulation of Gs to the GLP1R/PF06882961 complex obtained from the previous SuMD.

      We considered performing the Gs binding simulation to the active structure of GLP-1R.

      However, the GLP-1R (and other class B receptors) fully active state, as reported in 7LCJ, depends on the presence of the Gs and can be reached only upon effector coupling. Since it is unlikely that the unbound receptor is already in the fully active state, we reasoned that considering it as a starting point for Gs binding simulations would have been an artifact.

      An example of the insufficient depth of the authors' replies can be seen in their response: "We note that among the suggested references, only Mafi et al report about a simulated G protein (in a pre-formed complex) and none of the work sampled TM6 rotation without input of energy."

      This statement is inaccurate. For instance, D'Amore et al. (Chem 2024, doi: 10.1016/j.chempr.2024.08.004) simulated Gs coupling to A2A as well as TM6 rotation, as did Maria-Solano and Choi (eLife 2023, doi: 10.7554/eLife.90773.1). The former employed path collective variables metadynamics, which is not cited in the introduction or the discussion, despite its relevance to the methodologies mentioned.

      Respectfully, our previous reply is correct, as all of the mentioned articles used enhanced (energy-biased) approaches, so the claim “none of the work sampled TM6 rotation without input of energy” stands. The reference to D’Amore et al. (published after the previous round of reviews of this manuscript) has been added to the introduction; we thank the reviewer for pointing it out. 

      Additionally, SuMD employs a tabu algorithm that applies geometric supervision to the simulation, serving as an alternative approach to enhancing sampling compared to the "input of energy" techniques as called by the authors. A fair discussion should clearly acknowledge this aspect of the SuMD methodology.

      We have now specified in the Methods that a tabù-like algorithm is part of SuMD, which, despite being the parent technique of mwSuMD, is not the focus of the present work. We provide extended references for readers interested in SuMD. mwSuMD, on the other hand, does not use a tabù-like algorithm but rather a continuative approach based on a score to select the best walker for each batch, as described in the Methods.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper contains what could be described as a "classic" approach towards evaluating a novel taste stimuli in an animal model, including standard behavioral tests (some with nerve transections), taste nerve physiology, and immunocytochemistry of taste cells of the tongue. The stimulus being tested is ornithine, from a class of stimuli called "kokumi" (in terms of human taste); these kokumi stimuli appear to enhance other canonical tastes, increasing what are essentially hedonic attributes of other stimuli. The mechanism for ornithine detection is thought to be GPRC6A receptors expressed in taste cells. The authors showed evidence for this in an earlier paper with mice; this paper evaluates ornithine taste in a rat model, and comes to a similar conclusion, albeit with some small differences between the two rodent species.

      Strengths:

      The data show effects of ornithine on taste/intake in laboratory rats: In two-bottle and briefer intake tests, adding ornithine results in higher intake of most, but all not all stimuli tested. Bilateral chorda tympani (CT) nerve cuts or the addition of GPRC6A antagonists decreased or eliminated these effects. Ornithine also evoked responses by itself in the CT nerve, but mainly at higher concentrations; at lower concentrations it potentiated the response to monosodium glutamate. Finally, immunocytochemistry of taste cell expression indicated that GPRC6A was expressed predominantly in the anterior tongue, and co-localized (to a small extent) with only IP3R3, indicative of expression in a subset of type II taste receptor cells.

      Weaknesses:

      As the authors are aware, it is difficult to assess a complex human taste with complex attributes, such as kokumi, in an animal model. In these experiments they attempt to uncover mechanistic insights about how ornithine potentiates other stimuli by using a variety of established experimental approaches in rats. They partially succeed by finding evidence that GPRC6A may mediate effects of ornithine when it is used at lower concentrations. In the revision they have scaled back their interpretations accordingly. A supplementary experiment measuring certain aspects of the effects of ornithine added to Miso soup in human subjects is included for the express purpose of establishing that the kokumi sensation of a complex solution is enhanced by ornithine; however, they do not use any such complex solutions in the rat studies. Moreover, the sample size of the human experiment is (still) small - it really doesn't belong in the same manuscript with the rat studies.

      Despite the reviewer’s suggestion, we would like to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, which is then followed by basic animal experiments to investigate the underlying mechanisms of kokumi in humans.

      We did not present the additive effects of ornithine on miso soup in the present rat study because our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26) already confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred to plain miso soup by mice.

      Furthermore, we believe that our sample size (n = 22) is comparable to those employed in other studies. For example, the representative kokumi studies by Ohsu et al. (Ref. #9), Ueda et al. (Ref. #10), Shibata et al. (Ref. #20), Dunkel et al. (Ref. #37), and Yang et al. (Ref. #44) used sample sizes of 20, 19, 17, 9, and 15, respectively.

      Reviewer #2 (Public review):

      Summary:

      The authors used rats to determine the receptor for a food-related perception (kokumi) that has been characterized in humans. They employ a combination of behavioral, electrophysiological, and immunohistochemical results to support their conclusion that ornithine-mediated kokumi effects are mediated by the GPRC6A receptor. They complemented the rat data with some human psychophysical data. I find the results intriguing, but believe that the authors overinterpret their data.

      Strengths:

      The authors provide compelling evidence that ornithine enhances the palatability of several chemical stimuli (i.e., IMP, MSG, MPG, Intralipos, sucrose, NaCl, quinine). Ornithine also increases CT nerve responses to MSG. Additionally, the authors provide evidence that the effects of ornithine are mediated by GPRC6A, a G-protein-coupled receptor family C group 6 subtype A, and that this receptor is expressed primarily in fungiform taste buds. Taken together, these results indicate that ornithine enhances the palatability of multiple taste stimuli in rats and that the enhancement is mediated, at least in part, within fungiform taste buds. This is an important finding that could stand on its own. The question of whether ornithine produces these effects by eliciting kokumi-like perceptions (see below) should be presented as speculation in the Discussion section.

      Weaknesses:

      I am still unconvinced that the measurements in rats reflect the "kokumi" taste percept described in humans. The authors conducted long-term preference tests, 10-min avidity tests and whole chorda tympani (CT) nerve recordings. None of these procedures specifically model features of "kokumi" perception in humans, which (according to the authors) include increasing "intensity of whole complex tastes (rich flavor with complex tastes), mouthfulness (spread of taste and flavor throughout the oral cavity), and persistence of taste (lingering flavor)." While it may be possible to develop behavioral assays in rats (or mice) that effectively model kokumi taste perception in humans, the authors have not made any effort to do so. As a result, I do not think that the rat data provide support for the main conclusion of the study--that "ornithine is a kokumi substance and GPRC6A is a novel kokumi receptor."

      Kokumi can be assessed in humans, as demonstrated by the enhanced kokumi perception observed when miso soup is supplemented with ornithine (Fig. S1). Currently, we do not have a method to measure the same kokumi perception in animals. However, in the two-bottle preference test, our previous companion paper (Fig. 1B in Mizuta et al. 2021, Ref. #26) confirmed that miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) was statistically significantly (P < 0.001) preferred over plain miso soup by mice.

      Of the three attributes of kokumi perception in humans, the “intensity of whole complex tastes (rich flavor with complex tastes)” was partly demonstrated in the present rat study. In contrast, “mouthfulness (the spread of taste and flavor throughout the oral cavity)” could not be directly detected in animals and had to be inferred in the Discussion. “Persistence of taste (lingering flavor)” was evident at least in the chorda tympani responses; however, because the tongue was rinsed 30 seconds after the onset of stimulation, the duration of the response was not fully recorded.

      It is well accepted in sensory physiology that the stronger the stimulus, the larger the tonic response—and consequently, the longer it takes for the response to return to baseline. For example, Kawasaki et al. (2016, Ref. #45) clearly showed that the duration of sensation increased proportionally with the concentration of MSG, lactic acid, and NaCl in human sensory tests. The essence of this explanation has been incorporated into the Discussion (p. 12).

      Why are the authors hypothesizing that the primary impacts of ornithine are on the peripheral taste system? While the CT recordings provide support for peripheral taste enhancement, they do not rule out the possibility of additional central enhancement. Indeed, based on the definition of human kokumi described above, it is likely that the effects of kokumi stimuli in humans are mediated at least in part by the central flavor system.

      We agree with the reviewer’s comment. Our CT recordings indicate that the effects of kokumi stimuli on taste enhancement occur primarily at the peripheral taste organs. The resulting sensory signals are then transmitted to the brain, where they are processed by the central gustatory and flavor systems, ultimately giving rise to kokumi attributes. This central involvement in kokumi perception is discussed on page 12. Although kokumi substances exert their effects at low concentrations—levels at which the substance itself (e.g., ornithine) does not become more favorable or (in the case of γ-Glu-Val-Gly) exhibits no distinct taste—we cannot rule out the possibility that even faint taste signals from these substances are transmitted to the brain and interact with other taste modalities.

      The authors include (in the supplemental data section) a pilot study that examined the impact of ornithine on variety of subjective measures of flavor perception in humans. The presence of this pilot study within the larger rat study does not really mice sense. While I agree with the authors that there is value in conducting parallel tests in both humans and rodents, I think that this can only be done effectively when the measurements in both species are the same. For this reason, I recommend that the human data be published in a separate article.

      Despite the reviewer’s suggestion, we intend to include the human sensory experiment. Our rationale is that we must first demonstrate that the kokumi of miso soup is enhanced by the addition of ornithine, and then follow up with basic animal experiments to investigate the potential underlying mechanisms of kokumi in humans.

      In our previous companion paper (Fig. 1B in Mizuta et al., 2021, Ref. #26), we confirmed with statistical significance (P < 0.001) that mice preferred miso soup supplemented with 3 mM L-ornithine (but not D-ornithine) over plain miso soup. However, as explained in our response to Reviewer #2’s first concern (in the Public review), it is difficult to measure two of the three kokumi attributes—aside from the “intensity of whole complex tastes (rich flavor with complex tastes)”—in animal models.

      The authors indicated on several occasions (e.g., see Abstract) that ornithine produced "synergistic" effects on the CT nerve response to chemical stimuli. "Synergy" is used to describe a situation where two stimuli produce an effect that is greater than the sum of the response to each stimulus alone (i.e., 2 + 2 = 5). As far as I can tell, the CT recordings in Fig. 3 do not reflect a synergism.

      We appreciate your comments regarding the definition of synergy. In Fig. 5 (not Fig. 3), please note the difference in the scaling of the ordinate between Fig. 5D (ornithine responses) and Fig. 5E (MSG responses). When both responses are presented on the same scale, it becomes evident that the response to 1 mM ornithine is negligibly small compared to the MSG response, which clearly indicates that the response to the mixture of MSG and 1 mM ornithine exceeds the sum of the individual responses to MSG and 1 mM ornithine. Therefore, we have described the effect as “synergistic” rather than “additive.” The same observation applies to the mice experiments in our previous companion paper (Fig. 8 in Mizuta et al. 2021, Ref. #26), where synergistic effects are similarly demonstrated by graphical representation. We have also added the following sentence to the legend of Fig. 5:

      “Note the different scaling of the ordinate in (D) and (E).”

      Reviewer #3 (Public review):

      Summary:

      In this study the authors set out to investigate whether GPRC6A mediates kokumi taste initiated by the amino acid L-ornithine. They used Wistar rats, a standard laboratory strain, as the primary model and also performed an informative taste test in humans, in which miso soup was supplemented with various concentrations of L-ornithine. The findings are valuable and overall the evidence is solid. L-Ornithine should be considered to be a useful test substance in future studies of kokumi taste and the class C G protein coupled receptor known as GPRC6A (C6A) along with its homolog, the calcium-sensing receptor (CaSR) should be considered candidate mediators of kokumi taste. The researchers confirmed in rats their previous work on Ornithine and C6A in mice (Mizuta et al Nutrients 2021).

      Strengths:

      The overall experimental design is solid based on two bottle preference tests in rats. After determining the optimal concentration for L-Ornithine (1 mM) in the presence of MSG, it was added to various tastants including: inosine 5'-monophosphate; monosodium glutamate (MSG); mono-potassium glutamate (MPG); intralipos (a soybean oil emulsion); sucrose; sodium chloride (NaCl; salt); citric acid (sour) and quinine hydrochloride (bitter). Robust effects of ornithine were observed in the cases of IMP, MSG, MPG and sucrose; and little or no effects were observed in the cases of sodium chloride, citric acid; quinine HCl. The researchers then focused on the preference for Ornithine-containing MSG solutions. Inclusion of the C6A inhibitors Calindol (0.3 mM but not 0.06 mM) or the gallate derivative EGCG (0.1 mM but not 0.03 mM) eliminated the preference for solutions that contained Ornithine in addition to MSG. The researchers next performed transections of the chord tympani nerves (with sham operation controls) in anesthetized rats to identify a role of the chorda tympani branches of the facial nerves (cranial nerve VII) in the preference for Ornithine-containing MSG solutions. This finding implicates the anterior half-two thirds of the tongue in ornithine-induced kokumi taste. They then used electrical recordings from intact chorda tympani nerves in anesthetized rats to demonstrate that ornithine enhanced MSG-induced responses following the application of tastants to the anterior surface of the tongue. They went on to show that this enhanced response was insensitive to amiloride, selected to inhibit 'salt tastant' responses mediated by the epithelial Na+ channel, but eliminated by Calindol. Finally they performed immunohistochemistry on sections of rat tongue demonstrating C6A positive spindle-shaped cells in fungiform papillae that partially overlapped in its distribution with the IP3 type-3 receptor, used as a marker of Type-II cells, but not with (i) gustducin, the G protein partner of Tas1 receptors (T1Rs), used as a marker of a subset of type-II cells; or (ii) 5-HT (serotonin) and Synaptosome-associated protein 25 kDa (SNAP-25) used as markers of Type-III cells.

      At least two other receptors in addition to C6A might mediate taste responses to ornithine: (i) the CaSR, which binds and responds to multiple L-amino acids (Conigrave et al, PNAS 2000), and which has been previously reported to mediate kokumi taste (Ohsu et al., JBC 2010) as well as responses to Ornithine (Shin et al., Cell Signaling 2020); and (ii) T1R1/T1R3 heterodimers which also respond to L-amino acids and exhibit enhanced responses to IMP (Nelson et al., Nature 2001). These alternatives are appropriately discussed and, taken together, the experimental results favor the authors' interpretation that C6A mediates the Ornithine responses. The authors provide preliminary data in Suppl. 3 for the possibility of co-expression of C6A with the CaSR.

      Weaknesses:

      The authors point out that animal models pose some difficulties of interpretation in studies of taste and raise the possibility in the Discussion that umami substances may enhance the taste response to ornithine (Line 271, Page 9).

      Ornithine and umami substances interact to produce synergistic effects in both directions—ornithine enhances responses to umami substances, and vice versa. These effects may depend on the concentrations used, as described in the Discussion (pp. 9–10). Further studies are required to clarify the precise nature of this interaction.

      One issue that is not addressed, and could be usefully addressed in the Discussion, relates to the potential effects of kokumi substances on the threshold concentrations of key tastants such as glutamate. Thus, an extension of taste distribution to additional areas of the mouth (previously referred to as 'mouthfulness') and persistence of taste/flavor responses (previously referred to as 'continuity') could arise from a reduction in the threshold concentrations of umami and other substances that evoke taste responses.

      Thank you for this important suggestion. If ornithine reduces the threshold concentrations of tastants—including glutamate—and enhances their suprathreshold responses, then adding ornithine may activate additional taste cells. This effect could explain kokumi attributes such as an “extension of taste distribution” and possibly the “persistence of responses.” As shown in Fig. 2, the lowest concentrations used for each taste stimulus are near or below the thresholds, which indicates that threshold concentrations are reduced—especially for MSG and MPG. We have incorporated this possibility into the Discussion as follows (p.12):

      “Kokumi substances may reduce the threshold concentrations as well as they increase the suprathreshold responses of tastants. Once the threshold concentrations are lowered, additional taste cells in the oral cavity become activated, and this information is transmitted to the brain. As a result, the brain perceives this input as coming from a wider area of the mouth.”

      The status of one of the compounds used as an inhibitor of C6A, the gallate derivative EGCG, as a potential inhibitor of the CaSR or T1R1/T1R3 is unknown. It would have been helpful to show that a specific inhibitor of the CaSR failed to block the ornithine response.

      Thank you for this important comment. We attempted to identify a specific inhibitor of CaSR. Although we considered using NPS-2143—a commonly used CaSR inhibitor—it is known to also inhibit GPRC6A. We agree that using a specific CaSR inhibitor would be beneficial and plan to pursue this in future studies.

      It would have been helpful to include a positive control kokumi substance in the two bottle preference experiment (e.g., one of the known gamma glutamyl peptides such as gamma-glu-Val-Gly or glutathione), to compare the relative potencies of the control kokumi compound and Ornithine, and to compare the sensitivities of the two responses to C6A and CaSR inhibitors.

      We agree with this comment. In retrospect, it may have been advantageous to directly compare the potencies of CaSR and GPRC6A agonists in enhancing taste preferences—and to evaluate the sensitivity of these preferences to CaSR and GPRC6A antagonists. However, we did not include γ-Glu-Val-Gly in the present study because we have already reported its supplementation effects on the ingestion of basic taste solutions in rats using the same methodology in a separate paper (Yamamoto and Mizuta, 2022, Ref. #25). The results from both studies are compared in the Discussion (p. 11).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major:

      I am not convinced by the Author's arguments for including the human data. I appreciate their efforts in adding a few (5) subjects and improving the description, but it still feels like it is shoehorned into this paper, and would be better published as a different manuscript.

      This human study is short, but it is complete rather than preliminary. The rationale for us to include the human data as supplementary information is shown in responses to the reviewer’s Public review.

      Minor concerns:

      Page 3 paragraph 1: Suggest "contributing to palatability".

      Thank you for this suggestion. We have rewritten the text as follows:

      “…, the brain further processes these sensations to evoke emotional responses, contributing to palatability or unpleasantness”.

      Page 4 paragraph 2: The text still assumes that "kokumi" is a meaningful descriptor for what rodents experience. Re-wording the following sentence like this could help:

      "Neuroscientific studies in mice and rats provide evidence that gluthione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi taste as experienced by humans. However, to our..."

      Or something similar.

      Thank you for this suggestion. We have rewritten the sentence according to your suggestion as follows:

      "Neuroscientific studies (23,25,30) in mice and rats provide evidence that glutathione and y-Glu-Val-Gly activate CaSRs, and modify behavioral responses to other tastants in a way that may correspond to kokumi as experienced by humans”.

      Page 7 paragraph 1 - put the concentrations of Calindol and EGCG used (in the physiology exps) in the text.

      We have added the concentrations: “300 µM calindol and 100 µM EGCG”.

      Reviewer #2 (Recommendations for the authors):

      I have included all of my recommendations in the public review section.

      Reviewer #3 (Recommendations for the authors):

      Although the definitions of 'thickness', 'mouthfulness' and 'continuity' have been revised very helpfully in the Introduction, 'mouthfulness' reappears at other points in the MS e.g., Page 4, Results, Line 3; Page 9, Line 3. It is best replaced by the new definition in these other locations too.

      We wish to clarify that our revised text stated, “…to clarify that kokumi attributes are inherently gustatory, in the present study we use the terms ‘intensity of whole complex tastes (rich flavor with complex tastes)’ instead of ‘thickness,’ ‘mouthfulness (spread of taste and flavor throughout the oral cavity)’ instead of ‘continuity,’ and ‘persistence of taste (lingering flavor)’ instead of ‘continuity.’” The term “mouthfulness” was retained in our text, though we provided a more specific explanation. In the re-revised version, we have added “(spread of taste in the oral cavity)” immediately after “mouthfulness.”

      I doubt that many scientific readers will be familliar with the term 'intragemmal nerve fibres' (Page 8, Line 4). It is used appropriately but it would be helpful to briefly define/explain it.

      We have added an explanation as follows:

      “… intragemmal nerve fibers, which are nerve processes that extend directly into the structure of the taste bud to transmit taste signals from taste cells to the brain.”

      I previously pointed out the overlap between the CaSR's amino acid (AA) and gamma-glutamyl-peptide binding site. I was surprised by the authors' response which appeared to miss the point being made. It was based on the impacts of selected mutations in the receptor's Venus FlyTrap domain (Broadhead JBC 2011) on the responses to AAs and glutathione analogs. The significantly more active analog, S-methylglutathione is of additional interest because, like glutathione itself, it is present in mammalian body fluids. My apologies to the authors for not more carefully explaining this point.

      Thank you for this comment. Both CaSR and GPRC6A are recognized as broad-spectrum amino acid sensors; however, their agonist profiles differ. Aromatic amino acids preferentially activate CaSR, whereas basic amino acids tend to activate GPRC6A. For instance, among basic amino acids, ornithine is a potent and specific activator of GPRC6A, while γ-Glu-Val-Gly in addition to amino acids is a high-potency activator of CaSR. It remains unclear how effectively ornithine activates CaSR and whether γ-glutamyl peptides also activate GPRC6A. These questions should be addressed in future studies.

    1. Author response:

      We thank the reviewers for their evaluation, for helpful suggestions to improve clarity and accuracy, and for their positive reception of the manuscript. We will incorporate their suggestions in a revised manuscript. Here, we respond to their major comments. 

      The reviewers suggest that a molecular study of Hofstenia’s reproductive systems would be beneficial, as would mechanistic explanations for its unusual reproductive behavior. We agree with the reviewers that both of these would be interesting avenues, although we think this is outside the scope of this current manuscript. This manuscript studies growth and reproductive dynamics in acoels, and establishes a foundation to study its underlying molecular, developmental, and physiological machinery. 

      Our previous molecular work, using scRNAseq and FISH, identified several germline markers. Here, we show that two of them are specific markers of testes and ovaries, respectively. This, together, with our new anatomical data, allows us to identify the expression domains of most of these other markers more clearly. Some markers may be expressed in a presumptive common germline that eventually splits into an anterior male germline and posterior female germline. We agree with the reviewers that understanding the dynamics of germline differentiation and its molecular genetic underpinnings would be very interesting, and we hope to address this in future work. 

      As the reviewers note, we do not understand how sperm is stored, how the worm’s own sperm can travel to its ovaries to enable selfing, or how eggs in the ovaries travel within the body. We agree with the reviewers that understanding these processes would be very interesting. Our histological and molecular work so far has been unable to find tube-like structures or other cavities for storage and transport. Potentially, cells could move within the parenchyma. Explaining these events will require substantial effort (including mechanistic studies of cell behavior and ultrastructural studies that the reviewers suggest), and we hope to do this in future work. 

      We agree with Reviewer 1 that it is interesting that Piwi-1 expression is only observed in the ovaries and not in the testes - unusual given its broad germline expression in many taxa. Although there are several possible explanations for this finding (for eg. Piwi-1 could be expressed at low levels in male germline, perhaps other Piwi proteins are expressed in male germline, or Piwi may play roles in male germline progenitors that are not co-located with maturing sperm, etc), we do not currently know why this is so, and we will discuss these possibilities in our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors demonstrate impairments induced by a high cholesterol diet on GLP-1R dependent glucoregulation in vivo as well as an improvement after reduction in cholesterol synthesis with simvastatin in pancreatic islets. They also map sites of cholesterol high occupancy and residence time on active versus inactive GLP-1Rs using coarse-grained molecular dynamics (cgMD) simulations and screened for key residues selected from these sites and performed detailed analyses of the effects of mutating one of these residues, Val229, to alanine on GLP-1R interactions with cholesterol, plasma membrane behaviour, clustering, trafficking and signalling in pancreatic beta cells and primary islets, and describe an improved insulin secretion profile for the V229A mutant receptor.

      These are extensive and very impressive studies indeed. I am impressed with the tireless effort exerted to understand the details of molecular mechanisms involved in the effects of cholesterol for GLP-1 activation of its receptor. In general, the study is convincing, the manuscript well written and the data well presented.

      Some of the changes are small and insignificant which makes one wonder how important the observations are. For instance, in figure 2 E (which is difficult to interpret anyway because the data are presented in percent, conveniently hiding the absolute results) does not show a significant result of the cyclodextrin except for insignificant increases in basal secretion. That is not identical to impairment of GLP-1 receptor signaling!

      We assume that the reviewer refers to Figure 1E, where we show the percentage of insulin secretion in response to 11 mM glucose +/- exendin-4 stimulation in mouse islets pretreated with vehicle or MβCD loaded with 20 mM cholesterol. While we concur with the reviewer that the effect in this case is triggered by increased basal insulin secretion at 11 mM glucose, exendin-4 appears to no longer compensate for this increase by proportionally amplifying insulin responses in cholesterol-loaded islets, leading to a significantly decreased exendin-4induced insulin secretion fold increase under these circumstances, as shown in Figure 1F. We interpret these results as a defect in the GLP-1R capacity to amplify insulin secretion beyond the basal level to the same extent as in vehicle conditions. An alternative explanation is that there is a maximum level of insulin secretion in our cells, and 11 mM glucose + exendin-4 stimulation gets close to that value. With the increasing effect of cholesterol-loaded MβCD on basal secretion at 11 mM glucose, exendin-4 stimulation would then appear to work less well.

      We have performed a simple experiment to investigate this possibility: insulin secretion following stimulation with a secretagogue cocktail (20 mM glucose, 30 mM KCl, 10 µM FSK and 100 µM IBMX) in islets +/- MβCD/cholesterol loading to determine if maximal stimulation had been reached or not in our original experiment. This experiment, now included in Supplementary Figure 1C, demonstrates that insulin secretion can increase up to ~4% (from ~2%) in our islets, supporting our initial conclusion. We have also included absolute insulin concentrations as well as percentages of secretion for all the experiments included in the study in the new Supplementary File 1 to improve the completeness of the report.

      To me the most important experiment of them all is the simvastatin experiment, but the results rest on very few numbers and there is a large variation. Apparently, in a previous study using more extensive reduction in cholesterol the opposite response was detected casting doubt on the significance of the current observation. I agree with the authors that the use of cyclodextrin may have been associated with other changes in plasma membrane structure than cholesterol depletion at the GLP-1 receptor.

      We agree with the reviewer that the insulin secretion results in vehicle versus LPDS/simvastatin treated mouse islets (Figure 1H, I) are relatively variable. We have therefore performed 2 extra biological repeats of this experiment (for a total n of 7). Results now show a significant increase in exendin-4-stimulated secretion with no change in basal secretion in islets pre-incubated with LPDS/simvastatin.  

      The entire discussion regarding the importance of cholesterol would benefit tremendously from studies of GLP-1 induced insulin secretion in people with different cholesterol levels before and after treatment with cholesterol-lowering agents. I suspect that such a study would not reveal major differences.

      We agree with the reviewer that such study would be highly relevant. While this falls outside the scope of the present paper, we encourage other researchers with access to clinical data on GLP-1R agonist responses in individuals taking cholesterol lowering agents to share their results with the scientific community. We have highlighted this point in the paper discussion to emphasise the importance of more research in this area.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors provided a proof of concept that they can identify and mutate a cholesterol-binding site of a high-interest class B receptor, the GLP-1R, and functionally characterize the impact of this mutation on receptor behavior in the membrane and downstream signaling with the intent that similar methods can be useful to optimize small molecules that as ligands or allosteric modulators of GLP-1R can improve the therapeutic tools targeting this signaling system.

      Strengths:

      The majority of results on receptor behavior are elucidated in INS-1 cells expressing the wt or mutant GLP-1R, with one experiment translating the findings to primary mouse beta-cells. I think this paper lays a very strong foundation to characterize this mutation and does a good job discussing how complex cholesterol-receptor interactions can be (ie lower cholesterol binding to V229A GLP-1R, yet increased segregation to lipid rafts). Table 1 and Figure 9 are very beneficial to summarize the findings. The lower interaction with cholesterol and lower membrane diffusion in V229A GLP-1R resembles the reduced diffusion of wt GLP-1R with simv-induced cholesterol reductions, although by presumably decreasing the cholesterol available to interact with wt GLP-1R. This could be interesting to see if lowering cholesterol alters other behaviors of wt GLP-1R that look similar to V229A GLP-1R. I further wonder if the authors expect that increased cholesterol content of islets (with loading of MβCD saturated with cholesterol or high-cholesterol diets) would elevate baseline GLP-1R membrane diffusion, and if a more broad relationship can be drawn between GLP-1R membrane movement and downstream signaling.

      Membrane diffusion experiments are difficult to perform in intact islets as our method requires cell monolayers for RICS analysis. We however agree that it is of interest to investigate if cholesterol loading affects GLP-1R diffusion. To this end, we have performed further RICS analysis in INS-1 832/3 SNAP/FLAG-hGLP-1R cells pretreated with vehicle or MβCD loaded with 20 mM cholesterol (new Supplementary Figures 1D and 1E). Interestingly, results show significantly increased plasma membrane diffusion of exendin-4-stimulated receptors, with no change in basal diffusion, following MβCD/cholesterol loading. This behaviour differs from that of the V229A mutant receptor which shows reduced diffusion under basal conditions, a pattern that mimics that of the WT receptor under low cholesterol conditions (by pre-treatment with LPDS/simvastatin).

      Weaknesses:

      I think there are no obvious weaknesses in this manuscript and overall, I believe the authors achieved their aims and have demonstrated the importance of cholesterol interactions on GLP-1R functioning in beta-cells. I think this paper will be of interest to many physiologists who may not be familiar with many of the techniques used in this paper and the authors largely do a good job explaining the goals of using each method in the results section.

      The intent of some methods, for example the Laurdan probe studies, are better expanded in the discussion.

      We have expanded on the rationale behind the use of Laurdan to assess behaviours of lipid packed membrane nanodomains in the methods, results and discussion of the revised manuscript.

      I found it unclear what exactly was being measured to assess 'receptor activity' in Fig 7E and F.

      Figures 7E and F refer to bystander complementation assays measuring the recruitment of nanobody 37 (Nb37)-SmBiT, which binds to active Gas, to either the plasma membrane (labelled with KRAS CAAX motif-LgBiT), or to endosomes (labelled with Endofin FYVE domain-LgBiT) in response to GLP-1R stimulation with exendin-4. This assay therefore measures GLP-1R activation specifically at each of these two subcellular locations. We have included a schematic of this assay in the new Supplementary Figure 3 to clarify the aim of these experiments.

      Certainly many follow-up experiments are possible from these initial findings and of primary interest is how this mutation affects insulin homeostasis in vivo under different physiological conditions. One of the biggest pathologies in insulin homeostasis in obesity/t2d is an elevation of baseline insulin release (as modeled in Fig 1E) that renders the fold-change in glucose stimulated insulin levels lower and physiologically less effective. No difference in primary mouse islet baseline insulin secretion was seen here but I wonder if this mutation would ameliorate diet-induced baseline hyperinsulinemia.

      We concur with the reviewer that it would be interesting to determine the effects of the GLP1R V229A mutation on insulin secretion responses under diet-induced metabolic stress conditions. While performing in vivo experiments on glucoregulation in mice harbouring the V229A mutation falls outside the scope of the present study, we have included ex vivo insulin secretion experiments in islets from GLP-1R KO mice transduced with adenoviruses expressing SNAP/FLAG-hGLP-1R WT or V229A and subsequently treated with vehicle versus MβCD loaded with 20 mM cholesterol to replicate the conditions of Figure 1E in the new Supplementary Figure 4.

      I would have liked to see the actual islet cholesterol content after 5wks high-cholesterol diet measured to correlate increased cholesterol load with diminished glucose-stimulated inulin. While not necessary for this paper, a comparison of islet cholesterol content after this cholesterol diet vs the more typical 60% HFD used in obesity research would be beneficial for GLP-1 physiology research broadly to take these findings into consideration with model choice.

      We have included these data in Supplementary Figure 1A.

      Another area to further investigate is does this mutation alter ex4 interaction/affinity/time of binding to GLP-1 or are all of the described findings due to changes in behavior and function of the receptor?

      To answer this question, have performed binding affinity experiments, which show no differences, in INS-1 832/3 SNAP/FLAG-hGLP-1R WT versus V229A cells (new Supplementary Figure 2D).

      Lastly, I wonder if V229A would have the same impact in a different cell type, especially in neurons? How similar are the cholesterol profiles of beta-cells and neurons? How this mutation (and future developed small molecules) may affect satiation, gut motility, and especially nausea, are of high translational interest. The comparison is drawn in the discussion between this mutation and ex4-phe1 to have biased agonism towards Gs over beta-arrestin signaling. Ex4-phe1 lowered pica behavior (a proxy for nausea) in the authors previously co-authored paper on ex4-phe1 (PMID 29686402) and I think drawing a parallel for this mutation or modification of cholesterol binding to potentially mitigate nausea is worth highlighting.

      While experiments in neurons are outside the scope of the present study, we have added this worthy point to the discussion and hypothesise on possible effects of GLP-1R mutants with modified cholesterol interactions on central GLP-1R actions in the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      There are no line numbers

      These have now been added.

      Abstract: "Cholesterol is a plasma membrane enriched lipid" - sorry for being finicky, but shouldn't this read; "a lipid often enriched in plasma membranes"

      We have modified the abstract to state that: “Cholesterol is a lipid enriched at the plasma membrane”.

      p. 4 "Moreover, islets extracted from high cholesterol-fed mice". How do you "extract islets"?

      We have exchanged the term “extracted” by “isolated”. Islet isolation is described in the paper methods section.

      p. 4 The sentence "These effects were accompanied by decreased GLP-1R plasma membrane diffusion under vehicle conditions, measured by Raster Image Correlation Spectroscopy (RICS) in rat insulinoma INS-1 832/3 cells with endogenous GLP-1R deleted [INS-1 832/3 GLP-1R KO cells (27)] stably expressing SNAP/FLAG-tagged human GLP-1R (SNAP/FLAG-hGLP-1R), an effect that is normally triggered by agonist binding (28), as also observed here (Supplementary Figure 1C, D)" is a masterpiece of complexity. Perhaps breaking up would facilitate reading?

      This paragraph has now been modified in the revised manuscript.

      p. 5. I cannot evaluate the "coarse grain molecular dynamics" studies.

      Reviewer #2 (Recommendations for the authors):

      I view this as an excellent manuscript with very comprehensive work and clear translational relevance. I don't think any further experiments are needed for the scope outlined in this manuscript. The discussion is already long but a short postulation on how this may translate to GLP-1R-cholesterol interactions in other cell types, specifically neurons with the intent on manipulating satiation and nausea, could be worthwhile.

      This has now been added.

      The only thing for readability I would suggest is a sentence in the results mentioning why you're doing the Laurdan analysis, and what is the output for assessing 'receptor activity' in the membrane and endosomes.

      Both points have now been added.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      The authors examine CD8 T cell selective pressure in early HCV infection using. They propose that after initial CD8-T mediated loss of virus fitness, in some participants around 3 months after infection, HCV acquires compensatory mutations and improved fitness leading to virus progression.

      Strengths:

      Throughout the paper, the authors apply well-established approaches in studies of acute to chronic HIV infection for studies of HCV infection. This lends rigor the to the authors' work.

      Weaknesses:

      (1) The Discussion could be strengthened by a direct discussion of the parallels/differences in results between HIV and HCV infections in terms of T cell selection, entropy, and fitness.

      We have added a direct discussion of the parallels/differences between HIV and HCV throughout the discussion including at lines 308 – 310 and 315 -327.

      Lines 308-310: “In fact, many parallels can be drawn between HIV infections and HCV infections in the context of emerging viral species that escape T cell immune responses.”

      Lines: 315-327: “One major difference between HCV and HIV infection is the event where patients infected with HCV have an approximately 25% chance to naturally clear the infection as opposed to just achieving viral control in HIV infections. Here, we probed the underlying mechanism, and questioned how the host immune response and HCV mutational landscape can allow the virus to escape the immune system. To understand this process, taking inspiration from HIV studies (24), a quantitative analysis of viral fitness relative to viral haplotypes was conducted using longitudinal samples to investigate whether a similar phenomenon was identified in HCV infections for our cohort for patients who progress to chronic infection. We observed a decrease in population average relative fitness in the period of <90DPI with respect to the T/F virus in chronic subjects infected with HCV. The decrease in fitness correlated positively with IFN-γ ELISPOT responses and negatively with SE indicating that CD8+ T-cell responses drove the rapid emergence of immune escape variants, which initially reduced viral fitness. This is similarly reflected in HIV infected patients where strong CD8+ T-cell responses drove quicker emergence of immune escape variants, often accompanied by compensatory mutations (24).”

      (2) In the Results, please describe the Barton model functionality and why the fitness landscape model was most applicable for studies of HCV viral diversity.

      This has been added to the introduction section rather than Results as we feel that it is more appropriate to show why it is most applicable to HCV viral diversity in the background section of the manuscript. We write at lines 77-90:

      “Barton et al.’s [23] approach to understand HIV mutational landscape resulting in immune escape had two fundamental points: 1) replicative fitness depends on the virus sequence and the requirement to consider the effect of co-occurring mutations, and 2) evolutionary dynamics (e.g. host immune pressure). Together they pave the way to predict the mutational space in which viral strains can change given the unique immune pressure exerted by individuals infected with HIV. This model fits well with the pathology of HCV infection. For instance, HIV and HCV are both RNA viruses with rapid rate of mutation. Additionally, like HIV, chronic infection is an outcome for HCV infected individuals, however, unlike HIV, there is a 25% probability that individuals infected with HCV will naturally clear the virus. Previously published studies [9] have shown that HIV also goes through a genetic bottleneck which results in the T/F virus losing dominance and replaced by a chronic subtype, identified by the immune escape mutations. The concepts in Barton’s model and its functionality to assess the fitness based on the complex interaction between viral sequence composition and host immune response is also applicable to early HCV infection.”

      (3) Recognize the caveats of the HCV mapping data presented.

      We have now recognized the caveats of the HCV mapping data at lines 354-256 “While our findings here are promising, it should be recognized that although the bioinformatics tool (iedb_tool.py) proved useful for identifying potential epitopes, there could be epitopes that are not predicted or false-positive from the output which could lead to missing real epitopes”

      (4) The authors should provide more data or cite publications to support the authors' statement that HCV-specific CD8 T cell responses decline following infection.

      We have now clarified at lines 352-353 that the decline was toward “selected epitopes that showed evidence of escape”.

      Furthermore, we have cited two publications at line 352 that support our statement.

      (5) Similarly, as the authors' measurements of HCV T and humoral responses were not exhaustive, the text describing the decline of T cells with the onset of humoral immunity needs caveats or more rigorous discussion with citations (Discussion lines 319-321).

      We have now added a caveat in the discussion at lines 357-360 which reads

      “In conclusion, this study provides initial insights into the evolutionary dynamics of HCV, showing that an early, robust CD8+ T-cell response without nAbs strongly selects against the T/F virus, enabling it to escape and establish chronic infection. However, these findings are preliminary and not exhaustive, warranting further investigation to fully understand these dynamics. “

      (6) What role does antigen drive play in these data -for both T can and antibody induction?

      It is possible that HLA-adapted mutations could limit CD8 T cell induction if the HLAs were matched between transmission pairs, as has been shown previously for HIV (https://doi.org/10.1371/journal.ppat.1008177) with some data for HCV (https://journals.asm.org/doi/10.1128/jvi.00912-06). However, we apologise as we are not entirely sure that this is what the reviewer is asking for in this instance.

      (7) Figure 3 - are the X and Y axes wrongly labelled? The Divergent ranges of population fitness do not make sense.

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      (8) Figure S3 - is the green line, average virus fitness?

      This has now been clarified in Figure S3.

      (9) Use the term antibody epitopes, not B cell epitopes.

      We now use the term antibody epitopes throughout the manuscript.

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      (1) Introduction:

      Line 52: 'carry mutations B/T cell epitopes'. Two points

      i) These are antibody epitopes (and antibody selection) not B cell epitopes

      We have corrected this sentence at line 55 which now reads: “carry mutations within epitopes targeted by B cells and CD8+ T cells”.

      ii) To avoid confusion, add text that mutations were generated following selection in the donor.

      For HCV, it is unclear if mutations are generated following selection or have been occurring in low frequencies outside detection range. Only when selection by host immune pressure arises do the potentially low-frequency variants become dominant. However, we do acknowledge it is potentially misleading to only mention new variants replacing the transmitted/founder population. We have modified the sentence at line 52 to read:

      “At this stage either an existing variant that was occurring in low-frequency outside detection range or an existing variant with novel mutations generated following immune selection is observed in those who progress to chronic infection”

      - Lines 51-56: Human studies of escape and progression are associative, not causative as implied.

      Correct, evidence suggesting that escape and progression are currently associative. We have now corrected these lines to no longer suggest causation.

      - Line 65: Suggest you clarify your meaning of 'easier'?

      This sentence, now at line 72, has been modified to: “subtype 1b viruses have a higher probability to evade immune responses”

      (2) Results:

      - Line 147: Barton model (ref'd in Intro) is directly referred to here but not referenced.

      The reference has been added.

      - The authors should cite previous HIV literature describing associations between the rate of escape and Shannon Entropy e.g. the interaction between immunodominance, entropy, and rate of escape in acute HIV infection was described in Liu et al JCI 2013 but is not cited.

      We have now cited previous HIV research at line 147-151, adding Liu et al:

      “Additionally, the interaction between immunodominance, entropy, and escape rate in acute HIV infection has been described, where immunodominance during acute infection was the most significant factor influencing CD8+ T cell pressure, with higher immunodominance linked to faster escape (27). In contrast, lower epitope entropy slowed escape, and together, immunodominance and entropy explained half of the variability in escape timing (27).”

      - Line 319: The authors suggest that HCV-specific CD8 T cell response declines following early infection. On what are they basing this statement? The authors show their measured T cell responses decline but their approach uses selected epitopes and they are therefore unable to assess total HCV T cell response in participants (Where there is no escape, are T cell magnitudes maintained or do they still decline?). Can the authors cite other studies to support their statement?

      We have now clarified that the decline was toward “selected epitopes that showed evidence of escape”. Furthermore, we also cite two studies to support our findings.

      - Throughout the authors talk in terms of CD8 T cells but the ELISpot detects both CD4 and CD8 T cell responses. I suggest the authors be more explicit that their peptide design (9-10mers) is strongly biased to only the detection of CD8 T cells.

      To make this clearer and more explicit we have now added to the methods section at line 433-435:

      “While the ELISpot assay detects responses from both CD4 and CD8 T cells, our peptide design (9-10mers) is strongly biased toward CD8 T-cell detection. We have therefore interpreted ELISpot responses primarily in terms of CD8 T-cell activity.”

      - The points made in lines 307-321 could be more succinct

      We have now edited the discussion (lines 307 – 321) to make the points more succinct (now lines 307-323).

      Minor corrections to text, figures:

      - Figure 2: suggest making the Key bigger and more obvious.

      We have now made the key bigger and more obvious

      - Figure 3 A & D....is there an error on the X-axis...are you really reporting ELISpot data of < 1 spot/10^6? Perhaps the X and Y axes are wrongly labelled?

      Our apologies, there was an error with the plot in Figure 3 and the X and Y axis were wrongly labelled. This has now been resolved.

      - Figure 5: As this is PBMC, remove CD8 from the description of ELISpot. 

      We have now removed CD8 from the description of ELISpot in both Figure 5 and Figure S3

      Reviewer #2 (Public review):

      Summary:

      In this work, Walker and collaborators study the evolution of hepatitis C virus (HCV) in a cohort of 14 subjects with recent HCV infections. They focus in particular on the interplay between HCV and the immune system, including the accumulation of mutations in CD8+ T cell epitopes to evade immunity. Using a computational method to estimate the fitness effects of HCV mutations, they find that viral fitness declines as the virus mutates to escape T-cell responses. In long-term infections, they found that viral fitness can rebound later in infection as HCV accumulates additional mutations.

      Strengths:

      This work is especially interesting for several reasons. Individuals who developed chronic infections were followed over fairly long times and, in most cases, samples of the viral population were obtained frequently. At the same time, the authors also measured CD8+ T cell and antibody responses to infection. The analysis of HCV evolution focused not only on variation within particular CD8+ T cell epitopes but also on the surrounding proteins. Overall, this work is notable for integrating information about HCV sequence evolution, host immune responses, and computational metrics of fitness and sequence variation. The evidence presented by the authors supports the main conclusions of the paper described above.

      Weaknesses:

      One notable weakness of the present version of the manuscript is a lack of clarity in the description of the method of fitness estimation. In the previous studies of HIV and HCV cited by the authors, fitness models were derived by fitting the model (equation between lines 435 and 436) to viral sequence data collected from many different individuals. In the section "Estimating survival fitness of viral variants," it is not entirely clear if Walker and collaborators have used the same approach (i.e., fitting the model to viral sequences from many individuals), or whether they have used the sequence data from each individual to produce models that are specific to each subject. If it is the former, then the authors should describe where these sequences were obtained and the statistics of the data.

      If the fitness models were inferred based on the data from each subject, then more explanation is needed. In prior work, the use of these models to estimate fitness was justified by arguing that sequence variants common to many individuals are likely to be well-tolerated by the virus, while ones that are rare are likely to have high fitness costs. This justification is less clear for sequence variation within a single individual, where the viral population has had much less time to "explore" the sequence landscape. Nonetheless, there is precedent for this kind of analysis (see, e.g., Asti et al., PLoS Comput Biol 2016). If the authors took this approach, then this point should be discussed clearly and contrasted with the prior HIV and HCV studies.

      We thank the reviewer for pointing out the weakness in our explanation and description of the fitness model. The model has been generated using publicly released viral sequences and this has been described in a previous publication by Hart et al. 2015. T/F virus from each of the subjects chronically infected with HCV in our cohort were given to the model by Hart et al. to estimate the initial viral fitness of the T/F variant. Subsequent time points of each subject containing the subvariants of the viral population were also estimated using the same model (each subtype). For each subject, these subvariant viral fitness values were divided by the fitness value of the initial T/F virus (hence relative fitness of the earliest time points with no mutations in the epitope regions were a value of 1.000). All other fitness values are therefore relative fitness to the T/F variant.

      We have further clarified this point in the methods section “Estimating survival fitness of viral variant” to better describe how the data of the model was sourced (Lines 465-499).

      To add to the reviewer’s point, we agree that sequence variants common to many individuals are likely to be well-tolerated by the virus and this event was observed in our findings as our data suggested that immune escape variants tended to revert to variants that were closer the global consensus strain. Our previous publications have indicated that T/F viruses during transmission were variants that were “fit” for transmission between hosts, especially in cases where the donor was a chronic progressor, a single T/F is often observed. Progression to immune escape and adaptation to chronic infection in the new host has an in-between process of genetic expansion via replication followed by a bottleneck event under immune pressure where overall fitness (overall survivability including replication and exploring immune escape pathways) can change. Under this assumption we questioned whether the observation reported in HIV studies (i.e. mutation landscapes that allow HIV adaptation to host) also happens in HCV infections. Furthermore, cohort used in this study is a rare cohort where patients were tracked from uninfected, to HCV RNA+, to seroconversion and finally either clearing the virus or progression to chronic infection. Thus, it is of importance to understand the difference between clearance and chronic progression.

      Another important point for clarification is the definition of fitness. In the abstract, the authors note that multiple studies have shown that viral escape variants can have reduced fitness, "diminishing the survival of the viral strain within the host, and the capacity of the variant to survive future transmission events." It would be helpful to distinguish between this notion of fitness, which has sometimes been referred to as "intrinsic fitness," and a definition of fitness that describes the success of different viral strains within a particular individual, including the potential benefits of immune escape. In many cases, escape variants displace variants without escape mutations, showing that their ability to survive and replicate within a specific host is actually improved relative to variants without escape mutations. However, escape mutations may harm the virus's ability to replicate in other contexts. Given the major role that fitness plays in this paper, it would be helpful for readers to clearly discuss how fitness is defined and to distinguish between fitness within and between hosts (potentially also mentioning relevant concepts such as "transmission fitness," i.e., the relative ability of a particular variant to establish new infections).

      Thank you for pointing out the weakness of our definition of fitness. We have now clarified this at multiple sections of the paper: In the abstract at lines 18-21 and in the introduction at lines 64-69.

      These read:

      Lines 18-21: “However, this generic definition can be further divided into two categories where intrinsic fitness describes the viral fitness without the influence of any immune pressure and effective fitness considers both intrinsic fitness with the influence of host immune pressure.”

      Lines 64-69: “This generic definition of fitness can be further divided into intrinsic fitness (also referred to as replicative fitness), where the fitness of sequence composition of the variant is estimated without the influence of host immune pressure. On the other hand, effective fitness (from here on referred to as viral fitness) considers fundamental intrinsic fitness with host immune pressure acting as a selective force to direct mutational landscape (19)[REF], which subsequently influences future transmission events as it dictates which subvariants remain in the quasispecies.”

      One concern about the analysis is in the test of Shannon entropy as a way to quantify the rate of escape. The authors describe computing the entropy at multiple time points preceding the time when escape mutations were observed to fix in a particular epitope. Which entropy values were used to compare with the escape rate? If just the time point directly preceding the fixation of escape mutations, could escape mutations have already been present in the population at that time, increasing the entropy and thus drawing an association with the rate of escape? It would also be helpful for readers to include a definition of entropy in the methods, in addition to a reference to prior work. For example, it is not clear what is being averaged when "average SE" is described.

      We thank the reviewer to point out the ambiguity in describing average SE. This has been rectified by adding more information in the methods section (Lines 397 to 400):

      “Briefly, SE was calculated using the frequency of occurrence of SNPs based on per codon position, this was further normalized by the length of the number of codons in the sequence which made up respective protein. An average SE value was calculated for each time point in each protein region for all subjects until the fixation event.”

      To answer the reviewer’s question, we computed entropy at multiple time points preceding the observation in the escape mutation. The escape rate was calculated for the epitopes targeted by immune response. We compared the average SE based on change of each codon position and then normalised by protein length, where the region contained the epitope and the time it took to reach fixation. We observed that if the protein region had a higher rate of variation (i.e. higher average SE) then we also see a quicker emergence of an immune escape epitope. Since we took SE from the very first time point and all subsequent time points until fixation, we do not think that escape mutations already been present at the population would alter the findings of the association with rate of escape. Especially, these escape mutations were rarely observed at early time points. It is likely that due to host immune pressure that the escape variant could be observed, the SE therefore suggest the liberty of exploration in the mutation landscape. If the region was highly restrictive where any mutations would result in a failed variant, then we should observe relatively lower values of average SE. In other words, the higher variability that is allowed in the region, the greater the probability that it will find a solution to achieve immune escape.

      Reviewer #2 (Recommendations for the authors):

      In addition to the main points above, there are a few minor comments and suggestions about the presentation of the data.

      (1) It's not clear how, precisely, the model-based fitness has been calculated and normalized. It would be helpful for the authors to describe this explicitly. Especially in Figure 3, the plotted fitness values lie in dramatically different ranges, which should be explained (maybe this is just an error with the plot?).

      We have now clarified how the model-based fitness has been calculated and normalized in the method section “Estimating survival fitness of viral variants” at line 465-472.

      “The model used for estimating viral fitness has been previously described by Hart et al. (19). Briefly, the original approach used HCV subtype 1a sequences to generate the model for the NS5B protein region. To update the model for other regions (NS3 and NS2) as well as other HCV subtypes in this study, subtype 1b and subtype 3a sequences were extracted from the Los Almos National Laboratory HCV database. An intrinsic fitness model was first generated for each subtype for NS5B, NS3 and NS2 region of the HCV polyprotein. Then using, longitudinally sequenced data from patients chronically infected with HCV as well as clinically documented immune escape to describe high viral fitness variants, we generated estimates of the viral fitness for subjects chronically infected with HCV in our cohort.”

      Our apologies, there was an error with the plot in Figure 3. This has now been resolved.

      (2) In different plots, the authors show every pairwise comparison of ELISPOT values, population fitness, average SE, and rate of escape. It may be helpful to make one large matrix of plots that shows all of these pairwise comparisons at the same time. This could make it clear how all the variables are associated with one another. To be clear, this is a suggestion that the authors can consider at their discretion.

      Thank you for the suggestion to create a matrix of plots for pairwise comparisons. While this approach could indeed clarify variable associations, implementing it is outside the scope of this project. We appreciate the idea and may consider it in future studies as we continue to expand on this work.

    1. Think for a minute about consequentialism. On this view, we should do whatever results in the best outcomes for the most people. One of the classic forms of this approach is utilitarianism, which says we should do whatever maximizes ‘utility’ for most people. Confusingly, ‘utility’ in this case does not refer to usefulness, but to a sort of combo of happiness and wellbeing. When a utilitarian tries to decide how to act, they take stock of all the probable outcomes, and what sort of ‘utility’ or happiness will be brought about for all parties involved. This process is sometimes referred to by philosophers as ‘utility calculus’. When I am trying to calculate the expected net utility gain from a projected set of actions, I am engaging in ‘utility calculus’ (or, in normal words, utility calculations). Now, there are many reasons one might be suspicious about utilitarianism as a cheat code for acting morally, but let’s assume for a moment that utilitarianism is the best way to go. When you undertake your utility calculus, you are, in essence, gathering and responding to data about the projected outcomes of a situation. This means that how you gather your data will affect what data you come up with. If you have really comprehensive data about potential outcomes, then your utility calculus will be more complicated, but will also be more realistic. On the other hand, if you have only partial data, the results of your utility calculus may become skewed. If you think about the potential impact of a set of actions on all the people you know and like, but fail to consider the impact on people you do not happen to know, then you might think those actions would lead to a huge gain in utility, or happiness. When we think about how data is used online, the idea of a utility calculus can help remind us to check whether we’ve really got enough data about how all parties might be impacted by some actions. Even if you are not a utilitarian, it is good to remind ourselves to check that we’ve got all the data before doing our calculus. This can be especially important when there is a strong social trend to overlook certain data. Such trends, which philosophers call ‘pernicious ignorance’, enable us to overlook inconvenient bits of data to make our utility calculus easier or more likely to turn out in favor of a preferred course of action.

      These paragraphs tell us that it is important to collect comprehensive data, think about the impact of relevant parties, and considering the groups that are easily overlooked before making decisions. This reminds me of cyberbullying in the society today. Lots of people only listen to one side of the story. They get emotionally stirred up by comments on a popular influencer’s social post and end up participating in online bullying against the other group. This kind of behavior stems from a lack of critical thinking and the unwillingness to investigate the truth from multiple perspectives, which can have serious consequences.

    2. Can you think of an example of pernicious ignorance in social media interaction? What’s something that we might often prefer to overlook when deciding what is important?

      An example in social media is internet violence. Most people cite common shaming as upholding justice but are remiss in forgetting the psychological as well as emotional harm inflicted on the target. By focusing on the enjoyment of calling someone out at any cost while forgetting the long-term impact on the target's well-being, users forget the harm that their actions may result in eventually

  4. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Shannon Bond. Twitter takes Elon Musk to court, accusing him of bad faith and hypocrisy. NPR, July 2022. URL: https://www.npr.org/transcripts/1111032233 (visited on 2023-11-24).

      In this NPR transcript we learn that basically Elon Musk has broken a contract with twitter as he "secretly stopped taking action to buy twitter" this idea shows which the two people in this transcript mention of changing mind when he feels and trashing the company, it is interesting that even billionaires think that their wish-washy thinking may not harm others or the reputation of others when it actually does.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Revision Plan (Response to Reviewers)

      1. General Statements [optional]

      Response: We are pleased the reviewers appreciate the power of this novel proteomics methodology that allowed us to uncover new depths on the complexity of the ribosome ubiquitination code in response to stress. We also appreciate that the reviewers think that this is a "very timely" study and "interesting to a broad audience" that can change the models of translation control currently adopted in the field. Characterizing complex cellular processes is critical to advance scientific knowledge and our work is the first of its kind using targeted proteomics methods to unveil the integrated complexity of ribosome ubiquitin signals in eukaryotic systems. We also appreciate the fairness of the comments received and below we offer a comprehensive revision plan substantially addressing the main points raised by the reviewers. According to the reviewers' suggestions, we will also expand our studies to two additional E3 ligases (Mag2 and Not4) known to ubiquitinate ribosomes, which will create an even more complete perspective of ubiquitin roles in translation regulation.

      2. Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors present a potentially powerful proteomics platform using parallel reaction monitoring (PRM) to quantitatively profile ribosomal protein (RP) ubiquitylation, with a focus on yeast under hydrogen peroxide (H₂O₂) stress. This approach robustly identifies both known and novel RP modifications, including basal ubiquitylation events previously undetected, and identifies Hel2-dependent mechanisms. The data support the conclusion that RPs are regulated by a multifaceted ubiquitin code, establishing a good foundation for the study.

      However, the study's focus shifts in a manner that introduces several limitations. Following the rigorous PRM-based analyses, the reliance on Western blotting without replication or quantification (e.g., single-experiment data in Figs. 3-5) significantly weakens the evidence. Experimental design becomes inconsistent, with variable combinations of stressors (H₂O₂, MMS, 4-NQO) and genetic backgrounds (WT, hel2Δ, rad6Δ) that preclude systematic comparisons. For instance, Fig. 3C/E and Fig. 4 omit critical controls (e.g., MMS in Fig. 4, rad6Δ in Fig. 3E), while Fig. 5 conflates distinct variables by comparing H₂O₂-treated rad6Δ with MMS-treated hel2Δ-a design that obscures causal relationships. Furthermore, Fig. 3F highlights that 4-NQO and MMS elicit divergent responses in hel2Δ, undermining the rationale for using these stressors interchangeably. These inconsistencies culminate in a fragmented narrative; attempts to link ISR activation or ribosome stalling to RP ubiquitylation become impossible, leaving the primary takeaway as "stress responses are complex" rather than advancing mechanistic insight.

              __Response: __We appreciate the evaluation of our work and that the power of our proteomics method established a good foundation for the study. We also understand the reviewer's concerns and we will detail below a plan to enhance quantification and increase systematic comparisons. The experiments presented here were conducted with biological replicates, but in several instances, we focused on presence and absence of bands, or their pattern (mono vs poly-ub) because of the semi-quantitative nature of immunoblots. We will revise the figures and present their quantification and statistical analyses. In additional, we did not intend to use these stressors interchangeably, but instead, to use select conditions to highlight the complexity the stress response. In particular, we followed up with H2O2 *versus* 4-NQO because both chemicals are considered sources of oxidative stress. Even though it is unfeasible to compare every single stress condition in every strain background, in the revised version, we will include additional controls to increase the cohesion of the narrative, and expand the comparison between MMS, H2O2, and 4-NQO, as suggested. Details below.
      

      To strengthen the work, the following revisions are essential:

      R1.1. Repeat and quantify immunoblots: All Western blotting data require biological replicates and statistical analysis to support claims.

              __Response: __As requested, we will display quantification and statistical analysis of the suggested and new immunoblots that will be conducted during the revision period.
      

      R1.3. Remove non-parallel comparisons: The mRNA expression analysis in Fig. 5, which compares dissimilar conditions (e.g., rad6Δ + H₂O₂ vs. hel2Δ + MMS), should be omitted or redesigned to enable direct, strain- and stressor-matched contrasts.

              __Response: __We will follow the reviewers' suggestion and redesign the analysis to increase consistency and prioritize data under identical conditions. To increase confidence in the mRNA data analysis, we intend to perform follow up experiments and analyze protein abundance of *ARG proteins* and *CTT1 *under different conditions. The remaining data using non-parallel comparisons will be moved to supplemental material and de-emphasized in the final version of the manuscript.
      

      R1.4. Standardize experimental variables: Restructure the study to maintain identical genetic backgrounds and stressors across all figures, enabling systematic interrogation of enzyme- or stress-specific effects on the ubiquitin code.

              __Response: __To ensure a better comparison across strains and conditions, we will re-run several experiments and focus on our main stress conditions. Specifically:
      
      • 3D: We plan to re-run this experiment and include MMS

      • 3E: We plan to perform the same panel of experiments in rad6D ,and display WT data as main figure.

      • 4A-B: We plan to perform translation output (HPG incorporation) experiments with MMS as suggested

      • 4C: We plan to re-run blots for p-eIF2a under MMS for improved comparison.

      Reviewer #1 (Significance (Required)):

      The authors present a potentially powerful proteomics platform using parallel reaction monitoring (PRM) to quantitatively profile ribosomal protein (RP) ubiquitylation, with a focus on yeast under hydrogen peroxide (H₂O₂) stress. This approach robustly identifies both known and novel RP modifications, including basal ubiquitylation events previously undetected, and identifies Hel2-dependent mechanisms. The data support the conclusion that RPs are regulated by a multifaceted ubiquitin code, establishing a good foundation for the study.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this manuscript the authors use a new target proteomics approach to quantify site-specific ubiquitin modification across the ribosome before and after oxidative stress. Then they validate their findings following in particular ubiquitination of Rps20 and Rps3 and extend their analysis to different forms of oxidative stress. Finally they question the relevance of two known actors of ribosome ubiquitination, Hel2 and Rad6. It is not easy to summarize the observations because in fact the major finding is that the patterns of ribosome ubiquitination occur in a stresser and enyzme specific manner (even when considering only oxidative stress). However, the complexity revealed by this study is very relevant for the field, because it underlies that the ubiquitination code of ribosomes is not easy to interpret with regard to translation dynamics and responses to stress or players involved. It suggests that some of the models that have generally been adopted probably need to be amended or completed. I am not a proteomics expert, so I cannot comment on the validity of the new proteomics approach, of whether the methods are appropriately described to reproduce the experiments. However, for the follow up experiments, the results following Rps20 and Rps3 ubiquitination are well performed, nicely controlled and are appropriately interpreted.

      Maybe what one can regret is that the authors have limited their analysis to the study of Hel2 and Rad6, and not included other enyzmes that have already been associated with regulation of ribosome ubiquitination, to get a more complete picture. It may not take that much time to test more mutants, but of course there is the risk that rather than enable to make a working model it might make things even more complex.

              __Response: __We value the positive evaluation of our work. We also appreciate the notion that it meaningfully expands the knowledge on the complexity of the ribosome ubiquitination code, challenges the current models of translation control, and conducted well-performed, and nicely controlled experiments. To address the main concern of the reviewer, we will expand our work by studying two additional enzymes involved in ribosome ubiquitination (Mag2 and Not4) and provide a more comprehensive picture of this integrated system. Specifically, we will generate yeast strains deleted for *MAG2* and *NOT4*, and evaluate their impact in ribosome ubiquitination under our main conditions of stress. We will investigate the role of these additional E3s in translation output (HPG incorporation), and in inducing the integrated stress response via phosphorylated eIF2α and Gcn4 expression. Additional follow up experiments will be performed according to our initial results.
      

      Reviewer #2 (Significance (Required)):

      In recent years, regulation of translation elongation dynamics has emerged as a much more relevant site of control of gene expression that previously envisonned. The ribosome has emerged as a hub for control of stress responses. Therefore this study is certainly very timely and interesting for a broad audience. However, it does fall short of giving any simple picture, and maybe the only point one can question is whether it is interesting to publish a manuscript that concludes that regulation is complicated, without really being able to provide any kind of suggestive model.

      My feeling is nevertheless that it will impact how scientists in the field design their experiments and what they will conclude. It will certainly also drive new experiments and approaches, and lead to investigations on how all the different players in regulation of ribosome modification talk to each other and signal to signaling pathways.

              __Response: __We appreciate the comments and the balanced view that studies like ours will still be impactful and contribute to a number of fields in multiple and meaningful ways. With the new experiments proposed here, and used of additional mutants and strains, we intend to propose and provide a more unified model that explain this complex and dynamic relationship.
      

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Recent studies have shown that the ubiquitination of uS3 (Rps3) is crucial for the quality control of nonfunctional rRNA, specifically in the process known as 18S noncoding RNA degradation (NRD). Additionally, the ubiquitination of uS10 (Rps20) plays a significant role in ribosome-associated quality control (RQC). However, the dynamics of ribosome ubiquitination in response to oxidative stress are not yet fully understood.

      In this study, the authors developed a targeted proteomics method to quantify the dynamics of ribosome ubiquitination in response to oxidative stress, both relatively and stoichiometrically. They identified 11 ribosomal sites that exhibited increased ubiquitin modification after exposure to hydrogen peroxide (H2O2). This included two known targets: uS10 and uS3 (of Hel2), which recognize collided ribosomes and initiate the processes of 18S NRD and translation quality control (RQC). Using isotope-labeled peptides, the researchers demonstrated that these modifications are non-stoichiometric and display significant variability among different peptides.

      Furthermore, the authors explored how specific enzymes in the ubiquitin system affect these modifications and their impact on global translation regulation. They found that uS3 (Rps3) and uS10 (Rps20) were modified differently by various stressors, which in turn influenced the Integrated Stress Response (ISR). The authors suggest that different types of stressors alter the pattern of ubiquitinated ribosomes, with Rad6 and Hel2 potentially competing for specific subpopulations of ribosomes.

      Overall, this study emphasizes the complexity of the ubiquitin ribosomal code. However, further experiments are necessary to validate these findings before publication.

      Major Comments:

      I consider the additional experiments essential to support the claims of the paper.

      R3.1. To understand the roles of ribosome ubiquitination at the specific sites, the authors must perform stressor-specific suppression of global translation, as demonstrated in Figures 4 and 5. This should include the uS10-K6R/K8R and uS3-K212R mutants.

              __Response: __We understand the importance of the suggested experiment. We have already requested and kindly received strains expressing these mutations, which will reduce the time required to successfully address this point. We will perform our translation and ISR assays such as the one referred by the reviewer in Figs. 4A-C and 5E, and results will determine the role of individual ribosome ubiquitination sites in translation control.
      

      R3.2. It is crucial to ensure that experiments are adequately replicated and that statistical analysis is thorough, with precise quantification. For a more accurate comparison between wild-type (WT) and Hel2 deletion mutants regarding ribosome ubiquitination, the authors should quantify the ubiquitinated ribosomes in both WT and Hel2 mutants under stress conditions. This quantification should be conducted on the same blot, using diluted control samples. Similarly, in Figures 3F and 4C, for an accurate comparison between WT and Hel2 or Rad6 deletion mutants, the authors should quantify the ubiquitinated ribosomes across these conditions. Again, this quantification should be performed on the same blot with the dilution of control samples.

              __Response: __As was also requested by reviewer 1 and discussed above (point R1.1), we will conduct quantification and display statistical analyses for our immunoblots. In addition, we will re-run the aforementioned experiments to improve quantification following the reviewers' request (same gel & diluted control samples).
      

      Reviewer #3 (Significance (Required)):

      • General assessment:

      Recent studies reveal that the ubiquitination of uS3 (Rps3) is essential for the quality control of nonfunctional rRNA (18S NRD), while the ubiquitination of uS10 (Rps20) plays a crucial role in ribosome-associated quality control (RQC). However, the dynamics of ribosome ubiquitination in response to oxidative stress remain unclear.

      • Advance:

      In this study, the authors developed a targeted proteomics method to quantify ribosome ubiquitination dynamics in response to oxidative stress, both relatively and stoichiometrically. By utilizing isotope-labeled peptides, they demonstrated that these modifications are non-stoichiometric and exhibit significant variability across different peptides. They identified 11 ribosomal sites that showed increased ubiquitin modification following H2O2 exposure, including two known targets of Hel2, which recognize collided ribosomes and induce translation quality control (RQC).

      • Audience: This information will be of interest to a specialized audience in the fields of translation, ribosome function, quality control, ubiquitination, and proteostasis.

      • The field: Translation, ribosome function, quality control, ubiquitination, and proteostasis.

      __ Response:__ We appreciate that our work will be valuable to a number of fields in protein dynamics and that our method advances the field by measuring ribosome ubiquitination relatively and stoichiometrically in response to stress.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Response: All requested changes require experiments and data analyses, and a complete revision plan is delineated above in section #2.

      • *

      4. Description of analyses that authors prefer not to carry out

      • *

      R1.2. Leverage the PRM platform: Apply the established quantitative proteomics approach to validate or extend findings in Fig. 3 (e.g., RAD6-dependent ubiquitylation), ensuring methodological consistency.

              __Response: __Although we understand the interest on the proposed result for consistency, this is the only requested experiment that we do not intend to conduct. Because of the lack of overall ubiquitination of ribosomal proteins in *rad6**D* in response to H2O2 (e.g., Silva et al., 2015, Simoes et al., 2022), we believe that this PRM experiment in unlikely to produce meaningful insight on the ubiquitination code. In this context, we expected that sites regulated by Hel2 will be the ones largely modified in rad6*D *and we followed up on them via immunoblot. Moreover, this experiment would not be time or cost-effective, and resources and efforts could be used to strengthen other important areas of the manuscript, such as including the E3's Mag2 and Not4 into our work.
      
    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This is a very interesting paper investigating the fitness and cellular effects of mutations that drive dihedral protein complex into forming filaments. The Levy group have previously shown that this can happen relatively easily in such complexes and this paper now investigates the cellular consequences of this phenomenon. The study is very rigorous biophysically and very surprisingly comes up empty in terms of an effect: apparently this kind of self-assembly can easily be tolerated in yeast, which was certainly not my expectation. This is a very interesting result, because it implies that such assemblies may evolve neutrally because they fulfill the two key requirements for such a trajectory: They are genetically easily accessible (in as little as a single mutation), and they have perhaps no detrimental effect on fitness. This immediately poses two very interesting questions: Are some natural proteins that are known to form filaments in the cell perhaps examples of such neutral trajectories? And if this trait is truly neutral (as long as it doesn't affect the base biochemical function of the protein in question), why don't we observe more proteins form these kinds of ordered assemblies.

      I have no major comments about the experiments as I find that in general very carefully carried out. I have two more general comments:

      1. The fitness effect of these assemblies, if one exists, seems very small. I think it's worth remembering that even very small fitness effects beyond even what competition experiments can reveal could in principle be enough to keep assembly-inducing alleles at very low frequencies in natural populations. Perhaps this could be acknowledged in the paper somewhere.
      2. The proteins used in this study I think were chosen such that they do not have an important function in yeast that could be disrupted by assembly This allows the effect of the large scale assemblies to be measured in isolation. If I deduced this correctly, this should probably be pointed out agin in this paper (I apologise if I missed this).
      3. The model system in which these effects were tested for is yeast. This organism has a rigid cell wall and I was wondering if this makes it more tolerant to large scale assemblages than wall-less eukaryotes. Could the authors comment on this?

      Minor points:

      In Figure 2D, what are the fits? And is there any analysis that rules out expression effects on the mutant caused by higher levels of the wild-type? The error bars in Figure 2E are not defined.

      Significance

      This is a remarkably rigours paper that investigates whether self-assembly into large structures has any fitness effect on a single celled organism. This is very relevant, because a landmark paper from the Levy group showed that many proteins are very close in genetic terms to forming such assemblies. The general expectation I think would have been that this phenomenon is pretty harmful. This would have explained why such filaments are relatively rare as far as we know. This paper now does a large number of highly rigours experiments to first prove beyond doubt that a range of model proteins really can be coaxed into forming such filaments in yeast cells through a very small number of mutations. Its perhaps most surprising result is that this does not negatively affect yeast cells.

      From an evolutionary perspective, this is a very interesting and highly surprising result. It forces us to rethink why such filaments are not more common in Nature. Two possible answers come to mind: First, it's possible that filamentation is not directly harmful to the cell, but that assembling proteins into filaments can interfere with their basic biochemical function (which was not tested for here).

      Second, perhaps assembly does cause a fitness defect, but one so small that it is hard to measure experimentally. Natural selection is very powerful, and even fitness coefficients we struggle to measure in the laboratory can have significant effects in the wild. If this is true, we might expect such filaments to be more common in organisms with small effective population sizes, in which selection is less effective.

      A third possibility is of course that the prevalence of such self-assembly is under-appreciated. Perhaps more proteins than we currently know assemble into these structures under some conditions without any benefit or detriment to the organism.

      These are all fascinating implications of this work that straddle the fields of evolutionary genetics and biochemistry and are therefore relevant to a very wide audience. My own expertise is in these two fields. I also think that this work will be exciting for synthetic biologists, because it proves that these kinds of assemblies are well tolerated inside cells.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      The article emphasizes vocal social behavior but none of the experiments involve a social element. Marmosets are recorded in isolation which could be sufficient for examining the development of vocal behavior in that particular context. However, the early-life maturation of vocal behavior is strongly influenced by social interactions with conspecifics. For example, the transition of cries and subharmonic phees which are high-entropy calls to more low-entropy mature phees is affected by social reinforcement from the parents. And this effect extends cross context where differences in these interaction patterns extend to vocal behavior when the marmosets are alone. From the chord diagrams, cries still consist of a significant proportion of call types in lesioned animals. Additionally, though it is an intriguing finding that the infants' phee calls have acoustic differences being 'blunted of variation, less diverse and more regular,' the suggestion that the social message conveyed by these infants was 'deficient, limited, and/or indiscriminate' is not but can be tested with, for example, playback experiments.

      We recognize that our definition of vocal social behavior is not within the normal realm of direct social interactions. We were particularly interested in marmoset vocalizations as a social signal, such as phees, cries and twitter, even when their family members or conspecifics are not visibly present. Generally speaking, in the laboratory, infant marmosets make few calls when in the presence of another conspecific, but when isolated they naturally make phee calls to reach out to their distantly located relatives. In this context, while we did not assess the animals interacting directly, we assessed what are normally referred to as ‘social contact calls,’ hence the term ‘social vocalizations.’ Playback recordings might provide potential evidence of antiphonal calling as a means of social interaction and might reveal the poor quality of the social message conveyed by the infant, but even here, the vocalizing marmoset would be calling to a non-visible conspecific. Thus, although our experiment lacked a direct social element, our data suggest that in the absence of a functioning ACC in early life, infant calls that convey social information, and which would elicit feedback from parents and other family members, may be compromised, and this could potentially influence how that infant develops its social interactive skills. We have now commented on the significance of social vocalizations in the introductory text (page 3) and discussion (page 15).

      The manuscript would benefit from the addition of more details to be able to better determine if the conclusions are well supported by the data. Understanding that this is very difficult data to get, the number of marmosets and some variability in the collection of the data would allow for the plotting of each individual across figures. For example, in the behavioral figures, which is the marmoset that is in the behavioral data that has a sparing of the ACC lesion in one hemisphere? Certain figures, described below in the recommendations for the authors, could also do with additional description.

      Thanks for these suggestions. We have plotted the individual animals in the relevant figures and addressed the comments and recommendations listed below.

      Reviewer #1 (Recommendations For The Authors):

      Given the number of marmosets, variability in the collected data, lesion extent, and different controls, I would like to see more plots with individuals indicated (perhaps with different symbols). More details could also be added for several plots.

      Figure 2D (new) and 2E now have plots that represent the individual animals, each represented by a different symbol.

      Figure 2A) Since lesions are bilateral, could you also show the extent of the lesions on the other side for completeness?

      Our intention was to process one hemisphere of each brain for Golgi staining to examine changes in cell morphology in the ACC and associated brain regions following the lesion. Unfortunately, the Golgi stain was unsuccessful. Consequently, we were unable to use the tissue to reconstruct the bilateral extent of the lesion. We did, however, first establish the bilateral nature of the lesion through coronal slices of the animals MRI scan before processing the intact hemisphere to confirm the bilateral extent of the lesion. The MRI scans (every 5th section) for each control and lesioned animal is compiled in a figure in the supplementary materials (Fig. S1). These scans show that the ACC-lesioned animals have bilateral lesions with one animal (ACC1) showing some sparing in one hemisphere, as we noted in the text. We have now made reference to this supplemental figure in the text (page 5).

      Figure 2B/C) In Figure 2B, control and ACC lesions are in the columns while right next to it in 2C, ACC lesion and control are in the rows. Could these figures be adjusted so that they are consistent?

      We have now adjusted these figures and updated the figure legends accordingly.

      Figure 2C) Is there quantification of the 'loss of neurons and respective increase in glial cells at the lesioned site especially at the interface between gray and white matter'? There are multiple slices for each animal.

      Thanks for suggesting this. We have now quantified these data which are presented as a new graph as Fig. 2D. These data revealed a significant loss of neurons (NeuN) in the ACC group as well as an increase in glial cells (GFAP and Iba1) relative to the controls. The figure legend and results have also been updated.

      Figure 2C) It is difficult for me to distinguish between white and purple - could you show color channels independently since images were split into separate channels for each fluorophore?

      Fig. 2C has been revised to better visualize the neurons and glia at the gray and white matter interface. We found that grayscale images for each channel offered a better contrast than separating the channels for each fluorophore.

      Figure 2C/D) I like how there are individual dots here for the individual marmosets. Since there are four in each group, could they be represented throughout with symbols (with a key indicating the pair and also the control condition)? For example, were there changes in the histology for control animals that got saline injections as opposed to those that didn't get any surgery?

      We have highlighted the individual animals with different symbols in the figures. Although some animals were twin pairs, it was not possible to have twins in all cases. Only two sets were twins. We have indicated the symbols that represent the twin pair in Fig. 2 as well as the MRI scans of the twin pairs in Fig. S1. There were no observed changes in histology for the sham animals relative to the other non-sham controls. The MRI scan for one sham CON2 shows herniated tissue in the right hemisphere which is a normal consequence of brain exposure caused by a craniotomy.

      Figure 3D-E) Here, individual data points could be informative especially given that some animals are missing data past the third week.

      To prevent cluttering the figure with too many data points, we have added the sample size for each group in the figure legend (pages 33).

      Figure 3D/F) What exactly is the period that goes into this analysis? In the text, 'Further analysis showed that the ACC lesion had minimal effects on the rate of most call types during this period'. Is this period from weeks 3 to 6 relative to the proportions in week 2? I think I also don't quite understand the chord diagram. The legend says 'the numbers around each chord diagram represents relative probability value for each call type transition' so how does that relate to the proportion of these call types? It looks like there is a wider slice for cries for ACC-lesioned animals each week. I also don't see in the week 4 chord diagram, the text description of 'elevation in the rate of 'other' calls, which comprised tsik, egg, eck, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion."

      We apologize for the confusion. Fig 3D and Fig 3F are not directly related. Fig. 3D shows the different types of emitted calls. The figure shows the averaged data per group pooled from post-surgery weeks (week 3 – week 6). It represents the proportion of individual call types relative to the total number of calls during each recording period. The only major finding here was the increased rate of ‘other’ calls comprising tsik, egg, ock, chatter and seep calls. These calls were significantly elevated in animals after the ACC lesion.

      While Fig. 3D represents the differences in the proportion of calls, the chord diagrams in Fig. 3F represents the probability of call-to-call transition obtained from a probability matrix. At postnatal week 6, marmosets with ACC lesions showed a higher likelihood of transitions between all call types, but less frequent transitions between social contact calls relative to sham controls. The chord diagrams visualize the weighted probabilities and directionality of these transitions between the different call types. Weighted probabilities were used to account for variations in call counts. The thickness of the arrows or links indicates the probability of a call transition, while the numbers surrounding each chord diagram represent the relative probability value for each specific transition. We have now reworded the text and clarified these details in the figure legend (pages 32-33).

      Figure 3E) How is the ratio on the y-axis calculated here?

      The y-axis represents the averaged value of the ratios of the number of social contact calls relative to non-social contact calls in each recording per subject per group (i.e., (x̄ (# social calls / # non-social calls). This is now included in the figure legend and the axis is updated (page 32).

      Also, cries could be considered a 'social contact call' since they are produced by infants to elicit responses from the parents. There is also the hypothesis in the literature that cries transition into phees.

      The reviewer is correct. Cries are often considered a social contact call because they elicit parental feedback. We decided to separate cry-calls from other social contact calls for two reasons. First, in our sample, we found cry behavior to be highly variable across the animals. For example, one control infant cried incessantly whereas another control infant cried less than normal. This extreme variability in animals of the same group masked the features between animals that reliably differentiated between them. Second, cry-calls elicit feedback from parents who are normally within the vicinity of the infant whereas phee calls elicit antiphonal phee calls from any distantly located conspecific. In other words, the context in which these calls are often elicited are very different.

      The use of 'syntactical' is a bit jarring to me because outside of linguistics, its use in animal communication generally refers to meaning-bearing units that can be combined into well-formed complexes such as pod-specific whale songs or predator alarm calls with concatenated syllable types in some species of monkeys. To my knowledge, individual phee syllables have not been currently shown to carry information on their own and may be better described as 'sequential' rather than 'syntactical'.

      We agree. We have made this change accordingly.

      Figure 4B) How many phee calls with differing numbers of syllables are present each week? How equal is the distribution given that later analyses go up to 5 syllables?

      The total number of phee calls with differing number of syllables ranged between 20-40 phees. This number varied between subjects, per week. The most common were 3- and 4-syllable phee calls which ranged from 7-15. Due to this variability, Fig. 4B presents the average syllable count. The axis is now updated.

      Figure 4C-E) How is the data combined here? Is there a 2nd syllable, the combined data from the 2nd syllable from phee calls of all lengths (1 - 5?). If so, are there differences based on how long the total sequence is?

      The combined data represents the specific syllable (e.g., the 1st syllable in a 2-syllable phee, in a 3-syllable phee and in a 4-syllable phee) irrespective of the length of the sequence in a sequence. No differences were observed between 2nd syllable in a 2 syllable phee and 2nd syllable in a 3 or a 4 syllable phee. We have included this detail in the figure legend (page 33-34).

      So duration is a vocal parameter that is highly dependent on physical factors such as body size and lung volume, where there differences in physical growth between the pairs of ACC-lesioned marmosets and their twins? Entropy is less closely tied to these physical factors but has previously been shown to decrease as phee calls mature, which we can also see in the negative relationship of the control animals. Do you know of experiments that show that lower entropy calls are more 'blunted'?

      Thank you for raising the important issue of physical growth factors. For twin pairs, it is not uncommon for one infant to be slightly bigger, heavier or stronger than the other presumably because one gets more access to food. With increasing age, we did not observe significant changes in bodyweight between the groups. We examined grip strength in all infants as a means of assessing how well the infant was able to access food during nursing. Poor grip strength would indicate a lower propensity to ‘hang on’ to the mother for nursing which could lead to lower weight gain and reduced physical growth. We found that both grip strength and body weight increased as the infants got older and both parameters were equivalent. We have included an additional figure to show the normal increase in both weight and grip strength to the supplemental materials (Fig. S3) and have made reference to this in the text (page 8).

      As for entropy, it’s impact on the emotional quality of vocalizations has not been systematically explored. Generally speaking, high entropy relates to high randomness and distortion in the signal. Accordingly, one view posits low-entropy phee calls represent mature sounding calls relative to noisy and immature high-entropy calls (e.g., Takahasi et al 2017). In the current study, the reduction in syllable entropy observed for both groups of animals with increasing age is consistent with this view. At the same time entropy can relate to vocal complexity; high entropy refers to complex and variable sound patterns whereas low entropy sounds are predictable, less diverse and simple vocal sequences (Kershenbaum, A. 2013. Entropy rate as a measure of animal vocal complexity. Bioacoustics, 23(3), 195–208). One possibility is that call maturity does not equate directly to emotional quality. In other words, a low-entropy mature call can also be lacking in emotion as observed in humans with ACC damage; these patients show mature speech, but they lack the variations in rhythms, patterns and intonation (i.e., prosody) that would normally convey emotional salience and meaning. Our observation of a reduction in phee syllable entropy in the ACC group in the context of being short and loud with reduced peak frequency is consistent with this view. Our use of the word ‘blunt’ was to convey how the calls exhibited by the ACC group were potentially lacking emotional meaning. Beyond this speculation, we are not aware of any papers that have examined the relationship between entropy and blunted calls directly. We have now included this speculation in the discussion (pages 12-13).

      Reviewer #2 (Public Review):

      The authors state that the integrity of white matter tracts at the injection site was impacted but do not show data.

      We have added representative micrographs of a control and ACC-lesioned animal in a new supplementary figure which shows the neurotoxin impacted the integrity of white matter tracts local to the site of the lesion (Fig. S2).

      The study only provides data up to the 6th week after birth. Given the plasticity of the cortex, it would be interesting to see if these impairments in vocal behavior persist throughout adulthood or if the lesioned marmosets will recover their social-vocal behavior compared to the control animals.

      We agree. Our original intention was to examine behavior into adulthood. Unfortunately, the COVID-19 pandemic compromised the continuation of the study. We were limited by the data that we were allowed to acquire due to imposed restrictions. Some non-vocalization data collected when the animals were young adults is currently being prepared for another paper.

      Even though this study focuses entirely on the development of social vocalizations, providing data about altered social non-vocal behaviors that accompany ACC lesions is missing. This data can provide further insights and generate new hypotheses about the exact role of ACC in social vocal development. For example, do these marmosets behave differently towards their conspecifics or family members and vice versa, and is this an alternate cause for the observed changes in social-vocal development?

      We agree. At the time however, apparatus for assessing behavior between the infant’s family and non-family members was not available. Assessing such behaviors in the animals holding room posed some difficulty since marmosets are easily distracted by other animals as well as the presence of an experimenter, amongst other things. This is an area of investigation we are currently pursuing.

      Reviewer #3 (Public Review):

      It is striking to find that the vocal repertoire of infant marmosets was not significantly affected by ACC lesions. During development, the neural circuits are still maturing and the role of different brain regions may evolve over time. While the ACC likely contributes to vocalizations across the lifespan, its relative importance may vary depending on the developmental stage. In neonates, vocalizations may be more reflexive or driven by physiological needs. At this stage, the ACC may play a role in basic socioemotional regulation but may not be as critical for vocal production. Since the animals lived for two years, further analysis might be helpful to elucidate the precise role of ACC in the vocal behavior of marmosets.

      Figure 3D. According to the Introduction "...infant ACC lesions abolish the characteristic cries that infants normally issue when separated from its mother". Are the present results in marmosets showing the opposite effect? Please discuss.

      To date, the work of Maclean (1985) is the only publication that describes the effect of early cingulate ablation on the spontaneous production of ‘separation calls’ largely construed as cries, coos and whimpers in response to maternal separation. All of this work was largely performed in rhesus macaques or squirrel monkeys. In addition to ablating the cingulate cortex, Maclean found that it was necessary to ablate the subcallosal (areas 25) and preseptal cingulate cortex (presumably referring to prelimbic area 32) to permanently eliminate the spontaneous production of separation cry calls. Our ablation of the ACC was more circumscribed to area 24 and is therefore consistent with MacLean’s earlier work that removal of ACC alone does not eliminate cry behavior. In adults, ACC ablation is insufficient at eliminating vocalization as well. We make reference to this on pages 13-14 of the discussion.

      Figure 3E and Discussion. Phees are mature contact calls and cries immature contact calls (Zhang et al, 2019, Nat Commun). Therefore, I would rather say that the proportion of immature (cries) contact calls increases vs the mature (phee, trill, twitters) contact calls in the ACC group. Cries are also "isolated-induced contact calls" to attract the attention of the caregivers.

      The reviewer is correct in that cries are directed towards caregivers but in our sample, cry behavior was highly variable between the infants. Consequently, in Fig. 3E social contact calls include phee, twitter and trill calls but does not include cries which were separated (see also response to reviewer #1). Many of the calls made during babbling were immature in their spectral pattern (compare phee calls between Fig. 3A and 3B). Cries typically transitioned into phees, twitters or trills before they fully matured. Fig 3E shows that the controls made more isolation-induced social contact calls at postnatal week 6 which were presumably maturing at this time point. Thus, if anything, there was an increase in the proportion of mature contact calls vs immature contact calls with increasing age.

      Figure 4D. Animal location and head direction within the recording incubator can have significant effects on the perceived amplitude of a call. Were these factors taken into account?

      The reviewer makes an excellent observation. Unfortunately, we did not account for location and head direction because the infants were quite mobile in the incubator. The directional microphone was hidden from view because the infants were distracted by it, and positioned ~12 cm from the marmoset, and placed in the exact same location for every recording. In addition, calls with phantom frequencies were eliminated during visual inspection of spectrograms. Beyond these details, location and head direction were not taken into account.

      Figure 4E. When a phee call has a higher amplitude, as is the case for the ACC group (Figure 4D), the energy of the signal will be concentrated more strongly at the phee call frequency ~8KHz. This concentration of the energy reduces the variability in the frequency distribution, leading to lower entropy. The interpretation of the results should be reconsidered. A faint call (control group) can exhibit more variability in the frequency content since the energy is distributed across a wider range of frequencies contributing to higher entropy. It can still be "fixed, regular, and stereotyped" if the behavior is consistent or predictable with little variation. Also, to define ACC calls as "monotonic" I would rather search for the lack of frequency modulation, amplitude variation, or narrower bandwidth.

      We very much appreciate this explanation. We were able to identify the maximum frequency that closely matched pitch of a sound for each syllable in a multisyllabic phee. New Fig. 4E shows that the peak frequency for each phee syllable was lower in the ACC-lesioned monkeys which may directly translate to the low entropy observed in this group. The term “monotonic” was used to relate our data to the classical and long-standing evidence of human ACC lesions causing monotonous intonation of speech. When all factors are taken into account, it is evident that the vocal phee signature of the ACC-lesioned animal was structurally different to the controls implicating a less complex and stereotyped ACC signal. Further studies are needed to systematically explore the relationship between entropy and emotional quality of vocalizations

      Apart from the changes in the vocal behavior, did the AAC lesions manifest in any other observable cognitive, emotional, or social behavior? ACC plays a role in processing pain and modulating pain perception. Could that be the reason for the observed increase in the proportion of cries in the ACC group and the increase in the phee call amplitude? Did the cries in the ACC group also display a higher amplitude than the cries in the control group?

      It was our intention to acquire as much data as possible from these infants as they matured from a cognitive, social and emotional perspective. Unfortunately, our study was hampered by variety of reasons including the COVID-19 pandemic which imposed major restrictions on our ability to continue with the experiment in a time sensitive manner. In addition, the development and construction of the custom apparatus to measure these behaviors was stalled during this period further preventing us from collecting behavioral data at regular time intervals. As for the cry behavior, the number of cries, in the ACC group were very low especially at postnatal week 5 and 6. Consequently, there were very few data points to work with.

      Discussion. Louder calls have the potential to travel longer distances compared to fainter calls, possess higher energy levels, and can propagate through the environment more effectively. If the ACC group produced louder phee syllables, how could be the message conveyed over long distances "deficient, limited, and/or indiscriminate"?

      Thanks for raising this interesting concept. Not all calls emitted by the animals were loud. We specifically examined the long-distance phee call in this regard. The phee syllables emitted by the ACC group were high amplitude with low frequencies, short duration and low entropy. Taking these factors into account, it is conceivable that the phee calls produced by the ACC group could not effectively convey their message over long distances despite their propagation through the environment. We have made reference to this in the discussion where we focus is specifically on the phee calls only (pages 12).

      Abstract: Do marmosets have syntax? Consider replacing "syntactical" with a more appropriate term (maybe "syntax-like").

      Thanks for this suggestion. We have replaced the term syntactical with ‘sequential’ as per the recommendation of reviewer #1.

      Introduction: "...cries that infants normally issue when separated from its mother". Please replace "its" with "their".

      This has been corrected.

      Results: Is the reference to Fig 1B related to the text?

      We have included and referred to Fig. 1B in the text (results and methods) to show other researchers how they can use this technique as a reliable and safe means of monitoring tidal volume under anesthesia in small infant marmoset without intubation.

      I understand that both "spectrograph" and "spectrogram" are used to analyze the frequency content of a signal. Nevertheless, "spectrogram" refers to the visual representation of the frequency content of a signal over time, and this term is commonly used in audio signal processing and specifically in the vocal communication field. I would recommend replacing "spectrograph" with "spectrogram".

      Thanks for this suggestion. We have corrected this throughout the manuscript.

      (Concerning the previous comment in the public review). Cries are uttered to attract the attention of the caregivers. The increase in the proportion of cries in the ACC group does not match the sentence: "...these infants appeared to make little effort in using vocalizations to solicit social contact when socially isolated".

      We apologize for the confusion. It is not the case that the ACC animals make more cries. Cry calls were highly variable amongst the animals. Consequently, although Fig 3D gives the impression that the proportion of cries in higher in ACC animals they did not differ significantly from the controls. Due to their high variability, cries were removed in the measurement of social contact. Accordingly, Fig. 3E does not include cry behavior; it shows that the ACC animals engage less in social contact calls.

      Related to Figure 3. What is the difference between "egg" and "eck" calls? Do you mean "ock"?

      We apologize. This was a typo. It should be ock calls.

      Figure 4B. Is the sample size five animals per group and per week? Overlapping data points seem to be placed next to each other. Why in some groups (e.g. ACC 6 weeks) less than five dots are visible?

      The sample size differed per week because of the lack of recording during the COVID restrictions. In Fig 4b, we have now separated the overlapping dots. We have also added the sample size of the groups in the figure legends.

      Would the authors expect to see stronger differences between the lesioned and the control groups when comparing a later developmental stage? The animals were euthanized at the age of

      These speculation is certainly feasible and yes, we were hoping to establish this level of detail with testing at later developmental stages. This is an aspect of development we are currently pursuing.

      Could these experiments be conducted?

      I’m afraid these animals are longer available, but we are currently conducting experiments in other animals with early life neurochemical manipulations who show behavioral changes into early adulthood.

      ACC lesion: It is reported that the lesions extended past 24b into motor area 6M. Did the animal display any motor control disability?

      Surprisingly, despite the lesion encroaching into 6M, these animals showed no observable motor impairment. We assessed the animals grip strength and body weight and discovered normal strength and growth in weight in both controls and the lesioned group. We have added this data as supplemental information (Fig. S3).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study investigates what happens to the stimulus-driven responses of V4 neurons when an item is held in working memory. Monkeys are trained to perform memory-guided saccades: they must remember the location of a visual cue and then, after a delay, make an eye movement to the remembered location. In addition, a background stimulus (a grating) is presented that varies in contrast and orientation across trials. This stimulus serves to probe the V4 responses, is present throughout the trial, and is task-irrelevant. Using this design, the authors report memory-driven changes in the LFP power spectrum, changes in synchronization between the V4 spikes and the ongoing LFP, and no significant changes in firing rate.

      Strengths:

      (1) The logic of the experiment is nicely laid out.

      (2) The presentation is clear and concise.

      (3) The analyses are thorough, careful, and yield unambiguous results.

      (4) Together, the recording and inactivation data demonstrate quite convincingly that the signal stored in FEF is communicated to V4 and that, under the current experimental conditions, the impact from FEF manifests as variations in the timing of the stimulus-evoked V4 spikes and not in the intensity of the evoked activity (i.e., firing rate).

      Weaknesses:

      I think there are two limitations of the study that are important for evaluating the potential functional implications of the data. If these were acknowledged and discussed, it would be easier to situate these results in the broader context of the topic, and their importance would be conveyed more fairly and transparently.

      (1) While it may be true that no firing rate modulations were observed in this case, this may have been because the probe stimuli in the task were behaviorally irrelevant; if anything, they might have served as distracters to the monkey's actual task (the MGS). From this perspective, the lack of rate modulation could simply mean that the monkeys were successful in attending the relevant cue and shielding their performance from the potentially distracting effect of the background gratings. Had the visual probes been in some way behaviorally relevant and/or spatially localized (instead of full field), the data might have looked very different.

      Any task design involves tradeoffs; if the visual stimulus was behaviorally relevant, then any observed neurophysiological changes would be more confounded by possible attentional effects. We cannot exclude the possibility that a different task or different stimuli would produce different results; we ourselves have reported firing rate enhancements for other types of visual probes during an MGS task (Merrikhi et al. 2017). We have added an acknowledgement of these limitations in the discussion section (lines 323-330 in untracked version). At minimum, our results show a dissociation between the top-down modulation of phase coding, which is enhanced during WM even for these task-irrelevant stimuli, and rate coding. Establishing whether and how this phase coding is related to perception and behavior will be an important direction for future work.

      With this in mind, it would be prudent to dial down the tone of the conclusions, which stretch well beyond the current experimental conditions (see recommendations).

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      (2) Another point worth discussing is that although the FEF delay-period activity corresponds to a remembered location, it can also be interpreted as an attended location, or as a motor plan for the upcoming eye movement. These are overlapping constructs that are difficult to disentangle, but it would be important to mention them given prior studies of attentional or saccade-related modulation in V4. The firing rate modulations reported in some of those cases provide a stark contrast with the findings here, and I again suspect that the differences may be due at least in part to the differing experimental conditions, rather than a drastically different encoding mode or functional linkage between FEF and V4.

      We have added a paragraph to the discussion section addressing links to attention and motor planning (lines 315-333), and specifically acknowledging the inherent difficulties of fully dissociating these effects when interpreting our results (lines 323-330).

      Reviewer #2 (Public review):

      Summary:

      It is generally believed that higher-order areas in the prefrontal cortex guide selection during working memory and attention through signals that selectively recruit neuronal populations in sensory areas that encode the relevant feature. In this work, Parto-Dezfouli and colleagues tested how these prefrontal signals influence activity in visual area V4 using a spatial working memory task. They recorded neuronal activity from visual area V4 and found that information about visual features at the behaviorally relevant part of space during the memory period is carried in a spatially selective manner in the timing of spikes relative to a beta oscillation (phase coding) rather than in the average firing rate (rate code). The authors further tested whether there is a causal link between prefrontal input and the phase encoding of visual information during the memory period. They found that indeed inactivation of the frontal eye fields, a prefrontal area known to send spatial signals to V4, decreased beta oscillatory activity in V4 and information about the visual features. The authors went one step further to develop a neural model that replicated the experimental findings and suggested that changes in the average firing rate of individual neurons might be a result of small changes in the exact beta oscillation frequency within V4. These data provide important new insights into the possible mechanisms through which top-down signals can influence activity in hierarchically lower sensory areas and can therefore have a significant impact on the Systems, Cognitive, and Computational Neuroscience fields.

      Strengths:

      This is a well-written paper with a well-thought-out experimental design. The authors used a smart variation of the memory-guided saccade task to assess how information about the visual features of stimuli is encoded during the memory period. By using a grating of various contrasts and orientations as the background the authors ensured that bottom-up visual input would drive responses in visual area V4 in the delay period, something that is not commonly done in experimental settings in the same task. Moreover, one of the major strengths of the study is the use of different approaches including analysis of electrophysiological data using advanced computational methods of analysis, manipulation of activity through inactivation of the prefrontal cortex to establish causality of top-down signals on local activity signatures (beta oscillations, spike locking and information carried) as well as computational neuronal modeling. This has helped extend an observation into a possible mechanism well supported by the results.

      Weaknesses:

      Although the authors provide support for their conclusions from different approaches, I found that the selection of some of the analyses and statistical assessments made it harder for the reader to follow the comparison between a rate code and a phase code. Specifically, the authors wish to assess whether stimulus information is carried selectively for the relevant position through a firing rate or a phase code. Results for the rate code are shown in Figures 1B-G and for the phase code are shown in Figure 2. Whereas an F-statistic is shown over time in Figure 1F (and Figure S1) no such analysis is shown for LFP power. Similarly, following FEF inactivation there is no data on how that influences V4 firing rates and information carried by firing rates in the two conditions (for positions inside and outside the V4 RF). In the same vein, no data are shown on how the inactivation affects beta phase coding in the OUT condition.

      Per the reviewer’s suggestion, we have added several new supplementary figures. We now show the F-statistic for discriminability over time for the LFP timecourse (Fig. S2), and as a function of power in various frequencies (Fig. S4). We have added before/after inactivation comparisons of the LFP and spiking activity, and their respective F-statistics for discrimination between contrasts and orientations in Fig. S9. Lastly, we added a supplementary figure evaluating the impact of FEF inactivation on beta phase coding in the OUT condition, showing no significant change (Fig. S11).

      Moreover, some of the statistical assessments could be carried out differently including all conditions to provide more insight into mechanisms. For example, a two-way ANOVA followed by post hoc tests could be employed to include comparisons across both spatial (IN, OUT) and visual feature conditions (see results in Figures 2D, S4, etc.). Figure 2D suggests that the absence of selectivity in the OUT condition (no significant difference between high and low contrast stimuli) is mainly due to an increase in slope in the OUT condition for the low contrast stimulus compared to that for the same stimulus in the IN condition. If this turns out to be true it would provide important information that the authors should address.

      We have updated the STA slope measurement, excluding the low contrast condition which lacks a clear peak in the STA. Additionally, we equalized the bin widths and aligned the x-axes for better visual comparability. Then, we performed a two-way ANOVA, analyzing the effects of spatial features (IN vs. OUT) and visual conditions (contrast and orientation). The results showed a significant effect of the visual feature on both orientation (F = 3.96, p=0.046) and contrast (F = 14.26, p<10<sup>-3</sup>). However, neither the spatial feature nor the spatial-visual interaction exhibited significant effects for orientation (F = 0.52, p=0.473, F=1.56, p=0.212) or contrast (F = 2.19, p=0.139, F=1.15, p=0.283).

      There are also a few conceptual gaps that leave the reader wondering whether the results and conclusion are general enough. Specifically,

      (1) The authors used microstimulation in the FEF to determine RFs. It is thus possible that the FEF sites that were inactivated were largely more motor-related. Given that beta oscillations and motor preparatory activity have been found to be correlated and motor sites show increased beta oscillatory activity in the delay period, it is possible that the effect of FEF inactivation on V4 beta oscillations is due to inactivation of the main source of beta activity. Had the authors inactivated sites with a preponderance of visual neurons in the FEF would the results be different?

      We do not believe this to be likely based on what is known anatomically and functionally about this circuitry. Anatomically, the projections from FEF to V4 arise primarily from the supragranular layers, not layers which contain the highest proportion of motor activity (Barone et al. 2000, Pouget et al. 2009, Markov et al. 2013). Functionally, based on electrical identification of V4-projecting FEF neurons, we know that FEF to V4 projections are predominantly characterized by delay rather than motor activity (Merrikhi et al. 2017). We have now tried to emphasize these points when we introduce the inactivation experiments (lines 185-186).

      Experimentally, the spread of the pharmacological effect with our infusion system is quite large relative to any clustering of visual vs. motor neurons within the FEF, with behavioral consequences of inactivation spreading to cover a substantial portion of the visual hemifield (e.g., Noudoost et al. 2014, Clark et al. 2014), and so our manipulation lacks the spatial resolution to selectively target motor vs. other FEF neurons.

      (2) Somewhat related to this point and given the prominence of low-frequency activity in deeper layers of the visual cortex according to some previous studies, it is not clear where the authors' V4 recordings were located. The authors report that they do have data from linear arrays, so it should be possible to address this.

      Unfortunately, our chamber placement for V4 has produced linear array penetration angles which do not reliably allow identification of cortical layers. We are aware of previous results showing layer-specific effects of attention in V4 (e.g., Pettine et al. 2019, Buffalo et al. 2011), and it would indeed be interesting to determine whether our observed WM-driven changes follow similar patterns. We may be able to analyze a subset of the data with current source density analysis to look for layer-specific effects in the future, but are not able to provide any information at this time.

      (3) The authors suggest that a change in the exact frequency of oscillation underlies the increase in firing rate for different stimulus features. However, the shift in frequency is prominent for contrast but not for orientation, something that raises questions about the general applicability of this observation for different visual features.

      While the shift in peak frequency across contrasts is more prominent than that across orientations (Fig. S3A-B), the relationship between orientation and peak frequency is also significant (one-way ANOVA for peak frequency across contrasts, F<sub>Contrast</sub>=10.72, p<10<sup>-4</sup>; or across orientations, F<sub>Orientation</sub>=3, p=0.030; stats have been added to Fig. S3 caption). This finding also aligns with previous studies, which reported slight peak frequency shifts (~1–2 Hz) in the context of attention (Fries, 2015). To address the question of whether the frequency-firing rate correlation generalizes to orientation-driven changes, we now examine the relationship between peak frequency and firing rate separately for each contrast level (Fig. S14). The average normalized response as a function of peak frequency, pooled across subsamples of trials from each of 145 V4 neurons (100 subsamples/neuron), IN vs. OUT conditions, shows a significant correlation during the delay period for each contrast (contrast low (F<sub>Condition</sub>=0.03, p=0.867; F<sub>Frequency</sub>=141.86, p<10<sup>-18</sup>; F<sub>Interaction</sub>=10.70, p=0.002, ANCOVA), contrast middle (F<sub>Condition</sub>=7.18, p=0.009; F<sub>Frequency</sub>=96.76, p<10<sup>-14</sup>; F<sub>Interaction</sub>=0.13, p=0.716, ANCOVA), contrast high (F<sub>Condition</sub>=12.51, p=0.001; F<sub>Frequency</sub>=333.74, p<10<sup>-29</sup>; F<sub>Interaction</sub>=7.91, p=0.006, ANCOVA).

      (4) One of the major points of the study is the primacy of the phase code over the rate code during the delay period. Specifically, here it is shown that information about the visual features of a stimulus carried by the rate code is similar for relevant and irrelevant locations during the delay period. This contrasts with what several studies have shown for attention in which case information carried in firing rates about stimuli in the attended location is enhanced relative to that for stimuli in the unattended location. If we are to understand how top-down signals work in cognitive functions it is inevitable to compare working memory with attention. The possible source of this difference is not clear and is not discussed. The reader is left wondering whether perhaps a different measure or analysis (e.g. a percent explained variance analysis) might reveal differences during the delay period for different visual features across the two spatial conditions.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      The use of the memory-guided saccade task has certain disadvantages in the context of this study. Although delay activity is interpreted as memory activity by the authors, it is in principle possible that it reflects preparation for the upcoming saccade, spatial attention (particularly since there is a stimulus in the RF), etc. This could potentially change the conclusion and perspective.

      We have added a new discussion paragraph addressing the relationship to attention and motor planning (lines 315-333). We have also moderated the language used to describe our conclusions throughout the manuscript in light of this ambiguity.

      For the position outside the V4 RF, there is a decrease in both beta oscillations and the clustering of spikes at a specific phase. It is therefore possible that the decrease in information about the stimuli features is a byproduct of the decrease in beta power and phase locking. Decreased oscillatory activity and phase locking can result in less reliable estimates of phase, which could decrease the mutual information estimates.

      Looking at the SNR as a ratio of power in the beta band to all other bands, there is no significant drop in SNR between conditions (SNRIN = 4.074+-984, SNROUT = 4.333+-0.834 OUT, p=0.341, Wilcoxon signed-rank). Therefore, we do not think that the change in phase coding is merely a result of less reliable phase estimates.

      The authors propose that coherent oscillations could be the mechanism through which the prefrontal cortex influences beta activity in V4. I assume they mean coherent oscillations between the prefrontal cortex and V4. Given that they do have simultaneous recordings from the two areas they could test this hypothesis on their own data, however, they do not provide any results on that.

      This paper only includes inactivation data. We are working on analyzing the simultaneous recording data for a future publication.

      The authors make a strong point about the relevance of changes in the oscillation frequency and how this may result in an increase in firing rate although it could also be the reverse - an increase in firing rate leading to an increase in the frequency peak. It is not clear at all how these changes in frequency could come about. A more nuanced discussion based on both experimental and modeling data is necessary to appreciate the source and role (if any) of this observation.

      As the reviewer notes, it is difficult to determine whether the frequency changes drive the rate changes, vice versa, or whether both are generated in parallel by a common source. We have adjusted our language to reflect this (lines 291-293). Future modeling work may be able to shed more light on the causal relationships between various neural signatures.

      Reviewer #3 (Public review):

      Summary:

      In this report, the authors test the necessity of prefrontal cortex (specifically, FEF) activity in driving changes in oscillatory power, spike rate, and spike timing of extrastriate visual cortex neurons during a visual-spatial working memory (WM) task. The authors recorded LFP and spikes in V4 while macaques remembered a single spatial location over a delay period during which task-irrelevant background gratings were displayed on the screen with varying orientation and contrast. V4 oscillations (in the beta range) scaled with WM maintenance, and the information encoded by spike timing relative to beta band LFP about the task-irrelevant background orientation depended on remembered location. They also compared recorded signals in V4 with and without muscimol inactivation of FEF, demonstrating the importance of FEF input for WM-induced changes in oscillatory amplitude, phase coding, and information encoded about background orientations. Finally, they built a network model that can account for some of these results. Together, these results show that FEF provides meaningful input to the visual cortex that is used to alter neural activity and that these signals can impact information coding of task-irrelevant information during a WM delay.

      Strengths:

      (1) Elegant and robust experiment that allows for clear tests for the necessity of FEF activity in WM-induced changes in V4 activity.

      (2) Comprehensive and broad analyses of interactions between LFP and spike timing provide compelling evidence for FEF-modulated phase coding of task-irrelevant stimuli at remembered location.

      (3) Convincing modeling efforts.

      Weaknesses:

      (1) 0% contrast background data (standard memory-guided saccade task) are not reported in the manuscript. While these data cannot be used to consider information content of spike rate/time about task-irrelevant background stimuli, this condition is still informative as a 'baseline' (and a more typical example of a WM task).

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL when remembering the V4 RF location) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (2) Throughout the manuscript, the primary measurements of neural coding pertain to task-irrelevant stimuli (the orientation/contrast of the background, which is unrelated to the animal's task to remember a spatial location). The remembered location impacts the coding of these stimulus variables, but it's unclear how this relates to WM representations themselves.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, the two points I raised in the public review merit a bit of development in the Discussion. In addition, the authors should revise some of their conclusions.

      For instance (L217):

      "The finding that WM mainly modulates phase coded information within extrastriate areas fundamentally shifts our understanding of how the top-down influence of prefrontal cortex shapes the neural representation, suggesting that inducing oscillations is the main way WM recruits sensory areas."

      In my opinion, this one is over-the-top on various counts.

      Here is another exaggerated instance (L298):

      "...leading us to conclude that representations based on the average firing rate of neurons are not the primary way that top-down signals enhance sensory processing."

      Again, as noted above, the problem is that one could make the case that the top-down signals are, in fact, highly effective, since they are completely quashing any distracter-related modulation in firing rate across RFs. There is only so much that one can conclude from responses to stimuli that are task-irrelevant, uniform across space, and constant over the course of a trial.

      I think even the title goes too far. What the work shows, by all accounts, is that the sustained activity in FEF has a definitive impact on V4 *even* with respect to a sustained, irrelevant background stimulus. The result is very robust in this sense. However, this is quite different from saying that the *primary* means of functional control for FEF is via phase coding. Establishing that would require ruling out other forms of control (i.e., rate coding) in all or a wide range of experimental conditions. That is far from the restricted set of conditions tested here and is also at variance with many other experiments demonstrating effects of attention or even FEF microstimulation on V4 firing activity.

      To reiterate, in my opinion, the work is carefully executed and the data are interesting and largely unambiguous. I simply take issue with what can be reliably concluded, and how the results fit with the rest of the literature. Revisions along these lines would improve the readability of the paper considerably.

      We have edited the title (removing the word ‘primarily’) and key sentences throughout to tone down the conclusions, generally to state that the importance of a phase code in WM modulations is *possible* given the observed results, rather than certain (see abstract lines 26-27, introduction lines 59-62, conclusion lines 310-311).

      Reviewer #3 (Recommendations for the authors):

      (1) My primary comment that came up multiple times as I read the manuscript (and which is summarized above) is that I wasn't ever sure why the authors are focused on analyzing neural coding of task-irrelevant sensory information during a WM task as a function of WM contents (remembered location). Most studies of neural codes supporting WM often focus on coding the remembered information - not other information. Conceptually, it seems that the brain would want to suppress - or at least not enhance - representations of task-irrelevant information when performing a demanding task, especially when there is no search requirement, and when there is no feature correspondence between the remembered and viewed stimuli. (i.e., the interaction between WM and visual input is more obvious for visual search for a remembered target). Why, in theory, would a visual region need to improve its coding of non-remembered information as a function of WM? This isn't meant to detract from the results, which are indeed very interesting and I think quite informative. The authors are correct that this is certainly relevant for sensory recruitment models of WM - there's clear evidence for a role of feedback from PFC to extrastriate cortex - but what role, specifically, each region plays in this task is critical to describe clearly, especially given the task-irrelevance of the input. Put another way: what if the animal was remembering an oriented grating? In that case, MI between spike-based measures and orientation would be directly relevant to questions of neural WM representations, as the remembered feature is itself being modeled. But here, the focus seems to be on incidental coding.

      Indeed, here we have focused on how maintaining spatial WM impacts visual processing of incoming sensory information, rather than on how the spatial WM signal itself is represented and maintained. Behaviorally, this impact on visual signals could be related to the effects of the content of WM on perception and reaction times (e.g., Soto et al. 2008, Awh et al. 1998, Teng et al. 2019), but no such link to behavior is shown in our data.

      Whether similar phase coding is also used to represent the content of object WM (for example, if the animal was remembering an oriented grating), or whether phase coding is only observed for WM’s modulation of the representation of incoming sensory signals, is an important question to be addressed in future work.

      (2) Related to the above, the phrasing of the second sentence of the Discussion (lines 291-292) is ambiguous - do the authors mean that the FEF sends signals that carry WM content to V4, or that FEF sends projections to V4, and V4 has the WM content? As presently phrased, either of these are reasonable interpretations, yet they're directly opposing one another (the next sentence clarifies, but I imagine the authors want to minimize any confusion).

      We have edited this sentence to read, “Within prefrontal areas, FEF sends direct projections to extrastriate visual areas, and activity in these projections reflects the content of WM.”

      (3) I'm curious about how the authors consider the spatial WM task here different from a cued spatial attention task. Indeed, both require sustained use of a location for further task performance. The section of the Discussion addressing similar results with attention (lines 307-311) presently just summarizes the similarities of results but doesn't offer a theoretical perspective for how/why these different types of tasks would be expected to show similar neural mechanisms.

      We have added discussion regarding the relationship of these results to previous findings during attention in the discussion section (lines 315-333).

      (4) As far as I can tell, there is no consideration of behavioral performance on the memory-guided saccade task (RT, precision) across the different stimulus background conditions. This should be reported for completeness, and to determine whether there is an impact of the (likely) task-irrelevant background on task performance. This analysis should also be reported for Figure 3's results characterizing how FEF inactivation disrupts behavior (if background conditions were varied, see point 7 below).

      We have added the effect of inactivation on behavioral RT and % correct across the different stimulus background conditions (Fig. S8). Background contrast and orientation did not impact either RT or % correct.

      (5) Results from Figure 2 (especially Figures 2A-B) concerning phase-locked spiking in V4 should be shown for 0%-contrast trials as well, as these trials better align with 'typical' WM tasks.

      We have added a new supplementary figure to show the effect of WM on V4 LFP power and SPL in 0% contrast trials (Fig. S6). These results (increases in beta LFP power and SPL) match our previous report for the effect of spatial WM on LFP power and SPL within extrastriate area MT (Bahmani et al. 2018).

      (6) The magnitude of SPL difference in aggregate (Figure 2B) is much, much smaller than that of the example site shown (Figure 2A), such that Figure 2A's neuron doesn't appear to be visible on Figure 2B's scatterplot. Perhaps a more representative sample could be shown? Or, the full range of x/y axes in Figure 2B could be plotted to illustrate the full distribution.

      We have updated Fig. 2A with a more representative sample neuron.

      (7) I'm a bit confused about the FEF inactivation experiments. In the Methods (lines 512-513), the authors mention there was no background stimulus presented during the inactivation experiment, and instead, a typical 8-location MGS task was employed. However, in the results on pg 8 (Lines 201-214), and Figure 3G, the authors quantify a phase code MI. The previous phase code MI analysis was looking at MI between each spike's phase and the background stimulus - but if there's no background, what's used to compute phase code MI? Perhaps what they meant to write was that, in addition to the primary task with a manipulation of background properties, an 8-location MGS task was additionally employed.

      The reviewer is correct that both tasks were used after inactivation (the 8-location task to assess the spread of the behavioral effect of inactivation, and the MGS-background task for measuring MI). We have edited the methods text to clarify.

      (8) How is % Correct defined for the MGS task? (what is the error threshold? Especially for the results described in lines 192-193).

      The % correct is defined as correct completed trials divided by the total number of trials; the target window was a circle with radius of 2 or 4 dva (depending on cue eccentricity). These details have been added to the Methods.

      (9) The paragraph from lines 183-200 describes a number of behavioral results concerning "scatter" and "RT" - the RT shown seems extremely high, and perhaps is normalized. Details of this normalization should be included in the Methods. The "scatter" is listed as dva, but it's not clear how scatter is quantified (std dev of endpoint distribution? Mean absolute error), nor how target eccentricity is incorporated (as scatter is likely higher for greater target eccentricity).

      We have renamed ‘scatter’ to ‘saccade error’ in the text to match the figure, and now provide details in the Methods section. Both RT and saccade error are normalized for each session, details are now provided in the Methods. Since error was normalized for each session before performing population statistics, no other adjustment for eccentricity was made.

    1. It’s Friday at 7:30 pm and Amy is really tired after work. Her wife isn’t home yet—she had to stay late—and so while she’d normally eat out, she’s not eager to go out alone, nor is she eager to make a big meal just for herself. She throws a frozen dinner in the microwave and heads to the living room to sit down on her couch to rest her legs. Once it’s done, she takes it out, eats it far too fast, and spends the rest of the night regretting her poor diet and busy day.

      Although not disagreeing with the use of personas to understand user problems, is it not concerning that through their creation we may be including our own personal biases and backgrounds? I found it hard to truly understand and empathize with the scenario, most likely because my own scenario and lifestyle is very different to Ko's. My positionality as a young, healthy college aged person would impact my ability to define issues with this user persona. I think that the designer's background and situation should be heavily considered before the creation of a user persona or scenario in order to minimize the implications of bias.

    1. Now, that doesn’t mean that a situation is undesirable to everyone. For one person a situation might be undesirable, but to another, it might be greatly desirable.

      This is an important statement that stands out to me. It connects to what I learned in INFO380 where we discussed how when companies are making designs and adding features, they have to consider which ones are the most beneficial and desirable. I think that this shows the importance of designers doing research and learning more about the stakeholders involved to help them make these decisions. If research isn't done properly, the company may waste a lot of resources and time. I also think it is important to consider underrepresented demographics that may be users of the product and see how they can possibly be considered in these decision-making processes. This makes me appreciate designers even more as this process is not easy and can be difficult having to make decisions that don't please some users but this might be something they learn as they develop their skills since you can't please everyone.

    1. Bots present a similar disconnect between intentions and actions. Bot programs are written by one or more people, potentially all with different intentions, and they are run by others people, or sometimes scheduled by people to be run by computers. This means we can analyze the ethics of the action of the bot, as well as the intentions of the various people involved, though those all might be disconnected.

      This part brings up a very interesting point about who is responsible for a bots actions. I think it is who ever used it, the creator might have different intentions for its use, and the bot may get used differently.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02465

      Corresponding author(s): Saravanan, Palani

      1. General Statements

      We would like to thank the Review Commons Team for handling our manuscript and the Reviewers for their constructive feedback and suggestions. In our revised manuscript, we have addressed and incorporated all the major suggestions of the reviewers, and we have also added new significant data on the role of Tropomyosin in regulation of endocytosis through its control over actin monomer pool maintenance and actin network homeostasis. We believe that with all these additions, our study has significantly gained in quality, strength of conclusions made, and scope for future work.

      2. Point-by-point description of the revisions

      Reviewer #1

      Evidence, reproducibility and clarity

      There are 2 Major issues -

      Having an -ala-ser- linker between the GFP and tropomyosin mimics acetylation. This is not the case, and more likely the this linker acts as a spacer that allows tropomyosin polymers to form on the actin, and without it there is steric hindrance. A similar result would be seen with a simple flexible uncharged linker. It has been shown in a number of labs that the GFP itself masks the effect of the charge on the amino terminal methionine. This is consistent with NMR, crystallographic and cryo structural studies. Biochemical studies should be presented to demonstrate that the impact of a linker for the conclusions stated to be made, which provide the basis of a major part of this study.

      Response: We would like to clarify that all mNG-Tpm constructs used in our study contain a 40 amino-acid (aa) flexible linker between the N-terminal mNG fluorescent protein and the Tpm protein as per our earlier published study (Hatano et al., 2022). During initial optimization, we have also experimented with linker length and the 40aa-linker length works optimally for clear visualization of Tpm onto actin cable structures in budding yeast, fission yeast (both S. pombe and S. japonicus), and mammalian cells (Hatano et al., 2022). These constructs have also been used since in other studies (Wirshing et al., 2023; Wirshing and Goode, 2024) and currently represents the best possible strategy to visualize Tpm isoforms in live cells. In our study, we characterized these proteins for functionality and found that both mNG-Tpm1 and mNG-Tpm2 were functional and can rescue the synthetic lethality observed in Dtpm1Dtpm2 cells. During our study, we observed that mNG-Tpm1 expression from a single-copy integration vector did not restore full length actin cables in Dtpm1 cells (Fig. 1B, 1C). We hypothesized that this could be a result of reduced binding affinity of the tagged tropomyosin due to lack of normal N-terminal acetylation which stabilizes the N-terminus. The 40aa linker is unstructured and may not be able to neutralize the charge on the N-terminal Methionine, thus, we tried to insert -Ala-Ser- dipeptide which has been routinely used in vitro biochemical studies to stabilize the N-terminal helix and impart a similar effect as the N-terminal acetylation (Alioto et al., 2016; Palani et al., 2019; Christensen et al., 2017) by restoring normal binding affinity of Tpm to F-actin (Monteiro et al., 1994; Greenfield et al., 1994). We observed that addition of the -Ala-Ser- dipeptide to mNG-Tpm fusion, indeed, restored full length actin cables when expressed in Dtpm1 cells, performing significantly better in our in vivo experiments (Fig. 1B, 1C). We agree with the reviewer that the -AS- dipeptide addition may not mimic N-terminal acetylation structurally but as per previous studies, it may stabilize the N-terminus of Tpm and allow normal head-to-tail dimer formation (Greenfield et al., 1994; Monteiro et al., 1994; Frye et al., 2010). We have discussed this in our new Discussion section (Lines 350-372). Since, the addition of -AS- dipeptide was referred to as "acetyl-mimic (am)" in a previous study (Alioto et al., 2016), we continued to use the same nomenclature in our study. Now as per your suggestions and to be more accurate, we have renamed "mNG-amTpm" constructs as "mNG-ASTpm" throughout the study to not confuse or claim that -AS- addition mimics acetylation. In any case, we have not seen any other ill effect of -AS- dipeptide introduction in addition to our 40 amino acid linker suggesting that it can also be considered part of the linker. Although, we agree with the reviewer that biochemical characterization of the effect of linker would be important to determine, we strongly believe that it is currently outside the scope of this study and should be taken up for future work with these proteins. Our study has majorly aimed to understand the functionality and utility of these mNG-Tpm fusion proteins for cell biological experiments in vivo, which was not done earlier in any other model system.

      My major issue however is making the conclusions stated here, using an amino-terminal fluorescent protein tag that s likely to impact any type of isoform selection at the end of the actin polymer. Carboxyl terminal tagging may have a reduced effect, but modifying the ends of the tropomyosin, which are integral in stabilising end to end interactions with itself on the actin filament, never mind any section systems that may/maynot be present in the cell, is not appropriate.

      Response: We agree with the reviewer that N-terminal tagging of tropomyosin may have effects on its function, but these constructs represent the only fluorescently tagged functional tropomyosin constructs available currently while C-terminal fusions are either non-functional (we were unable to construct strains with endogenous Tpm1 gene fused C-terminally to GFP) or do not localize clearly to actin structures (See Figure R1 showing endogenous C-terminally tagged Tpm2-yeGFP that shows almost no localization to actin cables). To our knowledge, our study represents a first effort to understand the question of spatial sorting of Tpm isoforms, Tpm1 and Tpm2, in S. cerevisiae and any future developments with better visualization strategies for Tpm isoforms without compromising native N-terminal modifications and function will help improve our understanding of these proteins in vivo. We have also discussed these possibilities in our new Discussion section (Lines 391-396).

      Significance

      This paper explores the role of formin in determining the localisation of different tropomyosins to different actin polymers and cellular locations within budding yeast. Previous studies have indicated a role for the actin nucleating proteins in recruiting different forms of tropomyosin within fission yeast. In mammalian cells there is variation in the role of formins in affiecting tropomyosin localisation - variation between cell type. There is also evidence that other actin binding proteins, and tropomyosin abundance play roles in regulating the tropomyosin-actin association according to cell type. Biochemical studies have previously been undertaken using budding yeast and fission yeast that the core actin polymerisation domain of formins do not interact with tropomyosin directly. The significance of this study, given the above, and the concerns raised is not clear to this reviewer.

      Response: __Our study explores multiple facets of Tropomyosin (Tpm) biology. The lack of functional tagged Tpm has been a major bottleneck in understanding Tpm isoform diversity and function across eukaryotes. In our study, we characterize the first functional tagged Tpm proteins (Fig. 1, Fig. S1) and use them to answer long-standing questions about localization and spatial sorting of Tpm isoforms in the model organism S. cerevisiae (Fig. 2, Fig. 3, Fig. S2, Fig. S3). We also discover that the dual Tpm isoforms, Tpm1 and Tpm2, are functionally redundant for actin cable organization and function, while having gained divergent functions in Retrograde Actin Cable Flow (RACF) (Fig. 4, Fig. 5A-D, Fig. S4, Fig. S5, Fig. S6). We have now added new data on role of global Tpm levels controlling endocytosis via maintenance of normal linear-to-branched actin network homeostasis in S. cerevisiae (Fig. 5E-G)__. We respectfully differ with the reviewer on their assessment of our study and request the reviewer to read our revised manuscript which discusses the significance, limitations, and future perspectives of our study in detail.

      Reviewer #2

      Evidence, reproducibility and clarity

      This manuscript by Dhar, Bagyashree, Palani and colleagues examines the function of the two tropomyosins, Tpm1 and Tpm2, in the budding yeast S. cerevisiae. Previous work had shown that deletion of tpm1 and tpm2 causes synthetic lethality, indicating overlapping function, but also proposed that the two tropomyosins have distinct functions, based on the observation that strong overexpression of Tpm2 causes defects in bud placement and fails to rescue tpm1∆ phenotypes (Drees et al, JCB 1995). The manuscript first describes very functional mNeonGreen tagged version of Tpm1 and Tpm2, where an alanine-serine dipeptide is inserted before the first methionine to mimic acetylation. It then proposes that the Tpm1 and Tpm2 exhibit indistinguishable localization and that low level overexpression (?) of Tpm2 can replace Tpm1 for stabilization of actin cables and cell polarization, suggesting almost completely redundant functions. They also propose on specific function of Tpm2 in regulating retrograde actin cable flow.

      Overall, the data are very clean, well presented and quantified, but in several places are not fully convincing of the claims. Because the claims that Tpm1 and Tpm2 have largely overlapping function and localization are in contradiction to previous publication in S. cerevisiae and also different from data published in other organisms, it is important to consolidate them. There are fairly simple experiments that should be done to consolidate the claims of indistinguishable localization, and levels of expression, for which the authors have excellent reagents at their disposal.

      1. Functionality of the acetyl-mimic tagged tropomyosin constructs: The overall very good functionality of the tagged Tpm constructs is convincing, but the authors should be more accurate in their description, as their data show that they are not perfectly functional. For instance, the use of "completely functional" in the discussion is excessive. In the results, the statement that mNG-Tpm1 expression restores normal growth (page 3, line 69) is inaccurate. Fig S1C shows that tpm1∆ cells expressing mNG-Tpm1 grow more slowly than WT cells. (The next part of the same sentence, stating it only partially restores length of actin cables should cite only Fig S1E, not S1F.) Similarly, the growth curve in Fig S1C suggests that mNG-amTpm1, while better than mNG-Tpm1 does not fully restore the growth defect observed in tpm1∆ (in contrast to what is stated on p. 4 line 81). A more stringent test of functionality would be to probe whether mNG-amTpm1 can rescue the synthetic lethality of the tpm1∆ tpm2∆ double mutant, which would also allow to test the functionality of mNG-amTpm2.

      __Response: __We would like to thank the reviewer for his feedback and suggestions. Based on the suggestions, we have now more accurately described the growth rescue observed by expression of mNG-ASTpm1 in Dtpm1 cells in the revised text. We have also removed the use of "completely functional" to describe mNG-Tpm functionality and corrected any errors in Figure citations in the revised manuscript.

      As per reviewers' suggestion, we have now tested rescue of synthetic lethality of Dtpm1Dtpm2 cells by expression of all mNG-Tpm variants and we find that all of them are capable of restoring the viability of Dtpm1Dtpm2 cells when expressed under their native promoters via a high-copy plasmid (pRS425) (Fig. S1E) but only mNG-Tpm1 and mNG-ASTpm1 restored viability of Dtpm1Dtpm2 cells when expressed under their native promoters via an integration plasmid (pRS305) (Fig. S1F). These results clearly suggest that while both mNG-Tpm1 and mNG-Tpm2 constructs are functional, Tpm1 tolerates the presence of the N-terminal fluorescent tag better than Tpm2. These observations now enhance our understanding of the functionality of these mNG-Tpm fusion proteins and will be a useful resource for their usage and experimental design in future studies in vivo.

      It would also be nice to comment on whether the mNG-amTpm constructs really mimicking acetylation. Given the Ala-Ser peptide ahead of the starting Met is linked N-terminally to mNG, it is not immediately clear it will have the same effect as a free acetyl group decorating the N-terminal Met.

      Response: __We agree with the reviewer's observation and for the sake of clarity and accuracy, we have now renamed "mNG-amTpm" with "mNG-ASTpm". The use of -AS- dipeptide is very routine in studies with Tpm (Alioto et al., 2016; Palani et al., 2019; Christensen et al., 2017) and its addition restores normal binding affinities to Tpm proteins purified from E. coli (Monteiro et al., 1994). We agree with the reviewer that the -AS- dipeptide addition may not mimic N-terminal acetylation structurally but as per previous studies, it may help neutralize the impact of a freely protonated Met on the alpha-helical structure and stabilize the N-terminus helix of Tpm and allow normal head-to-tail dimer formation (Monteiro et al., 1994; Frye et al., 2010; Greenfield et al., 1994). Consistent with this, we also observe a highly significant improvement in actin cable length when expressing mNG-ASTpm as compared to mNG-Tpm in Dtpm1 cells, suggesting an improvement in function probably due to increased binding affinity (Fig. 1B, 1C). We have also discussed this in our answer to Question 1 of Reviewer 1 and the revised manuscript (Lines 350-372)__.

      __ Localization of Tpm1 and Tpm2:__Given the claimed full functionality of mNG-amTpm constructs and the conclusion from this section of the paper that relative local concentrations may be the major factor in determining tropomyosin localization to actin filament networks, I am concerned that the analysis of localization was done in strains expressing the mNG-amTpm construct in addition to the endogenous untagged genes. (This is not expressly stated in the manuscript, but it is my understanding from reading the strain list.) This means that there is a roughly two-fold overexpression of either tropomyosin, which may affect localization. A comparison of localization in strains where the tagged copy is the sole Tpm1 (respectively Tpm2) source would be much more conclusive. This is important as the results are making a claim in opposition to previous work and observation in other organisms.

      Response: __We thank the reviewer for this observation and their suggestions. We agree that relative concentrations of functional Tpm1 and Tpm2 in cells may influence the extent of their localizations. As per the reviewer's suggestion, we have now conducted our quantitative analysis in cells lacking endogenous Tpm1 and only expressing mNG-ASTpm1 from an integrated plasmid copy at the leu2 locus and the data is presented in new __Figure S3. We compared Tpm-bound cable length (Fig. S3A, S3B) __and Tpm-bound cable number (Fig. S3A, S3C) along with actin cable length (Fig. S3D, S3E) and actin cable number (Fig. S3D, S3F) in wildtype, Dbnr1, and Dbni1 cells. Our analysis revealed that mNG-ASTpm1 localized to actin cable structures in wildtype, Dbnr1, and Dbni1 cells and the decrease observed in Tpm-bound cable length and number upon loss of either Bnr1 or Bni1, was accompanied by a corresponding decrease in actin cable length and number upon loss of either Bnr1 or Bni1. Thus, this analysis reached the same conclusion as our earlier analysis (Fig. 2) that mNG-ASTpm1 does not show preference between Bnr1 and Bni1-made actin cables. mNG-ASTpm2 did not restore functionality, when expressed as single integrated copy, in Dtpm1Dtpm2 cells (new results in __Fig. S1E, S1F, S5A) thus, we could not conduct a similar analysis for mNG-ASTpm2. This suggests that use of mNG-ASTpm2 would be more meaningful in the presence of endogenous Tpm2 as previously done in Fig. 2D-F.

      We have now also performed additional yeast mating experiments with cells lacking bnr1 gene and expressing either mNG-ASTpm1 or mNG-ASTpm2 and the data is shown in new Figure 3. From these observations, we observe that both mNG-ASTpm1 and mNG-ASTpm2 localize to the mating fusion focus in a Bnr1-independent manner (Fig. 3B, 3D) and suggests that they bind to Bni1-made actin cables that are involved in polarized growth of the mating projection. These results also add strength to our conclusion that Tpm1 and Tpm2 localize to actin cables irrespective of which formin nucleates them. Overall, these new results highlight and reiterate our model of formin-isoform independent binding of Tpm1 and Tpm2 in S. cerevisiae.

      In fact, although the authors conclude that the tropomyosins do not exhibit preference for certain actin structures, in the images shown in Fig 2A and 2D, there seems to be a clear bias for Tpm1 to decorate cables preferentially in the bud, while Tpm2 appears to decorate them more in the mother cell. Is that a bias of these chosen images, or does this reflect a more general trend? A quantification of relative fluorescence levels in bud/mother may be indicative.

      Response: __We thank the reviewer for pointing this out. Our data and analysis do not suggest that Tpm1 and Tpm2 show any preference for decoration of cables in either mother or bud compartment. As per the reviewer's suggestion, we have now quantified the ratio of mean mNG fluorescence in the bud to the mother (Bud/Mother) and the data is shown in __Figure. S2G. The bud-to-mother ratio was similar for mNG-ASTpm1 and mNG-ASTpm2 in wildtype cells, and the ratio increased in Dbnr1 cells and decreased in Dbni1 cells for both mNG-ASTpm1 and mNG-ASTpm2 (Fig. S2G). __This is consistent with the decreased actin cable signal in the mother compartment in Dbnr1 cells and decreased actin cable signal in the bud compartment in Dbni1 cells (Fig. S2A-D). Thus, our new analysis shows that both mNG-ASTpm1 and mNG-ASTpm2 have similar changes in their concentration (mean fluorescence) upon loss of either formins Bnr1 and Bni1 and show similar ratios in wildtype cells as well, suggesting no preference for binding to actin cables in either bud or mother compartment. The preference inferred by the reviewer seems to be a bias of the current representative images and thus, we have replaced the images in __Fig. 2A, 2D to more accurately represent the population.

      The difficulty in preserving mNG-amTpm after fixation means that authors could not quantify relative Tpm/actin cable directly in single fixed cells. Did they try to label actin cables with Lifeact instead of using phalloidin, and thus perform the analysis in live cells?

      __Response: __We did not use LifeAct for our analysis as LifeAct is known to cause expression-dependent artefacts in cells (Courtemanche et al., 2016; Flores et al., 2019; Xu and Du, 2021) and it also competes with proteins that regulate normal cable organization like cofilin. Use of LifeAct would necessitate standardization of expression to avoid such artefacts in vivo. Also, phalloidin staining provides the best staining of actin cables and allows for better quantitative results in our experiments. The use of LifeAct along with mNG-Tpm would also require optimization with a red fluorescent protein which usually tend to have lower brightness and photostability. However, during the revision of our study, a new study from Prof. Goode's lab has developed and optimized expression of new LifeAct-3xmNeonGreen constructs for use in S. cerevisiae (Wirshing and Goode, 2024). Thus, a similar strategy of using tandem copies of bright and photostable red fluorescent proteins can be explored for use in combination with mNG-Tpm in the future studies.

      __ Complementation of tpm1∆ by Tpm2:__

      I am confused about the quantification of Tpm2 expression by RT-PCR shown in Fig S3F. This figure shows that tpm2 mRNA expression levels are identical in cells with an empty plasmid or with a tpm2-encoding plasmid. In both strains (which lack tpm1), as well as in the WT control, one tpm2 copy is in the genome, but only one strain has a second tpm2 copy expressed from a centromeric plasmid, yet the results of the RT-PCR are not significantly different. (If anything, the levels are lower in the tpm2 plasmid-containing strain.) The methods state that the primers were chosen in the gene, so likely do not distinguish the genomic from the plasmid allele. However, the text claims a 1-fold increase in expression, and functional experiments show a near-complete rescue of the tpm1∆ phenotype. This is surprising and confusing and should be resolved to understand whether higher levels of Tpm2 are really the cause of the observed phenotypic rescue.

      The authors could for instance probe for protein levels. I believe they have specific nanobodies against tropomyosin. If not, they could use expression of functional mNG-amTpm2 to rescue tpm1∆. Here, the expression of the protein can be directly visualized.

      Response: __We thank the reviewer for pointing this out. We would like to clarify that in our RT-qPCR experiments, the primers were chosen within the Tpm1 and Tpm2 gene and do not distinguish between transcripts from endogenous or plasmid copy. We have now mentioned this in the Materials and Methods section of the revised manuscript. So, they represent a relative estimate of the total mRNA of these genes present in cells. We were consistently able to detect ~19 fold increase in Tpm2 total mRNA levels as compared to wildtype and ∆tpm1 cells (Fig. S4D) when tpm2 was expressed from a high-copy plasmid (pRS425). This increase in Tpm2 mRNA levels was accompanied by a rescue in growth (Fig. S4A) and actin cable organization (Fig. S4B) of ∆tpm1 cells containing pRS425-ptpm2TPM2. When tpm2 was expressed from a low-copy number centromeric plasmid (pRS316), we detected a ~2 fold increase in Tpm2 transcript levels when using the tpm1 promoter and no significant change was detected when using tpm2 promoter (Fig. S4E)__. We have made sure that these results are accurately described in the revised manuscript.

      As per the reviewer's suggestion, we have now conducted a more extensive analysis to ascertain the expression levels of Tpm2 in our experiments and the data is now presented in new Figure S5. We used mNG-ASTpm1 and mNG-ASTpm2 to rescue growth of ∆tpm1 (Fig. S5A) and correlated growth rescue with protein levels using quantified fluorescence intensity (Fig. S5B, S5C) and western blotting (anti-mNG) (Fig. S5D, S5E). We find that ∆tpm1 cells containing pRS425-ptpm1mNG-ASTpm1 had the highest protein level followed by pRS425-ptpm2 mNG-ASTpm2, pRS305-ptpm1mNG-ASTpm1, and the least protein levels were found in pRS305-ptpm2 mNG-ASTpm2 containing ∆tpm1 cells in both fluorescence intensity and western blotting quantifications (Fig. S5C, S5E). Surprisingly, we were not able to detect any protein levels in ∆tpm1 cells containing pRS305-ptpm2 mNG-ASTpm2 with western blotting (Fig. S5D) which was also accompanied by a lack of growth rescue (Fig. S5A). This most likely due to weak expression from the native Tpm2 promoter which is consistent with previous literature (Drees et al., 1995). Taken together, this data clearly shows that the rescue observed in ∆tpm1 cells is caused due to increased expression of mNG-ASTpm2 in cells and supports our conclusion that increase in Tpm2 expression leads to restoration of normal growth and actin cables in ∆tpm1 cells.

      __ Specific function of Tpm2:__

      The data about the retrograde actin flow is interpreted as a specific function of Tpm2, but there is no evidence that Tpm1 does not also share this function. To reach this conclusion one would have to investigate retrograde actin flow in tpm1∆ (difficult as cables are weak) or for instance test whether Tpm1 expression restores normal retrograde flow to tpm2∆ cells.

      Response: __We agree with the reviewer and as per the reviewer's suggestion, we have performed another experiment which include wildtype, ∆tpm2 cells containing empty pRS316 vector or pRS316-ptpm2TPM1 or pRS316-ptpm1TPM1. We find that RACF rate increased in ∆tpm2 cells as compared to wildtype and was restored to wildtype levels by exogenous expression of Tpm2 but not Tpm1 (Fig. S6E, S6F). Since, actin cables were not detectable in ∆tpm1 cells, we measured RACF rates in ∆tpm1 cells expressing Tpm1 or Tpm2 from a plasmid copy, which restored actin cables as shown previously in __Fig. 5A-C. We observed that RACF rates were similar to wildtype in ∆tpm1 cells expressing either Tpm1 or Tpm2 (Fig. S6E, S6F), suggesting that Tpm1 is not involved in RACF regulation. Taken together, these results suggest a specific role for Tpm2, but not Tpm1, in RACF regulation in S. cerevisiae, consistent with previous literature (Huckaba et al., 2006).

      Minor comments: __1.__The growth of tpm1∆ with empty plasmid in Fig S3A is strangely strong (different from other figures).

      Response: We thank the reviewer for pointing this out. We have now repeated the drop test multiple times (Fig. R2), but we see similar growth rates as the drop test already presented in Fig. S4A. __At this point, it would be difficult to ascertain the basis of this difference observed at 23{degree sign}C and 30{degree sign}C, but a recent study that links leucine levels to actin cable stability (Sing et al., 2022) might explain the faster growth of these ∆tpm1 cells containing a leu2 gene carrying high-copy plasmid. However, there is no effect on growth rate at 37{degree sign}C which is consistent with other spot assays shown in __Fig. S1D, S4F, S5A.

      Significance

      I am a cell biologist with expertise in both yeast and actin cytoskeleton.

      The question of how tropomyosin localizes to specific actin networks is still open and a current avenue of study. Studies in other organisms have shown that different tropomyosin isoforms, or their acetylated vs non-acetylated versions, localize to distinct actin structures. Proposed mechanisms include competition with other ABPs and preference imposed by the formin nucleator. The current study re-examines the function and localization of the two tropomyosin proteins from the budding yeast and reaches the conclusion that they co-decorate all formin-assembled structures and also share most functions, leading to the simple conclusion that the more important contribution of Tpm1 is simply linked to its higher expression. Once consolidated, the study will appeal to researchers working on the actin cytoskeleton.

      We thank the reviewer for their positive assessment of our work and the constructive feedback that has greatly improved the quality of our study. After addressing the points raised by the reviewer, we believe that our study has significantly gained in consolidating the major conclusions of our work.

      **Referees cross-commenting**

      Having read the other reviewers' comments, I do agree with reviewer 1 that it is not clear whether the Ala-Ser linker really mimics acetylation. I am less convinced than reviewer 3 that the key conclusions of the study are well supported, notably the issue of Tpm2 expression levels is not convincing to me.

      Response: __We acknowledge the reviewer's point about the effect of Ala-Ser dipeptide and would request the reviewer to refer to our response to Reviewer 1 (Question 1) for a more detailed discussion on this. We have also extensively addressed the question of Tpm2 expression levels as suggested by the reviewer (new data in __Figure S5) which has further strengthened the conclusions of our study.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:__ The study presents the first fully functional fluorescently tagged Tpm proteins, enabling detailed probing of Tpm isoform localization and functions in live cells. The authors created a modified fusion protein, mNG-amTpm, which mimicked native N-terminal acetylation and restored both normal growth and full-length actin cables in yeast cells lacking native Tpm proteins, demonstrating the constructs' full functionality. They also show that Tpm1 and Tpm2 do not have a preference for actin cables nucleated by different formins (Bnr1 and Bni1). Contrary to previous reports, the study found that overexpressing Tpm2 in Δtpm1 cells could restore growth rates and actin cable formation. Furthermore, it is shown that despite its evolutionary divergence, Tpm2 retains actin-protective functions and can compensate for the loss of Tpm1, contributing to cellular robustness.

      Major and Minor Comments: 1. The key conclusions of this paper are convincing. However, I suggest that more detail be provided regarding the image analysis used in this study. Specifically, since threshold settings can impact the quality of the generated data and, therefore, its interpretation, it would be useful to see a representative example of the quantification methods used for actin cable length/number (as in refs. 80 and 81) and mitochondria morphology. These could be presented as Supplemental Figures. Additionally, it would help to interpret the results if the authors could be more specific about the statistical tests that were used.

      Response: __We agree with the reviewer's suggestions and have now updated our Materials and Methods section to describe the image analysis pipelines used in more detail. We have also added examples of quantification procedure for actin cable length/number and mitochondrial morphology as an additional Supplementary __Figure S7. Briefly, the following pipelines were used:

      • Actin cable length and number analysis: This was done exactly as mentioned in McInally et al., 2021, McInally et al., 2022. Actin cables were manually traced in Fiji as shown in __ S7A__, and then the traces files for each cell were run through a Python script (adapted from McInally et al., 2022) that outputs mean actin cable length and number per cell.
      • Mitochondria morphology: Mitochondria Analyzer plug-in in Fiji was used to segment out the mitochondrial fragments. The parameters used for 2D segmentation of mitochondria were first optimized using "2D Threshold Optimize" to find the most accurate segmentation and then the same parameters were run on all images. After segmentation of the mitochondrial network, measurements of fragment number were done using "Analyze Particles" function in Fiji. An example of the overall process is shown in __ S7B.__ As per the reviewer's suggestion, we have now included the description of the statistical test used in the Figure Legends of each Figure in the revised manuscript. We have used One-Way Anova with Tukey's Multiple Comparison test, Kruskal-Wallis test with Dunn's Multiple Comparisons, and Unpaired Two-tailed t-test using the in-built functions in GraphPad Prism (v.6.04).

      **Referees cross-commenting**

      I agree with both reviewers 1 and 2 regarding the issues with the Ala-Ser acetylation mimic and Tpm2 expression levels, respectively. I think the authors should be more careful in how they frame the results, but I consider that these issues do not invalidate the main conclusions of this study.

      Response: __We acknowledge the reviewer's concern about the Ala-Ser dipeptide and would request them to refer our earlier discussion on this in response to Reviewer 1 (Question 1) and Reviewer 2 (Question 2). We would also request the reviewer to refer to our answer to Reviewer 2 (Question 6) where we have extensively addressed the question of Tpm2 expression levels and their effect on rescue of Dtpm1 cells. This data is now presented as new __Figure S5 in our revised manuscript.

      Reviewer#3 (Significance (Required)):

      The finding that Tpm2 can compensate for the loss of Tpm1, restoring actin cable organization and normal growth rates, challenges previous assumptions about the non-redundant functions of these isoforms in Saccharomyces cerevisiae (ref. 16). It also supports a concentration-dependent and formin-independent localization of Tpm isoforms to actin cables in this species. The development of fully functional fluorescently tagged Tpm proteins is a significant methodological advancement. This advancement overcomes previous visualization challenges and allows for accurate in vivo studies of Tpm function and regulation in S. cerevisiae.

      The findings will be of particular interest to researchers in the field of cellular and molecular biology who study actin cytoskeleton dynamics. Additionally, it will be relevant for those utilizing advanced microscopy and live-cell imaging techniques.

      As a researcher, my experience lies in cytoskeleton dynamics and protein interactions, though I do not have specific experience related to tropomyosin. I use different yeast species as models and routinely employ live-cell imaging as a tool.

      We thank the reviewer for their positive outlook and assessment of our study. We have incorporated all their suggestions, and we are confident that the revised manuscript has significantly improved in quality due to these additions.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Summary

      In this extensive comparative study, Moreno-Borrallo and colleagues examine the relationships between plasma glucose levels, albumin glycation levels, diet and lifehistory traits across birds. Their results confirmed the expected positive relationship between plasma blood glucose level and albumin glycation rate but also provided findings that are somewhat surprising or contrast with findings of some previous studies (positive relationships between blood glucose and lifespan, or absent relationships between blood glucose and clutch mass or diet). This is the first extensive comparative analysis of glycation rates and their relationships to plasma glucose levels and life history traits in birds that is based on data collected in a single study, with blood glucose and glycation measured using unified analytical methods (except for blood glucose data for 13 species collected from a database).

      Strengths

      This is an emerging topic gaining momentum in evolutionary physiology, which makes this study a timely, novel and important contribution. The study is based on a novel data set collected by the authors from 88 bird species (67 in captivity, 21 in the wild) of 22 orders, except for 13 species, for which data were collected from a database of veterinary and animal care records of zoo animals (ZIMS). This novel data set itself greatly contributes to the pool of available data on avian glycemia, as previous comparative studies either extracted data from various studies or a ZIMS database (therefore potentially containing much more noise due to different methodologies or other unstandardised factors), or only collected data from a single order, namely Passeriformes. The data further represents the first comparative avian data set on albumin glycation obtained using a unified methodology. The authors used LC-MS to determine glycation levels, which does not have problems with specificity and sensitivity that may occur with assays used in previous studies. The data analysis is thorough, and the conclusions are substantiated. Overall, this is an important study representing a substantial contribution to the emerging field evolutionary physiology focused on ecology and evolution of blood/plasma glucose levels and resistance to glycation.

      Weaknesses

      Unfortunately, the authors did not record handling time (i.e., time elapsed between capture and blood sampling), which may be an important source of noise because handling-stress-induced increase in blood glucose has previously been reported. Moreover, the authors themselves demonstrate that handling stress increases variance in blood glucose levels. Both effects (elevated mean and variance) are evident in Figure ESM1.2. However, this likely makes their significant findings regarding glucose levels and their associations with lifespan or glycation rate more conservative, as highlighted by the authors.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I understand that your main objective regarding glycation rate and lifespan, was to analyse the species resistance to glycation with respect to lifespan, while factoring out the species-specific variation in blood glucose level. However, I still believe that the absolute glycation level (i.e., not controlled for blood glucose level) may also be important for the evolution of lifespan. Given that blood glucose is positively related to both glycation and lifespan (although with a plateau in the latter case), lifespan could possibly be positively correlated with absolute glycation levels. If significant, that would be an interesting and counterintuitive finding, which would call for an explanation, thereby potentially stimulating further research. If not significant, it would show that long-lived species do not have higher glycation levels, despite having higher blood glucose levels, thereby strengthening your argument about higher resistance of longlived species to glycation. So, in my opinion, the inclusion of an additional model of glycation level on life-history traits, without controlling for blood glucose, is worth considering.

      We include now this model as supplementary material, indicating it in several parts of the text, including some of these issues we discussed here.

      Lines 230-231: Please, provide a citation for these GVIF thresholds

      We include it now.

      Figure 3: I think that showing both glucose and glycation rate on the linear scale, rather than log scale, would better illustrate your conclusion - the slowing rise of glycation rate with increasing glucose levels.

      That is a good point, although it may also be confusing for readers to see a graph that represents the data in a different way as the models. Maybe showing both graphs (as 3.A and 3.B) can solve it?

      Figure 4. I recommend stating in the caption that the whiskers do not represent interquartile ranges (a standard option in box plots) but credible intervals as mentioned in the current version of the public author response.

      Sorry about that, it was missed. Now it is included. Nevertheless, interquartile ranges from the posterior distributions can still be observed here represented with the boxes. Then the whiskers are the credible intervals.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Guo and colleagues used a cell rounding assay to screen a library of compounds for inhibition of TcdB, an important toxin produced by Clostridioides difficile. Caffeic acid and derivatives were identified as promising leads, and caffeic acid phenethyl ester (CAPE) was further investigated.

      Strengths:

      Considering the high morbidity rate associated with C. difficile infections (CDI), this manuscript presents valuable research in the investigation of novel therapeutics to combat this pressing issue. Given the rising antibiotic resistance in CDI, the significance of this work is particularly noteworthy. The authors employed a robust set of methods and confirmatory tests, which strengthened the validity of the findings. The explanations provided are clear, and the scientific rationale behind the results is well-articulated. The manuscript is extremely well-written and organized. There is a clear flow in the description of the experiments performed. Also, the authors have investigated the effects of CAPE on TcdB in careful detail and reported compelling evidence that this is a meaningful and potentially useful metabolite for further studies.

      Weaknesses:

      This is really a manuscript about CAPE, not caffeic acid, and the title should reflect that. Also, a few details are missing from the description of the experiments. The authors should carefully revise the manuscript to ascertain that all details that could affect the interpretation of their results are presented clearly. Just as an example, the authors state in the results section that TcdB was incubated with compounds and then added to cells. Was there a wash step in between? Could compound carryover affect how the cells reacted independently from TcdB? This is just an example of how the authors should be careful with descriptions of their experimental procedures. Lastly, authors should be careful when drawing conclusions from the analysis of microbiota composition data. Ascribing causality to correlational relationships is a recurring issue in the microbiome field. Therefore, I suggest authors carefully revise the manuscript and tone down some statements about the impact of CAPE treatment on the gut microbiota.

      Thanks for your constructive suggestion. We have carefully revised the manuscript, including the description of title, results and methods sections.

      Reviewer #2 (Public review):

      Summary:

      This work is towards the development of nonantibiotic treatment for C. difficile. The authors screened a chemical library for activity against the C. difficile toxin TcdB, and found a group of compounds with antitoxin activity. Caffeic acid derivatives were highly represented within this group of antitoxin compounds, and the remaining portion of this work involves defining the mechanism of action of caffeic acid phenethyl ester (CAPE) and testing CAPE in mouse C. difficile infection model. The authors conclude CAPE attenuates C. difficile disease by limiting toxin activity and increasing microbial diversity during C. difficile infection.

      Strengths/ Weaknesses:

      The strategy employed by the authors is sound although not necessarily novel. A compound that can target multiple steps in the pathogenies of C. difficile would be an exciting finding. However, the data presented does not convincingly demonstrate that CAPE attenuates C. difficile disease and the mechanism of action of CAPE is not convincingly defined. The following points highlight the rationale for my evaluation.

      (1) The toxin exposure in tissue culture seems brief (Figure 1). Do longer incubation times between the toxin and cells still show CAPE prevents toxin activity?

      Thanks for your comments. The cytotoxicity assay was employed to directly assess the protective capacity of CAPE against cell death induced by TcdB. Our observations at 1 and 12 h post-TcdB exposure revealed that CAPE effectively mitigated the toxic effects of the TcdB at both time points, demonstrating its potent protective role. Please see Figure S1.

      (2) The conclusion that CAPE has antitoxin activity during infection would be strengthened if the mouse was pretreated with CAPE before toxin injections (Figure 1D).

      Thanks for your constructive comments. According to your suggestion, we administered TcdB 2 h after pretreatment with CAPE. The outcomes demonstrated that CAPE pretreatment significantly enhanced the survival rate of the intoxicated mice, confirming that CAPE retains its antitoxin efficacy during the infection process. Please see Figure S2.

      (3) CAPE does not bind to TcdB with high affinity as shown by SPR (Figure 4). A higher affinity may be necessary to inhibit TcdB during infection. The GTD binds with millimolar affinity and does not show saturable binding. Is the GTD the binding site for CAPE? Auto processing is also affected by CAPE indicating CAPE is binding non-GTD sites on TcdB.

      Thanks for your comments. Our findings indicate that the GTD domain is a critical binding site for CAPE. CAPE exerts its protective effects at multiple stages of TcdB-mediated cell death, including inhibiting TcdB's self-cleavage and blocking the activity of GTD, thereby preventing the glycosylation modification of Rac1 by TcdB.

      (4) In the infection model, CAPE does not statistically significantly attenuate weight loss during C. difficile infection (Figure 6). I recognize that weight loss is an indirect measure of C. difficile disease but histopathology also does not show substantial disease alleviation (see below).

      Thanks for your comments. Our comparative analysis revealed a notable distinction in the body weight of mice on the third day post-infection (Figure 6B). Similarly, the dry/wet stool ratio exhibited a comparable pattern, suggesting that treatment with phenethyl caffeic acid ameliorated Clostridium difficile-induced diarrhea to a significant degree (Figure 6C).

      (5) In the infection model (Figure 6), the histopathology analysis shows substantial improvement in edema but limited improvement in cellular infiltration and epithelial damage. Histopathology is probably the most critical parameter in this model and a compound with disease-modifying effects should provide substantial improvements.

      Thanks for your comments. Edema, inflammatory factor infiltration, and epithelial damage served as key evaluation metrics. Statistical analysis revealed that the pathological scores of mice treated with CAPE were markedly reduced compared to those in the model group (Figure 6F).

      (6) The reduction in C. difficile colonization is interesting. It is unclear if this is due to antitoxin activity and/or due to CAPE modifying the gut microbiota and metabolites (Figure 6). To interpret these data, a control is needed that has CAPE treatment without C. difficile infection or infection with an atoxicogenic strain.

      The observed reduction in C. difficile fecal colonization following drug treatment may be attributed to the CAPE's antitoxin properties or its capacity to modify the intestinal microbiota and metabolites. These two mechanisms likely work in tandem to combat CDI. CDI is primarily triggered by the toxins A (TcdA) and B (TcdB) secreted by the bacterium. Certain therapies, including monoclonal antibodies like bezlotoxumab, target CDI by neutralizing these toxins, thereby mitigating gut damage and subsequent C. difficile colonization(1,2). The establishment of C. difficile in the gut is intricately linked to the equilibrium of the intestinal microbiota. Although antibiotic treatments can inhibit C. difficile growth, they may also disrupt the microbial balance, potentially facilitating the overgrowth of other pathogens. Consequently, interventions such as fecal microbiota transplantation (FMT) are designed to reestablish gut flora balance and consequently decrease C. difficile colonization(3,4). Moreover, the administration of probiotics and prebiotics is considered to reduce C. difficile colonization by modifying the gut environment(5,6).

      (7) Similar to the CAPE data, the melatonin data does not display potent antitoxin activity and the mouse model experiment shows marginal improvement in the histopathological analysis (Figure 9). Using 100 µg/ml of melatonin (~ 400 micromolar) to inactivate TcdB in cell culture seems high. Can that level be achieved in the gut?

      The uptake and dissemination of melatonin within the body varies with the dose administered. For instance, in rats, the bioavailability of melatonin following administration was found to be 53.5%, whereas in dogs, bioavailability was nearly complete (100%) at a dose of 10 mg/kg, yet it decreased to 16.9% at a lower dose of 1 mg/kg(7). This data suggests that the absorption of melatonin differs across various animal species and is influenced by the dose administered. Moreover, it underscores the higher potential bioavailability of melatonin, implying that a dose of 200 mg/kg should be adequate to achieve the desired concentration in the body post-administration.

      (8) The following parameters should be considered and would aid in the interpretation of this work. Does CAPE directly affect the growth of C. difficile? Does CAPE affect the secretion of TcdB from C. difficile? Does CAPE alter the sporulation and germination of C. diffcile?

      We incorporated CAPE into the MIC assay for detecting C. difficile, as well as for assessing the sporulation capacity of C. difficile and evaluating the secretion level of TcdB. The findings revealed that CAPE markedly repressed tcdB transcription at a concentration of 16 μg/mL and effectively suppressed the growth and sporulation of C. difficile BAA-1870 at a concentration of 32 μg/mL. Please see Figure S3.

      References:

      (1) Skinner AM, et al. Efficacy of bezlotoxumab to prevent recurrent Clostridioides difficile infection (CDI) in patients with multiple prior recurrent CDI. Anaerobe. 2023 Dec; 84: 102788.

      (2) Wilcox MH, et al. Bezlotoxumab for Prevention of Recurrent Clostridium difficile Infection. N Engl J Med. 2017 Jan 26;376(4):305-317.

      (3) Khoruts A, Sadowsky MJ. Understanding the mechanisms of faecal microbiota transplantation. Nat Rev Gastroenterol Hepatol. 2016 Sep;13(9):508-16.

      (4) Khoruts A, Staley C, Sadowsky MJ. Faecal microbiota transplantation for Clostridioides difficile: mechanisms and pharmacology. Nat Rev Gastroenterol Hepatol. 2021 Jan;18(1):67-80.

      (5) Mills JP, Rao K, Young VB. Probiotics for prevention of Clostridium difficile infection. Curr Opin Gastroenterol. 2018 Jan;34(1):3-10.

      (6) Lau CS, Chamberlain RS. Probiotics are effective at preventing Clostridium difficile-associated diarrhea: a systematic review and meta-analysis. Int J Gen Med. 2016 Feb 22; 9:27-37.

      (7) Yeleswaram K, et al. Pharmacokinetics and oral bioavailability of exogenous melatonin in preclinical animal models and clinical implications. J Pineal Res. 1997 Jan;22(1):45-51.

      Reviewer #3 (Public review):

      Summary:

      The study is well written, and the results are solid and well demonstrated. It shows a field that can be explored for the treatment of CDI.

      Strengths:

      The results are really good, and the CAPE shows a good and promising alternative for treating CDI. The methodology and results are well presented, with tables and figures that corroborate them. It is solid work and very promising.

      Weaknesses:

      Some references are too old or missing.

      Thanks for your constructive suggestion. We have included and refreshed several references to enhance the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the manuscript convincingly demonstrates that CAPE affects the TcdB toxin and reduces its toxicity in vitro, it would be beneficial to include data on the effect of CAPE on the growth of C. difficile. This would help ensure that the observed in vivo effects are not merely due to reduced bacterial growth but rather due to the specific action of CAPE on the toxin.

      Thanks for your constructive suggestion. We have augmented our findings with the impact of CAPE on the bacteria themselves, revealing that CAPE not only hampers the growth of the bacterial cells but also suppresses their capacity to produce spores. Please see Figure S3.

      (1) Line 41, line 115 - authors should clarify what they mean when mentioning Bacteroides within parentheses.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 71 - Is C. difficile really found "in the environment"?

      Thanks for your comments. C. difficile is prevalent across various natural settings, including soil and water ecosystems. A study has identified highly diverse strains of this bacterium within environmental samples(1). Moreover, the significant presence of C. difficile in soil and lawn specimens collected near Australian hospitals indicates that the organism is indeed a common inhabitant in the environment(2).

      (3) Lines 128-130 - Was there a wash step here? What could be the impact of compound carryover in this experiment?

      Thanks for your comments. Following pre-incubation of TcdB with CAPE, remove the compounds that have not bound to TcdB through centrifugation. The persistence of the compound in the culture post-washing could result in an inflated assessment of its efficacy, particularly if it continues to engage with TcdB or the cells beyond the initial 1-hour pre-incubation window. The carryover of the compound might also give rise to misleading positive results, where the compound seems to confer protection or inhibition against TcdB-mediated cell rounding, whereas such effects are actually due to the lingering activity of the compound. This carryover could skew the determination of the compound's minimum effective concentration, as the effective concentration interacting with the cells might be inadvertently elevated. Furthermore, if the compounds possess cytotoxic properties or impact cell viability, carryover could generate artifacts in cell morphology that are unrelated to the direct interaction between TcdB and the compounds.

      (4) Lines 133-134 - I suggest authors mention how many caffeic acid derivatives there were in the entire library so that the suggested "enrichment" of them in the group of bioactive compounds can be better judged.

      Thanks for your comments. The natural compound library contained eight caffeic acid derivatives, of which methyl caffeic acid and ferulic acid displayed no efficacy. This information has been incorporated into the manuscript.

      (5) Line 135 - I recommend the authors add the molarity of the compound solutions used.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (6) Line 247 - I think the term "CAPE mice" is confusing. Please use a full description.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (7) Line 248 - I also think the terms "model mice" and "model group" are confusing. Maybe call them "control mice"?

      Thanks for your comments. The terms "model mice" and "model group" are indeed synonymous, and we have subsequently clarified that control mice refer to those that have not been infected with C. difficile.

      (8) Line 273 - "most abundant species at the genus level" is incorrect. I think what you mean is "most abundant TAXA".

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (9) Line 278 - Please include your p-value cut-off together with the LDA score.

      Thanks for your comments. We have revised the above description to “LDA score > 3.5, p < 0.05”.

      (10) Line 292 - Details on how metabolomics was performed should be included here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (11) Line 299 - 1.5 is a fairly low cut-off. The authors should at a minimum also include the p-value cut-off used.

      Response: Thanks for your comments. We have revised the above description to “fold change > 1.5, p < 0.05”.

      (12) Line 307 - Purine "degradation" would be better here.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (13) Line 328 onward - The melatonin experiment is a weird one. Although I fully understand the rationale behind testing the effect of melatonin in the mouse model, the idea that just because melatonin levels changed in the gut it would act as a direct inhibitor of TcdB was very far-fetched, even though it ended up working. Authors should explain this in the manuscript.

      Thanks for your comments. Furthermore, beyond our murine studies, we have confirmed that melatonin significantly diminishes TcdB-induced cytotoxicity at the cellular level (Figure 9A). Additionally, it has been documented that melatonin, acting as an antimicrobial adjuvant and anti-inflammatory agent, can decrease the recurrence of CDI(3). Consequently, we contend that the aforementioned statement is substantiated.

      (14) Lines 429-435 - There are seemingly contradictory pieces of information here. The authors state that adenosine is released from cells upon inflammation and that CAPE treatment caused an increase in adenosine levels. Later in this section, the authors state that adenosine prevents TcdA-mediated damage and inflammation. This should be clarified and better discussed.

      Thanks for your comments. Adenosine modulates immune responses and inflammatory cascades by interacting with its receptors, including its capacity to suppress the secretion of specific pro-inflammatory mediators. We have updated this depiction in the manuscript.

      (15) Lines 513-514 - How was this phenotype quantified?

      Thanks for your comments. Initially, we introduced TcdB at a final concentration of 0.2 ng/mL along with various concentrations of compounds into 1 mL of medium for a 1-h pre-incubation period. Subsequently, unbound compounds were removed through centrifugation, and the resulting mixture was then applied to the cells.

      (16) Figure 3 - panels are labeled incorrectly.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (17) Figure 5C - it is unclear what the different colors and labels represent.

      Thanks for your comments. In the depicted graph, blue denotes the total binding energy, red signifies the electrostatic interactions, green corresponds to the van der Waals forces, and orange indicates solvation or hydration effects. The horizontal axis represents the mutation of the amino acid residue at the respective position to alanine. As illustrated in Figure 5C, the mutations W520A and GTD exhibit the highest binding energies.

      References:

      (1) Janezic S, et al. Highly Divergent Clostridium difficile Strains Isolated from the Environment. PLoS One. 2016 Nov 23;11(11): e0167101.

      (2) Perumalsamy S, Putsathit P, Riley TV. High prevalence of Clostridium difficile in soil, mulch and lawn samples from the grounds of Western Australian hospitals. Anaerobe. 2019 Dec; 60:102065.

      (3) Sutton SS, et al. Melatonin as an Antimicrobial Adjuvant and Anti-Inflammatory for the Management of Recurrent Clostridioides difficile Infection. Antibiotics (Basel). 2022 Oct 25;11(11):1472.

      Reviewer #2 (Recommendations for the authors):

      Minor comments and questions.

      (1) Which form of TcdB is being used in these experiments?

      Thanks for your comments. The TcdB proteins used in this study are TcdB1 subtypes.

      (2) Why are THP-1 cells being used in these assays?

      Thanks for your comments. For the purposes of this study, we employed a diverse array of cell lines, including Vero, HeLa, THP-1, Caco-2, and HEK293T. Each cell line was selected to serve a specific experimental objective. The inclusion of the THP-1 cell line was necessitated by the need to incorporate a macrophage cell line to ensure the comprehensive nature of our experiments, allowing for the testing of both epithelial cells and macrophages. C. difficile is a kind of intestinal pathogenic bacteria, and immune clearance plays a vital role in the process of pathogen infection, so THP-1 cells are used as important immune cells.

      (3) Please improve the quality of the microscopy images in Figure 1.

      Thanks for your comments. We have improved the quality of the microscopy images in Figure 1.

      (4) Does the flow cytometry experiment in Figure 2B show internalization? Surface-bound toxins would provide the same histogram.

      Thanks for your comments. Figure 2B was employed to assess the internalization of TcdB, and the findings indicate that CAPE does not influence the internalization process of TcdB.

      (5) The sensogram in Figure 4A does not look typical and should be clarified.

      Thanks for your comments. Typically, small molecules and proteins engage in a rapid binding and dissociation dynamic. However, as depicted in Figure 4A, the interaction between CAPE and TcdB demonstrates a gradual progression towards equilibrium. This behavior can be primarily explained by the swift occupation of the protein's primary binding sites by the small molecule in the initial stages. Subsequently, CAPE binds to secondary or lower affinity sites, extending the time needed to reach equilibrium. Additionally, the likelihood of CAPE binding to multiple sites on TcdB requires time for the exploration and occupation of these diverse locations before equilibrium is attained, we have incorporated an analysis of this potential scenario into the manuscript.

      Reviewer #3 (Recommendations for the authors):

      These are my suggestions for the text:

      (1) Line 29: high recurrent rates.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (2) Line 32: Where is the caffeic acid identified? I think a line should be included.

      Thanks for your comments. Caffeic acid was identified from natural compounds library and we have completed the corresponding modifications according to the suggestions.

      (3) Line 39: C. difficile is not italic.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (4) Line 41: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      (5) Line 56: This number of casualties 56.000 is still happening or it was in the past?

      Thanks for your comments. The mortality rates reported in the manuscript reflect a downturn in the incidence and fatality of CDI around 2017(1), as the infection gained broader recognition. Nonetheless, a recent study reveals that the mortality rate for CDI cases in Germany can soar to 45.7% within a year, with the overall economic burden amounting to approximately 1.6 billion euros. This underscores the ongoing significance of CDI as a global public health challenge(2).

      (6) Line 104: Where did the idea of testing caffeic acid come from? Any previous study of the authors? Any studies with the inhibition of other pathogens?

      Thanks for your comments. Initially, we conducted a screen of a compound library comprising 2,076 compounds and identified several potent inhibitors, which, upon structural analysis, were revealed to be caffeic acid derivatives. Prior to our investigation, no studies had explored the potential of CAPE in this context.

      (7) Line 115: Bacteroides spp.

      Thanks for your comments. We have completed the corresponding modifications according to the suggestions.

      Results section

      (8) Did the authors try the caffeic acid with the TcdA or binary toxin? I know this is not the purpose of the study, but TcdA toxin has a high identity structure with TcdB and generates inflammation in the gut via neutrophils. Negative strains for the major toxins and positive for the binary toxin also cause severe cases of CDI.

      Thanks for your comments. Although we acknowledge the significance of TcdA and binary toxins in CDI, we did not investigate the impact of CAPE on these toxins. Our focus was exclusively on the effect of CAPE against TcdB, as it is the primary virulence factor in C. difficile pathogenesis. Since TcdA and TcdB are highly similar in structure, we will analyze the neutralization effect of CAPE on TcdA in later studies.

      (9) Does caffeic acid have any effect on C. difficle? Or does it only gain the toxins? That would be ideal.

      Thanks for your comments. We have included additional related assays in our study. Beyond directly neutralizing TcdB, CAPE also demonstrates the capacity to inhibit the growth and spore formation of C. difficile.

      (10) Line 230: C. difficile BAA-1870 is a clinical strain? There are no details about it in the paper.

      Thanks for your comments. C. difficile BAA-1870 (RT027/ST1), a highly virulent isolate frequently employed in research(3-6), was kindly donated by Professor Aiwu Wu. We have meticulously noted the PCR ribotype in our manuscript.

      (11) Line 236: Did the mice fully recover from CDI after the administration of the CAPE? Was one dose enough?

      Thanks for your comments. CAPE was administered orally at 24 h intervals, commencing with the initial dose on Day 0. By the time a significant difference was observed on Day 3, the treatment had been administered a total of three times.

      Methodology

      (12) Most of the methods do not have a reference.

      Thanks for your comments. We have added several references to the methods.

      Discussion section

      (13) The first two paragraphs of the discussion should be summarized. Those details were already explained in the introduction.

      Thanks for your comments. The discussion section and the introduction address slightly different focal points; therefore, we aim to retain the first two paragraphs to maintain continuity and context.

      (14) Line 382: Bezolotoxumab was approved by the FDA in 2016. It is not recent.

      Thanks for your comments. We have revised the above description.

      (15) Line 410: "Despite the high 410 cure rate and increasing popularity of FMT, its safety remains controversial. Although this is true, recently (2022) the FDA approved the Rebyota, which was later cited by the authors.

      Thanks for your comments. We have revised the above description.

      (16) Lines 415-416: "the abundance of Bacteroides, a critical gut microbiota component that is required for C. difficile resistance". There is only one reference cited by the authors. I suppose that if it is true, more studies should be mentioned. Why are probiotics with Bacteroides spp. not available in the market?

      Thanks for your comments. We have supplemented additional references. The scarcity of probiotic products containing Bacteroides spp. on the market is primarily attributable to the stringent requirements of their survival conditions. As most Bacteroides spp. are anaerobic, they thrive in oxygen-deprived environments. This unique survival trait poses challenges in maintaining their viability during product preservation and distribution, which in turn escalates production costs and complexity. Furthermore, despite the significant role of Bacteroides in gut health, research into its potential probiotic benefits and safety is comparatively underexplored.

      References:

      (1) Guh AY, et al. Emerging Infections Program Clostridioides difficile Infection Working Group. Trends in U.S. Burden of Clostridioides difficile Infection and Outcomes. N Engl J Med. 2020 Apr 2;382(14):1320-1330.

      (2) Schley K, et al. Costs and Outcomes of Clostridioides difficile Infections in Germany: A Retrospective Health Claims Data Analysis. Infect Dis Ther. 2024 Nov 20.

      (3) Saito R, et al. Hypervirulent clade 2, ribotype 019/sequence type 67 Clostridioides difficile strain from Japan. Gut Pathog. 2019 Nov 4; 11:54.

      (4) Pellissery AJ, Vinayamohan PG, Venkitanarayanan K. In vitro antivirulence activity of baicalin against Clostridioides difficile. J Med Microbiol. 2020 Apr;69(4):631-639.

      (5) Shao X, et al. Chemical Space Exploration around Thieno[3,2-d]pyrimidin-4(3H)-one Scaffold Led to a Novel Class of Highly Active Clostridium difficile Inhibitors. J Med Chem. 2019 Nov 14;62(21):9772-9791.

      (6) Mooyottu S, Flock G, Venkitanarayanan K. Carvacrol reduces Clostridium difficile sporulation and spore outgrowth in vitro. J Med Microbiol. 2017 Aug;66(8):1229-1234.

  5. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. They may not reach out to their professors when they are performing poorly in the class, fearing that they will be judged as lacking in the ability to succeed in school.

      This makes a lot of sense because I think students who come from low-income backgrounds have always had to work extra hard to end up at the same place as their wealthier counterparts who have more resources and thus more opportunities. It may make them feel "weak" to ask for help even though it is totally normal for us to reach out to professors when we are struggling. I think there is a big psychological effect that is going on here. No one wants to feel like they cannot handle a class or exam, especially if they have put a lot of pressure on themselves to overcome their situation. It is totally understandable when lower-income students have trouble reaching our, but we should work on creating a safe space where students feel comfortable reaching out regardless of their situations.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      The authors have constructively responded to previous referee comments and I believe that the manuscript is a useful addition to the literature. I particularly appreciate the quantitative approach to social behavior, but have two cautionary comments.

      (1) Conceptually it is important to further justify why this particular maximum entropy model is appropriate. Maximum entropy models have been applied across a dizzying array of biological systems, including genes, neurons, the immune system, as well as animal behavior, so would seem quite beneficial to explain the particular benefits here, for mouse social behavior as coarse-grained through the eco-hab chamber occupancy. This would be an excellent chance to amplify what the models can offer for biological understanding, particularly in the realm of social behavior

      We thank the reviewer for this comment. Maximum entropy models, along with other statistical inference methods that learn interaction patterns from simultaneously-measured degrees of freedom, help distinguish various types of interactions, e.g. direct vs. indirect interactions among animals, individual preference to food vs. social interaction with pairs. As research on social behavior expands from focusing on pairs of animals to studying groups in (semi-)naturalistic environments, maximum entropy models serve as a crucial link between high-throughput data and the need to identify and distinguish interaction rules. Specifically, among all possible maximum entropy models, the pairwise maximum entropy model is one of the simplest that can describe interactions among individuals, which serves as an excellent starting point to understand collective and social behavior in animals.

      Although the Eco-HAB setup currently records spatially coarse-grained data, it still provides more spatial information compared to the traditional three-chamber tests used to assess sociability for rodents. By showing that the maximum entropy model can effectively analyze Eco-HAB data, we hope to highlight its potential in research of social behavior in animals.

      To amplify what the models can offer for biological understanding particularly in the realm of social behavior, We have updated the Introduction to add a more logical structure to the need of using maximum entropy models to identify interactions among mice. Additionally, we updated the first paragraph of the Discussion to make it specific that it is the use of maximum entropy models that identifies interaction patterns from the high-throughput data. Finally, we have also added in the Discussion (line 422-425) arguments supporting the specific use of pairwise maximum entropy models to study social behaviors.

      (2) Maximum entropy models of even intermediate size systems involve a large number of parameters. The authors are transparent about that limitation here, but I still worry that the conclusion of the sufficiency of pairwise interactions is simply not general, and this may also relate to the differences from previous work. If, as the authors suggest in the discussion, this difference is one of a choice of variables, then that point could be emphasized. The suggestion of a follow up study with a smaller number of mice is excellent.

      We thank the reviewer for raising the issue and agree that the caveat of how general pairwise interactions can describe social behavior of animals needs to be discussed. We have added a sentence in the Discussion to point out this important caveat. “More generally, this discrepancy when looking at different choices of variables raises the issue that when studying social behavior of animals in a group, it is important to test and compare interaction models with different complexity (e.g. pairwise or with higher-order interactions).” We have also toned down our conclusion to limit our results of pairwise interactions describing mice co-localization patterns to the data collected in Eco-HAB (also see Reviewer 3 Major Point 2).

      Reviewer #3 (Public review):

      Summary:

      Chen et al. present a thorough statistical analysis of social interactions, more precisely, co-occupying the same chamber in the Eco-HAB measurement system. They also test the effect of manipulating the prelimbic cortex by using TIMP-1 that inhibits the MMP-9 matrix metalloproteinase. They conclude that altering neural plasticity in the prelimbic cortex does not eliminate social interactions, but it strongly impacts social information transmission.

      Strengths:

      The quantitative approach to analyzing social interactions is laudable and the study is interesting. It demonstrates that the Eco-HAB can be used for high throughput, standardized and automated tests of the effects of brain manipulations on social structure in large groups of mice.

      Weaknesses:

      A demonstration of TIMP-1 impairing neural plasticity specifically in the prelimbic cortex of the treated animals would greatly strengthen the biological conclusions. The Eco-HAB provides coarser spatial information compared to some other approaches, which may influence the conclusions.

      Recommendations for the authors:  

      Reviewer #3 (Recommendations for the authors):

      Major points

      (1) Do the Authors have evidence that TIMP-1 was effective, as well as specific to the prelimbic cortex?

      We refer to the literature for the effectiveness and specificity of TIMP-1 to the prelimbic cortex.

      Specifically, the study by Okulski et al. (Biol. Psychiatry 2007) provides clear evidence that TIMP1 plays a role in synaptic plasticity in the prefrontal cortex. They showed that TIMP-1 is induced in the medial prefrontal cortex (mPFC) following stimulation that triggers late long-term potentiation (LTP), a key model of synaptic plasticity. Overexpression of TIMP-1 in the mPFC blocked the activity of matrix metalloproteinases (MMPs) and prevented the induction of late LTP in vivo. Similar effects were observed with pharmacological inhibition of MMP-9 in vitro, reinforcing the idea that TIMP-1 regulates extracellular proteolysis as part of the plasticity mechanism in the prefrontal cortex. These findings confirm that TIMP-1 is both effective and active in this specific brain region.

      Further evidence comes from Puścian et al. (Mol. Psychiatry 2022), who used TIMP-1-loaded nanoparticles to influence neuronal plasticity in the amygdala. They found that TIMP-1 affected MMP expression, LTP, and dendritic morphology, showing its impact on synaptic modifications. More directly relevant, Winiarski et al. (Sci. Adv. 2025) demonstrated that injecting TIMP-1-loaded nanoparticles into the prelimbic cortex altered responses to social stimuli, further supporting the idea that TIMP-1 has region-specific effects on behavioral processes.

      We have also updated the main text (page 8, 1st paragraph of “Effect of impairing neuronal plasticity in the PL on subterritory preferences and sociability”) of the manuscript to include the above references.

      (2) The Authors seem to suggest that one main reason for the different results compared to Shemesh et al. 2013 was the coarseness of the Eco-HAB data. In this case, I think this conclusion should be toned down because of this significant caveat.

      We thank the reviewer for pointing this out, and agree that this caveat and difference should be emphasized. To tone down the conclusion, we have

      (1) added details about the Eco-HAB (it being coarse-grained, etc.) in the abstract to tone down the conclusion.

      (2) added to the results summary in the Discussion (top of page 12) that the results are “within in the setup of the semi-naturalistic Eco-HAB experiments”

      (3) added to the Discussion (page 13) that the different results compared to Shemesh et al 2013 means that general studies of social behavior need to compare models with different levels of complexity (e.g. pairwise vs. higher-order interactions). (Also see Reviewer 2 Comment 2.)

      Minor points

      (1) Please explain what is measured in Fig. 1C (what is on the y axis?).

      Figure 1C shows the activity of the mice as measured by the rate of transitions, i.e. the number of times the mice switch boxes during each hour of the day, averaged over all N = 15 mice and T = 10 days (cohort M1). The error bars represent variability of activities across individuals or across days. For mouse-to-mouse variability (blue), we first compute for each mouse its number of transitions averaged over the same hour for all 10 days, then we compute its standard deviation across all 15 mice and plot it as error bars. For day-to-day variability (orange), we first compute for each day the number of transitions for each hour averaged over all mice, then compute its standard deviation across all 10 days as the errorbar. We have added the detailed explanation in the caption of Figure 1C.

      (2) In Fig. 3, it would be better to present the control group also in the main figure instead of the supplementary.

      We have merged Figure 3 and Figure 3 Supplementary 1 to present the control group also in the main figure.

      (3) In Fig. 3 and corresponding supplements, there seems to be a large difference between males and females. I think this would deserve some more discussion.

      While not being the main focus of this paper, we agree with the reviewer that the difference between male and female is important and deserves attention in the discussion and also future study. Thus we have added a paragraph in the Discussion (line 394-399, bottom of page 12).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors set out to analyse the roles of the teichoic acids of Streptococcus pneumoniae in supporting the maintenance of the periplasmic region. Previous work has proposed the periplasm to be present in Gram positive bacteria and here advanced electron microscopy approach was used. This also showed a likely role for both wall and lipo-teichoic acids in maintaining the periplasm. Next, the authors use a metabolic labelling approach to analyse the teichoic acids. This is a clear strength as this method cannot be used for most other well studied organisms. The labelling was coupled with super-resolution microscopy to be able to map the teichoic acids at the subcellular level and a series of gel separation experiments to unravel the nature of the teichoic acids and the contribution of genes previously proposed to be required for their display. The manuscript could be an important addition to the field but there are a number of technical issues which somewhat undermine the conclusions drawn at the moment. These are shown below and should be addressed. More minor points are covered in the private Recommendations for Authors.

      Weaknesses to be addressed:

      (1) l. 144 Was there really only one sample that gave this resolution? Biological repeats of all experiments are required.

      CEMOVIS is a very challenging method that is not amenable to numerous repeats. However, multiple images were recorded from at least two independent samples for each strain. Additional sample images are shown in a new Fig. S3.

      CETOVIS is even more challenging (only two publications in Pubmed since 2015) and was performed on a single ultrathin section that, exceptionally, laid perfectly flat on the EM grid, allowing tomography data acquisition on ∆tacL cells. The reconstructed tomogram confirmed the absence of a granular layer in the depth of the section. Additionally, the numbering of Fig. S4A-B (previously misidentified as Fig. S2A-B) has been corrected in the text of V2.

      (2) Fig. 4A. Is the pellet recovered at "low" speeds not just some of the membrane that would sediment at this speed with or without LTA? Can a control be done using an integral membrane protein and Western Blot? Using the tacL mutant would show the behaviour of membranes alone.

      We think that the pellet is not just some of the membrane but most of it. In support of this view, the “low” speed pellets after enzymatic cell lysis contain not just some membrane lipids, but most of them (Fig. S10A). We therefore expect membrane proteins to be also present in this fraction. We performed a Western blot using antibodies against the membrane protein PBP2x (new Fig. S7C). Unfortunately, no signal was detected most likely due to protein degradation from contaminant proteases that we could trace to the purchased mutanolysin. The same sedimentation properties were observed with the ∆tacL strain as shown in Fig. 6A. However, in the ∆tacL strain the membrane pellet still contains membrane-bound TA precursors. It is therefore impossible to test definitely if pneumococcal membranes totally devoid of TA would sediment in the same way.

      (3) Fig. 4A. Using enzymatic digestion of the cell wall and then sedimentation will allow cell wall associated proteins (and other material) to become bound to the membranes and potentially effect sedimentation properties. This is what is in fact suggested by the authors (l. 1000, Fig. S6). In order to determine if the sedimentation properties observed are due to an artefact of the lysis conditions a physical breakage of the cells, using a French Press, should be carried out and then membranes purified by differential centrifugation. This is a standard, and well-established method (low-speed to remove debris and high-speed to sediment membranes) that has been used for S. pneumoniae over many years but would seem counter to the results in the current manuscript (for instance Hakenbeck, R. and Kohiyama, M. (1982), Purification of Penicillin-Binding Protein 3 from Streptococcus pneumoniae. European Journal of Biochemistry, 127: 231-236).

      Thank you for this suggestion. We have tested this hypothesis by breaking cells with a Microfluidizer followed by differential centrifugation. This experiment, which requires an important minimal volume, was performed with unlabeled cells (due to the cost of reagents) and assessed by Western blot using antibodies against the membrane protein PBP2x (new Fig. S7C). In this case, the majority of the membrane material was found in the high-speed pellet, as expected.

      We also applied the spheroplast lysis procedure of Flores-Kim et al. to the labeled cells, and found that most of the labeled material sedimented at low speed (new Fig. S7B), as observed with our own procedure.

      With these new results, the section on membrane density has been removed from the Supplementary Information. Instead, the fractionation is further discussed in terms of size of membrane fragments and presence of intact spheroplasts in the notes in Supplementary Information preceding Fig. S7.

      (4) l. 303-305. The authors suggest that the observed LTA-like bands disappear in a pulse chase experiment (Fig. 6B). What is the difference between this and Fig. 5B, where the bands do not disappear? Fig. 5C is the WT and was only pulse labelled for 5 min and so would one not expect the LTA-like bands to disappear as in 6B?

      Fig. 6B shows a pulse-chase experiment with strain ∆tacL, whereas Fig. 5C shows a similar experiment with the parental WT strain. The disappearance of the LTA-like band pattern with the ∆tacL strain (Fig. 6B), and their persistence in the WT strain (Fig. 5C), indicate that these bands are the undecaprenyl-linked TA in ∆tacL and proper LTA in the WT. A sentence has been added to better explain this point in V2.

      Note that we have exchanged the previous Fig. 5C and Fig. S13B, so that the experiments of Fig. 5A and 5C are in the same medium, as suggested by Reviewer #2.

      (5) Fig. 6B, l. 243-269 and l. 398-410. If, as stated, most of the LTA-like bands are actually precursor then how can the quantification of LTA stand as stated in the text? The "Titration of Cellular TA" section should be re-evaluated or removed? If you compare Fig. 6C WT extract incubated at RT and 110oC it seems like a large decrease in amount of material at the higher temperature. Thus, the WT has a lot of precursors in the membrane? This needs to be quantified.

      Indeed, the quantification of the ratio of LTA and WTA in the WT strain rests on the assumption that the amount of membrane-linked polymerized TA precursors is negligible in this strain. This assumption is now stated in the Titration section. We think it is the case. The true LTA and TA precursors do not have exactly the same electrophoretic mobility, being shifted relative to each other by about half a ladder “step”. This difference is visible when samples are run in adjacent lanes on the same gel, as in the new Fig. 6C. The difference of migration was well documented in the original paper about the deletion of tacL, although tacL was known as rafX at that time, and the ladders were misidentified as WTA (Wu et al. 2014. A novel protein, RafX, is important for common cell wall polysaccharide biosynthesis in Streptococcus pneumoniae: implications for bacterial virulence. J Bacteriol. 196, 3324-34. doi: 10.1128/JB.01696-14). This reference was added in V2. The experiment in the new Fig. 6C was repeated to have all samples on the same gel and treated at a lower temperature. The minor effect on the amount of LTA when WT cells are heated at pH 4.2 may be due to the removal of some labeled phosphocholine. We have NMR evidence that the phosphocholine in position D is labile to acidic treatment of LTA, which may lack in some cases, as reported by Hess et al. (Nat Commun. 2017 Dec 12;8(1):2093. doi: 10.1038/s41467-017-01720-z).

      (6) L. 339-351, Fig. 6A. A single lane on a gel is not very convincing as to the role of LytR. Here, and throughout the manuscript, wherever statements concerning levels of material are made, quantification needs to be done over appropriate numbers of repeats and with densitometry data shown in SI.

      Yes indeed. Apart from the titration of TA in the WT strain, we haven’t yet carried out a thorough quantification of TA or LTA/WTA ratio in different strains and conditions, although we intend to do so in a follow-up study, using the novel opportunities offered by the method presented here.

      However, to better substantiate our statement regarding the ∆lytR strain, we have quantified two experiments performed in C-medium with azido-choline, and two experiments of pulse labeling in BHI medium. The results are presented in the additional supplementary Fig. S14. The value of 51% was a calculation error, and was corrected to 41%. Likewise, the decrease in the WTA/LTA ratio was corrected to 5 to 7-fold.

      (7) 14. l. 385-391. Contrary to the statement in the text, the zwitterionic TA will have associated counterions that result in net neutrality. It will just have both -ve and +ve counterions in equal amounts (dependent on their valency), which doesn't matter if it is doing the job of balancing osmolarity (rather than charge).

      Thank you for pointing out this point. The paragraph has been corrected in V2.

      Reviewer #2 (Public review):

      The Gram-positive cell wall contains for a large part of TAs, and is essential for most bacteria. However, TA biosynthesis and regulation is highly understudied because of the difficulties in working with these molecules. This study closes some of our important knowledge gaps related to this and provides new and improved methods to study TAs. It also shows an interesting role for TAs in maintaining a 'periplasmic space' in Gram positives. Overall, this is an important piece of work. It would have been more satisfying if the possible causal link between TAs and periplasmic space would have been more deeply investigated with complemented mutants and CEMOVIS. For the moment, there is clearly something happening but it is not clear if this only happens in TA mutants or also in strains with capsules/without capsules and in PG mutants, or in lafB (essential for production of another glycolipid) mutants. Finally, some very strong statements are made suggesting several papers in the literature are incorrect, without actually providing any substantiation/evidence supporting these claims. Nevertheless, I support the publication of this work as it pioneers some new methods that will definitively move the field forward.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) l. 55 It is stated that TA are generally not essential. This needs to be introduced in a little more detail as in several species they are collectively. Need some more references here to give context.

      We have expended the paragraph and added a selection of references in V2.

      (2) l. 63 and Fig. 1A. Is the model based on the images from this paper? Is the periplasm as thick as the peptidoglycan layer? Would you not expect the density of WTA to be the same throughout the wall, rather than less inside? Do the authors think that the TA are present as rods in the cell envelope and because of this the periplasm looks a little like a bilayer, is this so? Is the relative thickness of the layers based on the data in the paper (Table 1)?

      The model proposed in Fig. 1A is not based on our data. It is a representation of the model proposed by Harold Erickson, and the appropriate reference has been added to the figure legend in V2. We do not speculate on the relative density of WTA inside the peptidoglycan layer, at the surface or in the periplasm. The only constraint from the model is that the density of WTA in the periplasm should be sufficient for self-exclusion and allow the brush polymer theory to apply. The legend has been amended in V2.

      We indeed think that the bilayer appearance of the periplasmic space in the wild type strain, and the single layer periplasmic space in the ∆tacL and ∆lytR support the Erickson’s model. Although the model was drawn arbitrarily, it turns out that the relative thickness of the peptidoglycan and periplasmic scale is in rough agreement with the measurements reported in Table 1.

      (3) Fig. 2. It is hard to orient oneself to see the layers. The use of the term periplasmic space (l. 132) and throughout is probably not wise as it is not a space.

      We prefer to retain this nomenclature since the term periplasmic space has been used in all the cell envelope CEMOVIS publications and is at the core of Erickson’s hypothesis about these observations and teichoic acids.

      (4) L. 147. This is not referring to Fig. S2A-B as suggested but Fig. S3A-B.

      This has been corrected.

      (5) l. 148. How do you know the densities observed are due to PG or certainly PG alone? Perhaps it is better to call this the cell wall.

      Yes. Cell wall is a better nomenclature and the text and Table 1 have been corrected in V2, in accordance with Fig. 2.

      (6) l. 165. It is also worth noting that peripheral cell wall synthesis also happens at the same site so this may well not be just division.

      Yes. We have replaced “division site” by “mid-cell” in V2.

      (7) l. 214 What is the debris? If PG digestion has been successful then there will be marginal debris. Is this pellet translucent (like membranes)? If you use fluorescently labelled PG in the preparation has it all disappeared, as would be expected by fully digested and solubilised material?

      In traditional protocols of bacterial membrane preparation, a low-speed centrifugation is first performed to discard “debris” that to our knowledge have not been well characterized but are thought to consist of unbroken cells and large fragments of cell wall. After enzymatic degradation of the pneumococcal cell wall, the low-speed pellet is not translucent as in typical membrane pellets after ultracentrifugation, but is rather loose, unlike a dense pellet of unbroken cells. A description of the pellet appearance was added in V2.

      It is a good idea to check if some labeled PG is also pelleted at low-speed after digestion. In a double labeling experiment using azido-choline and a novel unpublished metabolic probe of the PG, we found that the PG was fully digested and labeled fragments migrated as a couple of fuzzy bands likely corresponding to different labeled peptides. These species were not pelleted at low speed.

      (8) l. 219. Can you give a reference to certify that the low mobility material is WTA? Why does it migrate differently than LTA? Or is the PG digestion not efficient?

      WTA released from sacculi by alkaline lysis were found to migrate as a smear at the top of native gels revealed by alcian-blue silver staining, which is incompatible with SDS (Flores-Kim, 2019, 2022). The references have be added in V2. It could be argued in this case that the smearing was due to partial degradation of the WTA by the alkaline treatment.

      Bui et al. (2012) reported the preparation of WTA by enzymatic digestion of sacculi, but the resulting WTA were without muropeptide, presumably due to a step of boiling at pH 5 used to deactivate the enzymes.

      To our knowledge, this is the first report of pneumococcal WTA prepared by digestion of sacculi and analyzed by SDS-PAGE. Since the migration of WTA in native and SDS-PAGE is similar, we hypothesize that they do not interact significantly with the dodecyl sulphate, in contrast to the LTA, which bear a lipidic moiety. The fuzziness of the WTA migration pattern may also result from the greater heterogeneity due to the attached muropeptide, such as different lengths (di-, tetra-saccharide…), different peptides despite the action of LytA (tri-, tetra-peptide…), different O-acetylation status, etc.

      (9) L. 226-227, Fig S8. Presumably several of the major bands on the Coomassie stained gel are the lysozyme, mutanolysin, recombinant LytA, DNase and RNase used to digest the cell wall etc.? Can the sizes of these proteins be marked on the gel. Do any of them come down with the material at low-speed centrifugation?

      We have provided a gel showing the different enzymes individually and mixed (new Fig. S9G). While performing several experiments of this type, we found that the mutanolysin might be contaminated with proteases. The enzymes do not appear to sediment at low speed.

      (10) Fig. S9B. It is difficult to interpret what is in the image as there appear to be 2 populations of material (grey and sometimes more raised). Does the 20,000 g material look the same?

      Fig. S10B is a 20,000 × g pellet. We agree that there appears to be two types of membrane vesicles, but we do not know their nature.

      (11) l. 277 and Fig. 5A. Why is it "remarkable" that there are apparently more longer LTA molecules as the cell reach stationary phase?

      This is the first time that a change of TA length is documented. Such a change could conceivably have consequences in the binding and activity of CBPs and the physiology of the cell envelope in general. These questions should be adressed in future studies.

      (12) l. 280. How do you know which is the 6-repeat unit?

      It is an assumption based on previous analyses by Gisch et al.( J Biol Chem 2013, 288(22):15654-67. doi: 10.1074/jbc.M112.446963). The reference was added.

      (13) Fig. 5A and C. Panel C, the cells were grown in a different medium and so are not comparable to Panel A. Why is Fig. S12B not substituted for 5B? Presumably these are exponential phase cells.

      We have interverted the Fig. S13B and 5C in V2, as suggested, and changed the text and legends accordingly.

      Reviewer #2 (Recommendations for the authors):

      L30: vitreous sections?

      Corrected in V2.

      L32: as their main universal function --> as a universal function. To show it's the main universal function, you will need to look at this across various bacterial species.

      Changed to “possible universal function” in V2.

      L35: enabled the titration the actual --> titration of the actual?

      Corrected in V2.

      L34: consider breaking up this very long sentence.

      Done in V2.

      L37: may compensate the absence--> may compensate for the absence.

      Corrected in V2.

      L45: Using metabolic labeling and electrophoresis showed --> Metabolic labeling and...

      Corrected in V2.

      L46: This finding casts doubts on previous results, since most LTA were likely unknowingly discarded in these studies. This needs to be rephrased and is unnecessarily callous. While the current work casts doubts on any quantitative assessments of actual LTA levels measured in previous studies, it does not mean any qualitative assessments or conclusions drawn from these experiments are wrong. Better would be to say: These findings suggest that previously reported quantitative assessments of LTA levels are likely underestimating actual LTA levels, since much of the LTA would have been unknowingly discarded.

      If the authors do think that actual conclusions are wrong in previous work, then they need to be more explicit and explain why they were wrong.

      Yes indeed. The statement was toned down in V2.

      L55: Although generally non-essential. I would remove or rephrase this statement. I don't think any TA mutant will survive out in the wild and will be essential under a certain condition. So perhaps not essential for growth under ideal conditions, but for the rest pretty essential.

      The paragraph was amended by qualifying the essentiality to laboratory conditions and including selected references.

      L95: Note that the prevailing model until reference 20 (Gibson and Veening) was that the TA is polymerized intracellularly (see e.g. Figure 2 of PMID: 22432701, DOI: 10.1089/mdr.2012.0026). This intracellular polymerisation model seemed unlikely according to Gibson and Veening ('As TarP is classified by PFAM as a Wzy-type polymerase with predicted active site outside the cell, we speculate that TarP and TarQ polymerize the TA extracellularly in contrast to previous reports.'), but there is no experimental evidence as far as this referee knows of either model being correct.

      Despite the lack of experimental evidence, we think that Gibson and Veening are very likely correct, based on their argument, and also by analogy with the synthesis of other surface polysaccharides from undecaprenyl- or dolichol-linked precursors. It is unfortunate that Figure 2 of PMID: 22432701, DOI: 10.1089/mdr.2012.0026 was published in this way, since there was no evidence for a cytoplasmic polymerization, to our knowledge.

      L97: It is commonly believed, although I'm not sure it has ever been shown, that the capsule is covalently attached at the same position on the PG as WTA. Therefore, there must be some sort of regulation/competition between capsule biosynthesis and WTA biosynthesis (see also ref. 21). The presence of the capsule might thus also influence the characteristics of the periplasmic space. Considering that by far most pneumococcal strains are encapsulated, the authors should discuss this and why a capsule mutant was used in this study and how translatable their study using a capsule mutant is to S. pneumoniae in general.

      A paragraph was added in the Introduction of V2 to present the complication and a sentence was added at the end of the discussion to mention that this should be studied in the future.

      L102: Ref 29 should probably be cited here as well?

      Since in Ref 29 (Flores-Kim et al. 2019) there is a detectable amount of LTA (presumably precursors TA) in the ∆tacL stain, we prefer to cite only Hess et al. 2017 regarding the absence of LTA in the absence of TacL. However, we added in V2 a reference to Flores-Kim et al. 2019 in the following paragraph regarding the role of the LTA/WTA ratio.

      L106: dependent on the presence of the phosphotransferase LytR (21). --> dependent on the presence of the phosphotransferase LytR, whose expression is upregulated during competence (21).

      Corrected in V2.

      L119: I fail to see how the conclusions drawn by other groups (I assume the authors mean work from the Vollmer, Rudner, Bernhardt, Hammerschmidt, Havarstein, Veening groups?) are invalid if they compared WTA:LTA ratios between strains and conditions if they underestimated the LTA levels? Supposedly, the LTA levels were underestimated in all samples equally so the relative WTA/LTA ratio changes will qualitatively give the same outcome? I agree that these findings will allow for a reassessment of previous studies in which presumably too low LTA levels were reported, but I would not expect a difference in outcome when people compared WTA:LTA ratios between strains?

      The sentence was rephrased in V2 to be neutral regarding previous work and rather emphasize future possibilities.

      L131: Perhaps it would be good to highlight that such a conspicuous space has been noticed before by other EM methods (see e.g. Figs.4 and 5 or ref 19, or one of the most clear TEM S. pneumoniae images I have seen in Fig. 1F of Gallay et al, Nat. Micro 2021). However, always some sort of staining had previously been performed so it was never clear this was a real periplasmic space. CEMOVIS has this big advantage of being label free and imaging cells in their presumed native state.

      Thanks for pointing out these beautiful data that we had overlooked. We have added a few sentences and references in the Discussion of V2.

      L201: References are not numbered.

      Corrected in V2.

      L271/L892: Change section title. 'Evolution' can have multiple meanings. It would be more clear to write something like 'Increased TA chain length in stationary phase cells' or something like that.

      Changed in V2.

      L275: harvested

      Corrected in V2.

      L329: add, as suggested shown previously (I guess refs 24 and 29)

      Reference to Hess et al. 2017 has been added in V2. A sentence and further references to Flores-Kim, 2019, 2022 and Wu et al. 2014 were added at the end of the discussion with respect to the LTA-like signal observed in these studies of ∆tacL strains.

      L337: I think a concluding sentence is warranted here. These experiments demonstrate that membrane-bound TA precursors accumulate on the outside of the membrane, and are likely polymerized on the outside as well, in line with the model proposed in ref. 20.

      From the point of view of formal logic, the accumulation of membrane-bound TA precursors on the outer face of the membrane does not prove that they were assembled there. They could still be polymerized inside and translocated immediately. However, since this is extremely unlikely for the reasons discussed by Gibson and Veening, we have added a mild conclusion sentence and the reference in V2.

      L343: How accurate are these quantifications? Just by looking at the gel, it seems there is much less WTA in the lytR mutant than 50% of the wild type?

      Yes, the 51% value was a calculation error. This was changed to 41%. Likewise, the decrease of the WTA amount relative to LTA was corrected to 5- to 7-fold.

      Apart from the titration of TA in the WT strain, we haven’t yet carried out a careful quantification neither of TA nor of the LTA/WTA ratio in different strains and conditions, although we intend to do so in the near future using the method presented here.

      However, to better substantiate our statement regarding the ∆lytR strain, we have quantified two experiments of growth in C-medium with azido-choline, and two experiments of pulse labeling in BHI medium. The results are presented in the additional supplementary Fig. S14.

      L342: although WTA are less abundant and LTA appear to be longer (Fig. 6A). although WTA are less abundant and LTA appear to be longer (Fig. 6A), in line with a previous report showing that LytR the major enzyme mediating the final step in WTA formation (ref. 21). (or something like that). Perhaps better is to start this paragraph differently. For instance: Previous work showed that LytR is the major enzyme mediating the final step in WTA formation (ref. 21). As shown in Fig. 6A, the proportion of WTA significantly decreased in the lytR mutant. However, there was still significant WTA present indicating that perhaps another LCP protein can also produce WTA.

      Changed in V2.

      Of note, WTA levels would be a lot lower in encapsulated strains as used in Ref. 21 (assuming WTA and capsule compete for the same linkage on PG). So perhaps it would be hard to detect any residual WTA in a encapsulated lytR mutant?

      Investigation of the relationship between TA and capsule incorporation or O-acetylation is definitely a future area of study using this method of TA monitoring.

      L371: see my comments related to L131. Some TEM images clearly show the presence of a periplasmic space.

      Comments and references have been added in V2.

      L402: It would be really interesting to perform these experiments on a wild type encapsulated strain. Would these have much more LTA? (I understand you cannot do these experiments perhaps due to biosafety, but it might be interesting to discuss).

      Yes. It would be interesting to compare the TA in D39 and D39 ∆cps strains. We have added this perspective at the end of the discussion in V2.

      L418: ref lacks number

      Corrected in V2.

      L423: refs missing.

      References added in V2.

      L487: See my comments regarding L46. I do not see one valid point in the current paper why underestimating LTA levels would change any of the conclusions drawn in Ref. 21. I do not know the other papers cited well enough, but it seems highly unlikely that their conclusions would be wrong by systematically underestimating LTA levels. As far as I understand it, this current work basically confirms the major conclusions drawn by these 'doubtful' papers (that TacL makes LTA and LytR is the main WTA producer). As such, I find this sentence highly unfair without precisely specifying what the exact doubts are. Sure, this current paper now shows that probably people have discarded unknowingly LTA and therefore underestimated LTA levels, so any quantitative assessment of LTA levels are probably wrong. That is one thing. But to say this casts doubts on these studies is very serious and unfair (unless the authors provide good arguments to support these serious claims).

      Yes indeed. The sentence was rephrased to be strictly factual in V2.

      Table 2: I assume these strains are delta cps? Would be relevant to list this genotype.

      The Table 2 was completed in V2.

      The authors should comment on why the mutants have not been complemented, especially for lytR as it's the last gene in a complex operon. It would be great to see WTA levels being restored by ectopic expression of LytR.

      Yes. We think this could be part of an in-depth study of the attachment of WTA, together with the investigation of the other LCP phosphotransferases.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The aim of this study is to test the overarching hypothesis that plasticity in BNST CRF neurons drives distinct behavioral responses to unpredictable threat in males and females. The manuscript provides evidence for a possible sex-specific role for CRF-expressing neurons in the BNST in unpredictable aversive conditioning and subsequent hypervigilance across sexes. As the authors note, this is an important question given the high prevalence of sex differences in stress-related disorders, like PTSD, and the role of hypervigilance and avoidance behaviors in these conditions. The study includes in vivo manipulation, bulk calcium imaging, and cellular resolution calcium imaging, which yield important insights into cell-type specific activity patterns. However, it is difficult to generate an overall conclusion from this manuscript, given that many of the results are inconsistent across sexes and across tests and there is an overall lack of converging evidence. For example, partial conditioning yields increased startle in males but not females, yet, CRF KO only increases startle response in males after full conditioning, not partial, and CRF neurons show similar activity patterns between partial and full conditioning across sexes. Further, while the study includes a KO of CRF, it does not directly address the stated aim of assessing whether plasticity in CRF neurons drives the subsequent behavioral effects unpredictable threat.

      We appreciate the reviewer’s summary and agree that there is a large amount of complexity to the results, and that it was difficult to generate a simple model/conclusion to summarize our work. This is the unfortunate side effect of looking across both sexes at different conditioning paradigms, however, we believe that it is important to convey this information to the field even without a simple answer.  Our data reinforces the very important findings from the Maren and Holmes groups that partial fear is a different process than full fear, and that the BNST plays a differential role here. We have reworded the manuscript to better convey this complexity.

      A major strength of this manuscript is the inclusion of both males and females and attention to possible behavioral and neurobiological differences between them throughout. However, to properly assess sex-differences, sex should be included as a factor in ANOVA (e.g. for freezing, startle, and feeding data in Figure 1) to assess whether there is a significant main effect or interaction with sex. If sex is not a statistically significant factor, both sexes should be combined for subsequent analyses. See, Garcia-Sifuentes and Maney, eLife 2021 https://elifesciences.org/articles/70817. There are additional cases where t-tests are used to compare groups when repeated measures ANOVAs would be more appropriate and rigorous.

      We agree with the reviewer that this is the more appropriate analysis and have changed the analysis and figures throughout the revised manuscript to better assess sex differences as well as differences between fear conditions.

      Additionally, it's unclear whether the two sexes are equally responsive to the shock during conditioning and if this is underlying some of the differences in behavioral and neuronal effects observed. There are some reports that suggest shock sensitivity differs across sexes in rodents, and thus, using a standard shock intensity for both males and females may be confounding effects in this study.

      This is a great point. We have conducted appropriate analysis (Sex by Tone Repeated measures two-way ANOVAS for each of the groups: Ctrl, Full, Part) and there are no sex differences in freezing between males and females. The extent of conditioning is not different between the groups suggesting that if there was a difference in shock sensitivity, it is not driving any discernible differences in behavioral performance. However, it is possible that the experience of the shock differs for the animals even in the absence of any measurable behavior.

      The data does not rule out that BNST CRF activity is not purely tracking the mobility state of the animal, given that the differences in activity also track with differences in freezing behavior. The data shows an inverse relationship between activity and freezing. This may explain a paradox in the data which is why males show a greater suppression of BNST activity after partial conditioning than full conditioning, if that activity is suspected to drive the increased anxiety-like response. Perhaps it reflects that activity is significantly suppressed at the end of the conditioning session because animals are likely to be continuously freezing after repeated shock presentations in that context. It would also explain why there is less of a suppression in activity over the course of the recall session, because there is less freezing as well during recall compared with conditioning.

      While it is possible that the BNST may be tracking activity, we believe it is not purely tracking mobility state. For instance, while freezing increases across tone exposures in Part fear regardless of sex, males show an increase while females show a reduction in BNST response during tone 5 (Fig 2K). The data the reviewer refers to showing the inverse relationship with BNST activity and freezing would have suggested the opposite response if it were purely tracking the mobility state of the animal. This is also the case with BNST<sup>CRF</sup> activity to first and last tone during recall. Despite the suppression of activity over the course of recall (Fig 5K), we see an increase in BNST<sup>CRF</sup> tone response when comparing tone 1 and 6 in males and a decrease in females (Fig 6M), again suggesting the BNST is responding to more than just activity.

      A mechanistic hypothesis linking BNST CRF neurons, the behavioral effects observed after fear conditioning, and manipulation of CRF itself are not clearly addressed here.

      We disagree with this assertion. The data suggests a model in which males respond with increased arousal and Part fear males show persistent activation of the BNST and BNST<sup>CRF</sup> neurons during fear conditioning and recall while female Part fear mice show the opposite response. This female response differs from what the field believes to be the role of the BNST in sustained fear. Additionally, we show that CRF knockdown is not involved in fear differentiation or fear expression in males, while it enhances fear learning and recall in females. We have reworded the manuscript to highlight these novel findings.

      Reviewer #2 (Public Review):

      This study examined the role of CRF neurons in the BNST in both phasic and sustained fear in males and females. The authors first established a differential fear paradigm whereby shocks were consistently paired with tones (Full) or only paired with tones 50% of the time (Part), or controls who were exposed to only tones with no shocks. Recall tests established that both Full and Part conditioned male and female mice froze to the tones, with no difference between the paradigms. Additional studies using the NSF and startle test, established that neither fear paradigm produced behavioral changes in the NSF test, suggesting that these fear paradigms do not result in an increase in anxiety-like behavior. Part fear conditioning, but not Full, did enhance startle responses in males but not females, suggesting that this fear paradigm did produce sustained increases in hypervigilance in males exclusively.

      Thank you for this clear summary of the behavioral work.

      Photometry studies found that while undifferentiated BNST neurons all responded to shock itself, only Full conditioning in males lead to a progressive enhancement of the magnitude of this response. BNST neurons in males, but not females, were also responsive to tone onset in both fear paradigms, but only in Full fear did the magnitude of this response increase across training. Knockdown of CRF from the BNST had no effect on fear learning in males or females, nor any effect in males on fear recall in either paradigm, but in females enhanced both baseline and tone-induced freezing only in Part fear group. When looking at anxiety following fear training, it was found in males that CRF knockdown modulated anxiety in Part fear trained animals and amplified startle in Fully trained males but had no effect in either test in females. Using 1P imaging, it was found that CRF neurons in the BNST generally decline in activity across both conditioning and recall trials, with some subtle sex differences emerging in the Part fear trained animals in that in females BNST CRF neurons were inhibited after both shock and omission trials but in males this only occurred after shock and not omission trials. In recall trials, CRF BNST neuron activity remained higher in Part conditioned mice relative to Full conditioned mice.

      Overall, this is a very detailed and complex study that incorporates both differing fear training paradigms and males and females, as well as a suite of both state of the art imaging techniques and gene knockdown approaches to isolate the role and contributions of CRF neurons in the BNST to these behavioral phenomena. The strengths of this study come from the thorough approach that the authors have taken, which in turn helped to elucidate nuanced and sex specific roles of these neurons in the BNST to differing aspects of phasic and sustained fear. More so, the methods employed provide a strong degree of cellular resolution for CRF neurons in the BNST. In general, the conclusions appropriately follow the data, although the authors do tend to minimize some of the inconsistencies across studies (discussed in more depth below), which could be better addressed through discussion of these in greater depth. As such, the primary weakness of this manuscript comes largely from the discussion and interpretation of mixed findings without a level of detail and nuance that reflects the complexity, and somewhat inconsistency, across the studies. These points are detailed below:

      - Given the focus on CRF neurons in the BNST, it is unclear why the photometry studies were performed in undifferentiated BNST neurons as opposed to CRF neurons specifically (although this is addressed, to some degree, subsequently with the 1P studies in CRF neurons directly). This does limit the continuity of the data from the photometry studies to the subsequent knockdown and 1P imaging studies. The authors should address the rationale for this approach so it is clear why they have moved from broader to more refined approaches.

      The reviewer raises a good point.  We did some preliminary photometry studies with BNST CRF neurons and found that there was poor time locked signal. We reasoned that this was due to the heterogeneity of the cell activity, as we saw in our previous publication (Yu et al). Because of this, we moved to the 1p imaging work in place of continued BNST CRF photometry. We have also reworded the manuscript to better discuss the complexities and inconsistencies in findings across the studies.

      - The CRF KD studies are interesting, but it remains speculative as to whether these effects are mediated locally in the BNST or due to CRF signaling at downstream targets. As the literature on local pharmacological manipulation of CRF signaling within the BNST seems to be largely performed in males, the addition of pharmacological studies here would benefit this to help to resolve if these changes are indeed mediated by local impairments in CRF release within the BNST or not. While it is not essential to add these experiments, the manuscript would benefit from a more clear description of what pharmacological studies could be performed to resolve this issue.

      We agree with the reviewer that the addition of this experiment would be highly informative for differentiating the role of CRF in the BNST. This is something that will need to be considered moving forward and we have added this as a point of discussion.

      - While I can appreciate the authors perspective, I think it is more appropriate to state that startle correlates with anxiety as opposed to outright stating that startle IS anxiety. Anxiety by definition is a behavioral cluster involving many outputs, of which avoidance behavior is key. Startle, like autonomic activation, correlates with anxiety but is not the same thing as a behavioral state of anxiety (particularly when the startle response dissociates from behavior in the NSF test, which more directly tests avoidance and apprehension). Throughout the manuscript the use of anxiety or vigilance to describe startle becomes interchangeable, but then the authors also dissociate these two, such as in the first paragraph of the discussion when stating that the Part fear paradigm produces hypervigilance in males without influencing fear or anxiety-like behaviors. The manuscript would benefit from harmonization of the language used to operationally define these behaviors and my recommendation would be to remain consistent with the description that startle represents hypervigilance and not anxiety, per se.

      The reviewer raises an excellent point, we have clarified in the revised manuscript.

      - The interpretation of the anxiety data following CRF KD is somewhat confusing. First, while the authors found no effect of fear training on behavior in the NSF test in the initial studies, now they do, however somewhat contradictory to what one would expect they found that Full fear trained males had reduced latency to feed (indicative of an anxiolytic response), which was unaltered by CRF KD, but in Part fear (which appeared to have no effect on its own in the NSF test), KD of CRF in these animals produced an anxiolytic effect. Given that the Part fear group was no different from control here it is difficult to interpret these data as now CRF KD does reduce latency to feed in this group, suggesting that removal of CRF now somehow conveys an anxiolytic response for Part fear animals. In the discussion the authors refer to this outcome as CRF KD "normalizing" the behavior in the NSF test of Part fear conditioned animals as now it parallels what is seen after Full fear, but given that the Part fear animals with GFP were no different then controls (and neither of these fear training paradigms produced any effect in the NSF test in the first arm of studies), it seems inappropriate to refer to this as "normalization" as it is unclear how this is now normalized. Given the complexity of these behavioral data, some greater depth in the discussion is required to put these data in context and describe the nuance of these outcomes, in particular a discussion of possible experimental factors between the initial behavioral studies and those in the CRF KD arm that could explain the discrepancy in the NSF test would be good (such as the inclusion of surgery, or other factors that may have differed between these experiments). These behavioral outcomes are even more complex given that the opposite effect was found in startle whereby CRF KD amplified startle in Full trained animals. As such, this portion of the discussion requires some reworking to more adequately address the complexity of these behavioral findings.

      The reviewer raises a good point, and we agree that there are many inconsistencies in the behaviors. We believe it is still good to show these results but have expanded the manuscript on potential reasons for these behavioral inconsistencies.

      Reviewer #3 (Public Review):

      Hon et al. investigated the role of BNST CRF signaling in modulating phasic and sustained fear in male and female mice. They found that partial and full fear conditioning had similar effects in both sexes during conditioning and during recall. However, males in the partially reinforced fear conditioning group showed enhanced acoustic startle, compared to the fully reinforced fear conditioning group, an effect not seen in females. Using fiber photometry to record calcium activity in all BNST neurons, the authors show that the BNST was responsive to foot shock in both sexes and both conditioning groups. Shock response increased over the session in males in the fully conditioned fear group, an effect not observed in the partially conditioned fear group. This effect was not observed in females. Additionally, tone onset resulted in increased BNST activity in both male groups, with the tone response increasing over time in the fully conditioned fear group. This effect was less pronounced in females, with partially conditioned females exhibiting a larger BNST response. During recall in males, BNST activity was suppressed below baseline during tone presentations and was significantly greater in the partially conditioned fear group. Both female groups showed an enhanced BNST response to the tone that slowly decayed over time. Next, they knocked CRF in the BNST to examine its effect on fear conditioning, recall and anxiety-like behavior after fear. They found no effect of the knockdown in either sex or group during fear conditioning. During fear recall, BNST CRF knockdown lead to an increase in freezing in only the partially conditioned females. In the anxiety-like behavior tasks, BNST CRF knockdown lead to increased anxiolysis in the partially reinforced fear male, but not in females. Surprisingly, BNST CRF knockdown increased startle response in fully conditioned, but not partially conditioned males. An effect not observed in either female group. In a final set of experiments, the authors single photon calcium imaging to record BNST CRF cell activity during fear conditioning and recall. Approximately, 1/3 of BNST CRF cells were excited by shock in both sexes, with the rest inhibited and no differences were observed between sexes or group during fear conditioning. During recall, BNST CRF activity decreased in both sexes, an effect pronounced in male and female fully conditioned fear groups.

      Overall, these data provide novel, intriguing evidence in how BNST CRF neurons may encode phasic and sustained fear differentially in males and females. The experiments were rigorous.

      We thank you for this positive review of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are several graphs representing different analyses of (presumably) the same group of subjects, but which have different N/group. For example, in Figure 2:

      (1) Fig 2P seems to have n=10 in Part Male group (Peak), but 2Q only has n=9 in Part Male group (AUC)

      (2) Fig 2S seems to have n=10 in Part Female group (Peak), but 2T only has n=7 in Part Female group (AUC)

      (3) Fig 2G (Tone Resp) has n=6 Full Males but 2F (Tone Resp), 2H (Shock Resp), and 2I (Shock Resp) have n=7 Full Males

      (4) Fig 2K (Tone Resp) has n=7 Full Females but 2L (Tone Resp), 2M (Shock Resp), and 2N (Shock Resp) have n=8 Full Females

      (5) Fig 2L (Tone Resp) has n=9 Part Females but 2K (Tone Resp), 2M (Shock Resp), and 2N (Shock Resp) have n=10 Part Females

      It's possible that this is just due to overlapping individual data points which are made harder to see due to the low resolution of the figures. If so, this can be easily rectified. However, there may also be subjects missing from some analyses which must be clarified or corrected.

      We thank you for catching these. We have gone through and fixed any issues with data points and have added statistics and exclusions in datasets to figure legends to further explain inconsistencies.

      Regarding statistical tests:

      (2) Data in Figs 2G and 2I should be analyzed using a two-way RM ANOVA.

      We have now included sex as a factor in most of our analysis and are now using appropriate statistical tests.

      (3) Data in Fig 3K should be analyzed using a two-way RM ANOVA.

      We are now using appropriate statistical tests.

      Calcium activity in response to the shock during conditioning and in response to the tone during recall should be included in Figure 5. Given partial and full animals also receive unequal presentations of the cue, it would be useful to see the effects trial by trial or normalized to the first 3 presentations only.

      The reviewer raises a great point. We have changed this figure and have now added the response to shock and tones. Since we are most interested in the difference between sustained and phasic fear, we decided to compare tone 3 in Full fear and tone 4 in Part fear, which differ in the ambiguity of their cue and only have one tone difference.

      Histology maps should be included for all experiments depicting viral spread and implant location for all animals, in addition to the included representative histology images. These can be placed in the supplement.

      We agree this is helpful. While we have confirmed all of the experiments are hits, the tissue is no longer in condition for this analysis.

      Referring to the quantification of peaks in fiber photometry and cellular resolution calcium imaging data as "spikes" is a bit misleading given the inexact relationship between GCAMP sensor dynamics/calcium binding and neuronal action potentials, perhaps calling it "event" frequency would be more clear.

      We have changed the references of spikes to events as suggested.

      The legend for Figure 2S is mislabeled as A.

      Thank you for catching this mistake, it has been fixed.

      The methods refer to CRFR1 fl/fl animals but it seems no experiments used these animals, only CRF fl/fl.

      We have fixed this, thank you.

      Reviewer #2 (Recommendations For The Authors):

      As stated in the public review, while I think the addition of local pharmacological studies blocking CRF1 and 2 receptors in the BNST in both males and females, done under the same conditions as all of the other testing herein, would help to resolve some of the speculation of interpreting the CRF KD data, I dont think these studies are essential to do, but it would be good for the authors to more explicitly state what studies could be done and how they could facilitate interpretation of these data.

      Thank you for this suggestion. We have added this discussion into the manuscript.

      Asides from this, my other recommendations for the authors are to more clearly address the discrepancies in behavioral outcomes across studies and explicitly describe their rationale for the sequence of experiments performed and to harmonize their operationalization of how they define anxiety.

      Again, we appreciate these great suggestions. We have added more discussion on the behavioral discrepancies as well as rationale for the experiments. We have also changed the wording to remain consistent that the NSF test relates to anxiety and the Startle test relates to vigilance.

      - In Figure 2, Panel S is listed as Panel A in the caption and should be corrected.

      Thank you for catching this mistake, we have fixed it.

      Reviewer #3 (Recommendations For The Authors):

      My biggest concerns I have regard the interpretations and some conclusions from this data set, which I have stated below.

      (1) It was surprising to see minimal and somewhat conflicting behavioral effects due to BNST CRF knockdown. The authors provide a representative image and address this in the conclusion. They mention the role of local vs projection CRF circuits as well as the role of GABA. I don't think those experiments are necessary for this manuscript. However, it may be worthwhile to see through in situ hybridization or IHC, to see BNST CRF levels after both full and partial conditioned fear paradigms. Additionally, it would help to see a quantification of the knockdown of the animals.

      Thank you for these great suggestions. We will consider these for future experiments. We piloted out some CRF sensor experiments to probe this, but it was unclear if the signal to noise for the sensor was sufficient. We hope to do more of this in the future if we ever manage to get funding for this work.

      The authors can add a figure showing deltaF/F changes from control.

      We did not have control mice in these in-vivo experiments Our main interests lie in understanding the differences in Full and Part Fear conditioning paradigms specifically.

      (2) Related to the previous point, it was surprising to see an effect of the CRF deletion in the full fear group compared to the partial fear in the acoustic startle task. To strengthen the conclusion about differential recruitment of CRF during phasic and sustained fear, the experiment in my previous point could help elucidate that. Conversely, intra-BNST administration of a CRF antagonist into the BNST before the acoustic startle after both conditioning tasks could also help. Or patch from BNST CRF neurons after the conditioning tasks to measure intrinsic excitability. Not all these experiments are needed to support the conclusion, it's some examples.

      We thank the reviewer for these suggestions and agree that these are important experiments. We will consider this in future experiments exploring the role of BNST CRF in fear conditioning.

      (3) In Figure 5 F and K, the authors report data combined for both part and full fear conditioning. Were there any differences between the number of excited or inhibited neurons b/t the conditioning groups?

      We are only looking at the first shock exposure in these figures. These were combined because the first tone and shock exposure is identical in Full and Part fear conditioning. Differences in these behavioral paradigms emerge after Tone 3 exposure, where Part fear does not receive a shock while Full fear does.

      Also, can the authors separate male and female traces in Fig 5 E and P?

      Traces in Fig E are from females only. We did not include male traces because males and females had identical responses to first shock, and we felt only one trace was needed as an example. Traces in Figure P are from males. We did not show female traces because females did not show differential effects from baseline to end.

      (4) Also, regarding the calcium imaging data, what was the average length of a transient induced by shock? Were there any differences between the sexes?

      We have many cells in each condition, and the length of traces after shock were all different and hard to quantify, as for example, sometimes cells were active before shock and thus trace length would be difficult to quantify. Therefore, to keep consistency and reduce ambiguity regarding trace lengths, we focused on keeping the time consistent across mice and focused on the 10 second window post shock to be consistent across conditions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review):

      Summary:

      In this study, Osiurak and colleagues investigate the neurocognitive basis of technical reasoning. They use multiple tasks from two neuroimaging studies and overlap analysis to show that the area PF is central for reasoning, and plays an essential role in tool-use and non-tool-use physical problem-solving, as well as both conditions of mentalizing task. They also demonstrate the specificity of the technical reasoning and find that the area PF is not involved in the fluid-cognition task or the mentalizing network (INT+PHYS vs. PHYS-only). This work suggests an understanding of the neurocognitive basis of technical reasoning that supports advanced technologies.

      Strengths:

      -The topic this study focuses on is intriguing and can help us understand the neurocognitive processes involved in technical reasoning and advanced technologies.

      -The researchers obtained fMRI data from multiple tasks. The data is rich and encompasses the mechanical problem-solving task, psychotechnical task, fluid-cognition task, and mentalizing task.

      -The article is well written.

      We sincerely thank Reviewer 1 for their positive and very helpful comments, which helped us improve the MS. Thank you.

      Weaknesses:

      - Limitations of the overlap analysis method: there are multiple reasons why two tasks might activate the same brain regions. For instance, the two tasks might share cognitive mechanisms, the activated regions of the two tasks might be adjacent but not overlapping at finer resolutions, or the tasks might recruit the same regions for different cognition functions.

      Thus, although overlap analysis can provide valuable information, it also has limitations.

      Further analyses that capture the common cognitive components of activation across different

      tasks are warranted, such as correlating the activation across different tasks within subjects for a region of interest (i.e. the PF).

      We thank Reviewer 1 for this comment. We added new analyses to address the two alternative interpretations stressed here by Reviewer 1, namely, the same-region-but-differentfonction interpretation and the adjacency interpretation. The new analyses ruled out both alternative interpretations, thereby reinforcing our interpretation.

      “The conjunction analysis reported was subject to at least two key limitations that needed to be overcome to assure a correct interpretation of our findings. The first was that the tasks could recruit the same regions for different cognition functions (same-region-but-different-function interpretation). The second was that the activated regions of the different tasks could be adjacent but did not overlap at finer resolutions (adjacency interpretation). We tested the same-region-but-different-function interpretation by conducting additional ROI analyses, which consisted of correlating the specific activation of the left area PF (i.e., difference in terms of mean Blood-Oxygen Level Dependent [BOLD] parameter estimates between the experimental condition minus the control condition) in the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. This analysis did not include the mechanical problem-solving task because the sample of participants was not the same for this task. As shown in Fig. 5, we found significant correlations between all the tasks that were hypothesized as recruiting technical reasoning, i.e., the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .05). By contrast, no significant correlation was obtained between these three tasks and the fluid-cognition task (all p > .15). This finding invalidates the same-region-but-different-function interpretation by revealing a coherent pattern in the activation of the left area PF in situations in which participants were supposed to reason technically. We examined the adjacency interpretation by analysing the specific locations of individual peak activations within the left area PF ROI for the mechanical problemsolving task, the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. These peaks, which corresponded to the maximum value of activation obtained for each participant within the left area PF ROI, are reported in Fig. 6. As can be seen, the peaks of the fluid-cognition task were located more anteriorly, in the left area PFt (Parietal Ft) and the postcentral cortex, compared to the peaks of the other four tasks, which were more posterior, in the left area PF. Statistical analyses based on the y coordinates of the individual activation peaks confirmed this description (Fig. 6). Indeed, the y coordinates of the peaks of the mechanical problem-solving task, the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task were posterior to the y coordinates of the peaks of the fluid-cognition task (all p < .05), whereas no significant differences were reported between the four tasks (all p > .05). These findings speak against the adjacency interpretation by revealing that participants recruited the same part of the left area PF to perform tasks involving technical reasoning.” (p. 11-13)

      Control tasks may be inadequate: the tasks may involve other factors, such as motor/ actionrelated information. For the psychotechnical task, fluid-cognition task, and mentalizing task, the experiment tasks need not only care about technical-cognition information but also motor-related information, whereas the control tasks do not need to consider motor-related information (mainly visual shape information). Additionally, there may be no difference in motor-related information between the conditions of the fluid-cognition task. Therefore, the regions of interest may be sensitive to motor-related information, affecting the research conclusion.

      We thank Reviewer 1 for this comment. We added a specific section in the discussion that addresses this limitation.

      “The second limitation concerns the alternative interpretation that the left area PF is not central to technical reasoning but to the storage of sensorimotor programs about the prototypical manipulation of common tools. Here we show that the left area PF is recruited even in situations in which participants do not have to process common manipulable tools. For instance, some items of the psychotechnical task consisted of pictures of tractor, boat, pulley, or cannon. The fact that we found a common activation of the left area PF in such tasks as well as in the mechanical problem-solving task, in which participants could nevertheless simulate the motor actions of manipulating novel tools, indicates that this brain area is not central to tool manipulation but to physical understanding. That being said, some may suggest that viewing a boat or a cannon is enough to incite the simulation of motor actions, so our tasks were not equipped to distinguish between the manipulation-based approach and the reasoning-based approach. We have already shown that the left area PF is more involved in tasks that focus on the mechanical dimension of the tool-use action (e.g., the mechanical interaction between a tool and an object) than its motor dimension (i.e., the interaction between the tool and the effector [e.g., 24, 40]). Nevertheless, we recognize that future research is still needed to test the predictions derived from these two approaches.” (p. 18-19)

      -Negative results require further validation: the cognitive results for the fluid-cognition task in the study may need more refinement. For instance, when performing ROI analysis, are there any differences between the conditions? Bayesian statistics might also be helpful to account for the negative results.

      We agree that our negative results required further validation. We conducted the ROI analyses suggested by Reviewer 1, which confirmed the initial whole-brain analyses.

      “Region of interest (ROI) results. We conducted additional analyses to test the robustness of our findings. One of our results was that we did not report any specific activation of the left area PF in the fluid-cognition task contrary to the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. However, this negative result needed exploration at the ROI level. Therefore, we created a spherical ROI of the left area PF with a radius of 12 mm in the MNI standard space (–59; –31; 40). This ROI was literature-defined to ensure the independence of its selection (40). ROI results are shown in Fig. 4. The analyses confirmed the results obtained with the whole-brain analyses by indicating a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .35).” (p. 10-11)

      Reviewer #1 (Recommendations For The Authors):

      (1) I may not fully grasp some of the arguments. In the abstract, what does the term "intermediate-level" mean, and why is it an intermediate-level state? In the sentence "the existence of a specific cognitive module in the human brain dedicated to materiality", I cannot see a clear link between technical cognition and the word "materiality".

      We used the term materiality to refer to a potential human trait that allows us to shape the physical world according to our ends, by using, making tools and transmiting them to others. This is a reference to Allen et al. (2020; PNAS): “We hope this empirical domain and modeling framework can provide the foundations for future research on this quintessentially human trait: using, making, and reasoning about tools and more generally shaping the physical world to our ends” (p. 29309). Scientists (including archaeologists, economists, psychologists, neuroscientists) interested in human materiality have tended to focus on how we manipulate things according to our thought (motor cognition) or how we conceptualize our behaviour to transmit it to others (language, social cognition). However, little has been said on the intermediate level, that is, technical cognition. We added the term “technical cognition” here, which should help to make the connection more quickly.

      “Yet, little has been said about the intermediate-level cognitive processes that are directly involved in mastering this materiality, that is, technical cognition.” (p. 2)

      (2) The introduction could provide more details on why the issue of "generalizability and specificity" is important to address, to clarify the significance of the research question.

      We followed this comment and added a sentence to explain why it is important to address this research question. Again, we thank Reviewer 1 for their helpful comments.

      “Here we focus on two key aspects of the technical-reasoning hypothesis that remain to be addressed: Generalizability and specificity. If technical reasoning is a specific form of reasoning oriented towards the physical world, then it should be implicated in all (the generalizability question) and only (the specificity question) the situations in which we need to think about the physical properties of our world.” (p. 5)

      Reviewer #2 (Public Review):

      Summary:

      The goal of this project was to test the hypothesis that a common neuroanatomic substrate in the left inferior parietal lobule (area PF) underlies reasoning about the physical properties of actions and objects. Four functional MRI (fMRI) experiments were created to test this hypothesis. Group contrast maps were then obtained for each task, and overlap among the tasks was computed at the voxel level. The principal finding is that the left PF exhibited differentially greater BOLD response in tasks requiring participants to reason about the physical properties of actions and objects (referred to as technical reasoning). In contrast, there was no differential BOLD response in the left PF when participants engaged in fMRI variant of the Raven's progressive matrices to assess fluid cognition.

      Strengths:

      This is a well-written manuscript that builds from extensive prior work from this group mapping the brain areas and cognitive mechanisms underlying object manipulation, technical reasoning, and problem-solving. Major strengths of this manuscript include the use of control conditions to demonstrate there are differentially greater BOLD responses in area PF over and above the baseline condition of each task. Another strength is the demonstration that area PF is not responsive in tasks assessing fluid cognition - e.g., it may just be that PF responds to a greater extent in a harder condition relative to an easy condition of a task. The analysis of data from Task 3 rules out this alternative interpretation. The methods and analysis are sufficiently written for others to replicate the study, and the materials and code for data analysis are publicly available.

      We sincerely thank Reviewer 2 for their precious comments, which helped us improve the MS. 

      Weaknesses:

      The first weakness is that the conclusions of the manuscript rely on there being overlap among group-level contrast maps presented in Figure 2. The problem with this conclusion is that different participants engaged in different tasks. Never is an analysis performed to demonstrate that the PF region identified in e.g., participant 1 in Task 2 is the same PF region identified in Participant 1 in Task 4.

      We added new analyses that demonstrated that “the PF region identified in e.g., participant 1 in Task 2 is the same PF region identified in Participant 1 in Task 4”. We thank Reviewer 2 for this comment, because these new analyses reinforced our interpretation.

      “The conjunction analysis reported was subject to at least two key limitations that needed to be overcome to assure a correct interpretation of our findings. The first was that the tasks could recruit the same regions for different cognition functions (same-region-but-different-function interpretation). The second was that the activated regions of the different tasks could be adjacent but did not overlap at finer resolutions (adjacency interpretation). We tested the same-region-but-different-function interpretation by conducting additional ROI analyses, which consisted of correlating the specific activation of the left area PF (i.e., difference in terms of mean Blood-Oxygen Level Dependent [BOLD] parameter estimates between the experimental condition minus the control condition) in the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. This analysis did not include the mechanical problem-solving task because the sample of participants was not the same for this task. As shown in Fig. 5, we found significant correlations between all the tasks that were hypothesized as recruiting technical reasoning, i.e., the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .05). By contrast, no significant correlation was obtained between these three tasks and the fluid-cognition task (all p > .15). This finding invalidates the same-region-but-different-function interpretation by revealing a coherent pattern in the activation of the left area PF in situations in which participants were supposed to reason technically. We examined the adjacency interpretation by analysing the specific locations of individual peak activations within the left area PF ROI for the mechanical problemsolving task, the psychotechnical task, the fluid-cognition task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. These peaks, which corresponded to the maximum value of activation obtained for each participant within the left area PF ROI, are reported in Fig. 6. As can be seen, the peaks of the fluid-cognition task were located more anteriorly, in the left area PFt (Parietal Ft) and the postcentral cortex, compared to the peaks of the other four tasks, which were more posterior, in the left area PF. Statistical analyses based on the y coordinates of the individual activation peaks confirmed this description (Fig. 6). Indeed, the y coordinates of the peaks of the mechanical problem-solving task, the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task were posterior to the y coordinates of the peaks of the fluid-cognition task (all p < .05), whereas no significant differences were reported between the four tasks (all p > .05). These findings speak against the adjacency interpretation by revealing that participants recruited the same part of the left area PF to perform tasks involving technical reasoning.” (p. 11-13)

      A second weakness is that there is a variance in accuracy between tasks that are not addressed. It is clear from the plots in the supplemental materials that some participants score below chance (~ 50%). This means that half (or more) of the fMRI trials of some participants are incorrect. The methods section does not mention how inaccurate trials were handled. Moreover, if 50% is chance, it suggests that some participants did not understand task instructions and were systematically selecting the incorrect item.

      It is true that the experimental conditions were more difficult than the control conditions, with some participants who performed at or below 50% in the experimental conditions. We added a section in the MS to stress this aspect. To examine whether this potential difficulty effect biased our interpretation, we conducted new ROI analyses by removing all the participants who performed at or below the chance level. These analyses revealed the same results as when no participant was excluded, suggesting that this did not bias our interpretation.

      “As mentioned above, the experimental conditions of all the tasks were more difficult than their control conditions. As a result, the specific activation of the left area PF documented above could simply reflect that this area responds to a greater extent in a harder condition relative to an easy condition of a task. This interpretation is nevertheless ruled out by the results obtained with the fluid-cognition task. We did not report a specific activation of the left area PF in this task while its experimental condition was more difficult than its control condition. To test more directly this effect of difficulty, we conducted new ROI analyses by removing all the participants who performed at or below 50% (Fig. S2). These new analyses replicated the initial analyses by showing a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .48). In sum, the ROI analyses corroborated the wholebrain analyses and ruled out the potential effect of difficulty.” (p. 11)

      A third weakness is related to the fluid cognition task. In the fMRI task developed here, the participant must press a left or right button to select between 2 rows of 3 stimuli while only one of the 3 stimuli is the correct target. This means that within a 10-second window, the participant must identify the pattern in the 3x3 grid and then separately discriminate among 6 possible shapes to find the matching stimulus. This is a hard task that is qualitatively different from the other tasks in terms of the content being manipulated and the time constraints.

      We acknowledge that the fluid-cognition task involved a design that differed from the other tasks. However, this was also true for the other tasks, as the design also differed between the mechanical problem-solving task, the psychotechnical task, and the mentalizing task. Nevertheless, despite these distinctions, we found a consistent activation of the left area PF in these tasks with different designs including in the psychotechnical task, which seemed as difficult as the fluid-cognition task.

      “Region of interest (ROI) results. We conducted additional analyses to test the robustness of our findings. One of our results was that we did not report any specific activation of the left area PF in the fluid-cognition task contrary to the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task. However, this negative result needed exploration at the ROI level. Therefore, we created a spherical ROI of the left area PF with a radius of 12 mm in the MNI standard space (–59; –31; 40). This ROI was literature-defined to ensure the independence of its selection (40). ROI results are shown in Fig. 4. The analyses confirmed the results obtained with the whole-brain analyses by indicating a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .35).” (p. 10-11)

      In sum, this is an interesting study that tests a neuro-cognitive model whereby the left PF forms a key node in a network of brain regions supporting technical reasoning for tool and non-tool-based tasks. Localizing area PF at the level of single participants and managing variance in accuracy is critically important before testing the proposed hypotheses.

      We thank Reviewer 2 for this positive evaluation and their suggestions. As detailed in our response, our revision took into consideration both the localization of the left area PF at the level of single participants and the variance in accuracy. 

      Reviewer #2 (Recommendations For The Authors):

      Did the fMRI data undergo high-pass temporal filtering prior to modeling the effects of interest? Participants engaged in a long (17-24 minutes) run of fMRI data collection. Highpass filtering of the data is critically important when managing temporal autocorrelation in the fMRI response (e.g., see Shinn et al., 2023, Functional brain networks reflect spatial and temporal autocorrelation. Nature Neuroscience).

      Yes. We added this information.

      “Regressors of non-interest resulting from 3D head motion estimation (x, y, z translation and three axes of rotation) and a set of cosine regressors for high-pass filtering were added to the design matrix.” (p. 25-26)

      Including scales in Figure 2 would help the reader interpret the magnitude of the BOLD effects.

      We added this information in Figure 3 (Figure 2 in the initial version of the MS).

      It was difficult to inspect the small thumbnail images of the task stimuli in Figure 1. Higher resolution versions of those stimuli would help facilitate understanding of the task design and trial structure.

      We changed both Figure 1 and Figure S1.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript reports two neuroimaging experiments assessing commonalities and differences in activation loci across mechanical problem-solving, technical reasoning, fluid cognition, and "mentalizing" tasks. Each task includes a control task. Conjunction analyses are performed to identify regions in common across tasks. As Area PF (a part of the supramarginal gyrus of the inferior parietal lobe) is involved across 3 of the 4 tasks, the investigators claim that it is the hub of technical cognition.

      Strengths:

      The aim of finding commonalities and differences across related problem-solving tasks is a useful and interesting one.

      The experimental tasks themselves appear relatively well-thought-out, aside from the concern that they are differentially difficult.

      The imaging pipeline appears appropriate.

      We thank Reviewer 3 for their constructive comments, which helped us improve the MS.

      Weaknesses:

      (1) Methodological

      As indicated in the supplementary tables and figures, the experimental tasks employed differ markedly in 1) difficulty and 2) experimental trial time. Response latencies are not reported (but are of additional concern given the variance in difficulty). There is concern that at least some of the differences in activation patterns across tasks are the result of these fundamental differences in how hard various brain regions have to work to solve the tasks and/or how much of the trial epoch is actually consumed by "on-task" behavior. These difficulty issues should be controlled for by 1) separating correct and incorrect trials, and 2) for correct trials, entering response latency as a regressor in the Generalized Linear Models, 3) entering trial duration in the GLMs.

      We thank Reviewer 3 for this comment. It is true that the experimental conditions were more difficult than the control conditions, with some participants who performed at or below 50% in the experimental conditions. We added a section in the MS to stress this aspect. We could not conduct new analyses by separating correct and incorrect trials because, for each task, participants had to respond only on the last item of the block. Therefore, we did not record a response for each event. Nevertheless, we could examine whether this potential difficulty effect biased our interpretation, by conducting new ROI analyses in which we removed all the participants who performed at or below the chance level. These analyses revealed the same results as when no participant was excluded, suggesting that this did not bias our interpretation. 

      “As mentioned above, the experimental conditions of all the tasks were more difficult than their control conditions. As a result, the specific activation of the left area PF documented above could simply reflect that this area responds to a greater extent in a harder condition relative to an easy condition of a task. This interpretation is nevertheless ruled out by the results obtained with the fluid-cognition task. We did not report a specific activation of the left area PF in this task while its experimental condition was more difficult than its control condition. To test more directly this effect of difficulty, we conducted new ROI analyses by removing all the participants who performed at or below 50% (Fig. S2). These new analyses replicated the initial analyses by showing a greater activation of the left area PF in the mechanical problem-solving task, the psychotechnical task, and the PHYS-Only and INT+PHYS conditions of the mentalizing task (all p < .001), but not in the fluid-cognition task (p \= .48). In sum, the ROI analyses corroborated the wholebrain analyses and ruled out the potential effect of difficulty.” (p. 11)

      A related concern is that the control tasks also differ markedly in the degree to which they were easier and faster than their corresponding experimental task. Thus, some of the control tasks seem to control much better for difficulty and time on task than others. For example, the control task for the psychotechnical task simply requires the indication of which array contains a simple square shape (i.e., it is much easier than the psychotechnical task), whereas the control task for mechanical problem-solving requires mentally fitting a shape into a design, much like solving a jigsaw puzzle (i.e., it is only slightly easier than the experimental task).

      It is true that some control conditions could be easier than other ones. These differences reinforced the common activation found in the left area PF in the tasks hypothesized as involving technical reasoning, because this activation survived irrespective of the differences in terms of experimental design. For us, the rationale is the same as for a meta-analysis, in which we try to find what is common to a great variety of tasks. The only detrimental consequence we identified here is that this difference explained why we did not report a specific activation of the left area PF in the fluid-cognition task, as if the left area PF was more responsive when the task was difficult. This possibility assumes that the experimental condition of the fluid-cognition task is much more difficult than its control condition compared to what can be seen in the other tasks. As Reviewer 2 stressed in Point 1, this interpretation is unlikely, because the differences between the experimental and control conditions were similar to the fluid-cognition task in the mechanical problem-solving and psychotechnical tasks. In addition, again, the new ROI analyses in which we removed all the participants who performed at or below the chance level in expetimental conditions reproduced our initital results.

      (2) Theoretical 

      The investigators seem to overlook prior research that does not support their perspective and their writing seems to lack scientific objectivity in places. At times they over-reach in the claims that can be made based on the present data. Some claims need to be revised/softened.

      As this comment is also mentioned below, please find our response to it below.

      Reviewer #3 (Recommendations For The Authors):

      (1) Because of the high level of detail, Figures 1 and S2 (particularly the mentalizing task and mechanical problem-solving task, and their controls) are very hard to parse, even when examined relatively closely. It is suggested that these figures be broken down into separate panels for Experiment 1 and Experiment 2 to facilitate understanding.

      We changed both Figure 1 and Figure S1.

      (2) The behavioral data (including response latencies) should be reported in the main results section of the paper and not in a supplement.

      The behavioural data are now reported in the main results. We did not report response latencies because participants were not prompted to respond as quickly as possible.

      “Behavioural results. All the behavioural results are given in Fig. 2. As shown, scores were higher in the experimental conditions than for the control conditions for all the tasks (all p < .05). In other words, the experimental conditions were more difficult than the control conditions. This difference in terms of difficulty can also be illustrated by the fact that some participants performed at or below the chance level in the experimental conditions whereas none did so in the control conditions.” (p. 8)

      (3) The investigators seem to overlook prior research that does not support their perspective and their writing seems to lack scientific objectivity in places. At times they over-reach in the claims that can be made based on the present data. For example, claims that need to be revised/softened include:

      Abstract: "Area PF... can work along with social-cognitive skills to resolve day-to-day interactions that combine social and physical constraints". This statement is overly speculative.

      This statement is based on the fact that we reported a combined activation of the technical-reasoning network and the mentalizing network in the INT+PHYS condition of the mentalizing task. This suggests that both networks need to work together for solving a day-today problem in which both the physical constraints of the situation and the intention of the individual must be integrated. Our findings replicated previous ones with a similar task (e.g., Brunet et al. 2000; Völlm et al., 2006), in which the authors gave an interpretation similar to ours in considering that this task requires understanding physical and social causes. Perhaps that the reference to the results of the mentalizing task was not explicit enough. We added “dayto-day” before “problem” in the part of the discussion in which we discuss this possibility to make this aspect clearer.

      “In broad terms, the results of the mentalizing task indicate that causal reasoning has distinct forms and that it recruits distinct networks of the human brain (Social domain: Mentalizing; Physical domain: Technical reasoning), which can nevertheless interact together to solve day-to-day problems in which several domains are involved, such as in the INT+PHYS condition of the mentalizing task.” (p. 16)

      Introduction: "The manipulation-based approach... remains silent on the more general cognitive mechanisms...that must also encompass the use of unfamiliar or novel tools". This statement seems to be based on an overly selective literature review. There are a number of studies in which the relationship between a novel and familiar tool selection/use has been explored (e.g., Buchman & Randerath, 2017; Mizelle & Wheaton, 2010; Silveri & Ciccarelli, 2009; Stoll, Finkel et al., 2022; Foerster, 2023; Foerster, Borghi, & Goslin, 2020; Seidel, Rijntjes et al., 2023).

      We thank Reviewer 3 for this comment. Even if we accept the idea that we possess specific sensorimotor programs about tool manipulation, it remains that these programs cannot explain how an individual decides to bend a wire to make a hook or to pour water in a recipient to retrieve a target. As a matter of fact, such behaviour has been reported in nonhuman animals, such as crows (Weir et al., 2002, Nature) or orangutans (Mendes et al., 2007, Biology Letters). In these studies, the question is whether these nonhuman animals understand the physical causes or not, but the question of sensorimotor programs is never addressed (to our knowledge). This is also true in developmental studies on tool use (e.g., Beck et al., 2011, Cognition; Cutting et al., 2011, Journal of Experimental Child Psychology). This is what we meant here, that is, the manipulation-based approach is not equipped to explain how people solve physical problems by using or making tools – or any object – or by building constructions or producing technical innovations. However, we agree that some papers have been interested in exploring the link between common and novel tool use and have suggested that both could recruit common sensorimotor programs. It is noteworthy that these studies do not test the predictions from the manipulation-based approach versus the reasoning-based approach, so both interpretations are generally viable as stressed by Seidel et al. (2023), one of the papers recommended by Reviewer 3.

      “Apparently, the presentation of a graspable object that is recognizable as a tool is sufficient to provoke SMG activation, whether one tends to see the function of SMG to be either “technical reasoning” (Osiurak and Badets 2016; Reynaud et al. 2016; Lesourd et al. 2018; Reynaud et al. 2019) or “manipulation knowledge” (Sakreida et al. 2016; Buxbaum 2017; Garcea et al. 2019b).” (Seidel et al., 2023; p. 9)

      Regardless, as suggested by Reviewer 3, these papers deserve to be cited and this part needed to be rewritten to insist on the “making, construction, and innovation” dimension more than on the “unfamiliar and novel tool use” dimension to avoid any ambiguity.

      “This manipulation-based approach has provided interesting insights (12–16) and even elegant attempts to explain how these sensorimotor programs could support the use of both unfamiliar or novel tools (17–20), but remains silent on the more general cognitive mechanisms behind human technology that include the use of common and unfamiliar or novel tools but must also encompass tool making, construction behaviour, technical innovations, and transmission of technical content.” (p. 3)

      Introduction: "Here we focus on two important questions... to promote the technicalreasoning hypothesis as a comprehensive cognitive framework..."(italics added). This and other similar statements should be rewritten as testable scientific hypotheses rather than implying that the point of the research is to promote the investigators' preferred view.

      We agree that our phrasing could seem inappropriate here. What we meant here is that the technical-reasoning hypothesis could become an interesting framework for the study of the cognitive bases of human technology only if we are able to verify some of its key facets. As suggested, we rewrote this part. We also rewrote the abstract and the first paragraph of the discussion.

      “Here we focus on two key aspects of the technical-reasoning hypothesis that remain to be addressed: Generalizability and specificity. If technical reasoning is a specific form of reasoning oriented towards the physical world, then it should be implicated in all (the generalizability question) and only (the specificity question) the situations in which we need to think about the physical properties of our world.” (p. 5)

      Introduction: The Goldenberg and Hagmann paper cited actually shows that familiar tool use may be based either on retrieval from semantic memory or by inferring function from structure (mechanical problem solving); in other words, the investigators saw a role for both kinds of information, and the relationship between mechanical problem solving and familiar tool use was actually relatively weak. This requires correction.

      We disagree with Reviewer 3 on this point. The whole sentence is as follows:

      “This silence has been initially broken by a series of studies initiated by Goldenberg and Hagmann (9), which has documented a behavioural link in left brain-damaged patients between common tool use and the ability to solve mechanical problems by using and even sometimes making novel tools (e.g., extracting a target out from a box by bending a wire to create a hook) (9, 17).” (p. 3-4)

      We did not mention the interpretations given by Goldenberg and Hagmann about the link with the pantomime task, but only focused on the link they reported between common tool use and novel tool use. This is factual. In addition, we also disagree that the link between common tool use and novel tool use was weak.

      “The hypothesis put forward in the introduction predicts that knowledge about prototypical tool use assessed by pantomime of tool use and the ability to infer function from structure assessed by novel tool selection can both contribute to the use of familiar tools. Indeed results of both tests correlated signicantly with the use of familiar tools pantomime of tool use: r \= 0.77, novel tool selection: r \= 0.62; both P < 0.001), but there was also a signicant correlation between the two tests r \= 0.64, P < 0.001).” (Goldenberg & Hagmann, 1998; p. 585)

      As can be seen in this quote, they reported a significant correlation between novel tool selection and the use of familiar tools. It is also noteworthy that the novel tool selection test and the pantomime test correlated together. Georg Goldenberg told one of the authors (F. Osiurak; personal communication) that this result incited him to revise its idea that pantomime could assess “semantic knowledge”, which explains why he did not use it again as a measure of semantic knowledge. Instead, he preferred to use a classical semantic matching task in his 2009 Brain paper with Josef Spatt, in which they found a clearer dissociation between semantic knowledge and common/novel tool use not only at the behavioral level but also at the cerebral level.

      Introduction: Please expand and clarify this sentence "However, this involvement seems to be task-dependent, contrary to the systematic involvement of left are PF. The IFG and LOTC activations observed in prior studies are of interest as well. Were they indeed all taskdependent in these studies?

      We agree that this sentence is confusing. We meant that, in the studies reported just above in the paragraph, these regions were not systematically reported contrary to the left area PF. As we think that this information was not crucial for the logic of the paper, we preferred to remove it. 

      Introduction: If implicit mechanical knowledge is acquired through interactions with objects, how is that implicit knowledge conveyed to pass on the material culture to others?

      We thank Reviewer 3 for this comment. Although mechanical knowledge is implicit, it can be indirectly transmitted to other individuals, as shown in two papers we published in Nature Human Behaviour (Osiurak et al., 2021) and Science Advances (Osiurak et al., 2022). Actually, verbal teaching is not the only way to transmit information. There are many other ways of transmitting information such as gestural teaching (e.g., pointing the important aspects of a task to make them salient to the learner), observation without teaching (i.e., when we observe someone unbeknown to them) or reverse engineering (i.e., scrutinizing an artifact made by someone else). We have shown that even in reverse-engineering conditions, participants can benefit from what previous participants have done to increase their understanding of a physical system. In other words, all these forms of transmission allow the learners to understand new physical relationships without waiting that these relationships randomly occur in the environment. There is a wide literature on social learning, which describes very well how knowledge can be transmitted without using explicit communication. In fact, it is very likely that such forms of transmission were already present in our ancestors, allowing them to start accumulating knowledge without using symbolic language. We did not add this information in the MS because we think that this was a little bit beyond the scope of the MS. Nevetheless, we cited relevant literature on the topic to help the reader find it if interested in the topic.

      “Yet, recent accounts have proposed that non-social cognitive skills such as causal understanding or technical reasoning might have played a crucial role in cumulative technological culture (6, 29, 66). Support for these accounts comes from micro-society experiments, which have demonstrated that the improvement of technology over generations is accompanied by an increase in its understanding (67, 68), or that learners’ technical-reasoning skills are a good predictor of cumulative performance in such micro-societies (33, 69).” (p. 19)

      What distinguishes this implicit mechanical knowledge from stored knowledge about object manipulation? Are these two conceptualizations really demonstrably (testably) different?

      We agree that it is complex to distinguish between these two hypotheses as suggested by Seidel et al. (2023) cited above (see Reviewer 3 Point 8). We have conducted several studies to test the opposite predictions derived from each hypothesis. The main distinction concerns the understanding of physical materials and forces, which is central to the technical-reasoning hypothesis but not to the manipulation-based approach. Indeed, sensorimotor programs about tool manipulation are not assumed to contain information about physical materials and forces. In the present study, the understanding of physical materials and forces was needed in the four tasks hypothesized as requiring technical reasoning, i.e., the mechanical problem-solving task, the psychotechnical task and the PHYS-Only and INT+PHYS conditions of the mentalizing task. We can illustrate this aspect with items of each of these tasks. Figure 1A is of the mechanical problem-solving task. 

      As explained in the MS, participants had memorized the five possible tools before the scanner session. Thus, for 4 seconds, they had to imagine which of these tools could be used to extract the target out from the box. We did so to incit them to reason about mechanical solutions based on the physical properties of the problem. Then, they had 3 seconds to select the tool with the appropriate shape, here the right one. In this case, the motor action remains the same (i.e., pulling). Another illustration can be given, with the psychotechnical task (Figure 1B).

      In this task, the participant had to reason as to whether the boat-tractor connection was better in the left picture or in the right picture. This needs to reason about physical forces, but there is no need to recruit sensorimotor programs about tool manipulation. Finally, a last example can be given with the PHYS-Only condition of the mentalizing task (but the logic is the same for the INT+PHYS condition except that the character’s intentions must also be taken into consideration) Figure 1D).

      Here the participant must reason about which picture shows what is physically possible. In this task, there is no need to recruit sensorimotor programs about tool manipulation. In sum, what is common between these three tasks is the requirement to reason about physical materials and forces. We do not ignore that motor actions could be simulated in the mechanical problemsolving task, but no motor action needed to be simulated in the other three tasks. Therefore, what was common between all these tasks was the potential involvement of technical reasoning but not of sensorimotor programs about tool manipulation. Of course, an alternative is to consider that motor actions are always needed in all the situations, including situations where no “manipulable tool” is presented, such as a tractor and a boat, a pulley, or a cannon. We cannot rule out this alternative, which is nevertheless, for us, prejudicial because it implies that it becomes difficult to test the manipulation-based approach as motor actions would be everywhere. We voluntarily decided not to introduce a debate between the reasoning-based approach and the manipulation-based approach and preferred a more positive writing by stressing the insights from the present study. Note that we stressed the merits of the manipulation-based approach in the introduction because we sincerely think that this approach has provided interesting insights. However, we voluntarily did not discuss the debate between the two approaches. Given Reviewer 3’s comment (see also Reviewer 1 Point 2), we understand and agree that some words must be nevertheless said to discuss how the manipulation-based approach could interpret our results, thus stressing the potential limitations of our interpretations. Therefore, we added a specific section in the discussion in which we discussed this aspect in more details.

      “The second limitation concerns the alternative interpretation that the left area PF is not central to technical reasoning but to the storage of sensorimotor programs about the prototypical manipulation of common tools. Here we show that the left area PF is recruited even in situations in which participants do not have to process common manipulable tools. For instance, some items of the psychotechnical task consisted of pictures of tractor, boat, pulley, or cannon. The fact that we found a common activation of the left area PF in such tasks as well as in the mechanical problem-solving task, in which participants could nevertheless simulate the motor actions of manipulating novel tools, indicates that this brain area is not central to tool manipulation but to physical understanding. That being said, some may suggest that viewing a boat or a cannon is enough to incite the simulation of motor actions, so our tasks were not equipped to distinguish between the manipulation-based approach and the reasoning-based approach. We have already shown that the left area PF is more involved in tasks that focus on the mechanical dimension of the tool-use action (e.g., the mechanical interaction between a tool and an object) than its motor dimension (i.e., the interaction between the tool and the effector [e.g., 24, 40]). Nevertheless, we recognize that future research is still needed to test the predictions derived from these two approaches.” (p. 18-19)

      Introduction and throughout: The framing of left Area PF as a special area for technical reasoning is overly reductionistic from a functional neuroanatomic perspective in that it ignores a large relevant literature showing that the region is involved with many other tasks that seem not to require anything like technical cognition. Indeed, entering the coordinates - 56, -29, 36 (reported as the peak coordinates in common across the studied tasks) in Neurosynth reveals that 59 imaging studies report activations within 3 mm of those coordinates; few are action-related (a brief review indicated studies of verbal creativity, texture processing, reading, somatosensory processing, stress reactions, attentional selection etc). Please acknowledge the difficulty of claiming that a large brain region should be labeled the brain's technical reasoning area when it seems to also participate in so much else. The left IPL (including area PF) is densely connected to the ventral premotor cortex, and this network is activated in language and calculation tasks as well as tool use tasks (e.g., Matsumoto, Nair, et al., 2012). What other constructs might be able to unite this disparate literature, and are any of these alternative constructs ruled out by the present data? Lacking this objective discussion, the manuscript does read as a promotion of the investigators' preferred viewpoint.

      We thank Reviewer 3 for this comment. As stressed in the initial version of the MS, we did not write that the left area PF is sufficient but central to the network that allows us to reason about the physical world. Regardless, we agree that an objective discussion was needed on this aspect to help the reader not misunderstand our purpose. We added a section in this aspect as suggested. 

      “Before concluding, we would like to point out two potential limitations of the present study. The first limitation concerns the fact that the literature has documented the recruitment of the left area PF in many neuroimaging experiments in which there was no need to reason about physical events (e.g., language tasks). This can be easily illustrated by entering the left area PF coordinates in the Neurosynth database.

      This finding could be enough to refute the idea that this brain area is specific to technical reasoning. Although this limitation deserves to be recognized, it is also true for many other findings. For instance, sensory or motor brain regions such as the precentral or the postcentral cortex have been found activated in many non-motor tasks, the visual word form area in non-language tasks, or the Heschl’s gyrus in nonmusical tasks. This remains a major challenge for scientists, the question being how to solve these inconsistencies that can result from statistical errors or stress that considerable effort is needed to understand the very functional nature of these brain areas. Thus, understanding that the left area PF is central to physical understanding can be viewed as a first essential step before discovering its fundamental function, as suggested by the functional polyhedral approach (56).” (p. 18)

      Discussion: The discussion of a small cluster in the IFG (pars opercularis) that nearly survived statistical correction is noteworthy in light of the above point. This further underscores the importance of discussing networks and not just single brain regions (such as area PF) when examining complex processes. The investigators note, "a plausible hypothesis is that the left IFG integrates the multiple constraints posed by the physical situation to set the ground for a correct reasoning process, such as it could be involved in syntactic language processing". In fact, the hypothesis that the IFG and SMG are together related to resolving competition has been previously proposed, as has the more specific hypothesis that the SMG buffers actions and that the context-appropriate action is then selected by the IFG (e.g., Buxbaum & Randerath, 2018). The parallels with the way the SMG is engaged with competing lexical or phonological alternatives (e.g., Peramunage, Blumstein et al., 2011) have also been previously noted.

      We added the Buxbaum and Randerath (2018)’s reference in this section.

      “The functional role of the left IFG in the context of tool use has been previously discussed (24) and a plausible hypothesis is that the left IFG integrates the multiple constraints posed by the physical situation to set the ground for a correct reasoning process, such as it could be involved in syntactic language processing (for a somewhat similar view, see [51]).” (p. 16-17)

      Introduction and Discussion: Please clarify how the technical reasoning network overlaps with or is distinct from the tool-use network reported by many previous investigators.

      We added a couple of sentences in the discussion to clarify this point.

      “It should be clear here that we do not advocate the localizationist position simply stating that activation in the left area PF is the necessary and sufficient condition for technical reasoning. We rather defend the view according to which it requires a network of interacting brain areas, one of them – and of major importance – being the left area PF. This allows the engagement of different configurations of cerebral areas in different technical-reasoning tasks, but with a central process acting as a stable component: The left area PF. Thus, when people intend to use physical tools, it can work in concert with brain regions specific to object manipulation and motor control, thereby forming another network, the tool-use network. It can also interact with brain regions specific to intentional gestures to form a “social-learning” network that allows people to enhance their understanding about the physical aspects of a technical task (e.g., the making of a tool) through communicative gestures such as pointing gestures (42). The major challenge for future research is to specify the nature of the cognitive process supported by the left area PF and that might be involved in the broad understanding of the physical world.” (p. 14)

      Discussion: All of the experimental tasks require a response from a difficult choice in an array, and all of the tasks except for the fluid cognition task are likely to require prediction or simulation of a motion trajectory-whether an embodied or disembodied trajectory is unclear. The Discussion does mention the related (but distinct) idea of an "intuitive physics engine", a "kind of simulator", Please clarify how this study can rule out these alternative interpretations of the data. If the study cannot rule out these alternatives, the claims of the study (and the paper title which labels PF as a technical cognition area) should be scaled back considerably. 

      We thank Reviewer 3 for this comment. The authors of the papers on intuitive physics engine or associative learning do not suggest that these processes are embodied. As discussed above, we clarified our perspective on the role of the left area PF and hope that these modifications help the reader better understand it. We warmly thank Reviewer 3 for their comments, which considerably helped us improve the MS.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Hüppe and colleagues had already developed an apparatus and an analytical approach to capture swimming activity rhythms in krill. In a previous manuscript they explained the system, and here they employ it to show a circadian clock, supplemented by exogenous light, produces an activity pattern consistent with "twilight" diel vertical migration (DVM; a peak at sunset, a midnight sink, and a peak in the latter half of the night).

      They used light:dark (LD) followed by dark:dark (DD) photoperiods at two times of the year to confirm the circadian clock, coupled with DD experiments at four times of year to show rhythmicity occurs throughout the year along with DVM in the wild population. The individual activity data show variability in the rhythmic response, which is expected. However, their results showed rhythmicity was sustained in DD throughout the year, although the amplitude decayed quickly. The interpretation of a weak clock is reasonable, and they provide a convincing justification for the adaptive nature of such a clock in a species that has a wide distributional range and experiences various photic environments. These data also show that exogenous light increases the activity response and can explain the morning activity bouts, with the circadian clock explaining the evening and late-night bouts. This acknowledgement that vertical migration can be driven by multiple proximate mechanisms is important.

      The work is rigorously done, and the interpretations are sound. I see no major weaknesses in the manuscript. Because a considerable amount of processing is required to extract and interpret the rhythmic signals (see Methods and previous AMAZE paper), it is informative to have the individual activity plots of krill as a gut check on the group data.

      The manuscript will be useful to the field as it provides an elegant example of looking for biological rhythms in a marine planktonic organism and disentangling the exogenous response from the endogenous one. Furthermore, as high latitude environments change, understanding how important organisms like krill have the potential to respond will become increasingly important. This work provides a solid behavioral dataset to complement the earlier molecular data suggestive of a circadian clock in this species.

      We appreciate the positive evaluation of our work by Reviewer 1, acknowledging our approach to record locomotor activity in krill and the importance of the findings in assessing krill’s potential to respond to environmental change in their habitat.

      Reviewer #2 (Public review):

      Summary:

      This manuscript provides experimental evidence on circadian behavioural cycles in Antarctic krill. The krill were obtained directly from krill fishing vessels and the experiments were carried out on board using an advanced incubation device capable of recording activity levels over a number of days. A number of different experiments were carried out where krill were first exposed to simulated light:dark (L:D) regimes for some days followed by continuous darkness (DD). These were carried out on krill collected during late autumn and late summer. A further set of experiments was performed on krill across three different seasons (summer, autumn, winter), where incubations were all DD conditions. Activity was measured as the frequency by which an infrared beam close to the top of the incubation tube was broken over unit time. Results showed that patterns of increased and decreased activity that appeared synchronised to the LD cycle persisted during the DD period. This was interpreted as evidence of the operation of an internal (endogenous) clock. The amplitude of the behavioural cycles decreased with time in DD, which further suggests that this clock is relatively weak. The authors argued that the existence of a weak endogenous clock is an adaptation to life at high latitudes since allowing the clock to be modulated by external (exogenous) factors is an advantage when there is a high degree of seasonality. This hypothesis is further supported by seasonal DD experiments which showed that the periodicity of high and low activity levels differed between seasons.

      Strengths

      Although there has been a lot of field observations of various circadian type behaviour in Antarctic krill, relatively few experimental studies have been published considering this behaviour in terms of circadian patterns of activity. Krill are not a model organism and obtaining them and incubating them in suitable conditions are both difficult undertakings. Furthermore, there is a need to consider what their natural circadian rhythms are without the overinfluence of laboratory-induced artefacts. For this reason alone, the setup of the present study is ideal to consider this aspect of krill biology. Furthermore, the equipment developed for measuring levels of activity is well-designed and likely to minimise artefacts.

      We would like to thank Reviewer 2 for their positive assessment of our approach to study the influence of the circadian clock on krill behavior. We are delighted, that Reviewer 2 found our mechanistic approach in understanding daily behavioral patterns of Antarctic krill using the AMAZE set-up convincing, and that the challenging circumstances of working with a polar, non-model species are acknowledged.

      Weaknesses

      I have little criticism of the rationale for carrying out this work, nor of the experimental design. Nevertheless, the manuscript would benefit from a clearer explanation of the experimental design, particularly aimed at readers not familiar with research into circadian rhythms. Furthermore, I have a more fundamental question about the relationship between levels of activity and DVM on which I will expand below. Finally, it was unclear how the observational results made here related to the molecular aspects considered in the Discussion.

      (1) Explanation of experimental design - I acknowledge that the format of this particular journal insists that the Results are the first section that follows the Introduction. This nevertheless presents a problem for the reader since many of the concepts and terms that would generally be in the Methods are yet to be explained to the reader. Hence, right from the start of the Results section, the reader is thrown into the detail of what happened during the LD-DD experiments without being fully aware of why this type of experiment was carried out in the first place. Even after reading the Methods, further explanation would have been helpful. Circadian cycle type research of this sort often entrains organisms to certain light cycles and then takes the light away to see if the cycle continues in complete darkness, but this critical piece of knowledge does not come until much later (e.g. lines 369-372) leaving the reader guessing until this point why the authors took the approach they did. I would suggest the following (1) that more effort is made in the Introduction to explain the exact LD/DD protocols adopted (2) that a schematic figure is placed early on in the manuscript where the protocol is explained including some logical flow charts of e.g. if behavioural cycle continues in DD then internal clock exists versus if cycle does not continue in DD, the exogenous cues dominate - followed by - major decrease in cyclic amplitude = weak clock versus minor decrease = strong clock and so on

      We want to thank Reviewer 2 for pointing out that the experimental design and its rationale are not becoming clear early in the manuscript, especially for people outside the field of chronobiology. We added a new figure (now Fig. 1), illustrating the basic principle of chronobiological study design and how we adopted it. We also extended the description at the beginning of the Results section to clarify the rationale behind the experimental design.

      (2) Activity vs kinesis - in this study, we are shown data that (i) krill have a circadian cycle - incubation experiments; (ii) that krill swarms display DVM in this region - echosounder data (although see my later point). My question here is regarding the relationship between what is being measured by the incubation experiments and the in situ swarm behaviour observations. The incubation experiments are essentially measuring the propensity of krill to swim upwards since it logs the number of times an individual (or group) break a beam towards the top of the incubation tube. I argue that krill may be still highly active in the rest of the tube but just do not swim close to the surface, so this approach may not be a good measure of "activity". Otherwise, I suggest a more correct term of what is being measured is the level of "upward kinesis". As the authors themselves note, krill are negatively buoyant and must always be active to remain pelagic. What changes over the day-night cycle is whether they decide to expend that activity on swimming upwards, downwards or remaining at the same depth. Explaining the pattern as upward kinesis then also explains by swarms move upwards during the night. Just being more active at night may not necessarily result in them swimming upwards.

      We believe there is a slight misunderstanding in how what we call “activity” is measured. The experimental columns are equipped with five detector modules, evenly distributed over the height of the column. In our analysis we count all beam breaks caused by upward movement, i.e. every time a detector module is triggered after a detector module at a lower position has been triggered, and not only when the top detector module is triggered. In this way, we record upward swimming movements throughout the column, and not only when the krill swims all the way to the top of the column. This still means that what we are measuring is swimming activity, caused by upward swimming. We use this measure, to deliberately separate increased swimming activity, from baseline activity (i.e. swimming, which solely compensates for negative buoyancy) and inactivity (i.e. passive sinking).

      Higher activity is thus at first interpreted as an increase in swimming activity, which in the field may result in upwards-directed swimming but also could mean a horizontal increase in activity, for example, representing increased foraging and feeding activity. This would explain the daily activity pattern observed under LD cycles (now Fig. 3), which shows a general increase in activity during the dark phase. This nighttime increase could be used for both upward directed migration during sunset and horizontal directed swimming for feeding and foraging throughout the night.

      We added the following sentence to the description of the activity metric in the Methods section to clarify this point (lines 465-469):

      “To accomplish this, we organized the raw beam break data from all five detector modules in each experimental column in chronological order. We selected only those beam break detections that occurred after a detection in the detector module positioned lower on the column. Like this, we consider upward swimming movements throughout the full height of the column.”

      (3) Molecular relevance - Although I am interested in molecular clock aspects behind these circadian rhythms, it was not made clear how the results of the present study allow any further insight into this. In lines 282 to 284, the findings of the study by Biscontin et al (2017) are discussed with regard to how TIM protein is degraded by light via the clock photreceptor CRYTOCHROME 1. This element of the Discussion would be a lot more relevant if the results of the present study were considered in terms of whether they supported or refuted this or any other molecular clock model. As it stands, this paragraph is purely background knowledge and a candidate for deletion in the interest of shortening the Discussion.

      We agree that this part is not directly related to the data presented in the manuscript. We, therefore, omitted this part in the revised version of the manuscript to keep the discussion concise and focused on the results.

      Other aspects

      (i) 'Bimodal swimming' was used in the Abstract and later in the text without the term being fully explained. I could interpret it to mean a number of things so some explanation is required before the term is introduced.

      We thank the Reviewer for pointing this out. We provided an explanation for the term “bimodal” in the Results section, where the two clock driven activity bouts are described first, by extending the sentence in lines 161-164, which now reads:

      “This suggests that the circadian clock drives a distinct bimodal activity pattern with two activity peaks in one day, i.e. the evening and late-night activity bouts, while. In contrast, the morning activity bout is triggered by the onset of illumination in the experimental set-up.”.

      (ii) Midnight sinking - I was struck by Figure 2b with regards to the dip in activity after the initial ascent, as well as the rise in activity predawn. Cushing (1951) Biol Rev 26: 158-192 describes the different phases of a DVM common to a number of marine organisms observed in situ where there is a period of midnight sinking following the initial dusk ascent and a dawn rise prior to dawn descent. Tarling et al (2002) observe midnight sinking pattern in Calanus finmarchicus and consider whether it is a response to feeding satiation or predation avoidance (i.e. exogenous factors). Evidence from the present study indicates that midnight sinking (and potential dawn rise) behaviour could alternatively be under endogenous control to a greater or lesser degree. This is something that should certainly be mentioned in the Discussion, possibly in place of the molecular discussion element mentioned above - possibly adding to the paragraph Lines 303-319.

      We would like to thank the Reviewer for pointing this out and agree that adding the idea of an endogenous control of midnight sinking would be interesting to the discussion. We added the following section to the Discussion (lines 335-343):

      “Interestingly, the decrease in clock-controlled swimming activity during the early night, right after the evening activity bout, may further facilitate a phenomenon called “midnight sinking”, which describes the sinking of animals to intermediate depths after the evening ascent, followed by a second rise to the surface before the morning descend. This behavior has been observed in a number of zooplankton species, including calanoid copepods (see 69, 70 and references therein) and krill (71). While previous studies suggested several exogenous factors, such as satiation or predator presence, as drivers of the midnight sink (69, 70), our study suggests that this pattern may be partly under endogenous control.”

      (iii) Lines 200-207 - I struggled to follow this argument regarding Piccolin et al identifying a 12 h rhythm whereas the present study indicates a ~24 h rhythm. Is one contradicting the other - please make this clear.

      In our study, we found that the circadian clock drives a bimodal pattern of swimming activity in krill, meaning it controls two bouts of activity in a 24-hour cycle. Piccolin et al. (2020) identified a swimming activity pattern of ~12 h (i.e. two peaks in 24 h) at the group level, which aligns with our findings at the individual level. We revised the Section in the discussion for more clarity, which now reads:

      “Data from Piccolin et al. (20) showed a strong damping of the amplitude and indication of a remarkably short (~12 h) free running period (FRP) of vertical swimming behavior of a group of krill under constant darkness (20). The short period found in Piccolin et al. (20) complements is in line with our findings of a bimodal activity pattern the pattern of swimming activity under DD conditions on the individual level found in the present study, suggesting that the ~12 h rhythm in group swimming behavior in Piccolin et al. (20) could have resulted from a bimodal activity pattern at the individual level, as found in our study.” (lines 212-219).  

      (iv) Although I agree that the hydroacoustic data should be included and is generally supportive of the results, I think that two further aspects should be made clear for context (a) whether there was any groundtruthing that the acoustic marks were indeed krill and not potentially some other group know to perform DVM such as myctophids (b) how representative were these patterns - I have a sense that they were heavily selected to show only ones with prominent DVM as opposed to other parts of the dataset where such a pattern was less clear - I am aware of a lot of krill research where DVM is not such a clear pattern and it is disingenuous to provide these patterns as the definitive way in which krill behaves. I ask this be made clear to the reader (note also that there is a suggestion of midnight sinking in Fig 5b on 28/2).

      To clarify the mentioned points concerning the hydroacoustic data:

      a) As mentioned in the Methods section, only hydroacoustic data during active fishing was included in the analysis. E. superba occurs in large monospecific aggregations, and the fishery actively targets E. superba and monitors their catch and the proportion of non-target species continuously with cameras. Krill fishery bycatch rates are very low (0.1–0.3%, Krafft et al. 2022), and fishing operations would stop if non-target species were caught in significant proportions at any time. Therefore, and supported by our own observations when we conducted the experiments, we argue that it is a valid assumption that E. superba predominantly causes the backscattering signal shown in Figure 5 (now Fig. 6).

      b) We are aware of the fact that DVM patterns of Antarctic krill are highly variable and that normal DVM patterns do not need to be the rule (e.g. see our cited study on the plasticity of krill DVM by Bahlburg et al. 2023). The visualized data were not selected for their DVM pattern but represent the period directly preceding the sampling for behavioral experiments in four seasons (experiment 2), including the day of sampling. These periods were chosen to assess the DVM behavior of krill swarms in the field in the days before and during the sampling for behavioral experiments.

      To improve understanding, we modified the description in the Results, Discussion, and Methods sections, as well as the caption of Figure 5 (now Fig. 6), which now read:

      “To investigate whether krill swarms exhibited daily behavioral patterns in swimming behavior in the field before they were sampled for seasonal experiments, hydroacoustic data were recorded from the fishing vessel, continuously over a three-day period prior to sampling for the seasonal experiments described above…” (lines 191-194).

      “Furthermore, hydroacoustic recordings demonstrate that most krill swarms sampled exhibited synchronized DVM in the field in the days directly before sampling for behavioral experiments, indicating that in this region, krill remain behaviorally synchronized across a wide range of photoperiods.” (lines 397-400).

      “Hydroacoustic data were collected using a hull-mounted SIMRAD ES80 echosounder (Kongsberg Maritime AS) aboard the Antarctic Endurance, covering three days before the sampling for each of the seasonal behavioral experiments of experiment 2” (lines 512-515).

      “We only included data during active fishing periods and the vessel is specifically targeting E. superba, which occurs in large monospecific aggregations. Further, krill fishery bycatch rates are very low (0.1-0.3%, 84), which makes it highly probable that the recorded signal represents krill swarms.” (lines 523-526).

      “Hydroacoustic recordings showing the vertical distribution of krill swarms in the upper water column (<220 m) below the vessel, visualized by the mean volume backscattering signal (200 kHz), on the three days prior to krill sampling for experiments…” (lines 802-804).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As noted in the public review, this is a logical and well-written manuscript. I have very few comments to consider addressing.

      The Results lead with a paragraph outlining the experimental approach. This is good, but you use the term "experiments" to refer to both the two sets, and the two or four subsets of experiments. Perhaps consider the subset experiments as "treatments"? I understood what you meant, but it took a few read-throughs to be sure I got it.

      We thank the reviewer for pointing this out and changed the nomenclature of the experiments throughout the manuscript. We now refer to the two sets of experiments as experiment 1 and 2, to the subsets of experiment 1 as “short day treatment” and “long day treatment”, and to the subsets of experiment 2 as summer treatment, late summer treatment, autumn treatment, and winter treatment. We also believe that the new Figure 1 is now helping to follow the experimental design more efficiently.

      Ln 140: "...off and decrease at lights-on."

      We adjusted the sentence accordingly.

      Ln 244: Can you define "extreme photic conditions"? I get what you mean, but to be clear to the reader this would help.

      We adjusted the sentence, which now reads:

      “This could confer a significant adaptive advantage to species inhabiting environments characterized by extreme photic conditions (53, 54, 60), such as phases of polar night or midnight sun as well as rapid changes in daylength, or species that rely on precise photoperiodic time measurement for accurate seasonal adaptation.” (lines 258-261).

      Figures: Consider adding an LSP for groups in Fig 1. Also, it would be useful to have LSP period estimates for each individual tested. This could be a separate table, or it could be added to the individual activity plots. Should S3 and S4 be reversed?

      We thank the reviewer for their suggestion and added an LSP as figure 1d (now Fig. 2d) to statistically support the group activity shown in Figure 1c (now Fig. 2c) as suggested. We added the individual animals' LSP period estimates to supplementary figures S2, S7, S8, S9, and S10. We also reversed Figures S3 and S4 to match the appearance in the main text. 

      Fig 5: are the light regime bars for b and c correct? They look similar, but there are only 15 days apart, so perhaps they are correct as is.

      We double checked the light regime bars in Fig. 5b and c (now 6b and c) and they are correct as is.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      This very interesting manuscript proposes a general mechanism for how activating signaling proteins respond to species-specific signals arising from a variety of stresses. In brief, the authors propose that the activating signal alters the structure by a universal allosteric mechanism.

      Strengths:

      The unitary mechanism proposed is appealing and testable. They propose that the allosteric module consists of crossed alpha-helical linkers with similar architecture and that their attached regulatory domains connect to phosphatases or other molecules through coiled-coli domains, such that the signal is transduced via rigidifying the alpha helices, permitting downstream enzymatic activity. The authors present genetic and structural prediction data in favor of the model for the system they are studying, and stronger structural data in other systems.

      Weaknesses:

      The evidence is indirect - targeted mutations, structural predictions, and biochemical data. Therefore, these important generalizable conclusions are not buttressed by impeccable data, which would require doing actual structures in B. subtilis, confirming experiments in other organisms, and possibly co-evolutionary coupling. In the absence of such data, it is not possible to rule out variant models.

      We thank the reviewer for their feedback. A challenge of studying flexible proteins is that it is often not possible to directly obtain high resolution structural data. For the case of B. subtilis RsbU, the independent experimental approaches we applied (including two unbiased genetic screens, targeted mutagenesis, SAXS, enzymology, and structure prediction, which includes evolutionary coupling) converged upon a model for activation, which we feel is well supported. Frustratingly, our attempts at determining high resolution experimental structures have been unsuccessful, which we think is due to the flexibility of the proteins revealed by our SAXS experiments. For example, we collected X-ray diffraction data from crystals of a fragment of B. subtilis RsbU containing the N-terminal domain and linker in which the linker was almost entirely disordered in the maps. We agree that doing experiments in other organisms would be valuable next steps to test the hypothesis that this coiled-coil based transduction mechanism is conserved across species, and will modify the text to differentiate this more speculative section of the manuscript.

      We have modified the abstract to read:

      “This coiled-coil linker transduction mechanism additionally suggests a resolution to the mystery of how shared sensory domains control serine/threonine phosphatases, diguanylate cyclases and histidine kinases.”

      We have modified the results to read:

      "These predictions suggest a testable hypothesis that RsbP is controlled through an activation mechanism similar to that of RsbU (Fig. 5A)”

      “From this analysis, we speculate that linker-mediated phosphatase domain dimerization is an evolutionarily conserved, adaptable mechanism to control PPM phosphatase activity.”

      Based on this critique (and the critiques of the other reviewers), we plan to do energetic analysis of the predicted coiled coils from the enzymes we analyzed from other species and to incorporate this into the manuscript.

      We have modified the results to read:

      Consistent with a model in which the stability of the linker plays a conserved regulatory role, the AlphaFold2 models for many of the predicted structures have unfavorable polar residues buried in the coiled-coil interface (positions a and d, for which non-polar residues are most favorable) (Figure 5 – figure supplement 2).”

      Finally, in the manuscript, we have highlighted that this mechanism is not the only mechanism for activation of other proteins with effector domains connected to linkers, but rather one of many mechanisms (Fig 5G). The reviewer additionally made helpful suggestions about the text in detailed comments that we will incorporate as appropriate.

      Reviewer #2 (Public review):

      Summary:

      While bacteria have the ability to induce genes in response to specific stresses, they also use the General Stress Response (GSR) to deal with growth conditions that presumably include a larger range of stresses (for instance, stationary phase growth). The activation of GSR-specific sigma factors is frequently at the heart of the induction of a GSR. Given the range of stresses that can lead to GSR induction, the regulatory inputs are frequently complex. In B. subtilis, the stressosome, a multi-protein complex, contains a set of proteins that, upon appropriate stresses, initiate partner switching cascades that free the sigma B sigma factor from an anti-sigma. The focus here is on the mode of activation of RsbU, a serine/threonine phosphatase of the PPM family, leading to sigB activation. RbsT, a component of the degradosome interacts with RsbU upon stress, activating the phosphatase activity. Once active, RsbU dephosphorylates its target (RsbV, an anti-antisigma), which in turn binds the anti-sigma. The conclusion is that flexible linker domains upstream of the phosphatase domain are the target for activation, via binding of proteins to the N-terminal domain, resulting in a crossed-linker dimeric structure. The authors then use the information on RsbU to suggest that parallel approaches are used to activate PPM phosphatases for the GSR response in other bacteria. (Biology vs. Mechanism, evolution?)

      Strengths and Weaknesses:

      Many of these have to do with clarifying what was done and why. This includes the presentation and content of the figures.

      One issue relates to the background and context. A bit more information on the stresses that release RsbT would be useful here. The authors might also consider a figure showing the major conclusions and parallels for SpoIIE activation and possibly other partner switches that are discussed, introducing the switch change more clearly to set the stage for the work here (and the generalization). There are a lot of players to keep track of.

      We plan to carefully review the manuscript to improve the clarity of presentation and background. In particular, we thank the reviewer for pointing out the missing information about the release of RsbT from the stressosome. We will incorporate this information into the introduction and provide an additional figure.

      We have added the following text to the introduction:

      “RsbT is sequestered in a megadalton stress sensing complex called the stressosome, and is released to bind RsbU in response to specific stress signals including ethanol, heat, acid, salt, and blue light”

      We have added a new figure panel (2C) that shows the model for how Q94L, M166V, and RsbT binding induce conformational change of the PPM domain to recruit metal cofactor and activate RsbU (analogous, but slightly different from the mechanism for SpoIIE).

      The reviewer additionally provided detailed helpful comments that we will incorporate in the text and figures.

      Reviewer #3 (Public review):

      Summary:

      The authors present a study building on their previous work on activation of the general stress response phosphatase, RsbU, from Bacillus subtilis. Using computed structural models of the RsbU dimer the authors map previously identified activating mutations onto the structure and suggest further protein variants to test the role of the predicted linker helix and the interaction with RsbT on the activation of the phosphatase activity.

      Using in vivo and in vitro activity assays, the authors demonstrate that linker variants can constitutively activate RsbU and increase the affinity of the protein for RsbT, thus showing a link between the structure of the linker region and RsbT binding.

      Small angle X-ray scattering experiments on RsbU variants alone, and in complex with RsbT show structural changes consistent with a decreased flexibility of the RsbU protein, which is hypothesised to indicate a disorder-order transition in the linker when RsbT binds. This interpretation of the data is consistent with the biochemical data presented by the authors.

      Further computed structure models are presented for other protein phosphates from different bacterial species and the authors propose a model for phosphatase activation by partner binding. They compare this to the activation mechanisms proposed for histidine kinase two-component systems and GGDEF proteins and suggest the individual domains could be swapped to give a toolkit of modular parts for bacterial signalling.

      Strengths:

      The key mutagenesis data is presented with two lines of evidence to demonstrate RsbU activation - in vivo sigma-b activation assays utilising a beta-galactosidase reporter and in vitro activity assays against the RsbV protein, which is the downstream target of RsbU. These data support the hypothesis for RsbT binding to the RsbU linker region as well as the dimerisation domain to activate the RsbU activity.

      Weaknesses:

      Small angle scattering curves are difficult to unambiguously interpret, but the authors present reasonable interpretations that fit with the biochemical data presented. These interpretations should be considered as good models for future testing with other methods - hydrogen/deuterium exchange mass spectrometry, would be a good additional method to use, as exchange rates in the linker region would be affected significantly by the disorder/order transition on RsbT binding.

      We agree with the reviewer that the SAXS data has inherent ambiguity due to the nature of the measurement. However, SAXS is one of the best techniques to directly assess conformational flexibility. Our scattering data for RsbU have multiple signatures of flexibility supporting a high confidence conclusion. While the scattering data support a reduction in flexibility for the RsbT/RsbU complex, we agree that a high resolution structure would be valuable. However the combination of the scattering data with our biochemical and genetic data supports the validity of the AlphaFold predicted model. We thank the reviewer for the suggestion of future hydrogen/deuterium exchange experiments that would be complementary, but which we feel are beyond the scope of this work.

      The interpretation of the computed structure models should be toned down with the addition of a few caveats related to the bias in the models returned by AlphaFold2. For the full-length models of RsbU and other phosphatase proteins, the relationship of the domains to each other is likely to be the least reliable part of the models - this is apparent from the PAE plots shown in Supplementary Figure 8. Furthermore, the authors should show models coloured by pLDDT scores in an additional supplementary figure to help the reader interpret the confidence level of the predicted structures.

      We thank the reviewer for suggestions on how to clarify the discussion of AlphaFold models. We will decrease the emphasis on the computed models in the text and will add figures with the models colored by the pLDDT scores to aid in the interpretation.

      We have modified the text of the Abstract: “This coiled-coil linker transduction mechanism additionally suggests a resolution to the mystery of how shared sensory domains control serine/threonine phosphatases, diguanylate cyclases and histidine kinases.”

      We have modified the text of the Results: “These predictions suggest a testable hypothesis that RsbP is controlled through an activation mechanism similar to that of RsbU (Fig. 5A).”

      “From this analysis, we speculate that linker-mediated phosphatase domain dimerization is an evolutionarily conserved, adaptable mechanism to control PPM phosphatase activity”

      We have also added Figure 1 – figure supplement 2 with the AlphaFold2 models colored by the pLDDT scores.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Baral and colleagues investigate the regulatory mechanisms of the General Stress Response (GSR) in Bacillus subtilis, focusing on the phosphatase RsbU and its regulation by the protein RsbT. The GSR is a critical adaptive mechanism that allows bacteria to survive under various stress conditions by reshaping their physiology through a broad transcriptional response. RsbU, a key player in the GSR, facilitates the activation of the transcription factor SigB by dephosphorylating RsbV. This activation is mediated through a partner-switching mechanism involving RsbT. Baral and colleagues use a combination of genetic screening, structural predictions via AlphaFold2, and biophysical techniques such as SAXS and MALS to present a model for how RsbT regulates RsbU. Key findings include the identification of specific amino acid substitutions that enhance RsbU activity, the role of the α-helical linker in RsbU dimerization and activation, and the potential broader conservation of these mechanisms across bacterial species. However, as described below, additional work is required to solidify the results.

      Major Points

      (1) The manuscript is misnamed--it dissects a single step of the signal-transduction pathway regulating the general stress response. Instead, it is rather seeking a generalizable mechanism for kinase -phosphatase interactions across stresses.

      We have edited the title to “A General Mechanism for Initiating the General Stress Response in Bacteria” to reflect that that this study addresses the initiating event of the general stress response.

      (2) The genetic screen likely has limitations in detecting all possible variants that could affect RsbU activity. The readout is specific to σ^B activation, and the focus on specific amino acid substitutions may overlook other significant regions or mechanisms involved in the regulation of RsbU, particularly those involving RsbV and RsbT.

      Our screens were specifically designed to identify features of RsbU that contribute to regulation. Importantly, RsbU does not have any known targets other than RsbV and the downstream σ<sup>B</sup> response but agree that substitutions in either RsbV or RsbT could influence RsbU activation. In principle our suppressor screen with RsbU<sup>Y28I</sup> could have identified RsbT variants (rsbT was mutagenized in this screen), but we did not identify any such variants in the screen. We conducted a separate screen (published elsewhere) that specifically addressed how RsbU recognizes RsbV.

      (3) The authors largely focus on the biochemical and structural aspects of RsbU regulation. There is limited discussion on the broader functional implications of these findings in the context of bacterial physiology and stress response. Incorporating more in vivo studies to show how these mechanisms impact bacterial survival and adaptation would provide a more comprehensive understanding.

      We appreciate this comment, but did not conduct additional studies of survival and adaptation because the phenotypes of σ<sup>B</sup> deletion in B. subtilis under laboratory conditions are relatively mild and therefore difficult to assay. Future studies to address this in other systems could be highly informative.

      (4) The results primarily support the model of linker-mediated dimerization and rigidity. However, other potential regulatory mechanisms or interacting partners might also play significant roles in RsbU activation. A more thorough exploration of these possibilities would strengthen the study's conclusions.

      One of the major advantages of RsbU as a model for initiation of the general stress response is that the system is discreet with all evidence pointing to there being a single primary input (RsbT) and output (dephosphorylation of RsbV). While there are other possible variations on the system (for example RsbU may be directly activated by manganese stress), we focused on this system precisely because of its simplicity.

      (5) While the study presents evidence for the conservation of the described mechanism across different species, this assumption is based on structural predictions and limited experimental data. Broader experimental validation across diverse bacterial species would be necessary to substantiate this claim. Coevolution coupling along with conservation/evolutionary studies could be considered.

      We have altered the language in the paper to emphasize where we are making inferences from predictions that are therefore more speculative. We agree that a more detailed analysis of the evolutionary coupling would likely be fruitful. We note that these couplings are the major driving force of AlphaFold predictions, suggesting that these couplings contributed to the models that we analyzed.

      (6) The reliance on AlphaFold2 for structural predictions introduces potential biases and uncertainties inherent in computational models. Experimental validation of these models through additional techniques such as cryo-EM or X-ray crystallography would strengthen the conclusions.

      We agree with this point, which is why we performed extensive analysis and validation of the models for RsbU using SAXS, genetics, and biochemistry. The proposed techniques are made more challenging by flexibility and heterogeneity, which we detected in our experiments. Our attempts thus far at experimental structure determination are consistent with this being a major technical hurdle.

      (7) SAXS data provide low-resolution structural information, and the interpretation of flexibility versus rigidification might be overemphasized in its interpretation. This part of the study was difficult to interpret. Improving readability by breaking down the text into sections with clear headings for each figure panel and clarifying descriptions of the panels and methods would help. Complementary high-resolution techniques could provide a more definitive view of the linker's conformational changes.

      We have modified the presentation of the figures to clarify the SAXS analysis. The fact that the SAXS analysis suggests flexibility rather than a discrete inactive conformation means that high-resolution techniques may not be appropriate for this system.

      (8) The study primarily focuses on the model where RsbT binding rigidifies the RsbU linker. Alternative hypotheses, such as subtle conformational adjustments without complete rigidification, are not extensively explored or ruled out.

      Our analysis of the SAXS data strongly suggests that a subtle conformational change could not account for the scattering data that we obtained. We have modified the text to clarify this point.

      “Indicative of significant deviation between the RsbU structure in solution to the AlphaFold2 model, the scattering intensity profile (I(q) vs. q) was a poor fit (χ<sup>2</sup> 12.53) to a profile calculated from the AlphaFold2 model of an RsbU dimer using FoXS (Schneidman-Duhovny et al. 2016; Schneidman-Duhovny et al. 2013) (Fig. 4A). We therefore assessed the SAXS data for the RsbU dimer for features that report on flexibility (Kikhney & Svergun 2015). First, the scattering intensity data lacked distinct features caused by the multi-domain structure of RsbU from the AlphaFold2 model (Fig.4A).”

      (9) Future studies should aim to validate the AlphaFold2 predictions with high-resolution structural techniques. This would provide definitive evidence for the proposed conformational states of RsbU with and without RsbT.

      The fact that the SAXS analysis suggests flexibility rather than a discrete inactive conformation means that high-resolution techniques may not be appropriate for this system.

      (10) Investigating the RsbU-RsbT interaction in vivo using techniques like FRET, co-immunoprecipitation, or live-cell imaging would provide a more comprehensive understanding of their functional dynamics in a cellular context.

      We appreciate the reviewer’s suggestions for future experiments.

      (11) Exploring and testing alternative models of RsbU activation, such as partial rigidification or different modes of conformational change, would strengthen the conclusions.

      While our data strongly support that a flexible-to-rigid transition controls RsbU activation, we agree that it is possible that other mechanisms of linker modification could control other phosphatases and we discuss this at some length in the discussion.

      (12) The figure legends are quite dense and could benefit from some streamlining.

      We have edited the figure legends for clarity and length.

      Reviewer #2 (Recommendations for the authors):

      (1) Activation assays (Figures 1, 3, S2) are presented here as blue or white spots (reflecting a reporter activity). While off and on these are fairly clear, it is more difficult to compare the degree of activity (for instance that rsbU<sup>Q94L</sup> is more active than M166V). It would also be good to clearly present in the text the logic of asking if the mutant is RsbT independent or not (and the interpretation of that). Quantitative assays of these would be very useful.

      We chose not to perform quantitative-LacZ assays here because of several complications to interpreting these results that we encountered in our previously published study (Ho and Bradshaw, 2021). However, the level of blue pigmentation shown in Figure 1B for RsbU Q94L and RsbU M166V is qualitatively different, making the comparison possible. Most importantly, we observed cell density dependent changes in LacZ activity in the absence of rsbT for rsbU<sup>M166V</sup> expressing cells, meaning that comparisons between strains would be difficult. Additionally, we found that it was important to make a chromosomal replacement of rsbU to see the full effect of the M166V substitution. However, we were not able to construct a similar rsbU<sup>Q94L</sup> strain, likely because the high level σ<sup>B</sup> activity is lethal (we were able to construct this strain when σ<sup>B</sup> was deleted but only obtained strains with additional loss-of-function mutations in RsbU when σ<sup>B</sup> was present.

      We have modified the text to explain the logic of identifying RsbT independent variants: “We previously conducted a genetic screen (Ho & Bradshaw 2021) to identify features of RsbU that are important for phosphatase regulation by isolating gain-of-function variants that are active in the absence of RsbT.”

      (2) Explain Figure S8 graphs: as much as Alphafold is now in use, the authors should provide some further explanation of what is shown here. Blue (low error) is good, presumably. What are the A, B, C, and D sections showing? Different parts of a given letter region (and between them)? What is the x-axis? Is the top-ranked model used in every case in the text? How different are these models? The Methods section could be used for some of this (but doesn't in its current form). This also becomes important for the models generated later in the paper (Figure S7), which look rather different here.

      We have modified figure S8 to include additional labels and have added structures with the pLDDT scores shown. We have additionally modified the figure legends and methods to provide the requested information.

      (3) Figure 1C, D, Figure S2: amino acid ends of linker domains could be shown (text discusses 83-97 the linker as a two-turn coiled coil; Q94 is pretty close to the end of this coiled-coil? Figure S2 is even less clear - addresses of other amino acids would help, and or an added sequence showing the full linker and coiled-coil region). Some explanation for positions for readers to focus on for full coiled-coil would be useful in the legend of Figure S2. How strong a coiled-coil prediction is there for this region?

      We have added the sequence of the coiled-coil regions to the figures with numbering. For these analyses we used the Socket2 program, which analyzes a PDB file to identify coiled-coil regions and thus does not provide a confidence score. However, inspection of the sequence and the confidence scores of the AlphaFold2 models indicates that the coiled-coil regions are not ideal, consistent with this being a regulatory feature.

      Is it clear that the fully inactive proteins are still properly folded and soluble?

      In the case of RsbU, our biophysical analysis indicates that the inactive form of the protein is soluble. While phosphatase activity is substantially reduced, our unpublished comparison of single- and multiple-turnover reactions in the absence of RsbT indicates that nearly all of the enzyme is active.

      Finally, are there other positions that would also be expected, from this model, to stabilize the coiled-coil and thus bypass the requirement for RsbT? If so, it would be good to test these. Is it the burial of amino acid at position 94 that is important, or the ability to form crossed helices?

      Because of how short the predicted coiled-coil region is, we did not identify any obvious positions that would likely have the same effect as Q94 substitution. We considered making helix-breaking mutations, which would be predicted to block RsbU activation, but favored analysis of the wildtype protein because of limitations in interpreting the effects of loss-of-function mutations.

      (4) Figure 2A, RsbT binding to RsbU: It was not entirely clear to this reviewer why one would expect the RsbT binding, not needed for activation, to be increased by the mutation that stabilizes the crossed alpha helices. The change is impressive but doesn't the lack of a need for RsbT suggest that this mutation bypasses the normal mechanism? (Is dimerization enuf? Or other protein cross helices?).

      We have modified the text to clarify this point: “One prediction of our hypothesis that RsbT stabilizes the crossed alpha helices of the RsbU dimer, is that RsbT should bind more tightly to rsbU<sup>Q94L</sup> than to RsbU because the coiled-coil conformation that RsbT binds would be more energetically favorable.” Another way of putting this is that if the Q94L substitution activates RsbU through an on-pathway mechanism, RsbT must bind more tightly.

      (5) Figure 3A, Figure S3: Please label the yellow (interface) residues in RsbU and RsbT in Fig. S3 and the green (suppressor) spheres in Figure 3A.

      We have added labels to the figures as suggested.

      If RbsT interacts with the N-terminal dimerization domain and linker, why were residues 174 and 178 (from PPM domain) shown to be implicated in binding?

      The fact that residues in the switch region suppress a mutation that decreases RsbT binding suggests that this region is part of an allosteric network that links RsbT binding, the linker, and dimerization of the phosphatase domains. For example, any substitution that promotes a conformation of the phosphatase domain that is more favorable for dimerization would also promote RsbT binding. However, the precise details of how each mutation fits into this network is not clear and we have therefore chosen to not specify a particular model to avoid over interpreting our data.

      Are these marked in Figure S3?

      We have added labels to make this clear.

      Are these part of a dimerization interface in the C-terminal domain? Are any/all of these RsbU mutants suppressed by Q94L, as one might predict (apparently Y28I is since Q94L was again identified)?

      We chose to focus on Y28I because it was the best studied previously, but we would predict that Q94L would suppress other RsbT binding mutations.

      (6) Line 191-192: Is it surprising that no suppressors were isolated in RsbT?

      We didn’t have a preconception of whether or not it would be possible to identify similar suppressors in RsbT. Explanations for why we did not identify such suppressors could include that RsbT may be destabilized more easily by substitution, that RsbT is more constrained because it has other interaction partners, or that the particular substitutions that would suppress Y28I are less common by the PCR mutagenesis strategy we used.

      (7) Figure 3: Would the same mutants arise if the screen had been done in the absence of RsbT? Was RsbT-dependent tested for the rsbU alleles?

      Our prediction is that we would not have identified any of these mutations except for Q94L in the absence of rsbT. We tested a few of the alleles and found them all to be rsbT dependent, but did not systematically test all of the alleles and therefore did not include this analysis in the manuscript.

      Given the findings earlier in the paper for Q94L, suggesting that this stabilizes the coiled-coil and shows some activity in the absence of RsbT, it seems that the interpretation of other mutants in this region (and Q94L itself) as evidence that RsbT contacts the linker directly and that contact is necessary for activation may be an overinterpretation. If these are in fact RsbT independent, they support the importance of the linker (do they further stabilize coiled-coil formation?), rather than the role of RsbT here. Are G92 and T89 on the outside of the coiled-coil? If Q94 is buried, is it qualitatively different from these others?

      G92 and T89 are predicted to be exposed. The fact that these mutations are near Q94 is part of the reason that we focused on R91 and the predicted contact with D92 of RsbT as another approach to validate the predicted interface.

      (8) Figure 3C addresses the issue of direct interaction of RsbT with the RsbU linker to some extent, given that RsbU R91E doesn't appear to have a lot of activity without RsbT. It would be helped by telling the reader what the R91 contact is initially.

      We have modified the text to clarify this point: “To test the model that RsbT activates RsbU by directly interacting with the linker to dimerize the RsbU phosphatase domains, we introduced a charge swap at position R91 that would abolish a predicted salt-bridge with RsbT D92 (Fig. 3C).”

      (9) Figure 4 and the discussion of it in the text is not likely to be easily understandable for many readers. Aside from providing a bit more explanation of what these analyses are showing, it would be useful to start the whole section (or maybe even much earlier in the paper) with the information found on lines 261-264, that other studies show that the N-terminus dimerizes stably on its own (and is it known that the C-terminus does not?). Then the discussion of the alternative models early in this section would be clearer.

      We have updated the introduction to emphasize this point “RsbU has an N-terminal four-helix bundle domain that dimerizes RsbU and is also the binding site for RsbT, which activates RsbU as a phosphatase (Fig. 1C,D) (Delumeau et al. 2004).”

      We have also added clarification to the model presented at the beginning of this section: “A second possibility is that inactive RsbU is dimerized by the N-terminal domains but that the linkers of inactive RsbU are flexible and that the phosphatase domains only interact with each other when RsbT orders the linkers into a crossing conformation.”

      Is the dimerization of the N-terminal domains previously determined similar/the same as what is seen in the AlphaFold models used here (or the AlphaFold dimerization derived primarily from that data?).

      Yes, the dimerization in the AlphaFold models matches closely to the published structure.

      (10) Discussion and Figure 5: The final part of this work predicts AlphaFold models for a set of other phosphatases involved in initiating GSR across bacterial species, and suggests that linked-mediated phosphatase dimerization is the critical factor to activate the phosphatase. Clearly, this is the most speculative but interesting aspect of the paper. A number of possible questions are suggested by some of this:

      a. Do any of the activating mutants In RsbU and RsbP in the PPM domain (that apparently improve dimerization and thus activation) do a similar job in the other modeled proteins?

      This is an interesting question, but unfortunately most of these proteins have not been biochemically characterized. We highlight examples of RsbP and E. coli RssB for which similar activating mutations have been characterized.

      b. The legend (Figure 5G) suggests that all of the linker combinations will be coiled-coils, but that they will undergo different types of activating (and dimerizing?) transitions. Is that in fact what is being proposed here?

      Yes, this is our working hypothesis.

      c. If there is no dimerization (as noted, only weak dimerization has been reported for E. coli RssB), does that generalize the model to there are linkers and their structures are important? At the least, would the folding up of the E. coli RssB linker with antiadaptor binding be considered another mode of signal transduction or rather some sort of storage form?

      Interestingly, the P. aeruginosa RssB constitutively dimerizes, suggesting the E. coli is the outlier.

      d. Would the "toolkit" model, in which different changes occur in the linker regions, suggest that the interacting proteins are going to be critical for the type of linker changes that will be important? Or something about the nature of the linkers themselves?

      This is an interesting question that we cannot yet answer. We have chosen to focus on the possible flexibility of this mechanism and anticipate that a variety of mechanisms will be used.

      e. Given the extensive comparison to E. coli RssB, the authors might consider a figure to clarify the relative domain architecture, sequences that are akin to switch regions, and others important to the discussion here.

      We tried to highlight this in Figure 5C including coloring the regions similar to the switch regions.

      Reviewer #3 (Recommendations for the authors):

      Given the caveats noted above related to the reliability of computed structure models, I would recommend the authors make the following additions/modifications to their manuscript:

      (1) The authors should show alpha fold models coloured by pLDDT scores in an additional supplementary figure to help the reader interpret the confidence level of the predicted structures.

      We have added these models to figure 1 – figure supplement 2.

      (2) Because of the points mentioned above the authors should tone down the generalisation relating to the activation mechanism of this family of phosphatases presented in the discussion.

      We have modified the paper throughout to emphasize where we are speculating.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1:

      Summary:

      Kimura et al performed a saturation mutagenesis study of CDKN2A to assess functionality of all possible missense variants and compare them to previously identified pathogenic variants. They also compared their assay result with those from in silico predictors.

      Strengths:

      CDKN2A is an important gene that modulate cell cycle and apoptosis; therefore it is critical to accurately assess functionality of missense variants. Overall, the paper reads well and touches upon major discoveries in a logical manner.

      Weaknesses:

      The paper lacks proper details for experiments and basic data, leaving the results less convincing. Analyses are superficial and does not provide variant-level resolution. Many of which were addressed during the revision process.

      Comments on revisions:

      The manuscript was improved during the revision process.

      We thank the reviewer for their comments. We are grateful for the opportunity to provide additional information and data to clarify our approach and study results.

      Reviewer #2:

      Summary:

      This study describes a deep mutational scan across CDKN2A using suppression of cell proliferation in pancreatic adenocarcinoma cells as a readout for CDKN2A function. The results are also compared to in silico variant predictors currently utilized by the current diagnostic frameworks to gauge these predictors' performance. The authors also functionally classify CDKN2A somatic mutations in cancers across different tissues.

      Review:

      The goal of this paper was to perform functional classification of missense mutations in CDKN2A in order to generate a resource to aid in clinical interpretation of CDKN2A genetic variants identified in clinical sequencing. In our initial review, we concluded that this paper was difficult to review because there was a lack of primary data and experimental detail. The authors have significantly improved the clarity, methodological detail and data exposition in this revision, facilitating a fuller scientific review. Based on the data provided we do not think the functional characterization of CDKN2A variants is robust or complete enough to meet the stated goal of aiding clinical variant interpretation. We think the underlying assay could be used for this purpose but different experimental design choices and more replication would be required for these data to be useful. Alternatively, the authors could also focus on novel CDKN2A variants as there seems to be potential gain of function mutations that are simply lumped into "neutral" that may have important biological implications.

      Major concerns:

      Low experimental concordance. The p-value scatter plot (Figure 2 Figure Supplement 3A) across 560 variants shows low collinearity indicating poor replicability. These data should be shown in log2fold changes, but even after model fitting with the gamma GLM still show low concordance which casts strong doubt on the function scores.

      Concordance among non-significant p-values is generally low because most of the signal comes from random variability across repeats. If the observed log2 fold change between the repeats is entirely due to noise, one would expect two repeated p-values to behave like independent random uniforms. True concordance is typically more evident in significant p-values because they reflect consistent effects above random noise. Functionally deleterious variants are called when their associated p-value is significant. To confirm this statement, a scatter plot with the log2 normalized fold change was added in Figure 2 Supplement 3C. We see low concordance between repeats in the log2 normalized fold changes centered around 0, corresponding to log log2 normalized changes mainly due to noise. The concordance increases as the variants become significant. One can notice that the correlation coefficient between duplicate assay results was almost identical between the model-based p-values and log2normalized fold change (Figure 2-figure supplement 3A and 3C, Appendix 1-table 4, and Appendix 1-table 6). Also, importantly, no variant was functionally deleterious in one replicate and functionally neutral in another, implying a perfect concordance in calls if we exclude variants that were called indeterminate in one of the two repeats. Finally, of variants with discordant classifications, only 6/560 repeats (1.1%) were functionally deleterious (significant p-value) in one replicate and of indeterminate function in another. We have updated the text as follows:

      “Of variants with discordant classifications, 6 (1.1%) were functionally deleterious in one replicate and of indeterminate function in another. While 102 variants (18.2%) were functionally neutral in one replicate and of indeterminate function in another. Importantly, no variant that was functionally deleterious in one replicate and functionally neutral in another (Appendix 1 -table 4). Furthermore, the correlation coefficient between duplicate assay results was similar using the gamma GLM and log2 normalized fold change (Figure 2-figure supplement 3A and 3C).”

      The more detailed methods provided indicate that the growth suppression experiment is done in 156 pools with each pool consisting of the 20 variants corresponding to one of the 156 aa positions in CKDN2A. There are several serious problems with this design.

      Batch effects in each of the pools preventing comparison across different residues. We think this is a serious design flaw and not standard for how these deep mutational scans are done. The standard would be to combine all 156 pools in a single experiment. Given the sequencing strategy of dividing up CDKN2A into 3 segments, the 156 pools could easily have been collapsed into 3 (1 to 53, 54 to 110, 111 to 156). This would significantly minimize variation in handling between variants at each residue and would be more manageable for performance of further replicates of the screen for reproducibility purposes. The huge variation in confluency time 16-40 days for each pool suggest that this batch effect is a strong source of variation in the experiment.

      While there is variation in time to confluency between different amino acid residues, we do not anticipate this batch effect to significantly affect variant classifications in our study. For example, our results were generally consistent with previous classifications. All synonymous variants (one per residue) and benchmark benign variants assayed were classified as functionally neutral. Furthermore, of benchmark pathogenic variants assayed, none were classified as functionally neutral. 84% were classified as functionally deleterious and 16 percent were classified as indeterminate function.

      Lack of experimental/biological replication: The functional assay was only performed once on all 156 CDKN2A residues and was repeated for only 28 out of 156 residues, with only ~80% concordance in functional classification between the first and second screens. This is not sufficiently robust for variant interpretation. Why was the experiment not performed more than once for most aa sites?

      In our study we determined functional classifications for all CDKN2A missense variants while assessing variability with replicates across 28 residues. Of these variants, only 6 (1.1%) were functionally deleterious in one replicate and of indeterminate function in another. Furthermore, no variant was functionally deleterious in one replicate and functionally neutral in another (Appendix 1 -table 4).  As noted above, we provided additional context in the manuscript.

      For the screen, the methods section states that PANC-1 cells were infected at MOI=1 while the standard is an MOI of 0.3-0.5 to minimize multiple variants integrating into a single cell. At an MOI =1 under a Poisson process which captures viral integration, ~25% of cells would have more than 1 lentiviral integrant. So in 25% of the cells the effect of a variant would be confounded by one or more other variants adding noise to the assay.

      As noted previously, we are not able to differentiate effects due to multiple viral integrations per cells. However, we do not anticipate multiple viral integrations to significantly affect variant classifications in our study as our results are consistent with previous classifications, as described above.

      While the authors provide more explanation of the gamma GLM, we strongly advise that the heatmap and replicate correlations be shown with the log2 fold changes rather than the fit output of the p-values.

      Thank you for the suggestion. As noted, we provide additional explanation in the manuscript about why we classified variants using a gamma GLM. Using a gamma GLM, classification thresholds were determined using the change in representation of 20 non-functional barcodes in a pool of PANC-1 cells stably expressing CDKN2A after a period of in vitro proliferation. Our variant classifications were therefore not based on assay outputs for previously reported – benchmark – pathogenic or begin variants to determine thresholds. We strongly prefer using p-values and classifications using the gamma GLM in the manuscript. However, comparison of assay outputs using a gamma GLM and log2 fold change are included in the manuscript. Read counts, log2 fold change, and classifications based on log2 fold change are presented in the manuscript, for all variants. Readers who wish to use these data may do so and we refer them to the manuscript text, Appendix 1 -table 4, Appendix 1 -table 6, and Figure 2 -figure supplement 2.

      In this study, the authors only classify variants into the categories "neutral", "indeterminate", or "deleterious" but they do not address CDKN2A gain-of-function variants that may lead to decreased proliferation. For example, there is no discussion on variants at residue 104, whose proliferation values mostly consist of higher magnitude negative log2fold change values. These variants are defined as neutral but from the one replicate of the experiment performed, they appear to be potential gain-of-function variants.

      We have added a comment to the discussion to highlight that we did not identify potential gain-of-function variants. Specifically:

      “We classified CDKN2A missense variants using a gamma GLM, as either functionally deleterious, indeterminate functional or functionally neutral. However, we did not classify variants that may have gain-of-function effects, resulting in decreased representation in the cell pool. Future studies are necessary to determine the prevalence and significance of CDKN2A gain-of-function variants.”

      Minor concerns:

      The differentiation between variants of "neutral" and "indeterminate" function seems unnecessary and it seems like there are too many variants that fall into the "indeterminate" category. The authors seem to have set numerical thresholds for CDKN2A function using benchmark variants of known function. While the benchmark variants are important as a frame of reference for the "dynamic range" of the assay, their function scores should not necessarily be used to define hard cutoffs of whether a variant's function score can be interpreted.

      We did not utilize benchmark variants to define thresholds for functional classifications using a gamma GLM. This is one of the strengths of using a gamma GLM model for classification. As explained in our manuscript, classification thresholds were determined using the change in representation of 20 non-functional barcodes in a pool of PANC-1 cells stably expressing CDKN2A after a period of in vitro proliferation. Our variant classifications were therefore not based on assay outputs for previously reported – benchmark – pathogenic or begin variants. While not required when using a gamma GLM, we included indeterminate classifications, which are not uncommon.

      Figure 2 supplement 2 - on the x-axis, should "intermediate" be "indeterminate"?

      This, and a similar typographical error in Figure 2 -figure supplement 3, has been corrected.

    1. Reviewer #3 (Public review):

      This study investigates the characteristics of the autofluorescence signal excited by 740 nm 2-photon excitation, in the range of 420-500 nm, across the Drosophila brain. The fluorescence lifetime (FL) appears bi-exponential, with a short 0.4 ns time constant followed by a longer decay. The lifetime decay and the resulting parameter fits vary across the brain. The resulting maps reveal anatomical landmarks, which simultaneous imaging of genetically encoded fluorescent proteins helps to identify. Past work has shown that the autofluorescence decay time course reflects the balance of the redox enzyme NAD(P)H vs. its protein-bound form. The ratio of free-to-bound NADPH is thought to indicate relative glycolysis vs. oxidative phosphorylation, and thus shifts in the free-to-bound ratio may indicate shifts in metabolic pathways. The basics of this measure have been demonstrated in other organisms, and this study is the first to use the FLIM module of the STELLARIS 8 FALCON microscope from Leica to measure autofluorescence lifetime in the brain of the fly. Methods include registering the brains of different flies to a common template and masking out anatomical regions of interest using fluorescence proteins.

      The analysis relies on fitting an FL decay model with two free parameters, f_free and t_bound. F_free is the fraction of the normalized curve contributed by a decaying exponential with a time constant of 0.4 ns, thought to represent the FL of free NADPH or NADH, which apparently cannot be distinguished. T_bound is the time constant of the second exponential, with scalar amplitude = (1-f_free). The T_bound fit is thought to represent the decay time constant of protein-bound NADPH but can differ depending on the protein. The study shows that across the brain, T_bound can range from 0 to >5 ns, whereas f_free can range from 0.5 to 0.9 (Figure 1a). These methods appear to be solid, the full range of fits are reported, including maximum likelihood quality parameters, and can be benchmarks for future studies.

      The authors measure the properties of NADPH-related autofluorescence of Kenyon Cells (KCs) of the fly mushroom body. The results from the three main figures are:

      (1) Somata and calyx of mushroom bodies have a longer average tau_bound than other regions (Figure 1e);

      (2) The f_free fit is higher for the calyx (input synapses) region than for KC somata (Figure 2b);

      (3) The average across flies of average f_free fits in alpha/beta KC somata decreases from 0.734 to 0.718. Based on the first two findings, an accurate title would be "Autofluorecense lifetime imaging reveals regional differences in NADPH state in Drosophila mushroom bodies."

      The third finding is the basis for the title of the paper and the support for this claim is unconvincing. First, the difference in alpha/beta f_free (p-value of 4.98E-2) is small compared to the measured difference in f_free between somas and calyces. It's smaller even than the difference in average soma f_free across datasets (Figure 2b vs c). The metric is also quite derived; first, the model is fit to each (binned) voxel, then the distribution across voxels is averaged and then averaged across flies. If the voxel distributions of f_free are similar to those shown in Supplementary Figure 2, then the actual f_free fits could range between 0.6-0.8. A more convincing statistical test might be to compare the distributions across voxels between alpha/beta vs alpha'/beta' vs. gamma KCs, perhaps with bootstrapping and including appropriate controls for multiple comparisons.

      I recommend the authors address two concerns. First, what degree of fluctuation in autofluorescence decay can we expect over time, e.g. over circadian cycles? That would be helpful in evaluating the magnitude of changes following conditioning. And second, if the authors think that metabolism shifts to OXPHOS over glycolosis, are there further genetic manipulations they could make? They test LDH knockdown in gamma KCs, why not knock it down in alpha/beta neurons? The prediction might be that if it prevents the shift to OXPHOS, the shift in f_free distribution in alpha/beta KCs would be attenuated. The extensive library of genetic reagents is an advantage of working with flies, but it comes with a higher standard for corroborating claims.

      FLIM as a method is not yet widely prevalent in fly neuroscience, but recent demonstrations of its potential are likely to increase its use. Future efforts will benefit from the description of the properties of the autofluorescence signal to evaluate how autofluorescence may impact measures of FL of genetically engineered indicators.

    1. After some time, I also realized that if design was problem solving, then we all design to some degree. When you rearrange your room to better access your clothes, you’re doing interior design. When you create a sign to remind your roommates about their chores, you’re doing information design. When you make a poster or a sign for a club, you’re doing graphic design. We may not do any of these things particularly well or with great expertise, but each of these is a design enterprise that has the capacity for expertise and skill.

      In my opinion, design is framed as something everyone does, not just professionals, which makes it feel more universal and accessible. I think simple actions like rearranging a room or making a sign are forms of design, even if they lack the formal methods and expertise of professional work. However, while this perspective is valuable, it overlooks how structured processes and iteration differentiate professional design from everyday problem-solving.

    1. AbstractMicrobiome-based disease prediction has significant potential as an early, non-invasive marker of multiple health conditions linked to dysbiosis of the human gut microbiota, thanks in part to decreasing sequencing and analysis costs. Microbiome health indices and other computational tools currently proposed in the field often are based on a microbiome’s species richness and are completely reliant on taxonomic classification. A resurgent interest in a metabolism-centric, ecological approach has led to an increased understanding of microbiome metabolic and phenotypic complexity revealing substantial restrictions of taxonomy-reliant approaches. In this study, we introduce a new metagenomic health index developed as an answer to recent developments in microbiome definitions, in an effort to distinguish between healthy and unhealthy microbiomes, here in focus, inflammatory bowel disease (IBD). The novelty of our approach is a shift from a traditional Linnean phylogenetic classification towards a more holistic consideration of the metabolic functional potential underlining ecological interactions between species. Based on well-explored data cohorts, we compare our method and its performance with the most comprehensive indices to date, the taxonomy-based Gut Microbiome Health Index (GMHI), and the high dimensional principal component analysis (hiPCA)methods, as well as to the standard taxon-, and function-based Shannon entropy scoring. After demonstrating better performance on the initially targeted IBD cohorts, in comparison with other methods, we retrain our index on an additional 27 datasets obtained from different clinical conditions and validate our index’s ability to distinguish between healthy and disease states using a variety of complementary benchmarking approaches. Finally, we demonstrate its superiority over the GMHI and the hiPCA on a longitudinal COVID-19 cohort and highlight the distinct robustness of our method to sequencing depth. Overall, we emphasize the potential of this metagenomic approach and advocate a shift towards functional approaches in order to better understand and assess microbiome health as well as provide directions for future index enhancements. Our method, q2-predict-dysbiosis (Q2PD), is freely available (https://github.com/Kizielins/q2-predict-dysbiosis).

      This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giaf015), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 1: Vanessa Marcelino

      The manuscript proposes a new method to distinguish between healthy and diseased human gut microbiomes. The topic is timely, as to date, there is no consensus on what constitutes a healthy microbiome. The key conceptual advance of this study is the integration of functional microbiome features to define health. Their new computational approach, q2-predict-dysbiosis (Q2PD), is open source and available on GitHub.

      While the manuscript is conceptually innovative and interesting for the scientific community, there are several major limitations in the current version of this study.

      1. To develop the Q2PD, they define features associated with health by comparing it with microbiome samples from IBD patients. There are many more non-healthy/dysbiotic phenotypes beyond IBD, therefore it is not accurate to use IBD as synonymous of dysbiosis as done throughout this version of the paper.

      2. The study initially tests the performance of Q2PD against other gut microbiome health indexes (GMHI and hiPCA) using the same data that was used to select the health-associated features of Q2PD. Model performance should be assessed on independent data. On a separate analysis, they do use different datasets (from GMHI and hiPCA), but these datasets seem to be incomplete - GMHI and hiPCA publications have included 10 or more disease categories, and it is unclear why only 4 categories are shown in this study.

      3. While Q2PD does provide visible improvements in differentiating some diseases from healthy phenotypes, the accuracy and sensitivity of Q2PD isn't clear. To adopt Q2PD, I would like to know what are the chances that the classification results will be correct.

      4. There is very little documentation on how to use Q2PD. What are the expect outputs for example, do we need to chose a threshold to define health? Is the method completely dependent on Humann and Metaphlan outputs, or other formats are accepted? The test data contain some samples with zero counts. I got an error when trying it with the test data (ValueError: node array from the pickle has an incompatible dtype…).

      Therefore, I recommend including a range of disease categories to develop Q2PD and use independent datasets to validate the model in terms of accuracy and sensitivity. Alternatively, consider focusing this contribution on IBD. Making the code more user friendly will drastically increase the adoption of Q2PD by the community.

      Please also use page and line numbers when submitting the next version. Other suggestions:

      Abstract: I recommend replacing 'attributed' with 'linked', as 'attributed' suggests that dysbiosis may be causing (rather than reflecting) disease.

      Results: Please indicate what it is meant by 'function' here - it will be good to clarify that this method uses Metaphlan's read-based approach to identify metabolic pathways. What is used, pathway completeness or abundance?

      Results regarding Figure 3a are difficult to interpret. Is 'non-negatively correlated' the same as 'positively correlated'? What does the colour gradient represent - their abundance in those groups, or the strength of their correlation?

      "We observed that the prevalence of the pairs positively correlated in health was higher than in a number of disease-associated groups (Figure 3b)" . This is a very generalised statement considering that only half of the comparisons were significant. How co-occurring species were selected?

      "To test this, we compared the contributions of MDFS-identified species to "core functions" in different groups (Supplementary Figure 4)." How was this comparison made, based on species correlations? The caption of these figures could include more detail - it just says 'Top species contributions to functions.' but how do you define 'top' ? What do the colours represent?

      'This finding was congruent with our earlier suspicions of functional plasticity; modulation of function and thus altered connectivity in the interaction network, shifting towards less abundant, non-core functions upon perturbation of homeostasis.' This is reasonable, but I don't understand how you can draw this conclusion from these figures where there seems to be no significant difference between health and disease.

      Section 'Testing q2-predict-dysbiosis, GMHI and hiPCA accuracy of prediction for healthy and IBD individuals'

      What is the difference between fraction of "core functions" found the fraction of "core functions" among all functions?

      "Most importantly, Q2PD produced visually the highest scores for all healthy in comparison to unhealthy cohorts" . This was not statistically significant. In fact, GMHI finds more significant differences between health and disease than Q2PD.

      Sup. Figure 7 - would be informative to add the name/description of these metabolites not just their ID).

      'Although the threshold of 0.6 as determinant of health by the Q2PD was not applicable to the new datasets'. Does the threshold to define health with Q2PD change depending on the dataset? What are the implications of this for the applicability of this index?

      Effects of sequencing depth - this is a very good addition to the paper, the effects of sequencing depth can be profound but are ignored in most studies, so I commend the authors for doing this here. It would be even better, in my opinion, if this was done with the same datasets used to test/compare Q2PD with other methods, as using a different dataset here adds a new layer of confounding factors.

      'the GMHI and the hiPCA produced the opposite trend, wrongly indicating patient recovery.' The difference here is striking, what is driving this trend?

      The Gut Microbiome Wellness Index 2 (GMWI2) is now published. I don't think it needs to be part of the benchmarking, but it could be acknowledged/cited here.

      Methods: More information on how the data was processed is needed - how were the abundance tables normalized? Which output from Humann was used for downstream analyses?

      To ensure reproducibility, please provide the scripts/code used for analyses and figures.

    1. AbstractBackground Spiders generally exhibit robust starvation resistance, with hunting spiders, represented by Heteropoda venatoria, being particularly outstanding in this regard. Given the challenges posed by climate change and habitat fragmentation, understanding how spiders adjust their physiology and behavior to adapt to the uncertainty of food resources is crucial for predicting ecosystem responses and adaptability.Results We sequenced the genome of H. venatoria and, through comparative genomic analysis, discovered significant expansions in gene families related to lipid metabolism, such as cytochrome P450 and steroid hormone biosynthesis genes. We also systematically analyzed the gene expression characteristics of H. venatoria at different starvation resistance stages and found that the fat body plays a crucial role during starvation in spiders. This study indicates that during the early stages of starvation, H. venatoria relies on glucose metabolism to meet its energy demands. In the middle stage, gene expression stabilizes, whereas in the late stage of starvation, pathways for fatty acid metabolism and protein degradation are significantly activated, and autophagy is increased, serving as a survival strategy under extreme starvation. Additionally, analysis of expanded P450 gene families revealed that H. venatoria has many duplicated CYP3 clan genes that are highly expressed in the fat body, which may help maintain a low-energy metabolic state, allowing H. venatoria to endure longer periods of starvation. We also observed that the motifs of P450 families in H. venatoria are less conserved than those in insects, which may be related to the greater polymorphism of spider genomes.Conclusions This research not only provides important genetic and transcriptomic evidence for understanding the starvation mechanisms of spiders but also offers new insights into the adaptive evolution of arthropods.

      This work has been peer reviewed in GigaScience (https://doi.org/10.1093/gigascience/giaf019), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Sandra Correa-Garhwal

      The manuscript "Genomic and transcriptomic analyses of Heteropoda venatoria reveal the expansion of P450 family for starvation resistance in spider" uses comparative genomics to study the underlying mechanisms of starvation resistance. I appreciate that the authors have produced a high-quality genome for an RTA species. The methods are sound and some interesting gene families are highlighted as key factors in starvation resistance.

      One primary concern I have relates to the study's setup and hypothesis. As currently written, the study comes across as a fishing expedition rather than a focused research project. Although the introduction is informative, it lacks a clear rationale for including this particular species. The reasoning only becomes apparent at the end of the gene family expansion and contraction section. Additionally, I am unsure if being an active hunter makes feeding more unpredictable compared to web-based prey capture. I recommend incorporating this information into the introductory paragraph to better establish the context for the analysis. While terms like "autophagy" and "energy homeostasis" are appropriate for a scientific audience, consider briefly defining them for clarity, especially if the intended audience might not be familiar with all the terminology. Although authors mention that there is no high-quality genome sequence for H. venatoria, it could be helpful to elaborate on why this is significant for understanding starvation resistance. A brief explanation of how genomic data could enhance understanding of the molecular mechanisms involved would strengthen this point. The conclusion provides a clear goal for your study, but it could be more impactful. You might want to emphasize the broader implications of your research findings for ecological conservation and biodiversity. End with a statement about the importance of understanding these mechanisms in the context of preserving ecosystems and addressing challenges posed by climate change.

      For the discussion, while the content is detailed, some parts feel slightly repetitive or could be more concise. For instance, the description of P450 gene expression could be streamlined by removing redundant mentions of their role in metabolic rate regulation. Example: In the discussion section "Interestingly, we found that some P450 families are expanded in H. venatoria, and most P450 genes are more highly expressed in the fat body than in other tissues…" This point is later reiterated in the sentence about other spider species. These ideas could be combined for efficiency. The paragraph about the phylogenetic analysis of the CYP3 clan could be shortened. While it is an interesting finding, some of the details (like the number of genes or proteins) might be better suited for the main text rather than a summary. Focusing more on the functional implications of these duplications would keep the reader engaged. Though the findings are well-explained, the broader significance could be emphasized more explicitly. For example, why is understanding these mechanisms important for the field of arachnid biology, evolutionary biology, or even practical applications (e.g., pest control, conservation)? You could add a closing sentence that ties everything together and highlights the broader relevance of the findings, such as the evolutionary or ecological importance of these adaptations in spiders.

      Other comments: Last paragraph of the introduction: When introducing Heteropoda venatoria, please spell out the species name the first time that is used. The sentence "However, these findings indicate that H. venatoria does not feed in a stable manner and often experiences periods of starvation." Does not fit the rest of the text. Finding from what study? Transcription design for starvation resistance in H. venatoria section: First sentence: What samples? confusing to start like this. Please add information about the samples. You could delete "the samples of H. venatoria were subjected to" it will read better. Are all 23 CYP# clan genes on chromosome 4 tandemly arrayed? Figure 4 - add more information about the figure. For pannel C, What do the red lines show? Grey? Numbers in the circles? While I know what they represent, other readers might not. The finding that H. venatoria chromosomes have undergone lots of chromosomal fragmentation is very interesting, and it is clearly shown on the figure. Which is why I think that more detail is needed. In this sentence "In Uloborus diversus, members of this subfamily are located on Chr5 and an unanchored scaffold." You need to specify which members. Figure 5 - Include a description of the tissues. What is Epi? Ducts? Tail?

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review): Summary:

      The authors demonstrate that two human preproprotein human mutations in the BMP4 gene cause a defect in proprotein cleavage and BMP4 mature ligand formation, leading to hypomorphic phenotypes in mouse knock-in alleles and in Xenopus embryo assays.

      Strengths:

      They provide compelling biochemical and in vivo analyses supporting their conclusions, showing the reduced processing of the proprotein and concomitant reduced mature BMP4 ligand protein from impressively mouse embryonic lysates. They perform excellent analysis of the embryo and post-natal phenotypes demonstrating the hypomorphic nature of these alleles. Interesting phenotypic differences between the S91C and E93G mutants are shown with excellent hypotheses for the differences. Their results support that BMP4 heterodimers act predominantly throughout embryogenesis whereas BMP4 homodimers play essential roles at later developmental stages.

      Weaknesses:

      (1) A control of BMP7 alone in the Xenopus assays seems important to excludeBMP7 homodimer activity in these assays.

      We and other have shown that BMP7 homodimers have weak or no activity while BMP4/7 heterodimers single at a much higher level than either BMP4 or BMP7 homodimers in Xenopus ectodermal and mesodermal cells. We have expanded the description of these published findings in the results section (lines 182-187). We have also added representative examples of experiments in which BMP4 and BMP7 alone controls are included (new Fig. S2). Since the level of activity of BMP7 + BMP4 variants is equivalent to that of BMP7 + WT BMP4, this cannot be accounted for by BMP7 homodimers.

      (2) The Discussion could be strengthened by more in-depth explanations of how BMP4 homodimer versus heterodimer signaling is supported by the results, so that readers do not have to think it all through themselves. Similarly, a discussion of why the S91C mutant has a stronger phenotype than E93G early in the Discussion would be helpful or least mention that it will be addressed later.

      We have revised the discussion as suggested by the reviewer. Please see responses to recommendations 2-4 below.

      Reviewer #1 (Recommendations for the authors):

      (1) A control of BMP7 injection alone seems missing when comparing the BMP4/7 variants. BMP4 in the embryo assays presented in Fig 1. Is it not possible that the activity observed is BMP7 homodimers, perhaps due to inhibited heterodimer formation by the BMP4 variant?

      Multiple published studies have shown that BMP7 homodimers have weak or no activity in Xenopus ectodermal and mesodermal cells, and that ½ dose of RNA encoding BMP4 and BMP7 together signals at a higher level than does a full dose of RNA encoding either BMP4 or BMP7 alone. We have expanded our description of these published findings (lines 182-187), have included additional details about RNA doses that were injected (line 156, 175, 182) and have added representative examples of experiments in which BMP4 and BMP7 controls were included in a new Figure (Fig. S2).

      (2) In reading the Discussion, I was continually thinking of the stronger phenotype of the S91C mutant compared to the E93G one, although both are discussed together throughout most of the Discussion. Only at the end of the Discussion is the stronger phenotype of S91C discussed with a compelling explanation for the stronger phenotype, not related to the phosphorylation site function. I wonder if it would be better placed earlier in Discussion or at least mentioned the difference in phenotypes that will be discussed later.

      We have moved the possible explanation of differences between Bmp4<sup>S91C</sup> and Bmp4<sup>E93G</sup> mutants to immediately follow the introductory paragraph of the results section.

      (3) Along these same lines, why is it that the E93G exhibits rather normal cleavage at E10.5? Might the mechanisms of cleavage vary in different contexts with phosphorylation-dependent cleavage not functioning at early stages of development? I believe the hypothesis is that it is cleaved due to heterodimerization with BMP7. More discussion of this excellent hypothesis should be provided with clear statements, rather than inferences, if I'm understanding this correctly. For example, I had to read 3 times the first sentence of the last paragraph on p.14 before I understood it. Better to break that sentence down and the one that follows it, so it is easier to understand.

      We have rewritten and expanded the paragraphs describing phenotypic and biochemical evidence for defective homodimer but not heterodimer signaling as suggested (lines 343-375). We have also more explicitly stated the possibility that normal cleavage of BMP4<sup>E93G</sup> in embryonic lystates may be due to a predominance of BMP4/7 heterodimers in early embryonic stages or spatiotemporal differences in phosphorylation-dependent cleavage of BMP4 homodimers (lines 369-372)

      (4) Similarly the last paragraph of the Discussion mentions that the authors provide evidence of BMP4 homodimer signaling. I agree with the authors, but I had to think through the evidence myself. Better if the authors clearly explain the evidence that points to this, as this is a very good point of

      See response to point 3, above. Thank you for these useful suggestions.

      (5) Last sentence, first paragraph on p.11 should be qualified for the E93G mutant to E13.5, since it was normal at E10.5 regarding Figure 4 results.

      Thank you for pointing this out. It has been corrected.

      (6) Skip the PC acronym, since it is only repeated once in the text and hard to remember almost 10 pages later when it is used again.

      We have corrected this.

      (7) In the Discussion, a typo in "a single intramolecular disulfide bond that stabilizes the dimer", should be 'intermolecular'.

      Thank you for catching our switch in the use of inter- and intramolecular. We have corrected this (lines 334-335).

      (8) At times the E93G mutant is referred to having early lethality, often in conjunction with S91C, while other times it is referred to as late lethality. Considering that the homozygotes die postnatally after weaning, most would consider it late lethality. In contrast S91C is indeed an early lethal.

      We have changed the wording in the introduction to state that “mice carrying Bmp4<sup>S91C</sup> or Bmp4<sup>E93G</sup> knock in mutations show embryonic or enhanced postnatal lethality, respectively,… (lines 141-143)” and have removed the word “early” from the title.

      Reviewer #2 (Public review): Summary:

      Kim et al. report that two disease mutations in proBMP4, Ser91Cys and Glu93Gly, which disrupt the Ser91 FAM20C phosphorylation site, block the activation of proBMP4 homodimers. Consequently, analysis of DMZ explants from Xenopus embryos expressing the proBMP4 S91C or E93G mutants showed reduced pSmad1 and tbxt1 expression. The block in BMP4 activity caused by the mutations could be overcome by co-expression of BMP7, suggesting that the missense mutations selectively affect the activity of BMP4 homodimers but not BMP4/7 heterodimers. The expert amphibian tissue transplant studies were extended to in vivo studies in Bmp4S91C/+ and Bmp4E93G/+ mice, demonstrating the impact of these mutations on embryonic development, particularly in female mice, in line with patient studies. Finally, studies in MEFs revealed that the mutations did not affect proBMP4 glycosylation or ER-to-Golgi transport but appeared to inhibit the furin-dependent cleavage of proBMP4 to BMP4. Based on these findings and AI (AlphaFold) modeling of proBMP4, the authors speculate that pSer91 influences access of furin to its cleavage site at Arg289AlaLysArg292.

      Strengths:

      The Xenopus and mouse studies are valuable and elegantly describe the impact of the S91C and E93G disease mutations on BMP signaling and embryonic development.

      Weaknesses:

      The interpretation of how the mutations may disturb the furin-mediated cleavage of proBMP4 is underdeveloped and does not consider all of their data. Understanding how pS91 influences the furin-dependent cleavage at Arg292 seems to be the crux of this work and thus warrants more consideration. Specifically:

      (1) Figure S1 may be significantly more informative than implied. The authors report that BMP4S91D activates pSmad1 only incrementally better than S91C and much less than WT BMP4. However, Fig. S1B does not support the conclusion on page 7 (numbering beginning with title page); "these findings suggest that phosphorylation of S91 is required to generate fully active BMP4 homodimers". The authors rightly note that the S91C change likely has manifold effects beyond inhibiting furin cleavage. The E93G change may also affect proBMP4 beyond disturbing FAM20C phosphorylation. Additional mutation analyses would strengthen the work.

      The major goal of generating and comparing the activity of the S91D mutant with S91C was to control for phosphorylation independent defects cause by the deleterious introduction of a cysteine residue, which might cause aberrant disulfide bonding. We opted to introduce S91D since “phosphomimics” can sometimes approximate the phosphorylated state. S91D has significantly higher activity than S91C (p<0.01) and has a less significant loss of activity (p<0.05) than does S91C (<p<0.0001) relative to wild type BMP4 (Fig. S1), consistent with deleterious effects of the cysteine residue and supporting a possible explanation for the more severe phenotype of S91C vs E93G mice. We have rewritten this section to clarify our interpretation (lines 165-174)and have changed our statement that our activity data “suggest the importance of phosphorylation” to a statement that they are consistent with this possibility (lines 179-180). We do not believe that further mutational analysis using activity assays in Xenopus would shed light on how or whether phosphorylation affects proteolytic activation of BMP4.

      (2) These findings in Figure S1 are potentially significant because they may inform how proBMP4 is protected from cleavage during transit through the TGN and entry into peripheral cellular compartments. Intriguing modeling studies in Figure 6 suggest that pSer91 is proximal to the furin cleavage site. Based on their presentation, pSer91 may contact Arg289, the critical P4 residue at the furin site. If so, might that suggest how pS91 may prevent furin cleavage, thus explaining why the S91D mutation inhibits processing as presented, and possibly how proBMP4 processing is delayed until transit to distal compartments (perhaps activated by a change in the endosomal microenvironment or a Ser91 phosphatase)? Have the authors considered or ruled out these possibilities? In addition to additional mutation analyses of the FAM20C site, moving the discussion of this model to an "Ideas and Speculation" subsection may be warranted.

      The model shown in Fig. 6B proposes the possibility that phosphorylation unmasks (rather than preventing) the furin cleavage motif due to the proximity of Ser91 to the cleavage site (lines 399-402). If S91D truly mimicked phosphorylation, we would predict it would facilitate processing rather than inhibiting it. We do not have data comparing cleavage of S91D relative to wild type BMP4 and have not generated knock in S91D mice to test this idea. While the reviewers questions are intriguing, they cannot be answered by mutational analysis of the FAM20C site and are beyond the scope of the current studies that sought to understand the impact of human pS91C and pE93G mutations and cell biological implications. We have moved the models to an “Ideas and Speculation” subsection as suggested (lines 377-414) since these models are meant to provoke further thought rather than provide definitive answers based on our data.

      (3) The lack of an in vitro protease assay to test the effect of the S91 mutations on furin cleavage is problematic.

      Although we routinely perform in vitro cleavage assays with recombinant furin, we don’t believe they would be informative on how S91 phosphorylation or mutation of this residue impacts cleavage since in vitro synthesized substrate used in these assays is neither dimerized not post-translationally modified, and cleavage would be tested in isolation from the endogenous trafficking environment that we propose influences cleavage.

      Reviewer #2 (Recommendations for the authors):

      (1) The impact of BMPS91A should be determined and paired with the S91D phosphomimic data to reveal if it causes proBMP4 to be cleaved prematurely and disturbs pSmad1 expression. Data for S93G should also be included.

      Our major goal in comparing the activity of S91D with S91C was to control for phosphorylation independent defects cause by the deleterious introduction of a cysteine residue in S91C, which might cause aberrant disulfide bonding. We opted to introduce S91D since “phosphomimics” can sometimes approximate the phosphorylated state. We note that S91D has significantly higher activity than S91C, consistent with deleterious effects of the cysteine residue and supporting a possible explanation for the more severe phenotype of S91C vs E93G mice. We have revised the wording of this section to clarify this. Our models predict that S91D would be cleaved more efficiently than S91C or S91A, if it really mimics the endogenous phosphorylated state, rather than being cleaved prematurely. Our biochemical analysis compares cleavage of endogenous BMP4 in wild type and mutant MEFs. Generation of S91D, S91A or S93G mutant mice to compare cleavage is beyond the scope of the current work.

      (2) Is the distance between pS91 and Arg289 close enough to form a hydrogen bond? If so, might this interaction influence furin access?

      AI modeling does not provide high probability prediction of structures surrounding the furin motif (see Fig. S7) and thus we cannot comment on whether or not these residues are close enough to form a hydrogen bond. We have revised the wording of the discussion to state “This simple model building indicates the possibility of direct contact between pSer91 and Arg289, and that phosphorylation is required for furin to access the cleavage site, although we note that predictions surrounding the furin motif represent low probability conformations (Fig. S7) (lines 399-402).”

      (3) The genotypes in Figure 2 are labeled awkwardly. Consider labeling the headers for the three subsections of panels (A-F, G-L, and M-O) differently.

      We have revised Fig. 2 to clarify that the three subsections of panels are distinct, and to emphasize that the middle subsection represents views of the right and left side of the same embryo.

      (4) The tables should be reformatted. As is, the labeling is frequently cut off, and the numbers of expected and observed progeny should both be stated to aid the reader.

      We thank the reviewer for noting the formatting errors in the tables, which we have corrected. We have also changed the tables so that normal or abnormal mendelian distributions are reported as numbers of observed/expected progeny rather than numbers/percent observed progeny.

      Reviewer #3 (Public review):

      Summary:

      The authors describe important new biochemical elements in the synthesis of a class of critical developmental signaling molecules, BMP4. They also present a highly detailed description of developmental anomalies in mice bearing known human mutations at these specific elements.

      Strengths:

      Exceptionally detailed descriptions of pathologies occurring in mutant mice. Novel findings regarding the interaction of propeptide phosphorylation and convertase cleavage, both of which will move the field forward. Provocative hypothesis regarding furin access to cleavage sites, supported by Alphafold predictions.

      Weaknesses:

      Figure 6A presents two testable models for pre-release access of furin to cleavage sites since physical separation of enzyme from substrate only occurs in one model; could immunocytochemistry resolve?

      Available reagents are not sensitive enough to detect endogenous furin and BMP4 with high resolution. Because PC/substrate interactions are transient, whereas the bulk of furin and BMP4 is distributed throughout the secretory pathway, it is not possible to co-immunolocalize furin and BMP4 in vivo at present. Studies using more advanced cell biological techniques such along with tagged proteins may enable us to test these hypotheses in the future.

      Reviewer #3 (Recommendations for the authors):

      This interesting paper presents new data on an important family of developmental signaling molecules, BMPs. Mutations at FAM20C consensus sites within BMP prodomains are known to cause birth defects. The authors have here explored differential effects of human mutations on hetero- and homodimer activity and maturation, issues that may well arise during human development. In addition to demonstrating the profound effect of these mutations on development in Xenopus and mice, the authors also show differential processing of BMP4 precursors bearing these mutations in MEF cells prepared from mutant embryos. Finally, they show that FAM20C plays a role in BMP4 prodomain processing with quite differing outcomes in homo- vs heterodimers, which they suggest is due to structural differences impacting furin access. While this latter idea remains speculative due to the lack of crystal structures (models are based on Alphafold) it is a highly promising line of work.

      The data are beautifully presented and will be of clear interest to all developmental biologists. Certain cell biology results may also extrapolate to other phosphorylated precursor molecules undergoing the interesting (and as yet unexplained) phenomenon of convertase cleavage immediately before secretion, for example, FGF23. I have only a few minor comments regarding the presentation, which is remarkably clear.

      (1) The introduction of BMP7 in the Abstract is abrupt. It should be described as a preferred dimerization partner for BMP4.

      Thank you for noting this. We have revised the first sentence of the abstract to better introduce BMP7(lines 49-50).

      (2) In Figure 1A, what is the small light green box?

      This is a small fragment released from the prodomain by the second cleavage. We have clarified this in the introduction (lines 112-114) and in the legend to Figure 1 (lines 758-759).

      (3) In the Discussion it might be relevant to mention that FAM20C propeptide is not cleaved by convertases but by S1P (Chen 2021).

      We have added this information to clarify (lines 394-396).

      (4) Figure 3, define VSD; Figure 5, Endo H removes sugars only from immature (nonsialylated) sugars, not from all chains as implied. More importantly, EndoH and PNGase remove N-linked sugars, yet Results refer only to O-linked glycosylation.

      Thank you for noting these oversights. We have defined VSD in Figure 3. We have also revised the headers for Fig. 5 and for the relevant subsection of the results to include N-linked glycosylation and note in the results that EndoH removes only immature N-linked carbohydrates (lines 301-304).

      (5) Figure 5- for clarity, I suggest it be broken up into two larger panels labeled "Embryos" and "MEFs"

      Thank you for this suggestion, we have subdivided the Figure into two panels.

      (6) Figure 6A presents two testable models for pre-release access of furin to cleavage sites since the physical separation of the enzyme from substrate only occurs in one model; could confocal immunocytochemistry resolve?

      Available reagents are not sensitive enough to detect endogenous furin and BMP4 with high resolution and PC/substrate interactions are transient whereas the bulk of both furin and BMP4 is in transit through the secretory pathway. For these reasons it is not possible to co-immunolocalize furin and BMP4 in vivo. Future studies using advanced cell biological techniques may enable us to test these hypotheses in the future.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Liu et al., present glmSMA, a network-regularized linear model that integrates single-cell RNA-seq data with spatial transcriptomics, enabling high-resolution mapping of cellular locations across diverse datasets. Its dual regularization framework (L1 for sparsity and generalized L2 via a graph Laplacian for spatial smoothness) demonstrates robust performance of their model and offers novel tools for spatial biology, despite some gaps in fully addressing spatial communication.

      Overall, the manuscript is commendable for its comprehensive benchmarking across different spatial omics platforms and its novel application of regularized linear models for cell mapping. I think this manuscript can be improved by addressing method assumptions, expanding the discussion on feature dependence and cell type-specific biases, and clarifying the mechanism of spatial communication.

      The conclusions of this paper are mostly well supported by data, but some aspects of model development and performance evaluation need to be clarified and extended.

      We thank the reviewer for their thoughtful comments. We will clarify the model assumptions and the feature selection process to make it more understandable. To clarify, the performance of glmSMA does not depend on cell type. For some rare cell types, the small number of cells can lead to a drop in performance. To better illustrate our results and reduce cell type-specific biases, we will shuffle and randomly sample the cell types.

      (1) What were the assumptions made behind the model? One of them could be the linear relationship between cellular gene expression and spatial location. In complex biological tissues, non-linear relationships could be present, and this would also vary across organ systems and species. Similarly, with regularization parameters, they can be tuned to balance sparsity and smoothness adequately but may not hold uniformly across different tissue types or data quality levels. The model also seems to assume independent errors with normal distribution and linear additive effects - a simplification that may overlook overdispersion or heteroscedasticity commonly observed in RNA-seq data.

      Thank you for this comment. We acknowledge that the non-linear relationships can be present in complex tissues and may not be fully captured by a linear model. 

      Our choice of a linear model was guided by an investigation of the relationship in the current datasets, which include intestinal villus, mouse brain, and fly embryo.

      There is a linear correlation between expression distance and physical distance [Nitzan et al]. Within a given anatomical structure, cells in closer proximity exhibit more similar expression patterns. In tissues where non-linear relationships are more prevalent—such as the human PDAC sample—our mapping results remain robust. We acknowledge that we have not yet tested our algorithm in highly heterogeneous regions like the liver, and we plan to include such analyses in future work if necessary. Regarding the regularization parameters, we agree that the balance between sparsity and smoothness is sensitive to tissue-specific variation and data quality. In our current implementation, we explored a range of values to find robust defaults.

      (2) The performance of glmSMA is likely sensitive to the number and quality of features used. With too few features, the model may struggle to anchor cells correctly due to insufficient discriminatory power, whereas too many features could lead to overfitting unless appropriately regularized. The manuscript briefly acknowledges this issue, but further systematic evaluation of how varying feature numbers affect mapping accuracy would strengthen the claims, particularly in settings where marker gene availability is limited. A simple way to show some of this would be testing on multiple spatial omics (imaging-based) platforms with varying panel sizes and organ systems. Related to this, based on the figures, it also seems like the performance varies by cell type. What are the factors that contribute to this? Variability in expression levels, RNA quantity/quality? Biases in the panel? Personally, I am also curious how this model can be used similarly/differently if we have a FISH-based, high-plex reference atlas. Additional explanation around these points would be helpful for the readers.

      Thank you for this thoughtful comment. The performance of our method is indeed sensitive to the number and quality of selected features. To optimize feature selection, we employed multiple strategies, including Moran’s I statistic, identification of highly variable genes, and the Seurat pipeline to detect anchor genes linking the spatial transcriptomics data with the reference atlas. The number of selected markers depends on the quality of the data. For high-quality datasets, fewer than 100 markers are typically sufficient for accurate prediction. To address this more clearly, we will revise the manuscript to include detailed descriptions of our feature selection process and demonstrate how varying the number of selected features impacts performance.

      We evaluated our method across diverse tissue types and platforms—including Slide-seq, 10x Visium, and Virtual-FISH—which represent both sequencing-based and imaging-based spatial transcriptomics technologies. Our model consistently achieved strong performance across these settings. It's worth noting that the performance of other methods, such as CellTrek [Wei et al] and novoSpaRc [Nitzan et al], also depends heavily on feature selection. In particular, performance degrades substantially when fewer features are used.

      We do not believe that the observed performance is directly influenced by cell type composition. Major cell types are typically well-defined, and rare cell types comprise only a small fraction of the dataset. For these rare populations, a single misclassification can disproportionately impact metrics like KL divergence due to small sample size. However, this does not necessarily indicate a systematic cell type–specific bias in the mapping. To mitigate this issue, we will implement shuffling and sampling procedures to reduce potential bias introduced by rare cell types.

      (3) Application 3 (spatial communication) in the graphical abstract appears relatively underdeveloped. While it is clear that the model infers spatial proximities, further explanation of how these mappings translate into insights into cell-cell communication networks would enhance the biological relevance of the findings.

      Thank you for this valuable feedback. We agree that further elaboration on the connection between spatial proximity and cell–cell communication would enhance the biological interpretation of our results. While our current model focuses on inferring spatial relationships, we may provide some cell-cell communications in the future.

      (4) What is the final resolution of the model outputs? I am assuming this is dictated by the granularity of the reference atlas and the imposed sparsity via the L1 norm, but if there are clear examples that would be good. In figures (or maybe in practice too), cells seem to be assigned to small, contiguous patches rather than pinpoint single-cell locations, which is a pragmatic compromise given the inherent limitations of current spatial transcriptomics technologies. Clarification on the precise spatial scale (e.g., pixel or micrometer resolution) and any post-mapping refinement steps would be beneficial for the users to make informed decisions on the right bioinformatic tools to use.

      Thank you for the comment. For each cell, our algorithm generates a probability vector that indicates its likely spatial assignment along with coordinate information. We will include the resolution and the number of cells assigned to each spot in future versions. In our framework, each cell is mapped to one or more spatial locations with associated probabilities. Depending on the amount of regularization through L1 and L2 norms, a cell may be localized to a small patch or distributed over a broader domain. For the 10x Visium data, we applied a repelling algorithm to enhance visualization [Wei et al]. If a cell’s original location is already occupied, it is reassigned to a nearby neighborhood to avoid overlap. The users can also see the entire regularization path by varying the penalty terms. 

      Nitzan M, Karaiskos N, Friedman N, Rajewsky N. Gene expression cartography. Nature. 2019;576(7785):132-137. doi:10.1038/s41586-019-1773-3

      Wei, R. et al. (2022) ‘Spatial charting of single-cell transcriptomes in tissues’, Nature Biotechnology, 40(8), pp. 1190–1199. doi:10.1038/s41587-022-01233-1. 

      Reviewer #2 (Public review):

      Summary:

      The author proposes a novel method for mapping single-cell data to specific locations with higher resolution than several existing tools.

      Thank you for recognizing our contribution. Our goal was to develop a method that achieves higher spatial resolution in mapping single-cell data compared to existing tools. We are encouraged by the results and will continue to refine the approach to improve accuracy and generalizability across platforms and tissue types.

      Strengths:

      The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus.

      Thank you for this comment. We believe that evaluating our method across diverse tissue types—such as the mouse cortex, human PDAC, and intestinal villus—demonstrates its robustness and broad applicability. We plan to continue expanding these evaluations to additional tissue contexts and species to further validate the method’s generalizability.

      Weakness:

      (1) Although the researchers claim that glmSMA seamlessly accommodates both sequencing-based and image-based spatial transcriptomics (ST) data, their testing primarily focused on sequencing-based ST data, such as Visium and Slide-seq. To demonstrate its versatility for spatial analysis, the authors should extend their evaluation to imaging-based spatial data.

      Thank you for the comment. We have tested our algorithm on the virtual FISH dataset from the fly embryo, which serves as an example of image-based spatial omics data. However, such datasets often contain a limited number of available genes. To address this, we will conduct additional testing on image-based data if needed. The Allen Brain Atlas provides high-quality ISH data, and we can select specific brain regions from this resource to further evaluate our algorithm if necessary [Lein et al]. Currently, we plan to focus more on the 10x Visium platform, as it supports whole-transcriptome profiling and offers a wide range of tissue samples for analysis.

      (2) The definition of "ground truth" for spatial distribution is unclear. A more detailed explanation is needed on how the "ground truth" was established for each spatial dataset and how it was utilized for comparison with the predicted distribution generated by various spatial mapping tools.

      Thank you for the comment. To clarify how ground truth is defined across different tissues, we provide the following details. Direct ground truth for cell locations is often unavailable in scRNA-seq data due to experimental constraints. To address this, we adopted alternative strategies for estimating ground truth in each dataset:

      - 10x Visium Data: We used the cell type distribution derived from spatial transcriptomics (ST) data as a proxy for ground truth. We then computed the KL divergence between this distribution and our model's predictions for performance assessment.

      - Slide-seq Data: We validated predictions by comparing the expression of marker genes between the reconstructed and original spatial data.

      - Fly Embryo Data: We used predicted cell locations from novoSpaRc as a reference for evaluating our algorithm.

      These strategies allowed us to evaluate model performance even in the absence of direct cell location data. In addition, we can apply multiple evaluation strategies within a single dataset.

      (3) In the analysis of spatial mapping results using intestinal villus tissue, only Figure 3d supports their findings. The researchers should consider adding supplemental figures illustrating the spatial distribution of single cells in comparison to the ground truth distribution to enhance the clarity and robustness of their investigation.

      Thank you for the comment. We will include additional details for this dataset in the supplementary figures. As the intestinal villus is a relatively simple tissue, most existing algorithms performed well on it. For this reason, we did not initially provide extensive details in the main text.

      (4) The spatial mapping tests were conducted on various tissues, including the mouse cortex, human PDAC, and intestinal villus. However, the original anatomical regions are not displayed, making it difficult to directly compare them with the predicted mapping results. Providing ground truth distributions for each tested tissue would enhance clarity and facilitate interpretation. For instance, in Figure 2a and Supplementary Figures 1 and 2, only the predicted mapping results are shown without the corresponding original spatial distribution of regions in the mouse cortex. Additionally, in Figure 3c, four anatomical regions are displayed, but it is unclear whether the figure represents the original spatial regions or those predicted by glmSMA. The authors are encouraged to clarify this by incorporating ground truth distributions for each tissue.

      Thank you for the comment. To improve visualization, we will include anatomical structures alongside the mapping results in the next version, wherever such structures are available (e.g., mouse brain cortex, human PDAC sample, etc.). Regions will be color-coded to enhance clarity and make the spatial organization easier to interpret.

      (5) The cell assignment results from the mouse hippocampus (Supplementary Figure 6) lack a corresponding ground truth distribution for comparison. DG and CA cells were evaluated solely based on the gene expression of specific marker genes. Additional analyses are needed to further validate the robustness of glmSMA's mapping performance on Slide-seq data from the mouse hippocampus.

      Thank you for the comment. The ground truth for DG and CA cells was not available. To better evaluate the model's performance, we will compute the KL divergence between the original and predicted cell type distributions, following the same approach used for the 10x Visium dataset.

      (6) The tested spatial datasets primarily consist of highly structured tissues with well-defined anatomical regions, such as the brain and intestinal villus. Anatomical regions are not distinctly separated, such as liver tissue. Further evaluation of such tissues would help determine the method's broader applicability.

      Thank you for the comment. We have already tested our algorithm on the fly embryo, where anatomical structures are not well defined or clearly separated. If needed, we can further apply glmSMA to more complex tissues such as the liver. To clarify the role of anatomical structures in our model: glmSMA does not require anatomical information as input. Instead, it leverages a distance matrix between cells to apply L2 norm regularization. Despite the absence of anatomical information, the model still demonstrates strong performance. We will include results to illustrate its effectiveness without anatomical input. Additionally, we plan to evaluate the model on tissues where anatomical regions are not clearly delineated.

      Lein, E., Hawrylycz, M., Ao, N. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007). https://doi.org/10.1038/nature05453

      Reviewer #3 (Public review):

      Summary:

      The authors aim to develop glmSMA, a network-regularized linear model that accurately infers spatial gene expression patterns by integrating single-cell RNA sequencing data with spatial transcriptomics reference atlases. Their goal is to reconstruct the spatial organization of individual cells within tissues, overcoming the limitations of existing methods that either lack spatial resolution or sensitivity.

      Strengths:

      (1) Comprehensive Benchmarking:

      Compared against CellTrek and Novosparc, glmSMA consistently achieved lower Kullback-Leibler divergence (KL divergence) scores, indicating better cell assignment accuracy.

      Outperformed CellTrek in mouse cortex mapping (90% accuracy vs. CellTrek's 60%) and provided more spatially coherent distributions.

      (2) Experimental Validation with Multiple Real-World Datasets:

      The study used multiple biological systems (mouse brain, Drosophila embryo, human PDAC, intestinal villus) to demonstrate generalizability.

      Validation through correlation analyses, Pearson's coefficient, and KL divergence support the accuracy of glmSMA's predictions.

      We thank reviewer #3 for their positive feedback and thoughtful recommendations.

      Weaknesses:

      (1) The accuracy of glmSMA depends on the selection of marker genes, which might be limited by current FISH-based reference atlases.

      We agree that the accuracy of glmSMA is influenced by the selection of marker genes, and that current FISH-based reference atlases may offer a limited gene set. To address this, we incorporate multiple feature selection strategies, including highly variable genes and spatially informative genes (e.g., via Moran’s I), to optimize performance within the available gene space. As more comprehensive reference atlases become available, we expect the model’s accuracy to improve further.

      (2) glmSMA operates under the assumption that cells with similar gene expression profiles are likely to be physically close to each other in space which not be true under various heterogeneous environments.

      While this assumption effectively captures spatial continuity in many cases, we acknowledge that it may not hold across all biological contexts. To address this, we plan to refine our regularization strategy and evaluate the model's performance in heterogeneous tissue regions.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the Reviewers

      I would like to thank the reviewers for their comments and interest in the manuscript and the study.

      Reviewer #1

      1. I would assume that there are RNA-seq and/or ChIP-seq data out there produced after knockdown of one or more of these DBPs that show directional positioning.

      The directional positioning of CTCF-binding sites at chromatin interaction sites was analyzed by CRISPR experiment (Guo Y et al. Cell 2015). We found that the machine learning and statistical analysis showed the same directional bias of CTCF-binding motif sequence and RAD21-binding motif sequence at chromatin interaction sites as the experimental analysis of Guo Y et al. (lines 229-253, Figure 3b, c, d and Table 1). Since CTCF is involved in different biological functions (Braccioli L et al. Essays Biochem. 2019 ResearchGate webpage), the directional bias of binding sites may be reduced in all binding sites including those at chromatin interaction sites (lines 68-73). In our study, we investigated the DNA-binding sites of proteins using the ChIP-seq data of DNA-binding proteins and DNase-seq data. We also confirmed that the DNA-binding sites of SMC3 and RAD21, which tend to be found in chromatin loops with CTCF, also showed the same directional bias as CTCF by the computational analysis.

      __2. Figure 6 should be expanded to incorporate analysis of DBPs not overlapping CTCF/cohesin in chromatin interaction data that is important and potentially more interesting than the simple DBPs enrichment reported in the present form of the figure. __

      Following the reviewer's advice, I performed the same analysis with the DNA-binding sites that do no overlap with the DNA-binding sites of CTCF and cohesin (RAD21 and SMC3) (Fig. 6 and Supplementary Fig. 4). The result showed the same tendency in the distribution of DNA-binding sites. The height of a peak on the graph became lower for some DNA-binding proteins after removing the DNA-binding sites that overlapped with those of CTCF and cohesin. I have added the following sentence on lines 435 and 829: For the insulator-associated DBPs other than CTCF, RAD21, and SMC3, the DNA-binding sites that do not overlap with those of CTCF, RND21, and SMC3 were used to examine their distribution around interaction sites.

      3. Critically, I would like to see use of Micro-C/Hi-C data and ChIP-seq from these factors, where insulation scores around their directionally-bound sites show some sort of an effect like that presumed by the authors - and many such datasets are publicly-available and can be put to good use here.

      As suggested by the reviewer, I have added the insulator scores and boundary sites from the 4D nucleome data portal as tracks in the UCSC genome browser. The insulator scores seem to correspond to some extent to the H3K27me3 histone marks from ChIP-seq (Fig. 4a and Supplementary Fig. 3). We found that the DNA-binding sites of the insulator-associated DBPs were statistically overrepresented in the 5 kb boundary sites more than other DBPs (Fig. 4d). The direction of DNA-binding sites on the genome can be shown with different colors (e.g. red and green), but the directionality of insulator-associated DNA-binding sites is their overall tendency, and it may be difficult to notice the directionality from each binding site because the directionality may be weaker than that of CTCF, RAD21, and SMC3 as shown in Table 1 and Supplementary Table 2. We also observed the directional biases of CTCF, RAD21, and SMC3 by using Micro-C chromatin interaction data as we estimated, but the directionality was more apparent to distinguish the differences between the four directions of FR, RF, FF, and RR using CTCF-mediated ChIA-pet chromatin interaction data (lines 287 and 288).

       I found that the CTCF binding sites examined by a wet experiment in the previous study may not always overlap with the boundary sites of chromatin interactions from Micro-C assay (Guo Y et al. *Cell* 2015). The chromatin interaction data do not include all interactions due to the high sequencing cost of the assay, and include less long-range interactions due to distance bias. The number of the boundary sites may be smaller than that of CTCF binding sites acting as insulators and/or some of the CTCF binding sites may not be locate in the boundary sites. It may be difficult for the boundary location algorithm to identify a short boundary location. Due to the limitations of the chromatin interaction data, I planned to search for insulator-associated DNA-binding proteins without using chromatin interaction data in this study.
      
       I discussed other causes in lines 614-622: Another reason for the difference may be that boundary sites are more closely associated with topologically associated domains (TADs) of chromosome than are insulator sites. Boundary sites are regions identified based on the separation of numerous chromatin interactions. On the other hand, we found that the multiple DNA-binding sites of insulator-associated DNA-binding proteins were located close to each other at insulator sites and were associated with distinct nested and focal chromatin interactions, as reported by Micro-C assay. These interactions may be transient and relatively weak, such as tissue/cell type, conditional or lineage-specific interactions.
      
       Furthermore, I have added the statistical summary of the analysis in lines 372-395 as follows: Overall, among 20,837 DNA-binding sites of the 97 insulator-associated proteins found at insulator sites identified by H3K27me3 histone modification marks (type 1 insulator sites), 1,315 (6%) overlapped with 264 of 17,126 5kb long boundary sites, and 6,137 (29%) overlapped with 784 of 17,126 25kb long boundary sites in HFF cells. Among 5,205 DNA-binding sites of the 97 insulator-associated DNA-binding proteins found at insulator sites identified by H3K27me3 histone modification marks and transcribed regions (type 2 insulator sites), 383 (7%) overlapped with 74 of 17,126 5-kb long boundary sites, 1,901 (37%) overlapped with 306 of 17,126 25-kb long boundary sites. Although CTCF-binding sites separate active and repressive domains, the limited number of DNA-binding sites of insulator-associated proteins found at type 1 and 2 insulator sites overlapped boundary sites identified by chromatin interaction data. Furthermore, by analyzing the regulatory regions of genes, the DNA-binding sites of the 97 insulator-associated DNA-binding proteins were found (1) at the type 1 insulator sites (based on H3K27me3 marks) in the regulatory regions of 3,170 genes, (2) at the type 2 insulator sites (based on H3K27me3 marks and gene expression levels) in the regulatory regions of 1,044 genes, and (3) at insulator sites as boundary sites identified by chromatin interaction data in the regulatory regions of 6,275 genes. The boundary sites showed the highest number of overlaps with the DNA-binding sites. Comparing the insulator sites identified by (1) and (3), 1,212 (38%) genes have both types of insulator sites. Comparing the insulator sites between (2) and (3), 389 (37%) genes have both types of insulator sites. From the comparison of insulator and boundary sites, we found that (1) or (2) types of insulator sites overlapped or were close to boundary sites identified by chromatin interaction data.
      

      4. The suggested alternative transcripts function, also highlighted in the manuscripts abstract, is only supported by visual inspection of a few cases for several putative DBPs. I believe this is insufficient to support what looks like one of the major claims of the paper when reading the abstract, and a more quantitative and genome-wide analysis must be adopted, although the authors mention it as just an 'observation'.

      According to the reviewer's comment, I performed the genome-wide analysis of alternative transcripts where the DNA-binding sites of insulator-associated proteins are located near splicing sites. The DNA-binding sites of insulator-associated DNA-binding proteins were found within 200 bp centered on splice sites more significantly than the other DNA-binding proteins (Fig. 4e and Table 2). I have added the following sentences on lines 405 - 412: We performed the statistical test to estimate the enrichment of insulator-associated DNA-binding sites compared to the other DNA-binding proteins, and found that the insulator-associated DNA-binding sites were significantly more abundant at splice sites than the DNA-binding sites of the other proteins (Fig 4e and Table 2; Mann‒Whitney U test, p value 5. Figure 1 serves no purpose in my opinion and can be removed, while figures can generally be improved (e.g., the browser screenshots in Figs 4 and 5) for interpretability from readers outside the immediate research field.

      I believe that the Figure 1 would help researchers in other fields who are not familiar with biological phenomena and functions to understand the study. More explanation has been included in the Figures and legends of Figs. 4 and 5 to help readers outside the immediate research field understand the figures.

      6. Similarly, the text is rather convoluted at places and should be re-approached with more clarity for less specialized readers in mind.

      Reviewer #2's comments would be related to this comment. I have introduced a more detailed explanation of the method in the Results section, as shown in the responses to Reviewer #2's comments.

      Reviewer #2

      1. Introduction, line 95: CTCF appears two times, it seems redundant.

      On lines 91-93, I deleted the latter CTCF from the sentence "We examine the directional bias of DNA-binding sites of CTCF and insulator-associated DBPs, including those of known DBPs such as RAD21 and SMC3".

      2. Introduction, lines 99-103: Please stress better the novelty of the work. What is the main focus? The new identified DPBs or their binding sites? What are the "novel structural and functional roles of DBPs" mentioned?

      Although CTCF is known to be the main insulator protein in vertebrates, we found that 97 DNA-binding proteins including CTCF and cohesin are associated with insulator sites by modifying and developing a machine learning method to search for insulator-associated DNA-binding proteins. Most of the insulator-associated DNA-binding proteins showed the directional bias of DNA-binding motifs, suggesting that the directional bias is associated with the insulator.

       I have added the sentence in lines 96-99 as follows: Furthermore, statistical testing the contribution scores between the directional and non-directional DNA-binding sites of insulator-associated DBPs revealed that the directional sites contributed more significantly to the prediction of gene expression levels than the non-directional sites. I have revised the statement in lines 101-110 as follows: To validate these findings, we demonstrate that the DNA-binding sites of the identified insulator-associated DBPs are located within potential insulator sites, and some of the DNA-binding sites in the insulator site are found without the nearby DNA-binding sites of CTCF and cohesin. Homologous and heterologous insulator-insulator pairing interactions are orientation-dependent, as suggested by the insulator-pairing model based on experimental analysis in flies. Our method and analyses contribute to the identification of insulator- and chromatin-associated DNA-binding sites that influence EPIs and reveal novel functional roles and molecular mechanisms of DBPs associated with transcriptional condensation, phase separation and transcriptional regulation.
      

      3. Results, line 111: How do the SNPs come into the procedure? From the figures it seems the input is ChIP-seq peaks of DNBPs around the TSS.

      On lines 121-124, to explain the procedure for the SNP of an eQTL, I have added the sentence in the Methods: "If a DNA-binding site was located within a 100-bp region around a single-nucleotide polymorphism (SNP) of an eQTL, we assumed that the DNA-binding proteins regulated the expression of the transcript corresponding to the eQTL".

      4. Again, are those SNPs coming from the different cell lines? Or are they from individuals w.r.t some reference genome? I suggest a general restructuring of this part to let the reader understand more easily. One option could be simplifying the details here or alternatively including all the necessary details.

      On line 119, I have included the explanation of the eQTL dataset of GTEx v8 as follows: " The eQTL data were derived from the GTEx v8 dataset, after quality control, consisting of 838 donors and 17,382 samples from 52 tissues and two cell lines". On lines 681 and 865, I have added the filename of the eQTL data "(GTEx_Analysis_v8_eQTL.tar)".

      5. Figure 1: panel a and b are misleading. Is the matrix in panel a equivalent to the matrix in panel b? If not please clarify why. Maybe in b it is included the info about the SNPs? And if yes, again, what is then difference with a.

      The reviewer would mention Figure 2, not Figure 1. If so, the matrices in panels a and b in Figure 2 are equivalent. I have shown it in the figure: The same figure in panel a is rotated 90 degrees to the right. The green boxes in the matrix show the regions with the ChIP-seq peak of a DNA-binding protein overlapping with a SNP of an eQTL. I used eQTL data to associate a gene with a ChIP-seq peak that was more than 2 kb upstream and 1 kb downstream of a transcriptional start site of a gene. For each gene, the matrix was produced and the gene expression levels in cells were learned and predicted using the deep learning method. I have added the following sentences to explain the method in lines 133 - 139: Through the training, the tool learned to select the binding sites of DNA-binding proteins from ChIP-seq assays that were suitable for predicting gene expression levels in the cell types. The binding sites of a DNA-binding protein tend to be observed in common across multiple cell and tissue types. Therefore, ChIP-seq data and eQTL data in different cell and tissue types were used as input data for learning, and then the tool selected the data suitable for predicting gene expression levels in the cell types, even if the data were not obtained from the same cell types.

      6. Line 386-388: could the author investigate in more detail this observation? Does it mean that loops driven by other DBPs independent of the known CTCF/Cohesin? Could the author provide examples of chromatin structural data e.g. MicroC?

      As suggested by the reviewer, to help readers understand the observation, I have added Supplementary Fig. S4c to show the distribution of DNA-binding sites of "CTCF, RAD21, and SMC3" and "BACH2, FOS, ATF3, NFE2, and MAFK" around chromatin interaction sites. I have modified the following sentence to indicate the figure on line 501: Although a DNA-binding-site distribution pattern around chromatin interaction sites similar to those of CTCF, RAD21, and SMC3 was observed for DBPs such as BACH2, FOS, ATF3, NFE2, and MAFK, less than 1% of the DNA-binding sites of the latter set of DBPs colocalized with CTCF, RAD21, or SMC3 in a single bin (Fig. S4c).

       In Aljahani A et al. *Nature Communications* 2022, we find that depletion of cohesin causes a subtle reduction in longer-range enhancer-promoter interactions and that CTCF depletion can cause rewiring of regulatory contacts. Together, our data show that loop extrusion is not essential for enhancer-promoter interactions, but contributes to their robustness and specificity and to precise regulation of gene expression. Goel VY et al. *Nature Genetics* 2023 mentioned in the abstract: Microcompartments frequently connect enhancers and promoters and though loss of loop extrusion and inhibition of transcription disrupts some microcompartments, most are largely unaffected. These results suggested that chromatin loops can be driven by other DBPs independent of the known CTCF/Cohesin.
      
      I added the following sentence on lines 569-577: The depletion of cohesin causes a subtle reduction in longer-range enhancer-promoter interactions and that CTCF depletion can cause rewiring of regulatory contacts. Another group reported that enhancer-promoter interactions and transcription are largely maintained upon depletion of CTCF, cohesin, WAPL or YY1. Instead, cohesin depletion decreased transcription factor binding to chromatin. Thus, cohesin may allow transcription factors to find and bind their targets more efficiently. Furthermore, the loop extrusion is not essential for enhancer-promoter interactions, but contributes to their robustness and specificity and to precise regulation of gene expression.
      
       FOXA1 pioneer factor functions as an initial chromatin-binding and chromatin-remodeling factor and has been reported to form biomolecular condensates (Ji D et al. *Molecular Cell* 2024). CTCF have also found to form transcriptional condensate and phase separation (Lee R et al. *Nucleic acids research* 2022). FOS was found to be an insulator-associated DNA-binding protein in this study and is potentially involved in chromatin remodeling, transcription condensation, and phase separation with the other factors such as BACH2, ATF3, NFE2 and MAFK. I have added the following sentence on line 556: FOXA1 pioneer factor functions as an initial chromatin-binding and chromatin-remodeling factor and has been reported to form biomolecular condensates.
      

      7. In general, how the presented results are related to some models of chromatin architecture, e.g. loop extrusion, in which it is integrated convergent CTCF binding sites?

      Goel VY et al. Nature Genetics 2023 identified highly nested and focal interactions through region capture Micro-C, which resemble fine-scale compartmental interactions and are termed microcompartments. In the section titled "Most microcompartments are robust to loss of loop extrusion," the researchers noted that a small proportion of interactions between CTCF and cohesin-bound sites exhibited significant reductions in strength when cohesin was depleted. In contrast, the majority of microcompartmental interactions remained largely unchanged under cohesin depletion. Our findings indicate that most P-P and E-P interactions, aside from a few CTCF and cohesin-bound enhancers and promoters, are likely facilitated by a compartmentalization mechanism that differs from loop extrusion. We suggest that nested, multiway, and focal microcompartments correspond to small, discrete A-compartments that arise through a compartmentalization process, potentially influenced by factors upstream of RNA Pol II initiation, such as transcription factors, co-factors, or active chromatin states. It follows that if active chromatin regions at microcompartment anchors exhibit selective "stickiness" with one another, they will tend to co-segregate, leading to the development of nested, focal interactions. This microphase separation, driven by preferential interactions among active loci within a block copolymer, may account for the striking interaction patterns we observe.

       The authors of the paper proposed several mechanisms potentially involved in microcompartments. These mechanisms may be involved in looping with insulator function. Another group reported that enhancer-promoter interactions and transcription are largely maintained upon depletion of CTCF, cohesin, WAPL or YY1. Instead, cohesin depletion decreased transcription factor binding to chromatin. Thus, cohesin may allow transcription factors to find and bind their targets more efficiently (Hsieh TS et al. *Nature Genetics* 2022). Among the identified insulator-associated DNA-binding proteins, Maz and MyoD1 form loops without CTCF (Xiao T et al. *Proc Natl Acad Sci USA* 2021 ; Ortabozkoyun H et al. *Nature genetics* 2022 ; Wang R et al. *Nature communications* 2022). I have added the following sentences on lines 571-575: Another group reported that enhancer-promoter interactions and transcription are largely maintained upon depletion of CTCF, cohesin, WAPL or YY1. Instead, cohesin depletion decreased transcription factor binding to chromatin. Thus, cohesin may allow transcription factors to find and bind their targets more efficiently. I have included the following explanation on lines 582-584: Maz and MyoD1 among the identified insulator-associated DNA-binding proteins form loops without CTCF.
      
       As for the directionality of CTCF, if chromatin loop anchors have some structural conformation, as shown in the paper entitled "The structural basis for cohesin-CTCF-anchored loops" (Li Y et al. *Nature* 2020), directional DNA binding would occur similarly to CTCF binding sites. Moreover, cohesin complexes that interact with convergent CTCF sites, that is, the N-terminus of CTCF, might be protected from WAPL, but those that interact with divergent CTCF sites, that is, the C-terminus of CTCF, might not be protected from WAPL, which could release cohesin from chromatin and thus disrupt cohesin-mediated chromatin loops (Davidson IF et al. *Nature Reviews Molecular Cell Biology* 2021). Regarding loop extrusion, the 'loop extrusion' hypothesis is motivated by in vitro observations. The experiment in yeast, in which cohesin variants that are unable to extrude DNA loops but retain the ability to topologically entrap DNA, suggested that in vivo chromatin loops are formed independently of loop extrusion. Instead, transcription promotes loop formation and acts as an extrinsic motor that extends these loops and defines their final positions (Guerin TM et al. *EMBO Journal* 2024). I have added the following sentences on lines 543-547: Cohesin complexes that interact with convergent CTCF sites, that is, the N-terminus of CTCF, might be protected from WAPL, but those that interact with divergent CTCF sites, that is, the C-terminus of CTCF, might not be protected from WAPL, which could release cohesin from chromatin and thus disrupt cohesin-mediated chromatin loops. I have included the following sentences on lines 577-582: The 'loop extrusion' hypothesis is motivated by in vitro observations. The experiment in yeast, in which cohesin variants that are unable to extrude DNA loops but retain the ability to topologically entrap DNA, suggested that in vivo chromatin loops are formed independently of loop extrusion. Instead, transcription promotes loop formation and acts as an extrinsic motor that extends these loops and defines their final positions.
      
       Another model for the regulation of gene expression by insulators is the boundary-pairing (insulator-pairing) model (Bing X et al. *Elife* 2024) (Ke W et al. *Elife* 2024) (Fujioka M et al. *PLoS Genetics* 2016). Molecules bound to insulators physically pair with their partners, either head-to-head or head-to-tail, with different degrees of specificity at the termini of TADs in flies. Although the experiments do not reveal how partners find each other, the mechanism unlikely requires loop extrusion. Homologous and heterologous insulator-insulator pairing interactions are central to the architectural functions of insulators. The manner of insulator-insulator interactions is orientation-dependent. I have summarized the model on lines 559-567: Other types of chromatin regulation are also expected to be related to the structural interactions of molecules. As the boundary-pairing (insulator-pairing) model, molecules bound to insulators physically pair with their partners, either head-to-head or head-to-tail, with different degrees of specificity at the termini of TADs in flies (Fig. 7). Although the experiments do not reveal how partners find each other, the mechanism unlikely requires loop extrusion. Homologous and heterologous insulator-insulator pairing interactions are central to the architectural functions of insulators. The manner of insulator-insulator interactions is orientation-dependent.
      

      8. Do the authors think that the identified DBPs could work in that way as well?

      The boundary-pairing (insulator-pairing) model would be applied to the insulator-associated DNA-binding proteins other than CTCF and cohesin that are involved in the loop extrusion mechanism (Bing X et al. Elife 2024) (Ke W et al. Elife 2024) (Fujioka M et al. PLoS Genetics 2016).

       Liquid-liquid phase separation was shown to occur through CTCF-mediated chromatin loops and to act as an insulator (Lee, R et al. *Nucleic Acids Research* 2022). Among the identified insulator-associated DNA-binding proteins, CEBPA has been found to form hubs that colocalize with transcriptional co-activators in a native cell context, which is associated with transcriptional condensate and phase separation (Christou-Kent M et al. *Cell Reports* 2023). The proposed microcompartment mechanisms are also associated with phase separation. Thus, the same or similar mechanisms are potentially associated with the insulator function of the identified DNA-binding proteins. I have included the following information on line 554: CEBPA in the identified insulator-associated DNA-binding proteins was also reported to be involved in transcriptional condensates and phase separation.
      

      9. Also, can the authors comment about the mechanisms those newly identified DBPs mediate contacts by active processes or equilibrium processes?

      Snead WT et al. Molecular Cell 2019 mentioned that protein post-transcriptional modifications (PTMs) facilitate the control of molecular valency and strength of protein-protein interactions. O-GlcNAcylation as a PTM inhibits CTCF binding to chromatin (Tang X et al. Nature Communications 2024). I found that the identified insulator-associated DNA-binding proteins tend to form a cluster at potential insulator sites (Supplementary Fig. 2d). These proteins may interact and actively regulate chromatin interactions, transcriptional condensation, and phase separation by PTMs. I have added the following explanation on lines 584-590: Furthermore, protein post-transcriptional modifications (PTMs) facilitate control over the molecular valency and strength of protein-protein interactions. O-GlcNAcylation as a PTM inhibits CTCF binding to chromatin. We found that the identified insulator-associated DNA-binding proteins tend to form a cluster at potential insulator sites (Fig. 4f and Supplementary Fig. 3c). These proteins may interact and actively regulate chromatin interactions, transcriptional condensation, and phase separation through PTMs.

      10. Can the author provide some real examples along with published structural data (e.g. the mentioned micro-C data) to show the link between protein co-presence, directional bias and contact formation?

      Structural molecular model of cohesin-CTCF-anchored loops has been published by Li Y et al. Nature 2020. The structural conformation of CTCF and cohesin in the loops would be the cause of the directional bias of CTCF binding sites, which I mentioned in lines 539 - 543 as follows: These results suggest that the directional bias of DNA-binding sites of insulator-associated DBPs may be involved in insulator function and chromatin regulation through structural interactions among DBPs, other proteins, DNAs, and RNAs. For example, the N-terminal amino acids of CTCF have been shown to interact with RAD21 in chromatin loops.

       To investigate the principles underlying the architectural functions of insulator-insulator pairing interactions, two insulators, Homie and Nhomie, flanking the *Drosophila even skipped *locus were analyzed. Pairing interactions between the transgene Homie and the eve locus are directional. The head-to-head pairing between the transgene and endogenous Homie matches the pattern of activation (Fujioka M et al. *PLoS Genetics* 2016).
      

      Reviewer #3

      Major Comments:

      1. Some of these TFs do not have specific direct binding to DNA (P300, Cohesin). Since the authors are using binding motifs in their analysis workflow, I would remove those from the analysis.

      When a protein complex binds to DNA, one protein of the complex binds to the DNA directory, and the other proteins may not bind to DNA. However, the DNA motif sequence bound by the protein may be registered as the DNA-binding motif of all the proteins in the complex. The molecular structure of the complex of CTCF and Cohesin showed that both CTCF and Cohesin bind to DNA (Li Y et al. Nature 2020). I think there is a possibility that if the molecular structure of a protein complex becomes available, the previous recognition of the DNA-binding ability of a protein may be changed. Therefore, I searched the Pfam database for 99 insulator-associated DNA-binding proteins identified in this study. I found that 97 are registered as DNA-binding proteins and/or have a known DNA-binding domain, and EP300 and SIN3A do not directory bind to DNA, which was also checked by Google search. I have added the following explanation in line 257 to indicate direct and indirect DNA-binding proteins: Among 99 insulator-associated DBPs, EP300 and SIN3A do not directory interact with DNA, and thus 97 insulator-associated DBPs directory bind to DNA. I have updated the sentence in line 20 of the Abstract as follows: We discovered 97 directional and minor nondirectional motifs in human fibroblast cells that corresponded to 23 DBPs related to insulator function, CTCF, and/or other types of chromosomal transcriptional regulation reported in previous studies.

      2. I am not sure if I understood correctly, by why do the authors consider enhancers spanning 2Mb (200 bins of 10Kb around eSNPs)? This seems wrong. Enhancers are relatively small regions (100bp to 1Kb) and only a very small subset form super enhancers.

      As the reviewer mentioned, I recognize enhancers are relatively small regions. In the paper, I intended to examine further upstream and downstream of promoter regions where enhancers are found. Therefore, I have modified the sentence in lines 929 - 931 of the Fig. 2 legend as follows: Enhancer-gene regulatory interaction regions consist of 200 bins of 10 kbp between -1 Mbp and 1 Mbp region from TSS, not including promoter.

      3. I think the H3K27me3 analysis was very good, but I would have liked to see also constitutive heterochromatin as well, so maybe repeat the analysis for H3K9me3.

      Following the reviewer's advice, I have added the ChIP-seq data of H3K9me3 as a truck of the UCSC Genome Browser. The distribution of H3K9me3 signal was different from that of H3K27me3 in some regions. I also found the insulator-associated DNA-binding sites close to the edges of H3K9me3 regions and took some screenshots of the UCSC Genome Browser of the regions around the sites in Supplementary Fig. 3b. I have modified the following sentence on lines 974 - 976 in the legend of Fig. 4: a Distribution of histone modification marks H3K27me3 (green color) and H3K9me3 (turquoise color) and transcript levels (pink color) in upstream and downstream regions of a potential insulator site (light orange color). I have also added the following result on lines 356 - 360: The same analysis was performed using H3K9me3 marks, instead of H3K27me3 (Fig. S3b). We found that the distribution of H3K9me3 signal was different from that of H3K27me3 in some regions, and discovered the insulator-associated DNA-binding sites close to the edges of H3K9me3 regions (Fig. S3b).

      4. I was not sure I understood the analysis in Figure 6. The binding site is with 500bp of the interaction site, but micro-C interactions are at best at 1Kb resolution. They say they chose the centre of the interaction site, but we don't know exactly where there is the actual interaction. Also, it is not clear what they measure. Is it the number of binding sites of a specific or multiple DBP insulator proteins at a specific distance from this midpoint that they recover in all chromatin loops? Maybe I am missing something. This analysis was not very clear.

      The resolution of the Micro-C assay is considered to be 100 bp and above, as the human nucleome core particle contains 145 bp (and 193 bp with linker) of DNA. However, internucleosomal DNA is cleaved by endonuclease into fragments of multiples of 10 nucleotides (Pospelov VA et al. Nucleic Acids Research 1979). Highly nested focal interactions were observed (Goel VY et al. Nature Genetics 2023). Base pair resolution was reported using Micro Capture-C (Hua P et al. Nature 2021). Sub-kilobase (20 bp resolution) chromatin topology was reported using an MNase-based chromosome conformation capture (3C) approach (Aljahani A et al. Nature Communications 2022). On the other hand, Hi-C data was analyzed at 1 kb resolution. (Gu H et al. bioRxiv 2021). If the resolution of Micro-C interactions is at best at 1 kb, the binding sites of a DNA-binding protein will not show a peak around the center of the genomic locations of interaction edges. Each panel shows the number of binding sites of a specific DNA-binding protein at a specific distance from the midpoint of all chromatin interaction edges. I have modified and added the following sentences in lines 593-597: High-resolution chromatin interaction data from a Micro-C assay indicated that most of the predicted insulator-associated DBPs showed DNA-binding-site distribution peaks around chromatin interaction sites, suggesting that these DBPs are involved in chromatin interactions and that the chromatin interaction data has a high degree of resolution. Base pair resolution was reported using Micro Capture-C.

      Minor Comments:

      1. PIQ does not consider TF concentration. Other methods do that and show that TF concentration improves predictions (e.g., ____https://www.biorxiv.org/content/10.1101/2023.07.15.549134v2____or ____https://pubmed.ncbi.nlm.nih.gov/37486787____/). The authors should discuss how that would impact their results.

      The directional bias of CTCF binding sites was identified by ChIA-pet interactions of CTCF binding sites. The analysis of the contribution scores of DNA-binding sites of proteins considering the binding sites of CTCF as an insulator showed the same tendency of directional bias of CTCF binding sites. In the analysis, to remove the false-positive prediction of DNA-binding sites, I used the binding sites that overlapped with a ChIP-seq peak of the DNA-binding protein. This result suggests that the DNA-binding sites of CTCF obtained by the current analysis have sufficient quality. Therefore, if the accuracy of prediction of DNA-binding sites is improved, although the number of DNA-binding sites may be different, the overall tendency of the directionality of DNA-binding sites will not change and the results of this study will not change significantly.

       As for the first reference in the reviewer's comment, chromatin interaction data from Micro-C assay does not include all chromatin interactions in a cell or tissue, because it is expensive to cover all interactions. Therefore, it would be difficult to predict all chromatin interactions based on machine learning. As for the second reference in the reviewer's comment, pioneer factors such as FOXA are known to bind to closed chromatin regions, but transcription factors and DNA-binding proteins involved in chromatin interactions and insulators generally bind to open chromatin regions. The search for the DNA-binding motifs is not required in closed chromatin regions.
      

      2. DeepLIFT is a good approach to interpret complex structures of CNN, but is not truly explainable AI. I think the authors should acknowledge this.

      In the DeepLIFT paper, the authors explain that DeepLIFT is a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input (Shrikumar A et al. ICML 2017). DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. DeepLIFT calculates a metric to measure the difference between an input and the reference of the input.

       Truly explainable AI would be able to find cause and reason, and to make choices and decisions like humans. DeepLIFT does not perform causal inferences. I did not use the term "Explainable AI" in our manuscript, but I briefly explained it in Discussion. I have added the following explanation in lines 623-628: AI (Artificial Intelligence) is considered as a black box, since the reason and cause of prediction are difficult to know. To solve this issue, tools and methods have been developed to know the reason and cause. These technologies are called Explainable AI. DeepLIFT is considered to be a tool for Explainable AI. However, DeepLIFT does not answer the reason and cause for a prediction. It calculates scores representing the contribution of the input data to the prediction.
      
       Furthermore, to improve the readability of the manuscript, I have included the following explanation in lines 159-165: we computed DeepLIFT scores of the input data (i.e., each binding site of the ChIP-seq data of DNA-binding proteins) in the deep leaning analysis on gene expression levels. DeepLIFT compares the importance of each input for predicting gene expression levels to its 'reference or background level' and assigns contribution scores according to the difference. DeepLIFT calculates a metric to measure the difference between an input and the reference of the input.
      
    1. “Tsze-kung asked, saying, ‘Is there one word which may serve as a rule of practice for all one’s life?’ The Master said, ‘Is not reciprocity such a word? What you do not want done to yourself, do not do to others.’”

      While this "golden rule" is easy to understand, I find it also runs the risk of being oversimplified or misapplied. Too often, we interpret it simply as "don't do things that make others unhappy," ignoring the diversity of personal preferences, cultural backgrounds, and practical needs. I have been in the division of projects, because I hate "being rushed", I think others also hate "rushing", resulting in the delay of the task, and ultimately everyone. It can be seen that "what you don't want" is not necessarily equivalent to the real needs of the other party. Making the Golden Rule work requires more active communication and empathy, rather than simply applying our own standards to others.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank the reviewers for their thoughtful comments


      Reviewer #1 (Evidence, reproducibility and clarity):

      SUMMARY: The manuscript is well written, with excellent explanation and documentation of experimental approaches. All conclusions are well supported by the data. The discussion is balanced and appropriate. The data, including images and movies, are of high quality and beautifully presented. The experimental design and analysis, including quantification of parameters in the images, is rigorous. Additional rigor is provided by comparing different cell types. The rapalog and iLID dimerization strategies have been described previously, as has their use to recruit kinesin motors to membranous organelles. However, this is the first application of these strategies to recruit motors to intermediate filaments. The evidence that vimentin filaments can be redistributed locally is clear and convincing and offers appealing potential for future experimentation. The redistribution was not fully reversible in all cells, but this is not surprising given the entanglement that must result from the action of motors along the length of these long flexible polymers.

      In terms of the biology of intermediate filaments, the authors show that vimentin redistribution had negligible effect on microtubule or F-actin organization, cell area, or the number of focal adhesions. Depletion of vimentin filaments locally reduced cell stiffness. Both ER and mitochondria segregated with vimentin filaments, but not lysosomes. These findings are consistent with published reports (e.g. comparing vimentin null and wildtype cell lines), but the acute and reversible nature of the motor recruitment strategy is a more elegant experimental approach, and the selectivity of the observed effects is evidence of its specificity. It is interesting that the ER network segregated with vimentin even in the absence of RNF26. While this is not explored further, it points to the potential power of this motor recruitment strategy for future studies on intermediate filament interactions.

      • *

      The following are some major and minor issues, which should all be easy for the authors to address.

      MAJOR COMMENTS:

        • Fig. S1 shows that the Vim-mCherry-FKBP construct coassembles with endogenous vimentin, but similar data for the iLID constructs appears to be lacking. I would like to see data demonstrating the incorporation of the Vim-mCherry-SspB constructs into the vimentin filaments. This should include high magnification images of single filaments in the cytoplasm of the cells.*
      • *

      Response:

      We have included a new Figure 2D, which illustrates the incorporation of the vimentin-mCherry-SspB construct into the vimentin network stained for endogenous vimentin.

        • The authors do not discuss the density of motor recruitment along the filaments. To address this, I'd like to see images showing the extent of recruitment of motors to the filaments using the rapalog and LID strategies. This should include high magnification images of single filaments in the cytoplasm of the cells.*
      • *

      Response:

      We have included new Figure S1B,C and Figure S2A, which illustrate the recruitment of kinesin motors to vimentin filaments upon induction with rapalog or light, respectively, by using super-resolution imaging with an Airyscan microscope. The motors were stained with antibodies against GFP. These data are discussed in the text, lines 126-132 and 165-168.

        • For the experiments on vimentin and keratin organization, the authors do not explain that these proteins form distinct networks and do not coassemble. The authors should show this in the cell types examined. This should also be explained explicitly in the body of the manuscript, though the data could be placed in the supplementary data. This is important because many intermediate filaments can coassemble freely, and coassembled proteins would be expected to segregate together.*
      • *

      Response:

      To address this important comment, we have now included images of vimentin and keratin in the three studied cell types using super-resolution imaging, both for cells expressing vimentin constructs (updated Figure 5) and endogenous filament staining in untransfected cells (updated Figure S4). These images illustrate that vimentin and keratin mostly form distinct filaments in HeLa cells. However, we do observe some degree of co-assembly of vimentin and keratin in COS-7 and U2OS cells. We were really surprised by this observation as, to our knowledge, it has not been clearly documented in the literature. These data help to explain why vimentin pulling causes keratin co-clustering in COS-7 and U2OS cells. We note that in a study where kinesin-1 mediated transport of vimentin and keratin has been previously investigated by the Gelfand lab in RPE1 cells, the two networks also appear to overlap quite strongly (Robert et al, 2019, FASEB J). Since no super-resolution microscopy was performed in that study, potential co-assembly of keratin and vimentin filaments was not discussed. Colocalization and coprecipitation of vimentin and keratin have been also described by Velez-delValle et al. in epithelial cells (Sci Rep 2016). Cell type-specific co-assembly of keratin and vimentin would require more investigation, and we make no strong conclusions about it, but we think that our data illustrate the usefulness of our methodology to address the co-dependence of different types of intermediate filaments.

      MINOR COMMENTS:

        • The authors refer to selecting cells within an "optimized expression range" for their transiently expressed recombinant proteins. They should state the proportion of the cells that met this criterion in their transient transfection experiments as this is important information for other researchers that might wish to use this approach in their own studies*. Response:

      These numbers are now included in lines 137 -142 and 173-176 of the revised paper. For the FRB-FKP system, ~50% of transfected cells could be used for analysis, for the light-induced system, ~40% were in the optimal range.

        • In Fig. 1F there should be a statistical comparison between cells transfected with the Kin14 construct and control (untransfected) cells in the absence of rapalog*
      • *

      Response:

      This comparison has been added.

        • In Fig. 1G there should be a statistical comparison between cells expressing Kin14 and KIF5A in the absence of rapalog.*
      • *

      Response:

      This comparison has been added.

        • The depletion of the ER network in the cell periphery is not evident in Fig. 7B, though the perinuclear accumulation is evident. Perhaps the authors could select another example or explain to the reader what exactly to look for in these images.*
      • *

      Response:

      We note that Figure 7B is a line scan of the image shown in Figure 7A. We assume that the reviewer meant Figure 7C, which is discussed in detail below.

        • In Fig. 7C, the intensity of the mCherry declines markedly over time. This is presumably due to photobleaching but should be explained in the legend.*
      • *

      Response:

      We have now improved Figure 7 by adding additional quantifications of ER and vimentin intensity and distribution in Figures 7D and E. We also extended the corresponding text (lines 288-297), which now reads; “Using the optogenetic tool, we observed that ER sheets and matrices, but not tubules, were pulled along with vimentin, confirming their previously described direct connections (Cremer et al., 2023) (black arrows, Figure 7C; Video S5). Most of the vimentin and ER repositioning occurred within approximately 10 minutes (Figure 7C, D, Video S5). While initially this resulted in a sparser tubular ER network at the cell periphery, over time, the network became denser, with smaller polygonal structures. This effect could also be observed in the ratio of perinuclear to peripheral intensity, where a subset of ER initially follows vimentin to the perinuclear region but then redistributes again towards the cell periphery (Figure 7D). It should be noted that while photobleaching of the ER channel was negligible, there was a 40% reduction in total Vim-mCh-SspB intensity over the course of the experiment due to photobleaching (Figure 7E).”

      • *

      Reviewer #1 (Significance):

      SUMMARY: The authors show that chemical-induced and light-induced dimerization strategies can be used to recruit microtubule motors to vimentin filaments, allowing rapid and reversible experimental manipulation of vimentin filament organization either locally or globally in cells. These strategies provide an experimental approach for investigating the physical interaction of intermediate filaments with organelles and other cytoskeletal component, as well as a method for probing the role of intermediate filaments in cell mechanics, cytoskeletal dynamics, etc. This is a technical improvement over previous experimental strategies, which have relied largely on chronic manipulation such as global disassembly or genetic deletion of intermediate filaments, e.g. comparison of vimentin null and wild type cells.

      The principal weakness of this study is that it offers limited insight into intermediate filament biology. As such, it might be most appropriate for a tools or techniques section of a journal. The dimerization strategies have been reported previously, so that is not new, but the application to intermediate filaments is novel.

      • *

      Response:

      We agree that our paper is primarily of technical nature and thus would be most appropriate for the tools and techniques section of a journal. We also agree that we used motor recruitment strategies that we and others have employed previously. However, we would like to emphasize that the demonstration that the tools work very well for intermediate filaments is entirely novel, as are the observations that these tools can be used to very rapidly alter cell stiffness or probe the links between intermediate filaments and organelles. Most importantly, the intermediate filament field currently lacks rapid specific manipulation strategies, and our tools will allow revisiting many important pending questions in the field. For example, they will allow to distinguish short-term and direct effects of intermediate filaments on cell polarity, adhesion and migration from their function in signaling and gene expression. We also report some new biology, such as evidence of some degree of co-assembly of vimentin and keratin.

      AUDIENCE: This paper will be of interest to cell biologists who study cytoskeletal interactions, particularly the interaction of intermediate filaments with other cellular organelles or cytoskeletal polymers, or the role of intermediate filaments in cellular mechanics.

      REVIEWER EXPERTISE: This reviewer has expertise on the cytoskeleton, cytoskeletal dynamics, and intracellular transport including intermediate filament biology.

      __ __


      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary: The manuscript presents a novel methodology for acute manipulation of vimentin intermediate filaments (IFs) using chemical genetic and optogenetic tools. By recruiting microtubule-based motors to vimentin via inducible dimerization systems, the authors achieve precise temporal and spatial control over vimentin distribution. Apart from the significant advancement in terms of methods development, key findings include:

      * Vimentin's role in organelle positioning: Mitochondria and ER are repositioned with vimentin, while lysosomes are less dependent on its organization.

      * Cytoskeletal interactions: Vimentin clustering minimally impacts actin and microtubule networks in the short term.

      * Cell stiffness: Vimentin repositioning reduces cell stiffness, indicating its significant role in cellular mechanics.

      * Cell-type-specific keratin interactions: The study highlights diverse interactions between vimentin and keratin-8 across cell lines.

      The study demonstrates methodological advancements enabling rapid vimentin manipulation and provides insights into vimentin's interactions with cellular structures.

      A major shortcoming is the unclear narrative, what do the authors want to present? This aspect requires significant attention.

      Response:

      By “unclear narrative” the reviewer meant that we should have provided a more balanced discussion of the insights that could be obtained using our new method compared to previously published literature, and we have modified our narrative accordingly.

      General Comments and Overall Assessment

      The manuscript represents an interesting contribution to the cytoskeletal field, addressing limitations of long-term perturbation methods. The tools developed are innovative, allowing controlled and reversible vimentin reorganization with minimal off-target effects. The findings are robust and provide important insights into the role of vimentin in cellular mechanics and organelle positioning.

      Strengths:

      Methodological novelty with broad applicability - this is the most exciting aspect.

      Comprehensive validation of the tools in multiple cell lines.

      Clear differentiation between vimentin's short- and long-term roles.

      Addressing gaps in understanding vimentin-organelle interactions.

      Limitations:

      * The manuscript is a little bit all over the place. While the method development is clear, the manuscript makes claims way beyond the method development. The message and narrative needs to be improved, and in the respect the whole structure needs an overhaul.

      Response:

      We have carefully modified the manuscript to avoid the impression that we make any claims that go beyond the immediate and quantifiable effects of vimentin repositioning on different cellular structures.

      * Unclear how much the differences in expression levels impact results and reproducibility.

      Response:

      Quantifications of expression levels and their discussion are included in Figures 1G-I, 2G-H, S2B and lines 137-142 and 173-176.

      * Would be good to discuss some findings that are specific to a given experimental cell line. How generalizable are these results?

      Response:

      Cell line-specific findings concerned mostly the co-displacement of keratin together with vimentin, which occurred in COS-7 and U2OS cells but in in HeLa cells. This interesting finding is discussed in the text, lines 246-269 and 375-383 (see also our answers on page 3 above and page 7 below).

      Major Comments

      Evidence and Claims:

      * While the methodological aspect is very strong the balance between presenting a novel method and presenting specific cell biological findings needs to be improved. Now it is quite unclear what the manuscript wants to present.

      * The abstract needs a complete overhaul. From reading the abstract, it is not clear what the manuscript wants to present.

      Response:

      We have modified the abstract to make it more clear that we do not make any general claims on the impact of vimentin on the interactions and functions of different organelles, but rather describe what can be directly observed after the acute displacement of vimentin and which conclusions can be made from these observations.

      Regarding the research findings there are a number of things for the authors to consider. Since the methods aspect is, in the eyes of this reviewer, in focus, I have not stringently assessed the experimental findings. Hence, the comments below are things to be considered in order to make the findings related to IF research stronger:

      • *

      * Cell-specific keratin interactions: The manuscript could benefit from some further validation of the physical interactions between vimentin and keratin-8 across different cell types.

      Response:

      We have improved the images of keratin and vimentin by using super-resolution (Airyscan) microscopy to show that they indeed form distinct filaments in HeLa cells, whereas in COS-7 and U2OS cells, where their co-displacement occurs, they can also incorporate into the same filaments. This observation was very surprising but agrees with the data published by the Gelfand lab on similarity in the distribution pattern and co-transport of vimentin and keratin in RPE1 cells (Robert et al, 2019, FASEB J). Colocalization and coprecipitation of vimentin and keratin has been also described by Velez-delValle at al. in epithelial cells (Sci Rep 2016).

      * Impact on microtubules: The disorganization of stable microtubules in cells expressing KIF5A was attributed to overexpression effects. It would be helpful to include additional controls, such as expressing KIF5A without vimentin constructs, to confirm this claim.

      Response:

      This control has been included in the new Figure S3. We note that this observation fully aligns with data published by another lab (Andreu-Carbó et al, 2024, Nat Comm).

      * ER-vimentin linkages: The observation that ER-vimentin interactions persist in RNF26 knockout cells is intriguing. The manuscript would benefit from a discussion on possible candidates for alternative linkers.

      Response:

      We have added a short discussion (lines 394-398) about the potential involvement of nesprins, such as nesprin-3, because they can connect the nuclear envelope to intermediate filaments, and might also partly participate in ER sheet-IF connections because ER and nuclear membranes are continuous and show some overlap in proteome.

      * Construct variability: Do the authors have some data on how much Expression level differences significantly affect the outcomes (e.g., incomplete recovery)?

      Response:

      We have added a figure (Figure S2B), which shows that incomplete recovery of vimentin clustering does not correlate with protein expression levels and likely depends on other factors, which could possibly be the cell cycle phase or degree of vimentin entanglement after repositioning. This point is discussed in revised text, lines 194-197.


      Reviewer #2 (Significance):

      Significance

      General Assessment: The study represents a significant technical advance in the study of cytoskeletal dynamics. The tools developed address critical limitations of traditional vimentin perturbation methods, allowing for spatiotemporally precise manipulation without long-term effects on gene expression or signaling pathways.

      Novelty:

      This is, to my knowledge, the first demonstration of reversible and acute vimentin repositioning using optogenetics. The study extends understanding of vimentin's short-term mechanical and organizational roles, distinguishing them from compensatory effects observed in knockdown models.

      Audience and Impact: The manuscript will appeal to researchers in cytoskeletal dynamics, cell mechanics, and organelle biology. The tools have broader applicability in studying other cytoskeletal systems and could inspire translational applications, such as investigating the role of vimentin in cancer or fibrosis.

      The reference list provide a relatively representative selection of articles relevant for the article. However, the authors may consider whether there could be relevant information in the relatively recent special edition of Current Opinion in Cell Biology, which focused on IFs, specially featuring vimentin https://www.sciencedirect.com/special-issue/10TFHK2QCKW

      Response:

      We thank the reviewer for this excellent suggestion, and we have included some additional references from this issue.

      Field of Expertise

      I specialize in cell biology, intermediate filaments, post-translational modifications, cytoskeletal dynamics, and advanced microscopy techniques.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      This is an excellent paper describing the use of chemical and light-induced heterodimerization of microtubule-based motors to rapidly disrupt the distribution of the vimentin cytoskeletal network. Rapid clustering of vimentin did not significantly affect the microtubule or actin networks, cell spreading or focal adhesions. Other organelles were repositioned together with vimentin. Interestingly, in some cell lines, keratin networks were displaced along with vimentin while in other cells they were not.

      Major comments:

      The conclusions are well supported by the data presented and appropriate controls are included.

      Optional comments:

        • The authors should expand on why they think the plus end directed KIF5A gives such a strong localization of vimentin to the perinuclear area.* Response:

      We think that two factors can contribute to this counterintuitive effect. First, vimentin is strongly concentrated and entangled in the perinuclear region, and displacement of some vimentin filaments to the cell periphery can cause the collapse of the rest to the cell center, with kinesins being unable to pull the perinuclear network apart. Second, kinesin-1 KIF5A is a motor that strongly prefers stable, post-translationally modified microtubules, and our previous study has shown that a significant proportion of such microtubules are located with their minus ends facing towards the cell periphery (Chen et al., Elife 2016). This could contribute to the accumulation of vimentin in the cell center upon KIF5A recruitment. These considerations were added to the revised text, lines 344-347.

      • Consideration should be given to the idea that the pulling of ER and mitochondria along with the vimentin could be due to trapping of these organelles within the vimentin matrix and not necessarily due to direct interactions. Such reasoning could explain the transient localization of lysosomes with the center aggregate since lysosomes are generally not thought to significantly bind to vimentin networks.*

      Response:

      This is an excellent point, and we have included it in the revised article, lines 333-335 and 405.

      Reviewer #3 (Significance):

      This study describes some valuable tools that should be useful to cell biologists interested in determining the role of the cytoskeleton and possibly other organelles in a variety of cellular contexts. It overcomes some of the existing shortcomings of the pharmacological reagents currently available for studying intermediate filament biology and will provide a useful adjunct to other more long-term manipulations of the cytoskeleton. While much of the data presented confirm results obtained by other methods, this is a significant technical advance as it provides a short time scale, and in one instance, reversible manipulation of the cytoskeleton.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The paper by Fournier et al. investigates the sensitivity of neural circuits to changes in intrinsic and synaptic conductances. The authors use models of the stomatogastric ganglion (STG) to compare how perturbations to intrinsic and synaptic parameters impact network robustness. Their main finding is that changes to intrinsic conductances tend to have a larger impact on network function than changes to synaptic conductances, suggesting that intrinsic parameters are more critical for maintaining circuit function.

      The paper is well-written and the results are compelling, but I have several concerns that need to be addressed to strengthen the manuscript. Specifically, I have two main concerns:

      (1) It is not clear from the paper what the mechanism is that leads to the importance of intrinsic parameters over synaptic parameters.

      (2) It is not clear how general the result is, both within the framework of the STG network and its function, and across other functions and networks. This is crucial, as the title of the paper appears very general.

      I believe these two elements are missing in the current manuscript, and addressing them would significantly strengthen the conclusions. Without a clear understanding of the mechanism, it is difficult to determine whether the results are merely anecdotal or if they depend on specific details such as how the network is trained, the particular function being studied, or the circuit itself. Additionally, understanding how general the findings are is vital, especially since the authors claim in the title that "Circuit function is more robust to changes in synaptic than intrinsic conductances," which suggests a broad applicability.

      I do not wish to discourage the authors from their interesting result, but the more we understand the mechanism and the generality of the findings, the more insightful the result will be for the neuroscience community.

      Major comments

      (1) Mechanism

      While the authors did a nice job of describing their results, they did not provide any mechanism for why synaptic parameters are more resilient to changes than intrinsic parameters. For example, from Figure 5, it seems that there is mainly a shift in the sensitivity curves. What is the source of this shift? Can something be changed in the network, the training, or the function to control it? This is just one possible way to investigate the mechanism, which is lacking in the paper.

      (2) Generality of the results within the framework of the STG circuit

      (a) The authors did show that their results extend to multiple networks with different parameters (the 100 networks). However, I am still concerned about the generality of the results with respect to the way the models were trained. Could it be that something in the training procedure makes the synaptic parameters more robust than intrinsic parameters? For example, the fact that duty cycle error is weighted as it is in the cost function (large beta) could potentially affect the parameters that are more important for yielding low error on the duty cycle.

      (b) Related to (a), I can think of a training scheme that could potentially improve the resilience of the network to perturbations in the intrinsic parameters rather than the synaptic parameters. For example, in machine learning, methods like dropout can be used to make the network find solutions that are robust to changes in parameters. Thus, in principle, the results could change if the training procedure for fitting the models were different, or by using a different optimization algorithm. It would be helpful to at least mention this limitation in the discussion.

      (3) Generality of the function

      The authors test their hypothesis based on the specific function of the STG. It would be valuable to see if their results generalize to other functions as well. For example, the authors could generate non-oscillatory activity in the STG circuit, or choose a different, artificial function, maybe with different duty cycles or network cycles. It could be that this is beyond the scope of this paper, but it would be very interesting to characterize which functions are more resilient to changes in synapses, rather than intrinsic parameters. In other words, the authors might consider testing their hypothesis on at least another 'function' and also discussing the generality of their results to other functions in the discussion.

      (4) Generality of the circuit

      The authors have studied the STG for many years and are pioneers in their approach, demonstrating that there is redundancy even in this simple circuit. This approach is insightful, but it is important to show that similar conclusions also hold for more general network architectures, and if not, why. In other words, it is not clear if their claim generalizes to other network architectures, particularly larger networks. For example, one might expect that the number of parameters (synaptic vs intrinsic) might play a role in how resilient the function is with respect to changes in the two sets of parameters. In larger models, the number of synaptic parameters grows as the square of the number of neurons, while the number of intrinsic parameters increases only linearly with the number of neurons. Could that affect the authors' conclusions when we examine larger models?

      In addition, how do the authors' conclusions depend on the "complexity" of the non-linear equations governing the intrinsic parameters? Would the same conclusions hold if the intrinsic parameters only consisted of fewer intrinsic parameters or simplified ion channels? All of these are interesting questions that the authors should at least address in the discussion.

      We thank Reviewer #1 for their valuable input. We agree with the reviewer that generality of the results may have been overstated. To address this we changed the title of the manuscript to make it more specific to rhythmic circuits and we included a sentence to this effect in the discussion. 

      (1) We were more interested in knowing which set of conductances is more robust in a population of models, rather than a mechanism. If such a mechanism exists it will be the subject of a different study.

      (2) (a) It is impossible to explore the whole parameter space of these models. Our method to find circuits will leave subsets of circuits out of the study. Our sole goal in constructing the model database was that the activities were similar but the conductances were different.  (b) Of course one could devise a cost function targeting circuits that are more or less robust to changes in one parameter. Whether those exist is a different matter. This is not what we intended to do.

      (3) For this we would need a different circuit that produces non-oscillatory activity. A normal pyloric rhythm circuit always produces oscillatory activity unless it is “crashed"either by temperature or perturbations, but even in this case because we don’t have a proper “control” activity (circuits crash in different ways) we would not be able to utilize the same approach.

      We think it is a valuable idea to perform a similar study in another small circuit with nonoscillatory (or rhythmic) activities. 

      (4) We did not explore the issue of how our results generalize to larger networks as it would be pure speculation. It could be potentially interesting to do a similar sensitivity analysis with a large network trained to perform a simple task. Our understanding is that many large trained networks are extremely sensitive to perturbations in synaptic weights, at the same time that the intrinsic properties of neurons in ANN are typically oversimplified and identical across units. 

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents an important exploration of how intrinsic and synaptic conductances affect the robustness of neural circuits. This is a well-deserved question, and overall, the manuscript is written well and has a logical progression.

      The focus on intrinsic plasticity as a potentially overlooked factor in network dynamics is valuable. However, while the stomatogastric ganglion (STG) serves as a well-characterized and valuable model for studying network dynamics, its simplified structure and specific dynamics limit the generalizability of these findings to more complex systems, such as mammalian cortical microcircuits.

      Strengths:

      Clean and simple model. Simulations are carefully carried out and parameter space is searched exhaustively.

      Weaknesses:

      (1) Scope and Generalizability:

      The study's emphasis on intrinsic conductance is timely, but with its minimalistic and unique dynamics, the STG model poses challenges when attempting to generalize findings to other neural systems. This raises questions regarding the applicability of the results to more complex circuits, especially those found in mammalian brains and those where the dynamics are not necessarily oscillating. This is even more so (as the authors mention) because synaptic conductances in this study are inhibitory, and changes to their synaptic conductances are limited (as the driving force for the current is relatively low).

      (2) Challenges in Comparison:

      A significant challenge in the study is the comparison method used to evaluate the robustness of intrinsic versus synaptic perturbations. Perturbations to intrinsic conductances often drastically affect individual neurons' dynamics, as seen in Figure 1, where such changes result in single spikes or even the absence of spikes instead of the expected bursting behavior. This affects the input to downstream neurons, leading to circuit breakdowns. For a fair comparison, it would be essential to constrain the intrinsic perturbations so that each neuron remains within a particular functional range (e.g., maintaining a set number of spikes). This could be done by setting minimal behavioral criteria for neurons and testing how different perturbation limits impact circuit function.

      (3) Comparative Metrics for Perturbation:

      Another notable issue lies in the evaluation metrics for intrinsic and synaptic perturbations. Synaptic perturbations are straightforward to quantify in terms of conductance, but intrinsic perturbations involve more complexity, as changes in maximal conductance result in variable, nonlinear effects depending on the gating states of ion channels. Furthermore, synaptic perturbations focus on individual conductances, while intrinsic perturbations involve multiple conductance changes simultaneously. To improve fairness in comparison, the authors could, for example, adjust the x-axis to reflect actual changes in conductance or scale the data post hoc based on the real impact of each perturbation on conductance. For example, in Figure 6, the scale of the panels of the intrinsic (e.g., g_na-bar) is x500 larger than the synaptic conductance (a row below), but the maximal conductance for sodium hits maybe for a brief moment during every spike and than most of the time it is close to null. Moreover, changing the sodium conductance over the range of 0-250 for such a nonlinear current is, in many ways, unthinkable, did you ever measure two neurons with such a difference in the sodium conductance? So, how can we tell that the ranges of the perturbations make a meaningful comparison?

      We thank Reviewer #2 for their comments. We agree with both reviewers about scope and generalizability. We changed the title of the manuscript and included a sentence in the discussion to address this. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 63: Tau_b is tau in Fig 1B? What is the 'network period' tau_n? Both are defined in the methods, but it would be good to clarify here and also in the figure.

      This was fixed. Tau_b is the  bursting period and we indicated it in the figure. Network period means the period of the network activity. This was rewritten.  

      (2) Line 74: "maximal conductances g_i." What is i? I can imagine what you meant, but it would be good to clarify the notation.

      There are multiple different currents. Letter ‘i' is an index over the different types. It now reads as follows,

      "The activity of the network depends on the values of the maximal conductances g ̄ i, where i is an index corresponding to the different current types (Na,CaS,CaT,Kd,KCa,A,H,Leak IMI)"

      (3) Line 78: "conductances are changed by a random amount." How much is the "random amount"? In percentages? 

      We fixed this sentence. This is how it reads now, 

      "The blue trace in Figure 1C corresponds to the activity of the same model when each  of the intrinsic conductances is changed by a random amount within a range between 0  (completely removing the conductance) and twice its starting value, 2×gi, or equivalently, an increment of 100%."

      Similarly, in Line 87: "by a similar percent." Can you provide Figures 1E-F in percentages? Are the percentages the same?

      The phrase "by a similar percent.” Is misleading and unimportant. Thank you, we removed it. 

      (4) Line 113: Why did you add I_MI? Is it important for the results or for the conclusions?

      I_MI was added because the current is known to be there and it is not more or less important for the results or conclusions than any other current. 

      (5) Line 117: "We used a genetic algorithm to generate a database." Confusing. I guess you meant that you used genetic algorithms to optimize the cost function.

      Thank you for this comment. We fixed this sentence, see below. 

      “We used a genetic algorithm to optimize the cost function, and in this way generated a database of N = 100 models with different values of maximal conductances (Holland 88)."

      (6) Line 136: "The models in the database were constrained to produce solutions whose features were similar to the experimental measurements." Why are there differences in the features? Is this an optimization issue? I thought you wanted to claim that there are degenerate solutions, that is, solutions where the parameters are different, but the output is identical. Please clarify.

      The concept of degenerate solutions does not imply that the solutions are mathematically identical. In biology this means that they provide very similar functions, but do so with different underlying parameters (in this case, maximal conductances). The activity of the pyloric network is slightly different across animals, and it also changes over time within the same individual. Variation across models reflects individual variation in the biological circuit, and it is strength of our modeling approach. The function of the circuits are equally good because they produce biologically realistic patterns, although the details of the activity patterns show differences. 

      (7) Line 139: "distributed (p > 0.05)." What test did you use? N? Similarly, at Lines 218, 241, 239, etc. Please be more rigorous when reporting statistical tests.

      Thank you. We now specify the test we utilized every time we report a p value. 

      (8) Line 143: "In this case, it is not possible to identify clusters, suggesting that there are no underlying relationships between the features in the model database." The 2D plot is misleading, as the features are in 11 dimensions. Claims should be about the 11D space, not projections onto 2D. In fact, I don't think you can rule out correlations between the features based on the 2D plots. For example, shouldn't there be correlations between the on and off phases and the burst durations?

      Thank you. These sentences were confusing and were removed. We added the following sentence to the end of that paragraph.

      "Because the feature vectors are similar, their t-SNE projections do not form groups or clusters."

      (9) Related to this, I don't understand this sentence: "Even though the conductances are broadly distributed over many-fold ranges, the output of the circuits results in tight yet uncorrelated distributions.”

      This sentence is confusing and was removed. 

      (10) Line 158: Repetition of Line 152: Figure 3 shows the currentscapes of each cell in two model networks.

      We removed the second instance of the repeated sentences. 

      (11) Line 160: "yet the activity of the networks is similar." Well, they are similar, but not identical. I can also say that the current scapes are 'similar'. This should be better quantified and not left as a qualitative description.

      While this is an interesting point it will not change the results and conclusions of the present study. The network models are different since the values of their maximal conductances are distributed over wide ranges.  

      (12) Line 218: midpoint parameter? Is that b - the sharpness? Please be consistent. Regarding the mechanism (see above) - any ideas what leads to this shift in the sensitivity curves between the two types of parameters?

      Yes, we made a mistake. ‘b’ is the midpoint parameter. This was fixed in the text, thank you.

      (13) Figure 6 illustrates why synaptic parameters are more robust, but it is not quantified. Why not provide a quantitative measure for this claim? For example, calculate the colored area within the white square for each pair, for each cell, and for each model. Show that these measures can predict improved robustness for one model over another and for synaptic vs. intrinsic parameters.

      The ratio of areas of the colored and non-colored regions in the whole hyperboxes (for intrinsic and synaptic conductances) is the number reported in the y-axis of the sensitivity curves when we include all conductances (and not just a pair). 

      We computed the ratios of the colored/noncolored areas in all panels in figure 6 and now report these quantities as follows, 

      "We computed the proportions of areas of the white boxes that correspond to pyloric activity. These values for the intrinsic conductances panels are PD = 0.58, LP = 0.50, PY = 0.49, and the proportions for the synaptic conductances panels are PDPY = 0.62, P DLP = 0.87, and LPPD = 0.94. The occupied areas for synaptic conductances are larger than in the intrinsic conductances panels, consistent with our finding that the circuits’ activities are more robust to changes in synaptic conductances versus changes in intrinsic conductances."

      "As before, we computed the proportion of areas of pyloric activity within the white boxes: PD = 0.61, LP = 0.55, PY = 0.52, and the proportions for the synaptic conductances panels are PDPY = 0.88, PDLP = 0.87, and LPP D = 0.83. These results provide an intuition of the complexities of GP . Not only are these regions hard-to-impossible to characterize in one circuit, but they are also different across circuits.” 

      (14) Does the sign of the synaptic weights affect the conclusions?

      We did not explore this issue because all chemical synapses in this network are inhibitory.

      (15) Line 492: typo: deltai.

      We fixed this.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 301 - you can also add Williams and Fletcher 2019 Neuron.

      We added the reference. Thank you. 

      (2) Line 316 - this is a strange comment as these exact regions that were shown intrinsic plasticity (e.g., Losonczy, Attila, Judit K. Makara, and Jeffrey C. Magee. "Compartmentalized dendritic plasticity and input feature storage in neurons." Nature 452.7186 (2008): 436-441).

      We did not understand this comment. 

      (3) I found only one citation for the work of Turrigiano, the most relevant of which is only mentioned in the Method section. This is odd, as her work directly relates how synaptic conductance perturbation results in changes in intrinsic conductance.

      We included more references to the work of Turrigiano to provide more context. 

      "Desai, Niraj S., Lana C. Rutherford, and Gina G. Turrigiano. "Plasticity in the intrinsic excitability of cortical pyramidal neurons." Nature neuroscience 2, no. 6 (1999): 515-520.” "Desai, Niraj S., Sacha B. Nelson, and Gina G. Turrigiano. "Activity-dependent regulation of excitability in rat visual cortical neurons." Neurocomputing 26 (1999): 101-106.”

      (4) Line 329 - The list of citations is very limited regarding studies of ext/int balance which started really way before 2009. Please give some of the credit to the classics.

      We included the following additional references.

      Van Vreeswijk, Carl, and Haim Sompolinsky. "Chaos in neuronal networks with balanced excitatory and inhibitory activity." Science 274, no. 5293 (1996): 1724-1726.

      Rubin, Ran, L. F. Abbott, and Haim Sompolinsky. "Balanced excitation and inhibition are required for high-capacity, noise-robust neuronal selectivity." Proceedings of the National Academy of Sciences 114, no. 44 (2017): E9366-E9375.

      Wang, Xiao-Jing. "Macroscopic gradients of synaptic excitation and inhibition in the neocortex." Nature reviews neuroscience 21, no. 3 (2020): 169-178.

      Lo, Chung-Chuan, Cheng-Te Wang, and Xiao-Jing Wang. "Speed-accuracy tradeoff by a control signal with balanced excitation and inhibition." Journal of Neurophysiology 114, no. 1 (2015): 650-661.

      (5) In Figure 1B, why does it say 'OFF' when the neuron is spiking?

      The label indicates the interval of time elapsed between the first spike in the PD neuron (taken as a reference), and the last spike in the burst (PD off). 

      Summary of changes to figures:

      Figure 1:

      Fixed labels indicating bursting period and burst duration.

      Figure 5:

      Added labels in panels C and D specifying the symbol corresponding to the sigmoidal parameter.

      Additional changes

      We changed the title of the manuscript as follows:

      "Rhythmic circuit function is more robust to changes in  synaptic than intrinsic conductances." We included the following sentence at the end of the Discussion Section. 

      "We believe our results will hold for other rhythmic circuits and will be relevant for similar studies in other circuits with more complex functions.”

      We realized we made a mistake with the units for maximal conductances. They were incorrectly expressed in nS (nano Siemens) in the figure labels, and correctly expressed in micro Siemens in the methods section. This was fixed and now conductances are expressed in micro Siemens consistently in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Summary:

      The authors examine the role of the medial prefrontal cortex (mPFC) in cognitive control, i.e. the ability to use task-relevant information and ignore irrelevant information, in the rat. According to the central-computation hypothesis, cognitive control in the brain is centralized in the mPFC and according to the local hypothesis, cognitive control is performed in task-related local neural circuits. Using the place avoidance task which involves cognitive control, it is predicted that if mPFC lesions affect learning, this would support the central computation hypothesis whereas no effect of lesions would rather support the local hypothesis. The authors thus examine the effect of mPFC lesions in learning and retention of the place avoidance task. They also look at functional interconnectivity within a large network of areas that could be activated during the task by using cytochrome oxidase, a metabolic marker. In addition, electrophysiological unit recordings of CA1 hippocampal cells are made in a subset of (lesioned or intact) animals to evaluate overdispersion, a firing property that reflects cognitive control in the hippocampus. The results indicate that mPFC lesions do not impair place avoidance learning and retention (though flexibility is altered during conflict training), do not affect cognitive control seen in hippocampal place cell activity (alternation of frame-specific firing), a measure of location-specific firing variability, in pretraining. It nevertheless has some effect on functional interconnections. The results overall support the local hypothesis. 

      Strengths:

      Straightforward hypothesis: clarification of the involvement of the mPFC in the brain is expected and achieved. Appropriate use of fully mastered methods (behavioral task, electrophysiological recordings, measure of metabolic marker cytochrome oxidase) and rigorous analysis of the data. The conclusion is strongly supported by the data. 

      Weaknesses:

      No notable weaknesses in the conception, making of the study, and data analysis. The introduction does not mention important aspects of the work, i.e. cytochrome oxidase measure and electrophysiological recordings. The study is actually richer than expected from the introduction. 

      The revised Introduction now includes:

      “We used cytochrome oxidase, a metabolic marker of baseline neuronal activity, to confirm the mPFC lesions were effective and that there are non-local network consequences despite the local lesion. We first evaluated cytochrome oxidase activity in regions known to be associated with performance in the active place avoidance task, or regions with known connectivity to the mPFC. We then evaluated covariance of activity amongst the regions in an effort to detect network consequences of the lesion.”

      Reviewer #2 (Public review): 

      Park et al. set out to test two competing hypotheses about the role of the medial prefrontal cortex (PFC) in cognitive control, the ability to use task-relevant cues and ignore taskirrelevant cues to guide behavior. The "central computation" hypothesis assumes that cognitive control relies on computations performed by the PFC, which then interacts with other brain regions to accomplish the task. Alternatively, the "local computation" hypothesis suggests that computations necessary for cognitive control are carried out by other brain regions that have been shown to be essential for cognitive control tasks, such as the dorsal hippocampus and the thalamus. If the central computation hypothesis is correct, PFC lesions should disrupt cognitive control. Alternatively, if the local computation hypothesis is correct, cognitive control would be spared after PFC lesions. The task used to assess cognitive control is the active place avoidance task in which rats must avoid a section of a rotating arena using the stationary room cues and ignoring the local olfactory cues on the rotating platform. Performance on this task has previously been shown to be disrupted by hippocampal lesions and hippocampal ensembles dynamically represent the room and arena depending on the animal's proximity to the shock zone. They found no group (lesion vs. sham) differences in the three behavioral parameters tested: distance traveled, latency to enter the shock zone, and number of shock zone entries for both the standard task and the "conflict" task in which the shock zone was rotated by 180 degrees. The only significant difference was the savings index; the lesion group entered the new shock zone more often than the sham group during the first 5 minutes of the second conflict session. This deficit was interpreted as a cognitive flexibility deficit rather than a cognitive control failure. Next, the authors compared cytochrome oxidase activity between sham and lesion groups in 14 brain regions and found that only the amygdala showed significant elevation in the lesion vs. sham group. Pairwise correlation analysis revealed a striking difference between groups, with many correlations between regions lost in the lesion group (between reuniens and hippocampus, reuniens and amygdala and a correlation between dorsal CA1 and central amygdala that appeared in the lesion group and were absent in the sham group. Finally, the authors assessed dorsal hippocampal representations of the spatial frame (arena vs. room) and found no differences between lesion and sham groups. The only difference in hippocampal activity was reduced overdispersion in the lesion group compared to the sham group on the pretraining session only and this difference disappeared after the task began. Collectively, the authors interpret their findings as supporting the local computation hypothesis; computations necessary for cognitive control occur in brain regions other than the PFC. 

      Strengths:

      (1) The data were collected in a rigorous way with experimental blinding and appropriate statistical analyses. 

      (2) Multiple approaches were used to assess differences between lesion and sham groups, including behavior, metabolic activity in multiple brain regions, and hippocampal singleunit recording. 

      Weaknesses:

      (1) Only male rats were used with no justification provided for excluding females from the sample.

      This is a weakness we acknowledge. The experiments were performed at a time when we did not have female rats in the lab.

      (2) The conceptual framework used to interpret the findings was to present two competing hypotheses with mutually exclusive predictions about the impact of PFC lesions on cognitive control. The authors then use mainly null findings as evidence in support of the local computation hypothesis. They acknowledge that some people may question the notion that the active place avoidance task indeed requires cognitive control, but then call the argument "circular" because PFC has to be involved in cognitive control. This assertion does not address the possibility that the active place avoidance task simply does not require cognitive control. 

      We beg to differ that the possibility was not addressed. Prior to making the assertion, the manuscript describes the evidence that the active place avoidance task requires cognitive control. The evidence is multifold, and includes task design, behavior, and electrophysiology; we argue that this is more evidence than has been provided for other tasks that are asserted to require cognitive control. Specifically line 417 states:

      “We have previously demonstrated cognitive control in the active place avoidance task variant we used (Fig. 1) because the rats must ignore local rotating place cues to avoid the stationary shock zone. Even when the arena does not rotate, rats distinctly learn to avoid the location of shock according to distal visual room cues and local olfactory arena cues, such that the distinct place memories can be independently manipulated using probe trials [49, 50]. When the arena rotates as in the present studies, neural manipulations that impair the place avoidance are no longer impairing when the irrelevant arena cues are hidden by shallow water [14, 15, 51, 52]. Furthermore, persistent hippocampal neural circuit changes caused by active place avoidance training are not detected when shallow water hides the irrelevant arena cues to reduce the cognitive control demand [10, 31, 33]. While these findings unequivocally demonstrate the salience of relevant stationary room cues to use for avoiding shock and irrelevant arena cues to ignore during active place avoidance, the most compelling evidence of cognitive control comes from recording hippocampal ensemble discharge. Hippocampal ensemble discharge purposefully represents current position using stationary room information when the subject is close to the stationary shock zone and alternatively represents rotating arena information when the mouse is far from the stationary shock zone [Fig. 4; 10].”

      Line 436, however, acknowledges a fact that will always be true: no matter what anyone opines - until there are universally agreed upon objective criteria, it is logically possible that active place avoidance does not require cognitive control. The revision states: Despite this evidence from task design, behavioral observations, and direct electrophysiological representational switching as required to directly demonstrate cognitive control, one might still argue that it is logically possible that the active place avoidance task does not require cognitive control and this is why the mPFC lesion did not impair place avoidance of the initial shock zone. We consider such reasoning to be unproductive because it presumes that only tasks that require an intact mPFC can be cognitive control tasks. We nonetheless acknowledge that for some, we have not provided sufficient evidence that the active place avoidance requires cognitive control.

      “We assert the evidence is compelling, and together these findings require rejecting the central-computation hypothesis that the mPFC is essential for the neural computations that are necessary for all cognitive control tasks.”

      (3) The authors did not link the CO activity with the behavioral parameters even though the CO imaging was done on a subset of the animals that ran the behavioral task nor did they make any attempt to interpret these findings in light of the two competing hypotheses posed in the introduction. Moreover, the discussion lacks any mechanistic interpretations of the findings. For example, there are no attempts to explain why amygdala activity and its correlation with dCA1 activity might be higher in the PFC lesioned group. 

      The CO study was performed to assess the effects of the lesion, as stated on line 262 “Cytochrome oxidase (CO), a sensitive metabolic marker for neuronal function [27], was used to evaluate whether lesion effects were restricted to the mPFC.” Furthermore, as a matter of fact, line 411 states “Thus, CO imaging and electrophysiological evidence identify changes in the brain beyond the directly damaged mPFC area. In particular, the dorsal hippocampus loses the inhibitory input from mPFC [45, 46] and loses the metabolic correlation with the nucleus reuniens, which is thought to be a relay between the mPFC and the dorsal hippocampus [47, 48].”

      These CO measures assess baseline metabolic function and so it would be inappropriate to correlate them with the measures of behavior. Because the lesion and control groups do not differ on most measures of behavior, a relationship to CO measures is not expected. Importantly, even if there were differences in correlations between CO activity and behavioral measures, what could they mean? The study was designed to distinguish between two hypotheses, not to determine what CO differences could mean for behavior. As such, it is not at all clear how metabolic consequences of the lesion relate to the two hypotheses being evaluated, and so we consider it inappropriate to speculate. We did examine, and now include, the correlation between lesion size and conflict behavior. The Fig. 1 legend states “Savings was not related to lesion size r = 0.009, p = 0.98. *p < 0.05.”

      (4) Publishing null results is important to avoid wasting animals, time, and money. This study's results will have a significant impact on how the field views the role of the PFC in cognitive control. Whether or not some people reject the notion that the active place avoidance task measures cognitive control, the findings are solid and can serve as a starting point for generating hypotheses about how brain networks change when deprived of PFC input. 

      We thank the reviewer for the acknowledgement.

      Reviewer #3 (Public review): 

      Summary:

      This study by Park and colleagues investigated how the medial prefrontal cortex (mPFC) influences behavior and hippocampal place cell activity during a two-frame active place avoidance task in rats. Rats learned to avoid the location of mild shock within a rotating arena, with the shock zone being defined relative to distal cues in the room. Permanent chemical lesions of the mPFC did not impair the ability to avoid the shock zone by using distal cues and ignoring proximal cues in the arena. In parallel, hippocampal place cells alternated between two spatial tuning patterns, one anchored to the distal cues and the other to the proximal cues, and this alteration was not affected by the mPFC lesion. Based on these findings, the authors argue that the mPFC is not essential for differentiating between task-relevant and irrelevant information. 

      Strengths:

      This study was built on substantial work by the Fenton lab that validated their two-frame active place avoidance task and provided sound theoretical and analytical foundations. Additionally, the effectiveness of mPFC lesions was validated by several measures, enabling the authors to base their argument on the lack of lesion effects on behavior and place cell dynamics. 

      Weaknesses:

      The authors define cognitive control as "the ability to judiciously use task-relevant information while ignoring salient concurrent information that is currently irrelevant for the task." (Lines 77-78). This definition is much simpler than the one by Miller and Cohen: "the ability to orchestrate thought and action in accordance with internal goals (Ref. 1)" and by Robbins: "processes necessary for optimal scheduling of complex sequence of behaviour." (Dalley et al., 2004, PMID: 15555683). Differentiating between task-relevant and irrelevant information is required in various behavioral tasks, such as differential learning, reversal learning, and set-shifting tasks. Previous rodent behavioral studies have shown that the integrity of the mPFC is necessary for set-shifting but not for differential or reversal learning (e.g., Enomoto et al., 2011, PMID: 21146155; Cho et al., 2015, PMID: 25754826). In the present task design, the initial training is a form of differential learning between proximal and distal cues, and the conflict training is akin to reversal learning. Therefore, the lack of lesion effects is somewhat expected. It would be interesting to test whether mPFC lesions impair set-shifting in their paradigm (e.g., the shock zone initially defined by distal cues and later by proximal cues). If the mPFC lesions do not impair this ability and associated hippocampal place dynamics, it will provide strong support for the authors' local computation hypothesis.

      Thank you for these comments. In addressing them we have provided a significant revision to the manuscript’s Introduction. While authors like those cited by the reviewer have defined cognitive control, those definitions are difficult to test rigorously, as it is almost a matter of opinion whether a subject is displaying “the ability to orchestrate thought and action in accordance with internal goals" or whether they are using "processes necessary for optimal scheduling of complex sequence of behaviour." What would such definitions of cognitive control predict about neuronal activity? We have deliberately used a simple, operational definition of cognitive control because it is physiologically testable. In the revision, starting at line 93, we have provided an excerpt from Miller and Cohen (2001) with discussion. The importance of that work is that it provides explicit neuronal criteria and a means to operationally define cognitive control. As stated on Line 118 “Accordingly, cognitive control would be at work when there is sustained neuronal network representations of task-relevant information that suppresses or gates representations of salient task-irrelevant information in accord with purposeful judicious behavior.”

      We used a R+A- task variant in which there is a stationary room-frame shock zone and task irrelevant arena-frame information. A strict correspondence to shift-shifting task design cannot be accomplished with active place avoidance because an A+R- task that requires avoiding an arena-frame shock zone in the absence of a room-frame shock zone can be accomplished trivially if the subject chooses to not move when it is in a place with no shock. However, the R+A+ task variant is readily learned, in which there is both a room-frame and an arena-frame shock zone (see cited work below). This task variant requires the subject to judiciously shift between avoiding the room-frame shock zone using stationary room information and avoiding the arena-frame shock zone using rotating arena information. This R+A+ task variant might meet the reviewer’s criteria for cognitive control. We have recorded hippocampal and entorhinal ensemble activity during the R+A+ task variant and it is very similar to the activity during the R+A- task we used. Nonetheless, future work will investigate the efect of mPFC lesion on the R+A+ task variant.

      Cited work:

      Fenton AA, Wesierska M, Kaminsky Y, Bures J (1998), Both here and there: simultaneous expression of autonomous spatial memories in rats. Proc Natl Acad Sci U S A 95:11493-11498. Kelemen E, Fenton AA (2010), Dynamic grouping of hippocampal neural activity during cognitive control of two spatial frames. PLoS Biol 8:e1000403.

      Burghardt NS, Park EH, Hen R, Fenton AA (2012), Adult-born hippocampal neurons promote cognitive flexibility in mice. Hippocampus 22:1795-1808.

      Park EH, Keeley S, Savin C, Ranck JB, Jr., Fenton AA (2019), How the Internally Organized Direction Sense Is Used to Navigate. Neuron 101:1-9.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors): 

      (1) Incorporate the cytochrome oxidase and hippocampal recordings (rationale and hypothesis) in the introduction, explaining how these aspects are relevant to the general question. 

      We have done this as requested. See lines 159-173 of the revised introduction.

      (2) Figure 1C. On Day 4-5 (conflict training) in which the shock zone was relocated 180 deg from the initial location, the behavioral tracks did not show any presence of the rat in this sector (in particular for the lesion example). Figure 4 nevertheless indicates that entrances have been made (which was expected since rats have to know that the shock zone was relocated).

      Thanks for pointing this out. The tracks are from the end of the sessions. The labels have been changed to specify which trials the tracks are from.

      (3) Figure 1C. The caption is huge as it contains the statistical analyses details. I would prefer to have these details in the text and keep the caption at a "reasonable" length. At the end of the caption (l. 190-191), it would be less confusing the keep the numbering of the training days: replace D1T1 with D2T1 and D2T9 with D3T9).

      The statistical details have been relocated to the main text and the numbering updated, as suggested, thank you.

      (4) It was not inconsiderable to show that mPFC lesion had some effects in the present task if it were only to validate the effectiveness of the lesion. This brain area has been shown to be important for planning, cognitive flexibility, etc. Indeed the authors found that the saving index was greater in sham than in mPFC rats (overdispersion in hippocampal firing was also reduced in pretraining) and interpreted this result as impaired flexibility. Would an alternative explanation be a memory deficit? I nevertheless expected that impaired flexibility in mPFC rats would be expressed in conflict trials in the form of more entrances in the zone that was initially not associated with shock (at least in the first trials of Day 4). But it appears to not be the case.

      A memory deficit is unlikely to explain the difference between the groups on the first trial of Day 5. Memory in the lesion rats was tested multiple times, specifically at the start of each trial (time to first entrance), including on the 24-h retention test, and no deficits were observed. Performance on Day 9 trial 1 is worse in the lesion group than in the controls, but it is not parsimonious to attribute this to a simple memory deficit since 24-h memory was good and similar between lesion and control rats on days 3 and 4, and memory on Day 5 was equally poor in both the lesion and control rats, as measured by time to first entrance.  

      (5) Material and methods. The injected volume of ibotenic acid should be mentioned. 

      The volume 0.2 µl was added. See line 531.

      (6) The rationale for doing the conflict training session should be indicated somewhere. 

      The rationale was provided. See lines 204-208.

      Reviewer #2 (Recommendations for the authors): 

      (1) Line 132: The text states that all sham rats improved and only 6/10 lesion rats improved is followed by a t-test, which tests the difference between means; it does not compare proportions. Also, what criterion was used to determine if an improvement was seen or not? 

      The statistical comparison is provided (now lines 230: test of proportions z = 2.3, p = 0.03). Improvement was simply numerically fewer entrances.

      (2) Line 138: This is a very long and confusing sentence. Consider revising for clarity. 

      The sentence (now line 234) was revised.

      (3) Figure 1B only includes data from 3 animals. Most published studies show the whole dataset by presenting the largest and smallest lesions. 

      Supplemental Figure S2 was added with all the lesions depicted and quantified.

      (4) Figure 1C suggestion to make the schematic shock zone line up with the shock zone shown for the tracking data. 

      Graphically, it looks better as drawn as it uses to perspective to depict a three-dimensional structure.

      (5) Methods: Clarify if the shock zone location was the same across all rats. 

      Line 570 states that the shock zone was the same for all rats.

      (6) Line 158: "Behavioral tracks" is not clear. Suggest more precise wording.

      Reworded to “Tracked room-frame positions” (now line 249)

      (7) Line 166: "effect of trial" - should this be the main effect of trial?; "interaction" - should this be "group x trial" interaction? 

      Reworded (now line 181).

      (8) Line 167: "or their interaction" is awkward in the context of the sentence. 

      Reworded (now line 182).

      (9) Line 182: Avoid talking about "trends" as if they are almost significant unless the authors suspect that they did not have sufficient statistical power to detect differences. In that case, a power analysis should be provided. 

      Removed.

      (10) Line 190: "left:...right..." is hard to follow, especially with acronyms like D1T1. Consider revising for clarity. 

      Revised (now lines 246-248).

      (11) Line 195: "effectiveness of the PFC to impair" is unnecessarily verbose. 

      Reworded (now lines 255-257).

      (12) Savings results: There is a lot of variability in the lesion group. It would be interesting to know if the extent of the lesion correlates with savings.

      Savings was not related to lesion. See line 259.

      (13) Line 300: The thalamic recording results are not reported in the results section (other than appearing in the table). Moreover, there is no detail about which thalamic nucleus these recordings are from.

      Lines 411 and 614 provides these details.  

      (14) Line 312: "no longer impair" contains a grammatical error. 

      Corrected (now line 422)

      (15) Line 325: "was not impairing" contains a grammatical error. 

      Corrected (now line 437).

      (16) Line 327: The sentence ending with "...opinion of others" seems unnecessarily confrontational. 

      Previous reviewers at other journals have maintained this position, we therefore included such a strong statement in our initial submission. However, we now revised this statement to avoid appearing confrontational.

      (17) Line 329: Sentence is awkward. Consider revising. 

      Revised (now line 443).

      (18) Line 384: The authors should disclose if there was an objective metric for determining the adequacy of the lesion. 

      The lesion assessment and quantification is better explained in the Methods under “Cytochrome oxidase activity and Nissl staining,” (lines 708-714).

      (19) Line 385: The authors should clarify how they got from 15 rats (Line 376) to 10. 

      This information is provided in the methods.

      (20) Line 390: It is not clear why skin irritation in the cage mate would prevent the rat from being tested. 

      This has been explained in the Methods under “Behavioral analysis followed by cytochrome oxidase activity” (lines 515-518).

      (21) Methods section: The authors should describe how the tracking data were acquired. Overhead camera? Tracker based on luminance or body position? What software program was used? What was the sampling rate? 

      This is now better explained in the Methods under “Active place avoidance task) (lines 538551).

      (22) Methods section: Include how fast the arena was rotating and other details about the task such as where rats were placed during the ITI. 

      Better explained in the Methods under “Active place avoidance task”.

      (23) Line 439: The recording system used (hardware & software) should be stated. 

      This is now included in the Methods (line 538).

      (24) Line 435: Though overdispersion calculation is described thoroughly, there is nothing in the paper that tells me what overdispersion means. 

      What the measure means is now described in the Methods under “Electrophysiology data analysis” (lines 646-650).

      (25) Line 561: The test used to assess effect sizes should be stated. 

      Effect sizes corresponding to the statistical tests are provided.

      Reviewer #3 (Recommendations for the authors): 

      (1) At the end of the conflict training, rats with mPFC lesions learned to avoid the new shock zone (Figure 1F, Block 16), but their place cells did not show room-preferring activity near the shock zone (Figure 4B). This observation questions whether spatial frame-specific representation is relevant for active avoidance. Can the authors clarify this point?

      This is a dynamic behavior and the hippocampal dynamics match, changing with a dynamic that is a few seconds, as we have shown in several published papers. The lack of a preference averaged over 20 minutes when the rats are avoiding both the current and former shock zones during the conflict session is pretty much what would be expected from such a coarse measurement. The important measure is the spatially-resolved measure of room versus arena preference. Figure 4B shows that in the lesion rats there is less of a frame preference during conflict, generally (consistent with poorer flexibility). However, Figure 4D quantifies the frame preference near and far from the shock zone and accordingly, there is no difference between the groups.

      (2) Related to the point above, the author might consider including panels in Figures 4C and D to show the neural activity during the pretraining and conflict training retention period. I assume p(room) will be comparable between the Near and Far segment in both sessions, but the p(room) may be higher in the Conflict training session than the Pretraining session. This would show that the mPFC lesion impairs suppressing the place cell activity encoding the old shock location. 

      Thanks for the suggestion. While we don’t think we can draw any strong conclusions from this analysis we are fine to show it. The issue is that during conflict, the rats have two perfectly reasonable representations of where there was shock, the initial location that was turned off to make the conflict, and the most recent conflict location of shock. Importantly, these recordings are during conflict retention after we turned off the shock for the retention recording (for the second time in the rat’s experience). Turning off the shock allows us to exactly match the physical conditions of pretraining, initial retention and conflict retention, which was the experimental design’s goal. However, the experiential history of the rats prior to initial retention and conflict retention cannot match, because during initial retention the rats had never experienced a changed shock zone whereas, by conflict retention, they had experienced multiple changes. Importantly, we have previously shown that mouse hippocampal ensembles represent both initial and conflict shock locations, as the animals consider their options during conflict trials (see Dvorak et al 2018, PLoS Biol 16:e2003354). Consequently, we cannot make any strong predictions about whether or not hippocampal activity during conflict retention should be room-frame preferring selectively in the vicinity of the current shock zone. As I am sure the reviewer appreciates from their own introspection, mental representations are mercifully not obliged to dictate behavior. In fact, that is what is interesting and controversial about cognitive control – it is a dynamic internal process and the innovation of our work lies in demonstrating that one cannot only rely on behavior to assess this process. Nonetheless, we did this analysis and now present it in the revised Fig. 4. During pretraining both lesion and sham groups express no particular spatially-modulated preference for either the room or the arena frame, as expected. During initial training both groups express a room-frame preference in the vicinity of the shock zone, as we initially reported. By inspection, during conflict, the sham rats express a preference for room-frame activity in the vicinity of the most recent shock zone location; this preference is weaker than what is expressed during initial retention. The lesion rats do not show this preference. These impressions are quantified in revised Fig. 4D; the comparisons within the conflict retention sessions did not reach statistical significance. We leave it to the reader to interpret what that means. Thanks for the nudge.

      (3) The significant group difference in place cell overdispersion during the pretraining phase (Figure 3C) is interesting, but some readers would appreciate additional sentences on its functional implication. Does it mean the spatial tuning of place cells was disrupted by the mPFC lesion?

      Only the reliability of spatial firing was altered, not the spatial tuning.

      (4) Although the method section described how to calculate overdispersion and SFEP, some concise, intuitive descriptions of these measures in the result section would help readers understand these results.

      Overdispersion is better explained. See lines 646-650.

      (5) I recommend adding a figure of the task performance of the rats used in the electrophysiological recording experiment and a table summarizing the number of cells recorded per animal. 

      We have included Table S2 with the cell counts and a summary of the performance for each of the rat in the electrophysiological recording experiment.

      (6) Readers would appreciate additional information on task apparatus, such as the size, appearance, and rotating speed of the arena, as well as stationary cues available in the room. 

      This is now provided in the Methods under “Active place avoidance task”.

      (7) Lines 425-416: "On the fourth day of the behavioral training, the rats had a single trial with the shock on to test retention of the training." Shouldn't it be "shock off"? 

      No the shock was on to prevent extinction learning and to increase the challenge for conflict learning.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Overall, the conclusions of the paper are mostly supported by the data but may be overstated in some cases, and some details are also missing or not easily recognizable within the figures. The provision of additional information and analyses would be valuable to the reader and may even benefit the authors' interpretation of the data.

      We thank the reviewer for the thoughtful and constructive feedback. We are pleased that the reviewer found the overall conclusions of our paper to be well supported by the data, and we appreciate the suggestions for improving figure clarity and interpretive accuracy. Below we address each point raised:

      The conclusion that DREADD expression gradually decreases after 1.5-2 years is only based on a select few of the subjects assessed; in Figure 2, it appears that only 3 hM4Di cases and 2 hM3Dq cases are assessed after the 2-year timepoint. The observed decline appears consistent within the hM4Di cases, but not for the hM3Dq cases (see Figure 2C: the AAV2.1-hSyn-hM3Dq-IRES-AcGFP line is increasing after 2 years.)

      We agree that our interpretation should be stated more cautiously, given the limited number of cases assessed beyond the two-year timepoint. In the revised manuscript, we will clarify in both the Results and Discussion that the observed decline is based on a subset of animals. We will also state that while a consistent decline was observed in hM4Di-expressing monkeys, the trajectory for hM3Dq expression was more variable—with at least one case showing increased in signal beyond two years.

      Given that individual differences may affect expression levels, it would be helpful to see additional labels on the graphs (or in the legends) indicating which subject and which region are being represented for each line and/or data point in Figure 1C, 2B, 2C, 5A, and 5B. Alternatively, for Figures 5A and B, an accompanying table listing this information would be sufficient.

      We thank the reviewer for these helpful suggestions. In response, we will revise the relevant figures as noted in the “Recommendations for the authors”, including simplifying visual encodings and improving labeling. We will also provide a supplementary table listing the animal ID and brain regions for each data point shown in the graphs.

      While the authors comment on several factors that may influence peak expression levels, including serotype, promoter, titer, tag, and DREADD type, they do not comment on the volume of injection. The range in volume used per region in this study is between 2 and 54 microliters, with larger volumes typically (but not always) being used for cortical regions like the OFC and dlPFC, and smaller volumes for subcortical regions like the amygdala and putamen. This may weaken the claim that there is no significant relationship between peak expression level and brain region, as volume may be considered a confounding variable. Additionally, because of the possibility that larger volumes of viral vectors may be more likely to induce an immune response, which the authors suggest as a potential influence on transgene expression, not including volume as a factor of interest seems to be an oversight.

      We thank the reviewer for raising this important issue. We agree that injection volume is a potentially confounding variable. In response, we will conduct an exploratory analysis including volume as an additional factor. We will also expand the Discussion to highlight the need for future systematic evaluation of injection volume, especially in relation to immune responses or transduction efficiency in different brain regions.

      The authors conclude that vectors encoding co-expressed protein tags (such as HA) led to reduced peak expression levels, relative to vectors with an IRES-GFP sequence or with no such element at all. While interesting, this finding does not necessarily seem relevant for the efficacy of long-term expression and function, given that the authors show in Figures 1 and 2 that peak expression (as indicated by a change in binding potential relative to non-displaced radioligand, or ΔBPND) appears to taper off in all or most of the constructs assessed. The authors should take care to point out that the decline in peak expression should not be confused with the decline in longitudinal expression, as this is not clear in the discussion; i.e. the subheading, "Factors influencing DREADD expression," might be better written as, "Factors influencing peak DREADD expression," and subsequent wording in this section should specify that these particular data concern peak expression only.

      We appreciate this important clarification. In response, we will revise the title to “Factors influencing peak DREADD expression levels”, and we will specify that our analysis focused on peak ΔBP<sub>ND</sub> values around 60 days post-injection. We will also explicitly distinguish these findings from the later-stage changes in expression seen in the longitudinal PET data in both the Results and Discussion sections.

      Reviewer #2 (Public review):

      Weaknesses

      This study is a meta-analysis of several experiments performed in one lab. The good side is that it combined a large amount of data that might not have been published individually; the downside is that all things were not planned and equated, creating a lot of unexplained variances in the data. This was yet judiciously used by the authors, but one might think that planned and organized multicentric experiments would provide more information and help test more parameters, including some related to inter-individual variability, and particular genetic constructs.

      We thank the reviewer for bringing this important point to our attention. We fully agree that the retrospective nature of our dataset, compiled from multiple studies conducted within a single laboratory, introduces variability due to differences in constructs, injection sites, and timelines. While this reflects the real-world constraints of long-term NHP research, we acknowledge the need for more standardized approaches. We will add a statement in the revised Discussion emphasizing that future multicenter and harmonized studies would be valuable for systematically examining specific parameters and inter-individual variability.

      Reviewer #3 (Public review):

      Minor weaknesses are related to a few instances of suboptimal phrasing, and some room for improvement in time course visualization and quantification. These would be easily addressed in a revision.

      These findings will undoubtedly have a very significant impact on the rapidly growing but still highly challenging field of primate chemogenetic manipulations. As such, the work represents an invaluable resource for the community.

      We thank the reviewer for the positive assessment of our manuscript and for the constructive suggestions noted in the “Recommendations for the authors”. In response, we will carefully review and revise the manuscript to improve visualization and quantification.

  6. Mar 2025
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      __Evidence, reproducibility and clarity __

      This is a well-written manuscript that describes a thorough study of the functionality of individual residues of a central component of the ESX-3 type VII secretion system of Mycobacterium smegmatis, EccD3, in the essential role of this protein transport system in iron acquisition. Using the powerful and unbiased approach of deep mutational scanning (DMS), the authors assessed the impact of different mutations on a large number of residues of this component. This carefully executed research highlights the importance of hydrophobic residues at the center the ubiquitin-like domain, specific residues of the linker domain that connects this domain with the transmembrane domains and specific residues that connect EccD3 with the MycP3 component.

      Major comments

      Since the LOF effects in the iron-sufficient and iron-deficient condition differ less than expected, the differences of the DMS results between these two conditions should be better presented, explained and discussed: 1. The authors discuss: "Of the 270 LOF mutations seen in the iron-deficient condition, 37 (13.7%) were tolerant in the iron sufficient condition, and 39 (14.44%) had strong LOF effects but weak LOF effects in the iron sufficient condition." Do the authors mean that 39 (14.44%) had strong LOF effects in the iron-deficient condition, but weak LOF effects in the iron-sufficient condition. In turn, does this mean that the remaining mutants (71.9%) had similar LOF effects in the two conditions?

      We thank this reviewer for their comment and for highlighting a lack of clarity. We have updated the main text to more effectively communicate our point - that 270 mutants had LOF effects in the iron-deficient media. 37 of these 270 mutants were tolerant in the iron-sufficient media. 39 of these 270 mutants had strong LOF effects in iron-deficient media, but were weak LOF in iron-sufficient media. The remaining 124/270 mutants had weak LOF effects in both conditions. The larger point is that removing iron leads to stronger selection - tolerant mutants become LOF, weak LOF become strong LOF. Removing iron pushes mutants at the bounds over the limit.

      __ The diagonal shape of the scatter plot in Fig. 2C, which shows the correlation of the Enrich2 scores of all mutants in the two conditions, indicates that the growth of most mutants is affected similarly in these conditions, but in Fig. 2D lower graph, which shows only the Enrich2 scores of missense mutants, there are clear differences between the two conditions. How can this be explained?__

      We apologize for any confusion created by this presentation of our data. We hoped to highlight that while effects are largely similar across conditions, there are some differences. As communicated in our first response, 270 out of our ~2700 missense mutations had LOF effects in the iron-deficient condition. 37 of these 270 mutants were tolerant in the iron-sufficient media. 39 of these 270 mutants had strong LOF effects in iron-deficient media, but were weak LOF in iron-sufficient media. The remaining 124 mutations had weak LOF effects in both conditions.

      While Figure 2C shows this difference, it is hard to see by nature of using a scatter plot. We have added contours to highlight how our data is distributed. Our density plots in Figure 2D are meant to try to highlight these differences, where the top plot is showing the effects of all missense mutations. Negatively scored mutations represent LOF effects, mutations with scores around 0 are considered tolerant, and the extremely rare scores with positive scores have GOF effects. Our bottom plot specifically zooms into the negatively scored mutations, to show the 270 LOF mutants we discussed. Specifically, we were hoping to highlight the 39 mutations that have strong LOF effects in iron-deficient media (so the purple line scores are more negative), but weak LOF effects in iron-sufficient media (the green line scores are less negative).

      __ Regarding the authors' explanation for the observed LOF effects in the permissive condition, "This speaks to the sensitivity of next-generation sequencing compared to the strong differences observed between conditions in phenotypic growth curves." But this sensitivity does not explain the observed large LOF effects but no growth difference in the permissive condition, unless the analysis is less quantitative than expected? Could it be that there is local iron depletion in this mixed culture, causing selection pressure even in the iron-sufficient condition? Moreover, the severity of the growth defect at the time of sampling, i.e., after 24 hours of growth, is unclear. Indeed, the growth curve in Fig. 1 shows that the growth of the double mutant in iron-deficient conditions is significantly impaired at that timepoint. In the growth curve in Fig. 2B (and also slightly in Fig. 2F), however, the growth defect is less pronounced: the double mutant has a similar OD600 as the WT strain, although the error bar is larger. Is this variability between replicates also seen in the DMS analysis? In general, no statistics are shown for the DMS analysis and there is no information on the significance of the observed LOF effects. In addition, the legend should explain how many replicates the DMS data are based on.__

      We thank this reviewer for their comment and for highlighting a point of confusion. In addition to increased sensitivity in next generation sequencing compared to our growth curve experiments, our data analysis and variant scoring was performed by comparing growth rates of our mutant strains to our wild type strain. So, any effect on viability or growth rates seen by expression mutant variants will be more notable in our DMS scoring, as they are relative to wild type. In contrast, our growth curves are plotted as the raw OD600 values of each strain. We believe this difference underlies the difference seen in our heatmaps and growth rates.

      It is also a relevant and important point that our libraries are grown as mixed cultures, where there is competition over the limited iron in their growth media, as we highlight in our discussion.

      While the double mutant does show a stark growth defect at 24 hours in Figure 1 compared to the WT and complement, it grows just as well as those strains in Figure 2B. The growth defect becomes notable after 24 hours. Within this experiment, we observed variability in growth at the 24hr timepoint for the negative control strain, but also selection when compared to the positive control and library growth at later time points. We analyzed our DMS data in accordance with typical methods used in the field (see: https://doi.org/10.1186/s13059-017-1272-5). We include statistics for the DMS analysis as supplemental Figure 1. We apologize for any confusion regarding the figure caption, however in our manuscript we do point out that our library growth in Figure 2B was repeated in triplicate in the figure caption, and the samples collected during that experiment were the ones used to generate the DMS data.

      Minor comments

      1. Line and page numbering should be added to the manuscript to facilitate the reviewing process.

      We have updated our manuscript to include line and page numbering.

      __ "Knockout of the entire ESX-3 operon leads to inhibited M. smegmatis growth in a low-iron environment. When individual components of the ESX-3 system are deleted, growth is only available under impaired if the additional siderophore exochelin formyltransferase fxbA is also knocked out20." First, a reference should be added to the first sentence. Second, Siegrist et al. did not exactly show this. They showed that the fxbA/eccC3 double mutant grows slower that the fxbA single mutant. To my knowledge there is no publication showing that single esx-3 component mutants grow as WT in iron-deficient conditions. Do the authors have data demonstrating this? If true, it is surprising that mutating EccD3 has a milder phenotype compared the complete region deletion, as it is a crucial ESX-3 component.__

      We apologize for any confusion. We had the relevant reference two lines prior, and have since added it to that sentence as well.

      The reviewer is correct that Siegrest et al did not show the effects of just ESX-3 component single deletions. However, Siegrest et al. 2009 demonstrated that deleting the entire ESX-3 operon results in growth similar to the wild type strain in low-iron media. In contrast, the fxbA single knockout exhibits a notable growth defect, and the fxbA/ESX-3 double knockout has an even more severe growth defect. Following the logic that a double knockout is needed to observe a growth defect in low-iron media, Siegrest et al. 2014 demonstrated this also extends to single ESX-3 component knockouts, such as the fxbA/eccD3 double knockout strain. To ensure clarity and accuracy, I will edit the sentence to say "When individual components of the ESX-3 system are deleted, growth is significantly impaired when the additional siderophore exochelin formyltransferase fxbA is also knocked out."

      __ Reference to Table 1, should be a reference to Table S1.__

      We have updated our manuscript to correct this reference.

      __ "Our heatmaps surprisingly reveal residues where substitutions are deleterious specifically in the iron-sufficient condition" Refer here to Fig. S2.__

      We have updated our manuscript to include this reference.

      __ "In the iron-deficient condition, 6/551 (1.08%) missense mutations have a weak LOF effect, and 0 have strong effects." More clearly explain this refers to the residues of the transmembrane region.__

      We have updated our manuscript to provide more clarity.

      __ "The MycP transmembrane helix has been hypothesized to be required for ESX complex specificity, targeting MycP to associate with the correct ESX homologue." I miss a reference here. And I thought that the transmembrane domain of MycP was required for complex stability not for specificity?__

      We thank the reviewer for pointing out our missing citation, and asking us to clarify our point. I believe the literature suggests that both the protease and transmembrane domains of MycP are required for both complex stability and specificity. van Winden et al. 2016 https://doi.org/10.1128/mbio.01471-16 show that MycP5 needs to be present for secretion. The protease activity can be abolished and the ESX-5 complex can still secrete and be pulled down, as seen by BN-PAGE. van Winden et al. 2019 https://doi.org/10.1074/jbc.RA118.007090 show that truncated mutants missing either the protease domain or the transmembrane domain cannot rescue ESX-5 secretion or complex stability in a MycP knockout strain. More relevant, they attempted to rescue MycP1 and MycP5 mutants by creating chimeric proteins that either had the MycP1 protease domain and MycP5 transmembrane domain, or the MycP5 protease domain and MycP1 transmembrane domain. If the protease and transmembrane domains were required for complex stability and NOT specificity, we would see MycP5 rescue ESX-1 secretion in the MycP1 mutant strains and vice versa. We would also see the chimera proteins rescue both ESX-1 and ESX-5 secretion and complex stability. Instead, we see that neither chimera rescued ESX-1 nor ESX-5 secretion or complex stability, implying that both MycP domains are necessary.

      We will amend our paper text to reference MycP's role in complex stability instead of specificity, and soften the language: "The MycP transmembrane helix has been shown to be required for ESX complex stability, as MycP knockouts and truncated mutants abolish ESX secretion and pulldowns of the entire complex."

      __ "....role in ESX function relating to EccB3 and EccC3. In the transmembrane, ..... we" Insert "region" after "transmembrane"__

      We have updated our manuscript to include this update.

      Significance

      The study provides insight into individual residues of a central component of the ESX-3 type VII secretion system for functionality, which is useful for those studying the functioning of mycobacterial type VII secretion systems. Moreover, because this system is essential for the growth of the important pathogen M. tuberculosis, this knowledge can be used to design new anti-tuberculosis compounds that block the ESX-3 system. Although the results mainly confirm previous observations (highlighting specific residues important for the stability of ubiquitin and residues of other parts of EccD important for protein-protein interactions within the ESX-3/ESX-5 membrane complex), to my knowledge this is the first time DMS has been applied to mycobacteria. This study is therefore of interest to mycobacteriologists.


      Reviewer #2

      __Evidence, reproducibility and clarity __

      This work provides valuable insights into EccD3 function, a core component of the ESX-3 secretion system. The strength of this study lies in the development of a robust functional assay for the systematic mapping of functionally relevant amino acids in EccD3. The approach could potentially be expanded to analyze other ESX-3 components but remains limited to the ESX-3 secretion system. 1. The authors engineered an M. smegmatis knockout strain with deletions of fxbA and eccD3. Deletion of fxbA renders the exocholin iron uptake system non-functional, forcing the bacteria to rely on siderophore-mediated iron uptake under iron-limiting conditions. This process, in turn, depends on ESX-3 secretion activity, as PPE4, a known ESX-3 substrate, has been previously implicated in iron utilization in M. tuberculosis (Tufariello et al., 2016). This experimental setup links EccD3 function to a growth phenotype under iron-limiting conditions, as mutations impairing ESX-3 secretion disrupt iron utilization and mycobacterial growth. 2. By complementing the knockout strain with a library of EccD3 mutant variants, the authors systematically identify residues essential for protein-protein interactions within the ESX-3 core complex. Structural analysis corroborates the functional relevance of these residues, specifically those mediating interactions between EccD3 and other ESX-3 components, or those disrupting the hydrophobic core of the EccD3 ubiquitin-like (Ubl) domain. 3. Structural comparisons with the MycP5-bound ESX-5 complex allow the authors to predict residues within EccD3 that may interact with MycP3 during ESX-3 core complex assembly. Furthermore, comparisons with the ESX-5 hexamer suggest residues that may stabilize or drive oligomerization of the ESX-3 dimer into its putative hexameric state. These insights are significant and provide testable hypotheses for future studies. 4. The methodology is limited to ESX-3. The authors exploit the essentiality of ESX-3 for siderophore-dependent growth under iron-limiting conditions. However, this functional readout cannot be directly transferred to other ESX systems (ESX-1, ESX-2, ESX-4, ESX-5), which have distinct substrates, biological roles, and regulatory mechanisms.

      Significance

      This work provides valuable insights into EccD3 function, a core component of the ESX-3 secretion system. The strength of this study lies in the development of a robust functional assay for the systematic mapping of functionally relevant amino acids in EccD3. The approach could potentially be expanded to analyze other ESX-3 components but remains limited to the ESX-3 secretion system.

      Thank you for your thoughtful and supportive feedback. We appreciate your time and effort in reviewing our study.


      Reviewer #3

      __Evidence, reproducibility and clarity __

      The manuscript by Trinidad et al. provides a deep mutational scanning (DMS) analysis to investigate the functional roles of residues from the EccD3 subunit of the Type VII ESX-3 secretion apparatus from M. smegmatis. A previously published structure of ESX-3 from M. smegmatis by the Rosenberg group (Oren Rosenberg is also an author of this paper) is used as basis for structural interpretation of the DMS data presented in this contribution. A shortcoming of the previous structure, despite being very rich in terms of structural details, was in the lack of hexameric pore formation, which has been established more recently by structures of the related ESX-5 system.

      Technically, DMS is state-of-the art and a powerful approach to systematically scan residues of potential functional interest. Therefore, the data presented here, provide a remarkable repository for further interpretation in this contribution and by other future investigations. The experimental data have been deposited in Github enabling access by others in the future.

      Overall, the paper would benefit from an improved overall organisation. I found in part hard to extract some of the main points from the way the data are presented. In essence, two separate screens were performed, the first one focusing on the EccD3 Ubl domain and adjacent linker regions and a second one on the EccD3 TM region. I think the paper could be better structured accordingly. Tables of residues with strong effects in iron-deficient and iron-sufficient media, together with their structural annotation, would facilitate extracting main messages from this manuscript. Without going too much in detail, there is also scope for improvement of most of the structural figures. More consistency in terms of color coding with the previous paper by Powileit et al. (2019) would also help navigation.

      A potential weakness of the paper is in the limited scope of interpretation of the data in the context of the dimeric ESX-3 assembly, which is actually acknowledged by the authors. Computational AI-based methods should allow generating a complete pore model of ESX-3, which would allow interpretation of some of the data in a more functional relevant context. This would enhance the validity of the current interpretations presented.

      We acknowledge the lack of a hexameric ESX-3 structure, and would love to base our analysis on such a structure. Unfortunately, experimentally purifying and determining such a structure is beyond the scope of this manuscript. While AI-based methods are certainly exciting and helpful to make sense of mutational data, they are not able to computationally predict such large structures. The AlphaFold3 server website is commonly used for these purposes and allows predictions of up to 5000 tokens (or amino acids). An ESX-3 hexamer would be composed of 6x EccB proteins (519 AA each), 6x EccC proteins (1326 AA each), 12x EccD proteins (476 AA each), and 6x EccE proteins (310 AA each). Together, this complex would be made up of 18,642 amino acids.

      We tried using alphafold to predict an ESX-5 dimer complex, as well as reproduce the ESX-3 dimer complex, and were unable to produce these structures. Each ESX protomer is assembled correctly, as each protein within the complex makes appropriate contacts with each other. We see the EccD-dimers still form the membrane vestibule within each ESX complex. The issue is the ESX dimer complex has not assembled correctly: the EccC transmembrane helix 1 of a protomer should interact with the EccB transmembrane helix of the neighboring protomer; and, the N-terminus of EccB in one protomer should interact with the loop between the EccD transmembrane helices 10 and 11 in the neighboring protomer. Instead, Alphafold creates contacts along the EccD proteins from both complexes. We have included a "top-down" view of the ESX-5 dimer, where the periplasmic domains of EccB have been cleaved off for clarity.

      A side view:

      Here we have the ESX-3 dimer structure published by Poweleit et al. side-by-side with the ESX-3 dimer predicted by alphafold, visualized in Pmyol. The alphafold structure largely has each proteins' domains and folds properly predicted, including even the EccD3 dimer found in each ESX protomer. However, the protomers are not assembled into a dimer properly as compared to the purified ESX-3 dimer from PDB: 6umm. We included a "front" and "side view", as well as a "top down" view where the cytoplasmic domains have been hidden for visual clarity.

      The use of full names and acronyms needs to be more consistent. As an example, the terms "ubiquitin-like" and ubiquitin-like (Ubl) and UBl are used in parallel throughout the manuscript. The percentages given in various places of the paper could be reduced to integers, as they generally relate to relatively small data sets. Please express numbers with a precision, reasonable matching expected statistical significance.

      We apologize for the lack of consistency in how we referred to the ubiquitin-like domain. I originally wrote "ubiquitin-like (Ubl)" once per section (intro, results, discussion). I have edited these all to just "Ubl" after the introduction, except for figure and section titles. We have also reduced our percentages to integers.

      Some of the DMS experiments have been repeated three-fold, which should be a minimal number to allow extracting statistical significance, other experiments have only been repeated two-fold. Could this be clarified, please?

      We apologize for this oversight, and thank the reviewer for pointing this out. All experiments were done in triplicate, the exception being the site-directed mutant growth curves, which were performed in duplicate. We have repeated this experiment in triplicate in response to this point. As we repeated this experiment, mutant R134A dropped out due to technical reasons, and so we did not include it in the updated growth curves.

      Specific comments on text and figures:

      Figure 1: The EM densities shown considerably deviate from those that were shown in the original publication by Poweleit et al (2019). If there is an aim is to reinterpret the data this needs to be described in sufficient technical detail. There may be a case for this, in light of recent advances in computational AI-vased structural biology.

      We acknowledge this may be confusing and we apologize for that, as the EM density I have shown in this manuscript uses the same map we used to create the one seen in the original publication Poweleit et al 2019. There are existing crystal structures of EccB1 and the ATPase domains of EccC1 that we used to create homology models of EccB3 and EccC3 using the structure-prediction software RaptorX for the 2019 publication. These homology models were then combined with a low resolution EM density to create the model seen in the 2019 eLife paper. I did not include those homology models in this manuscript, as I did not believe those predictions were relevant to this study. I wanted to include the highest resolution and thus most accurate depiction of our ESX-3 structure.

      Introduction, statement "We made comparisons to a prior DMS on ubiquitin to increase signal-to-noise in our interpretation of the Ubl domain mutagenesis data." Could this be further explained please? I could not find anything in addition in the Methods section and elsewhere.

      __ __We apologize for the confusion!

      EccD3 Ubl domain and ubiquitin DMS dataset comparisons

      To compare the DMS data of EccD3 Ubl with that of ubiquitin, we first identified homologous residues in each structure. This was achieved by aligning the EccD3 Ubl domain with ubiquitin (PDB: 1ubq) using PyMOL and assessing the positional correspondence of side chains (e.g., ubiquitin residue I3 aligned with EccD3 residue V12). Next, we referenced missense mutation datasets to calculate the average DMS score for each residue position in both proteins. We then generated a scatter plot to compare the average missense scores for ubiquitin and EccD3 Ubl using ggplot2. Data points were color-coded according to the functional roles assigned to ubiquitin, with residues forming the hydrophobic patch and core highlighted, while all other residues were represented in grey.

      Description of "vestibule" as a core feature of the ESX-3 structure. As mentioned above, this is very much a result of the presented dimeric arrangement. In the context of a complete pore model, these features may change or even disappear.

      While we would certainly welcome an ESX-3 hexamer model to definitively determine whether this feature persists, such a model is not currently available. However, the highly homologous ESX-5 complex retains these EccD vestibules, and there is no reason to believe these features would change or disappear. Therefore, based on our interpretation of the ESX-3 dimer and ESX-5 hexamer we believe that the EccD membrane vestibule is not just an artifact of the ESX-3 dimer complex.

      It is possible that the reviewer misunderstood what we were referring to as the vestibule. We updated the language in the text to improve clarity. However the vestibule is not a consequence of ESX-3 complex dimer formation. It is an inherent feature of the ESX monomer complexes, where two EccD proteins dimerize to form said vestibule. Furthermore, there is no evidence to suggest that this feature would be lost in a hexameric state.

      Structurally, the ESX-3 dimer consists of two ESX-3 monomer complexes, each containing one EccB, one EccC, one EccE, and two EccD proteins. Therefore, each ESX-3 monomer inherently includes an EccD dimer. The presence of the EccD dimer is not exclusive to the ESX-3 dimer but is a fundamental component of each ESX-3 complex. Similarly, the ESX-5 hexamer retains the EccD dimer within each ESX-5 complex, further supporting the idea that this structural feature is conserved.

      Figure 2, panel B: Isn't right that "positive" and "negative" need to exchanged? Perhaps, there is something I misunderstood.

      We apologize for the confusion, and appreciate the reviewer pointing out this inconsistency. We have updated the manuscript to correct this.

      Figure 2, panel F: it is hard to extract the assignments from the overlaid curves.

      We apologize for a lack of clarity in how this growth curve was presented. We have included labels at the end point to show where each sample is.

      Figure 3, caption "from low (red) to white (tolerant)": for the sake of consistency, please either put the color in parentheses, or functional description. Does this statement relate to panel A or B? "All other residues are colored white". I can't see this.

      We apologize for the inconsistency, and have updated this label. We hope we have clarified the fact that the entire structure is white except for the residues we colored red.

      Results text "In contrast to ubiquitin, all hydrophobic core residues in the EccD3 Ubl domain are equally intolerant to charged residue swaps. Unsurprisingly, residues important for ubiquitin's specific degradation interactions are not sensitive to substitutions in the EccD3 Ubl domain." Does this mean that proper folding of Ubl is less critical for ESX_3 function? Please elaborate on this further.

      We apologize for any confusion. Our data shows that residues which side chains extend into the hydrophobic core of the Ubl domain are intolerant to swaps to charge residues. We hypothesize these missense mutations disrupt this hydrophobic core, and lead to destabilization of this domain. These intolerant missense mutations each have negative Enrich2 scores, implying a loss of ESX-3 function, and that proper folding of the Ubl is critical for ESX-3 function. We have updated our text to clarify this point:

      Unsurprisingly, residues important for ubiquitin function's specific interactions are not sensitive to substitutions in the EccD3 Ubl domain. There is no simple discernable preference within the Ubl domain to any side that maintains protein-protein interactions, implying that the scores are dominated by stability effects and that the Ubl domain must maintain a stable β-grasp fold for ESX-3 function.

      Figure 4, panel C: the surface does not provide residue-specific information, hence this panel is not very informative.

      We agree with the reviewer that Figure 4 panel C was not very informative, and so we have removed it from Figure 4 for the sake of brevity.

      Results text "T148 extends out from transmembrane helix 1 into a hydrophobic pocket between transmembrane helices 1, 2, and 3." Could this please be illustrated in one of the structural presentations?

      We have updated figure 5 to include a snapshot of this residue and the hydrophobic pocket it extends into, as panel E.

      Results text, last paragraph, Figure 5C-D: interpretation of the experimental ESX-3 data based on ESX-5 models is problematic, without showing proof of conservation of relevant sequence/structural features. As mentioned above, I would encourage the authors to establish a hexameric ESX-3 model and interpret the data starting from there. Extrapolation of the interpretation of data to other ESX systems, including ESX-5, would expand the scope by generalization, which however would open another chapter. The ESX-5 structure does not explain e.g. why W227 when mutated is less sensitive to iron depletion as opposed to iron being present.

      We do not believe we can use AI to predict a hexameric ESX-3 model. We will update our supplement to include a figure showing proof of conservation between the EccD3 and EccD5 sequences. We can superpose the ESX-3 dimer structure onto the ESX-5 hexamer structure, and see that this dimeric complex overlays quite well on top of an ESX-5 subcomplex. We can imagine this hexamer as a trimer of dimers, where three copies of this dimeric complex interact to form the hexamer. The superposition is not perfect and there are slight rearrangements to different helices to allow for hexamer formation, but these do not imply we cannot compare these two homologous structures.

      We have included a new structure snapshot in Figure 5, where panel D is the ESX-3 dimer (PDB: 6umm) shown as a side and top-down view. This allows for a comparison with panel C, the snapshot of the ESX-5 complex (PDB: 7np7) where in two protomers the EccB, EccC, and EccD proteins are colored the same way as ESX-3, and the other ESX-5 protomers are colored white. Note that in this hexamer, EccE is missing. We see the EccD membrane vestibule is conserved in both structures.

      Significance

      Strength and Limitations: already assessed under "Evidence, reproducibility and clarity".

      There is scope for further interpretation using experimental structural and modeling data. There is also scope for applying complementary assays for selected mutants, most likely within a lower throughput format.

      Advance: The contribution demonstrates well the power of DMS for systematic screening, in the context of Type VII secretion. The main advance is in the raw data generated and deposited.

      Audience: microbiology with a specific interest in secretion, structural biology

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The manuscript by Trinidad et al. provides a deep mutational scanning (DMS) analysis to investigate the functional roles of residues from the EccD3 subunit of the Type VII ESX-3 secretion apparatus from M. smegmatis. A previously published structure of ESX-3 from M. smegmatis by the Rosenberg group (Oren Rosenberg is also an author of this paper) is used as basis for structural interpretation of the DMS data presented in this contribution. A shortcoming of the previous structure, despite being very rich in terms of structural details, was in the lack of hexameric pore formation, which has been established more recently by structures of the related ESX-5 system.

      Technically, DMS is state-of-the art and a powerful approach to systematically scan residues of potential functional interest. Therefore, the data presented here, provide a remarkable repository for further interpretation in this contribution and by other future investigations. The experimental data have been deposited in Github enabling access by others in the future.

      Overall, the paper would benefit from an improved overall organisation. I found in part hard to extract some of the main points from the way the data are presented. In essence, two separate screens were performed, the first one focusing on the EccD3 Ubl domain and adjacent linker regions and a second one on the EccD3 TM region. I think the paper could be better structured accordingly. Tables of residues with strong effects in iron-deficient and iron-sufficient media, together with their structural annotation, would facilitate extracting main messages from this manuscript. Without going too much in detail, there is also scope for improvement of most of the structural figures. More consistency in terms of color coding with the previous paper by Powileit et al. (2019) would also help navigation.

      A potential weakness of the paper is in the limited scope of interpretation of the data in the context of the dimeric ESX-3 assembly, which is actually acknowledged by the authors. Computational AI-based methods should allow generating a complete pore model of ESX-3, which would allow interpretation of some of the data in a more functional relevant context. This would enhance the validity of the current interpretations presented.

      The use of full names and acronyms needs to be more consistent. As an example, the terms "ubiquitin-like" and ubiquitin-like (Ubl) and UBl are used in parallel throughout the manuscript. The percentages given in various places of the paper could be reduced to integers, as they generally relate to relatively small data sets. Please express numbers with a precision, reasonable matching expected statistical significance.

      Some of the DMS experiments have been repeated three-fold, which should be a minimal number to allow extracting statistical significance, other experiments have only been repeated two-fold. Could this be clarified, please?

      Specific comments on text and figures:

      Figure 1: The EM densities shown considerably deviate from those that were shown in the original publication by Poweleit et al (2019). If there is an aim is to reinterpret the data this needs to be described in sufficient technical detail. There may be a case for this, in light of recent advances in computational AI-vased structural biology.

      Introduction, statement "We made comparisons to a prior DMS on ubiquitin to increase signal-to-noise in our interpretation of the Ubl domain mutagenesis data." Could this be further explained please? I could not find anything in addition in the Methods section and elsewhere.

      Description of "vestibule" as a core feature of the ESX-3 structure. As mentioned above, this is very much a result of the presented dimeric arrangement. In the context of a complete pore model, these features may change or even disappear.

      Figure 2, panel B: Isn't right that "positive" and "negative" need to exchanged? Perhaps, there is something I misunderstood.

      Figure 2, panel F: it is hard to extract the assignments from the overlaid curves.

      Figure 3, caption "from low (red) to white (tolerant)": for the sake of consistency, please either put the color in parentheses, or functional description. Does this statement relate to panel A or B? "All other residues are colored white". I can't see this.

      Results text "In contrast to ubiquitin, all hydrophobic core residues in the EccD3 Ubl domain are equally intolerant to charged residue swaps. Unsurprisingly, residues important for ubiquitin's specific degradation interactions are not sensitive to substitutions in the EccD3 Ubl domain." Does this mean that proper folding of Ubl is less critical for ESX_3 function? Please elaborate on this further.

      Figure 4, panel C: the surface does not provide residue-specific information, hence this panel is not very informative.

      Results text "T148 extends out from transmembrane helix 1 into a hydrophobic pocket between transmembrane helices 1, 2, and 3." Could this please be illustrated in one of the structural presentations?

      Results text, last paragraph, Figure 5C-D: interpretation of the experimental ESX-3 data based on ESX-5 models is problematic, without showing proof of conservation of relevant sequence/structural features. As mentioned above, I would encourage the authors to establish a hexameric ESX-3 model and interpret the data starting from there. Extrapolation of the interpretation of data to other ESX systems, including ESX-5, would expand the scope by generalization, which however would open another chapter. The ESX-5 structure does not explain e.g. why W227 when mutated is less sensitive to iron depletion as opposed to iron being present.

      Referee cross-commenting

      I especially second the comments of referee #1, major comments, point 3 (statistical significance of the data). Addressing this point is crucial for the paper. Referee #2, significance section "The approach could potentially be expanded to analyze other ESX-3 components but remains limited to the ESX-3 secretion system." I was considering making the same point but did not at the end. Of course, ultimately, it would be great if all components of ESX-3 could be analyzed they way it was done for the EccD3 component. However, I am afraid such exercise could become quite open ended. Already by now, there is some compromise on the depth of mechanistic interpretation in light of a large data set generated.

      Significance

      Strength and Limitations: already assessed under "Evidence, reproducibility and clarity".

      There is scope for further interpretation using experimental structural and modeling data. There is also scope for applying complementary assays for selected mutants, most likely within a lower throughput format.

      Advance: The contribution demonstrates well the power of DMS for systematic screening, in the context of Type VII secretion. The main advance is in the raw data generated and deposited.

      Audience: microbiology with a specific interest in secretion, structural biology

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      Evading predation is of utmost importance for most animals and camouflage is one of the predominant mechanisms. Wu et al. set out to test the hypothesis of a unique camouflage system in leafhoppers. These animals coat themselves with brochosomes, which are spherical nanostructures that are produced in the Malpighian tubules and are distributed on the cuticle after eclosion. Based on previous findings on the reflectivity properties of brochosomes, the authors provide very good evidence that these nanostructures indeed reduce the reflectivity of the animals thereby reducing predation by jumping spiders. Further, they identify four proteins, which are essential for the proper development and function of brochosomes. In RNAi experiments, the regular brochosome structure is lost, the reflectivity reduced and the respective animals are prone to increased predation. Finally, the authors provide some phylogenetic sequence analyses and speculate about the evolution of these essential genes.

      Strengths:

      The study is very comprehensive including careful optical measurements, EM and TM analysis of the nanoparticles and their production line in the malphigian tubules, in vivo predation tests, and knock-down experiments to identify essential proteins. Indeed, the results are very convincingly in line with the starting hypothesis such that the study robustly assigns a new biological function to the brochosome coating system.

      A key strength of the study is that the biological relevance of the brochosome coating is convincingly shown by an in vivo predation test using a known predator from the same habitat.

      Another major step forward is an RNAi screen, which identified four proteins, which are essential for the brochosome structure (BSMs). After respective RNAi knock-downs, the brochosomes show curious malformations that are interesting in terms of the self-assembly of these nanostructures. The optical and in vivo predation tests provide excellent support for the model that the RNAi knock-down leads to a change of brochosomes structure, which reduces reflectivity, which in turn leads to a decrease of the antipredatory effect.

      Thank you very much for your positive feedback and insightful comments on our manuscript. We are delighted that you acknowledge the efforts we have made in studying the components and functions of Brochosomal proteins. We have carefully considered your suggestions and have thoroughly revised the manuscript to address the shortcomings identified in our original submission. We hope that the revised version meets with your approval. Below, please find our detailed point-by-point responses.

      Weaknesses:

      The reduction of reflectivity by aberrant brochosomes or after ageing is only around 10%. This may seem little to have an effect in real life. On the other hand, the in vivo predation tests confirm an influence. Hence, this is not a real weakness of the study - just a note to reconsider the wording for describing the degree of reflectivity.

      Thank you for your valuable suggestions. Based on your recommendations, we have revised the manuscript accordingly. Although the absolute reduction in light reflection due to Brochosomal coverage is approximately 10%, the relative decrease in light reflection on the leafhopper's surface is nearly 30%. Specifically, in the ultraviolet region, the reflection is reduced from about 30% to 20%, and in the visible light region, it is reduced from 20% to 10%. For detailed revisions, please refer to lines 151-156 of the revised manuscript.

      The single gene knockdowns seemed to lead to a very low penetrance of malformed brochosomes (Figure Supplement 3). Judging from the overview slides, less than 1% of brochosomes may have been affected. A quantification of regular versus abnormal particles in both, wildtype and RNAi treatments would have helped to exclude that the shown aberrant brochosomes did not just reflect a putative level of "normal" background defects. Of note, the quadruple knock-down of all BSMs seemed to lead to a high penetrance (Figure 4), which was already reflected in the microtubule production line. While the data shown are convincing, a quantification might strengthen the argument.

      While the RNAi effects seemed to be very specific to brochosomes and therefore very likely specific, an off-target control for RNAi was still missing. Finding the same/similar phenotype with a non-overlapping dsRNA fragment in one off-target experiment is usually considered required and sufficient. Further, the details of the targeted sequence will help future workers on the topic.

      Thank you for your valuable suggestions. Based on your recommendations, we have synthesized dsRNA targeting two non-overlapping regions of the coding sequences for four Brochosomal structural protein genes. These dsRNAs were injected individually and in combination for each gene. Our RNAi experiments for each BSM gene demonstrated that both individual and combined injections significantly suppressed the expression of the target genes, with the combined injection yielding slightly better silencing efficiency. Statistical analysis of the SEM observations revealed that the combined injection of dsRNAs targeting two non-overlapping regions led to a 60-70% reduction in the surface area coverage of Brochosomes. Additionally, approximately 20% of the remaining Brochosomes exhibited significant morphological changes. For detailed revisions, please refer to lines 199-211 of the revised manuscript, as well as Figures 3A and 3C, and Supplementary Figures 4 and 5.

      The main weakness in the current manuscript may be the phylogenetic analysis and the model of how the genes evolved. Several aspects were not clearly or consistently stated such that I felt unsure about what the authors actually think. For instance: Are all the 4 BSMs related to each other or only BSM2 and 3? If so, not only BSM2 and 3 would be called "paralogs" but also the other BSMs. If they were all related, then a phylogenetic tree including all BSMs should be shown to visualize the relatedness (including the putative ancestral gene if that is the model of the authors). Actually, I was not sure about how the authors think about the emergence of the BSMs. Are they real orphan genes (i.e. not present outside the respective clade) or was there an ancestral gene that was duplicated and diverged to form the BSMs? Where in the phylogeny does the first of the BSMs or ancestral proteins emerge (is the gene found in Clastoptera arizonana the most ancestral one?)? Maybe, the evolution of the BSMs would have to be discussed individually for each gene as they show somewhat different patterns of emergence and loss (BSM4 present in all species, the others with different degrees of phylogenetic restriction).

      Thank you very much for your constructive feedback on our phylogenetic analysis and the modeling of gene evolution. We fully agree with your insights and acknowledge that the evolutionary analysis of BSM genes remains somewhat ambiguous. This ambiguity is primarily due to the limited research on the precise structural protein composition of Brochosomes. While proteomics studies have analyzed and discussed the structural proteins of Brochosomes, the accurate composition of these proteins is still poorly understood. In this study, we identified four BSM proteins, but given the intricate structure of Brochosomes as proteinaceous spheres, we believe there may be additional BSM genes that have not yet been identified. Moreover, despite the presence of over ten thousand species within the Cicadomorpha, only three species have genome sequences available, and fewer than a hundred species have transcriptome sequencing data. The scarcity of research on Brochosomes, as well as the limited availability of genomic and transcriptomic data, poses significant challenges for our phylogenetic analysis and understanding of BSM gene evolution.

      Based on your suggestions, we have revised the manuscript accordingly. Specifically, we have updated Figure 5C by including ten additional species from Cereopoidea, Cicadoidea, and Fulgoroidea to better illustrate that BSM genes are true orphan genes. We have also added a phylogenetic tree of BSM genes within Cicadidae in Supplementary Figure 3. Additionally, we have expanded the discussion of BSM gene evolution in the manuscript (lines 503-556). For detailed revisions, please refer to Figure 5C, Supplementary Figure 3, and lines 507-585 of the revised manuscript.

      Related to these questions I remained unsure about some details in Figure 5. On what kind of analysis is the phylogeny based? Why are some species not colored, although they are located on the same branch as colored ones? What is the measure for homology values - % identity/similarity? The homology labels for Nephotetix cincticeps and N. virescens seem to be flipped: the latter is displayed with 100% identity for all genes with all proteins while the former should actually show this. As a consequence of these uncertainties, I could not fully follow the respective discussion and model for gene evolution.

      Thank you very much for your insightful comments and suggestions. We have carefully considered your feedback and have thoroughly revised our manuscript accordingly. Specifically, we have enhanced the description of the phylogenetic analysis process to provide greater clarity and transparency, with the detailed methods now included in lines 789-798. Regarding Figure 5C, we appreciate your attention to the coloring scheme. We would like to clarify that the family Cicadellidae comprises 25 subfamilies, many of which are represented by only one species in our figure. To ensure clarity and meaningful representation, we have chosen to color only those subfamilies with more than three species, thereby avoiding visual clutter and emphasizing the most relevant taxonomic groups. Additionally, we have corrected the inverted homology labels for Nephotetix cincticeps and Nephotetix virescens to ensure the accuracy and consistency of our data presentation.

      Conclusion:

      The authors successfully tested their hypothesis in a multidisciplinary approach and convincingly assigned a new biological function to the brochosomes system. The results fully support their claims - only the quantification of the penetrance in the RNAi experiments would be helpful to strengthen the point. The author's analysis of the evolution of BSM genes remained a bit vague and I remained unsure about their respective conclusions.

      The work is a very interesting study case of the evolutionary emergence of a new system to evade predators. Based on this study, the function of the BSM genes could now be studied in other species to provide insights into putative ancestral functions. Further, studying the self-assembly of such highly regular complex nano-structures will be strongly fostered by the identification of the four key structural genes.

      Reviewer #1 (Recommendations for the authors):

      Main manuscript:

      Please consider the annotated pdf with suggestions for wording and comments at the authors' discretion:

      Thank you very much for your detailed suggestions and comments provided in the annotated PDF. We have carefully reviewed each of your points and have revised the manuscript accordingly. All changes have been highlighted in red text for your convenience. The revised manuscript with tracked changes is available for your review. We believe these revisions have improved the clarity and quality of our manuscript. Thank you again for your valuable feedback.

      Supplementary Figure 2 C:

      Y-axes:

      - label: "surface coverage in %"

      - there are different scale values for the different days (e.g. 80-105 for day 5 and 0-80 at day 25). As a comparison between days is interesting, it would help to have the same scale values for all. That would show the decrease more intuitively.

      Thank you very much for your suggestion regarding the Y-axis in Supplementary Figure 2C. We agree that using a consistent scale across all time points is essential for clear and intuitive comparison. In the revised manuscript, we have standardized the Y-axis scale for Supplementary Figure 2C to a uniform range of 0-100% for all days. This change allows for a more straightforward visualization of the decreasing trend in surface coverage over time.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors investigate the optical properties of brochosomes produced by leafhoppers. They hypothesize that brochosomes reduce light reflection on the leafhopper's body surface, aiding in predator avoidance. Their hypothesis is supported by experiments involving jumping spiders. Additionally, the authors employ a variety of techniques including micro-UV-Vis spectroscopy, electron microscopy, transcriptome and proteome analysis, and bioassays. This study is highly interesting, and the experimental data is well-organized and logically presented.

      Strengths:

      The use of brochosomes as a camouflage coating has been hypothesized since 1936 (R.B. Swain, Entomol. News 47, 264-266, 1936) with evidence demonstrated by similar synthetic brochosome systems in a number of recent studies (S. Yang, et al. Nat. Commun. 8:1285, 2017; L. Wang, et al., PNAS. 121: e2312700121, 2024). However, direct biological evidence or relevant field studies have been lacking to directly support the hypothesis that brochosomes are used for camouflage. This work provides the first biological evidence demonstrating that natural brochosomes can be used as a camouflage coating to reduce the leafhoppers' observability of their predators. The design of the experiments is novel.

      We are extremely grateful for your positive feedback and insightful comments on our manuscript. We are delighted that you have recognized the efforts we have put into our research on how brochosomes serve as a camouflage coating to reduce the detectability of leafhoppers to their predators. We have carefully considered your suggestions and have thoroughly revised the manuscript to address the shortcomings of the original version. We hope that the revised version meets with your approval. Below, please find our detailed point-by-point responses.

      Weaknesses:

      (1) The observation that brochosome coatings become sparse after 25 days in both male and female leafhoppers, resulting in increased predation by jumping spiders, is intriguing. However, since leafhoppers consistently secrete and groom brochosomes, it would be beneficial to explore why brochosomes become significantly less dense after 25 days.

      Thank you very much for your valuable suggestions. We appreciate your interest in the reduction of brochosomal density on the surface of leafhoppers after 25 days.We believe that the primary reason for the decreased density of brochosomes on the leafhopper surface after 25 days is the reduced synthesis and secretion of brochosomes. The Malpighian tubules are the main sites for brochosome synthesis. As shown in Figure 2D and Supplementary Figure 1, the thick glandular segments of the Malpighian tubules in both male and female leafhoppers begin to atrophy 15 days after reaching adulthood. This indicates a gradual decline in brochosome synthesis and secretion after day 15 of adulthood. Following your suggestion, we have revised the discussion section of the manuscript to elaborate on this observation. The detailed changes can be found in lines 474-491 of the revised manuscript.

      (2) The authors demonstrate that brochosome coatings reduce UV (specular) reflection compared to surfaces without brochosomes, which can be attributed to the rough geometry of brochosomes as discussed in the literature. However, it would be valuable to investigate whether the proteins forming the brochosomes are also UV absorbing.

      Thank you very much for your valuable suggestions. Following your advice, we have successfully expressed four BSM genes in a prokaryotic system, purified the corresponding proteins, and applied them to quartz glass surfaces. We then measured the light reflectance of the quartz glass surfaces coated with these purified proteins. The results showed that the purified BSM proteins did not exhibit better antireflective properties compared to the control GST protein. For more details, please refer to Supplementary Figure 8 in the revised manuscript.  We believe that the excellent antireflective properties of brochosomes are fundamentally due to their unique geometric shapes. The hollow pores within the brochosomes, with diameters of approximately 100 nm, are significantly smaller than most wavelengths in the visible spectrum. When light passes through these tiny pores, diffraction occurs, while light passing through the ridges of the brochosomes causes scattering. The interference between the diffracted and scattered light from these pores and ridges results in the observed extinction characteristics of brochosomes. We have incorporated these insights into the discussion section of the revised manuscript (lines 416-425 and lines 432-442 of the revised manuscript).

      (3) The experiments with jumping spiders show that brochosomes help leafhoppers avoid predators to some extent. It would be beneficial for the authors to elaborate on the exact mechanism behind this camouflage effect. Specifically, why does reduced UV reflection aid in predator avoidance? If predators are sensitive to UV light, how does the reduced UV reflectance specifically contribute to evasion?

      Thank you very much for your valuable suggestions. Based on your advice, we have included a detailed discussion on how reducing ultraviolet (UV) reflection can help insects avoid predation. The revised content can be found in lines 445-460 of the revised manuscript.

      “UV light serves as a crucial visual cue for various insect predators, enhancing foraging, navigation, mating behavior, and prey identification (Cronin & Bok, 2016; Morehouse et al., 2017; Silberglied, 1979). Predators such as birds, reptiles, and predatory arthropods often rely on UV vision to detect prey (Church et al., 1998; Li & Lim, 2005; Zou et al., 2011). However, UV reflectance from insect cuticles can disrupt camouflage, increasing the risk of detection and predation, as natural backgrounds like leaves, bark, and soil typically reflect minimal UV light (Endler, 1997; Li & Lim, 2005; Tovee, 1995). To mitigate this risk, insects often possess anti-reflective cuticular structures that reduce UV and broad-spectrum light reflectance. This strategy is widespread among insects, including cicadas, dragonflies, and butterflies, and has been shown to decrease predator detection rates (Hooper et al., 2006; Siddique et al., 2015; Zhang et al., 2006). For example, the compound eyes of moths feature hexagonal protuberances that reduce UV reflectance, aiding nocturnal concealment (Blagodatski et al., 2015; Stavenga et al., 2005). In butterflies, UV reflectance from eyespots on wings can attract predators, but reducing UV reflectance or eyespot size can lower predation risk and enhance camouflage (Chan et al., 2019; Lyytinen et al., 2004). Hence, the reflection of ultraviolet light from the insect cuticle surface increases the risk of predation by disrupting camouflage (Tovee, 1995)”

      (4) An important reference regarding the moth-eye effect is missing. Please consider including the following paper: Clapham, P. B., and M. C. Hutley. "Reduction of lens reflection by the 'Moth Eye' principle." Nature 244: 281-282 (1973).

      Thank you very much for pointing out the omission of the important reference on the “moth eye” effect. We sincerely apologize for the oversight. Based on your suggestion, we have now included the seminal paper by Clapham and Hutley (1973) in the revised manuscript. The reference has been added to both the Introduction and Discussion sections to provide a more comprehensive context for our discussion on anti-reflective structures in insects.

      (5) The introduction should be revised to accurately reflect the related contributions in literature. Specifically, the novelty of this work lies in the demonstration of the camouflage effect of brochosomes using jumping spiders, which is verified for the first time in leafhoppers. However, the proposed use of brochosome powder for camouflage was first described by R.B. Swain (R.B. Swain, Notes on the oviposition and life history of the leafhopper Oncometopta undata Fabr. (Homoptera: Cicadellidae), Entomol. News. 47: 264-266 (1936)). Recently, the antireflective and potential camouflage functions of brochosomes were further studied by Yang et al. based on synthetic brochosomes and simulated vision techniques (S. Yang, et al. "Ultra-antireflective synthetic brochosomes." Nature Communications 8: 1285 (2017)). Later, Lei et al. demonstrated the antireflective properties of natural brochosomes in 2020 (C.-W. Lei, et al., "Leafhopper wing-inspired broadband omnidirectional antireflective embroidered ball-like structure arrays using a nonlithography-based methodology." Langmuir 36: 5296-5302 (2020)). Very recently, Wang et al. successfully fabricated synthetic brochosomes with precise geometry akin to those natural ones, and further elucidated the antireflective mechanisms based on the brochosome geometry and their role in reducing the observability of leafhoppers to their predators (L. Wang et al. "Geometric design of antireflective leafhopper brochosomes." Proceedings of the National Academy of Sciences 121: e2312700121 (2024)).

      Thank you very much for your valuable suggestions regarding the revision of the introduction to accurately reflect the relevant contributions in the literature. Based on your feedback, we have thoroughly revised the introduction and added the suggested references to provide a comprehensive context for our study. The details of these revisions can be found in lines 84-94 of the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) In Figure 2E, the data for Male-5d appears to be missing. Please verify and ensure all relevant data is included.

      Thank you for pointing out the issue regarding the data presentation in Figure 2E.We apologize for any confusion caused by the overlapping data points and the less conspicuous color choice for Male-5d. We have carefully reviewed the data and confirmed that all relevant data points, including Male-5d, are indeed present in the dataset. In the revised manuscript, we have adjusted the color scheme for Male-5d and Female-5d in Figure 2E to ensure that both curves are clearly distinguishable, even in areas where they overlap. This adjustment should facilitate a more accurate and convenient observation of the data trends. We appreciate your attention to detail, and we believe these revisions have improved the clarity and readability of the figure.

      (2) In Figure 6, please clarify the reflectance data in the inset. Clearly explain what the blue and light blue curves represent.

      Thank you for your suggestion regarding Figure 6.We have revised the figure to improve clarity. The light blue curve now represents the reflectance measurements of leafhoppers with higher brochosome coverage, while the dark blue curve corresponds to those with lower coverage. These changes, along with updated labels in the figure legend, ensure that the data are clearly distinguishable and easy to interpret. We appreciate your feedback and believe these revisions have enhanced the overall clarity of the figure.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Weaknesses (clarifications needed):

      (1) Experimental Design:

      The study does not mention whether the authors examined sex differences or any measures of attractiveness or hierarchy among participants (e.g., students vs. teachers). Including these variables could provide a more nuanced understanding of group dynamics.

      We are grateful to the reviewer for pointing out this valuable question. We have clarified that future studies should include sex differences or any measures of attractiveness or hierarchy among participants (e.g., students vs. teachers) (p. 27).

      “Finally, future research should investigate additional variables, including sex differences and measures of attractiveness or hierarchy among participants, such as students versus teachers.”  p. 27

      (2) fNIRS Data Acquisition:

      The authors' approach to addressing individual differences in anatomy is lacking in detail. Understanding how they identified the optimal channels for synchrony between participants would be beneficial. Was this done by averaging to find the location with the highest coherence?

      We apologize for missing some details here. We have included the following information in the fNIRS data acquisition and fNIRS data analyses to clarify the details (pp. 8 and 12).

      We employed the one-sample t-test method to assess the GNS disparity between the baseline and task sessions, identifying particular channels of interest. This analysis did not ascertain the maximum coherence level, but rather pinpointed the channel exhibiting significant divergence between the two sessions, which we designated as pertinent to the group decision-making task. Furthermore, we selected the PFC and left TPJ as our reference brain regions, guided by existing literature.

      “Two optode probe sets were used to cover each participant's prefrontal and left TPJ regions (Figure S1). The DLPFC plays a crucial role in group decision-making processes, with findings suggesting that individuals exhibiting reduced prefrontal activity were more prone to out-group exclusion and demonstrated stronger in-group preferences (Goupil et al., 2021; Jankovic, 2014; Yang et al., 2020). Similarly, the left TPJ has been previously reported to be associated with decision-making and information exchange (Freitas et al., 2019; Tindale et al., 2019).”  p. 8

      “Time-averaged GNS (also averaged across channels in each group) was compared between the baseline session (i.e., the resting phase) and the task session (from reading information to making decisions) using a series of one-sample t-tests. Here, p-values were thresholded by controlling for FDR (p < 0.05; Benjamini & Hochberg, 1995). When determining the frequency band of interest, the time-averaged GNS was also averaged across channels. After that, we analyzed the time-averaged GNS of each channel. Then, channels showing significant GNS were regarded as regions of interest and included in subsequent analyses.” p. 12

      (3) Behavioral Analysis:

      For group identification, the analysis currently uses a dichotomous approach. Introducing a regression model to capture the degree of identification could offer more granular insights into how varying levels of group identification affect collective behavior and performance.

      Thank you for your suggestion. As suggested, we have conducted the regression model to examine how varying levels of group identification affect collective performance, with the score of group identification being the independent variable and collective performance as the dependent variable (pp.9 and 15).

      “Moreover, we employed a regression model to examine how varying levels of group identification affect collective performance, using group identification scores as the independent variable and collective performance as the dependent variable.”  p.9

      “The results from the regression model highlighted a significant association between the degree of group identification and collective performance (β \= 0.45, t = 4.56, p \= 0.019).”  p.15

      (4) Single Brain Activation Analysis:

      The application of the General Linear Model (GLM) is unclear, particularly given the long block durations and absence of multiple trials. Further explanation is needed on how the GLM was implemented under these conditions.

      Thank you for your suggestion, we have added more details in this section (p.11).

      “In the GLM model analysis, HbO was the dependent variable, and the regression amount was set for different task stages (a. Reading information, b. Sharing private information, c. Discussing information, d. Decision). After that, we convolved the regression factor with the Hemodynamic Response Function (HRF) and obtained the brain activation β value of each participant in each channel at different task stages through regression analysis.’  p.11

      (5) Within-group neural Synchrony (GNS) Calculation:

      The method for calculating GNS could be improved by using mutual information instead of pairwise summation, as suggested by Xie et al. (2020) in their study on fMRI triadic hyperscanning. Additionally, the explanation of GNS calculation is inconsistent. At one point, it is mentioned that GNS was averaged across time and channels, while elsewhere, it is stated that channels with the highest GNS were selected. Clarification on this point is essential.

      We appreciate the reviewer for highlighting this inquiry. We utilized a conventional GNS calculation approach, as detailed in Line 296 of the manuscript, where the GNS was determined in pairs after the WTC computation, and then averaged. Further details regarding the second question have been provided in the article (p.12).

      (6) Placement of fNIRS Probes:

      The probes were only placed in the frontal regions, despite literature suggesting that the superior temporal sulcus (STS) and temporoparietal junction (TPJ) regions are crucial for triadic team performance. A justification for this choice or inclusion of these regions in future studies would be beneficial.

      The original manuscript clearly stated the use of two optode probe sets to encompass the prefrontal and left TPJ regions of each participant (see Figure S1, p. 8).

      (7) Interpretation of fNIRS Data:

      Given that fNIRS signals are slow, similar to BOLD signals in fMRI, the interpretation of Figure 6 raises concerns. It suggests that it takes several minutes (on the order of 4-5 minutes) for people to collaborate, which seems implausible. More context or re-evaluation of this interpretation is needed.

      The question you have pointed out is very pertinent, and we have added more explanation for this result (pp. 25-26).

      As previous studies have shown, the BOLD signal collected by fNIRS is slowly increasing compared to neuronal activity, which means that it has hysteresis (Turner et al., 1998). In social interactions such as group decision-making, the time of neural synchronization is delayed because people need to spend time increasing the number of dialogues to improve collaboration efficiency and form the same preference (Zhang et al., 2019). For example, the study of group consensus found that participants would show significant neural alignment after completing a period of dialogue (Sievers et al., 2024). In the task of cooperation, with the improvement of tacit understanding between two participants, the higher degree of neural synchronization (Cui et al., 2012). Therefore, the generation of neural synchronization depends on the interaction over a period of time. Therefore, we believe that the 4-5 minutes of collaboration time shown in Figure 6 may be related to establishing consensus and the same preference of team members, which is reflected in the dynamic time change of neural synchronization.

      Moreover, previous studies on neural synchronization during social interaction and group decision-making revealed that substantial neural synchronization occurred around 50-55 seconds into a teaching task involving prior knowledge (Liu et al., 2019) and persisted approximately 6 minutes into the discussion period (Xie et al., 2023). These results collectively validate the suitability of utilizing fNIRS signal response time in our study (pp. 25-26).

      “Our study also has demonstrated significant increases in single-brain activation, DLPFC-OFC functional connectivity, and GNS at 7, 12, and 17 minutes, respectively, following task initiation. The significant increase in these neural activities together constructs the two-in-one neural model that explains how group identification influences the collective performance we proposed. As previous studies have shown, the BOLD signal collected by fNIRS is slowly increasing compared to neuronal activity, which means that it has hysteresis (Turner et al., 1998). In social interactions such as group decision-making, the time of neural synchronization is delayed because people need to spend time increasing the number of dialogues to improve collaboration efficiency and form the same preference (Zhang et al., 2019). For example, participants would exhibit significant neural alignment, but only after they had completed a period of dialogue (Sievers et al., 2024). In the task of cooperation, with the improvement of cooperation efficiency between two participants, the higher degree of neural synchronization (Cui et al., 2012). Therefore, the generation of neural synchronization depends on the interaction over a period of time, which can affect the estimation of collaboration time. Prior research has shown that when the teaching task with prior knowledge began 50-55 seconds, significant neural synchronization could be generated between teacher and students, which meant that students and teacher achieved the same goal of learning knowledge (Liu et al., 2019). Moreover, a noteworthy increase in GNS was observed approximately 6 minutes into the group discussion period for better discussing and solving the problem (Xie et al., 2023). These findings are similar to ours. Therefore, the time points we found could reflect the dynamic time change of the neural process of team collaboration.’ pp.25-26

      Reviewer #2 (Public review):

      Weaknesses:

      The authors need to clearly articulate their hypothesis regarding why neural synchronization occurs during social interaction. For example, in line 284, it is stated that "It is plausible that neural synchronization is closely associated with group identification and collective performance...", but this is far from self-evident. Neural synchronization can occur even when people are merely watching a movie (Hasson et al., 2004), and movie-watchers are not engaged in collective behavior. There is no direct link between the IBS and collective behavior. The authors should explain why they believe inter-brain synchronization occurs in interactive settings and why they think it is related to collective behavior/performance.

      Thank you for bringing these points to our attention, we have clarified the relationship between neural synchronization and collective behavior in the Introduction section. (p.4). Moreover, in order to investigate whether neural synchronization stems from a common task or environment, we pseudo-randomized all pairs of subjects and created a null distribution consisting of 1,000 pseudo-groups, as described in Lines 311-315. This approach enabled us to eliminate neural synchronization resulting from factors other than social interaction, allowing us to identify neural patterns associated with collective performance (p.12).

      “Moreover, Ni et al. (2024) indicated that neural synchronization was linked to the strength of social-emotional communication and connections between individuals. An increase in neural synchronization has also been shown to predict the coordination and cooperation abilities of group members (Lu et al., 2023). Therefore, we hypothesize that neural synchronization may be related to group performance.” p.4

      “After that, the nonparametric permutation test was conducted on the observed interaction effects on GNS of the real group against the 1,000 permutation samples. By pseudo-randomizing the data of all participants, a null distribution of 1000 pseudo-groups was generated (e.g., time series from member 1 in group 1 were grouped with member 2 in group 2 & member 3 in group 3). The GNS of 1,000 reshuffled pseudo-groups was computed, and the GNS of the real groups was assessed by comparing it with the values generated by 1000 reshuffled pseudo-groups.” p.12

      The authors state that "GNS in the OFC was a reliable neuromarker, indicating the influence of group identification on collective performance," but this claim is too strong. Please refer to Figure 4B. Do the authors really believe that collective performance can be predicted given the correlation with the large variance shown? There is a significant discrepancy between observing a correlation between two variables and asserting that one variable is a predictive biomarker for the other.

      Thank you for your suggestion, we have revised the relevant statement (p.18).

      “Through correlation and regression model analysis, we found that in group decision-making, the increase in group identity would affect group performance by improving GNS in the OFC brain region.”  p.18

      Why are the individual answers being analyzed as collective performance (See, L-184)? Although these are performances that emerge after the group discussion, they seem to be individual performances rather than collective ones. Typically, wouldn't the result of a consensus be considered a collective performance? The authors should clarify why the individual's answer is being treated as the measure of collective performance.

      We appreciate the insightful comment provided by the reviewer. The decision to utilize individual responses as a metric of overall performance is based on several key considerations. Previous studies on various hidden profile tasks have utilized averaged individual scores to represent collective performance (e.g., Stasser et al., 1995; Wittenbaum et al., 1996; Brockner et al., 2022). Secondly, while consensus outcomes are typically regarded as collective expressions, we argue that in the context of this study, individual responses are not independent entities but rather extensions of the group decision-making process. The collective deliberation process significantly influenced individual thinking and decision-making in this study. Through group discussions, members shared perspectives, adjusted their stances, and formulated their responses based on collective insights. The responses provided by participants in this study were molded by the dynamics of group conversations, serving as an indirect measure of group performance and potentially indicating the efficacy of collective deliberations.

      Performing SPM-based mapping followed by conducting a t-test on the channels within statistically significant regions constitutes double dipping, which is not an acceptable method (Kriegeskorte et al., 2011). This issue is evident in, for example, Figures 3A and 4A.

      Please refer to the following source: https://www.nature.com/articles/nn.2303

      We have carefully reviewed the articles provided by the reviewer, and we acknowledge the concerns regarding selective analysis and double dipping in our statistical approach. To address this, we believe it is important to clarify this issue further in the Discussion section (pp.26-27).

      Our study introduces a novel perspective while utilizing conventional fNIRS-based hyperscanning analyses (Liu et al., 2019; Pärnamets et al., 2020; Reinero et al., 2021; Számadó et al., 2021; Solansky, 2011), methods that are widely endorsed within the field. In our analysis, significant channels were first identified using a one-sample t-test, followed by additional analyses including ANOVA, independent samples t-tests, and other procedures. We would like to emphasize that the statistical assumptions underlying the one-sample t-test and paired-sample t-test in our study maintain a level of independence. Moreover, to further mitigate concerns about the potential for double dipping, we employed permutation testing to validate the robustness of our results and ensure that our findings are not influenced by biases inherent in the selection of significant regions.

      We recognize the importance of rigorous statistical practices and are committed to upholding the highest standards of analysis. As such, we have revisited our methodology and included a more detailed explanation of the steps taken to avoid double dipping and ensure the integrity of our analyses in the revised manuscript.

      “Although our study has found a new perspective, the analysis method still refers to and uses the traditional fNIR-based hyperscanning analyses (Liu et al., 2019; P¨arnamets et al., 2020; Reinero et al., 2021; Számadó et al., 2021; Solansky, 2011), which is generally accepted by the majority of fNIR-based hyperscanning researchers. For example, we would first identify significant channels through a one-sample t-test and then conduct further analyses, such as ANOVA or independent samples t-tests. Selective analysis is a powerful tool and is perfectly justified whenever the results are statistically independent of the selection criterion under the null hypothesis (Kriegeskorte et al., 2019). However, it may lead to double dipping and missing information. In this study, the absence of statistically significant TPJ activation in the analyzed data led to the TPJ being ignored. In the future, it should be made explicit in the analysis, and the reliability of the results should be ensured by appropriate statistical methods (e.g., cross-validation, independent data sets, or techniques to control for selective bias).” p.26-27

      In several key analyses within this study (e.g., single-brain activation in the paragraph starting from L398, neural synchronization in the paragraph starting from L393), the TPJ is mentioned alongside the DLPFC. However, in subsequent detailed analyses, the TPJ is entirely ignored.

      We thank the reviewer for your careful review and valuable comment. TPJ is referenced in certain analyses within this paper (as detailed in paragraphs L414 and L440); however, its role remains inadequately investigated and expounded upon in subsequent more intricate analyses. This is due to the absence of statistically significant TPJ activation in the analyzed data. As pointed out by the reviewer, limitations may exist in pursuing further analyses through ROIs, a point we also have addressed in the Discussion section (p.27).

      The method for analyzing single-brain activation is unclear. Although it is mentioned that GLM (generalized linear model) was used, it is not specified what regressors were prepared, nor which regressor's β-values are reported as brain activity. Without this information, it is difficult to assess the validity of the reported results.

      We have revised the relevant description to clarify the analyses of single-brain activation (p. 11)

      While the model illustrated in Figure 7 seems to be interesting, for me, it seems not to be based on the results of this study. This is because the study did not investigate the causal relationships among the three metrics. I guess, Figure 5D might be intended to explain this, but the details of the analysis are not provided, making it unclear what is being presented.

      We regret the confusion that has arisen. Firstly, as highlighted by the reviewer, the model depicted in Figure 7 is not directly derived from the causal analysis conducted in this study. Our investigation did not directly explore the causal relationships among the three indicators; instead, we constructed a model based on correlations and potential mechanisms. In the revised manuscript, we have explicitly stated that Figure 7 represents a descriptive model (p.22).

      Regarding Figure 5D, the reviewer noted that while it may offer some explanatory value, it lacks the necessary analytical detail to elucidate the chart's significance clearly. We have clarified the details of the analysis in Figure 5 (pp.13-14). The model in Figure 5D suggested that the connection between the similarity in individual-collective performance and the correlation of brain activation, as well as whether the impact of each individual’s single-brain activation on the corresponding group’s GNS was regulated by their brain activation connectivity.

      “Finally, we employed correlation and mediation analyses to assess if brain activation connectivity could explain the connection between individuals’ single-brain activation and the related group’s GNS. We examined the connection between the similarity in individual-collective performance and the correlation of brain activation, as well as whether the impact of each individual’s single-brain activation on the corresponding group’s GNS was regulated by their brain activation connectivity. We utilized the PROCESS tool in SPSS to investigate the proposed moderation effect. Specifically, we applied Model 1 with 5000 bootstrap resamples to examine the interaction between the independent variable (i.e., single-brain activation) and the moderator (i.e., brain activation connectivity) in predicting the dependent variable (i.e., GNS). It is noteworthy that prior to analysis, all variables in the moderation model were mean-centered to reduce multicollinearity and improve the interpretability of interaction terms.”  p.13-14

      “Building on the above results, we have developed a two-in-one neural model that explains how group identification influences collective performance. This descriptive model aims to illustrate the potential interrelationships among these indicators and establish a conceptual framework to inspire forthcoming research endeavors.”  p.21

      The details of the experiment are not described at all. While I can somewhat grasp what was done abstractly, the lack of specific information makes it impossible to replicate the study.

      As suggested, we have clarified the details of the experiment in the manuscript.

      (1) As stated in the public review, the details of the experiment are not described at all and while I can somewhat grasp what was done abstractly, the lack of specific information makes it impossible to replicate the study. In points a-e below, I list the aspects that I could not fully understand, but I am not asking for direct answers to these points. Instead, please provide a detailed description of the experiment so that it can be replicated.

      Thank you for your suggestion; we have responded to each question sequentially and elaborated on the experiment specifics to ensure replicability.

      (a) Please provide more detailed information about the Group Identification Task. How much did each participant speak (was there any asymmetry in the amount of speaking, and was there any possibility that the asymmetry influenced the identification rating)? Did the three participants interact in person, or online? Are they isolated from experimenters? How was the rating conducted, what I mean is that it's a PC-based rating?

      We apologize for the lack of detail in our description of the procedures for the experiment.

      For the first question, we draw upon previous studies concerning the manipulation of group identity while controlling the content of pre-task conversations. Specifically, the high-identity group engaged in self-introductions and identified similarities among the three members, whereas the low-identity group discussed topics related to the current semester's classes (Xie et al., 2023; Yang et al., 2020). Both discussions were conducted for the same duration of three minutes, ensuring that the number of exchanges between the two groups remained comparable. There was almost no asymmetry in the amount of speaking. We also conducted a manipulation check, which confirmed the effectiveness of our identity manipulation(pp.5-6).

      Xie, E., Li, K., Gu, R., Zhang, D., & Li, X. (2023). Verbal information exchange enhances collective performance through increasing group identification. NeuroImage, 279, 120339.

      Yang, J., Zhang, H., Ni, J., De Dreu, C. K., & Ma, Y. (2020). Within-group synchronization in the prefrontal cortex associates with intergroup conflict. Nature neuroscience, 23(6), 754-760.

      “Both discussions were conducted for the same duration of three minutes, ensuring that the number of exchanges between the two groups remained comparable.”  p.5-6

      For the second question,the three participants interacted offline in a face-to-face setting, while the experimenter remained outside the laboratory (p.6).

      “The three participants conducted face-to-face offline interaction throughout the manipulation process.” p.6

      For the third question, at the beginning of the experimental task, participants were isolated from the experimenters (p.6).

      “In addition to explaining the next phase of the task and controlling the timer, experimenters would be isolated from participants.” p.6

      For the last question, the rating of group identification was conducted through a questionnaire presented on participants’ phones (p.6).

      “The questionnaire was presented on participants’ phones.” p.6

      (b) The procedures of the Main Task are also unclear. For the Reading Information (5 min): How was the information presented? PC-based or paper-based? How were the participants seated? Did they read it independently?

      We apologize for the missing details. We have included the following information in the article.

      For the first and last question, each participant would get a piece of paper, which presents the common information and private information. They read independently. (p.6)

      “Each participant would get a piece of paper, which presented the information. Participants could read independently.” p.6

      About how the participants sat, the three participants sat around a table without partitions between each other. Only in the discussion stage, they could communicate face-to-face (p.6).

      “They sat around a table without partitions between each other.” p.6

      “In this process of discussion, the participants were able to communicate face-to-face and verbally.” p.6

      (c) For Sharing Private Information: The authors stated they share text messages using Tencent Meeting. If so, how and with what devices? How was the information displayed on the screen? Were the participants even in the same room?

      Thank you for your reminder. We have added more details now (p.6). Firstly, the experimenter sent the Tencent Meeting link to the participants. After the participants entered the meeting through their mobile phones, they could text the information they wanted to share in the chat box of the meeting. They were in the same room, with Tencent Meeting recording shared information, the participants could view them at any time.

      “During the group sharing, participants entered Tencent Meeting via their mobile phones and were able to text their private information in the chat box to their group members for 5 minutes.” p.6

      (d) For Discussing Information: It's a verbal interaction. How did they interact with others? What is the distance between them? I found a very small picture in Figure 8, but that is all information about experiment settings, that is provided by the authors.

      We are sorry about the missing details. As we have explained in the article it’s a verbal communication, so participants could talk face to face in one room. We have included the following information in the article (p.6).

      “Participants were sitting and communicating around a table. The distance between adjacent participants was about 15 cm, and the distance between face-to-face participants was about 40 cm. In this process of discussion, the participants were able to communicate face-to-face and verbally.” p.6

      (e) For the Decision Process (5 min): How did they answer (What I mean is verbally, writing, or computer-based input), and how did the experimenters record these answers?

      The questions were presented on paper, so the participants could write down their answers and experimenters could count the answers on paper. We have included the following information in the article(p.7).

      “After discussion, all triads were given 5 minutes to answer the following questions (i) the probability of three suspects, 0%-100% for each suspect; (ii) the motivation and tool of crime; and (iii) deduced the entire process of crime. The three questions were presented on paper, allowing participants to write their answers directly on the same sheet. Subsequently, three independent raters used these paper questionnaires to record and calculate the scores for each group.” p.7

      (2) I find the model presented in Figure 7 to be intriguing. Understanding why inter-brain synchronization occurs and how it is supported by specific single-brain activations or intra-brain functional connectivity is indeed a critical area for researchers conducting hyperscanning studies to explore. However, the content depicted in this model is not based on the results of this study. This is because the study did not investigate the causal relationships among the three metrics. I guess, Figure 5D might be intended to explain this, but the details of the analysis are not provided, making it unclear what is being presented. Please include a detailed explanation.

      The specific answers are available on page 5 of our response letter.

      (3) The analysis of single-brain activation analysis (and probably other analyses) focuses on the period from reading to making decisions (L237). Why was this entire interval chosen for analysis? Reading does not involve social interaction. As mentioned in a previous comment, the details of the tasks are unclear, so it's difficult to understand what was actually done in the reading period. Anyway, why were these different phases combined as the focus of analysis? Please clarify the reasoning behind this choice.

      Thank you for your feedback. The decision to analyze the entire interval, spanning from reading to decision-making, was primarily made to grasp the continuum of information processing comprehensively. While reading itself lacks social interaction, it serves as the foundation for subsequent decision-making, during which participants' cognitive states and affective responses gradually evolve. Therefore, examining these two phases collectively enables a more thorough investigation into how information influences decision-making. Furthermore, considering the task details remain ambiguous, we aim to uncover the underlying cognitive and affective mechanisms through a holistic analysis.

      (4) The method for analyzing single-brain activation is unclear. Please provide a detailed description of the analysis methods.

      Thank you for your suggestion, we have added more details in the Method section (p.11).

      “In the GLM model analysis, HbO was the dependent variable, and the regression amount was set to different task stages (a. Reading information, b. Sharing private information, c. Discussion information, d. Decision). After that, we convolved the regression factor with the Hemodynamic Response Function (HRF), and obtained the brain activation β value of each participant in each channel at different task stages through regression analysis.”  p.11

      (5) In the periods of Reading Information and Sharing Private Information, there appears to be no social interaction between participants (Figure1D). However, Figure 6 shows an increase in brain activity correlation even during the first 10 minutes (it corresponds to the Reading and Sharing period). Why does inter-brain correlation (GNS, in this study) increase even though there is no interaction between participants? Please provide an explanation.

      Sharing private information fosters interactive engagement, necessitating its exchange during Tencent Meetings to facilitate sharing. Previous research suggests that heightened correlations in brain activity can be attributed to (1) intrinsic cognitive processes, wherein participants display similar cognitive and emotional responses, fostering shared cognitive processing and brain activity synchronization despite limited external interaction; (2) emotional connections, as divulging private information elicits emotional responses that can be neurally correlated among individuals; and (3) environmental influences, where shared environments and contexts prompt neural interaction among participants even in the absence of direct social engagement. These factors collectively contribute to increased brain activity correlations without active interaction. Our primary focus, however, lies in the phase characterized by significant synchronized brain activity.

      Minor Comments:

      (6) Equation 1 Explanation: There is no explanation of Equation 1. It mentions Yi as the collective score, but what constitutes the collective score Yi is not defined in the manuscript. Additionally, while "i" is referred to as an item (in Line 196), the meaning of "item" is not clear. Therefore, the meaning of this equation is not understood.

      We apologize for this confusion. We have added a description in the manuscript (p.9).

      “In Eq.1, x is the individual score, y is the collective score (y is calculated from the three per capita scores), and i stands for the group number for the item. So, x_i means the individual score of participants in the _i group, and y_i means the collective score of the _i group. _d (x, y) r_epresents the distance from the individual to the collective score.”  p.9

      (7) Equation 2 Explanation: There is no explanation for Equation 2. Please provide descriptions for all variables such as S, t, and w.

      We have clearly stated the meaning of s, t, and w in the first edition of the manuscript article (p.12).

      As shown in L291-293: Here, t denotes the time, s denotes the wavelet scale, 〈⋅〉 represents a smoothing operation in time, and W is the continuous wavelet transform (Grinsted, Moore, & Jevrejeva, 2004).

      (8) Acronyms: Please define all acronyms upon their first appearance (e.g., CFI, TLI, RMSEA in L380).

      We apologize for these mistakes, and we have added full explanations for abbreviations upon their first use (p.16).

      “The mediation model demonstrated a satisfactory fit (CFI = 0.93, TLI = 0.93, RMSEA = 0.04) (CFI-Comparative Fit Index; TLI-Tucker-Lewis index; RMSEA-Root-Mean-Square Error of Approximation), suggesting that the perceived group identification of each individual affected the alterations in single-brain activations in the DLPFC, consequently leading to variations in their performance (β<sub>a</sub> = 0.16, t = 2.20, p = 0.030; β<sub>b</sub> = 0.26, t = 3.56, p < 0.001; β<sub>c</sub> = 0.18, t = 2.34, p = 0.020) (Figure 3C).”  p.16

      (9) Hyperscanning fMRI Studies: Since there are hyperscanning fMRI studies analyzing communication among three people (e.g., Xie et al., 2020, PNAS), it would be beneficial to cite this research. pnas.org/doi/pdf/10.1073/pnas.1917407117.

      As suggested, we have cited this paper. (p.4)

      (10) Line 272; Line 275: Should these references be to Benjamini & Hochberg (1995)?

      As suggested, we have revised our citation.

      (11) Research Objectives: The authors' aim seems to be understanding the relationship between Group Identification Level (High or Low), collective performance, and inter-brain synchronization (GNS). If so, shouldn't the results shown in Figure 6 illustrate how these differ between High and Low groups?

      We are grateful to the reviewer for your insightful comment. This study aimed to investigate the impact of group identity levels on collective performance and interbrain synchronization. Our analysis primarily focused on inter-group disparities to elucidate the potential influence of varying levels of group identification on collective behavior and neural synchrony, as highlighted by the reviewer. It is important to note that the relationship between group identification levels and collective performance, as well as neural synchronization, may represent a continuous or correlational process, rather than a binary comparison between two distinct groups. Notably, we treated group identification as a continuous variable and, consequently, Figure 6 was designed to illustrate trends in the association between group identification levels and both collective performance and neural synchronization, without conducting significance tests between groups. We are confident that the depiction in Figure 6 effectively captures the evolving dynamics between group identification levels and both collective performance and neural synchronization.

      (12) Figure 6 Star-Marker: What is the star marker shown in Figure 6? Please provide an explanation.

      We apologize for this confusion. We have added this explanation to the article. (p.21)

      “The red star sign indicates that at this time point, the neural signal began to increase significantly.” p.21

      (13) Pearson's Correlation: Use "Pearson's correlation" instead of "Pearson correlation."

      Thanks for your comments, we've changed Pearson correlation to Pearson's Correlation for a total of 10 places in the original text (pp. 9,11,13, 15,16, 19,23).

      “Moreover, the Pearson’s correlation was used to examine the relationship between group identification_2 and collective performance.” p.9

      “Subsequently, we used Pearson’s correlation analyses to investigate the relationship between single-brain activation and individual performance.” p.11

      “Second, the Pearson’s correlation between GNS and collective performance was performed.” p.13

      “Following that, we analyzed Pearson’s correlations between the original HbO data in the region related to individual and collective performance, denoted as brain activation connectivity (Lu et al., 2010).” p.13

      “Subsequently, the Pearson’s correlation between the quality of information exchange and collective performance was assessed.” p.15

      “Furthermore, the results of the Pearson’s correlation indicated that groups with higher group identification were more likely to exhibit better collective performance (r \= 0.38, p \= 0.003) (Figure 2B).” p.15

      “The Pearson’s correlation and its associated analyses were based on the data from group identification_2. *p < 0.05.” p.16

      “We first extracted the HbO brain activities related to individual performance (e.g., DLPFC, CH4) and collective performance (e.g., OFC, CH21) of each group member and conducted a Pearson’s correlation between the two.” p.19

      “Subsequently, Pearson’s correlation was used to test whether individual differences in the similarity in individual-collective performance were reflected by DLPFC-OFC connectivity.” p.19

      “Pearson’s correlation showed that the higher quality of information exchange, the better collective performance (r \= 0.36, p \= 0.007) (Figure 8C).” p.23

      (14) MNI Coordinates: The MNI coordinates for each channel are listed in the supporting information. How were these coordinates measured? Were they consistent for all participants? Was MRI conducted for each participant to obtain these coordinates?

      Thank you for your reminder, we have included the necessary instructions in the revised version. First, we need to clarify that we referred to previous literature to determine the placement of the optical probe plates. Following the completion of data collection, we utilized the Vpen positioning system to accurately locate the detection light poles, ultimately obtaining the MNI positioning coordinates. These coordinates were basically consistent for each participant. (p.8)

      “For each participant, one 3 × 5 optode probe set (8 emitters and 7 detectors forming 22 measurement points with 3 cm optode separation, see Table S1 for detailed MNI coordinates) was placed over the prefrontal cortex (reference optode is placed at Fpz, following the international 10-20 system for positioning). The other 2 × 4 probe set (4 emitters and 4 detectors forming 10 measurement points with 3 cm optode separation, see Table S2 for detailed MNI coordinates) was placed over the left TPJ (reference optode is placed at T3, following the international 10-20 system for positioning). The probe sets were examined and adjusted to ensure consistency of the positions across the participants. After the completion of data collection, we utilized the Vpen positioning system to accurately locate the detection light poles, ultimately obtaining the MNI positioning coordinates.”  p.8

    1. Author response:

      Reviewer #1:

      A) The presentation of the paper must be strengthened. Inconsistencies, mislabelling, duplicated text, typos, and inappropriate colour code should be changed.

      We will revise the manuscript to correct the abovementioned issues.

      B) Some claims are not supported by the data. For example, the sentence that says that "adolescent mice showed lower discrimination performance than adults (l.22) should be rewritten, as the data does not show that for the easy task (Figure 1F and Figure 1H).

      We will carefully review, verify claims, and correct conclusions where needed.

      C) In Figure 7 for example, are the quantified properties not distinct across primary and secondary areas?

      We will analyse the data in Figure 7 separately for AUDp and secondary auditory cortices to test regional differences. Additionally, we will provide a table summarizing key neuronal firing properties for each area during passive recordings to clarify how activity varies across cortical subregions and developmental stages.

      D) Some analysis interpretations should be more cautious. (..) A lower lick rate in general could reflect a weaker ability to withhold licking- as indicated on l.164, but also so many other things, like a lower frustration threshold, lower satiation, more energy, etc).

      We will address issues around lick bias including alternative explanations, such as differences in motivation or impulsivity.

      Reviewer #2:

      A) For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We will edit the discussion and clarify these points. In addition, we will adjust and extend the methodology section to clarify the rationale of our analysis.

      B) The results of the optogenetic manipulation, while very interesting, warrant a more in-depth discussion.

      We agree that the effects observed in our optogenetic manipulation warrant further discussion. We will extend on the analysis and discussion of ACx silencing.

      Reviewer #3:

      A) One fact that could help shed light on this would be to know how often the animals licked the spout in between trials. Finally, for the head-fixed version of the task, only d' values are reported. Without the corresponding hit and false alarm rates (and frequency of licking in the intertrial interval), it's hard to know what exactly the animals were doing.

      We recognize the need for a more nuanced analysis for the head-fixed version of the task. We will extend the behavioral analysis and provide more details to clarify these points.

      B) There are some instances where the citations provided do not support the preceding claim. For example, in lines 64-66, the authors highlight the fact that the critical period for pure tone processing in the auditory cortex closes relatively early (by ~P15). However, one of the references cited (ref 14) used FM sweeps, not pure tones, and even provided evidence that the critical period for this more complex stimulus occurred later in development (P31-38). Similarly, on lines 72-74, the authors state that "ACx neurons in adolescents exhibit high neuronal variability and lower tone sensitivity as compared to adults." The reference cited here (ref 4) used AM noise with a broadband carrier, not tones.

      We appreciate the reviewer pointing out instances where our citations may not fully support our claims. We will carefully review the relevant citations and revise them to ensure they accurately reflect the findings of the cited studies. We will update references in lines 64–66 and 72–74 to better align with the specific stimulus types and developmental timelines discussed.

      C) Given that the authors report that neuronal firing properties differ across auditory cortical subregions (as many others have previously reported), why did the authors choose to pool neurons indiscriminately across so many different brain regions?

      We agree that pooling neurons from multiple auditory cortical regions could potentially obscure region-specific differences. However, we addressed this concern by analyzing regional differences in neuronal firing properties, as shown in Supplementary Figures S4-1 and S4-2, and Supplementary Tables 2 and 3. Additionally, we examined stimulus-related and choice-related activity across regions and found no significant differences, as presented in Supplementary Figure S4-3. Please see our response to Reviewer 1, where we further elaborate on this point.

      D) And why did they focus on layers 5/6? (Is there some reason to think that age-related differences would be more pronounced in the output layers of the auditory cortex than in other layers?)

      We acknowledge that other cortical layers are also of interest and may contribute differently to auditory processing across development. Our focus on layers 5/6 was motivated by both methodological considerations and biological relevance. These layers contain many of the principal output neurons of the auditory cortex, and are therefore well positioned to influence downstream decision-making circuits. We will clarify this rationale in the revised manuscript and note the limitations of our approach.

    1. Let’s face it, very few people read the “terms and conditions,” or the “terms of use” agreements prior to installing an application (app). These agreements are legally binding, and clicking “I agree” may permit apps (the companies that own them) to access your: calendar, camera, contacts, location, microphone, phone, or storage, as well as details and information about your friends.  While some applications require certain device permissions to support functionality—for example, your camera app will most likely need to access your phone’s storage to save the photos and videos you capture—other permissions are questionable. Does a camera app really need access to your microphone? Think about the privacy implications of this decision. When downloading an app, stop and consider: Have you read the app’s terms of use? Do you know what you’re giving the app permission to access? (e.g., your camera, microphone, location information, contacts, etc.) Can you change the permissions you’ve given the app without affecting its functionality? Who gets access to the data collected through your use of the app, and how will it be used? What kind of privacy options does the app offer?

      I think there is something that needs changing beyond how we interact with EULA (End User License Agreements) when we get access to an app. Here in the US, EULAs are complex and long, which is what makes us click agree without reading. If our nation could implement functions like nations in Europe have for EULAs, we could keep them simpler and readable, which is better for the consumer. I think this is most of the real solution, fixing the EULAs themselves, not fixing how we read the EULAs.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02655

      Corresponding author(s): Thierry SOLDATI

      1. General Statements [optional]

      The emergence of powerful model organisms for infection studies accelerates discoveries in innate immunity and conserved cell-autonomous defence mechanisms. Using the genetically tractable Dictyostelium discoideum/Mycobacterium marinum infection platform, we explored the critical interplay between pathogen-induced membrane damage and host repair pathways.

      Recent findings highlight evolutionarily conserved membrane repair pathways as crucial for cellular integrity against both sterile and pathogenic insults. We previously demonstrated the involvement of ESCRT and autophagy machineries in repairing membrane damage and containing pathogenic mycobacteria within vacuoles. Crucially, we uncovered that TrafE, an evolutionarily conserved TRAF-like E3 ubiquitin ligase, coordinates these machineries to repair membrane damage, preventing cell death.

      Here, we reveal that pathogenic mycobacteria manipulate host membrane microdomain scaffolding proteins and sterols to enhance toxin activity and facilitate bacterial escape. Genetic knockout of these microdomain organizers and sterol depletion significantly reduce membrane damage and bacterial escape, effectively containing mycobacteria and increasing host resistance. The conserved roles of flotillin and sterols are confirmed in murine microglial cells, underscoring evolutionary conservation.

      These discoveries significantly advance understanding of intracellular host-pathogen interactions, offering broad implications for cellular microbiology and immunology and have already attracted wide interest at major international scientific meetings.

      Thanks to the constructive criticisms and suggestions of the referees, we were able to significantly enhance the manuscript by integrating novel experimental strategies and improving presentation and discussion of previous results that together further strengthen our evidence.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The proposed study aims to elucidate the role of membrane microdomains and associated proteins-Vacuolin A, B, and C-during the infection of Dictyostelium discoideum (Dd) amoebae by Mycobacterium marinum (Mm). The results demonstrate that Vacuolins are required for Mm virulence, and that the presence of membrane microdomains is essential for phagosome membrane damage and bacillary escape into the cytosol-key steps in establishing a successful infection and subsequent bacterial proliferation. The study is well-designed, employing methodologies with which the authors have demonstrated expertise. Overall, it is methodologically sound, and most conclusions are well-supported by the presented data. However, some points require clarification.

      We thank the referee for their positive evaluation of the scope and strengths of our manuscript. The constructive criticisms of the referees were important to guide our revisions. We are convinced that the new data now integrated further strengthen our evidence.

      Major Points:

      The study aims to link the function of Dd Vacuolins to their potnetial facilitating role in phagosome escape and overall infection by Mm. To phenocopy the effect of Vac-KO, the authors used MβCD. Strikingly, this compound had a more significant impact on phagosome escape compared to Vac-KO, which either did not affect or only mildly affected this process. This likely reflects a difference in the underlying mechanisms being studied. Vac-KO cells may lack well-organized membrane domains but could retain a similar overall membrane composition. In contrast, MβCD disrupts these domains by chelating cholesterol, thus altering both the membrane composition and the domains themselves. This may explain why EsxA partitioning is more affected by MβCD than by triple KO. Consequently, this suggests that cholesterol, rather than the mere presence of membrane domains, plays a critical role in EsxA partitioning and activity in the phagosome. And even if LLOMe activity was lower in Vac-KO cells, this might be explained by the compartment targeted, the lysosomes which membrane composition may differ from the MCV. These points should be further discussed in the discussion section.

      The referee is right on target, these are all excellent points, and we fully agree with the argumentation. If we compare EsxA to a cholesterol-dependent PFT such as SLO, the presence of sterol is an absolute requirement for pore formation, but the local concentration of sterols achieved via clustering and the organisation of lipids/sterols in microdomains "only" increases efficiency (see for example PMID: 39835825). Therefore, the respective impacts of vac-KO and CD treatment differ in "intensity", and are additive in most assays, but are not resulting from "different underlying mechanisms". The simplest and most plausible interpretation of the combined results is that EsxA requires a threshold of local concentration/clustering of sterols to act and vacuolins/flotillins is one of the means to achieve it. In other words, membrane composition inhomogeneities exist in physiological membranes, particularly sterol and sphingolipid clustering in rafts, and microdomain organisers probably regulate their size and dynamics. Without vacuolin/flotillin, these inhomogeneities persist. Only when sterol is depleted and/or redistributed, do they disappear. In brief, the local sterol concentration is the trigger for EsxA preferential partitioning and activity, and many factors besides microdomain organisers influence it.

      The second interesting point is that LLOMe is a lysosomotropic membrane damaging agent, whereas EsxA targets the MCV membrane. We have already documented that the MCV has some endo-lysosomal properties and potentially resembles most the "post-lysosomal" compartment, characterized by a mildly acidic pH (pH ~6), the presence of Rab7 and zinc, ammonium and cupper transporters, for example. Our experiments also show that LLOMe is active in the whole endo-lysosomal pathway, including these post-lysosomes (PMID: 30596802, PMID: 37070811). The exact lipid composition of the MCV and post-lysosomes is not known, but both accumulate sterols in a similar manner. Both compartments are also akin to multivesicular bodies. These data are no direct proof but are compatible with our conclusions that both LLOMe and EsxA require similar threshold of local sterol concentration and that vacuolins are a mean to achieve this.

      The presentation of these conclusions has been revised and enhanced in the discussion (for example lines 396-400 and 437-439).

      Despite these similarities between LLOMe and EsxA activities, note that the mature MCV can be distinguished from all other endo-lysosomal compartments by the use of a Flipper probe that is sensitive to lipid composition and packing (Fig. 7C, and see below). In addition, RNAseq analyses of the impact of vac-KO and sterol depletion on infected and non-infected cells also highlight the interdependence between sterol concentration and vacuolin expression (Fig. 3G, 4G and H, Fig. EV5 and 6, and see below).

      Based on this observation, in figure 2, does the D4H/filipin signal or association increase over time as the Vac signal "solidifies"? In Vac-KO cells, does the mScarlet-D4H signal change in intensity or pattern (building on fig. S1)? These insights could provide valuable information on cholesterol levels at the MCV in KO versus wild-type cells. If possible, the authors should quantify fluorescence or the frequency of signal association.

      Qualitatively, sterols, as visualised by filipin and D4H, are present at all stages of the endo-lysosomal pathway and of MCV biogenesis. Now, there are many technical difficulties linked to a quantitative assessment, and therefore, please, let me present the framework. First, despite their wide use, the exact mechanism of binding of both reporters and which pool of sterol they visualise is still a mystery. This is often expressed as "they detect the accessible pool" of sterol, whatever it is. In addition, filipin detects sterols in both leaflets (and in intra-lumenal vesicles and other lipidic structures), while D4H detects sterols only in the cytosolic leaflet, and it is not known whether both leaflets have the same concentration of sterols. It is also known that filipin signal is only indirectly proportional to the sterol quantity in a cell, as measured by other quantitative methods. One of the best examples comes from studying the cellular phenotype of Niemann-Pick Type C disease, because many publications report a strong increase of filliping staining, whereas lipidomic analyses show at best a two-fold increase in cholesterol in NPC deficient cells. Moreover, technically speaking, D4H is a live probe, and fixation leads to some loss of localisation, probably because sterols are not fixable. On the other hand, filipin is mainly used after chemical fixation, but again sterols are not fixable, and the signal is very likely restricted to the membrane of origin, but not necessarily to the microdomains.

      All this to admit that, despite numerous and rigorous tentatives, we have not been able to reliably obtain quantitative measurements of neither filipin nor D4H signals. Also, these features likely also explain why we were not able to document changes in "patterns" of signals during MCV maturation. We ask for the referee's indulgence about this. Vacuolins remain the best microdomain morphology reporters.

      We nevertheless present additional qualitative D4H and VacC colocalization images in Fig. EV1C.

      Additionally, since Vacuolins do not have a significant impact on phagosome damage or escape, the difference in intracellular growth may be indirect, as suggested in the team's previous study on Vacuolins (DOI: 10.1242/jcs.242974). The authors measured MCV pH in figure S6-could they repeat this experiment to test whether Vacuolins affect MCV maturation? This was investigated in a previous version of the pre-print (DOI: 10.1101/2021.11.16.468763), and if the results still hold, it would strengthen the hypothesis that Vacuolins promote escape by modulating membrane organization, rather than influencing phagosome maturation.

      First, we respectfully disagree that vacuolins have no impact on membrane damage, we explained above why this impact is limited, but nevertheless additive with sterol depletion in most assays, during infection and sterile damage.

      We thank the referee for their excellent knowledge of the literature. Indeed, we previously went to extreme experimental sophistication to interrogate the impact of vac-KO on endo-phagosomal maturation. We were able to demonstrate that the major impact is on the recycling of phagocytic receptors and therefore on the cytoskeleton- and motor-induced deformation of the membrane in a cup that is essential for efficient phagocytosis (but not macropinocytosis). We also demonstrated a minimal effect on maturation, on the kinetics of pH change and delivery/recycling of hydrolases, but these cell biological differences did not translate in an impact on bacteria killing and digestion. As mentioned above, the MCV shares characteristics with post-lysosomes but minimal alterations of endo-lysosomal maturation in vac-KO cells should not be responsible for the strong effect on Mm infection. In other words, we are convinced that these minimal (mainly loss-of-function) perturbations that do not impact killing of food bacteria do not lead to an increased phagosomal "ferocity" and restriction of tough mycobacteria.

      Consequently, we decided not to repeat experiments to measure the pH around wt Mm in vac-KO cells, as it is anyway only slightly and transiently acidified in wt host cells, and previous work did not reveal major differences in endolysosomal compartment pH control (PMID: 32482795). But we agree with the referee that some of the MCV maturation data presented in the previous bioRxiv version are interesting for specialists, despite the indications of extremely small alterations between wt and vac-KO host cells. These data document that in absence of vacuolins, MCV characteristics are slightly altered, but we found no indication that they are more bactericidal in vac-KO cells (Fig. EV8F-H).

      Finally, as a substantial part of this manuscript relies on microscopy and image analysis, the methods section should detail how these analyses were performed. Specifically, for figure 1f, it is unclear how the cells were segmented and fluorescence quantified-was total fluorescence per cell measured, or was an average value used? Figures 5c and 5h could be moved to the supplementary material, and the segmentation method should be explained in the methods section. Additionally, statistical analysis should be more clearly described, justifying the use of one-way or two-way ANOVA, and specifying the post-hoc tests used for group comparisons.

      We fully agree with the referee and have therefore improved the detailed description of image analyses. For example, details for cell segmentation in images originating from infection and LLOMe experiments are succinctly described in the Materials & Methods section (lines 585-588, 594-597, and 639-640), but we now also refer to a methods chapter in press that describe in detail the whole segmentation pipeline (Perret et al. 2025).

      Concerning specifically Fig. 1F, we distinguished infected or bystander cells by the presence of bacteria and quantitated the maximal fluorescence intensity for each cell. Then, we decided on an arbitrary threshold of intensity of 5,000, that corresponds to the maximal signal observed for cells in mock conditions. Then, we quantified the percentage of bystander and infected cells with a higher-than-threshold (>5,000) vacuolin signal intensity. This clarification is now added to the legend of Fig. 1F.

      The statistical analyses applied are described in more detail in each figure legend.

      Reviewer #1 (Significance (Required)):

      This study provides the first direct evidence of the importance of membrane composition and organization in the virulence of Mycobacterium marinum, particularly in facilitating phagosome damage and bacillary escape. Using the well-established model of Dictyostelium discoideum infected with M. marinum, which has frequently been predictive of Mycobacterium tuberculosis behavior within phagosomes, the authors contribute critical insights into the mechanisms of mycobacterial phagosome escape-a key step in cellular invasion and dissemination. These findings have the potential to inform strategies aimed at blocking this escape mechanism, which, as demonstrated in this study, could prevent intracellular bacterial growth.

      This work is significant for advancing our understanding of mycobacterial pathogenesis, particularly by linking membrane microdomain composition to bacterial virulence. It will be highly relevant to researchers studying mycobacteria, intracellular pathogens, and host-pathogen interactions. While the study's use of M. marinum provides valuable insights, a limitation is that these results may not fully translate to M. tuberculosis, and further testing with the latter pathogen will be essential.

      We sincerely thank the referee for their very strong appraisal of our contributions, past and present, much appreciated. We agree that the translation of our findings to Mtb and macrophages is not guaranteed ... but has turned to be surprisingly and satisfyingly consistent in the past. To our delight, a recent article in Nature Communications reports about "Paired analysis of host and pathogen genomes identifies determinants of human tuberculosis" and clearly identified flotillin-1 as a susceptibility factor for tuberculosis (PMID: 39613754). We have introduced a sentence in the discussion that reads "Importantly and consistently with our findings, recent work has revealed flotillins as a major determinant of the fate of Mtb infection in patients, because overexpression of flotillin-1, resulting from particular allele variants, is a host susceptibility factor for Mtb infection (PMID: 39613754)." (Lines 477-480)

      I am an expert in the infection of macrophages by Mycobacterium tuberculosis, the phagosome escape mechanism, and its associated effectors. I also have expertise in microscopy and image analysis. However, I do not specialize in Dictyostelium discoideum biology.


      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      the authors of this manuscript reported that EsxA, a secreted virulent factor of Mtb or Mm, causes membrane lysis in sterol-rich micro domain. They used the Mm-infected amoeba as an infection model, and characterized the effects of microdomain in Mycobacterium-containing Vacuole (MCV) on EsxA-mediated membrane disruption. They found that disruption of the micro domain through knockout of vacuolins or sterol depletion diminished Mm-induced membrane damage and cytosolic escape. They also found that vacuolins and sterol are essential for EsxA inserting into the membranes in vitro, and flotillin knockdown and sterol depletion conferred the resistance of murine microglial cells to Mm infection. The experiments were well designed and controlled, and the data were convincing.

      We thank the referee for this snappy summary of our main findings and for the positive comment on study design.

      My major comment is that the authors need to justify the use of BV-2 cells that are murine microglial cells, instead of macrophage cell lines, which are more relevant to Mtb/Mm infection.

      We understand the referee's concerns about the host used for Mm infection. First, we would like to argue that it is very true that the detailed biological processes accompanying the infection by Mtb, Mm or in fact any other pathogen depend on the origin and status of the host cell. In the TB field, a plethora of host macrophages, from murine and human origins, primary or immortalised, alveolar or interstitial, M1 or M2 have been used through the decades. Beside a robust agreement on many processes (phagosome maturation arrest, MCV membrane damage, role of xenophagy etc...), some of the crucial outcomes, for example the susceptibility or resistance to Mtb infection and the type of host cell death, have been hotly debated and depend on the host phagocyte identity and status.

      Now, it is true that microglial cells have only rarely been used for Mtb (or Mm) research, but it does not mean that this is not relevant. First, we would like to remind the referee that TB is not only a pulmonary disease, and that among the most disastrous extra-pulmonary sites of infection is the brain, resulting in the devastating tuberculous meningitis. In fact, tuberculous meningitis is the most severe form of tuberculosis with a fatality rate of 20-50% in treated individuals (doi: https://doi.org/10.1101/2025.03.04.641394). A quick literature survey on the topic reveals over 9,000 publications, including very significant contributions, using both Mtb and Mm in animal and human models (PMID: 38745656, PMID: 38264653, PMID: 36862557, PMID: 32057291, PMID: 30645042, PMID: 29352446, PMID: 27935825, PMID: 26041993).

      We have introduced a brief mention of these arguments in the discussion (Lines 456-459).

      In addition, we have already shown that this BV-2 cell line is reliable, they are adherent, motile and constitutively phagocytic and thus do not need to be differentiated with mega-doses of PMA, or any other stimulus. They beautifully recapitulate our findings in the Dd-Mm model (PMID: 38270456, PMID: 25772333), including when used as a host phagocyte to validate anti-infective compounds that were primarily identified using the Dd-Mm platform (PMID: 29500372).

      We have introduced a brief mention of these arguments in the results section (Lines 329-334).

      We also introduced two novel experimental evidence to strengthen the link between the Dd and BV-2 model systems. First, we show using qRT-PCR that, like vacuolins, flotillin-1 is upregulated in BV-2 at 32hpi (Fig. EV9B). Excitingly, as mentioned as response to referee #1, a recent article in Nature Communications reports about "Paired analysis of host and pathogen genomes identifies determinants of human tuberculosis" and clearly identified flotillin-1 as a susceptibility factor for tuberculosis (PMID: 39613754). We have introduced a sentence in the discussion that reads "Importantly and consistently with our findings, recent work has revealed flotillins as a major determinant of the fate of Mtb infection in patients, because overexpression of flotillin-1, resulting from particular allele variants, is a host susceptibility factor for Mtb infection (PMID: 39613754)." (Lines 477-480)

      Second, we used for the first time the LysoFlipper probe to monitor MCV lipid composition and packing during infection (Fig. 7C). These results indicate that in BV-2 cells, as in Dd, the membrane characteristics of the MCV are profoundly different from the standard endo-lysosomal compartments.

      Reviewer #2 (Significance (Required)):

      It is well known that EsxA is membrane-lytic protein playing a role in Mtb/Mm-mediated phagosomal escape. There are other studies that have indicated lipid raft or micro domains in the membrane may play a role in EsxA-mediated membrane damage. This study further confirmed that the sterol-rich micro domain on the membrane has significant influence on the EsxA-mediated membrane disruption both in vitro and in vivo. While this finding is expected, but confirmation with solid experimental evidence is welcomed. This study also identified the genes or proteins required for micro domain organization, vacuolins and flotillin, which could be a target of host-directed therapy. Overall, this study is performed well and the results are convincing.

      We thank the referee for their expert views and comments on the function of EsxA and the lipidic environment in which it is supposed to act. We agree that EsxA has been the centre of attention for decades, but we respectfully disagree that its precise mode of action is known, neither in vitro nor in vivo. First, historically, it took the best of a decade for the field to accept that Mtb was not a strictly vacuolar pathogen. And even when the escape to the cytosol became a fact, the implication of EsxA remained extremely debated. For example, a "petition" was signed and published, arguing against its direct membrane damaging activity (PMID: 28119503). We agree that cumulated evidence now converges against a canonical "pore-forming" activity, but in favour of a "membrane-disrupting" activity. On the other hand, it is true that researchers have reached a form of consensus on the role of low pH to dissociate the EsxA-B dimer, and on the importance of the "physiological" composition of the acceptor membrane (PMID: 31430698, PMID: 35271388, PMID: 17557817). We are convinced that our evidence is not merely expected and confirmatory, but represents a novel, complete, solid, biochemical in vitro, molecular and genetics in vivo demonstration of the role of sterols clustering and microdomain organisers as susceptibility factors for Mm infection in evolutionary distant phagocytes.


      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Bosmani, Perret et al examines the role of Dictyodistelium discoideum vacuolin proteins in the integrity of the Mycobacterium marinum vacuole membrane. The data demonstrates that loss of vacuolins, similar to sterol depletion, reduced vacuole membrane damage meaning less cytosolic escape of the pathogen and subsequently reduced bacterial replication. The authors demonstrate functional analogy in a mammalian model of infection - where flotillin plays a similar role to the vacuolins - and this is an important demonstration of the utility of the D. discoideum model. The data is well presented and clear.

      We thank the referee for this positive summary of our main findings and of the clarity of results, interpretations and working model.

      Major Comments:

      There is no evidence presented in the manuscript of "microdomains" - while I believe this is likely a true description of what is happening on the vacuole membrane there is no visualisation of this. Both the GFP-Vac vacuole staining and the filipin staining show complete coverage of the vacuole. Perhaps at the 1 hour time points this is more convincing but I think it is worth looking at more of these earlier time points and quantifying these "microdomains" - i.e. proportion of vacuole membrane that is positive for the Vacs. Is it possible to look at the GFP-Vac signal and filipin staining at the same time? And other vacuole markers too?

      We agree with the referee that microdomains are the central characters of our study. Now, we would like to argue with the referee that one has to distinguish between structural, morphological evidence for the existence of microdomains and the biochemical and genetic evidence of their functional implication.

      On the one hand, microdomains are in fact nanometer-scale and are thus under the resolution limit of most optical microscopies. We and others already documented that during phagosome maturation, vacuolin distribution is patchy, reflecting the clustering of nanometer-scale inhomogeneities, and that the coating becomes more continuous with progressing maturation. The transition we observed here for vacuolins, as microdomain organisers, from a patchy to continuous coating reflects indirectly their macroscopic coalescence. As discussed above in response to the first referee, visualisation of the underlying lipidic clusters and microdomains is for technical reasons almost undoable. One cannot fix sterols. As replied to the first referee, we have not been able to improve much on the spatial resolution of lipidic microdomains, and, despite numerous and rigorous tentatives, we have not been able to reliably obtain quantitative measurements of neither filipin nor D4H signals, nor to document changes in "patterns" of signals during MCV maturation. We nevertheless present additional qualitative D4H and VacC colocalization images Fig. EV1C.

      On the other hand, we respectfully disagree that our manuscript lacks in strong and direct evidence for the functionality of sterol-rich microdomains as susceptibility factors required for a full mycobacteria infection in evolutionary distant phagocytes.

      In addition to the evidence presented previously, we have now added a large set of RNAseq analyses of the impact of vac-KO and sterol depletion on infected and non-infected cells, which also highlight the interdependence between sterol concentration and vacuolin expression (Fig. 3G, 4G and H, Fig. EV5 and 6). Moreover, we have now used a Flipper probe sensitive to lipid composition and packing to distinguish the mature MCV from all other endo-lysosomal compartments in microglial cells (Fig. 7C). Altogether, the simplest and most plausible interpretation of our cumulated evidence is that sterol-rich microdomains are necessary for EsxA-mediated MCV damage and escape to the cytosol.

      I really like the data presented in Figure 1 that demonstrates the specific upregulation of Vacuolin C during M. marinum infection. This is an intriguing result that brings up a lot of new questions e.g. how is this regulated? In response to membrane damage? Sensed by what? Does this upregulation also hold true for flotillin in the mammalian model? (and more!) however none of these ideas are pursued in the manuscript and by the end I was wondering why this data was included in the manuscript because all of the phenotypic data uses either a VacBC or ABC mutant. The link between figure 1 and the rest of the manuscript would be aided by characterisation of a specific VacC mutant.

      We share the referee's fascination with these data showing that VacC is a specific reporter of virulent mycobacteria infection. First, VacC expression at the transcriptional level, but also at the protein accumulation level both point toward a correlation with an infection with damage-causing mycobacteria. Specifically, one can distinguish two stages, one transient upregulation of all three isoforms that becomes sustained only for VacC and only when wt Mm causes damage (as opposed to the DRD1 mutant or M. smegmatis). This is clearly presented in multiple places in the manuscript (for example lines 377-380).

      Now, how is MCV damage sensed is extremely interesting and is the focus of numerous past and on-going studies in our laboratory but is out of the scope of this article. Just to mention a few lines of research as food for thoughts, membrane damage (by EsxA and by LLOMe) triggers the recruitment of the E3 ubiquitin ligase TrafE (PMID: 37070811), and subsequently of the ESCRT and autophagy machineries (PMID: 37070811, PMID: 30596802). Upstream of TrafE, we know that decrease of membrane tension is one parameter, because transient hyperosmolar shock also recruits TrafE to endo-lysosomal compartments (PMID: 37070811). On-going experiments demonstrate that calcium leakage from endo-lysosomes and MCV is another major triggering factor.

      As mentioned above, and in more direct response to the referee's questioning, we have now included RNAseq experiments that unequivocally indicate the link between vac-KO and sterol depletion and the direct effect on reducing membrane damage, because the two conditions lead to a down-regulation of the damage-dependent transcriptomic signatures of the ESCRT and autophagy related genes (Fig. 4G-H and Fig. EV5). Moreover, it clearly establishes that sterol depletion, which decreases sterile and EsxA-mediated damage, decreases vacuolin expression in infected cells (Fig 3G). Finaly, qRT-PCR on infected BV-2 microglial cells indeed documents an up-regulation of flotillin-1, reminiscent of vacC regulation in Dd (Fig. EV9B).

      All in all, we would like to respectfully ask the editor and referee to acknowledge that the signalling pathway between damage sensing and the vacuolin responses will be the focus of future studies.

      We understand that investigating the phenotypic consequences of only a single vacC-KO might be interesting, but we would like to argue that it is superfluous. First, for intricate biological reasons, KO of single and combinations of vacuolin genes result in very qualitatively and quantitatively similar phenotypes associated to motility, phagocytosis, endosome maturation etc... (PMID: 32482795). The present study extends this remarkable phenomenon by interrogating multiple parameters, reporters and phenotypes linked to infection, some shown and some unpublished (for example Fig. EV3B and Fig. 4D-E).

      Are the MMVs positive for all three vacuolins? It would be great if you could quantify which are present together or whether there are more distinct populations that are positive for just one or all three for example.

      The referee points to an interesting mechanistic aspect. We have therefore directly assessed the colocalization of pairs of vacuolin isoforms (Fig. EV1B), which clearly indicate that every MCV is coated with two vacuolins, which therefore arithmetically implies that all three isoforms are present together and that there is no isoform-specific MCV (Fig 2B). This is potentially also corroborated by earlier studies that showed vacuolin hetero-oligomerization (PMID: 16750281), a characteristic shared by flotillins (PMID: 38985763).

      Minor Comments:

      Fig 1F - this graph is quite striking but I think the individual data points should be presented as it is unclear whether this intensity threshold is an arbitrary value or genuinely represents two different populations. Perhaps better represented as a scatter plot?

      We fuly agree with the referee and have accordingly replotted all the graphs where this improved the visualisation and contributed to the interpretation of the data. We did not change the representation in Fig. 7E and G, Fig. EV3C, because the error bar already represents the deviation of the Area Under the Curve (AUC) that was calculated for the average curves resulting from a biological triplicate of experiments.

      The bar graphs early in the manuscript should shoe the individual data points from replicates. While the presentation is clear and differences are striking I think this article explains why showing the replicate data is important: https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128

      We fully agree with the referee and have accordingly replotted all the graphs where this improved the visualisation and contributed to the interpretation of the data.

      In Figure 2: F and G should include quantification, in G the arrow on the24 hpi filipin panel is not in the right location

      As mentioned in response to referee #1 and #2, qualitatively, sterols, as visualised by filipin and D4H, are present at all stages of the endo-lysosomal pathway and of MCV biogenesis. Now, there are many technical difficulties linked to a quantitative assessment, and therefore, please, let me present the framework. First, despite their wide use, the exact mechanism of binding of both reporters and which pool of sterol they visualise is still a mystery. This is often expressed as "they detect the accessible pool" of sterol, whatever it is. In addition, filipin detects sterols in both leaflets (and in intra-lumenal vesicles and other lipidic structures), while D4H detects sterols only in the cytosolic leaflet, and it is not known whether both leaflets have the same concentration of sterols. It is also known that filipin signal is only indirectly proportional to the sterol quantity in a cell, as measured by other quantitative methods. One of the best examples comes from studying the cellular phenotype of Niemann-Pick Type C disease, because many publications report a strong increase of filliping staining, whereas lipidomic analyses show at best a two-fold increase in cholesterol in NPC deficient cells. Moreover, technically speaking, D4H is a live probe, and fixation leads to some loss of localisation, probably because sterols are not fixable. On the other hand, filipin is mainly used after chemical fixation, but again sterols are not fixable, and the signal is very likely restricted to the membrane of origin, but not necessarily to the microdomains.

      We corrected the arrow localisation.

      Reviewer #3 (Significance (Required)):

      The key strength of this manuscript is the use of the Dictyostelium model to dissect host-pathogen interactions. This provides an interesting evolutionary lens to the research findings presented here and is further strengthened by the data demonstrating that these findings are relevant in a mammalian model as well. The weaknesses are articulated in my "major comments" section. The phenotypic data presented here is strong - it is clear that these vacuolin proteins are important for the intracellular success of M. marinum however the data demonstrating the mechanism for this is less clear.

      We thank the referee for this overall positive summary of our main findings and of the clarity of results, interpretations and working model. As detailed above, we respectfully disagree with the final conclusion and are pleased to note that the other two referees are more satisfied with the level of mechanistic evidence.

      I am an academic researcher who is interested in the molecular host-pathogen interactions mediated by intracellular microbial pathogens. Scientists in my research field will be a key audience for this research. Predominantly this is basic researchers but the interest will be broader than host-pathogen interactions as researchers in the membrane integrity and membrane dynamics field will be interested here.

    1. “In faith,” said Simontault, “I do not believe that you have ever been in love. If you had felt the flame like other men, you would not now be picturing to us Plato’s Republic, which may be described in writing but not be put into practice.” “Nay, I have been in love,” said Dagoucin, “and am so still, and shall continue so as long as I live. But I am in such fear lest the manifestation of this love should impair its perfection, that I shrink from declaring it even to her from whom I would fain have the like affection. I dare not even think of it lest my eyes should reveal it, for the more I keep my flame secret and hidden, the more does my pleasure increase at knowing that my love is perfect.”

      Here, Plato's "Republic" is referenced in regards to Dagoucin's statement, "...love be based on the beauty, grace, love, and favour of a woman...such love cannot long endure". Dagoucin is indicating his belief that love cannot exist if it is only based on individual parts rather than the whole, and this mimics the "Republic"'s stance on love: "I dare say that you remember, and therefore I need not remind you, that a lover, if he is worthy of the name, ought to show his love, not to some one part of that which he loves, but to the whole" ("Plato's Republic" Book V). However, Plato did not believe that love was romantic, but was the result of desire, and that this desire should be directed away from sex and towards more spiritual things that can free the soul (Kraut 30). Dagoucin's claim that "[his] love is perfect" suggests that humans subconsciously subscribe to Plato's argument that the Forms, a representation of a superior and perfect reality in which beauty is a part, are what we truly worship, not their representations in human form.

      However, Parlamente and Saffradent suggest that they find this stance of loving only concepts and ideals rather than people to be cowardly: "I have known others besides you who preferred to die rather than speak", says Parlamente, and Saffradent continues, describing worldly rather than philosophical love "I have heard much of such timid lovers, but I have never yet seen one die...I do not think that any one can die of love". This contrast in opinions suggests that Navarre, the author, understands that humans often choose to love perfect ideals, but because of her humanist leanings, which focus on the humanity of society, she is more inclined to believe that humans are capable of loving each other, even in their imperfect forms. This therefore suggests that she thinks that focusing only on perfect forms is lonely and ultimately results in death, as represented by her response that "others...preferred to die" than live free of requited love, as Dagoucin describes: "my love would not be increased any more than it could be lessened, were it not returned with equal warmth".

      Sources:

      Plato. Plato Book V-VI (excerpt). University of Notre Dame.

      Kraut, Richard. "Plato on Love." The Oxford Handbook of Plato, edited by Gail Fine, Oxford Academic, 2008, pp. 286-310.

    1. “Lord, if you be so virtuous of intelligence as you be naturally relieved to the body, you should have pity of me.

      In this quote we see that advice is being asked for. “One expert, doctor Rondibilis, replies that he should not, as any wife will be unfaithful because she is ultimately an irrational being.” I think that Pantagruel giving advice to his friend took so long because he may have been trying to figure out what to say and how to say it. His friend asks about if he should marry or not.

      “Early Modern Period: Fiction, Gargantua and Pantagruel” Primary Source. https://chnm.gmu.edu/wwh/p/83.html

    2. Now as he [the man] was just amongst them, Pantagruel said unto him, “Let me entreat you, friend, that you may be pleased to stop here a little and answer me to that which I shall ask you, and I am confident you will not think your time ill bestowed; for I have an extreme desire, according to my ability, to give you some supply in this distress wherein I see you are; because I do very much commiserate your case, which truly moves me to great pity.

      In this quote we read about when Pantagruel first meets his friend. “The astonishing intellectual scope, the formal and linguistic inventiveness, and the general ebullience of Rabelais's writings, known collectively as Gargantua and Pantagruel, embody Renaissance humanism in all its excitement and thirst for knowledge.” This quote fits this because Pantagruel wants to get to know his new friend.

      Nelson, Brian. “Rabelais: The Uses of Laughter”. Cambridge University Press. 2015. https://www.cambridge.org/core/books/abs/cambridge-introduction-to-french-literature/rabelais-the-uses-of-laughter/8C24FC8EC905FE9A3E39B31AF24339C6

    1. Author response:

      Reviewer #1 (Public Review):

      The work of Umetani et al. monitors the death of about 100,000 cells caused by lethal antibiotic treatments in a microfluidic device. They observe that the surviving bacteria are either in a dormant or in a non-dormant state prior to the antibiotic treatment. They then study the relative abundances of these different persister cells when varying the physiological state of the culture. In agreement with previous observations, they observe that late stationary phase cultures harbor a high number of dormant persister cells and that this number goes down as the culture is more exponential but remains non-zero, suggesting that cultures at the exponential phase contain different types of persister bacteria. These results were qualitatively similar in a rich and poor medium. Further characterization of the growing persister bacteria shows that they often form Lforms, have low RpoS-mcherry expression levels and grow only slightly more slowly than the non-persister bacteria. Taken together, these results draw a detailed view of persister bacteria and the way they may survive extensive antibiotic treatments. However, in order to represent a substantial advance on previous knowledge, a deeper analysis of the persister bacteria should be done.

      We thank the reviewer for suggesting the addition of more detailed analyses of persister cells. As we wrote in our response to Essential Revision 1, we now include a new section titled “Response of growing persisters to Amp exposure is heterogeneous” (Page 11-12) and present the results of the detailed analyses of single-cell dynamics of growth and cell morphology over the course of the pre-exposure, exposure, and post-exposure periods (Fig. 2D and H, Fig. 4B and D, Fig. 4 – figure supplement 1 and 2, Fig. 5B and D, Fig. 5 – figure supplement 1, Fig. 8B and D, and Figure 8 – figure supplement 1). The new results characterize differential responses to Amp treatment among growing persister cells (Fig. 4A-D, Fig. 4 – figure supplement 1, Fig. 4 – figure supplement 2A, Fig. 5A-D, and Fig. 5 – figure supplement 1), comparable division rates of MG1655 between non-surviving cells and persister cells growing prior to antibiotic treatments (Fig. 4E and Fig. 8E), except for the post-exponential phase cell populations of MF1 to Amp treatment in the LB medium and the post-exponential phase cell populations of MG1655 to Amp treatment in the M9 medium (Fig. 4 – figure supplement 2B and Fig. 5E) and the presence of persister cells to CPFX that avoid filamentation after the treatment (Fig. 8C and D, and Fig. 8 – figure supplement 1). We believe that these new analyses would provide new insights into the diverse dynamics and survival modes of antibiotic persistence at the single-cell level and represent important contributions to the field.

      Reviewer #2 (Public Review):

      The main question asked by Umenati et al. is whether persister cells to ampicillin arise preferentially from dormant, non-dividing cells or from cells that are actively growing before antibiotic exposure. The authors tracked persister cells generated from populations at different growth phases and culture media using a microfluidic device coupled to fluorescence microscopy, which is a challenge due to the low frequency of these persister cells. One of the main conclusions is that the majority of persisters arising in exponentially-growing populations originated from actively-dividing cells before the antibiotic treatment, reinforcing the idea that dormancy is not a prerequisite for persister formation. The authors made use of a fluorescent reporter monitoring RpoS activity (RpoS-mCherry fusion) and observed that RpoS levels in these persister cells were low. In the few lineages that exhibited no growth before the ampicillin treatment, RpoS levels were low as well, indicating that RpoS is not a predictive marker for persistence. By performing the same experiment with early and late stationary phase cultures, the authors observed that the proportion of persister cells that originated from dormant cells before the ampicillin treatment is significantly increased under these conditions. In the late stationary phase condition, dormant cells were expressing high levels of RpoS. The authors suggested that RpoS-mCherry proteins form aggregates which were suggested by the authors to be a characteristic of 'deep dormancy'. These cells were mostly unable to restart growth after the antibiotic removal while others with the lowest levels of RpoS tended to be persister. Confirming that these cells indeed contain protein aggregates as well as determining the physiological state of these cells appears to be crucial.

      We thank reviewer #2 for pointing out the critical issue with the RpoS-mCherry fusion that we used to quantify RpoS expression levels in single cells in the original manuscript. As explained in our reply to the comments below, we performed a suggested experiment and confirmed that the RpoS function was impaired by tagging it with mCherry. To resolve this issue, we repeated almost all the experiments using the wild-type strain MG1655 and confirmed the reproducibility of the main results (Fig. 3, Fig. 3 – figure supplement 1, and Fig. 7). Due to this change of the main strain used in this study, we removed the results on the correlation between RpoS expression and the persistence trait in the revised manuscript because it may not reflect the relationship of intact RpoS. However, we decided to still keep and show some of the results with the MF1 strain, such as the population killing curves and the survival mode analyses, because they also provide insight into the role of RpoS in antibiotic persistence. In particular, we found both beneficial and detrimental effects of RpoS on antibiotic persistence, depending on culture conditions and duration of antibiotic treatment (Fig. 1 – figure supplement 3 and Fig. 6 – figure supplement 1). Therefore, we have included these results and related discussions in the revised manuscript.

      Reviewer #3 (Public Review):

      In their manuscript, Umetani, et al. address the question of the origin of persister bacteria using single-cell approaches. Persistence refers to a physiological state where bacteria are less sensitive to antibiotherapy, although they have not acquired a resistance mutation; importantly, the concept of persistence has been refined in the past decade to distinguish it from tolerance where bacteria are only transiently insensitive. Since persister cells are very rare in growing populations (typically 1e-5 or 1e-6), it is very challenging to observe them directly. It had been proposed that individual cells surviving antibiotics are not growing at the start of the treatment, but recent studies (nicely reviewed in the introduction) where persister bacteria were observed directly do not support this link. Following a similar line, the authors nonetheless still aim at "investigating whether non-growing cells are predominantly responsible for bacterial persistence". Based on new experimental data, they claim the contrary that most surviving cells were "actively growing before drug exposure" and that their work "reveals diverse survival pathways underlying antibiotic persistence".

      We thank the reviewer for this helpful comment, which suggested to us that some revisions in our Introduction would better place our study in the context of previous understanding of antibiotic persistence. As mentioned in our response to Essential Revision 4 and the second comment of Reviewer 1's Recommendations for the authors, we have modified the Introduction to more appropriately place our study in the context of the field.

      The main strengths of the manuscript are in my opinion:

      - To report on direct observation of E. coli persisters to ampicillin (200µg/mL) in 5 different growth media (typically 20 persisters or more per condition, one condition with 12 only), which constitutes without a doubt an experimental tour de force.

      - To aim at bridging the population level and the single-cell level by measuring relevant variables for each and analyzing them jointly.

      - To demonstrate that in most conditions a large fraction of surviving cells was actively growing before drug exposure.

      In addition, although it is well-known that E. coli doesn't need to maintain its rod shape for surviving and dividing, I found very remarkable in their data the extent to which morphology can be affected in persister cells and their progeny, since this really challenges our understanding of E. coli's "lifestyle" (these swimming amoeba-like cells in Supp Video 11 are mind-blowing!).

      We are grateful to the reviewer for the articulation of the strength of this study. 

      Unfortunately, these positive aspects are counter-balanced by several shortcomings in the way experiments are analyzed and interpreted, which I explain below. Moreover, the manuscript is written in a way that makes it very hard to find important information on how experiments are done and is likely to leave the reader with an impression of confusion about what the main findings actually are.

      We thank the reviewer for pointing out these important issues regarding the original manuscript. Please see our replies below regarding how we corresponded to each specific comment to resolve the issue. To make the experimental methods and procedures more accessible and interpretable, we have added more explanations of the experimental details to the Results and Methods sections. Furthermore, since we understood that some of the confusions came from the insufficient explanation of the preculture procedures for the microfluidic experiments, we have modified the schematic illustration of the method shown in Fig. S1 in the original manuscript and moved it as the first main figure in the revised manuscript (Fig. 1C and D). We have also added an illustration that explains the cultivation procedures for the batch culture experiments as Fig.

      6A. 

      My major concerns are the following:

      (1) The main interpretation framework proposed by the authors is to assess whether cells not growing before drug exposure (so-called "dormant") are more or less likely to survive the treatment than growing ones ("non-dormant"). Fig 2A and Fig 3G show the main conclusions of the article from this perspective, that growing cells can survive the treatment and that the fraction of persisters in a given condition is not explained by the fraction of "dormant" cells, respectively. With this analysis, the authors essentially assume that "dormant" cells are of the same type in their different conditions, which ignores the progress in this field over the last decade (Balaban et al. 2019). I argue on the contrary that the observation of "diverse modes of survival in antibiotic persistence" is expected from their experimental design. In particular, the sensitivity of E. coli to beta-lactams such as ampicillin is expected to be much lower during the lag out of the stationary phase, a phenomenon which has been coined "tolerance"; hence in the Late Stationary condition, two subpopulations coexist for which different response to ampicillin is expected. I propose steps toward a more compelling interpretation of the experimental data. Should this point be taken seriously by the authors, it, unfortunately, implies a major rewriting of the article, including its title.

      We thank the reviewer for bringing to our attention the point that may have caused confusion in the original manuscript. 

      The primary purpose of this manuscript was not to assess whether non-growing cells prior to drug exposure are more or less likely to survive treatment than growing cells. Rather, we wanted to examine how different persister cell dynamics emerge at the single-cell level depending on previous cultivation history, growth media, and antibiotic types. We believe that this point is clearer in the revised manuscript with the newly added single-cell dynamics data (Fig. 2D, 2H, 4B, 4D, Fig. 4 – figure supplement 1 and 2A, Fig. 5B, 5D, Fig. 5 – figure supplement 1, Fig. 8B, 8D, and Fig. 8 – figure supplement 1). 

      We also did not mean to imply that "dormant cells" were of the same type under different conditions, as we were aware of the diversity of cellular states of non-growing cells, as well as the reduced sensitivity of cells to antibiotics during the lag out of stationary phase. We believe that one of the reasons this point may have been unclear is that in the previous version we had referred to all cells that were not growing prior to antibiotic treatment as "dormant cells", a term that is often used in a more restricted way to refer to cells under prolonged growth arrest. Therefore, in the revised manuscript, we have avoided the term "dormant cells" and instead simply referred to these as "non-growing cells". Accordingly, we have changed the title of the paper from "Observation of non-dormant persister cells reveals diverse modes of survival in antibiotic persistence" to "Observation of persister cell histories reveals diverse modes of survival in antibiotic persistence".

      To further address these points, we have improved the description of the experimental procedures for the single-cell measurements (see the reviewer's next comment as well). The nongrowing persisters of the MF1 strain found in the post-exponential phase cell populations must be of a different type than those found in the post-early and post-late stationary phase cell populations due to the experimental design. All early and late stationary phase cells were maintained in a non-growing state by flowing conditioned media prepared from the early and late stationary phase cultures until the start of the time-lapse measurements. Thus, aside from potential physiological heterogeneity, the non-growing cells prior to drug treatment are all long lagging cells. On the other hand, for the post-exponential phase condition, we maintained exponential growth conditions during the period from the start of the second pre-culture to the start of antibiotic treatment, including the period during sample preparation for time-lapse measurements. Given the exponential dilution by growth of cell populations, the non-growing persisters are unlikely to be long lagging cells (see our response to Reviewer 2's third comment  in "Recommendations for the authors"). We now describe these experimental procedures in more detail in the Results section (L161-178, L287-297). In addition, we discuss the diversity of cellular states of both non-growing and growing cells in Discussion, citing literature (L545-557).

      (2) The way the authors describe their experiments with bacteria in the stationary phase is very problematic. For instance, they write that they "sampled cells from early and late stationary phases (...) and exposed them to 200 μg/mL of Amp in both batch and single-cell cultures." For any reader in a hurry (hence skipping methods and/or supplementary figure), this leads to believe that bacteria sampled in the stationary phase were exposed to the drug right away (either by adding the drug to the stationary phase sample, or more classically by transferring cells to fresh media with antibiotics). However, it turns out that, after sampling and loading in the microfluidic device, bacteria are grown 2 h in LB (or 4 h in M9) - I don't know what to think of such a blatant omission. The names chosen for each condition should reflect their most important aspects, here "stationary" is simply not appropriate - maybe something like "post early stationary" instead. In any case, I believe that this point highlights further the misconception pointed out in 1 and implies that the average reader will be at best confused, and probably misled.

      We again thank the reviewer for pointing out the insufficient explanation of the method for the single-cell measurements and the helpful recommendation regarding our nomenclature for different conditions. As mentioned above, we now present the previous supplementary figure that schematically explains the experimental procedure as the first main figure to clarify how we prepared the cells loaded into the microfluidic device for single-cell measurements (Fig. 1C and D). Also, following the reviewer's suggestion, we now refer to the conditions as "post-exponential phase," "post-early stationary phase," and "post-late stationary phase" in the revised manuscript. 

      We included a 2-hour (or 4-hour in M9) cultivation period in fresh medium in batch cultures for measuring killing curves to make the cultivation conditions prior to antibiotic treatment as similar as possible between batch and microfluidic experiments. We have clarified the presence of preexposure cultivation of post-early stationary and post-late stationary phase cell populations in the fresh medium before treating them with antibiotics (L264-269, Fig. 6A), so that readers can more easily recognize the experimental conditions.

      (3) Figures 4 and 5 are of very minor significance, and the methodology used in Fig 4 is questionable. The authors measure the abundance of an Rpos-mCherry translational fusion because its "high expression has been suggested to predict persistence". The rationale for this (that an RpoS-mCherry fusion would be a proxy for intracellular ppGpp levels, and in turn predict persistence) has never been firmly established, and the standards used in the article where this reporter was introduced (Maisonneuve, Castro-Camargo, and Gerdes 2013) are notoriously low (which eventually led to its retraction) - I don't know what to think of the fact that the authors cite a review by this group rather than their retracted article. While transcriptional fusions of promoters regulated by RpoS have been proposed to measure its regulatory activity (Patange et al. 2018), the combination of self-regulation and complex post-translational regulation of rpoS makes the physical meaning of the reporter used here completely unclear. Moreover, this translational fusion is introduced without doing any of the necessary controls to demonstrate that the activity of RpoS is not impaired by the addition of the fluorescent protein. Fig 5 simply reports the existence of persisters to ciprofloxacin growing before the treatment. This might be a new observation but it is not unexpected given that a similar observation has been made with a similar drug, ofloxacin (Goormaghtigh and van Melderen 2019), as pointed out in the introduction. There is no further quantitative claim on this.

      We thank the reviewer for pointing out the issue of the RpoS-mCherry fusion. As we mentioned in our response to Essential Revision 2 and also to the comment from reviewer #2, we have tested the sensitivity of this fluorescent reporter strain to oxidative stress and confirmed that it is as sensitive as the rpoS strain (Fig. 1 – figure supplement 1C). Therefore, the RpoS function seems to be defective in this strain, as now explained in Results (L69-79). After confirming the problem with the RpoS-mCherry fusion, we removed all analyses and related arguments that relied on the RpoS expression level (previous Figure 4). In addition, we repeated almost all the experiments with the original MG1655 strain to confirm that the observed results are not specific to the problematic reporter strain. 

      Regarding the experiments with CPFX, we have added a more detailed analysis of single cell dynamics and found that, contrary to the reported results for ofloxacin, not all persistent cells show filamentation after drug withdrawal (Fig. 8C and D, Fig. 8 – figure supplement 1). In addition, we performed new microfluidic experiments in which we treated post-late stationary phase cells with CPFX (Fig. 3). In contrast to the Amp treatment result and the previous study that reported the persistence of post-stationary phase cell populations to ofloxacin (ref. 20), all the persisters for which we identified the pre-exposure growth traits in this condition grew normally prior to CPFX treatment. These newly added analyses and experiments clarify the significance of the CPFX experiments. 

      (4) The authors don't mention the dead volume nor the speed of media exchange in their device. Hopefully, it is short compared to the duration of the treatment; however, it is challenging to remove all antibiotics after the treatment and only 1e-3 or 1e-4 of the treatment concentration is already susceptible to affecting regrowth in fresh media. If this is described in another article, it would be worth adding a comment in the main text.

      We thank the reviewer for bringing up this important point. We have added the perfusion chamber volume and medium flow rate information in the Methods section (L809-817).   

      In the study in which two of the authors participated, the medium exchange rate across the semipermeable membrane was evaluated in a similar device with similar microchamber dimensions (ref. 26). There, we confirmed that the medium exchange was completed within 5 min, which is much shorter than the period of antibiotic treatment and post-antibiotic treatment periods for observing regrowth. We have also included this information in the main text with the reference (L58-63).

      Despite the relatively high medium exchange rate, we cannot formally exclude the possibility that a small amount of antibiotic may remain in the device, e.g. due to non-specific adsorption on the internal surface of the microchambers. In such cases, the residual antibiotics may influence the physiological states of the cells and the regrowth kinetics in the post-exposure periods, as suggested by the reviewer. However, the frequencies of persister cells in the cell populations in our single-cell measurements are comparable to those in the batch culture measurements. Therefore, the removal of antibiotic drugs in our device is at least as efficient as in the batch culture assay. To clarify this point, we have added a paragraph to the Discussion with a reference that reviews the influence of antibiotics at concentrations significantly lower than the MICs (L482-

      489).    

      (5) Fig 2A supports the main finding that a significant fraction of bacteria surviving the treatment are growing before drug exposure, but it uses a poorly chosen representation.

      - In order to compare between conditions, one would like to see the fraction of each type in the population.

      - The current representation (of a fraction of each type among surviving cells) requires a side-byside comparison with a random sample (which will practically be equivalent to the fraction of each type among killed cells) in order to be informative.

      We have changed the style of the previous Fig. 2A to show the fraction of each type in the population instead of the fraction of each type among surviving cells (Fig. 3 and Fig. 3-figure supplement 1).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      By way of background, the Jiang lab has previously shown that loss of the type II BMP receptor Punt (Put) from intestinal progenitors (ISCs and EBs) caused them to differentiate into EBs, with a concomitant loss of ISCs (Tian and Jiang, eLife 2014). The mechanism by which this occurs was activation of Notch in Put-deficient progenitors. How Notch was upregulated in Put-deficient ISCs was not established in this prior work. In the current study, the authors test whether a very low level of Dl was responsible. But co-depletion of Dl and Put led to a similar phenotype as depletion of Put alone. This result suggested that Dl was not the mechanism. They next investigate genetic interactions between BMP signaling and Numb, an inhibitor of Notch signaling. Prior work from Bardin, Schweisguth and other labs has shown that Numb is not required for ISC self-renewal. However the authors wanted to know whether loss of both the BMP signal transducer Mad and Numb would cause ISC loss. This result was observed for RNAi depletion from progenitors and for mad, numb double mutant clones. Of note, ISC loss was observed in 40% of mad, numb double mutant clones, whereas 60% of these clones had an ISC. They then employed a two-color tracing system called RGT to look at the outcome of ISC divisions (asymmetric (ISC/EB) or symmetric (ISC/ISC or EB/EB)). Control clones had 69%, 15% and 16%, respectively, whereas mad, numb double mutant clones had much lower ISC/ISC (11%) and much higher EB/EB (37%). They conclude that loss of Numb in moderate BMP loss of function mutants increased symmetric differentiation which lead caused ISC loss. They also reported that numb<sup>15</sup> and numb<sup>4</sup> clones had a moderate but significant increase in ISC-lacking clones compared to control clones, supporting the model that Numb plays a role in ISC maintenance. Finally, they investigated the relevance of these observation during regeneration. After bleomycin treatment, there was a significant increase in ISC-lacking clones and a significant decrease in clone size in numb<sup>4</sup> and numb<sup>15</sup> clones compared to control clones. Because bleomycin treatment has been shown to cause variation in BMP ligand production, the authors interpret the numb clone under bleomycin results as demonstrating an essential role of Numb in ISC maintenance during regeneration.

      Strengths:

      (i) Most data is quantified with statistical analysis

      (ii) Experiments have appropriate controls and large numbers of samples

      (iii) Results demonstrate an important role of Numb in maintaining ISC number during regeneration and a genetic interaction between Mad and Numb during homeostasis.

      Weaknesses:

      (i) No quantification for Fig. 1

      Quantification of Fig.1 has been added. 

      (ii) The premise is a bit unclear. Under homeostasis, strong loss of BMP (Put) leads to loss of ISCs, presumably regardless of Numb level (which was not tested). But moderate loss of BMP (Mad) does not show ISC loss unless Numb is also reduced. I am confused as to why numb does not play a role in Put mutants. Did the authors test whether concomitant loss of Put and Numb leads to even more ISC loss than Put-mutation alone.

      We have tested the genetic interaction between put and numb using Put RNAi and Numb RNAi driven by esg<sup>ts</sup>. According to the results in this study and our previously published data, put mutant clone or esg<sup>ts</sup> > Put-RNAi induced a rapid loss of ISC (whin 8 days). We did not observe further enhancement of stem cell loss phenotype in Put and Numb double RNAi guts.

      (iii) I think that the use of the word "essential" is a bit strong here. Numb plays an important role but in either during homeostasis or regeneration, most numb clones or mad, numb double mutant clones still have ISCs. Therefore, I think that the authors should temper their language about the role of Numb in ISC maintenance.

      We have revised the language and changed “essential” to important”.

      Reviewer #2 (Public review):

      Summary:

      This work assesses the genetic interaction between the Bmp signaling pathway and the factor Numb, which can inhibit Notch signalling. It follows up on the previous studies of the group (Tian, Elife, 2014; Tian, PNAS, 2014) regarding BMP signaling in controlling stem cell fate decision as well as on the work of another group (Sallé, EMBO, 2017) that investigated the function of Numb on enteroendocrine fate in the midgut. This is an important study providing evidence of a Numb-mediated back up mechanism for stem cell maintenance.

      Strengths:

      (1) Experiments are consistent with these previous publications while also extending our understanding of how Numb functions in the ISC.

      (2) Provides an interesting model of a "back up" protection mechanism for ISC maintenance.

      Weaknesses:

      (1) Aspects of the experiments could be better controlled or annotated:

      (a) As they "randomly chose" the regions analyzed, it would be better to have all from a defined region (R4 or R2, for example) or to at least note the region as there are important regional differences for some aspects of midgut biology.

      Thank you for the suggestion. In fact, we conducted all the analyses in region 4, we have added statement to clarify this in the revised manuscript.

      (b) It is not clear to me why MARCM clones were induced and then flies grown at 18{degree sign}C? It would help to explain why they used this unconventional protocol.

      We kept the flies at 18°C to avoid spontaneous clone.

      (2) There are technical limitations with trying to conclude from double-knockdown experiments in the ISC lineage, such as those in Figure 1 where Dl and put are both being knocked down: depending on how fast both proteins are depleted, it may be that only one of them (put, for example) is inactivated and affects the fate decision prior to the other one (Dl) being depleted. Therefore, it is difficult to definitively conclude that the decision is independent of Dl ligand.

      In our hand, Dl-RNAi is very effective and exhibited loss of N pathway activity (as determined by the N pathway reporter Su(H)-lacZ ) after RNAi for 8 days (Fig. 1D). Therefore, the ectopic Su(H)-lacZ expression in Punt Dl double RNAi (fig. 1E) is unlikely due to residual Dl expression. Nevertheless, we have changed the statement “BMP signaling blocks ligand-independent N activity” to” Loss of BMP signaling results in ectopic N pathway activity even when Dl is depleted”

      (3) Additional quantification of many phenotypes would be desired.

      (a) It would be useful to see esg-GFP cells/total cells and not just field as the density might change (2E for example).

      We focused on R4 region for quantification where the cell density did not exhibit apparent change in different experimental groups. In addition, we have examined many guts for quantification. It is very unlikely that the difference in the esg-GFP+ cell number is caused by change in cell density.

      (b) Similarly, for 2F and 2G, it would be nice to see the % of ISC/ total cell and EB/total cell and not only per esgGFP+ cell.

      Unfortunately, we didn’t have the suggested quantification. However, we believe that quantification of the percentage of ISC or EB among all progenitor cells, as we did here, provides a meaningful measurement of the self-renewal status of each experimental group.

      (c) Fig1: There is no quantification - specifically it would be interesting to know how many esg+ are su(H)lacZ positive in Put- Dl- condition compared to WT or Put- alone. What is the n?

      Quantification of Fig.1 has been added. 

      (d) Fig2: Pros + cells are not seen in the image? Are they all DllacZ+?

      Anti-Pros and anti-E(spl)mβ-CD2 were stained in the same channel (magenta).  Pros+ exhibited “dot-like” nuclear staining while CD2 staining outlined the cell membrane of EBs. We have clarified this in the revised figure legend.

      (e) Fig3: it would be nice to have the size clone quantification instead of the distribution between groups of 2 cell 3 cells 4 cell clones.

      Because of the heterogeneity of clone size for each genotype, we chose to group clones based on their sizes ( 2, 3-6, 6-8, >8 cells) and quantified the distribution of individual groups for each genotype, which clearly showed an overall reduction in clone size for mad numb double mutant clones. We and others have used the same clone size analysis in previous studies (e.g., Tian and Jiang, eLife 2014).

      (f) How many times were experiments performed?

      All experiments were performed at least 3 times.

      (4) The authors do not comment on the reduction of clone size in DSS treatment in Figure 6K. How do they interpret this? Does it conflict with their model of Bleo vs DSS?

      Guts containing numb<sup>4</sup> clones treated with DSS exhibited a slight reduction of clone size, evident by a higher percentage of 2-cell clones and lower percentage of > 8 cell clones. This reduction is less significant in guts containing numb<sup>15</sup> clones. However, the percentage of Dl<sup>+</sup>-containing clones is similar between DSS and mock-treated guts. It is possible that ISC proliferation is lightly reduced due to numb<sup>4</sup> mutation or the genetic background of this stock.

      (5) There is probably a mistake on sentence line 314 -316 "Indeed, previous studies indicate that endogenous Numb was not undetectable by Numb antibodies that could detect Numb expression in the nervous system".

      We have modified the sentence.

      Reviewer #3 (Public review):

      Summary:

      The authors provide an in-depth analysis of the function of Numb in adult Drosophila midgut. Based on RNAi combinations and double mutant clonal analyses, they propose that Numb has a function in inhibiting Notch pathway to maintain intestinal stem cells, and is a backup mechanism with BMP pathway in maintaining midgut stem cell mediated homeostasis.

      Strengths:

      Overall, this is a carefully constructed series of experiments, and the results and statistical analyses provides believable evidence that Numb has a role, albeit weak compared to other pathways, in sustaining ISC and in promoting regeneration especially after damage by bleomycin, which may damage enterocytes and therefore disrupt BMP pathway more. The results overall support their claim.

      The data are highly coherent, and support a genetic function of Numb, in collaborating with BMP signaling, to maintain the number and proliferative function of ISCs in adult midguts. The authors used appropriate and sophisticated genetic tools of double RNAi, mutant clonal analysis and dual marker stem cell tracing approaches to ensure the results are reproducible and consistent. The statistical analyses provide confidence that the phenotypic changes are reliable albeit weaker than many other mutants previously studied.

      Weaknesses:

      In the absence of Numb itself, the midgut has a weak reduction of ISC number (Fig. 3 and 5), as well as weak albeit not statistically significant reduction of ISC clone size/proliferation. I think the authors published similar experiments with BMP pathway mutants. The mad<sup>1-2</sup> allele used here as stated below may not be very representative of other BMP pathway mutants. Therefore, it could be beneficial to compare the number of ISC number and clone sizes between other BMP experiments to provide the readers with a clearer picture of how these two pathways individually contribute (stronger/weaker effects) to the ISC number and gut homeostasis.

      Thanks for the comment. We have tested other components of BMP pathway in our previously study (Tian et al., 2014). More complete loss of BMP signaling (for example, Put clones, Put RNAi, Tkv/Sax double mutant clones or double RNAi) resulted in ISC loss regardless the status of numb, suggesting a more predominant role of BMP signaling in ISC self-renewal compared with Numb. We speculate that the weak stem cell loss phenotype associated with numb mutant clones in otherwise wild type background could be due to fluctuation of BMP signaling in homeostatic guts.

      The main weakness of this manuscript is the analysis of the BMP pathway components, especially the mad<sup>1-2</sup> allele. The mad RNAi and mad<sup>1-2</sup> alleles (P insertion) are supposed to be weak alleles and that might be suitable for genetic enhancement assays here together with numb RNAi. However, the mad<sup>1-2</sup> allele, and sometimes the mad RNAi, showed weakly increased ISC clone size. This is kind of counter-intuitive that they should have a similar ISC loss and ISC clone size reduction.

      We used mad<sup>1-2</sup> and mad RNAi here to test the genetic interaction with numb because our previous studies showed that partial loss of BMP signaling under these conditions did not cause stem cell loss, therefore, may provide a sensitized background to determine the role of Numb in ISC self-renewal. The increased proliferation of ISC/ clone size associated with mad<sup>1-2</sup> and mad RNAi is due to the fact that reduction of BMP signaling in either EC or EB non-autonomously induces stem cell proliferation. However, in mad numb double mutant clones, there was a reduction in clone size due to loss of ISC in many clones.

      A much stronger phenotype was observed when numb mutants were subject to treatment of tissue damaging agents Bleomycin, which causes damage in different ways than DSS. Bleomycin as previously shown to be causing mainly enterocyte damage, and therefore disrupt BMP signaling from ECs more likely. Therefore, this treatment together with loss of numb led to a highly significant reduction of ISC in clones and reduction of clone size/proliferation. One improvement is that it is not clear whether the authors discussed the nature of the two numb mutant alleles used in this study and the comparison to the strength of the RNAi allele. Because the phenotypes are weak and more variable, the use of specific reagents is important.

      We have included information about the two numb alleles in the “Materials and Methods”. numb<sup>15</sup> is a null allele, and the nature of numb<sup>4</sup> has not been elucidated. According to Domingos, P.M. et al., numb<sup>15</sup> induced a more severe phenotype than numb<sup>4</sup> did. Consistently, we also found that more numb<sup>15</sup> mutant clones were void of stem cell than numb<sup>4</sup> mutant clones.

      Furthermore, the use of possible activating alleles of either or both pathways to test genetic enhancement or synergistic activation will provide strong support for the claims.

      Activation of BMP (esgts>Tkv<sup>CA</sup>) alone induced stem cell tumor (Tian et al., 2014) whereas overexpression of Numb did not induce increase stem cell number although overexpression of Numb in wing discs produced phenotypes indictive of inhibition of N (our unpublished observation), making it difficult to test the synergistic effect of activating both BMP and Numb.

      Reviewer #1 (Recommendations for the authors):

      - Cartoon of RGT in Fig 4 needs to be improved. We need to know what chromosome harbors the esgts. It is not sufficient to simply put the location of the ubi-GFP and ubi-RFP (on 19A) and not show the location of other components of the RGT system.

      Thank you for the suggestion. We have revised the cartoon in Fig. 4 to include all three pairs of chromosomes and indicate where the esgts driver and UAS-RNAi are located. In addition, we have included the genotypes for all the genetic experiments in the Method section.

      - Quantification of the results in Fig. 1

      Quantification of Fig.1 has been added. 

      - The authors need to explain the premise more carefully (see above) and explain whether or not they tested put, numb double knockdowns.

      We have explained why not testing put numb double RNAi (see above).

      Reviewer #2 (Recommendations for the authors):

      The number of times the experiments have been performed would be useful to include.

      This information has been added in the figure legends.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility, and clarity)

      The manuscript by Song et al presents evidence to show that the predicted cysteine protease type 6 secretion system (T6SS) effector Cpe1 inhibits target cell growth by cleaving type II DNA Topoisomerases GyrB and ParE. The authors determined the structure of the protein complex formed by Cpe1 and its immunity protein Cpi1, which allowed them to reveal the mechanism of inhibition. Moreover, the authors identified type II DNA topoisomerases GyrB and ParE as the targets of Cpe1. Overall, the major conclusions were well supported by experimental data of high quality. The findings have expanded our appreciation of the mechanism utilized by T6SS effectors to inhibit target cell growth.

      We thank the reviewer for their positive remarks and valuable suggestions to improve this manuscript.


      Major comments

      To better establish that GyrB and ParE are the sole targets of Cpe1, the authors should express the GG mutant in target cells and determine whether these cells become resistant to Cpe1-mediated killing (inhibition). They can also determine whether co-expression of the cleavage resistant mutants suppresses the toxicity of Cpe1.

      We appreciate the reviewer’s suggestion to investigate additional substrates of Cpe1 beyond GyrB and ParE, which may not have been fully captured in our crosslinking-mass spectrometry experiments due to technical limitations or low protein abundance. To address this topic, we generated target cells heterologously expressing cleavage-resistant GyrB and ParE variants (GyrBΔG102 and ParEΔG98) that are not susceptible to Cpe1, as described in our original manuscript (Figures 3h, i). We performed both Cpe1 expression assay and competition assay to assess if expression of the cleavage-resistant variants suppresses Cpe1 toxicity (Author Response Figures 1a, b). However, we did not observe a substantial protective effect. While this outcome could suggest that GyrB and ParE are not the sole targets of Cpe1, alternative explanations are also plausible. In the Cpe1 expression assay, high levels of Cpe1 could still act on endogenous wild-type GyrB and ParE, and although we attempted to increase variant expression, precise quantification remains challenging. In the competition assay, highly active Cpe1 may have continued to target wild-type substrates throughout the experiment, potentially masking any protective effect. Additionally, reduced activity of the mutant proteins could contribute to the observed results. Finally, deletion of the global repressor H-NS in the Cpe1-producing E. coli strain may have induced other interbacterial competition mechanisms1, leading to growth inhibition independently of Cpe1. Addressing these questions comprehensively would require a more systematic investigation under a wider range of conditions. We consider this an important avenue for future studies.

      Results in Figure 7 clearly show that Cpi1 is capable of displacing ParE from Cpe1 due to higher affinity. Yet, the "competitive inhibition model" described in the last result section does not completely match what is really happening in Cpe1-mediated interbacterial competition. If Cpi1 is in the target cell, it would more likely engage the incoming Cpe1 before it can interact with ParE or GyrB, so competition does not occur in this scenario. Similarly, in the predatory cells expressing Cpe1 and Cpi1, these two proteins will form a stably protein complex, and no competition with the target will occur. The authors should reconsider their model.

      We thank the reviewer for their comments and appreciate the opportunity to clarify this point. First, we believe the reviewer is referring to Figure 5 rather than Figure 7. In our model, the primary role of immunity proteins in interbacterial competition is to neutralize cognate toxins and prevent self- or kin-intoxication. These immunity proteins exhibit high specificity and strong binding affinity toward their associated toxins, ensuring effective protection2. In predatory cells, immunity proteins are typically co-expressed with their corresponding toxins, likely enabling immediate suppression upon translation. During kin competition, immunity proteins can protect cells even after foreign toxins engage their substrates.

      Our results demonstrate that Cpi1 binds Cpe1 with higher affinity than its substrates and can displace them from pre-formed Cpe1-substrate complexes (Figures 5b-f). This aligns with the established function of immunity proteins in interbacterial competition and provides a mechanistic basis for how they confer protection, even when toxins have initially engaged their targets2. We acknowledge the reviewer’s point that in both scenarios—whether in the recipient cell or the toxin-producing cell—Cpe1 may first encounter Cpi1. However, our model underscores that Cpi1 not only binds at the substrate site but also exhibits superior affinity for Cpe1, ensuring robust protection against Cpe1-mediated toxicity.

      Minor comments

      "Intoxication" was used throughout the text numerous times to describe the activity of Cpe1. Looking in the Marriam-Webster dictionary, "Intoxication" means "a condition of being drunk". This word should be replaced with "toxicity" or some other terms in this line.

      We thank the reviewer for this comment. We acknowledge that the term "intoxication" is commonly associated with alcohol consumption, yet the Merriam-Webster dictionary also defines it as "an abnormal state that is essentially a poisoning" (https://www.merriam-webster.com/dictionary/intoxication). This definition aligns with its well-established usage in the field of interbacterial competition to describe the effects of interbacterial toxins during antagonism3-5, which we have adopted in our manuscript. However, we appreciate the reviewer’s concern and remain open to revising the terminology if deemed necessary for clarity.

      Lines 46-48, references on contact-dependent killings by these systems mentioned should cited. Ref. 9 cited does NOT cover the information at all.

      We thank the reviewer for this comment. We have revised the citation and now reference studies that specifically describe contact-dependent killing systems in the relevant sentences (Lines 45–____50)

      "characterizations" should be "characterization".

      We have now modified the sentence as requested (Line 69)

      Line 229 "Cpe1-Bpa monomers" should be " apo Cpe1-Bpa". The results cannot distinguish whether these bands are monomers or multimers.

      We appreciate the reviewer’s careful assessment of our manuscript. The results in Line 233 (Figure 3c) show the enrichment of His-tagged proteins, including crosslinked complexes and overproduced Cpe1-Bpa. Based on the molecular weight marker, the Cpe1-Bpa bands appear between 10–15 kDa, consistent with the molecular weight of Cpe1 monomers (Figure 3a). Therefore, we have labeled this band as “Cpe1-Bpa monomers” and maintained this terminology throughout the text. This designation aligns with previous studies utilizing site-specific crosslinking via Bpa incorporation6,7

      Line 283, was the mutation deletion? Substitution was used I think.

      We thank the reviewer for highlighting this point. The GyrB and ParE mutants used to confirm the cleavage sites were deletion mutants, with a single glycine removed from the predicted double-glycine motifs. We have now revised the text for clarity (Lines 285–290)

      Lines 439-444 the discussion should be extended to include other bacterial toxins that target type II DNA topoisomerases (e.g. PMID: 26299961 and PMID: 26814232).

      We appreciate the reviewer’s suggestion. The studies referenced (PMID: 26299961 and PMID: 26814232) describe FicT toxin with adenylyl transferase activity that target and post-translationally modify GyrB and ParE at their ATPase domains, highlighting a potential hotspot for topoisomerase inhibition. We have now incorporated an additional paragraph in the Discussion section to describe these findings (Lines 424–439).

      Reviewer #1 (Significance)

      The authors determined the structure of the protein complex formed by Cpe1 and its immunity protein Cpi1, which allowed them to reveal the mechanism of inhibition. Moreover, the authors identified type II DNA topoisomerases GyrB and ParE as the targets of Cpe1. Overall, the major conclusions were well supported by experimental data of high quality. The findings have expanded our appreciation of the mechanism utilized by T6SS effectors to inhibit target cell growth.

      We sincerely thank the reviewer for their positive comments and for the suggestions to improve our manuscript.

      Reviewer #2 (Evidence, reproducibility, and clarity)

      The manuscript, titled "An Interbacterial Cysteine Protease Toxin Inhibits Cell Growth by Targeting Type II DNA Topoisomerases GyrB and ParE", describes how an effector family was identified and characterized as a papain-like cysteine protease (PLCP) that negatively impacts bacterial growth in the absence of its co-encoded immunity protein. This thorough report includes (1) bioinformatic analysis of prevalence, finding this PLCP effector encoded in many gram-negative bacteria, (2) confirming conservation of catalytic active site via structural (crystallographic) analysis, as well as visualizing contacts with the immunity protein, (3) validation of results using growth studies combined with mutagenesis, (4) using a cell-based cross-linking method to pull out potential targets, which were subsequently identified via mass spectrometry, (5) validation of these results using in vitro protease assays with purified (potential) substrates, including verification of the motif recognized on the substrate(s), and cell-based phenotype analyses, and finally, (6) demonstrating competition between immunity protein and ParE substrate using an in vitro pull-down approach. Overall, this is a strong body of work with compelling conclusions that are well supported by multiple experimental approaches.

      We appreciate the reviewer for their positive comments regarding our original submission.

      Major comments

      The claims made based on the presented results are well supported, including that this PLCP effector toxin is widespread, is neutralized in a competitive mechanism by its immunity partner, and that it effectively cleaves both GyrB and ParE (subunits of bacterial type II topoisomerases) at a conserved motif, resulting in suppression of bacterial cell growth via mis-regulating chromosome segregation. No additional experiments are needed to further validate these results, and the authors are commended on the cell-based and in vitro studies to deduce very specific mechanisms and structural details.

      We appreciate the reviewer’s positive feedback.

      Minor comments

      While the writing and data presentation are extremely clear, in general I recommend the authors indicate the level(s) of replication for experiments. Figure legends generally note that mean values with standard deviations are shown, but I did not find where the number of replicates (and independent versus technical) were listed.

      We appreciate the reviewer’s suggestion. We have now revised the manuscript to specify the levels of replication (independent vs. technical) for each experiment in the figure legends, particularly in Figures 2 and 3.

      The figures are very clear, but in many instances the addition of PLCP toxin is indicated as "before" and "after"; while a modest change, I recommend altering this to some type of "-" and "+" type nomenclature rather than a time-based notation (especially as presumably both samples were treated identically, just with or without protease).

      We thank the reviewer for this helpful comment. In Figures 3 and Supplementary Figures 5, 9, we used "before" and "after" to indicate the time points for in vitro cleavage assays verifying Cpe1 cleavage. To minimize variations between reactions, the catalytic mutant Cpe1tox (Cpe1toxC362A) was used as a comparison rather than a reaction without Cpe1tox. In these assays, duplicate reaction mixtures were prepared: one was denatured immediately after preparation ("before" reaction) to serve as a baseline, while the other was incubated to allow enzymatic activity ("after" reaction). This labeling clarifies the comparison between initial and processed samples. We believe this approach clearly distinguishes the effects of Cpe1 activity and provides a reliable basis for assessing proteolysis in our assays.

      I also suggest quantifying the intensities of the gel images presented in Figure 5c, d (for example, Cpe1 intensity as a ratio to that of the ParE ATPase domain), to make the interpretation even more evident.

      We thank the reviewer for the valuable suggestion to quantify the signal intensities of the gel images presented in Figures 5c, d. We have now included the quantification results in Supplementary Figures 9e, f and have updated the respective text in the manuscript (Lines 826-828 and 1066-1087).

      Crystallographic structure: the PDB report notes some higher-than-expected RZR (RSRZ) scores; I interpret this to mean that there was strain around the catalytic site of one of the two toxins in the asymmetric unit, or that this copy was less well ordered. The RZR outliers likely arise from non-optimal weighting for geometric restraints. While no figures of electron density are presented, these modest outliers are not expected to alter the conclusions reached in the current work. One point of interest that is not addressed, however, is if any variance between the two complexes in the asymmetric unit are noted? A passage compares the current toxins to others in the larger subfamily and notes a rotation of a side chain is needed to superpose (Line 159). Can the authors please clarify around which bond this rotation is needed, and if both copies in the asymmetric unit are in the same orientation at this site?

      We appreciate the reviewer’s insightful comments.

      1. We have provided the electron density map for the RSR-Z outlier residues along with the model (Author response Figure 2a). These outlier residues are located at the loop regions of a molecule within the asymmetric unit in the crystal (Chain B). As a result, the electron density for their side chains appears to be noisier compared to residues in the well-folded regions, leading to higher RSR-Z scores. Notably, when we superimposed the models of two complexes within the asymmetric unit, the calculated RMSD value was 0.402 Å (Author response Figure 2b), indicating that the two models are structurally very similar and that these residues are properly assigned. Therefore, the RSR-Z outliers do not significantly impact the overall structure.
      2. Here, we provide a zoomed-in view of Figure 2d, highlighting the superimposed crystal structures of Cpe1 and the closely related PLCPs, ComA and LahT (Author response Figure 2c). As shown, the side chain of the catalytic cysteine residue in ComA adopts a different orientation, positioning it slightly farther from the homologous residues in Cpe1 and LahT. However, since the backbone and catalytic pockets remain structurally intact, we believe that this deviation arises due to results from crystal packing effects rather than an inherent functional distinction. We have now modified the main text (Lines 159-166) to clarify this and prevent any potential misinterpretation.

      Reviewer #2 (Significance)

      Bacteria encode numerous effectors to successfully compete in natural environments or to mediate virulence; these effectors are typically associated with type VI secretion system machinery or referred to as contact dependent inhibition systems. The current work has identified a sub-family of papain-like cysteine protease effectors that are unique by targeting type II topoisomerases. Among the actionable findings is the identification of both the specific site of interaction with the topo substrates, as well as the specific motif recognized for cleavage. This should enable the field to move forward probing for this activity with other toxins and substrates. The insights provided by the competitive neutralization mechanism also stand out as an important contribution that can be more broadly applied. Within the literature, few effector targets are identified, making the current study stand out as impactful by the well-executed experiments that directly support the conclusions.

      While the current study has strong elements of novelty and is complete, it also nicely sets up future studies for remaining open questions. For example, does the nucleotide-bound status of the ATPase domain, or other catalytic intermediate, impact the susceptibility of topoisomerases to cleavage? Is this identified motif found in other ATPase domains? Is the negative supercoiling activity unique to gyrase also impacted, or is the phenotypic mechanism of cell toxicity reliant only on chromosome segregation? What types of kinetic parameters do this class of toxins demonstrate, and does sequence variability alter this? These ideas are a testament to the intriguing study as presented, capturing the readers' curiosity for additional details that are clearly beyond the scope of the current work.

      I anticipate this work will be of interest to the broad field of microbiologists that study interbacterial communication as well as pathogenic mechanisms. While the research is largely fundamental in nature, it is wide in scope with applications to many gram-negative bacteria that inhabit a myriad of niches. The work will also be of interest to specialists in topoisomerases, as the list of toxins that target these essential enzymes is growing and the therapeutic utility of topoisomerase inhibition remains vital. My interest lies in the latter, in toxin-mediated inhibition of topoisomerase enzymes as a means to alter bacterial cell growth. While I have strong expertise in structural biology, I am lacking in expertise for mass spectrometry. I note this because this method was used for the identification of the target substrate.

      We appreciate the reviewer’s insightful discussion and interest in our study. We agree that further investigations are crucial to address the open questions posed, and we have initiated work on some of these avenues.

      For example, considering Cpe1's specificity for the ATPase domain of GyrB and ParE, we have begun examining whether Cpe1 targets other ATPase domains by searching for the consensus sequence or double glycine motifs in the sequences of ATPase domains beyond GyrB and ParE. Among the 42 E. coli ATPase domains identified by the PEC database8, we found several with double glycine residues. However, none contained the exact LHAGGKF consensus sequence identified in GyrB and ParE, which are targeted by Cpe1 (Author Response Figure 3). These findings suggest that Cpe1 is less likely to target other ATPase domains. Nonetheless, due to Cpe1’s potential tolerance of certain variations within the consensus sequence, we cannot draw a definitive conclusion without further investigation into the cleavage sites.

      Another critical open question is the impact of Cpe1-mediated cleavage on the function of GyrB and ParE. To address this topic, we have begun investigating if Cpe1 cleavage affects the ATPase activity of these proteins. As expected, our biochemical analysis has demonstrated a significant decrease in ATP hydrolysis in the presence of active Cpe1tox, but not in the presence of the catalytic mutant Cpe1toxC362A (Author response Figures 4a, b). These results confirm that the ATP-dependent activities of both GyrB and ParE are disrupted following Cpe1 cleavage9. Previous work on FicT toxin that inhibits GyrB and ParE ATPase activity through post-translational modification found that ATP-dependent activities such as DNA supercoiling, relaxation, and decatenation were inhibited10,11. Interestingly, GyrB’s relaxation of negative supercoiled DNA, which does not require ATP, was also affected to some extent. This outcome raises the question as to whether Cpe1-cleaved GyrB results in similar downstream defects. Investigating this possibility would provide valuable insights into Cpe1’s mode of action, although we feel doing so is beyond the scope of the current study. Consequently, we view this as an important area for future research.

      Finally, regarding the potential applications of Cpe1, we are interested in further investigating its enzymatic specificity and properties. In this study, we analyzed the binding kinetics between Cpe1 and its substrate (Figure 5f) and currently we are endeavoring to characterize the kinetics of Cpe1-mediated proteolysis. To better probe hydrolytic dynamics, we plan to utilize a substrate with a reporting group (such as a chromogenic or fluorogenic leaving group) to monitor cleavage over time. We could achieve this by designing a recombinant substrate based on our knowledge of Cpe1’s native substrates (GyrB and ParE) and the target sequence (“LHAGGKF”). Alternatively, a secondary reaction leading to colorimetric changes could be employed for detection. We consider this an exciting research direction and an important next step for this study.

      Overall, we are grateful for the reviewer’s recognition of the novelty and importance of our work in advancing the understanding of interbacterial toxins and their inhibitory effects on topoisomerases. We plan to further investigate the consequences of Cpe1 cleavage on GyrB and ParE and to explore Cpe1 kinetics and its mechanistic actions in more detail. This will not only deepen our understanding of bacterial toxin-mediated inhibition but may also provide critical insights into strategies for targeting type II DNA topoisomerases. The reviewer’s insightful feedback has proven invaluable in shaping our ongoing and future research directions.

      Reviewer #3 (Evidence, reproducibility, and clarity)

      Bacterial warfare in microbial communities has become illuminated by recent discoveries on molecular weapons that allow contact-dependent injection of bacterial toxins between competitors. Among the best characterized systems are the type VI secretion system (T6SS) or the contact-dependent inhibition (CDI) system (i.e. some of the T5SSs). These systems are delivering a plethora of toxins with various biochemical activities and a broad range of targets. In recent years many such toxins have been characterized and their relevance in pointing at appropriate drug targets is increasing.

      In this study the authors built on a previously published association of a family of proteins, papain-like cysteine proteases (PLCPs), with their delivery by T6SS or CDI into target bacterial cells. Whereas this observation is not particularly novel, the findings that this set of proteins, that the authors called now Cpe1, can specifically target bacterial proteins such as ParE and GyrB, so that it affects chromosome partitioning and cell division, is groundbreaking. The authors are clearly demonstrating that Cpe1 cleaves their target proteins at double glycine recognition site which is in line with previous characterization of such proteases when fused to a particular category of ABC transporters. Even more remarkably they can show using biochemical approaches that Cpi1 is a cognate immunity for CpeI, preventing its activity, not by interfering with the catalytic site, but instead with the substrate binding site. The mechanism of competitive inhibition between immunity and substrate is also substantiated by biochemical data.

      We sincerely appreciate the reviewer’s interest in and support of our study.

      Major comments

      • This is a very well conducted study which combines bacterial genetics and phenotypes with excellent biochemical evidence.

      We thank the reviewer for their positive comments.

      • There are 8 targets identified for Cpe1 and yet only two are cleaved by the enzyme. It is intriguing that FtsZ is one identified target by the pull down but not confirmed for cleavage. The authors rules this as false positive but the cell division defect associated with Cpe1 activity would be consistent here. Are there any double glycine in FtsZ that could be identified as cleavage site? Is it possible that slightly different incubation conditions may promote degradation of FtsZ?

      We appreciate the reviewer’s thoughtful comment regarding FtsZ as a potential substrate of Cpe1. This was indeed an intriguing possibility, especially given the cell division defects observed following Cpe1 intoxication. Early on in the project, we also identified FtsZ as a Cpe1 interactor in our proteomic crosslinking assays, which further fueled the hypothesis that FtsZ might be a target.

      To explore this possibility, first we examined the FtsZ protein sequence for potential Cpe1 cleavage sites and identified several double glycine motifs (Author response Figure 5a). However, none of these motifs matched the consensus sequence identified in GyrB and ParE, which is LHAGGKF, a sequence that we have shown to be critical for Cpe1 cleavage activity. In an effort to better understand if FtsZ could still be cleaved by Cpe1, we conducted additional cleavage assays under various conditions (Author response Figure 5b). We tested different incubation temperatures, including increasing the temperature to 37 °C, and extended the reaction time to overnight. However, we did not observe any cleavage of FtsZ under these conditions. Given that FtsZ undergoes significant conformational changes upon binding to GTP12, we also considered the possibility that the GTP-bound form of FtsZ might be cleaved by Cpe1. However, even under those conditions, no significant cleavage of FtsZ was detected (Author response Figure 5b). Based on these results, we do not have any evidence to support that FtsZ is a target of Cpe1. The observed cell division defects are more likely a secondary effect resulting from the cleavage of GyrB and ParE, direct targets of Cpe1 that are crucial for chromosome segregation.

      • Could it be structurally predicted whether the GG of ParE or GyrB is fitted into the catalytic site of Cpe1.

      We appreciate the reviewer’s insightful question regarding the structural prediction of the GG motif of ParE and GyrB fitting into the catalytic site of Cpe1. To address this possibility, we used Alphafold 3 to predict the interaction structure between Cpe1 and its substrates13. The resulting model of Cpe1 interacting with the ATPase domain of GyrB (GyrBATPase) is shown in Supplementary Figure 9c. As illustrated, the loop of the GyrB ATPase domain containing the consensus targeting sequence (“LHAGGKF”) fits into the catalytic site of Cpe1, with the GG motif positioned closest to the catalytic cysteine residue, which likely facilitates hydrolysis. We also attempted to model the interaction between Cpe1 and the ATPase domain of ParE. However, confidence for this model was lower (ipTM = 0.74, pTM = 0.71), possibly due to Alphafold’s preference for certain protein configurations. To gain a more accurate understanding of how Cpe1 binds and recognizes its substrates, we are currently working on co-crystallizing Cpe1tox with GyrB and ParE. This long-term project aims to provide precise structural insights into the Cpe1-substrate interaction and further elucidate the mechanism of cleavage.

      Minor comments

      • The authors described a family of proteases, PLPCs, and characterized one here called Cpe1. Not clear whether this is a generic name or one specific protein from one particular bacterial species. Indeed, it is unclear from which bacterial strain the Cpe1 protein studied here originates.

      We thank the reviewer for this comment and apologize for the lack of clarity. To provide better context, we have now revised the manuscript (Lines 136-137 and 141-145) to clearly state that the Cpe1 protein characterized in this study originates from E. coli strain ATCC 11775.

      • It may be worth to emphasize that the Cpe1 domain is found in all possible configurations as T6SS cargo and that is to be linked to VgrG, PAAR or Rhs.

      Thank you for this suggestion. We have revised the manuscript accordingly to emphasize this point (Lines 106-109).

      • Line 49 the authors could indicate that the Esx system is also known as type VII secretion system (T7SS).

      Thank you for this suggestion. We have revised the manuscript accordingly (Line 48-50).

      • Line 113 it may be better to use Proteobacteria instead of Pseudomonadota

      We have revised the manuscript (Lines 114-115) as suggested by the reviewer. It is important to note that following the recent decision by the International Committee on Systematics of Prokaryotes (ICSP) to amend the International Code of Nomenclature of Prokaryotes (ICNP) and formally recognize "phylum" under official nomenclature rules14,15, the taxonomy database used in our analysis has adopted the updated nomenclature. To ensure consistency, we followed this updated nomenclature throughout the original manuscript.

      Reviewer #3 (Significance)

      This is an excellent piece of work. The characterization of Cpe1 might look poorly novel at the start when compared to previous studies. Yet the findings go crescendo by characterizing original mechanisms of action of the cognate immunity, and by identifying the molecular target of Cpe1. This is providing real conceptual advance in the T6SS field and not just reporting yet another T6SS toxin.

      As a T6SS expert I genuinely feel that these findings are groundbreaking and could be targeted to broad audience since the possible implications of these observations for future antimicrobial drugs discovery or therapeutic approaches is highly relevant.

      We sincerely appreciate the reviewer’s positive remarks and support of our study.

      References

      1. Ishihama, A., and Shimada, T. (2021). Hierarchy of transcription factor network in Escherichia coli K-12: H-NS-mediated silencing and Anti-silencing by global regulators. FEMS Microbiol Rev 45. 10.1093/femsre/fuab032.
      2. Hersch, S.J., Manera, K., and Dong, T.G. (2020). Defending against the Type Six Secretion System: beyond Immunity Genes. Cell Rep 33, 108259. 10.1016/j.celrep.2020.108259.
      3. Russell, A.B., Singh, P., Brittnacher, M., Bui, N.K., Hood, R.D., Carl, M.A., Agnello, D.M., Schwarz, S., Goodlett, D.R., Vollmer, W., and Mougous, J.D. (2012). A widespread bacterial type VI secretion effector superfamily identified using a heuristic approach. Cell Host Microbe 11, 538-549. 10.1016/j.chom.2012.04.007.
      4. Jana, B., Fridman, C.M., Bosis, E., and Salomon, D. (2019). A modular effector with a DNase domain and a marker for T6SS substrates. Nat Commun 10, 3595. 10.1038/s41467-019-11546-6.
      5. Halvorsen, T.M., Schroeder, K.A., Jones, A.M., Hammarlof, D., Low, D.A., Koskiniemi, S., and Hayes, C.S. (2024). Contact-dependent growth inhibition (CDI) systems deploy a large family of polymorphic ionophoric toxins for inter-bacterial competition. PLoS Genet 20, e1011494. 10.1371/journal.pgen.1011494.
      6. Nguyen, T.T., Sabat, G., and Sussman, M.R. (2018). In vivo cross-linking supports a head-to-tail mechanism for regulation of the plant plasma membrane P-type H(+)-ATPase. J Biol Chem 293, 17095-17106. 10.1074/jbc.RA118.003528.
      7. Liu, Y., Yu, J., Wang, M., Zeng, Q., Fu, X., and Chang, Z. (2021). A high-throughput genetically directed protein crosslinking analysis reveals the physiological relevance of the ATP synthase 'inserted' state. FEBS J 288, 2989-3009. 10.1111/febs.15616.
      8. Yamazaki, Y., Niki, H., and Kato, J. (2008). Profiling of Escherichia coli Chromosome database. Methods Mol Biol 416, 385-389. 10.1007/978-1-59745-321-9_26.
      9. Reece, R.J., and Maxwell, A. (1991). DNA gyrase: structure and function. Crit Rev Biochem Mol Biol 26, 335-375. 10.3109/10409239109114072.
      10. Harms, A., Stanger, F.V., Scheu, P.D., de Jong, I.G., Goepfert, A., Glatter, T., Gerdes, K., Schirmer, T., and Dehio, C. (2015). Adenylylation of Gyrase and Topo IV by FicT Toxins Disrupts Bacterial DNA Topology. Cell Rep 12, 1497-1507. 10.1016/j.celrep.2015.07.056.
      11. Lu, C., Nakayasu, E.S., Zhang, L.Q., and Luo, Z.Q. (2016). Identification of Fic-1 as an enzyme that inhibits bacterial DNA replication by AMPylating GyrB, promoting filament formation. Sci Signal 9, ra11. 10.1126/scisignal.aad0446.
      12. Matsui, T., Han, X., Yu, J., Yao, M., and Tanaka, I. (2014). Structural change in FtsZ Induced by intermolecular interactions between bound GTP and the T7 loop. J Biol Chem 289, 3501-3509. 10.1074/jbc.M113.514901.
      13. Abramson, J., Adler, J., Dunger, J., Evans, R., Green, T., Pritzel, A., Ronneberger, O., Willmore, L., Ballard, A.J., Bambrick, J., et al. (2024). Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493-500. 10.1038/s41586-024-07487-w.
      14. Oren, A., Arahal, D.R., Rossello-Mora, R., Sutcliffe, I.C., and Moore, E.R.B. (2021). Emendation of Rules 5b, 8, 15 and 22 of the International Code of Nomenclature of Prokaryotes to include the rank of phylum. Int J Syst Evol Microbiol 71. 10.1099/ijsem.0.004851.
      15. Oren, A., and Garrity, G.M. (2021). Valid publication of the names of forty-two phyla of prokaryotes. Int J Syst Evol Microbiol 71. 10.1099/ijsem.0.005056.
    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to the Reviewers

      We thank the reviewers for their evaluation of our previous submission and have responded to each point in detail below. Overall, we have revised the manuscript with the addition of several new data and corresponding figure panels that strengthen our previous conclusions and add new insights allowing us to extend the conclusions of the study. Important additions include new data showing the impact of loss of CLU on adapting to additional stressors during metabolic transitions that supports a mechanistic understanding of our omics results; by poly(dT) FISH we show that fly Clu granules indeed contain mRNAs; FRAP microscopy analysis supports that Clu1 granules have dynamic content similar to other LLPS membraneless organelles; and we have re-analysed our data to demonstrate more clearly the impact of Clu1 on translation efficiency and also the relative binding of mRNAs during translation. In addition, we provide some extra control analyses for completeness.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this manuscript the authors study the Clustered mitochondrial proteins Clu of Drosophila melanogaster and Clu1 of Saccharomyces cerevisiae, two homologues of the mammalian protein CLUH. They show in compelling microscopy analysis that both proteins form granules. This was the case for flies fed on yeast paste after starvation and in yeast in post-diauxic phase, in respiratory media or during mitochondrial stress. They show that these granules are found in proximity to mitochondria and that they behave like liquid-liquid-phase separated condensates. They show by co-staining for P-bodies and stress granules that Clu1-granules are distinct from these RNA granules. Furthermore, they found that the formation required active translation. In the second part, they show that Clu1 interacts with ribosomal and mitochondrial proteins by BioID. The deletion of Clu1 leads to slightly impaired growth on media containing Ethanol as a carbon source. They find that nascent polypeptides of some mitochondrial precursor proteins are decreased in the deletion of Clu1 and conclude that Clu1 regulates translation of these proteins. Using RNA immunoprecipitation of Clu1-GFP in presence of cycloheximid, EDTA and puromycin. The mRNAs of nuclear-encoded mitochondrial proteins found to be interacting with Clu1 were purified in conditions when the ribosomes are intact and the RNAs showed no interaction when ribosomes were disassembled. They show in sucrose gradients that Clu1 co-migrates with polysomes independent of its distribution state or carbon source. However, when cells are grown in conditions of granule formation, then polysomes and Clu1 run less deeply into the gradient. Form these data, the authors conclude that Clu/Clu1 regulates the translation of nuclear-encoded mitochondrial proteins.

      Major comments:

      -The authors state that Clu1 is regulating translation during metabolic shifts. However, it is not clear what the real impact on mitochondrial function is. They show that there is a minor growth defect on ethanol media when CLU1 is deleted. However, if Clu1 is necessary mainly for adaptation, the phenotype will be strongest observed in conditions where cells switch carbon sources. Growth curves would be suitable in which the lag-phase of yeast cells precultured either in glucose or glycerol switched to media of different carbon sources (glucose to glycerol or glycerol to glucose) are measured. One would expect that the deletion mutant shows a longer lag-phase compared to the wild type when shifted from glucose to glycerol media.

      We agree that this is an important question, and, duly, we previously attempted to address this exactly as the reviewer described. Surprisingly, we were not able to observe any substantial differences in the duration of the lag phase between the wild-type and CLU1 knockout strains under these conditions. However, we did note that CLU1 knockout cells consistently reached stationary phase with a lower optical density when switched to ethanol media, consistent with these cells having a different metabolic efficiency during growth on ethanol media.

      To further explore the role of Clu1, we noted that several of the Clu1 mRNA interactors were mitochondrial heat shock proteins (HSPs), which are crucial for mitochondrial protein folding and import during the transition from fermentation to respiration. Hence, we hypothesised that the absence of Clu1 might lead to increased sensitivity to heat shock during the metabolic shift.

      To test this, we subjected both wild-type and CLU1 knockout cells to heat shock under three different conditions: (1) during growth on glucose-containing media (fermentation), (2) after shifting cells to media containing ethanol during the lag phase, when cells are adapting to respiration, and (3) after cells had fully adapted to ethanol and resumed growth. Interestingly, CLU1 knockout cells were more sensitive to heat shock selectively during the adaptation to respiration, which involves the translation of an extensive number of mitochondrial proteins. We think that the small difference in translation of mitochondrial HSPs becomes evident only upon additional heat shock, likely due to a deficient mitochondrial protein folding and import. These findings support our hypothesis that Clu1 is essential for optimal mitochondrial function during metabolic shifts.

      These results have been added to the manuscript and shown in Fig. S6 and described on page 9.

      -In line with this, how different is the mitochondrial proteome of the WT and the mutant? Do hits of the BioID, RIP and Punch-P experiments change at steady state or during metabolic shifts? Either proteomics of isolated mitochondria or western blots of whole cells or isolated mitochondria of WT and the deletion mutant grown in conditions of Clu1-granule formation or no granules for the hits could answer this question.

      We also considered this question during the course of the work. However, in exploratory analyses we saw no obvious differences in overall mitochondrial proteomics at steady-state which is what prompted us to look at more subtle effects on translation. Considering this further, changes in steady-state levels can be complex to interpret as they represent the combined effects of protein production and degradation. Small changes arising from altered production could be masked by compensatory changes in turnover rate. In light of this, we believe that the translational regulation differences identified in our study remain central to understanding the role of Clu1, and any downstream proteomic changes would not alter our primary conclusions.

      -The authors analyze RNAs bound in polysomes to assess translation efficiency. Translation efficiency is usually calculated by the fraction of RNA bound by ribosomes to the total RNA amount of an RNA species. Thus, doing RT-qPCR from whole cells would be necessary to assess if the occupancy of ribosomes on the transcripts is due to changes in RNA abundance or other regulatory pathways and would help to further assess what causes the observed changes.

      Thanks for this recommendation. To address this and expand our analysis to other proteins differentially translated in clu1Δ cells, we measured the mRNA steady-state levels by performing RNAseq on WT and clu1Δ strains grown under the same conditions as used for Punch-P. We then calculated the translation efficiency by dividing the nascent protein levels (Punch-P) by steady-state mRNA levels (RNAseq), as previously described for Punch-P data (PMID: 26824027). The translation efficiency for the majority of proteins with reduced translation in the clu1Δ cells by Punch-P analysis was lower. Similarly, the majority of proteins with increased translation had higher translation efficiency.

      The mRNA quantification in polysomes we originally presented in the manuscript, further showed that the decrease in translation efficiency is not caused by a simple decrease of mRNA engaged in translation and that Clu1 is regulating protein translation at the ribosome level. In contrast, for higher translated proteins, we detected an increase in mRNAs engaged in polysomes, likely underlying the increased translation. These results further support our conclusions regarding the regulatory effects of Clu1 on translation.

      These results have been added to the manuscript and shown in Fig. 7E and described on page 9.

      OPTIONAL:

      -The authors show a co-localization of Clu/Clu1 with mitochondrial fission factors and conclude that the granules appear likely near fission sites. Indeed, CLUH has been implied in the past to play a role in mitochondrial fission (Yang, H., Sibilla, C., Liu, R. et al. Clueless/CLUH regulates mitochondrial fission by promoting recruitment of Drp1 to mitochondria. Nat Commun 13, 1582 (2022). https://doi.org/10.1038/s41467-022-29071-4). Thus, are fission sites required for Clu-granule localizations? What is the role of the mitochondrial network integrity for the granule distribution? Expressing Clu-GFP/Clu1-GFP in cells depleted for the fission factors would provide information on that.

      Thanks for this suggestion. We agree that it would be interesting to know whether Clu1 granules still appear when mitochondrial fission is blocked. We tried to address this question but encountered some technical limitations. First, overexpression of Clu1-GFP via a plasmid did not replicate the endogenous Clu1 behaviour, making it necessary to delete the fission factors in the Clu1-GFP background. While crossing the Clu1-GFP strain with already available knockout strains would be straightforward, we would need access to a tetrad dissecting microscope, which unfortunately was not available to us. We also attempted PCR-based gene deletion but the sequence homology between the GFP-tagging cassette and the deletion cassettes made this very challenging. Given these limitations, and as the lab's yeast expert had already left, we were not able to pursue this experiment further and have removed these observations from our manuscript. We hope that future studies will explore this question in more detail.

      -The author assess convincingly that Clu1 interacts with ribosomes and runs with polysomal fractions. However, how it actually regulates translation is not clear. To answer this question, selective ribosomal profiling would be necessary. The authors have established conditions which would be suitable for the experiment. They could use crosslinking and sucrose cushions to IP ribosomes with Clu1-GFP bound to be used for ribosomal profiling. However, this experiment is quite time-intensive (3-4 months) and expensive, thus, an optional suggestion.

      We thank the reviewer for this suggestion. We agree that ribosome profiling could provide novel insights into the function of Clu1/Clu. While we recognise the potential of this approach, as the reviewer points out, this experiment would indeed be time- and resource-intensive. Based on our initial tests, where we included cross-linked samples (UV and formaldehyde) we anticipate that it could even take longer than the estimated 3-4 months, as the IP using cross-linked lysates was not as successful as the IP using non-cross-linked samples: we were not able to immunoprepitate Clu1 so efficiently likely to the epitope being poorly exposed to the antibody. Although we have optimised working conditions for co-immunoprecipitating Clu1 with ribosomes, performing ribosome profiling using our setup within the timeframe and resources of this study is unfortunately not currently feasible.

      Minor comments:

      Fig1: B, C, please add scale bars into the zoom ins.

      These have been added.

      Fig 2 would profit from inlets of zoom ins to visualize the distribution better.

      These have been added.

      Fig.3: Panel C does not really add much information. I would rather remove it or put it into supplements and therefore show a zoom of Panel E with a line plot showing the rings. It is not clear from the represented images where the rings are formed.

      We think some confusion has arisen from the text description. It seems that the reviewer was under the impression that Fig. 3C and 3E were intended to be showing the Clu1 rings around the mitochondria, but this was shown only in Fig. S3A. We have re-written these sentences for better clarity. To be clear, Fig. 3C is a 3D rendering of the left-hand cell in 3B (3D is a line plot of part of the right-hand cell) and 3E is a different experiment showing the formation of Clu1 granules under a different respiratory stress (galactose plus CCCP). We have also added a line plot showing Clu1-GFP and mito-mCherry fluorescence intensity to highlight the Clu1 rings around the mitochondria in Fig. S3A.

      Fig.3 panel F: Max projections are not appropriate to show colocalization as they can lead to false-positive overlaps. Just remove the max projections.

      We tried a number of different approaches to improve this analysis but, ultimately, we were not able to generate sufficiently robust data to be convincing so we decided to remove this from the manuscript. The coincidence of Clu1 granules with mitochondrial fission factors was an adjunct observation and not a major part of the story and has been discussed by others relating to fly Clu (PMID: 35332133), so removal from the current manuscript does not impact the key conclusions of the study.

      References 21 and 22 are the same.

      Thanks. This has been fixed.

      Reviewer #1 (Significance (Required)):

      This manuscript shows in a convincing way that Clu and Clu1 form RNA granules and that Clu1 interacts with ribosomes. It is written in a clear way and the figures support the conclusions drawn in the text. The finding that Clu/Clu1 is important for metabolic adaptation has not been shown in fly or yeast to my knowledge. It is in line with findings for the mammalian homologue CLUH. Thus, the findings are supported by earlier work. This study is of value for a broader audience of the basic research field, especially of the mitochondrial and RNA granule field, as it supports the idea of post-transcriptional regulation of nuclear-encoded mitochondrial protein gene expression for dynamic adaptation of mitochondrial function. The conditions when Clu granules form is studied in detail, followed up by identification of target RNAs and interaction partners. Though the interaction of Clu1 with ribosomes is shown in a compelling way, a detailed mechanism of the function of Clu/Clu1 is missing and would require more experiments. Thus, even though a detailed mechanism is missing, the study does expand on our understanding of Clu/Clu1 in regulating mitochondrial biogenesis and is therefore of high interest of the mitochondrial field.

      Expertise: mitochondria, yeast, RNA granules, mitochondrial biogenesis, next-generation sequencing, fluorescence microscopy

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      In this manuscript the authors use D. melanogaster and S. cerevisiae to study the role of CLUH in the translation of nuclear-encoded mitochondrial proteins. During conditions requiring aerobic respiration, CLUH forms RNA-dependent granules that localise in the proximity to mitochondria. Furthermore, the authors demonstrate that CLUH interacts with translating ribosomes to facilitate the translation of specific target mRNAs. For this, the authors use a combination of GFP-tagged CLUH models. BioID, polysome translating proteomics, RNA-IP. The authors' main conclusions are that (i) CLUH forms dynamic, membrane-less, RNA-dependent granules under conditions that demand aerobic respiration, (ii) CLUH interacts with specific mRNAs encoding metabolic factors, and (iii) CLUH interacts with the translating ribosome. The manuscript is well written and the conclusions stand in proportion to the experimental output and the results. The main concern is with regards to lack of advancement in relationship to published data.

      We appreciate the reviewer's feedback and specific comments which we respond to individually below. However, we would like to first address the point regarding "lack of advancement" and the use of the "CLUH" terminology which the reviewer uses throughout their critique. We would like to reiterate, as the reviewer states, our work focussed exclusively on yeast Clu1 and Drosophila Clu. None of our data relates to mammalian CLUH. While these proteins share substantial sequence homology, it is imprudent and scientifically unsound to assume cross-species equivalence without directly testing. Indeed, one of the central aims of our study was to characterise the molecular function of yeast Clu1, which remains almost entirely unstudied.

      We acknowledge that some of the observations contained within our study have been described by others and we have appropriately noted and cited these in context. Nevertheless, (a) independent replication is always valuable but easily criticised as lacking novelty, and (b) the majority of the work was analysing the molecular dynamics and function of yeast Clu1 which is almost completely unstudied and may help provide hypotheses for others to test for conservation in mammalian CLUH. Hence, we consider that summarising the work as 'lacking advancement' is misplaced.

      Comments:

      To this reviewer it is not clear how CLUH can regulate the translation of specific mRNAs while being bound to ribosomes, regardless of being in a diffuse or granular state. The authors suggest that under metabolically active conditions, CLUH might aggregate translating ribosomes, forming the granular structures. How CLUH though can both be bound to translating ribosomes and recruit specific mRNAs at the same time is not explained.

      It was indeed surprising to us that the data indicate that Clu1 can bind both mRNAs and ribosomes to affect translation, and we share the reviewer's curiosity about the precise mechanism of how this occurs. While we have provided novel insights into this situation, dissecting the precise molecular mechanisms is beyond the scope of the current study.

      The authors might want to discuss how changes in metabolic demands signal the aggregation of CLUH, and how CLUH can recognise its target mRNAs.

      We appreciate the reviewer's point here but as this would be pure speculation we have made only brief comments on this at the end of the Discussion.

      What was the rationale to perform the RIP or the PUNCH-P experiments only under non-challenged conditions, but not under conditions demanding aerobic respiration?

      We appreciate the reviewer's question. In fact, the Punch-P analysis was carried out on cells that had been transferred to ethanol to induce respiration. This was stated in the Methods, but we appreciate that this may have been missed so we have now clarified this in the main text (p9).

      Regarding the RIP, our initial tests showed that mRNAs encoding proteins found to interact with Clu1 by BioID were interacting with Clu1 in both fermenting and respiring conditions. Due to this consistency, it did not seem necessary to perform the RIP experiments under both metabolic conditions, so we chose to conduct the experiment under the simpler growth condition.

      If CLUH is ubiquitously bound to ribosomes, has CLUH been seen in any structural representation of the cytosolic ribosome?

      This is a good question, and we wondered the same. To our knowledge, Clu1/Clu/CLUH has not been observed in any structural studies of the ribosome, and no formal structure of any Clu family proteins has been resolved.

      Nevertheless, we would like to clarify that we do not think, or suggest in the manuscript, that Clu/Clu1 is ubiquitously bound to ribosomes. First, current evidence supports that Clu/Clu1 only regulates a specific subset of mRNAs. Second, our work, particularly the sucrose gradient experiments, shows that Clu1 interacts transiently with ribosomes, as cross-linking was required to capture the full extent of this interaction. This transient and selective interaction of Clu/Clu1 with the ribosome, together with the fact that transient interactors are often lost during ribosome purification, makes Clu/Clu1 detection in structural studies unlikely. Due to the transient interaction and dynamic localisation of Clu/Clu1, capturing Clu/Clu1 in ribosomal structures will require significant work in the future.

      Reviewer #2 (Significance (Required)):

      CLUH has been studied in various publications, showing data very similar to that presented in this manuscirpt. However, the authors provide a comprehensive analysis on both yeast and fly CLUH. The strength of the manuscript is the combination of several elegant methods and genetically modified model systems in two species to elucidate the role of CLUH during the translation of specific mRNA. In my view through, the advancement of understanding the function of CLUH is limited.

      Although the authors work in yeast and DM, the results seem applicable to other species, including humans, and thus, the presented results will be of interest in a range of researchers working in the field of metabolic regulation and gene expression.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary: This study from Miller-Fleming et al. employs yeast and Drosophila as model systems to explore the function of the RNA-binding protein Clu1, which is involved in mitochondrial biogenesis. The first part of the manuscript characterizes so called "Clu1 granules", and their dependance from metabolic transitions. In particular, using yeast, they find a relocalisation of Clu1 upon starvation and several mitochondrial stress conditions. These granules are not stress granules, and are dissolved by RNAse and puromycin treatment. The second part of the study aims to understand the molecular function of the protein and its link to translation. The results confirm an evolutionary conserved role of Clu1 in binding mRNAs for mitochondrial proteins and in interacting with mitochondrial proteins, ribosomal components and polysomes. In addition, the authors claim that binding of Clu1 to RNA is enhanced when mRNAs are trapped in polysomes by treatment with cycloheximide (CHX), leading to the proposal that Clu1 binds mRNAs during active translation.

      Major comments:

      -The claim of Clu1 granule localization next to mitochondria (Figure 3) would be more convincing if any of the experiment would be quantified. Especially in the case of panel 3G in Drosophila egg chambers where there are a lot of mitochondria, one wonders whether the closeness to mitochondria is just random. Furthermore, mdv1-signal does not look very convincing, being blurry and not dotty as expected. Thus, the conclusion that Clu1 granules partially colocalization with site of fission appears premature.

      The claim that Clu/Clu1 granules are often found in close proximity to mitochondria was inferred from observations from multiple analyses from yeast (we looked at hundreds of cells in several different conditions) and flies, where it had already been demonstrated (Cox and Spradling, 2009). We agree that observations of the fly egg chambers are challenging due to the very high density of mitochondria (and other cellular components - see the new analysis of poly(A) mRNAs) in these highly active cells. These considerations motivated us to take the CLEM approach (in addition to investigating the membraneless nature), to gain a much higher resolution view of the localisation of the granules. This analysis unequivocally showed that the Clu granules were exactly juxtaposed to several mitochondria. It is noteworthy that even in the TEM images shown, there is ample cytoplasm in which the Clu granule could be located if the association with mitochondria was coincidental and all granules had mitochondria in close proximity.

      Regarding the possible coincidence of Clu1 with mitochondrial fission factors, as mentioned above for Reviewer 1, we tried a number of different approaches to improve this analysis but, ultimately, we were not able to generate sufficiently robust data to be convincing so have decided to remove this from the manuscript. Since this was an adjunct observation and not a major part of the story and has been discussed by others relating to fly Clu (PMID: 35332133), removal from the current manuscript does not impact the key conclusions of the study.

      Based on the ability of 1,6-hexanediol to dissolve the granules (Figure 4), the authors conclude that: "Clu1 foci have membraneless nature". As they correctly state in the discussion, treatment with 1,6-hexanediol can have other effects. I suggest to be more cautious with the conclusions or add additional experiments. Are the granules dynamics if using FRAP? Do they fuse?

      The inference that the Clu1 granules are membraneless organelles was not solely based on the observation that they disassemble upon 1,6-hexanediol treatment but was made in conjunction with the CLEM analysis that showed unambiguously that Clu granules are not associated with any detectable membrane, which is strong evidence that these granules are membraneless in nature. Indeed, as the reviewer mentioned, we are cautious in concluding they have been formed by liquid-liquid phase separation (LLPS) and we do acknowledge that 1,6-hexanediol can have other effects in cells. Nevertheless, following the reviewer's suggestion we have analysed Clu1 granule dynamics using FRAP, even though we are aware that FRAP is also not a definitive proof that a structure is formed by LLPS. The FRAP analysis, shown in new Figure 4C, D, revealed approximately 50% recovery over 10 min imaging timeframe. As discussed on page 13, this indicates a dynamic nature of these granules, but this dynamism can vary widely between different types of granules and even different proteins within the same granule. Further work is warranted to fully investigate the dynamic nature of Clu/Clu1 granule components.

      The experiment in which the granules are dissolved by treatment with RNAse is very interesting. However, per se this does not directly demonstrate that the granules contain mRNA. To state this the author should perform FISH experiments for example using a probe to detect poly-A.

      We thank the reviewer for this suggestion. We have performed poly(dT) FISH in egg chambers. Initial analysis showed that the fluorescence was diffuse and widely distributed, as expected for these highly active cells, but with no specific accumulation in Clu granules. Interestingly, we observed that treatment with RNase A, which we initially used to demonstrate probe specificity, revealed an enrichment of poly(A) RNAs in Clu granules. So, while treating the live egg chambers with RNase revealed that granules depend on RNA for their stability, treating fixed egg chambers revealed more directly the presence of RNAs in granules.

      These results have been added to the manuscript and shown in Fig. 5 and described on page 7.

      The authors show that puromycin prevents the granule formation before insulin addition in the fly. Are these results (upon RNAse treatment and puromycin treatment) recapitulated in the yeast system? The authors conclude that Clu1 formation depends on mRNAs being engaged in translation, but never show that the granules are site of active translation. More experiments in this direction (for example using puro-PLA of specific mRNAs) are missing and would clearly improve the manuscript.

      Thanks for this very interesting consideration. We agree that we have not formally shown that the Clu1 granules are sites of active translation. A major limitation to addressing this is that puromycin is not able to penetrate the yeast cell wall, so cannot be used for analysis of intact cells as would be needed in this case. We agree that this would be a welcome addition but is beyond the scope of the current study.

      The interactome of Clu1-neighbouring proteins (Figure 6) is interesting and a valuable addition to data in other organisms. I am wondering why the authors have not used as a control a cytosolic BirA-GFP, which would have been the right control for this experiment, especially since GFP tends to form aggregates.

      We thank the reviewer for this comment. With hindsight, we agree that a cytosolic BirA-GFP would have been a better control. However, we are confident in our results for the following reasons:

      1. The levels of GFP obtained from Clu1-GFP expression are low, and under these conditions, we observed no evidence of GFP aggregation. Even in experiments where GFP is overexpressed from a high-copy 2µ plasmid under a strong promoter, we do not detect aggregation. Aggregation is not a concern in our experimental setup.
      2. Our conclusions are not solely based on the interactome analysis (BioID) but are supported by complementary findings. Specifically, several proteins identified in the proximity to Clu1 in the BioID analysis showed reduced translation in Clu1 knockout cells, and their corresponding mRNAs were found to interact with Clu1 during translation. These complementary results from independent techniques provide strong evidence for Clu1's role and validate the findings of the interactome analysis. Given this robust and complementary dataset, having BirA as a control strain was sufficient to validate our conclusions.

      Figure 7B: The log 2 FC for the changed proteins are in many cases small, implying that the difference in translation for these proteins is not so large. For this reason, it is relevant to know how was the statistical significance calculated for these MS measurements. In the supplementary Tables and in Fig 7B, a p value is indicated and it is not clear if this is a simple p value or an adjusted p value (FDR or q value). If not shown, I recommend showing the adjusted p value, so that one can have an idea of the solidity of the data and the claim. Again, this is an important piece of evidence, since the authors base on this experiment the conclusion that Clu1 controls translation of these mRNAs.

      Thanks for this comment. We have now included the q-value in the supplementary table.

      Minor comments:

      -Figure 1: The change in Clu1 localisation in post-diauxic phase or upon changing of the medium is evident from the images shown. However, it seems that the experiment has been performed only once (the same for Figure 2). Is this the case? An important information would be to show the expression levels of Clu1-GFP in the different conditions. Does recruitment of CLU1 to granules associate to increased expression levels?

      The experiments shown in figures 1 and 2 were performed independently at least three times, as stated in the figure legends. The numbers shown are indicative values from one of the replicate experiments. This has now been added to the figure legends.

      We agree that providing the information regarding the expression levels of Clu1-GFP is important to address whether the recruitment of Clu1 to granules is associated with changes in its abundance. To this end, we have performed an additional experiment to quantify Clu1-GFP levels under the conditions where Clu1 is diffuse (log growth phase in glucose-containing media) and when Clu1 is in granules (sodium azide treatment).

      These results have been added to the manuscript and shown in Fig. S2 and described on page 4.

      Figure 2 A-B. The authors claim that the only stressor capable of inducing Clu1 granules formation alone is inhibition of complex IV activity via sodium azide treatment. Other mitochondrial stresses like CCCP treatment or OA treatment are efficient only when combined to starvation. It should be mentioned that sodium azide treatment is not only capable of inhibiting complex IV but has also uncoupling function.

      Thanks for this comment. We have now mentioned this (p4).

      Figure 2 D-E: investigation of colocalization with Bre5 would help to understand how similar the yeast Clu1 granules are compared to the mammalian CLUH granules (Pla-Martin et al., 2020).

      This is an interesting suggestion and one that we also considered, but with limited time and resources we were not able to pursue this line of inquiry as well.

      Figure 8. This figure summarizes one of the most novel pieces of data about Clu1, the interaction with mRNAs via the ribosome. The way how panel A-C are represented is however a bit misleading. The Y axis in Figure B and C has the same amplitude as the one in A. Therefore, potential differences in Clu1-RNA pull-down in presence of EDTA or puromycin cannot be assessed. It is true that in presence of CHX there is much more pulled down RNA, but one cannot judge from these panels if there is any difference between Clu1 targets and controls also in the other conditions. The graphs should be modified and statistics added.

      We appreciate the reviewer's feedback regarding the presentation of the RIP-qPCR data in Fig. 8. Based on the comments, we have revised how the results are represented, improved the normalisation of the data and added statistical analysis.

      First, it is worth clarifying that the presentation of the original charts was done specifically to highlight the huge differences between RNA-pulldown in CHX versus disrupted ribosomes. It is also important to note that these RIP experiments were performed simultaneously under identical experimental conditions, so any differences lie in the treatments applied. To improve cross-comparison between treatments we have now incorporated an additional normalisation step. We normalised the enrichment levels of each mRNA tested against the non-specific binding observed with the negative control housekeeping genes (UBC6 and TAF10). This ensures that differences in bead loss or other technical variations are accounted for.

      We now show the comparison of the six positive hits and two negative controls normalised as described above, on the same scale (Fig. 8A). We now also present the relative effects of the three conditions (CHX, EDTA, and puromycin) within the same graph for each mRNA tested (Fig. 8B). This format enables direct comparison of Clu1 target mRNA enrichment and two negative controls across treatments, which is the relevant comparison for testing the hypothesis of ribosome-dependent interactions. We have adjusted the Y-axis scaling for each mRNA, as requested by the reviewer, and added statistical comparisons. For clarity, the data shown in Fig. 8A are also represented in the panels of Fig. 8B (CHX). We have amended the text appropriately and hope that these changes improve the comparisons between treatments and more readily demonstrate that Clu1 target enrichment is lost upon ribosome disassembly, either by EDTA or by puromycin.

      In addition, RNAse treatment in panel L does not seem to have really worked.

      These samples were cross-linked prior to treatment to preserve the transient interaction of Clu1 with the ribosome, hence, the normal dramatic effect of RNase to collapse the polysomes is much less pronounced. Nevertheless, the purpose of this experiment was to monitor whether Clu1 co-migrated with ribosomes, which it does.

      The authors should cite Vornlocher et al. (PMID: 10358023), who were the first to implicate Clu1 (Tif31) with translation.

      Thank you for this prompt. We have now added a comment on this in the Discussion (page 13).

      References 21 and 22 are the same.

      Thanks. This has been fixed.

      Reviewer #3 (Significance (Required)):

      The data reported in this manuscript are valuable, because they confirm an evolutionary conserved role of Clu1 in binding mRNAs for mitochondrial proteins and regulating their translation. It is also interesting that in yeast, similar to Drosophila and mammalian cells, Clu1 can form granular structures upon metabolic rewiring. A limitation of the study is that direct experiments to support the claim that Clu1 concentrates ribosomes engaged in translation are not provided. Furthermore, it is not clear what is the functional role of the Clu1 granules, since the proximity interactome and the binding of Clu1 to the polysomes is not affected by treatments that dissolve or stimulate granule formation.

      The study is of interest to a general cell biology audience.