5,827 Matching Annotations
  1. Aug 2022
    1. Author Response

      We thank the reviewers for their thoughtful and constructive comments which have helped us improve our manuscript. In our revised manuscript, we will respond to three main weaknesses:

      1. We will address the inconsistency in the experimental design across the behavior and the transcription experiments by repeating the behavior with an experimental timeline that more exactly matches that of the animals used in transcriptional studies;

      2. We will further validate and justify our use of TRAP and our focus on the NAc as the sole brain region of investigation;

      3. We will revise the language throughout the manuscript, especially in the discussion, to reduce anthropomorphizing of our results and interpretations. Below we have provided responses to specific concerns articulated by each reviewer.

      Reviewer #1 (Public Review):

      The monogamous vole provides unique opportunities to study the neural basis of pair bonding and this study exploits that opportunity in a novel way. Focusing on the nucleus accumbens, the authors conduct RNA-Seq to characterize the transcriptome in same-sex and opposite-sex pairs when bonded, when separated for a short time and when separated for a long time at which point the literature has in the past demonstrated the willingness to form a new bond. They determine that the transcriptome of pair bonding includes a preponderance of glial-associated gene changes and that it degrades with long-term separation. To the latter point, they then conduct a neuron enriching trap schema to find those genes subject to change specifically in neurons.

      The strength of the report is the clever experimental design, the unusual animal model, and the comparisons of same-sex and opposite-sex pairs and long-term and short-term separations.

      The weakness is that the behavioral changes observed are not what was expected based on prior work and are relatively modest, providing a disconnect between the outcome and the more dramatic transcriptional changes. A second weakness is the focus on the nucleus accumbens which is a brain region most closely associated with reward. While pair bonding may be rewarding, that component may be independent of the memory of a partner or the willingness to partner anew. Lastly, there is no clear connection between the identified transcriptome and either the formation or degradation of the pair bond.

      We thank the reviewer for noting the unique strengths of using prairie voles to investigate this specific question and for praising our experimental design, which compares opposite-sex and same-sex paired males at each time point to disentangle the effects of pair bonding from general social affiliation and isolation.

      Reviewers #1 and #3 noted the mismatch between the behavioral and transcriptional responses. Specifically, we found little evidence of bond dissolution following long term separation despite substantial erosion of the pair bond transcriptional signature. They further note that the experimental design employed to assess behavior and transcription differed, which may have contributed to the apparent mismatch. Importantly, our initial behavioral assessment as presented in Figure 1 of the manuscript had two strengths. It measured intra-animal changes in behavior over time and minimized the number of animals required. However, we agree with the reviewers, and we are currently repeating the behavior experiments to match the transcription experiments. Specifically, separated partners will be kept in separate colony rooms to ensure no possible access to partner-associated sensory cues (visual, auditory, olfactory), and we will use separate cohorts of animals for short- and long-term separation. This design avoids partner re-introduction during the short-term partner preference test. The results of this work will be informative regardless of outcome. If we observe a dissolution of pair bond behaviors, it indicates that re-exposure to a partner after a short, 48-hour separation has a powerful effect on bond duration following separation. If we do not observe any change in pair bond behaviors following separation, it would confirm that pair bond behaviors are more resistant to erosion than are transcriptional signatures of pair bonding.

      We have focused on the NAc because it is a critical hub that is engaged upon attachment formation and is implicated in loss processing. Specifically, studies have shown that blockade of neuromodulatory signaling (i.e. oxytocin and dopamine) in this region impairs bond formation and can lead to bond dissolution. Our group and others have demonstrated that plasticity within this region - in patterns of neuronal activity and in synaptic response to oxytocin - are associated with bond formation and maturation (1, 2). And literature on drugs of abuse has demonstrated an important role for the NAc in encoding of reward associations (3), which ultimately underlies partner preference. Additionally, in human neuroimaging studies, Prolonged Grief Disorder is associated with an enhanced signal in the NAc when viewing images of the lost loved one, suggesting that normal resolution of grief corresponds with a decrease in NAc activity elicited by reminders of the lost loved one (4). Thus, our focus on this region is well supported. Nonetheless, we recognize that the NAc does not act in a vacuum, and the efferent and afferent connectivity of different NAc cell types is well delineated, paving the way for future work (5, 6).

      Additionally, we agree with the reviewer that pair bonding behavior is multifaceted and comprised of several discrete behaviors that are not dissociable in the partner preference test. Partner-associated reward and partner memory may be independently encoded, and disruption of either process would manifest as a decrease or lack of partner preference. In our complete response to reviewers and revision of the manuscript, we will address this point more thoroughly. Finally, we interpret the reviewer’s last comment to be a request for functional manipulations to validate that the predicted transcriptional changes have a behavioral effect. This is beyond the scope of this manuscript but an active area of future research.

      Reviewer #2 (Public Review):

      The goal of this study is to understand the molecular mechanisms by which pair bonded animals recover following the loss of a partner.

      Strengths of this work include: (1) The organism - a novel model for studying pair bonding and loss; (2) The integrative nature of the study; it integrates behavior and brain gene expression RNASeq data and vTRAP; (3) The important and understudied question about how pair bonded animals recover from loss; (4) The thorough and careful analysis of highly multidimensional and complex datasets

      Weaknesses include: (1) the major comparison is between same vs opposite sex housed pairs. This design controls for social effects somewhat, but the two treatment groups differ not just with respect to whether or not they are pair bonded, but also in whether or not they had associated with a male or female. Differences between the treatments could reflect pair bonding, or perhaps something about the sex of the partner. It would be useful to have an additional control group, or data on the behavior of individuals within both types of pairs while they are co-housed. Were transcriptomic effects more detectable in pairs that were more bonded together behaviorally? That would suggest that the gene expression signatures really reflect something about the bond rather than other confounds, for example; (2) The vTRAP method is fancy but what is it really adding? (3) The authors interpret the transcriptomic differences as promoting the ability to form a new bond but there are probably other processes that are contributing to the differences in gene expression. Some of the differentially expressed genes could be involved in promoting a new pair bond, but there could also be a signature of the memory of the identity of the partner, the signature of the bond itself, etc. (4) Some of the interpretations go a little too far, especially in terms of anthropomorphism. The impact of the work includes further development of voles as an important model for studying social behavior and insights into the molecular processes important for recovering from the loss of a partner.

      We thank the reviewer for recognizing the strength of our study organism and experimental techniques as well as rigorous analyses to answer an important question about adapting to partner loss.

      Regarding the noted weaknesses:

      (1) We chose to compare opposite sex pair bonds to same sex affiliative relationships as this is the standard within the field, and we note that reviewers 1 and 3 found this to be a strength of our study design (7–11). Peer relationships in prairie voles are difficult to distinguish behaviorally from those of opposite-sex pairs (Fig 1) because both same and opposite-sex paired voles show selective preference for their pairmate and selective agression towards other voles (7). As such, the critical feature that makes pair bonding different is mating, which requires an opposite sex partner in voles, and our experiments are optimally designed to identify the longitudinal transcriptional changes that result from mating and cohabitating with an opposite-sex partner. In order to best match our two groups, only animals with a preference score >50% were included in the transcriptional experiment, ensuring that we were comparing animals with an affiliative preference for their partner - whether that individual was the same or opposite sex.

      We interpret the reviewers comment to be that they want us to compare opposite-sex-paired animals with and without bonds. This can be achieved two ways. First, we can compare to a promiscuous species, such as meadow voles, which will mate and cohabitate without forming bonds, but this is confounded by species differences in transcription that may exist independent of bonding. Second, we can compare bonded voles to the small subset that do not form bonds. While intriguing, this is experimentally challenging as only ~10-20% of males fail to form a bond when paired with a sexually receptive female (in the current study, 16% had a preference < 50% after two weeks of pairing, which is consistent with prior reports - (9–11)). Put simply, we would need to pair hundreds of voles to opportunistically collect a sufficient number of non-bonders for transcriptional assessment across our experimental conditions. While we hope to eventually be able to do such an experiment, litter sizes, consideration of animal welfare, and other constraints make this largely untenable at present.

      Data on the behavior of individuals within both types of pairs while they are co-housed is already provided via results of a partner preference test performed after 2 weeks of co-housing and prior to re-housing or separation (Fig 2B and 3B). We find the reviewer’s suggestion of finding a relationship between the transcriptional signature and the pair bonding strength an interesting question, and we undertook a preliminary analysis examining whether animals with different pair bond strength aggregate on a PCA analysis of gene expression. There was no apparent relationship, although we are performing additional analyses such as exploratory factor analysis. The fact that we have not found a relationship between the baseline partner preference and the transcription in these initial analyses is perhaps unsurprising. First, bonding may require some threshold change in gene expression, with bond strength reflected in non-genomic information, such as synapse formation or strengthening, or axonal ensheathment. Second, we only performed transcriptional analyses on animals with a baseline partner preference >50%; we would not necessarily expect a dissociation given the uniformly strong bonds across these animals.

      (2) We feel that inclusion of TRAP adds substantially to this manuscript and to our understanding of the neuromolecular underpinnings of bonding and loss in the NAc. The value of this experiment is twofold. As noted by Reviewer 3, “the TRAP approach in prairie voles is novel and will provide a great resource to the research community.” The prairie vole community has just developed its first transgenic Cre lines, which could be paired with vTRAP to query bond-associated gene expression changes exclusively in Cre-expressing neurons (15). Second, we noticed a puzzle in our tissue-level data. The majority of cells in the NAc are neurons (16, 17), and the vast majority of pair bonding studies of this region have focused on neuronal phenotypes, but our transcriptional signatures were linked to changes in glial populations. Ultimately, changes in glia are likely to act via their interactions with neurons, and vTRAP enables us to query the neuronal transcriptional changes within our data. Supporting that this provides novel insights into our datasets, when we cluster transcripts based on their expression profiles following short and long-term separation, we predict different GO terms from the tissue level and neuronally-enriched gene sets. For instance, the GO terms resulting from cluster 2 for neuronal genes (Fig 4) includes “response to amphetamine” within the top 10 results, but the same cluster of genes from tissue level sequencing predicts this GO term as the 34th result.

      (3) We agree with the reviewer that adapting to partner loss is a multifaceted process that likely engages numerous biological and emotional systems in voles. The explanation we offer for the transcriptional changes during loss is based on previous work in the field and is one possible interpretation. We will expand on this point during revision of the manuscript.

      (4) We thank the reviewer for encouraging us to be objective with our interpretations. We will address this comment during revision of the manuscript.

      Finally, we thank the reviewer for recognizing the value of our study for not only the field of voles but the bereavement field more broadly.

      Reviewer #3 (Public Review):

      In this manuscript, the authors investigate the behavioral and brain transcriptional alterations associated with short- and long-term partner separation in the monogamous male prairie vole. Male prairie voles continue to show affiliative behavior after short- (2 days) and long-term (4-weeks) partner separation, with similar effects for same and opposite-sex pairs. However, the transcriptional signature in the nucleus accumbens exhibits marked alterations after long-term separation.

      Strengths:

      1) A key strength of this manuscript is its use of the monogamous prairie vole to investigate transcriptional alterations associated with pair bonding and subsequent pair separation. This sort of behavior cannot be investigated in typical rodent model systems (e.g., mice, rats), and the choice of using prairie voles allows for dissection of potential mechanisms of social bonding with relevance to partner loss in humans.

      2) Investigation of behavioral measures and transcriptional alterations at both short- and long-term time points after pairing and separation is a strength of the manuscript. These time points were selected based on previous studies in laboratory and wild prairie voles related to the time it takes to form a pair bond and for the male prairie vole to leave the nest after the loss of the female pair. The datasets generated will be of great use to the scientific community.

      3) The authors investigate the behavior and transcriptional profiles after same-sex as well as opposite-sex pairing. This is considered a thoughtful decision on the authors' part which allows them to tease apart the effects of same vs. opposite sex.

      4) The use of numerous behavioral measures to assess both affiliative and aggressive behaviors is a strength of the approach.

      5) The authors use many biostatistical approaches (e.g., RRHO, WGCNA, Enrichr) to probe the transcriptomics data. These approaches allow the authors to move beyond simply assessing transcriptional profiles separately, but to look for patterns that are similar or different across datasets.

      6) The authors use rigorous statistical methods to assess behavioral measures.

      7) The TRAP approach in prairie voles is novel and will provide a great resource to the research community.

      Weaknesses:

      1) The methods state that prairie voles were treated differently in the behavioral and transcriptomics studies. Specifically, for the separation in the behavioral studies, prairie voles were separated by sight, but not necessarily by the smell from partners (i.e., partners were kept ~1 foot apart). However, prairie voles in the transcriptomics studies were separated by both sight and smell (i.e., partners were sacrificed after separation). Thus, it is possible that the lack of degradation of pair bond-related behavior after long-term separation might be due to these prairie voles being able to smell their partners after separation. This is considered a moderate flaw in the design of the studies which limits the integration of results between behavior and transcriptomics. This might be why the authors do not see a strong behavioral degradation of pair bond-related behavior after long-term separation but do see a strong transcriptional signature.

      2) While RRHO is helpful to assess overall patterns of transcriptional signatures across datasets, its utility for determining the exact transcripts is limited. This is because of how RRHO determines the overlapping transcripts for its Venn diagram feature (by taking the point where the p-value is most significant and taking the list to the outside corner of that quadrant).

      3) TRAP expression was verified in only one animal. Thus, the approach has not been appropriately confirmed.

      We thank the reviewer for their thoughtful comments on the innovative strengths and advantages of our manuscript.

      Regarding the noted weaknesses:

      (1) Please see our response to Reviewer #1, who shares your concerns.

      (2) We agree that RRHO is particularly useful for assessment of overall patterns. We interpret the Reviewer’s comment to mean that when extracting the overlapping gene lists from an RRHO quadrant for downstream analyses, we should filter that list for genes whose differential expression passes a nominal p-value cutoff to reduce the amount of biologically insignificant conclusions we are drawing from the RRHO data. Our initial analyses used just such a threshold-based approach by identifying GO terms via differentially expressed genes of the combined pair bond (Figure 2) using both p-value and log2Fold cutoffs. This analysis revealed a number of terms associated with glial cell proliferation, differentiation, and function (Fig 2H). Such processes occur over a time frame of days to weeks, with different phases of differentiation characterized by different gene expression profiles. To explore this further, we used the genes in the UU and DD RRHO quadrants without implementing a p-value cutoff to see if additional genes associated with these GO-identified pathways may be showing subtle but consistent directional changes (Fig 3). Importantly, we only use the overlapping RRHO gene lists to determine how previously defined biological processes via DEG-predicted GO terms change across conditions; we are not using the RRHO gene lists to generate new GO terms. This allowed us to look for patterns within the identified pathways that may give insight into how transcription might be affecting gliogenesis. This analysis was similarly suggested to us from other experienced users of RRHO plots (see Acknowledgements). There are also several published studies that use RRHO UU and DD quadrant overlap (18–22).

      (3) Most labs rarely confirm Cre-dependence of vectors in more than one or two animals as the results, including those shown in Fig S9A, are typically definitive (i.e. no expression in the absence of Cre, expression in the presence of Cre). In addition to the images shown in figure S9A, we used fluorescent guided dissection to harvest tissue/mRNA, serving as an additional visual confirmation of RPL10-GFP expression in the animals used to generate Figure 4. Since submission, we have also confirmed that this vector also expresses in rats when Cre-recombinase is present. However, prior to resubmission, we will perform additional surgeries to confirm that TRAP is only expressed in the presence of Cre-recombinase.

      References

      1. J. L. Scribner, E. A. Vance, D. S. W. Protter, W. M. Sheeran, E. Saslow, R. T. Cameron, E. M. Klein, J. C. Jimenez, M. A. Kheirbek, Z. R. Donaldson, A neuronal signature for monogamous reunion. Proceedings of the National Academy of Sciences. 117, 11076–11084 (2020).
      2. A. M. Borie, S. Agezo, P. Lunsford, A. J. Boender, J.-D. Guo, H. Zhu, G. J. Berman, L. J. Young, R. C. Liu, Social experience alters oxytocinergic modulation in the nucleus accumbens of female prairie voles. Current Biology. 32, 1026-1037.e4 (2022).
      3. E. S. Calipari, R. C. Bagot, I. Purushothaman, T. J. Davidson, J. T. Yorgason, C. J. Peña, D. M. Walker, S. T. Pirpinias, K. G. Guise, C. Ramakrishnan, K. Deisseroth, E. J. Nestler, In vivo imaging identifies temporal signature of D1 and D2 medium spiny neurons in cocaine reward. Proc. Natl. Acad. Sci. U.S.A. 113, 2726–2731 (2016).
      4. M.-F. O’Connor, D. K. Wellisch, A. L. Stanton, N. I. Eisenberger, M. R. Irwin, M. D. Lieberman, Craving love? Enduring grief activates brain’s reward center. NeuroImage. 42, 969–972 (2008).
      5. T. Hikida, S. Yao, T. Macpherson, A. Fukakusa, M. Morita, H. Kimura, K. Hirai, T. Ando, H. Toyoshiba, A. Sawa, Nucleus accumbens pathways control cell-specific gene expression in the medial prefrontal cortex. Sci Rep. 10, 1838 (2020).
      6. C. Baimel, L. M. McGarry, A. G. Carter, The Projection Targets of Medium Spiny Neurons Govern Cocaine-Evoked Synaptic Plasticity in the Nucleus Accumbens. Cell Reports. 28, 2256-2263.e3 (2019).
      7. N. S. Lee, N. L. Goodwin, K. E. Freitas, A. K. Beery, Affiliation, aggression, and selectivity of peer relationships in meadow and prairie voles. Frontiers in Behavioral Neuroscience. 13 (2019), doi:10.3389/fnbeh.2019.00052.
      8. O. J. Bosch, H. P. Nair, T. H. Ahern, I. D. Neumann, L. J. Young, The CRF System Mediates Increased Passive Stress-Coping Behavior Following the Loss of a Bonded Partner in a Monogamous Rodent. Neuropsychopharmacology. 34, 1406–1415 (2009).
      9. O. J. Bosch, J. Dabrowska, M. E. Modi, Z. V. Johnson, A. C. Keebaugh, C. E. Barrett, T. H. Ahern, J. Guo, V. Grinevich, D. G. Rainnie, I. D. Neumann, L. J. Young, Oxytocin in the nucleus accumbens shell reverses CRFR2-evoked passive stress-coping after partner loss in monogamous male prairie voles. Psychoneuroendocrinology. 64, 66–78 (2016).
      10. A. J. Grippo, B. S. Cushing, C. S. Carter, Depression-like behavior and stressor-induced neuroendocrine activation in female prairie voles exposed to chronic social isolation. Psychosomatic Medicine. 69, 149–157 (2007).
      11. A. J. Grippo, D. Gerena, J. Huang, N. Kumar, M. Shah, R. Ughreja, C. Sue Carter, Social isolation induces behavioral and neuroendocrine disturbances relevant to depression in female and male prairie voles. Psychoneuroendocrinology (2007), doi:10.1016/j.psyneuen.2007.07.004.
      12. J. R. WILLIAMS, C. S. CARTER, T. INSEL, Partner Preference Development in Female Prairie Voles Is Facilitated by Mating or the Central Infusion of Oxytocin. Annals of the New York Academy of Sciences. 652, 487–489 (1992).
      13. C. Sue Carter, A. Courtney Devries, L. L. Getz, Physiological substrates of mammalian monogamy: The prairie vole model. Neuroscience and Biobehavioral Reviews. 19, 303–314 (1995).
      14. L. L. Getz, C. S. Carter, L. Gavish, The mating system of the prairie vole, Microtus ochrogaster: Field and laboratory evidence for pair-bonding. Behavioral Ecology and Sociobiology. 8, 189–194 (1981).
      15. K. Horie, K. Inoue, S. Suzuki, S. Adachi, S. Yada, T. Hirayama, S. Hidema, L. J. Young, K. Nishimori, Oxytocin receptor knockout prairie voles generated by CRISPR/Cas9 editing show reduced preference for social novelty and exaggerated repetitive behaviors. Horm Behav. 111, 60–69 (2019).
      16. K. E. Savell, J. J. Tuscher, M. E. Zipperly, C. G. Duke, R. A. Phillips, A. J. Bauman, S. Thukral, F. A. Sultan, N. A. Goska, L. Ianov, J. J. Day, A dopamine-induced gene expression signature regulates neuronal function and cocaine response. Sci Adv. 6, eaba4221 (2020).
      17. D. Avey, S. Sankararaman, A. K. Y. Yim, R. Barve, J. Milbrandt, R. D. Mitra, Single-Cell RNA-Seq Uncovers a Robust Transcriptional Response to Morphine by Glia. Cell Reports. 24, 3619-3629.e4 (2018).
      18. S. L. Fulton, S. Mitra, A. E. Lepack, J. A. Martin, A. F. Stewart, J. Converse, M. Hochstetler, D. M. Dietz, I. Maze, Histone H3 dopaminylation in ventral tegmental area underlies heroin-induced transcriptional and behavioral plasticity in male rats. Neuropsychopharmacology. 47, 1776 (2022).
      19. S. G. Caradonna, T.-Y. Zhang, N. O’Toole, M.-J. Shen, H. Khalil, N. R. Einhorn, X. Wen, C. Parent, F. S. Lee, H. Akil, M. J. Meaney, B. S. McEwen, J. Marrocco, Genomic modules and intramodular network concordance in susceptible and resilient male mice across models of stress. Neuropsychopharmacol. 47, 987–999 (2022).
      20. J. S. Wang, T. Kamath, C. M. Mazur, F. Mirzamohammadi, D. Rotter, H. Hojo, C. D. Castro, N. Tokavanich, R. Patel, N. Govea, T. Enishi, Y. Wu, J. da Silva Martins, M. Bruce, D. J. Brooks, M. L. Bouxsein, D. Tokarz, C. P. Lin, A. Abdul, E. Z. Macosko, M. Fiscaletti, C. F. Munns, P. Ryder, M. Kost-Alimova, P. Byrne, B. Cimini, M. Fujiwara, H. M. Kronenberg, M. N. Wein, Control of osteocyte dendrite formation by Sp7 and its target gene osteocrin. Nat Commun. 12, 6271 (2021).
      21. D. A. Gallegos, M. Minto, F. Liu, M. F. Hazlett, S. Aryana Yousefzadeh, L. C. Bartelt, A. E. West, Cell-type specific transcriptional adaptations of nucleus accumbens interneurons to amphetamine. Mol Psychiatry, 1–15 (2022).
      22. B. J. Hilton, A. Husch, B. Schaffran, T. Lin, E. R. Burnside, S. Dupraz, M. Schelski, J. Kim, J. A. Müller, S. Schoch, C. Imig, N. Brose, F. Bradke, An active vesicle priming machinery suppresses axon regeneration upon adult CNS injury. Neuron. 110, 51-69.e7 (2022).
    1. Author Response

      Reviewer #1 (Public Review):

      In this paper the authors present variations in carbon oxidation state and hydration state in proteomes available in RefSeq. Then they use this information to predict community level proteomes, and their corresponding carbon oxidation states and hydration states, based on available 16S rRNA gene sequences from selected previously published datasets. When combining this with information about the environmental setting of the individual samples analyzed, the authors are able to demonstrate connections between redox conditions and proteomic carbon oxidation state and hydration state. Furthermore, they explore how individual taxonomic groups at different taxonomic levels contribute to forming these connections.

      A weakness with the study is that the described environmental proteomes are inferred from 16S rRNA gene sequence data and not observed directly. However, there is good reason to believe that the conclusions drawn in the paper are valid.

      The study sheds light on microbial adaptations on the genome level that so far have received relatively little attention. The paper is also interesting from an ecological perspective regarding the general question of how microbial communities are shaped by environmental settings.

      To attempt to bring more attention to environmental constraints, a plot (Figure 4E in the published paper) was redrawn to more clearly show how carbon oxidation state of estimated community proteomes not only is lower in more reducing conditions for a variety of environments but also shows the largest differences for hydrothermal systems and shale-gas wells. This finding is discussed in terms of geological sources of reductants and provides new evidence that the chemical makeup of microbial communities reflects their geological context.

      Reviewer #2 (Public Review):

      This manuscript mainly investigated the carbon oxidation and stoichiometric hydration states of the inferred community proteomes according to 16S rRNA gene compositions from the published datasets and explored their potential associations with environmental parameters such as redox gradients, oxygen concentrations and salinity.

      Predictions of the carbon oxidation and stoichiometric hydration states on the basis of microbial proteomes can provide some meaningful information for disentangling microbial response to environmental changes. As we know, some genes in microbial genomes are not expressed and transformed to proteins. Therefore, such gene redundancy in genomes may lead to bias in predicting the carbon oxidation and stoichiometric hydration states.

      Our study uses available data sources to identify informative differences of elemental compositions of proteomes predicted from genomes. There are numerous examples in the literature of using protein sequences predicted from genomes to make comparisons of amino acid composition (for example, in eLife: https://doi.org/10.7554/eLife.57347), so it would appear to be acceptable with some level of uncertainty to use genomic data to make comparisons between (amino acid or elemental) compositions of predicted proteomes.

      Furthermore, this study compiled many 16S rRNA gene datasets from previous studies. Different primer sets were applied in those studies, and such difference will result in distinct 16S rRNA gene compositions. Accordingly, it is essential to deal with the influence of different primer sets on the 16S rRNA gene compositions among samples. Unfortunately, such information is missing in the method section.

      Primer sets used in the source studies have been added to Table 1 in the published paper. The Discussion was modified to acknowledge limitations in making comparisons *between* datasets obtained using different primers. However, the main results of this study are based on differences of carbon oxidation state (Zc) *within* individual datasets (for instance, along the vertical redox gradients shown in Figure 3).

      The intra-dataset differences of Zc themselves are compared across datasets in Figure 4E. However, it can be expected that the effects of technical variability – including not only primer pairs but also DNA extraction methods, etc. – would tend to be reduced in these inter-dataset comparisons of intra-dataset differences, in contrast to direct inter-dataset comparisons. The index plot at the center of Figure 2 does make a direct inter-dataset comparison, but the outcome is consistent with trends identified in previous analyses of shotgun metagenomic datasets, 16S primers and other technical differences between studies notwithstanding.

      Additionally, the community proteomes in this study were inferred from 16S rRNA genes. The marker gene of 16S rRNA cannot well predict their corresponding genomes, possibly leading to prediction of biased proteomes. Therefore, it should avoid to use 16S rRNA genes for predicting microbial genomes and proteomes.

      Despite the various sources of uncertainty in making estimates of elemental composition of communities from 16S rRNA genes and reference proteomes, comparisons with shotgun metagenomic data support the reliable identification of trends within datasets (Figure 5 in the published paper).

      It seems that the relationships between carbon oxidation states/stoichiometric hydration state and redox/salinity gradients have been reported in previous studies (e.g., Dick et al 2019, 2020, 2021). The finding of this study is not new in comparison with the previously reported.

      The explorations in previous studies of chemical links between communities and environments were based on analysis of shotgun metagenomic data. The ability to reproduce those findings by analyzing 16S rRNA gene sequence data is a new advance in this study.

      Other new results in the published paper are the different magnitudes of Zc differences in various environments (which were not previously documented from shotgun metagenomes; Figure 4E) and the comparison of shotgun metagenome and 16S-based estimates of Zc for the time series of injected fluids in the Marcellus Shale (Figure 5B). The latter results are particularly interesting; the close correspondence for Days 0, 7, and 13 supports the basic reliability of the 16S-based estimates, while the increasing divergence at Days 82 and 328 suggests the onset of some interfering mechanisms (the speculation is made that this could be related to viral lysis and heterotrophic degradation of the released DNA). Also, the published paper presents the first analysis of carbon oxidation state of proteins – from either shotgun metagenome sequences or 16S rRNA-based estimates – for microbial communities in various body sites using data from the Human Microbiome Project (Figure 5D).

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript by de la Vega and colleagues describes Neuroscout, a powerful and easy-to-use online software platform for analyzing data from naturalistic fMRI studies using forward models of stimulus features. Overall, the paper is interesting, clearly written, and describes a tool that will no doubt be of great use to the neuroimaging community. I have just a few suggestions that, if addressed, I believe would strengthen the paper.

      Major comments

      1) How does Neuroscout handle collinearity among predictors for a given stimulus? Does it check for this and/or throw any warnings? In media stimuli that have been adopted for neuroimaging experiments, low-level audiovisual features are not infrequently correlated with mid-level features such as the presence of faces on screen (see Grall & Finn, 2022 for an example involving the Human Connectome Project video clips). How to disentangle correlated features is a frequent concern among researchers working with naturalistic data.

      We agree with the reviewer that collinearity between predictors is one of the biggest challenges for naturalistic data analysis. However, absent consensus on how to best model these data, we find that it is out of scope of the present report to make strong recommendations. Instead, our goal was to design an agnostic platform that would enable users to thoughtfully design statistical models for their particular goal. Papers such as Grall & Finn (2022) will be critical in advancing the debate on how to best analyze and interpret such data.

      We explicitly address this challenge in a new paragraph in the discussion under “Challenges and future directions:

      “A major challenge in the analysis of naturalistic stimuli is the high degree of collinearity between features, as the interpretation of individual features is dependent on co-occurring features. In many cases, controlling for confounding variables is critical for the interpretation of the primary feature— as is evident in our investigation of the relationship between FFA and face perception. However, it can also be argued that in dynamic narrative driven media (i.e. films and movies), the so-called confounds themselves encode information of interest that cannot or should not be cleanly regressed out (Grall & Finn, 2022).[…] Absent a consensus on how to model naturalistic data, we designed Neuroscout to be agnostic to the goals of the user and empower them to construct sensibly designed models through comprehensive model reports. An ongoing goal of the platform—especially as the number of features continues to increase—will be to expand the visualizations and quality control reports to enable users to better understand the predictors and their relationship. For instance, we are developing an interactive visualization of the covariance between all features in Neuroscout that may help users discover relationships between a predictor of interest and potential confounds.” (pg. 11)

      Note we shortened the second paragraph of the discussion by two sentences as it had touched on this subject, and was better addressed separately.

      In addition, we ensured to highlight the covariance structure visualization in the Results section:

      “At this point, users can inspect the model through quality-control reports and interactive visualizations of the design matrix and predictor covariance matrix, iteratively refining models if necessary.” (pg. 3)

      2) On a related note, do the authors and/or software have opinions about whether it is moreappropriate to run several regressions each with a single predictor of interest or to combine all predictors of interest into a single regression? (Or potentially a third, more sophisticated solution involving variance partitioning or another technique to [attempt to] isolate variance attributable to each unique predictor?) Does the answer to this depend on the degree of collinearity among the predictors? Some discussion of this would be helpful, as it is a frequent issue encountered when analyzing naturalistic data.

      This is a very sensitive methodological point, but one for which it is hard to find a univocal answer in the literature. While on the one hand it can be deceptive to model a single feature in isolation (as illustrated by our face perception analyses), more complex models pose different challenges in terms of robust parameter estimation and variance attribution. Resolving these challenges goes beyond the scope of our work, and it is ultimately our goal to provide a flexible tool which will enable these types of investigations, and enable users to take responsibility and provide motivations for methodological choices made using the platform. We touch on Neuroscout’s agnostic philosophy on this issue under “Challenges and future directions” (pg. 11; quoted above).

      However, we also agree that in part the solution to this problem will be methodological. This is particularly true for modeling deep learning based embeddings, which can have hundreds of features in a single model. We are currently working on expanding beyond traditional GLM models in Neuroscout, opening the door to more sophisticated variance partitioning techniques, and more robust parameter estimation in complex models. We highlight current and future efforts to expand Neuroscout’s statistical models in the following paragraph:

      “However, as the number of features continues to grow, a critical future direction for Neuroscout will be to implement statistical models which are optimized to estimate a large number of covarying targets. Of note are regularized encoding models, such as the banded-ridge regression as implemented by the Himalaya package. These models have the additional advantage of implementing feature-space selection and variance partitioning methods, which can deal with the difficult problem of model selection in highly complex feature spaces such as naturalistic stimuli. Such models are particularly useful for modeling high-dimensional embeddings, such as those produced by deep learning models. Many such extractors are already implemented in pliers and we have begun to extract and analyze these data in a prototype workflow that will soon be made widely available. “ (pg. 11)

      3) What the authors refer to as "high-level features" - i.e., visual categories such as buildings,faces, and tools - I would argue are better described as "mid-level features", reserving the term "high-level" for features that are present only in continuous, engaging, narrative or narrative-like stimuli. Examples: emotional tone or valence, suspense, schema for real-world situations, other operationalizations of a narrative arc, etc. After all, as the authors point out, one doesn't need naturalistic paradigms to study brain responses to visual categories or single-word properties. Much of the work that has been done so far with forward models of naturalistic stimuli has been largely confirmatory (e.g., places/scenes still activate PPA even during a rich film as opposed to a serial visual presentation paradigm). This is a good first step, but the promise of naturalistic paradigms is ultimately to go beyond these isolated features toward more holistic models of cognitive and affective processes in context. One challenge is that extracting true high-level features is not easily automated, although the ability to crowdsource human ratings using online data collection has made it feasible to create manual annotations. However, there are still technical challenges associated with collecting continuous-response measurement (CRM) data during a relatively long stimulus from a large number of individuals online. Does Neuroscout have any plans to develop support for collecting CRM data, perhaps through integration with Amazon MTurk and/or Prolific? Just a thought and I am sure there are a number of features under consideration for future development, but it would be fabulous if users could quickly and easily collect CRM data for high-level features on a stimulus that has been uploaded to Neuroscout (and share these data with other end users).

      The reviewer makes a very good point regarding the fact that many so-called “high-level” features are best called “mid-level”. As such, we have changed our use of “high-level” to “mid-level perceptual features” throughout the manuscript.

      “Currently available features include hundreds of predictors coding for both low-level (e.g., brightness, loudness) and mid-level (e.g., object recognition indicators) properties of audiovisual stimuli…” (pg. 3)

      That said, we do believe that as machine learning (and in particular deep learning) models evolve, it will become more feasible to extract higher level features automatically. This has already been shown with transformer language models, which are able to extract higher-level semantic information from natural text. To this end, we have ensured to design our underlying feature extraction platform, pliers, to be easily extensible, to ensure the continued growth of the platform as algorithms evolve. We ensure to highlight this in the Results section ‘Automated annotation of stimuli’:

      “The set of available predictors can be easily expanded through community-driven implementation of new pliers extractors, as well as public repositories of deep learning models, such as HuggingFace and TensorFlowHub. We expect that as machine learning models continue to evolve, it will be possible to automatically extract higher-level features from naturalistic stimuli.” (pg. 3)

      We also ensured to highlight the extensibility of pliers to increasingly power deep learning models in the Discussion by revising this sentence

      “As a result, we have designed Neuroscout and its underlying feature extraction framework pliers to facilitate community-led expansion to novel extractors— made possible by the rapid increase in public repositories of pre-trained deep learning models such as HuggingFace and TensorFlow Hub” (pg. 10)

      As to the point of a potential extension to Neuroscout for easily collecting crowd source stimuli annotations, we are in full agreement that this would be very useful. In fact, this feature was part of the original plan for Neuroscout, but fell out of scope as other features took priority. Although we are unsure if this extension is a short term priority for the Neuroscout team (as it likely would take substantial effort to develop a general purpose extension), the ability to submit user-generated features to the Neuroscout API should make it possible to design a modular extension to Neuroscout to collect such features.

      We mention this possibility briefly in the future directions section:

      “Other important expansions include facilitating analysis execution by directly integrating with cloud-based neuroscience analysis platforms (such as Brainlife.io) and facilitating the collection of higher-level stimulus features by integrating with crowdsourcing platforms such as MechanicalTurk or Prolific.” (pg. 11)

      4) Can the authors talk a bit more about the choice to demean and rescale certain predictors, namely the word-level features for speech analysis? This makes sense as a default step, but I wonder if there are situations in which the authors would not recommend normalizing features prior to computing the GLM (e.g., if sign is meaningful, if the distribution of values is highly skewed if the units reflect absolute real-world measurements, etc). Does Neuroscout do any normalization automatically under the hood for features computed using the software itself and/or features that have been calculated offline and uploaded by the user?

      In keeping with Neuroscout’s philosophy to be a general purpose platform, we have not performed any standardization of features. Instead, users can choose to modify raw predictor values by applying transformations on a model-by-model basis. Currently available transformations through the web interface include: scale, orthogonalize and threshold. Note that there is a wider range of transformations available in the BIDS Stats Model, but we are hesitant to advertise these yet, as they are more difficult to use.

      We revised our description of transformations in the Result section to clarify these transformations are model specific:

      “Raw predictor values can be modified by applying model-specific transformations such as scaling, thresholding, orthogonalization, and hemodynamic convolution.” (pg. 3)

      We also clarify that variables are ingested without any in-place modifications in the Methods section. The only exception is that we down-sample highly dense variables (such as those from auditory files, which can result in thousands of value per second), to save disk space:

      “Feature values are ingested directly with no in place modifications, with the exception of down sampling of temporally dense variables to 3hz to reduce storage on the server.” (pg. 13)

      With respect to the word frequency analysis, the primary reason we scaled variables was to facilitate imputing missing values for words not found in the look-up dictionary. By scaling the variable, we were able to replace missing values with zero, effectively assigning them the average word frequency value. We clarified this strategy in the Methods section:

      “In all analyses, this variable was demeaned and rescaled prior to HRF convolution. For a small percentage of words not found in the dictionary, a value of zero was applied after rescaling, effectively imputing the value as the mean word frequency.” (pg. 17)

      On a more general note, when interpreting a single variable with a dummy coded contrast (i.e. 1 for the predictor of interest, and 0 for all other variables), it’s not necessary to normalize features prior to modeling, as fMRI t-stat maps are scale-invariant (although the parameter estimates will be affected).

      We added a note with our recommendations in the Neuroscout Documentation: https://neuroscout.github.io/neuroscout-docs//web/builder/transformations.html#scale

      Reviewer #2 (Public Review):

      The authors present a new platform for constructing and sharing fMRI analyses, specifically geared toward analyzing publicly-available naturalistic datasets using automatically-extracted features. Using a web interface, users can design their analysis and produce an executable package, which they can then execute on their local hardware. After execution, the results are automatically uploaded to NeuroVault. The paper also describes several examples of analyses that can be run using this system, showing how some classical feature-sensitive ROIs can be derived from a meta-analysis of naturalistic datasets.

      The Neuroscout system is impressive in a number of ways. It provides easy access to a number of publicly-available datasets (though I would like to see the current set of 13 datasets increase in the future), has a wide variety of machine-learning features precomputed on the video and audio features of these stimuli, and builds on top of established software for creating and sandboxing analysis workflows. Performing meta-analyses across multiple datasets are challenging both practically and statistically, but this kind of multi-dataset analysis is easy to specify using Neuroscout. It also allows researchers to easily share a reproducible version of their pipeline simply by pointing to the publicly-available analysis package hosted on Neuroscout. The platform also provides a way for researchers to upload their own custom models/predictors to extend those available by default.

      The case studies described in the paper are also quite interesting, showing that traditional functional ROIs such as PPA and VWFA can be defined without using controlled stimuli. They also show that, running a contrast for faces does not produce FFA until speech (and optionally adaptation) is properly controlled for, and that VWFA shows relationships to lexical processing even for speech stimuli.

      I have some questions about the intended workflow for this tool: is Neuroscout meant to be used for analysis development in addition to sharing a final pipeline? The fact that the whole analysis is packaged into a single command is excellent for reproducibility but seems challenging to use when iterating on a project. For example, if we wanted to add another contrast to a model, it appears that this would require cloning the analysis and re-starting the process from scratch.

      An important principle of Neuroscout from the onset of the project was to minimize undocumented researcher degrees of freedom, and maximize transparency in order to reduce the file drawer effect which can contribute to biased results in the published literature. As such, we require analyses to be registered and locked as the modal usage of our application. In the case of adding a contrast, it is true that this would require a user to clone the analysis. Although all of the information from the previous model would be encoded in the new model, this would require re-estimating the design matrix which could be time consuming. However, in our experience, users almost always add new variables to the design-matrix when a study is cloned, which would in any case require re-estimating the design matrix for all runs and subjects. We believe this trade-off is worthwhile to ensure maximal reproducibility, but also point out that since Neuroscout’s data is freely available via our API, power users could directly access the data if they need to use it in a less constrained manner.

      We believe that these important distinctions are best addressed in the newly developed Neuroscout documentation which we now reference throughout the text (https://neuroscout.org/docs/web/browse/clone.html).

      I'm also unsure about how versioning of the input datasets and the predictors is planned to be handled by the platform; if datasets have been processed with multiple versions of fmriprep, will all of those options be available to choose from? If the software used to compute features is updated, will there be multiple versions of the features to choose from?

      The reviewer makes an astute observation regarding the versions of input data (predictors & datasets). Currently we have only pre-processed the imaging data once per data, and as such this has not been an issue. However, in the long run we certainly agree this would be important to give users the ability to choose which pre-processed version of the raw dataset they want to use, as certainly there could be differing but equally valid versions. We have opened an issue in Neuroscout’s repository to track this issue, and plan to incorporate this ability in a future version (https://github.com/neuroscout/neuroscout/issues/1076).

      With respect to feature versions, every time a feature is re-extracted, a new predictor_id is generated, and the accompanying meta-data such as time of extraction is tracked for that specific version. As such, if a feature is updated and re-extracted, this will not change existing analyses. By default, we have chosen to obscure this from the user to make the user experience simpler. However, there is an open issue to expand the frontend’s ability to explicitly display different versions, and allow users to update older analyses with newer versions of features. Advanced users already have access to this functionality by using the Python API (PyNS) to directly access all features, and create analyses with more precision.

      We have made a note regarding this behavior in the Neuroscout Documentation: https://neuroscout.github.io/neuroscout-docs/web/builder/predictors.html

      I also had some difficulty attempting to test out the platform, so additional user testing may be necessary to ensure that novice users are able to successfully run analyses.

      We thank the reviewer for this bug report, which allowed us to fix a previously unnoticed issue with a subset of Neurosout datasets. We have been incontact with the reviewer to ensure that this issue was successfully addressed.

    1. Author Response

      Reviewer #1 (Public Review):

      1) While the authors identify the suppressors in known genetic interactors (GIs) of the yeast SEC53, it is worth testing if the compensatory mutations are rewiring the GIs, thereby explaining the lack of comparable compensations observed in reconstituted strains. If altered GIs explain the suppression, then while yeast serves as an excellent tool to perform these assays, the human context of the disease may require a different set of genetic suppressors and, therefore, a different target than the yeast PGM1 ortholog.

      Our data show that pgm1 mutations alone greatly improve growth of sec53-V238M strains. Our data also indicate other pathways of compensation. Whether each of these compensatory mechanisms translate to humans is unknown. However, the observed enrichment of compensatory mutations in genes whose human homologs are associated with Type 1 CDG, suggests that many of these genetic interactions are likely to be conserved.

      Also, are Sec53 and Pgm1 proteins directly interacting in yeast and whether these mutations are on the interaction interface?

      As we mention above, there is no support for a direct physical interaction between Sec53 and Pgm1.

      2) Based on the data obtained between pACT1 and pSEC53-driven expression of the SEC53 mutant alleles, the pattern of suppressors appears to be different. Authors report that the variants expressed from strong pACT1 promoters show more suppressors than those driven by native promoters. Is this a general trend in experimental evolution that slower-growing strains tend to show lesser suppressors? For example, on Page 6, line 154, "compensating for Sec53-F126L dimerization defects are rare or not easily accessible". The statement suggests that the authors did obtain suppressors that compensate for the dimerization defect. At the same time, while rare (also, are authors suggesting suppression of dimerization defect as in better dimerization?), the rate of obtaining suppressors seems to be linked to the severity of the fitness defects of the strains. The lack of suppressors may be a limitation of the evolution experiments. Indeed later in the manuscript, the authors noticed that while PGM1 suppressors obtained in V238M can also suppress F126L alleles, the suppression was not as efficient. Could it be that evolution experiments in slower-growing strains predominantly enrich suppressors in other pathways (i.e., not in the CDG orthologs) that restore the growth better and compete out the relatively weaker suppressors in PGM1? In fact, the authors report similar effects on Page 7, lines 204-210. These two paragraphs are contradictory and should be explained further.

      All of our sequencing was performed on strains with sec53 under the control of the pACT1 promoter. While we did not identify unique sec53-F126L suppressors, we cannot exclude that sec53-F126L suppressors exist, so we describe them as “rare or not easily accessible”. While it is possible that the slower growth rate of the sec53-F126L allele could impact the likelihood of observing suppressors, we think it is more likely due to the nature of the variant (dimerization defect versus stability defect) rather than growth rate. In other laboratory evolution experiments the same beneficial mutation typically has a greater effect in slower-growing backgrounds (for example: doi.org/10.1126/science.1250939).

      3) Authors report that the LOF of PGM1 compensates for the SEC53 mutations. However, the evolution experiments did not capture any LOFs in PGM1. The fitness comparisons in evolution experiments are different as many different genotypes compete in a mix. Therefore, the fitness assays in a clonal population may not represent these differences well. To test this argument, authors can try to mimic the evolution experiments by mixing two genotypes to check competitive fitness, like the co-culture of pgm1 suppressor obtained via evolution experiments with pgm1Δ.

      Though we did not perform a direct head-to-head competition between a pgm1 suppressor and a pgm1Δ, our data suggest that the pgm1 delete would outcompete some of the lower-fitness suppressors. In the Discussion we speculate as to why we do not see deletion mutations: “Given that most of the evolved clones containing pgm1 mutations are more fit than the reconstructed strains, it is possible that other evolved mutations interact epistatically only with non-loss-of-function pgm1 mutations.”. Though it is beyond the scope of the present manuscript, it would be possible to rerun the evolution experiment in sec53-V238M strains carrying either a pgm1 missense suppressor or a pgm1Δ. Under the hypothesis of additional interacting loci, only the pgm1 missense suppressors would be more likely to acquire additional compensatory mutations.

      Reviewer #3 (Public Review):

      Vignogna et al. used yeast genetics, experimental evolution and biochemistry to tackle human congenital disorders of glycosylation (CDG), a disease mostly caused by mutations in PMM2. They took advantage of the observation that the budding yeast gene SEC53 is almost identical to human PMM2, and used experimental evolution to find interactors of SEC53/PMM2. They found an overrepresentation of mutations in genes corresponding to other human CDG genes, including PGM1. Genetic and biochemical characterizations of the pgm1 mutations were carried out. This work is solid, although authors did not reveal why reduction of pgm1 activity could compensate for defects of a particular mutant allele of sec53.

      Out of curiosity, if the authors were to simply focus on the preexisting mutations, would they have gotten the materials for most of the experiments in this article? In other words, how important is the experimental evolution?

      The evolution experiment was crucial as the specific pgm1 mutations we identified here have not been reported elsewhere, nor have the orthologous mutations been identified in human PGM1.

      A strain table with full genotypes is needed.

      We added a strain genotype table (Supplemental Dataset 2).

    1. Author Response

      Reviewer #1 (Public Review):

      In this work, Maxime R. and co-authors intended to investigate the consequence of dystrophin absence/alteration in myoblasts, the effector cells of muscle growth and regeneration, and the early role of such cells in the pathogenesis of the disease. They carried out a transcriptomic analysis, comparing transcripts expressed by dystrophic myoblasts isolated from two murine models of DMD (Dmdmdx and Dmdmdx-βgeo) and control healthy mice. The expression of a large number of genes, comprising key regulator of myogenic differentiation (Myod1, Myog, Pax3 etc.) resulted affected in comparison to control in both mouse lines.

      We believe that the novelty and importance of these result lie in demonstrating for the first time that the loss of full-length dystrophin expression is both necessary and sufficient to trigger molecular and functional abnormalities in myoblasts. The fundamental point is that, contrary to the prevailing belief, the dystrophin function may not be just to provide sarcolemma stability in myofibers but rather that there is a disease continuum: DMD defects in satellite cells (Dumont et al., 2015, Ref 45), cause myoblast dysfunctions diminishing muscle regeneration (this work), and also impairing myofiber differentiation (Shoji et al., Ref 4), with the resulting fibre being unstable and therefore degenerating. These data can better explain all the symptoms of dystrophic muscle pathology, where abnormalities in satellite cells, myoblasts and myofibers form the pathological vicious cycle. Moreover, we identify the key trigger behind these abnormalities in dystrophic myoblasts, which is MyoD downregulation. Furthermore, we demonstrate that the additional loss of short dystrophin isoforms, although these are expressed in myoblasts, do not exacerbate the phenotype. This latter finding is very important given the near complete lack of understanding of the pathology in dystrophin-null patients.

      Authors highlighted similar gene expression modifications also in a myoblast cell line previously established from the mdx mouse.

      Analogous alterations found in both primary myoblasts and in the established myoblast cell line demonstrate that this change is cell-autonomous and not evoked by the external factors in the dystrophic niche, e.g. inflammatory mediators. This also shows that the dystrophic phenotype resists the transcriptomic drift as it is maintained through numerous passages. This approach was praised later on in the review.

      To assess the outcomes from the gene ontology analysis, which pointed on the alteration of muscle system and regulation of muscle system processes, authors evaluated the proliferative, chemotactic and differentiative capacities of dystrophic myoblasts. Myoblasts presented increased proliferation, reduced chemotaxis and quite surprisingly, improved differentiating capacity, if considering the transcriptomic data.

      The key pathways (proliferation, migration and differentiation), that are essential for myoblast to evoke muscle regeneration, were confirmed to be altered in functional analyses, thus proving these transcriptomic alterations to be functional and biologically relevant. Our data showing accelerated differentiation in mdx myoblasts fully agree with findings by others, both in primary cultures and in isolated myofibers (Yablonka-Reuveni &Anderson, 2005, Ref 22).

      Finally, Maxime R. and co-authors carried out a transcriptomic analysis in myoblasts from DMD human subjects. Even though the profile of altered gene expression resulted similar and the GO studies seemed to focus on the same pathway categories, a significative divergence was observed particularly at the level of gene expression.

      Given that myoblasts from individual DMD patients present heterogeneous phenotypes (Choi et al., 2016), such divergence at the level of individual gene expression between mouse and human is to be expected. Nevertheless, these changes become convergent in altered GO categories and pathways. In the revised manuscript we have included additional genome-scale metabolic analysis in human DMD myoblasts. This revealed significant alteration in specific metabolic pathways. These are consistent with the metabolic alterations found previously in dystrophic muscle and brain, thus confirming the commonality of dystrophic defects found here in myoblasts and described before in dystrophic tissues. Moreover, this analysis is an additional proof that DMD myoblasts are significantly altered when compared to healthy cells.

      Authors link transcriptomic abnormalities and functional changes in proliferation, chemotaxis and differentiation of the dystrophic myoblasts with the alterations (probably epigenetic changes) occurring in satellite cells of dystrophic mice, consequent to the absence of the dystrophin protein. Such modifications in gene expression are supposed to be inherited by pathological myoblasts due to the division of the SC that is no longer asymmetric as occurring in healthy tissue.

      Strengths

      Transcriptomic data from samples of different sources are solid and rigorous statistical analyses have been carried out.

      Transcriptomic and functional data from primary proliferating myoblasts of the two mouse models and from the myoblast cell line are similar. This is a convincing evidence that the transcriptomic alterations observed in primary myoblasts are not influenced by the exposure to the niche environment present in the dystrophic muscle, but that are cell autonomous.

      Authors adopted a 3D culture for the functional analysis concerning myoblasts differentiations, in this way better mimicking the process occurring in vivo.

      Weaknesses

      The mdx mouse phenotype is mild in comparison to the severe symptoms and the rapid disease progression experimented by most of the human DMD subjects. Mdx mice is characterized by cycle of degeneration/regeneration initiating around the age of 6 weeks and continuing for several weeks. It was expected that authors discussed this point in detail, also considering that the animals used in this study were 8 weeks old.

      The mdx mouse has a mutation resulting in the loss of full-length dystrophin expression, which reflects the molecular defect affecting the majority of DMD patients. Therefore, mdx is the most commonly used pre-clinical model in DMD studies. The intensity of myonecrosis during this active degeneration and regeneration period (starting at 12 days and not at 6 weeks) is as aggressive as in patients. In fact, it has been suggested that the intensity of myonecrosis seen in mdx mice would be lethal to DMD patients (Duddy et al., 2015). The difference between human and mdx mouse pathology is that, starting at 10 weeks of age, the fibre replacement in mdx leg muscles reduces gradually, due to an unknown mechanism. Therefore, we isolated myoblasts at 8 weeks, when mdx replicates the human pathology. To emphasise the relevance of our findings for the human pathology, we discuss this point in detail in the revised manuscript.

      Furthermore, transcriptomic analysis of the human DMD myoblasts highlighted many differences as well as similarities when compared to mouse samples. Why do not focus more on this aspect? According to the authors, dystrophic abnormalities in myoblasts manifest irrespective of differences in genetic backgrounds and across species. The last one is a strong statement that should have been supported at least by functional data regarding chemotaxis proliferation and differentiation of human DMD myoblasts.

      What we meant by: “dystrophic abnormalities in myoblasts manifest irrespective of differences in genetic backgrounds and across species” is that the lack of full-length dystrophin expressions results in identical molecular defects in mouse and human primary myoblasts and also in the dystrophic cell line, despite numerous gene expression alterations triggered by the long-term culture in the latter We agree that linking the functional alterations in human dystrophic myoblasts to the transcriptomic alteration that we identified is important. And indeed, altered proliferation, migration and differentiation of human DMD myoblasts have been described before (Witkowski and Dubovitz., 1985; Nesmith et al., 2016; Sun et al., 2020). In fact, these previous findings that were never fully investigated, prompted us to undertake this study. Thus, our data provide a molecular underpinning for these abnormalities. In the revised manuscript we have elaborated on the existing functional data supporting alterations in human myoblasts.

      Further functional analyses will be needed to understand their consequences. It would require investigation of numerous parameters, including significant alterations in metabolic pathways, which we identified and described in the revised version of this manuscript. Given the aforementioned individual variability in patients’ population demonstrated by heterogeneous phenotypes in myoblasts, such functional analyses would need to involve a significant number of probands.

      Therefore, a detailed study in a sufficiently large cohort of DMD myoblasts is a logical next step from the identification of specific pathway alterations described here. But it is an extensive new project beyond our immediate capability.

      In the discussion, the authors suggest two possible mechanisms as responsible for alterations in the behavior of the SC that ultimately affect the functionality of myoblasts, an RNA-mediated pathological process or an alteration in epigenetic regulation. They consider the latter mechanism more likely. This is based in particular on transcriptomic data showing the downregulation of important genes involved in histone modifications, normally linked to transcriptional activation. They also reported from the literature that HDAC inhibitors upregulate MyoD, a gene that is effectively downregulated in this study. Since the authors postulate that the epigenetic dysregulation of Myod1 expression is responsible for the pathological cascade of gene downregulation, ultimately leading to the pathological phenotype, it would have been interesting to evaluate the impact of HDACi on this pathways or the overexpression of enzymes responsible for H3K4 methylation as Smid1 (downregulated in this study).

      We have presented several hypotheses regarding the mechanism in which loss of full-length dystrophin expression could affect myoblasts, including restricted spatio-temporal requirement for small amounts of full-length dystrophin and an RNA-based mechanism. The notion that epigenetic dysregulation of Myod1 expression causes a pathological cascade of transcription downregulation of genes controlled by MyoD was based on our finding that transcripts downregulated in dystrophic myoblasts exhibit overrepresentation of MyoD binding sites. We discussed this as a likely mechanism, supported by a body of literature on the known alterations of epigenetic regulation found in DMD (fifteen papers in total). We also offered a hypothesis that since treatment of mdx mice with histone deacetylase inhibitors (HDACi) promoted myogenesis (Saccone et al., 2014) and HDACi upregulate Myod1 (Mal et al., 2001), HDACi could increase myogenesis by counteracting the changes we found in dystrophic myoblast. However, while evaluation of the impact of HDACi or of the overexpression of enzymes responsible for H3K4 methylation would prove or disprove this one of the working hypotheses we made in the Discussion, it would, in no way, alter the key discovery of this study, which is that loss of full-length dystrophin expression results in major cell-autonomous abnormalities in proliferating myoblasts. Thus, if preferred, this Discussion paragraph could be shortened not to detract the reader from the main findings of this manuscript.

      Reviewer #2 (Public Review):

      This study is one of many that explore various abnormalities in the mononuclear myogenic cell compartments in DMD. Although the aim has been extensively investigated in the last several decades, it is still relevant.

      It is correct that abnormalities of proliferation, migration and differentiation in dystrophic myogenic cells have been reported over decades, but these were not followed up and often disregarded. Certainly, their causative link to DMD mutations and their consequences for the pathology were never investigated. Our study is the first to provide the comprehensive molecular underpinning for these abnormalities, demonstrating that the loss of full-length dystrophin expression directly and significantly affects myoblasts.

      The biggest limitation of this study is that it relies on the RNAseq analyses of extensively cultured myoblasts. While the computation analyses are profound, the study lacks any mechanistic explanation for the relevance of the transcriptional differences seen in the DMD myoblasts.

      We are not sure where this opinion had originated from. In fact, we used freshy isolated primary myoblasts in RNAseq experiments and then confirmed the key alterations functionally in primary myoblasts freshy isolated from two strains of DMD mice. Furthermore, we performed the mechanistic analyses, where we linked process alterations to functional defects, in which we focussed on proliferation, migration and differentiation, as processes known to impact the DMD pathology.

      In an approach considered as one of the strengths of our work by the other Reviewer, these findings in primary myoblasts were then reproduced in myoblast cell line, to demonstrate that alterations observed are not evoked by the exposure to the niche environment present in the dystrophic muscle, but that are cell-autonomous. Importantly, DMD mutant cells show these alterations despite being extensively cultured in vitro, demonstrating expressivity of this mutation. Finally, alterations were confirmed in human primary myoblasts.

      Cell purity, the myogenic status of the cells, passage number, and the period that cells were in culture are not well described. This study's cell isolation method allows contamination with non-myogenic cells that can significantly influence the RNAseq analyses. Immunostaining for myogenic markers, for example, MyoD, would indicate the purity of the cell culture. Extensive culturing of the primary myoblasts promotes clonal selection and introduces numerous molecular alterations; thus, the passage number and duration of the culture are significant factors. It looks that some assays were conducted with cells in the high passage. For example, in myogenic differentiation assay where they needed one million cells for each pellet. Maybe that is the reason for the low differentiation rate presented in Sup. Fig 2.

      Cell homogeneity across genotypes was fully confirmed by sample-based hierarchical clustering, clearly segregating transcripts into groups corresponding to genotypes. Furthermore, the same alterations were found in corresponding myoblast cell lines, which purity and myogenic potential was demonstrated previously (Onopiuk et al., 2015). Therefore, varying contamination with non-myogenic cells could not significantly influence these results. However, for completeness, in the revised manuscript (Supplementary Figure 8) we described cell characterisation using MyoD as a marker, proving that the well-established myoblast isolation procedure used by us produces pure myoblast cultures.

      As for the differentiation assay, isolated myoblasts were never passaged extensively (one passage only) but sufficient numbers were obtained through the efficient isolation. Moreover, cells from every genotype were maintained and treated identically. Therefore, under these given conditions, any differences were the result of the DMD gene mutation and not culturing.

      It is hard to explain how DMD myoblasts differentiate better than the WT controls if they have a suppressed myogenic program in the proliferation stage. Even at day 0 of differentiation, DMD myoblasts differentiated better according to the RT-qPCR presented in Figure 5c. Additionally, it is unusual that the marker of differentiation Myog and Myh1 reached the peak at day 2 of differentiation for the WT myoblasts.

      In fact, our data fully agree with findings by others, that mdx cells display accelerated differentiation both in primary cultures and in isolated myofibers (Yablonka-Reuveni &Anderson, 2005). Our team recently demonstrated that DMD mutations evoke marked transcriptome and miRNome dysregulations early in human muscle cell development (Mournetas et al, 2021). Expression of key coordinators of muscle differentiation was dysregulated in proliferating dystrophic myoblasts, the differentiation of which was subsequently found to be altered, in line with the mouse cells studied here. Clearly, further studies into the mechanisms of this and numerous other alterations described by us here are urgently needed, as these may uncover new therapeutic targets.

      As to whether it is unusual for these differentiation markers to peak at that time, we cannot comment, as no reference for this statement was given and the expressions can vary depending on the experimental conditions used – in our case the 3D culture could make the difference. Yet, again, cells from every genotype were maintained and treated identically and so any differences reflect the impact of the DMD mutation.

    1. Author Response

      Reviewer #1 (Public Review):

      The current manuscript examined patients with inborn errors of immunity (IEI) using whole exome sequencing (WES) and identified de novo variants (DNVs) associated with the disease. They found 14 genes associated with DNVs, including four novel genes - PSMB10, DDX1, KMT2C, and FBXW11, and conducted a systematic assessment of affected genes.

      Given the level of heterogeneity underlying IEI, the sample size is limited. Although the authors clearly stated this, the analysis of the current manuscript does not add much value to describing genes affected by DNVs. The sample size is small to perform exome-wide evaluation (authors described they did "exome-wide evaluation" in Abstract - line 10 but there is no statistical evaluation to prioritize effect genes). They could go with systems biology approaches, explaining the biological pathway of affected genes or underlying cell types from immune single-cell datasets. As the authors stated that IEI constitutes a large group of heterogeneous disorders, there should be some analysis to explain the functional convergence of affected genes in disease development.

      We believe the term ‘exome-wide evaluation’ might have led to misinterpretation. We used it in the context of reviewing each DNV found in a single patient’s exome outside the diagnostic IEI gene panel (i.e. ‘exome-wide’), instead of reviewing DNVs across all exomes. We have rephrased the sentences containing this term. The main purpose of this manuscript was to identify ‘all’ coding DNVs in each case, and explore whether they include any pathogenic or novel candidate DNVs. Our main purpose was to urge the IEI field to apply trio-based WES more systematically, and share candidate DNVs with the field for further validation.

      As the reviewer points out, our sample size would be too limited to perform systems biology approaches for variant prioritization. The signal-to-noise ratio would be very high, because many genes causing inborn errors of immunity remain to be discovered and the studied group of patients with inborn errors of immunity is very heterogeneous. This means that we would not have the power to investigate potential enrichment or burden of DNVs in specific genes nor the functional convergence of affected genes or pathways in specific phenotypes. In this study, we aimed to show the additional value of the systematic DNV analysis as a method to identify and prioritize candidate variants in individual cases, but ideally we would like to answer other important research questions using computational/statistical approaches in a larger cohort in the future, as has been performed in other rare disease fields. The suggestion of the reviewer is helpful, and this approach has been shown to implicate novel pathways enriched in disease for various forms of neurodevelopmental diseases for which ten-thousands of trio-based WES have been performed [9, 10].

      For DNV identification, the authors filtered out variants with ExAC & gnomAD AF > 0.1% or GoNL AF > 0.5%. I think this is too lenient a cutoff for filtering for DNV. For example, gnomAD AF 0.1% is approximately ~200 individuals in population. Given the filtering parameters (<5 variation reads, <20% variant allele frequency, or low coverage DNVs), they did not use specific filtering metrics to find DNV and there might be false-positive variants in the final DNV set. As far as I can find in the manuscript, they used the GATK pipeline from the previous study (REF 29). The GATK unified genotype generates a range of filtering metrics to increase specificity in variant filtering. It is very surprising that the authors seem to use three parameters (variation reads → FORMAT:AD[1]; variant allele frequency → FORMAT:AB? and low coverage → FORMAT:DP? but the authors did not state the cutoff) to filter de novo variants, which are fragile to false-positive variant calling.

      The chosen population database fraction cut-offs align with DNV filtering strategies in literature. We have not chosen a stricter cut-off to avoid missing true positives, since patients with IEI can exhibit late-onset disease, variable penetrance and have postzygotic mutations, while limiting the chance of false-positive findings. For instance, we have reduced local false-positives by filtering on allele frequencies in our in-house database and Dutch population database. Moreover, automated DNV calling required >2% alternate reads in either parent and variants were prioritized based on prediction scores and annotated immune function. Additionally, and in accordance with this expert reviewer, we have now put a stricter cut-off in place for variation reads (from 5 to 10) to further minimize false-positive findings. Lastly, we visually inspected the final 14 candidate DNVs in IGV and/or Alamut, which supports the validity of the findings. The DNVs reported in our final DNV list (Table 2B) are therefore unlikely to contain falsepositive findings.

      Reviewer #2 (Public Review):

      The manuscript by Hebert et al., reports on the utility of TRIO-based whole-exome sequencing (WES) in patients who presented as sporadic cases and are suspected of having inborn errors of immunity (IEIs). The authors developed an in-house pipeline for data analysis and used a set of known algorithms to prioritize the impact of genetic variants located mostly in the coding region of proteins. The data analysis was done in two steps; the first step involved the routine WES diagnostic analysis that led to the identification of pathogenic (P) and likely pathogenic variants (LP) in genes already associated with IEIs. The authors claim that this analysis resulted in a likely molecular diagnosis in 19 (~15%) of patients, while an additional 14% of cases were carriers for VUSs or other risk factors in the disease causal genes. As many of these variants are either inherited from one parent or are present as heterozygous (monoallelic) variants in genes associated with recessive diseases, their clinical significance is unclear.

      In the second step, the authors focused on the identification of de novo variants (DNVs), including SNVs, CNVs, and small indel, since these variants are more likely to be deleterious on protein function. The authors identified 136 non-synonymous DNVs, which were then filtered down to 14 best candidate variants using various in silico tools and database searches. These 14 variants included DNVs in genes previously associated with autoinflammatory diseases, such as CAPS and RELA haploinsufficiency. Three patients are found to carry de novo copy number variants (CNVs) of unknown clinical significance. Finally, several de novo loss-of-function (LoF) variants have been identified in genes that are not yet associated with any IEIs but are good functional candidates. Their potential pathogenicity is further supported by the observation that they are found in genes intolerant to loss of function. Functional validation has been performed only for the patient carrier of the novel FBXW11 splice variant. The authors state that the maximum solve rate (i.e., probable molecular diagnosis) in this cohort might be as high as 23%, which is comparable to similar reports of patients with IEIs, however, the reported results do not yet support this conclusion.

      The main conclusion of this study is that TRIO-based WES analysis for DNVs could improve the diagnostic rate and can result in the identification of novel disease-causing genes. TRIO-based sequencing is also preferable when analyzing patients from populations underrepresented in gnomAD and ExAC. As the cost of WES has come down, WES has been increasingly used in the clinical diagnosis of many human disorders. Despite the major progress in the development of novel sequencing technologies and new in silico tools, the diagnostic rate is still below 50%. In summary, this study suggests that despite the identification of over 400 genes associated with IEIs, there are many more genes to be identified and that the heritability of these diseases is very complex.

      We thank the reviewer for the elaborate summary of our study and the suggestions that have helped to further improve the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This is a study that is aimed at understanding the binding mechanism of D-serine to the two different binding lobes of the NMDA receptor. D-serine is a known agonist and binder of the GluN1 ligand-binding domain, but its interaction with the GluN2A is unknown. Using long time-scale conventional molecular dynamics simulations, the researchers observe that D-serine interacts and associates readily with both binding domains, often via protein surface pathways referred to as a guided-diffusion mechanism. As observed previously, free-energy calculations show that D-serine stabilizes the closure of both binding domains. Finally, analysis of the effect of glycans shows that these modifications play a role in further stabilizing the closed state of the ligand-binding domains.

      Amongst this broad and careful analysis, the major finding from this work is that D-serine surprisingly associates with GluN2A, which has been known to bind glutamate to enable activation of the channel. Since the binding of D-serine to GluN2A had not been observed previously, they proposed that D-serine acts as an inhibitor for glutamate at high concentrations. This hypothesis was investigated and supported by electrophysiological experiments, yielding a novel result that presents new interpretations for the field. However, the guided-diffusion mechanism still remains hypothetical and is unclear as to whether this is in fact a driving force, or requirement, for the binding. Specifically, the following questions warrant further investigation:

      1) Specific or non-specific association? It is possible that non-specific association events of ligands to the protein could be an intrinsic artifact of the MD simulations. To investigate this, it would be informative to compare the current results with a negative control simulation where the ligand was replaced with a similar amino acid or molecule that has been verified as a non-binder for NMDAR.

      To address this, we quantified the non-specific association signal by comparing the number of successful binding events to random association (see response to Essential Revisions #4). In theory, any appropriately small amino acid could associate with the conserved arginine of each LBD through its C-terminus (as evidenced by our PMF of glycine bound to GluN2A). However, an amino acid’s ability to remain bound long enough to induce LBD closure is largely dependent on the presence of interactions with the LBD bottom lobe.

      2) Dissociation events? Further clarification is required to understand whether any dissociation events are observed in these simulations to the non-specific sites or the final binding site. If dissociation is not observed, how does this impact the interpretation of the binding mechanisms that characterize only the association events?

      Association and dissociation are both observed and documented in Datasets S2-S4. We added clarification to the text on page 5 about the nature of both processes and how pathways are defined by residues that allow the agonist to enter and leave the binding site. As illustrated in the clustering dendrograms, association (even-numbered events) and dissociation (odd-numbered events) pathways are present in all clusters.

      3) Testing the hypothesis of guided diffusion. It is proposed that guided diffusion drives serine binding to its site. This would imply that the residues on this path are important, and if mutated, would decrease the association rate and the ability to compete with glutamate. Additional electrophysiological experiments or direct binding experiments would be useful in understanding the relevance of guided diffusion in the ligand-binding mechanism of NMDARs.

      To address this point, we performed additional TEVC experiments generating D-serine dose-response curves for GluN1a Arg694Ala and Arg695Ala, and GluN2A Arg692Ala and Arg695Ala. The curves for both GluN2A mutants support our guided diffusion mechanism, as they lowered the D-serine inhibition potency (These mutants also likely also alter glutamate binding, but since D-serine and glutamate bind through the same residues, it is not possible to separate out individual contributions.) The GluN1a mutants did not show altered behavior, supporting the increased diffusiveness of D-serine binding to GluN1 compared to GluN2A. These additional findings are included in the main text on page 12 and in Fig. 4D.

      Reviewer #2 (Public Review):

      In this manuscript, Yovanno et. al. did a comprehensive mechanistic study of D-serine binding to NMDAR ligand-binding domains (LBDs). The framework of the current investigation is built upon this research group's previous studies of NMDAR agonists glutamate and glycine binding. Using an aggregated 51 microseconds of all-atom MD simulations of spontaneous binding, the authors applied rigorous pathway similarity analysis to cluster the paths through which D-serine enters the LBDs from the bulk solution. The most interesting and unexpected result from this study is the spontaneous binding of D-serine to the GluN2A LBD, which was previously known to be the glutamate binding site.

      By computing the overlap coefficient for all binding pathways, the authors concluded that D-serine binding to GluN2A LBD through "guided" diffusion, while to GluN1 through random diffusion (the clustered pathways comprise random contacts rather than specific, conserved residue contacts). A "guided" binding pathway further suggests that the agonist binding could be sensitive to the conformational change within and around the binding pocket, and vice versa.

      To investigate whether D-serine binding events are able to modulate the GluN2A LBD conformation, the authors then computed a series of LBD conformational free energy landscapes (2D-PMF) using 2D-umbrella sampling simulations. The 2D-PMF profiles confirmed that D-serine stabilizes the closed LBD conformation, just like glutamate. Because the D-serine 2D-PMF shows a metastable state that was absent in glutamate 2D-PMF, the authors argue that D-serine may not stabilize the closed conformation to the same extent as glutamate. Likewise, based on the 2D-PMF of GluN1 LBD, the authors suggest that D-serine has a higher potency than glycine, in part due to its ability to more strongly stabilize a closed LBD conformation.

      The simulations above generated the hypothesis that D-serine could function as a competitive antagonist of glutamate at high concentrations. This computationally derived hypothesis is beautifully tested by the authors' dose-response curves and the Schild plot.

      One question that would merit further clarification is whether the binding affinity of D-serine to the two LBDs is stronger or weaker in comparison with glutamate and glycine. The difference in agonist potency could be due to the difference in binding affinity and/or efficacy. Stabilizing the closed LBD conformation may indicate the efficacy of the agonist, but affinity (Kd) will still play a role in the final potency.

      Indeed, as Reviewer 2 pointed out, affinity should play a role since the D-serine inhibition here is attributed to the competitive binding of D-serine against glutamate as we showed with our Schild plot. The bona fide binding site for D-serine is GluN1 LBD where D-serine binds more strongly than glycine (Furukawa/Gouaux 2003). In the GluN1 LBD, D-serine is a full agonist. The D-serine binding to the GluN2A LBD (the finding here) is substantially weaker (mM) than glutamate (~1 uM).

      While a glycosylated GluN1/GluN2A dimer was used for the majority of MD simulations, the authors also checked the "reality" by mapping the pathway residues onto the NMDAR heterotetramer structure. The role of glycans in D-serine binding pathways was further investigated by conducting an additional 30 microseconds simulations of the non-glycosylated dimer. It was found that glycans introduced small kinetic "traps" that slow down the binding process. Glycan was also found to stabilize LBD closure from 1D-PMF profiles.

      The detailed mechanistic insight and D-serine's inhibitory effect on NMDAR, unraveled by this study, may play an important role in therapeutic strategies, and thus is likely to have a broad impact in the field.

    1. Author Response

      Reviewer #2 (Public Review):

      Dr Muktupavela et al. present a novel likelihood-based method for inferring the strength of natural selection and basic demographic parameters, such as mobility rates, from time-stamped ancient DNA data in a spatially explicit framework. This is an elegant method that is, in many ways, a natural extension of previous work in the field that has focussed mainly on inferring natural selection from temporal data to a spatial setting. In addition to the simplest scenarios of isotropic dispersal the authors also consider models with different dispersal rate in longitudinal and latitudinal directions, as well as biased dispersal. Selection strength, dispersal rates and bias are assumed to be constant across space and piecewise constant in time (but it would be very straightforward to relax these assumptions). The bias component of the model is an interesting addition that, in principle, allows to broadly account for the effect of long-range dispersals such as the spread of agriculture across Europe from the fertile crescent and Bronze age migrations from the Asian steppes on the spatiotemporal pattern of allele frequencies.

      Although the main idea is clearly communicated, there is room for improvement of the manuscript regarding investigating the properties of the model and presenting the results. Notably, the authors assume that the age of mutation is known and correct in their assessment of the performance of the model on simulated data (which may inflate the reported accuracy of the reconstructions) and use estimates from the literature when the method is applied to empirical data. Although it is necessary to specify the age of the allele, and this could easily have been treated as a free parameter in the framework. I would like to see a discussion of why the method may not be suitable for this, and a more systematic test for the sensitivity of the method to misspecification of the age (which could be very substantial, especially if the population history has been complex). In the cases where the model is run for different allele age estimates in the manuscript, such as for the lactase persistence scenario, the authors should present the (approximate maximum) likelihoods for the different scenarios in the text.

      An explanation as to why we do not infer the age of the allele (see text below) has been added to the main text under section “Parameter search” (lines 531-533). Briefly, we chose to construct our method in a way that uses the age of the allele as an input parameter rather than estimating it since there are multiple equally possible solutions with various combinations of allele age and selection coefficient values. This is demonstrated Appendix A3.

      We also added a description of log-likelihood values when we vary the allele ages under section “Robustness of parameters to the assumed age of the allele” in lines 324-329, the results of which are presented in supplementary Figure 6–Figure Supplement 9 and Figure 8–Figure Supplement 6.

      Briefly, we assessed the likelihood of the best fitted models by varying the ages of the rs4988235(T) and rs1042602(A) alleles. We can see that in the case of rs4988235(T) allele the allele age used in this study (7,441 years) gives the most likely solution among the explored ages. In the case of the rs1042602(A) allele, we found that there are multiple nearly equally likely ages when looking at ages at least as old as 15,000 years.

      A further weakness of the method is that it uses the Fisher information matrix to estimate uncertainty. While this works well if the posterior distribution is narrow, it can severely underestimate the uncertainty if this is not case, in particular if the distribution is non-gaussian in the tails. It would be better, but perhaps computationally prohibitively expensive, to report Bayesian posterior distributions for the parameters as well as Bayes factors that could be used to formally compare the fit of different models to the data.

      We agree with the reviewer that implementing Bayesian parameter fitting would likely provide a more robust understanding of the uncertainty of the estimates as well as an opportunity to formally compare different models using Bayes factors (although at the cost of an increase of computational intensity). Changing the inference engine of our method in this manner (while keeping it computationally feasible) is something we are currently investigating and hope to release as part of a future Bayesian version of our method. In the meantime, we have added a discussion of this caveat in our manuscript (sixth paragraph).

      Finally, although the rationale behind the model is clearly described, the detailed descriptions of the model and the numerical implementation have some shortcomings. First, there are typos in the appendix where the continuous model is derived from a discrete approximation (the right-hand side of Eq. (8) should not contain the term p(x,y,t) for it to be consistent with Eqs. (9) and (10)). Second, any differential equation model is incomplete without specifying the boundary conditions. This is especially important here as the assumption of uniform diffusion and advection on the grid is violated by the constraints imposed by the land mask, where the population is assumed to vanish on water areas (suggesting an absorbing boundary condition). Further down in the methods, details are also missing on how Eq. (10) was solved numerically, merely that it was discretized at a certain resolution.

      Looking more closely at the Eq (8), we do believe that the term p(x,y,t) should be there since it is moved to the left-hand side of the Eq (9) by simple algebraic rearrangements of the terms of the equation.

    1. Author Response

      Evaluation Summary:

      1) The paper is well written, and its style/formatting are optimal. The baseline signature moderately predicted outcome, and the data after one cycle further improved the algorithm, though this decreases its utility as a pure predictive tool

      We thank the editor and the reviewers for their positive feedback regarding the style and formatting of the manuscript. We concur that longitudinal sampling of blood, before and after one cycle of treatment, renders the predictive signature marginally more laborious to generate. In an ideal setting, we would be able to solely generate a predictive signature based on baseline characteristics - unfortunately such a test does not yet exist.

      In this study, we propose adding an easily obtainable blood sample after the first cycle of treatment to significantly improve our ability to predict response. Due to the ease of sampling them, we believe that blood biopsies will be key as the search for predictive biomarkers expands. Since the inception of our study, there have been numerous impactful pieces of published literature assessing PBMCs, mainly in response to immune checkpoint blockade 1-6. Given that our risk signature is now validated in an immunotherapy trial (EACH trial NCT03494322), we are even more confident with our unique approach to longitudinal sampling to developing a predictive model to systemic therapy. The trial design of the validation study is now included as supplementary (Figure 2A) in the manuscript.

      2) Signatures were not prospectively validated on an independent cohort; the algorithm was developed around a first-line therapy that is no longer considered to be the standard of care for HNSCC; and, while most of the conclusions are supported by the data, some of the caveats (such as the lack of a validation cohort, key in predictive biomarker development), are not addressed.

      Thank you. We will address this comment in two parts – (a) with regards to the validation cohort part and (b) for the status of the EXTREME treatment regimen in the original cohort. In this revised version, we have validated our risk signature in an independent cohort of patients who received cetuximab and avelumab (anti-PD-L1) in a single-arm, phase 2 clinical trial setting. Beyond serving purely as a validation cohort, it also demonstrates the applicability of our model in predicting response to immune checkpoint blockade-based therapy in keeping with contemporary advances in systemic treatment for HNSCC. The risk signature strongly predicted response in the new independent cohort giving us more confidence in our model’s ability to predict outcome for systemic therapy regimens beyond cytotoxic chemotherapy and cetuximab. Figure 5B shows the strong correlation between the risk signature and disease outcome in the validation cohort (Kendall rank correlation, t=0.725 p=0.0181).

      Secondly, the EXTREME regimen (platinum/5-FU/cetuximab) remains a first-line standard of care treatment in the UK and European countries for HNSCC patients with negative PD-L1 status (CPS score <1) which account for around 15% of all HNSCC patients 7. While the US Food and Drug Administration (FDA) approved pembrolizumab in combination with chemotherapy as first-line treatment regardless of PD-L1 expression and pembrolizumab alone for patients with PD-L1-expressing tumours (CPS ≥1), the European Medicines Agency (EMA) approved pembrolizumab with or without chemotherapy only for patients with a CPS ≥1, and this has been highlighted in the European Society for Medical Oncology (ESMO) and the UK National Institute for Health and Care Excellence (NICE) guidelines 8 and (https://www.nice.org.uk/guidance/ta661/chapter/1-Recommendations).

      Furthermore, chemotherapy with EXTREME regimen is standard of care for patients with contraindications to immune checkpoint inhibitors such as autoimmune disease 8. It can also be considered as second-line treatment in patients who only received pembrolizumab monotherapy in the first line setting.

      3) However the overall impact in the field of this work seems limited by a number of factors, including that the authors focused on immune cell subpopulations and exosomes, which narrows the scope (no cytokines or other biomarkers were included).

      Thank you. We selected a finite number of covariates based on a few factors – (a) published literature, (b) previous data generated by the group and (c) the applicability of the findings to the clinic. Instead of an exploratory article in which we could generate an infinite number of covariates by a technique similar to RNA sequencing, we opted for a select set of covariates. This hypothesis-driven approach generated a strong signature that is now validated across two trials. The focus on immune population is driven by our hypothesis that systemic changes in the PBMCs are indicative and reflective of the status of the intra-tumoral immune response. In the revised manuscript we used a custom immune focused imaging mass cytometry antibody panel to probe tissue sections from 9 patients. We now show that the key populations driving the predictive model in the periphery are not only reflected at the tumoral level, but these disparate immune cell subpopulations also interact. See Figure 6 in which we use a machine learning approach to segment cells and assign them to distinct immunological subpopulations. We found that the peripheral monocyte population strongly correlated with a tumoral macrophage population having a similar marker expression pattern. We found that the peripheral central memory CD8 T cells inversely correlated with tissue resident memory T cells. The tissue presence of both these cells correlated positively with outcome. Most strikingly, these two populations were most likely to co-localize with each other at the tissue level at a frequency of almost double the second highest co-localization. Data on the nature of the interplay between peripheral systemic immunity and intra-tumoral immunity is novel and rarely exists in the literature outside the scope of in-vivo animal models. Here we describe these interactions using human patient samples treated with a clinically relevant therapy.

      Given the limited amount of patient sera collected in the trial we opted to perform exosome analysis on markers known to impact the response to the anti-EGFR/HER3 treatment/immune responses. This was in line with our labs work to use exosome FRET-FLIM as a surrogate for tissue FRET-FLIM which we originally used to discover a potential dimer dependent mechanism for anti-EGFR treatment resistance in neoadjuvant breast cancer patients9; and more recently published on a colorectal patient sample cohort from the COIN study 10. While exosome EGFR-HER3 heterodimer failed to reach significance in our risk signature, it was close as depicted in the Kaplan-Meier curve from Figure 3C. We of course acknowledge the potential added benefit of having serum cytokine array analysis. While that was not feasible for this study our group now aims at ensuring that extra patient serum samples are bio-banked for such analysis from ongoing and future trials.

      Reviewer 1 (Public Review):

      1) For this study to be significant, one would want to see a marked improvement over current biomarkers, in a robust and generalizable population. Unfortunately, this study falls short in these respects. First, the authors do not adequately discuss the prior literature. Even a fairly crude and old-fashioned blood-based biomarker such as neutrophil:lymphocyte ratio has quite good predictive and prognostic capability in R/M HNSCC

      Thank you for your suggestion. We have expanded the discussion to include an overview of current biomarkers. We also compared the predictive power of neutrophil:lymphocyte ratio (NLR) from two published meta-analysis to our risk signature 11,12. We used the median risk score to divide our original patient cohort into a high and low risk group. We then calculated the HRs and CI for both signatures at pre-treatment alone (HR = 4.1397 [95% CI: 1.975 - 8.676]) and for the combined signature (HR = 2.574 [95% CI: 1.336 - 4.96]). Both were higher than the published literature whilst only using the median as the cutoff. Mascarella, Mannard et al. published “NLR greater than the cutoff value was associated with poorer OS and DSS (HR 1.69; 95% CI 1.47-1.93; P < .001 and HR 1.88; 95% CI 1.20-2.95”, and Takenaka, Oya et al published : “The combined hazard ratio for OS in patients with an elevated NLR (range 2.04-5) was 1.78 (confidence interval [CI] 1.53-2.07”. We realize that we are stratifying patients based on PFS and not overall survival, which is an inherent limitation of the study, but the added preditive value of the signature relative to existing literature we humbly believe is too large to not be impacful.

      2) It is not clear to me that there is a compelling need to do better -- given that existing predictive biomarkers based on clinical nomograms or NLR are actually used in practice.

      We agree that clinical nomograms (based on clinicopathological factors) have been shown to be predictors of outcomes in HNSCC 13. However, whilst these models have been validated as prognostic biomarkers for overall survival and/or disease specific survival, they are not currently recommended in the cancer treatment guidelines nor universally used in the clinic. With the further validation performed on a cohort treated with an immune-checkpoint inhibitor, our multimodal signature describes new data to help understand the range of treatment responses and predict outcomes and could be used to guide treatment intensification, continuation and/or early termination in clinical practice or incorporated into future clinical trials. Moreover, in the resubmission we extend our work from predictive biomarker research to developing a better understanding of the interplay between the peripheral immune response to intra-tumoral immunity which we discuss in this letter as part of our response to the public evaluation summary part 3. Given the recent surge in literature focused on tumor immunity with the increased use of immune checkpoint blockers, we believe our work offers a strong contribution to the few papers in circulation that have attempted to link tumor immunity from the systemic level to the tumor tissue level.

      3) A large number (31 of 87) patients were not included due to lack of biomaterials. No analyses have been performed to examine the characteristics of these patients. It is unlikely that the collection of biomaterials has no correlation with disease characteristics, prognostic features, outcomes, or the analytes in this study. This exclusion -- akin to unequal censoring in clinical trials -- is likely to significant impact results. Given that the population enrolled in a phase II trial, and that sub-population of patients who survive long enough and are feeling well enough to submit to large volume blood draws on trial, would not necessarily represent the real world population of R/M HNSCC patients, a broader population is needed to justify conclusions about this assay having robust predictive value.

      We appreciate the reviewer’s concern on potential skewness of the data based on patient selection criteria. The median PFS of our 56-patient cohort used in the generation of the risk signature was 5.48 months as shown in supplementary table 1 in the original submission. This is in line with real-world treatment outcomes to the EXTREME Regimen (cetuximab with platinum-based therapy) as first line therapy for Recurrent/Metastatic Squamous Cell Carcinoma of the Head and Neck which was reported as 5 month by Sano et al in 2019 14. It is also very similar to the median PFS observed in the DIRECT study 15

      4) It is unclear why OS as a hard endpoint was not analyzed here. No explanation is provided, other than OS was not available, a statement that is difficult to understand, given that PFS was available, and overall survival is a component of PFS.

      Thank you. We admit that the absence of overall survival is an inherent limitation of the study. In the process of submitting this revision, we have once again requested this dataset from the sponsoring pharmaceutical company but were informed that they are unable to provide it. This is because reorganization of funding priorities within the company precludes them opening datasets from an already-published clinical trial. We are equally disappointed to not be able to obtain this data, but firmly believe that the ability of the signature to predict PFS (the primary endpoint of the trial, untainted by subsequent lines of treatment), as well as cross-validation against the contemporary EACH trial, is a testament to the signature’s strength.

      There is no validation set for the biomarker. The biomarker was trained and cross-validated using Bayesian techniques to reduce overfitting. This is a valid approach for training and cross-validation, but for the biomarker to be testable and interpretable, it requires assessment in an independent dataset. There is no statistical technique that I am aware of that generates informative biomarkers without an independent validation dataset

      We completely agree with the reviewer regarding the need to obtain a validation set. Obtaining patient samples from a similar cohort was difficult but we managed to validate the signature on a set of patients treated with an anti-PD-L1 monoclonal antibody in combination with cetuximab. Furthermore, the validation was performed using a limited numbers of covariates that were identified in the risk signature by the Bayesian model. These immune populations can be obtained by running a limited set of markers on flow cytometry. We were very happy to see that these limited immune based covariates strongly correlated with a worst disease response in an independent cohort using a different treatment modality. This furthers our hypothesis that changes in the immune populations are key to understanding response to systemic therapy. Fueled with the data from the validation cohort we furthered our analysis of the tissue from a total of 9 patients from the test cohort. Using imaging mass cytometry, we were able to identify how immune populations are mirrored at the tumoral level opening the horizon for new research. The data for the validation set are copied into this letter in response to point 2 of the public evaluation summary.

    1. Author Response

      Reviewer #1 (Public Review):

      Tarasov and colleagues provide data that extensively phenotypes TGAC8 mice, which exhibit a cAMP-mediated increase in cardiac workload prior to developing heart failure. The authors confirm data from prior studies, showing increased cardiac output mediated by changes in heart rate with similar ejection fraction. 

      The above is slightly incorrect as stated. Our results section stated that HR and EF were increased in TGAC8, but that stroke volume did not differ by genotype. Thus 30% increase in cardiac output in TGAC8 was attributable to the increased HR.

      The study is overall well-planned and the amount of data presented by the authors is impressive. The work nicely incorporates animal-level physiology (echocardiography data), tests for known canonical markers of hypertrophy, and then delves into an unbiased analysis of the transcriptome and proteome of LV tissue in bulk. The techniques and analyses in the study are adequately executed and within the realm of expertise of the Lakatta laboratory. This study is a necessary and crucial first step to extensively phenotype this mouse line and generate hypotheses for further work. 

      Reviewer #2 (Public Review): 

      Tarasov et al. present an impressive amount of work in their in-depth assessment of a murine model of chronic stress in a transgenic line with constitutively active AC/cAMP/PKA/Ca2+ signaling that spans cardiac structure, function, cellular architecture, gene and protein expression, mitochondrial function, energetics and more. Exploration of multiple cellular pathways throughout the manuscript and as summarized in Figure 16 help characterize this murine model and serves as a first step in using this model to understanding the effect of chronic stress on the heart. The conclusions of the manuscript are well-supported by the data, and I have the following comments: 

      Strengths: 

      1. The authors present echocardiographic, histologic, electrocardiographic, neurohormonal quantification, protein synthesis/degradation, mitochondrial, gene and protein expression profiling, and metabolism data in their assessment of this model. 

      2. The verification of increased transcripts of AC and PKA activation in this transgenic line provided validation for the model. 

      3. The pathway analyses for both gene and protein expression profiling help supports the authors' claim of the importance of differences noted in the various pathways between the transgenic line and controls. 

      4. The investigators posit that there is decreased wall stress and adequate energy production due to a shift in metabolism. 

      As written, this statement does not exactly reflect what we had intended to communicate in the paper. We did not posit, that LV wall stress was reduced in TGAC8, but that it must be reduced compared to WT on the basis of Laplace’s Law because of a substantial reduction of LV cavity volume. We also did not posit that energy production is due to a shift in metabolism, but rather, that adaptations in energy metabolism resulted in adequate energy production to meet, what appeared to us to be a marked increase in energy demand in TGAC8 vs WT, based on our observation that transcriptome and proteome gene ontology (GO) terms that differed in TGAC8 vs WT, covered nearly all biological processes and molecular functions within nearly all compartments of the LV myocardium.

      These findings would suggest that this model would be suitable for that of an athlete's heart, which is characterized by thickened left ventricular walls without a compromise in function. 

      Although the chronic increase in cardiac output in TGAC8 heart simulates that of an athlete’s heart during exercise, LV cavity volume at rest is larger in the endurance trained heart and this is associated with bradycardia. In these aspects, the TGAC8 heart differs from the endurance trained heart (perhaps because it does not have sufficient rest periods between bouts of exercise, as does the endurance trained heart). In the discussion section of the manuscript, we noted several features that differed between the TGAC8 vs the endurance trained heart. 

      However, the mice do develop heart failure after 1 year without a sense of mechanism despite the wealth of data provided. Are the authors able to comment on what changes described in this study of this transgenic line may be deleterious in the long run? 

      Heart failure in the long run, had first been described in the TGAC8 mouse by Mougenot et. al. (ref 10 in our manuscript) who performed numerous biochemical and biophysical measurements in TGAC8 and WT attributed the heart failure to be a manifestation of accelerated heart aging. We are in the midst of conducting a longitudinal study of cardiac structure and function in the TGAC8 vs WT as these mice age, along with additional non-biased multi-omics analyses in order to get an overview about which of adaptive pathways that are activated in TGAC8 heart at 3 months of age become faltered with advancing age and how changes in these pathways relate to the altered cardiac structure and function of the TGAC8 as age advances. Following that, we will focus on each of these pathways employing detailed mechanistic analyses. Our provisional hypothesis is that while AC8 activity will continue to be increased as age advances, its downstream signaling will begin to fail due to age-associated changes in proteostasis and in the expression of proteins, including those involved in energy metabolism.

      Weaknesses: 

      1.  As acknowledged by the investigators, this is a hypothesis-generating rather than hypothesistesting study. 

      Yes, we used a systems approach at first, in order to “open our eyes” so that we could get an overview of numerous changes that might have occurred in the TGAC8 heart in order to generate hypotheses that could later be tested by others and by us.”

      2.  The investigators posit that there is decreased wall stress and adequate energy production due to a shift in metabolism. These findings would suggest that this model would be suitable for that of an athlete's heart, which is characterized by thickened left ventricular walls without a compromise in function. However, the mice do develop heart failure after 1 year without a sense of mechanism despite the wealth of data provided. Are the authors able to comment on what changes described in this study of this transgenic line may be deleterious in the long run? 

      We have addressed these comments above in our response to your comment #4 under strengths.

      3.  Figure 5B is referenced to support the claim regarding beta adrenergic receptor desensitization, but the data show catecholamine levels in tissue. I would have expected receptor expression analysis to suggest up/downregulation of receptors at the membrane to support this claim. 

      Beta adrenergic receptor desensitization can occur due to changes in molecules that inhibit signaling that are at the receptor or at the signaling downstream of the receptor in the absence of changes in receptor number. Here is how we summed this up in our manuscript:  “Numerous molecules that inhibit βAR signaling, (e.g. Grk5 by 2.6 fold in RNASEQ and 30% in proteome; Dab2 by 1.14 fold in RNASEQ and 18% in proteome; and β-arrestin by 1.2 fold in RNASEQ and 14% in proteome) were upregulated in the TGAC8 vs WT LV (Table S.3, S.5 and S.9), suggesting that βAR signaling is downregulated in TGAC8 vs WT, and prior studies indicate that βAR stimulation-induced contractile and HR responses are blunted in TGAC8 vs WT.8,11… A blunted response to βAR stimulation in a prior report was linked to a smaller increase in L-type Ca2+ channel current in response to βAR stimulation in the context of increased PDE activity.13, 14 WB analyses showed that PDE3A and PDE4A expression increased by 94% and 36%, respectively in TGAC8 vs WT, whereas PDE4B and PDE4D did not differ statistically by genotype (Figure 16-supplement 1 A). In addition to mechanisms that limit cAMP signaling, the expression of endogenous PKI-inhibitor protein (PKIA), which limits signaling of downstream of PKA was increased by 93% (p<0.001) in TGAC8 vs WT (Table S.3). Protein phosphatase 1 (PP1) was increased by 50% (Figure 16-supplement 1 A). The DopamineDARPP-32 feedback on cAMP signaling pathway was enriched and also activated in TGAC8 vs WT (Figure 15), the LV and plasma levels of dopamine were increased, and DARPP-32 protein was increased in WB by 269% (Figure 16-supplement 1 A).

      Thus, mechanisms that limit signaling downstream of AC-PKA signaling (βAR desensitization, increased PDEs, PKI inhibitor protein, and phosphoprotein phosphatases, and increased DARPP32, cAMP (dopamine- and cAMP-regulated phosphoprotein)) are crucial components of the cardio-protection circuit that emerge in response to chronic and marked increases in AC and PKA activities (Figure 4 C, F).” 

      4. Changes in ion channel (e.g. KCNQ1 and KCNJ2) gene and protein expression were described but not validated by assessment of change in function. 

      Reviewer #3 (Public Review): 

      Tarasov et al have undertaken a very extensive series of studies in a transgenic mouse model (cardiomyocyte-specific overexpression of adenylyl cyclase type 8) that apparently resists the chronic stress of excessive cAMP signaling for around a year or so without overt heart failure. Based on the extensive analyses, including RNAseq and proteomic screening, the authors have hunted for potential "adaptive" or "protective" pathways. There is a wealth of information in this study and the experiments appear to have been carefully performed from a technical viewpoint. Many interesting pathways are identified and there is plenty of information where additional experiments could be designed. 

      General comments 

      1. Ultimately, this is a descriptive and hypothesis-generating study rather than providing directly proven mechanistic insights.

      As noted in response to Reviewer #2: “Yes, we used a systems approach at first, in order to “open our eyes” so that we could get an overview of numerous changes that might have occurred in the TGAC8 heart in order to generate hypotheses that could later be tested by others and by us.”

      -Given several prior studies reporting a detrimental effect of chronically increased cAMP signaling, what is it that is different in this model? Is it something specific about AC8? Is it to do with when (in life) the stress commences? 

      We believe it is, at least in part, due to something specific about the effects of the marked increased activity of AC8 perse, because adenylyl cyclase singling impacts nearly all aspects of our current knowledge of cell biology. Thus, due to the marked increase of AC and PKA activation in the TGAC8 heart, the transcriptome and proteome gene ontology (GO) terms that differ in TGAC8 vs. WT covered nearly all biological processes and molecular functions within nearly all compartments of the TGAC8 LV myocardium.

      - Is the information herein relevant to stress adaptation in general or is it just something interesting in this specific mouse model?

      In our opinion, AC8 mouse model is very relevant to stress adaptation in general, but this broad view has hardly ever been realized previously in the literature, because of the reductionist nature (by necessity) of mainstream biomedical research. For example, reports on cardiac specific overexpression of AC5 and AC6 never provided broader view on these mice and were focused only on a limited number of traits i.e., arrhythmogenesis, chronic pressure overload, contraction (Am J Physiol Heart Circ Physiol. 2015 Feb 1;308(3):H240-9; Am J Physiol Heart Circ Physiol. 2010 Sep;299(3):H707-12; Clin Transl Sci. 2008 Dec;1(3):221-7; Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9986-90; Am J Physiol Heart Circ Physiol. 2013 Jul 1;305(1):H1-8). 

      None of the pathways that are apparently activated were directly perturbed so their mechanistic role requires further study.

      We agree and have entitled a section of our discussion “Opportunities for Future Scientific Inquiry Afforded by the Present Results” to address this plainly.

      Specific 

      1. The strain of the mice and their sex needs to be stated as well as the exact age at which the various assays were performed.

      All assays were performed on 3-month-old males. This information was inadvertently not directly stated in the original submission.  

      2. The hearts of the Tg mice have more cardiomyocytes but which are smaller. Since there is no observed increase in proliferation of cardiomyocytes, how (or when) did this increase in cell number occur?   

      It is likely that an increase in number of cardiomyocytes may have occurred during the embryonic stage of development (8.5 dpc), when AC8 expression begins. Since submitting our manuscript we have found that the expression level of human AC8 (the type of AC8 employed in this transgenic model) increases markedly during the embryonic period when compared to endogenous AC8 and remains elevated in both the fetal and perinatal periods. 

      3. While the mice do not show an increased mortality up to 12 months of age, HR/CO/EF are poor indices of contractile function. Data on end-systolic elastance or perhaps echo-based LV strain indices which will be relatively load-independent would be useful.

      Numerous comprehensive hemodynamic measurements have been performed previously on this mouse. For example, Mougenot et. al (Ref 10 in our manuscript), based on invasive hemodynamics analysis concluded that contractile function in the TGAC8 heart was increased at both 2 and 12 months of age. But Doppler imaging of the heart in conscience mice, unmasked, myocardial dysfunction, informed by a reduction in systolic strain rate in both old TGAC8 and WT littermates. This is why they attributed the heart failure in TGAC8 at 12 months of age to be a manifestation of accelerated aging.

      We agree with your comment that end-systolic elastance ought to be measured in the TGAC8 but also end-diastolic elastance, and effective arterial elastance should be measured in order to quantify diastolic function and heart energetic coupling in the TGAC8.  

      4.  Quite a lot of conclusions are made relating to metabolism. However, this is entirely based on gene expression or protein levels. Given the substantial role of allosteric regulation in metabolic control, as well as the interconnectedness of metabolic pathways, ultimately any robust conclusions need to be based on an assessment of activity of key pathways. 

      We concur and have described some of the types of metabolic assessments in the last section of our discussion “Opportunities for Future Scientific Inquiry Afforded by the Present Results”: “… precisely defining shifts in metabolism within the cell types that comprise the TGAC8 LV myocardium via metabolomic analyses, including fluxomics.97 It will be also important that future metabolomics studies elucidate post-translational modifications (e.g. phosphorylation, acetylation, ubiquitination and 14-3-3 binding) of specific metabolic enzymes of the TGAC8 LV, and how these modifications affect their enzymatic activity”.

    1. Author Response

      Reviewer #1 (Public Review):

      In their manuscript, these authors present a novel geostatistical framework for modelling the complex animal-environment-human interaction underlying Leptospira infections in a marginalised urban setting in Salvador, Brazil.

      In their work, the authors combine human infection data and the rattiness framework of Eyre et al. (Journal of the Royal Society Interface, 2020) . They use seroconversion defined as an MAT titer increase from negative to over 1:50 or a four-fold increase in titer for either serovar between paired samples from cohort subjects. Whereas this is a commonly used measure of infection; the work would benefit from answering the question about how robust results are related to this definition of seroconversion.

      Thank you for your comment. We have acknowledged this on line 534 in the discussion by adding the following text: “A possible limitation of this study is the titre rise cut-off values used for classifying seroconversion and reinfection in the cohort that determine the sensitivity and specificity of the infection criteria. However, these criteria were used because they are the standard definitions for serological determination of infection that are commonly applied for leptospirosis and a wide range of other infections, and they enable the comparison of results with other previous leptospirosis studies.”

      The model framework relies on the concept of 'rattiness' previously defined by Eyre et al. (JRSI, 2020) and assumes conditional independence within its built up (equation (1)). Whereas this is a reasonable assumption, it would be good to discuss situations in which this assumption is questionable and what the implications are for applying the modelling framework to other settings.

      We have added the following text immediately after “is shown schematically in Figure 2” following equation (1) on line 225: “The conditional independence assumption in (1) is reasonable for a vector-borne disease or one that is transmitted indirectly, in which context the observed rat indices are to be considered as noisy indicators of the unobservable spatial variation in the extent to which the environment is contaminated with rat-derived pathogen. It would be more questionable for applications in which the disease of interest is spread by direct transmission from rat to human.”

      The authors provide an extensive model building exercise and investigate, in different ways, whether the model captures the necessary complexity (GAM smoothers - testing linearity, spatial correlation, etc). I believe the work would benefit from (1) a formal diagnostic investigation, if feasible; (2) providing guidelines on how model building should be performed.

      We have added a new Appendix 7 with diagnostic plots of randomized quantile residuals to check the rattiness-infection model fit with the human infection data and included the following text in Section 2.4 of the main text: “A formal diagnostic investigation of randomized quantile residuals is included in Appendix 7. We found no evidence in the diagnostic plots to suggest that there were issues with our modelling approach.”

      To supplement the R code that is publicly available for repeating all of the steps in this analysis, we have now also included a detailed step-by-step explanation of the model building process in Appendix 8 that outlines the key steps for building the rat and infection components of the model (variable selection and evaluation of residual spatial autocorrelation) and fitting and examining the joint rattiness-infection model. We have added the following text in Section 2.6 of the main text: “We also include a step-by-step explanation of the model building process to guide future users of the rattiness-infection framework in Appendix 8.”

      The authors are to be acknowledged for providing an extensive and thorough discussion of the different aspects of their work. Whereas the discussion is complete, I wonder whether the authors can give a brief example about how this model can be applied in a different setting.

      Thank you. We have added the following text on line 551 in the discussion: “The framework may have important applications beyond the study of zoonotic spillover, with the rattiness component replaced by other exposure measures e.g. mosquito density or ecological indices (such as pollution, where there are multiple, related measures of air or groundwater quality) to model associations with human or animal health outcomes.”

      Reviewer #2 (Public Review):

      Eyre et al. developed and applied a novel geostatistical framework for joint spatial modeling of multiple indices of pathogen (Leptospira) reservoir (rats) abundance and human infection risk. This framework enabled evaluation of infection risk at a fine spatial scale and accounted for uncertainty in the pathogen reservoir abundance estimates. The authors used data collected in two different field projects: (1) a rat ecology study in which three different approaches were used to detect rat presence "rattiness", and (2) a prospective community cohort study in which individuals were sampled during two different time periods to detect recent infections via seroconversion or a four-fold increase in anti-Leptospira antibody MAT titer. Univariable and then multivariable analyses were performed on these data to identify (1) the environmental variables that best predicted "rattiness", and (2) the demographic/social, environmental (household), occupational, and behavioral variables that best predicted human risk of infection. Once identified, the best predictors from (1) and (2) were included in a final, joint model to identify the significant predictors of both 'rattiness' and human infection risk. As a result of this study, the authors were able to detect spatial heterogeneity in leptospiral transmission to humans. They found that infection risk associated with increases in reservoir abundance differed by elevation, and that increases in reservoir abundance at high elevation were associated with a much higher odds ratio for infection than at low elevation. The authors suggest that this has to do with differences in how the infectious leptospires (shed by the rat reservoir) are dispersed in the environment. At high elevations, flooding is less frequent and thus rat shed leptospires are likely to stay where the rat deposited them. Whereas at lower elevations, flooding may play a large role in spreading leptospires more evenly across the landscape, reducing the importance of rat presence at smaller spatial scales. The final best model was then used to generate prediction maps of 'rattiness' as well as human infection risk at all locations within the study area (i.e. including those that lacked rat detection data and human infection data. This work represents an important advance in infection risk modeling as it explicitly incorporates estimates of reservoir abundance and the uncertainty surrounding these estimates into the infection risk assessment, and allows for modeling of infection risk at fine spatial scales. Findings from this study have important management implications at the authors' study site as it suggests that interventions directed at high elevations should be different from those designed to address infection risk at lower elevations. However these are broader implications, as this novel approach may be applied to other systems to enable identification of differences in infection risk for other pathogens at a fine spatial scale, predict infection risk more broadly, and facilitate intervention strategies targeted for the specific epidemiological and ecological conditions experienced by a population.

      This was a well-designed study. The field sampling approach was well balanced, well described and appropriate. Broadly the modeling framework is appropriate for the questions being asked and for the data being used. The variable and model selection approaches were clearly described and appropriate. Evaluation of the more detailed mathematical approach is outside of my area of expertise, so I am unable to comment on the validity of the approach.

      For the most part, the explanatory variables assessed in the different models were well described and justified, however there were some cases for which further explanation would have been helpful. For example, how did the authors determine which occupations to evaluate? Specifically, why traveling salesperson? What is the difference between open sewer within 10 m and unprotected from sewer?

      We have added the following additional text to Section 2.3.2 on line 297 to clarify the definition and reason for inclusion for these variables: “In the household environment domain, two variables were used to capture risk due to sewer flooding close to the household: i) the presence of an open sewer within 10 metres of the household location and ii) a binary `unprotected from open sewer' variable which identified those households within 10 metres of an open sewer that did not have any physical barriers erected to prevent water overflow. Three high-risk occupations were included in the occupational exposures domain as binary variables. Construction workers and refuse collectors have direct contact with potentially contaminated soil, building materials and refuse in areas that provide harbourage and food for rats. Travelling salespeople have regular and high levels of exposure to the environment (particularly during flooding events) as they move from house to house by foot. Two other binary occupational exposure variables were included that measured whether a participant worked in an occupation that involves contact with floodwater or sewer water.”

      I also had some concerns regarding the time-period of the rat ecology study used to determine abundance, potential fluctuations in rat abundance through time, and how this might align with sampling to detect infection in humans. Depending on the time scale of population fluctuation in rats as well as fluctuations in infection prevalence in rats, the abundances calculated from data from the ecology study may not be accurately reflecting true abundance (and therefore shedding and transmission risk) during the time period that a human may have been exposed. However, the authors do a nice job of addressing some of these issues in the discussion. They mention that infection prevalence in rats is consistently around 80% and that there don't appear to be seasonal fluctuations in human exposure risk in the study area.

      Thank you.

      Reviewer #3 (Public Review):

      The goal of the authors was to test how important local rat abundance is as a driver of Leptospira infection in humans.

      The authors approached this using a strong combination of datasets on human infection risk and rat abundance, across a spatial scale that is large enough to allow simultaneous assessment of multiple potentially important drivers of infection risk. This further enables the authors to develop infection prediction maps based on the fitted models.

      This study design is a major advance towards understanding link between rat abundance and human infection risk.

      Based on the top models tested in the study, the authors conclude that local rat abundance is indeed correlated with infection risk, and that this correlation is strongest at higher elevation.

      This is an impactful finding, but in my opinion it is not yet clear how robust and important this is, because of two reasons:

      (1) The infection risk data: while the actual infection risk data are not shown, the map shown in Figure 5B suggests that there is an infection hotspot that happens to be at high elevation. This raises the question of how strongly this single hotspot is driving the observed correlation between rat abundance and infection risk (which the authors find to be much stronger at high elevation than at lower elevations).

      We have added a new figure (Figure 4) earlier on in the article (we decided to add this here rather than to Figure 6 - formerly Figure 5 - to ensure that the map is large enough that points in Figure 4A are easily visible – please note that it is included as a larger and easier to view image in the main eLife template version) with the raw infection data overlaid on contour lines for the three elevation levels to provide the reader with a better overview of the raw data. This new Figure 4 shows that out of a total of 403 participants in the high elevation region there were 16 infections, of which only 5 (31%) were located in the large hotspot in Valley 3 (valleys are numbered 1 to 3 from west to east, see Figure 1A). In addition to the largest hotspot in the north of Valley 3, there are several other areas in the high elevation region with raised predicted infection risk values relative to their surroundings where there were also rattiness hotspots and infected participants in the raw data: fives cases (red and yellow infection risk areas in Figure 5B) on the western side of Valley 2; the two cases on the eastern edge of Valley 2; the two cases on the western edge of Valley 3; and the single case in the southwest of Valley 3. Other variables are also important drivers of infection risk and at several of these locations the contribution of rattiness increases infection risk significantly relative to the low-risk surrounding area (e.g. to 10% in areas where risk is closer to 1% or 2%) without reaching the more obviously visible high infection risk values closer to 20%. We believe that our statistical model provides a better test of whether there is a statistical association between rattiness and infection at high elevations than a visual examination, but that this is supported by the large number of observations in the high elevation area (403) and the distribution of infected and uninfected households, which demonstrates that the observed association is not only driven by the hotspot in Valley 2.

      (2) The statistical models: if I understand correctly, all tested models of infection risk include the variable rat abundance, and while the individual effect estimates for rat abundance are statistically significant (Table 3), the more important question of how the fit of a model without the rat abundance variables compares with those of the other tested models (shown in Supplementary Table S2) has not been addressed.

      These models were considered but were ranked outside of the top five models and for this reason were not reported in Table S2. We agree that showing the AIC of a model without rattiness in this table can more clearly demonstrate the improved fit of the model with rattiness. To do this we have added the highest ranked model without rattiness (M) to Table S2 and added a note to the table explaining the reason for its inclusion (“Model M was ranked outside of the top 5 models but is included here for reference to demonstrate the improvement in model fit when rattiness is included”). The AIC of M* was 532.13. This is substantially higher than the top five models (M1 = 523.14 and M5 = 525.04), justifying its inclusion in this model and in the joint rattiness-infection framework.

      Regardless of whether rat abundance is an important driver of human infection risk, this study is a major step in our understanding of the role of rats in the spread of leptospirosis, due to the strong combination of a unique combination of datasets and a spatial statistical modeling approach.

      Thank you.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript discusses evolutionary patterns of manipulation of others' allocation of investment in individual reproduction relative to group productivity. Three traits are considered: this investment, manipulation of others' investment, and resistance to this investment. The main result of the manuscript is that the joint evolution of these traits can lead to the maintenance of diversity through, as documented here, cyclic (or noisier) dynamics. Although there are some analytical results, this main conclusion is instead supported by individual-based simulations, which seem correctly performed (but for clonal populations, as emphasized below).

      There could be material for a good paper here but the organization of the manuscript makes it difficult to fully evaluate. The narrative is highly condensed, with the drawbacks that this often entails in terms of accurately conveying the results of a study, as illustrated here by the following issue.

      The population is apparently assumed to be clonal (more than just "haploid"), meaning that there is no recombination between the loci controlling the three traits. In the one case where this assumption is relaxed (quite artificially), the cyclic dynamics disappear (section 4.4 of the appendix). This is crucial information that cannot be appreciated in the main text.

      The paragraph at line 368 offers a simple explanation for the joint dynamics of traits. However, this explanation would hold identically for a sexual population and a clonal population, whereas these two cases seem to have completely different dynamics. Thus, there is something essential to explain these differences, that is missing from the given explanation.

      Yes, our model was asexual with no recombination. To address this comment, we carried additional simulations where recombination was allowed (Appendix 1— 4.8). We found that recombination does not change our results (predictions), and describe this on line 469-475. By assuming additive effects of traits and each traits having the same dispersal property, our haploid asexual model is also equivalent to a diploid sexual model (Taylor 1996; Day & Taylor 1998).

      This is especially important because the finding that the joint evolution of several traits can lead to some form of diversity maintenance is not surprising. As the discussion acknowledges (but the introduction seems to downplay), it is also well understood that manipulation and counter-adaptations to it can occur in many contexts and lead to the maintenance of diversity. For this reason, similar results in the present case are not surprising, and the main outcome of the study should be to provide a deeper understanding of the forces leading to the different outcomes in the current models.

      I do not see clearly what distinguishes "manipulative cheating" from other forms of manipulations that have been previously discussed in the literature (e.g, as cited lines 461). Couldn't this be clarified by some kind of mathematical criterion?

      Thanks for pointing out that there is room to improve the distinction between our model and previous models! We have added more description to explain the conceptual difference on line 187-193, and a new subsection in appendix to show these differences through mathematically examine the fitness formulations in previous models (Appendix 1—1.3).

    1. Author Response

      Reviewer #1 (Public Review):

      This paper addresses an important question: whether the conduction velocity in white matter tracts is related to individual differences in memory performance. The authors use novel MRI techniques to estimate the "g-ratio" in vivo in humans - the ratio of the inner axon relative to the inner axon plus its outer myelin sheath. They find that autobiographical recall is positively related to the g-ratio in a specific white matter tract (the parahippocampal cingulum bundle) in a population of 217 healthy adults. This main finding is extended by showing that better memory is associated with larger inner axon diameters and lower neurite dispersion, which suggests more coherently organised neurites. The authors also argue that their results show that the magnetic resonance (MR) g-ratio can reveal novel insights into individual differences in cognition and how the human brain processes information.

      The study is exploratory in nature and the analyses were not pre-registered. The technique has not been used before to associate cognitive performance with MR estimates of conduction velocity in candidate white matter tracts. It is therefore unknown how strong any associations are likely to be and what sort of sample size might be needed to observe them. Nevertheless, if the technique proves to be reliable, then it certainly offers a valuable new tool to understand individual differences in cognitive abilities. However, brain structure to behavior associations are notoriously variable across studies and have been argued to require very large sample sizes to obtain reproducible results.

      We respectfully disagree that the study was exploratory. We had distinct aims and hypotheses from the outset. Our prime interest is in autobiographical memory, the hippocampus and its connectivity. This motivated our focus on three specific white matter tracts. We also planned from the time of study design to examine the MR g-ratio, and even contributed to refining the pre-processing pipeline for this approach, as reported in a previous paper (Clark et al., 2021, Frontiers in Neuroscience). Moreover, in the current manuscript we outlined well thought through possible outcomes and declared specific predictions.

      Regarding pre-registration, due to the scope of this work, the experiment was planned eight years ago, and data collection commenced seven years ago. At that time, formal pre-registration was not common practice. However, it has been a long-standing feature of our Centre that proposed studies and their analysis plans undergo rigorous internal peer review, including presentation to the whole Centre, before data acquisition can commence. The proposal for the research under consideration here was presented on 26th September, 2014.

      As noted in our response to the Editors’ Public Evaluation Summary above, someone has to be the first to report a novel result, and we believe that the depth and transparency of our approach permits confidence in the findings. Not least, and to reprise, because we employed the most widely-used and best-validated method of testing autobiographical memory recall that is currently available – Levine’s Autobiographical Interview. Our primary analyses were performed using the behavioural outcome measure from this test, the results of which were directly compared to those from a closely-matched control measure to test whether significantly larger effects were observed for our variable of interest. The potential for false positives was further reduced by extracting microstructure data from hypothesised tracts of interest (instead of performing whole brain voxel-wise analyses), with statistical correction performed on all structure-behaviour analyses. Moreover, we performed partial correlations with age, gender, scanner and number of voxels in a region of interest (ROI) as covariates. Complementary investigations were also conducted using other commonly-reported measures, providing supporting evidence. We report all analyses (and provide all the source data), including those finding no relationships. The consistent results throughout were associations between autobiographical memory recall ability and the microstructure of the parahippocampal cingulum bundle only. Moreover, thanks to the excellent suggestions of the Reviewers, the revised version reports additional analyses that allow us to further corroborate and interpret our findings.

      Our sample of 217 participants allowed for sufficient power to identify medium effect sizes when conducting correlation analyses at alpha levels of 0.01 and when comparing correlations at alpha levels of 0.05 (Cohen, 1992, Psychological Bulletin). While it has recently been suggested that thousands of participants are required in order to investigate brain structure-behaviour associations (Marek et al., 2022, Nature), other, more sophisticated, analyses suggest that samples of ~200 participants can be sufficient, in line with our estimates (Cecchetti and Handjaras, https://psyarxiv.com/c8xwe; DeYoung et al., https://psyarxiv.com/sfnmk). Given that our study was principled, well-controlled, analysed appropriately and produced very specific and consistent findings, we are confident that the findings are robust.

      The authors decided to analyse performance on a single task - the Autobiographical Memory Interview - and identified three candidate white matter tracts that connect the hippocampal region with other brain regions. While it is clear why these three tracts were chosen, it is less obvious why the authors chose to investigate associations with the Autobiographical Memory Interview and not other memory tests that were part of the battery of tests administered to the participants. It is reasonable to assume that something as general as the conduction velocity of a white matter tract would have an effect on memory ability across a range of tasks, so to single out one seems an unnecessarily narrow focus.

      Our main interest over many years, and hence the focus of this study, is autobiographical memory recall because it directly relates to how people function in real life. As noted above, autobiographical experiences occur in dynamic, multisensory, multidimensional, non-linear, ever-changing contexts; they involve actively engaging with the environment and other people; they are embodied; they span milliseconds to decades. Many of these features cannot be captured by laboratory-based episodic memory tests. This issue is increasingly being discussed (for example, see recent reviews by Nastase et al., 2020, NeuroImage; Mobbs et al., 2021, Neuron; Miller et al., 2022, Current Biology). It is further laid bare in McDermott et al.’s (2009, Neuropsychologia) meta-analysis of functional MRI studies which showed that laboratory-based and autobiographical memory retrieval tasks differ substantially in terms of their neural substrates. Consequently, we were not surprised to find that when we analysed laboratory-based memory test performance, there were no correlations with the MR g-ratio. Recall of vivid, detailed, multimodal, autobiographical memories may rely on inter-regional connectivity to a greater degree than simpler, more constrained laboratory-based memory tests. Therefore, as well as speaking to conduction velocity, these findings also contribute to wider discussions about real-world compared to laboratory-based memory tests. We thank the Reviewer for making the excellent suggestion to include these additional data, analyses and discussion points.

      The results of the study are interesting and highlight a key role of the parahippocampal cingulum bundle in autobiographical memory recall. The results are corrected for multiple comparisons across the three fiber tracts of interest and the recall of "external details" provides a nice control compared to the "internal details" which are the measure of interest. The main findings are extended to show that it is likely to be an increase in axon diameter and an increase in neurite coherency that characterize those individuals with better autobiographical recall. Despite these positives, it remains unclear whether memory recall, in general, is better in people with higher g-ratios in this tract (as implied in the Abstract), or if this effect is specific to scores on the Autobiographical Memory Interview.

      Our interest is in autobiographical memory, and so we employed the most widely-used and best-validated method of testing autobiographical memory recall that is currently available – Levine’s Autobiographical Interview. Not only does this test include a control measure, external details (as noted by the Reviewer), but we had independent raters score the autobiographical memory descriptions, and found that the inter-class correlation coefficients were very high (see Materials and Methods). Despite using this current, gold standard approach, at the request of the Reviewer we have now analysed data from eight additional laboratory-based memory tests. These are standard memory tests that are often used in neuropsychological studies: testing recall - the immediate and delayed recall of the Logical Memory subtest of the Wechsler Memory Scale IV, the immediate and delayed recall of the Rey Auditory Verbal Learning Test, the delayed recall of the Rey–Osterrieth Complex Figure; testing recognition memory - the Warrington Recognition Memory Tests for Words and Faces; testing semantic memory - the “Dead or Alive Test”. While these tests can assess some aspects of memory recall, they cannot be regarded simply as proxies for autobiographical memory recall, for the reasons we outlined in our response to the previous point. They do not capture key aspects of autobiographical memories. It is therefore all the more interesting that we found no associations between these laboratory-based memory tasks and the MR g-ratio of the parahippocampal cingulum bundle, in contrast to the relationship identified with autobiographical memory recall ability. Recall of vivid, detailed, multimodal, autobiographical memories may rely on inter-regional connectivity to a greater degree than simpler, more constrained laboratory-based memory tests. Therefore, as well as speaking to conduction velocity, these findings also contribute to wider discussions about real-world compared to laboratory-based memory tests. We thank the Reviewer once again for making the excellent suggestion to include these additional data, analyses and discussion points.

      Reviewer #2 (Public Review):

      In this study, Clark and colleagues tackle a very intriguing question: how differences in autobiographical recall abilities reflect in the human brain structure and function? To answer this question, they interviewed a large cohort of subjects and proceeded to acquire MRI data, specifically diffusion-weighted imaging and magnetization transfer data, to estimate the g-ratio, a measure of myelination deeply linked to conduction velocity. Looking at three specific white matter pathways of interest, all interconnecting the hippocampus with other brain structures, they studied the relationship between the g-ratio and the autobiographical recall abilities, together with many more measures from MRI. They found a significant positive association between the g-ratio of the parahippocampal cingulum bundle and the number of inner details from the interviews. These results can provide new potential directions to further study the underlying neural features beyond memory.

      I think that this is a very interesting article, it is well written, the methods are extensively explained, and the appendix provides further details for more expert readers. The authors put an effort into providing a comprehensive context in the introduction and in the discussion, and as a result, the paper seems overall quite suitable for both general and specialistic readerships.

      Thank you.

      The main issue I can currently see in the paper is that the mentioned relationship between g-ratio and recall abilities is then used to infer that better recall abilities are associated with higher conduction velocity and larger axons. The authors' line of reasoning is that given the hypothesized association, the increase in the g-ratio implies increases in myelin and axonal diameter. Despite this scenario being indeed possible given the current result, an increased g-ratio may also not indicate higher conduction velocity. In fact, the first potential inference would be that, without having any information on the axon size, the quantify of myelin can indeed be lower and as result, the conduction velocity would decrease. I understand that the authors expected higher conduction velocity associated with better autobiographical memory recall, but it is hard to see any experimental outcome that could have disproved this hypothesis: from the possible scenarios depicted in the introduction, any change in the g-ratio (and even not any change at all) could indicate higher conduction velocity. What would be then needed to corroborate one of these scenarios is some independent or complementary measure, which unfortunately is missing.

      The mentioned issue does not mean that the paper loses relevance - I think that it should focus on the very practical result, a change in myelination and microstructure, and discuss what are the potential implications, including the one that currently dominates the discussion section.

      Thank you for these comments and the opportunity to provide further clarification.

      First, we have now provided additional background information regarding the relationship between the MR g-ratio and conduction velocity. We explicitly note that while finding a significant relationship between the MR g-ratio and autobiographical memory recall suggests the existence of an association between autobiographical memory recall and parahippocampal cingulum bundle conduction velocity, it cannot speak to the direction of this association.

      Second, we have further noted that interpretation of the parahippocampal cingulum bundle MR g-ratio in relation to the underlying microstructure requires knowledge, or an assumption, about whether the associated change in conduction velocity is faster or slower. Given that faster conduction velocity is thought to promote better cognition (e.g. Brancucci, 2012; Dicke and Roth, 2016; Miller, 1994; Reed and Jensen, 1992), we interpreted our MR g-ratio findings under the assumption of faster conduction velocity, and now explicitly note in several places in the revised manuscript that this is an assumption.

      Third, we thank the Reviewer for the excellent suggestion that a complementary measure could help to further inform the findings. Consequently, we now also include additional analyses examining the relationship between the extent of myelination and autobiographical memory recall ability. This is possible using the magnetisation transfer saturation maps, which are optimised to assess myelination. Given our assumption of faster conduction velocity when interpreting our positive MR g-ratio correlations, then no relationship between parahippocampal cingulum bundle magnetisation transfer saturation and autobiographical memory recall would be expected. On the other hand, if conduction velocity is actually decreasing, then a negative correlation between magnetisation transfer saturation values and autobiographical memory recall ability would be observed. In fact, we found no relationship between parahippocampal cingulum bundle magnetisation transfer saturation and autobiographical memory recall. This suggests that myelin was not associated with autobiographical memory recall ability, supporting our assumption that relationships with the MR g-ratio were indicative of faster rather than slower, conduction velocity.

      We have now added these new data, analyses and discussion points to the revised manuscript.

      It would also be helpful to include some paragraphs on both interpretation and methodological issues when it comes to MRI-based microstructural imaging, which at the moment is lacking. This would provide a better picture of the results for a more general readership.

      We agree, and additional consideration of interpretational and methodological limitations have now been included in the manuscript.

      As one of the first works using an MRI-based microstructural measure of myelin, the g-ratio, to study cognition in a large cohort of subjects, I think this work will be a needed and significant step towards merging the neuroscience and MRI physics community - the methodology presented here is robust and could be used in many other applications.

      Thank you.

      Reviewer #3 (Public Review):

      The manuscript adds useful information about how structural properties of the brain are related to individual differences in autobiographical memory. A novel metric is used to assess features of white matter in tracts that are important for information exchange between the hippocampus and other brain regions. In one of these, the parahippocampal bundle, a relationship between the MR g-ratio and autobiographical memory recall is identified. This represents new and interesting information. The authors interpret the results in line with the theory that speed of signal transmission is important for cognitive function.

      Thank you for this positive summary.

    1. Author Response

      Reviewer #1 (Public Review):

      Rasicci et al. have developed a FRET biosensor that is designed to light up when cardiac myosin folds. This structure is extremely important to understand, and its link to the super-relaxed (SRX) state has not been fully shown. Their study provides a comprehensive review of the literature and provides compelling data that the 15 heptad+leucine zipper+GFP construct does function well and that the DCM mutant E525K has a similar IVM velocity despite a reduced ATPase compared with HMM. They rely on the ionic strength-dependent changes in the rate of MantATP release to argue that the E525K mutation stabilizes the 'interacting heads motif' (IHM) state, which makes logical sense.

      Strengths:

      Well written and comprehensive.

      Utilizes the appropriate fluorescence-based sensor for measuring the folding of the myosin structure. Provides a detailed range of techniques to support the premise of the study

      Weaknesses:

      Over-interpretation of the outcomes from this study means that the IHM and SRX are the same. Similar studies, e.g. Anderson 2018 and Chu 2021 support the opposite view that IHM and SRX are not necessarily the same, Anderson (and Rohde 2018) point out that S1 has some element of a reduced ATPase, this clearly cannot be due to folding of the molecule. Also, mavacamten was used in these studies to show that even S1 is inhibited suggesting that SRX and IHM are not connected. This is not to say that with enough supporting evidence that these observations cannot be over-ridden, it is just not clear that there is enough in this study to support this conclusion.

      We have revised our discussion to emphasize that our results support a model in which the SRX state is enhanced by formation of the IHM, but given the S1 and 2HP data the IHM may not be required for populating the SRX biochemical state (see page 8).

      I felt that the authors passed over the recent Chu 2021 paper too quickly, the Thomas group used a FRET sensor as well and provides a direct comparison as a technique, but with opposite conclusions. They also have supporting data in Rohde 2018 that their constructs were less ionic strength sensitive. It would be useful to understand what the authors think about this.

      We have discussed the Rohde and Chu papers in more detail in the discussion (see page 8). In the Rhode paper they used proteolytically prepared HMM and S1. Rohde found 20% SRX at all KCl concentrations in S1, while HMM shifted from 50% to 20% SRX in low and high salt conditions, respectively. Our results are different in terms of the absolute fraction of the SRX state but the trend is similar in terms of S1 being salt-insensitive and HMM being salt-sensitive. The difference could be proteolytic HMM, which is a longer construct, and proteolytic S1, which is prone to internal cleavage that can impact ATPase activity. Another difference could be the mixed isoform of mantATP used in previous studies and the single isoform of mantATP used on our study (see page 5)

      Reviewer #2 (Public Review):

      The paper by Rasicci et al. examines the impact of the DCM mutation E525K in beta-cardiac myosin on its function and regulation by autoinhibition. The role of the auto-inhibited state of beta-cardiac myosin in fine-tuning cardiac contractility is an active and exciting area of current research related to muscle biology and cardiomyopathies. Several studies in the past have linked the destabilization of the autoinhibited, super-relaxed (SRX) state of myosin to the pathogenesis of hypertrophic cardiomyopathy. This timely study provides one of the first few examples where the hypocontractile phenotype of a DCM mutation has been linked to the stabilization of the SRX state.

      One of the strengths here is the utilization of a wide variety of both pre-existing and novel biochemical and biophysical assays for the study. The authors have characterized a new two-headed long-tailed myosin construct containing 15-heptad repeats of the proximal S2 (15HPZ), which they show allows myosin to form the SRX state in vitro using single ATP turnover assays. The authors go on to compare the E525K and WT proteins using the 15HPZ myosin construct in terms of their steady-state actin-activated ATPase activity, in-vitro actin-sliding velocity and single ATP turnover measurements. These assays reveal that the predominant effect of this mutation is the stabilization of the SRX state which is maintained even at 150 mM salt concentration where the WT SRX is largely disrupted. This is an important observation because DCM mutations so far have been believed to only affect the force-generating capacity of myosin.

      One of the biggest strengths of this study is the attempt to develop a FRET-based approach to directly ask if the biochemical SRX state here correlates well with the structural IHM state, which is an important unresolved question in the field. The authors have designed a FRET pair (C-terminal GFP and Cy3ATP bound to the active site) that is sensitive to the relative position of the heads and the tail, allowing them to distinguish between the low-FRET closed IHM conformation and the no-FRET open conformation. Remarkably, the authors show that the salt dependence of the FRET efficiency values closely follows their results from the salt dependence of the percent SRX for both WT and E525K proteins. The authors then attempt to substantiate their FRET results by a direct visual analysis of the conformational states populated by both WT and E525K proteins at low salt using negative staining EM analysis. The authors have optimized conditions to allow the deposition of the IHM state on grids without adding the small molecule mavacamten, which was found to be necessary in an earlier study to visualize the closed state using EM. The authors conclude that the SRX state correlates well with the IHM state and that the E525K mutation indeed stabilizes the folded-back conformation of myosin.

      This study significantly strengthens the previously illustrated correlation between the SRX and IHM states and provides methodological advances (especially visualization of the IHM state by negative EM in the absence of cross-linking agents) that will be very useful to the field going forward. The observation that a DCM mutation can lead to stabilization of the folded back state is a novel insight that should spark interest in the field to test how broadly this applies to other DCM mutations. The conclusions of the paper are mostly supported by the data; however, some clarifications and qualifications are needed.

      Weaknesses:

      The extremely low enzymatic activity of the M2β 15HPZ myosins as compared to the WT S1 control (which is a historical control not assayed in parallel with the 15HPZ proteins), is concerning for the low protein quality of the 15HPZ myosins. The authors attribute the low kcat to the high proportion of SRX population in their ensembles. However, the DRX rates reported for the WT and E525K 15HPZ proteins in the single ATP turnover assay are ~3-4 fold lower than those of their S1 counterparts. These rates reflect basal turnover of ATP in the open state and thus should not be affected by the presence of the S2 tail, which leads to concerns about the 15HPZ protein activity. In addition, the very high percentage of stuck filaments in the in vitro motility assay for the 15HPZ constructs (despite the use of dark actin) is also concerning for significant amounts of enzymatically inactive protein.

      We thank the reviewer for pointing out the differences in the S1 and HMM DRX rates. We performed additional single turnover measurements with S1, adding two sets of measurements from one additional preparation (N=3), and we demonstrate that there is a significant increase in the DRX rates of WT S1 compared to WT HMM (see pages 4-5, Table 3, Figure 3- figure supplement 3). A faster rate in S1 was also reported in Rohde et al. 2018. Indeed, the DRX rates of E525K S1 are significantly higher than WT in S1, which we also now report in the results (see page 5, Figure 3 – figure supplement 3). We addressed the concerns about 15HPZ activity by performing NH4+ ATPase assays to demonstrate that the number of active heads was similar in S1 and 15HPZ HMM (see page 4). It is possible that the higher percentage of stuck filaments in the HMM motility is due to myosin heads in the IHM state on the motility surface, which generate a drag force by non-specifically interacting with actin, but further study is necessary to examine this question.

      The authors assert that the E525K mutation represents a new mechanism by which DCM-causing mutations lead to decreased contractility - by stabilizing the sequestered state rather than affecting motor function. However, there is no evaluation of the motor function (actin-activated ATPase activity or in vitro motility) of the E525K S1, which would reveal the effects of the mutation without confounding effects due to the sequestering of heads. Interestingly, in the single ATP turnover assay, the DRX rate of the E525K S1 is >2-fold higher than the WT control, suggesting that the mutation may have effects beyond stabilization of the SRX state. The conclusion that the E525K mutation's effect on myosin function is mediated via stabilization of the SRX state would be strengthened if the effects of the mutation on the motor domain alone were also known.

      We thank the reviewer for this suggestion. We performed actin-activated ATPase assays with WT and E525K S1 and found that E525K increases kcat and lowers KATPase, demonstrating enhanced intrinsic motor activity in the mutant S1 construct (see page 4, Figure 2B). This adds an interesting dimension to the manuscript because we report a mutant that enhances the intrinsic motor activity but stabilizes the SRX/IHM (see Discussion page 10). We did not perform in vitro motility, because this assay depends on the surface attachment strategy, and we would like to compare all constructs with the same attachment strategy using a C-terminal GFP tag (mutant and WT S1 and 15HPZ HMM). Therefore, we are making the S1 construct with a C-terminal GFP tag for this purpose, to be examined in a future study.

      While the authors show strong qualitative correlations between the SRX and IHM states using single ATP turnover, FRET, and EM experiments, attempts to quantitatively compare the fraction of heads in the IHM state using the various experimental approaches is problematic. For example, the R0 value of the FRET pair used here doesn't allow precise measurement of the distances being probed here to be made, but the distances are reported and compared to predicted distances. The authors report that the R0 for their FRET pair is 63 Å. Surprisingly the authors go on to use the steady-state FRET efficiency values to determine the average D-A distance (Fig 5B) which is 100 Å when all heads are in the IHM configuration and becomes larger than that when heads open. R0 of 63 Å allows a precise distance measurement to be made in the 31.5-94.5 Å range which corresponds to 0.5-1.5 R0. It is therefore technically incorrect to use the steady-state FRET efficiency values to determine the D-A distance here. Besides, there are several unknown factors here like orientation factor (κ2) which further complicate these calculations. Similarly, the quantification of IHM state molecules from the negative stain EM experiments is significantly hampered by the disruptive effect of the grid surface on the structure of the IHM state. The authors find that limiting the contact time with the grid to ~ 5s is necessary to preserve the IHM state.

      Despite that, only ~15% WT molecules were seen in the IHM state at low salt (Fig. 6B). In contrast, ~56% E525K molecules were in the IHM state. Both these proteins have similar SRX proportions (Fig. 3C) and similar FRET efficiency values (Fig. 5A) at this salt concentration. This mismatch highlights the problem arising due to not having a measure of the populations from the FRET data. It is not clear if the hugely different proportions of the IHM state in EM experiments are indicative of the relative stability of this state in the two proteins or a random difference in the electrostatic interactions of WT vs mutant with the grid. These experiments do not provide a correct idea of the %IHM in the two proteins. In the absence of any IHM population measurement, it is important to proceed with caution when quantitatively correlating the SRX and IHM.

      We thank the reviewer for pointing out that measuring precise distances by FRET can be difficult. We agree that the low FRET efficiency makes precise distance determination even more challenging. However, FRET is quite good at measuring a change in distance given a specific donor-acceptor pair. We feel our FRET biosensor clearly demonstrates FRET efficiencies that are salt-insensitive in E525K but a clear decrease in FRET at higher salt concentrations in WT. In order to compare the trend in the predicted FRET, based on the single turnover measurements, and the actual FRET we thought it was important to plot the two together on the same graph. We understand that this could have been misleading that we were reporting actual distances. We have now plotted the FRET efficiency instead of distance as a function of KCl concentration (Figure 5B), to prevent any confusion with reporting distances. In addition, we have emphasized that the data are plotted to allow for a comparison of the trend in the single turnover and FRET data (see page 6, 10, Figure 5B).

      We agree that it is important to proceed with caution when comparing the EM to the FRET and single turnover data. The EM does not give a quantitative estimate of the fraction of IHM molecules, due to the disruptive effect of the grid surface on protein conformation. However, it does provide direct (though qualitative) evidence that the conformation underlying SRX and enhanced FRET is the IHM, and it is consistent with our interpretation that the E525K mutation enhances FRET and SRX by stabilizing the IHM. To consolidate this result, we have performed EM experiments now with a total of 3 preparations of WT and mutant (see page 6-7 and Figure 6D). We find that while there is variability from experiment to experiment, likely because the grid surface is slightly different each time the experiment is performed, in all cases there was a ~4-fold higher fraction of folded molecules in the mutant. Since each WT/mutant experimental pair was studied in parallel, using identically prepared grids, the results provide further evidence that the mutant stabilizes the IHM. However, we agree that a quantitative, direct visual correlation of the SRX and IHM is not possible based on the current EM data.

      Finally, the utility of the methods described in the paper to the field would be greatly enhanced if they were described in more detail. As currently written, it would be difficult for others to replicate these experiments.

      Thank you for the comment. We have made significant changes in the methods to clarify the details of the experiments (see pages 11-14). In addition, we have added details to the results and figure legends.

    1. Author Response

      Reviewer #1 (Public Review):

      “This study investigates the dynamics of brain network connectivity during sustained experimental pain in healthy human participants. To this end, capsaicin was applied to the tongues of two cohorts of participants (discovery cohort, N=48; replication cohort, N=74). This procedure resulted in pain for several minutes. During sustained pain, pain avoidance/intensity ratings and fMRI scans were obtained. The analyses (i) compare the pain state with a resting state, (ii) assess the dynamics of brain networks during sustained pain, and (iii) aim to predict pain based on the dynamics of brain networks. To this end, the analyses focus on community structures of time-evolving networks. The results show that sustained pain is associated with the emergence of a brain network including somatomotor, frontoparietal, basal ganglia and thalamic brain areas. The somatomotor area of the tongue is particularly involved in that network while this area is decoupled from other parts of the somatomotor cortex. Moreover, the network configuration changes over time with the frontoparietal network decoupling from the somatomotor network. Frontoparietal-cerebellar connections were predictive of decreases of pain. Together, the findings provide novel and convincing insights into the dynamics of brain network during sustained pain.

      Strengths

      • The brain mechanisms of sustained pain is a timely and relevant topic with potential clinical implications.

      • Assessing the dynamics of sustained pain and relating it to the dynamics of brain networks is a timely and promising approach to further the understanding of the brain mechanisms of pain.

      • The study includes discovery and replication cohorts and pursues a cutting-edge analysis strategy.

      • The manuscript is very well-written and the results are visualized in an exemplary manner including a graphical outline and summary of the findings.”

      We thank the reviewer for the thoughtful summarization and evaluation of our study.

      “Weaknesses

      • It remains unclear whether the changes of brain networks over time simply reflect the duration of sustained pain or whether they essentially reflect different levels of pain intensity/avoidance.”

      We appreciate the editor and reviewer’s comment on this issue. With the current experimental paradigm, it is difficult to dissociate the pain duration from the level of pain because the delivery of oral capsaicin commonly induces initial bursting and then a gradual decrease of pain over time. That is, the pain duration is correlated with the pain intensity in our task.

      However, when we examined the time-course of the ratings at each individual level (as shown in Figure S2), the time duration explained 53.7% of the rating variance, R2 = 0.537 ± 0.315 (mean ± standard deviation). In addition, if we constrain the beta coefficient of the time duration to be negative (i.e., ratings should decrease over time), the explained variance decreases to 48.2%, R2 = 0.482 ± 0.457, leaving us enough variance (i.e., greater than 50%) for examining the distinct effects of time duration and ratings on the patterns of functional brain reorganization.

      Indeed, the two main analyses included in the manuscript—consensus community detection and predictive modeling—were designed to examine those two aspects of the task, i.e., time duration and pain avoidance ratings, respectively. First, through the consensus community detection analysis, we examined the community structure that changes over time, i.e., across the early, middle, and late periods (as shown in Figure 3). We then developed predictive models of pain avoidance ratings in the second main analysis (as shown in Figure 5).

      Though it is still a caveat that we cannot fully dissociate the effects of time duration versus pain ratings, we could interpret the first set of results to be more about time duration, while the second set of results is more about pain ratings.

      We now added a description of the implication of predictive modeling for isolating the effects of pain ratings. In addition, a discussion on the caveat of the current experimental design and relevant future direction.

      Revisions to the main manuscript:

      p. 25: Moreover, developing models to directly predict the pain ratings is helpful to complement the group-level analysis, because the changes in consensus community structure over the early, middle, and late periods only indirectly reflect the different levels of pain.

      p. 27: This study also had some limitations. First, with the current experimental paradigm, it is difficult to dissociate the pain duration from the level of pain because the delivery of oral capsaicin commonly induces initial bursting and then a gradual decrease of pain over time. Though we aimed to model the effects of pain duration and pain avoidance ratings with our two primary analyses, i.e., consensus community detection and predictive modeling, we cannot fully dissociate the impact of time duration versus pain ratings.

      “• Although the manuscript is very well-written it might benefit from an even clearer and simpler explanation of what the consensus community structure and the underlying module allegiance measure assesses.”

      We thank you for the suggestion. Now we added additional (but simple) descriptions of module allegiance and consensus community detection methods.

      Revisions to the main manuscript:

      pp. 8-9: Here, the consensus community means the group-level representative structures of the distinct community partitions of individuals. To determine the consensus community across different individuals and times, we first obtained the module allegiance (Bassett et al., 2011) from the community assignment of each individual. Module allegiance assesses how much a pair of nodes is likely to be affiliated with the same community label, and is defined as a matrix T whose element Tij is 1 when nodes i and j are assigned to the same community and 0 when assigned to different communities. This conversion of the categorical community assignments to the continuous module allegiance values allows group-level summarization of different community structures of individuals.

      p. 14: Here, high module allegiance indicates the voxels of two regions are likely to be in the same community affiliation, and vice versa.

      “• The added value of the assessment of the dynamics of brain networks remains unclear. Specifically, it is unclear whether the current analysis of brain networks dynamics allows for a clearer distinction between and prediction of pain and no-pain states than other measures of static or dynamic brain activity or static measures of brain connectivity.”

      The main goal (and thus, the added value) of the current study was to provide a “mechanistic” understanding of the brain processes of sustained pain, rather than the “prediction.” Even though we included the results from the predictive modeling, as in Figures 4-6, our focus was more on the interpretation of the model to quantitatively examine the functional changes in the brain, not on the maximization of the prediction performance.

      Indeed, maximizing the prediction performance was the main goal of our previous study (Lee et al., 2021), in which we developed a predictive model of sustained pain based on the patterns of dynamic functional connectivity. The model showed better prediction performances compared to the current study, but it was challenging to interpret the model because of the high dimensionality of the model and its features. In addition, functional connectivity itself provides only limited insight into how functional brain networks are structured and reconfigured over time.

      In this sense, the multi-layer community detection method has several advantages to achieving our goal. First, the community detection analysis allows us to summarize the complex, high-dimensional whole-brain connectivity patterns into neurobiologically interpretable subsystems. Second, the multi-layer community detection method allows us to study the temporal changes in community structure by connecting the same nodes across different time points.

      Now we added a description of the rationale behind the choice of the multi-layer community detection analysis over the conventional functional connectivity methods, and the added value of our study.

      Revisions to the main manuscript:

      p. 3: In this study, we examined the reconfiguration of whole-brain functional networks underlying the natural fluctuation in sustained pain to provide a mechanistic understanding of the brain responses to sustained pain.

      p. 7: In this study, we used this approach to examine the temporal changes of brain network structures during sustained pain, which cannot be done with conventional functional connectivity-based analyses (Lee et al., 2021).

      p. 27: However, the previous model provides a limited level of mechanistic understanding because of the high dimensionality of the model and its features. In addition, functional connectivity itself provides only limited insight into how functional brain networks are structured and reconfigured over time.

      Reviewer #2 (public Review):

      “The Authors J-J Lee et al., investigated cortical and subcortical brain networks and their organization in communities over time during evoked tonic pain. The paper is well-written, and the findings are interesting and relevant for the field. Interestingly, other than confirming well known phenomena (e.g., segregation within the primary somatomotor cortex) the Authors identified an emerging "pain supersystem" during the initial increase of pain, in which subcortical and frontoparietal regions, usually more segregated, showed more interactions with the primary somatomotor cortex. Decrease of pain was instead associated to a reconfiguration of the networks that sees subcortical and frontoparietal regions connected with areas of the cerebellum. The main novelty of the proposed analysis, lies in the resulting high performances of the classifier, that shows how this interesting link between frontoparietal network and subcortical regions with the cerebellum, is predictive of pain decrease. In summary, the main strengths of the present manuscript are: • Inclusion of subcortical regions: most of the recent papers using the Shaefer parcellation in ~200 brain areas1, do not consider subcortical areas, ignoring possible relevant responses and behaviors of those regions. Not only the Authors smartly addressed this issue, but most of their results showed how subcortical regions played a key role in the networks reconfiguration over time during evoked sustained pain.

      • Robust classification results: high accuracy obtained on training dataset (internal validation), using a leave-one-out approach, and on the available independent test dataset (external validation) of relatively large sample size (N=74).

      • Clarity in the description of aim and sub-aims and exhaustive presentation of the obtained results helped by appropriate illustrations and figures (I suggest less wording in some of them).

      • Availability of continuous behavioral outcome (track ball).”

      We appreciate the reviewer’s summary and positive evaluations.

      “Even though the results are mostly cohesive with previous literature, some of the results need to be discussed in relationship to recently published papers on the same topic as well as justifying some of the non-standard methodological procedures adding appropriate citations (or more detailed comments). The Authors do not touch upon the concept of temporal summation of pain, historically associated with tonic pain, especially when the study is finalized to better understanding brain mechanisms in chronic pain populations (chronic pain patients often exhibit increased temporal summation of pain2). I would suggest starting from the paper recently published by Cheng et al. that also shares most of the methodological pipeline3 to highlight similarities and novelties and deepen the comparison with the associated literature.”

      We thank the reviewer and editor for the comment on this important topic. Temporal summation of pain indicates progressively increased sensation of pain during prolonged noxious stimulation (Price, Hu, Dubner, & Gracely, 1977), and has been suggested as a hallmark of chronic pain disorders including fibromyalgia (Cheng et al., 2022; Price et al., 2002). In a recent study by Cheng et al. (2022), the authors induced tonic pain using constantly high cuff pressure and examined whether the participants experienced increased pain in the late period compared to the early period of pain. On the contrary, in our experimental paradigm, the capsaicin liquid initially delivered into the oral cavity is being cleaned out by saliva, and thus overall pain intensity was decreasing over time, not increasing (Figure 1B). Therefore, the temporal summation of pain may occur in a limited period (e.g., the early period of the run), but it is difficult to examine its effect systematically in our study.

      However, it is notable that Cheng et al.’s results overlap with our findings. For example, Cheng et al. reported the intra-network segregation within the somatomotor network and the inter-network integration between the somatomotor and other networks during the temporal summation of pressure pain in patients with fibromyalgia, which were similar to the findings we reported in Figure S9 and Figure 4. Although it is unclear whether these results reflect the temporal summation of pain, these network-level features shared across the two studies are likely to be an essential component of the sustained pain processes in the brain.

      Now we added a comment on the temporal summation of pain in the main manuscript.

      Revisions to the main manuscript (p. 26):

      Interestingly, a recent fMRI study on the temporal summation of pain in fibromyalgia patients reported results similar to ours (Cheng et al., 2022), including the intra-network dissociation within the somatomotor network and the inter-network integration between the somatomotor and other networks during pain. Although we cannot directly examine whether the temporal summation of pain gave rise to these network-level changes due to the limitation of our experimental paradigm, these consistent findings between the two studies may suggest that our findings could be generalized to clinical conditions.

      We thank the reviewer and editor for the information about this recent publication. Cheng et al. (2022) was not published at the time we wrote the manuscript, and we were surprised that Cheng et al. shares many aspects with our study, e.g., both used multilayer community detection and also reported similar findings, as described above.

      However, there were some differences between the two studies as well.

      First, the focus of our study was on the brain dynamics during the natural time-course of sustained pain from its initiation to remission in healthy participants, whereas the focus of Cheng et al. was on the temporal summation phenomenon of pain (TSP) and the enhanced TSP in patients with fibromyalgia patients. Because of this difference in the research focuses, our study and Cheng et al. are providing many nonoverlapping results and insights. For example, our study paid particular attention to the coping mechanisms of the brain (e.g., the network-level changes in the subcortical and frontoparietal network regions) and the brain systems that are correlated with the natural decrease of pain (e.g., the cerebellum in Figure 5). In contrast, Cheng et al. (2022) identified the brain connectivity and network features important for the increased TSP in fibromyalgia patients.

      Second, our great interest was in identifying and visualizing the fine-grained spatiotemporal patterns of functional brain network changes over the period of sustained pain. To utilize fine-grained brain activity information, we conducted our main analyses at a voxel-level resolution and on the native brain space, such as in Figures 2-3 and Figures S5, S7, and S8. With this fine-grained spatiotemporal mapping, we were able to identify small, but important voxel-level dynamics.

      We now cited Cheng et al. (2022) in multiple places and revised the manuscript accordingly.

      Revisions to the main manuscript (p. 26):

      Interestingly, a recent fMRI study on the temporal summation of pain in fibromyalgia patients reported results similar to ours (Cheng et al., 2022), including the intra-network dissociation within the somatomotor network and the inter-network integration between the somatomotor and other networks during pain. Although we cannot directly examine whether the temporal summation of pain gave rise to these network-level changes due to the limitation of our experimental paradigm, these consistent findings between the two studies may suggest that our findings could be generalized to clinical conditions.

      “Here the main significant weaknesses of the study:

      • The data analysis is entirely conducted on young healthy subjects. This is not a limitation per se, but the conclusion about offering new insights into understanding mechanisms at the basis of chronic pain is too far from the results. Centralization of pain is very different from summation and habituation, especially if all the subjects in the study consistently rated increased and decreased pain in the same way (it never happens in chronic pain patients). A similar pipeline has been actually applied to chronic pain patients (fibromyalgia and chronic back pain)3,4. Discussing the results of the present paper in relationship to those, could offer a more robust way to connect the Authors' results to networks behavior in pathological brains.”

      We are grateful for the opportunity to discuss the clinical implication of our study. First of all, we agree with the reviewer and editor that we cannot make a definitive claim about chronic pain with the current study, and thus, we revised the last sentence of the abstract to tone down our claim.

      Revisions to the main manuscript (p. 2, in the abstract):

      This study provides new insights into how multiple brain systems dynamically interact to construct and modulate pain experience, advancing our mechanistic understanding of sustained pain.

      However, as we noted above in E-4, some of our findings were consistent with the findings from a previous clinical study (Cheng et al., 2022), suggesting the potential to generalize our study to clinical pain conditions. In addition, we previously reported that a predictive model of sustained pain derived from healthy participants performed better at predicting the pain severity of chronic pain patients than the model derived directly from chronic pain patients (Lee et al., 2021), highlighting the advantage of the “component process approach.”

      The component process approach aims to develop brain-based biomarkers for basic component processes first, which can then serve as intermediate features for the modeling of multiple clinical conditions (Woo, Chang, Lindquist, & Wager, 2017). This has been one of the core ideas of the Research Domain Criteria (RDoC) (Insel et al., 2010) and the Hierarchical Taxonomy of Psychopathology (HiTOP) (Kotov et al., 2017). If the clinical pain of a patient group is modeled as a whole, it becomes unclear what is being modeled because of the multidimensional and heterogeneous nature of clinical pain (Melzack, 1999) as well as other co-occurring health conditions (e.g., mental health issues, medication use, etc.). The component process approach, in contrast, can specify which components are being modeled and are relatively free from heterogeneity and comorbidity issues by experimentally manipulating the specific component of interest in healthy participants.

      The current study was conducted on healthy young adults based on the component process approach. We used oral capsaicin to experimentally induce sustained pain, which unfolds over protracted time periods and has been suggested to reflect some of the essential features of clinical pain (Rainville, Feine, Bushnell, & Duncan, 1992; Stohler & Kowalski, 1999). Therefore, the detailed characterization of the brain processes of sustained pain will be able to serve as an intermediate feature of multiple clinical conditions in future studies.

      Now we added the discussion on the clinical generalizability issue in the discussion section.

      Revisions to the main manuscript:

      pp. 25-26: An interesting future direction would be to examine whether the current results can be generalized to clinical pain. Experimental tonic pain has been known to share similar characteristics with clinical pain (Rainville et al., 1992; Stohler & Kowalski, 1999). In addition, in a recent study, we showed that an fMRI connectivity-based signature for capsaicin-induced orofacial tonic pain can be generalized to chronic back pain (Lee et al., 2021). Therefore, a detailed characterization of the brain responses to sustained pain has the potential to provide useful information about clinical pain.

      p. 26: Interestingly, a recent fMRI study on the temporal summation of pain in fibromyalgia patients reported results similar to ours (Cheng et al., 2022), including the intra-network dissociation within the somatomotor network and the inter-network integration between the somatomotor and other networks during pain. Although we cannot directly examine whether the temporal summation of pain gave rise to these network-level changes due to the limitation of our experimental paradigm, these consistent findings between the two studies may suggest that our findings could be generalized to clinical conditions.

      “Vice versa, the behavioral measure used to assess evoked pain perception (avoidance ratings), has been developed for chronic pain patients and never validated on healthy controls5. It might not be an appropriate measure considering the total absence of pain variability in the reported responses over forty-eight subjects6,7.”

      We acknowledge that pain avoidance measures are not fully validated in the healthy population. Nevertheless, we used this measure in this study for the following two main reasons that outweigh the limitations.

      First, a pain avoidance rating provides an integrative measure that can reflect the multi-dimensional aspects of sustained pain. One of the essential functions of pain is to avoid harmful situations and promote survival, and the avoidance motivation induced by pain is composed of not only sensory-discriminative, but also cognitive components including learning, valuation, and contexts (Melzack, 1999). According to the fear-avoidance model (Vlaeyen & Linton, 2012), if the pain-induced avoidance motivation is not resolved for a long time and is maladaptively associated with innocuous environments, chronic pain is likely to develop, suggesting the importance and clinical relevance of pain avoidance measures. In addition, our experimental design is particularly suitable for the use of avoidance rating because the oral capsaicin stimulation is accompanied by the urge to avoid the painful sensation, but it cannot immediately be resolved similar to chronic pain. Moreover, capsaicin is sometimes experienced as intense but less aversive (or even appetitive) in some cases, e.g., spicy food craver (Stevenson & Yeomans, 1993). In this case, avoidance ratings can provide a more reasonable measure of pain compared to the intensity rating.

      Second, the avoidance measure provides a common scale on which we can compare different types of aversive experiences, allowing us to conduct specificity tests for a predictive model of pain. For example, a recent study successfully compared the brain representations of two types of pain and two types of aversive, but non-painful experiences (e.g., aversive auditory and visual experiences) using the same avoidance measure (Ceko, Kragel, Woo, Lopez-Sola, & Wager, 2022). These comparisons were possible because the avoidance measure provided one common scale for all the aversive experiences regardless of their types of stimuli.

      To provide a better justification for the use of the avoidance measure, we now included the specificity test results of our pain predictive models. More specifically, we tested our module allegiance-based SVM and PCR models of pain on the aversive taste and aversive odor conditions (Figure S13).

      Despite these advantages, the use of avoidance rating without thorough validation is a limitation of the current study, and thus future studies need to examine the psychometric properties of the avoidance rating, e.g., examining the relationship among pain intensity, unpleasantness, and avoidance measures. However, the current study showed that the predictive models derived with pain avoidance rating (Study 1) could be used to predict the pain intensity rating (Study 2). In addition, the overall time-course of pain avoidance ratings in Study 1 was similar to the time-course of pain intensity ratings in Study 2, providing some supporting evidence for the convergent validity of the pain avoidance measure.

      As to the following comment, “It might not be an appropriate measure considering the total absence of pain variability in the reported responses over forty-eight subjects,” there are pieces of evidence supporting that the low between-individual variability of ratings is due to the characteristics of our experimental design, not to the fact that we used the avoidance measure. As we discussed in more detail in our response to E-1, our experimental procedure based on capsaicin liquid commonly induces the initial burst of painful sensation and the subsequent gradual relief for most of the participants (Figure 1B, left). A similar time-course pattern of ratings was observed in Study 2 (Figure 1B, right), which used the pain “intensity” rating, not the pain avoidance rating. In addition, previous studies with a similar experimental design (i.e., intra-oral capsaicin application) (Berry & Simons, 2020; Lu, Baad-Hansen, List, Zhang, & Svensson, 2013; Ngom, Dubray, Woda, & Dallel, 2001) also showed a similar time-course of pain ratings with low between-individual variability regardless of the rating types (e.g., VAS or irritation intensity), confirming that this observation is not unique to the pain avoidance rating.

      Now we added descriptions on the small between-individual variability of pain ratings and the use of avoidance ratings.

      Revisions to the main manuscript:

      pp. 5-7: Note that the overall trend of pain ratings over time was similar across participants because of the characteristics of our experimental design, which has also been observed in the previous studies that used oral capsaicin (Berry & Simons, 2020; Lu et al., 2013; Ngom et al., 2001). However, also note that each individual’s time-course of pain ratings were not entirely the same (Figures S2 and S3).

      p. 26: However, there are also differences between the characteristics of capsaicin-induced tonic pain versus clinical pain. For example, clinical pain continuously fluctuates over time in an idiosyncratic pattern (Apkarian, Krauss, Fredrickson, & Szeverenyi, 2001), whereas capsaicin-induced tonic pain showed a similar time-course pattern across the participants—i.e., increasing rapidly and then decreasing gradually (Figure 1B). This typical time-course of pain ratings has been reported in previous studies that used oral capsaicin (Berry & Simons, 2020; Lu et al., 2013; Ngom et al., 2001).

      pp. 26-27: Note that Study 1 used a pain avoidance measure that is not yet fully validated in healthy participants. However, we chose to use the pain avoidance measure, which can provide integrative information on the multi-dimensional aspects of pain (Melzack, 1999; Waddell, Newton, Henderson, Somerville, & Main, 1993). It also has a clinical implication considering that the maladaptive associations of pain avoidance to innocuous environments have been suggested as a putative mechanism of transition to chronic pain (Vlaeyen & Linton, 2012). Lastly, the avoidance measure can provide a common scale across different modalities of aversive experience, allowing us to compare their distinct brain representations (Ceko et al., 2022) or test the specificity of their predictive models (Lee et al., 2021) (Figure S13). Although the psychometric properties of the pain avoidance measure should be a topic of future investigation, we expect that the pain avoidance measure would have a high level of convergent validity with pain intensity given the observed similarity between pain avoidance (Study 1) and pain intensity (Study 2) in their temporal profiles. The generalizability of our PCR model across Studies 1 and 2 also supports this speculation. However, there would also be situations in which pain avoidance is dissociated from pain intensity. For example, capsaicin can be experienced to be intense but less aversive or even appetitive in some contexts, such as cravings for spicy food (Stevenson & Yeomans, 1993). In addition, the gradual rise of avoidance ratings during the late period of the control condition in Study 1 would not be observed if the intensity measure was used. Future studies need to examine the relationship between pain avoidance and the other pain assessments and the advantage of using the pain avoidance measure.

      “• The dynamic measure employed by the Authors is better described from the term "windowed functional connectivity". It is often considered a measure of dynamic functional connectivity and it gives information about fluctuations of the connectivity patterns over time. Nevertheless, the entire focus of the paper, including the title, is on dynamic networks, which inaccurately leads one to think of time-varying measures with higher temporal resolution (either updating for every acquired time point, as the Authors did in their previous publication on the same dataset4, or sliding windows involving weighting or tapering8,9). This allows one to follow network reorganization over time without averaging 2-min intervals in which several different brain mechanisms might play an important role3,10,11. In summary, the assumption of constant response throughout 2-min periods of tonic pain and the use of Pearson correlations do not mirror the idea of dynamic analysis expressed by the Authors in title and introduction. I would suggest removing "dynamic" from the title, reduce the emphasis on this concept, address possible confounds introduced by the choice of long windows and rephrase the aim of the study in terms of brain network reconfiguration over the main phases of tonic pain experience.”

      Now we removed the word ‘dynamic’ from many places in the manuscript, including the title. In addition, we added a brief discussion on the reason we chose to use the long and non-overlapping windows for connectivity calculation.

      Revisions to the main manuscript (p. 8):

      Although the long duration of the time window without overlaps may obscure the fine-grained temporal dynamics in functional connectivity patterns, we chose to use this long time window based on previous literature (Bassett et al., 2011; Robinson, Atlas, & Wager, 2015), which also used long time windows to obtain more reliable estimates of network structures and their transitions.

      “• Procedure chosen for evoking sustained pain. To the best of my knowledge, capsaicin sauce on the tongue is not a validated tonic pain procedure. In favor of this argument is the absence of inter-subject variability in the behavioral results showed in the paper, very unusual for response to painful stimulations. The procedure is well described by the Authors, and some precautions like letting the liquid drying before the start of the scan, have helped reducing confounds. Despite this, the measures in figure 1B suggest that the intensity of the painful stimulation is not constant as expected for sustained pain (probably the effect washes out with the saliva). In this case, the first six-minute interval requires particular attention because it encapsulates the real tonic pain phase, and the following ones require more appropriate labels. Ideally the Author should cite previous studies showing that tongue evoked pain elicits a very specific behavioral response (summation, habituation/decrease of pain, absence of pain perception). If those works are missing, this response need to be treated as a funding rather than an obvious point.”

      We addressed this comment. Moreover, we could find previous studies that experimentally induced tonic pain through the application of capsaicin on the tongue (Berry & Simons, 2020; Boudreau, Wang, Svensson, Sessle, & Arendt-Nielsen, 2009; Green, 1991; Ngom et al., 2001), suggesting that our experimental procedure is in line with previous literature.

      Reviewer #3 (Public Review ):

      “In their manuscript, Lee and colleagues explore the dynamics of the functional community structure of the brain (as measured with fMRI) during sustained experimental pain and provide several potentially highly valuable insights into, and evaluate the predictive capacity of, the underlying dynamic processes. The applied methodology is novel but, at the same time, straightforward and has solid foundations. The findings are very interesting and, potentially, of high scientific impact as they may significantly push the boundaries of our understanding of the dynamic neural processes during sustained pain, with a (somewhat limited) potential for clinical translation.

      However (Major Issue 1), after reading the current manuscript version, not all of my doubts have been dissolved regrading the specificity of the results to pain. Moreover (Major Issue 2), some of the results (specifically, those related to the group level analysis of community differences) do not seem to be underpinned with a proper statistical inference in the current version of the manuscript and, therefore, their presentation and discussion may not be proportional to the degree of evidence. Next to these Major Issues (detailed below), some other, minor clarifications might also be needed before publications. These are detailed below or in the private part of the review ("Recommendations for the authors").

      Despite these issues, this is, in general, a high quality work with a high level of novelty and - after addressing the issues - it has a very high potential for becoming an important contribution (and a very interesting read) to the pain-research community and beyond.”

      We appreciate the reviewer’s thoughtful comments. We have revised the manuscript to address the Reviewer’s major concerns, as described below.

      “Major Issue 1:

      The main issue with the manuscript is that it remains somewhat unclear, how specific the results are to pain.

      Differences between the control resting state and the capsaicin trials might be - at least partially - driven by other factors, like:

      • motion artifacts

      • saliency, attention, axiety, etc.

      Differences between stages over the time-course might, additionally, be driven by scanner drifts (to which the applied approach might be less sensitive, but the possibility is still there ) or other gradual processes, e.g. shifts in arousal, attention shifts, alertness, etc.

      All the above factors might emerge as confounding bias in both of the predictive models.

      This problem should be thoroughly discussed, and at least the following extra analyses are recommended, in order to attenuate concerns related to the overall specificity and neurobiological validity of the results:

      • reporting of, and testing for motion estimates (mean, max, median framewise displacement or anything similar)

      • examining whether these factors might, at least partially, drive the predictive models.

      • e.g. applying the PCR model on the resting state data and verifying of the predicted timecourse is flat (no inverse U-shape, that is characteristic to all capsaicin trials).

      Not using the additional sessions (bitter taste, aversive odor, phasic heat) feels like a missed opportunity, as they could also be very helpful in addressing this issue.”

      We thank the reviewer for this comment on the important issue regarding the specificity of our results and the potential influences of noise. The effects of head motion and physiological confounds are particularly relevant to pain studies because pain involves substantial physiological changes and often causes head motion. To address the related concerns of specificity, we conducted additional analyses assessing the independence of our predictive models (i.e., SVM and PCR models) from head movement and physiology variables and the specificity of our models to pain versus non-painful aversive conditions (i.e., bitter taste and aversive odor) in Study 1.

      First, we examined the overall changes of framewise displacement (FD) (Power, Barnes, Snyder, Schlaggar, & Petersen, 2012), heart rate (HR), and respiratory rate (RR) in the capsaicin condition (Figure S11). For the univariate comparison between the capsaicin vs. control conditions (Figure S11A), the results showed that, as expected, the capsaicin condition caused significant changes in head motion and autonomic responses. The mean FD and HR were significantly higher, and the RR was lower in the capsaicin condition compared to the control condition (FD: t47 = 5.30, P = 2.98 × 10-6; HR: t43 = 4.98, P = 1.10 × 10-5; RR: t43 = -1.91, P = 0.063, paired t-test). In addition, the increased motion and autonomic responses were more prominent in the early period of pain (Figure S11B). The 10-binned (2 mins per time-bin) FD and HR showed a decreasing trend while the RR showed an increasing trend over time in the capsaicin condition. The comparisons between the early (1-3 bins, 0-6 min) vs. late (8-10 bins, 14-20 min) periods of the capsaicin condition showed significant differences both for FD and HR (FD: t47 = 6.45, P = 8.12 × 10-8; HR: t43 = 6.52, P = 6.41 × 10-8; RR: t43 = -1.61, P = 0.11, paired t-test). These results suggest that while participants were experiencing capsaicin tonic pain, particularly during the early period, head motion and heart rate were increased, while breathing was slowed down. Note that we needed to exclude 4 participants’ data in this analysis due to technical issues with the physiological data acquisition.

      Next, we examined whether the changes in head motion and physiological responses influenced our predictive model performance (Figure S12). We first regressed out the mean FD, HR, and RR (concatenated across conditions and participants as we trained the SVM model) from the predicted values of the SVM model with leave-one-subject-out cross-validation (2 conditions × 44 participants = 88) and then calculated the classification accuracy again (Figure S12A). The results showed that the SVM model showed a reduced, but still significant classification accuracy for the capsaicin versus control conditions in a forced-choice test (n = 44, accuracy = 89%, P = 1.41 × 10-7, binomial test, two-tailed). We also did the same analysis for the PCR model (10 time-bins × 44 participants = 440) and the PCR model also showed a significant prediction performance (n = 44, mean prediction-outcome correlation r = 0.20, P = 0.003, bootstrap test, two-tailed, mean squared error = 0.159 ± 0.022 [mean ± s.e.m.]) (Figure S12B). These results suggest that our SVM and PCR models capture unique variance in tonic pain above and beyond the head movement and physiological changes.

      Lastly, we examined the specificity of our predictive models to pain, by testing the models on the non-painful but aversive conditions including the bitter taste (induced by quinine) and aversive odor (induced by fermented skate) conditions (Figure S13). All the model responses were obtained using leave-one-participant-out cross-validation. The results showed that the overall model responses of the SVM model for the bitter taste and aversive odor conditions were higher than those for the control condition but lower than the capsaicin condition (Figure S13A). Classification accuracies for comparing capsaicin vs. bitter taste and capsaicin vs. aversive odor were all significant (for capsaicin vs. bitter taste, accuracy = 79%, P = 6.17 × 10-5, binomial test, two-tailed, Figure S13C; for capsaicin vs. aversive odor, accuracy = 83%, P = 3.31 × 10-6, binomial test, two-tailed, Figure S13E), supporting the specificity of our SVM model of pain. Similarly, the model responses of the PCR model for the bitter taste and aversive odor conditions were lower than the capsaicin condition, and their temporal trajectories were less steep and fluctuating compared to the capsaicin condition (Figure S13B). The time-course of the model responses for the control condition was flatter than all other conditions and did not show the inverted U-shape. Furthermore, the model responses of the bitter taste and aversive odor conditions did not show the significant correlations with the actual avoidance ratings (bitter taste: mean prediction-outcome correlation r = 0.05, P = 0.41, bootstrap test, two-tailed, mean squared error = 0.036 ± 0.006 [mean ± s.e.m.], Figure S13D; aversive odor: mean prediction-outcome correlation r = 0.12, P = 0.06, bootstrap test, two-tailed, mean squared error = 0.044 ± 0.004 [mean ± s.e.m.], Figure S13F), suggesting the specificity of PCR model to pain.

      Overall, we have provided evidence that our models can predict pain ratings above and beyond the head motion and physiological changes and that the models are more responsive to pain compared to non-painful aversive conditions.

      Now we added descriptions on the specificity tests to the main manuscript and also to the Supplementary Information.

      Revisions to the main manuscript (p. 20):

      Specificity of the module allegiance-based predictive models To examine whether the predictive models were specific to pain and the prediction performances were not influenced by confounding variables such as head motion and physiological changes, we conducted additional analyses as shown in Figures S11-13. The SVM and PCR models showed significant prediction performances even after controlling for head motion (i.e., framewise displacement) and physiological responses (i.e., heart rate and respiratory rate) (Figures S11 and S12) and did not respond to the non-painful but aversive conditions including the bitter taste and aversive odor conditions (Figure S13), supporting the specificity of our predictive to pain. For details, please see Supplementary Results.

      Revisions to the Supplementary Information (pp. 2-4):

      Specificity analysis (Figures S11-13) To examine whether the predictive models (i.e., SVM and PCR models) were specific to pain and not influenced by confounding noises, we conducted additional specificity analysis assessing the independence of the models from head movement and physiology variables and specificity of our models to pain versus non-painful aversive conditions (i.e., bitter taste and aversive odor) in Study 1. First, we examined the overall changes of framewise displacement (FD) (Power et al., 2012), heart rate (HR), and respiratory rate (RR) in sustained pain (Figure S11). For the univariate comparison between capsaicin vs. control conditions (Figure S11A), the results showed that, as expected, capsaicin condition caused significant changes in motion and autonomic responses. The mean FD and HR were significantly higher, and the RR was lower in the capsaicin condition compared to the control condition (FD: t47 = 5.30, P = 2.98 × 10-6; HR: t43 = 4.98, P = 1.10 × 10-5; RR: t43 = -1.91, P = 0.063, paired t-test). For the temporal changes of movement and physiology variables (Figure S11B), the results showed that the increased motion and autonomic responses are more prominent in the early period of pain. The 10-binned (2 mins per time-chunk) FD and HR showed decreasing trend while the RR showed increasing trend over time in capsaicin condition. Additional univariate comparisons between early (1-3 bins, 0-6 min) vs. late (8-10 bins, 14-20 min) period of capsaicin condition showed that differences were significant for FD and HR (FD: t47 = 6.45, P = 8.12 × 10-8; HR: t43 = 6.52, P = 6.41 × 10-8; RR: t43 = -1.61, P = 0.11, paired t-test). This suggests that while participants were experiencing tonic pain, particularly in the early period, motion and heart rate was increased but breathing was slowed. Note that we needed to exclude 4 participants’ data due to technical issues with physiological data acquisition. Next, we examined whether the head movement and physiological responses are the main driver of our predictive models (Figure S12). For all the original signature responses from SVM model (2 conditions × 44 participants = 88), we regressed out the mean FD, HR, and RR (concatenated across conditions and participants as the SVM model was trained) and calculated the classification accuracy (Figure S12A). Although the signature responses were controlled for movement and physiology variables, the SVM model still showed a high classification accuracy for the capsaicin versus control conditions in a forced-choice test (n = 44, accuracy = 89%, P = 1.41 × 10-7, binomial test, two-tailed). Similarly, for all the original signature responses from PCR model (10 time-bins × 44 participants = 440), we regressed out the 10-binned FD, HR, and RR (concatenated across time-bins and participants as the PCR model was trained) and calculated the within-individual prediction-outcome correlation (Figure S12B). Again, the PCR model showed a significantly high predictive performance (n = 44, mean prediction-outcome correlation r = 0.20, P = 0.003, bootstrap test, two-tailed, mean squared error = 0.159 ± 0.022 [mean ± s.e.m.]) while controlling for movement and physiology variables. These results suggest that our SVM and PCR models captures unique variance in tonic pain above and beyond the head movement and physiological changes. Lastly, we examined the specificity of our predictive models to pain, by testing the models onto the non-painful but tonic aversive conditions including bitter taste (induced by quinine) and aversive odor (induced by fermented skate) (Figure S13). All the signature responses were obtained using leave-one-participant-out cross-validation. The results showed that the overall signature responses of SVM model for bitter taste and aversive odor conditions were higher than those for control conditions, but lower than capsaicin condition (Figure S13A). Classification accuracy between capsaicin vs. bitter taste and vs. aversive odor were all significantly high (capsaicin vs. bitter taste: accuracy = 79%, P = 6.17 × 10-5, binomial test, two-tailed, Figure S13C; capsaicin vs. aversive odor: accuracy = 83%, P = 3.31 × 10-6, binomial test, two-tailed, Figure S13E), suggesting the specificity of SVM model to pain. Similarly, the temporal trajectories of the signature responses of PCR model for bitter taste and aversive odor conditions were not overlapping with that of the capsaicin condition (Figure S13B). Furthermore, the signature responses of bitter taste and aversive odor conditions do not have significant relationship with the actual avoidance ratings (bitter taste: mean prediction-outcome correlation r = 0.05, P = 0.41, bootstrap test, two-tailed, mean squared error = 0.036 ± 0.006 [mean ± s.e.m.], Figure S13D; aversive odor: mean prediction-outcome correlation r = 0.12, P = 0.06, bootstrap test, two-tailed, mean squared error = 0.044 ± 0.004 [mean ± s.e.m.], Figure S13F), suggesting the specificity of PCR model to pain. Overall, we have provided evidence that the module allegiance-based models can predict pain ratings above and beyond the movement and physiological changes, and are more responsive to pain compared to non-painful aversive conditions, which suggest the specificity of our results to pain.

      “Major Issue 2:

      Another important issue with the manuscript is the (apparent) lack of statistical inference when analyzing the differences in the group-level consensus community structures (both when comparing capsaicin to control and when analysing changes over the time-course of the capsaicin-challenge).

      Although I agree that the observed changes seem biologically plausible and fit very well to previous results, without proper statistical inference we can't determine, how likely such differences are to emerge just by chance.

      This makes all results on Figs. 2 and 3, and points 1, 4 and 5 in the discussion partially or fully speculative or weakly underpinned, comprising a large proportion of the current version of the manuscript.

      Let me note, that this issue only affects part of the results and the remaining - more solid - results may already provide a substantial scientific contribution (which might already be sufficient to be eligible for publication in eLife, in my opinion).

      Therefore I see two main ways of handling Major Issue 2:

      • enhancing (or clarifying potential misunderstandings regarding) the methodology (see my concrete, and hopefully feasible, suggestions in the "private part" of the review),

      • de-weighting the presentation and the discussion of the related results.

      I believe there are many ways to test the significance of these differences. I highlight two possible, permutation testing-based ideas.

      Idea 1: permuting the labels ctr-capsaicin, or early-mid-late, repeating the analysis, constructing the proper null distribution of e.g. the community size changes and obtain the p-values. Idea 2: "trace back" communities to the individual level and do (nonparametric) statistical inference there.”

      We appreciate this important comment. We did not conduct statistical inference when comparing the group-level consensus community affiliations of the different conditions (Figure 2) or different phases (Figure 3) because of the difficulty in matching the community affiliation values of the networks to be compared.

      For example, let us assume that the 800 out of 1,000 voxels of community #1 and 1,000 out of 4,000 voxels of community #2 in the control condition are commonly affiliated with the same community #3 in the capsaicin condition. To compare the community affiliation between two conditions, we should first match the community label of the capsaicin condition (i.e., #3) to that of the control condition (i.e., #1 or #2), and here a dilemma occurs; if we prioritize the proportion of the overlapping voxels for the matching, the common community should be labeled as #1, whereas if we prioritize the number of the overlapping voxels for the matching, the label of the common community should be #2. Although both choices look reasonable, none of them can be a perfect solution.

      As the example above, it is impossible to exactly match the community affiliation of the different networks. We must choose an imperfect criterion for the matching procedure, which essentially affects the comparison of network structure. This was the main reason that we limited our results of Figures 2-3 to a qualitative description based on visual inspection. Moreover, the group-level consensus community structures in Figures 2-3 are not a simple group statistic like sample mean; they were obtained from multiple steps of analyses including permutation-based thresholding and unsupervised clustering, which could further complicate the interpretation of statistical tests.

      Alternatively, there is a slightly different but more rigorous approach to the comparisons of the community structures, which is the Phi-test (Alexander-Bloch et al., 2012; Lerman-Sinkoff & Barch, 2016). Instead of direct use of the community labels, this method converts the community label of each voxel into a list of module allegiance values between the seed voxel and all the voxels of the brain (i.e., 1 if the seed and target voxels have the same community label and 0 otherwise). This allows quantitative comparisons of voxel-level community profiles between different conditions without an arbitrarily matching of the community labels. We adopted this Phi-test for our analyses to examine whether the regional community affiliation pattern is significantly different between (i) the capsaicin vs. control conditions and (ii) the early vs. late periods of pain (Figure S6), which correspond to the main findings of the Figures 2 and 3 in our manuscript, respectively.

      More specifically, to compare the group-level consensus community structures between the capsaicin vs. control conditions and the early vs. late periods, we first obtained a seed-based module allegiance map for each voxel (i.e., using each voxel as a seed). Then, we calculated a correlation coefficient of the module allegiance values between two different conditions for each voxel. This correlation coefficient can serve as an estimate of the voxel-level similarity of the consensus community profile. Because module allegiance is a binary variable, these correlation values are Phi coefficients. A small Phi coefficient means that the spatial pattern of brain regions that have the same community affiliation with the given voxel are different between the two conditions. For example, if a voxel is connected to the somatomotor-dominant community during the capsaicin condition and the default-mode-dominant community during the control condition, the brain regions that have the same community label with the voxel will be very different, and thus the Phi coefficient will become small. Moreover, the Phi coefficient can be small even if a voxel is affiliated as the same (matched) community label for both conditions, when the spatial patterns of the same community is different between conditions.

      To calculate the statistical significance of the Phi coefficient, we conducted permutation tests, in which we randomly shuffled the condition labels in each participant and obtained the group-level consensus community structure for each shuffled condition. Then, we calculated the voxel-level correlations of the module allegiance values between the two shuffled conditions. We repeated this procedure 1,000 times to generate the null distribution of the Phi coefficients, and calculated the proportion of null samples that have a smaller Phi coefficient (i.e., a more dis-similar regional community structure) than the non-shuffled original data.

      Results showed that there are multiple voxels with statistical significance (permutation tests with 1,000 iterations, one-tailed) in the area where the community affiliations of the two contrasting conditions were different (Figure S6). For example, the frontoparietal and subcortical regions for the capsaicin vs. control (c.f., Figure 2), and the frontoparietal, subcortical, brainstem, and cerebellar regions for the early vs. late period of pain (c.f., Figure 3) contain voxels that survived after thresholding with FDR-corrected q < 0.05, suggesting the robustness of our main results.

      Particularly, the somatomotor and insular cortices showed statistical significance in the permutation test, and this may reflect the large changes in other areas that are connecting to the somatomotor and insular cortices across different conditions. The statistical significance was also observed in the visual cortex, which was unexpected. We interpret that the spatial distribution of the visual network community is too stable across conditions, and thus the null distribution from permutation formed a very narrow distribution of Phi coefficients. Therefore, a small change in the community structure could achieve statistical significance.

      Now we added descriptions on the permutation tests.

      Revisions to the main manuscript:

      p. 9: Permutation tests confirmed that the community assignment in the frontoparietal and subcortical regions showed significant changes between the capsaicin versus control conditions (Figure S6A).

      p. 13: Permutation tests further confirmed that the community assignment in the frontoparietal, subcortical, and brainstem regions showed significant changes between the early versus late period of pain (Figure S6B).

      pp. 36-37: Permutation tests for regional differences in community structures. To test the statistical significance of the voxel-level difference of consensus community structures (Figures 2 and 3), we performed the following Phi-test (Alexander-Bloch et al., 2012; Lerman-Sinkoff & Barch, 2016). First, for each given voxel, we compared the community label of the voxel to the community label of all the voxels, generating a list of voxel-seed module allegiance values that allow quantitative comparison of voxel-level community profile (e.g., [1, 0, 1, 1, 0, 0, ...], whose element is equal to 1 if the seed and target voxels were assigned to the same community and 0 otherwise). Next, a correlation coefficient was calculated between the module allegiance values of the two different brain community structures (i.e., capsaicin versus control, and early versus late). This correlation coefficient is an estimate of the regional similarity of community profiles (here, the correlation coefficient is Phi coefficient because module allegiance is a binary variable). To estimate the statistical significance of the Phi coefficient, we performed permutation tests, in which we randomly shuffled the labels and then obtained the group-level consensus community structures from the shuffled data. Then, the Phi coefficient between the module allegiance values of the two shuffled consensus community structures was calculated. We repeated this procedure 1,000 times to generate the null distribution of the Phi coefficient for each voxel. Lastly, we examined the probability to observe a smaller Phi coefficient (i.e., a more dissimilar community profile) than the one from the non-shuffled original data, which corresponds to the P-value of the permutation test. All the P-values were one-tailed as the hypothesis of this permutation test is unidirectional.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript by Chen et al., the authors use live-cell single-molecule imaging to dissect the role of DNA binding domains (DBD) and activation domains (AD) in transcription factor mobility in the nucleus. They focus on the family of HypoxiaInducible factors isoforms, which dimerize and bind chromatin to induce a transcriptional response. The main finding is that activation domains can be involved in DNA binding as indicated by careful observations of the diffusion/reaction kinetics of transcription factors in the nucleus. For example, different bound fractions of HIF-1beta and HIF2alpha are observed in the presence of different binding partners and chimeras. The paradigm of interchangeable parts of transcription factors has been eroded over the years (the recent work of Naama Barkai comes to mind, cited herein), so the present observations are not unexpected per se. Yet, the measurements are rigorous and wellperformed and have the important benefit of being in living cells. Enthusiasm is also dampened by the exclusive use of one technique and one analysis to reach conclusions.

      In the revised manuscript we complement the single molecule imaging experiments with genomic approaches, including Cut&Run and RNA-seq, that largely confirm our main conclusions derived from the SPT results. 

      Reviewer #2 (Public Review):

      The authors raise the very important question how different transcription factors with similar in vitro DNA sequence specificity are able to achieve distinct binding profiles associated with distinct functions. They use hypoxia inducible factors (HIF) as model system and combine live cell single-particle tracking with comprehensive genetic and chemical perturbations to study the mechanisms underlying isoform-specific gene regulation. Their main experimental readout is the distribution of diffusion coefficients of a molecular species, extracted from a population of single-particle trajectories. From this distribution, the authors extract the fractions of immobile and mobile molecules as well as the peak diffusion coefficient of the mobile fraction. They find that in addition to the structured DNA binding domain and the dimerization interface of HIF-1a and HIF-2a, the C-terminus of those factors, which includes intrinsically disordered regions and an activation domain, contributes to modulating the bound fraction of HIF-1b and the HIF-a isoforms. In particular, the C-terminus of HIF-2a mediates a higher bound fraction than the one of HIF-1a. This finding is important as it demonstrates that separating HIF into distinct domains that each have clearly defined functions is an oversimplification. Rather, a more holistic view seems suitable, in which all parts of HIF contribute to nuclear diffusion and binding.

      The conclusions drawn on the bound fractions and the nuclear dynamics of HIF isoforms are mostly backed up by data and proper controls. However, some controls are missing and some aspects of data analysis need to be clarified and extended. Moreover, the authors fail to answer their initial question, as the experimental readout does not contain information on the DNA sequences involved in the binding events.

      Experimental controls:

      For some imaging experiments, the authors use cell lines where endogenous HIF-1b or HIF-2a was fused to a N-terminal HaloTag by CRISPR/Cas editing. These cell lines are comprehensively controlled for proper functionality of the edited transcription factors, including expression levels, cellular localization and DNA binding. However, differential expression compared to unedited levels is not quantified and only Halo-HIF-2a is tested for functional gene transcription.

      To confirm that the tagged proteins still maintain normal function in driving target gene expression, we performed RNA-seq on WT cells, HaloTag-HIF-2α KIN and Halo-HIF-1β KIN cells, and show that gene expression on these edited cells do not differ significantly from unedited WT cells (Figure 1—figure supplement 3B, C).

      Other experiments include overexpression of exogenously expressed factors. For those, the authors give statements such as "expressed from a relatively strong ... promoter" and "weakly expressed", but do not provide any control of the amount of overexpression. Quantifying the expression levels will be important, as some of the author's experiments demonstrate a strong dependency of results on expression level. 

      We have now included Western Blot results showing L30-driven expression of all HIF variants in comparison with KIN levels (Fig 4—Figure Supplement 1). However, we note that cells stably expressing the HIF variants are polyclonal and Western Blotting is a bulk assay only able to assess the population average. As such, Western blot analysis may not reflect the actual expression level in the individual cells used in the imaging experiments. To properly control HIF expression at the individual cell level, we instead monitored the protein concentration in each cell and only chose to image cells with similar fluorescence level, as measured by localization density (Fig 4—Figure Supplement 1 and see detailed discussion in Appendix 2).

      Moreover, the authors do not provide any control for proper functionality of domainswap mutants.

      We now include RNA-seq results demonstrating that WT cells over-expressing HIF-α

      WT and domain swap variants (Halo-HIF-1α, Halo-HIF-1α/2α, Halo-HIF-2α, Halo-HIF2α/1α) can activate their specific target genes, confirming that all these variants are also transcriptionally active. (See Figure 6A, B, Figure 6—figure supplement 2 - increased binding of wild type or domain-swapped HIF to several gene loci or neighboring regions coincide with increased transcription levels of these genes, and Figure 7 - HIF expressing cells with same HIF-IDR co-cluster in their mRNA transcription profile).

      The authors further state that they use a high illumination power of 1100 mW. Such high laser power might be detrimental to cells and the authors should control whether this laser power induces any artifacts.

      We agree that a high illumination power (indispensable to achieve high signal-to-noise ratio and detect single molecules) may be detrimental to cells in the long run. However, we only took 1 movie with < 2000 frames for each cell. With a 5-ms frame rate, the total imaging duration per cell was under 10 seconds. Cells are unlikely to respond to any stimulus/damage in such a short time. Moreover, we used stroboscopic illumination instead of continuous illumination, with only 1-ms laser exposure for each 5-ms frame. The total integrated laser exposure is thus only 2 seconds. In addition, all imaging was done with a red laser (633 nm), which has a relatively low phototoxicity. Finally, the 1100 mW is the output from the laser box, but the actual laser power density used for imaging were measured to approximately 2.3 kW/cm2 at 633 nm (Graham et al., 2021). Such an imaging scheme is very unlikely to generate phototoxicity artifacts within the short time window of our measurements. Lastly, we are comparing results across all conditions with the exact same imaging set-up, so any artifact should be accounted and controlled for. We do consider fast SPT a terminal, end-point experiment, where each cell is only imaged once and never re-used.

      Data analysis:

      Distributions of diffusion coefficients greatly vary between individual cells (e.g. Fig. 2A and B, Fig. S3A and C, Fig. S4E). Unfortunately, the authors do not explain whether this variation is a real cell-to-cell variation, or rather reflects variation of their analysis method, potentially due to a low number of single particle tracks per cell. 

      We agree with the reviewer that the cell-to-cell variation we observed could be due to a low number of trajectories collected for each cell. In fact, sampling small numbers of trajectories allows us to identify protein species with unique diffusion coefficients, which might be lost if we just looked at a large population. Also, the fact that the diffusion coefficient distribution varies between cells does not mean that a particular cell only contains the more prevalent species that was detected. Here we are not trying to determine whether proteins in each cell indeed behave differently or whether the observed variation in the diffusion coefficient distribution is simply an effect of the limited trajectories collected in each cell. We instead analyzed data collected from many cells combined to get a better estimation of the population behavior. We have modified our text to make this important point clear to the readers. 

      Moreover, the bound fraction of HIF-1b differs between two independent measurements including three biological replicates each (Fig. 5 C and F). This raises the concern that not enough data enter each biological replicate, or not enough replicates are considered.

      Unfortunately, the number of cells that could be measured in our current setup is limited. It takes approximately 1 hour to collect 20 cells per sample, including staining, washing, looking for cells with desired expression level, and acquiring movies. For experiments with multiple conditions (>12), 20 cells per sample is the upper limit that can fit into a single day. 

      To address the question of what is the minimum number of cells/replicates needed we included in Figure 2—figure supplement 3 - the result of a bootstrapping analysis. We used data collected from a total of 243 cells of the same cell line, from over 11 replicates as the “population” and performed a bootstrapping analysis to identify the source of variation. We have also included appendix 1 with a detailed discussion. Our results showed that cell-to-cell variation contributes most to the total variation of the data, followed by day-to-day (replicate-to-replicate) variation. However, sampling over 800 trajectories, and from over 60 cells, imaged in 3 replicates well approximates the “population value” (bound fraction calculated from 243 cells from over 11 replicates). As a result, in each figure we always used over 60 cells from 3 replicates to generate the reported parameters. Although this approach still gives variable numbers from figure to figure, the variations seen for the same cell line are much smaller compared to the differences observed between different cell lines/conditions. 

      The authors compare the bound fractions among various mutants and experimental conditions. However, the peak diffusion is not, or only descriptively, evaluated. Thus, it is not clear whether the main effect of a mutation or chemical treatment is to change the bound fraction, or rather the diffusion coefficient of the mobile fraction. 

      Since there might be multiple mobile populations (defined as the fraction with a diffusion coefficient > 0.5 μm2/sec), the mean diffusion coefficient can change while the mode (peak) diffusion coefficient stays the same and vice versa. Because of such complexity in the mobile population, we prefer to use descriptive words to report the trend for the change instead of reporting exact values. However, as requested, we have added peak diffusion coefficient information to relevant figures as bar plots. We have also included in Table 1 a summary of mean and mode diffusion coefficient estimated for moving molecules in all relevant figures for reader’s reference. Note that the diffusion coefficient estimation is on a log scale, and the larger the diffusion coefficient, the lower the resolution (e.g, there is 1-grid of difference both between 2.63 and 2.75, and between 9.55 and 10).

      Conclusions:

      The authors provide data that highlight a potential role of the intrinsically disordered domain of HIF in modulating the bound fraction of these transcription factors. They further claim that the intrinsically disordered domains have a main contribution to this bound fraction. However, the autors do not quantify how this contribution relates to those of the DNA binding domain or the dimerisation interface. Changes in bound fraction estimated from the data in e.g. Fig. 3C, Fig. 4C, Fig. 5C and F rather hint to a dominant effect of dimerisation, followed by DNA binding and a smaller contribution of the intrinsically disordered domain. The authors should quantify the relative changes of the bound fraction for all mutants and experimental conditions, to clarify the importance of the contribution of the intrinsically disordered domain.

      It would be ideal if we could quantify what percent of the bound fraction is contributed by dimerization interface, DBD and IDR, respectively. However, it is very likely that these different domains do not act independently of each other in terms of binding to chromatin fibers. In practice, it is very difficult to dissect and quantify these effects independently. For example, we did try to express HIF-1α and 2α with their IDR completely deleted; however, because the protein-degradation signals are within the IDRs, these deletions caused massive stabilization of these proteins, making it impossible to find cells that express these forms at similar levels as the full-length counterpart. As a result, although these IDR-deleted HIF-α show greatly reduced binding, we did not include the results in the paper because the loss of binding could also be due to the overall higher protein expression levels, leading to large unbound fractions. Regarding the DBD mutants, they only have 1 mutation, so it is hard to tell whether the remaining binding in Figure 5B is due to some residual binding affinity of HIF-α (HIF-α only partially lost its binding affinity), or is due to binding through its partner HIF-1β (HIF-α completely lost binding affinity, but can still bind through dimerization with HIF-1β). All we can safely conclude from Figure 5B is that HIF-α DBD is required for optimal binding, but we cannot determine how much exactly it contributes to binding. We thus argue that, given the interdependence of the different protein domains, the reviewer’s request is not experimentally feasible.

      The authors state that the intrinsically disordered domains of HIF determine their differential binding specificity to chromatin. However, the experiments provided do not allow for such a conclusion. In particular, measuring changes in the bound fractions is not sufficient. Such a conclusion requires a method that is able to inform about the DNA sequences involved in HIF binding, for example chromatin immunoprecipitation.

      As requested, we have included new Cut&Run and RNA-seq results in the revised manuscript showing HIF-α-IDR-specific binding and gene activation.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors report a public browser in which users can easily investigate associations between PGSs for a wide range of traits, and a large set of metabolites measured by the Nightingale platform in UKBB. This browser can potentially be used for identifying novel biomarkers for disease traits or, alternatively, for identifying novel causal pathways for traits of interest.

      Overall I have no major technical concerns about the study, but I would encourage the authors to revisit whether they can find a more compelling example that can better showcase the work that they have done. I understand that this is partly a resource paper but I think the resource itself can have more impact if the paper provides a clearer use-case for how it can drive novel biological insight.

      Many thanks for your comments. We have undertaken a new application of bi-directional Mendelian randomization to demonstrate how users may use this approach to disentangle whether associations in our atlas likely reflect either causes or consequences of PGS traits/diseases. This example is described on page 9:

      ‘For example, we applied Mendelian randomization (MR) to further evaluate associations highlighted in our atlas with triglyceride-rich very low density lipoprotein (VLDL) particles. For instance, both VLDL particle average diameter size and concentration were associated with the PGS for body mass index (BMI) (Beta=0.04, 95% CI=0.033 to 0.046, P<1x10-300 & Beta=0.012, 95% CI=0.006 to 0.019, P=2.7x104 respectively) and coronary heart disease (CHD) (Beta=0.026, 95% CI=0.019 to 0.032, P<1x10-300 & Beta=0.035, 95% CI=0.028 to 0.042, P<1x10-300 respectively). Conducting bi-directional MR suggested that the associations with average diameter of VLDL particles are likely attributed to a consequence of BMI and CHD liability as opposed to the size of VLDL particles having a causal influence on these outcomes (Supplementary Table 6). In contrast, MR analyses suggested that the concentration of VLDL particles increases risk of CHD (Beta=1.28 per 1-SD change in VLDL particle concentration, 95% CI=1.25 to 1.65, P=2.8x10-7) which may explain associations between the CHD PGS and this metabolic trait within our atlas.’

      and discussed in the discussion on page 21:

      ‘We likewise conducted bi-directional MR to demonstrate that associations between the CHD PGS and VLDL particle size likely reflect an effect of CHD liability on this metabolic trait. In contrast, the association between the CHD PGS and VLDL concentrations are likely attributed to the causal influence of this metabolic trait on CHD risk, suggesting that it is the concentration of these triglyceride-rich particles that are important in terms of the aetiology of CHD risk as opposed to their actual size. We envisage that findings from our atlas, as well as other ongoing efforts which leverage the large-scale NMR data within UKB, should facilitate further granular insight into lipoprotein lipid biology.’

      PGS construction: It's unclear how well the PGS work. Should the reader prefer the stringent or lenient PGS? Perhaps there could be some validation with traits that have decent sample sizes in UKBB. Was there any filtering to remove traits with few GWS hits, low sample sizes, or low SNP heritability as these are unlikely to produce useful PGSs?

      An example of validation was previously included for the chronic kidney disease PGS and its association with circulating creatinine, although this has now been removed due to the feedback you provided in your comments below. However, we have now provided the weights for all of the PGS included in our web atlas should users want to use these scores for prediction purposes (page 7):

      ‘The specific weights for clumped variants used in all PGS can be found at https://tinyurl.com/PGSweights.’

      On page 8 we have mentioned that in this work we have used a more lenient threshold to facilitate endeavours in a ‘reverse gear Mendelian randomization’ framework. However, the option to use the more stringent threshold remains an option for users interested in this as an alternative:

      ‘In this paper, we have discussed findings using PGS that were derived using the more lenient criteria (i.e., P<0.05 & r2<0.1), although all findings based on both thresholds can be found in the web atlas.’

      ‘Specifically, we believe our findings can facilitate a ‘reverse gear Mendelian randomization’ approach to disentangle whether associations likely reflect metabolic traits acting as a cause or consequence of disease risk (Holmes and Davey Smith, 2019) as illustrated using triglyceride-rich very low density lipoprotein (VLDL) particles in the next section.’

      We have not filtering based on other criteria such as the number as SNPs given that certain scores, despite only been constructed using few SNPs, may still provide useful to users. For example, our score for ‘Drinks per day’ based on the more stringent threshold (i.e. P<5x10-8) consists of only 6 SNPs. However, one of these is rs1229984, a missense variant located at the alcohol dehydrogenase ADH1B gene region and known to be a strong predictor of alcohol use (e.g. https://pubmed.ncbi.nlm.nih.gov/31745073/).

      Reviewer #2 (Public Review):

      The authors set out to create an atlas of associations between phenome-wide polygenic scores and circulating lipids, fatty acids, and metabolites. To do so, they utilize GWAS from 129 traits available in the OpenGWAS database to derive polygenic (risk) scores (PGS) along with the recently released NMR metabolomics data containing 249 biomarkers (and ratios) in ~120,000 UK Biobank participants. The authors create a publicly available web portal containing PGS to NMR biomarker associations:

      http://mrcieu.mrsoftware.org/metabolites_PGS_atlas/.

      The strength of this study is in the comprehensive nature of the atlas, containing associations for 129 traits phenome-wide, the large sample size of the UK Biobank NMR data, and the use of PGS for prioritising molecular traits for follow-up experiments, which is an emerging area of interest (International Common Disease Alliance, 2020; Ritchie et al., 2021a). To our knowledge this study is the first to explore this for circulating metabolites.

      In its current form the atlas has several limitations, which should be straightforward to address. Notably, results in the current atlas may be confounded by (1) technical variation in the NMR data (Ritchie et al., 2021b), and (2) major biological determinants of biomarker concentrations, including body mass index, fasting time, and statin usage.

      Firstly, thank you for the suggestion to use your ‘ukbnmr’ R package to help remove technical variations from the UK Biobank NMR metabolites data. We have applied it to remove outliers and variation in the individual data due to (1) the duration between sample preparation and sample measurement, (2) position of samples on shipment plates, (3) different equipment (spectrometers) used. This meant that we needed to re-run our entire analysis pipeline for this project from scratch to the updated dataset. Results do not appear to have drastically changed, although nonetheless we have updated results from all downstream analyses in our online web atlas using this updated dataset provided by ‘ukbnmr’.

      Secondly, the reviewer is correct that biological factors, such as body mass index (BMI) and statin usage, are indeed strongly correlated with metabolites levels. However, we are not able to adjust for such biological factors directly in our analyses, given that they are potential colliders in the causal relationship between diseases/traits and metabolites. Statin usage may be caused by both the high genetic liability to coronary artery disease as well as abnormal lipoprotein lipid levels. Likewise, obesity (and changes in BMI) may result from a high genetic predisposition to cardiometabolic disorders and disrupted metabolism. Thus, adjusting for statin usage and BMI will induce collider bias (https://jamanetwork.com/journals/jama/fullarticle/2790247), which creates spurious associations between the disease/trait PGS and metabolites.

      To better illustrate this issue, we have added additional text on page 14 to justify this study design decision as well as added a new figure (Figure 3) to help demonstrate this clearly to the readers. Fasting time on the other hand we believe is unlikely to act as a collider and was adjusted as a covariate in all linear regression models in this work. This is mentioned on page 25.

      …Further, association results for two (of the 129) PGSs, systolic blood pressure (SBP) and diastolic blood pressure (DBP), are invalid (vastly inflated) as the GWASs used to construct these PGSs included UK Biobank samples.

      Many thanks for your suggestion. We have now removed the SBP and DBP PGS from our atlas due to overlapping samples in UKB. Furthermore, our colleagues at the University of Bristol have notified us that the Glioma GWAS data obtained from the OpenGWAS platform was uploaded with incorrect effect alleles. This PGS has also been subsequently removed from the atlas. Additionally, we removed the Alzheimer’s disease (without APOE) PGS because the pleiotropic effect of lipid associated genes is now systematically examined using lipid gene excluded PGS.

      To demonstrate how one might use these PGS to NMR biomarker associations to prioritise (or deprioritise) findings for follow-up, the authors select a biomarker of interest, glycoprotein acetyls (GlycA), to perform bi-directional Mendelian randomization to orient the direction of causal effects between GlycA and traits of associated PGS. However, the conclusions of this analysis are hampered by the heterogeneous nature of the GlycA biomarker, which captures the levels of five proteins in circulation (Otvos et al., 2015; Ritchie et al., 2019), making it a difficult target to appropriately instrument for Mendelian randomization analysis. This, however, does not detract from the broader point the authors make: that PGS can help prioritize molecular traits for experimental follow-up.

      We have now conducted further sensitivity analyses to evaluate the genetically predicted effects of each of the five proteins in the reference you have provided. This is discussed on page 11:

      ‘We also conducted further sensitivity analyses given that the NMR signal of GlycA is a composite signal contributed by the glycan N-acetylglucosamine residues on five acute-phase proteins, including alpha1-acid glycoprotein, haptoglobin, alpha1-antitrypsin, alpha1-antichymotrypsin, and transferrin (Otvos et al., 2015). Using cis-acting plasma protein (where possible) and expression quantitative trait loci (pQTLs and eQTLs) as instrumental variables for these proteins (Supplementary Table 12) did not provide convincing evidence that they play a role in disease risk for associations between PGS and GlycA (Supplementary Table 13). The only effect estimate robust to multiple testing was found for higher genetically predicted alpha1-antitrypsin levels on gamma glutamyl transferase (GGT) levels (Beta=0.05 SD change in GGT per 1 SD increase in protein levels, 95% CI=0.03 to 0.07, FDR=3.6x10-3), although this was not replicated when using estimates of genetic associations with GGT levels from a larger GWAS conducted in the UK Biobank data (Beta=1.6x10-3, 95% CI=-6.9 x10-3 to 0.01, P=0.71). For details of pleiotropy robust analysis and replication results see Supplementary Table 14.’

      There are also several important limitations to the study which cannot be addressed, which the authors discuss appropriately in the paper. First, the NMR data does not provide a comprehensive view of the metabolome - it is heavily focused on lipids and fatty acids. Many small metabolites in circulation cannot be measured by NMR spectroscopy, and further insights must wait for data from molecular profiling efforts planned or underway in UK Biobank (e.g. mass spectrometry). Second, the authors restricted analysis to participants of European ancestries. This a pragmatic analysis choice given (1) the PGSs were derived from GWAS performed in European ancestries, (2) PGS associations are particularly susceptible to confounding from genetic stratification and differences in environment, and (3) the very small sample sizes for which NMR data is currently available in UK Biobank participants. Finally, although a large sample size, UK Biobank is not a random sample of the population: healthy adults are over-represented, meaning PGS to metabolite associations may be different in disease cases or less healthy individuals.

      Overall this study has strong potential, with straightforward to address limitations, and the resulting atlas will provide a useful characterisation of the relationships between NMR biomarkers and polygenic predisposition to various traits and diseases, which can be used by domain experts to prioritise biomarkers or traits for experimental follow-up.

      Reviewer #3 (Public Review):

      Fang et al. created an atlas for associations between the genetic liability of common risk factors or complex disorders and the abundance of small molecules as well as the characteristics of major apolipoproteins in blood. The whole study is well executed, and the statistical framework is sound. A clear strength of the study is the large array of common risk factors and disease analyzed by means of polygenic risk scores (PGS). Further, the development of an open access platform with appealing graphical display of study results is another strength of the work. Such a reference catalog can help to identify novel biomarkers for diseases and possible causative mechanisms. The authors further show, how such a systematic investigation can also help to distinguish cause from causation. For example, an inflammatory molecule readily measured by the NMR platform and strongly associated in observational studies, is likely to be a consequence rather than a cause for common complex diseases.

      However, in its current form, the study suffers from some weakness that would need to be addressed to improve the applicability of the 'atlas'. This includes a distinction of locus-specific versus real polygenic effects, that is, to what extent are findings for a PGS driven by strong single genetic variants that have been shown to have dramatic impact on small molecule concentrations in blood.

      Thank you for your suggestions to help refine our work. In line with this comment, we have repeated all analyses 1) after applying the ‘ukbnmr’ R package as recommending by reviewer #2 to remove technical variations and outliers and 2) conducted sensitivity analyses to remove an established list of lipid gene loci from PGS construction. Full results can be interrogated in the web atlas to evaluate whether PGS association may be driven by locus-specific effects at these regions, which may be particularly informative given the representation of lipoprotein lipid metabolites on the NMR panel. Findings are reported on page 19:

      ‘The polygenic nature of complex traits means that the inclusion of highly weighted pleiotropic genetic variants in PGS may introduce bias into genetic associations within our atlas. To provide insight into this issue, we constructed PGS excluding variants within the regions of the genome which encode the genes for 14 major regulators of NMR lipoprotein lipids signals which captured 75% of the gene-metabolite associations in the Finnish Metabolic Syndrome In Men (METSIM) cohort (Gallois et al., 2019). For details of these genes see Supplementary Table 5).

      For PGS with these lipid loci excluded, anthropometric traits such as waist-to-hip ratio (N=209), waist circumference (N=206) and body mass index (N=205) still provided strong evidence of association with the majority of metabolic measurements on the NMR panel based on multiple testing corrections. Elsewhere however, the Alzheimer’s disease PGS, which was associated with 60 metabolic traits robust to P<0.05/19 in the initial analysis including these lipid loci (Supplementary Table 17), provided no convincing evidence of association with the 249 circulating metabolites after excluding the lipid loci based on the same multiple testing threshold (Supplementary Table 18). Further inspection suggested that the likely explanation for this attenuation of evidence were due to variants located within the APOE locus which are recognised to exert their influence on phenotypic traits via horizontally pleiotropic pathways (Ferguson et al., 2020).’

      …Further, it is unclear how much NMR spectroscopy adds over and above established clinical biomarkers, such as LDL-cholesterol or total triglycerides. This is in particular important, since the authors do not adequately distinguish between small molecules, such as amino acids, and characteristics of lipoprotein particles, e.g., the cholesterol content of VLDL, LDL or HDL particles, the latter presenting the vast majority of measures provided by the NMR platform. Finally, the study would benefit from more intriguing or novel examples, how such an atlas could help to identify novel biomarkers or potential causal metabolites, or lipoprotein measures other than the long-established markers named in the manuscript, such as creatinine or lipoproteins.

      To address these comments, we have added a new example focusing on the granular measures of VLDL particles provided by the NMR data (on top of the examples listed at the start of the response to reviewer document), which as the review points out is one of its strengths of the measures generated by this platform over long-established biomarkers (page 21):

      ‘We likewise conducted bi-directional MR to demonstrate that associations between the CHD PGS and VLDL particle size likely reflect an effect of CHD liability on this metabolic trait. In contrast, the association between the CHD PGS and VLDL concentrations are likely attributed to the causal influence of this metabolic trait on CHD risk, suggesting that it is the concentration of these triglyceride-rich particles that are important in terms of the aetiology of CHD risk as opposed to their actual size. We envisage that findings from our atlas, as well as other ongoing efforts which leverage the large-scale NMR data within UKB, should facilitate further granular insight into lipoprotein lipid biology.’

    1. Author Response

      We appreciate the thoughtful and thorough critique provided by the two reviewers, and generally agree with their assessment. The revised submission will address the issues they raise. In particular, we agree that the framework of the paper should be broadened to include bacteria and the deep literature associated with coincidental selection.

    1. Author Response

      Evaluation Summary:

      The work by Volante et al. studied a new plasmid partition system, in which the authors discovered that four or more contiguous ParS sequence repeats are required to assemble a stable partitioning ParAB complex and to activate the ParA ATPase. The work reveals a new plasmid partitioning mechanism in which the mechanic property of DNA and its interaction with the partition complex may drive the directional movement of the plasmid.

      Thank you for the kind evaluation. But we wonder about the description of the pSM19035 partition system we studied here as “a new plasmid partition system”. This system itself is quite old. The editor might have meant “new” as a subject of a research, but plasmid partition systems involving RHH-ParB proteins have been studied by number of groups for some time, including the Alonso Lab, which has worked on the pSM19035 partition system number of years prior to our current collaboration for this paper. Therefore, we wonder if the term “new” is the most appropriate.

      Reviewer #1 (Public Review):

      This is a very thorough biochemical work that investigated the ParABS system in pSM19035 by Volante et al. Volante et al showed convincingly that a specific architecture of the centromere (parS) of pSM19035 is required to assemble a stable/functional partition complex. Minimally, four consecutive parS are required for the formation of partition complex, and to efficiently activate the ATPase activity of ParA. The work is very interesting, and the discovery will allow the community to compare and contrast to the more widespread/more investigated canonical chromosomal ParABS system (where ParB is a sliding CTPase protein clamp, and a single parS site is often sufficient to assemble a working partition complex). All the main conclusions in the abstract are justified and supported by biochemical data with appropriate controls. A proposed multistep mechanism of partition complex assembly and disassembly (summarized in Fig 6) is reasonable. Perhaps the only shortcoming of this work is that the team does not yet get to the bottom of why four consecutive parS are needed.

      Thank you for the kind evaluation. The last point is an important one. We would like to continue to test our current model to either obtain stronger supporting evidence or come up with better alternative model.

      *Reviewer #2 (Public Review):

      ParBs come in two variations, RHH and HTH. In this study, the authors examine the in vitro behavior of the RHH system, which is less studied. Two activities were carefully monitored; ATPase activation and ParA removal from DNA. The system is quite complex, but the authors have done a good job of examining parameter space. One question concerns the physiological relevance. Can this be assessed by uncoupling ParA/ParB expression (making it inducible with IPTG from the chromosome, for example) and testing plasmids with the various constructs?

      This is an excellent point; we agree this a shortcoming of the current study. As described in response to “Essential Revisions”, we very much wanted to include an experiment testing in vivo plasmid stability for different size parSpSM sites in this paper, and we put a significant effort. However, we encountered certain technical issues with the approach we tried, and we failed to obtain conclusive data in timely fashion before we run out of time. Although, we had preliminary data, which appeared to be consistent with the notion that shorter parS sites are non-functional and full-size parS sites are functional, the experiment had certain flaw, which we could not rectify immediately to our satisfaction. Therefore, we decided to postpone this part of the project and plan for broader physiological evaluation of the parSpSM sequence arrangements in near future. In the revision, we mentioned at the beginning of discussion that in vivo functional test of parSpSM site requirements still remains to be examined.

      The authors appear to suggest that the requirement for at least 4 ParB binding sites is due to the inability of ParBs of this type to spread inferring that for the ParB-HTH multiple ParBs bound to ParS are required. Has this been tested in this system?

      ParB spreading has been shown to be essential for the HTH-ParB to perform its role in partition function. We clarified the importance of HTH-ParB spreading for partition function on lines 44-45.

      In any event, another major difference between the two systems is that a peptide corresponding to the N-ter of ParB is sufficient to bind DNA indicating this type of ParB does not have to be bound to DNA to stimulate ParA. It would have been useful if the authors had commented on this.

      There seems to be a mistype here. “N-ter of ParB is sufficient to bind DNA indicating ……” is incorrect. Perhaps this was meant to be “N-ter of ParB is unable to bind DNA, indicating ……” This is not a qualitative difference between the HTH- and RHH-ParBs: the N-terminal ParA interacting peptides of HTH-ParBs also can activate their cognate ParA ATPase without parS DNA binding, and parS-dependency of ATPase activation for HTH-ParBs appears to be significantly less stringent compared to the case for RHH-ParB we report here. ParBpSM1-27 , which cannot bind parSpSM, could only stimulate ParApSM ATPase to at most 1/10 of the full size ParBpSM in the presence of active parSpSM. We clarified this on lines 156-157, and also added discussion about this contrast between the HTH- and RHH-ParBs and possible implications on lines 458-467.

      Reviewer #3 (Public Review):

      Drs. Volante, Alonso, and Mizuuchi presented a milestone experimental finding on how the distinct architecture of centromere (ParS) on bacterial plasmid drives the ParABS-mediated genome partition process. Rather than driven by cytoskeletal filament pushing or pulling as its eukaryotic counterpart, the genome partition in prokaryotes is demonstrated to operate as a burnt-bridge Brownian Ratchet, first put forward by the Mizuuchi group. To drive directed and persistent movement without linear motor proteins, this Brownian Ratchet requires two factors: 1) enough bonds (10s' to 100s') bridging the PC-bound ParB to the nucleoid-bound ParA to largely quench the diffusive motion of the PC, and 2) the PC-bound ParB 'kicks" off the nucleoid-bound ParA that can replenish the nucleoid only after a sufficient time delay, which rectifies the initial symmetry-breaking into a directed and persistent movement. Although the time delay in ParA replenishment is established as a common feature across different bacteria, the binding properties of PC-bound ParB vary greatly, which begs the question of how Brownian Ratcheting adapts to different cellular milieu to fulfill the functional fidelity.

      The finding in this work presented a new but important twist in the Brownian Ratchet paradigm. The authors showed that in the pSM19035 plasmid partition system, only four contiguous ParB-binding repeats in ParS are required for the ParA-ParB interactions that drive the PC partition. In other words, only four chemical bonds are needed for the PC partition. Crucially, the authors further demonstrated that distinct orientation (configuration?) of the ParB-binding repeats is required for this fidelity by their state-of-art biochemistry and reconstitution experiments. The authors then elaborated on a possible mechanism of how the smaller number of PC-bound ParB can drive directed and persistent PC movement by interacting with nucleoid ParA. If I understand correctly, in their proposed scheme, due to their specific orientation (configuration?), when two of the ParS-bound ParB molecules bind to the two nucleoid-bound ParA molecules there arises a torsional/distortional stress. Consequently, the thermal fluctuations preload the forming bonds, triggering the dissociation of the two ParB molecules from the PC. And the remaining PC-bound ParBs may kick off the ParAs that have a time delay in DNA-rebinding, while ParB molecules will replenish the ParS to initiate the next round. In this proposal, the key conceptual leap is that not only the substrate but the cargo remodels to underlie the Brownian Ratcheting.

      We thank the reviewer for kind evaluation of our work. The model proposed is highly speculative at this point. Despite it may appear rather detailed in order to account for the unexpected findings, we consider it only a working hypothesis to be revised or replaced by a better model in future. We thank for many useful suggestions, which we will follow in our revision.

    1. Author Response

      REVIEWER #1 (PUBLIC REVIEW):

      The study by Monterisi et al. reports that loss-of-function mutations in metabolic pathways do not necessarily have a negative impact on cancer growth. The authors suggest that small solutes transferred via gap junction channels formed between wild-type cells and cells express mutants defective in metabolic pathways rescue the metabolic-deficient cancer cells. Through the examination of multiple human cell lines with several advanced means to determine gap junction coupling, Cx26 was identified as a major connexin molecule involved in medicating gap junction coupling between colorectal cancer (CRC) cells. The gene mutations of three metabolic gene mutations were investigated for major metabolic function of the cell, pH regulation, glycolysis and mitochondrial function.

      Strengths: The paper tests a new hypothesis that the mutations that inactivate key metabolic pathways do not incur functional deficits in cancer cells expressing the mutants due to their gap junction coupling to wild type cells.

      From microarray data they identified multiple connexins expressed in various CRC cells. Several advanced analyses were used to assess gap junction coupling in CRC cells including fluorescence recovery after photobleaching (FRAP). The extent of permeability at steady state was evaluated using CellTracker dyes and coupling coefficients were determined. They also used flow-cytometry to study dye transfer, which will provide a quantitative, dynamic means for study cell coupling. The data showed that knocking down Cx26 could greatly reduce diffusive exchange in most of the CRC cells tested.

      The study focused on three metabolic genes, Na+/H+ exchanger NHE1, a regulator of intracellular pH, a glycolytic gene, ALDOA and mitochondrial respiration gene, NDUFS1. These genes were knocked out in the selected CRC cells highly expressing these genes. The co-culture studies were well executed with fluorescence-markers distinguishing the WT and knockout cells and well-defined readouts such as intracellular pH, media pH, glucose/lactate levels and mitochondrial O2 consumption and glycolytic acid.

      The experiments in general were well designed and conducted, and the data supported the conclusions. The paper is also logically written and figures were well presented providing clear graphic illustrations.

      Thank you for recognising the strengths and novelty of our findings.

      Weaknesses: Although the hypothesis is innovative, no clear justification is provided that illustrates the scenario representing the clinical situation. The remaining questions include: What kind of somatic mutations in cancer cells has little impact on their growth and progression?

      We have now added in vivo data (Fig 8) and revised the Introduction and Discussion to address this point. Briefly, the broader clinical relevance our findings relates to the notion of essential genes and their negative selection. We show that connexin-dependent coupling can rescue a genetic deficiency, provided the mutation-carrying cell can access wild-type neighbours for the missing function. This rescue effect is limited to processes that handle solutes that can pass via connexin channels, i.e. metabolic processes. As such, sporadic loss-of-function mutations in “essential genes” may not produce a functional deficit in human cancers. We demonstrate rescue extensively in vitro, and now in a xenograft model.

      We argue that our work can explain why certain metabolic genes are essential in vitro, but not in vivo. In monolayers of mutated cells, diffusion across gap junctions cannot rescue the mutant phenotype, because there is no wild-type cell available to supply the missing function. In contrast, mutations in vivo will arise sporadically and wild-type cells are typically available to couple onto the mutation-bearing cell, providing it with functional rescue. Thus, only in the former case would the lethality of essential genes emerge.

      Indeed, many notable studies have found genes of various metabolic pathways to be essential for growth in vitro. Such genes would be expected to undergo negative selection in vivo, but this is exceedingly rare according to multiple observations. By demonstrating metabolic rescue in co-cultures (i.e. a setting closer to the tumour) and (now) in xenografts, our work provides an explanation for this apparent paradox. Indeed, cells such as NDUFS1-negative SW1222 grow very, very slowly in culture compared to wild-type cells and require regular media changes to keep pH alkaline. However, coupling onto wild-type cells can rescue knock-out cells in vitro and in vivo. We argue that this finding explains why loss-of-function mutations in NDUFS1 (and similar genes) do not undergo negative selection in human tumours (despite in vitro predictions).

      The three proteins selected for this study were chosen to represent very distinct types of solute-handling processes. We illustrate our point in a (new) summary figure in Fig 8.

      What types of WT cells, within the tumour cells or with neighbouring normal cells? Whether the current experimental design closely recapitulates the scenario in vivo?

      Indeed, we find that stromal fibroblasts may also support cancer cells via gap junctions, as this is essentially the same concept (i.e. coupling onto a cell with wild-type genes). However, we feel that expanding our present submission to fibroblasts would make the volume of data exceeding large. Also, the methods we use for fibroblasts are different, and require a full manuscript on its own. For example, there is the issue of how to control for the radically different growth rates of fibroblasts and cancer cells. We chose co-cultures of WT and genetically altered CRC cells so that the co-cultures are of the same background, with just one element changing (i.e. the metabolite-handling gene). This makes our data easy to interpret, and thus strengthens our case. Our in vitro experiments were performed on monolayers, where cells can make contacts in 2D. In vivo, these contacts will spread in all dimensions, thus connectivity is likely to be even more significant. If anything, monolayers probably under-estimate the importance of connectivity, but this preparation is more accessible for studying cell-to-cell communication.

      We recognise the importance of adding in vivo data to firm our conclusions. To that end, we have analysed xenografts established from co-cultures of wild-type DLD1 and NDUFS1-KO SW1112 cells on one flank of a mouse, and Cx26-KO DLD1 and NDUFS1-KO SW1112 cells on the other flank. This experiment tested whether Cx26-dependent connections between mitochondrially-defective NDUFS1 KO SW1222 cells and respiring DLD1 cells (on left flank only) are able to stimulate growth of the former (GFP-tagged). Indeed, NDUFS1-deficient cells grew faster when rescued by Cx26-expressing DLD1 cells. In contrast, their growth decelerated when DLD1 cells were Cx26-negative. We include these experiments and their controls in Fig 8.

      The readouts for co-culturing for glycolytic ALDOA and NDUFS1 knockout are only cell mass, without determining the more relevant markers, glucose/lactate and mitochondrial O2 consumption and glycolytic acid production.

      Our readouts are two-fold: total biomass and the size of the genetically altered compartment of co-cultures (GFP). We can therefore follow the relative growth of KO cells, which is essential for describing their growth (dis)advantage. We appreciate other markers are informative. Indeed, we characterised KO and WT cells in terms of O2 consumption and acid production in Fig 7. However, it would not be possible to measure glucose consumption selectively in GFP-positive KO cells of a co-culture, as the assays available for this measure ensemble rates for the entire population of cells (e.g. in a single well). Nonetheless, we believe that biomass as a readout is highly relevant to cancer, and we hope the reviewer concurs with us.

      The study needs to include cells without functional gap junctions like the characterized negative control RKO cells.

      This is an excellent suggestion, and we have added data for RKO cells to several figures. As expected, these do not form a syncytium and cannot rescue genetic defects in co-cultured cells. New data are shown in Fig 3G-H, Fig 6-supp2 and Fig 7H, adding to existing RKO controls in Fig 2A/B. Briefly, RKO cells do not exchange CellTracker dyes in monolayers (Fig 3F/G), cannot rescue cells that are ALDOA-deficient (Fig 6-supp2), and cannot rescue NDUFS1-deficient cells (Fig 7H). We also added Cx26-KO DLD1 cells to the CellTracker experiments in Fig 3.

      REVIEWER #2 (PUBLIC REVIEW):

      This paper is a logical extension of the 50 year-old concept of the "bystander effect" in tumours, wherein the effects of anti-tumour chemotherapeutics extend beyond the cells that take them up due to spread through gap junctions to adjacent cells. In this case, however, the authors have creatively realized that the reverse might also occur, and that tumour cells with otherwise fatal mutations in essential metabolic pathways can be rescued by their neighbours through passage of the missing metabolites through gap junctions. This can explain why mutations in other critical pathways, such as protein synthesis and transporters, are selected against in rapidly growing tumours, but others in equally critical pathways of glycolysis, electron transport, etc. are not, despite these genes having been demonstrated to be essential in in vitro KO studies (where all cells in the plate have the critical gene knocked out). A series of elegant experiments are used to test this proposal in several colorectal cancer (CRC) cell lines using three examples - pH regulation (defective Na+/H+ exchanger NHE1), glycolysis (defective Aldolase A (ALDOA)) and oxidative phosphorylation (defective Complex 1 - NDUFS1).

      Thank you for these positive comments. We have added key references to the bystander effect in the Introduction, and explain how our findings build on these milestones.

      The authors first determine the levels of different Cx proteins expressed in each cell line, and determine that for most Cx26 and 31 are dominant, although come lines have a subset of cells with high Cx43 expression. They then use Cell Tracker Green to pre-label cells and use FRAP as a means to measure how well the cell population is coupled. This is a useful measurement but is significantly over-interpreted by the authors as a "permeability" in uM/min. This is not really a permeability, which requires knowledge of the concentration gradient of the permeant species, relative cell volumes, etc. Rather it is a rate of fluorescent recovery that is presumably correlated with, but not quantitatively related to, levels of coupling.

      Thank you for this comment. We would like to explain why we believe our FRAP experiments are able to estimate permeability in units of um/s. The rate of recovery of a solute in a cell following its “destruction” (here, photobleaching) is given as follows:

      dCcell/dt = p⋅P(Ccell-Csurround) … [1]

      Where subscripts ‘cell’ and ‘surround’ refer to the cell and its neighbours. P is the permeability of the barrier between these two compartments, and p is the ratio of the surface area of the barrier (i.e. membrane) to volume of the bleached cell. Within a “bleached” cell, we measure fluorescence.

      Since fluorescence (F) is proportional to concentration (C), we can substitute:

      C = α⋅F

      where α is a constant of proportionality. Thus, the rate of recovery (L.H.S. of equation [1]) becomes:

      dC/dt = d(α⋅F)/dt = α⋅dF/dt … [2]

      And the R.H.S. of equation [1] is re-written as: P⋅(Ccell-Csurround) = P⋅(α⋅Fcell-α⋅Fsurround) = α⋅P⋅ (Fcell-Fsurround) … [3]

      Putting [2] and [3] together,

      dFcell/dt = p⋅P⋅(Fcell-Fsurround)

      Prior to photobleaching, there are no (net) gradients, thus initial Fcell and Fsurround are equal.

      Thus, we can re-write the equation in terms of normalized fluorescence (f=F/F0):

      dfcell/dt = p⋅P⋅(fcell-fsurround)

      P can therefore be expressed as:

      P = dfcell/dt / (p⋅ (fcell-fsurround))

      Here, dfcell/dt is measured from the fluorescence recovery time course and fcell-fsurround is measured experimentally (in fact, bleaching in the cell is set to 50%, thus this takes the value of 0.5 by default). We can approximate the monolayer as a network of cuboidal cells. The cell’s volume is thus ‘area’ times ‘height’, and the cell’s surface (at which it contacts its neighbors) is the ‘cell’s perimeter’ times ‘height’. Thus, for the bleached cell,

      p = perimeter × height / area × height = perimeter / area.

      The perimeter and area can be measured from the acquired fluorescence images. Thus, we can describe permeability using data obtained from image stacks. We appreciate that this method makes certain geometrical approximations, but we believe these are not unreasonable. We explain the assumptions and calculations in Appendix 1. More information about the method is published by us in https://pubmed.ncbi.nlm.nih.gov/28368405/. Of course, we accept that these calculations are less accurate than, say, electrical conductance measurements, and to that end, we added a note of caution to the main text.

      This fluorescent recovery is shown to be sensitive to siRNA KO of Cx expression, but strangely its reduction is only correlated with KD of Cx26 in the 5 cell lines examined. KD of Cx43 (in LOVO cells) and Cx31 in all 5 cell lines had no effect or in some cases seemed to increase the rate of recovery (DLD1 and SNU1235). This is a notable finding, yet the authors choose to completely ignore it and continue with Cx26 KDs in studies of specific metabolite transfers. Some discussion should be included as to why KD pf these Cxs has no effect or causes an apparent increase in coupling of the cells.

      The effectiveness of GJB2 knockdown in ablating ensemble connectivity is most likely a reflection that Cx26 is likely the dominant conductance inherited from the parent epithelium. Other isoforms are expressed, but in most CRCs cells, these do not produce major coupling, as GJB2 knockdown was sufficient to uncouple many CRCs. These observations justify our choice of connexin for studying metabolic rescue functionally. These findings are also consistent with the good correlation between ensemble connectivity and GJB2 levels.

      Our data show a trend that GJB3 (Cx31) KD in DLD1 and SNU1235 cells and of GJA1 (Cx43) KD in LOVO cells produce an increase in coupling. However, when analysed by hierarchical (nested) analysis, these effects are not statistically significant, and for that reason we did not elaborate on these trends in the original submission. The apparent increase in conductivity in cells treated with GJA1 or GJB3 siRNA could reflect a compensatory response to the ablation of a specific message, closer contacts between cells allowing Cx26 to strengthen its connections, or a shift away from heterotypic channels involving Cx26 and Cx31/Cx43, towards homotypic Cx26. We did not see any consistent change in the intimacy of cell-cell contacts. We now performed western blots for connexins to probe for compensatory changes (see Fig 2-supp1). In comparison to wild-type cells, expression of Cx31 was not changed by GJB2 (Cx26) or GJA1 (Cx43) knockdown in DLD1 cells. GJB2 KO DLD1 cells did not induce expression of the other major isoform, Cx43. Also, in DLD1 cells, KD of GJB3 or GJA1 did not substantially change Cx26 levels. Similarly, KD of GJB3 did not affect Cx43 levels. In GJA1-high C10 cells, KD of GJB3 did not alter Cx43 levels, although a small decrease was observed with GJB2 KD on Cx43. Also in C10 cells, KD of GJB2 and GJA1 did not induce an increase in Cx31 levels.

      We agree that complex interactions between connexin genes are possible, but we feel that a molecular study of Cx gene regulation would fall outside the scope of the present manuscript. Our findings point to a prominent role of Cx26 in metabolic rescue, and to strengthen this point, we show that Cx26-negative cells that express other connexins (e.g. C10 cells or NCIH747 cells) cannot rescue ALDOA-deficient counterparts or NDUFS1-KO SW1222 cells (new data in Fig 6 and 7). We share the Reviewer’s enthusiasm about the interplay between connexins and will endeavour to study this further in the near future.

      Rather than just focus on acute transfer of dye between cells, the authors develop a system using 50/50 mixes of cells labelled with two junctionally permeant dyes and measured the degree of mixing at equilibrium (48 hours). This is presented as a "coupling coefficient", but how it is calculated, and its significance is not well described, and does not correlate with the historical use of this term in the literature. Nonetheless, the studies do seem to demonstrate a good degree of equilibration, although it would have been informative to determine of the cells that do not exchange dyes express Connexins. To document that this equilibration requires gap junctions, the authors employ low density cultures, which significantly decrease dye exchange. However, in at least one cell line (SW1222) dye exchange is only reduced by <50%, indicating a very high background to this assay. This is not addressed.

      Thank you for these comments. We agree that our description of the method was inadequate, and we have added the necessary information in Appendix 1. We have also added information about actual confluency and restructured the figure. We also added new data for RKO cells and DLD1-Cx26 KO cells, i.e. two negative controls (Fig 3H). We pondered about the best name for describing the numerical output of the method, and concluded that “coupling coefficient” is reasonable (provided we improve our description of it) because it is dimensionless, and like many coefficients has a finite range (here, zero to one). With further explanation, we hope this terminology is acceptable. The issue with SW1222 cells is that both low- and high-seeding densities produce clusters of cells. Even though overall cell numbers were different in high and low seeded cultures, actual connectivity within “islands” of cells remained high, hence their similar coupling coefficients (see Fig 3E). Indeed, this CRC line is unusual in this behaviour, so we only present data from the higher density.

      The most compelling part of the study is the use of reporters to directly demonstrate a role of Cx26 coupling of cells to rescue cells with mutations of the three genes mentioned above when mixed with normal neighbours. This case was most convincing in the cases of ALDOA and NDUFS1, with the data for the pH regulation requiring more explanation for full understanding of the data shown (e.g. Figs 7 G and H).

      Thank you for this comment. Studies of pHi regulation provide a unique opportunity to obtain single-cell resolution (unlike e.g. glycolytic assays). We took advantage of this, and therefore the figure on pHi presents a greater depth of analysis. Nonetheless, we agree the pH data need further explanation. We have expanded the text, and also added a bar plot of data on day 7, which now provides a clearer illustration of the rescue effect. This form of presentation was also adopted for ALDOA and NDUFS1 experiments in the subsequent figures.

      Overall, the study does a credible job of demonstrating that Cx26 coupling of CRC cells serves to rescue cells with mutations in critically necessary metabolic pathways, presumably due to transfer of metabolites from surrounding wt cells. However, some of the results indicate this is not a simple process where all connexins behave similarly, and some effort should be made to investigate if Cx31 and 43, which do not seem to play the same roles in maintaining cell coupling as Cx26, also play any role in such metabolic rescue.

      Thank you for this comment. We have addressed this by selecting three additional cell lines for study: RKO – a cell line with no major Cx expression; C10 – a cell line that expresses Cx43, but very low levels of Cx26; NCIH747 – a cell line that expresses Cx31, but low levels of Cx26. These additional experiments cover lines that are GJB2 (Cx26)-low/negative to test whether metabolic rescue is best achieved with Cx26. Our new data show that these cells are unable to rescue metabolic defects (new data provided in Fig 6H/I, Fig 7H, and Fig 6-supp2). These findings strengthen our case for a major role of Cx26, at least in CRC networks. Indeed, recent analyses by Robert Gatenby and colleagues have shown that mutations in GJB2 (Cx26) are exceedingly rare in cancer (a property not shown for other connexins genes). This is interpreted to mean that Cx26 plays a particularly prominent role, ostensibly for metabolic rescue.

      REVIEWER #3 (PUBLIC REVIEW):

      Strengths of the study include that it appears to be a careful and well thought out set of experiments. The analysis and treatment of multiplexed data is also sophisticated. For the most part, the work is clearly and logically described, as well as well illustrated. In general, the authors achieved their experimental goals, and the methods while not entirely new, do provide new twists and augmentations that should be useful to the field. A general weakness is that this is not entirely a new story. Instead, it is a variant of one of the oldest concepts in the field of gap junction biology i.e. "Metabolic cooperation". The term "Metabolic cooperation" (i.e., as mediated by gap junctions) was not mentioned by the authors, but it is a long-established and foundational concept in the field. Indeed, in a classic paper by Gilula and colleagues published in 1972, the experimental approach used was similar to that of the study in hand. These earlier authors showed how transformed cell lines with deficiencies in hypoxanthine metabolism can be "rescued" by "metabolic cooperation" in co-culture with metabolically competent cells via passing a gap junctional permeant molecule. This and other relevant papers were not cited. More importantly, the extant literature places the onus on the authors to explain and convince reviewers why this study is more than an incremental step.

      We apologise for not quoting these important and classical references. We have now added these works to our reference list (quoted in Introduction). At the time of these seminal discoveries, Loewenstein and colleagues made a case that connexins are absent in cancer, and this belief persisted for many decades. More recently, the role of gap junctions in cancers has garnered attention. With new gene manipulations (e.g. CRISPR/Cas9) and imaging techniques and improved xenografting, it is now possible to precisely study the impact of GJ on cancer metabolism. Moreover, we have a wide panel of cancer cell lines to study, and identify the prominent role of Cx26. We highlight that our study is the first to offer a mechanistic explanation for the absence of negative selection in cancer, a phenomenon which was not known in the 1970s. To strengthen our novelty, we now add in vivo data to Fig 8 that confirm in vitro findings.

    1. Author Response

      Reviewer #1 (Public Review):

      1. “The major weakness of the study is that with the interpretation of the results. The changes in tractography, behavior and TBM are what would be expected following lesions of the neostriatum”

      We appreciate this comment and would like to offer clarification. We respectfully disagree that the pattern of results presented in this manuscript are akin to what would be expected following striatal lesions. In NHPs, striatal lesions typically cause more extreme phenotypes than what we observed in our 85Q-treated animals. In macaques, bilateral putamen lesions can result in phenotypes that include seizures, inappetence, hyper-aggression, and other severe features.  This strongly impacts clinical scores and can make it unfeasible to care for the animals for multiple years. For these reasons, recent NHP HD lesion models have used only unilateral putamen lesions coupled with bilateral caudate lesions to model HD (as in the recent paper by Lavisse et al, 2019). Of additional relevance is that even the cognitive effects of these striatal lesions are more severe than what we observed in our 85Q-treated animals: for example, Lavisse reported reduced performance on similar “prefrontal” cognitive tasks by ~50%, whereas our AAV-HTT model exhibited only ~10% reductions in working memory. This mild, but significant, change in cognitive performance and motor function seen in our 85Q animals is much more akin to that which is observed in the early stages of HD.

      2. “The results have been interpreted as showing a progressive model, although evidence that there is progression is limited”...“begs the question as to whether or not the 85Q-lesioned monkeys would recover to a level similar to the 10Q animals if left for another 12 months”

      At the request of Reviewer 1, we added an additional 30-month timepoint and re-ran all of the analyses to include these new data.  All of the behavioral and neuroimaging data were re-analyzed with this final timepoint included (see Lines 125-141, 146-163, 173-194, 228-255, 270-294, 314-345). Additionally, due to the unidirectional nature of our hypothesis and on the advice of our bio-statistician, we applied one-tailed tests to the planned comparisons in this revision. To address the Reviewer’s point directly: 85Q-treated animals showed minimal evidence of functional recovery between the 20- and 30-months timepoints on the behavior tasks. In particular, working memory deficits measured with SDR and fine motor skills measured with Lifesaver Retrieval did not improve between 20- and 30-months (Figure 1C and 1F). Additionally, neurological rating scores in group 85Q remained consistently elevated (in the 5-7 range) between the 20- and 30-month timepoint. Taken together, we feel confident that these results do not show evidence of any significant functional recovery, out to 2.5 years (30-months). In terms of the longitudinal trajectories of the behavioral measures, we appreciate the Reviewer’s feedback regarding the use of the term ‘progressive’ and have tempered our language appropriately. We removed all instances of the word progressive/progressed except in the context of the motor rating scores, which show a significant Group x Timepoint interaction and demonstrate a clear progression.

      3. “The whole manuscript is written as though this is a genetically-relevant progressive model of HD. But the animals are normal, and so there is no genetic context relevant to HD”

      We thank Reviewer 1 for this comment. We recognize that viral-based animal models of HD, including the model characterized here, are not as genetically similar to the human condition compared to some of the other modeling approaches currently under investigation (ex. knock-in and gene editing). Limitations of the AAV-based HTT85Q model include: 1) vector packaging restrictions that prohibit expression of full-length HTT, 2) the use of a CAG promoter vs. an endogenous promoter that leads to overexpression of the transgene, 3) the use of cDNA versus genomic DNA excludes introns and therefore lacks the ability to produce alternatively spliced variants (ex, Exon 1), 4) the use of a mixed CAG-CAA repeat may preclude the possibility of somatic instability and 5) expression of HTT that is restricted to specific brain regions and cell types. All of these important limitations have been added to the discussion section in this re-submission (Lines 503-517).

      Despite these limitations, we feel that this AAV2:AAV2.retro-HTT85Q based model has some features that make it genetically-relevant to human HD including: 1) the expression of an N-terminal fragment of human HTT (N171), 2) the N-terminal fragment bears a pathological PolyQ expansion (85Q), 3) the expressed mHTT fragment forms neuronal aggregates that can be detected in the nucleus, 4) mHTT fragments are expressed in many of the same brain regions where aggregates are detected in human HD cases, with both regional and sub-regional specificity (ex. higher expression in anterior vs posterior cortical regions and expression primarily limited to deep cortical layers V/VI) and 5) expression of mHTT fragments in these regions leads to many of the same pathological and behavioral changes observed in HD patients.  Importantly, expression of the N-terminal portion of HTT allows for the evaluation of HTT lowering therapeutics that target first 3 exons (ASOs, miRNAs, zinc finger repressors, CRISPR-based therapies, etc), which cannot be evaluated in lesion-based models.

      4. “The authors state in the Abstract that the injection resulted in "robust expression of mutant huntingtin in the caudate and putamen". These data are not in the manuscript.”

      Evidence of mHTT expression in the caudate and putamen, as well as several other brain regions, via immunohistochemical and immunofluorescent staining is now included in the manuscript. Please see additions to the methods, results and discussion sections regarding these findings, as well as a new Figure 5, (see Lines 347-376, 756-788). Additionally, further details regarding an associated PET imaging study in this same cohort of animals using a mHTT aggregate-binding radioligand has been added to the discussion, (see Lines 437-443). Please also see response #13 (below).

      5. “The authors chose to use a fragment of the HD gene, with a very long repeat that is seen only in juvenile patients”

      Comments regarding the need to use a fragment of the HTT gene, versus the full-length gene, due to packaging constraints of the viral vector, were added to the discussion in the context of limitations (Lines 503-517), and also discussed above in response #3.  The choice to use a CAG repeat length of 85 (83 pure CAGs followed by a CAA/CAG cassette -see response #17 below for further details), was based off previous studies wherein similar CAG repeat lengths were used to create animal models of HD over the past several decades. Interestingly, while CAG repeat lengths in patients with adult-onset HD typically range from ~40-60, longer repeat lengths (>60) are typically required in animal models of HD to elicit pathological and behavioral manifestations of disease: transgenic, knock-in and viral vector-based rodent models (ranging from 72-150 CAGs), OVT73 transgenic sheep model (73 CAGs), transgenic and knock-in minipig models (ranging from 85-150 CAGs), transgenic and viral vector-based macaque models (ranging from 82-103 CAGs). See Ramaswamy et al, 2007 and Howland et al, 2021 for thorough reviews of these models.

      6. “For their cognitive testing, the authors used a task (delayed non-match to sample) that measures object recognition and familiarity. Before surgery, only 11/17 of the animals were successfully trained to complete this task. It is not clear how useful the data are when only 64% of the animals can be included.”

      We appreciate the Reviewer’s concerns and have decided to conservatively remove this data from the revised manuscript.

      7. “It is not clear how this monkey model will be useful for developing either disease biomarkers or therapeutic strategies for HD (as stated in the abstract)”. “The authors state that they hope the model will become a widely used resource. This seems an unlikely scenario, given the limitations of the current study and the challenges associated with using monkeys. They say that a major advantage of their technique is being able to generate large numbers of monkeys. But this is not a relevant argument if the usefulness of the model to investigate HD is not proven.”

      We thank the reviewer for requesting clarification on these important points. We believe that this model will be useful for developing therapeutic strategies because the HTT85Q-treated macaques express mutant HTT, along with HTT aggregates, in several key brain regions that are affected in human HD, along with undergoing regional gray matter atrophy and white matter microstructural alterations that correlate well with behavioral dysfunction. Studies currently under review elsewhere also show reduced dopamine neurotransmission and regional hypometabolism via PET imaging in this model. Together, or individually, these imaging and behavioral changes can serve as outcome measures when screening potential therapies. Possible therapeutic interventions that are amenable to screening in this model are included in the discussion.

      Regarding biomarker development, we have already engaged in PET imaging biomarker development in this model in collaboration with the CHDI foundation and the Molecular Imaging Center at the University of Antwerp, evaluating a candidate radioligand that binds to aggregated mHTT. See #13 below for a more detailed description of this PET study, including recent data showing its ability to bind to aggregated species of mHTT in several brain regions in this same cohort of HTT85Q macaques that correspond to 2B4 and em48 IHC staining (a manuscript describing these results has been prepared for submission and the PDF is included for the reviewers to peruse).

      The authors do envision this AAV-based macaque model becoming a resource for the HD research community. While this model does have certain limitations (now detailed in the Discussion), we respectfully assert that all of the HD animal models, both small and large, each have their own important limitations to consider when deciding on which to use to screen therapeutics. Selecting a specific animal model based on the individual scientific questions being asked will be required, and employing a combination of models may be an even more prudent strategy.

      While NHP research presents unique challenges (cost, housing requirements and recent challenges in availability, among them), we believe that viral vector-based NHP models could be more accessible to the HD research community compared to some of the other established large animal models; in that they may able to be readily created at contract research organizations (CROs), in addition to various academic research institutions. There are now many CROs that exist in the US, and elsewhere around the world, that have developed specific expertise in MRI-guided, intracranial delivery of AAVs into the NHP brain (including the caudate and putamen), in the context of assessing therapeutic interventions for a variety of neurological disorders (HD, PD, and MSA, among others). Most of these same CROs also have expertise in NHP imaging (MRI/DTI) and behavioral assessments across multiple domains. It seems feasible that AAV-mediated HD macaques could be produced in sufficient numbers to appropriately power therapeutic studies, using the outcome measures established in the current study.

      Reviewer #2 (Public Review):

      1. “The major weaknesses are the manner in which the data is presented”

      We replotted all of the figures with improved color palettes and larger font sizes to make them easier to read. We also added additional details throughout the results section to aid in clarity and improve readability.

      2. “The authors would benefit from talking more about their model in the introduction and including references to some key points. For example, there has been critical new data in the field showing the importance of poly (CAG) in disease, not necessarily poly(Q), and the community will want to know (and not be required to look up), the nature of the transgene. Is it a pure CAG repeat? A mixed repeat? If it is pure, do they see or could they measure somatic expansion in the various brain regions impacted? How does that data match the phenotypes seen? Since this is a transgene, there is no possibility for the exon1/intron1 splicing variant to appear - how does this impact their interpretation”

      Further details regarding the transgene have been added to the Viral Vector Section of the Methods (Lines 531-550). The repeat is not pure and contains a single CAA interruption. The glutamine encoded repeat for HTT85Q contained 83 pure CAG repeats, followed by a single CAA/CAG cassette, while the glutamine encoded sequence for HTT10Q contained 8 pure CAG repeats followed by a single CAA/CAG cassette. Both constructs contained a proline stretch distal to the glutamine repeat in the following allelic conformation where QT represents the total glutamine length:

      HTT85Q: QT\=85, (CAG)83(CAACAG)1(CCGCCA)1(CCG)7(CCT)2

      HTT10Q: QT\=10, (CAG)8(CAACAG)1(CCGCCA)1(CCG)7(CCT)2

      There are plans to probe for somatic expansion in various brain regions, including the caudate and putamen, as well as several distal cortical regions. That analysis is ongoing and not in the scope of the present manuscript; however, these analyses are now mentioned in the discussion section (lines 540-560), as well as a discussion on the ability to either remove or duplicate the CAA/CAG cassette to potentially increase or decrease the rate of disease progression, respectively, based on the work of Ciosi et al. 2019. Additionally, Reviewer 2 is correct in that the lack of intronic sequences in the transgene precludes the formation of splicing variants, such as the exon1/intron1 variant, which we know is pathological based on the work of Bates et al. This drawback has been added to the discussion, along with other limitations of this viral vector-based model (Lines 503-517).

      3. “What about RAN translations? Is RAN translation noted at all in this over-expression model? How does that contribute (or not) to the progressive phenotype they see in their NHPs?”

      We are also curious regarding the assessment of toxic protein products from RAN translation of the expanded repeat sequence in this model. These studies are planned, and the results of these assays will be included in a future manuscript describing other ongoing post-mortem evaluations in this model.

    1. Author Respose

      Reviewer #1 (Public Review):

      This manuscript reports a new genetically encoded neuronal silencer BoNT-C. They show that it fully blocks neurotransmission in two classes of Drosophila motor neurons (Is and 1b; tonic and phasic, respectively). They also update a GCaMP postsynaptic reporter SynapGCaMP to express GCaMP8f instead of 6f. They selectively silence 1b or 1s neurons to disambiguate the neurotransmission properties of each neuron. Finally, they show that silencing either 1b or 1s neurons does not induce heterosynaptic structural or functional plasticity (only neuron ablation triggers plasticity). The data are convincing. The new silencing tool will be widely used.

      We thank this reviewer for his positive assessment of our study and for highlighting the utility of the new silencing tool presented in this study.

      Reviewer #2 (Public Review):

      The conclusions of this paper are properly supported by the provided data.

      Overall this work opens a new window to examine novel aspects of heterosynaptic structural and functional plasticity.

      We also thank this reviewer for his positive assessment of our study and for putting the importance of our findings in context.

      Reviewer #3 (Public Review):

      The strength of the manuscript by Han et al. is the comprehensive characterization of BoNT-C, showing that it truly abolishes all evoked and mini responses without structural alteration of the NMJ. Based on this, the authors then show that ablation of all neurotransmission in either Ib or Is does not cause any compensatory changes (neither functional nor structural) in the 'other' (i.e. looking at Is when silencing Ib or looking at Ib when silencing Is).

      The weakness of the manuscript lies in the modest gain over the previous work. Specifically, Aponte-Santiago had already shown that many parameters are not changed (in Ib when Is is perturbed, or in Is when Ib is perturbed), including that 'the Is terminal failed to show functional or structural changes following loss of the coinnervating Ib input' (quote from 2020 paper). Hence, the only major difference is that Han et al now show that Ib also does not really change when Is is silenced. Aponte-Santiago also clearly showed a ~50% EJP reduction when Is or Ib are perturbed alone, and adding these two equals wild type. The highly emphasized finding of Han et al. that (quote) ' composite values of Is and Ib neurotransmission can be fully recapitulated by isolated physiology from each input' quite obviously follows from the one key finding that one does not affect the other, as mentioned above in the strengths. The wording is a bit odd, but really adding Is (with Ib perturbed) and Ib (with Is perturbed) inputs is really not adding much over either the main finding nor the previous work.

      We thank this reviewer for his/her/their assessment of our study and for highlighting the strengths in characterizing the impact of BoNT-C expression at the NMJ. We also understand and appreciate the criticisms raised. It is important to note from the outset that the motivation and central goal of this study was not primarily to mechanistically dissect heterosynaptic plasticity between tonic and phasic motor inputs at the Drosophila NMJ. Rather, it was to develop an approach that would, for the first time, enable accurate isolation of complete neurotransmission from entire MN-Is or MN-Ib NMJs (both miniature and evoked transmission). By the reviewer’s own admission, we were entirely successful at achieving this central goal in our comprehensive characterization of BoNT-C.

      Next, the reviewer raises the valid question about whether this achievement is a significant advance over previous work, and discusses recent experimental findings regarding heterosynaptic plasticity at the fly NMJ. We want to emphasize here that having a tool that is capable, for the first time, of accurately discriminating complete transmission from Is vs Ib alone is a major advance, one that it is not clear the reviewer sufficiently appreciates. As summarized in Fig. 1, no previous attempts have been successful in accurately isolating synaptic transmission between Is vs Ib synapses. In particular, no previous approach was capable of isolating miniature activity from Is vs Ib, and as we show in our manuscript, miniature events exhibit major differences between the two inputs. Thus, without isolating miniature transmission, one cannot know baseline synaptic function in Is vs Ib nor whether heterosynaptic functional plasticity has been induced. Further, we detail major confounds with some of the previous approaches the reviewer alludes to in prior studies, including selective optogenetic stimulation.

      Finally, the reviewer discusses at length recent findings regarding heterosynaptic plasticity and questions whether the new insights revealed by BoNT-C provides a sufficient advance. In particular, the reviewer refers to previous work published in 2020 and 2021, where important initial insights into Is vs Ib structure and transmission after differential manipulations to either input was reported. The reviewer appears to believe that it was settled in these studies that no heterosynaptic functional plasticity was induced.

      However, a critical point that the reviewer appears not to appreciate is that while the two previous studies on heterosynaptic plasticity at the Drosophila NMJ were able to assess structural plasticity (AponteSantiago et al., 2020; Wang et al., 2021), no accurate or quantitative conclusions can be made about heterosynaptic functional plasticity from these studies. This is due to the authors not knowing what baseline synaptic function is at Is vs Ib (miniature frequency, miniature amplitude, and evoked transmission), so that in their manipulations they cannot accurately determine whether any functional changes are observed after their manipulations. Further complicating the interpretation of the previous studies is that at the muscle 1 NMJ (2020 study), like the muscle 4 NMJ (2021 study), ~30% of these NMJs fail to be innervated by a Is input in wild-type larvae. This major confound makes it difficult to know how or whether adaptive plasticity is induced in wild-type NMJs with or without Is innervation (since, interestingly, evoked transmission does not appear to change in wild-type m1 or m5 NMJs with or without a Is input), and then to determine whether any heterosynaptic plasticity is induced. Indeed, we have also struggled with how to accurately determine whether synaptic function changes compared to baseline throughout our studies at earlier stages, despite the fact that the muscle 6/7 NMJ we use in our study does not suffer from the variable Is innervation confounds observed at muscle 4 (Wang et al., 2021) and muscle 1 (Aponte-Santiago et al., 2020).

      Respectfully, we contend that the only way one can accurately and quantitatively determine baseline synaptic transmission (miniature amplitude, frequency, evoked, quantal content), and whether any changes are observed following manipulations to Is or Ib, is to fully and accurately recapitulate wild type (blended Is+Ib) neurotransmission from isolated Is vs Ib transmission. This is why we believe the data shown in Fig. 7 (and also Fig. S7 in the revised manuscript) is so important. It is true that numerous previous studies established relative and qualitative differences between Is vs Ib (miniature events are larger at Is relative to Ib, Is drives larger depolarization in response to single synaptic stimulation over Ib, etc). However, in no case did previous studies accurately assess baseline Is vs Ib synaptic function from entire inputs, and therefore could not conclude with certainty whether heterosynaptic functional plasticity was induced.

      On a different but somewhat similar topic, UAS-BoNT-C is not a new tool. I am a bit put off by the wording ' We have developed a botulinum neurotoxin, BoNT-C...'. More on this and the way the previous BoNT-C paper (Backhaus et al., 2016) is cited in the detail comments below in the recommendations for the authors.

      We understand these points raised by the reviewer. Our BoNT-C transgenic line is indeed a new tool, the only one in which synaptic transmission has ever been electrophysiologically characterized and shown to completely silence synaptic transmission in Drosophila. That being said, in retrospect, we can appreciate that the term “developed” might imply a level of innovation that reasonable people can disagree about. We have therefore elected to change the apparently offensive wording to “We have employed a botulinum neurotoxin, BoNT-C…” in the abstract of the revised manuscript.

      Additionally, the manuscript does not really dive into an analysis of phasic versus tonic functions (that's just a correlation with the Is and Ib dominant modes of function).

      We absolutely agree that selective silencing by BoNT-C now enables a rigorous study of tonic vs phasic neurotransmission at MN-Is vs MN-Ib NMJs, but that in the current manuscript we have not focused on this interesting question. We have adopted the convention the field has used to classify MN-Is and MN-Ib subtypes based on their apparent firing modes as “phasic” vs “tonic”, but like previous studies, we have not analyzed these functional distinctions on a deeper level. Although the focus of the current manuscript was to establish the properties of BoNT-C and highlight its utility as a tool for the field, we are now in the process of preparing an entirely new manuscript focused on just this reviewer’s question about the differences in tonic vs phasic synaptic physiology. This eight-figure manuscript will be entitled “Electrophysiological properties and nanoscale distinctions that define tonic vs phasic glutamatergic synapses” and is focused on the central question raised by the reviewer - how and why synaptic transmission differs between tonic vs phasic inputs. While this interesting question is outside the scope of the current manuscript, we will submit this new manuscript within the next few months, which is based on new experimental insights now enabled by selective BoNT-C silencing established in the current manuscript.

      Finally, since the authors show that loss of Is or Ib function does not cause any change in the other, we are left to wonder what actually DOES cause heterosynaptic plasticity. TNT or rpr DO cause some heterosynaptic plasticity and they also DO cause some structural changes - but whether the structural changes themselves are important here remains unclear. Substantial progress would have been to take the starting point that BoNT-C does not cause heterosynaptic plasticity, and then identify the signal that does (is it morphology? or signaling between Is and Ib? Or with the muscle?).

      We certainly agree with the reviewer that understanding how heterosynaptic plasticity is induced is an important question and worthy of additional investigation. As stated above, the focus of our current study was to establish the tool, BoNT-C, that will now enable a variety of fascinating and important future studies, both at understanding how and why synaptic strength differs between tonic vs phasic synapses and also how heterosynaptic plasticity signaling occurs at the NMJ. It required substantial time and experimental effort to establish that BoNT-C works to cleanly silence transmission without inducing structural and functional plasticity in the current manuscript (Figures 1-7 and several supplemental figures). Respectfully, we believe it is unreasonable to expect all of this data to be relegated to a “starting point” to then go on and probe heterosynaptic plasticity in more detail, all compressed into a single paper.

      It appears this reviewer is particularly interested in heterosynaptic plasticity, which we agree is a fascinating topic. First, we should clarify that in our experiments, TNT expression does NOT induce any heterosynaptic structural or functional plasticity (see Figures 6 and Table S2), at least in our studies at m6/7, m12/13, and m4 NMJs. Rather, TNT expression alters synaptic structure in the neuron in which it is expressed (“intrinsic structural plasticity”, Fig. 6), but does not induce any changes to the convergent input. Hence, the only evidence for actual heterosynaptic plasticity is the rather minor adaptations in synaptic structure and function observed following ablation of Is motor inputs (Fig. 6 and 8).

      In addition to the important insights revealed by BoNT-C in accurately distinguishing tonic vs phasic transmission outlined above, it appears that the reviewer does not fully appreciate the mechanistic constraints that the new BoNT-C tool reveals about heterosynaptic signaling. We would therefore like to highlight the key insights our study has revealed specifically about heterosynaptic plasticity. First, we show that at the muscle 6/7 NMJ, loss of MN-Ib completely eliminates Is innervation – this was not the finding reported in the 2020 study (Ib ablation was not reported in the 2021 study). Rather, AponteSantiago et al. 2020 reported that elimination of Ib did not trigger compensatory changes in active zone or bouton numbers of the Is input, no were compensatory increases in the Is EPSP reported. This may be due to the confounding variable Is innervation at the muscle 1 and muscle 4 NMJs used in the previous studies. Second, to what extent miniature transmission changes after manipulating activity from Is vs Ib could not be accurately assessed in previous studies because spontaneous activity persists following TNT expression as does innervation following rpr.hid expression. Third, and perhaps most important, our study is the only one that can demonstrate no heterosynaptic functional plasticity is induced by the physical presence but functional silencing of neurotransmitter release between tonic vs phasic inputs at NMJs with consistent innervation by both Is and Ib inputs.

      It is clear to us now that we did not do a sufficient job of emphasizing these advances our study has now revealed about the baseline and heterosynaptic relationships between Is vs Ib. We have added additional details throughout the revised manuscript to ensure these insights are highlighted in an effort for the reader to better appreciate the importance of this study.

      Overall, while an initial reading of the manuscript sounded rather exciting, a deeper analysis of the work in context of the literature of the last few years diminishes my enthusiasm for the novelty and progress provided.

      We have responded to the major criticisms raised by this reviewer above and hope that he/she/they can more fully appreciate the importance of the new tool we developed, the impact it will have on the field in opening new studies on tonic vs. phasic transmission, and establishing the rules of heterosynaptic plasticity between convergent tonic and phasic inputs on common targets.

    1. Author Response

      Reviewer #1 (Public Review):

      It should also be noted that their immunohistochemical studies of human fetal tissue for TBX5 and PTK7 are not convincing. There appears to be widespread staining of multiple cell types, suggesting either very broad expression of both genes or poor specificity of the primary antibodies.

      We appreciate the reviewer’s comment that the immunohistochemistry staining does not provide definitive evidence for the functional importance of TBX5 and PTK7 in PUV, however these images do confirm that the proteins are ‘in the right place at the right time’ during normal human urinary tract development. We have updated the discussion on page 19, line 441-445 to emphasise this. To further support a putative role for these proteins in urinary tract development we have added additional images from a second human embryo at the same gestation which confirms these distinct patterns of staining (Figure 8 – figure supplement 1 on page 14, lines 313-317). Even if these proteins can also be detected in other tissues or cell types, this does not detract from this idea, as in other locations the proteins may have redundant or different roles. 

      PUVs have not been described as a clinical manifestation of disease associated with mutations of either gene in humans.

      The reviewer is correct that rare variants affecting TBX5 and PTK7 have not previously been associated with PUV. They have however been associated with other developmental anomalies (as stated in the discussion on page 18, line 408-411 and page 19, line 434-437) confirming a clear role in embryonic development for both these genes.

      The fact that rare variant association testing did not identify an increased burden of rare, likely deleterious variants in these two genes (although with limited power in this cohort) suggests that PUV is not driven by ultra-rare, highly penetrant alleles in these genes. However, the identification of common and low-frequency variants using GWAS suggests a complex mode of inheritance for PUV likely in combination with maternal_/in utero_ factors. As with other complex traits, these signals provide potential insights into the underlying biology of this disease as opposed to the diagnostic implications of conventional monogenic gene discovery associated with purely Mendelian conditions. A paragraph on the Mendelian/complex trait implications of the findings of the study has been incorporated into the discussion (page 21-22, line 594-502).       

      Discuss how variants in either gene or in the patterns of structural variants that they found associated with PUV intersect with sex to result in this exclusively male condition.

      The fact that PUV is a uniquely male disease is most likely the result of differences in urethra and bladder development and length differences in urethra between males and females. Sex hormones may also potentially result in tissue-specific differences in gene expression (Ober, Loisel, and Gilad 2008). We have added a paragraph into the discussion to clarify this (page 20, line 454-463) as well as clarified the results of the chromosome X and sex-specific analyses (page 7, lines 149-155; see also Reviewer 2, point 5 below) as suggested. 

      Reviewer #2 (Public Review):

      Major:

      1. The replication study is problematic given that different genotyping methods are used for cases (targeted KASP) versus controls (WGS). This may introduce differential bias. Moreover, the ancestry of the control cohort (UK-based) does not seem to be well matched to the cases (predominantly German and Polish), and the lack of genome-wide data for the cases precludes proper adjustment for population stratification. The case-control design is also imbalanced in the replication study. The authors should reconsider their replication strategy to include a more balanced cohort with ancestry-matched controls and uniform genotyping. As an alternative, genome-wide genotyping of the replication case cohort would significantly enhance the study and should be considered.

      Many thanks to the reviewer for their valuable comments regarding the replication study case-control cohort. While different sequencing technologies were used to compare allele counts at the lead variants in the replication study (KASP genotyping for cases vs WGS for controls), both techniques exhibit > 99.5% accuracy and are subjected to variant level quality control metrics. Only individuals with reliably called genotypes were included in the replication analysis. This has been clarified in the methods section (page 30, line 693).

      We were able to obtain genome-wide genotyping data for 204 of the 395 European cases in the replication cohort. While (despite sustained effort on our part) we were unable to analyze this data jointly with the control cohort in the 100KGP due to enforced limitations on data sharing, we were able to demonstrate similar ancestry of the replication study cases and controls:  we performed PCA on a set of ~80,000 overlapping autosomal, high-quality, LD-pruned variants with MAF > 10% and projected the cases and controls separately onto (the same) data from the 1000 Genomes Project (Phase 3) labelled by ‘population’ (Figure 5). This clearly demonstrates that both cohorts have homogeneous European ancestry, as stated now in the results (page 8, lines 166-168).

      We note with thanks the reviewer’s comments regarding the case-control imbalance in the replication study which can sometimes result in a type 1 error. To address this, the case control ratio was reduced from 1:27 to 1:10.5 by including only the 4,151 male controls from the cancer cohort of the 100KGP. The results remained significant for both lead variants and have been updated in the manuscript (page 8, line 162-176; Table 2).

      When the number of controls was reduced to 500 males (a case:control ratio of 1:1.3), rs10774740 (TBX5 locus) remained significant demonstrating that case-control imbalance was not driving the observed signal (P\=9.9x10-3; OR 0.77; 95% CI 0.63-0.94). rs144171242 (PTK7 locus) however did not reach significance due to insufficient power (P\=0.06; OR 2.24; 95% CI 0.93-5.36). For a rare variant such as rs144171242 (MAF ~ 1%), a replication study with 500 controls is only powered to detect association with large effect size (OR > 3.5). A case:control ratio of ~1:10 is therefore optimal to maximize power to detect association, while minimizing unnecessary noise from excess controls. This has been added to the results section of the manuscript (page 8-9, lines 178-184).

      2. I am reassured that the TBX5 signal remains genome-wide significant in European-only analysis. However, the signal at PTK7 appears much less robust - it has borderline statistical significance (especially given that the authors test for all rare and common variants across the genome) and is represented by a single variant with a relatively rare risk allele that is differentially distributed by ancestry. Therefore, I would like to see more information for this specific signal:

      Information on the depth of coverage and the quality of the top variant

      This has been incorporated into the manuscript for both lead variants (Page 7, lines 142-145). For rs144171242 at the PTK7 locus, the meanDP was 29.34 and the meanGQ was 75.59.

      Information if the top PTK7 variant remain genome-wide significant after application of genomic control. Of note, the calculation of genomic inflation is dependent on sample size - lambda of 1.05 may represent an underestimate given low power of the cohort, and this point deserves at least a comment. Some methods correcting lambda for sample size have been proposed, and the authors should consider applying these methods.

      We appreciate the reviewer’s comments that the value of lambda may be affected by sample size and have added a comment to this in the manuscript (Page 7, line 136-137). Despite extensive searching, we were unable to find a recent published example of how to correct lambda for sample size and would be grateful if the reviewer could suggest a reference for this.

      To answer the reviewer’s specific question, application of genomic control to the lead variant at PTK7 results in P\=4.37x10-8 which remains below the threshold for conventional genome-wide significance. However, while the genomic inflation factor provides a reasonable indication of possible confounding by population structure, there are recognized limitations to applying it as a corrective factor as it assumes that all variants are confounded i.e., the same correction is applied irrespective of differences in population allele frequency which can be insufficient for some variants and lead to a loss of power in others. Furthermore, in addition to sample size, lambda can vary with heritability and disease prevalence (Yang et al. 2011) and its use for correction can therefore be too conservative and reduce power to detect significant associations. In this manuscript we therefore chose to use the mixed model approach (as part of SAIGE – detailed in the methods on page 28, lines 647-648), which has largely superseded older methods such as genomic control, to robustly correct for both population structure and cryptic relatedness and minimize false positive associations (Shin and Lee 2015).

      This locus requires more robust replication as discussed above. If more robust replication study is not possible, additional functional studies could provide more evidence in support of this locus.

      Please refer to point 1 regarding the revised and more robust evidence of replication. 

      3. There is no validation of sensitivity and specificity of SV detection by variant size or type (e.g. inversions, deletions, duplications). Also, since burden differences are not replicated independently, the authors should stress the exploratory nature of these analyses.

      We appreciate the reviewer’s comment that there is no independent validation of SV detection (e.g., by microarray or long-read sequencing) and this was reported as a limitation of our study in the discussion (page 22-23, line 520-524). However, one of the main strengths of this study is the use of clinical-grade WGS data where all samples have been sequenced on the same platform and undergone variant calling using the same bioinformatics pipeline. This essentially eliminates confounding due to differences in data generation and processing and the sensitivity and specificity of SV detection will therefore be the same for both cases and controls.

      We agree with the reviewer that the SV analyses have not yet been replicated independently and, as they suggest, have stressed the exploratory nature of the findings in the discussion (page 21, line 491-493).

      In the discussion (especially second paragraph, but also throughout), the authors overemphasize multi-ancestry nature of their study. The reality is that the included non-Europeans are very small in numbers (18 SAS cases, 11 AFR cases, and 14 admixed cases). I would suggest for the authors to specifically state these case counts and make it clear that expanded efforts to recruit non-Europeans are still needed given these very low numbers.

      We appreciate the reviewer’s comment about the overemphasis on the multi-ancestry nature of the study and the small absolute numbers of individuals included, however as a proportion of the cohort, a third of the cases are non-European: 14% are of South Asian ancestry, 8% are of African ancestry and 11% are admixed. This breakdown comprises a greater proportion of non-white European ancestry individuals than the UK as a whole (DOI: 10.5257/census/aggregate-2001-2), where the discovery cohort was based. This provides evidence that our study eliminates at least some of the Euro-centric bias present in existing genetic and genomic literature, at least as far as the UK population is concerned. Clearly, global studies fairly representing all populations would be needed to address this issue perfectly. The case counts were reported in Table 1 but we have now referenced the low absolute numbers and included the reviewer’s suggestion about expanding efforts to recruit non-European populations in the main text (page 22, line 518-520). We have also edited paragraph two of the discussion in response to the reviewer’s comments (page 17, line 387-398).   

      Supplemental figure 2 -provide case-control counts in each ancestral group (Y axis).

      These have been added to the figure legend of Figure 6 – supplemental figure 4 (previously Figure 5 - supplemental figure 2).

      Supplemental figure 3 is misleading since allelic frequencies in the cases are pooled and are not available individually for all depicted populations.

      Figure 5 - supplemental figure 3 has been removed and replaced by Figure 6 – supplemental figure 3 to show only the individual case, control and gnomAD AF by ancestry for AFR, SAS and EUR population groups instead of using the pooled allele frequencies.

      5. I did not see details of chr. X analysis. This is important given that the case group involves only Males and control group involves both Males and Females. Also, please explain how sex was used as a fixed effect (as stated in the methods) given that the case cohort is 100% male.

      We thank the reviewer for their insightful comments. Sex was used as a covariate (or fixed effect) to control for the anatomical differences in development of the urethra (and in utero hormonal changes) between the sexes in the control cohort (clarified in the methods, page 28, lines 651-653). Given the PheWAS findings (page 13, line 292-297) reveal an association between the lead variant near TBX5 and female genital prolapse and urinary incontinence, this suggests that while women do not develop PUV (due to differences in urethral development) they may manifest other lower urinary tract phenotypes. In theory, removing the female individuals from the control cohort should therefore strengthen the association as the signal would not be diluted by ‘affected’ women (i.e., those with potentially unknown lower urinary tract phenotypes). We tested this by performing a sex-specific male-only GWAS and found that the strength of association at both lead variants increased. The results of this have been added to the manuscript (page 7, line 149-155).

      The results of the chromosome X rare variant analysis are shown on the Manhattan plot (Figure 9), with no significant genes identified. We have added chromosome X to the mixed-ancestry and European GWAS as suggested (with no significant results) and the Manhattan and Q-Q plots have been updated in Figure 2 and Figure 6. The number of analyzed variants in each analysis has also been updated accordingly.

    1. Author Response

      Reviewer #2 (Public Review):

      Feeding behaviour in C. elegans has been extensively studied over decades. Several methods  of measuring feeding exist, but none can directly measure both pumping and locomotion  behaviour in freely-moving worm populations. The authors have developed a new  imaging-based method for automated detection of pharyngeal pumping events in freely moving

      C. elegans populations, and can thus simultaneously measure pumping and locomotion  behaviour in tens of worms, at a single-worm, single-pump resolution that is not possible with  previous methods. This user-friendly method can be applied to several research directions, such  as large-scale foraging, behavioural coordination, and high throughput screening.

      The authors designed their new method to be broadly applicable and user-friendly, for easy  adaptation in other research labs. However, adding direct evidence to show that "the method is  relatively insensitive to the optical instrument used" will better support this claim of wider  application.

      We appreciate the reviewer’s suggestion to show evidence that our method will also work on  data acquired on different microscopes. We now present data obtained on a second  epi-fluorescent microscope, which was downscaled and analyzed in Fig. 1H-J.

      The authors carefully benchmarked their new method against expert annotations and existing  results from previous methods, to both validate their method and reveal additional advantages.  They also assessed potential pitfalls of the method such as by examining the effect of  fluorescence imaging on the behavioural outcome, albeit only at the timescale of minutes. The  effect of longer-term fluorescence imaging should be further explored, which is relevant for  large-scale foraging experiments that the authors discussed. It could be helpful to determine the  maximum total exposure for the method to still be valid, both in terms of pump detection (which  could be sensitive to photobleaching) and behavioural modulation (which could be sensitive to  higher phototoxicity).

      We thank the reviewer for this comment. In response to their comment and related comments  by the other reviewers, we have provided bleaching curves and evidence of long-term imaging  to show the potential of the methods for longer scale assays. We found that with our illumination  intensity (see methods), bleaching was significant at a time scale of ~1h. We then added  triggered illumination and could extend the recording time to ~5 h (Methods). Additionally, we  perform a supplementary control for viability of worms exposed to continuous light (not  triggered) for 5 hrs. We do not observe any apparent phototoxic effect.

      Overall, the manuscript is well-written and the results are clearly presented both in terms of  statistics and interpretation. Methodological details are well-documented and openly accessible.

      We thank the reviewer for their positive view of our work and their appreciation for our efforts to  document both data and software.

      Reviewer #3 (Public Review):

      In this manuscript, the authors present a method for simultaneous assessment of pharyngeal  pumping (feeding) and locomotion in many C. elegans simultaneously. In this technique,  imaging of the fluorescent labeled pharynx provides a measure of velocity and pumping rate,  through analysis of the spatial variations in fluorescence.

      The technique is clearly described, well-validated, and yields some novel results. It has the  advantage that it can be performed using microscopes found in many C. elegans laboratories.

      We appreciate that the reviewer recognizes the wide applicability of the method across many C.  elegans  laboratories.

      Some limitations of the method include its reliance on fluorescence imaging, which is a  hindrance to genetic analysis, computational intensiveness, and phototoxic effects of  fluorescence excitation that are not fully explored in the manuscript.

      The authors show the utility of their method by assessing pharyngeal pumping and motor  behavior (1) during development, (2) in the presence or absence of food, and (3) in the  presence of two mutations affecting feeding.

      Although I understand these are proof-of-principle demonstrations, I still came away feeling  underwhelmed by these examples. I did not see any results here that could not have been  obtained fairly easily with conventional techniques.

      We appreciate the constructive criticism of the reviewer and highlight in the revised version the  fact that using conventional techniques such studies would require tens of hours of experiment  time. We would like to emphasize the comparisons in Table 1 where we show other methods  and their current limitations. Obtaining a dataset such as in Figure 3 which comprises a total of  34 worm-hours of pumping observation from unrestrained animals is to our knowledge currently  impractical with competing methods. We would like to remind the reviewer that, using our  method we were able to reveal bimodal distributions within a population as illustrated, for  instance, in Fig. 3F, 4B, and 4F. These observations are not possible when the single worm  resolution is not accessible or when large statistics are not feasible as it happens with previous  methods.

      Given these limitations, I feel the method's eventual impact in the field will be relatively small.

      In this study, we present a method allowing performing behavioral studies on worm populations  at high throughput and reduced costs. Such a technique opens the door to many laboratories  that can not do EPG recordings or microfluidics due to the technical difficulties, or that want to  study animals in their normal plate context. We also would like to emphasize that there are already more than 1500 strains containing myo-2  promoter transgene available on CGC, which  would be amenable to our imaging approach. These transgenic strains form broad classes of  interest, such as thermotolerance, ER stress resistance, aging and neural-circuit specific genes.

      Pharyngeal pumping has also been used as a read-out for pharmacological screens, for  example, bacteria pre-loaded with pharmacological agents are tested for their effect on  pharyngeal pumping rate. Pharaglow offers a high-throughput and sensitive method to measure  the pumping rate. This will benefit the field who use C. elegans  pumping for pharmacological  screens, and pave the way for the researchers who plan to use but are hindered by existing  techniques.

    1. Author Response

      Reviewer #1 (Public Review):

      7) Can the primary cells in Figure 2E and AML#1 and AML#2 be studied for mTORC1 activity by Western, as in 2D?

      For reasons that we do not understand, we have been unable to effectively culture primary FLT3-ITD AMLs, despite being able to culture most other AMLs for weeks. This issue has prevented us from being able to perform biochemical analyses of FLT3-ITD AMLs in response to FLT3 inhibition.

      8) Additional genetic information should be provided if possible for the primary AML cells - what other mutations in addition to FLT3 were present? Were there any mTOR pathway alterations?

      We provided the other mutations of AML#1 sample (NPM1 mutation) in the section METHODS-Therapeutic modeling in mice, as well as Figure legends 2E and 3D. There were no evident alterations in the mTOR pathway (beyond the FLT3-ITD mutation).

    1. Author Response:

      We thank the reviewers for their thoughtful critiques and helpful suggestions for how to improve our manuscript. Described below is our response clarifying a number of issues raised by the reviewers.

      We agree with the reviewer that we cannot definitively conclude that the first division chromosome segregation defects and the later mid-blastula transition CI-induced defects are the result of distinct mechanisms. In fact, we raise this possibility in the discussion. However, our finding that the CI phenotype induces a temporally and developmentally deferred chromosome segregation defects in the late blastoderm divisions (in addition to the well-studied first division defect) alters the established view of the CI phenotype and must be taken into account when considering mechanisms of CI. Our current view is that the distinct early and late defects could be caused by either 1) a common mechanism (possibly a chromosome mark/defect inherited through the early blastoderm divisions causing segregation defects in the late blastoderm divisions) or 2) distinct early and late mechanisms that do not strictly “depend” upon one another. We have clarified this point in the revised manuscript.

      We disagree with the reviewer that this result is to be expected given previous studies. In D. simulans, a small percentage of embryos derived from the CI cross hatch. These embryos are thought to have bypassed the first division defect. It is not obvious why there must be late defects in these embryos that “escape” early CI-induced defects and subsequently hatch. Previous studies interpreted embryos that exhibit late division errors as those that have lost their entire paternal complement of chromosomes as a result of strong CI-induced defects during the first mitotic division and develop as maternal haploids. These studies, including transgene- induced CI, have focused primarily on embryos that have undergone the first mitotic division embryonic defects. To the best of our knowledge, no group has thoroughly examined embryos that progress normally through the pre-cortical cycles 2-9 as performed in this manuscript. Thus, it was entirely unexpected that these embryos would exhibit the mitotic defects during the late blastoderm divisions and the MBT. We discuss how this finding requires modified current models for the mechanisms of CI.

      Regarding the comment that “the primary claim of the paper that later-stage embryos die for different reasons than early-stage embryos,” we make no such claim. In fact, we provide evidence that the failure to hatch (late embryonic lethality) is, at least in part, due to haploid development—a direct result of the first division CI defect. The focus of our studies are those CI-derived embryos that progress normally, maintain the normal complement of chromosomes through the first division, and exhibit chromosome segregation errors during the late blastoderm divisions. We do not know the fate of these embryos, and previous studies have demonstrated that embryos suffering extensive late blastoderm segregation errors are able to hatch (Sullivan, 1990, Development 110:311-323). We have clarified these points in the discussion.

      While we agree that transgenic tools have proven invaluable in the study of CI, they are not appropriate for these studies. The purpose of our study was to undertake an unbiased re-examination of the CI phenotype. Of necessity, the transgenic studies rely on exogenous host promoters rather than the natural endogenous Wolbachia/Prophage promoters. Thus, while informative, it is unlikely the that the transgenic alleles would capture all of the complexities and nuance of the CI phenotype. In addition, the transgenic studies, of which we are aware, have only interrogated a single pair of the CI-inducing genes, while the Wolbachia genome contains both Cid and Cin CI-associated gene pairs and possibly other yet-to-be-identified CI/Rescue genes.

      Our unbiased re-examination of the CI phenotype induced by W. riverside in D. simulans identified a previously unsuspected temporally and developmentally distinct set of CI-induced defects that occur during and after the mid-blastula transition. This finding must be taken into account when considering the mechanisms that cause CI. In our revisions, we clarify the above points and qualify our statements to appropriately interpret our results in context of the nuances and uncertainties of CI and early Drosophila embryogenesis.

    1. Author Response:

      Reviewer #3 (Public Review):

      The authors revealed the novel role of the DLL-4-Notch1-NICD signaling axis played in platelet activation, aggregation, and thrombus formation. They firstly confirmed the expression of Notch1 and DLL-4 in human platelets and demonstrated both Notch1 and DLL-4 were upregulated in response to thrombin stimulation. Further, they confirmed the exposure of human platelets with DLL-4 would lead to γ-secretase mediated NICD (a calpain substrate) release. Stimulating platelets with DLL-4 alone triggered platelet activation measured by integrin αIIbβ3 activation, P-selectin translocation, granule release, enhanced platelet-neutrophil and platelet-monocyte interactions, intracellular calcium mobilization, PEVs release, phosphorylation of cytosolic proteins, and PI3K and PKC activation. In addition, Susheel N. Chaurasia et al. showed that when platelets were stimulated with DLL-4 and low-dose thrombin, the Notch1 signaling can operate in a juxtacrine manner to potentiate low dose thrombin mediate platelet activation. When the DLL-4-Notch1-NICD signaling axis was blocked by γ-secretase inhibitors, the platelets responding to stimulation were attenuated, and the arterial thrombosis in mice was impaired.

      This study by Susheel N. Chaurasia et al. was carefully designed and used multiple approaches to test their hypothesis. Their research raised the potential of targeting the DLL-4-Notch1-NICD signaling axis for anti-platelet and anti-thrombotic therapies. Interestingly, compared to thrombin, a potent physiological platelet agonist, the signaling cascade triggered by DLL-4 was relatively weak. Given that the upregulation of DLL-4 and Notch1 happened in response to thrombin stimulation, how much DLL-4 mediated signaling could contribute to in vivo platelet activation in the presence of thrombin is questionable. This could potentially limit the application of targeting Notch1 as an anti-thrombotic therapy. Further, the authors showed that Notch1 signaling could operate in a juxtacrine manner to potentiate low dose thrombin mediate platelet activation, which means the DLL-4 mediated platelet signaling can act as an accelerator of early-stage hemostasis. Again, inhibition of Notch1 may slow down the hemostasis process. But given the fact that there are other platelet agonists (ADP, collagen...) existing simultaneously, blocking Notch1 signaling may not have a strong anti-platelet effect.

      We concur with the Public Reviewer that, further study is needed to delineate extent of contribution of DLL-4 signaling in thrombin-activated platelets. However, it is now amply clear that Notch signaling plays a central role in development of thrombinactivated phenotype of platelets. Further, DLL-4-Notch1 interaction on surfaces of adjacent platelets within the thrombus reinforces platelet-platelet aggregate formation. This is further reflected from significant inhibition of thrombus formation in vivo in presence of DAPT in a mouse model of intravital thrombosis. Given that there is a lot of redundancy in stimulation of platelets employing different physiological agonists (ADP, collagen, thrombin etc.), none of the present-day drugs is fully capable of effective platelet inhibition due to parallel signaling pathways. Thus, discovery of Notch signaling and its seminal role in platelet activation could explain redundancy associated with anti-platelet drugs, and Notch inhibition could complement with existing anti-platelet regimen in evoking effective and complete platelet inhibition.