2,537 Matching Annotations
  1. Last 7 days
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      General Statements:

      We were very pleased and appreciative of the reviewer’s comments, and constructive suggestions for improving the manuscript. In response to their suggestions, we have added new text to better emphasize the importance of the question, the novelty of our approach, the significance of the results, and the potential for future discovery,

      To summarize our key findings, we have identified 3,500 instances where – despite their shared ancestry - only one of two paralogous proteins undergoes a specific post-translational modification. By comparing adjoining sequences across 1012 isolates of the same yeast species, we determined that sequence conservation near sites of modification is greater than at sites that are not modified. We postulate that these differences in sequence are partly responsible for the differences in post-translational modifications, and that differences in modification allow duplicated proteins to be differentially regulated. These differences may account for their retention after 100M years of evolution.

      Our analysis is clearly distinct from earlier investigations. In particular, we use new and substantially larger proteomics datasets reporting multiple types of post-translational modifications, new tools to analyze protein structure (AlphaFold), as well as new and expanded protein interactome datasets. Perhaps most importantly, we rely entirely on in-species sequence conservation data, with particular emphasis on duplicated proteins. Finally, we developed a custom algorithm (CoSMoS.c.) and web site that quantifies sequence conservation, in an automated fashion, across all 1012 unique strain isolates.

      We propose that in-species comparisons of paralogs will prove to be more reliable than cross-species comparisons of orthologous proteins and/or in-species comparisons of non-homologous proteins. Comparison of paralogs is powerful because they are likely to have similar structures and functions, due to their shared evolutionary origin. Comparison within a single species is powerful because it avoids non-biological sources of uncertainty, such as potential alignment errors and any accompanying structural differences. Thus, by comparing unique modifications in closely-related gene products and across closely-related strain isolates, investigators using CoSMoS.c. will be better able to predict new enzyme-substrate relationships, identify new motifs for post-translational modifications, and prioritize mechanistic investigations of those modifications.

      All of the reviewers asked that we explain the motivation for the design choice, compare our design with those used in earlier studies, add new controls for the effects of protein abundance, and provide examples of how our novel approach may be useful to investigators who study post-translational modifications. We are pleased to report that we were able to address all of these issues with revised text, additional references, two new control experiments, and real-world examples of individual paralog-paralog comparisons that have been useful in the past.

      Finally, we have changed the title to: Differential modification____ of protein ____paralogs reveals conserved sequence determinants of post-translational ____modification

      And we have changed the running title to: In-species evolution of protein modification sites

      Reply to the Reviewers:

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): *

      Summary: This paper reports bioinformatics analysis of population variation in PTM sites in paralogs from the yeast whole-genome-duplication. If I understand it correctly, the main finding is that modified sites show less population variation than paralagous unmodified sites. The results are largely in line with what is expected based on previous studies, though the authors do not present their results in that context.

      Major comments:

      1. The study benefits from two clever design choices:

      First, comparison of sites between paralogs is a very powerful test for an evolutionary hypothesis because paralogous sites are expected to have relatively similar structural context. Second, use of within species polymorphism data is much less susceptible to alignment errors that can be an issue for longer evolutionary comparisons.

      However, these design choices are not discussed or motivated by the authors. Nor are they compared to the designs of previous studies. Examples of previous studies (PMID: 22588506, PMID: 21273632, PMID: 20594336,PMID: 20594336, PMID: 24465218, PMID: 22889910, PMID: 20368267, PMID: 28054638)** *

      We were very pleased and appreciative of the reviewer’s comments, and constructive suggestions for improving the manuscript. We have added nearly all references suggested by the reviewer, as well as new text describing____ the central findings of these papers, as follows:

      ”Most importantly, and in contrast with previous studies, we restricted our analysis to modified and unmodified pairs of paralogous proteins. This represents a very powerful test for the hypothesis because paralogs have a shared evolutionary history and are expected to have similar secondary structures. Moreover, the use of within-species polymorphism data is much less susceptible to the alignment errors that often occur with longer evolutionary comparisons.”

      and

      “Our analysis is clearly distinct from - and complementary to - earlier investigations of post-translational modifications in yeasts. … Our analysis builds on these foundational studies, by considering new and substantially larger proteomics datasets, multiple additional types of post-translational modifications, new and sophisticated models of protein structure, large-scale kinase interactome data, and in-species sequence conservation data – with particular emphasis on duplicated proteins.

      We propose that in-species comparisons of paralogs will prove to be more reliable than cross-species comparisons of orthologous proteins, or in-species comparisons of non-homologous proteins. Comparison of paralogs is powerful because they are likely to have similar structures and functions, due to their shared evolutionary origin (56, 58, 68). Comparison within a single species is powerful because it allows us to avoid important non-biological sources of uncertainty, such as potential alignment errors and unknown structural or functional differences.”

        1. One essential control that needs to be added is how much of the effect the authors observe can be explained by protein abundance. In yeast, protein abundance is strongly negatively correlated with evolutionary rate, and is strongly positively correlated with identification of PTMs in MS and other assays (extensively discussed in some of the previous studies I listed above). The authors need to assess whether their findings are due to the slow evolution of highly expressed proteins, and the detection bias for these proteins in PTM identification experiments. As far as I could tell this was not discussed by the authors.*

      This point was also raised by Reviewer #2. We have added additional text stating that detection of PTMs by mass spectrometry is correlated with protein abundance.____ In addition, and as suggested by the reviewers, we have now done a control experiment using cross-study conservation of PTMs and limiting our comparison to proteins of similar abundance. By both methods, and as detailed below, we were able to confirm our original findings:

      “We then reanalyzed our data to account for possible effects of protein abundance, which in cross species comparisons was observed to negatively correlate with evolutionary rate and positively correlate with modification detection by mass spectrometry (39). Accordingly, we restricted our in-species analysis to a subset of 270 paralog pairs that have similar ( 100 instances each of phosphorylation, ubiquitylation and succinylation, where the target and paralog have the same amino acid, but only the target is modified. Even with this restricted dataset, we obtained similar results for all three types of analysis (Dataset S9). We also considered the potential effect of false positives and false negatives among the reported modification sites. False positives can result from ambiguous assignments, as might arise through misidentification of modified sites within peptides that contain multiple potential sites of modification. False negatives can result from difficulties in detecting modifications in poorly expressed proteins (39), or an overly strict reliance on high confidence sites. We then further restricted the data to only include modifications identified in multiple studies. After applying this additional filter, we were left with > 100 instances of phosphorylation. Once again, we obtained similar results for Symmetric Average Score and One-sided Average Score analysis, but not for Chemical Similarity Average Score, which is further restricted by splitting the data into five chemical categories (Dataset S10).”

      • 3.A major weakness of the paper is its lack of focus. It includes a rambling historical introduction and discussion that omits discussion of the relevant recent research directly related to the questions at hand. For example, the paper describes historical work on phosphorylase, but gives not a single example of a paralog pair with a polymorphic PTM site identified in their study. The authors introduce gene duplication in a very general way, even though several papers have focused specifically on evolution of protein regulation in paralogs (e.g., PMID: 20080574, PMID: 27003913, PMID: 25474245) The paper of Nguyen Ba et al. 2014 (PMID: 25474245) seems especially relevant, as in addition to perfoming a genome-wide analysis, their abstract reads "We examine changes in constraints on known regulatory sequences and show that for the Rck1/Rck2, Fkh1/Fkh2, Ace2/Swi5 paralogs, they are associated with previously characterized differences in posttranslational regulation." It seems that the results of that study could be directly compared to the analysis performed here.*

      This point was also raised by Reviewer 2. At the suggestion of the reviewers, we have moved or removed discussion of these foundational studies of PTM mapping and added discussion of well-characterized examples of paralog pairs with polymorphic PTM sites, based on the references provided, as follows:

      “We propose that in-species comparisons of paralogs will prove to be more reliable than cross-species comparisons of orthologous proteins, or in-species comparisons of non-homologous proteins. Comparison of paralogs is powerful because they are likely to have similar structures and functions, due to their shared evolutionary origin (56, 58, 68). Comparison within a single species is powerful because it allows us to avoid important non-biological sources of uncertainty, such as potential alignment errors and unknown structural or functional differences. This is supported by a small number of prior studies, which compared four sets of paralogous proteins in yeast - Rck1 v. Rck2, Fkh1 v. Fkh2, Ace2 v. Swi5 (68), and Boi1 v. Boi2 (58), and concluded that divergence in short linear motifs is likely responsible for differences in phosphorylation. While paralogs are far less common in other organisms, a similar conclusion emerged from a comparison of predicted sites of phosphorylation in mammalian p53, p63 and p73 (69).

      Our analysis of differentially-modified pairs of paralogous proteins revealed that the most common modifications – phosphorylation, ubiquitylation and acylation but not N-glycosylation – occur within regions of high sequence conservation. Further studies will benefit from the availability of our search algorithm CoSMoS.c.. For example, when studying a particular protein kinase, CoSMoS.c. can be used to identify specific motifs near potentially modified serines, threonines and tyrosines (Table 2). When studying a particular substrate of ubiquitylation, CoSMoS.c. can be used to prioritize conserved versus non-conserved sequences flanking potentially modified lysines. For rare modifications, CoSMoS.c. can also be used to locate highly conserved regions as the starting points for finding new sequence motifs. Thus, by comparing unique modifications in closely-related gene products and across closely-related strain isolates, we can prioritize mechanistic investigations of modifications that are likely to have functional importance, to identify recognition motifs for specific modifying enzymes, and to better predict new enzyme-substrate relationships.”

      *Reviewer #1 (Significance (Required)):

      The significance is hard to assess because the research is not given proper context and motivation.

      I believe the study could be of interest to research studying cell signalling and its evolution, as well as those interested in gene family diversification. However, as written, no specific examples are given or clear hypotheses tested, making the paper seem largely descriptive.

      My keywords: molecular evolution, signalling, intrinsically disordered regions, computational biology

      *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): *

      Summary

      The authors of this work study how S. cerevisiae paralogue pairs are differentially modified with respect to five major PTM classes: phosphorylation, ubiquitination, mono-acetylation, N-glycosylation, and succinylation. Emphasis is placed on paralogue pairs where a modification is found in only one of the two paralogues at homologous positions. A conservation analysis is then performed across 1011 S. cerevisiae isolates to check for differences in conservation between the modified target and its unmodified paralogue. The authors claim that, for most of the PTM classes, modified targets tend to be more conserved than their unmodified paralogues. Phosphorylation sites between paralogue pairs were also compared using AlphaFold2 and a database of kinase interactions (YeastKID), revealing differential interactions between paralogues but no significant structural differences. *

      We were very pleased and appreciative of the reviewer’s comments, and constructive suggestions for improving the manuscript.____ * *

      *Major:

      1) A major issue with this work is that the problem of 'false negatives' for PTM detection is never adequately addressed or controlled for. As the authors allude to in the manuscript, the number of PTM sites detected is likely far below the number that exists and this is especially a problem for the less well characterised PTM classes. How then can the authors be confident that an 'unmodified' site is truly unmodified and not just undetected? The authors can refer to Freschi et al., 2011 (MSB) for a method that controls for the false negative (FN) PTM detection rate by comparing cross-study conservation with cross-study reproducibility. *

      * 2) The second point follows closely from the first. The issue is that MS-based PTM detection is generally biased towards abundant proteins, and protein abundance also correlates strongly with evolutionary rate, with more abundant proteins tending to have higher conservation. Taken together, these two relationships could explain the observation that the modified paralogue tends to be more conserved than the 'unmodified' paralogue. The authors should try and control for the effect of protein abundance on the results observed; for example, by checking if the results/conclusions change when restricting the analysis to paralogue pairs with similar abundances. *

      * 3) Alongside false negatives, there is the cognate issue of false positives and mislocalised PTM sites (see Lanz et al., 2021, EMBO Reports). If possible, the authors should check to see if their conclusions change when restricting the analysis to high-confidence PTM sites identified from multiple sources and/or validated by low throughput experimental assays.*

      __This point was also raised by Reviewer #1. To address the concern, we have now done a new control analysis, one that uses only those modifications identified in multiple studies and comparing only proteins of similar (“We then reanalyzed our data to account for possible effects of protein abundance, which in cross species comparisons was observed to negatively correlate with evolutionary rate and positively correlate with modification detection by mass spectrometry (39). Accordingly, we restricted our in-species analysis to a subset of 270 paralog pairs that have similar ( 100 instances each of phosphorylation, ubiquitylation and succinylation, where the target and paralog have the same amino acid, but only the target is modified. Even with this restricted dataset, we obtained similar results for all three types of analysis (Dataset S9). We also considered the potential effect of false positives and false negatives among the reported modification sites. False positives can result from ambiguous assignments, as might arise through misidentification of modified sites within peptides that contain multiple potential sites of modification. False negatives can result from difficulties in detecting modifications in poorly expressed proteins (39), or an overly strict reliance on high confidence sites. We then further restricted the data to only include modifications identified in multiple studies. After applying this additional filter, we were left with > 100 instances of phosphorylation. Once again, we obtained similar results for Symmetric Average Score and One-sided Average Score analysis, but not for Chemical Similarity Average Score, which is further restricted by splitting the data into five chemical categories (Dataset S10).”

      4) The authors define conservation here using 1011 wild and domesticated yeast isolates within one species (S. cerevisiae). While this is clearly valuable information, this reviewer wonders why orthologues from closely related species were not also leveraged to assess the evolutionary rate, as is traditionally done for studies on PTM evolution? Is there a strong rationale for this? Using more distantly-related genomes could give more statistical power for the detection of weak differences in selective constraint between paralogues.

      We believe that a major strength of our study is the reliance on____ in-species sequence conservation data – with particular emphasis on duplicated proteins. To better emphasize this point, we have added new text as follows:

      ”Most importantly, and in contrast with previous studies, we restricted our analysis to modified and unmodified pairs of paralogous proteins. This represents a very powerful test for the hypothesis because paralogs have a shared evolutionary history and are expected to have similar secondary structures. Moreover, the use of in-species polymorphism data is much less susceptible to the alignment errors that often occur with longer evolutionary comparisons.”

      and

      “We propose that in-species comparisons of paralogs will prove to be more reliable than cross-species comparisons of orthologous proteins, or in-species comparisons of non-homologous proteins. Comparison of paralogs is powerful because they are likely to have similar structures and functions, due to their shared evolutionary origin (56, 58, 68). Comparison within a single species is powerful because it allows us to avoid important non-biological sources of uncertainty, such as potential alignment errors and unknown structural or functional differences.”

      *Minor:

      1) Both the Introduction and Discussion describe PTMs and the evolution of gene duplication in very general terms. However, literature concerning the evolution of PTMs and specifically the evolution of PTMs following gene duplication has been largely ignored. These studies give the most relevant context to this work and should be described and cited. Freschi et al., 2011 (Molecular Systems Biology) and Ba et al., 2014 (PloS Computational Biology) are particularly relevant. *

      We have added references suggested by the reviewer, as well as new text describing the central findings of these papers, as follows:

      “Our analysis is clearly distinct from - and complementary to - earlier investigations of post-translational modifications in yeasts. Previous analysis showed that duplicated proteins in Saccharomyces cerevisiae are more likely to be phosphorylated, and to have a greater number of phosphorylation sites, than non-duplicated proteins (58). The difference persisted when controlling for differences in protein abundance, coverage, essentiality, positioning within protein interaction networks and assembly into multi-protein complexes (58). When compared with a yeast species that diverged before the whole genome duplication event, it appears that the majority of phosphorylation sites in paralogs have either been lost or gained, with a strong bias toward losses (56). Subsequent cross-species comparisons noted a high degree of sequence conservation near sites of phosphorylation and other types of modification in yeasts (49, 59-65). The relationship was strongest for phosphosites with known function (49, 50, 61). A focused study of 249 unique high-confidence phosphorylation sites, targeted by 7 protein kinases in S. cerevisiae, confirmed that regions flanking sites of phosphorylation are significantly constrained, in comparison with other closely related yeast species (61). A similar relationship exists for sites phosphorylated by the cyclin-dependent protein kinase Cdk1 (66), and was the basis for predicting novel sites of phosphorylation by the cAMP-dependent protein kinase (67). Our analysis builds on these foundational studies, by considering new and substantially larger proteomics datasets, multiple additional types of post-translational modifications, new and sophisticated models of protein structure, large-scale kinase interactome data, and in-species sequence conservation data – with particular emphasis on duplicated proteins.

      We propose that in-species comparisons of paralogs will prove to be more reliable than cross-species comparisons of orthologous proteins, or in-species comparisons of non-homologous proteins. Comparison of paralogs is powerful because they are likely to have similar structures and functions, due to their shared evolutionary origin (56, 58, 68). Comparison within a single species is powerful because it allows us to avoid important non-biological sources of uncertainty, such as potential alignment errors and unknown structural or functional differences. This is supported by a small number of prior studies, which compared four sets of paralogous proteins in yeast - Rck1 v. Rck2, Fkh1 v. Fkh2, Ace2 v. Swi5 (68), and Boi1 v. Boi2 (58), and concluded that divergence in short linear motifs is likely responsible for differences in phosphorylation. While paralogs are far less common in other organisms, a similar conclusion emerged from a comparison of predicted sites of phosphorylation in mammalian p53, p63 and p73 (69).”

      *2) While I enjoyed to a limited extent the historical perspective on PTM discovery, there is far too much text given to this overall and the writing should be made more concise by removing excessive detail. This is especially the case for the Results section, where the emphasis should be on the analysis performed by the authors. *

      This point was also raised by Reviewer 1. At the suggestion of the reviewers, we have moved or removed discussion of these foundational studies of PTM mapping and added discussion of well-characterized examples of paralog pairs with polymorphic PTM sites, based on the references provided, as detailed above.

      *3) Description of the methodology should be reviewed for language and clarity. In particular, the authors should explain explicitly the meaning of new terms such as 'pairing structure' and how this may confer an 'advantage / disadvantage' to target proteins -- wording that this reviewer found especially confusing and unnecessary. The authors should also be explicit about how the distributions for each test are constructed; the current wording sometimes gives the impression that a distribution is derived from a single target or paralogue instead of being derived from a set of modified targets and the corresponding set of unmodified paralogues. Another confusion is that the Distribution Mean Test is contrasted with the Paralog Pairing Test in Fig S8 and yet on page 15 the Distribution Mean Test is described as 'paired' test on page 15 even though from the description the test seems unpaired? *

      We are now more explicit about how the distributions for each test are constructed, and we have clarified the meaning of the terms 'pairing structure', 'advantage / disadvantage' and ‘Distribution Mean Test’, as follows:

      “We then performed two statistical tests: the Distribution Mean Test, which determines whether the mean of the distribution of target protein conservation scores (that is, the mean conservation score for all modified target proteins) is significantly larger than that of the unmodified paralogs, and the Paralog Pairing Test, which tests whether the pairing structure confers an advantage for the target proteins. Figure 2 presents two possible pairing structures (panels A and C) and how these can advantage (panels A and B) or disadvantage (panels C and D) target proteins...”

      “In this instance we applied a one-sided, paired Mann-Whitney-Wilcoxon Test (100), which determines whether the target protein conservation score distribution is significantly larger than the unmodified paralog conservation score distribution, without assuming that they follow a normal distribution. We used the paired test because the comparison is between the means of paired observations that have a relationship between the two groups (modified target and unmodified paralogs). Hereafter we refer to this as Distribution Mean Test.”

      4) Following on from point 2) in the 'major' section above, the authors could consider normalising the conservation scores within a protein to control for the effect of protein abundance and other potential confounders acting at the protein level.

      We have added additional text stating that detection of PTMs by mass spectrometry is correlated with protein abundance.____ In addition, and as suggested by the reviewers, we have now done a control experiment using cross-study conservation of PTMs and limiting our comparison to proteins of similar abundance. By both methods, and as detailed below, we were able to confirm our original findings:

      “We then reanalyzed our data to account for possible effects of protein abundance, which in cross species comparisons was observed to negatively correlate with evolutionary rate and positively correlate with modification detection by mass spectrometry (39). Accordingly, we restricted our in-species analysis to a subset of 270 paralog pairs that have similar ( 100 instances each of phosphorylation, ubiquitylation and succinylation, where the target and paralog have the same amino acid, but only the target is modified. Even with this restricted dataset, we obtained similar results for all three types of analysis (Dataset S9). We also considered the potential effect of false positives and false negatives among the reported modification sites. False positives can result from ambiguous assignments, as might arise through misidentification of modified sites within peptides that contain multiple potential sites of modification. False negatives can result from difficulties in detecting modifications in poorly expressed proteins (39), or an overly strict reliance on high confidence sites. We then further restricted the data to only include modifications identified in multiple studies. After applying this additional filter, we were left with > 100 instances of phosphorylation. Once again, we obtained similar results for Symmetric Average Score and One-sided Average Score analysis, but not for Chemical Similarity Average Score, which is further restricted by splitting the data into five chemical categories (Dataset S10).”

      *5) For the analysis of motifs, departure from the BLOSUM62 expectation may just reflect the fact that many of these PTMs fall in disordered regions - which have distinct amino acid propensities -- whereas matrices like BLOSUM62 were constructed mostly from ordered protein regions. *

      We have modified the Materials and Methods section to reflect this alternative, as follows:

      “If the observed changes differ substantially from expectation (BLOSUM62), this suggests the presence of selection pressure and functional importance. This might also arise from distinct amino acid propensities when comparing ordered protein regions, from which the BLOSUM62 matrices were constructed, and disordered regions, where most modifications are likely to occur. This is unlikely to impact our results, as we are comparing structurally similar paralogous proteins. In addition, we are using multiple score algorithms to support our conclusions.”

      6) The analysis of sequence motifs could be extended by scoring phosphosites with yeast position weight matrices (PWMs) for protein kinases and comparing the results between modified targets and their unmodified paralogues. This can help distinguish true positive and false negative modification differences. See Freschi et al., 2011 (Molecular Systems Biology).

      We have performed this analysis according to the reviewer’s suggestion and added new text to the Results, as follows:

      “Finally, in an initial effort to match sites of phosphorylation with protein kinases, we used the position-weight matrices (PWMs) developed by Mok et al. (56, 57). That analysis determined phosphorylation site selectivity for 61 of the 122 kinases in Saccharomyces cerevisiae and proposed empirically-derived PWMs that enable the assignment of candidate protein kinases to known sites of phosphorylation (56, 57). We applied the PWMs to our dataset, which contains sites where one of the two proteins is known to be phosphorylated and the amino acid residue is the same in both. From this dataset, we kept 190 paralogous pairs where each protein contains at least one such phosphorylation site, so that both proteins would have kinase interactions to be compared. Using the PWMs from (57), we assigned the kinase that most likely corresponds to each phosphorylation site, as implemented in (56). Out of the 190 paralogous pairs, 130 interacted with different kinases. Together, these results indicate that most kinases regulate one or the other of the protein paralogs. They suggest further that differential modifications reported here may be the result of differential interactions with modifying enzymes.”

      *Reviewer #2 (Significance (Required)):

      This work is potentially of specialist interest to researchers studying the evolution of PTMs. While the evolution of phosphorylation following gene duplication has been studied previously (Freschi et al 2011, MSB), this work considers other PTM classes and takes advantage of a much larger data set. Potentially, clear examples of paralogue PTM divergence could be used as a basis for follow-up experiments. However, the web-server as it is now is designed to facilitate the easy analysis of a single protein at a time and not comparisons across paralogue pairs.

      *

      We have added new text to better emphasize the importance of the question, the novelty of our approach, the significance of the results, and the potential for future discovery, as follows:

      “Post-translational modifications are critical functional elements within proteins, and are therefore expected to be conserved in evolution. Here, we have identified several thousand instances where, despite a shared ancestry, only one of two paralogous proteins undergoes a specific post-translational modification. We also developed a custom algorithm that quantifies sequence conservation, in an automated fashion, across 1012 unique strain isolates. By comparing adjoining sequences in multiple isolates of the same species, we determined that sequence conservation near sites of modification is greater than at sites that are not modified. In addition, many of the modifications were associated with characteristic sequence elements nearby. We postulate that these differences in sequence conservation are partly responsible for differences in post-translational modifications, that differences in post-translational modifications allow duplicated proteins to be differentially regulated, and these differences may account for their retention after 100M years of evolution.

      Our analysis is clearly distinct from - and complementary to - earlier investigations of post-translational modifications in yeasts. Previous analysis showed that duplicated proteins in Saccharomyces cerevisiae are more likely to be phosphorylated, and to have a greater number of phosphorylation sites, than non-duplicated proteins (58). The difference persisted when controlling for differences in protein abundance, coverage, essentiality, positioning within protein interaction networks and assembly into multi-protein complexes (58). When compared with a yeast species that diverged before the whole genome duplication event, it appears that the majority of phosphorylation sites in paralogs have either been lost or gained, with a strong bias toward losses (56). Subsequent cross-species comparisons noted a high degree of sequence conservation near sites of phosphorylation and other types of modification in yeasts (49, 59-65). The relationship was strongest for phosphosites with known function (49, 50, 61). A focused study of 249 unique high-confidence phosphorylation sites, targeted by 7 protein kinases in S. cerevisiae, confirmed that regions flanking sites of phosphorylation are significantly constrained, in comparison with other closely related yeast species (61). A similar relationship exists for sites phosphorylated by the cyclin-dependent protein kinase Cdk1 (66), and was the basis for predicting novel sites of phosphorylation by the cAMP-dependent protein kinase (67). Our analysis builds on these foundational studies, by considering new and substantially larger proteomics datasets, multiple additional types of post-translational modifications, new and sophisticated models of protein structure, large-scale kinase interactome data, and in-species sequence conservation data – with particular emphasis on duplicated proteins.

      We propose that in-species comparisons of paralogs will prove to be more reliable than cross-species comparisons of orthologous proteins, or in-species comparisons of non-homologous proteins. Comparison of paralogs is powerful because they are likely to have similar structures and functions, due to their shared evolutionary origin (56, 58, 68). Comparison within a single species is powerful because it allows us to avoid important non-biological sources of uncertainty, such as potential alignment errors and unknown structural or functional differences. This is supported by a small number of prior studies, which compared four sets of paralogous proteins in yeast - Rck1 v. Rck2, Fkh1 v. Fkh2, Ace2 v. Swi5 (68), and Boi1 v. Boi2 (58), and concluded that divergence in short linear motifs is likely responsible for differences in phosphorylation. While paralogs are far less common in other organisms, a similar conclusion emerged from a comparison of predicted sites of phosphorylation in mammalian p53, p63 and p73 (69).

      Our analysis of differentially-modified pairs of paralogous proteins revealed that the most common modifications – phosphorylation, ubiquitylation and acylation but not N-glycosylation – occur within regions of high sequence conservation. Further studies will benefit from the availability of our search algorithm CoSMoS.c.. For example, when studying a particular protein kinase, CoSMoS.c. can be used to identify specific motifs near potentially modified serines, threonines and tyrosines (Table 2). When studying a particular substrate of ubiquitylation, CoSMoS.c. can be used to prioritize conserved versus non-conserved sequences flanking potentially modified lysines. For rare modifications, CoSMoS.c. can also be used to locate highly conserved regions as the starting points for finding new sequence motifs. Thus, by comparing unique modifications in closely-related gene products and across closely-related strain isolates, we can prioritize mechanistic investigations of modifications that are likely to have functional importance, to identify recognition motifs for specific modifying enzymes, and to better predict new enzyme-substrate relationships.”

      __In addition, and in response to the reviewer’s suggestion, we are currently expanding the web site to facilitate comparisons across paralogue pairs. ____

      __

      Currently, the major problems stated above 1) correction for the problem of false negatives, and 2) correction for the confounding effects of protein abundance need to be addressed before the results can be fully interpreted and evaluated.

      As detailed above under Points 1-3,____ we have now done a control experiment using cross-study conservation of PTMs and limiting our comparison to proteins of similar abundance. By both methods, we were able to confirm our original findings, as detailed above.

      *Reviewer field of expertise: phosphosite evolution, PTM evolution, protein evolution.

      *

      * *

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): *

      This manuscript describes the evolutionary conservation of yeast post-translationally modified residues and sequence motifs surrounding them.

      Reviewer #3 (Significance (Required)):

      Although this is not new, (Beltrao, Cell 2012, Minguez, MSB 2012, Hendriksen 2012) all show that sites of acetylation, phosphorylation and other modifications are more conserved in yeast than would be expected. Beltrao and Minguez also provide webservers http://ptmfunc.com/ http://ptmcode.embl.de where the link of conserved modified sites is made to protein structures and protein-protein interactions.

      The novelty of this study is in studying the duplicated proteins after whole genome duplication as well as providing an online interactive server where the conservation can be retrieved in detail, different scoring functions are provided. In addition, the conservation is calculated in closely related species rather than long evolutionary distances as previous studies have done.

      I am missing a concrete example of how a researcher would use the resource that the authors introduce here, and how it is an advance to previously proposed methods. For example, are there sites found conserved in this set of more closely related organisms, that are not conserved in yeast versus metazoa? Is the more fine-grained methodology useful to detect motif sequences that can otherwise not be detected? Can the authors provide proof that indeed the conserved sites are more functional than non-conserved?

      *

      *At the moment the manuscript describes very little results, and only a possible advance compared to previous methods, no proof is given that an actual advance is made. *

      The authors should compare their work to previous work in this field.

      We were very pleased and appreciative of the reviewer’s comments, and constructive suggestions for improving the manuscript.____ We have added new text to better emphasize the importance of the question, the novelty of our approach, the significance of the results, and the potential for future discovery, as follows:

      “Post-translational modifications are critical functional elements within proteins, and are therefore expected to be conserved in evolution. Here, we have identified several thousand instances where, despite a shared ancestry, only one of two paralogous proteins undergoes a specific post-translational modification. We also developed a custom algorithm that quantifies sequence conservation, in an automated fashion, across 1012 unique strain isolates. By comparing adjoining sequences in multiple isolates of the same species, we determined that sequence conservation near sites of modification is greater than at sites that are not modified. In addition, many of the modifications were associated with characteristic sequence elements nearby. We postulate that these differences in sequence conservation are partly responsible for differences in post-translational modifications, that differences in post-translational modifications allow duplicated proteins to be differentially regulated, and these differences may account for their retention after 100M years of evolution.

      Our analysis is clearly distinct from - and complementary to - earlier investigations of post-translational modifications in yeasts. Previous analysis showed that duplicated proteins in Saccharomyces cerevisiae are more likely to be phosphorylated, and to have a greater number of phosphorylation sites, than non-duplicated proteins (58). The difference persisted when controlling for differences in protein abundance, coverage, essentiality, positioning within protein interaction networks and assembly into multi-protein complexes (58). When compared with a yeast species that diverged before the whole genome duplication event, it appears that the majority of phosphorylation sites in paralogs have either been lost or gained, with a strong bias toward losses (56). Subsequent cross-species comparisons noted a high degree of sequence conservation near sites of phosphorylation and other types of modification in yeasts (49, 59-65). The relationship was strongest for phosphosites with known function (49, 50, 61). A focused study of 249 unique high-confidence phosphorylation sites, targeted by 7 protein kinases in S. cerevisiae, confirmed that regions flanking sites of phosphorylation are significantly constrained, in comparison with other closely related yeast species (61). A similar relationship exists for sites phosphorylated by the cyclin-dependent protein kinase Cdk1 (66), and was the basis for predicting novel sites of phosphorylation by the cAMP-dependent protein kinase (67). Our analysis builds on these foundational studies, by considering new and substantially larger proteomics datasets, multiple additional types of post-translational modifications, new and sophisticated models of protein structure, large-scale kinase interactome data, and in-species sequence conservation data – with particular emphasis on duplicated proteins.

      We propose that in-species comparisons of paralogs will prove to be more reliable than cross-species comparisons of orthologous proteins, or in-species comparisons of non-homologous proteins. Comparison of paralogs is powerful because they are likely to have similar structures and functions, due to their shared evolutionary origin (56, 58, 68). Comparison within a single species is powerful because it allows us to avoid important non-biological sources of uncertainty, such as potential alignment errors and unknown structural or functional differences. This is supported by a small number of prior studies, which compared four sets of paralogous proteins in yeast - Rck1 v. Rck2, Fkh1 v. Fkh2, Ace2 v. Swi5 (68), and Boi1 v. Boi2 (58), and concluded that divergence in short linear motifs is likely responsible for differences in phosphorylation. While paralogs are far less common in other organisms, a similar conclusion emerged from a comparison of predicted sites of phosphorylation in mammalian p53, p63 and p73 (69).

      Our analysis of differentially-modified pairs of paralogous proteins revealed that the most common modifications – phosphorylation, ubiquitylation and acylation but not N-glycosylation – occur within regions of high sequence conservation. Further studies will benefit from the availability of our search algorithm CoSMoS.c.. For example, when studying a particular protein kinase, CoSMoS.c. can be used to identify specific motifs near potentially modified serines, threonines and tyrosines (Table 2). When studying a particular substrate of ubiquitylation, CoSMoS.c. can be used to prioritize conserved versus non-conserved sequences flanking potentially modified lysines. For rare modifications, CoSMoS.c. can also be used to locate highly conserved regions as the starting points for finding new sequence motifs. Thus, by comparing unique modifications in closely-related gene products and across closely-related strain isolates, we can prioritize mechanistic investigations of modifications that are likely to have functional importance, to identify recognition motifs for specific modifying enzymes, and to better predict new enzyme-substrate relationships.”

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript describes the evolutionary conservation of yeast post-translationally modified residues and sequence motifs surrounding them.

      Significance

      Although this is not new, (Beltrao, Cell 2012, Minguez, MSB 2012, Hendriksen 2012) all show that sites of acetylation, phosphorylation and other modifications are more conserved in yeast than would be expected. Beltrao and Minguez also provide webservers http://ptmfunc.com/ http://ptmcode.embl.de where the link of conserved modified sites is made to protein structures and protein-protein interactions.

      The novelty of this study is in studying the duplicated proteins after whole genome duplication as well as providing an online interactive server where the conservation can be retrieved in detail, different scoring functions are provided. In addition, the conservation is calculated in closely related species rather than long evolutionary distances as previous studies have done.

      I am missing a concrete example of how a researcher would use the resource that the authors introduce here, and how it is an advance to previously proposed methods. For example, are there sites found conserved in this set of more closely related organisms, that are not conserved in yeast versus metazoa? Is the more fine-grained methodology useful to detect motif sequences that can otherwise not be detected? Can the authors provide proof that indeed the conserved sites are more functional than non-conserved?

      At the moment the manuscript describes very little results, and only a possible advance compared to previous methods, no proof is given that an actual advance is made.

      The authors should compare their work to previous work in this field.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      The authors of this work study how S. cerevisiae paralogue pairs are differentially modified with respect to five major PTM classes: phosphorylation, ubiquitination, mono-acetylation, N-glycosylation, and succinylation. Emphasis is placed on paralogue pairs where a modification is found in only one of the two paralogues at homologous positions. A conservation analysis is then performed across 1011 S. cerevisiae isolates to check for differences in conservation between the modified target and its unmodified paralogue. The authors claim that, for most of the PTM classes, modified targets tend to be more conserved than their unmodified paralogues. Phosphorylation sites between paralogue pairs were also compared using AlphaFold2 and a database of kinase interactions (YeastKID), revealing differential interactions between paralogues but no significant structural differences.

      Major:

      1. A major issue with this work is that the problem of 'false negatives' for PTM detection is never adequately addressed or controlled for. As the authors allude to in the manuscript, the number of PTM sites detected is likely far below the number that exists and this is especially a problem for the less well characterised PTM classes. How then can the authors be confident that an 'unmodified' site is truly unmodified and not just undetected? The authors can refer to Freschi et al., 2011 (MSB) for a method that controls for the false negative (FN) PTM detection rate by comparing cross-study conservation with cross-study reproducibility.
      2. The second point follows closely from the first. The issue is that MS-based PTM detection is generally biased towards abundant proteins, and protein abundance also correlates strongly with evolutionary rate, with more abundant proteins tending to have higher conservation. Taken together, these two relationships could explain the observation that the modified paralogue tends to be more conserved than the 'unmodified' paralogue. The authors should try and control for the effect of protein abundance on the results observed; for example, by checking if the results/conclusions change when restricting the analysis to paralogue pairs with similar abundances.
      3. Alongside false negatives, there is the cognate issue of false positives and mislocalised PTM sites (see Lanz et al., 2021, EMBO Reports). If possible, the authors should check to see if their conclusions change when restricting the analysis to high-confidence PTM sites identified from multiple sources and/or validated by low throughput experimental assays.
      4. The authors define conservation here using 1011 wild and domesticated yeast isolates within one species (S. cerevisiae). While this is clearly valuable information, this reviewer wonders why orthologues from closely related species were not also leveraged to assess the evolutionary rate, as is traditionally done for studies on PTM evolution? Is there a strong rationale for this? Using more distantly-related genomes could give more statistical power for the detection of weak differences in selective constraint between paralogues.

      Minor:

      1. Both the Introduction and Discussion describe PTMs and the evolution of gene duplication in very general terms. However, literature concerning the evolution of PTMs and specifically the evolution of PTMs following gene duplication has been largely ignored. These studies give the most relevant context to this work and should be described and cited. Freschi et al., 2011 (Molecular Systems Biology) and Ba et al., 2014 (PloS Computational Biology) are particularly relevant.
      2. While I enjoyed to a limited extent the historical perspective on PTM discovery, there is far too much text given to this overall and the writing should be made more concise by removing excessive detail. This is especially the case for the Results section, where the emphasis should be on the analysis performed by the authors.
      3. Description of the methodology should be reviewed for language and clarity. In particular, the authors should explain explicitly the meaning of new terms such as 'pairing structure' and how this may confer an 'advantage / disadvantage' to target proteins -- wording that this reviewer found especially confusing and unnecessary. The authors should also be explicit about how the distributions for each test are constructed; the current wording sometimes gives the impression that a distribution is derived from a single target or paralogue instead of being derived from a set of modified targets and the corresponding set of unmodified paralogues. Another confusion is that the Distribution Mean Test is contrasted with the Paralog Pairing Test in Fig S8 and yet on page 15 the Distribution Mean Test is described as 'paired' test on page 15 even though from the description the test seems unpaired?
      4. Following on from point 2) in the 'major' section above, the authors could consider normalising the conservation scores within a protein to control for the effect of protein abundance and other potential confounders acting at the protein level.
      5. For the analysis of motifs, departure from the BLOSUM62 expectation may just reflect the fact that many of these PTMs fall in disordered regions - which have distinct amino acid propensities -- whereas matrices like BLOSUM62 were constructed mostly from ordered protein regions.
      6. The analysis of sequence motifs could be extended by scoring phosphosites with yeast position weight matrices (PWMs) for protein kinases and comparing the results between modified targets and their unmodified paralogues. This can help distinguish true positive and false negative modification differences. See Freschi et al., 2011 (Molecular Systems Biology).

      Significance

      This work is potentially of specialist interest to researchers studying the evolution of PTMs. While the evolution of phosphorylation following gene duplication has been studied previously (Freschi et al 2011, MSB), this work considers other PTM classes and takes advantage of a much larger data set. Potentially, clear examples of paralogue PTM divergence could be used as a basis for follow-up experiments. However, the web-server as it is now is designed to facilitate the easy analysis of a single protein at a time and not comparisons across paralogue pairs.

      Currently, the major problems stated above 1) correction for the problem of false negatives, and 2) correction for the confounding effects of protein abundance need to be addressed before the results can be fully interpreted and evaluated.

      Reviewer field of expertise: phosphosite evolution, PTM evolution, protein evolution.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      This paper reports bioinformatics analysis of population variation in PTM sites in paralogs from the yeast whole-genome-duplication. If I understand it correctly, the main finding is that modified sites show less population variation than paralagous unmodified sites. The results are largely in line with what is expected based on previous studies, though the authors do not present their results in that context.

      Major comments:

      1. The study benefits from two clever design choices:

      First, comparison of sites between paralogs is a very powerful test for an evolutionary hypothesis because paralogous sites are expected to have relatively similar structural context. Second, use of within species polymorphism data is much less susceptible to alignment errors that can be an issue for longer evolutionary comparisons.

      However, these design choices are not discussed or motivated by the authors. Nor are they compared to the designs of previous studies. Examples of previous studies (PMID: 22588506, PMID: 21273632, PMID: 20594336,PMID: 20594336, PMID: 24465218, PMID: 22889910, PMID: 20368267, PMID: 28054638) 2. One essential control that needs to be added is how much of the effect the authors observe can be explained by protein abundance. In yeast, protein abundance is strongly negatively correlated with evolutionary rate, and is strongly positively correlated with identification of PTMs in MS and other assays (extensively discussed in some of the previous studies I listed above). The authors need to assess whether their findings are due to the slow evolution of highly expressed proteins, and the detection bias for these proteins in PTM identification experiments. As far as I could tell this was not discussed by the authors. 3. A major weakness of the paper is its lack of focus. It includes a rambling historical introduction and discussion that omits discussion of the relevant recent research directly related to the questions at hand. For example, the paper describes historical work on phosphorylase, but gives not a single example of a paralog pair with a polymorphic PTM site identified in their study. The authors introduce gene duplication in a very general way, even though several papers have focused specifically on evolution of protein regulation in paralogs (e.g., PMID: 20080574, PMID: 27003913, PMID: 25474245) The paper of Nguyen Ba et al. 2014 (PMID: 25474245) seems especially relevant, as in addition to perfoming a genome-wide analysis, their abstract reads "We examine changes in constraints on known regulatory sequences and show that for the Rck1/Rck2, Fkh1/Fkh2, Ace2/Swi5 paralogs, they are associated with previously characterized differences in posttranslational regulation." It seems that the results of that study could be directly compared to the analysis performed here.

      Significance

      The significance is hard to assess because the research is not given proper context and motivation.

      I believe the study could be of interest to research studying cell signalling and its evolution, as well as those interested in gene family diversification. However, as written, no specific examples are given or clear hypotheses tested, making the paper seem largely descriptive.

      My keywords: molecular evolution, signalling, intrinsically disordered regions, computational biology

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2022-01707

      Corresponding author(s): Sarah Butcher, Richard Lundmark

      1. General Statements [optional]

      We thank the reviewers for their insightful comments. The inclusion of the points raised by the referees have strengthened the manuscript. However, some of the reviewer suggestions are beyond the scope of the work (see below), but will doubtlessly be touched upon in future studies by the authors. In addition to incorporating changes relevant to answering the reviewers’ comments, we have edited the manuscript for increased clarity and precision.

      2. Description of the planned revisions

      1. Liposome flotation assay Reviewer #1 suggested that we should perform a liposome floatation assay to separate possible C protein aggregation from membrane binding: "I would strongly recommend supplementing the current liposome sedimentation assay by liposome flotation assay. In contrast to liposome co-sedimentation, the flotation assay can discriminate protein aggregates from proteins bound to liposomes. Although the SDS PAGE shown in Fig. 1A looks pretty convincing, a faint protein band in the „P" lane of the middle panel for the (-) sample is evident. Therefore, C protein aggregation cannot be ruled out and it would be indistinguishable from liposome binding examined by mere co-sedimentation assay”

      Response: We agree that this is a necessary control experiment to add, and we will perform it with liposomes containing 40 % POPS. As we detected complete C protein co-sedimentation with this lipid composition, performing the flotation experiment with the same composition will prove that the earlier result indicates lipid binding and not protein aggregation.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      1. Reviewer #1
      2. In addition, it needs to be clarified which TBEV C protein construct, whether full-length or truncated, was used for co-sedimentation fragmentation.

      Response: We have clarified in this section of the manuscript that the full-length C protein construct was used for the liposome co-sedimentation assays by adding “full-length” prior to instances of “C protein” e.g. in the paragraph starting line 118.

      1. How to understand the finding that „the C protein forms a very rigid layer when adsorbed to the membrane". Can the aggregation of C-protein be ruled-out? Following the 1M NaCl wash of C-protein-bound to SLB, the authors stated: „This shows that initial membrane recruitment of C protein is strongly dependent on its interactions with the negatively-charged lipid headgroups. However, once bound, the C protein-membrane interaction is complemented with non-electrostatic interactions such as membrane insertion or protein oligomerization": does it mean that there are several layers of C protein, the first held by electrostatic interactions, overlayed by non-electrostatically bound C protein? If yes, the illustration of single-layered C-protein adsorbed onto SLB in Fig. 2A, B is not correct.

      Response: We understand the confusion regarding the term “rigid” which was used as a way to describe how we interpret the relatively minor change in the dissipation upon membrane binding. What we intended to describe was that this indicates that the protein is attached in a stable way that does not add viscoelastic properties to the system. These data indicate that the protein does not form large aggregates that non-specifically attach to the membrane in different protrusive orientations. We have clarified this in the manuscript and specified that the as there is no dissipation change, there is no aggregation. We added the following to line 168 “This, in turn, indicates that the C protein does not bind as non-specific aggregates as these would have changed the viscoelastic properties of the system.”

      We do not mean that there are several layers of C protein. We consider, due to the highly charged nature of C, that the most likely explanation is that there are multiple modes of C binding but the result is only one layer, with multiple C-proteins interacting with each other within that layer. We have modified the text at line 184 to: “However, once bound, the C protein-membrane interaction is complemented with non-electrostatic interactions such as membrane insertion or protein oligomerization within the bound layer.”

      1. The sentence: “To confirm that the C protein is biologically active, we investigated its ability to bind RNA" seems to be a little odd because it suggests the model membrane binding assays do not require biological active proteins. However, considering that the interactions leading to binding either negatively-charged lipid or negatively-charged RNA are electrostatic - this sentence must be rewritten.” Response: We thank the reviewer and have now rephrased this sentence to the following at line 249 “Since RNA binding is crucial for the NC assembly, we investigated the C protein’s ability to perform this function.”

      2. “The authors´ statement in the Abstract: „....we investigate nucleocapsid assembly..." is too speculative because the assembly was not studied in their work. It needs to be reformulated.” Response: We agree, and the statement has been removed from the abstract.

      3. Despite this clear and valuable methodological contribution, the authors' contribution to our knowledge of the coordination of the nucleocapsid components to the sites of assembly and budding is not so obvious. Contrary to the earlier idea that the flavivirus is asymmetrically charged (that is, hydrophobic on one side (α2) and positively charged on the other side (α4), recent studies show that the entire surface of the protein is highly electropositive (Mebus-Antunnes et al., 2022). Therefore, a well-ordered neutralization of the flaviviral C proteins' highly positive surface seems critical for the proper organization and assembly of nucleocapsid. I am afraid that the authors do not shed much light on this issue.” Response: The recent structure of the TBEV C protein, published after we submitted the manuscript, shows that indeed the C protein is highly positively charged on all surfaces (updated Supplementary Figure 1 and Selinger et al., 2022). The recruitment of C protein to the membrane, that we demonstrate is dependent on negatively-charged head groups, provides a biologically relevant mechanism for charge neutralization on the C protein surface that interacts with the lipids. The remaining surface charge can be then neutralized by RNA recruitment. Mebus-Antunnes et al. made their observations with just RNA and C protein from Dengue virus in the context of artificial surfaces e.g. mica. However, our experiments utilize the TBEV C protein and specifically include a membrane, the third critical component of NC assembly. Thus, we build upon the work of Mebus-Antunnes et al. by adding a second biologically relevant charge-neutralising component and comparing with a distantly-related virus. We have changed the discussion section of the manuscript to reflect this new structure and to emphasize the advance here. Starting from line 371 we changed the text to: “Recently, it has been shown that the neutralization of the C protein surface positive charge is important for RNA binding in the distantly-related Dengue virus (DENV) (Mebus-Antunes et al, 2022). The recruitment of C protein to the membrane, that we demonstrate is dependent on negatively-charged head groups, provides a biologically relevant mechanism for charge neutralization on the C protein surface that interacts with the lipids. The remaining surface charge can be then neutralized by RNA recruitment.”

      Reviewer #2 1. “What results demonstrate C protein inserts into membrane? The current results support the C protein interacts with membranes with positive charge, but do not seem to demonstrate membrane insertion. If the C protein inserts into the membrane, which regions (helices) play this role?”

      Response: The Langmuir-Blodgett trough tensiometry experiments with monolayers directly measure the insertion of a protein into the monolayer. By determining the maximum insertion pressure of the C protein constructs, we also show that the membrane insertion can occur in bilayers. We show that the N-terminus is not inserting into the membrane, further work, beyond the scope of this manuscript, is needed to pinpoint the residues responsible for insertion, for instance by hydrogen-deuterium exchange or FRET measurements that would not affect folding. To clarify the use of the LB trough, we added the following at line 216: “To investigate if the C protein membrane binding includes insertion into the membrane after the initial electrostatic binding, we used Langmuir-Blodgett trough monolayer experiments. In this approach, the insertion of a protein into a lipid monolayer can be detected by following the pressure (π) of the monolayer after protein injection into the aqueous subphase, with increases in π corresponding to protein injection (Brockman, 1999; Liu et al, 2022).“

      1. The authors should discuss several previous papers reporting the effect of partial deletions of the C gene on the replication of TBEV, West Nile virus, and other flaviviruses.” Response: We agree that this is a necessary addition, and have now added a paragraph in the discussion section starting at line 333: “N-terminally truncated flaviviral C proteins have been shown to be assembly competent and in vitro, able to bind RNA, which is consistent with our results with N-terminally truncated TBEV C protein (Khromykh & Westaway, 1996; Kofler et al, 2002; Patkar et al, 2007; Schlick et al, 2009). One role of C is in the modulation of host responses to infection and the N-terminus maybe involved in that (Yang et al, 2002; Limjindaporn et al, 2007; Colpitts et al, 2011; Bhuvanakantham & Ng, 2013; Katoh et al, 2013; Urbanowski & Hobman, 2013; Samuel et al, 2016; Slomnicki et al, 2017; Fontaine et al, 2018). The membrane insertion directly detected in our experiments is central to C protein function. Other studies have found that deletions in the hydrophobic region of the α2 helix significantly impair particle assembly (Kofler et al, 2002; Patkar et al, 2007; Schlick et al, 2009). In the light of this evidence, we consider that the α2 helix could be responsible for membrane insertion (Markoff et al, 1997; Kofler et al, 2002; Nemésio et al, 2011, 2013).”

      Reviewer #3 1. “In Figure 4, the band (256:1) that are supposedly in the wells (red arrow) is not clear- it is only slightly darker than the other wells.”

      Response: This confusion was the result of unclear wording. We have now revised the figure legend at line 278 to : “The black arrow indicates the bands of freely-migrating RNA, and the red arrow the wells. On lanes 624:1 and 256:1, RNA has been immobilized in the wells.”

      1. Figure S1A, the N-terminal end (which is truncated in the mutant) should be colored on the cyan molecule.” Response: We have coloured the truncated part of the cyan molecule in the figure (now S1B) according to the reviewer’s comment.

      Other 1. As the nuclear magnetic resonance structure of the truncated TBEV C protein has recently been released (Selinger et al, 2022), we have updated the manuscript and Figure S1 to include the information from this structure. We have also generated a new homology model of the full-length TBEV C protein using this structure as a template and included that in Figure S1.

      4. Description of analyses that authors prefer not to carry out

      1. Reviewer #1
      2. However, we do not know whether in the infected cells, the C protein is pre-bound to ER membrane or to viral RNA. Having such a unique assay in their hands, I wonder whether the authors could use the pre-bound C protein with genomic RNA (i.e. the experiment shown in Fig. 4A) ribonucleoprotein complex in the SLB binding assay. If doable, this experiment would be exciting and could bring some important information about NC assembly.”

      Response: We agree that it would be very interesting to decipher if the C-protein first binds to RNA or to membranes using the QCM-D methodology. Yet, our data on pre-incubated C-protein and RNA suggests that large aggregates are formed which would hamper the interpretation of the QCM-D data. Furthermore, based on the suggested experiment, we will not be able to firmly conclude whether or not the C-protein first binds to RNA or to membranes since the time of the experiment will allow rearrangement of preformed complexes between C-protein and RNA. Additionally, the QCM-D measurement cannot differentiate if the preformed complexes bind on their own, or if excess unbound C protein binds the membrane and then recruits the complex. Therefore, addressing this question would require major adjustments to the RNA model system and methodology that we feel are beyond the scope of this study.

      Reviewer 2 1. “The authors should use the lipids detected in the virions to confirm C protein binding experiments.”

      Response: In the mass spectrometry characterization of the TBEV virions, we detected lipids from 9 classes (Car, PE, PS, PI, PG, PC, Cer, HexCer & TG). We have tested four of them (PE, PS, PI, PC) in the liposome sedimentation assay. Additionally, we tested GalCer, which, like HexCer, are cerebrosides. Our liposome binding experiments clearly demonstrate that the C protein does not bind to a specific lipid class, but instead to lipids with negatively-charged headgroups. Therefore, we would argue that doing additional sedimentation experiments with Car, PG, Cer, and TG would not add extra insight to the manuscript.

      Additionally, while the population of lipid species in the TBEV envelope is diverse, the diversity mostly comes from differences in the lipid tails, which do not generally affect the head group-mediated binding of proteins. Therefore, performing additional lipid binding experiments with varying tail lengths would not likely lead to new observations.

      Finally, to perform the authentic experiment of testing C protein binding to liposomes formed from lipids extracted from purified virions would require orders of magnitude more virus sample than our research laboratory is capable of producing. Therefore, we argue that this experiment is beyond the scope of this study.

      1. The study may be strengthened by performing virus mutagenesis experiments.” Response: While we agree that, ultimately, experiments on virus and cells would help to understand the role of the C protein in the biological context, we think these experiments are beyond the scope of this study. For virus mutagenesis, candidate residues should be first identified with biochemical and biophysical studies, which is already beyond the scope of this work. Additionally, the C protein has multiple functions in the host cell in addition to NC assembly, and interpreting the effect on the mutations on e.g. virus titer is difficult.

      Reviewer #3 1. “In all figure legends, authors should write a conclusion line after the description of the experiments - what conclusion is drawn from each experiment.”

      Response: While we agree that adding such a conclusion line would make it easier for the reader to understand each figure, the format of the figure legends is highly subject to journal policy. Therefore, we think that the addition of such lines will be an editorial decision and will depend on the journal. We have, however strived to make the figure titles as informative as possible in lieu of such concluding lines.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Pulkkinen and co-authors, title: Simultaneous membrane and RNA binding by TBE virus capsid protein.

      This paper characterizes the ability of purified TBE capsid proteins to bind to different composition of lipids by biophysical methods and found that it prefers to bind to negatively charge lipids. The capsid then partially inserts into the membrane. Using mass spectrometry, they analyze the lipid composition of the purified TBE virus and showed they composed of negatively charge lipids thereby further supporting that the virus is likely first assembled where the negatively charge lipids are located in the endoplasmic reticulum. They also characterize the membrane bound capsid protein's ability to bind RNA and show they are able to bind. For all these experiments, they also included a capsid mutant with its N-terminal end deleted and show the mutant capsid protein activity doesn't not differ much from the whole capsid protein- thus showing the N-terminal end is likely not important for these processes. The experiments are well conducted and the manuscript is very clearly written.

      Comments:

      1. In Figure 4, the band (256:1) that are supposedly in the wells (red arrow) is not clear- it is only slightly darker than the other wells.

      Minor comments:

      1. In all figure legends, authors should write a conclusion line after the description of the experiments - what conclusion is drawn from each experiment.
      2. Figure S1A, the N-terminal end (which is truncated in the mutant) should be colored on the cyan molecule.

      Referees cross-commenting

      I agree with comments by Reviewers #1 and #2

      Significance

      The study here although done in an in vitro system illuminates the virus assembly process - the positively charged capsid protein binds to the negatively charge area of the endoplasmic reticulum membrane, the capsid then partially insert into the membrane, then capsid interacts with viral RNA genome to facilitate virus assembly process. This is a very detailed study of the initial steps of the virus assembly process.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Pulkkinen et al. performed biochemical and biophysical experiments to suggest (1) negatively charged lipids are required for TBEV C protein interacting with membrane, (2) the membrane-associated C protein could simultaneously bind viral RNA, (3) the first 17 amino acids are not required for (1) and (2), and (4) TBEV virions contain negatively charged lipids. The study is important and provides molecular insights in flavivirus assembly. The following points can substantiate the manuscript.

      Major points

      1. The authors should use the lipids detected in the virions to confirm C protein binding experiments.
      2. What results demonstrate C protein inserts into membrane? The current results support the C protein interacts with membranes with positive charge, but do not seem to demonstrate membrane insertion. If the C protein inserts into the membrane, which regions (helices) play this role?
      3. The study may be strengthened by performing virus mutagenesis experiments.
      4. The authors should discuss several previous papers reporting the effect of partial deletions of the C gene on the replication of TBEV, West Nile virus, and other flaviviruses.

      Significance

      This is an important study as indicated in the comments to authors.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The authors characterized the interactions of recombinant, bacterially expressed full-length and N-terminally truncated C proteins of tick borne encephalitis virus with model membrane systems. They used a unique combination of biophysical methods, including protein liposome co-sedimentation, QCM-D measurement and Langmuir-Blodgett trough monolayer experiments. Their experiments showed that the binding of TBEV C to both liposomes and supported lipid bilayer (SLB) is strongly dependent on the presence of negatively charged lipids. They also showed that following the initial electrostatic binding to the model lipid membrane, both C protein variants absorb to the SLB and form rigid layers which are stabilized by non-electrostatic interactions. By Langmuir-Blodgett trough monolayer experiments they demonstrated that negatively charged lipids are needed for C protein membrane insertion. The SLB bound C proteins, either full-length or N-terminally truncated, were shown to bind in vitro transcribed TBEV genomic RNA. Finally, to prove their major finding that negatively charged lipid head groups are crucial for C protein interaction with the lipid membrane, the authors analyzed the lipid content of the purified virions.

      This work deals with the central role of the C protein, namely with its binding to the lipid membrane and genomic RNA. In the infected cells, this process leads to nucleocapsid assembly, a step which is poorly understood. The authors demonstrate that the membrane affinity of the C protein is conditioned by the presence of negatively charged polar heads. The text and figures are clear and accurate. The results obtained from three independent methodological approaches are solid and confirm the importance of electrostatic interactions for a contact of C protein with the membrane. As highly interesting, I considered the observation that the C protein, while bound to the model membrane (SLB), still retains its ability to bind RNA. Although their data did not show anything about the orientation of the C protein in SLB, this methodology opens the way to how, using suitable mutants of TBEV C, this can be found. I am sure that the authors are aware of the possibilities of studying a series of the TBEV C mutants with impaired membrane or RNA binding. Therefore, I assume that the authors' primary focus here is to show new methodological approaches to the simultaneous measurement of C protein interactions with model membranes and RNA, and some data obtained on the abovementioned mutants will be published afterwards.

      Major comments:

      1. One of the fundamental challenges of the work with flaviviral capsid proteins is that they tend to form amorphous aggregates to neutralize their highly positive surface charge. As the authors state themselves, „ We cannot rule out that the C protein preparation is heterogeneous..." I would strongly recommend supplementing the current liposome sedimentation assay by liposome flotation assay. In contrast to liposome co-sedimentation, the flotation assay can discriminate protein aggregates from proteins bound to liposomes. Although the SDS PAGE shown in Fig. 1A looks pretty convincing, a faint protein band in the „P" lane of the middle panel for the (-) sample is evident. Therefore, C protein aggregation cannot be ruled out and it would be indistinguishable from liposome binding examined by mere co-sedimentation assay. In addition, it needs to be clarified which TBEV C protein construct, whether full-length or truncated, was used for co-sedimentation fragmentation.
      2. In section: Initial C protein recruitment to the membrane is of an electrostatic nature How to understand the finding that „the C protein forms a very rigid layer when adsorbed to the membrane". Can the aggregation of C-protein be ruled-out?

      Following the 1M NaCl wash of C-protein-bound to SLB, the authors stated: „This shows that initial membrane recruitment of C protein is strongly dependent on its interactions with the negatively-charged lipid headgroups. However, once bound, the C protein-membrane interaction is complemented with non-electrostatic interactions such as membrane insertion or protein oligomerization": does it mean that there are several layers of C protein, the first held by electrostatic interactions, overlayed by non-electrostatically bound C protein? If yes, the illustration of single-layered C-protein adsorbed onto SLB in Fig. 2A, B is not correct. 3. C protein inserts into membranes It is beyond the frame of this work; however, it would be nice to show whether mutations of amino acid residues within the hydrophobic segment of TBEV C, which are in other flaviviral C proteins considered responsible for hydrophobic interaction, can abolish the membrane interaction. 4. Membrane-bound C protein can recruit TBEV genomic RNA. The sentence „ To confirm that the C protein is biologically active, we investigated its ability to bind RNA" seems to be a little odd because it suggests the model membrane binding assays do not require biological active proteins. However, considering that the interactions leading to binding either negatively-charged lipid or negatively-charged RNA are electrostatic - this sentence must be rewritten. 5. The authors state, "These data show that membrane-bound C protein is capable of recruiting TBEV genomic RNA at the membrane, suggesting that this also happens in the context of NC assembly". However, we do not know whether in the infected cells, the C protein is pre-bound to ER membrane or to viral RNA. Having such a unique assay in their hands, I wonder whether the authors could use the pre-bound C protein with genomic RNA (i.e. the experiment shown in Fig. 4A) ribonucleoprotein complex in the SLB binding assay. If doable, this experiment would be exciting and could bring some important information about NC assembly.

      Minor comments:

      The authors´ statement in the Abstract: „....we investigate nucleocapsid assembly..." is too speculative because the assembly was not studied in their work. It needs to be reformulated.

      Referees cross-commenting

      I agree with the Reviews by reviewers #2 and #3

      Significance

      This manuscript's major novelty and originality are in using a unique combination of biophysical methods, including quartz crystal microbalance with dissipation monitoring and Langmuir-Blodgett trough. Using quartz crystal microbalance with dissipation, the authors confirmed the necessity of negatively charged lipid components of the model lipid membrane for C-protein binding. Furthermore, this method also allows them to measure the formation of a rigid layer of C protein stabilized by non-electrostatic interactions. By Langmuir-Blodgett trough monolayer experiments, they demonstrated the insertion of TBEV C protein into the model membrane. However, I do not have sufficient expertise to evaluate the correctness of the experiments done by these two methodologies.

      Despite this clear and valuable methodological contribution, the authors' contribution to our knowledge of the coordination of the nucleocapsid components to the sites of assembly and budding is not so obvious. Contrary to the earlier idea that the flavivirus is asymmetrically charged (that is, hydrophobic on one side (α2) and positively charged on the other side (α4), recent studies show that the entire surface of the protein is highly electropositive (Mebus-Antunnes et al., 2022). Therefore, a well-ordered neutralization of the flaviviral C proteins' highly positive surface seems critical for the proper organization and assembly of nucleocapsid. I am afraid that the authors do not shed much light on this issue.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      The authors do not wish to provide a response at this time since they are submitting a Revision plan and not a Full revision.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The objective of this manuscript was to determine the role of TZPs in mouse oocyte quality. The experimental plan was to compare the phenotypes of global Myo10-/-, oocyte Myo10-/-, and Myo10+/+ follicles. The results indicate that global loss of Myo10 did not prevent oocyte growth, but resulted in lower density of TZPS. Whole ovary image analysis revealed that Myo10-/- follicles actually contained more TZPs than wt, despite the fact that TZP density was decreased in Myo10-/- follicles. In mature knockout females, oocyte growth proceeded, but with impaired oocyte-zona integrity and alterations in gene expression including upregulation of numerous protein encoding genes. Oocytes from Myo10-/- knockout females produced a normal-appearing spindle but exhibited reduced capacity to mature beyond MI. Analysis of ovulated oocytes from mated females revealed an increase in the number of unfertilized and dead oocytes, many of which exhibited gaps between the zona pellucida and the oocyte plasma membrane. Those oocytes that were successfully fertilized exhibited a higher than normal of developmental arrest by the blastocyst stage. Lastly, mating trials revealed that Myo10-/- females were sub-fertile.

      The results are clearly described with high quality imaging to demonstrate phenotypes. The data appear reproducible based on sample size and the number of repetitions. In most cases, statistical analysis demonstrates significance of observed differences.

      Minor comments:

      1. Fig. 2B does not provide statistical evidence that the two data sets differ.
      2. Fig. 6A Was the zona pellucida functional in unfertilized oocytes from Myo10-/- females? That is, were sperm bound to the zona or within the perivitelline space?
      3. The observation that oocytes from Myo10-/- females have more TZPs but lower TZP density raises questions as to how more TZPs (even if less densely spaced) could fail to support oocyte development. Dye diffusion assays comparing the rate of injected dye from Myo10+/+ and Myo10-/- (GV stage) or (maturing) stage oocytes into their attached granulosa cells might reveal an explanation.

      Significance

      The manuscript addresses an important aspect of follicle development using state of the art methodology to test the requirement of Myo10 for successful TZP-oocyte interaction during follicle development. The authors demonstrate significant findings as to the mechanism by which TZPs enable granulosa cell-oocyte contact which is required for transfer of critical components from granulosa cell to oocyte. The requirement of Myo10 in this process in oocyte competence is demonstrated clearly, however the mechanism by which Myo10 ablation causes defective fertilization and development remains unclear. In any case, the results demonstrate new and interesting findings that will be of great interest to basic scientists including oocyte biologists working on diverse animal species. The results could lead to further understanding of TZP-oocyte interaction and reveal ways to improve or restore communication between these two cells.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The authors have investigated the effect of knocking out the Myosin-X gene (Myo10) on oocytes in mice. The major finding was that transzona processes (TZPs), which are filipodia-like structures that cross the oocyte's extracellular matrix shell (zona pellucida, ZP), were greatly reduced when the gene was globally knocked out. In comparison, an oocyte-specific knockout had no effect on TZPs. Using a machine learning algorithm developed by one of the authors, it was found that characteristics of the ZP were changed, and the oocyte shape was altered in the knockouts. RNAseq showed that many genes were upregulated in oocytes from knockout females. Oocytes from knockouts also failed to complete meiotic maturation at a higher rate and produced embryos that were fertilized less frequently and whose embryos were impaired in reaching the blastocyst stage. Finally, litters per female and pups per litter were lower in knockouts, indicating lower female fertility.

      Major comments:

      Overall, this is a very well done and comprehensive study that indicates a major role for MYO10 in oogenesis and oocyte developmental competence. There are some relatively major issues that should be resolved, however:

      1. An experiment was done to assess the number of follicles per ovary, which is shown in Fig. S3. No significant difference in follicle number (per unit area) was detected. However, there are two problems here. One is that only four repeats were done, and the lack of significance would appear to be driven by only one of the knockout repeats which had a high number of oocytes compared the others. It is possible that there is really not a biologically significant difference between the controls and somatic knockouts, but there are an insufficient number of repeats to determine this (technically, P>0.95 would mean they are the same). Second, it is unclear that the number of follicles per unit area is the relevant parameter for fertility rather than the absolute number of follicles. Both measures should be reported and tested statistically.
      2. A main function of TZPs is to transfer metabolites and other small molecules into the oocyte via Cx37-containing gap junctions. As the authors note, the phenotype here is different from the Cx37 knockout, where oocytes failed to develop. This implies some connectivity remains in Myo10 knockouts, but how much has not been determined. The amount of connectivity should be measured. The techniques are fairly straightforward and involve only microinjection of a fluorophore into the oocyte and measuring the spread into the surrounding somatic cells. This also has implications for the lack of effect on GVBD and resumption of meiosis, since Laurinda Jaffe's group has shown that diffusion of cGMP out though the gap junctions is important in this process.
      3. The TZP-like structures that remain are intriguing, but this was not followed up. They apparently are visible optically but contain neither actin nor membrane. Is it possible that these are tracks left from degenerated TZPs? Electron microscopy might resolve this question and should be considered. In any case, a more extensive discussion is warranted since the data are contradictory, with fluorescence-based methods indicating a decrease in TZPs but optical methods indicating an apparent increase.
      4. The apparent delay in formation of a perivitelline space is interesting. The perivitelline space forms gradually as the ZP detaches from the oocyte independent of meiotic maturation (see, e.g., Richard et al., 2017, J Cell Physiol 232:2436-46). Could this not be a delay in detachment and therefore transient (and dependent on when the assay was performed relative to oocyte isolation)?
      5. While GO analysis was done and shown in Table 1, this is not treated in any depth in the paper. There should be more description of the GO pathways that were upregulated and the implications.

      Minor comments:

      1. The comparisons that were done for whole-body knockout vs. oocyte-specific knockouts were only done by comparing each to its control. There is no direct comparison showing whether the two knockouts differ significantly from each other. The comparisons should be done using ANOVA with appropriate post-hoc tests to test all four groups against each other.
      2. The experiment in which 5-ethynyl uridine incorporation was used to show that global transcription was not increased may not actually be conclusive, since a large amount of RNA synthesized is not mRNA. A global increase in mRNA synthesis could still be occurring but the signal swamped by RNAs such as rRNA and other non-coding RNAs.

      Referees cross-commenting

      It looks like the reviewers basically agree that this is interesting but there are questions remaining about whether cumulus-oocyte coupling is affected (and could explain the phenotype) and why there is an apparent discrepancy between the results for detecting the numbers and densities of TZPs. These should be addressed.

      Significance

      This work has fundamental implications for understanding oocyte development and the role of the surrounding somatic cells in oogenesis and oocyte developmental competence. It also has direct implications for human and animal fertility and assisted reproduction.

      This is a fundamental new set of results that establishes a role for Myo10 and adds to the knowledge about the role of transzonal processes. It is a substantial advance over previously published research.

      The audience will primarily be basic biomedical researchers in the general field of reproductive biology as well as those investigating filipodia and should extend to those interested in translational research in infertility.

      I have direct and extensive expertise in the field of oogenesis in mice.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In the manuscript by Crozet et al. the authors investigated the contribution of transzonal projections (TZPs) to the oocyte development and acquisition of competence. The results were obtained using two Myo10 knockout mice models: a full knockout for Myo10 (Myo10-/- full) and an oocyte-conditioned knockout (Myo10-/- oo). The major findings due to the global depletion of Myo10 include the decrease in TZP density, discrete morphological alterations in the oocytes, alterations in oocyte gene expression, the inability of the oocytes to complete the first meiotic division (lack of 1PB extrusion), and subfertility in Myo10-/- full females.

      The research topic is interesting and, overall, I consider the manuscript relevant. However, to increase the scientific soundness authors are encouraged to explore the effects of the (partial) interruption of the germ-soma communication on the regulation of meiotic arrest and resumption. This is worth investigating (is optional, but highly recommended) since the lower density of TZPs is associated with an apparent normal meiotic arrest but an abnormal meiotic resumption. At first, the measurement of cGMP and cAMP into oocytes during meiotic arrest and resumption would be a nice try. This will help to shed light on the reasons for the abnormal meiotic progression, indicating if it is the consequence of a direct blockage in the transfer of molecules from follicular cells to the oocyte or an indirect consequence.

      Minor points

      Lines 53-55: The oocyte does not complete two successive meiotic divisions to generate a mature oocyte ready to be fertilized. Instead, meiosis completion only occurs if fertilization of MII-arrested oocytes takes place. Consider rephrasing to communicate the accurate concept.

      Lines 145-153 and Figure S4-F: Authors claim that TZP-deprived oocytes grow up to normal sizes. However, the perimeter of fully grown oocytes is lower in Myo10-/- full oocytes. This is conflicting.

      Referees cross-commenting

      In addition to the comments made by my own, my colleagues both suggested the inclusion of experiments to determine the functionality of the remaining TZP through dye diffusion assays. I concur with them.

      Significance

      The manuscript clearly adds to the existing knowledge. I'm convinced that the findings described here will be of interest for readers from the field of reproductive biology, follicle development, and oocyte biology.

      Authors are encouraged to better frame their findings as to the existing knowledge. There is at least one another knockout model in mice that leads to TZP density reduction (Zhang et al., 2021; Nature Comm., 12:2523). In this paper, the authors show that the TZPs connecting the GCs and the oocyte support proper oocyte development. Also, its removal results in subfertility. These previous findings should be acknowledged in the current manuscript.

      My expertise: researcher in reproductive biology; emphasis on folliculogenesis and oocyte development.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Overall comments

      We are pleased by the reviewers’ comments and appreciate their suggestions for improvements. In addition to correcting small typos throughout the manuscript, the major changes we did in response to their comments are as follows:

      • Changed the title of our paper to reflect the strong evolutionary correlation more accurately between sex chromosomal meiotic drive and gains/losses of SNBP genes in
      • New experiments to test the role of the well-conserved, universally retained SNBP, CG30056, in male fertility in * melanogaster*. Although reviewers had suggested we could eliminate this section, we felt that this would add a lot of weight to the unexpectedly inverse relationship between age/retention and fertility functions of SNBP genes. Thus, over the past few months, we have carried out new experiments with increased sample sizes, better controls, and sperm exhaustion. These new results strengthen our earlier analyses.
      • Better clarification of the X-Y chromosome fusion, which is a new observation, in the montium group via careful rewriting as partly suggested by Reviewer #2.
      • Highlighting that the genetic conflicts hypothesis does not rule out a role for sperm competition or other conflicts in shaping SNBP evolution in a revised Discussion. All changes in response to the reviewer’s comments have been detailed in our point-by-point response (below). You will see that we have addressed almost all the suggestions made, including with new experiments. The only reviewer suggestions (all optional from Reviewer 3), which we did not directly address in our revision are:

      • __Branch specific protamine evolution analyses for sex chromosome amplified SNBP genes: __given the state of SNBP gene annotation and the difficulties of assembling these genes in large tandem arrays, this will require considerable work and is beyond the scope of the paper.

      • Covariation between SNBP evolution and sperm morphology: We cannot perform these experiments as there is a paucity of sperm morphology data currently. Obtaining this data reliably is a significant undertaking.
      • Are SNBP genes more prone to be lost than average in the montium group: We have not comprehensively examined all loss events in the montium group or any other Drosophila This is also a non-trivial analysis, albeit it would be very interesting. However, we believe the more relevant comparison is whether these lost SNBP genes are more likely to be retained in non-montium species, which they are, as we now highlight. We hope you will favorably judge our good faith efforts to address all other reviewers’ comments, and their laudatory comments during the previous round of reviews.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): __

      Chang and Malik present a comprehensive evolutionary analysis of sperm nuclear basic proteins (SNBPs) in Drosophila. In addition, they provide a preliminary functional characterization of one such protein (CG30056) and describe a newly discovered X-Y chromosomal fusion in the Drosophila montium species group. All of these findings are interesting and important, but the headline from this study is the well-supported possibility that SNBPs, or at least a large fraction of them, function in suppressing X vs. Y chromosome meiotic drive. While this hypothesis is challenging to test experimentally, the authors provide strong correlational evidence that SNBPs are associated with drive by documenting these proteins' rapid evolution. This rapid evolution takes the form of sequence changes (as predicted by coevolution between drivers and suppressors of drive), gene amplification in cases when SNBPs move to sex chromosomes (consistent with the SNBP becoming a potential agent of drive for its new "home chromosome"), and gene loss in species with X-Y chromosome fusions (in which drive is not predicted to occur).

      Overall, this is an excellent, comprehensive study. The phylogenetic and genomic analyses are first-rate (and one of the first to make use of the new 101 Drosophila genomes); the logic is very well explained; conclusions are supported by multiple lines of evidence; the writing and figures are clear and accessible; and, the findings are fascinating. It's a good sign that it is easy to imagine several experiments one could do to follow up on this study, but I do not feel any are required in revision, as the manuscript is comprehensive as is. Thus, I have just a few minor points the authors may wish to consider in making revisions and a few suggestions for clarity/typos.

      __

      We thank the reviewer for their positive comments on our work.

      1. I would be interested in whether the authors think that all SNBPs in a given Drosophila species function(ed) in meiotic drive, or whether some fraction may play other roles, such as sexual selection or chromatin compaction, which have been the traditional hypotheses for SNBP function. Relatedly, given the high turnover of SNBPs the authors observe and the fact that some melanogaster-essential SNBPs are younger genes, would they like to comment on whether the subsets of SNBPs involved in drive/suppression vs. chromatin packaging/sperm traits/Wolbachia defense are likely to differ from across fly species? The reviewer raises an excellent point. In our revised discussion, we now speculate that different SNBPs might have distinct functions. For example, the same subset of SNBPs is subject to gene amplification and loss whereas other SNBPs are subject to less turnover. Moreover, even this stable set of SNBPs evolves rapidly, including in the montium group of species that have undergone dramatic SNBP loss. As the reviewer suggests, sperm competition or pressures from Wolbachia toxins might be is a driving force for sperm evolution. We discuss these possibilities and conclude in our discussion: “Our findings do not rule out the possibility that forces other than meiotic drive are also important for driving the rapid evolution and turnover of SNBP genes in Drosophila species.

      What do the authors make of the lower isoelectric points for a few of the SNBPs (e.g., CG31010 with pI = 4.77 in Table 1)? These proteins have identifiable HMG box domains, so is the pI driven lower by other parts of the protein sequence?

      We thank the reviewer for raising this point. We found that the pI of HMG domains can range from 6 to 12. Thus, the pI is driven by both HMG domains and other parts of proteins. We now include the pI of the whole SNBP protein and the HMG domain alone in Table 1. We do not have enough biochemical information to speculate on how these differences could alter SNBP function.

      __3. For readers less familiar with the field, it may help to spell out (e.g., on p. 6) why the authors consider ProtA/B to be important for fertility. Some of the previous papers on these genes describe them as dispensable - though the present authors are correct that these previous studies do detect fertility defects of various magnitudes under some conditions.

      __

      We agree with the reviewer. Previous studies are in disagreement about the importance of ProtA/ProtB for male fertility- while no significant effects were seen under standard fertility assays, sperm exhaustion conditions (mating with excess females) did reveal fertility effects. We have now added these references and discussed ProtA/ProtB more fully in our revision.

      On p. 9, paragraph 2, the data showing that "six different SNBP genes underwent 11 independent degeneration events in the montium group" are shown in Fig. 6A, not 5A.

      Thank you. This has been fixed in our revision.

      5. The summary Table 2 is useful, but I wonder whether including relative levels of expression and dN/dS in addition to ordinal rankings might help clarify. For instance, if there were a drop off in mean expression level between the 5th and 6th most highly expressed SNBP, this wouldn't be evident from the table.

      We agree with this suggestion and have now added this information.

      In Fig. 3, I like the use of the clean CG31010 figure in panel A to illustrate the circular representations. In addition, though, it might be useful to show Prot's graph at this same, larger size, since it's the most complicated and will likely be most closely examined.

      We agree with this suggestion and have now amended this figure in line with the reviewer’s suggestion.

      In Fig. 4, the end of the legend says that the species tree is shown "on the right," but it's on the left in the figure.

      Thank you. This has been fixed in our revision.

      __CROSS-CONSULTATION COMMENTS • I agree with both Reviewers 2 and 3 that the title could be changed to be a bit more tentative. I'd had this thought as well.

      __

      We agree with this suggestion. We have now amended this title to “Expansion and loss of sperm nuclear basic protein genes in Drosophila correspond with genetic conflicts between sex chromosomes.”

      • I agree with Reviewer 2 that the fertility assay could be conducted with a larger sample size and a better control in order to be better compared with how the authors described other published fertility phenotypes for SNBPs. For the control, crossing the deletion line to y w (or w1118) and using the resulting heterozygotes (KO/+) would be better than using the mutation over the balancer chromosome (KO/CyO). We agree with both suggestions. We now compare fertility between KO/KO and KO/+ males in sperm exhaustion assays. Our more stringent fertility assays find no evidence of CG30056 role in male fertility, strengthening our previous findings. We have now added the motivation for the new assays and the new results to our Revision.

      • I agree with Reviewer 3's third bullet point about spending a bit more time on the different possible roles that SNBPs could play in spermatids. (This is a more eloquent version of my review point #1.)

      We have now expanded our discussion of other possibilities in our revision.

      • I agree in principle with Reviewer 3's first bullet point about examining whether SNBP evolution correlates with changes in sperm morphology, but this feels like it could be a whole, fascinating study on its own, while this manuscript is already packed with data. I'd welcome the authors' thoughts about this in discussion, but wouldn't personally require a formal analysis of this to be added prior to publication.

      We also agree that this would be an interesting test. However, we are not able to do the test due to the scarcity of sperm phenotype data in Drosophila. We also think that our original version unintentionally downplayed this possibility. Our revised discussion makes clear that the rapid evolution of some Drosophila SNBP genes may be driven by sperm competition, just as in mammals, and influence the evolution of sperm morphologies.

      __Reviewer #1 (Significance (Required)):

      This study describes an important conceptual advancing in our understanding of the evolution and potential functions of sperm nuclear basic proteins (SNBPs) in Drosophila, which stands in interesting contrast to the functional roles of equivalent proteins in primates. It should be of broad interest to biologists studying spermatogenesis, meiotic drive, and genome evolution, both in and out of Drosophila. __

      We thank the reviewer for their positive appraisal.

      __ To contextualize the work, paternal DNA is typically compacted during spermatogenesis. This process involves the replacement of histones with other small, positively charged proteins in a sequential order, ending with protamines that bind DNA in mature sperm. In Drosophila, work over the last two decades (largely from the labs of R. Renkawitz-Pohl, B. Loppin and B. Wakimoto) has identified more than a dozen sperm nuclear basic proteins that localize to condensing/condensed spermatid nuclei. Two interesting observations have been that many of these proteins are dispensable for male fertility, and the proteins vary in their degree of evolutionary conservation. Recent work from Eric Lai's lab (J Vedanayagam et al. 2021, Nat Ecol Evol) showed that in D. simulans and sister species, at least one of these SNBP genes (Prot) underwent gene amplification and now acts in those species as a meiotic driver. This finding suggested the hypothesis, tested thoroughly in the present study, that the rapidly evolving SNBP gene family could be involved in causing or suppressing meiotic drive. Consistent with this idea, the authors here find that SNBP genes expand in copy number more frequently when they move from autosomes to sex chromosomes (consistent with the idea that they may cause or contribute to drive), and that otherwise well-conserved SNBP genes are lost in a group of species in which sex chromosome meiotic drive is not expected to occur. These findings are based on a thorough and well conducted phylogenomic and molecular evolutionary analysis of SNBPs across dozens of Drosophila species. Overall, this work generates exciting new hypotheses about the function of SNBPs and should be widely read both within and outside of the field.

      __

      We are grateful for the reviewer’s accurate summary of our work and its significance. We share the reviewer’s excitement and expect that more studies will explore the new function of SNBPs in multiple taxa soon.

      Keyword describing my field of expertise: Drosophila, molecular evolution, reproduction, genetics, genome evolution.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      The paper describes interesting patterns on the evolution of Drosophila SNBP genes, and proposes a very interesting explanation, namely, that meiotic drive is the main evolutionary force behind these patterns. Some of these observations have recently been made by other authors in a single case (the Dox genes in D. simulans), but not in the scale and breadth of the present ms. The ms combines an extensive investigation of available genomes with expert analysis, and new experimental data. In particular, the finding that the ancestral Y became incorporated into de X in montium species is very exciting, and may provide a smoking gun for the explanation proposed by the authors. Overall, I think it is a very good paper. I do have several criticisms and suggestions that may help to improve it.

      __

      We are grateful for the positive comments of the reviewer and for their constructive criticism and suggestions, which we have incorporated into our revision.

      __The paper has a speculative side that it almost unavoidable given its novelty and breadth. I do not see this as a problem per se, but I think the uncertain/unsupported/problematic points should be more openly presented to the readers. The main cases I noted are:

      1. The title of the ms states that "Genetic conflicts between sex chromosomes drive expansion and loss of sperm nuclear basic protein genes in Drosophila", but the evidence is somewhat circumstantial, and the patterns may be explained also by other known phenomena (e.g., demasculinization of the sex chromosomes; below). I think the tone of the end of the Introduction reflects more faithfully the strength of the evidence ("Thus, we conclude that rapid diversification of SNBP genes might be largely driven by genetic conflicts between sex chromosomes in Drosophila."). I understand the temptation of writing a bold title, but I think it is a bit misleading in the present case. I.e., it would be desirable that the title conveys the uncertainties of the data and their interpretation. __

      We agree with this suggestion. We have now amended this title to “Expansion and loss of sperm nuclear basic protein genes in Drosophila correspond with genetic conflicts between sex chromosomes.”

      However, we also want to highlight that de-masculinization of the X chromosome cannot explain the observed amplification and loss patterns of SNBP genes, except in cases of sex chromosome fusions. We now highlight the de-masculinization hypothesis for the latter case, but still strongly favor the genetic conflicts hypothesis.

      "In contrast, we found no instances of pseudogenization or subsequent translocation to the X chromosome of SNBP genes that are still preserved on their original autosomal locations or involved in chromosome fusions between autosomes (0/16). This difference is highly significant (Fig 5 and Table S11; 3:5 versus 0:16, Fisher's exact test, P=0.03). " Readers should be warned that this pattern can also be explained by the well-known demasculinization of X chromosomes (e.g., Sturgill et al. Nature 2007, 450, 238-241)

      We agree with this point and thank the reviewer for pointing this out. We now expressly raise the ‘de-masculinization of X chromosomes’ as one potential explanation of the pattern we observe here.

      "Indeed, no meiotic drive has been documented in the montium species even though it is rampant in many other Drosophila lineages [38]." Two remarks here: a) the authors should make clear that they are referring to sex-chromosome meiotic drive. b) I think the evidence is much weaker than the sentence implies. Sex-chromosome meiotic drive is known in less than 20 Drosophila species, scattered throughout the phylogeny. As far as I know all cases were discovered by accident, so the sampling is biased towards model species (e.g., the obscura group, which was very popular around 1930-1960). So we do not know the true frequency of sex-ratio meiotic drive among Drosophila species, nor, say, if it is more common in the Drosophila or Sophophora species, if it is suspiciously absent in the montium group (as suggested by the authors), etc. I think these uncertainties should be acknowledged or, perhaps, given the weakness of the argument, the sentence should be deleted or attenuated.

      We agree with this comment and have now removed this argument in our revision.

      __ "X-Y chromosome fusions eliminate the extent of meiotic drive and may lead to the degeneration of otherwise conserved SNBP genes, whose functions as drive suppressors are no longer required. Thus, unlike in mammals, sex chromosome-associated meiotic drive appears to be the primary cause of SNBP evolutionary turnover in Drosophila species." The authors found that in the montium species the ancestral Y became incorporated into de X chromosome, and that montium species seem to have an inordinate amount of SNBP gene losses. They combine these two observations by suggesting that these SNBP became dispensable or deleterious because they originally were involved in XY meiotic drive. I think many readers will think that males in montium species are X/0, whereas in fact in all of them carry a Y chromosome (just, in most cases, more gene poor than "normal" Y-chromosomes). I do not think this is a fatal flaw for the explanation proposed by the authors, but certainly is a difficulty that should be acknowledged.

      __

      We agree with this point. It was not our intention to suggest that montium group males are X/O, but this could be misinterpreted as we originally stated. We now add a clarification that montium group males still harbor a Y chromosome, which is missing most ancestrally Y-linked genes.

      __Problems/suggestions with experiments and data analysis

      1. There is a section titled "CG30056 is universally retained in Drosophila but dispensable for male fertility in D. melanogaster". In this section and in the figures, it is stated, "Although CG30056 is the most conserved SNBP we surveyed, we found no clear difference in offspring number between heterozygous controls and homozygous knockout males (Fig 2B). (...) We found either no or weak evidence of fertility impairments in two different crosses with homozygous CG30056 knockout males.". I think the fertility data are weak for the purpose of the authors, and I strongly suspect that this conclusion is wrong. Let me explain why. At other passages of the ms, the authors classify the SNPB genes in three groups. (i) essential/important for male fertility: "Three genes (Mst77F, Prtl99C and ddbt) are essential for male fertility while knockdown or knockout of two other SNBP genes (ProtA, and ProtB) leads to significant reduction in male fertility [27-30, 32]." (ii) genes that do not appear to impair male fertility at all. (iii) untested. CG30056 was in the last group, and hence the authors produced knockouts, tested their effect in male fertility, and concluded that it belongs to the second group. Now, look at Fig. 3B. The numbers of tested males are too small (it seems to range from 3 to 10), and male fertility is known to be a very noisy phenotype (as shown by the huge scatter in the authors' data). Furthermore, two different knockouts were tested, and both were nominally less fertile than the controls, and in one of them the difference is statistically significant. Taken at the face value, the knockouts seem to be perhaps ~25% less fertile than the controls. Another potentially big problem is that the "control males" actually carry visible dominant mutants (the balancers CyO or SM6) which certainly reduce their fitness, whereas the experimental males are wild-type for these mutants. Without the detrimental effect of these visible mutants in the controls, the difference to the CG30056 knockouts will probably be even larger. Note that the fertility effects of the genes ProtA, and ProtB (a.k.a. "Mst35B") , which the authors put in group "essential/important for male fertility" would not had been detected if assayed as the CG30056 gene: Tirmarche et al (2014; the reference cited by the authors) stated that: "In fact, the impact of Mst35B on male fertility was only revealed when mutant males were allowed to mate with a large excess of virgin females (1 for 10; Figure 3F) but not with a 1:1 sex ratio (not shown). " The authors' fertility test did not used this type of challenge. My general impression is that the fertility effects of CG30056 may actually be similar to ProtA and ProtB. I think the authors should do a proper fertility test of CG30056, or remove this section. Another possibly useful approach would be to classify the SNPB genes in those essential for male fertility and those that are not essential, because "experimentally speaking" this is a safer distinction (e.g., the fertility testes reported by other authors may also had been quick tests). Since these genes only function in sperm and are under purifying selection (otherwise they would have been lost; also, all have dN/dS We are very appreciative of the many important points raised by the reviewer. Rather than removing this conclusion, which is not central to our paper, we have now performed additional, well-controlled experiments to address the reviewer’s concerns, which we summarize below:

      2. We agree with the reviewer that it is easier classification to identify SNBP genes that are essential for male fertility versus those that are not.

      3. We also agree with the reviewer and now include more details about earlier studies to highlight that ProtA/ProtB fertility effects were only revealed in a sperm exhaustion setting.
      4. We agree with the reviewer’s suggestion and have now included sample sizes for all our experiments in a new supplementary Table (Supplementary file 8).
      5. We agree with the reviewer that a comparison between KO/KO and KO/Bal males is non-ideal given that Balancer chromosomes carry many deleterious mutations. We now include new experiments in our revision that compare KO/KO and KO/wt chromosomes.
      6. We agree with the reviewer that standard fertility assays may be too noisy to detect subtle fertility effects. We therefore now carry out much more stringent fertility assays under sperm exhaustion conditions with a male: female ratio of 1:10 and at least 10 males tested per genotype Despite this higher stringency, we detect no difference in fertility between KO/KO males and KO/wt controls for CG30056 (>10 males were tested for each). Thus, our original conclusion is even stronger that CG30056 has no detectable effect on male fertility. We have not tested the possibility of sperm storage or precedence being affected in our assays. However, we do believe that the finding that one of the best conserved and retained SNBP genes has no detectable effect on male fertility is an important conclusion which greatly increases the impact of our study, especially since most fertility-essential genes are either young or not universally conserved. We hope these changes will satisfy the reviewer's concerns about this section of our paper.

      "Our phylogenomic analyses also highlighted one Drosophila clade- the montium group of species (including D. kikkawai)- which suffered a precipitous loss of at least five SNBP genes that are otherwise conserved in sister and outgroup species (Fig 3). (...) Given our hypothesis that autosomal SNBP genes might be linked to the suppression of meiotic drive (above), we speculated that the loss of these genes in the montium group of Drosophila species may have coincided with reduced genetic conflicts between sex chromosomes in this clade." The montium data is an important part of the paper. I think the authors should test the statistical significance of this pattern.

      We appreciate the reviewer’s suggestion. However, we are unable to perform the statistical tests suggested for technical reasons. We note that three loss events occurred in the ancestor of D. montium species, while two happened in the ancestors of most D. montium species. Since it’s hard to estimate the evolutionary rates using these internal branches, we can’t directly compare them to other branches using statistics. However, in response to the reviewer’s comments, we now more clearly contrast the fate of SNBPs between D. montium species and other melanogaster group species, noting that three of five genes lost in the montium group are retained in all other melanogaster group species.

      __Other points:

      1. "The five remaining SNBP genes (Mst33A, CG30056, CG31010, CG34269, and CG42355) remain cytologically uncharacterized [30]." I think it will be interesting if the authors look at other potentially useful resources: Vibranovski et al papers which looked at gene expression in mitotic, meiotic and post-meiotic cells (_https://mnlab.uchicago.edu/sppress/index.php), and the papers by several labs on testis single-cell transcriptomic data (Witt et al 2021 PLOS Genetics. 17(8):e1009728 ; Nat Commun. 2021;12: 892). These may provide additional clues on the function of SNBP genes. There is also a recent report on sperm proteome (doi: _https://doi.org/10.1101/2022.02.14.480191) __

      We are grateful to the reviewer for this suggestion. We now add the data from single-cell expression analyses from Witt et al. in Table 1-figure supplement 1. We found most SNBPs are expressed at late spermatocytes and early spermatids, although CG30056 is primarily expressed in late spermatids, whereas CG34269 is expressed earlier in late spermagonia. The data from Vibrranovski et al. also show similar patterns but don’t have four of these genes, including CG34269. The data from Mahadevaraju et al. are from larva testes, and lack some critical stages during spermatogenesis. Thus, we only report the data from Witt et al.

      We also surveyed the proteome data as the reviewer suggested, but we only found 3 SNBPs (ProtA, ProtB, and Prtl99C) in the data. This did not include, Mst77F, which is the most highly expressed (see Table 2) and well-studied SNBP, so we suspect the proteomic study might be biased toward proteins from sperm tails. Therefore, we decide not to include this analysis.

      ____ "Our inability to detect homologs beyond the reported species does not appear to result from their rapid sequence evolution. Indeed, abSENSE analyses [45] support the finding that Prtl99C, Mst77F, Mst33A, Tpl94 and CG42355 were recently acquired in Sophophora within 40 MYA. For example, the probability of a true homolog being undetected for Prtl99C and Mst77F is 0.07 and 0.18 (using E-value=1), respectively (Table S1, Methods)." This should be complemented by synteny analysis.

      It may not have been clear from our original version that we did perform synteny analyses for all SNBP genes. We have now restated this more clearly in our revision.

      I found the following sentence unclear: "However, we could only ascribe a sex chromosomal linked location for species if no data was available from either BUSCO genes or females (only males and mixed-sex flies)."

      We modified the sentence to make it clearer: “However, we could not ascribe a sex-chromosomal linked location of a contig to either the X or Y chromosome in cases where there was no linkage information from BUSCO genes and no read data available from females, only from males and mixed-sex flies.”

      "Using the available assemblies with Illumina-based chromosome assignment, we surprisingly found that most ancestrally Y-linked genes are not linked to autosomes as was previously suggested [by Dupim et al 2018] (Fig 6A)."

      The new result of X-linkage is exciting, but the sentence is not exact: Dupim et al 2018 made clear that they could only separate X/A from Y-linkage. E.g., the legend of their Fig 3: "Phylogeny and gene content of the Y chromosome in the montium subgroup. "M" means amplification only in males (i.e., Y-linkage), whereas "MF" means amplification in both sexes (autosomal or X-linkage)."

      We are grateful to the reviewer for this correction. We now modified the sentence to make clear that Dupim et al had “showed that many ancestrally Y-linked genes are present in females because of possible relocation to other chromosomes in the montium group.”

      "The most parsimonious explanation for these findings is a single translocation of most of the Y chromosome to the X chromosome via a chromosome fusion in the ancestor of the montium group of species. Afterward, some of these genes relocated back to the Y chromosome in some species (Fig S6; Supplementary text)." Explanations for this pattern of "return to the Y" have been extensively discussed and tested in Dupim et al 2008 (see their section "Why genes seem to return to the Y chromosome after Y incorporations?" ) The available evidence strongly suggests that it is not a case of relocation to the Y.

      We thank the reviewer for raising this point. However, our conclusions disagree slightly with those from Dupim et al. 2018, in part because of additional sequencing in this clade. Dupim et al. suggested the possibility that most Y chromosomal loci duplicated to other chromosomes in the ancestor of the D. montium clade, following which each species degenerated either Y-linked or autosomal copies of genes. If this was the case, Y-linked copies should have diverged from X-linked copies since the ancestor of the D. montium clade. In contrast to this expectation, our phylogenetic analyses found that D. kikkawai Y-linked PRY is more closely related to X-linked PRY in all other related species (Figure 6- figure supplement 1). This result is much less parsimoniously explained by the ancient duplication event proposed by Dupim et al. and is more consistent with a ‘return-to-Y’ that we propose. We also make clear that, unlike PRY, we can’t differentiate the two hypotheses in the case of kl-2.

      Fig 6B suggests that the authors assembled the "translocated Y" in D. triauraria. However, no direct data or account for this assembly is provided. Please clarify.

      This was not our assembly. We searched all publicly available assemblies in the montium group and found one assembly (NCBI accession GCA_014170315.2) that assembled all ancestral Y-linked regions. We now clarify this in our revision.

      __ "Why would meiotic drive only influence Drosophila, but not mammalian, SNBP evolution? One important distinction may arise from the timing of SNBP transcription. In D. melanogaster, SNBP genes are transcribed before meiosis but translated after meiosis [29, 43, 57]. Thus, SNBP transcripts from a single allele, e.g., Xlinked allele, are inherited and translated by all sperm, regardless of which chromosomes they carry. Consequently, they can act as meiotic drivers by causing chromatin dysfunction in sperm without the allele, e.g., Y-bearing sperm." During spermatogenesis Drosophila haploid cells actually are syncytial, which has interesting consequences for the evolution of male genes (Raices et al, Genome Res. 1115-1122, 2019). This may be relevant for the present paper.

      __

      We thank the reviewer for this suggestion. We now gratefully include this citation in our revision.

      __Reviewer #2 (Significance (Required)):

      see above __ __Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This manuscript by Chang & Malik consider the evolution of HMG-box-containing sperm nuclear basic proteins (SNBPs) across Drosophila species in phylogenetic context.

      Previous work in mammals had highlighted fast evolution of proteins involved in chromatin remodeling during spermatogenesis. Here, the authors provide evidence for widespread positive selection and likely involvement in genetic conflict in a set of proteins with analogous functions in Drosophila. Amongst other findings, the authors highlight biased amplification of SNBP paralogs on sex chromosomes along several Drosophila lineages, a tendency towards loss/pseudogenization following translocation onto a sex chromosome, and an intriguing concerted SNBP loss event in the montium group where parts of the Y chromosome have become fused to the X, thus nullifying the chance that genetic conflicts can play out via distorted segregation of sex chromosomes. The authors suggest that, taken together, their findings support widespread of SNBPs involvement (as instigators and repressors) in meiotic drive. Overall, I found the manuscript to be well written and thorough in its exploration of the evolutionary dynamics of SNBPs in this clade.

      __

      We thank the reviewer for the accurate summary and the kind comments.

      __Below, I have highlighted some aspects that I think would benefit from further attention, none of them major.

      • Following their exploration of patterns of SNBP evolution in Drosophila, the authors highlight support of their data for genetic conflict between sex chromosomes. They also rightly acknowledge that other evolutionary drivers such as sperm competition might also play a role in, for example, fast evolution of certain SNBPs. Yet those (not mutually exclusive) alternatives are never pitted directly against each other. The focus is firmly on exploring the support for the sex chromosome genetic conflict model. Given that the authors highlight Drosophila as a great model in part because of its well characterized sperm biology (including comparative morphology), I wondered why the authors had not made an explicit attempt to see if SNBP evolution covaries with aspects of sperm morphology across Drosophila. __

      We do agree with the reviewer that it will be very interesting to test whether SNBP evolution covaries with sperm morphology in Drosophila. However, data on sperm morphology is scant in most Drosophila species. Indeed, this trait has only been well studied in clades with heteromorphic (different-sized) sperm but we agree this will be an exciting topic to consider in the future.

      We also clarify better in our revised discussion that our analyses do not rule out a role for sperm competition or sperm morphology in driving the evolution of at least some SNBP genes. We note that a subset of SNBP genes undergo gene amplifications and loss, but most SNBP genes evolve rapidly including in species with gene loss. Thus, the meiotic drive hypothesis is not to the exclusion of other hypotheses.

      • The most intriguing part of the manuscript for me was the exploration of SNBP fate in the montium group, where the authors find evidence for an ancestral fusion event between the X and parts of the Y chromosome. The loss of SNBPs is certainly consistent with the conflict model but I was wondering to what extent this lineage is characterized more broadly by unusual evolution at the chromosomal level. Is there simply a lot of upheaval in montium, with more frequent gain/loss across the board? How specific is SNBP loss in the context of other orthologous groups? This could be investigated by looking at retention of other genes in other orthologous groups (in montium and some other control group) or perhaps by looking at synteny conservation. This is a good suggestion. Using the same methodology as used in this paper, we found that very few D. melanogaster essential genes (2000) are lost in any single species we surveyed here (unpublished data). However, we have not carried out similar analyses for all genes; given vastly different rates of evolution, this would be a significant undertaking. Thus, we are not able to make a direct comparison between SNBP genes and a control group, that would include other testes-specific or fertility-essential genes. Instead, we highlight the fact that since we identify SNBPs using syntenic analyses, we have known that the neighboring genes of SNBPs are much better conserved than the SNBP genes themselves in the montium group species.

      • In introducing SNBPs, the authors focus on their role as packaging agents. Clearly, SNBPs do package the genome in the sense that they bind to DNA and lead to reduced chromosome volume. But is this all packaging for packaging's sake (as portrayed by the sperm shape hypothesis)? Or is the situation a bit more nuanced, where condensation leads to a reduction of volume but also to a shutdown of transcription, protection from DNA damage, etc.? I think the focus on packaging alone is somewhat limiting when it comes to imagining how these proteins might act in the context of genomic conflicts. The authors may want to broaden their description of SNBPs in the Introduction accordingly. We completely agree with the reviewer and are currently exploring these possibilities in follow-up studies on SNBP function. However, it is fair to add that this hypothesis has not been well-recognized, and we, therefore, prefer to include it in our revised Discussion rather than Introduction. However, we also think that SNBP packaging function might be targeted by Wolbachia-encoded toxins, speeding up their evolution (revised Discussion). We think there are many molecular possibilities for SNBPs.

      • The authors highlight that some SNBPs are expressed in mature sperm whereas others are transition proteins. The evidence for positive selection chiefly comes from the latter group (and "undefined" proteins that could also be transition proteins). Can the authors comment on whether this is expected/unexpected? Along the same lines, the authors highlight differences between Drosophila and mammals when it comes to the timing of transcription/translation during meiosis, suggesting that meiotic drive can happen in Drosophila because alleles are expressed early and can exert an effect after meiosis regardless of whether the associated locus is present in the gamete. I wonder how this relates, if at all, to the author's finding that transition SNBPs are more likely to be part of conflicts (as indicated by positive selection signals) compared to SNBPs in mature sperm. We thank the reviewer for this comment. We expect that many genes expressed explicitly in spermatogenesis, including SNBP genes, would be under position selection, regardless of whether they are associated with X-Y conflicts. The positive selection signals could come from either X-Y conflicts, sperm competition, or conflicts with Wolbachia; we now discuss all of these in the Discussion.

      In contrast, the amplification and loss of a subset of Drosophila SNBPs are more likely associated with X-Y conflicts. We note that known SNBPs retained in mature sperm are more likely to be subject to amplification than known transition proteins.

      Regarding the timing of expression, it is true that transition SNBPs act earlier in spermatogenesis than SNBPs retained in mature sperm. However, for the meiotic drive hypothesis to apply, all it requires is for SNBP expression to precede sperm individualization, which it does for most SNBPs, including transition proteins.

      • ____ It is not entirely clear from the text (and also e.g. Table S4) how dN and dS (and subsequently dN/dS) where calculated. I presume as a single estimate across the whole phylogeny? If so, how heterogeneous is dN/dS across the phylogeny and can the authors identify specific branches on which selective regimes are different? A branch-level analysis should be better powered than the site-level analysis the authors present, which requires repeated selection on the same set of sites to get a strong enough signal. A branch-specific assessment of evolution would be particularly valuable in combination when combined with the assessment of amplifications/losses. We thank the reviewer for this question. The reviewer is correct. We estimated dN and dS in Supplementary file 4 across the whole phylogeny. We conducted branch tests for the amplification of tHMG only in the Dsim clade (Supplementary file 11).

      We are interested in how SNBP amplification happened across species, but we need better gene annotation for their structure in many of these 19 independent cases. Moreover, we hope to combine these with transcriptomic analyses with detailed sequence analyses to reveal how the event happened and how gene conversion, gene duplication, and mutations affect their evolution. Each of these analyses requires extensive additional resources and analyses, and we feel are beyond the scope of this current paper.

      • The authors suggest that young SNBPs are more likely to encode essential, non-redundant male fertility functions (p7, third paragraph). I'm not sure whether this generalization is appropriate given the small sample. Tpl94D is as young as Mst77F/Prtl99C, tHMG and CG14835 homologs have been lost along different lineages and most of the events are in a single lineage leading up to D. kikkawai. Do the authors really feel that this generalization is warranted? We agree with the reviewer. However, it is striking that the known fertility essential genes are either young or not universally conserved. We have therefore reworded our conclusion to make this contrast more accurate.__

      • How do the sex-chromosomal amplifications differ in sequence from the ancestral autosomal copies? The authors suggest that the sex chromosomal copies might be involved in meiotic drive? Does the sequence offer a function as to how? (e.g. loss of charged residues/DNA-binding capacity?__

      These are good questions. We do not know mechanistically how the sex-chromosome amplifications may cause meiotic drive. We did not observe the loss of positive charge or HMG domain in most sex-chromosomal amplified copies (Supplementary file 3). Our current working hypothesis is that they compete for the DNA binding with autosomal SNBP, and might interact with other proteins, e.g., heterochromatin proteins, to disturb sperm function. How they might function to cause meiotic drive is an active area of investigation in our and other labs.

      • I think it would be nice to have a final table/figure to summarizing the different lines of evidence for all the genes in Table 1 (i.e. positive selection yes/no, amplification in some lineages yes/no, sex chromosome translocations yes/no), for different lineages, including whether any of the HMG-box genes are unlikely to act as SNBPs. We agree with this suggestion. We have now significantly revised and added to Table 2 to include this added information.

      • The evidence the authors present is often consistent with genetic conflicts between sex chromosomes. Is it cogent? Arguably not (since direct tests of the mechanism are provided. I would therefore suggest a more cautious title than one stating that conflicts drive expansion and loss of SNBPs. We agree with all three reviewers and have amended our title to highlight the correlation. We also discuss other possibilities that can drive SNBP evolution in our revised Discussion.

      __Typographical errors etc.:

      • P3. First paragraph: "One of the driving forces ... " I found this sentence a bit odd in terms of causality (changes in composition being portrayed as a force that leads to selection) __

      We thank the reviewer for pointing out the confusing construction. We modified the sentence to “The positive selection of SNBPs results in changes to their amino acid composition.”

      - P3. Second paragraph: should be "HMG-box" rather than "HMB-box"

      Fixed.

      - P3. Fourth paragraph "..., consistent with the observation in mammals". I think "consistent" should be reserved for two observations that speak to the same phenomenon. SNBPs could evolve with no evidence for positive selection in Drosophila and that wouldn't exactly be "inconsistent" with mammals. It would just be different.

      Fixed. We changed “consistent with” to “similar to”.

      ____- P5. Fifth paragraph: should be "in the PAML package" rather than "in PAML package"

      Fixed.

      - P9. Second paragraph: "... montium group (Fig 5A)...)" should be Fig 6A.

      Fixed.

      __CROSS-CONSULTATION COMMENTS I have not much to add. The other reviews seem fair and well-informed from my somewhat-outside perspective. I don't know how tricky/time-consuming the suggested additional fly mating experiments are but want to note that, in general, I'm loath to "punish" authors of principally bioinformatic work for including some experiments. If experimental shortcomings can be addressed with appropriate caveats, that should be an option, as should removal of experimental data that - by the experts - would be considered too preliminary.

      __

      We thank the reviewer for their support. However, we felt that improved experiments on CG30056 role in fertility could broaden the scope of this paper, despite the additional time and labor commitment. We have now finished these experiments and they do reinforce our original conclusions with much greater support.

      __It is my policy to sign my reviews.

      Tobias Warnecke

      Reviewer #3 (Significance (Required)):

      I'm not enough of an expert in the field of SNBPs to assess the level of advance provided by this study. __

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript by Chang & Malik consider the evolution of HMG-box-containing sperm nuclear basic proteins (SNBPs) across Drosophila species in phylogenetic context.

      Previous work in mammals had highlighted fast evolution of proteins involved in chromatin remodeling during spermatogenesis. Here, the authors provide evidence for widespread positive selection and likely involvement in genetic conflict in a set of proteins with analogous functions in Drosophila. Amongst other findings, the authors highlight biased amplification of SNBP paralogs on sex chromosomes along several Drosophila lineages, a tendency towards loss/pseudogenization following translocation onto a sex chromosome, and an intriguing concerted SNBP loss event in the montium group where parts of the Y chromosome have become fused to the X, thus nullifying the chance that genetic conflicts can play out via distorted segregation of sex chromosomes.

      The authors suggest that, taken together, their findings support widespread of SNBPs involvement (as instigators and repressors) in meiotic drive.

      Overall, I found the manuscript to be well written and thorough in its exploration of the evolutionary dynamics of SNBPs in this clade.

      Below, I have highlighted some aspects that I think would benefit from further attention, none of them major.

      • Following their exploration of patterns of SNBP evolution in Drosophila, the authors highlight support of their data for genetic conflict between sex chromosomes. They also rightly acknowledge that other evolutionary drivers such as sperm competition might also play a role in, for example, fast evolution of certain SNBPs. Yet those (not mutually exclusive) alternatives are never pitted directly against each other. The focus is firmly on exploring the support for the sex chromosome genetic conflict model. Given that the authors highlight Drosophila as a great model in part because of its well characterized sperm biology (including comparative morphology), I wondered why the authors had not made an explicit attempt to see if SNBP evolution covaries with aspects of sperm morphology across Drosophila.
      • The most intriguing part of the manuscript for me was the exploration of SNBP fate in the montium group, where the authors find evidence for an ancestral fusion event between the X and parts of the Y chromosome. The loss of SNBPs is certainly consistent with the conflict model but I was wondering to what extent this lineage is characterized more broadly by unusual evolution at the chromosomal level. Is there simply a lot of upheaval in montium, with more frequent gain/loss across the board? How specific is SNBP loss in the context of other orthologous groups? This could be investigated by looking at retention of other genes in other orthologous groups (in montium and some other control group) or perhaps by looking at synteny conservation.
      • In introducing SNBPs, the authors focus on their role as packaging agents. Clearly, SNBPs do package the genome in the sense that they bind to DNA and lead to reduced chromosome volume. But is this all packaging for packaging's sake (as portrayed by the sperm shape hypothesis)? Or is the situation a bit more nuanced, where condensation leads to a reduction of volume but also to a shutdown of transcription, protection from DNA damage, etc.? I think the focus on packaging alone is somewhat limiting when it comes to imagining how these proteins might act in the context of genomic conflicts. The authors may want to broaden their description of SNBPs in the Introduction accordingly.
      • The authors highlight that some SNBPs are expressed in mature sperm whereas others are transition proteins. The evidence for positive selection chiefly comes from the latter group (and "undefined" proteins that could also be transition proteins). Can the authors comment on whether this is expected/unexpected? Along the same lines, the authors highlight differences between Drosophila and mammals when it comes to the timing of transcription/translation during meiosis, suggesting that meiotic drive can happen in Drosophila because alleles are expressed early and can exert an effect after meiosis regardless of whether the associated locus is present in the gamete. I wonder how this relates, if at all, to the author's finding that transition SNBPs are more likely to be part of conflicts (as indicated by positive selection signals) compared to SNBPs in mature sperm.
      • It is not entirely clear from the text (and also e.g. Table S4) how dN and dS (and subsequently dN/dS) where calculated. I presume as a single estimate across the whole phylogeny? If so, how heterogeneous is dN/dS across the phylogeny and can the authors identify specific branches on which selective regimes are different? A branch-level analysis should be better powered than the site-level analysis the authors present, which requires repeated selection on the same set of sites to get a strong enough signal. A branch-specific assessment of evolution would be particularly valuable in combination when combined with the assessment of amplifications/losses.
      • The authors suggest that young SNBPs are more likely to encode essential, non-redundant male fertility functions (p7, third paragraph). I'm not sure whether this generalization is appropriate given the small sample. Tpl94D is as young as Mst77F/Prtl99C, tHMG and CG14835 homologs have been lost along different lineages and most of the events are in a single lineage leading up to D. kikkawai. Do the authors really feel that this generalization is warranted?
      • How do the sex-chromosomal amplifications differ in sequence from the ancestral autosomal copies? The authors suggest that the sex chromosomal copies might be involved in meiotic drive? Does the sequence offer a function as to how? (e.g. loss of charged residues/DNA-binding capacity?)
      • I think it would be nice to have a final table/figure to summarizing the different lines of evidence for all the genes in Table 1 (i.e. positive selection yes/no, amplification in some lineages yes/no, sex chromosome translocations yes/no), for different lineages, including whether any of the HMG-box genes are unlikely to act as SNBPs.
      • The evidence the authors present is often consistent with genetic conflicts between sex chromosomes. Is it cogent? Arguably not (since direct tests of the mechanism are provided. I would therefore suggest a more cautious title than one stating that conflicts drive expansion and loss of SNBPs.

      Typographical errors etc.:

      • P3. First paragraph: "One of the driving forces ... " I found this sentence a bit odd in terms of causality (changes in composition being portrayed as a force that leads to selection)
      • P3. Second paragraph: should be "HMG-box" rather than "HMB-box"
      • P3. Fourth paragraph "..., consistent with the observation in mammals". I think "consistent" should be reserved for two observations that speak to the same phenomenon. SNBPs could evolve with no evidence for positive selection in Drosophila and that wouldn't exactly be "inconsistent" with mammals. It would just be different.
      • P5. Fifth paragraph: should be "in the PAML package" rather than "in PAML package"
      • P9. Second paragraph: "... montium group (Fig 5A)...)" should be Fig 6A.

      Referees cross-commenting

      I have not much to add. The other reviews seem fair and well-informed from my somewhat-outside perspective. I don't know how tricky/time-consuming the suggested additional fly mating experiments are but want to note that, in general, I'm loath to "punish" authors of principally bioinformatic work for including some experiments. If experimental shortcomings can be addressed with appropriate caveats, that should be an option, as should removal of experimental data that - by the experts - would be considered too preliminary.

      It is my policy to sign my reviews.

      Tobias Warnecke

      Significance

      I'm not enough of an expert in the field of SNBPs to assess the level of advance provided by this study.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The paper describes interesting patterns on the evolution of Drosophila SNBP genes, and proposes a very interesting explanation, namely, that meiotic drive is the main evolutionary force behind these patterns. Some of these observations have recently been made by other authors in a single case (the Dox genes in D. simulans), but not in the scale and breadth of the present ms. The ms combines an extensive investigation of available genomes with expert analysis, and new experimental data. In particular, the finding that the ancestral Y became incorporated into de X in montium species is very exciting, and may provide a smoking gun for the explanation proposed by the authors. Overall, I think it is a very good paper. I do have several criticisms and suggestions that may help to improve it.

      The paper has a speculative side that it almost unavoidable given its novelty and breadth. I do not see this as a problem per se, but I think the uncertain/unsupported/problematic points should be more openly presented to the readers. The main cases I noted are:

      1. The title of the ms states that "Genetic conflicts between sex chromosomes drive expansion and loss of sperm nuclear basic protein genes in Drosophila", but the evidence is somewhat circumstantial, and the patterns may be explained also by other known phenomena (e.g., demasculinization of the sex chromosomes; below). I think the tone of the end of the Introduction reflects more faithfully the strength of the evidence ("Thus, we conclude that rapid diversification of SNBP genes might be largely driven by genetic conflicts between sex chromosomes in Drosophila."). I understand the temptation of writing a bold title, but I think it is a bit misleading in the present case. I.e., it would be desirable that the title conveys the uncertainties of the data and their interpretation.
      2. "In contrast, we found no instances of pseudogenization or subsequent translocation to the X chromosome of SNBP genes that are still preserved on their original autosomal locations or involved in chromosome fusions between autosomes (0/16). This difference is highly significant (Fig 5 and Table S11; 3:5 versus 0:16, Fisher's exact test, P=0.03). " Readers should be warned that this pattern can also be explained by the well-known demasculinization of X chromosomes (e.g., Sturgill et al. Nature 2007, 450, 238-241)
      3. "Indeed, no meiotic drive has been documented in the montium species even though it is rampant in many other Drosophila lineages [38]." Two remarks here: a) the authors should make clear that they are referring to sex-chromosome meiotic drive. b) I think the evidence is much weaker than the sentence implies. Sex-chromosome meiotic drive is known in less than 20 Drosophila species, scattered throughout the phylogeny. As far as I know all cases were discovered by accident, so the sampling is biased towards model species (e.g., the obscura group, which was very popular around 1930-1960). So we do not know the true frequency of sex-ratio meiotic drive among Drosophila species, nor, say, if it is more common in the Drosophila or Sophophora species, if it is suspiciously absent in the montium group (as suggested by the authors), etc. I think these uncertainties should be acknowledged or, perhaps, given the weakness of the argument, the sentence should be deleted or attenuated.
      4. "X-Y chromosome fusions eliminate the extent of meiotic drive and may lead to the degeneration of otherwise conserved SNBP genes, whose functions as drive suppressors are no longer required. Thus, unlike in mammals, sex chromosome-associated meiotic drive appears to be the primary cause of SNBP evolutionary turnover in Drosophila species." The authors found that in the montium species the ancestral Y became incorporated into de X chromosome, and that montium species seem to have an inordinate amount of SNBP gene losses. They combine these two observations by suggesting that these SNBP became dispensable or deleterious because they originally wee involved in XY meiotic drive. I think many readers will think that males in montium species are X/0, whereas in fact in all of them carry a Y chromosome (just, in most cases, more gene poor than "normal" Y-chromosomes). I do not think this is a fatal flaw for the explanation proposed by the authors, but certainly is a difficulty that should be acknowledged.

      Problems/suggestions with experiments and data analysis

      1. There is a section titled "CG30056 is universally retained in Drosophila but dispensable for male fertility in D. melanogaster". In this section and in the figures, it is stated, "Although CG30056 is the most conserved SNBP we surveyed, we found no clear difference in offspring number between heterozygous controls and homozygous knockout males (Fig 2B). (...) We found either no or weak evidence of fertility impairments in two different crosses with homozygous CG30056 knockout males.". I think the fertility data are weak for the purpose of the authors, and I strongly suspect that this conclusion is wrong. Let me explain why. At other passages of the ms, the authors classify the SNPB genes in three groups.
        • (i) essential/important for male fertility: "Three genes (Mst77F, Prtl99C and ddbt) are essential for male fertility while knockdown or knockout of two other SNBP genes (ProtA, and ProtB) leads to significant reduction in male fertility [27-30, 32]."
        • (ii) genes that do not appear to impair male fertility at all.
        • (iii) untested. CG30056 was in the last group, and hence the authors produced knockouts, tested their effect in male fertility, and concluded that it belongs to the second group. Now, look at Fig. 3B. The numbers of tested males are too small (it seems to range from 3 to 10), and male fertility is known to be a very noisy phenotype (as shown by the huge scatter in the authors' data). Furthermore, two different knockouts were tested, and both were nominally less fertile than the controls, and in one of them the difference is statistically significant. Taken at the face value, the knockouts seem to be perhaps ~25% less fertile than the controls. Another potentially big problem is that the "control males" actually carry visible dominant mutants (the balancers CyO or SM6) which certainly reduce their fitness, whereas the experimental males are wild-type for these mutants. Without the detrimental effect of these visible mutants in the controls, the difference to the CG30056 knockouts will probably be even larger. Note that the fertility effects of the genes ProtA, and ProtB (a.k.a. "Mst35B") , which the authors put in group "essential/important for male fertility" would not had been detected if assayed as the CG30056 gene: Tirmarche et al (2014; the reference cited by the authors) stated that: "In fact, the impact of Mst35B on male fertility was only revealed when mutant males were allowed to mate with a large excess of virgin females (1 for 10; Figure 3F) but not with a 1:1 sex ratio (not shown). " The authors' fertility test did not used this type of challenge. My general impression is that the fertility effects of CG30056 may actually be similar to ProtA and ProtB. I think the authors should do a proper fertility test of CG30056, or remove this section. Another possibly useful approach would be to classify the SNPB genes in those essential for male fertility and those that are not essential, because "experimentally speaking" this is a safer distinction (e.g., the fertility testes reported by other authors may also had been quick tests). Since these genes only function in sperm and are under purifying selection (otherwise they would have been lost; also, all have dN/dS < 1 ), they all most likely affect male fertility to some extent. In case the section on male fertility stays, it will be necessary to provide more details. How many males were crossed for each genotype? In some cases in Fig 2B, it seems that as low as 3, but it may be data superposition in the graph. Please provide the raw data in the supplementary material.
      2. "Our phylogenomic analyses also highlighted one Drosophila clade- the montium group of species (including D. kikkawai)- which suffered a precipitous loss of at least five SNBP genes that are otherwise conserved in sister and outgroup species (Fig 3). (...) Given our hypothesis that autosomal SNBP genes might be linked to the suppression of meiotic drive (above), we speculated that the loss of these genes in the montium group of Drosophila species may have coincided with reduced genetic conflicts between sex chromosomes in this clade." The montium data is an important part of the paper. I think the authors should test the statistical significance of this pattern.

      Other points:

      1. "The five remaining SNBP genes (Mst33A, CG30056, CG31010, CG34269, and CG42355) remain cytologically uncharacterized [30]." I think it will be interesting if the authors look at other potentially useful resources: Vibranovski et al papers which looked at gene expression in mitotic, meiotic and post-meiotic cells (https://mnlab.uchicago.edu/sppress/index.php), and the papers by several labs on testis single-cell transcriptomic data (Witt et al 2021 PLOS Genetics. 17(8):e1009728 ; Nat Commun. 2021;12: 892). These may provide additional clues on the function of SNBP genes. There is also a recent report on sperm proteome (doi: https://doi.org/10.1101/2022.02.14.480191)
      2. "Our inability to detect homologs beyond the reported species does not appear to result from their rapid sequence evolution. Indeed, abSENSE analyses [45] support the finding that Prtl99C, Mst77F, Mst33A, Tpl94 and CG42355 were recently acquired in Sophophora within 40 MYA. For example, the probability of a true homolog being undetected for Prtl99C and Mst77F is 0.07 and 0.18 (using E-value=1), respectively (Table S1, Methods)." This should be complemented by synteny analysis.
      3. I found the following sentence unclear: "However, we could only ascribe a sex chromosomal linked location for species if no data was available from either BUSCO genes or females (only males and mixed-sex flies)."
      4. "Using the available assemblies with Illumina-based chromosome assignment, we surprisingly found that most ancestrally Y-linked genes are not linked to autosomes as was previously suggested [by Dupim et al 2018] (Fig 6A)." The new result of X-linkage is exciting, but the sentence is not exact: Dupim et al 2018 made clear that they could only separate X/A from Y-linkage. E.g., the legend of their Fig 3: "Phylogeny and gene content of the Y chromosome in the montium subgroup. "M" means amplification only in males (i.e., Y-linkage), whereas "MF" means amplification in both sexes (autosomal or X-linkage)."
      5. "The most parsimonious explanation for these findings is a single translocation of most of the Y chromosome to the X chromosome via a chromosome fusion in the ancestor of the montium group of species. Afterward, some of these genes relocated back to the Y chromosome in some species (Fig S6; Supplementary text)." Explanations for this pattern of "return to the Y" have been extensively discussed and tested in Dupim et al 2008 (see their section "Why genes seem to return to the Y chromosome after Y incorporations?" ) The available evidence strongly suggests that it is not a case of relocation to the Y.
      6. Fig 6B suggests that the authors assembled the "translocated Y" in D. triauraria. However, no direct data or account for this assembly is provided. Please clarify.
      7. "Why would meiotic drive only influence Drosophila, but not mammalian, SNBP evolution? One important distinction may arise from the timing of SNBP transcription. In D. melanogaster, SNBP genes are transcribed before meiosis but translated after meiosis [29, 43, 57]. Thus, SNBP transcripts from a single allele, e.g., Xlinked allele, are inherited and translated by all sperm, regardless of which chromosomes they carry. Consequently, they can act as meiotic drivers by causing chromatin dysfunction in sperm without the allele, e.g., Y-bearing sperm." During spermatogenesis Drosophila haploid cells actually are syncytial, which has interesting consequences for the evolution of male genes (Raices et al, Genome Res. 1115-1122, 2019). This may be relevant for the present paper.

      Significance

      see above

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Chang and Malik present a comprehensive evolutionary analysis of sperm nuclear basic proteins (SNBPs) in Drosophila. In addition, they provide a preliminary functional characterization of one such protein (CG30056) and describe a newly discovered X-Y chromosomal fusion in the Drosophila montium species group. All of these findings are interesting and important, but the headline from this study is the well-supported possibility that SNBPs, or at least a large fraction of them, function in suppressing X vs. Y chromosome meiotic drive. While this hypothesis is challenging to test experimentally, the authors provide strong correlational evidence that SNBPs are associated with drive by documenting these proteins' rapid evolution. This rapid evolution takes the form of sequence changes (as predicted by coevolution between drivers and suppressors of drive), gene amplification in cases when SNBPs move to sex chromosomes (consistent with the SNBP becoming a potential agent of drive for its new "home chromosome"), and gene loss in species with X-Y chromosome fusions (in which drive is not predicted to occur).

      Overall, this is an excellent, comprehensive study. The phylogenetic and genomic analyses are first-rate (and one of the first to make use of the new 101 Drosophila genomes); the logic is very well explained; conclusions are supported by multiple lines of evidence; the writing and figures are clear and accessible; and, the findings are fascinating. It's a good sign that it is easy to imagine several experiments one could do to follow up on this study, but I do not feel any are required in revision, as the manuscript is comprehensive as is. Thus, I have just a few minor points the authors may wish to consider in making revisions and a few suggestions for clarity/typos.

      1. I would be interested in whether the authors think that all SNBPs in a given Drosophila species function(ed) in meiotic drive, or whether some fraction may play other roles, such as sexual selection or chromatin compaction, which have been the traditional hypotheses for SNBP function. Relatedly, given the high turnover of SNBPs the authors observe and the fact that some melanogaster-essential SNBPs are younger genes, would they like to comment on whether the subsets of SNBPs involved in drive/suppression vs. chromatin packaging/sperm traits/Wolbachia defense are likely to differ from across fly species?
      2. What do the authors make of the lower isoelectric points for a few of the SNBPs (e.g., CG31010 with pI = 4.77 in Table 1)? These proteins have identifiable HMG box domains, so is the pI driven lower by other parts of the protein sequence?
      3. For readers less familiar with the field, it may help to spell out (e.g., on p. 6) why the authors consider ProtA/B to be important for fertility. Some of the previous papers on these genes describe them as dispensable - though the present authors are correct that these previous studies do detect fertility defects of various magnitudes under some conditions.
      4. On p. 9, paragraph 2, the data showing that "six different SNBP genes underwent 11 independent degeneration events in the montium group" are shown in Fig. 6A, not 5A.
      5. The summary Table 2 is useful, but I wonder whether including relative levels of expression and dN/dS in addition to ordinal rankings might help clarify. For instance, if there were a drop off in mean expression level between the 5th and 6th most highly expressed SNBP, this wouldn't be evident from the table.
      6. In Fig. 3, I like the use of the clean CG31010 figure in panel A to illustrate the circular representations. In addition, though, it might be useful to show Prot's graph at this same, larger size, since it's the most complicated and will likely be most closely examined.
      7. In Fig. 4, the end of the legend says that the species tree is shown "on the right," but it's on the left in the figure.

      Referees cross-commenting

      • I agree with both Reviewers 2 and 3 that the title could be changed to be a bit more tentative. I'd had this thought as well.
      • I agree with Reviewer 2 that the fertility assay could be conducted with a larger sample size and a better control in order to be better compared with how the authors described other published fertility phenotypes for SNBPs. For the control, crossing the deletion line to y w (or w1118) and using the resulting heterozygotes (KO/+) would be better than using the mutation over the balancer chromosome (KO/CyO).
      • I agree with Reviewer 3's third bullet point about spending a bit more time on the different possible roles that SNBPs could play in spermatids. (This is a more eloquent version of my review point #1.)
      • I agree in principle with Reviewer 3's first bullet point about examining whether SNBP evolution correlates with changes in sperm morphology, but this feels like it could be a whole, fascinating study on its own, while this manuscript is already packed with data. I'd welcome the authors' thoughts about this in discussion, but wouldn't personally require a formal analysis of this to be added prior to publication.

      Significance

      This study describes an important conceptual advancing in our understanding of the evolution and potential functions of sperm nuclear basic proteins (SNBPs) in Drosophila, which stands in interesting contrast to the functional roles of equivalent proteins in primates. It should be of broad interest to biologists studying spermatogenesis, meiotic drive, and genome evolution, both in and out of Drosophila.

      To contextualize the work, paternal DNA is typically compacted during spermatogenesis. This process involves the replacement of histones with other small, positively charged proteins in a sequential order, ending with protamines that bind DNA in mature sperm. In Drosophila, work over the last two decades (largely from the labs of R. Renkawitz-Pohl, B. Loppin and B. Wakimoto) has identified more than a dozen sperm nuclear basic proteins that localize to condensing/condensed spermatid nuclei. Two interesting observations have been that many of these proteins are dispensable for male fertility, and the proteins vary in their degree of evolutionary conservation. Recent work from Eric Lai's lab (J Vedanayagam et al. 2021, Nat Ecol Evol) showed that in D. simulans and sister species, at least one of these SNBP genes (Prot) underwent gene amplification and now acts in those species as a meiotic driver. This finding suggested the hypothesis, tested thoroughly in the present study, that the rapidly evolving SNBP gene family could be involved in causing or suppressing meiotic drive. Consistent with this idea, the authors here find that SNBP genes expand in copy number more frequently when they move from autosomes to sex chromosomes (consistent with the idea that they may cause or contribute to drive), and that otherwise well-conserved SNBP genes are lost in a group of species in which sex chromosome meiotic drive is not expected to occur. These findings are based on a thorough and well conducted phylogenomic and molecular evolutionary analysis of SNBPs across dozens of Drosophila species. Overall, this work generates exciting new hypotheses about the function of SNBPs and should be widely read both within and outside of the field.

      Keyword describing my field of expertise: Drosophila, molecular evolution, reproduction, genetics, genome evolution.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements

      We thank the reviewers from Review Commons for their thorough reviews of our manuscript entitled, “The role of Limch1 alternative splicing in skeletal muscle function.” We were delighted by the many supportive comments of all three reviewers calling our study a “definite advance in our understanding of developmentally-regulated splice isoform transitions that are disease relevant”, “good comprehensive study with convincing results, the design of the experiments is good, and the conclusions are solid”, and “The article is well written, and I favor the publication of this [article] with minor revisions.”

      The reviews include comments on the interest in identifying the mechanism of action of mLIMCH1 in skeletal muscle function such as “ [The study] presents multiple new tools to study mLimch1 and identifies a possible role for mLIMCH1 in calcium regulation, but stops short of identifying the mechanism by which this regulation occurs.” While we agree that how the skeletal muscle-specific isoform of LIMCH1 affects calcium handling is of interest, we respectfully suggest that this manuscript describe previously unknown biology that will be of interest to investigators in different fields including muscle physiology, alternative splicing regulation, and skeletal muscle pathology in myotonic dystrophy. All experiments in this manuscript are performed in vivo using skeletal muscle tissues from animals lacking the isoform of Limch1 that is expressed only in skeletal muscle and is normally induced after birth. Comparisons were made to age-matched wild-type control animals, often litter mates. The results establish the functional significance of the LIMCH1 protein and particularly the muscle-specific isoform in skeletal muscle through extensive analysis of LIMCH1 localization and the impact of mLIMCH1 knockout on muscle strength, force generation, calcium handling and the disease relevance of this splicing transition in myotonic dystrophy type 1. Please review the comments of all three reviewers who were quite favorable to the significance of the work and overall favorable to its publication. Below, we clarify and describe additional data that has, and will be added to the manuscript to address all comments of the reviewers.

      2. Description of the planned revisions

      Reviewer 1

      “Page 6 - data not shown. The point of conservation is not essential to this story, but the authors should either include a table or panel with that data, or remove the data not shown statement. Given the putative relevance to DM1, it might be preferable to include data to support the developmental transition in human data.”

      We have removed the “data not shown” statement as suggested and we highlighted the importance of conservation of the induction of a skeletal muscle isoform of LIMCH1 after birth as a strong indication of functional importance for the isoform. We agree that data showing the conserved LIMCH1 splicing transition in human skeletal muscle development will support this point. We will include RT-PCR analysis of LIMCH1 splicing in fetal and adult human skeletal muscle RNA in Figure 6 to support the reversion of splicing to the fetal pattern observed in DM1. The results will complement the normal Limch1 splicing transition in mice (Figure 1) and the normal and aberrant fetal splicing patterns shown for unaffected and DM1 adult skeletal muscle, respectively (Figure 6).

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer 1

      “Figure 4 - The authors do a nice experiment to show the localization of Limch1 and raise an antibody to detect the muscle specific isoform. The data seem to show that the muscle-specific isoform localizes to the sarcolemma, and this staining is largely lost in the mutant mice. By contrast, one could infer that the cytoplasmic signal in the WT comes from the ubiquitous isoform (which accounts for 30-40% of the Limch1 expression). This is consistent with the validation in Fig. 2. However, the authors in the text claim this experiment reveals an increased distribution throughout the myofiber, or a more even signal distribution in the cytoplasm, and that the uLimch1 cannot recapitulate mLimch1 localization. Fig. 2 suggests that total levels of Limch1 are increased (as noted by the authors in the discussion). Given that the muscle-specific isoform localizes to the sarcolemma, and the ubiquitous isoform is presumably sarcoplasmic, it isn't clear to me that there is any change in localization per se. What the authors show is just that the signal at the sarcolemma is lost, and if one compares the intensity in the right-hand plots in Fig. 4B, they are comparable in the sarcoplasmic region. It seems likely there is more of the ubiquitous isoform, and what is seen here is just how that isoform localizes. The quantification the authors perform in D would likely show this strong difference in the localization of the muscle isoform. If the authors redo this quantification, exclude the signal at the sarcolemma and normalize to the average pixel intensity in the fiber, do they still see a difference? I am not convinced that the "clustering" of the signal of the ubiquitous, cytoplasmic isoform is in any way changed. Given the difference in the two proteins, I also would not expect that the ubiquitous isoform could compensate for loss of the muscle isoform, and would not expect it to "recapitulate" the muscle-isoform localization.”

      We agree that Figure 4 and the explanation in the text was not clear and we thank the reviewer for pointing this out. We have addressed this concern by modifying the figure as suggested by the reviewer and clarifying the description in the results section. The main point, that is recognized by the reviewer but needed clarification, is that the mLIMCH1 isoform preferentially localizes to the sarcolemma and the uLIMCH1 isoform is preferentially cytoplasmic. In the HOM Limch1 6exKO myofibers, the increased cytoplasmic signal is due to the increased level of uLIMCH1 as shown by the western blot in Figure 2. The reviewer is correct that there is not a “change in localization of isoforms per se”. We clarified this point to highlight the differential localization of the uLIMCH1 and mLIMCH1 isoforms within the sarcolemma vs. the sarcoplasm. The revision of the plot profile in Figure 4B and the analysis of the standard deviation of signal in Figure 4D demonstrates the stark difference in staining observed between the HOM Limch1 6exKO and WT myofibers when stained with a pan-LIMCH1 antibody. The signal intensity plot profile from sarcolemma to sarcolemma (Figure 4B) indicates that the uLIMCH1 isoform is not “mis-localized” upon mLIMCH1 knockout as we originally (mis)-stated. Upon mLIMCH1 knockout, there is increased uLIMCH1 expression compared to WT myofibers. Considering this in combination with the sarcolemma preference of mLIMCH1 (Figure 4E) and the significant loss of signal in the sarcolemma region in Limch1 6exKO myofibers, we conclude that in HOM Limch1 6exKO myofibers, uLIMCH1 is primarily localized throughout the sarcoplasm.

      Reviewer 1 (optional)

      “Experiments looking more closely at LIMCH1 co-localization with other proteins at the sarcolemma or the sufficiency of the muscle-specific region to localize would also be useful (for example, can the muscle-specific region localize GFP to the membrane in cells?).”

      We performed immunofluorescence microscopy of LIMCH1 with several skeletal muscle-relevant proteins but did not observe: (1) disruptions of normal structures in HOM Limch1 6exKO compared to WT myofibers or (2) colocalization that helped clarify any mechanistic role of mLIMCH1 or uLIMCH1. Therefore this data was not included in the original manuscript. In regard to the suggestion on the sufficiency of the muscle-specific region to localize to the sarcolemma region, we had previously generated a plasmid to express a fluorescent protein fused to the protein encoded by the six skeletal muscle-specific exons of LIMCH1 but it failed to localize to the sarcolemma. In collaboration with protein structural experts at Baylor College of Medicine, we analyzed the skeletal muscle-specific region of LIMCH1 and found it to be entirely disordered without known homologs. It appears that this region has no secondary structure but when expressed within the entire LIMCH1 protein which has conserved domains (calponin homology, LIM, coiled-coil regions) and upon protein binding, it is possible for the region to adopt a structure facilitating its binding in the sarcolemma region. Therefore we believe that regions common to both isoforms are required in combination with the muscle-specific region for preferential localization to the sarcolemma.

      Reviewer 1 (minor comments)

      “In the Figure 3 legend, the order of the descriptions for B-C and D-E is switched. The order of the panels matches the text, but the legend switches the description of the force-frequency curves (shown in B & C but labeled as D & E), with the description of the rate of relaxation and contraction plots (shown in D and E but labeled as B and C in the legend).”

      We fixed this error and thank the reviewer for pointing it out.

      “The scale in Figure 4, panel B between the top and bottom plots is not the same, so it is difficult to compare, particularly for the panels on the right. See comment above.”

      In addition to clarifying uLIMCH1’s localization upon mLIMCH1 knockout within the text, we added figure titles above the plot profile which will clarify the different plot profiles for the reader. In regard to the comment about the scale of the plot profile, we have addressed this by re-scaling the two plot profiles on the right in Figure 4B. These plot profiles now share the same scale, which is advantageous because this plot profile better emphasizes the stark difference in signal observed between the sarcolemma and sarcoplasm in WT myofibers that is lacking in HOM Limch1 6exKO myofibers.

      Reviewer 2

      “Figure 6A: There is a discrepancy between gene structures and splicing isoforms shown in Fig. 1 vs Fig. 6. There are differences in spacing between exons, and there appear to be six exons in the differentially regulated region in Fig 1, but seven exons in Fig 6. Perhaps this is a difference between human and mouse genes? Does the human gene actually regulate seven exons in this region, rather than six exons in the mouse? In both figures the gene is labeled as Limchi1, and both figures indicate that the ubiquitous isoform lacks exons 9-14. Please clarify.”

      The reviewer is correct that the human mLIMCH1 isoform contains seven exons that are skeletal muscle-specific compared with the six exons that are skeletal muscle-specific in the mouse. The seven human exons encode 544 amino acids with 65% homology with the mouse segment. We have clarified this in the figure legend and text. Exons 9-14 are shown in Figure 6B since this diagrams the mouse gene.

      “The methods section on RT-qPCR and RNA splicing presumably refers to analysis of mouse tissues. What is the origin of the human DM1 RNA-seq data?”

      We obtained adult human DM1-affected and non-affected skeletal muscle autopsy samples from colleagues and the NDRI and performed RNA-sequencing at Baylor College of Medicine. The RNA-seq has not yet been published, but we include the data for LIMCH1 to demonstrate the dramatic change in the alternative splicing pattern in DM1 skeletal muscle tissue. This has been clarified in the methods section.

      “Perhaps the word "activity" should be deleted in the following sentence: "The sole study investigating the function of LIMCH1 characterized it as an actin stress fiber associated protein that binds non-muscle myosin 2A (NM2A) activity to regulate focal adhesion formation."

      We thank the reviewer for pointing this out and we have removed this word.

      Reviewer 3

      “The diminution of the muscle force production in Limch16exKO is not correlated with a change in morphology of the myofibers in H&E and picrosirius stainings (Fig S2). Did the authors look at other skeletal muscles, fiber type, size, or different time points? (The age of the mouse and the name of the skeletal muscle used for the histology could be included in the results sections or figure legend).”

      As suggested by Reviewer 3, we have included additional histological data in Supplementary Figure 2. In addition to the histology at 10-12 weeks of age, the new data includes histology of multiple skeletal muscle tissues (quadriceps, EDL, soleus) at one year of age. The histology of Limch1 6exKO tissue at different time points showed no morphological differences (centralized nuclei or fibrosis) consistent with no change in muscle weight which led us to emphasize the significant effect of mLIMCH1 knockout on skeletal muscle function in the absence of muscle loss or overt structural changes. In regard to fiber-type, we have included histology of both the EDL (fast-twitch) and soleus (slow-twitch) and even after one year, we observe no gross morphological differences. Additionally, we analyzed the force production of both the EDL and soleus (Figure 3) with the fiber-type predominance of these tissues in mind and found decreased force generation in both tissues. We included the types of skeletal muscle tissue analyzed and the age of the mice in Supplementary Figure 2 as per the reviewer’s suggestion.

      “The authors performed RNAseq analysis in the skeletal muscle of the KO mouse (Fig 2B). What is the result of this experiment? Is the KO muscle transcriptome different or similar to control muscles?”

      We conducted RNA-sequencing on tissue from HOM Limch1 6exKO and WT controls and the results were disappointing showing minor differences that did not contribute to understanding the phenotype. We used this data only to show the loss of the six exons in Fig. 2B, however, we decided that RT-PCR analysis was the better assay since it shows not only that the exons are not included but also that exons 8 and 15 are spliced correctly, which is not apparent using the RNA-seq displayed on the genome browser.

      4. Description of analyses that authors prefer not to carry out

      Reviewer 1 (____Both points listed as optional)

      “If the authors perform TEM, can they see defects in t-tubules or organization of the sarcoplasmic reticulum, that are not visible by light microscopy?”

      We considered conducting TEM to investigate sarcomeric, T-tubule, or sarcolemma changes in myofibers derived from HOM Limch1 6exKO mice, but we concluded that it would most likely be of limited use. We do not think that T-tubule structural changes will be observed via TEM primarily due to the challenges of finding significant changes compared to WT controls in which one can always find abnormal structures. In our experience and the experience of our collaborator (Dr. Rodney) the disruptions must be dramatic to distinguish from the noncanonical structures often observed. Thus, we do not plan on conducting TEM to identify defects in the T-tubules.

      “If the muscle-specific isoform is transfected or transduced into differentiated myotubes, how does this affect calcium dynamics in the culture system?”

      While an interesting idea, we do not plan on conducting this experiment for multiple reasons. One issue is that all of our data is derived from in vivo analysis or from isolated myofibers and our concern is that the relatively immature state of myotubes in culture will provide a poor comparison to isolated myofibers. Therefore, we believe that it will be difficult to add meaningful data to the calcium data presented in Figure 5 through this experiment. Additionally, we have observed mis-localization of the overexpressed uLIMCH1 and mLIMCH1 in C2C12 cells that we believe would add too many caveats for meaningful interpretation of the results, regardless of the effects on calcium dynamics

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this manuscript, the authors have generated a knockout mouse model of a skeletal muscle-specific splice variant isoform of Limch1. These KO mice present skeletal muscle force production and calcium handling defects. These results could explain why the deficiency in splicing in myotonic dystrophy1 can lead to skeletal muscle defects.

      Overall, this is a good comprehensive study with convincing results, the design of the experiments is good, and the conclusions are solid. The article is well written, and I favor the publication of this article with minor revisions.

      Issues that I think the authors should clarify:

      • The diminution of the muscle force production in Limch16exKO, is not correlated with a change in morphology of the myofibers in H&E and picrosirius stainings (Fig S2). Did the authors look at other skeletal muscles, fiber type, size, or different time points? (The age of the mouse and the name of the skeletal muscle used for the histology could be included in the results sections or figure legend)
      • The authors performed RNAseq analysis in the skeletal muscle of the KO mouse (Fig 2B). What is the result of this experiment? Is the KO muscle transcriptome different or similar to control muscles?

      Significance

      In this manuscript, the authors have generated a knockout mouse model of a skeletal muscle-specific splice variant isoform of Limch1. These KO mice present skeletal muscle force production and calcium handling defects. These results could explain why the deficiency in splicing in myotonic dystrophy1 can lead to skeletal muscle defects.

      Overall, this is a good comprehensive study with convincing results, the design of the experiments is good, and the conclusions are solid. The article is well written, and I favor the publication of this article EMBO journal with minor revisions.

      Issues that I think the authors should clarify:

      • The diminution of the muscle force production in Limch16exKO, is not correlated with a change in morphology of the myofibers in H&E and picrosirius stainings (Fig S2). Did the authors look at other skeletal muscles, fiber type, size, or different time points? (The age of the mouse and the name of the skeletal muscle used for the histology could be included in the results sections or figure legend)
      • The authors performed RNAseq analysis in the skeletal muscle of the KO mouse (Fig 2B). What is the result of this experiment? Is the KO muscle transcriptome different or similar to control muscles?
    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      This manuscript continues the Cooper lab's analysis of the role of alternative splicing in muscle development and function. Here they report an intriguing alternative splicing difference between fetal and adult tissues involving 6 consecutive exons in the LIMCHI1 gene that are included predominantly in adult muscle to encode a longer isoform of the protein. Moreover, by CRISPR/Cas9-mediated deletion of these exons in mouse models, they show that muscle deficient in the longer LIMCHI1 protein isoform exhibits grip strength weakness in vivo and decreased force generation ex vivo. The mechanistic details remain to be investigated, but evidence so far suggests an intrinsic defect in muscle contraction, perhaps related to aberrant calcium handling, without obvious histopathology or muscle loss. Finally, these new findings may have important implications for human patients with myotonic dystrophy type 1 that typically exhibit defects in MBNL-regulated splicing events, because the authors show (1) that patient muscle poorly expresses the muscle isoform of LIMCHI1, due to inappropriate skipping of the exons, and (2) that mice with knockout of MBNL proteins also predominantly skip these exons.

      Major comments

      1. The major conclusions of the manuscript are clear and convincing -a muscle-specific cluster of 6 exons in the LIMCHI1 gene whose splicing is regulated directly or indirectly by MBNL splicing factor(s); loss of these exons compromises muscle strength; and these exons are poorly spliced in muscle of myotonic dystrophy patients. The data for these conclusions is strong.
      2. The authors do consider alternative explanations where appropriate. For example, they speculate in the discussion that muscle defects could be due not only to loss of the muscle-specific isoform, but possibly also due to the corresponding increase in expression of the non-muscle-specific isoform.
      3. Figure 6A: There is a discrepancy between gene structures and splicing isoforms shown in Fig. 1 vs Fig. 6. There are differences in spacing between exons, and there appear to be six exons in the differentially regulated region in Fig 1, but seven exons in Fig 6. Perhaps this is a difference between human and mouse genes? Does the human gene actually regulate seven exons in this region, rather than six exons in the mouse? In both figures the gene is labeled as Limchi1, and both figures indicate that the ubiquitous isoform lacks exons 9-14. Please clarify.

      Minor comments

      1. The methods section on RT-qPCR and RNA splicing presumably refers to analysis of mouse tissues. What is the origin of the human DM1 RNA-seq data?
      2. p. 4: Perhaps the word "activity" should be deleted in the following sentence: "The sole study investigating the function of LIMCH1 characterized it as an actin stress fiber associated protein that binds non-muscle myosin 2A (NM2A) activity to regulate focal adhesion formation."
      3. Other than the issue raised above regarding LIMCHI1 gene structure, the figures are clearly presented.

      Significance

      The results in this study could have important implications both regarding muscle function and regulation of alternative splicing. The demonstration of a muscle-specific isoform of LIMCHI1 is a novel finding that suggests previously unknown functions of the protein in muscle contraction. This raises intriguing questions as to how this alternative domain impacts muscle function through cooperation with other domains previously predicted (or shown) to interact with actin and non-muscle myosin. Regarding splicing, co-regulation of exon clusters is a poorly understood phenomenon that could be the subject of future interesting studies. Both issues could be relevant to understanding defects in human patients with myotonic dystrophy type I.

      The work would be of interest to scientists studying muscle function as well as those studying alternative splicing. Both groups would probably be intrigued by these results but might consider the results to be relatively preliminary, need more mechanistic details in the future.

      Expertise: I have extensive experience in analysis of alternative splicing regulation. My knowledge of specific techniques to evaluate muscle function is more limited. Although the experiments on muscle function seem clear and convincing to me, I admit that I am not an expert on those methods and could have missed an important point.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      In their paper "The role of Limch1 alternative splicing in skeletal muscle function," Penna and colleagues report a muscle-specific isoform of Limch1 and investigate its function in skeletal muscle. They show that a muscle-specific isoform of Limch1 is expressed preferentially in mature muscle, and demonstrate that animals mutant for this isoform have reduced grip strength and force generation. Notably, although muscle structure and T-tubules are structurally not affected, mutant muscle shows evidence of disrupted calcium handling. Limch1 is also misspliced in DM1 and Mbnl1/2 double mutant mice, suggesting the muscle isoform is disease relevant and regulated by MBNL.

      Major comments

      Page 6 - data not shown. The point of conservation is not essential to this story, but the authors should either include a table or panel with that data, or remove the data not shown statement. Given the putative relevance to DM1, it might be preferable to include data to support the developmental transition in human data.

      Figure 4 - The authors do a nice experiment to show the localization of Limch1, and raise an antibody to detect the muscle specific isoform. The data seem to show that the muscle-specific isoform localizes to the sarcolemma, and this staining is largely lost in the mutant mice. By contrast, one could infer that the cytoplasmic signal in the WT comes from the ubiquitous isoform (which accounts for 30-40% of the Limch1 expression). This is consistent with the validation in Fig. 2. However, the authors in the text claim this experiment reveals an increased distribution throughout the myofiber, or a more even signal distribution in the cytoplasm, and that the uLimch1 cannot recapitulate mLimch1 localization. Fig. 2 suggests that total levels of Limch1 are increased (as noted by the authors in the discussion). Given that the muscle specific isoform localizes to the sarcolemma, and the ubiquitous isoform is presumably sarcoplasmic, it isn't clear to me that there is any change in localization per se. What the authors show is just that the signal at the sarcolemma is lost, and if one compares the intensity in the right-hand plots in Fig. 4B, they are comparable in the sarcoplasmic region. It seems likely there is more of the ubiquitous isoform, and what is seen here is just how that isoform localizes. The quantification the authors perform in D would likely show this strong difference in the localization of the muscle isoform. If the authors redo this quantification, exclude the signal at the sarcolemma and normalize to the average pixel intensity in the fiber, do they still see a difference? I am not convinced that the "clustering" of the signal of the ubiquitous, cytoplasmic isoform is in any way changed. Given the difference in the two proteins, I also would not expect that the ubiquitous isoform could compensate for loss of the muscle isoform, and would not expect it to "recapitulate" the muscle-isoform localization.

      OPTIONAL: It would be interesting to examine how loss of the muscle-specific Limch1 isoform results in disrupted calcium handling. This is the mechanism that is not addressed in the paper, as the authors note in the discussion. If the authors perform TEM, can they see defects in t-tubules or organization of the sarcoplasmic reticulum, that are not visible by light microscopy? Experiments looking more closely at LIMCH1 co-localization with other proteins at the sarcolemma or the sufficiency of the muscle-specific region to localize would also be useful (for example, can the muscle-specific region localize GFP to the membrane in cells?). If the muscle-specific isoform is transfected or transduced into differentiated myotubes, how does this affect calcium dynamics in the culture system? As the authors note in the discussion, identification of mLimch1 versus uLimch1 interactors would be particularly interesting, and provide insight into how this protein can affect calcium handling without impacting structure.

      Minor comments

      • a. In the Figure 3 legend, the order of the descriptions for B-C and D-E is switched. The order of the panels matches the text, but the legend switches the description of the force-frequence curves (shown in B & C but labeled as D & E), with the description of the rate of relaxation and contraction plots (shown in D and E but labeled as B and C in the legend).
      • b. The scale in Figure 4, panel B between the top and bottom plots is not the same, so it is difficult to compare, particularly for the panels on the right. See comment above.

      Significance

      This is a well-written study identifying the function of a muscle-specific isoform of LIMCH1, as well as implicating a switch in Limch1 isoform expression in DM1 models as a target of MBNL regulation. It presents multiple new tools to study mLimch1, and identifies a possible role for mLIMCH1 in calcium regulation, but stops short of identifying the mechanism by which this regulation occurs. The study is a definite advance in our understanding of developmentally-regulated splice isoform transitions that are disease relevant. The work would be of interest to scientists with specialized interests in muscle development and isoform-specific function in myogenesis, as well as more broadly of interest to clinical scientists for the possible connection to DM1.

      I am an expert in RNA regulation and muscle development.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: #RC-2022-01697

      Corresponding author(s): William Roman; Edgar R. Gomes

      [The “revision plan” should delineate the revisions that authors intend to carry out in response to the points raised by the referees. It also provides the authors with the opportunity to explain their view of the paper and of the referee reports.

      The document is important for the editors of affiliate journals when they make a first decision on the transferred manuscript. It will also be useful to readers of the reprint and help them to obtain a balanced view of the paper.

      If you wish to submit a full revision, please use our "Full Revision" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      We would like to thank the reviewers for their careful evaluation of our study. The goal of this work is to demonstrate that fiber type composition can be altered with exercise of in vitro muscle cultures. These findings provide an additional strategy to better mimic muscle in vitro for biological investigation and disease modelling. The reviewers’ comments will strengthen the conclusions of our study.

      In this point-by-point answer, we also include a statement on the feasibility of each comment based on preliminary work we have performed since receiving the reviews. We expect experiments can be achieved within 2 – 3 months.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The manuscript by Henning et al describes a method to induce myofiber subtype specification in vitro based on optogenetics and particle image velocimetry. The work is well performed and the manuscript is clear. The findings might be useful to the muscle community, but there are some issues which should be addressed in order to improve the quality and impact of the manuscript.

      My main concern is that the whole work is performed in murine cells. Although I appreciate that the authors have used primary myoblasts rather than cell lines, I also think that the key advantage of such in vitro platforms is the possibility to "humanise" the experiments as much as possible. In this context, the key findings of this work should be reproduced using human myoblasts. This will significantly enhance the relevance of the work. *

      Point 1.1) We thank the reviewer for his suggestion and have already performed some pilot experiments to “humanize” experiments. We infected hiPSC-derived myotubes (van der Wal et al., 2018) and human immortalized myotubes (Mamchaoui et al., 2011) with AAV9-pACAGW-ChR2-Venus-AAV. After infection, human immortalized myotubes did not express ChR2, not permitting optogenetic training on these cultures. For hiPSC-derived myotubes, the infection rate was very low and insufficient to perform a bulk analysis to evaluate the effect of long term intermittent light stimulation. Moreover, the contractile behavior of hiPSC-derived myotubes expressing ChR2 significantly differed from primary mouse myotubes. They underwent a single and slow contraction when compared to the cyclic contractions observed in mouse myotubes. This suggests that the maturation of the contractile apparatus of 2D hiPSC-derived myotubes is insufficient to perform consistent in vitro training studies.

      As such, we agree with the reviewer that reproducing our key findings with human cells would improve the relevance of this work. However, due to the experimental limitations described above, significant improvements in human myotube maturation in vitro are required to perform such experiments. We will attempt to increase infection efficiency by using another AAV serotype in hiPSC-derived myotubes but this has a low probability of solving all the technical limitations. Our work is a proof of principal that fiber type composition can be influenced in vitro through contraction stimulation. We expect these findings to be the translated to human cultures when the field has discovered the necessary protocols to push human myotube maturation.

      Feasibility: run additional tests but probability of success is low due to technical limitations.

      *Other issues: *

      1) From a methodological perspective, I think some clarifications are needed on the western blots shown in Fig 4K-L, as the pattern of Myh3 and Myh8 in both panels appear very similar. This could easily be ruled out by providing raw data/images. Please accept my apologies if this is simply caused by similar migration patterns in the gels (worth checking).

      Point 1.2) The very similar appearance of both patterns is due to the same molecular weight (220 kDA) of distinct myh isoforms. After an initial staining of western blot membranes, primary and secondary antibodies were stripped off and the membrane was subsequently re-probed using a primary and secondary antibody. We incubated stripped membranes with secondary antibodies only and observed no signal, confirming the stripping was efficient. We have updated the representative images of the Western Blot membranes in Figure 4 and included the α-actinin loading controls on which the bands are normalized to account for sarcomerogenesis (Figure 4 K-M).

      Feasibility: Accomplished

      *2) Figure 3K-L (BTX): better imaging should be performed to assess morphology of NMJ (eg. pretzel-shaped as in mature/adult NMJ?) *

      Point 1.3) We agree with the point raised by the reviewer. However, a morphological assessment of the NMJ is difficult in this in vitro system due to our inability to generate mature muscle end plates as seen in in vivo adult NMJs. We will nevertheless perform a more quantitative evaluation of BTX stainings imaged with high spatial resolution by measuring the size and shape of the AChR clusters. The technical pipeline to do this quantitative approach is already established.

      Feasibility: will be accomplished

      *3) Figure 3 N-P: Why did the authors used a relatively complex techniques such as smFISH to answer a question more simply addressable with more conventional (and perhaps less operator dependent) techniques such quantitative PCR?

      *

      Point 1.4) We agree with the reviewer that the more conventional qPCR technique would highlight similar results to the smFISH quantifications. Due to the heterogeneity of our primary myotube cultures (presence of non-muscle cell types and varying degrees of muscle cell maturation), we opted to monitor AChR expression by conserving a spatial dimension. This allows us to observe ChrnE and ChrnG expression in mature muscle cells selected to perform the contraction analysis. Nevertheless, performing a bulk RNA expression analysis would be informative to show a significant increase in AChR expression across the culture. This point will be fully addressed by qPCR assays of ChrnE and ChrnG.

      Feasibility: will be accomplished

      *Reviewer #1 (Significance (Required)):

      Nature and significance: as mentioned in the previous section, the work can be very significant if expanded to human myoblasts/myotubes, which can have different slow/fast myosin expression pattern. The work is clearly methodological/descriptive, so showing an application of this technique using diseased/mutant cells may increase its relevance even more (but I do not believe it is a key barrier to publication). *

      We thank the reviewer for his comments as the “other issues” raised will significantly improve the manuscript and will all be tackled. With regards to using human myotubes, we will attempt a few more strategies to translate our findings to human cultures, but our preliminary data suggests that many technical barriers need to be overcome to perform such experiments. Nevertheless, it is our opinion that the main contribution of this manuscript is to show that fiber switching can be achieved in vitro and that this will be routinely used in the next generation of human in vitro muscle systems.

      *

      *

      *Comparison with other methods: Similar methods have been published but not with this level of resolution.

      Expertise: muscle disease and regeneration, in vitro and in vivo models.*

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)): *

      * The work presented shows that muscle stem cells isolated from 5-day-old mice can be transduced with a DNA coding for a Channelrhodopsin2-Venus which will allow the muscle cell to be excited by a light beam (475nm) and to induce the contraction of myotubes. The authors measure the speed of contraction, relaxation and fatigability of such cells as a function of a more or less long excitation time. In particular, they show that myotubes in culture, excited at a frequency of 5 Hz, 8 hours per day for 7 days are larger than unstimulated myotubes and are more resistant to fatigue. Surprisingly, they show that myotubes stimulated at the low frequency of 5Hz express the neonatal Myosin heavy chain more than the slow Myh whose expression is known in adult muscle to be specifically strong in muscle fibers stimulated at low frequency. As the authors do not apply a high stimulation frequency (100Hz) to their culture, it is difficult to conclude whether the stimulation frequency applied in the study induces a specific phenotypic specialization of the myofiber, or a more general role. In this respect, the size of the myotubes obtained after training seems to be increased, showing a hypertrophic effect on the cultured myotubes. This study does not allow us to conclude, beyond the expression of the Myh8 gene, on the “gain” of the fast-twitch specialization of the myofiber by repeated stimulation over several days. A complementary study would certainly provide elements to better understand the role of muscle fiber stimulation, apart from the trophic contribution provided in vivo by the motoneuron. If the study is well conducted, some points are nevertheless important to address before publication.*

      *Reviewer #2 (Significance (Required)): *

      * - Figures 4F/G are difficult to understand: the Myh7 signal seems much higher in trained myonuclei (F), but the histogram shows the opposite (G).*

      __Point 2.1) __We apologize for the confusion. The apparent higher Myh7 signal in trained cells in Figure 4F is due to background noise in the image. When mRNA is expressed, the smFISH probes are visible as small round dots. For clarity, we updated the representative images for the smFISH probes and highlighted the smFISH dots with arrows. We also adapted the y-axis of each graph to better represent the analysis of mRNA counts per myonuclei.

      Feasibility: Accomplished

      *- Figures 4L, the western blot shows the same increase in Myh3 and Myh8 at day 4, while the graph shows an increase at d4 only in Myh8, why? *

      Point 2.2) We have chosen another western blot to better reflect the quantification. It is important to note that we have normalized the band intensity to a-actinin instead of a house keeping gene to account for changes in sarcomerogenesis over the lifetime of the cultures. As such, although we observe an increase in Myh3 intensity, it is counter balanced by an increase in a-actinin expression. We have now added the a-actinin bands.

      - For immunocytochemistry against fMyh (Fig4 H, I) as well as for Western blots (Fig 4M, N), the authors have to provide arguments regarding the specificity of the antibodies used: some fMyh-specific antibodies recognize, Myh 3, 8, 1, 2, and 4, some only Myh 8, 1, 2, and 4, so it is quite difficult to conclude on the experiments using sc-32732 antibodies, (clone F59) which Myh are actually recognized in Western blot or immunocytochemistry.

      Point 2.3) According to the manufacturer, the sc-32732 antibody is specific for fast Myh (Myh1, 2, 4 and 6). Nevertheless, we will ensure the specificity of the sc-32732 antibody against fast Myosins by staining neonatal and adult TA/EDL muscle sections with anti-Myh3 (embryonic), anti-Myh8 (neonatal) and anti-fMyh antibodies.

      Feasibility: will be accomplished

      While 10Hz stimulation is known in vivo to increase the slow program, and Myh7 expression in adult muscles, the authors show that ex vivo this is not the case with primary myotubes, with Myh7 protein level not being upregulated in the 7 day stimulation paradigm, while on the contrary Myh8 expression is upregulated. I think it would be important to quantify the mRNA of each of the Myh genes to be sure that there is no problem with the antibodies, which could recognize several Myh proteins, in the absence of a resolving acrylamide gel allowing visualization and relative level of each isoform according to its migration. Nevertheless, this is an interesting observation that could be related to the early phases of muscle contraction in vivo. Indeed, it has been shown in rats that early postnatal development animals are essentially sedentary and whose muscles (Sol and EDL) are stimulated by short intermittent bursts similar to 10Hz (doi: 10.1111/j.0953-816X.2004.03418.x) during the first 2-3 weeks of life. This should be compatible with Myh8 expression. It would be relevant in this idea to verify that the paradigm presented leads to myotubes with a "neonatal" phenotype. Quantification of the expression level of *genes specifically expressed during the neonatal period, compared with those expressed in adult slow or fast myofibers, would enhance the conclusions drawn by the authors. *

      Point 2.4) The reviewer raises an important technical limitation of observing Myh proteins to identify fiber types due to the cross-reactivity of antibodies. Despite our best efforts to select the appropriate antibodies, we agree that investigating mRNA expression of individual Myh isoforms would strengthen the conclusion of our study. We will design specific primers and perform qPCR for distinct Myh isoforms on untrained and trained cultures.

      With regards to the “neonatal” phenotype of these in vitro cultures, this does indeed seem to be the case as the cultures transition from embryonic and neonatal myosins to adult myosins during the lifetime of the cultures.

      Feasibility: will be accomplished

      *Should we also be cautious about bulk analysis since, as shown in Figure S1, not all myotubes express ChR2? *

      Point 2.5) Although 10% of myotubes do not express ChR2, we believe that 90% of infected myotubes is sufficient for bulk analysis. We nevertheless combine in our study bulk analysis with single cell assays such as smFISH and immunofluorescence, which are in line with the bulk analyses.

      Feasibility: Accomplished

      May the authors correlate the ex vivo neonatal phenotype observed with the neonatal muscles they used to prepare myogenic stem cells?

      Point 2.6) We understand from this that the reviewer would like us to check the expression of distinct Myh isoforms in our in vitro system and compare it to neonatal muscle. We will perform Myh staining of muscle sections from 6-day old mouse pups (time of myogenic stem cell isolation) and compare the expression of Myosin heavy chains with what we observe in our in vitro cultures.

      Feasibility: will be accomplished

      Overall, we will address all the points of the reviewer. Those ensuring the specificity of antibodies used are particularly relevant. With regards to the comparison between our in vitro cultures with neonatal muscle, we believe this will help contextualize our findings with the literature.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      *Summary: *

      *In this work, the authors propose an in vitro model describing a strategy to alter fiber type composition of myotubes with a long-term, intermittent mechanical training. The authors present a model of myotubes transfected with an adenovirus, which makes them photosensitive; in this way, fibers contraction can be induced upon stimulation with blue LEDs. *

      *Even though ChR2 expressing myotubes have previously been used by other groups (Asano T, Ishizua T, Yawo H. Optically controlled contraction of photosensitive skeletal muscle cells. Biotechnol Bioeng. 2012 Jan;109(1):199-204), no one has ever used it in the way proposed by the authors. For this reason, this work opens new perspectives on the possible use for clinical and therapeutic purposes for this in vitro muscle system. *

      *Major comments: *

      *I believe that the authors have presented their results, conclusion and methods in a fair and clear way, so that the experiment could also be reproduced. *

      *However, I think there are some adjustments that could be done in order to improve and strengthen the quality of this work: *

      *- The authors have analysed the expression of different myosin heavy chain isoforms, both regarding the slow and fast twitch fibers. Though, I think it would be interesting to investigate also the expression of Myh4, which is mainly expressed in type IIB fast twitch fibers; *

      Point 3.1) We agree with the reviewer’s comment. We will add the analysis for Myh 4 (western blots and qPCR) to our manuscript.

      Feasibility: will be accomplished

      The authors have observed a switch in the fiber type upon prolonged intermittent stimulation with blue LEDs, which translates into a higher number of type II fibers. It is known that exercise helps rescuing the loss of type II fibers, which is typical of age-related physiological processes, such as sarcopenia (Brunner F, Schmid A, Sheikhzadeh A, Nordin M, Yoon J, Frankel V. Effects of aging on Type II muscle fibers: a systematic review of the literature. J Aging Phys Act. 2007 Jul;15(3):336-48). However, I believe that providing a deeper analysis of the metabolism of the type II fibers (i.e. oxidative or glycolytic) could be helpful in order to have a clearer view on the specific subset of fibers that are generated with the given experimental conditions;

      Point 3.2) We agree with the reviewer's suggestion that an additional metabolic analysis would strengthen our observation. We propose to perform lactate measurements in cell lysate and supernatant to monitor a switch from oxidative to glycolytic metabolism. Specific inhibitors of the glycolytic pathway (2-DG, UK5099, Rotenone and AntimycinA) will be used as a control to prevent trained cells to shift towards a fast fiber type.

      Alternatively, we will assess the protein expression levels of key metabolic proteins involved in oxidative phosphorylation and in pyruvate and lactate production (e.g. OxPhos, …). All these techniques are routinely performed in an adjacent laboratory and we foresee no technical limitations.

      Feasibility: will be accomplished

      *Minor comments: *

      *The text and the figures are clear and well written, and help to explain better the experimental setup and procedures. Still, I would suggest some minor adjustments: *

      - I would suggest providing more information on the pH used for the experiments, since it plays a pivotal role in regulating myosin ATPase activity and, thus, muscular contractility. This would improve the replicability of your experiment.

      We thank the reviewer for this comment. We will provide information regarding the pH and add it in the method and materials section.

      Feasibility: will be accomplished

      The caption of Figure 1 is missing a description of panel E, even if it has been addressed in the text.

      Point 3.3.) We apologize for this mistake. We added the missing description of Fig. 1E.

      Feasibility: Accomplished

      *Reviewer #3 (Significance (Required)): *

      *This model opens new perspectives on in vitro muscle systems for the study of pathologies. The authors have been able to assess that myofibers contraction is able to induce a shift towards type II fibers, reproducing in vitro what is also known in vivo. For this reason, I believe that this model could be useful for further clinical approaches. It is important, though, to keep in mind that muscular disorders are not all characterized by a loss of type II fibers; for instance, myotonic dystrophies type I and type 2 exhibit similar phenotypes, even if different types of muscle fibers are affected. *

      *For this reason, it would be interesting to investigate the versatility of this model in terms of giving rise to different fiber types. *

      Point 3.4.) We added a sentence in the introduction that highlights an example of muscle disorders in which slow muscle fibers are predominately affected. Concerning the versatility of the model, we will add a paragraph to the discussion elaborating on how different stimulus frequency and durations could influence the specialization of fiber types.

      Feasibility: Accomplished

      Overall, we will address all major and minor comments from the reviewer. We have identified the experiments required for the metabolic analysis and agree that it will bolster our findings.

      Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      We have already carried out the following changes in the manuscript, which were proposed by the reviewers:

      Point 1.2: pattern of Myh3 and Myh8 in both panels appear very similar - We updated the representative images of Myh 3 and Myh8 in Figure 4 K-N __and included the loading controls Myh 8 and fMyh images in __Figure 4K-N __and to __supplementary Figure 4 A, B.

      Point 2.1: Figures 4F/G: representative images of Myh7 smFISH probe and the graph showing opposite trends – We have updated the representative images of Figure 4F and we have changed the x-axis of the graph in Figure 4E and G.

      __Point 2.5: __caution around bulk analysis we consider that based on the high percentage of contracting cells in response to blue light (~90%), this concern is not warranted.

      Point 3.3: caption of Figure 1 is missing a description of panel E – We have added the missing description to the manuscript (Figure 1E).

      Point 3.4: muscular disorders are not all characterized by a loss of type II fibers – we have added an example of a muscle disorder, in which slow fibers are predominantly affected, to the introduction (line 42-44) of the manuscript.

      investigate the versatility of this model in terms of giving rise to different fiber types – we added a paragraph to the discussion elaborating on how different stimulus frequency can lead to different fiber types (line 264-275).

      3. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      Point 1.1: Reproducing our key findings with human cells – we ran pilot experiments on immortalized human cell lines and human iPSC-derived myotubes but were not able to mature these cells sufficiently nor infect them to allow long-term in vitro training. Increased maturation of myotubes derived from hiPSCs is an endeavor currently undertaken by many laboratories. Although we will attempt a few more trials, we believe the technical limitations are too important to address this point.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      In this work, the authors propose an in vitro model describing a strategy to alter fiber type composition of myotubes with a long-term, intermittent mechanical training. The authors present a model of myotubes transfected with an adenovirus, which makes them photosensitive; in this way, fibers contraction can be induced upon stimulation with blue LEDs. Even though ChR2 expressing myotubes have previously been used by other groups (Asano T, Ishizua T, Yawo H. Optically controlled contraction of photosensitive skeletal muscle cells. Biotechnol Bioeng. 2012 Jan;109(1):199-204), no one has ever used it in the way proposed by the authors. For this reason, this work opens new perspectives on the possible use for clinical and therapeutic purposes for this in vitro muscle system.

      Major comments:

      I believe that the authors have presented their results, conclusion and methods in a fair and clear way, so that the experiment could also be reproduced.

      However, I think there are some adjustments that could be done in order to improve and strengthen the quality of this work: - The authors have analysed the expression of different myosin heavy chain isoforms, both regarding the slow and fast twitch fibers. Though, I think it would be interesting to investigate also the expression of Myh4, which is mainly expressed in type IIB fast twitch fibers; - The authors have observed a switch in the fiber type upon prolonged intermittent stimulation with blue LEDs, which translates into a higher number of type II fibers. It is known that exercise helps rescuing the loss of type II fibers, which is typical of age-related physiological processes, such as sarcopenia (Brunner F, Schmid A, Sheikhzadeh A, Nordin M, Yoon J, Frankel V. Effects of aging on Type II muscle fibers: a systematic review of the literature. J Aging Phys Act. 2007 Jul;15(3):336-48). However, I believe that providing a deeper analysis of the metabolism of the type II fibers (i.e. oxidative or glycolytic) could be helpful in order to have a clearer view on the specific subset of fibers that are generated with the given experimental conditions;

      Minor comments:

      The text and the figures are clear and well written, and help to explain better the experimental setup and procedures. Still, I would suggest some minor adjustments:<br /> - I would suggest providing more information on the pH used for the experiments, since it plays a pivotal role in regulating myosin ATPase activity and, thus, muscular contractility. This would improve the replicability of your experiment; - The caption of Figure 1 is missing a description of panel E, even if it has been addressed in the text.

      Significance

      This model opens new perspectives on in vitro muscle systems for the study of pathologies. The authors have been able to assess that myofibers contraction is able to induce a shift towards type II fibers, reproducing in vitro what is also known in vivo. For this reason, I believe that this model could be useful for further clinical approaches. It is important, though, to keep in mind that muscular disorders are not all characterised by a loss of type II fibers; for instance, myotonic dystrophies type I and type 2 exhibit similar phenotypes, even if different types of muscle fibers are affected.

      For this reason, it would be interesting to investigate the versatility of this model in terms of giving rise to different fiber types.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The work presented shows that muscle stem cells isolated from 5-day-old mice can be transduced with a DNA coding for a Channelrhodopsin2-Venus which will allow the muscle cell to be excited by a light beam (475nm) and to induce the contraction of myotubes. The authors measure the speed of contraction, relaxation and fatigability of such cells as a function of a more or less long excitation time. In particular, they show that myotubes in culture, excited at a frequency of 5 Hz, 8 hours per day for 7 days are larger than unstimulated myotubes and are more resistant to fatigue. Surprisingly, they show that myotubes stimulated at the low frequency of 5Hz express the neonatal Myosin heavy chain more than the slow Myh whose expression is known in adult muscle to be specifically strong in muscle fibers stimulated at low frequency. As the authors do not apply a high stimulation frequency (100Hz) to their culture, it is difficult to conclude whether the stimulation frequency applied in the study induces a specific phenotypic specialization of the myofiber, or a more general role. In this respect, the size of the myotubes obtained after training seems to be increased, showing a hypertrophic effect on the cultured myotubes. This study does not allow us to conclude, beyond the expression of the Myh8 gene, on the "gain" of the fast-twitch specialization of the myofiber by repeated stimulation over several days. A complementary study would certainly provide elements to better understand the role of muscle fiber stimulation, apart from the trophic contribution provided in vivo by the motoneuron.

      If the study is well conducted, some points are nevertheless important to address before publication.

      Significance

      • Figures 4F/G are difficult to understand: the Myh7 signal seems much higher in trained myonuclei (F), but the histogram shows the opposite (G).
      • Figures 4L, the western blot shows the same increase in Myh3 and Myh8 at day 4, while the graph shows an increase at d4 only in Myh8, why?
      • For immunocytochemistry against fMyh (Fig4 H, I) as well as for Western blots (Fig 4M, N), the authors have to provide arguments regarding the specificity of the antibodies used: some fMyh-specific antibodies recognize, Myh 3, 8, 1, 2, and 4, some only Myh 8, 1, 2, and 4, so it is quite difficult to conclude on the experiments using sc-32732 antibodies, (clone F59) which Myh are actually recognized in Western blot or immunocytochemistry.
      • While 10Hz stimulation is known in vivo to increase the slow program, and Myh7 expression in adult muscles, the authors show that ex vivo this is not the case with primary myotubes, with Myh7 protein level not being upregulated in the 7 day stimulation paradigm, while on the contrary Myh8 expression is upregulated. I think it would be important to quantify the mRNA of each of the Myh genes to be sure that there is no problem with the antibodies, which could recognize several Myh proteins, in the absence of a resolving acrylamide gel allowing visualization and relative level of each isoform according to its migration. Nevertheless, this is an interesting observation that could be related to the early phases of muscle contraction in vivo. Indeed, it has been shown in rats that early postnatal development animals are essentially sedentary and whose muscles (Sol and EDL) are stimulated by short intermittent bursts similar to 10Hz (doi: 10.1111/j.0953-816X.2004.03418.x) during the first 2-3 weeks of life. This should be compatible with Myh8 expression. It would be relevant in this idea to verify that the paradigm presented leads to myotubes with a "neonatal" phenotype. Quantification of the expression level of genes specifically expressed during the neonatal period, compared with those expressed in adult slow or fast myofibers, would enhance the conclusions drawn by the authors.
      • Should we also be cautious about bulk analysis since, as shown in Figure S1, not all myotubes express ChR2?
      • May the authors correlate the ex vivo neonatal phenotype observed with the neonatal muscles they used to prepare myogenic stem cells?
    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The manuscript by Henning et al describes a method to induce myofiber subtype specification in vitro based on optogenetics and particle image velocimetry. The work is well performed and the manuscript is clear. The findings might be useful to the muscle community, but there are some issues which should be addressed in order to improve the quality and impact of the manuscript.

      My main concern is that the whole work is performed in murine cells. Although I appreciate that the authors have used primary myoblasts rather than cell lines, I also think that the key advantage of such in vitro platforms is the possibility to "humanise" the experiments as much as possible. In this context, the key findings of this work should be reproduced using human myoblasts. This will significantly enhance the relevance of the work.

      Other issues:

      1. From a methodological perspective, I think some clarifications are needed on the western blots shown in Fig 4K-L, as the pattern of Myh3 and Myh8 in both panels appear very similar. This could easily be ruled out by providing raw data/images. Please accept my apologies if this is simply caused by similar migration patterns in the gels (worth checking).
      2. Figure 3K-L (BTX): better imaging should be performed to assess morphology of NMJ (eg. pretzel-shaped as in mature/adult NMJ?)
      3. Figure 3 N-P: Why did the authors used a relatively complex techniques such as snFISH to answer a question more simply addressable with more conventional (and perhaps less operator dependent) techniques such quantitative PCR?

      Significance

      Nature and significance: as mentioned in the previous section, the work can be very significant if expanded to human myoblasts/myotubes, which can have different slow/fast myosin expression pattern. The work is clearly methodological/descriptive, so showing an application of this technique using diseased/mutant cells may increase its relevance even more (but I do not believe it is a key barrier to publication).

      Comparison with other methods: Similar methods have been published but not with this level of resolution.

      Expertise: muscle disease and regeneration, in vitro and in vivo models.

  2. Nov 2022
    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      2. Point-by-point description of the revision

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *The paper titled "Deregulations of miR-1 and its target Multiplexin promote dilated cardiomyopathy associated with myotonic dystrophy type 1" by the Jagla group studied the effect of down-regulation of miR-1 in myotonic dystrophy type 1 (DM1) using fly as the disease model. The study is based on previous findings that in DM1 MBNL1 is sequestered, CELF1 is stabilized, and miR-1 is down regulated. The authors further identified Multiplexin to be the target effector of miR-1 in the fly heart and studied its function with a series of gain- and loss-of-function and rescue experiments. The authors' findings represent a significant advance in understanding the genetic mechanisms that can explain the pathogenic causes of dilated cardiomyopathy associated with DM1. Overall, this paper is well written and organized, with well-designed experiments and a clear model. A few additional experiments are suggested to further strengthen the conclusion. *

      Answer: We are grateful to the Reviewer for appreciating quality and significance of our work.

      1. Wild-type and Hand-Gal4 controls are missing for all experiments (only UAS lines outcrossed to w1118 are displayed). Also, Hand-Gal4 driver lines can cause mild dilation by itself, which would influence interpretation and statistics of data. * Answer: We agree and include the Hand-Gal4/+ control condition to all main and supplemental figures showing heart parameters. The differences in data statistics using Hand-Gal4/+ compared to UAS/+ control lines reinforce our data interpretation.

      They are listed below:

      • increase in diastolic diameter and reduction of fractional shortening become statistically significant in Hand>miR-1sponge hearts at 5 weeks (Fig. 1D, F);
      • reductions of fractional shortening become significant in Hand>Bru3 (Fig. 2C) and Hand>mblRNAi (Fig. 2F) contexts at 1 week;
      • increase in diastolic diameter of Hand>mblRNAi hearts at 5 weaks becomes statistically significant (Fig. 2D);
      • increase in diastolic diameter of Hand>Mp hearts at 5 weeks (Fig. 4E) becomes statistically significant ;
      • reduction of fractional shortening becomes statistically significant in Hand>Mp context at 1 week (Fig. 4G);
      • increase in diastolic diameter in Hand>960CTG context at 5 weeks becomes statistically significant (Fig. S2A). *Also, the authors should consider to confirm the miR-1 phenotype obtained with the sponge with a miR mutant, and also combine miR-1 het with miR sponge (worsening of phenotype?). Alternatively, knockdown efficiency of miR should be tested by qPCR or HCR/smFISH. *

      Answer: We are grateful for these comments. Below we refer to published data and to performed additional experiments that are in support of miR-1 sponge phenotypes:

      • UAS-miR-1 sponge line we used was generated and tested by Fulga et al., (Nat Comm, 2015). Fulga and colleagues apply UAS-miR1sponge line to attenuate miR-1 function in muscles and obtain miR-KO-like muscle phenotypes.
      • Here, we identify Mp as a new direct miR-1 target. To test whether miR-1 sponge attenuates miR-1 function we analyzed Mp protein levels in the hearts from wt and Hand>miR-1sponge flies. Mp expression is highly increased in Hand>miR-1sponge context indicating attenuation of miR-1 by the sponge transgene. These data are presented in new Fig. S5J;
      • We tested whether heterozygous dmiR-1 KO -/+ flies (homozygous dmiR-1 mutants are lethal) develop Hand>miR-1sponge-like heart phenotype. Indeed, at 5 weeks of age dmiR-1 KO -/+ flies show significantly increased diastolic and systolic heart diameters. Thus, in old flies loss of one copy of miR-1 mimics heart dilation observed in Hand>dmiR-1sponge context. Heart contractility remains unaffected in dmiR-1 KO -/+ flies, suggesting that loss of one copy of miR-1 has a weaker impact on heart function than heart-targeted miR-1sponge. These data are shown in a new supplemental figure (Fig. S4A-C).

      • It is surprising that one of the DM1 fly models, overexpression of 960 CTG repeats, did not show DCM, considering it is the primary cause of DM1 in humans due to excessive CTG repeats. It should be discussed why Hand>960 CTG does not lead to DCM, since the authors claim that this model with high number of CTG repeats shows a strong phenotype. Are Hand>bru3 and Hand>mbl stronger? *

      Answer: We thank Reviewer for pointing this out.

      Heart and muscle-specific DM1 models we established and tested (Hand> or Mef>960CTG, Hand> or Mef>mblRNAi and Hand> or Mef>Bru3) all develop the majority of DM1 phenotypes (Picchio et al., 2013 ; Picchio et al., 2018 ; Auxerre-Planté et al., 2019). However, some cardiac DM1 phenotypes such as conduction defects (Auxerre-Plantié et al., 2019) and described here DCM are only observed in Hand>Bru3 and Hand>mblRNAi contexts. We previously observed that the down-regulation of sarcomeric genes is more important in Mef>Bru3 than in Mef>960CTG context (Picchio et al., 2018). This could result from a milder effect of 960CTG repeats on Bru3 and Mbl levels when compared with Gal4-driven overexpression of Bru3 and RNAi-knockdown of mbl. We add a comment to Results section (page 5) to discuss this point: “The Hand>960CTG line shows cardiac dilation at 5 weeks of age characterized by significant increase in diastolic and systolic diameters but with normal cardiac contractility (Fig. S2A,B,C). We hypothesise that non-affected contractility in this DM1 line is due to a milder effect of 960CTG repeats on Bru3 and Mbl levels compared to GAL4-driven overexpression of Bru3 and RNAi-knockdown of mbl.”

      *Is miR-1 (and Mp) unaltered in these flies with 960 CTG repeats? *

      Answer: In Hand>960CTG context a reduced level of miR-1 and an increase in Mp are also observed (not shown). Hand>960CTG flies do not develop DCM but at 5 weeks of age show a significant increase in diastolic and systolic heart diameters. One possibility we favor is that deregulation of miR-1 and Mp in Hand>960CTG is under the level that induces DCM. By analogy, only DM1 patients with a high increase in Col15A1 develop DCM (Fig. 5).

      It would be interesting to overexpress 960 CTG in a miR-1 or mbl heterozygous mutant background, which may produce DCM.

      Answer: To test additional genetic context in which miR-1 is reduced we used miR-1 heterozygous KO flies. We found that in 5 weeks-old flies loss of one copy of miR-1 leads, like in Hand>miR-1sponge flies, to heart dilation (Fig. S4A-C), however mir-1KO +/- flies do not show affected contractility.

      We agree that combining Hand>960CTG with mir-1 heterozygous mutants would potentially result in an additional DCM producing context. As we are focusing on our DCM developing DM1 models (Hand>Bru3 and Hand>mblRNAi) and on conserved deregulation of miR-1 and its target Mp/Col15A1, we didn’t follow this suggestion.

      • In Figure 2H, the mean intensity is displayed as the readout of the smFISH quantification of miR-1 levels. If understood correctly, this is the wrong readout since smFISH detects single molecule fluorescence of transcripts, so the number of transcripts should be quantified. *

      Answer: The miR-1 quantification was done using FISH with miR-1-specific LNA probe (Qiagen miRCURY system). This is highly sensitive ISH but the resolution is not at a single molecule level. The Imaris-generated spots in Fig. 2G and 2G’ represent (for each of them) several miR-1 molecules. The mean intensity of the fluorescent signal for a given spot is proportional to the number of miR-1 molecules and the average of the mean intensities of all spots illustrates the level of miR-1 expression (Fig. 2H). We remove “sm” abbreviations from the text and Figure legend and provide a more detailed description of miR-1 quantification in the Method section.

      Furthermore, Hand-Gal4 is not expressed in the ventral longitudinal muscles (VLM). As a proof of principle, miR-1 levels should be quantified in VLM (no change in transcripts levels expected).

      Answer: When cross with UAS-GFP the Hand-Gal4 expression could be detected in VLMs even if VLM associated GFP signal is much lower than in cardioblasts (CB) and pericardial cells (PC). Representative views of Hand>GFP hearts labeled with anti-GFP are shown below.

      • Fig. 5: Since DCM- DM1 patients still show elevated COL15A1 levels but no DCM, it would be interesting to know if DCM phenotypes are COL15A1-dosage dependent. This could be easily tested in the fly model by testing UAS-Mp overexpression at different temperatures. *

      Answer: Heart parameters are to some extent temperature-sensitive (our observations) thus in our view increasing targeted Mp expression by elevating temperature is not appropriate for heart physiology experiments.

      Presented in the manuscript data on Mp overexpression at 25°C already provide some indication for Mp dose-dependent effect in the fly model. We observe that DCM is induced at both 1 and 5 weeks of age, but the cardiac tube dilation is less important in 1 than in 5 weeks-old flies (Fig. 4). Also, when analysing Hand>960CTG DM1 model we observed that young flies with a low Mp levels do not show cardiac dilation while aged Hand>960CTG flies display an increase in diastolic and systolic heart diameters concomitant with a higher Mp.

      • The authors elegantly show rescue of Hand>Bru3 flies by Mp RNAi. Their model would be further strengthened if a similar rescue can be shown with Hand>mblRNAi. *

      Answer: So far we were unsuccessful in generation of recombined mblRNAi ;MpRNAi line most probably because of incompatibility in chromosomal transgene locations. Thus, we were unable to perform this experiment.

      Because gene deregulations and DCM phenotypes we describe are highly similar in Hand>Bru3 and Hand>mblRNAi context we believe that rescue experiment we provide is representative for both DM1-associated DCM contexts.

      Minor points:

      *Fig. 1A,B: Ventral longitudinal muscles are covering the hearts on these images, so it's difficult to see the heart dimensions. This holds true for images throughout the manuscript. Where were the diameters measured (by the valves)? A better description and illustration would help the reader understand the situation. *

      Answer: In the lateral heart views as in Fig. 1A and B it is indeed difficult to appreciate heart dimensions. For this reason we always show transversal sections derived from 3D reconstruction (as in Fig. 1A’ and B’). In this context differences in the internal heart diameters could be appreciated (white lines). All diastolic and systolic heart diameter measures presented in the graphs are extracted from the SOHA registrations (see Methods).

      *Fig. 1 A',B': White line does not reflect the location where SOHA data are measured and should be horizontal for consistency. Where is ventral vs. dorsal? *

      Answer: We agree. We indicate where is ventral and dorsa. For consistency we remove white lines from panels 1A and 1B and maintain orientations of white lines in panels 1A’ and 1B’.

      Fig. 1D-F: Annotate 1 and 5 weeks in Figure, please. Also, why were 1 and 5 weeks tested? Is there an age-component in DM1 phenotype severity?

      Answer: We add 1 and 5 weeks indications to the figures and discuss in the text (Results section page 4) that 1 and 5 weeks analyses were applied because the severity of cardiac phenotypes increases with age.

      *Fig. 3A: Transcriptional analysis was done at which stage of development? *

      Answer: It was done at 5 weeks of age. We add information to the figure legend.

      *Fig. 3: It is not clear, in which set the authors looked for miR-1 bindings sites (144 genes or the whole set)? Not well annotated. What is meant by 'heart-targeted'? *

      Answer: In silico search was performed on the whole set of genes. We provide more precisions on in silico screen in Method section.

      *Fig. 4C,D: It looks like they are not shown in the same dorsal-ventral orientation. Also, it looks Mp is overexpressed in the VLM, but Hand-Gal4 only drives in the cardiomyocytes and pericardial cells? How was quantification done? *

      Answer: We are thanking for pointing this out. We revised heart orientations in panel 4C and 4D. As previously mentioned Hand-Gal4 is also expressed in the VLM. We present a more representative view in 4D with a lower Mp signal in VLMs. Quantification of Mp expression is not presented here but performed like in Fig. 3G.

      *Fig. 4I: Why are some myofibers indicated in red in the model? *

      Answer: In red are indicated additional actin filaments that form in the case of heart dilation. As we do not discuss this aspect we modify drawing in the model.

      Fig. 5 D-E: Genotypes need to be better indicated in the graphs.

      Answer: We provide now more complete genotypes.

      *Did the authors control for multiple UAS sites? Is UPRT a UAS control? *

      Answer: Yes, UPRT is the UAS line.

      *In the first paragraph of Result 3, the last sentence seems unfinished. "We identified a set of candidate genes, of which Multiplexin (Mp)" *

      Answer: We revise this sentence.

      *In Method, the in silico screening for miR-1 target should be explained in more detail. *

      Answer: We provide a more detailed in silico screening protocol in Method section.

      *Reviewer #1 (Significance (Required)): *

      * The presented data is a significant advance our knowledge of our understanding of the molecular mechanisms involved in DM1. I expect that scientists in the muscular disease field and beyond will find this work of high interest. *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      *Summary: This well-written manuscript utilizes the Drosophila model system to demonstrate that reduction of micro-RNA miR-1 and the resulting increase in one of its down-regulated proteins (Multiplexin) contribute to a dilated cardiomyopathy (DCM) phenotype. This is of interest in that this particular micro-RNA is downregulated in myotonic dystrophy type 1 (DM1), and this correlates with the DCM phenotype observed in patients. Further, the authors show that the human ortholog of Multiplexin is enriched in human DM1-DCM hearts and that downregulation of this protein in Drosophila DM1 models improves the DCM phenotype. Hence, the work demonstrates a potential mechanism for disease development and its amelioration. *

      Answer: We are grateful to the Reviewer for appreciating our work and pointing out potential impact of our findings.

      Major comments:

      1. As pericardial cells are probed and mentioned substantially in this paper, the authors should explain what these cells do in flies. While affiliated with the heart, they are not myocytes and are probably not particularly relevant to the human heart. In this regard, it is possible that the phenotypes observed in the heart are partially or completely the result of Hand-driven expression of transgenes in the pericardial cells. Although unlikely, this issue should be mentioned as well. * Answer: Hand-Gal4 driver is the most commonly used Drosophila cardiac driver. Regarding influence of Hand-Gal4 driven expression in pericardial cells on the heart phenotypes, we previously tested all our DM1 models using cardioblast-specific Tin-GAL4 driver. All cardiac phenotypes including DCM are also observed when using Tin-Gal4 driver (as shown for conduction defects in Fig.5B Auxerre-Plantié et al., Elife 2019) indicating that the phenotypes are mainly due to gene deregulations within the cardioblasts.

      *Does human miR-1 target Col15A1 transcripts based upon in silico analysis? This issue should be mentioned and discussed. *

      Answer: In silico analysis (new supplemental Fig.S4G) reveals that Col15A1 transcripts carry a perfect miR-1 seed site in 3’UTR region.

      Minor comments:

      1. The abstract should explicitly state that Multiplexin is a form of collagen.* Answer: We mention this in the abstract.

      *More information on the identity between the Drosophila and human forms of miR-1 would be helpful to establish that they are conserved. What is the percent identity and are the sequences that target mRNAs homologous? *

      Answer: Mature Drosophila and human miR-1 are highly homologous. We provide their sequences in new supplemental Fig. S4F.

      • In Figure 1C, it appears that there is an increased heartbeat frequency and arrhythmicity. Are these mutant phenotypes as well? *

      Answer: We check it again and do not observe any significant change in heart period or in arrhythmia index in Hand>miR1 sponge context in both young and old flies. We show a new more representative view of M-modes in panel 1C.

      • Incomplete sentence (page 5): We identified a set of candidate genes, of which Multiplexin (Mp) *

      Answer: This sentence was revised.

      *What is the basis for studying Multiplexin function as opposed to other candidates that were identified? It would be useful to mention this in the Results, although it is mentioned in the Discussion ("We top-ranked Mp because of its known role in setting the size of the cardiac lumen"). *

      Answer: We add following sentences to Results section to clarify this point earlier in the manuscript.

      “Mp overexpression in the developing embryonic heart leads to an enlargement of heart lumen and is sufficient to promote an increase of the embryonic aorta diameter to that of the heart proper (Harpaz et al., 2013). We thus reasoned that Mp could be involved in DM1-associated DCM.”

      • "Mp was detected on the luminal and external surfaces of the cardiomyocytes ensuring cardiac contractions" Why does this ensure cardiac contractions? *

      Answer: We are grateful for pointing this out. Mp is not ensuring but could influence cardiac contractions. We revise this sentence by deleting its second part “ensuring cardiac contractions”.

      • Need to state in the text that the increased level of Col15A1 transcript expression in DM1 patients was not statistically significant. *

      Answer: We state this in the text.

      • Need a magnification bar for Figures 5F-H. *

      Answer: Scale bar is added.

      *Please speculate as to why the third DM1 model does not recapitulate the cardiac phenotypes. *

      Answer: Heart and muscle-specific DM1 models we established and tested (Hand> or Mef>960CTG, Hand> or Mef>mblRNAi and Hand> or Mef>Bru3) all develop the majority of DM1 phenotypes (Picchio et al., 2013 ; Picchio et al., 2018 ; Auxerre-Planté et al., 2019). However, some cardiac DM1 phenotypes such as conduction defects (Auxerre-Plantié et al., 2019) and described here DCM are only observed in Hand>Bru3 and Hand>mblRNAi contexts. We previously observed that downregulation of sarcomeric genes is higher in Mef>Bru3 than in Mef>960CTG contexts (Picchio et al., 2018). This could result from a milder effect of 960CTG repeats on Bru3 and Mbl levels when compared with Gal4-driven overexpression of Bru3 and RNAi-knockdown of mbl. We add a comment in Results section (page 5) to discuss this point: “The Hand>960CTG line shows cardiac dilation at 5 weeks of age characterized by significant increase in diastolic and systolic diameters but with normal cardiac contractility (Fig. S2A,B,C). We hypothesise that non-affected contractility in this DM1 line is due to a milder effect of 960CTG repeats on Bru3 and Mbl levels compared to GAL4-driven overexpression of Bru3 and RNAi-knockdown of mbl.”

      • Did the confocal studies indicate whether there was myofibrillar disarray in the heart tubes? *

      Answer: Thank you for this comment. Yes, we observe myofibrillar disarray. We show disarray phenotypes in all DCM developing contexts in a new supplementary figure (Fig.S1).

      • For the statistical comparisons in the figures, please indicate in the legends that statistically significant differences (p

      Answer: We provide this precision in Figure legends.

      *Please more thoroughly explain the UPRT control line. *

      Answer: We provide information about UPRT line in the Results section (p.8).

      *Figure S1 legend: "(red) and 5 (darck)"; the latter should read "(black)" *

      Answer: Revised.

      *Figure S2 panels J and K: it would be helpful to indicate what is being measured on the Y axis, e.g., Mean intensity of dmiR-1 levels. This is true for the various panels in other figures labeled CTCF on their Y axes. *

      Answer: Revised as suggested by the Reviewer.

      *CROSS-CONSULTATION COMMENTS All reviewers agree that this is a well-designed study and that the manuscript is well written. The missing Hand-Gal4 control mentioned by Reviewers 1 and 3 seems an important element that is missing. These reviewers also call into question the FISH quantification methodology. These two issues seem the most critical to resolve. The other additional experiments suggested deserve input from the authors as to whether they already have relevant data that can be cited, whether they are important to pursue or if they go beyond the scope of the current study. Reviewers 1 and 2 agree that further discussion of the fly model that does not show DCM should be provided. The question on fibrosis in the fly models is germane (Reviewer 3), although it might be indirectly addressed by the fact that a collagen molecule is upregulated here (a major player in fibrosis). All of the minor comments are reasonable and should be addressed by the authors. *

      Answer: We provide answer to all these comments.

      *Reviewer #2 (Significance (Required)): *

      * This paper is significant in that it draws a more direct connection between the reduction in a microRNA that occurs in myotonic dystrophy and dilated cardiomyopathy that is affiliated with this disease. It shows that a form of collagen that is overexpressed in both Drosophila models and humans with DM1-caused DCM is causative/correlated with the increased heart diameters. Thus, the fly model provides important insights into the link between the mutant gene and the cardiac phenotype. This work will be of interest to those studying skeletal and cardiac muscle disease and scientists interested in developing potential therapeutics for treating DM1-caused DCM. Note that my expertise is in producing and studying skeletal muscle and cardiac disease models in the Drosophila system, which is relevant to evaluating this paper and defining its significance in the field. *

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      *A current study demonstrates that miR-1 targets the newly identified heat-specific target Multiplexin, which, when upregulated, exhibits similar phenotypes observed for the Drosophila DM1 model. Furthermore, the authors additionally confirm some of their results using samples derived from DM1 patients and support the data obtained in flies. Overall, it is a good study with well-performed experiments. The data presented in the paper are convincing and most of the claims the authors provide are supported by their findings. Manuscript is clearly written and easy to read and understand. Statistical analysis and the description of the methods are appropriate. It is an interesting paper and I would highly support it to be accepted for publication; however, I few comments I would like authors to address. *

      Answer: We are grateful to the Reviewer for his enthusiastic and supportive comments on our work.

      Major comments:

      1. The cardiac dilation/fractional shortening phenotype in Hand>dmiR-1-KD flies is only observed in young flies but not in old flies. However, heart-targeted Mp overexpression leads to DCM in aged flies. Could authors comment on this? * Answer: We speculate that attenuation of miR-1 in the heart could lead to more drastic pro-DCM alterations thus leading to earlier phenotypes than in the case of Mp overexpression.

      To better assess DCM phenotypes we included additional Hand-Gal4/+ control context and more heart samples from 5 weeks-old flies. These additional analyses presented in revised Fig. 1D-F reveal that old Hand>dmiR-1KD flies, like the young one, also develop DCM phenotype.

      • Since the HAND-Gal4 line was used to drive multiple transgenes, it would be important to have the cardiac dilation/fractional shortening phenotype measured in this line as a control. *

      Answer: We performed these control experiments and suggested by the reviewer they are now included to the graphs.

      • The use of a sponge line is highly appreciated as it allows for tissue-specific downregulation of miRNA. However, to corroborate the data, I would recommend including the knockdown mutant that is available in Bloomington as additional confirmation since no qPCR is provided for the efficacy of the sponge line. This line could also be used in combination with reporter lines to perform a targeting experiment. *

      Answer: We are grateful for these comments. Below we refer to performed additional experiments:

      • We tested whether heterozygous dmiR-1 KO -/+ flies (homozygous dmiR-1 mutants are lethal) develop Hand>miR-1sponge-like heart phenotype. Indeed, at 5 weeks of age dmiR-1 KO -/+ flies show significantly increased diastolic and systolic heart diameters. Thus, in old flies loss of one copy of miR-1 mimics heart dilation observed in Hand>dmiR-1sponge context. Heart contractility remains unaffected in dmiR-1 KO -/+ flies, suggesting that loss of one copy of miR-1 has a weaker impact on heart function than heart-targeted miR-1sponge. These data are shown in a new supplemental figure (Fig. S4A-C).
      • Here, we identify Mp as a new direct miR-1 target. To test whether miR-1 sponge attenuates miR-1 function we analyzed Mp protein levels in the hearts from wt and Hand>miR-1sponge flies. Mp expression is highly increased in Hand>miR-1sponge context indicating attenuation of miR-1 by the sponge transgene. These data are presented in new Fig. S5J;

      • The authors state that the reduced miR-1 levels have already been shown in DM1 patients. It would be a stronger argument if similar downregulation was shown in patient samples used in this manuscript (qPCR would be sufficient). *

      Answer: We performed suggested by the reviewer analyses of miR-1 in patient samples. We show that miR-1 is indeed down regulated. These new data supporting conserved pro-DCM deregulation of miR-1 and its target Mp/Col15A1 are shown in new Fig 5C.

      • Because fibrosis is a hallmark of myotonic dystrophy, do the authors have some makers or other methods to test whether observed phenotypes are due to fibrosis? *

      Answer: Fibrosis (replacement of muscle by fibrotic tissue) has not been reported in Drosophila and is not associated with degeneration of body wall or cardiac Drosophila muscles in so far described fly models of human muscular dystrophies. However, one could speculate that increase in Mp/Col15A1 levels within the ECM of diseased DM1 cardiac cells we observe, could have, a fibrotic-like, negative effect on cardiac function.

      • The explanation of the observation that pre-miR-1 levels are down-regulated only in young flies, whereas old flies show an opposite tendency, is missing. *

      Answer: Accumulation of pre-miR-1 in old flies is most probably due to the affected processing mediated by mbl. This is correlated with the reduction of the mature miR-1.

      The authors suggested that this is due to "impaired processing". To corroborate this interesting hypothesis, the authors performed only the smFISH intensity analyses, which are somewhat difficult to decipher. I would recommend, in addition to the pre-miRNA levels, to test and compare the mature miRNA expression using TaqMan qPCR.

      Answer: Impaired miR-1 processing is supported by the previous studies in human cells (Rau et al., 2011) and in Drosophila models (Fernandez-Costa et al., 2013). We believe that our method of quantification of miR-1 expression via highly sensitive miRCURY LNA FISH is a well-adapted method. It was performed with all necessary controls. In the method section we provide now more details for the LNA FISH based miR-1 quantification approach.

      In parallel, TaqManPCR-based miR-1 quantification was performed for human cardiac samples from DM1 patients.

      • The relationship between Bru3 and miR-1 shown in the schematic is not well-defined and would rather require a question mark or dotted line, as the authors provide no evidence that Bru3 can be directly involved in miR-1 processing. The authors suggest that CELF1 may bind UG-rich miRNAs and mediate their degradation by recruiting poly(A)-specific ribonuclease (PARN), but this is only a hypothesis and does not justify the placement of a direct line of repression on the schematic in the last figure. *

      Answer: We agree and modify scheme accordingly.

      • I also feel that the authors did not clearly explain the cardiac phenotypes in terms of systolic and diastolic diameter measurements. Which parameters clearly represent the DM1 model, specifically higher or lower diameters of systole and diastole? Results should be clearly indicated in figure legends. *

      Answer: We provide appropriate precisions in figure legends.

      Minor comments:

      1. The full name of CELF1 on page 2: CUGBP Elav-like family member 1 should be added*. Answer: Revised

      2. For better readability of the text and corresponding figures, consistent use of UAS-Mp or UAS-3HNC1 is recommended, but not a mixture of both. *

      Answer: We consistently use UAS-Mp in the revised version

      • Why is the Multiplexin overexpression line called UAS-3HNC1? *

      Answer: This is the name that resumes protein Mp domains: Collagen tripple helix and trimerization region (3H) and NC1 domain (C-terminal non-triple helical domain) comprising Endostatin domain. We provide this information in Methods section.

      • For all figures, it would be better if the genotypes were indicated in the panels and the graphs had the age of the flies instead of color coding. *

      Answer: We revised these points as suggested.

      • Figure 5. Were technical replicates performed for the western blot shown in 5B? *

      Answer: We didn’t perform technical replicates because of limited human sample amounts

      • Figure S1-S2. Why the data for qPCR of miR-1 is in figure S1 and not S2? *

      Answer: In the revised version all supplemental analyses on miR1 are included to the new Fig. S4

      • Figure S4. Misspelling in figure legend: Scale "barre" instead of scale "bar".*

      Answer:Revised

      Reviewer #3 (Significance (Required)):

      * The manuscript "Deregulations of miR-1 and its target Multiplexin promote dilated cardiomyopathy associated with myotonic dystrophy type 1" by Souidi et al., reports a novel role of identified Multiplexin (Mp) as a new cardiac miR-1 target involved in myotonic dystrophy type 1 (DM1) using Drosophila as a model system. Myotonic dystrophy type 1 (MD1) is a severe disease that results in a multisystem disorder affecting the skeletal and smooth muscles as well as the eye, heart, endocrine system, and central nervous system. At the moment, no appropriate treatment has been identified to prevent it. Previous studies have also shown that heart-specific miR-1 levels are reduced in patients with DM1, but the role and targets of this miRNA in the heart have not been analyzed. Research presented in this paper is of a broad interest and provide new evidence that will help to better understating DM1 on molecular level. It will be interesting not only to scientists from the Drosophila field but will also contribute to medical research field.*

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      A current study demonstrates that miR-1 targets the newly identified heat-specific target Multiplexin, which, when upregulated, exhibits similar phenotypes observed for the Drosophila DM1 model. Furthermore, the authors additionally confirm some of their results using samples derived from DM1 patients and support the data obtained in flies. Overall, it is a good study with well-performed experiments. The data presented in the paper are convincing and most of the claims the authors provide are supported by their findings. Manuscript is clearly written and easy to read and understand. Statistical analysis and the description of the methods are appropriate. It is an interesting paper and I would highly support it to be accepted for publication; however, I few comments I would like authors to address.

      Major comments:

      1. The cardiac dilation/fractional shortening phenotype in Hand>dmiR-1-KD flies is only observed in young flies but not in old flies. However, heart-targeted Mp overexpression leads to DCM in aged flies. Could authors comment on this?

      2. Since the HAND-Gal4 line was used to drive multiple transgenes, it would be important to have the cardiac dilation/fractional shortening phenotype measured in this line as a control.

      3. The use of a sponge line is highly appreciated as it allows for tissue-specific downregulation of miRNA. However, to corroborate the data, I would recommend including the knockdown mutant that is available in Bloomington as additional confirmation since no qPCR is provided for the efficacy of the sponge line. This line could also be used in combination with reporter lines to perform a targeting experiment.

      4. The authors state that the reduced miR-1 levels have already been shown in DM1 patients. It would be a stronger argument if similar downregulation was shown in patient samples used in this manuscript (qPCR would be sufficient).

      5. Because fibrosis is a hallmark of myotonic dystrophy, do the authors have some makers or other methods to test whether observed phenotypes are due to fibrosis?

      6. The explanation of the observation that pre-miR-1 levels are down-regulated only in young flies, whereas old flies show an opposite tendency, is missing. The authors suggested that this is due to "impaired processing". To corroborate this interesting hypothesis, the authors performed only the smFISH intensity analyses, which are somewhat difficult to decipher. I would recommend, in addition to the pre-miRNA levels, to test and compare the mature miRNA expression using TaqMan qPCR.

      7. The relationship between Bru3 and miR-1 shown in the schematic is not well-defined and would rather require a question mark or dotted line, as the authors provide no evidence that Bru3 can be directly involved in miR-1 processing. The authors suggest that CELF1 may bind UG-rich miRNAs and mediate their degradation by recruiting poly(A)-specific ribonuclease (PARN), but this is only a hypothesis and does not justify the placement of a direct line of repression on the schematic in the last figure.

      8. I also feel that the authors did not clearly explain the cardiac phenotypes in terms of systolic and diastolic diameter measurements. Which parameters clearly represent the DM1 model, specifically higher or lower diameters of systole and diastole? Results should be clearly indicated in figure legends.

      Minor comments:

      1. The full name of CELF1 on page 2: CUGBP Elav-like family member 1 should be added.

      2. For better readability of the text and corresponding figures, consistent use of UAS-Mp or UAS-3HNC1 is recommended, but not a mixture of both.

      3. Why is the Multiplexin overexpression line called UAS-3HNC1?

      4. For all figures, it would be better if the genotypes were indicated in the panels and the graphs had the age of the flies instead of color coding.

      5. Figure 5. Were technical replicates performed for the western blot shown in 5B?

      6. Figure S1-S2. Why the data for qPCR of miR-1 is in figure S1 and not S2?

      7. Figure S4. Misspelling in figure legend: Scale "barre" instead of scale "bar".

      Significance

      The manuscript "Deregulations of miR-1 and its target Multiplexin promote dilated cardiomyopathy associated with myotonic dystrophy type 1" by Souidi et al., reports a novel role of identified Multiplexin (Mp) as a new cardiac miR-1 target involved in myotonic dystrophy type 1 (DM1) using Drosophila as a model system. Myotonic dystrophy type 1 (MD1) is a severe disease that results in a multisystem disorder affecting the skeletal and smooth muscles as well as the eye, heart, endocrine system, and central nervous system. At the moment, no appropriate treatment has been identified to prevent it. Previous studies have also shown that heart-specific miR-1 levels are reduced in patients with DM1, but the role and targets of this miRNA in the heart have not been analyzed. Research presented in this paper is of a broad interest and provide new evidence that will help to better understating DM1 on molecular level. It will be interesting not only to scientists from the Drosophila field but will also contribute to medical research field.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      This well-written manuscript utilizes the Drosophila model system to demonstrate that reduction of micro-RNA miR-1 and the resulting increase in one of its down-regulated proteins (Multiplexin) contribute to a dilated cardiomyopathy (DCM) phenotype. This is of interest in that this particular micro-RNA is downregulated in myotonic dystrophy type 1 (DM1), and this correlates with the DCM phenotype observed in patients. Further, the authors show that the human ortholog of Multiplexin is enriched in human DM1-DCM hearts and that downregulation of this protein in Drosophila DM1 models improves the DCM phenotype. Hence, the work demonstrates a potential mechanism for disease development and its amelioration.

      Major comments:

      1. As pericardial cells are probed and mentioned substantially in this paper, the authors should explain what these cells do in flies. While affiliated with the heart, they are not myocytes and are probably not particularly relevant to the human heart. In this regard, it is possible that the phenotypes observed in the heart are partially or completely the result of Hand-driven expression of transgenes in the pericardial cells. Although unlikely, this issue should be mentioned as well.

      2. Does human miR-1 target Col15A1 transcripts based upon in silico analysis? This issue should be mentioned and discussed.

      Minor comments:

      1. The abstract should explicitly state that Multiplexin is a form of collagen.

      2. More information on the identity between the Drosophila and human forms of miR-1 would be helpful to establish that they are conserved. What is the percent identity and are the sequences that target mRNAs homologous?

      3. In Figure 1C, it appears that there is an increased heartbeat frequency and arrhythmicity. Are these mutant phenotypes as well?

      4. Incomplete sentence (page 5): We identified a set of candidate genes, of which Multiplexin (Mp)

      5. What is the basis for studying Multiplexin function as opposed to other candidates that were identified? It would be useful to mention this in the Results, although it is mentioned in the Discussion ("We top-ranked Mp because of its known role in setting the size of the cardiac lumen").

      6. "Mp was detected on the luminal and external surfaces of the cardiomyocytes ensuring cardiac contractions" Why does this ensure cardiac contractions?

      7. Need to state in the text that the increased levels of Col15A1 transcript expression in DM1 patients was not statistically significant.

      8. Need a magnification bar for Figures 5F-H.

      9. Please speculate as to why the third DM1 model does not recapitulate the cardiac phenotypes.

      10. Did the confocal studies indicate whether there was myofibrillar disarray in the heart tubes?

      11. For the statistical comparisons in the figures, please indicate in the legends that statistically significant differences (p<0.05) are shown.

      12. Please more thoroughly explain the UPRT control line.

      13. Figure S1 legend: "(red) and 5 (darck)"; the latter should read "(black)"

      14. Figure S2 panels J and K: it would be helpful to indicate what is being measured on the Y axis, e.g., Mean intensity of dmiR-1 levels. This is true for the various panels in other figures labeled CTCF on their Y axes.

      CROSS-CONSULTATION COMMENTS

      All reviewers agree that this is a well-designed study and that the manuscript is well written. The missing Hand-Gal4 control mentioned by Reviewers 1 and 3 seems an important element that is missing. These reviewers also call into question the FISH quantification methodology. These two issues seem the most critical to resolve. The other additional experiments suggested deserve input from the authors as to whether they already have relevant data that can be cited, whether they are important to pursue or if they go beyond the scope of the current study. Reviewers 1 and 2 agree that further discussion of the fly model that does not show DCM should be provided. The question on fibrosis in the fly models is germane (Reviewer 3), although it might be indirectly addressed by the fact that a collagen molecule is upregulated here (a major player in fibrosis). All of the minor comments are reasonable and should be addressed by the authors.

      Significance

      This paper is significant in that it draws a more direct connection between the reduction in a microRNA that occurs in myotonic dystrophy and dilated cardiomyopathy that is affiliated with this disease. It shows that a form of collagen that is overexpressed in both Drosophila models and humans with DM1-caused DCM is causative/correlated with the increased heart diameters. Thus, the fly model provides important insights into the link between the mutant gene and the cardiac phenotype. This work will be of interest to those studying skeletal and cardiac muscle disease and scientists interested in developing potential therapeutics for treating DM1-caused DCM. Note that my expertise is in producing and studying skeletal muscle and cardiac disease models in the Drosophila system, which is relevant to evaluating this paper and defining its significance in the field.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The paper titled "Deregulations of miR-1 and its target Multiplexin promote dilated cardiomyopathy associated with myotonic dystrophy type 1" by the Jagla group studied the effect of down-regulation of miR-1 in myotonic dystrophy type 1 (DM1) using fly as the disease model. The study is based on previous findings that in DM1 MBNL1 is sequestered, CELF1 is stabilized, and miR-1 is down regulated. The authors further identified Multiplexin to be the target effector of miR-1 in the fly heart and studied its function with a series of gain- and loss-of-function and rescue experiments. The authors' findings represent a significant advance in understanding the genetic mechanisms that can explain the pathogenic causes of dilated cardiomyopathy associated with DM1. Overall, this paper is well written and organized, with well-designed experiments and a clear model. A few additional experiments are suggested to further strengthen the conclusion.

      Major points:

      1. Wild-type and Hand-Gal4 controls are missing for all experiments (only UAS lines outcrossed to w1118 are displayed). Also, Hand-Gal4 driver lines can cause mild dilation by itself, which would influence interpretation and statistics of data. Also, the authors should consider to confirm the miR-1 phenotype obtained with the sponge with a miR mutant, and also combine miR-1 het with miR sponge (worsening of phenotype?). Alternatively, knockdown efficiency of miR should be tested by qPCR or HCR/smFISH.

      2. It is surprising that one of the DM1 fly models, overexpression of 960 CTG repeats, did not show DCM, considering it is the primary cause of DM1 in humans due to excessive CTG repeats. It should be discussed why Hand>960 CTG does not lead to DCM, since the authors claim that this model with high number of CTG repeats shows a strong phenotype. Are Hand>bru3 and Hand>mbl stronger? Is miR-1 (and Mp) unaltered in these flies with 960 CTG repeats? It would be interesting to overexpress 960 CTG in a miR-1 or mbl heterozygous mutant background, which may produce DCM.

      3. In Figure 2H, the mean intensity is displayed as the readout of the smFISH quantification of miR-1 levels. If understood correctly, this is the wrong readout since smFISH detects single molecule fluorescence of transcripts, so the number of transcripts should be quantified. Furthermore, Hand-Gal4 is not expressed in the ventral longitudinal muscles (VLM). As a proof of principle, miR-1 levels should be quantified in VLM (no change in transcripts levels expected).

      4. Fig. 5: Since DCM- DM1 patients still show elevated COL15A1 levels but no DCM, it would be interesting to know if DCM phenotypes are COL15A1-dosage dependent. This could be easily tested in the fly model by testing UAS-Mp overexpression at different temperatures.

      5. The authors elegantly show rescue of Hand>Bru3 flies by Mp RNAi. Their model would be further strengthened if a similar rescue can be shown with Hand>mblRNAi.

      Minor points:

      1. Fig. 1A,B: Ventral longitudinal muscles are covering the hearts on these images, so it's difficult to see the heart dimensions. This holds true for images throughout the manuscript. Where were the diameters measured (by the valves)? A better description and illustration would help the reader understand the situation.

      2. Fig. 1 A',B': White line does not reflect the location where SOHA data are measured and should be horizontal for consistency. Where is ventral vs. dorsal?

      3. Fig. 1D-F: Annotate 1 and 5 weeks in Figure, please. Also, why were 1 and 5 weeks tested? Is there an age-component in DM1 phenotype severity?

      4. Fig. 3A: Transcriptional analysis was done at which stage of development?

      5. Fig. 3: It is not clear, in which set the authors looked for miR-1 bindings sites (144 genes or the whole set)? Not well annotated. What is meant by 'heart-targeted'?

      6. Fig. 4C,D: It looks like they are not shown in the same dorsal-ventral orientation. Also, it looks Mp is overexpressed in the VLM, but Hand-Gal4 only drives in the cardiomyocytes and pericardial cells? How was quantification done?

      7. Fig. 4I: Why are some myofibers indicated in red in the model?

      8. Fig. 5 D-E: Genotypes need to be better indicated in the graphs. Did the authors control for multiple UAS sites? Is UPRT a UAS control?

      In the first paragraph of Result 3, the last sentence seems unfinished. "We identified a set of candidate genes, of which Multiplexin (Mp)"

      In Method, the in silico screening for miR-1 target should be explained in more detail.

      Significance

      The presented data is a significant advance our knowledge of our understanding of the molecular mechanisms involved in DM1. I expect that scientists in the muscular disease field and beyond will find this work of high interest.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      _Reply to the reviewers _

      Note: the three reviewers who provided comments were identified as Reviewers 2-4

      Reviewer #2

      1) I could not open any of the movies (while those associated with the BioRXiv preprint were fine). Some of the movies could be combined to minimize download/open clicking sequences.

      • The movies were uploaded as .avi files, as per Review Commons instructions, and we tested our ability to view them on several computers at our institution before submission. We are relieved the reviewer was able to access the .mp4 formatted movies via BioRXiv. We will ask the Review Commons Managing Editor to make sure there are no problems with the videos uploaded with the revised manuscript.*

      2) I really dislike reviewing papers without line numbers

      • Line numbers have been added to the revised version.*

      3) The manuscript could be made more relevant to malaria researchers by briefly discussing red cell invasion by merozoites (a single constriction and force against the cell cortex), migration of ookinetes (multiple constrictions during mosquito gut penetration) and sporozoites (long distance migration), but this is not a must.

      • Constrictions during ookinete migration are now mentioned on lines 265-269, and the discussion of the constriction at the moving junction has been broadened to include other apicomplexan parasites lines 270-278.*

      4) I would limit reporting of numbers to two digits, e.g. instead of 46.3% make it 46%; 2.56 +/- 0.38 to 2,6 +/- 0,4 etc

      > We have adjusted all numbers in the text and figures to the appropriate number of significant figures based on measurement precision.

      5) Millions of deaths, please rewrite, more like around 1 million from malaria and cryptosporidium; use citation (WHO)

      > Done (line 40)

      6) Motility: please don't mention flagella, which are used for swimming, in the same sentence / phrase / logic connection as lamellipodia, which are used for substrate based migration

      > The sentence has been rewritten to make clear that cilia and flagella are not organelles involved in the substrate-dependent motility of other eukaryotic cells (lines 47-49).

      7) In Figure 1B, I can see one microsphere and it's not clear if it moves completely back to the original position. In the movie it looks like it goes completely back, maybe exchange the last panel of the figure with a last frame from the movie? Or maybe better: replace with frames from movie 2, which is more striking and shows many beads being displaced?

      > As suggested, Figure 1B now shows frames from the other movie (former Video 2), where bead movement is more obvious.

      8) Please add the entire figure S1 to Figure 1. This is important for readers to understand and 'deserves' full figure status. Same for Figure S2.

      *> We have moved most of former Figure S1 into a new main Figure 2, as suggested. We left the two graphs as Supplemental data (new Figure S1), since these graphs simply show that parasite motility in fibrin is similar to the previously described motility of parasites in Matrigel. *

      *> Figure S2 has been moved to the main text, as suggested (in new Figures 3 and 6). *

      9) I would encourage the authors to elaborate more on the data on Figure S2. It appears that motile parasites did mostly not exert forces above the level for non-motile parasites; for how much motility did they observe forces? The meaning of the x-axis does not become clear. Are those individual parasites per time point or time points of one parasite or of the analyzed matrix volumes over several parasites? How many parasites where observed? This is stated more clearly later but needs to be done already here.

      > We have moved the data in former Suppl. Figure S2 into the main figures, broken it into two parts (Figures 3 and 6B-E) and included a new 3D volume view and additional explanatory detail in the figure legends and text to clarify these points of confusion (lines 100-116, 500-507, 564-570).

      10) Please change 0.042 um into 42 nm etc

      *> Done, lines 113-116. *

      11) Please move some of the data in Figure S8 to the main figures e.g. Figure 4, where it would make a nice contrast / comparison to the mic2 mutant. Please also put a WT for comparison.

      > Done; see revised Figure 6.

      12) I wonder if the defect in directional migration of the mic2 mutant is also partly due to the parasite not being able to squeeze through narrow matrix pores and hence is deflected more often. While I understand (and agree) with the authors observation (interpretation) of the wt parasites not squeezing but pulling, it's hard to think that such squeezing would not still play a part.

      *> The idea that the parasite needs to squeeze its way through pores in the matrix is intuitively appealing (and, in fact, what we had expected to see) but there is currently no data to support it. If squeezing were occurring, we should see an outward deformation of the matrix as the parasite pushes on the matrix fibers, but this is something we have never observed. We therefore think it is unlikely that the loss of directional migration is due to an inability to squeeze through pores in order to “stay on track”. *

      13) Hueschen et al is now on BioRXiv

      > The BioRXiv citation has been added (lines 293, 320).

      14) The shaving off of antibodies could be brought into context to the work on sporozoites by Aliprandini Nat Micro 2018 and on trypanosomes by Enstler Cell 2007 (but not a must)

      *> The two studies mentioned are intriguing and may be related to the well-documented anterior to posterior flux and shedding of GPI-anchored proteins from the surface of gliding Toxoplasma tachyzoites. What we are showing here is slightly different: the fluorescent antibodies on the cell surface seem to be “shaved” backwards at the constriction, much like surface bound antibodies are shaved backwards at the moving junction during invasion (Dubremetz 1985). In other words, there is a discontinuity in the density of surface staining at the constriction/junction. All of these processes may be related, but this is only speculation at this point and since the shaving of antibody at the constriction is a minor point of the paper (meant only to illustrate another similarity between 3D motility and invasion), we would prefer not to try to tie it to these other observations which may or may not be related. *

      15) Anterior-posterior flux: best experimental evidence for this is Quadt et al. ACS Nano 2016 for Plasmodium and Stadler MBoC 2017 for Toxoplasma. The common observations and differences could be discussed as they pertain to the current study

      > These two papers are now cited in our discussion of the linear motor model along with our speculation that the constriction reflects the motility-relevant zone of engagement of this rearward flux with ligands in the matrix (lines 319-322).

      16) The loss of mic2 could lead to the loss of the capability to form discrete adhesion sites that reveal themselves as the observed rings in 3D. I suggest to be careful to hypothesize that the absence of this and MyoA reveals a completely different motility mechanism. To me it seems more likely that the absence of the proteins means that the existing mechanism doesn't work perfectly any more, ie the highly tuned migration machinery misses a key part and malfunctions.

      *> The paragraph in question offered possible explanations for how parasites lacking the constriction could in fact move at normal speeds, not that motility was negatively affected. We have tried to make this more clear in the revision (lines 352-354), before describing the 3 possible explanations. *

      17) Maybe reflect on whether 'search strategy' might be a better word than 'guidance system'

      *> We have replaced the term “guidance system” in the title (lines 1-2), abstract (lines 33-36) and introduction (line 75) with more conservative references to the ability of the parasite to move directionally. The only place the term “guidance system” remains is in the final paragraph of the discussion, which is more speculative in nature, and where we now suggest it to be “part of” a guidance system. *

      Reviewer #3

      1) Extracellular matrix choice. The authors track the parasite movement first on Matrigel and next on fibrin. The authors exemplify the fibrin matrix on an image on Suppl. Fig 1 that shows a relatively quite large pore size, similar or greater than parasite size. Was the analysis done on parasites touching the fibers?

      *> Previous Suppl Figure 1A showed a confocal image at only one z-plane which did indeed give the impression that the pores are relatively large. We have changed this image to a more informative maximum intensity projection (New Figure 2A) and included a video showing the entire imaging volume (new Video 4), which makes clear that the matrix contains many small fibers and that the pores are smaller than the previous single z-plane suggested, so the parasite is likely to be near to or in contact with fibers of the matrix at all times. In Suppl Figure 1D we purposely used a less dense matrix in order to make the matrix deformation more obvious to the eye. The density of the matrix in Fig. 1D has been added to the legend. *

      2) Lack of movement of parasites. In many figures of the articles it is revealed that the majority of parasites in fibrin remain immobile (Suppl Fig 1, Fig 2, Video 5, Suppl Fig 2, Suppl Fig 8). The number of immobile parasites in Matrigel seem to be lower than in fibrin (Suppl Fig 1B) although no quantification is shown. How does the movement in fibrin and Matrigel compare? How does this compares with movement in stiff substrates in 2D? Could the lack of movement be caused by the large pore site in fibrin?.

      > We have added a panel to Suppl. Figure S1 showing that the proportions of parasites moving in fibrin vs Matrigel are not significantly different. In fact, none of our measured motility parameters are different between fibrin and Matrigel. Not all parasites move during the 80s of capture used for these matrix comparisons; some of the parasites are likely dead, but others may have simply not initiated motility during this time window. We typically see between 30-50% movement in 3D motility assays of this duration and similar numbers in 2D trail assays although we have not explored the effect of 2D substrate stiffness.

      3) Considering parasite movement: The authors consider that 3SD is a cutoff for considering parasite displacement. However, several timepoints fall behind this cutoff in the control without parasites and the knockouts with restricted movement.

      > We chose three standard deviations from the mean as our cutoff, in order to eliminate 99.7% of the noise. Since we calculate 16807 vectors per comparison, this leaves us with ~50 vectors above the cutoff even in samples with no moving parasites. Not surprisingly, these vectors are found at random locations in the volume. New Figures 3 and 6B-E and the associated text (lines 100-116, 500-507, 564-570) hopefully clarify this point adequately; it is quite obvious in Figure 3C which vectors correspond to parasite-induced displacements and which correspond to random noise.

      4) Imaging: Although the authors show a very detailed an illustrative table of the imaging acquisition conditions in table 1, it is unclear which microscope the authors used, as two microscopes are described in the methods section, a Nikon Eclipse TE300 widefield microscope and a Nikon AIR-ER confocal microscope. Which images were taken in each system? For the location of Table1 in the manuscript it seems that most images were taken with the Nikon Eclipse. Although this microscope has control over z, the images are quite noisy. How does the lack of confocallity might interfere with the analysis?

      > The high temporal resolution needed for 3D force mapping of cells that move several microns per second meant that all these experiments were done using a widefield microscope equipped with a piezo-driven z-stage. The fastest confocal we tested was not as fast as the widefield. However, spatial resolution suffered as a result of having to use widefield, particularly in z,* and this did indeed make our data more noisy as suggested by the reviewer. This may be why we were unable to detect fibrin deformation in the knockout parasites. The only data collected on the confocal microscope were those shown in new Figure 2A; we have clarified this on lines 421-427. Future studies will explore other imaging modalities such as light sheet microscopy in an attempt to achieve better spatial resolution while maintaining the high frame rates required for force mapping. *

      5) Nuclear constriction. The authors did not show any image or video exemplifying this.

      The images in Suppl. Figure 6 have been replaced with data that show the nuclear shape more clearly.

      6) Knockouts: The authors did not explain how did they generated the knockouts in the methods or did now show the efficacy of the knockout in any figure. If these knockout strains were a gift (I did not find it on the manuscript), the authors should indicate this more explicitly and reference the manuscript where they were described for the first time.

      > Both of the stable knockout lines used were generous gifts from Dr. Markus Meissner. We cited the original papers describing these lines in the text and thanked Dr. Meissner for providing them in the Acknowledgements section. We have now included an additional citation at the first mention of each of the knockouts (lines 174, 188) to make it even clearer where they came from.

      7) Discussion: Although the experimental methodology is sound the authors seem to make many assumptions and speculations on the discussion as how the appearance of this ring/constriction on the parasite translates into the helical movement of the parasite or the coupling of the ring with the cytoskeleton. Live imaging of actin dynamics or mathematical modelling could be used to support their claims.

      > We imaged parasites expressing the actin chromobody but were unable to visualize a ring of actin at the constriction. However, due to the speed of the parasites and the need for a fast frame rate (~15 ms per image) to reconstruct the 3D image volumes, the actin chromobody signal could be under our threshold of detection. We need to develop new, more sensitive ways to visualize proteins at the constriction, and this will be a major focus of our work going forward.

      *> We fully concur that mathematical modeling such as the work recently done by Hueschen et al on actin flow during motility and by Pavlou et al on the role of parasite twist during invasion has much to offer our understanding of these processes. Similar approaches may provide support to the speculations (not claims!) we offer in the discussion and, although beyond the scope of the current study, are a direction we intend to take this work in the future – particularly if we are able to improve the signal-to-noise in our force mapping. *

      8) Quantification of experiments missing: Overall, the main figures lack quantification that sometimes can be found in the supplemental information and sometimes is missing. I would suggest including quantifications next to the events described in the main figures). Likewise, some of the supplemental figures lack quantification (Suppl Fig 7, how many parasites showed this protein trail?)… Overall, the authors should indicate how many parasites were quantified in each figure. As they usually refer to number of constrictions. This is overall a problem in main figures 3 and 5. Or for example in Suppl Fig 5: How many parasites were quantified in this figure? The authors only show number of constrictions, and as the authors described, a parasite might have more than one constriction.

      > We have added further detail on the number of events/parasites quantified to both the figure legends and text throughout the manuscript, including the specific examples noted by the reviewer.

      9) Videos: The videos lack scale of time. Although this that can be found in main figures, it would be helpful to have the annotation in the videos. Likewise, some references for positions in videos, such as the cross found on Fig1 would be helpful for parasites that present little movement.

      > Time stamps have been added to all videos as suggested, and crosshairs have been applied to new Figure 1B and Suppl. Figures 7 and 8 to make the movement of the parasites more obvious. *

      *

      Reviewer #4

      1) I am not sure about the premise that the "linear model" of gliding motility predicts uniformly forward direction. Previous videos of 2D gliding show sporadic motility, changes in direction, or even reversal of direction are not infrequent. However, the current model could explain these behaviors if one or more of the following conditions occur: 1) myosin motors might be coordinating activated to initiate motility, followed by relaxation, 2) actin fibers might be transiently arrayed in clusters that change density and polarity over time, or 3) adhesins, necessary to generate traction, might vary in density and spatial orientation across the surface of the parasite. Changes in these properties would be expected result in zones that promote or disfavor local forces needed for motility - and reversal of direction could occur when forward forces relax and external elastic forces predominate.

      > The potential explanations offered by the reviewer for the frequent changes in direction of zoite motility are intriguing and worth exploring experimentally. The ability of actin fibers to periodically reverse polarity, or the presence of counteracting elastic forces are not components of the “standard” linear motor model of motility but, if they occur, could explain the patch gliding phenomenon and help refine our understanding of motility. Since the data in this manuscript do not in the end either strongly support or disprove the linear motor model – this may ultimately require higher resolution force mapping methods that can detect the forces responsible for forward motion – we have de-emphasized potential problems with the model in the introduction and deleted specific discussion of patch gliding as one of these problems (lines 61-64).

      2) The model favored here: "we propose that force is generated, at least in part, by the rearward translocation of the subset of actin filaments that are coupled to adhesins at the circular ring of attachment" does not seem fundamentally different from the current model - other than it focuses the forces at a critical junction that the parasite migrates through. It seems to me that this is a refinement of the current model and not a replacement. As such, the authors might focus on how their data improve the model rather than pointing out prior deficiencies (although I get that editors like this style).

      > We agree with the reviewer and have modified the text to be more circumspect on this issue* (lines 319-331). *

      3) The finding that the absence of MIC2 affects the constriction formed by inward pull on the matrix is quite convincing and interesting. However, mutants that cannot form the constriction, still move at similar speeds. This suggest that the inward force is different from the motor itself and affects its ability to impart direction, rather than the ability to move per see. The interpretation of the MyoA defect is complicated since motility is certain to be disrupted, the potential role of an independent inward force may no longer be detectable.

      > We agree with the reviewer on this point as well: the forces we have observed to date cannot explain forward motion. We stated this previously and have now emphasized the point further *(lines 322-324, 352-357). Because the parasite is moving forward, the forces responsible must be there but are likely below our threshold of detection. In order to visualize these forces, we are going to need new imaging modalities that can achieve better signal-to-noise than our current setup at the high frame rates required for force mapping. That said, we new data we have added to the manuscript are at least consistent with the narrow diameter ring of the constriction making a contribution to the parasite’s forward motion (new Suppl. Figure 10 and lines 347-351) *

      4) Although I agree with the authors that there are striking parallels between motility in 3D and cell invasion, I am not certain about their conclusion that the construction seen during cell entry is due to the parasite pulling inwardly. When entering the host cell, the parasite must also navigate the dense subcortical actin network, which likely also aids in forming the constriction that is observed. It would be interesting to record this pattern under conditions where host cell actin is destabilized while parasite motility is intact- for example using cytochalasin D to treat wild type host cells during invasion by resistant parasites.

      *> We do not conclude that the constriction during invasion is due to the parasites pulling inwardly, but we do propose that this possibility needs to be considered based on the noted similarities between invasion and motility and our clear (and somewhat surprising) demonstration that the moving parasite pulls on the matrix at the constriction during motility. During invasion, the parasite may indeed have to squeeze through the dense subcortical network – or it may use secreted proteins to loosen up the network so that no squeezing is required. We just don’t know, and our purpose here was simply to put this alternative possibility on the table because we believe it is a viable possibility that follows from the data presented. *

      > We thank the reviewer for the suggestion of testing what happens when cytoD resistant parasites invade in the presence of cytoD; this is a clever idea that we will likely pursue in future work.

      5) Not all of the color patterns shown in Figure 1A are consistent with the model. For example, GAP40 (yellow) does not appear in the model, there are two MLC boxes, but they are different shades, and ELC1/2 does not appear in the model.

      > We thank the reviewer for catching this error; it has now been fixed.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      The study provides new insight into the gliding motility of Toxoplasma gondii through the use of time lapse video microscopy combined with 3D traction force mapping. Substantial new insight is provided through the discovery that the parasite pulls inward on the matrix, creating a moving junction that it glides through during forward motility. This process of migration in 3D thus closely resembles the invasion of host cells. These data carefully documented and improve our understanding of how gliding motility operates. There are a few issues surrounding how previous data are presented and the relationship between this new inward force and the motor complex that require better explanation.

      Major points

      1. I am not sure about the premise that the "linear model" of gliding motility predicts uniformly forward direction. Previous videos of 2D gliding show sporadic motility, changes in direction, or even reversal of direction are not infrequent. However, the current model could explain these behaviors if one or more of the following conditions occur: 1) myosin motors might be coordinating activated to initiate motility, followed by relaxation, 2) actin fibers might be transiently arrayed in clusters that change density and polarity over time, or 3) adhesins, necessary to generate traction, might vary in density and spatial orientation across the surface of the parasite. Changes in these properties would be expected result in zones that promote or disfavor local forces needed for motility - and reversal of direction could occur when forward forces relax and external elastic forces predominate.
      2. The model favored here: "we propose that force is generated, at least in part, by the rearward translocation of the subset of actin filaments that are coupled to adhesins at the circular ring of attachment" does not seem fundamentally different from the current model - other than it focuses the forces at a critical junction that the parasite migrates through. It seems to me that this is a refinement of the current model and not a replacement. As such, the authors might focus on how their data improve the model rather than pointing out prior deficiencies (although I get that editors like this style).
      3. The finding that the absence of MIC2 affects the constriction formed by inward pull on the matrix is quite convincing and interesting. However, mutants that cannot form the constriction, still move at similar speeds. This suggest that the inward force is different from the motor itself and affects its ability to impart direction, rather than the ability to move per see. The interpretation of the MyoA defect is complicated since motility is certain to be disrupted, the potential role of an independent inward force may no longer be detectable.
      4. Although I agree with the authors that there are striking parallels between motility in 3D and cell invasion, I am not certain about their conclusion that the construction seen during cell entry is due to the parasite pulling inwardly. When entering the host cell, the parasite must also navigate the dense subcortical actin network, which likely also aids in forming the constriction that is observed. It would be interesting to record this pattern under conditions where host cell actin is destabilized while parasite motility is intact- for example using cytochalasin D to treat wild type host cells during invasion by resistant parasites.

      Minor points

      Not all of the color patterns shown in Figure 1A are consistent with the model. For example, GAP40 (yellow) does not appear in the model, there are two MLC boxes, but they are different shades, and ELC1/2 does not appear in the model.

      Significance

      The study provides a conceptual advance that improves our understanding of gliding motility in apicomplexan parasites. It will spur future research in the area to better define the process, although it does not yet offer a new mechanistic foundation.

      The work will be of interest to those working on motility in general and parasite systems in specific.

      I have worked on cell motility and invasion in this group of organisms for many years, although we currently focus on other questions.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript, Stadler et al., characterize the biophysics behind Toxoplasma gondii locomotion on extracellular matrix. The authors describe the generation of a circular/ring structure that pull extracellular matrix in a highly localized way, creating a constriction on the parasite accompanied by a forward movement. In addition, they characterize the movement in two knockouts of parasite myosin and adhesins. The characterization of the biophysical forces necessary for parasites to complete their life cycles is timely and necessary and relevant to improve our understanding and to design new antiparasitic strategies. I would like to praise the authors for this effort However, a few major and minor corrections and clarifications are necessary before the publication of the article.

      Major Comments:

      Extracellular matrix choice. The authors track the parasite movement first on Matrigel and next on fibrin. The authors exemplify the fibrin matrix on an image on Suppl. Fig 1 that shows a relatively quite large pore size, similar or greater than parasite size. Was the analysis done on parasites touching the fibers? Lack of movement of parasites. In many figures of the articles it is revealed that the majority of parasites in fibrin remain immobile (Suppl Fig 1, Fig 2, Video 5, Suppl Fig 2, Suppl Fig 8). The number of immobile parasites in Matrigel seem to be lower than in fibrin (Suppl Fig 1B) although no quantification is shown. How does the movement in fibrin and Matrigel compare? How does this compares with movement in stiff substrates in 2D? Could the lack of movement be caused by the large pore site in fibrin?. Considering parasite movement: The authors consider that 3SD is a cutoff for considering parasite displacement. However, several timepoints fall behind this cutoff in the control without parasites and the knockouts with restricted movement.

      Imaging: Although the authors show a very detailed an illustrative table of the imaging acquisition conditions in table 1, it is unclear which microscope the authors used, as two microscopes are described in the methods section, a Nikon Eclipse TE300 widefield microscope and a Nikon AIR-ER confocal microscope. Which images were taken in each system? For the location of Table1 in the manuscript it seems that most images were taken with the Nikon Eclipse. Although this microscope has control over z, the images are quite noisy. How does the lack of confocallity might interfere with the analysis? Nuclear constriction. The authors did not show any image or video exemplifying this. Knockouts: The authors did not explain how did they generated the knockouts in the methods or did now show the efficacy of the knockout in any figure. If these knockout strains were a gift (I did not find it on the manuscript), the authors should indicate this more explicitly and reference the manuscript where they were described for the first time.

      Discussion: Although the experimental methodology is sound the authors seem to make many assumptions and speculations on the discussion as how the appearance of this ring/constriction on the parasite translates into the helical movement of the parasite or the coupling of the ring with the cytoskeleton. Live imaging of actin dynamics or mathematical modelling could be used to support their claims.

      Minor comments:

      Quantification of experiments missing: Overall, the main figures lack quantification that sometimes can be found in the supplemental information and sometimes is missing. I would suggest including quantifications next to the events described in the main figures). Likewise, some of the supplemental figures lack quantification (Suppl Fig 7, how many parasites showed this protein trail?).

      Overall, the authors should indicate how many parasites were quantified in each figure. As they usually refer to number of constrictions. This is overall a problem in main figures 3 and 5. Or for example in Suppl Fig 5: How many parasites were quantified in this figure? The authors only show number of constrictions, and as the authors described, a parasite might have more than one constriction.

      Videos: The videos lack scale of time. Although this that can be found in main figures, it would be helpful to have the anotation in the videos. Likewise, some references for positions in videos, such as the cross found on Fig1 would be helpful for parasites that present little movement.

      Significance

      Host-parasite interactions are driven by a combinations of biochemical and mechanical factors, but most research focuses on the molecular side. This article aims to better define the mechanical properties behind Toxoplasma movement. This is important, because understanding the biophysical determinants behind parasite movement is essential and has been historically ignored. To my knowledge, this manuscript is among the few that aim to define the physical cues driving toxoplasma movement.

      Although the article is focused on the mechanobiology of Toxoplasma interactions with the extracellular matrix, the article is easy to read and accessible to molecular and cellular parasitologists/biologists.

      My background covers host-parasite interactions in 3D bioengineered models. This review has been done together with an expert in mechanobiology. Most of the article falls behind our expertise except for computational modelling of single cell displacements.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Unicellular apicomplexan parasites including those causing toxoplasmosis and malaria can migrate at extremely high speed and invade host cells using a specialized actin-myosin motility machinery termed the glideosome and a motility mode termed gliding during which they do not change their shape. In their paper Ward and colleagues examine the migration of T. gondii tachyzoites in 3D matrixes that are fluorescently labelled and hence allow the detection of their displacements and calculation of force vectors. The authors discover that tachyzoites move by a high degree of continuous constrictions reminiscent of those seen during cell invasion and probe not only wild type parasites but also two key mutants, which reveal a striking absence of the constrictions and changed trajectories.

      The manuscript is well written and should be understandable to the wide audience it is written for but I would encourage the authors to move some of their striking data from the supplement to the main figures.

      General critique:

      I could not open any of the movies (while those associated with the BioRXiv preprint were fine)

      I really dislike reviewing papers without line numbers

      The manuscript could be made more relevant to malaria researchers by briefly discussing red cell invasion by merozoites (a single constriction and force against the cell cortex), migration of ookinetes (multiple constrictions during mosquito gut penetration) and sporozoites (long distance migration), but this is not a must.

      I would limit reporting of numbers to two digits, e.g. instead of 46.3% make it 46%; 2.56 +/- 0.38 to 2,6 +/- 0,4 etc

      Further suggestions:

      Introduction: Millions of deaths, please rewrite, more like around 1 million from malaria and cryptosporidium; use citation (WHO)

      Motility: please don't mention flagella, which are used for swimming, in the same sentence / phrase / logic connection as lamellipodia, which are used for substrate based migration

      In Figure 1B, I can see one microsphere and it's not clear if it moves completely back to the original position. In the movie it looks like it goes completely back, maybe exchange the last panel of the figure with a last frame from the movie? Or maybe better: replace with frames from movie 2, which is more striking and shows many beads being displaced?

      Please add the entire figure S1 to Figure 1. This is important for readers to understand and 'deserves' full figure status. Same for Figure S2.

      I would encourage the authors to elaborate more on the data on Figure S2. It appears that motile parasites did mostly not exert forces above the level for non-motile parasites; for how much motility did they observe forces? The meaning of the x-axis does not become clear. Are those individual parasites per time point or time points of one parasite or of the analyzed matrix volumes over several parasites? How many parasites where observed? This is stated more clearly later but needs to be done already here.

      Please change 0.042 um into 42 nm etc

      Please move some of the data in Figure S8 to the main figures e.g. Figure 4, where it would make a nice contrast / comparison to the mic2 mutant. Please also put a WT for comparison.

      I wonder if the defect in directional migration of the mic2 mutant is also partly due to the parasite not being able to squeeze through narrow matrix pores and hence is deflected more often. While I understand (and agree) with the authors observation (interpretation) of the wt parasites not squeezing but pulling, it's hard to think that such squeezing would not still play a part.

      Discussion: Hueschen et al is now on BioRXiv

      The shaving off of antibodies could be brought into context to the work on sporozoites by Aliprandini Nat Micro 2018 and on trypanosomes by Enstler Cell 2007 (but not a must)

      Anterior-posterior flux: best experimental evidence for this is Quadt et al. ACS Nano 2016 for Plasmodium and Stadler MBoC 2017 for Toxoplasma. The common observations and differences could be discussed as they pertain to the current study

      The loss of mic2 could lead to the loss of the capability to form discrete adhesion sites that reveal themselves as the observed rings in 3D. I suggest to be careful to hypothesize that the absence of this and MyoA reveals a completely different motility mechanism. To me it seems more likely that the absence of the proteins means that the existing mechanism doesn't work perfectly any more, ie the highly tuned migration machinery misses a key part and malfunctions.

      Maybe reflect on whether 'search strategy' might be a better word than 'guidance system'

      Some of the movies could be combined to minimize download/open clicking sequences.

      Significance

      This manuscript provides both a truly remarkable technical advance and interesting insights into the way these parasites move, which will likely also be of relevance to the way other parasites of the same group of organisms move. Due to the uniqueness of eukaryotic gliding motility, it's high speed and the importance of infection, this manuscript will be of general interest to cell biologists studying cell migration and to infection disease researcher studying processes of pathogenesis. It will also appeal to biophysicists looking at cellular force generation. The paper is comparable in insight/relevance to recent work by Del Rosario et al, 2019; Pavlou et al., 2020, two studies that also use high end imaging and biophysical methods to understand parasite migration and invasion. My expertise: cell biology of parasites

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Comment 1.1: Figure 4. The figure legend and sub-figures are inconsistent. They do not match.

      Response 1.1: Apologies for the error. In the revised manuscript we have changed the order of panels in Figure 4 to make it consistent with the figure legends.

      Comment 1.2: Figure 4. For the Sanger sequencing trace of the edited HEK293 cells, why there are noise peak?

      Response 1.2: It is a heterozygous knock-in, with only one allele has a mutation. Moreover, it is a PCR product we have sequenced hence it looks noisy.

      Comment 1.3: How many single cell clones were chosen for further analyses after CRISPR genome editing? The authors should do single cell filtering by Flow Cytometer or others.

      Response 1.3: We had one clone with heterozygous knock-in.

      Comment 1.4: The authors conducted RT-qPCR to quantify mRNA expression, RNA-Sequencing should be more accurate.

      Response 1.4: We had one clone with heterozygous knock-in hence we used this clone for RT-qPCR. As reviewer no 3 suggested, RNA sequencing is not needed to show the effect of this mutation on genes in cis.

      Comment 1.5: The discussion is too long, please shorten.

      Response 1.5: In the revised manuscript we have shortened the discussion.

      Reviewer #1 (Significance):

      This study investigates the genetic and molecular mechanisms of intellectual disability (ID) by integrating whole genome sequencing and follow up functional explorations. The results provide novel insights into genetic aetiology of ID.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript by De Vas et al describes an investigation of the contribution of non-coding de novo variants to intellectual disability (ID). The authors perform whole genome sequencing (WGS) of 21 ID probands and both parents, and combine these data with WGS from 30 trios previously sequenced. The authors use publicly available data from the Roadmap Epigenomics project to identify sets of enhancers hypothesised to have a role in ID, such fetal brain specific enhances and enhancers associated with known ID-associated genes. These enhancer sets are then tested for enrichment of non-coding de novo variants ID, using publicly available de novo variant data from the Genome of Netherlands (GoNL) project as a control comparison. The authors report that de novo variants in ID are significantly enriched within fetal brain-specific and human-gained enhancers. This is perhaps the main finding of the study. The authors also identify recurrent de novo variants in ID within clusters of enhancers that regulate the genes CSMD1, OLDM1 and POU3F3 in ID. A number of functional experiments are performed to provide further insights in the mechanisms by which de novo variants impact the expression of putative target genes; for example, data is provide that show de novo variants observed in ID within a SOX8 enhancer leads to reduced expression of the SOX8 gene. In conclusion, the authors claim that their data support de novo variants in fetal brain enhancers as contributing to the aetiology of ID.

      Major comments.<br /> The study uses leading edge genomic technologies to generate WGS in a new ID sample, which is used to investigate the role of non-coding variants to ID aetiology. The manuscript is in general very well written. However, a weakness of the study is a very small sample size, which should result in low statistical power. Despite this power consideration, the authors report very strong P values for their main findings. My main concern with the study is that the methodology used to evaluate enrichment of de novo variants within specific sets of enhancers is unclear, and therefore as it currently stands, I am unable to be confident in the findings. I am also concerned about whether data from the Genome of the Netherlands project is a suitable control comparison, given technical differences that are likely to exist between this and the ID data set. I further explain these methodological concerns below:

      Comment 2.1: When testing for the enrichment of de novo variants, the most commonly used approach in the field involves testing whether the observed number of de novo variants in a given genomic region is greater than the number expected by chance, using a Poisson test. Here, the expected number of de novo variants is derived from trinucleotide mutation rates. This method was first proposed by Samocha et al 2014. The current authors use trinucleotide mutation rates to estimate the expected number of de novos among enhancer sets, and cite the Samocha paper, but my understanding is that they do not use a Poisson test to evaluate enrichment. Instead, they use the expected number of mutations among the enhancer sets to normalise the observed number of de novo variants, but it is not clear to me why this is performed, and also what data and the statistical test is actually being used to evaluate de novo variant enrichment? I can guess at what they have done, but the methods section outlining this test should be more clearly explained.

      Response 2.1: The Samocha et al 2004 paper provides a statistical framework to estimate the expected number of DNMs under neutral evolution. However, our aim was not to estimate the enrichment of DNM in fetal brain enhancers with a background rate of mutation (see the answer to the next comments for a detailed explanation). Our aim was to investigate whether in our ID cohort DNMs were enriched in the enhancers that are specifically active in the fetal brain or the enhancers that are active in specific subsections of the adult brain. Hence, we compared the number of DNMs in the fetal brain enhancers (Fetal brain enhancers and human gain enhancers) with the number of DNMs in enhancers of various sub-sections of the adult brain. In Table S6 of the revised manuscript, we have highlighted the values that were used for the statistical test. We used a T-test to estimate whether fetal brain enhancers were enriched for ID DNM as compared to adult brain enhancers.

      However, as pointed out by the reviewer in comment 2.3, the sequence composition and overall size in base pair vary significantly between fetal brain enhancers, human gain enhancers and enhancers from adult brain subsections thus they may have different background mutation rates. Hence, before doing any comparison between DNMs in various enhancer sets (fetal vs adult), it is important to normalise them to the same background mutation rate for valid comparison. Hence, we used the framework provided in Samocha et al 2014 paper to estimate the background mutation rate of various enhancer sets and normalised them to the background mutation rate of fetal brain-specific enhancer set.

      For example, the background mutation rate for fetal brain-specific enhancers is 0.970718 and we observed 53 DNMs. Similarly, the background mutation rate for the adult brain sub-section angular gyrus is 0.680226 and we observed 22 DNMs. Because of the difference in background mutation rate, we cannot directly compare the number of DNMs between fetal brain enhancers and angular gyrus. Hence, we normalised the observed number of DNMs in angular gyrus enhancers to a background mutation rate of 0.970718 using the following formula.

      (Observed number of DNMs in angular gyrus enhancers x mutation rate of fetal brain enhancers) / mutation rate of angular gyrus enhancers

      (22 x 0.970718) / 0.680226 = 31.395

      Similarly, we normalised the observed number of mutations from all adult brain subsections and human gain enhancers to a background mutation rate of 0.970718 (Table S6) so that we could perform a valid comparison between the observed number of mutations from various enhancer sets.

      In the revised manuscript, we have revised the method section to make it clearer (Page 23, line 23 to page 24, line 19).

      Comment 2.2: Can the authors please explain why they did not used the standard de novo variant enrichment approach outlined in Samocha et al 2014, which is used in similar non-coding de novo studies of ID (e.g. Short et al 2018 Nature)? My concern is that using the Samocha approach, no enrichment would be observed in fetal brain enhancers, given the data presented in supplemental table S6.

      Response 2.2: The Samocha et al 2014 paper provides the statistical framework to evaluate the rates of de novo mutation (DNM) assuming neutral selection. The variants that lead to disease (functional variants) tend to be under negative selection. Thus, the region or a gene that is devoid of functional variants is likely to reflect a region or a gene that is under selective constraint. The functional variants in such regions or genes are likely to cause disease (Samocha et al, 2014, Nature Genetics). This approach was used to identify genes that are intolerant to loss of function mutations (Lek et al, 2016, Nature).

      As we discussed in the manuscript, due to the triplet codon structure it is relatively easy to predict functional consequences of DNMs in protein-coding regions of the genome, thus it becomes easy to distinguish likely functional variants from non-functional variants. Please note that only protein-truncating and damaging missense (potentially functional) coding DNMs show enrichment in NDD and not non-functional DNMs.

      In non-coding regions of the genome, in absence of a codon like structure, it is extremely challenging to distinguish potentially functional variants from non-functional variants. A very small proportion of the DNMs that overlap enhancer regions might be truly functional (under selective pressure) and the majority might be non-functional (neutral). Hence, it is not possible to achieve statistical significance using Samocha et al 2014 framework for enhancer DNMs with a small cohort when the enhancer set contains a mixture of functional (a small fraction) and non-functional (a large fraction) DNMs. An analogy for the protein coding region would be applying Samoch et al 2014 framework to all protein-coding variants including synonymous mutations, which may not show enrichment of DNMs in the disease cohort.

      Given the small sample size and non-availability of tools and techniques to separate functional non-coding variants from non-functional variants, we did not use Samocha et al 2014 framework to show the enrichment of DNMs in fetal brain enhancers. Instead, we asked a simple question, out of fetal and adult brain enhancer sets which one is enriched for DNMs in the ID cohort?

      In the revised manuscript, in the abstract (Page 3, line 8) we changed the sentence to clarify that the enrichment of ID DNMs in fetal brain enhancers was against the adult brain enhancers.

      Comment 2.3: In Supplemental table S6, the normalised expected number of de novo variants across all different enhancer sets within the ID and GoNL samples is the same. Can the authors clarify why this is the case, as presumably these sets contain very different genomic sequences, and therefore one would not expect the same number of DNMs?

      Response 2.3: See the detailed explanation in answer to comment 2.1. As we normalised observed the number of DNMs from various enhancer sets to the background mutation rate of fetal brain enhancers (0.970718), the expected number of DNMs (number of samples X mutation rate, 47x0.970718 = 45.623746) is the same for all enhancer sets.

      Comment 2.4: Instead of using the standard enrichment approach proposed by Samocha et al 2014, the authors compare the rates of de novo variants in ID to those reported in the GoNL study. However, very little information is provided about the de novo variant data from the GoNL. Presumably, the GoNL and the current study used different approaches to sequence samples, call variants, and QC the data. Also, is the coverage across these studies comparable? All these factors will contribute to batch effects, and therefore I am not convinced that the GoNL study is an appropriate control comparison. The authors should provide data to reassure the reader that these samples can be compared. For example, are similar rates of de novo variants found between these samples for variants in null enhancers sets? To clarify, an equivalent analysis in exome sequencing studies would be to show that the rates of synonymous variants are the same across data sets.

      Response 2.4: We would like to point out that we haven’t performed a direct comparison between our ID cohort and GoNL cohort. We are aware that there are technical differences between DNM identification in our cohort and the GoNL cohort. The GoNL genomes were sequenced on Illumina HiSeq 2000 with 13X coverage while ID cohort reported in this study were sequenced on the Illumina HiSeq X10 platform with an average coverage of 37X. Hence, We did not perform a direct comparison between our ID cohort and GoNL cohort.

      We evaluated the enrichment of DNMs in fetal brain-specific enhancers compared to adult brain-specific enhancers independently within ID and GoNL cohorts. We compared the number of DNMs in fetal brain enhancers vs adult brain enhancers within the ID cohort. We observed the significant enrichment of DNMs in fetal brain-specific enhancers as compared to adult brain enhancers in the ID cohort. Next, we asked whether the DNMs from healthy individuals also show enrichment in fetal brain-specific enhancers or whether this enrichment was specific to the ID cohort. To answer this question, we used the GoNL cohort and performed a comparison between fetal brain enhancers and adult brain enhancers within GoNL cohort. We did not find any enrichment in fetal brain enhancers. As analysis is performed independently within each cohort between fatal and adult brain enhancers, hence the technical differences between the two datasets would not have any effect on the results.

      To make it clear, we have changed the text in the revised manuscript (Page 8, lines 1-4). We have also changed a sentence in the abstract from “We found that regulatory DNMs were selectively enriched in fetal brain-specific and human-gained enhancers.” to “We found that regulatory DNMs were selectively enriched in fetal brain-specific and human-gained enhancers as compared to adult brain enhancers.”

      Comment 2.5: The replication analysis of enhancer clusters that are recurrently hit be de novo variants in ID is weak. For enhancer clusters with recurrent de novo variants in their ID cohort, the authors simply report the number of de novo variants observed in these enhancers in the Genomics England cohort, but they do not test whether the observed number in Genomics England is greater than that expected. For their findings to be replicated, they need to show the de novo rate is statistically above expectation.

      Response 2.5: To improve the replication analysis, we estimated the expected number of DNMs in the Genomics England cohort (n=3,169) in CSMD1, OLFM1 and POU3F3 enhancer clusters using the framework defined in Samocha et al 2014 paper and estimated statistical significance using a poison test. We found that the POU3F3 enhancer cluster was significantly enriched for DNMs even after multiple test corrections. We included these findings in the revised manuscript (Page 12, lines 24-27). In addition, we applied Samocha et al framework to CSMD1, OLFM1 and POU3F3 enhancer clusters in our ID cohort as well. We found that all three enhancer clusters were enriched for DNMs after multiple test correction.

      Minor comments:<br /> Comment 2.6.1: The authors state that all coding de novos were validated by Sanger sequencing, but what about the non-coding de novos? Validation of the specific mutations that contribute to the main findings would strengthen the paper.

      Response 2.6.1: The potentially pathogenic coding variants were validated using sanger sequencing by clinicians to report our findings to respective families. However, as non-coding DNMs could not be reported back to families as a diagnosis until the pathogenicity of these DNMs is fully established, clinicians (who have patients' DNA) are reluctant to perform Sanger sequencing to confirm the DNM. However, we have investigated each non-coding variant reported in the manuscript in IGV and their pattern looks similar to the validated coding DNMs, hence we are confident that they are true DNM calls.

      Comment 2.6.2: In the introduction, the line "A family with two affected siblings was analysed for the presence of recessive variants" seems out of place and incomplete, as there is no mention of the results from this analysis.

      Response 2.6.2: Sorry for the error, we have removed this sentence from the manuscript.

      Comment 2.6.3: In the discussion, they write "It is noteworthy that in protein-coding regions of the genome, only protein-truncating variants (PTV), but not other protein-coding mutations, show significant enrichment in neurodevelopmental disorders (11,41)". This is not true. In Kaplanis et al 2020, damaging missense variants are robustly shown to contribute to NDDs (see SM figure 3 for example).

      Response 2.6.3: Thank you very much for pointing out the fact that the damaging missense mutations contribute to the NDD. We have changed the sentence in the revised manuscript and included damaging missense in the sentence (Page 16, lines 21-23).

      Comment 2.6.4: The data availability statement is weak. Many similar studies have deposited sequencing data from NDD cohorts to appropriate repositories.

      Response 2.6.4: We agree with the reviewer's suggestion, however, due to the restrictions of ethical approval, we may not be able to deposit sequence data to public databases even with controlled access.

      Comment 2.6.5: The authors should consider making the code used for their analysis open source, as this would help clarify some of the methodological questions I, and other may, have.

      Response 2.6.5: We have made available code used to calculate the expected number of DNMs in a set of enhancers and cohort size on GitHub ( https://github.com/santoshatanur/expDNM).

      Reviewer #2 (Significance):

      This is in important area of research, as the fraction of ID explained by non-coding variants is unknown. However, the very small sample size, especially when compared with other sequencing studies of NDDs in the literature, unfortunately limit the significance of the advance. Nevertheless, if authors can show that the results reported in the paper are robust, then the findings will be of interest to both researchers and clinicians studying NDDs.

      My area of expertise is in the generation and analysis of sequencing data to study psychiatric and neurodevelopmental disorders. I have a lot of experience analysing exome sequencing data from proband-parent trios. I do not have experience with CRISPR, so I have not commented on that part of the study.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary

      In this manuscript, Vas and Boulet et al. presents the potential regulatory role of de novo mutations (DNMs) in intellectual disability (ID). They performed whole-genome sequencing in an ID cohort including 21 ID probands and their healthy parents. To study the regulatory DNMs in ID, they combined 17 ID probands without pathogenic coding DNMs with a previous cohort including 30 exome-negative ID cases. Leveraging their DNM dataset with a variety of epigenomic datasets, they observed ID DNMs were enriched more within fetal brain enhancers than adult brain enhancers. They also detected that the enhancers harboring ID DNMs showed promoter-enhancer interactions for the ID-relevant genes. Moreover, they identified recurrent mutations within enhancer clusters associated with CSMD1, OLFM1, and POU3F3 genes, when combining with larger pre-existing databases of genetic variants. Finally, they found that many ID DNMs were predicted to disrupt binding motifs of TFs, and experimentally validated the regulatory function of some of these loci. They showed the allele-specific activity for an enhancer region including an ID DNM for the SOX8 gene via luciferase assay as an episomal assay. They further showed that the same enhancer region regulates SOX8 expression by performing CRISPRi, and proved the allele-specific impact of the same DNM via also genome editing with CRISPR/Cas9.

      Major

      Comment 3.1: The sample size of the Whole Genome Sequencing conducted in this study is extremely limited, and therefore the conclusions that can be drawn from the study are also extremely limited. The authors combined their data with existing cohorts for a subset of analyses, however, the novelty and utility of the findings from this cohort alone is limited.

      Response 3.1: The fact that our sample size is small has been sufficiently addressed in the manuscript. However, we have applied robust statistical methods and used state of art experimental techniques to support our findings. Even with the smaller sample size, we show that the DNMs in ID patients are enriched in fetal brain enhancers as compared to adult brain enhancers. We identified three enhancer clusters with recurrent mutations and one of them was replicated in a large cohort. Because our sample size was small, we performed extensive experimental validations. We show that nine DNMs, from nine different ID patients, that are located in fetal brain enhancers show allele-specific expression. Furthermore, we show that SOX8 enhancer DNM indeed affects SOX8 expression using CRISPR knock-in. Though our sample size is small, with strong experimental support, we believe our findings are widely applicable.

      Comment 3.2: Multiple testing burden must be considered when conducting enrichment studies in genomic regions using WGS data. Unfortunately, it is not considered here and without this the observed enrichment is not convincing. See for example https://www.nature.com/articles/s41588-018-0107-y.

      Response 3.2: In the manuscript, we have presented the outputs of multiple independent analyses where we applied different statistical tests. In any analysis, if more than one hypothesis was tested, we applied multiple test correction. In the manuscript, we clearly mentioned whether the test is significant at a nominal p-value or after multiple test corrections. For example, enrichment analysis for developing cortex and prefrontal cortex cell types. Here we mention that “On the contrary, all four developed human brain cell types showed significant enrichment for ID DNMs compared to GoNL DNMs in promoter regions after correcting for multiple tests.” (Page 11, lines 12-14).

      However, we agree that in the original manuscript we did not apply the multiple test correction for fetal vs adult brain enrichment analysis. In the revised manuscript, we have now applied multiple test corrections for fatal vs adult brain enrichment analysis. To achieve uniformity throughout the manuscript, we used R package p.adjust to estimate the false discovery rate (FDR) after multiple test corrections for all the analyses where more than one hypothesis test was performed.

      • DNM enrichment in fetal vs adult brain enhancers
      • Enrichment of known ID genes in the genes associated with the DNM-containing fetal brain enhancers
      • DNM enrichment analysis for developing brain and developed brain cell types
      • Recurrent DNMs in enhancer clusters

      The gene ontology enrichment and tissue enrichment analysis for genes associated with the DNM-containing enhancers were performed using the web-based tool Enrichr (https://maayanlab.cloud/Enrichr/), which applies Bonferroni correction for all the tests. Similarly, tissue enrichment analysis for transcription factors whose binding sites were disrupted by the DNM was also performed using Enrichr. Hence for both of these analyses, p-values provided by Enrichr were reported in the manuscript.

      The enrichment analysis of genes that are intolerant to loss of function mutations in genes associated with DNM-containing enhancers was a single test so multiple test correction was not applied.

      In the revised manuscript, we have now applied multiple test correction to all the analyses where it was appropriate to apply. In the revised manuscript, we have now mentioned the statistical test used, the p-value obtained and the FDR for all the statistical tests.

      Comment 3.3: The total number of promoter enhancer interactions as shown in Figure 2 is unbelievably high. The number of gene loops previously detected using Hi-C is much lower. This analysis seems to assign every enhancer in the region to the promoter within a TAD, which is much too liberal an analysis and not consistent with number of gene loops detected via Hi-C or eQTL work.

      Response 3.3: As explained in detail in the manuscript to identify enhancer-promoter interactions, we used promoter capture Hi-C data and correlation of H3K27ac signal across 127 tissues/cell types available through a roadmap epigenomic project. On average each enhancer was associated with 1.64 genes and each gene was interacting with 4.83 enhancers. These findings were consistent with previous reports of enhancer-promoter interactions (25). We added this to the revised manuscript (Page 8, lines 25-27)

      The specific genes presented in Figure 2 might have a higher number of enhancers associated with them because of the specific genomic architecture in those regions. For example, the TAD containing the CSMD1 gene is a single gene TAD.

      Comment 3.4: Because the total number of DNMs are few, I would recommend moving genomic annotations to hg38 rather than losing 123 DNMs via liftover to hg19.

      Response 3.4: As we mentioned in the manuscript, we used a large amount of publicly available epigenomic datasets which are mostly available in hg19. To move the analysis to HG38 we need to liftover all the epigenomic datasets to HG38, which is much more complicated than liftover of DNMs to hg19.

      Comment 3.5: The source of the neural progenitors used in the experiments are not described.

      Response 3.5: We have differentiated hESC (H9) to NPCs, methods are now detailed in the manuscript under the heading “NPC culture protocol” (Page 29, lines 22-25).

      Comment 3.6: The non-targeting or control gRNA is not described.

      Response 3.6: Control gRNA is now described in the method (Page 30, line 7).

      Comment 3.7: It's difficult to transfect both neural progenitors and neurons, it would be useful to see images of GFP expression if this is on the plasmid to know the degree of transfection efficiency and give greater confidence in the results presented in Figure 4.

      Response 3.7: We agree it is difficult to transfect these cells, Hence we have transfected NPCs followed by a selection of transfected cells using antibiotics.. (detailed in the manuscript methods section Page 31, lines 6-7)

      Comment 3.8: The specific instances where a one-tailed statistical test was used need to be highlighted.

      Response 3.8: Apologies for the error, we used a two-tailed t-test throughout the manuscript. The method section is corrected accordingly.

      Comment 3.9: At page 11, the authors stated "As enhancer regions of none of the human brain cell types showed significant enrichment for ID DNMs, we concentrated on DNMs overlapping enhancers from the bulk fetal brain for downstream analysis." However, cell-type-specific enhancer enrichment analysis vs fetal brain enhancer enrichment are two different analyses. The authors did not test if the ID DNMs were enriched more in fetal brain enhancers than control DNMs were. They only compared enrichment of ID DNMs and control DNMs fetal vs adult brain enhancers. Hence, this statement was not clearly justified. It would be improved by performing a fisher's exact test to assess if ID DNMs showed more enrichment within fetal bulk brain enhancers than control DNMs did similar to cell-type-specific enrichment analysis.

      Response 3.9: Thank you very much for pointing out this. In the revised manuscript, we have removed the above-mentioned sentence from the manuscript.

      Comment 3.10: At page 13, the authors indicated that "The fetal brain enhancer DNMs from ID probands frequently disturbed putative binding sites of TFs that were predominantly expressed in neuronal cells (P = 0.022; Table S12b). Our results suggest that the enhancer DNMs from ID probands were more likely to affect the binding sites of neuronal transcription factors and could influence the regulation of genes involved in nervous system development through this mechanism." How this conclusion is drawn is unclear. Table S12b includes three cell-types with identical p-values and odd ratios based on a statistical test. How could the authors get identical parameters for all cell-types? Which dataset was used to compare the expression of these transcription factors? Were transcription factors also expressed in non-neuronal cell-types? I would request the authors to clarify the analysis performed here in the methods section, and to compare the expression of TFs in other cell-types in order to conclude as "TFs that were predominantly expressed in neuronal cells". Also, this analysis would be improved by assessing the overlap of DNMs disturbed putative binding sites within cell-type-specific ATAC-seq peaks i.e. if they were enriched more within neuronal ATAC-seq peaks than non-neuronal ATAC-seq peaks.

      Response 3.10: The results presented in the manuscript are the output of the tissue/cell type expression analysis performed using the web-based tool Enrichr (http://amp.pharm.mssm.edu/Enrichr/). In the method section of the original manuscript under the heading “Gene enrichment analysis”, we described that the “Gene ontology enrichment and tissue enrichment analysis were performed using the web-based tool Enricher (http://amp.pharm.mssm.edu/Enrichr/))”.

      To estimate the tissue specificity of the gene expression Enricher uses gene expression data from the ARCHS4 project, which contains processed RNA-seq data from over 100,000 publicly available samples profiled by the two major deep sequencing platforms HiSeq 2000 and HiSeq 2500.

      In supplementary table 12b of the original manuscript, we presented only cell types that showed significant enrichment. However, in the revised manuscript, we have provided a list of all the tissues and cell types tested by Enrichr along with corresponding p-values. Except for the neuronal cell types, none of the tissues and cell types showed statistically significant enrichment.

      Furthermore, to make it clear we separated various gene and tissue enrichment analyses under different headings and provided a detailed explanation in the method section of the revised manuscript. The analysis of tissue specificity of transcription factor expression is now mentioned under the heading “Enrichment of analysis for tissue/cell type expression of transcription factors whose binding site were affected by enhancer DNM” (Page 27, lines 10-17) and described it in the main text as well (Page 9, lines 14-17).

      Comment 3.11: The authors randomly selected DNMs from 11 ID patients that were predicted to alter TFBS affinity for experimental validation in the luciferase assay. Were the allele-specific impacts of DNMs shown in Figure 3 consistent with the predicted impact via motifbreakR? Given that the authors prioritized the regulatory ID DNMs based on motifbreakR results for the experimental validation, I would request the authors to evaluate if the alleles disrupting a TF motif that mainly has activator/repressor function also showed lower/higher luciferase activity. That would help to support the evidence for the regulatory function of other ID DNMs predicted to be TF disruption but which could not be experimentally validated.

      Response 3.11: Thank you very much for the excellent suggestion. We evaluated if the allele disrupting TF motif that mainly has activator/repressor function also showed lower/higher luciferase activity. It is more complex because of nine DNMs that showed allele-specific activity only five disrupt the TF motif and four of them result in the gain of the TF binding site.

      Of the five that disrupt TF binding site, two disrupt the binding site of the activator (SP1 and CREB1) and both show reduced luciferase activity, while two disrupt the binding site of repressor or negative regulator (TCF7L1 and FOXN1) and both show increased luciferase activity. One DNM disrupts the binding site of the histone acetyltransferase (EP300) and shows reduced luciferase activity.

      Of the four DNMs that result in a gain of transcription factor binding sites, two create a binding site for activator (HBP1 and BPTF) and show increased activity in luciferase activity. Of the two gain of TFBS DNMs show reduced activity one creates TFBS for MAFB which can act as both a repressor and activator, while the second creates TFBS for HOXD13 for which we haven’t found any support for the repressive activity. Taken together eight out of nine DNMs show increased or decreased luciferase activity, which matches the known role of TF whose binding site was disrupted or created by DNM.

      In the revised manuscript, we added two additional columns in Table S13 indicating the role of the transcription factor (activator/repressor) and luciferase activity (gain or loss). Furthermore, we included the following text in the manuscript “Furthermore, for the majority of the DNMs (8 out 9) the allele-specific activity was consistent with the predicted effect of the MotifBreakR (Table S13). For example, CSMD1 enhancer DNMs disrupt the binding site of TCF7L1, a transcriptional repressor and luciferase assay shows that the mutant allele results in a gain of enhancer activity.” (Page 14, lines 16-19).

      Comment 3.12: At page 24 in the methods section, the authors defined the control DNMs set as "We downloaded de novo mutations identified in the healthy individuals in genomes of the Netherland (GoNL) study (21) from the GoNL website". Does DNM set from GoNL also include protein-truncating mutations? If it does, are there any de novo mutations that were previously also found in any other neurodevelopmental condition as being pathogenic or likely pathogenic? If it includes both protein-truncating de novo mutations and noncoding DNMs, the two datasets used for the analysis described in Figure 1 would not be appropriately comparable to conclude that regulatory DNMs in ID were enriched in fetal brain enhancers whereas control DNMs enriched in adult brain enhancers. In which enhancer category (fetal or adult) ID DNMs would be enriched if the same analysis is performed by using both protein-truncating and regulatory DNMs? I would request the authors to evaluate the possibility that regulatory DNMs were enriched more in fetal brain enhancers compared to adult brain regardless of disease status, if the GoNL control group includes both protein-truncating and regulatory DNMs. Also, as described in the previous statement, if control DNMs include only regulatory DNMs or both protein-truncating+regulatory DNMs is not clear. This analysis would also be improved by restricting control DNMs into regulatory DNMs.

      Response 3.12: Of 11,020 GoNL DNMs, only six DNMs were protein-truncating. None of the six protein-truncating DNMs were reported to be pathogenic or likely pathogenic in clinvar for any of the neurodevelopmental disorders or any other disease. All 47 ID samples are coding negative means they don’t have pathogenic or likely pathogenic coding DNM (protein truncating or damaging missense). Similarly, none of the GoNL samples has any pathogenic and likely pathogenic DNM. Hence, the comparison between the ID cohort and the GoNL cohort is a valid comparison.

      However, as suggested by the reviewer, we performed multiple analyses. i) We performed enrichment analysis after removing six protein-truncating DNMs from GoNL cohort but the results did not change. ii) We excluded all protein-coding DNMs including synonymous and non-synonymous DNMs from both cohorts (included only non-coding DNMs) but the results did not change.

      The number of DNMs that overlapped the fetal brain enhancer and adult brain enhancer did not change in any comparison. This is because protein-coding regions of the genome and, fetal and adult brain enhancers are mutually exclusive, they don’t overlap. Therefore, the inclusion or exclusion of protein-truncating DNMs in enhancer enrichment analysis did not affect the results.

      Comment 3.13: At page 14, the authors indicated that "In the heterozygous mutant clone, the SOX8 gene showed a significant (P = 0.0301) reduction in expression levels, however, no difference was observed in expression levels of the LFM1 gene (P = 0.8641; Fig. 4d), suggesting that the enhancer specifically regulates the SOX8 gene but not the LFM1 gene." based on the knock-in experiment for DNM. However, they did not show how CRISPRi of the enhancer which is the promoter for LFM1 impacted on LFM1 gene expression as they provided for the SOX8 gene in Figure 4b. I would request the authors to rephrase the statement as "the regulatory impact of DNM within the enhancer is specific for SOX8 but not for LFM1", or provide evidence that LFM1 expression levels did not change after the CRISPRi experiment. Also, if the CRISPRi experiment would not show any change in LFM1 expression, I would also request the authors to interpret what could be potential factors for that a regulatory sequence in a gene promoter would not impact its expression.

      Responce 3.13: As suggested by the reviewer, we have rephrased the sentence to “the regulatory impact of DNM within the enhancer is specific for SOX8 but not for LFM1”. (Page 15, lines 22-23)

      We would like to point out that the DNM-containing enhancer is not located in the promoter region of the LMF1 gene, but it is located downstream of the gene as LMF1 is on the reverse strand. The genes SOX8 (forward strand) and LMF1 (reverse strand) share a promoter region as they are transcribed in the opposite direction. The DNM-containing enhancer that interacts with the promoter region of both SOX8 and LMF1 is located downstream of the LMF1 gene. The region where gRNA was targeted for the CRISPRi experiment was approximately 10.5kb away from the 3’ end of the LMF1 gene.

      Comment 3.14: The authors utilized neuroblastoma cells for luciferase assay, neuronal progenitor cells for CRISPRi, and HEK293T cells for genome editing CRISPR/Cas9 experiments. Given the cell-type-specificity of active regulatory elements, I would request the authors to provide more justification for the utilization of different cell types for each assay. More specifically, LMF1 gene expression did not alter, albeit DNM's position in the gene promoter in Figure 4d. Could it be due to the low expression level of cell-type-specific transcription factors in HEK cells? Showing that expression levels of TFs whose binding motifs were disrupted via DNM at the region are comparable between HEK cells vs neuronal cells would be helpful here.

      Response 3.14: We set out to perform the studies in neuroblastoma cells and validate the findings in NPCs. However, due to the difficulty in performing precise editing of a single nucleotide in neuroblastoma cells/NPCs, we have used HEK293T cells (Page 15, lines 12-14).

      As described in the manuscript and the answer to the previous question, the DNM-containing enhancer is not located in the promoter region of the LMF1 gene (promoter is near the SOX8 gene), but it is located downstream of the gene as LMF1 is in the reverse strand of the genome. The region where gRNA was targeted for the CRISPRi experiment was approximately 10.5kb away from the 3’ end of the LMF1 gene, not in the promoter region of the LMF1 gene.

      Comment 3.15: Citation of many datasets are missing throughout the text including the (1) expression data in prefrontal cortex in the sentence at page 9 ".. but also predominantly expressed in the prefrontal cortex", (2) again expression data from neuronal datasets in the sentence at page 13 "The fetal brain enhancer DNMs from ID probands frequently disturbed putative binding sites of TFs that were predominantly expressed in neuronal cells", (3) NPCs in the sentence at page 14 "To investigate whether the putative enhancer of the SOX8/LMF1 gene indeed regulates the expression of the target genes, we performed CRISPR interference (CRISPRi), by guideRNA mediated recruitment of dCas9 fused with the four copies of sin3 interacting domain (SID4x) in the neuronal progenitor cells (NPCs).", and (4) H3K27ac and H3K4me1 datasets used in Figure 4e and described at page 14 in the sentence "Hence, we investigated H3K4me1 and H3K27ac levels at DNM containing SOX8 enhancer.". Adding citations of all external datasets utilized in the paper would be helpful for the reproducibility of the analyses and experiments.

      Response 3.15: In the revised manuscript, we have included citations for datasets used in the analysis.

      Analysis (1) and (2) were performed using the web-based tool Enrichr (https://maayanlab.cloud/Enrichr/). To perform tissue-specific expression analysis Enrichr uses the gene expression data from the ARCHS4 project (https://maayanlab.cloud/archs4/). We have mentioned this both in the text and the methods section of the revised manuscript.

      (3) source of NPC is now mentioned in the methods section (Page 29, lines 22-25).

      (4) The H3K4me1 and H3K27ac levels at DNM containing enhancers were measured using ChIP-qPCR in this study hence citation was not provided (Page 15, line 27 to page 16, line 1).

      Comment 3.16: At page 10, the authors indicated that "We did not find enrichment for ID DNMs in open chromatin regions (ATAC-seq peaks) for any of the developing brain cell types" and on page 11, they stated, "On the contrary, all four developed human brain cell types showed significant enrichment for ID DNMs compared to GoNL DNMs in promoter regions after correcting for multiple tests". Given that ID DNMs were more enriched in fetal brain enhancers than adult brain enhancers in Figure 1, it is important to discuss why ID DNMs were enriched within developed brain cell-type regulatory elements but not in developing brain cell-type specific regulatory elements. I would request the authors to clarify this discrepancy. Could the distance to the gene be a factor in this discrepancy? How do cell-type-specific enrichment results change if ATAC-seq peaks from developing human cortex would be also restricted by chromatin accessibility regions within gene promoters (e.g. within +/- 2kb from TSS)? If ID DNMs within promoter regions were enriched within at least one of the cell-type-specific regulatory elements in both developing and adult brains, re-evaluating the analysis performed in Figure 1 by considering the distance of DNMs to genes would be critical to conclude temporal-specific enrichment of ID DNMs.

      Response 3.16: ATAC-seq data from the developing brain was obtained from Song et al (2020, Nature) paper. ATAC-seq peaks open chromatin regions which include the entire regulatory spectrum including active and inactive regulatory regions, therefore open chromatin regions may not show enrichment for DNMs.

      To identify open chromatin regions that interact with the promoters that are active in specific cell types Song et al (2020, Nature) performed histone 3 lysine 4 trimethylation (H3K4me3) proximity ligation-assisted chromatin immunoprecipitation sequencing (PLAC-seq). Using cell type-specific chromatin interaction data, we investigated whether interacting open chromatin regions are enriched for ID DNMs as compared to GoNL DNMs. We found that interacting chromatin regions from IPCs were enriched for ID DNMs suggesting that DNMs affecting highly interacting regulatory regions might be functional.

      Furthermore, as suggested by the reviewer we performed an enrichment analysis by restricting ATAC-seq peaks to +/-2kb region around the TSS of protein-coding genes. We found that ID DNMs were enriched in promoter regions of all four developing brain cell types. We have included this result in the revised manuscript (page 11, lines 5-8).

      We then investigated if any of the 83 DNMs that overlapped with the fetal brain-specific enhancers or human gain enhancers were located within +/-2kb of the TSS of protein-coding gene. We found that only 4 DNMs were located within the 2kb region around TSS, suggesting that the enrichment observed fetal brain enhancers was not due to DNMs located in promoter regions.

      Minor<br /> Comment 3.17.1: In general, the study could benefit from more figures rather than providing results with tables to follow and understand them, especially for Table S6 and Table S11.

      Response 3.17.1: The data from Table S6 is already represented in Figure 1 of the manuscript.

      Comment 3.17.2: At figure 2, the colors of the arcs do not match the colors indicated in the label.

      Response 3.17.2: We have changed the arc colours in the Figure 2 legends to reflect the real colours of the arc from “pink” to “magenta” and “green” to “dark green”.

      Comment 3.17.3: At tables 11a and 11c, the column names indicated in the E and F columns are the same, it would be good to distinguish them.

      Response 3.17.3: Thank you very much for pointing out the error. In table 11a and 11c of the revised manuscript, we have changed the column names of the E and F columns.

      Comment 3.17.4: At page 10, the authors indicated that "The IPCs give rise to most neurons (32) hence DMNs in highly connected active promoters and enhancers from IPCs might have a profound impact on neurogenesis." This sentence is not clear.

      Response 3.17.4: We have rephrased the sentence to make it clearer “suggesting that DNMs affecting highly interacting regulatory regions of IPCs might be functional” (Page 11, lines 3-4).

      Comment 3.17.5: Radical glia -> radial glia

      Response 3.17.5: We have changed it throughout the manuscript

      Comment 3.17.6: Describe background gene lists used for all hypergeometric/fisher's exact tests.

      Response 3.17.6: We have already mentioned the background gene list used for all hypergeometric/fisher's exact tests performed in the respective supplemental tables. For the analysis performed using the web-based tool Enrichr (https://maayanlab.cloud/Enrichr/), in the method section of the revised manuscript, we have mentioned the background gene set used by Enrichr to perform tissue enrichment analysis.

      Comment 3.17.7: In Figure 4a, it would be useful to label the de novo mutation, otherwise it's not clear why a specific region was highlighted. Also, to highlight where the gRNA was targeted for the CRISPRi experiment.

      Response 3.17.7: In Figure 4a, we have labelled the de novo mutation in the revised manuscript. We have added panel 4b to highlight the region where gRNA was targeted for the CRISPRi experiment.

      Reviewer #3 (Significance):

      Overall this study attempted to identify and validate novel non-coding variants associated with ID. However, given limitations in sample size, statistical testing, and experimental design, as described above, many of these conclusions are limited.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      In this manuscript, Vas and Boulet et al. presents the potential regulatory role of de novo mutations (DNMs) in intellectual disability (ID). They performed whole-genome sequencing in an ID cohort including 21 ID probands and their healthy parents. To study the regulatory DNMs in ID, they combined 17 ID probands without pathogenic coding DNMs with a previous cohort including 30 exome-negative ID cases. Leveraging their DNM dataset with a variety of epigenomic datasets, they observed ID DNMs were enriched more within fetal brain enhancers than adult brain enhancers. They also detected that the enhancers harboring ID DNMs showed promoter-enhancer interactions for the ID-relevant genes. Moreover, they identified recurrent mutations within enhancer clusters associated with CSMD1, OLFM1, and POU3F3 genes, when combining with larger pre-existing databases of genetic variants. Finally, they found that many ID DNMs were predicted to disrupt binding motifs of TFs, and experimentally validated the regulatory function of some of these loci. They showed the allele-specific activity for an enhancer region including an ID DNM for the SOX8 gene via luciferase assay as an episomal assay. They further showed that the same enhancer region regulates SOX8 expression by performing CRISPRi, and proved the allele-specific impact of the same DNM via also genome editing with CRISPR/Cas9.

      Major

      • The sample size of the Whole Genome Sequencing conducted in this study is extremely limited, and therefore the conclusions that can be drawn from the study are also extremely limited. The authors combined their data with existing cohorts for a subset of analyses, however the novelty and utility of the findings from this cohort alone is limited.
      • Multiple testing burden must be considered when conducting enrichment studies in genomic regions using WGS data. Unfortunately, it is not considered here and without this the observed enrichment is not convincing. See for example https://www.nature.com/articles/s41588-018-0107-y.
      • The total number of promoter enhancer interactions as shown in Figure 2 is unbelievably high. The number of gene loops previously detected using Hi-C is much lower. This analysis seems to assign every enhancer in the region to the promoter within a TAD, which is much too liberal an analysis and not consistent with number of gene loops detected via Hi-C or eQTL work.
      • Because the total number of DNMs are few, I would recommend moving genomic annotations to hg38 rather than losing 123 DNMs via liftover to hg19.
      • The source of the neural progenitors used in the experiments are not described.
      • The non-targeting or control gRNA is not described.
      • It's difficult to transfect both neural progenitors and neurons, it would be useful to see images of GFP expression if this is on the plasmid to know the degree of transfection efficiency and give greater confidence in the results presented in Figure 4.
      • The specific instances where a one-tailed statistical test were used need to be highlighted.
      • At page 11, the authors stated "As enhancer regions of none of the human brain cell types showed significant enrichment for ID DNMs, we concentrated on DNMs overlapping enhancers from the bulk fetal brain for downstream analysis." However, cell-type-specific enhancer enrichment analysis vs fetal brain enhancer enrichment are two different analyses. The authors did not test if the ID DNMs were enriched more in fetal brain enhancers than control DNMs were. They only compared enrichment of ID DNMs and control DNMs fetal vs adult brain enhancers. Hence, this statement was not clearly justified. It would be improved by performing a fisher's exact test to assess if ID DNMs showed more enrichment within fetal bulk brain enhancers than control DNMs did similar to cell-type-specific enrichment analysis.
      • At page 13, the authors indicated that "The fetal brain enhancer DNMs from ID probands frequently disturbed putative binding sites of TFs that were predominantly expressed in neuronal cells (P = 0.022; Table S12b). Our results suggest that the enhancer DNMs from ID probands were more likely to affect the binding sites of neuronal transcription factors and could influence the regulation of genes involved in nervous system development through this mechanism." How this conclusion is drawn is unclear. Table S12b includes three cell-types with identical p-values and odd ratios based on a statistical test. How could the authors get identical parameters for all cell-types? Which dataset was used to compare the expression of these transcription factors? Were transcription factors also expressed in non-neuronal cell-types? I would request the authors to clarify the analysis performed here in the methods section, and to compare the expression of TFs in other cell-types in order to conclude as "TFs that were predominantly expressed in neuronal cells". Also, this analysis would be improved by assessing the overlap of DNMs disturbed putative binding sites within cell-type-specific ATAC-seq peaks i.e. if they were enriched more within neuronal ATAC-seq peaks than non-neuronal ATAC-seq peaks.
      • The authors randomly selected DNMs from 11 ID patients that were predicted to alter TFBS affinity for experimental validation in the luciferase assay. Were the allele-specific impacts of DNMs shown in Figure 3 consistent with the predicted impact via motifbreakR? Given that the authors prioritized the regulatory ID DNMs based on motifbreakR results for the experimental validation, I would request the authors to evaluate if the alleles disrupting a TF motif that mainly has activator/repressor function also showed lower/higher luciferase activity. That would help to support the evidence for the regulatory function of other ID DNMs predicted to be TF disruption but which could not be experimentally validated.
      • At page 24 in the methods section, the authors defined the control DNMs set as "We downloaded de novo mutations identified in the healthy individuals in genomes of the<br /> Netherland (GoNL) study (21) from the GoNL website". Does DNM set from GoNL also include protein-truncating mutations? If it does, are there any de novo mutations that were previously also found in any other neurodevelopmental condition as being pathogenic or likely pathogenic? If it includes both protein-truncating de novo mutations and noncoding DNMs, the two datasets used for the analysis described in Figure 1 would not be appropriately comparable to conclude that regulatory DNMs in ID were enriched in fetal brain enhancers whereas control DNMs enriched in adult brain enhancers. In which enhancer category (fetal or adult) ID DNMs would be enriched if the same analysis is performed by using both protein-truncating and regulatory DNMs? I would request the authors to evaluate the possibility that regulatory DNMs were enriched more in fetal brain enhancers compared to adult brain regardless of disease status, if the GoNL control group includes both protein-truncating and regulatory DNMs. Also, as described in the previous statement, if control DNMs include only regulatory DNMs or both protein-truncating+regulatory DNMs is not clear. This analysis would also be improved by restricting control DNMs into regulatory DNMs.
      • At page 14, the authors indicated that "In the heterozygous mutant clone, the SOX8 gene showed a significant (P = 0.0301) reduction in expression levels, however, no difference was observed in expression levels of the LFM1 gene (P = 0.8641; Fig. 4d), suggesting that the enhancer specifically regulates the SOX8 gene but not the LFM1 gene." based on the knock-in experiment for DNM. However, they did not show how CRISPRi of the enhancer which is the promoter for LFM1 impacted on LFM1 gene expression as they provided for the SOX8 gene in Figure 4b. I would request the authors to rephrase the statement as "the regulatory impact of DNM within the enhancer is specific for SOX8 but not for LFM1", or provide evidence that LFM1 expression levels did not change after the CRISPRi experiment. Also, if the CRISPRi experiment would not show any change in LFM1 expression, I would also request the authors to interpret what could be potential factors for that a regulatory sequence in a gene promoter would not impact its expression.
      • The authors utilized neuroblastoma cells for luciferase assay, neuronal progenitor cells for CRISPRi, and HEK293T cells for genome editing CRISPR/Cas9 experiments. Given the cell-type-specificity of active regulatory elements, I would request the authors to provide more justification for the utilization of different cell types for each assay. More specifically, LMF1 gene expression did not alter, albeit DNM's position in the gene promoter in Figure 4d. Could it be due to the low expression level of cell-type-specific transcription factors in HEK cells? Showing that expression levels of TFs whose binding motifs were disrupted via DNM at the region are comparable between HEK cells vs neuronal cells would be helpful here.
      • Citation of many datasets are missing throughout the text including the (1) expression data in prefrontal cortex in the sentence at page 9 ".. but also predominantly expressed in the prefrontal cortex", (2) again expression data from neuronal datasets in the sentence at page 13 "The fetal brain enhancer DNMs from ID probands frequently disturbed putative binding sites of TFs that were predominantly expressed in neuronal cells", (3) NPCs in the sentence at page 14 "To investigate whether the putative enhancer of the SOX8/LMF1 gene indeed regulates the expression of the target genes, we performed CRISPR interference (CRISPRi), by guideRNA mediated recruitment of dCas9 fused with the four copies of sin3 interacting domain (SID4x) in the neuronal progenitor cells (NPCs).", and (4) H3K27ac and H3K4me1 datasets used in Figure 4e and described at page 14 in the sentence "Hence, we investigated H3K4me1 and H3K27ac levels at DNM containing SOX8 enhancer.". Adding citations of all external datasets utilized in the paper would be helpful for the reproducibility of the analyses and experiments.
      • At page 10, the authors indicated that "We did not find enrichment for ID DNMs in open chromatin regions (ATAC-seq peaks) for any of the developing brain cell types" and on page 11, they stated, "On the contrary, all four developed human brain cell types showed significant enrichment for ID DNMs compared to GoNL DNMs in promoter regions after correcting for multiple tests". Given that ID DNMs were more enriched in fetal brain enhancers than adult brain enhancers in Figure 1, it is important to discuss why ID DNMs were enriched within developed brain cell-type regulatory elements but not in developing brain cell-type specific regulatory elements. I would request the authors to clarify this discrepancy. Could the distance to the gene be a factor in this discrepancy? How do cell-type-specific enrichment results change if ATAC-seq peaks from developing human cortex would be also restricted by chromatin accessibility regions within gene promoters (e.g. within +/- 2kb from TSS)? If ID DNMs within promoter regions were enriched within at least one of the cell-type-specific regulatory elements in both developing and adult brains, re-evaluating the analysis performed in Figure 1 by considering the distance of DNMs to genes would be critical to conclude temporal-specific enrichment of ID DNMs.

      Minor

      • In general, the study could benefit from more figures rather than providing results with tables to follow and understand them, especially for Table S6 and Table S11.
      • At figure 2, the colors of the arcs do not match the colors indicated in the label.
      • At tables 11a and 11c, the column names indicated in the E and F columns are the same, it would be good to distinguish them.
      • At page 10, the authors indicated that "The IPCs give rise to most neurons (32) hence DMNs in highly connected active promoters and enhancers from IPCs might have a profound impact on neurogenesis." This sentence is not clear.
      • Radical glia -> radial glia
      • Describe background gene lists used for all hypergeometric/fisher's exact tests.
      • In Figure 4a, it would be useful to label the de novo mutation, otherwise it's not clear why a specific region was highlighted. Also to highlight where the gRNA was targeted for the CRISPRi experiment.

      Referees cross-commenting

      I agree with the other reviewers' comments. I just have one specific comment: Reviewer 1 suggested that RNA-seq would be more accurate than gene expression; however, I feel that this assay is not necessary and may be quite expensive for the targeted gene expression differences measured here.

      Significance

      Overall this study attempted to identify and validate novel non-coding variants associated with ID. However, given limitations in sample size, statistical testing, and experimental design, as described above, many of these conclusions are limited.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript by De Vas et al describes an investigation of the contribution of non-coding de novo variants to intellectual disability (ID). The authors perform whole genome sequencing (WGS) of 21 ID probands and both parents, and combine these data with WGS from 30 trios previously sequenced. The authors use publicly available data from the Roadmap Epigenomics project to identify sets of enhancers hypothesised to have a role in ID, such fetal brain specific enhances and enhancers associated with known ID-associated genes. These enhancer sets are then tested for enrichment of non-coding de novo variants ID, using publicly available de novo variant data from the Genome of Netherlands (GoNL) project as a control comparison. The authors report that de novo variants in ID are significantly enriched within fetal brain-specific and human-gained enhancers. This is perhaps the main finding of the study. The authors also identify recurrent de novo variants in ID within clusters of enhancers that regulate the genes CSMD1, OLDM1 and POU3F3 in ID. A number of functional experiments are performed to provide further insights in the mechanisms by which de novo variants impact the expression of putative target genes; for example, data is provide that show de novo variants observed in ID within a SOX8 enhancer leads to reduced expression of the SOX8 gene. In conclusion, the authors claim that their data support de novo variants in fetal brain enhancers as contributing to the aetiology of ID.

      Major comments.

      The study uses leading edge genomic technologies to generate WGS in a new ID sample, which is used to investigate the role of non-coding variants to ID aetiology. The manuscript is in general very well written. However, a weakness of the study is a very small sample size, which should result in low statistical power. Despite this power consideration, the authors report very strong P values for their main findings. My main concern with the study is that the methodology used to evaluate enrichment of de novo variants within specific sets of enhancers is unclear, and therefore as it currently stands, I am unable to be confident in the findings. I am also concerned about whether data from the Genome of the Netherlands project is a suitable control comparison, given technical differences that are likely to exist between this and the ID data set. I further explain these methodological concerns below:

      1. When testing for enrichment of de novo variants, the most commonly used approach in the field involves testing whether the observed number of de novo variants in a given genomic region is greater than the number expected by chance, using a Poisson test. Here, the expected number of de novo variants is derived from trinucleotide mutation rates. This method was first proposed by Samocha et al 2014. The current authors use trinucleotide mutation rates to estimate the expected number of de novos among enhancer sets, and cite the Samocha paper, but my understanding is that they do not use a Poisson test to evaluate enrichment. Instead, they use the expected number of mutations among the enhancer sets to normalise the observed number of de novo variants, but it is not clear to me why this is performed, and also what data and statistical test is actually being used to evaluate de novo variant enrichment? I can guess at what they have done, but the methods section outlining this test should be more clearly explained.
      2. Can the authors please explain why they did not used the standard de novo variant enrichment approach outlined in Samocha et al 2014, which is used in similar non-coding de novo studies of ID (e.g. Short et al 2018 Nature)? My concern is that using the Samocha approach, no enrichment would be observed in fetal brain enhancers, given the data presented in supplemental table S6.
      3. In Supplemental table S6, the normalised expected number of de novo variants across all different enhancer sets within the ID and GoNL samples is the same. Can the authors clarify why this is the case, as presumably these sets contain very different genomic sequences, and therefore one would not expect the same number of DNMs?
      4. Instead of using the standard enrichment approach proposed by Samocha et al 2014, the authors compare the rates of de novo variants in ID to those reported in the GoNL study. However, very little information is provided about the de novo variant data from the GoNL. Presumably, the GoNL and the current study used different approaches to sequence samples, call variants, and QC the data. Also, is the coverage across these studies comparable? All these factors will contribute to batch effects, and therefore I am not convinced that the GoNL study is an appropriate control comparison. The authors should provide data to reassure the reader that these samples can be compared. For example, are similar rates of de novo variants found between these samples for variants in null enhancers sets? To clarify, an equivalent analysis in exome sequencing studies would be to show that the rates of synonymous variants are the same across data sets.
      5. The replication analysis of enhancer clusters that are recurrently hit be de novo variants in ID is weak. For enhancer clusters with recurrent de novo variants in their ID cohort, the authors simply report the number of de novo variants observed in these enhancers in the Genomics England cohort, but they do not test whether the observed number in Genomics England is greater than that expected. For their findings to be replicated, they need to show the de novo rate is statistically above expectation.

      Minor comments:

      1. The authors state that all coding de novos were validated by Sanger sequencing, but what about the non-coding de novos? Validation of the specific mutations that contribute to the main findings would strengthen the paper.
      2. In the introduction, the line "A family with two affected siblings was analysed for the presence of recessive variants" seems out of place and incomplete, as there is no mention of the results from this analysis.
      3. In the discussion, they write "It is noteworthy that in protein-coding regions of the genome, only protein-truncating variants (PTV), but not other protein-coding mutations, show significant enrichment in neurodevelopmental disorders (11,41)". This is not true. In Kaplanis et al 2020, damaging missense variants are robustly shown to contribute to NDDs (see SM figure 3 for example).
      4. The data availability statement is weak. Many similar studies have deposited sequencing data from NDD cohorts to appropriate repositories.
      5. The authors should consider making the code used for their analysis open source, as this would help clarify some of the methodological questions I, and other may, have.

      Referees cross-commenting

      I agree with the other reviews.

      Significance

      This is in important area of research, as the fraction of ID explained by non-coding variants is unknown. However, the very small sample size, especially when compared with other sequencing studies of NDDs in the literature, unfortunately limit the significance of the advance. Nevertheless, if authors can show that the results reported in the paper are robust, then the findings will be of interest to both researchers and clinicians studying NDDs.

      My area of expertise is in the generation and analysis of sequencing data to study psychiatric and neurodevelopmental disorders. I have a lot of experience analysing exome sequencing data from proband-parent trios. I do not have experience with CRISPR, so I have not commented on that part of the study.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Figure 4. The figure legend and sub-figures are inconsistent. They do not match.

      Figure 4. For the Sanger sequencing trace of the edited HEK293 cells, why there are noise peak?

      How many single cell clones were chosen for further analyses after CRISPR genome editing? The authors should do single cell filtering by Flow Cytometer or others.

      The authors conducted RT-qPCR to quantify mRNA expression, RNA-Sequencing should be more accurate.

      The discussion is too long, please shorten.

      Referees cross-commenting

      I agree with the other reviewers' comments.

      Significance

      This study investigates the genetic and molecular mechanisms of intellectual disability (ID) by integrating whole genome sequencing and follow up functional explorations. The results provide novel insights into genetic aetiology of ID.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Summary of changes

      We thank all three reviewers for their constructive feedback on our manuscript. We have now perfomed extensive experiments, analyses, and rewriting of our manuscript to address all their concerns. We believe that these changes significantly improve the rigor of our conclusions and the clarity of our discussion. We highlight below key experiments, analyses, and re-writing in the revised manuscript, which is followed by a detailed point-by-point response. 1) We have now performed experiments using alternative uORF donor sequences to demonstrate the robustness of uORF repression to changes in uORF length.

      2) By mutating out near-cognate start codons within uORF2, we have now demonstrated that near-cognate start codon initiation within uORF2 does not impact repression.

      3) To quantify the dynamic range of our dual luciferase assay, we have now mutated out the NLuc start codon. We find that repressive uORF2 constructs have expression levels that are still > 20-fold above the no-startcodon control values.

      4) We have now analyzed ribosome profiling coverage on uORFs (supplementary figure 5), and we show that several uORFs with known elongation stalls lack evidence of 40S and 80S subunit queueing 5′ to stalls, consistent with our collision-induced ribosome dissociation model.

      5) We have now provided detailed discussion of footprint length choice in our modeling and the role of codon choice in our experiments.

      6) We have now added a new main figure that provides a graphical representation of reactions considered in our kinetic modeling. This figure will make our modeling assumptions more transparent and accessible to readers with less computational expertise.

      Reviewer #1:

      Summary

      Bottorff et al test several models of uORF-mediated regulation of main ORF translation using the uORF2 of CMV UL4 gene, a system that has been previously experimentally characterized by the authors. They first train a computational model to recapitulate the observed experimental effects of mutations in uORF2, and then use the model to infer which uORF parameters may confer buffering against reduced ribosome loading that typically occurs upon biological perturbation. The authors then find that: i) the uORF2 confers buffering, ii) the uORF2 mechanism adjusts to computational predictions for the collision-mediated 40S dissociation model of uORF-mediated regulation. Significance

      This manuscript represents an interesting effort to distinguish mechanisms of uORF-mediated regulation based on mathematical modeling, and might be useful for the translation community. My expertise: Regulation of translation.

      We thank Reviewer 1 for a succinct summary of our main conclusions and highlighting the significance of our work to the translation community.

      Major comments 1) Figure 4 (Figure 5 in revised version): Which is the dynamic range of the WT vs the no-stall construct? In the WT construct, main ORF translation is already quite repressed, and detecting further repression may be more difficult than in the no-stall construct. In other words, the differences that authors are detecting between the WT and no-stall constructs might be due to a potential lower dynamic range of the WT construct

      To measure the dynamic range of our reporter assay, we have now mutated the start codon of the NLuc reporter ORF. We reasoned that this construct provides a lower bound on measurable NLuc signal. The resulting noNLuc-start-codon reporter expression was at least 20-fold lower than WT construct (Fig. S1A). Importantly, we also see that the raw NLuc signal of the WT construct is at least 20-fold over the background (Fig. S1B). Thus, the differential response of WT and no-stall constructs is not simply due to lower dynamic range of the WT construct.

      2) The authors conclude that uORF2 follows the collision-mediated 40S dissociation model, based on fitness of their experimental results with predictions from their mathematical modeling regarding distance between uORF2 initiation codon and the stalling site. But can the authors actually directly prove that there are no 40S subunits accumulating behind the stalled 40S using Ribo-Seq or TCP-Seq?

      We have now examined existing 80S Ribo-seq and 40S TCP-seq datasets to determine whether queued 40S or 80S ribosomes can be detected at known stall sites. Stern-Ginossar et al. (2012) performed 80S Ribo-seq during hCMV infection. In this dataset, while the stall at the UL4 termination codon has a very high ribosome density, few elongating ribosomes are seen queued behind the stalled 80S, consistent with an absence of 80S ribosome queuing (Fig. RR1). By contrast, another well-studied elongation stall in the Xbp1 mRNA shows ~30 nt periodic peaks in ribosome density indicative of ribosome queues (Fig. RR2). An important caveat is that queued ribosomes could be systematically underrepresented in standard Ribo-seq datasets due to incomplete nuclease digestion (Darnell et al., 2018; Subramaniam et al., 2014; Wolin and Walter, 1988).

      Since there is no 40S TCP-Seq dataset during hCMV infection, we examined other known stalls on human mRNAs (Fig. RR3 below; Fig. S5 in our manuscript). We examine small ribosomal subunit profiling data from human uORFs with conserved amino acid-dependent elongating ribosome stalls (Figure S5A). Ribosome density read counts are low across all of these uORFs, showing no evidence of ribosome queuing. Subtle queues might not be observed given these low read counts from insufficient capture of small ribosomal subunits. Nevertheless, we do not observe any evidence of queueing upstream to elongating ribosome stalls in this data. We note these observations in our Discussion section as follows (lines 688-712): “Although our data from UL4 uORF2 does not support the queuing-mediated enhanced repression model (Fig. 1C) (Ivanov et al., 2018), this model might describe translational dynamics on other mRNAs. Translation from near-cognate start codons is resistant to cycloheximide, perhaps due to queuing-mediated enhanced initiation, but sensitive to reductions in ribosome loading (Kearse et al., 2019). Loss of eIF5A, a factor that helps paused elongating ribosomes continue elongation, increases 5′ UTR translation in 10% of studied genes in human cells, augmented by downstream in-frame pause sites within 67 codons, perhaps also through queuing-mediated enhanced initiation (Manjunath et al., 2019). There is also evidence of queuing-enhanced uORF initiation in the 23 nt long Neurospora crassa arginine attenuator peptide (Gaba et al., 2020) as well as in transcripts with secondary structure near and 3′ to start codons (Kozak, 1989). Additional sequence elements in the mRNA might determine whether scanning ribosome collisions result in queuing or dissociation. Small subunit profiling data (Wagner et al., 2020) from human uORFs that have conserved amino acid-dependent elongating ribosome stalls do not show evidence of scanning ribosome queues (Fig. S5A), consistent with the collision-mediated 40S-dissociation model. Subtle queues might not be observed given these low read counts from insufficient capture of small ribosomal subunits.”

      3) Experimental data in Figures 2, 4 and 5 include 3 technical replicates. Sound conclusions typically require biological replicates. Further, the number of replicates in Figure 6 has not been indicated.

      As suggested by the reviewer, we have now included biological replicates for all luciferase assays [Figures 2, 5, 6, and 7 that were previously 2, 4, 5, and 6] that were technical replicates in the previous version. This replication does not alter any of our conclusions. We have now included the number of biological replicates for Figure 7 (former Figure 6).

      Minor comments 1) Figure 4 (Figure 5 in revised version): It is strange that a PEST sequence had to be introduced in the construct of part B in order to observe reliable differences, but not in constructs of parts A and C. Can the authors explain?

      We introduced the PEST sequence for part B because we wanted to measure the reporter response to treatment with a drug that reduces translation initiation. The PEST sequence increases the turnover rate of the reporter protein. Without the PEST sequence, the luminescence signal will be dominated by the reporter expression before the drug was added. However, in parts A and C, initiation rate was altered through genetic mutations and measuring their expression under basal conditions does not require a PEST sequence. Except in situations where a quick dynamic response needs to be measured such as in the drug treatment in part B, reporters without PEST sequences are simpler to interpret due to the absence of proteasome-mediated degradation and higher overall signal.

      2) Figure 6 (Figure 7 in revised version): Unfortunately, the authors find no other human uORFs with terminal diproline motifs that are so essential for main ORF repression as uORF2. In this light, can the authors comment further on the usefulness of their findings for human genes? Have the authors searched for viral RNAs with similar features? Please, notice that the gene PPP1R37 has not been mentioned in the main text.

      The UL4 and human uORFs differ in their sequence determinants of translational repression. UL4 uORF2 represses translation entirely through nascent peptide-mediated stalling. While the terminal diproline motif in UL4 uORF2 is necessary for main ORF repression, it is not sufficient. A number of other residues in the UL4 uORF2 peptide play a critical role in repression (Cao and Geballe, 1996; Matheisl et al., 2015). Thus, it is not surprising that human uORFs that we identified based solely on the presence of terminal diproline motifs confer only modest decrease in repression upon mutating the terminal proline. The human uORFs containing these terminal diprolines may partially repress translation via nascent peptide effects, but the majority of the repression likely arises from siphoning of scanning ribosomes from the main ORF (Fig. 1A in our manuscript) and inefficient termination following translation of consecutive prolines (Cao and Geballe, 1996; Cao and Geballe, 1998; Janzen et al., 2002; Matheisl et al., 2015). Our current understanding of features in nascent peptide that mediate translational repression (Wilson et al., 2016) is insufficient to bioinformatically identify elongation-stall containing uORFs in human or viral genomes, so we simply looked for terminal diprolines. Despite this limitiation, we note that the modeling approaches and experimental perturbations developed in our work can be applied to study ribosome kinetics on any repressive uORF, independent of the mRNA or peptide sequence underlying the repression. As suggested by Reviewer 1, we have now included all the studied uORFs in the main text.

      Reviewer #2:

      Summary

      In this paper, the authors are exploring the uORF regulatory mechanism. They first discussed five general models how uORFs might work to repress and buffering main ORF translation, then they mainly focus on the UL4 uORF2 for the potential mechanism. They use both computer modeling and experimental validation with reporter assay in 293t cell line. Based on their model, and few experimental results when they change the translation initiation rate and/or length of dORF, they propose it may work through 40S dissociation model, since the buffering effect is not uORF length sensitive. Significance

      It is an interesting area, using modeling with experiment validation to understand uORF regulation mechanism, the kinetics and interplay between different translation steps, it will help us to understand uORF buffering in stress conditions. Also bring modeling method with reporter validation to the translation field, will provide clues to the molecular mechanism study, especially in complex situation.

      We thank Reviewer 2 for a comprehensive summary of our work and noting the uniqueness and usefulness of our experiment-integrated modeling approach to the translation field.

      Major comments • Are the key conclusions convincing? The modeling for different mechanisms is insightful, but some modeling parameters and experimental validation are not conclusive and validation of few of them can enforce the conclusions.

      We have now performed key validation experiments suggested by Reviewer 2, notably: 1. mutating out of nearcognate start codons in the UL4 uORF2 coding sequence and 2. increasing UL4 uORF2 length using two unrelated protein coding sequences. Please see responses to specific comments below for further details.

      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? Yes, the part about queuing and length sensitive is not convincing to me, it should be modified and reduce the statement strength.

      We agree about reducing the statement strength and have altered our statements as suggested by the reviewer. Specifically, we have now expanded the rationale for the choice of footprint lengths of 40S subunits. Please see responses to specific comments below for further details.

      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. Yes, please see the specific concerns • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. They will need to re-think about the modeling, and validation in Figure 5, there are validation experiments that can be done in weeks and in a cost-efficient manner that can enforce the conclusions.

      We have performed the experiments suggested by the reviewer. See responses below.

      • Are the data and the methods presented in such a way that they can be reproduced? Most of them are good • Are the experiments adequately replicated and statistical analysis adequate? Yes Specific concerns 1) It is a bit confusing to me in Figure 2C, the reporter assays, that non-start codon reporter and non-stall reporter has same expression level. In theory, the non-stall reporter still has uORF there, so it should repress main ORF expression, and have lower expression level than the non-start reporter, where there is no uORF, no repression. In other uORFs they tested in Figure 6 (Figure 7 in revised version), the non-stall reporters are lower than non-start reporter. Since data they use to build the model is Figure 2B, and calculate the parameters for the whole paper, I just want to make sure it is making sense. I noticed there is another CTG in frame on the 4th codon, this may be alternative start codon in the non-start reporter to trigger some repression.

      To address Reviewer 2’s concern about alternative start codon usage in the non-start reporter, we have now mutated out all near-cognate start codons known to initiate translation with high frequency (Kearse and Wilusz, 2017). These near-cognate start codons consisted of Leu4 CTG, Leu11 CTG, Leu14 TTG, and Leu15 CTG and were mutated to CTA, CTA, TTA, and CTA, respectively. We find that removing the uORF2 near-cognate start codons does not significantly alter NLuc expression (Fig. S1A). This experiment merely rules out one possible source of these similar expression levels. We expect that uORF2 no-start and no-stall reporters’ very similar NLuc expression levels can be rationalized for the several following reasons: 1. uORF2 initiation frequency is quite low. We estimate it to be 5% or less in our modeling based on previous measurements (Cao and Geballe, 1995). Thus, the maximum theoretically possible difference in NLuc expression between no-start and no-stall reporters is 5% or less. 2. Further, re-initiation after uORF2 translation is frequent. We estimate it to be around 50% within our manuscript, which will further decrease repression in the no-stall mutant. Thus, we expect the no-stall mutant to decrease the flux of scanning ribosomes at the main ORF by 2-3% compared to the no-start mutant. 3. Finally, a subtle but important point to note is that our reporter assays are measuring NLuc expression and not the flux of scanning ribosomes at the main ORF NLuc start codon. Since NLuc ORF has a strong start codon context (GCCACC) and the flux of scanning ribosomes is already high for the no-start and no-stall mutants, slight changes in the flux of scanning ribosomes are unlikely to impact NLuc expression. This is because start codon selection is not rate-limiting for protein expression under these conditions. This last point is clearly seen in high throughput reporter assays where the mutations which impact reporter expression in a non-optimal context have little or no effect in an optimal context (see Fig. 5B, 5C in Noderer et al., 2014).

      Thus, in summary, even if the flux of scanning ribosomes is decreased by 3-5% by the no-stall uORF2 mutant compared to the no-start uORF2 mutant, we expect the effect on NLuc expression to be negligible and below the limit of our experimental resolution (which is ~10% based on the standard error across technical replicates).

      Regarding the different behavior of the human uORFs in our manuscript and UL4 uORF2, note the response to Reviewer 1 regarding the usefulness of our human uORF findings.

      2) All the modeling and prediction the authors do are based on average, but we know translation is very heterogeneous. For each ribosome or each 40S, the kinetics varies a lot, the authors should discuss about this part.

      We now discuss translation heterogeneity in the Discussion section in lines 781-794 as follows: “Translation heterogeneity among isogenic mRNAs has been observed in several single molecule translation studies (Boersma et al., 2019; Morisaki et al., 2016; Wang et al., 2020; Wu et al., 2016; Yan et al., 2016). This heterogeneity may arise from variability in intrasite RNA modifications (Yu et al., 2018), RNA binding protein occupancy, or RNA localization. We do not capture these sources of heterogeneity in our modeling since the observables in our simulations are averaged over long simulated time scales and used to predict only bulk experimental measurements. However, our models studied here can readily extended through compartmentalized and state-dependent reaction rates (Harris et al., 2016) to account for the different sources of heterogeneity observed in single molecule studies.”

      3) For modeling related with the queuing-mediated model in Figure 1C. they use 30nt as the ribosome length to count the potential queuing to start codon. But 30nt is the 80S protected fragment with specific conformation. The protected fragment for 80S will change based on different status of ribosome conformation or elongation step. More importantly, for queuing, it is 40S, so they may have a different size. Based on previous 40S ribosome profiling (Archer, Stuart K., et al. Nature 535.7613 (2016): 570-574. And other papers), the length can vary from 19nt to very long, so I don’t think the 30nt length can be used to model queuing in 40S and length sensitivity in the uORF working mechanism.

      We thank Reviewer 2 for highlighting this issue of footprint length heterogeneity that we had not previously addressed. In our modeling, we assume homogenous ribosome footprints. While, heterogeneous ribosome footprints have been observed for small ribosomal subunits (Bohlen et al., 2020; Wagner et al., 2020; Young et al., 2021) and elongating ribosomes (Lareau et al., 2014; Wu et al., 2019), we believe that our modeling of homogenous footprint length is appropriate for the following three reasons: First, with respect to the small ribosomal subunit footprint heterogeneity, we note that TCP-seq studies include crosslinking of eukaryotic initiation factors (eIFs). The presence of these eIFs is thought to be the main source of heterogeneity in scanning ribosome footprints (Bohlen et al., 2020; Wagner et al., 2020). Although crosslinking is often performed, it is not necessary to obtain scanning ribosome footprints, and homogenous 30 nt footprints are observed in the absence of crosslinking (Bohlen et al., 2020). Notably, figure S2 of Bohlen et al. (2020), reproduced as Fig. RR4 below, shows that scanning SSU footprint lengths are tightly distributed around 30 nt when crosslinking is not used.

      Second, in the context of the strong, minutes-long UL4 uORF2 elongating ribosome stall (Cao and Geballe, 1998), collided ribosomes will wait for long periods of time relative to normal elongating or scanning ribosomes. Thus, we expect that associated eIFs dissociate from these dwelling ribosomes as they typically do during start codon selection or during translation of short uORFs (Bohlen et al., 2020). Third, a significant fraction of mRNAs exhibit cap-tethered translation in which eIFs must dissociate from ribosomes before new cap-binding events, and therefore collisions, can occur (Bohlen et al., 2020). Based on above three points, we believe that modeling the footprint of only the scanning ribosomes, and not the associated eIFs, using a single 30 nt length is biologically reasonable. Footprint length heterogeneity of elongating ribosomes is much less drastic than that observed for scanning ribosomes and likely arises from different conformational states such as an empty or occupied A site (Lareau et al., 2014; Wu et al., 2019). While the different elongating ribosome footprints arise from differences in mRNA accessibility to nucleases, it is unclear whether the distance between two collided ribosomes changes across different ribosome conformations. For instance, the queues of elongating ribosomes observed at the Xbp1 mRNA stall occur at regular ~30 nt periodicity (Fig. RR2). Additionally, the stalled elongating ribosome is stuck in a pretranslocation state and has a defined, ~30 nt footprint (Wu et al., 2019), which only leaves room for 1 5′ queued ribosome within UL4 uORF2 whose footprint is conformation sensitive. Finally, a small degree of scanning footprint heterogeneity is also accounted for by our modeling of backward scanning which effectively introduces heterogeneity to collided scanning ribosome location on mRNAs (Figures 6A, S2D in our manuscript). We have now summarized the above points in the Discussion section of the revised manuscript (lines 713-740).

      4) For Figure 5B (Figure 6B in revised version), besides the modeling length part I have mentioned above, when the authors increase the length of uORF, the sequence is also changed, which may introduce other side effect. So, if the authors want to conclude about the queuing part, they should rethink about the length for both modeling and validation, plus control for the sequence they added to increase the length of uORF, for example use different sequence when manipulate the length.

      As suggested by the Reviewer, we have now varied the length of uORF2 using a different, unrelated donor sequence encoding the FLAG peptide and observe similar results (Fig. S4 in our manuscript) to our original experiment with the YFP-encoding sequence (Fig. 6B in our manuscript). A slight trend towards derepression with longer uORFs is observed in both cases. This effect might arise due to decreased stall strength caused by higher nascent peptide protrusion out of the exit tunnel leading to cotranslational folding (Bhushan et al., 2010; Nilsson et al., 2015; Wilson et al., 2016) or nascent chain factors (Gamerdinger et al., 2019; Weber et al., 2020) exerting a pulling force on the peptide. Importantly, we do not see the periodic change in repression predicted by the queueing model (Figure 6A, yellow-green lines).

      Minor comments • Specific experimental issues that are easily addressable. 5) It is unclear how the luciferase assays were analyzed considering the background noise. If the NLuc expression is low, close to the background, then how to extract or normalize the background will influence the expression level, thus fold change for different reporter/condition.

      To account for the luciferase background, we subtracted background from measured data values. To show that expression is rarely close to background (from mock transfections), we included a supplementary figure showing raw NLuc and FLuc values (Fig. S1B). Also note the response to Reviewer 1 regarding a no-start-codon control having a 20-fold lower signal than the WT UL4 uORF2 construct.

      • Are prior studies referenced appropriately? yes • Are the text and figures clear and accurate? Mostly good • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? Have a main figure about the modeling part.

      As suggested by the Reviewer, we have now added visual representations of the reactions as a new main figure (Fig. 3). We also moved the modeling workflow figure from the supplementary set of figures to this main figure (Fig. 3). We thank the reviwer for this suggestion that greatly improves the presentation of our modeling methodology

      • Place the work in the context of the existing literature (provide references, where appropriate). Recent years, there has been a lot of study about small open reading frames, while for uORFs are known to repress translation, the regulatory mechanism is not known yet, there are just different models not validated yet (Young & Wek, 2016). Also, under normal conditions and stress conditions, uORF can play both repressive and stimulative role in main ORF translation (Orr, Mona Wu, et al. NAR 48.3 (2020): 1029-1042.). This paper is the first study to put all the uORF working hypothesis with buffering effect together, they use modeling to explain how under each hypothesis, buffering may happen or not. >• State what audience might be interested in and influenced by the reported findings. It will be interesting to people, who study molecular biology, biochemistry for translation regulation, especially uORFs. The modeling people may also find it interesting, how they could adapt modelinbeew keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. I have extensive experience working in the translation regulation field and I feel extremely comfortable to discus all the experimental part including individual reporters as well as genome wide. But I do not consider an expert in the modelling section of this work.

      Reviewer #3 :

      Summary Small ORFs are prevalent in eukaryotic genomes with variety of functions. Recent technological advances enable their detection, yet our understanding on the mode of action remains quite rudimentary. The manuscript by Bottorff, Geballe and Subramaniam aims at elucidating the function of UL4 uORF in the CMV, and thus, it is on timely and topical research. The authors measure the uORF -controlled expression of the well-studies UL4 uORF and kinetically model the initiation behavior. Within a second uORF, a diproline pair controls initiation of the downstream main ORF sensing ribosomal collisions between a scanning small subunit and an 80S positioned at the canonical start of the main ORF. The stalling at both proline codons is envisioned as a kinetic window to sense any elongation-competent 80S at initiation and thus, control the ribosomal load and expression. Such diproline tandems are present in some uORFs in human transcriptome, hence representing more pervasive control mechanism. Significance I am unable to comment in depth on the modeling algorithms and simulations as this is outside of my expertise. The experiments are reasonably designed to test various models of uORF regulation and set the frame for the modelling. The idea that various stress factors would decrease canonical initiation and consequently would reflect the number of initiating ribosomes are adequately tested by varying the number of initiating ribosomes. The discovery of the two terminal prolines, that are also found in other human uORFs, is appealing mode of controlling stalling-driven downstream initiation. However, the lack of experimental support with the human uORFs may indicate additional contributions. This raises the question as to whether the proline codon identity plays a role? Since codons are read with different velocity which is mirrored by the tRNA concentration. It would be good to address whether special proline codons have been evolutionarily selected in CMV and whether the kinetics of stalling strongly depends on the codon identity. Are both prolines in the tandem using the same codon? Along that line, are the same proline codons used in the human diproline-containing counterparts? Consequently, the P to A mutation may have altered the codon usage and could be the reason for the nonlinear effect in the human sequenced. In this case, it would make sence to use Ala-codons with similar codon usage as the natural prolines?

      We thank the Reviewer for raising this point about the role of codon usage. The tandem proline residues do not use the same codon (CCG then CCT). The two C-terminal proline residues in uORF2 are necessary for the elongating ribosome stall (Bhushan et al., 2010; Degnin et al., 1993; Wilson et al., 2016), but it has been previously shown that the identity of the codon does not significantly impact repression (Degnin et al., 1993). The human uORFs generally have 1 of the 2 Pro codons in common with the uORF2 Pro codons. Given that most of the human uORF P to A mutations behave similarly (Figure 7) irrespective of the original proline codon, we believe that codon usage does not impact repression by these uORFs. Moreover, as explained in response to Reviewer 1 and 2’s questions, we believe that the human uORFs containing terminal diprolines may partially repress translation via nascent peptide effects, but the majority of the repression likely arises from efficient siphoning of scanning ribosomes from the main ORF by the uORF (Fig. 1A in our manuscript).

      References

      Bhushan, S., Meyer, H., Starosta, A.L., Becker, T., Mielke, T., Berninghausen, O., Sattler, M., Wilson, D.N., and Beckmann, R. (2010). Structural Basis for Translational Stalling by Human Cytomegalovirus and Fungal Arginine Attenuator Peptide. Molecular Cell 40, 138–146.

      Boersma, S., Khuperkar, D., Verhagen, B.M.P., Sonneveld, S., Grimm, J.B., Lavis, L.D., and Tanenbaum, M.E. (2019). Multi-Color Single-Molecule Imaging Uncovers Extensive Heterogeneity in mRNA Decoding. Cell 178, 458–472.e19.

      Bohlen, J., Fenzl, K., Kramer, G., Bukau, B., and Teleman, A.A. (2020). Selective 40S Footprinting Reveals Cap-Tethered Ribosome Scanning in Human Cells. Molecular Cell 79, 561–574.e5.

      Cao, J., and Geballe, A.P. (1995). Translational inhibition by a human cytomegalovirus upstream open reading frame despite inefficient utilization of its AUG codon. J Virol 69, 1030–1036.

      Cao, J., and Geballe, A.P. (1996). Coding sequence-dependent ribosomal arrest at termination of translation. Molecular and Cellular Biology 16, 603–608.

      Cao, J., and Geballe, A.P. (1998). Ribosomal release without peptidyl tRNA hydrolysis at translation termination in a eukaryotic system. RNA 4, 181–188.

      Darnell, A.M., Subramaniam, A.R., and O’Shea, E.K. (2018). Translational Control through Differential Ribosome Pausing during Amino Acid Limitation in Mammalian Cells. Molecular Cell 71, 229–243.e11.

      Degnin, C., Schleiss, M., Cao, J., and Geballe, A. (1993). Translational inhibition mediated by a short upstream open reading frame in the human cytomegalovirus gpUL4 (gp48) transcript. Journal of Virology.

      Gaba, A., Wang, H., Fortune, T., and Qu, X. (2020). Smart-ORF: a single-molecule method for accessing ribosome dynamics in both upstream and main open reading frames. Nucleic Acids Research.

      Gamerdinger, M., Kobayashi, K., Wallisch, A., Kreft, S.G., Sailer, C., Schlömer, R., Sachs, N., Jomaa, A., Stengel, F., Ban, N., et al. (2019). Early Scanning of Nascent Polypeptides inside the Ribosomal Tunnel by NAC. Mol Cell 75, 996–1006.e8.

      Han, P., Shichino, Y., Schneider-Poetsch, T., Mito, M., Hashimoto, S., Udagawa, T., Kohno, K., Yoshida, M., Mishima, Y., Inada, T., et al. (2020). Genome-wide Survey of Ribosome Collision. Cell Reports 31, 107610.

      Harris, L.A., Hogg, J.S., Tapia, J.-J., Sekar, J.A.P., Gupta, S., Korsunsky, I., Arora, A., Barua, D., Sheehan, R.P., and Faeder, J.R. (2016). BioNetGen 2.2: advances in rule-based modeling. Bioinformatics 32, 3366–3368.

      Ivanov, I.P., Shin, B.-S., Loughran, G., Tzani, I., Young-Baird, S.K., Cao, C., Atkins, J.F., and Dever, T.E. (2018). Polyamine Control of Translation Elongation Regulates Start Site Selection on the Antizyme Inhibitor mRNA via Ribosome Queuing. Mol Cell 70, 254–264.e6.

      Janzen, D.M., Frolova, L., and Geballe, A.P. (2002). Inhibition of translation termination mediated by an interaction of eukaryotic release factor 1 with a nascent peptidyl-tRNA. Mol Cell Biol 22, 8562–8570.

      Kearse, M.G., and Wilusz, J.E. (2017). Non-AUG translation: a new start for protein synthesis in eukaryotes. Genes Dev 31, 1717–1731.

      Kearse, M.G., Goldman, D.H., Choi, J., Nwaezeapu, C., Liang, D., Green, K.M., Goldstrohm, A.C., Todd, P.K., Green, R., and Wilusz, J.E. (2019). Ribosome queuing enables non-AUG translation to be resistant to multiple protein synthesis inhibitors. Genes Dev 33, 871–885.

      Kozak, M. (1989). Circumstances and mechanisms of inhibition of translation by secondary structure in eucaryotic mRNAs. Mol Cell Biol 9, 5134–5142.

      Lareau, L.F., Hite, D.H., Hogan, G.J., and Brown, P.O. (2014). Distinct stages of the translation elongation cycle revealed by sequencing ribosome-protected mRNA fragments. eLife 3, e01257.

      Manjunath, H., Zhang, H., Rehfeld, F., Han, J., Chang, T.-C., and Mendell, J.T. (2019). Suppression of Ribosomal Pausing by eIF5A Is Necessary to Maintain the Fidelity of Start Codon Selection. Cell Reports 29, 3134–3146.e6.

      Matheisl, S., Berninghausen, O., Becker, T., and Beckmann, R. (2015). Structure of a human translation termination complex. Nucleic Acids Res 43, 8615–8626.

      Morisaki, T., Lyon, K., DeLuca, K.F., DeLuca, J.G., English, B.P., Zhang, Z., Lavis, L.D., Grimm, J.B., Viswanathan, S., Looger, L.L., et al. (2016). Real-time quantification of single RNA translation dynamics in living cells. Science 352, 1425–1429.

      Nilsson, O.B., Hedman, R., Marino, J., Wickles, S., Bischoff, L., Johansson, M., Müller-Lucks, A., Trovato, F., Puglisi, J.D., O’Brien, E.P., et al. (2015). Cotranslational Protein Folding inside the Ribosome Exit Tunnel. Cell Reports 12, 1533–1540.

      Noderer, W.L., Flockhart, R.J., Bhaduri, A., Diaz de Arce, A.J., Zhang, J., Khavari, P.A., and Wang, C.L. (2014). Quantitative analysis of mammalian translation initiation sites by FACS-seq. Mol Syst Biol 10, 748.

      Stern-Ginossar, N., Weisburd, B., Michalski, A., Le, V.T.K., Hein, M.Y., Huang, S.-X., Ma, M., Shen, B., Qian, S.-B., Hengel, H., et al. (2012). Decoding Human Cytomegalovirus. Science 338, 1088–1093.

      Subramaniam, Arvind R., Zid, Brian M., and O’Shea, Erin K. (2014). An Integrated Approach Reveals Regulatory Controls on Bacterial Translation Elongation. Cell 159, 1200–1211.

      Wagner, S., Herrmannová, A., Hronová, V., Gunišová, S., Sen, N.D., Hannan, R.D., Hinnebusch, A.G., Shirokikh, N.E., Preiss, T., and Valášek, L.S. (2020). Selective Translation Complex Profiling Reveals Staged Initiation and Co-translational Assembly of Initiation Factor Complexes. Mol Cell 79, 546–560.e7.

      Wang, H., Sun, L., Gaba, A., and Qu, X. (2020). An in vitro single-molecule assay for eukaryotic cap-dependent translation initiation kinetics. Nucleic Acids Res 48, e6.

      Weber, R., Chung, M.-Y., Keskeny, C., Zinnall, U., Landthaler, M., Valkov, E., Izaurralde, E., and Igreja, C. (2020). 4EHP and GIGYF1/2 Mediate Translation-Coupled Messenger RNA Decay. Cell Reports 33, 108262.

      Wilson, D.N., Arenz, S., and Beckmann, R. (2016). Translation regulation via nascent polypeptide-mediated ribosome stalling. Current Opinion in Structural Biology 37, 123–133.

      Wolin, S.L., and Walter, P. (1988). Ribosome pausing and stacking during translation of a eukaryotic mRNA. EMBO J 7, 3559–3569.

      Wu, B., Eliscovich, C., Yoon, Y.J., and Singer, R.H. (2016). Translation dynamics of single mRNAs in live cells and neurons. Science 352, 1430–1435.

      Wu, C.C.-C., Zinshteyn, B., Wehner, K.A., and Green, R. (2019). High-Resolution Ribosome Profiling Defines Discrete Ribosome Elongation States and Translational Regulation during Cellular Stress. Molecular Cell 73, 959–970.e5.

      Yan, X., Hoek, Tim A., Vale, Ronald D., and Tanenbaum, Marvin E. (2016). Dynamics of Translation of Single mRNA Molecules In Vivo. Cell 165, 976–989.

      Young, D.J., Meydan, S., and Guydosh, N.R. (2021). 40S ribosome profiling reveals distinct roles for Tma20/Tma22 (MCT-1/DENR) and Tma64 (eIF2D) in 40S subunit recycling. Nat Commun 12, 2976.

      Yu, J., Chen, M., Huang, H., Zhu, J., Song, H., Zhu, J., Park, J., and Ji, S.-J. (2018). Dynamic m6A modification regulates local translation of mRNA in axons. Nucleic Acids Research 46, 1412–1423.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Small ORFs are prevalent in eukaryotic genomes with variety of functions. Recent technological advances enable their detection, yet our understanding on the mode of action remains quite rudimentary. The manuscript by Bottorff, Geballe and Subramaniam aims at elucidating the function of UL4uORF in the CMV, and thus, it is on timely and topical research. The authors measure the uORF -controlled expression of the well-studies UL4 uORF and kinetically model the initiation behavior. Within a second uORF, a diproline pair controls initiation of the downstream main ORF sensing ribosomal collisions between a scanning small subunit and an 80S positioned at the canonical start of the main ORF. The stalling at both proline codons is envisioned as a kinetic window to sense any elongation-competent 80S at initiation and thus, control the ribosomal load and expression. Such diproline tandems are present in some uORFs in human transcriptome, hence representing more pervasive control mechanism.

      Significance

      I am unable to comment in depth on the modeling algorithms and simulations as this is outside of my expertise. The experiments are reasonably designed to test various models of uORF regulation and set the frame for the modelling. The idea that various stress factors would decrease canonical initiation and consequently would reflect the number of initiating ribosomes are adequately tested by varying the number of initiating ribosomes.<br /> The discovery of the two terminal prolines, that are also found in other human uORFs, is appealing mode of controlling stalling-driven downstream initiation. However, the lack of experimental support with the human uORFs may indicate additional contributions. This raises the question as to whether the proline codon identity plays a role? Since codons are read with different velocity which is mirrored by the tRNA concentration, it would be good to address whether special proline codons have been evolutionarily selected in CMV and whether the kinetics of stalling strongly depends on the codon identity. Are both prolines in the tandem using the same codon? Along that line, are the same proline codons used in the human diproline-containing counterparts? Consequently, the P to A mutation may have altered the codon usage and could be the reason for the nonlinear effect in the human sequenced. In this case, it would make sence to use Ala-codons with similar codon usage as the natural prolines?

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      In this paper, the authors are exploring the uORF regulatory mechanism. They first discussed five general models how uORFs might work to repress and buffering main ORF translation, then they mainly focus on the UL4 uORF2 for the potential mechanism. They use both computer modeling and experimental validation with reporter assay in 293t cell line. Based on their model, and few experimental results when they change the translation initiation rate and/or length of dORF, they propose it may work through 40S dissociation model, since the buffering effect is not uORF length sensitive.

      Major comments:

      • Are the key conclusions convincing?<br /> The modeling for different mechanisms is insightful, but some modeling parameters and experimental validation are not conclusive and validation of few of them can enforce the conclusions.
      • Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?<br /> Yes, the part about queuing and length sensitive is not convincing to me, it should be modified and reduce the statement strength.
      • Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.<br /> Yes, please see the major concerns
      • Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.<br /> They will need to re-think about the modeling, and validation in Figure 5, there are validation experiments that can be done in weeks and in a cost-efficient manner that can enforce the conclusions.
      • Are the data and the methods presented in such a way that they can be reproduced?<br /> Most of them are good
      • Are the experiments adequately replicated and statistical analysis adequate?<br /> Yes

      I have some major concerns about the paper:

      1. It is a bit confusing to me in Figure 2C, the reporter assays, that non-start codon reporter and non-stall reporter has same expression level. In theory, the non-stall reporter still has uORF there, so it should repress main ORF expression, and have lower expression level than the non-start reporter, where there is no uORF, no repression. In other uORFs they tested in Figure 6, the non-stall reporters are lower than non-start reporter. Since data they use to build the model is Figure 2B, and calculate the parameters for the whole paper, I just want to make sure it is making sense. I noticed there is another CTG in frame on the 4th codon, this may be alternative start codon in the non-start reporter to trigger some repression.
      2. All the modeling and prediction the authors do are based on average, but we know translation is very heterogeneous. For each ribosome or each 40S, the kinetics varies a lot, the authors should discuss about this part.
      3. For modeling related with the queuing-mediated model in Figure 1C. they use 30nt as the ribosome length to count the potential queuing to start codon. But 30nt is the 80S protected fragment with specific conformation. The protected fragment for 80S will change based on different status of ribosome conformation or elongation step. More importantly, for queuing, it is 40S, so they may have a different size. Based on previous 40S ribosome profiling (Archer, Stuart K., et al. Nature 535.7613 (2016): 570-574. And other papers), the length can vary from 19nt to very long, so I don't think the 30nt length can be used to model queuing in 40S and length sensitivity in the uORF working mechanism.
      4. For Figure 5B, besides the modeling length part I have mentioned above, when the authors increase the length of uORF, the sequence is also changed, which may introduce other side effect. So, if the authors want to conclude about the queuing part, they should rethink about the length for both modeling and validation, plus control for the sequence they added to increase the length of uORF, for example use different sequence when manipulate the length.

      Minor comments:

      • Specific experimental issues that are easily addressable.<br /> It is unclear how the luciferase assays were analyzed considering the background noise. If the NLuc expression is low, close to the background, then how to extract or normalize the background will influence the expression level, thus fold change for different reporter/condition.
      • Are prior studies referenced appropriately?<br /> yes
      • Are the text and figures clear and accurate?<br /> Mostly good
      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?<br /> Have a main figure about the modeling part.

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.<br /> It is an interesting area, using modeling with experiment validation to understand uORF regulation mechanism, the kinetics and interplay between different translation steps, it will help us to understand uORF buffering in stress conditions.<br /> Also bring modeling method with reporter validation to the translation field, will provide clues to the molecular mechanism study, especially in complex situation.
      • Place the work in the context of the existing literature (provide references, where appropriate).<br /> Recent years, there has been a lot of study about small open reading frames, while for uORFs are known to repress translation, the regulatory mechanism is not known yet, there are just different models not validated yet (Young& Wek, 2016). Also, under normal conditions and stress conditions, uORF can play both repressive and stimulative role in main ORF translation (Orr, Mona Wu, et al. NAR 48.3 (2020): 1029-1042.). This paper is the first study to put all the uORF working hypothesis with buffering effect together, they use modeling to explain how under each hypothesis, buffering may happen or not.
      • State what audience might be interested in and influenced by the reported findings.<br /> It will be interesting to people, who study molecular biology, biochemistry for translation regulation, especially uORFs. The modeling people may also find it interesting, how they could adapt modeling to complex biology process and contribute to the understanding.
      • Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.<br /> I have extensive experience working in the translation regulation field and I feel extremely comfortable to discus all the experimental part including individual reporters as well as genome wide. But I do not consider an expert in the modelling section of this work.
    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Bottorff et al test several models of uORF-mediated regulation of main ORF translation using the uORF2 of CMV UL4 gene, a system that has been previously experimentally characterized by the authors. They first train a computational model to recapitulate the observed experimental effects of mutations in uORF2, and then use the model to infer which uORF parameters may confer buffering against reduced ribosome loading that typically occurs upon biological perturbation. The authors then find that: i) the uORF2 confers buffering, ii) the uORF2 mechanism adjusts to computational predictions for the collision-mediated 40S dissociation model of uORF-mediated regulation.

      Major comments:

      1. Figure 4: Which is the dynamic range of the WT vs the no-stall construct? In the WT construct, main ORF translation is already quite repressed, and detecting further repression may be more difficult than in the no-stall construct. In other words, the differences that authors are detecting between the WT and no-stall constructs might be due to a potential lower dynamic range of the WT construct.
      2. The authors conclude that uORF2 follows the collision-mediated 40S dissociation model, based on fitness of their experimental results with predictions from their mathematical modeling regarding distance between uORF2 initiation codon and the stalling site. But can the authors actually directly prove that there are no 40S subunits accumulating behind the stalled 40S using Ribo-Seq or TCP-Seq?
      3. Experimental data in Figures 2, 4 and 5 include 3 technical replicates. Sound conclusions typically require biological replicates. Further, the number of replicates in Figure 6 has not been indicated.

      Minor comments:

      1. Figure 4: It is strange that a PEST sequence had to be introduced in the construct of part B in order to observe reliable differences, but not in constructs of parts A and C. Can the authors explain?
      2. Figure 6: Unfortunately, the authors find no other human uORFs with terminal diproline motifs that are so essential for main ORF repression as uORF2. In this light, can the authors comment further on the usefulness of their findings for human genes? Have the authors searched for viral RNAs with similar features? Please, notice that the gene PPP1R37 has not been mentioned in the main text.

      Significance

      This manuscript represents an interesting effort to distinguish mechanisms of uORF-mediated regulation based on mathematical modeling, and might be useful for the translation community.<br /> My expertise: Regulation of translation.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      This section is optional. Insert here any general statements you wish to make about the goal of the study or about the reviews.

      The goal of our study was to evaluate the role of the RNA binding protein SAM68 in the regulation of cell adhesion and adaptation of endothelial cells to their extracellular environment. We showed that SAM68 depletion affected endothelial cell behavior by impairing adhesion site maturation and compromising basement membrane assembly.

      We are pleased that the reviewers found our study to be interesting, well written and clear, with findings that are supported by carefully designed experiments. Importantly, we would like to thank the reviewers for their careful analysis of our work and for their clear and constructive comments.

      One common query was whether the regulation of β-actin mRNA localization at adhesion sites, and FN1 gene transcription by SAM68 in endothelial cells involves direct interactions with the mRNA and promoter, respectively. This important point will be addressed with additional experiments in order to strengthen our hypothesis.

      A second point that emerged from the reviews relates to the interdependence of SAM68 multi-layered effects on cell adhesions and FN1 gene transcription. Our response to this issue is discussed below and has been clarified in the revised manuscript.

      Lastly, since in vivo studies are not feasible locally or in a reasonable timeframe, our claim that SAM68 tunes an endothelial morphogenetic program has been toned down in the revised manuscript. Nonetheless, our data clearly show that SAM68 is a major regulator of endothelial adhesion and conditioning of the subendothelial basement membrane.

      Altogether, the proposed experiments and revisions will solidify our data and improve our study thus providing “a significant advance towards understanding the multiple roles of RNA-binding proteins and their coordination in a study system with physiologically relevant connections”, as stated by Reviewer#3.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      To answer critical points raised by the 3 referees, we plan to implement our work with 3 main sets of experiments:

      Set 1 of experiments: Analysis of direct interaction between SAM68 and b-actin mRNA by RIP in endothelial cells according to an improved version of published protocols and results from (Li and Richard, 2016).

      Set 2 of experiments: Analysis of a direct interaction between SAM68 and the FN1 promoter by ChIP in endothelial cells according to published protocols and results from (Li and Richard, 2016).

      Set 3 of experiments: Assessment of the dual functions of SAM68 and their interconnections by i) FN rescue (expression of exogenous FN in SAM68-depleted cells) or ii) by expression of SAM68 mutants.

      In addition, we are generating tools to address the dynamic localization of b-actin in endothelial cells following SAM68 perturbations in endothelial cells (MS2 lentiviral constructs and antisense oligonucleotides designed to abrogate SAM68 recruitment onto b-actin mRNA).

      Below, we describe how these sets of experiment will address Reviewers’ comments and queries in a point-by-point reply.

      Reviewer #1

      • The authors claim that SAM68 interacts with B-actin mRNA to delivery to sites of adhesion only based on siRNA-mediated knockdown experiments. Is the binding of SAM68 to B-actin dynamic process that changes with time? The authors could perform RIP experiments at different stages of cell adhesion - from early points when SAM68 is peripheric to later stages when it homogeneously distributed - to show a potential dynamic interaction with B-actin mRNA.

      β-actin mRNA has been previously identified as a direct target of SAM68 in several published works performed by different groups (Itoh et al., 2002; Klein et al., 2013; Mukherjee et al., 2019). SAM68 binding site has been mapped to a 50 nt length sequence located in the 3’UTR of β-actin mRNA but direct binding of SAM68 onto β-actin mRNA has never been shown in endothelial cells. To this end, we will perform RIP experiments (Set 1 of experiments) to first identify direct recruitment of SAM68 to β-actin mRNA in endothelial cells (as suggested by reviewer 2 as well). Secondly, to address the dynamics of SAM68 interactions with β-actin mRNA we will assess direct interactions of SAM68 with β-actin mRNA at different stages of cell adhesion. These experiments will be conducted using an adapted version of the published SAM68 RIP protocol (Li and Richard, 2016).

      • The article would substantially benefit from live visualisation of B-actin localisation with MS2 tagged transcripts in SAM68 knockdown contexts. This would solidify the proposed mRNA delivery SAM68-mediated mechanism. Although this should not be hard to carry out given the availability of MS2-labelled animals, I understand access to the tools may constitute a major hurdle.

      As mentioned by Reviewer 1, access to MS2-labelled animals and carrying out in vivo experiments in mouse endothelial cells would be a roadblock for our team in the context of this work. Nonetheless, we fully agree that live visualization of β-actin mRNA recruitment at adhesions would solidify our hypothesis. Therefore, we are currently setting up in cellulo experiments in endothelial cells to visualize MS2-β-actin reporters (Yoon et al., 2016), in presence of control or SAM68 binding site-directed antisense blocking oligonucleotides, as previously described (Klein et al., 2013).

      Minor comment:

      • Could the authors run the eGFP-SAM68 movies for longer periods to show the dynamic localisation of the protein during spreading? These experiments would support the data based on fixed material.

      We thank Reviewer 1 for this suggestion and will adjust our imaging pipeline for longer time acquisitions taking caution not to impact cell dynamics due to extended laser exposure.

      Reviewer #2

      • The authors reference previous work defining SAM68 as a beta-actin mRNA interacting protein, however, experiments confirming this in endothelial cells and that this occurs during normal focal adhesion assembly are important.

      This concern will be addressed by Set 1 of experiments (RIP assays) as described in our response to the comments of Reviewer 1.

      • Likewise, experiments addressing how important this action is for focal adhesion function are critical. For example, the beta-actin RNA-binding site of SAM68 could be identified and perturbed to assess the direct impact of this mRNA delivery on FAK-Y397 phosphorylation, focal adhesion assembly, adhesion, cell spreading and migration/sprouting. Without these or similar experiments, the importance of SAM68-mediated beta-actin mRNA delivery is unknown.

      The beta-actin RNA-binding site of SAM68 has previously been identified (Itoh et al., 2002) and antisense blocking oligonucleotides designed to target this sequence have been shown to abrogate SAM68 recruitment onto β-actin mRNA in neurons (Klein et al., 2013). In order to determine whether SAM68 delivery of β-actin mRNA is directly involved in focal adhesion assembly and signalling, we will use the published antisense oligonucleotides to block SAM68 recruitment and assess FAK-Y397 phosphorylation in our bead model.

      • Indeed, if this is not important for FAK-Y397 phosphorylation and focal adhesion assembly, then experiments need to be designed to assess how SAM68 achieves FAK phosphorylation/maturation to provide any significant insight into SAM68 function.

      To address this point, we have generated an RNA binding mutant of SAM68 (KH domain) for analysis of FAK phosphorylation using the bead assay. Importantly, SAM68 is a multi-domain protein that harbors protein/protein interaction domains (SH2 and SH3 binding domains) and it is known to act as a scaffolding protein, in TNFRα signaling for instance (Ramakrishnan and Baltimore, 2011). Therefore, we have also generated lentiviral constructs containing mutations in the SH2 or SH3 binding domains of SAM68 to interrogate its potential signaling adaptor function.

      • The data as presented suggest that a key function of SAM68 is to drive fibronectin (and perhaps other ECM gene) transcription. However, more experiments are needed to validate this conclusion. For example, increased FN1 promoter activity in luciferase assays may be an indirect consequence of feedback to the promoter upon SAM68-mediated action on, amongst other possible actions, focal adhesion signaling, FN transcript splicing or ECM remodeling. Experiments confirming that SAM68 interacts with the endogenous ECM gene promoter would be critical (e.g. via ChIP), as would disruption of the trans-activating action of SAM68 to directly assess the impact of this function (versus modulation of focal adhesion dynamics) on focal adhesion assembly, adhesion, cell spreading and migration/sprouting.

      We fully agree that ChIP experiments to identify recruitment of SAM68 onto the endogenous FN1 promoter in endothelial cells would be required to confirm direct transcriptional activation of the FN1 gene in these cells. Therefore, we will perform these experiments (Set 2) according to a published SAM68 ChIP protocol (to be adapted for endothelial cells) which allowed for the demonstration of specific recruitment of SAM68 onto P21 or PUMA promoters, as well as its transcriptional co-activating activity (Li and Richard, 2016).

      Regarding possible indirect effects of FN1 promoter activity in the luciferase assay shown in Figure 1F on HEK293 cells, we would like to point out that, in addition to their high transfection efficiency, HEK293 cells were chosen for this assay because they display nearly undetectable expression of FN and they are unable to assemble the molecule (even upon overexpression of exogenous FN, see Efthymiou G et al., JCS 2021). Thus, our results using this system support a direct effect of SAM68 on FN promoter activity. This information has been added to the revised text.

      • In parallel, rescue experiments to determine how recovery of endothelial FN expression impacts adhesion, cell spreading, and migration/sprouting (upon SAM68 knockdown) would determine how important this action is to control of endothelial cell behavior.

      Our previous published data showed that autocrine FN expression regulates adhesion, spreading and migration of endothelial cells and that differences in FN expression levels affect assembly of the protein (Cseh et al., 2010; Radwanska et al., 2017). In SAM68-depleted cells, with compromised FN expression, the rescue of FN expression should allow us to uncouple SAM68 functions at adhesion sites from its role as a transcriptional regulator of FN expression (Set 3 of experiments). Expression of exogeneous FN in SAM68-depleted endothelial cells will be performed using lentiviral FN expression constructs described by our team (Efthymiou et al., 2021).

      • Likewise, experiments designed to determine if broader disruption of COL8A1, POSTN, FBLM1 and BGN expression are direct (or indirect, e.g., due to FN disruption) would be important to understand SAM68 function.

      The same set of experiments (Set 3) will be used to analyze by qRT-PCR the expression of COL8A1, POSTN, FBLN1 and BGN mRNAs upon the rescue of FN expression in SAM68-depleted cells.

      • Loss of SAM68 expression in other cell types is known to perturb migration, whereas migration is enhanced in endothelial cells upon SAM68 knockdown. Why would this be the case? Is it that the proposed negative impact of FN production on motility is greater than the positive impact of SAM68 focal adhesion dynamics in endothelial cells versus other cell types? Exploration of the relative impact of these proposed dual functions (using additional experiments as mentioned above) is critical to make sense of these somewhat conflicting observations.

      This point relating to the balance between the negative impact of SAM68-stimulated FN production on motility and the positive impact of SAM68 on focal adhesion dynamics in endothelial cells, is very interesting. Set 3 of experiments, which includes expression of exogenous FN and assessment of cell motility in SAM68-depleted endothelial cells, should allow us to clarify this issue.

      Previous work has implicated phosphorylation of SAM68 as a key trigger of its activity (Locatelli and Lange, 2011, Naro et al., 2022). Additional work exploring the impact of SAM68 phosphorylation on focal adhesion dynamics and ECM gene expression/remodeling (e.g. using phospho-mutants) in this manuscript would have strengthened the message.

      The regulation of SAM68 activity by phosphorylation is a complex question as SAM68 has multiple sites of phosphorylation by serine/threonine and tyrosine kinases. One of these sites (Y440) is a known substrate of Src, a major kinase activated at the cell membrane during adhesion. We are currently generating a Src phosphorylation mutant of SAM68 (Y440F) which could be used to address the impact of SAM68 phosphorylation on integrin signaling and ECM gene expression/remodeling.

      Reviewer #3

      • the authors describe the observed phenotypes as resulting from 'coalescent activities' of SAM68 that play a role in the adaptation of ECs to the extracellular environment. However, it is unclear whether and which of the observed effects result from direct local functions of different SAM68 pools, versus reflecting indirect downstream consequences of one major function. For example, the effects on transcription could be a result of altered adhesion signaling and might occur independently of nuclear SAM68. Or the effects on adhesions could be an indirect consequence of altered transcription of ECM genes, independent of the transient accumulation of SAM68 at the periphery. To support that these are distinct and direct SAM68 functions, the authors would have to provide more evidence for the involvement of SAM68 in the studied processes (e.g. is SAM68 observed by CHIP at promoter regions of ECM genes whose transcription is affected?)

      As recommended by Reviewer 2 as well, we will perform ChIP experiments to document the direct recruitment of SAM68 onto the FN1 promoter (Set 2 of experiments).

      • and try to uncouple them to assess their relative contributions and potential connections in the observed phenotypes (e.g. it would be informative to attempt to rescue the knockdown phenotypes with mutants of SAM68 that cannot be imported into the nucleus or that cannot bind RNA or that cannot be phosphorylated by Src_

      Set 3 of experiments should allow us to uncouple dual functions of SAM68 in endothelial cells. In these experiments, integrin signaling defects will be evaluated in SAM68–depleted cells following the rescue of FN expression. Persistence of the adhesion site defect would indicate that transcriptional activity and adhesion site regulation by SAM68 are distinct events. Moreover, as indicated above, we are generating lentiviral constructs of SAM68 mutants with impaired ability to bind RNA or be phosphorylated by Src (Y440F), in order to assess their effect on integrin signaling.

      Minor comment:

      Also, is the effect of SAM68 depletion on pY397-FAK levels local and/or transient? it would be useful to present data on the total amount of pY397-FAK (by IF or western) in control and si-SAM68 cells at early and late stages of spreading

      This point is very interesting and will be tested at early vs late stage of spreading.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      Reviewer #1

      • The authors state that "...both submembranous functions [...] and nuclear functions [...] of SAM68 contribute to the morphogenetic phenotype of angiogenic endothelial cells. Some caution must be taken, as all previous data were obtained from 2D experiments. At this stage it cannot be excluded other mechanisms involved in 3D migration.

      We fully agree with this reviewer’s comment and we have modified the manuscript to take into account the fact that we cannot exclude other mechanisms of action for SAM68 in 3D endothelial cell sprouting experiments However, it is noteworthy that the migration per se of individual cells is not measured in our 3D experiments.

      Minor comments:

      • In Figure 1F, there is a drop-in luciferase activity in cells transfected with higher amounts of vector, rather than an increase with SAM68. Why?

      The luciferase reporter assay is a convenient and well-accepted means of evaluating promoter activities, however, it requires the transfection of increasing amounts of expression plasmids, which often contain strong promoters such as CMV (in our case). Depending on the experimental conditions, a drop in luciferase activity is often observed, due to titration of general transcriptional factors. In our experiments shown in Figure 1F, despite the observed drop in luciferase activity in pcDNA 3.1-transfected cells, transfection of increasing amounts of the SAM68 expression vector induced a significant increase in luciferase activity.

      • The authors claim (and rightly so) that plasmids are hard to transfect into HUVECs when describing luciferase reporter assays. However, they express eGFP-SAM68 (presumably from a plasmid).

      eGFP-SAM68 was delivered and expressed in endothelial cells using a lentiviral vector (this has been specified in the revised manuscript: legend to Movie supplement 1). Although eGFP-SAM68 is successfully expressed, the efficiency of infection is a bit low. Thus, this method is adequate when experiments require observations at the single cell level, such as imaging of endothelial cells expressing eGFP-Sam68. However, the low infection efficiency makes it unsuited for the observation of global effects on the cell population, as is the case for a luciferase assay in which all cells from a given experimental condition are lysed.

      • Some experimental details in the figure legends could be restricted / moved to the methods section.

      • Some typos and British/American spelling inconsistencies (e.g. localisation and localization) need to be corrected throughout the manuscript.

      • Statistical analysis details could be mentioned in figure legends.

      • In page 11, "... proposed to be involved in regulation of early cell adhesion processes and spreading" needs referencing.

      • Y axes in some graphs do not start at 0, which may mislead visual interpretations.

      • "Figure 2-figure supplement 1" in page should read "movie supplement 1"

      We thank Reviewer #1 for these comments which have all been taken into account. Appropriate changes/corrections have been made in the revised version of the manuscript.

      Reviewer #2

      • The title of the manuscript states that SAM68 modulates the morphogenetic program of endothelial cells, yet there are no studies of blood vessel morphogenesis described by this work. Ultimately, in vivo studies of vessel development in SAM68 mutant mice would be required to be able to make this claim.

      We agree that only endothelial cell morphogenesis, and not blood vessel morphogenesis, has been addressed in this study. In light of the reviewer’s recommendation to tone down claims that SAM68 tunes an endothelial morphogenetic program, we have modified the revised manuscript text and title.

      • Place the work in the context of the existing literature (provide references, where appropriate).<br /> SAM68 has previously been identified as an RNA-binding protein associated with the 'adhesome' that regulates cell motility (Huot et al., 2009a, Locatelli and Lange, 2011, Naro et al., 2022). Here, Rekad and colleagues also probe the action of SAM68 in endothelial cell migration, but find this to be enhanced upon SAM68 knockdown - unlike previous studies demonstrating a reduction in motility in similar experiments in other cell types. Indeed, a detailed discussion of this discrepancy would have been appreciated.

      As recommended by Reviewer #2, we have included a more detailed discussion of this point in the revised manuscript.

      Reviewer #3

      Figure 3C: Is the n=3 indicative that only 3 beads were analyzed? Given the relatively small difference, a larger sample size would be useful.

      We thank the Reviewer for pointing out this mistake. Three independent experiments have been performed with quantification of at least 12 beads for each condition. The manuscript has been corrected accordingly (N=3).

      Page 5-6: The statement 'nearly all adhesion sites in SAM68-depleted cells remained smaller than 0.75 um' doesn't seem to accurately reflect the data presented in the right panel of Figure 1C.

      We have modified the units (µm2) of average adhesion size. Nearly all adhesion sites in SAM68-depleted cells remained smaller than 0.75 µm2

      Page 6: there is a reference to a G418 phosphotyrosine antibody. Do the authors mean 4G10 antibody? Also, there is a mention that materials are listed in Supplemental tables 1 and 2, but these were not attached.

      We thank the Reviewer for having noted these typos, and the fact that we omitted to attached Supplemental Tables 1 and 2. This has been corrected in the revised manuscript, to be submitted with the Supplemental Tables.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      Reviewer #1

      • deHoog et al. 2004 (doi.org/10.1016/S0092-8674(04)00456-8) had shown the presence of SAM68 in SICs. Why do the authors believe that the presence of SAM68 in the periphery in endothelial cells does not mark the formation of SICs in these cells?

      Spreading Initiation Centers (SICs) are described as structures involved in the early step of adhesion which contain SAM68, along with other RNA binding proteins (de Hoog et al., 2004) in MRC5 cells. In the same paper, to test whether SICs are a general feature of cell adhesion, authors evaluated the presence of SICS during adhesion of several other cell including endothelial cells (HUVEC). Among the 6 types of cells tested, SICs were not observed in nonfibroblastic cell types. In accordance with this study, we did not observe SICs, as defined by deHoog et al., in endothelial cells plated onto FN

      • In Shestakova et al 2001 (doi.org/10.1073/pnas.121146098), decreased localisation of B-actin mRNA leads to reduced persistence of direction of movement. Was this measured? Is this not seen here because SAM68 is only responsible for B-actin mRNA localisation at early stages of adhesion?

      We thank Reviewer 1 for this comment. After re-analysis of our migration data we did not detect a significant effect on the persistence of migration in our experimental conditions. This could indeed reflect the temporal regulation by SAM68 of b-actin mRNA localization at the leading edge of cells, although we cannot exclude additional defects caused by SAM68 depletion on adhesion stability and lammelipodial protrusion and consequently cell polarity and directional motility.

      • Although the authors claim that altered ECM deposition in SAM68 deficient cells results from altered transcription, they do not address potential misregulation of translation and secretion.

      We did not address misregulated translation here as FN mRNA levels were significantly decreased in SAM68-depleted cells. The decreased transcript levels were accompanied by decreased protein levels. Upon depletion of SAM68, we detected less FN in both “soluble” (conditioned medium) and “insoluble” (ECM-associated) forms, as shown in the western blots of Figure 4-figure supplement 1. We do not believe that SAM68 silencing impacts FN secretion, as we did not observe differential retention of FN in the cytoplasm of SAM68-depleted cells compared to control cells by immunostaining (Figure 4C). Rather, FN staining was strictly fibrillar (ECM-associated) in both control and SAM68-depleted cells, and the intensity profile baseline values were similarly low. This point has been added to the revised manuscript.

      -In, fact the highlight that whilst the level of some mRNAs encoding basement membrane proteins do not decrease in the absence of SAM68, their incorporation was severely affected. This is worth exploring to strengthen the manuscript.

      This issue was not addressed for other basement membrane components. However, the dichotomy in expression and matrix incorporation of certain basement membrane components is most likely due to the sequential and hierarchical nature of ECM assembly. FN is one of the earliest ECM proteins to be assembled and observations from multiple laboratories have shown that FN orchestrates the assembly of multiple matrix components (reviewed in, (Dallas et al., 2006; Marchand et al., 2019)), including COLIV ((Filla et al., 2017; Miller et al., 2014)).

      • Whilst the data presented in figure 7 is convincing, some more detailed mechanistic analyses could help further comprehend 2D and 3D behaviours. Could it be that the nuclear and cellular roles of SAM68 are somewhat decoupled depending on the environment? Could the RNA localisation functions have a critical role in endothelial sprouting and not so much in 2D migration? Some insights are needed to address these questions and wrap up some loose ends. In its current form, this section of the manuscript is too vague.

      It is known that 2D and 3D culture conditions induce differences in cell behavior, notably through differences in the physical (rigid vs pliable) and biochemical (plastic vs fibrin gel) nature of the environments that differentially regulate mechanotransduction, integrin signaling, cell polarity, etc. Here, we show that migration of endothelial cells on rigid 2D substrates is increased upon SAM68 depletion. On the other hand, the ability of cells to align in capillary-like cords and invade a 3D environment is reduced. Mechanistically, effects of SAM68 on FN production and ECM assembly are likely involved in both contexts by providing an adhesive substrate that restricts cell motility in 2D, or bridges neighboring cells and promotes cell survival in 3D. The purpose of performing the sprouting assay presented here in addition to cell migration assays was not to compare the same functions of SAM68 in these 2 different contexts but rather to illustrate that SAM68 controls endothelial cell behavior in both 2D and 3D environments and thus could significantly impact angiogenesis.

      Reviewer#2

      • Many of the findings are rather superficial or observational, and a detailed mechanistic understanding of SAM68 function is lacking. For example, loss of SAM68 expression reduces beta-actin mRNA recruitment to sites of fibronectin-coated bead adhesion, but how is this regulated and what is its impact on focal adhesion dynamics?

      Both the role of beta-actin mRNA localization on cell adhesion dynamics and the impact of reducing this localization have been extensively documented (Katz et al., 2012; Kislauskis et al., 1994; Shestakova et al., 2001), or (Herbert and Costa, 2019) and references therein. In particular, a specific RNA binding protein called ZBP1 has been shown to localize actin mRNA near focal adhesions (Katz et al., 2012) by a Src kinase-dependent mechanism (Hüttelmaier et al., 2005).

      References

      Cseh B, Fernandez-Sauze S, Grall D, Schaub S, Doma E, Van Obberghen-Schilling E. 2010. Autocrine fibronectin directs matrix assembly and crosstalk between cell-matrix and cell-cell adhesion in vascular endothelial cells. J Cell Sci 123:3989–3999. doi:10.1242/jcs.073346

      Dallas SL, Chen Q, Sivakumar P. 2006. Dynamics of Assembly and Reorganization of Extracellular Matrix ProteinsCurrent Topics in Developmental Biology. Academic Press. pp. 1–24. doi:10.1016/S0070-2153(06)75001-3

      de Hoog CL, Foster LJ, Mann M. 2004. RNA and RNA binding proteins participate in early stages of cell spreading through spreading initiation centers. Cell 117:649–662. doi:10.1016/s0092-8674(04)00456-8

      Efthymiou G, Radwanska A, Grapa A-I, Beghelli-de la Forest Divonne S, Grall D, Schaub S, Hattab M, Pisano S, Poet M, Pisani DF, Counillon L, Descombes X, Blanc-Féraud L, Van Obberghen-Schilling E. 2021. Fibronectin Extra Domains tune cellular responses and confer topographically distinct features to fibril networks. J Cell Sci 134:jcs252957. doi:10.1242/jcs.252957

      Filla MS, Dimeo KD, Tong T, Peters DM. 2017. Disruption of fibronectin matrix affects type IV collagen, fibrillin and laminin deposition into extracellular matrix of human trabecular meshwork (HTM) cells. Exp Eye Res 165:7–19. doi:10.1016/j.exer.2017.08.017

      Herbert SP, Costa G. 2019. Sending messages in moving cells: mRNA localization and the regulation of cell migration. Essays Biochem 63:595–606. doi:10.1042/EBC20190009

      Hüttelmaier S, Zenklusen D, Lederer M, Dictenberg J, Lorenz M, Meng X, Bassell GJ, Condeelis J, Singer RH. 2005. Spatial regulation of beta-actin translation by Src-dependent phosphorylation of ZBP1. Nature 438:512–515. doi:10.1038/nature04115

      Itoh M, Haga I, Li Q-H, Fujisawa J. 2002. Identification of cellular mRNA targets for RNA-binding protein Sam68. Nucleic Acids Res 30:5452–5464. doi:10.1093/nar/gkf673

      Katz ZB, Wells AL, Park HY, Wu B, Shenoy SM, Singer RH. 2012. β-Actin mRNA compartmentalization enhances focal adhesion stability and directs cell migration. Genes Dev 26:1885–1890. doi:10.1101/gad.190413.112

      Kislauskis EH, Zhu X, Singer RH. 1994. Sequences responsible for intracellular localization of beta-actin messenger RNA also affect cell phenotype. J Cell Biol 127:441–451. doi:10.1083/jcb.127.2.441

      Klein ME, Younts TJ, Castillo PE, Jordan BA. 2013. RNA-binding protein Sam68 controls synapse number and local β-actin mRNA metabolism in dendrites. Proc Natl Acad Sci U S A 110:3125–3130. doi:10.1073/pnas.1209811110

      Li N, Richard S. 2016. Sam68 functions as a transcriptional coactivator of the p53 tumor suppressor. Nucleic Acids Res 44:8726–8741. doi:10.1093/nar/gkw582

      Marchand M, Monnot C, Muller L, Germain S. 2019. Extracellular matrix scaffolding in angiogenesis and capillary homeostasis. Semin Cell Dev Biol, Mammalian innate immunity to fungal infection 89:147–156. doi:10.1016/j.semcdb.2018.08.007

      Miller CG, Pozzi A, Zent R, Schwarzbauer JE. 2014. Effects of high glucose on integrin activity and fibronectin matrix assembly by mesangial cells. Mol Biol Cell 25:2342–2350. doi:10.1091/mbc.e14-03-0800

      Mukherjee J, Hermesh O, Eliscovich C, Nalpas N, Franz-Wachtel M, Maček B, Jansen R-P. 2019. β-Actin mRNA interactome mapping by proximity biotinylation. Proc Natl Acad Sci 116:12863–12872. doi:10.1073/pnas.1820737116

      Radwanska A, Grall D, Schaub S, Divonne SB la F, Ciais D, Rekima S, Rupp T, Sudaka A, Orend G, Van Obberghen-Schilling E. 2017. Counterbalancing anti-adhesive effects of Tenascin-C through fibronectin expression in endothelial cells. Sci Rep 7:12762. doi:10.1038/s41598-017-13008-9

      Ramakrishnan P, Baltimore D. 2011. Sam68 Is Required for Both NF-κB Activation and Apoptosis Signaling by the TNF Receptor. Mol Cell 43:167–179. doi:10.1016/j.molcel.2011.05.007

      Shestakova EA, Singer RH, Condeelis J. 2001. The physiological significance of beta -actin mRNA localization in determining cell polarity and directional motility. Proc Natl Acad Sci U S A 98:7045–7050. doi:10.1073/pnas.121146098

      Yoon YJ, Wu B, Buxbaum AR, Das S, Tsai A, English BP, Grimm JB, Lavis LD, Singer RH. 2016. Glutamate-induced RNA localization and translation in neurons. Proc Natl Acad Sci U S A 113:E6877–E6886. doi:10.1073/pnas.1614267113

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this manuscript Rekad et al. investigate the role of the RNA-binding protein SAM68 in the interactions of endothelial cells with the extracellular matrix. They knockdown SAM68 expression using siRNAs and demonstrate a role in cell migration and angiogenic sprouting. They further extensively characterize the molecular effects of SAM68 loss, including effects associated with adhesions (number and size of focal adhesions, actin cytoskeleton organization, integrin signaling and delivery of beta-actin mRNA to adhesions), as well as effects on transcription and organization of ECM components. They also observe that SAM68 transiently localizes near adhesion sites during early stages of cell spreading, apart from its predominant presence in the nucleus. Based on this, they suggest that SAM68 affects migration and sprouting of endothelial cells through coordinated functions carried out locally both at adhesion sites (where SAM68 controls integrin signaling and mRNA delivery) as well as in the nucleus (where it controls transcription and mRNA splicing). The manuscript is clearly written, and the presented experiments are well-performed. However, the conclusions drawn from these experiments are not fully supported, and in various instances, statements regarding the underlying mechanisms are inferred based on reports of SAM68 functions in other systems.

      For example, the authors describe the observed phenotypes as resulting from 'coalescent activities' of SAM68 that play a role in the adaptation of ECs to the extracellular environment. However, it is unclear whether and which of the observed effects result from direct local functions of different SAM68 pools, versus reflecting indirect downstream consequences of one major function. For example, the effects on transcription could be a result of altered adhesion signaling and might occur independently of nuclear SAM68. Or the effects on adhesions could be an indirect consequence of altered transcription of ECM genes, independent of the transient accumulation of SAM68 at the periphery.

      To support that these are distinct and direct SAM68 functions, the authors would have to provide more evidence for the involvement of SAM68 in the studied processes (e.g. is SAM68 observed by CHIP at promoter regions of ECM genes whose transcription is affected?) and try to uncouple them to assess their relative contributions and potential connections in the observed phenotypes (e.g. it would be informative to attempt to rescue the knockdown phenotypes with mutants of SAM68 that cannot be imported into the nucleus, or that cannot bind RNA, or that cannot be phosphorylated by Src).

      In the absence of additional data, the work is quite descriptive and relies on extrapolations from other studies for supporting the proposed mechanistic model. If further evidence is provided to support it, it would amount to a significant advance towards understanding the multiple roles of RNA-binding proteins and their coordination in a study system with physiologically relevant connections.

      Minor comments for clarifying some existing data:

      Figure 3C: Is the n=3 indicative that only 3 beads were analyzed? Given the relatively small difference, a larger sample size would be useful. Also, is the effect of SAM68 depletion on pY397-FAK levels local and/or transient? it would be useful to present data on the total amount of pY397-FAK (by IF or western) in control and si-SAM68 cells at early and late stages of spreading

      Page 5-6: The statement 'nearly all adhesion sites in SAM68-depleted cells remained smaller than 0.75 um' doesn't seem to accurately reflect the data presented in the right panel of Figure 1C.

      Page 6: there is a reference to a G418 phosphotyrosine antibody. Do the authors mean 4G10 antibody? Also, there is a mention that materials are listed in Supplemental tables 1 and 2, but these were not attached.

      Significance

      See above

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      In this manuscript, Rekad and colleagues investigate the function of the RNA-binding protein SAM68 in endothelial cell-ECM interactions and remodeling. SAM68 was previously identified as a component of the 'adhesome', and Rekad et. al. further confirms transient co-localization with nascent focal adhesions and reveals that loss of SAM68 triggers disruption to cell spreading and adhesion maturation in endothelial cells. Using fibronectin-coated beads to trigger cell-ECM interactions, Rekad and colleagues further show that pFAK-Y397 is reduced upon SAM68 loss, as is recruitment of beta-actin mRNA to sites of focal adhesion assembly. In parallel, Rekad et. al. uncover additional impacts of SAM68 knockdown on fibronectin deposition, assembly, expression, promoter activity and splicing - as well as broader impacts on expression of other ECM proteins. Overall, this work suggests a dual role for SAM68 in regulation of endothelial focal adhesion dynamics and ECM assembly that negatively regulates cell migration and positively regulates cell sprouting in in vitro assays.

      Major comments:

      • Are the key conclusions convincing?

      The data are well presented and the impact of SAM68 depletion (or over-expression) on focal adhesion state, ECM composition and endothelial cell behavior appear clear. However, the direct function of SAM68 in the observed phenomena remain untested, and interpretation of results either relies heavily on observations based in other systems, or is inadequately followed up with detailed studies of the mechanisms involved. Thus, several conclusions on SAM68 function are not entirely convincing and need to be bolstered with additional experiments.<br /> - Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      Yes, several claims need to be supported with additional experiments (as detailed below). Moreover, the claim that SAM68 tunes an endothelial morphogenetic program is far too speculative based on observations using in vitro assays of vessel branching that do not fully recapitulate vessel morphogenesis. Without additional in vivo work in mouse, or other model systems in which blood vessel morphogenesis can be adequately observed, these claims need to be toned down significantly.<br /> - Would additional experiments be essential to support the claims of the paper?

      Yes, please see detailed below:

      1. Many of the findings are rather superficial or observational, and a detailed mechanistic understanding of SAM68 function is lacking. For example, loss of SAM68 expression reduces beta-actin mRNA recruitment to sites of fibronectin-coated bead adhesion, but how is this regulated and what is its impact on focal adhesion dynamics? The authors reference previous work defining SAM68 as a beta-actin mRNA interacting protein, however, experiments confirming this in endothelial cells and that this occurs during normal focal adhesion assembly are important. Likewise, experiments addressing how important this action is for focal adhesion function are critical. For example, the beta-actin RNA-binding site of SAM68 could be identified and perturbed to assess the direct impact of this mRNA delivery on FAK-Y397 phosphorylation, focal adhesion assembly, adhesion, cell spreading and migration/sprouting. Without these or similar experiments, the importance of SAM68-mediated beta-actin mRNA delivery is unknown. Indeed, if this is not important for FAK-Y397 phosphorylation and focal adhesion assembly, then experiments need to be designed to assess how SAM68 achieves FAK phosphorylation/maturation to provide any significant insight into SAM68 function.
      2. The data as presented suggest that a key function of SAM68 is to drive fibronectin (and perhaps other ECM gene) transcription. However, more experiments are needed to validate this conclusion. For example, increased FN1 promoter activity in luciferase assays may be an indirect consequence of feedback to the promoter upon SAM68-mediated action on, amongst other possible actions, focal adhesion signaling, FN transcript splicing or ECM remodeling. Experiments confirming that SAM68 interacts with the endogenous ECM gene promoter would be critical (e.g. via ChIP), as would disruption of the trans-activating action of SAM68 to directly assess the impact of this function (versus modulation of focal adhesion dynamics) on focal adhesion assembly, adhesion, cell spreading and migration/sprouting. In parallel, rescue experiments to determine how recovery of endothelial FN expression impacts adhesion, cell spreading and migration/sprouting (upon SAM68 knockdown) would determine how important this action is to control of endothelial cell behavior. Likewise, experiments designed to determine if broader disruption of COL8A1, POSTN, FBLM1 and BGN expression are direct (or indirect, e.g. due to FN disruption) would be important to understand SAM68 function.
      3. The title of the manuscript states that SAM68 modulates the morphogenetic program of endothelial cells, yet there are no studies of blood vessel morphogenesis described by this work. Ultimately, in vivo studies of vessel development in SAM68 mutant mice would be required to be able to make this claim.
      4. Loss of SAM68 expression in other cell types is know to perturb migration, whereas migration is enhanced in endothelial cells upon SAM68 knockdown. Why would this be the case? Is it that the proposed negative impact of FN production on motility is greater than the positive impact of SAM68 focal adhesion dynamics in endothelial cells versus other cell types? Exploration of the relative impact of these proposed dual functions (using additional experiments as mentioned above) is critical to make sense of these somewhat conflicting observations.
      5. Are the suggested experiments realistic in terms of time and resources?

      Yes, although this will depend on local availability of personnel and resources.<br /> - Are the data and the methods presented in such a way that they can be reproduced?

      Yes<br /> - Are the experiments adequately replicated and statistical analysis adequate?

      Yes

      Minor comments:

      • Specific experimental issues that are easily addressable.

      n/a<br /> - Are prior studies referenced appropriately?

      Yes<br /> - Are the text and figures clear and accurate?

      Yes<br /> - Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      n/a

      Significance

      • Describe the nature and significance of the advance (e.g. conceptual, technical, clinical) for the field.

      Here, Rekad and colleagues identify SAM68 as a potential key modulator of focal adhesion dynamics, as its knockdown impacts focal adhesion maturation, signaling and delivery of beta-actin mRNA. Moreover, the authors identify SAM68 expression as being critical for correct ECM gene expression and assembly. Finally, Rekad show that loss of SAM68 expression impacts endothelial cell migration and sprouting in in vitro assays. Overall, these observations identify SAM68 as a key regulator of endothelial cell-ECM interactions that could play an important role in regulating endothelial cell morphogenesis in vivo. However, the mechanistic basis of SAM68 function in focal adhesion dynamics and ECM remodeling still remain unclear. Additionally, if these activities reflect direct actions of SAM68 on focal adhesions and/or ECM expression/remodeling or are indirect consequences of one effect on the other remains unclear. Finally, relevance to blood vessel morphogenesis in vivo also remains unclear.<br /> - Place the work in the context of the existing literature (provide references, where appropriate).

      SAM68 has previously been identified as an RNA-binding protein associated with the 'adhesome' that regulates cell motility (Huot et al., 2009a, Locatelli and Lange, 2011, Naro et al., 2022). Here, Rekad and colleagues also probe the action of SAM68 in endothelial cell migration, but find this to be enhanced upon SAM68 knockdown - unlike previous studies demonstrating a reduction in motility in similar experiments in other cell types. Indeed, a detailed discussion of this discrepancy would have been appreciated. However, the work by Rekad et. al. goes further than previous studies to convincingly demonstrate that SAM68 expression impacts cell-ECM interactions - although the mechanisms of this action are les clear. Previous work has implicated phosphorylation of SAM68 as a key trigger of its activity (Locatelli and Lange, 2011, Naro et al., 2022). Additional work exploring the impact of SAM68 phosphorylation on focal adhesion dynamics and ECM gene expression/remodeling (e.g. using phospho-mutants) in this manuscript would have strengthened the message.<br /> - State what audience might be interested in and influenced by the reported findings.

      The work would be of interest to audiences studying the molecular basis of cell-ECM interactions and/or the broader vascular biology field.<br /> - Define your field of expertise with a few keywords to help the authors contextualize your point of view. Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      Keywords: Vascular biology. Endothelial. Vascular development.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      This interesting manuscript by Rekad et al. explores unappreciated roles of SAM68 in the context of endothelial biology and angiogenesis. The authors apply a series of in vitro assays to demonstrate cytoplasmic and nuclear functions of SAM68 during endothelial cell adhesion, remodelling and ECM deposition. In accordance with other studies, SAM68 localises to the cell periphery during early stages of adhesion, and it is involved in integrin activity. In its absence, FAK signalling is impaired and focal adhesions fail to mature. The involvement of SAM68 in adhesion goes hand-in-hand with its RNA localisation roles during the distribution of B-actin transcripts towards sites of adhesion. On the other hand, SAM68 induces FN deposition, presumably via the positive regulation of FN1 transcription. The abundance of cellular isoforms of FN1 mRNAs are particularly reduced in the absence, suggesting a role in alternative splicing. The authors claim that this is achieved through transcription regulation rather than direct modulation of splicing factors. Other transcripts downstream of SAM68 include basement membrane components, suggesting that its nuclear activity serves as another level of control of adhesion. Finally, the authors claim that whilst the loss of SAM68 enhances motility in 2D, presumably due to the reduced ECM deposition, it reduces endothelial cell invasion in 3D models of angiogenesis.

      Major comments:

      Overall, the article is well written, clear and the findings are supported by carefully designed experiments. The statistical methods seem adequate, and the information provided is likely to allow reproducibility. However, some of results are rather preliminary and could be supported by further experimental work:

      • deHoog et al. 2004 (doi.org/10.1016/S0092-8674(04)00456-8) had shown the presence of SAM68 in SICs. Why do the authors believe that the presence of SAM68 in the periphery in endothelial cells does not mark the formation of SICs n these cells?
      • The authors claim that SAM68 interacts with B-actin mRNA to delivery to sites of adhesion only based on siRNA-mediated knockdown experiments. Is the binding of SAM68 to B-actin dynamic process that changes with time? The authors could perform RIP experiments at different stages of cell adhesion - from early points when SAM68 is peripheric to later stages when it homogeneously distributed - to show a potential dynamic interaction with B-actin mRNA.
      • The article would substantially benefit from live visualisation of B-actin localisation with MS2 tagged transcripts in SAM68 knockdown contexts. This would solidify the proposed mRNA delivery SAM68-mediated mechanism. Although this should not be hard to carry out given the availability of MS2-labelled animals, I understand access to the tools may constitute a major hurdle.
      • In Shestakova et al 2001 (doi.org/10.1073/pnas.121146098), decreased localisation of B-actin mRNA leads to reduced persistence of direction of movement. Was this measured? Is this not seen here because SAM68 is only responsible for B-actin mRNA localisation at early stages of adhesion?
      • Although the authors claim that altered ECM deposition in SAM68 deficient cells results from altered transcription, they do not address potential misregulation of translation and secretion. In, fact the highlight that whilst the level of some mRNAs encoding basement membrane proteins do not decrease in the absence of SAM68, their incorporation was severely affected. This is worth exploring to strengthen the manuscript.
      • Whilst the data presented in figure 7 is convincing, some more detailed mechanistic analyses could help further comprehend 2D and 3D behaviours. Could it be that the nuclear and cellular roles of SAM68 are somewhat decoupled depending on the environment? Could the RNA localisation functions have a critical role in endothelial sprouting and not so much in 2D migration? Some insights are needed to address these questions and wrap up some loose ends. In its current form, this section of the manuscript is too vague.
      • The authors state that "...both submembranous functions [...] and nuclear functions [...] of SAM68 contribute to the morphogenetic phenotype of angiogenic endothelial cells. Some caution must be taken, as all previous data were obtained from 2D experiments. At this stage it cannot be excluded other mechanisms involved in 3D migration.

      Minor comments:

      • "Figure 2-figure supplement 1" in page should read "movie supplement 1"
      • Could the authors run the eGFP-SAM68 movies for longer periods to show the dynamic localisation of the protein during spreading? These experiments would support the data based on fixed material.
      • In Figure 1F, there is a drop in luciferase activity in cells transfected with higher amounts of vector, rather than an increase with SAM68. Why?
      • The authors claim (and rightly so) that plasmids are hard to transfect into HUVECs when describing luciferase reporter assays. However, they express eGFP-SAM68 (presumably from a plasmid).
      • Some experimental details in the figure legends could be restricted / moved to the methods section.
      • Some typos and British/American spelling inconsistencies (e.g. localisation and localization) need to be corrected throughout the manuscript.
      • Statistical analysis details could be mentioned in figure legends.
      • Y axes in some graphs do not start at 0, which may mislead visual interpretations.
      • In page 11, "... proposed to be involved in regulation of early cell adhesion processes and spreading" needs referencing.

      Significance

      This is a very interesting study that details the multi-layered activity of the RBP SAM68. Although many of the individual roles had been unveiled in other cell types, here the authors suggest that this protein simultaneously orchestrates several aspects of endothelial cell adhesion via distinct routes. Thus, the study may be most relevant for researchers working in the fields of developmental and pathological angiogenesis.

      However, the work falls short from being a conceptual advance in its current form, as some conclusions are not fully backed by experimental evidence.

      My expertise: RNA localisation

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      We appreciate the efforts the two reviewers had invested in reviewing our manuscript. Their constructive comments will help improve the paper overall.

      2. Description of the planned revisions

      Insert here a point-by-point reply that explains what revisions, additional experimentations and analyses are planned to address the points raised by the referees.

      Reviewer #1 (Evidence, reproducibility and clarity):

      The main point of the current report is that 6mA is present in DNA of Hydractinia, and is introduced randomly into the genome by DNA polymerases, originating from degradation of maternally provided RNA via nucleotide salvage pathway. The authors observed that 6mA levels are changing over development and peak at 16-cell stage, with a sudden decrease to 'background levels' at 64 cell stage, a stage when zygotic genome gets activated. The 6mA drop is Alkbh1 dependent, since upon K/D of Alkbh1, 6mA levels were significantly higher than in control embryos. Authors also observed that AlkbH1 K/D delays zygotic genome activation (ZGA) to later stages, but without any noticeable consequences for the proper development. To demonstrate that 6mA is not controlled via direct DNA methylation, they show that K/D of two potential DNA methyl transferases N6amt1 and Mettl4 does not have any effect on 6mA levels. Supporting their hypothesis, authors demonstrate high activity and imperfect selectivity towards non-modified nucleotides of salvage pathway during embryo development using EU labeling experiments.<br /> In general, the provided data support their model, however, the paper needs some improvements to include missing information and controls before publication.

      Major comments:<br /> 1. Fig 1A shows a schematic where D3-6mA is added to only QTRAP but not QQQ experiment, usually QQQ methods also require isotopic standards for each component quantified to normalize for ionization differences and provide true quantitative information. Why did authors not use dA isotope? The ionization suppression is more pronounced at high concentrations of the components, which is true for dA in the current set up. How do authors control or at least test this?

      We have limited resources of isotopic-labelled standards. Therefore, we initially used QQQ without these standards to obtain data that covered many time points in development to identify the general pattern and key time points of high and low 6mA. Once the QQQ indicated that the 16-cell stage has the highest 6mA and that this drops to background at the 64-cell stage (and remains so later on), we performed QTRAP with the isotope-labeled standard control only for these two stages. Looking at the data resulting from both techniques, it appears that they essentially revealed the same pattern. Since the main focus of the study is on 16- and 64-cell embryos, we feel that the contribution of performing all stages by QTRAP would be marginal. We have performed control experiments to assess ionization suppression for dA and found that it was insignificant. We will add the corresponding data to the Materials and Methods section.

      Fig S1 show that quantification works well, but were the total DNA amounts comparable to the gDNA amount used in actual samples? If yes, please indicate so.

      Yes, the amounts were the same (2mg). We will change the methods sections accordingly.

      1. In line 68 and in fig 1B, 1C there is a mysterious 'Neg. Ctrl 'sample. It is unclear what was the sample and more interestingly in fig 1B the levels in this sample are 0.015% but in fig 1C it is much below 0.001%. Why there is such a striking difference for the identical sample.

      Negative controls were the same amounts (2 µg) of oligonucleotides without 6mA, DNAse-treated exactly like the samples. Figure 1B shows that QQQ is not sensitive enough to reliably detect 6mA concentrations below 0.02%, incapable to distinguish the background 6mA in the negative control from the level of 6mA in the 64-cell stage and later. Therefore, we utilized D3-6mA labelled QTRAP (Figure 1C) and determined that the level in the 64 cells stage embryos was actually ~0.01%. In the negative control, the amount was considerably lower, around 6 ppm (0.0006%).

      1. As I can see authors measured natural isotopologue of 6mA, however traces of contaminant bacterial DNA originating even from recombinant DNA degradation enzymes also have 6mA, giving background signal. In their LC/MS experiments, did authors check if the 6mA comes truly from the gDNA and not from contaminant during DNA purification and processing before MS?

      Yes, we did. As control for the level of 6mA contamination from the enzymatic digestion (sourced from bacteria), we also performed digestion of the negative control (see also answer to previous comment).

      1. Fig 1D in the legend: authors should indicate that samples were already RNAse treated, and Line 80 in the text mentions a second RNase treatment (fig S1C) to confirm the specificity of the DNA staining.

      The samples were indeed RNase-treated. We will modify the legend and the reference to figure 1D on line 80 accordingly.

      1. In lines 86-87, authors compare the LC/MS and sequencing based quantifications, and say they are consistent. Can authors make a figure analogous to fig 1B but using sequencing data?

      The data are already provided in Figure S1E. However, we used a Venn diagram to denote that these figures were generated by a different type of analysis (SMRT-sequencing as opposed to QTRAP). They are consistent but not identical.

      1. Fig 3B and 3C, controls showing the validity of EU staining, are required, such as RNAse treated sample with a signals disappearing; or control embryos without EU, thus having only background signal.

      Indeed, Fig 3C shows an RNase treated sample in which the EU signal is abolished as expected.

      1. Fig 3D specificity control is missing, control embryos without EdU having only background signal.

      The control is provided in Figure 3B. It shows a sample without EdU (treated with EU) and shows the background signal.

      1. Fig 4A legend: 'rescue solution (see text)'. Please describe in the legend what the solution was. Moreover, I did not find clear explanation in the text either, my only guess was from the materials in methods section, where authors used both shAlkbh1 and Alkbh1 mRNA with silent mutations.

      The reviewer is right, this was indeed the control that was used. We will modify the text to clarify this point.

      1. Fig 4B shows many data points per condition and the legend says EU signals (in triplicate), was these triplicate animals with multiple cells, where EU signal from each cell was plotted as a point? Please specify in the legend.

      Yes, triplicate embryos and each cell used as point. The legend will be adapted.

      1. Lines 169-170 state 'the lack of premature ZGA following N6amt1/Mettl4 knockdown (Figure S7B) indicate a lack of methyl transferase that maintains 6mA through embryogenesis' while an experiment indeed demonstrates that these are not the major players in this process, it does not prove these are not DNA methyl transferases. The absence of evidence is not the evidence of absence. I think authors should at least soften this conclusion.

      We agree and will tone down the relevant statement.

      1. Discussion section describes many experimental data that belong to Results section.

      This is a point also raised by Reviewer #2. We will move these points to the results and expand the discussion.

      1. Fig S8 I think should be a part of the main figure since it is one of the important experiments to prove the high activity and somewhat low selectivity of salvage pathway in the embryos during the critical early stages.

      We had originally left it out to save space. We prefer to leave this decision with the editor.

      1. Fig 5C the model is confusing, authors should improve it.

      It is difficult to describe a complex story using a single static model. Therefore, we will add an animation to the supplemental material to clarify the model.

      1. Fig S8 negative controls showing the specificity of CuAAC staining are missing: control animals/ embryos without EU.

      We will redo these experiments and include appropriate controls.

      1. Authors may find this reference useful: PMID: 32355286.

      We will add this ref.

      1. It is known that in mammals ADAL protein is the one which demethylates m6A nucleotide to clear it from the nucleotide pools and prevent it entering into the salvage pathway (PMID: 29884623). Does Hydractinia Symbiolongicarpus have an ADAL analog? If yes then it would be important to see if knock down/overexpression of this enzyme has any effect on the timing of ZGA. In principle, passively introduced 6mA may be regulatory to proper time the ZGA, and is controlled via an activity of Adal and Alkbh1.

      The gene is present in the Hydractinia genome. We could perform the experiments recommended. We will knock the gene down and look at the effect of this manipulation on ZGA.

      1. Material and methods are missing information:<br /> a. Line 370-371 provide references to the protocols listed or describe the steps.<br /> b. Line 373 standard column based purification protocol, what is it either explain or provide a reference.

      References will be provided.

      Minor points:<br /> Line 79 : 'Fig 1D and S1B', Did authors meant 'Fig1D and S1C'?<br /> Fig 5A Y axis title is missing.<br /> Line 379: 3D1-6mA should be D3-6mA please correct the other appearances as well.<br /> Line 405: terms : dsDNA solutions and standard solutions are confusing please rephrase.<br /> Line 410: Cleaned embryos, what does cleaned mean, be specific.<br /> Line 413: PTx is mentioned, please explain what is it.<br /> Line 415 and line 440 : HCl was washed and embryos were neutralized, I guess it should state : HCl was neutralized and embryos were washed with...'<br /> Line 431: ' before fixed by incubation in PAGA-T..." did authors meant : 'before fixation with PAGA-T...?<br /> Line 435: Permeabilization was done by further washes the fixed embryos with...", did authors meant: Permeabilization was done by an additional wash of the fixed embryos with...?<br /> Line 440: The HCL was washed with what solution?<br /> Line 446: For how long were the PTx washes?<br /> Lines 458-460: the sentence is confusing.<br /> Line 500: 'then used detect' should be 'then used to detect'

      We will adopt all minor points above.

      Reviewer #1 (Significance):

      There are many high profile papers describing the existence of 6mA in gDNA of different organism including insects and mammals. However, there is no proof that it has any biological function. Indeed, recent reports (PMID: 32355286 and 32203414) indicate that in mammalian cells, 6mA is indeed primarily incorporated by DNA polymerases and originates from a salvage pathway. The present report is the first in vivo evidence that confirms this to be the case more generally and, importantly, demonstrates a 6mA effect on ZGA. Hence, this is an important and timely report, which will be interesting to the field, as well as a broad audience to clarify the role of 6mA and the mechanism whereby it is introduced into gDNA.<br /> My expertise: Biochemistry and biology of DNA and RNA modifications, including 6mA. Fair expertise: bioinformatics analysis.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript reports developmental dynamics of DNA 6mA in the cnidarian Hydractinia symbiolongicarpus. The authors describe an event of a seemingly random accumulation of this DNA modification in 16-cell stage embryos of Hydractinia symbiolongicarpus followed by an apparent clearance of 6mA by the 64-cell stage. Interestingly, the depletion of cnidarian orthologue of the putative 6mA 'demethylase', Alkbh1, results in delay in zygotic transcription accompanied by high levels of DNA 6mA in 64-cell stage cnidarian embryos. The authors suggest that the 6mA they observe originates from random misincorporation of recycled degraded m6A-marked ribo-nucleotides during early cnidarian embryogenesis.<br /> Overall, most of the experiments are performed at high technical level and the paper is generally nicely written. Despite this, in my opinion, the manuscript would benefit from incorporation of several addition controls and answering a number of points on the description/presenation of the data.<br /> Major comments:

      1. In the present version of the manuscript, the authors demonstrate the negative correlation between the presence of 6mA in the genome of cnidarian embryos and transcription. Although, the depletion of Alkbh1 leads to the delay in ZGA, strictly speaking, this effect may be independent of the catalytic function of Alkbh1. Therefore, to make a statement that m6A "random incorporation into the early embryonic genome inhibits transcription" the authors should use a catalytically inactive form of this enzyme as a control in the corresponding experiments and/or (ideally) perform in vitro transcription assays using 6mA-containing substrates.

      We could perform shRNA-mediated Alkbh1 KD and try rescue ZGA by co-injecting a catalytically-inactive Alkbh1 mRNA.

      The suggested in vitro experiment would be inconclusive for two reasons: first, Hydractinia polymerase may respond differently to 6mA; second, 6mA-mediated transcription inhibition could be indirect, requiring the in vivo context. We would like to add that transcription inhibition of 6mA has been demonstrated in vitro using yeast DNA polymerase as cited in the paper.

      1. Despite several experiments suggesting that random incorporation of recycled ribonucleotides occurs in cnidarian embryos, the source of 6mA in their DNA seems currently unclear. Would it be possible to directly test the author's hypothesis by comparing the levels of 6mA upon maternal (and possibly zygotic) depletion of the cnidarian orthologue of RNA m6A methyltransferase Mettl3 in cnidarian embryos? Alternatively, the authors could incubate the embryos in medium supplemented with labeled ribo-m6A followed by checking the levels of DNA 6mA in the embryonic DNA?

      We show that maternal mRNAs are already methylated in the early embryo (Figure 5). Therefore, it would indeed make sense to ablate Mettl3 from the maternal tissue while maternal mRNAs are methylated. However, in the absence of a conditional knockout technique in Hydractinia, this would require generation of CRISPR-Cas9 mutants that would likely die early in their development, long before reaching sexual maturity.

      Instead, we are happy to perform the other experiment suggested by the reviewer to directly demonstrate m6A to 6mA transition.

      Minor comments:<br /> 1. It would be nice to complement Fig. 4, 5, and S7 with immunostaining of the corresponding embryos for 6mA.

      6mA immunostaning is not compatible with EU labeling because, first, they require different types of fixation (PAGA-T vs formaldehyde); second, immunostaining requires RNase treatment to remove m6A which would also remove the EU signal.

      1. The current Discussion contains references for several figures with experimental results. I suggest separating these experimental data from the Discussion. The authors should, in my opinion, make an additional Results chapter and, if possible, expand the Discussion section (that is currently minimal) speculating on significance of their results for different biological systems.

      This has also been requested by Reviewer #1. We will follow the reviewer's recommendation.

      1. The present Title reads like a clear overstatement (at least currently, please see major comments above). The Title should also reference the organism where the observations have been made.

      Following the revision, we believe that both random incorporation of 6mA and a delay in zygotic transcription will be well supported by our data. We will add the organism's name to the title as suggested.

      Reviewer #2 (Significance):

      The presence and significance of DNA 6mA in animal genomes is a very interesting and highly controversial topic. Although a number of studies suggest that relatively high levels of this DNA modification occur in multicellular eukaryotes in different biological/functional contexts, other reports challenged these observations attributing them to different experimental artifacts. In this context, the current paper that provides high quality novel experimental data on the developmental dynamics of DNA 6mA in cnidarian is extremely interesting and timely. Moreover, the author's results and the hypotheses on the function/origin of 6mA in cnidarian embryogenesis may provide a conceptual framework for the interpretation of other 6mA/m6A-related studies performed on different experimental models. Thus, this manuscript will definitely be of interest for a wide range of researchers working in the fields of epigenetics, cancer biology and developmental biology.<br /> I strongly believe that this is an interesting and important study that definitely deserves to be published in a high impact journal.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. If no revisions have been carried out yet, please leave this section empty.

      4. Description of analyses that authors prefer not to carry out

      Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      Reviewer #2 suggested four experiments, three of which are either impossible in our system or expected to reveal insignificant information. First, the reviewer suggests ablating Mettl3 from the maternal tissue. While being a good idea in principle, there is no conditional ablation technique available for Hydractinia. Generating CRISPR-Cas9 mutants would likely result in embryonic lethality, long before sexual maturation has been reached.

      Second, the reviewer proposed to perform in vitro experiments with m6A-containing substrates. These experiments are unlikely to reveal useful data since the Hydractinia polymerase may respond differently to methylated adenine than commercially available polymerases. Also, transcription inhibition may be indirect, depending on the in vivo context that cannot be mimicked in vitro.

      Finally, the reviewer suggested expressing a catalytically-dead Alkbh1 in the background of endogenous Alkbh1 knockdown to demonstrate that its function depends on the enzymatic activity to remove 6mA from the genome. While we could perform the experiment (see our reply above), the information emanating from it would arguably be outside the scope of this study.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript reports developmental dynamics of DNA 6mA in the cnidarian Hydractinia symbiolongicarpus. The authors describe an event of a seemingly random accumulation of this DNA modification in 16-cell stage embryos of Hydractinia symbiolongicarpus followed by an apparent clearance of 6mA by the 64-cell stage. Interestingly, the depletion of cnidarian orthologue of the putative 6mA 'demethylase', Alkbh1, results in delay in zygotic transcription accompanied by high levels of DNA 6mA in 64-cell stage cnidarian embryos. The authors suggest that the 6mA they observe originates from random misincorporation of recycled degraded m6A-marked ribo-nucleotides during early cnidarian embryogenesis.

      Overall, most of the experiments are performed at high technical level and the paper is generally nicely written. Despite this, in my opinion, the manuscript would benefit from incorporation of several addition controls and answering a number of points on the description/presenation of the data.

      Major comments:

      1. In the present version of the manuscript, the authors demonstrate the negative correlation between the presence of 6mA in the genome of cnidarian embryos and transcription. Although, the depletion of Alkbh1 leads to the delay in ZGA, strictly speaking, this effect may be independent of the catalytic function of Alkbh1. Therefore, to make a statement that m6A "random incorporation into the early embryonic genome inhibits transcription" the authors should use a catalytically inactive form of this enzyme as a control in the corresponding experiments and/or (ideally) perform in vitro transcription assays using 6mA-containing substrates.
      2. Despite several experiments suggesting that random incorporation of recycled ribonucleotides occurs in cnidarian embryos, the source of 6mA in their DNA seems currently unclear. Would it be possible to directly test the author's hypothesis by comparing the levels of 6mA upon maternal (and possibly zygotic) depletion of the cnidarian orthologue of RNA m6A methyltransferase Mettl3 in cnidarian embryos? Alternatively, the authors could incubate the embryos in medium supplemented with labeled ribo-m6A followed by checking the levels of DNA 6mA in the embryonic DNA?

      Minor comments:

      1. It would be nice to complement Fig. 4, 5, and S7 with immunostaining of the corresponding embryos for 6mA.
      2. The current Discussion contains references for several figures with experimental results. I suggest separating these experimental data from the Discussion. The authors should, in my opinion, make an additional Results chapter and, if possible, expand the Discussion section (that is currently minimal) speculating on significance of their results for different biological systems.
      3. The present Title reads like a clear overstatement (at least currently, please see major comments above). The Title should also reference the organism where the observations have been made.

      Significance

      The presence and significance of DNA 6mA in animal genomes is a very interesting and highly controversial topic. Although a number of studies suggest that relatively high levels of this DNA modification occur in multicellular eukaryotes in different biological/functional contexts, other reports challenged these observations attributing them to different experimental artifacts. In this context, the current paper that provides high quality novel experimental data on the developmental dynamics of DNA 6mA in cnidarian is extremely interesting and timely. Moreover, the author's results and the hypotheses on the function/origin of 6mA in cnidarian embryogenesis may provide a conceptual framework for the interpretation of other 6mA/m6A-related studies performed on different experimental models. Thus, this manuscript will definitely be of interest for a wide range of researchers working in the fields of epigenetics, cancer biology and developmental biology.<br /> I strongly believe that this is an interesting and important study that definitely deserves to be published in a high impact journal.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The main point of the current report is that 6mA is present in DNA of Hydractinia, and is introduced randomly into the genome by DNA polymerases, originating from degradation of maternally provided RNA via nucleotide salvage pathway. The authors observed that 6mA levels are changing over development and peak at 16-cell stage, with a sudden decrease to 'background levels' at 64 cell stage, a stage when zygotic genome gets activated. The 6mA drop is Alkbh1 dependent, since upon K/D of Alkbh1, 6mA levels were significantly higher than in control embryos. Authors also observed that AlkbH1 K/D delays zygotic genome activation (ZGA) to later stages, but without any noticeable consequences for the proper development. To demonstrate that 6mA is not controlled via direct DNA methylation, they show that K/D of two potential DNA methyl transferases N6amt1 and Mettl4 does not have any effect on 6mA levels. Supporting their hypothesis, authors demonstrate high activity and imperfect selectivity towards non-modified nucleotides of salvage pathway during embryo development using EU labeling experiments.

      In general, the provided data support their model, however, the paper needs some improvements to include missing information and controls before publication.

      Major comments:

      1. Fig 1A shows a schematic where D3-6mA is added to only QTRAP but not QQQ experiment, usually QQQ methods also require isotopic standards for each component quantified to normalize for ionization differences and provide true quantitative information. Why did authors not use dA isotope? The ionization suppression is more pronounced at high concentrations of the components, which is true for dA in the current set up. How do authors control or at least test this? Fig S1 show that quantification works well, but were the total DNA amounts comparable to the gDNA amount used in actual samples? If yes, please indicate so.
      2. In line 68 and in fig 1B, 1C there is a mysterious 'Neg. Ctrl 'sample. It is unclear what was the sample and more interestingly in fig 1B the levels in this sample are 0.015% but in fig 1C it is much below 0.001%. Why there is such a striking difference for the identical sample.
      3. As I can see authors measured natural isotopologue of 6mA, however traces of contaminant bacterial DNA originating even from recombinant DNA degradation enzymes also have 6mA, giving background signal. In their LC/MS experiments, did authors check if the 6mA comes truly from the gDNA and not from contaminant during DNA purification and processing before MS?
      4. Fig 1D in the legend: authors should indicate that samples were already RNAse treated, and Line 80 in the text mentions a second RNase treatment (fig S1C) to confirm the specificity of the DNA staining.
      5. In lines 86-87, authors compare the LC/MS and sequencing based quantifications, and say they are consistent. Can authors make a figure analogous to fig 1B but using sequencing data?
      6. Fig 3B and 3C, controls showing the validity of EU staining, are required, such as RNAse treated sample with a signals disappearing; or control embryos without EU, thus having only background signal.
      7. Fig 3D specificity control is missing, control embryos without EdU having only background signal.
      8. Fig 4A legend: 'rescue solution (see text)'. Please describe in the legend what the solution was. Moreover, I did not find clear explanation in the text either, my only guess was from the materials in methods section, where authors used both shAlkbh1 and Alkbh1 mRNA with silent mutations.
      9. Fig 4B shows many data points per condition and the legend says EU signals (in triplicate), was these triplicate animals with multiple cells, where EU signal from each cell was plotted as a point? Please specify in the legend.
      10. Lines 169-170 state 'the lack of premature ZGA following N6amt1/Mettl4 knockdown (Figure S7B) indicate a lack of methyl transferase that maintains 6mA through embryogenesis' while an experiment indeed demonstrates that these are not the major players in this process, it does not prove these are not DNA methyl transferases. The absence of evidence is not the evidence of absence. I think authors should at least soften this conclusion.
      11. Discussion section describes many experimental data that belong to Results section.
      12. Fig S8 I think should be a part of the main figure since it is one of the important experiments to prove the high activity and somewhat low selectivity of salvage pathway in the embryos during the critical early stages.
      13. Fig 5C the model is confusing, authors should improve it.
      14. Fig S8 negative controls showing the specificity of CuAAC staining are missing: control animals/ embryos without EU.
      15. Authors may find this reference useful: PMID: 32355286.
      16. It is known that in mammals ADAL protein is the one which demethylates m6A nucleotide to clear it from the nucleotide pools and prevent it entering into the salvage pathway (PMID: 29884623). Does Hydractinia Symbiolongicarpus have an ADAL analog? If yes then it would be important to see if knock down/overexpression of this enzyme has any effect on the timing of ZGA. In principle, passively introduced 6mA may be regulatory to proper time the ZGA, and is controlled via an activity of Adal and Alkbh1.
      17. Material and methods are missing information:
      18. a. Line 370-371 provide references to the protocols listed or describe the steps.
      19. b. Line 373 standard column based purification protocol, what is it either explain or provide a reference.

      Minor points:

      Line 79 : 'Fig 1D and S1B', Did authors meant 'Fig1D and S1C'?

      Fig 5A Y axis title is missing.

      Line 379: 3D1-6mA should be D3-6mA please correct the other appearances as well.

      Line 405: terms : dsDNA solutions and standard solutions are confusing please rephrase.

      Line 410: Cleaned embryos, what does cleaned mean, be specific.

      Line 413: PTx is mentioned, please explain what is it.

      Line 415 and line 440 : HCl was washed and embryos were neutralized, I guess it should state : HCl was neutralized and embryos were washed with...'

      Line 431: ' before fixed by incubation in PAGA-T..." did authors meant : 'before fixation with PAGA-T...?

      Line 435: Permeabilization was done by further washes the fixed embryos with...", did authors meant: Permeabilization was done by an additional wash of the fixed embryos with...?

      Line 440: The HCL was washed with what solution?

      Line 446: For how long were the PTx washes?

      Lines 458-460: the sentence is confusing.

      Line 500: 'then used detect' should be 'then used to detect'

      Significance

      There are many high profile papers describing the existence of 6mA in gDNA of different organism including insects and mammals. However, there is no proof that it has any biological function. Indeed, recent reports (PMID: 32355286 and 32203414) indicate that in mammalian cells, 6mA is indeed primarily incorporated by DNA polymerases and originates from a salvage pathway. The present report is the first in vivo evidence that confirms this to be the case more generally and, importantly, demonstrates a 6mA effect on ZGA. Hence, this is an important and timely report, which will be interesting to the field, as well as a broad audience to clarify the role of 6mA and the mechanism whereby it is introduced into gDNA.

      My expertise: Biochemistry and biology of DNA and RNA modifications, including 6mA. Fair expertise: bioinformatics analysis.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      PIWI-interacting RNAs (piRNAs) are required for transposon repression and are transcribed from discrete genomic loci termed piRNA clusters. Torimochi was identified as a piRNA cluster in silkworm in 2012, but the incomplete genome assembly hindered its further characterisation. Here, Shoji and colleagues characterised torimochi using the current, recently improved, genome assembly, combined with long-read (MinION) and Sanger sequencing. This reveals that torimochi is a regular Gypsy LTR transposon. Comparison of copy number across strains reveals that torimochi has been particularly active in the BmN4 cell line, showing different insertions between strains. Moreover, piRNAs are produced from multiple torimochi copies across the genome. Lastly, the authors show that torimochi has an open chromatin conformation. The authors propose that torimochi may be a young and still growing piRNA cluster, capable of both trapping other transposable elements and transgenes and of producing piRNAs.

      Major comments

      How are the torimochi-derived piRNAs produced? Which part of the piRNA pathway are required for their production? Determining this would significantly strengthen the study and potentially support the idea that torimochi is a "young and still growing piRNA cluster". Currently, it is unclear what evidence there is for torimochi acting as a piRNA cluster rather than a regular LTR transposon.

      We thank the Reviewer for raising this important point. We have now re-analyzed our piRNA sequencing data and confirmed that 1) production of torimochi-derived piRNAs requires Siwi, the core PIWI protein component in silkworms and 2) torimochi-derived piRNAs show the ping-pong signature, as observed for other typical piRNAs. These data strengthen the idea that torimochi is a cluster that produces canonical piRNAs.

      As originally shown in Fig. 4, torimochi is the most actively translocated transposon in BmN4 cells with extremely high transcription and piRNA production levels and open chromatin structure, thereby representing those transposons that have gained the piRNA production activity in BmN4 cells. To further investigate if torimochi has any special features even among those piRNA-producing transposons in BmN4 cells, we have now performed a new analysis. It is known that well-established piRNA clusters in Drosophila (e.g., the 42AB cluster) have a specialized system for transcriptional activation. However, those specialized transcriptional activators such as Rhino (HP1 variant) and Moonshiner (TFIIA variant) are conserved only within the Drosophila genus, and thus the transcriptional activation systems of piRNA clusters are likely to be different in different organisms. Keeping this in mind, we asked if the transcription mechanism of torimochi is any different from other piRNA-producing transposons in BmN4 cells. Since specific transcriptional activators of piRNAs clusters remain unknown in silkworms (as in many other animals except for Drosophila), we decided to differentiate BmN4 cells into adipocytes so that they lose their “germline-ness” (Akiduki et al., 2007). As expected, the expression of the adipocyte marker BmFABP1 (Fatty Acid-Binding Protein 1) was markedly increased (Fig. 5a), while the expression levels of piRNA-related factors such as Vasa were decreased (Fig. 5b). Importantly, transcription of torimochi was drastically reduced by adipocyte differentiation (Fig. 5c), whereas most other transposons, including those piRNA-producing transposons in BmN4 cells, remained unrepressed or rather increased by differentiation (Fig. 5c and 5d). These findings suggest that, even among those piRNA-producing transposons in BmN4 cells, torimochi has started to gain a specialized, germline-specific transcriptional activation system and thus can be used as a good model as a “young and still growing piRNA cluster.” We will include these data and discussion in the revised manuscript.

      In figure 1F, the positive control (P50T) is missing. Based on the description, this one should show a band, but doesn't or at least doesn't do very clearly. The authors need to repeat this assay.

      We agree that the P50T band was quite faint, although it was clearly present at the expected molecular size. We will repeat this assay with more PCR cycles so that the band will appear more clearly.

      The authors should perform a qPCR (or similar assay) on the different torimochi loci (and across different strains) to assess their individual transcriptional activity. Generally, showing that torimochi is an active transposable element is crucial to support the claim that it is still expanding.

      We have now re-analyzed our RNA-seq data to assess the individual transcriptional activity of different torimochi loci. We found that, as expected, torimochi mRNAs are a mixture of transcripts from various loci, just like torimochi-derived piRNAs. We will include these data in the revised manuscript.

      I would also recommend the authors to perform ping-pong analysis on all piRNAs mapping to torimochi. The hypothesis that torimochi acts as a piRNA cluster would be supported showing phased biogenesis, and a lack of a ping-pong signature (i.e., 10A). Please provide evidence that the piRNAs mapping to the different torimochi insertions are not produced via Post Transcriptional Gene Silencing.

      We would like to note that silkworms have no homolog of Drosophila Piwi, the PIWI protein that is specialized for the phased piRNA biogenesis pathway. Instead, silkworm Siwi participates in both the ping-pong pathway and the phased piRNA pathway (Izumi and Shoji et al., 2020). As expected, torimochi-derived piRNAs show both the ping-pong signature and the head-to-tail phasing signature in Trimmer knockout BmN4 cells. We would also like to note that, even in Drosophila, dual-strand piRNA clusters (e.g., 42AB) are known to show the ping-pong signature, while uni-strand piRNA clusters (e.g., flamenco) lack it (refs).

      Line 265; "Torimochi has the open chromatin structure and can trap foreign transgenes as well as endogenous transposons" - The evidence for "trapping" transposable elements is circumstantial. Transposons are known to insert into each other. One occasion of a transgene inserting in torimochi is not strong enough evidence to support the made claim.

      We appreciate the Reviewer’s concern. We would like to note that, in the previous paper (Kawaoka et al., 2009), the GFP transgene was inserted into torimochi (not once but) at least three times independently; there were three out of eight independent lines that contained the GFP transgene inserted into torimochi for piRNA-mediated silencing. This observation highlights the especially efficient “trapping” ability of torimochi. We will revise the text to clarify this point.

      Please provide a size distribution of all the piRNAs that are mapping on torimochi. In the methods section it is stated that small-RNAs of length 20-42 nt are mapped. This range is too generous as it also includes siRNA on the low end, and other ncRNAs on the long end. Please use the appropriate piRNA size range, i.e., 23-30 nt.

      We will be happy to include the size distribution data of all small RNAs mapped to torimochi, which shows that only 6% of them are siRNAs (~21 nt) and the majority (82%) of them can be considered as piRNAs (23–32 nt).

      Please include the sequences of the newly identified transposon families.

      We will be happy to determine the exact sequences of the newly identified transposons in BmN4 cells by PCR and Sanger sequencing and deposit them in a public database.

      Minor comments

      Line 71-72; "However, it was recently shown that these large piRNA clusters are evolutionarily labile and mostly dispensable for transposon suppression", this is misleading in the context of flamenco since flamenco is essential for transposon suppression. Please rephrase.

      We agree that flamenco is essential for transposon suppression in the somatic follicle cells in Drosophila, and we will rephase the sentence accordingly.

      Line 100; "Therefore, torimochi may serve as a model for young piRNA clusters, which are still "alive" and active in transposition, can trap other transposons, and produce de novo piRNAs.". It is unclear how this is evidenced? Would not any transposon be able to "trap" external sequences (e.g., PMID: 33347429). It is unclear to me how torimochi is different from any active transposon that is silenced by the piRNA pathway.

      As discussed above, our new data show that torimochi is not only a representative of transposons that have gained piRNA-producing activity in BmN4 cells but also a unique transposon that has started to gain a specialized transcriptional activation system as seen in well-established piRNA clusters in Drosophila. Therefore, we believe that torimochi will serve as a good model for young piRNA clusters.

      Line 117; "Therefore, torimochi is not a unique sequence in the genome but should be now interpreted as a gypsy-type transposon" - Even if there is one copy in the genome, torimochi could still is a transposable element.

      We agree that, even if there is one copy in the genome, it could still be a transposable element. We will change this part into "Therefore, torimochi is not a unique sequence in the genome as was thought in the past but should be now interpreted as a gypsy-type transposon with multiple copies in the genome".

      Line 133; "a presumed ancestor of Bombyx mori" - both species are extant, so none of them can be an ancestor of the other.

      Yes, Bombyx mandarina is also an extant species. We will change the wording to “a wild progenitor of Bombyx mori.”

      Line 135 Change "species" into "strains"?

      Yes, “strains” is appropriate and we will change it accordingly

      Please provide the coverage for every SNP in figures 2D and 2E. Having an idea of the coverage (i.e., how many reads support this SNP) would strengthen the conclusions made.

      We will add a Figure that shows the coverage of each SNPs at the top of the current Figure or as a Supplemental Figure.

      Supplementary figure 2I/J; The insert depiction of the GFP cassette is incorrect, it currently is displayed as a small vertical strip, whereas it should be a large block.

      We originally intended to show the situation around the GFP cassette for the sake of consistency with Supplemental Figure S2A–H. We will redraw this figure with including the GFP cassette.

      Methods: More details are needed on the computational analysis. Please include parameters used for different tools as well as custom scripts. Where multi-mappers used to quantify piRNAs across the torimochi insertions?

      We will include precise parameters used for different tools and upload our custom scripts on GitHub.

      Display of Supplementary Table 2 and Supplementary Table 3 partially obscured.

      We are sorry for the problems caused by the conversion. We will amend them.

      Introduction/discussion: I would suggest that the authors also discuss how torimochi could be mis-identified as a piRNA cluster previously.

      We will include the following statement to explain why torimochi was originally thought as a unique piRNA cluster in the genome. “The silkworm genome published in 2008 had many unassembled regions, which had masked two out of the three torimochi copies that we now found to exist in the p50T genome. In other words, the 2008 silkworm genome appeared as if there was only one region to which torimochi-derived piRNAs were mappable. Back then, the apparent difference in the chromosomal position of torimochi between BmN4 cells and silkworm ovaries was thought to be due to a large rearrangement of the corresponding genomic region.”

      CROSS-CONSULTATION COMMENTS

      Reviewer #2 raised three interesting points and the manuscript would be strengthened by addressing these.

      We will also fully address the three points raised by Reviewer #2.

      Reviewer #1 (Significance):

      Significance

      I find the topic both important and timely with the ongoing re-examination of whether piRNA clusters or dispersed euchromatic transposon insertions fuel the piRNA pathway. However, I feel that the current study on torimochi is relatively shallow and descriptive and does not take us much closer to resolving the issue. Re-examining the torimochi cluster is on its own of minor significance, since there are only five publications on torimochi since 2012. However, the current study has potential and torimochi could act as a model to study how piRNAs are produced.

      We are grateful to the Reviewer for recognizing the potential importance of our current study. All the comments by the Reviewer were of great help in significantly improving our manuscript. In particular, new Fig. 5 (related to Major Point #1) is an important addition to support the idea that torimochi is a young and still growing piRNA cluster, and we thank the Reviewer again for his/her constructive comments.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The author performed a straightforward of long read DNA sequencing data, which indicates that torimochi is not a single locus, but a gypsy-like LTR transposon that has massively expanded in BmN4 cells. The data are clear and convincing, and raise a number of interesting questions:

      1. The authors present data on single nucleotide polymorphisms in torimochi insertions (Figure 2), but the element can capture transgenes and produce silencing piRNAs. Does the long read data reveal capture of transposon insertions by any of the torimochi elements? Do any appear to be expanding due to recurrent insertion?

      In original Fig. S3A, we demonstrated that an endogenous transposon named mejiro is indeed inserted into the torimochi element . We plan to perform additional long read sequencing and further analyze the data to see if there are other examples of transposon capture events by any of the torimochi elements.

      1. The data indicate that torimochi is active and transcribed, but also the source of piRNAs that can silence transgenes. Why isn't torimochi silenced by piRNAs derived from the dispersed insertions?

      We believe that torimochi is indeed being silenced by piRNAs, but just not 100%. The GFP transgene trapped by torimochi was also not 100% silenced and some GFP signals were clearly detectable even in the silenced cell lines (Kawaoka et al., 2011). This must be also the case for any other transposons, although the silencing efficiency (the current result of the tug-of-war between transposons and the host’s piRNA system) should vary.

      1. Comparisons with the silkworm genome indicates that torimochi has been very active since BmN4 were isolated, and the element appears to active now, based on transcription. However, activation could have occurred when the cell line was established. If transposition is ongoing, BmN4 cells maintained as independent stock should have different insertions. This could be tested by sequence analysis of stocks from different labs. This experiment isn't essential to publication, but could be informative.

      We thank the Reviewer for raising this important point. Indeed, there exist BmN4 cells that have been independently maintained, and we have now obtained another stock of BmN4 cells from a different lab. We plan to perform long-read sequencing of genomic DNA using these cells to compare the insertion sites of torimochi. The results will allow us to determine whether activation of torimochi occurred when the cell line was established or its transposition is ongoing. Either result would be informative and helpful to further improve our manuscript.

      Reviewer #2 (Significance):

      piRNAs have a conserved role in transposon silencing. In many systems the most abundant piRNAs are derived from distinct chromosomal loci, termed clusters, that are composed of complex arrays of transposon fragments. Available data indicate that these loci can produce trans-silencing piRNAs, and the flam locus is required for fertility and silencing of Gyspsy transposons in flies. However, several major clusters, in flies and mice, are not required for fertility or transposon silencing, and dispersed mobile elements can produce piRNAs. The nature and function of piRNA source loci thus remains to be established. Shoji et al. address that nature of piRNA source loci through a reevaluation of the torimochi cluster In silkworm BmN4 cells. The authors show that torimochi is actually a gypsy-like LTR transposon that has massively expanded in BmN4 cells, and may represent an emerging piRNA clusters, falling between established clusters that look like “transposon graveyards”, and single euchromatic insertions that appear to have epigenetically converted to “mini-clusters”. The data raise a number of interesting questions, and should stimulate studies in other systems for similar elements.

      We are grateful to the Reviewer for precisely understanding the significance of our current study.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The author performed a straightforward of long read DNA sequencing data, which indicates that torimochi is not a single locus, but a gypsy-like LTR transposon that has massively expanded in BmN4 cells. The data are clear and convincing, and raise a number of interesting questions:

      1. The authors present data on single nucleotide polymorphisms in torimochi insertions (Figure 2), but the element can capture transgenes and produce silencing piRNAs. Does the long read data reveal capture of transposon insertions by any of the torimochi elements? Do any appear to be expanding due to recurrent insertion?
      2. The data indicate that torimochi is active and transcribed, but also the source of piRNAs that can silence transgenes. Why isn't torimochi silenced by piRNAs derived from the dispersed insertions?
      3. Comparisons with the silkworm genome indicates that torimochi has been very active since BmN4 were isolated, and the element appears to active now, based on transcription. However, activation could have occurred when the cell line was established. If transposition is ongoing, BmN4 cells maintained as independent stock should have different insertions. This could be tested by sequence analysis of stocks from different labs. This experiment isn't essential to publication, but could be informative.

      Significance

      piRNAs have a conserved role in transposon silencing. In many systems the most abundant piRNAs are derived from distinct chromosomal loci, termed clusters, that are composed of complex arrays of transposon fragments. Available data indicate that these loci can produce trans-silencing piRNAs, and the flam locus is required for fertility and silencing of Gyspsy transposons in flies. However, several major clusters, in flies and mice, are not required for fertility or transposon silencing, and dispersed mobile elements can produce piRNAs. The nature and function of piRNA source loci thus remains to be established. Shoji et al. address that nature of piRNA source loci through a reevaluation of the torimochi cluster in silkworm BmN4 cells. The authors show that torimochi is actually a gypsy-like LTR transposon that has massively expanded in BmN4 cells, and may represent an emerging piRNA clusters, falling between established clusters that look like "transposon graveyards", and single euchromatic insertions that appear to have epigenetically converted to "mini-clusters". The data raise a number of interesting questions, and should stimulate studies in other systems for similar elements.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      PIWI-interacting RNAs (piRNAs) are required for transposon repression and are transcribed from discrete genomic loci termed piRNA clusters. Torimochi was identified as a piRNA cluster in silkworm in 2012, but the incomplete genome assembly hindered its further characterisation. Here, Shoji and colleagues characterised torimochi using the current, recently improved, genome assembly, combined with long-read (MinION) and Sanger sequencing. This reveals that torimochi is a regular Gypsy LTR transposon. Comparison of copy number across strains reveals that torimochi has been particularly active in the BmN4 cell line, showing different insertions between strains. Moreover, piRNAs are produced from multiple torimochi copies across the genome. Lastly, the authors show that torimochi has an open chromatin conformation. The authors propose that torimochi may be a young and still growing piRNA cluster, capable of both trapping other transposable elements and transgenes and of producing piRNAs.

      Major comments

      How are the torimochi-derived piRNAs produced? Which part of the piRNA pathway are required for their production? Determining this would significantly strengthen the study and potentially support the idea that torimochi is a "young and still growing piRNA cluster". Currently, it is unclear what evidence there is for torimochi acting as a piRNA cluster rather than a regular LTR transposon.

      In figure 1F, the positive control (P50T) is missing. Based on the description, this one should show a band, but doesn't or at least doesn't do very clearly. The authors need to repeat this assay.

      The authors should perform a qPCR (or similar assay) on the different torimochi loci (and across different strains) to assess their individual transcriptional activity. Generally, showing that torimochi is an active transposable element is crucial to support the claim that it is still expanding.

      I would also recommend the authors to perform ping-pong analysis on all piRNAs mapping to torimochi. The hypothesis that torimochi acts as a piRNA cluster would be supported showing phased biogenesis, and a lack of a ping-pong signature (i.e., 10A). Please provide evidence that the piRNAs mapping to the different torimochi insertions are not produced via Post Transcriptional Gene Silencing.

      Line 265; "Torimochi has the open chromatin structure and can trap foreign transgenes as well as endogenous transposons" - The evidence for "trapping" transposable elements is circumstantial. Transposons are known to insert into each other. One occasion of a transgene inserting in torimochi is not strong enough evidence to support the made claim.

      Please provide a size distribution of all the piRNAs that are mapping on torimochi. In the methods section it is stated that small-RNAs of length 20-42 nt are mapped. This range is too generous as it also includes siRNA on the low end, and other ncRNAs on the long end. Please use the appropriate piRNA size range, i.e., 23-30 nt.

      Please include the sequences of the newly identified transposon families.

      Minor comments

      Line 71-72; "However, it was recently shown that these large piRNA clusters are evolutionarily labile and mostly dispensable for transposon suppression", this is misleading in the context of flamenco since flamenco is essential for transposon suppression. Please rephrase.

      Line 100; "Therefore, torimochi may serve as a model for young piRNA clusters, which are still<br /> "alive" and active in transposition, can trap other transposons, and produce de novo piRNAs.". It is unclear how this is evidenced? Would not any transposon be able to "trap" external sequences (e.g., PMID: 33347429). It is unclear to me how torimochi is different from any active transposon that is silenced by the piRNA pathway.

      Line 117; "Therefore, torimochi is not a unique sequence in the genome but should be now interpreted as a gypsy-type transposon" - Even if there is one copy in the genome, torimochi could still is a transposable element.

      Line 133; "a presumed ancestor of Bombyx mori" - both species are extant, so none of them can be an ancestor of the other.

      Line 135 Change "species" into "strains"?

      Please provide the coverage for every SNP in figures 2D and 2E. Having an idea of the coverage (i.e., how many reads support this SNP) would strengthen the conclusions made.

      Supplementary figure 2I/J; The insert depiction of the GFP cassette is incorrect, it currently is displayed as a small vertical strip, whereas it should be a large block.

      Methods: More details are needed on the computational analysis. Please include parameters used for different tools as well as custom scripts. Where multi-mappers used to quantify piRNAs across the torimochi insertions?

      Display of Supplementary Table 2 and Supplementary Table 3 partially obscured.

      Introduction/discussion: I would suggest that the authors also discuss how torimochi could be mis-identified as a piRNA cluster previously.

      Referees cross-commenting

      Reviewer #2 raised three interesting points and the manuscript would be strengthened by addressing these.

      Significance

      I find the topic both important and timely with the ongoing re-examination of whether piRNA clusters or dispersed euchromatic transposon insertions fuel the piRNA pathway. However, I feel that the current study on torimochi is relatively shallow and descriptive and does not take us much closer to resolving the issue. Re-examining the torimochi cluster is on its own of minor significance, since there are only five publications on torimochi since 2012. However, the current study has potential and torimochi could act as a model to study how piRNAs are produced.

    1. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. General Statements [optional]

      The newly identified azyx-1 ORF was named peu-1 in the initial submission of this manuscript, a name that was under consideration with WormBase, who supervise nomenclature of C. elegans genes. In consultation with WormBase, the locus was named azyx-1 instead (the final decision being “azyx-1 will be attributed to F42G4.11. It will be released in WS287 at the beginning of 2023”). We updated this nomenclature in our submission files, including in reviewer comments pasted below. Please note that other than this, no changes whatsoever were made to the reviewer comments.

      2. Description of the planned revisions

      REV #3: Specific thoughts for consideration:

      Figure 5, Moderate is really minor/moderate with other metrics, and severe is definitely moderate with other metrics. Thus, I'm not sure if normal vs. moderate is needed. This really is a minor point as it doesn't impact results/overall story/importance.

      This was also pointed out by reviewer #1. We will rename classification more mildly so.

      REV #1 Fig. 5 Even the 'severe' muscle disruption is quite mild (say, in comparison to loss of talin). Perhaps rephrase these categories? The moderate and severe categories also do not look different to me. Show what the muscle cells look like in zyx-1 deletion and overexpression animals. Is there a way to use quantitative imaging to score these? Can azyx-1 phenotypes be rescued or enhanced by expression (or RNAi) of zyxin in the muscle? Also, clarify what age animals are being tested in the muscle and burrowing assay.

      We agree and will rename the classes in milder terms. Qualitative scoring (which was done blinded) is the standard in the field as was done according to Dhondt et al. (2021 Dis Model Mech). When tested for muscle integrity and burrowing capacity, animals were day 1 adults. This is mentioned in the Methods section of the current manuscript and will also be included in the captions of the revised figures.

      REV #2: I am not convinced by the data presented in Figure 5. There does not seem to be much to distinguish the five genotypes, but I concede that I am not used to looking at this type of data. But why was the muscle phenotype not also examined in the azyx-1 rescue lines?

      Because other reviewers that are familiar with these data point out that the observed differences of panels A-B are indeed milder that what is usually seen, we will rename classifications in the manuscript (see responses above). Because the azyx-1 deletion mutant does not differ from controls in the muscle phenotype, there is no phenotype to rescue for this readout, and no rescue strains were generated.

      We are not sure what the reviewer may struggle with in (assumedly) panel C (~‘to distinguish the five genotypes’). The positive control (zyx-1) behaves as expected in the burrowing assay, with our own mutants within that range, also as expected. All data were scored blinded to avoid any bias and statistical analysis supports the interpretations, all granting confidence to the observed differences. However, because reviewer#3 also would prefer another representation of the data shown in this panel (see below), we will provide an updated panel representation in the revised manuscript.

      REV #3: Figure 5C- Hard to read. Would displaying lines/tragectories make it easier to understand? Would displaying as violin plots for each timepoint/condition make it easier to visualize? Basically in black and white and in color this is hard to visually process.

      We will work on another representation for the revised manuscript, since reviewer2 also seemed to struggle with this panel representation.

      REV #1: Fig. S2 Match font sizes on Y-axes. Also, indicate any statistical differences and statistics used.

      Figure adjustments will be implemented in the revised manuscript as requested.

      REV#1: Fig. S3 C, indicate any statistical differences and statistics used.

      Figure adjustments will be implemented in the revised manuscript as requested.

      REV #2: I am not convinced by the "overexpression" experiments. These are not well controlled, since no evidence is presented that AZYX-1 is being overexpressed in these lines. Also, since we know that extrachromosomal transgenic lines are highly variable, one would need to test the effect of several independent lines to ensure that the effects that the authors observe are indeed associated with AZYX-1 overexpression and not simply an idiosyncratic effect of the genetic background of a given strain. Finally, there does not seem to be an obvious mechanism by which overexpression of AZYX-1 can impact ZYX-1 function. That doesn't rule out an effect, but based on the data as it is, it is premature to propose such a mechanism. The authors need to show that multiple overexpression lines do reproducibly overexpress AZYX-1 and that this results in reproducible effects of zyx-1 phenotypes.

      The extrachromosomal strains are indeed variable, but because the background is wild type (in contrast to a deletion mutant background for rescue strains), an overdose of the target provided is expected. As requested in the cross-consultation reviewer communication, we will include quantitative data in our revised manuscript that shows that the used strains (LSC1950, LSC1960, LSC2000) indeed are overexpressors.

      REV #2: The data presented in Figure 4F needs to be quantified using the same format as was presented in Figure 4B.

      Due to the different genetic background of the strains, this is not possible in the exact same way (the red signal of LSC1998 & LSC1999 is not unique to zyxin). We understand that in essence, the reviewer would like us to include a more quantitative representation of these data, and will update the figure accordingly.

      REV #2: What is the difference between the overexpression transgenic lines and the "rescuing" transgenic lines? In the Materials and Methods, the same concentration of plasmid was used in injections - so these likely give the same approximate level of transgenic expression.

      The genetic background: a rescue line adds wt DNA back to a mutant background, while in an OE strain it is added into a wt background. While this can already be derived from the genotype details in Supplemental Table S1, we apologize for not specifying this in the methods section, as it is common practice in the field. These specifications will be added to the revised manuscript.

      REV #2: I am not clear what features are being used to characterise the myofibril structures into the three categories. Can the authors annotate the images to indicate the diagnostic features?

      The reviewer is correct that manual classification is rather poorly defined in general, which is why it is scored blinded (here as per Cothren et al., 2018 Bio Protoc). We adhered to the reference images by Dhondt et al. (2021, Dis Mod Mech) with visual assessment based on how tightly organized (~parallel) myofilaments are organized, assessing overall increases of bends or breaks in individual myofibers as leading to a less aligned pattern (cf. Fig. 1 of Dhondt et al.). We will add this information more explicitly to the Methods section of the revised manuscript.

      3. Description of the revisions that have already been incorporated in the transferred manuscript

      REV #1: Fig. 4 would be better if the control (A) and azyx-1OE (B) worms were more similar in age and size

      The panels of this figure were not to the exact same scale, we apologize if the reviewer found this confusing. We have rescaled the panels so that this is less confusing. The animals are all day 1 adults.

      REV #1: Abstract: Clarify what is meant by 'putative syntenic conservation' or rephrase, simply stating that the existence of an ORF overlapping with the 5' region of zyxin is conserved

      This has been rephrased according to request.

      REV #1: Line 24: Clarify these are synthetic phenotypes (not caused by loss of zyx-1/azyx-1 alone). Loss of zyx-1 alone results in very mild phenotypes.

      While the original sentence already pointed this out, we rephrased the text to make clear that these observations require the dystrophic mutant background.

      REV #1: Line 28: Start new paragraph

      The new paragraph was started a sentence earlier, according to rev#2 request.

      REV #1: Line 31: Not clear what is meant by 'post-transcriptional regulation can be further propagated'- maybe reword to 'alternative and overlapping open reading frames (ORFs) arising from polycistronic mRNA can regulate translation' or something simpler like that.

      This has been rephrased according to request.

      REV #1: Line 56-57: Is this because most C. elegans transcripts start with the splice leader SL1 or SL2 rather than the adjacent 5' sequence? Is that relevant for zyx-1? Recommend commenting briefly on this.

      We did not look into this for all possible u(o)ORFs in C. elegans, which also is not the focus of the manuscript, so we cannot make general statements. As part of the annotation procedure of azyx‑1, WormBase verified that indeed several pieces of evidence, including available phyloCSF data for exon 1, SL1s, RNASeq and Nanopore data, all support its annotation, as well as its translation from the zyx-1 long transcripts (albeit with different start and in different reading frame).

      REV #1: Line 78: Delete the word 'other'

      Done

      REV #1: Line 122: zyx-1

      Done

      REV #1: Line 137: 'lead' should be 'led'

      Done

      REV #1: Line 158: rephrase 'only the long ones' to indicate which isoforms more precisely

      Done (these are a/e, cf. Luo et al. 2014, Development)

      REV #1: Line 195: Rephrase. Unclear what is meant by 'highlights the evasiveness of non-canonical ORFs from functional annotation'

      Done; this was rephrased to “This exemplifies how non-canonical ORFs can escape functional annotation, …”.

      REV #1: Various locations: I think it will be more clear to the reader to consistently refer to the burrowing assay as 'burrowing assay' rather than chemotaxis. I recommend adding a brief description of the burrowing assay to the results section.

      Wording has been updated, we can provide a short context sentence to the results section of the revised manuscript.

      REV #2: I'm not sure how to interpret the significance of the u/ouORFs across short and large phylogenetic distances. One would presume that there might not be primary amino acid conservation if the regulation simply takes by interference with ribosome scanning and translocation. Here some statistical analysis would help with assessing the significance of these observations. How unusual is it to find u/uoORFs in the 5' UTRs of gene encoding zyxin family members versus in general for the species analysed?

      This is indeed the very question we are asking in the manuscript, and there is a clear reason why we refrain from making significance statements. At the moment, all relevant available metadata are used for the analysis in the manuscript, leading to the communication of the synteny-related findings as they are currently presented. This is due to the dependency on translatomics data to find credible u(o)ORFs, and there aren’t very many translatomics datasets available, only for a limited set of species so far. Our manuscript contains all relevant OpenProt data, which are derived from only 9 animal species so far. As shown in Table S4, 14 zyxin orthologs belonging to 7 species have associated u(o)ORFS, for two species only overlapping ORFs are present in the database. While more and more datasets will undoubtedly become available in the next years, the findings in the manuscript are as complete as currently possible: we do find evidence of u(o)ORFs associated with zyxin orthologs in these species, some of which are evolutionarily distantly related to C. elegans.

      REV #2: The authors state that there is evidence for synteny and coding region conservation. The data supporting this assertion is not well presented. Presentation and analysis of multiple sequence alignments of the putative homologues involved would strengthen the assertion of synteny considerably.

      We apologize if the reviewer misunderstood: we discuss likely syntenic conservation, not coding region conservation. The latter is not mentioned in our manuscript, and in fact not convincing indeed. This is not surprising given the bigger sequence diversity observed at the N terminus of zyxins and the partial overlap of these coding sequences, and in line with observations of several others in the RiboSeq community that many identified uORFs are conserved between orthologous genes, but poorly conserved at the amino acid level (e.g. community-driven communication by Mudge et al., BioRxiv 2021 and references therein).

      REV #2: The authors are oddly coy about the molecular details of the 27 bp deletion used to study the loss of azyx-1 function. In the absence of these details, it is not possible to assess the validity of these experiments. We need to be given the full molecular details of the allele - precisely which nucleotides are deleted? And how do they affect the coding regions of zyx-1 and azyx-1?

      I am also confused about why the authors made a deletion allele rather than mutating the AUG of AZYX-1? This would be a cleaner experiment to interpret. Based on the data presented, there are two possible interpretations in addition to the one suggested by the authors: 1) the 27 bp deletion impacts zyx-1 expression due to its impact on the zyx-1 coding region (the coding regions of azyx-1 and zyx-1 overlap); 2) the deletion mutation deletes critical transcriptional control elements. A simpler mutation of the azyx-1 AUG via CRISPR might allow them to rule out the possibility that they have simply compromised a transcriptional control element or damaged the coding region of ZYX-1.

      As mentioned above and as will be included more clearly so in the revised manuscript: the deletion is 182-155bp (27bp) upstream of the zyx-1a start site. This was a mutant that could easily be generated via CRISPR, so we proceeded with this one. This edit rules out option1 (there is no change of the zyxin coding region), but (as also considered but addressed differently in the manuscript; see below) retains alternative interpretation 2. There are no regulatory regions or transcription factor binding sites known for the (a)zyx-1 locus (verified in current WormBase version WS285), but that does certainly not fully rule out the possibility either. Rather than creating a series of azyx-1 mutants, be they SNP or small deletion mutants, that would suffer from the exact same duality in possible interpretation, we chose to combine the deletion mutant with rescue and overexpression strains. Because these latter strains do not affect the endogenous zyxin regulatory region, they add far more credibility to the interpretation, than alternative mutants in the azyx-1/zyx-1 locus would.

      REV#2. The narrative flow of the introduction could be improved by the judicious use of paragraphs. Line 12, for instance is a clear paragraph break, as is line 24.

      Done

      REV #3: Specific thoughts for consideration:<br /> 3) Could more be said about overlapping genes/regulation in humans? Again, not critical but this is such a great piece of work that it would be useful to guide human subjects researchers as to how to best further your work.

      It is unclear whether the reviewer would like to see an extended introduction and/or discussion. We tried to meet this request without drifting too much from the focus of our current communication by adding the following to the introduction (lines 41-47 of the current draft): “From a more human-centred future perspective, uORFs are a rather unexplored niche for translational research: with a predicted prevalence in over 50% of human genes and first examples regulating translation of disease-associated genes already emerging (Lee et al. 2021; Schulz et al. 2018), the field is bound to not only lead to more fundamental, but also application-oriented insights. Keeping this broader context in mind, we here focus on more fundamental principles of uORFs in a model organism context.”.

      4. Description of analyses that authors prefer not to carry out

      REV#1: Does azyx-1 have zyx-1-independent functions or other regulatory targets?

      This is an interesting question that is not yet addressed. While this is possible, it is beyond the scope of our current communication. Since the reviewer does not request anything concrete, we would prefer to leave this for follow-up research. While this notion is included in the manuscript, we are happy to more explicitly address this question in the discussion as well.

      REV#1: Do the burrowing assay results reflect a neuronal or a muscle function for AZYX-1? Or both?

      Our manuscript indeed does not yet delve into tissue-specific actions of this newly discovered ORF. While interesting, and in line with reviewer #3’s remark, this would be valuable for follow-up research, but is beyond the scope of our current communication. We will make sure the concept is clearly mentioned in the discussion of our findings.

      REV #3: Specific thoughts for consideration:

      Could more be done/said about neruo vs, muscular effects of azyx-1 and zyx-1. I appreciate this is beyond the scope of the present manuscript and therefore does not require response if you don't have data or it makes telling the story you want to tell more difficult.

      We agree with the reviewer that spatially resolving some of these observations would be a next interesting step, which indeed is beyond the scope of our current communication.

      REV#1: Fig. 2A very faint, increase brightness/contrast?

      We did not adjust brightness or contrast for any of the figures, an no such requests were made by other reviewers. We greatly prefer presenting the data as unedited as possible, and would like to request the journal’s preference for action here.

      5. Remaining reviewer comments & responses not highlighted above

      CROSS-CONSULTATION COMMENTS<br /> _The following is a conversation among the three referees:<br /> _REFEREE #2: I appear to be the dissenting voice in terms of concern about the details of the 27 bp deletion and the "overexpression" constructs. I would be interested to know your opinions regarding my comments on these issues.<br /> REFEREE #1: I think adding the details of the 27 bp deletion is a reasonable request. It is probably not possible to disambiguate entirely the two effects of the deletion, and changing the start codon may result in an alternate start with other downstream effects. I think just explaining it more fully in the methods would satisfy my concerns.<br /> REFEREE #2: What about the issues with the overexpresssion? In my experience, presence of multicopy transgenes on an extrachromosomal arrays might not lead to over expression of the gene involved? This needs to be verified in some way.<br /> REFEREE #1: You are right about that. If the construct is tagged in some way they could try a western. I would recommend they integrate the transgenes, or just show results from several lines as you suggest.<br /> REFEREE #3: I agree the 27 bp deletion and over expression are reasonable technical issues. However, I view this a techical details vs. critical details for the novel regulatory mechanism. The point about ability to judge conservation is also reasonable but until the theory is firmly out there it is hard to test the conservation and broader applicability to other genes/proteins. Thus, while asking for additional information on these issues is reasonable I do not see the inability to address beyond highlighting as limitation in the text as critical to the overall validity of the work.<br /> REFEREE #2: I disagree with Reviewer #3, without knowing the details of 27 bp deletion the most reasonable interpretation of the data is simply that it is a loss-of-function allele of zyx-1. This goes beyond "technical" - at present there is no unequivocal evidence that azyx-1 has any functional significance beyond that it is expressed as a peptide.<br /> REFEREE #3: I've been back through the manuscript. They have sequenced the deletion and therefore should be able to provide that information to satisfy the issue(s). For the over expression, short of silencing my experience is that they do over express and when you have multiple lines some express more than others (and some silence more than others). If you want evidence that the peptide is over expressed ask them to quantify via mass spec if it isn't tagged and they can't do a Western. Clearly they have work leading expertise in quantitative mass spec proteomics in C. elegans and should be able to do that. Generally speaking, rescue of a deletion is a pretty good sign that the expression is working though (and is an accepted standard).

      We apologize if this was not clear from the manuscript, and will clearly include the details in the Methods section: the deletion is 182-155bp (27bp) upstream of the zyx-1a start site, at AT|G+26|TTC. This was confirmed by sequencing; the oligos used for this are listed in table S3 of the manuscript. We address the confusion of rescue and overexpression above, in response to reviewer #2 (who echoes this confusion here).

      Reviewer #1 (Evidence, reproducibility and clarity):

      **This is a very interesting paper about a gene regulatory mechanism in a type of poly-cistronic mRNA in which alternate starts/open reading frames lead to production of two different proteins from the same locus. AZYX-1 is a predicted 166 aa protein, translated from the 5'UTR of zyx-1. Two isoforms are expressed from the 5' UTR and coding region of zyx-1. The presence of overlapping transcripts with zyxin orthologs appears to be conserved in other animals. The authors provide spectroscopic evidence AZYX-1 is indeed translated, and show AZYX-1 can regulate zyx-1 expression. Intriguingly, it seems azyx-1 inhibits zyx-1 expression in cis (deletion of azyx-1 increases ZYX-1 peptides), but AZYX-1 promotes zyx-1 expression in trans (overexpression of AZYX-1 increases ZYX-1 expression).

      Reviewer #1 (Significance):

      Nature and significance of the advance: This is a very interesting paper about a gene regulatory mechanism in a type of poly-cistronic mRNA encoding azyx-1 and zyx-1. Intriguingly, it seems azyx-1 inhibits zyx-1 expression in cis (deletion of azyx-1 increases ZYX-1 peptides), but AZYX-1 promotes zyx-1 expression in trans (overexpression of AZYX-1 increases ZYX-1 expression).

      Compare to existing published knowledge: This is the first study of its type on zyx-1.

      Audience: Those interested in gene regulatory mechanisms and in zyxin.

      My expertise: C. elegans cytoskeleton, cell migration, acto-myosin contractility.

      Reviewer #2 (Evidence, reproducibility and clarity):

      **Summary:<br /> The authors build on previous work defining upstream and upstream-overlapping open reading frames (uORF and uoORFs, respectively) by focussing on a specific locus azyx-1, which the authors propose influences the expression of the gene encoding the sole zyxin family in C. elegans, zyx-1. They present evidence suggestive of u/uoORFs being a common feature of zyxin family genes in other animals, hinting that perhaps this is a conserved mechanism of gene expression regulation for these genes. In which case, studies in C. elegans would be valuable to elucidate the mechanism involved.<br /> Using a fluorescent reporter strategy, they show that azyx-1 is expressed in the same tissues as zyx-1, which is to be expected since their share the same transcriptional control elements.<br /> They also characterise the peptide steady state levels of both ZYX-1 and AZYX-1 isoforms, suggesting that while overall ZYX-1 levels decline with age, those for AZYX-1 are generally maintained. The significance of these observations was not immediately obvious to me - a priori it is difficult to assess what relative wild type steady-state levels one might expect if AZYX-1 translation impacted ZYX-1 expression.<br /> The authors propose that expression of AZYX-1 leads to inhibition of ZYX-1 translation through the standard model by which u/uoORFs impact translation of downstream ORFs. To test this, they generated a 27 bp deletion "at the beginning of the azyx-1 ORF". This deletion clearly correlated with a reduction in ZYX-1 expression.<br /> Finally, the authors generated lines designed to overexpress AZYX-1, testing the hypothesis that AZYX-1 might influence ZYX-1 in trans. Though here, it is not obvious by what mechanism this might operate, and the effect-sizes involved are modest.

      Reviewer #2 (Significance):

      The authors propose an interesting interaction between an important regulator of cellular behaviour (zyxin) and the u/uoORF that potentially regulates its expression - if validated by further experimentation, this would add to the growing evidence for the importance of the 5' UTR as a source of gene regulatory activity. Such regulation is well described in yeast, but there are fewer examples in animals, particularly in genetically tractable systems such as C. elegans. The work would primarily be of interest to researchers interested in understanding the spectrum of such activity in C. elegans. My own area of expertise, RNA-splicing and the post-transcriptional regulation of C. elegans gene expression, is not directly related to the research presented in the manuscript, but I am familiar with the general concepts and developments involved.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:<br /> The authors find that azyx-1 is a non-cononical gene with overlapping genomic localization to the gene zyx-1 in C. elegans. The authors also find preliminary evidence that similar genes with overlapping localization to zyxin genes exist in other species. The authors provide evidence for the tissue specific distribution of azyx-1 expression. The authors further provide evidence for azyx-1 and zyx-1 expression with age. Importantly, these data demonstrate differences in azyx-1 and zyx-1 protein products biological importance/relevance as they display differences with age. The authors provide evidence that azyx-1 expression influences zyx-1 expression in multiple ways. Lastly, the authors demonstrate that azyx-1 expression influences muscle structure and neuromuscular function. The authors use a combination of bioinformatic, protein biochemistry, genetic/transgenic, histologic, and physiologic methods to make these points. With regards to methods, the range/breadth is impressive and appropriate. In many ways the manuscript it is a tour de force in modern molecular biology with a focus on translational medicine. With regards to species, the in vivo experiments are solely C. elegans but the computational data include Fly, Bull, and Mouse.

      The key conclusions are convincing. There are no major claims that require qualification as preliminary or speculative. No additional experiments are essential to support the claims of the paper. The data and methods are presented in such a way that they can be reproduced. The experiments are adequately replicated and the statistical analysis is adequate.

      Prior studies are references appropriately. The text and figures are mostly clear and accurate.

      We would like to thank the reviewer for their appreciation of our efforts and research approach.

      Reviewer #3 (Significance):

      **Conceptually this is a massive/ground breaking piece of work. Essentially, the authors are demonstrating a novel mechanism of regulation of gene/protein expression that, really, hasn't been reported before. What is particularly notable is that it appears, unsurprisingly, as correctly stated by the authors, to be evolutionarily conserved and not well reported in the literature. As with many classical molecular biology papers, and the more recent (e.g. RNAi, lncRNA) genetic papers, this manuscript hold the promise of transforming biology/medicine. The range of methods employed and the linking of molecular biology to pathophysiology was impressive. The audience that will be interested in this work includes: geneticists, proteomics researchers, evolutionary researchers, molecular biologists, physiologists, ageing researchers, muscle researchers, and muscle disease researchers. Thus, the interested audience is broad. My field of expertise with regards to this manuscript is: C. elegans, Mass Spec, Proteomics, genomic regulation, genetics, transgenics, histology, muscle, and physiology. There are no parts of this manuscript that I do not feel I have insufficient expertise to evaluate. I congratulate the authors on a highly significant, cross disciplinary, manuscript, that should impact multiple sub-areas of biology.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The authors find that peu-1 is a non-cononical gene with overlapping genomic localization to the gene zyx-1 in C. elegans. The authors also find preliminary evidence that similar genes with overlapping localization to zyxin genes exist in other species. The authors provide evidence for the tissue specific distribution of peu-1 expression. The authors further provide evidence for peu-1 and zyx-1 expression with age. Importantly, these data demonstrate differences in peu-1 and zyx-1 protein products biological importance/relevance as they display differences with age. The authors provide evidence that peu-1 expression influences zyx-1 expression in multiple ways. Lastly, the authors demonstrate that peu-1 expression influences muscle structure and neuromuscular function. The authors use a combination of bioinformatic, protein biochemistry, genetic/transgenic, histologic, and physiologic methods to make these points. With regards to methods, the range/breadth is impressive and appropriate. In many ways the manuscript it is a tour de force in modern molecular biology with a focus on translational medicine. With regards to species, the in vivo experiments are solely C. elegans but the computational data include Fly, Bull, and Mouse.

      Major comments:

      The key conclusions are convincing. There are no major claims that require qualification as preliminary or speculative. No additional experiments are essential to support the claims of the paper. The data and methods are presented in such a way that they can be reproduced. The experiments are adequately replicated and the statistical analysis is adequate.

      Minor comments:

      Prior studies are references appropriately. The text and figures are mostly clear and accurate.<br /> Specific thoughts for improvement:<br /> Figure 5C- Hard to read. Would displaying lines/tragectories make it easier to understand? Would displaying as violin plots for each timepoint/condition make it easier to visualize? Basically in black and white and in color this is hard to visually process.<br /> Specific thoughts for consideration:

      1. Could more be done/said about neruo vs, muscular effects of peu-1 and zyx-1. I appreciate this is beyond the scope of the present manuscript and therefore does not require response if you don't have data or it makes telling the story you want to tell more difficult.
      2. Figure 5, Moderate is really minor/moderate with other metrics, and severe is definitely moderate with other metrics. Thus, I'm not sure if normal vs. moderate is needed. This really is a minor point as it doesn't impact results/overall story/importance.
      3. Could more be said about overlapping genes/regulation in humans? Again, not critical but this is such a great piece of work that it would be useful to guide human subjects researchers as to how to best further your work.

      Referees cross-commenting

      The following is a conversation among the three referees:

      REFEREE #2: I appear to be the dissenting voice in terms of concern about the details of the 27 bp deletion and the "overexpression" constructs. I would be interested to know your opinions regarding my comments on these issues.

      REFEREE #1: I think adding the details of the 27 bp deletion is a reasonable request. It is probably not possible to disambiguate entirely the two effects of the deletion, and changing the start codon may result in an alternate start with other downstream effects. I think just explaining it more fully in the methods would satisfy my concerns.

      REFEREE #2: What about the issues with the overexpresssion? In my experience, presence of multicopy transgenes on an extrachromosomal arrays might not lead to over expression of the gene involved? This needs to be verified in some way.

      REFEREE #1: You are right about that. If the construct is tagged in some way they could try a western. I would recommend they integrate the transgenes, or just show results from several lines as you suggest.

      REFEREE #3: I agree the 27 bp deletion and over expression are reasonable technical issues. However, I view this a techical details vs. critical details for the novel regulatory mechanism. The point about ability to judge conservation is also reasonable but until the theory is firmly out there it is hard to test the conservation and broader applicability to other genes/proteins. Thus, while asking for additional information on these issues is reasonable I do not see the inability to address beyond highlighting as limitation in the text as critical to the overall validity of the work.

      REFEREE #2: I disagree with Reviewer #3, without knowing the details of 27 bp deletion the most reasonable interpretation of the data is simply that it is a loss-of-function allele of zyx-1. This goes beyond "technical" - at present there is no unequivocal evidence that peu-1 has any functional significance beyond that it is expressed as a peptide.

      REFEREE #3: I've been back through the manuscript. They have sequenced the deletion and therefore should be able to provide that information to satisfy the issue(s). For the over expression, short of silencing my experience is that they do over express and when you have multiple lines some express more than others (and some silence more than others). If you want evidence that the peptide is over expressed ask them to quantify via mass spec if it isn't tagged and they can't do a Western. Clearly they have work leading expertise in quantitative mass spec proteomics in C. elegans and should be able to do that. Generally speaking, rescue of a deletion is a pretty good sign that the expression is working though (and is an accepted standard).

      Significance

      Conceptually this is a massive/ground breaking piece of work. Essentially, the authors are demonstrating a novel mechanism of regulation of gene/protein expression that, really, hasn't been reported before. What is particularly notable is that it appears, unsurprisingly, as correctly stated by the authors, to be evolutionarily conserved and not well reported in the literature. As with many classical molecular biology papers, and the more recent (e.g. RNAi, lncRNA) genetic papers, this manuscript hold the promise of transforming biology/medicine. The range of methods employed and the linking of molecular biology to pathophysiology was impressive. The audience that will be interested in this work includes: geneticists, proteomics researchers, evolutionary researchers, molecular biologists, physiologists, ageing researchers, muscle researchers, and muscle disease researchers. Thus, the interested audience is broad. My field of expertise with regards to this manuscript is: C. elegans, Mass Spec, Proteomics, genomic regulation, genetics, transgenics, histology, muscle, and physiology. There are no parts of this manuscript that I do not feel I have insufficient expertise to evaluate. I congratulate the authors on a highly significant, cross disciplinary, manuscript, that should impact multiple sub-areas of biology.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      The authors build on previous work defining upstream and upstream-overlapping open reading frames (uORF and uoORFs, respectively) by focussing on a specific locus peu-1, which the authors propose influences the expression of the gene encoding the sole zyxin family in C. elegans, zyx-1. They present evidence suggestive of u/uoORFs being a common feature of zyxin family genes in other animals, hinting that perhaps this is a conserved mechanism of gene expression regulation for these genes. In which case, studies in C. elegans would be valuable to elucidate the mechanism involved.<br /> Using a fluorescent reporter strategy, they show that peu-1 is expressed in the same tissues as zyx-1, which is to be expected since their share the same transcriptional control elements.<br /> They also characterise the peptide steady state levels of both ZYX-1 and PEU-1 isoforms, suggesting that while overall ZYX-1 levels decline with age, those for PEU-1 are generally maintained. The significance of these observations was not immediately obvious to me - a priori it is difficult to assess what relative wild type steady-state levels one might expect if PEU-1 translation impacted ZYX-1 expression.<br /> The authors propose that expression of PEU-1 leads to inhibition of ZYX-1 translation through the standard model by which u/uoORFs impact translation of downstream ORFs. To test this, they generated a 27 bp deletion "at the beginning of the peu-1 ORF". This deletion clearly correlated with a reduction in ZYX-1 expression.<br /> Finally, the authors generated lines designed to overexpress PEU-1, testing the hypothesis that PEU-1 might influence ZYX-1 in trans. Though here, it is not obvious by what mechanism this might operate, and the effect-sizes involved are modest.

      Major comments:

      1. I'm not sure how to interpret the significance of the u/ouORFs across short and large phylogenetic distances. One would presume that there might not be primary amino acid conservation if the regulation simply takes by interference with ribosome scanning and translocation. Here some statistical analysis would help with assessing the significance of these observations. How unusual is it to find u/uoORFs in the 5' UTRs of gene encoding zyxin family members versus in general for the species analysed?
      2. The authors state that there is evidence for synteny and coding region conservation. The data supporting this assertion is not well presented. Presentation and analysis of multiple sequence alignments of the putative homologues involved would strengthen the assertion of synteny considerably.
      3. The authors are oddly coy about the molecular details of the 27 bp deletion used to study the loss of peu-1 function. In the absence of these details, it is not possible to assess the validity of these experiments. We need to be given the full molecular details of the allele - precisely which nucleotides are deleted? And how do they affect the coding regions of zyx-1 and peu-1?<br /> I am also confused about why the authors made a deletion allele rather than mutating the AUG of PEU-1? This would be a cleaner experiment to interpret. Based on the data presented, there are two possible interpretations in addition to the one suggested by the authors: 1) the 27 bp deletion impacts zyx-1 expression due to its impact on the zyx-1 coding region (the coding regions of peu-1 and zyx-1 overlap); 2) the deletion mutation deletes critical transcriptional control elements. A simpler mutation of the peu-1 AUG via CRISPR might allow them to rule out the possibility that they have simply compromised a transcriptional control element or damaged the coding region of ZYX-1.
      4. I am not convinced by the "overexpression" experiments. These are not well controlled, since no evidence is presented that PEU-1 is being overexpressed in these lines. Also, since we know that extrachromosomal transgenic lines are highly variable, one would need to test the effect of several independent lines to ensure that the effects that the authors observe are indeed associated with PEU-1 overexpression and not simply an idiosyncratic effect of the genetic background of a given strain. Finally, there does not seem to be an obvious mechanism by which overexpression of PEU-1 can impact ZYX-1 function. That doesn't rule out an effect, but based on the data as it is, it is premature to propose such a mechanism. The authors need to show that multiple overexpression lines do reproducibly overexpress PEU-1 and that this results in reproducible effects of zyx-1 phenotypes.
      5. I am not convinced by the data presented in Figure 5. There does not seem to be much to distinguish the five genotypes, but I concede that I am not used to looking at this type of data. But why was the muscle phenotype not also examined in the peu-1 rescue lines?

      Minor comments:

      1. The narrative flow of the introduction could be improved by the judicious use of paragraphs. Line 12, for instance is a clear paragraph break, as is line 24.
      2. The data presented in Figure 4F needs to be quantified using the same format as was presented in Figure 4B.
      3. I am not clear what features are being used to characterise the myofibril structures into the three categories. Can the authors annotate the images to indicate the diagnostic features?
      4. What is the difference between the overexpression transgenic lines and the "rescuing" transgenic lines? In the Materials and Methods, the same concentration of plasmid was used in injections - so these likely give the same approximate level of transgenic expression.

      Referees cross-commenting

      The following is a conversation among the three referees:

      REFEREE #2: I appear to be the dissenting voice in terms of concern about the details of the 27 bp deletion and the "overexpression" constructs. I would be interested to know your opinions regarding my comments on these issues.

      REFEREE #1: I think adding the details of the 27 bp deletion is a reasonable request. It is probably not possible to disambiguate entirely the two effects of the deletion, and changing the start codon may result in an alternate start with other downstream effects. I think just explaining it more fully in the methods would satisfy my concerns.

      REFEREE #2: What about the issues with the overexpresssion? In my experience, presence of multicopy transgenes on an extrachromosomal arrays might not lead to over expression of the gene involved? This needs to be verified in some way.

      REFEREE #1: You are right about that. If the construct is tagged in some way they could try a western. I would recommend they integrate the transgenes, or just show results from several lines as you suggest.

      REFEREE #3: I agree the 27 bp deletion and over expression are reasonable technical issues. However, I view this a techical details vs. critical details for the novel regulatory mechanism. The point about ability to judge conservation is also reasonable but until the theory is firmly out there it is hard to test the conservation and broader applicability to other genes/proteins. Thus, while asking for additional information on these issues is reasonable I do not see the inability to address beyond highlighting as limitation in the text as critical to the overall validity of the work.

      REFEREE #2: I disagree with Reviewer #3, without knowing the details of 27 bp deletion the most reasonable interpretation of the data is simply that it is a loss-of-function allele of zyx-1. This goes beyond "technical" - at present there is no unequivocal evidence that peu-1 has any functional significance beyond that it is expressed as a peptide.

      REFEREE #3: I've been back through the manuscript. They have sequenced the deletion and therefore should be able to provide that information to satisfy the issue(s). For the over expression, short of silencing my experience is that they do over express and when you have multiple lines some express more than others (and some silence more than others). If you want evidence that the peptide is over expressed ask them to quantify via mass spec if it isn't tagged and they can't do a Western. Clearly they have work leading expertise in quantitative mass spec proteomics in C. elegans and should be able to do that. Generally speaking, rescue of a deletion is a pretty good sign that the expression is working though (and is an accepted standard).

      Significance

      The authors propose an interesting interaction between an important regulator of cellular behaviour (zyxin) and the u/uoORF that potentially regulates its expression - if validated by further experimentation, this would add to the growing evidence for the importance of the 5' UTR as a source of gene regulatory activity. Such regulation is well described in yeast, but there are fewer examples in animals, particularly in genetically tractable systems such as C. elegans. The work would primarily be of interest to researchers interested in understanding the spectrum of such activity in C. elegans. My own area of expertise, RNA-splicing and the post-transcriptional regulation of C. elegans gene expression, is not directly related to the research presented in the manuscript, but I am familiar with the general concepts and developments involved.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This is a very interesting paper about a gene regulatory mechanism in a type of poly-cistronic mRNA in which alternate starts/open reading frames lead to production of two different proteins from the same locus. PEU-1 is a predicted 166 aa protein, translated from the 5'UTR of zyx-1. Two isoforms are expressed from the 5' UTR and coding region of zyx-1. The presence of overlapping transcripts with zyxin orthologs appears to be conserved in other animals. The authors provide spectroscopic evidence PEU-1 is indeed translated, and show PEU-1 can regulate zyx-1 expression. Intriguingly, it seems peu-1 inhibits zyx-1 expression in cis (deletion of peu-1 increases ZYX-1 peptides), but PEU-1 promotes zyx-1 expression in trans (overexpression of PEU-1 increases ZYX-1 expression).

      Does peu-1 have zyx-1-independent functions or other regulatory targets?

      Fig. 4 would be better if the control (A) and peu-1OE (B) worms were more similar in age and size

      Fig. 5 Even the 'severe' muscle disruption is quite mild (say, in comparison to loss of talin). Perhaps rephrase these categories? The moderate and severe categories also do not look different to me. Show what the muscle cells look like in zyx-1 deletion and overexpression animals.<br /> Is there a way to use quantitative imaging to score these? Can peu-1 phenotypes be rescued or enhanced by expression (or RNAi) of zyxin in the muscle? Also, clarify what age animals are being tested in the muscle and burrowing assay.

      Do the burrowing assay results reflect a neuronal or a muscle function for PEU-1? Or both?

      Minor

      Abstract: Clarify what is meant by 'putative syntenic conservation' or rephrase, simply stating that the existence of an ORF overlapping with the 5' region of zyxin is conserved

      Line 24: Clarify these are synthetic phenotypes (not caused by loss of zyx-1/peu-1 alone). Loss of zyx-1 alone results in very mild phenotypes.

      Line 28: Start new paragraph

      Line 31: Not clear what is meant by 'post-transcriptional regulation can be further propagated'- maybe reword to 'alternative and overlapping open reading frames (ORFs) arising from polycistronic mRNA can regulate translation' or something simpler like that.

      Line 56-57: Is this because most C. elegans transcripts start with the splice leader SL1 or SL2 rather than the adjacent 5' sequence? Is that relevant for zyx-1? Recommend commenting briefly on this.

      Line 78: Delete the word 'other'

      Fig. 2A very faint, increase brightness/contrast?

      Line 122: zyx-1

      Line 137: 'lead' should be 'led'

      Line 158: rephrase 'only the long ones' to indicate which isoforms more precisely

      Line 195: Rephrase. Unclear what is meant by 'highlights the evasiveness of non-canonical ORFs from functional annotation'

      Various locations: I think it will be more clear to the reader to consistently refer to the burrowing assay as 'burrowing assay' rather than chemotaxis. I recommend adding a brief description of the burrowing assay to the results section.

      Fig. S2 Match font sizes on Y-axes. Also, indicate any statistical differences and statistics used.

      Fig. S3 C, indicate any statistical differences and statistics used.

      Referees cross-commenting

      The following is a conversation among the three referees:

      REFEREE #2: I appear to be the dissenting voice in terms of concern about the details of the 27 bp deletion and the "overexpression" constructs. I would be interested to know your opinions regarding my comments on these issues.

      REFEREE #1: I think adding the details of the 27 bp deletion is a reasonable request. It is probably not possible to disambiguate entirely the two effects of the deletion, and changing the start codon may result in an alternate start with other downstream effects. I think just explaining it more fully in the methods would satisfy my concerns.

      REFEREE #2: What about the issues with the overexpresssion? In my experience, presence of multicopy transgenes on an extrachromosomal arrays might not lead to over expression of the gene involved? This needs to be verified in some way.

      REFEREE #1: You are right about that. If the construct is tagged in some way they could try a western. I would recommend they integrate the transgenes, or just show results from several lines as you suggest.

      REFEREE #3: I agree the 27 bp deletion and over expression are reasonable technical issues. However, I view this a techical details vs. critical details for the novel regulatory mechanism. The point about ability to judge conservation is also reasonable but until the theory is firmly out there it is hard to test the conservation and broader applicability to other genes/proteins. Thus, while asking for additional information on these issues is reasonable I do not see the inability to address beyond highlighting as limitation in the text as critical to the overall validity of the work.

      REFEREE #2: I disagree with Reviewer #3, without knowing the details of 27 bp deletion the most reasonable interpretation of the data is simply that it is a loss-of-function allele of zyx-1. This goes beyond "technical" - at present there is no unequivocal evidence that peu-1 has any functional significance beyond that it is expressed as a peptide.

      REFEREE #3: I've been back through the manuscript. They have sequenced the deletion and therefore should be able to provide that information to satisfy the issue(s). For the over expression, short of silencing my experience is that they do over express and when you have multiple lines some express more than others (and some silence more than others). If you want evidence that the peptide is over expressed ask them to quantify via mass spec if it isn't tagged and they can't do a Western. Clearly they have work leading expertise in quantitative mass spec proteomics in C. elegans and should be able to do that. Generally speaking, rescue of a deletion is a pretty good sign that the expression is working though (and is an accepted standard).

      Significance

      Nature and significance of the advance: This is a very interesting paper about a gene regulatory mechanism in a type of poly-cistronic mRNA encoding peu-1 and zyx-1. Intriguingly, it seems peu-1 inhibits zyx-1 expression in cis (deletion of peu-1 increases ZYX-1 peptides), but PEU-1 promotes zyx-1 expression in trans (overexpression of PEU-1 increases ZYX-1 expression).

      Compare to existing published knowledge: This is the first study of its type on zyx-1.

      Audience: Those interested in gene regulatory mechanisms and in zyxin.

      My expertise: C. elegans cytoskeleton, cell migration, acto-myosin contractility.

    1. Transparent Peer Review


      Download the complete Review Process [PDF] including:

      • reviews
      • authors' reply
      • editorial decisions

      </br>

    2. Note: This rebuttal was posted by the corresponding author to Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary<br /> The authors have set out to study the Drosophila immune response against the fungus Aspergillus fumigatus. They found that Aspergillus fumigatus kills Drosophila Toll pathway mutants. The fungus does this without invasion because its dissemination is blocked by melanization. They suggest that there is a role for Toll in host defense distinct from resistance. The findings are interesting, and looks like the mycotoxins play a role. It also seems that there is some role of the Bomanins here, but I find that in particular Figure4 experiments are not convincing enough to provide a mechanistic insight as to what is going on. I think the authors need to think through what their results mean, and also, explain better (especially regarding Fig 4) their ideas and how the data fits them.

      We thank the reviewer for scrutinizing our manuscript as well as for suggestions to improve it.

      The role of mycotoxins is demonstrated:

      i) the fungus does not proliferate nor disseminate, also in Toll pathway mutant flies: thus, it must kill through diffusible substances, in as much as these immuno-deficient flies exhibit tremors toward the end of the infection;

      ii) a fungal strain devoid of the capacity to produce secondary metabolites is no longer virulent, even in Toll pathway mutant flies.

      The role of Bomanins is also demonstrated: the finding of a susceptibility of Bom__D__55C deletion flies to A. fumigatus and to mycotoxin challenges clearly shows that at least one or several Bomanin genes are required in the host defense against these challenges. The observation that this susceptibility can be rescued by the genetic overexpression of specific Bomanins indicates which ones are likely to mediate protection. The novel data we have included with the protection from mycotoxin action in neurons point clearly to BomS6 being the major mediator of protection against verruculogen action since it is the only one of two Bom genes to be induced in the head and with a proven potential for rescue of the Bom__D__55C phenotype.

      As regards the concept of the article, it is simple: we show that the Toll pathway does not control A. fumigatus infection by directly attacking the fungus but does so by neutralizing the effects of secreted virulence factors such as restrictocin and verruculogen. We further identify some of the relevant effectors such as Bomanins by using a genetic complementation strategy. To make our point clearer, we have now included additional data in which we show that BomS6 and BomS4 are the only Bomanins induced in the head of flies upon the injection of these two toxins. We next determine that BomS6 and not BomS4 expression in the nervous system dominantly protects the flies from the deleterious effects of verruculogen injection, both in terms of recovery from tremors and survival. Mechanistically, the Toll pathway protects the host from the action of verruculogen by expressing and likely secreting BomS6 from neurons.

      Major comments:<br /> Page 5: .."the fungal burden did not increase much in MyD88 flies challenged with 50 conidia (Fig. 1B)" - What do you mean did not increase much? There is a clear increase in Myd88 mutants compared to controls; would you expect a bigger increase (e.g. log scale induction)? Explain.

      When the injected dose is higher than 50 injected colonies, the fungal burden remains very close to that of the injected inoculum (Fig. EV1_F, J_). As for other pathogens regulated by the Toll pathway, it has been published that the microbial burden increases by log factors for filamentous fungi (Huang et al.., in revision), pathogenic yeasts (e.g., work from our laboratory Quintin et al. Journal of Immunology, 2013), bacteria (e. g., Duneau et al., eLife 2017; Huang et al., in revision). The pathogens usually proliferate exponentially in immuno-deficient hosts, which is clearly not the case of A. fumigatus, the first example we know of.

      Page 6: "the SPZ/Toll/MyD88 cassette is required for host defense against A. fumigatus infections, even though this pathogen only mildly stimulates the Toll pathway." - Should you rather say that A. fumigatus only mildly induces the Toll pathway target gene Drosomycin?

      The answer is negative. Fig. EV1_C_ clearly shows that BomS1 is also modestly induced as compared to an infection with E. faecalis. The promoter of BomS1 contains a canonical Dif-response element (Busse et al., EMBO J., 2007_)_. For a more thorough discussion of this point, please, see reply to Reviewer 2, Major Comment 2.

      Page 6: "...we tested Hayan mutant flies defective for this arm of innate immunity (Nam et al., 2012)." - elaborate this, which arm/which pathway?

      The title of the paragraph is “Drosophila melanization curbs A. fumigatus invasion”. The full first sentence of the paragraph actually read: “As melanization is a host defense of insects effective against fungal infections, we tested Hayan mutant flies defective for this arm of innate immunity”.

      This has not been introduced in the introduction. Explain.

      We have now added a couple of lines (82-83) to introduce melanization for the nonspecialist reader.

      Can you really draw this conclusion: "We conclude that melanization limits the proliferation and the dissemination of A. fumigatus injected into wild-type flies yet does not eradicate it at the injection site, where a melanization plug forms." Maybe you can based on the function/importance of the pathway to melanization, but you need to explain.

      Melanization is mediated by the Hayan protease and three phenol oxidases (two in adults) that catalyze the enzymatic reactions leading to melanin production (for Drosophila, please see Nam et al. EMBO J. (2012), Bingelli et al., PLoS Pathogen (2014), Dudzic et al., BMC Biology (2015), Cell Reports, 2019). Thus, finding that there is an increased proliferation and dissemination in null Hayan mutants is a strong indication for a role of melanization. The identification of a similar phenotype for PPO2 and PO1-PPO2 mutants demonstrates that melanization is curbing A. fumigatus. Our sentence is therefore fully justified.

      Page 10: "The cleavage of the 18S RNA was however much less pronounced in wild-type flies as compared to MyD88" - I am not sure what this means. Do you mean 28S?

      We thank the reviewer for pointing out this mistake that has now been corrected.

      And that the 28S peak is lower? Is this a quantitative method?

      The technique is liquid electrophoresis on a microchip. It is both a qualitative and quantitative technique that replaces traditional agarose or polyacrylamide gels.

      Fig. legend: "Arrows show the position of the 28S RNA sarcin fragment" - there are three arrows in both Fig 4E and F; specify which arrows point what.

      The thick arrow is now indicated in the figure legend to correspond to the much smaller sarcin fragment whereas the thin arrows on the graph clearly specify the position of the 28S RNA peaks.

      Based on the results, I am not convinced about the conclusion, that "restrictocin is able to inhibit translation to a detectable degree in vivo, likely through the cleavage of the ribosomal 28S a-sarcin/ricin loop as described in vitro." <- Do you draw this conclusion before doing the actual in vitro experiment, which is described next in the text (The rabbit reticulocute assay, S2 cells)?

      The existing literature (line 259 for a few selected references) has largely proven that restrictocin cleaves 28S RNA in vitro. We are demonstrating that this also happens in vivo in flies based on the generation of the alpha-sarcin fragment as well as the decreased 28S peaks. Our transgenic approach also indicates that restrictocin blocks translation in vivo. The in vitro approach has been implemented so that we could test the effect of synthetic BomS1 and BomS3 in cell culture. As to our knowledge, no one had demonstrated that restrictocin blocks translation in Drosophila cultured cells. It was therefore important to demonstrate it in cell culture using well-characterized in vitro techniques mastered by AT and FM.

      4H: Not sure what should be seen here, is it the darkest band at 0 uM that disappears?

      We have improved the figure and added an arrow to point out to the relevant band on the gel.

      HI & J need more explanation than what is now included in the text or Figure legend, is the conclusion that there is no difference? Write the stats above the Figs 4I & J (n.s.?).

      We have added NS on the figures and made our conclusion clearer (lines 295-298).

      Minor comments:

      It would have helped commenting if the manuscript contained line numbers

      We apologize for having initially provided a version in which lines were not numbered. At the prompting of Review Commons we immediately provided such a version, that was actually used by Reviewer 2.

      Why do you have the title "Hayan" on top of Fig 1F; you don't have this marking system in the other survival curves

      This point has now been addressed and the survival experiments checked for consistency.

      Fig 2A: Can you speculate why MyD88 flies die rapidly at day 10 if you inject PBST (your control)? What would happen to uninjected controls in otherwise the same conditions? (you could include an uninjected control here?)

      We suspect that this is linked to the trauma induced by the injection. Trauma has been shown to impact the homeostasis of the midgut epithelium (Lee & Miura, Current Topics Developmental Biology 2014, Chakrabarti et al., PLoS Genetics (2016)), and we suspect that it may lead to a leakiness of the gut allowing the passage of some bacteria from the gut microbiota that can proliferate in the hemocoel. Hence, we checked axenic and antibiotics-treated MyD88 flies to exclude that the limited sensitivity to trauma was not significantly contributing to the phenotypes we describe. It is also linked to the thickness of the needle and the problem is alleviated by using thinner needles.

      The uninjected control is now shown in Fig. EV8_E_.

      Please, see also the answer to Reviewer 2 Major comment 1.

      Fig 2E: Not sure what would be the best way of presenting the curves - different colors, dotted lines or something? Now if there are too many lines, they are hard to tell apart. because the symbols are not that visible. Like in 2E if you want to compare the light red/orange colored lines.

      We agree with the reviewer that the lines are hard to tell apart. This is however not a significant issue since the glip mutants display curves similar to that of the wt A. fumigatus control strain.

      For consistency add the caption also to Fig 3D (I assume it is the same as 3C)

      The caption was present in our version and is present in the revised version.

      For consistency, should you add Verruculogen on top of Fig 3F?

      Same reply as for the previous comment.

      Chronologically, how it is explained in the text, Figs 4A and B are in the wrong order.

      We fully agree with the reviewer. This problem has been addressed in the revised version.

      The quality of Fig 4 is not great, the text is hard to read (too small) and becomes blurry upon magnification.

      We fully agree with the reviewer. This problem has been addressed in the revised version.

      Page 12; "These data then suggest that a process akin to the immune surveillance of core cellular processes first described in C. elegans may also exist in Drosophila" - I think this sentence belongs to the discussion, this is not directly drawn from the results.

      We have followed the reviewer suggestion and have now developed our Discussion paragraph now entitled “Induction of the expression of specific Bomanin genes upon mycotoxin challenge”

      Referees cross-commenting

      I think we share many thoughts among all the reviewers.

      The main problem is that the manuscript language is quite strong; from the results many times it is not ok to make such strong statements. Some experiments need further analysis and clarification.

      I think in most cases, this could be achieved by softening the statements and adding more discussion, and not by making new experiments (some may be needed).

      We respectfully disagree with the reviewer on this point. There were obviously some misunderstandings that might be traced to the short format of the initial version. We have now developed the Discussion to clarify our conclusions as suggested by the reviewer.

      Minor things are that experiments are not advancing in a logical order between the text and the figures and there are problems with resolution in some figures.

      Statistics in some figures needs to be added.

      Please, see above.

      Reviewer #1 (Significance):

      The nature of the work is conceptual for the field, to understand the role of the Toll pathway and Bomanins in particular, in this fungal infection model. The work is interesting to a somewhat limited audience, mainly immunologists and in particular, people interested in the Drosophila model for immunity. The work may be interesting conceptually in understanding fungal infections.

      We are not certain that immunologists represent a limited audience. We agree that work on fungal infections is insufficiently funded with respect to the medical importance of these infections, as highlighted in our introduction and Perspective section of the Discussion.

      My expertise: I am a Drosophila immunity researcher with nearly 20 years of experience in working with fly immunity, in particular the Toll and the Imd pathways.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary:

      Xu et al. describe how A. fumigatus kills Toll-deficient fruit flies not by hyperproliferation, but more likely by virulence factors. Melanization is important for suppressing fungal spread. The Bomanin genes have an unknown function, and here the data suggest a reasonably convincing role for Toll in resilience. Overall the manuscript is thorough and presents a diversity of approaches that show Toll and the Bomanins in particular contribute to this resilience effect. The idea that Toll effectors are essential for resilience is interesting as other fly stress response pathways like JAK-STAT are better known for helping the fly cope with damages, while Toll is better known as an antifungal response.

      I believe the study, with some careful considerations added, would add a valuable series of observations to understanding how the host immune system promotes survival after infection. Overall I am quite positive about the results, and the authors have made a significant effort.

      We thank the reviewer for the positive evaluation of our work that actually spans many years of research on the Aspergillus fumigatus Drosophila infection model that is a major topic of our work at the Sino-French Hoffmann Institute of Guangzhou Medical University.

      Any experiment suggestions I make are strictly to improve the confidence in the interpretations of the results, but the language could alternately be softened to address those concerns. My major critique is that the authors repeatedly extend beyond what is shown, and occasionally in defiance of what is shown (if I understand the results correctly).

      We have chosen to perform additional experiments when needed. We have also clarified points where there were obvious misunderstandings by expanding our text that had been written under a very concise format.

      It is not thoroughly clear what the reviewer has in mind when using the word defiance. We suppose it refers to the work of Scott Lindsay with whom we are in contact. He actually attempted to monitor the C. glabrata burden but did not pursue this line of investigations as he already saw a difference after one hour and he thought that the Toll pathway cannot be induced so rapidly. Actually, David Duneau mentions a time of two to three hours for the Toll pathway to control E. faecalis infections (eLife, 2017) and Sandrine Uttenweiler-Joseph already saw by MALDI-TOF MS an induction of Bomanins and other DIMs at the earliest point tested, six hours (PhD thesis). There is absolutely no critique of the work of the Wasserman laboratory who has greatly contributed to our understanding of Bomanin functions. Some of our unpublished data clearly point out to an AMP role for at least one Bomanin gene against E. faecalis and we certainly do not exclude an AMP role for BomS against C. glabrata. This however does not dismiss the possibility that Bomanins may also have other roles in dealing with microbial toxins. We have been studying Candida infections in Drosophila for many years and have documented the host defense against C. glabrata (Quintin et al., JI, 2013). We do suspect that C. glabrata likely secretes virulence factors that have not been identified so far. We mention this as a possibility and certainly not as a truth. One should remember that investigators were unaware for a long period of the role of Candidalysin, a pore-forming toxin, in C. albicans infections.

      Finally, a dual role as AMP and protecting from secreted toxins has been clearly shown in the case of alpha-mammalian Defensins that we now are describing in our Revised Discussion (Kudryashova,Immunity, 2014).

      Comments below.

      Major comments:

      1) The language is too strong. Specifically the use of the phrase "anti-toxin" is too generalist, especially as the authors show that their candidate Bomanin does not bind to the toxin directly.

      We have checked all of the submitted documents: the term anti-toxin was never used (just found “anti” in antimicrobial, antifungal, antibiotics..), in this manuscript as well as in the companion article. and we have never excluded an indirect effect, quite to the contrary because of the in vitro experiment with restrictocin mentioned by the reviewer and other observations now included (see further below). We use the terms “protection” or “counteract”, which have not such a meaning. It is burdensome for the reader to read each time “counteract or protect from the actions of the toxins or the effects of the toxin.

      Instead, Toll mutants seem susceptible to damage/stress caused by injury/toxins. MyD88 even show general susceptibility to vehicle controls in Fig3C-D.

      The effects of stress related to the infection conditions and injury are clearly distinct from the much stronger ones exerted by the toxins themselves. As requested by the reviewer further below, we have submitted wild-type and immuno-deficient flies to several stresses such as heat or the injection of hydrogen peroxide or salt solution (Fig. EV8_B-E_). While the latter did not reveal any difference, MyD88 flies succumbed slightly faster to a strong 37°C stress; in contrast, they survived better to a 29°C exposure, the temperature at which we perform most experiments. However, the difference started to be visible only after some 15 days whereas the time frame in which flies succumb to A. fumigatus or toxin challenges is definitely much shorter by some 10 days. We also note that Bom__D__55C mutant flies behave like the isogenized wild-type controls in these assays, further excluding a potential role for general stress sensitivity as a contributor to the effect of toxins.

      As regards DMSO, there is indeed a general mild sensitivity of flies to DMSO, but not specifically affecting MyD88 mutant (Rebuttal Fig. 1J). We find that this effect is lessened when using thinner needles. Thus, the problem has become minor as we became more experienced. We had checked axenics- and antibiotics-treated flies to exclude a contribution from the microbiota. Finally, to uncouple the effects of verruculogen from those of DMSO, we have also challenged flies directly by introducing the powder, using a technique similar to that of the septic injury. While it is quantitatively less accurate, it clearly proves that verruculogen produces the reported effects (Fig. 3C) and was useful to measure Bom and Drosomycin expression by digital PCR in the heads of challenged flies, e.g., Fig. EV6_J-K_ and Figs EV_11&12_.

      Toll is important for development, so it may be expected that Toll flies could have development defects impacting resilience even if/when Toll flies can survive to adulthood. I don't say this to be too negative on the findings, which are quite convincing. But I am not sure that the phrase "anti-toxin" is right for what is shown.

      We fully agree with the reviewer on this point. We have failed to find RNAi lines that are efficient enough to mimic the Toll pathway phenotype when expressed ubiquitously at the adult stage. However, Bom__D__55C mutants do not seem to display a developmental phenotype and display a phenotype similar to that of MyD88 flies. Furthermore, our rescue experiments of the Bom__D__55C sensitivity phenotype to mycotoxin challenge is achieved by the overexpression of specific Boms that are induced only at the adult stage, making it unlikely that this sensitivity phenotype reflects a developmental problem, as had been shown to be the case for 18-wheeler that had initially been proposed to encode the IMD pathway receptor.

      A very interesting recent study shows Dif has a role in the synapse of neurons to protect from alcohol sensitivity. Could secreted Bomanins participate? This emphasizes a mechanism through which Toll mutants likely have defective neural development, which could make them stress response defective, especially to things like neurotoxins. See: https://pubmed.ncbi.nlm.nih.gov/35273084/

      We are aware of this study first presented at the 2019 Fly Meeting in Dallas and this author did discuss with the authors of the study. However, we have found that Dif (and Dorsal) mutants are not sensitive to A. fumigatus infections nor to injected mycotoxins, as was the case already for C. glabrata (Quintin et al., JI, 2013).

      Lin et al. (2019) also showed lack of Bomanin secretion from the fat body in Bombardier mutants causes loss of tolerance (resilience?). So does Bomanin disruption increase susceptibility to stresses more generally, rather than specifically fungal toxins? And is this a development role, rather than an immune response role?

      The authors could try to use other stresses (NaCl, oxygen, heat, alcohol) to test the contribution of Bomanins to this resilience, which may reflect defective neural development rather than a role for secreted systemic immune-response peptides.

      Please, see replies above.

      2) The authors present a paradox. On the one hand, A. fumigatus hardly induces Drs/Bomanins (Fig. S1). Yet on the other, they propose that inducible Bomanins protect the fly from mycotoxins. Why do the authors say Toll is hardly induced by A. fumigatus at the start of the study (Fig S1), but later use the same data to argue that Bomanin induction underlies the resilience phenotype (Fig5).

      The reviewer raises an interesting point. Of note, we have added new data in Fig. EV2_B_ that document that all 55C Bomanin genes, BomS4-_excepted, are induced by a systemic infection. There is indeed somewhat of a paradox. The _Bom__D__55C deletion phenotype clearly establishes that Bomanins play a major role in the protection against mycotoxins and A. fumigatus. The rescue experiments rely on ectopic expression and therefore establish that specific Bomanins can mediate the protective effect. Our data on verruculogen suggest that there might be local inductions, e. g., in the head of BomS6 and BomS4. The brain represents a compartment that is separated from the hemocoel by the blood-brain-barrier. We have not been able to generate BomS6 null mutants so far. In this case, the relevant response may not be systemic. We only detect a weak signal for BomS peptides in the hemolymph of unchallenged flies, making it unlikely that a basal expression is important, at least as regards a systemic infection. We cannot however exclude local inductions at the level of tissues. This would not rely on hemocytes as “hemoless” flies are not susceptible to A. fumigatus or toxin challenges. This topic definitely warrants further investigations.

      In Fig 5, it looks like DMSO is nearly identical to A. fumigatus, so can the authors really suggest that equal induction to DMSO is relevant?

      We had stated that an induction of the Bomanins by the injection of DMSO alone precluded us from analyzing the effects of verruculogen on Bom gene expression. We have now bypassed this difficulty through direct challenges by the undissolved powder (Fig. 6_J-K,_ Fig. EV11).

      The authors' discussion of these points would benefit from considering Vaz et al. (2019; Cell Rep) to frame how much PAMP is injected given equal numbers of fungal cells vs. bacterial cells. To me the lower induction by injecting a few fungal cells with much lower surface area to volume ratio means equal microbe mass has exponentially less PAMP in fungal conidia cell walls (2-3um diameter) vs. equal mass of bacteria (0.5-1um diameter).

      We fully agree with the reviewer and now mention that C. glabrata also led to a milder induction of the Toll-mediated humoral response (Quintin et al. JI, 2013). In addition, it has been shown previously that ß-(1-3)-glucans, which are sensed by GNBP3 in Drosophila (Gottar et al., Cell, 2006), are concealed by the cell wall (germinating conidia) or hydrophobins (Wheeler et al., PLoS Pathogens, 2006; Aimanianda et al., Nature, 2009) . In the case of yeasts, these glucans are accessible only at the budding scar (Gantner et al., EMBO J., 2003).

      Fig S1O is not convincing that Boms alone are present. There is significant noise near Drs in FigS1 infected, which likely saturates the detector before Drs can fly to it. I say this because DIM4 (Daisho) indicates that Toll is strongly induced. The authors should show a larger mass range on the x-axis including peaks of other Toll-induced peptides like the BaramicinA DIM10, DIM12 and DIM13 peptides of their companion paper and DIM14 (Daisho), which are closer in mass to the Bomanins and less likely disrupted by the noise at 4300 m/z. The maldi-tof calibration to correct ranges is critical for arguments of quantification.

      We provide the primary data in the Rebuttal figures at the end of this document. These are the results obtained from three single flies (Files A29683PBUG22, A29684PBUG23 and A29684PBUG24). The first three spectra correspond to the full scale based on the major peaks observed (DIM4/BomS5) in two out of three spectra. At this scale, no signal is visible for Drosomycin at 4891 and the “noise” at 4278 is modest. Next, the multi-spectra report allows to put all three samples on the same sheet, this time zooming on the peaks of interests in the region 4300 (“noise”) and 4891 (Drosomycin). Finally, the next two pages zoom in on the BomS peptide signals and the next page keeps the same scale to document the 4300-5000 region. On the last page, it is obvious that the signal around 4300 is very modest and too distant to influence the Drosomycin ion, thereby excluding any effect of suppression. Of note, in the systemic immune response, Drosomycin is the most induced AMP with a concentration estimated to be around 0.3µM, an order of magnitude higher than other AMPs. Finally, these experiments have been performed by PB who initially developed the technique (Uttenweiler-Joseph, PNAS, 1998) and has been using and developing it ever since.

      Combined with comments in Major Concern 1, I am not convinced that the -inducible- Bomanin response mediates the resilience phenotype.

      Besides our replies above, we do hope that the new data we have included in Fig. 6 that document an induction of only two BomS genes in the heads of Drosophila upon verruculogen and the finding that BomS6 expression in the nervous system protects the fly from the effects of verruculogen will convince this reviewer.

      3) The author's language is very strong to disregard a possible antimicrobial activity.

      As noted above, this is a misunderstanding that we hope is dispelled in the revised discussion (see also above and replies to Reviewer 1).

      Previous studies showed increased Candida growth and decreased hemolymph killing activity in Bom55C flies (Lindsay et al. 2018 and Hanson et al. 2019).

      Please, see reply above. Factually, Lindsay et al. did not study the C. glabrata titer in vivo but using collected hemolymph. The killing activity likely requires a cofactor regulated by the Toll pathway. Hanson investigated the burden of the dimorphic C. albicans pathogen that in flies is filamentous and not C. glabrata.

      Also see minor concern (i).<br /> I grant that the data are consistent with a resilience role. However the authors found no binding of Bomanin to restrictocin, countering their idea of a -direct- anti-toxin effect.

      We are surprised by this comment. We certainly did not favor this idea nor developed it in the original manuscript, even though we cannot formally exclude it at this stage. Future experiments will focus on BomS6 potential interactions with these two mycotoxins.

      At present the authors cannot rule out a direct antimicrobial role, or even the possibility of two different roles for the same peptides (ex: one in resilience, one antimicrobial). For instance, it is difficult to explain the loss of killing activity of Bom-deficient hemolymph ex vivo from Lindsay et al. if Bomanins are strictly anti-toxins. Surely they must also do something generalist?

      Please, see our replies above and the paragraph dedicated to this topic in the Discussion.

      4) In most figures, the authors do not compare flies with shared genetic backgrounds.

      The MyD88 allele we are using is a transposon insertion from the Exelixis collection and we are using the wA5001 strain that was used to generate the collection of insertion (Thibault et al., Nat. Genetics 2004). We thank the reviewer for this comment as we realized we had forgotten to mention the Bom__D__55C strain. Lines 603-604 state that the deficiency line has been isogenized in the wA5001 background.

      The phenotypes are usually strong so I am not concerned.

      However the rescue effect of Bom transgenes in Fig 5C-D is based on smaller differences. Were these genetic backgrounds controlled?

      Yes, as much as we reasonably could. The fact that most BomS transgenes did not rescue gives further confidence in the data.

      Were transgenes inserted at the same site?

      We used the strategy for overexpression developed by the Basler laboratory (Bishof et al., Development 2013, Nat. Protocols 2014) that relies on insertions at the same site.

      The authors seemingly used a heat shock to express transgenes.

      Heat-shocks are usually a short exposure to higher temperatures, usually 37°C. Here, we have used the inducible Gal4-Gal80ts system developed by McGuire and Davis (Trends in Genetics, 2004). The Gal80 repressor inhibits Gal4 function at the permissive temperature (18°C) and becomes inactive at the restrictive temperature (29°C). Thus, we use a temperature shift and not a bona fide heat shock.

      Given a resilience effect is being studied, this heat stress approach is sub-optimal. Earlier experiments showing effect/no effect of Bomanin on heat shock resilience would improve confidence here. I would recommend assaying temperatures that can kill wild-type in order to confirm that Bom do not succumb earlier (ex. up to 37'C).

      The results have been discussed above and show that 29°C is not a concern for Bom__D__55C and not much of a significant problem as regards MyD88.

      In Fig5C the time resolution is poor, and the effect inconsistent across Bomanins. What are the differences in the Bomanins that the authors suspect could cause this? And how consistent are the experiments?

      We provide all the primary survival data in Rebuttal Fig.1 A-H. The partial protection effects of BomBc1, BomS3 and BomS6 against restrictocin are consistent in the three independent experiments (Fig. 5D and Rebuttal Fig. 1 A-B). As regards the seven independent experiments performed with verruculogen, we observed a strong protection conferred by BomS6 expression in six experiments whereas we detected a milder protection conferred by BomS1 in four out of seven experiments and no protection in the three other ones. The effects were always there after 24 hours, in keeping with our novel data showing that BomS6 expression allows a faster recovery, around 10 hours, from verruculogen-induced tremors (Fig. 6E-F).

      Since the effect is finished by 24h, perhaps a boxplot of percent survival at this time would better show the consistency across experiments.

      Given the argument presented just above and considering that this rebuttal letter will be published alongside the article, this may not be needed.

      Minor concerns:

      i) The authors say the fungal burden of Bom55C flies remains low in Fig 5B, but they never measure flies that are near death when fungal load is greatest, or FLUD like in other figures. Given low mortality at the following time points, it seems likely that A. fumigatus would grow beyond initial loads in those individuals and kill them. I grant that these loads are less than what is seen in Hayan mutants. I just might suggest a more careful consideration of the time points used and what can be said about the trends shown here.

      This is certainly a relevant point. The FLUD data are now presented in Fig. EV8_A_ and do not reveal any additional growth.

      ii) Could the authors comment somewhere about the levels of toxin they were required to inject to get a phenotype vs. the level of toxins the authors expect are found in the fly during infection? I appreciate that toxin injection likely requires much higher doses, but it would be good to know just how far the authors have pushed their experimental system beyond its natural range.

      This a question that is difficult to answer accurately as we are not sure the techniques exist to measure toxin levels in these small flies. We have tested a range of concentrations. It is clear that we push the system and likely use concentrations that are higher than those actually secreted by A. fumigatus during infection. Indeed, the mutant strains defective for the production of verruculogen or restrictocin display only a mildly reduced virulence in MyD88 flies. This makes it even more remarkable that wild-type flies are able to withstand these high, unphysiological concentrations, an argument for an indirect effect independent of the dose as pointed out now in the Discussion. How fungal pathogens balance the expression of the hundreds of secreted virulence factors, proteins and secondary metabolites, is a major frontier for future investigations be them plant or animal pathogenic fungi/

      Again regarding toxins vs. general stresses, one could manage to inject salt into the hemolymph and show a stress-sensitized fly would succumb at lower doses than wild-type, emphasizing the relevance of defining concentrations.

      We feel that just monitoring the survival of flies after a challenge that produces an effect is sufficient (Fig. EV8_C_).

      The authors could also write toxin concentrations clearly in the figure/legend per experiment.

      Corrected.

      iii) Throughout the manuscript, the order that figures/panels are cited is inconsistent. Perhaps the text could be re-written so the reader can follow the figures more intuitively while going through the text?

      Corrected.

      iv) There are a few points where run-on sentences, involving many commas, make it hard to follow the logic. I might suggest a careful reading to break up long sentences into two sentences to ensure clarity.

      We hope to have addressed this concern.

      v) Line 279-281: this is the first and only mention of the immune surveillance hypothesis in nematodes. This is strange, given the authors are effectively describing an analogous idea exists in flies? Perhaps this could be added somewhere in the introduction or discussion.

      We have followed the advice of the reviewers and now discuss this point more fully in the Discussion under its own subheading.

      Small points

      • What timepoints are the gene expression data from? Could the authors indicate this in figures/legends?

      Done

      • Line 133-135: "We conclude that MyD88 flies succumb to a low A. fumigatus burden..." - could the authors cite a figure panel here to emphasize what evidence they're referring to.

      Done

      • Line 151-152: Dudzic et al. (2019- Cell Reports Figure 3) showed that PPO2 was regulated by Hayan, while PPO1 by Sp7. This relevant study should be cited here or in the introduction/discussion.

      Excellent suggestion, this was indeed an important study. Done

      • Line 179-180: could the authors define the gliotoxin mutant strain here in the text for clarity?

      Done

      • Line 196: Fig. 4A-B should be Fig. **S4 A-B?

      Corrected.

      • Fig4A: perhaps the authors could reduce the x-axis to focus on the early time points? If I understand correctly, aspf1 has slightly delayed killing compared to akuB (˜50% difference at 2 days), but both kill 100% by 3 days.

      Done

      • Fig4G: can the authors define the GFP transgene on pg10? Not clear what this is, or what this means. Brain? Fat body? The legend of Fig4G and the key in the top left... it's not easy to quickly understand what is shown in Fig4G.

      Done

      • Line 247: I would drop the "at the intracellular level" part. I'm not sure this is robustly shown given the use of an in vitro model where there is no closed extracellular environment. The data are convincing of the effect, this is just a semantic point.

      We agree that there is no closed extracellular environment and that therefore any signal emitted by the cells might get too diluted. However, the addition of EGF will activate the Toll intracellular through the chimeric EGFR-Toll receptor. As restrictocin is known to act intracellularly, one might have though that there might be some intracellular effectors mediating the Toll-dependent protection against restrictocin. Our sentence excludes this possibility.

      • Line 257-258: Cohen et al. (2020- Front Imm) never used Bomanin mutants. Did the authors mean to cite Hanson et al. (2019) here, which seems to fit their described citation re: Bom55C vs. Toll mutant flies (Fig. 2)? Given Hanson et al. infected Toll mutant and Bom55C flies with many bacteria/fungi including A. fumigatus, it's strange this study is not discussed currently.

      The reviewer is correct. Cohen et al. did use A. fumigatus, but on Daisho mutants and MyD88 and not Bom__D__55C as a control. We are now citing Hanson et al., 2019 in lines 443-449 (Discussion).

      • Fig5C-D: the labeling is difficult to follow.

      This is difficult to address unless multiplying EV figures. We feel this is not needed: the important curves are in color and each such curve is seen on the graphs.

      • Line 318: a -possible- AMP role of Bomanins was proposed because of the aforementioned killing activity of wt but not Bom mutant hemolymph, alongside rescue by single Bom genes. To say this was based only on survival experiments is incorrect.

      The paragraph has been rewritten and expanded to dispel any misunderstanding.

      • Line 324-328: could the authors cite appropriate references after "inhibition of calcium-activated K+ channels" ?

      Done

      • comment re-Line 334: Toll10b flies have melanotic tumors and are in general in a stressed state. Might their rescue be due to increased stress tolerance by pre-activated stress responses?

      This is a developmental effect occurring during larval stages, also observed for Cactus mutants. Here, we use a UAS-Tl10B transgene that is induced only at the adult stage using the Gal4-Gal80ts system. Thus, any stress is minimized as much as possible. Furthermore, we can phenocopy this phenotype to a large extent using a UAS-BomS6 driver, even though the phenotypes are subtly different as regards the protection against verruculogen-induced tremors.

      Referees cross-commenting

      Yes I agree that the data themselves are not the issue, nor even the direction of the results. But there are many overly-strong statements that go so far as to refute ideas which are supported by other studies, and for which the authors here do not provide any contradictory evidence.

      We hope that this revised, extended version has clarified any misunderstanding in the initial version.

      As per my review, I would be happy with a re-write that softened the language overall. I genuinely wonder if these Bomanin mutants simply have poor development, and so they are susceptible to neurotoxins/stress because their nervous system/development leaves them less resilient in general. Experiments testing their resilience to different stresses would greatly elevate the ability to make confident insights in the present manuscript. Currently the authors have only investigated one type of phenotype and interpreted it as if that is evidence of the evolved purpose of the peptides. This approach does not account for many other possible (and reasonable) explanations.

      We have performed the experiments suggested by the reviewer. While we see a modest effect of heat on MyD88, it is not found in Bom__D55C flies, which display essentially the same phenotype as MyD88 with regards to the sensitivity to A. fumigatus or some of its secreted mycotoxins_._

      Reviewer #2 (Significance):

      This paper should be of broad interest to the study of immunology, where roles for effectors are typically thought of as cytokines. In fruit flies and other invertebrates that lack adaptive immunity, immune effectors are more thought of as direct actors likely with antimicrobial properties. The finding that Toll might mediate resilience is interesting, and implicating well known Toll effectors provides an important step forward towards a mechanistic basis behind this resilience effect.

      We thank the reviewer for his appraisal of the significance of our work.

      My expertise is in insect and innate immunity.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Xu et al. describe how A. fumigatus kills Toll-deficient fruit flies not by hyperproliferation, but more likely by virulence factors. Melanization is important for suppressing fungal spread. The Bomanin genes have an unknown function, and here the data suggest a reasonably convincing role for Toll in resilience. Overall the manuscript is thorough and presents a diversity of approaches that show Toll and the Bomanins in particular contribute to this resilience effect. The idea that Toll effectors are essential for resilience is interesting as other fly stress response pathways like JAK-STAT are better known for helping the fly cope with damages, while Toll is better known as an antifungal response.

      I believe the study, with some careful considerations added, would add a valuable series of observations to understanding how the host immune system promotes survival after infection. Overall I am quite positive about the results, and the authors have made a significant effort. Any experiment suggestions I make are strictly to improve the confidence in the interpretations of the results, but the language could alternately be softened to address those concerns. My major critique is that the authors repeatedly extend beyond what is shown, and occasionally in defiance of what is shown (if I understand the results correctly). Comments below.

      Major comments:

      1. The language is too strong. Specifically the use of the phrase "anti-toxin" is too generalist, especially as the authors show that their candidate Bomanin does not bind to the toxin directly. Instead, Toll mutants seem susceptible to damage/stress caused by injury/toxins. MyD88 even show general susceptibility to vehicle controls in Fig3C-D. Toll is important for development, so it may be expected that Toll flies could have development defects impacting resilience even if/when Toll flies can survive to adulthood. I don't say this to be too negative on the findings, which are quite convincing. But I am not sure that the phrase "anti-toxin" is right for what is shown.<br /> A very interesting recent study shows Dif has a role in the synapse of neurons to protect from alcohol sensitivity. Could secreted Bomanins participate? This emphasizes a mechanism through which Toll mutants likely have defective neural development, which could make them stress response defective, especially to things like neurotoxins. See: https://pubmed.ncbi.nlm.nih.gov/35273084/<br /> Lin et al. (2019) also showed lack of Bomanin secretion from the fat body in Bombardier mutants causes loss of tolerance (resilience?). So does Bomanin disruption increase susceptibility to stresses more generally, rather than specifically fungal toxins? And is this a development role, rather than an immune response role?<br /> The authors could try to use other stresses (NaCl, oxygen, heat, alcohol) to test the contribution of Bomanins to this resilience, which may reflect defective neural development rather than a role for secreted systemic immune-response peptides.
      2. The authors present a paradox. On the one hand, A. fumigatus hardly induces Drs/Bomanins (Fig. S1). Yet on the other, they propose that inducible Bomanins protect the fly from mycotoxins. Why do the authors say Toll is hardly induced by A. fumigatus at the start of the study (Fig S1), but later use the same data to argue that Bomanin induction underlies the resilience phenotype (Fig5). In Fig 5, it looks like DMSO is nearly identical to A. fumigatus, so can the authors really suggest that equal induction to DMSO is relevant?<br /> The authors' discussion of these points would benefit from considering Vaz et al. (2019; Cell Rep) to frame how much PAMP is injected given equal numbers of fungal cells vs. bacterial cells. To me the lower induction by injecting a few fungal cells with much lower surface area to volume ratio means equal microbe mass has exponentially less PAMP in fungal conidia cell walls (2-3um diameter) vs. equal mass of bacteria (0.5-1um diameter).<br /> Fig S1O is not convincing that Boms alone are present. There is significant noise near Drs in FigS1 infected, which likely saturates the detector before Drs can fly to it. I say this because DIM4 (Daisho) indicates that Toll is strongly induced. The authors should show a larger mass range on the x-axis including peaks of other Toll-induced peptides like the BaramicinA DIM10, DIM12 and DIM13 peptides of their companion paper and DIM14 (Daisho), which are closer in mass to the Bomanins and less likely disrupted by the noise at 4300 m/z. The maldi-tof calibration to correct ranges is critical for arguments of quantification.<br /> Combined with comments in Major Concern 1, I am not convinced that the -inducible- Bomanin response mediates the resilience phenotype.
      3. The author's language is very strong to disregard a possible antimicrobial activity. Previous studies showed increased Candida growth and decreased hemolymph killing activity in Bom55C flies (Lindsay et al. 2018 and Hanson et al. 2019). Also see minor concern (i).<br /> I grant that the data are consistent with a resilience role. However the authors found no binding of Bomanin to restrictocin, countering their idea of a -direct- anti-toxin effect. At present the authors cannot rule out a direct antimicrobial role, or even the possibility of two different roles for the same peptides (ex: one in resilience, one antimicrobial). For instance, it is difficult to explain the loss of killing activity of Bom-deficient hemolymph ex vivo from Lindsay et al. if Bomanins are strictly anti-toxins. Surely they must also do something generalist?
      4. In most figures, the authors do not compare flies with shared genetic backgrounds. The phenotypes are usually strong so I am not concerned.<br /> However the rescue effect of Bom transgenes in Fig 5C-D is based on smaller differences. Were these genetic backgrounds controlled? Were transgenes inserted at the same site? The authors seemingly used a heat shock to express transgenes. Given a resilience effect is being studied, this heat stress approach is sub-optimal. Earlier experiments showing effect/no effect of Bomanin on heat shock resilience would improve confidence here. I would recommend assaying temperatures that can kill wild-type in order to confirm that Bom do not succumb earlier (ex. up to 37'C).<br /> In Fig5C the time resolution is poor, and the effect inconsistent across Bomanins. What are the differences in the Bomanins that the authors suspect could cause this? And how consistent are the experiments? Since the effect is finished by 24h, perhaps a boxplot of percent survival at this time would better show the consistency across experiments.

      Minor concerns:

      • i) The authors say the fungal burden of Bom55C flies remains low in Fig 5B, but they never measure flies that are near death when fungal load is greatest, or FLUD like in other figures. Given low mortality at the following time points, it seems likely that A. fumigatus would grow beyond initial loads in those individuals and kill them. I grant that these loads are less than what is seen in Hayan mutants. I just might suggest a more careful consideration of the time points used and what can be said about the trends shown here.
      • ii) Could the authors comment somewhere about the levels of toxin they were required to inject to get a phenotype vs. the level of toxins the authors expect are found in the fly during infection? I appreciate that toxin injection likely requires much higher doses, but it would be good to know just how far the authors have pushed their experimental system beyond its natural range. Again regarding toxins vs. general stresses, one could manage to inject salt into the hemolymph and show a stress-sensitized fly would succumb at lower doses than wild-type, emphasizing the relevance of defining concentrations. The authors could also write toxin concentrations clearly in the figure/legend per experiment.
      • iii) Throughout the manuscript, the order that figures/panels are cited is inconsistent. Perhaps the text could be re-written so the reader can follow the figures more intuitively while going through the text?
      • iv) There are a few points where run-on sentences, involving many commas, make it hard to follow the logic. I might suggest a careful reading to break up long sentences into two sentences to ensure clarity.
      • v) Line 279-281: this is the first and only mention of the immune surveillance hypothesis in nematodes. This is strange, given the authors are effectively describing an analogous idea exists in flies? Perhaps this could be added somewhere in the introduction or discussion.

      Small points

      • What timepoints are the gene expression data from? Could the authors indicate this in figures/legends?
      • Line 133-135: "We conclude that MyD88 flies succumb to a low A. fumigatus burden..." - could the authors cite a figure panel here to emphasize what evidence they're referring to.
      • Line 151-152: Dudzic et al. (2019- Cell Reports Figure 3) showed that PPO2 was regulated by Hayan, while PPO1 by Sp7. This relevant study should be cited here or in the introduction/discussion.
      • Line 179-180: could the authors define the gliotoxin mutant strain here in the text for clarity?
      • Line 196: Fig. 4A-B should be Fig. **S4 A-B?
      • Fig4A: perhaps the authors could reduce the x-axis to focus on the early time points? If I understand correctly, aspf1 has slightly delayed killing