- Jul 2018
-
europepmc.org europepmc.org
-
On 2016 Mar 05, Tamir Tuller commented:
In this paper Shah et al. propose a computational model for whole-cell translation. This model is used to test a number of hypotheses regarding the contribution of various factors to global protein production in the cell. As we describe below, the study (at least in its current form), is clearly not suitable to be published in Cell. Specifically, the reported results are either wrong or not new. In addition, the described model has many disadvantages that are not discussed, and it is clearly not evaluated accurately. Furthermore, while the authors claim the model is more comprehensive than previous models, many fundamental aspects mentioned in previous studies are not included in it. Below is a partial review; a more comprehensive review can be downloaded also from http://www.cs.tau.ac.il/~tamirtul/Shah_et_al_review.pdf
Diament & Tuller 5.3.2016
1) The reported correlation between estimated initiation probabilities and predicted 5'-end mRNA folding energies is significant, but very low (R=0.125). Furthermore, the data presented in Figure 5A is in fact binned. This fact is specifically interesting, as the authors discourage the use of bins, but at the same time note that it is “sometimes appropriate to bin the data for visualization purposes”. ~60 bins are used for 3,795 values, and while error bars denote the variance in each bin, the “visual representation” gives the impression of a strong relation between the two variables using very few points, which is not the case. In addition, the authors should include a control, e.g. partial correlation, for other important variables, such as the Kozak score (Kozak,Cell,1986), when computing this correlation (which eventually may be significantly lower) as well as CAI, amino acid charge, and GC content, at the beginning of the coding region, etc. These variables are known (and were known at the time of writing this paper) to have typical signals in the 5’-end of the ORF that are correlated with expression level (and initiation rate), for example see (Plotkin&Kudla,Nat Rev Genet,2011). We also recommend that statistical significance for this correlation be tested using an empirical null model. For example, amino acid preserving sequences in the relevant window can be generated at random to estimate the probability of observing a correlation of 0.125 between inferred initiation rates and mRNA folding energy. This is important since the distribution of amino acids near the N-terminal of the protein may induce the folding strength via their specific codon coding them.
2) This is not the first study that includes/suggests the analyses of many intracellular components (mRNA and ribosomes). However, previous studies on the topic such as (Brackley et al.,PLoS-CB,2011; Chu et al.,2011; Cook&Zia,2009; Cook et al., Phys.Rev.E,2009; Greulich et al., Phys.Rev.E,2012; Heldt&Thiel,2009; Jonathan et.al.,2012; Karr et al., Cell,2012; Mather et al.,2013) are not cited.
3) The major fundamental disadvantage when developing a model with many parameters is overfitting (Babyak,Psychosom Med,2004), or accumulation of error due to individual errors in the different parameters. Surprisingly, the authors do not bother to discuss this issue, and to demonstrate that this is not the case with the model.
4) All the conclusions reported in this paper based on the model are either wrong or not new. Thus, the ability of this model to provide novel conclusions is not convincing at all. For example (points mentioned in the abstract), the correlation between initiation and folding energy has been suggested many times before (Jia&Li, FEBS Lett.,2005; Kudla et al., Science,2009; Sagliocco et al., J.Biol.Chem.,1993; Schauder&McCarthy, Gene,1989; Wang&Wessler, Plant Physiol.,2001) and is very low (r=0.125); the idea that the ramp is caused by rapid initiation is wrong, among others, due to the fact that the authors did not normalize each profile by dividing it by the mean as was performed in previous studies they cite. The conclusions regarding the fact that “protein production in healthy yeast cells is typically limited by the availability of free ribosomes, whereas protein production under periods of stress can sometimes be rescued by reducing initiation or elongation rates” is not new (Bergmann&Lodish, J.Biol.Chem.,1979) and can’t be accurately evaluated via the model due to overfitting issues mentioned above, and missing fundamental aspects mentioned below and above.
5) The authors ignore previous studies that suggested that elongation is not only due to adaptation to the tRNA pool. For example, the model does not consider important aspects related to the elongation rate that were previously reported and suggested to have stronger effect on elongation than tRNA levels. Expressly, the effect of mRNA folding (Nackley et al.,Science,2006; Plotkin&Kudla,Nat Rev Genet,2011; Pop et al., MSB,2014; Tuller et al.,GB,2011; Yang et al.,PLoS Biol,2014) and amino acid content (Charneski&Hurst,PLoS Biol,2013; Lu&Deutsch,JMB,2008; Lu et al.,JMB,2007; Muto&Ito,2008; Pavlov et al.,PNAS.,2009), as well as wobble basepairing (Stadler&Fire,RNA,2011). In addition, the analysis is partially based on Ribo-Seq data, but measurements are not used to infer reliable codon decoding rates. Previous studies (e.g. Charneski&Hurst,2013) suggested that these aspects are significantly more important than the tRNA pool adaptation. Thus, without considering these fundamental aspects it is not clear what we can learn from this model (!).
6) The analysis of the ramp is clearly wrong and was not performed as in previous studies (see, explanations in Figure 5A in (Tuller&Zur, NAR,2015)). Ribosome occupancies must be normalized per gene by its average ribosome occupancy before averaging each codon position across the transcriptome (this is how this was performed in previous studies e.g. (Ingolia et al.,2009)(!)). We must say we are surprised that the authors are not familiar with the details and methodologies used in previous studies.
7) The reported results contradict previous papers of the authors’ themselves (Kudla et al., Science,2009), but this is not discussed. In (Kudla et al.,2009), the authors claimed that slower codons are related to higher ribosomal densities and vice versa. The 5' end of the ORF has been shown to be enriched with slower codons (Plotkin&Kudla,Nat Rev Genet,2011). Thus, by concluding that the 5' ramp is not related at all to codon usage bias in this Cell paper, the authors contradict themselves (as it is related to the claim that codon bias and ribosome densities are unrelated). The reported results are also contradicted by papers the authors published after this study (Weinberg et al.,2016).
8) The performance related to all the aspects of the model should be comprehensively compared with that of other models, and include a balanced discussion about the relative advantages of the different models. Otherwise, the advantages of the proposed model are not clear and misleading. The following are some points to be considered: running time, number of parameters needed and the availability of the parameters, the possibility of analytical analysis of the model, predictions using real and simulated data, etc.
9) “Interestingly, we also found a negative correlation between initiation probability and open reading frame (ORF) length...” This result can be at least partially explained by the fact that in (Ingolia et al., 2009) there is an increased ribosome density at the 5' end (partially due to biases as suggested in (Ingolia et al.,Cell,2011)). This should have a stronger effect on shorter genes. At the time of publishing and working on this study it was already very clear that the higher density of ribosomes at the 5’end of the mRNA was at least partially due to biases in the experiment (see Ingolia et al.,2011); nevertheless, the authors do not mention or consider this fundamental issue in their analysis/conclusions at all(!).
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
europepmc.org europepmc.org
-
On 2016 Mar 05, Tamir Tuller commented:
In this paper Shah et al. propose a computational model for whole-cell translation. This model is used to test a number of hypotheses regarding the contribution of various factors to global protein production in the cell. As we describe below, the study (at least in its current form), is clearly not suitable to be published in Cell. Specifically, the reported results are either wrong or not new. In addition, the described model has many disadvantages that are not discussed, and it is clearly not evaluated accurately. Furthermore, while the authors claim the model is more comprehensive than previous models, many fundamental aspects mentioned in previous studies are not included in it. Below is a partial review; a more comprehensive review can be downloaded also from http://www.cs.tau.ac.il/~tamirtul/Shah_et_al_review.pdf
Diament & Tuller 5.3.2016
1) The reported correlation between estimated initiation probabilities and predicted 5'-end mRNA folding energies is significant, but very low (R=0.125). Furthermore, the data presented in Figure 5A is in fact binned. This fact is specifically interesting, as the authors discourage the use of bins, but at the same time note that it is “sometimes appropriate to bin the data for visualization purposes”. ~60 bins are used for 3,795 values, and while error bars denote the variance in each bin, the “visual representation” gives the impression of a strong relation between the two variables using very few points, which is not the case. In addition, the authors should include a control, e.g. partial correlation, for other important variables, such as the Kozak score (Kozak,Cell,1986), when computing this correlation (which eventually may be significantly lower) as well as CAI, amino acid charge, and GC content, at the beginning of the coding region, etc. These variables are known (and were known at the time of writing this paper) to have typical signals in the 5’-end of the ORF that are correlated with expression level (and initiation rate), for example see (Plotkin&Kudla,Nat Rev Genet,2011). We also recommend that statistical significance for this correlation be tested using an empirical null model. For example, amino acid preserving sequences in the relevant window can be generated at random to estimate the probability of observing a correlation of 0.125 between inferred initiation rates and mRNA folding energy. This is important since the distribution of amino acids near the N-terminal of the protein may induce the folding strength via their specific codon coding them.
2) This is not the first study that includes/suggests the analyses of many intracellular components (mRNA and ribosomes). However, previous studies on the topic such as (Brackley et al.,PLoS-CB,2011; Chu et al.,2011; Cook&Zia,2009; Cook et al., Phys.Rev.E,2009; Greulich et al., Phys.Rev.E,2012; Heldt&Thiel,2009; Jonathan et.al.,2012; Karr et al., Cell,2012; Mather et al.,2013) are not cited.
3) The major fundamental disadvantage when developing a model with many parameters is overfitting (Babyak,Psychosom Med,2004), or accumulation of error due to individual errors in the different parameters. Surprisingly, the authors do not bother to discuss this issue, and to demonstrate that this is not the case with the model.
4) All the conclusions reported in this paper based on the model are either wrong or not new. Thus, the ability of this model to provide novel conclusions is not convincing at all. For example (points mentioned in the abstract), the correlation between initiation and folding energy has been suggested many times before (Jia&Li, FEBS Lett.,2005; Kudla et al., Science,2009; Sagliocco et al., J.Biol.Chem.,1993; Schauder&McCarthy, Gene,1989; Wang&Wessler, Plant Physiol.,2001) and is very low (r=0.125); the idea that the ramp is caused by rapid initiation is wrong, among others, due to the fact that the authors did not normalize each profile by dividing it by the mean as was performed in previous studies they cite. The conclusions regarding the fact that “protein production in healthy yeast cells is typically limited by the availability of free ribosomes, whereas protein production under periods of stress can sometimes be rescued by reducing initiation or elongation rates” is not new (Bergmann&Lodish, J.Biol.Chem.,1979) and can’t be accurately evaluated via the model due to overfitting issues mentioned above, and missing fundamental aspects mentioned below and above.
5) The authors ignore previous studies that suggested that elongation is not only due to adaptation to the tRNA pool. For example, the model does not consider important aspects related to the elongation rate that were previously reported and suggested to have stronger effect on elongation than tRNA levels. Expressly, the effect of mRNA folding (Nackley et al.,Science,2006; Plotkin&Kudla,Nat Rev Genet,2011; Pop et al., MSB,2014; Tuller et al.,GB,2011; Yang et al.,PLoS Biol,2014) and amino acid content (Charneski&Hurst,PLoS Biol,2013; Lu&Deutsch,JMB,2008; Lu et al.,JMB,2007; Muto&Ito,2008; Pavlov et al.,PNAS.,2009), as well as wobble basepairing (Stadler&Fire,RNA,2011). In addition, the analysis is partially based on Ribo-Seq data, but measurements are not used to infer reliable codon decoding rates. Previous studies (e.g. Charneski&Hurst,2013) suggested that these aspects are significantly more important than the tRNA pool adaptation. Thus, without considering these fundamental aspects it is not clear what we can learn from this model (!).
6) The analysis of the ramp is clearly wrong and was not performed as in previous studies (see, explanations in Figure 5A in (Tuller&Zur, NAR,2015)). Ribosome occupancies must be normalized per gene by its average ribosome occupancy before averaging each codon position across the transcriptome (this is how this was performed in previous studies e.g. (Ingolia et al.,2009)(!)). We must say we are surprised that the authors are not familiar with the details and methodologies used in previous studies.
7) The reported results contradict previous papers of the authors’ themselves (Kudla et al., Science,2009), but this is not discussed. In (Kudla et al.,2009), the authors claimed that slower codons are related to higher ribosomal densities and vice versa. The 5' end of the ORF has been shown to be enriched with slower codons (Plotkin&Kudla,Nat Rev Genet,2011). Thus, by concluding that the 5' ramp is not related at all to codon usage bias in this Cell paper, the authors contradict themselves (as it is related to the claim that codon bias and ribosome densities are unrelated). The reported results are also contradicted by papers the authors published after this study (Weinberg et al.,2016).
8) The performance related to all the aspects of the model should be comprehensively compared with that of other models, and include a balanced discussion about the relative advantages of the different models. Otherwise, the advantages of the proposed model are not clear and misleading. The following are some points to be considered: running time, number of parameters needed and the availability of the parameters, the possibility of analytical analysis of the model, predictions using real and simulated data, etc.
9) “Interestingly, we also found a negative correlation between initiation probability and open reading frame (ORF) length...” This result can be at least partially explained by the fact that in (Ingolia et al., 2009) there is an increased ribosome density at the 5' end (partially due to biases as suggested in (Ingolia et al.,Cell,2011)). This should have a stronger effect on shorter genes. At the time of publishing and working on this study it was already very clear that the higher density of ribosomes at the 5’end of the mRNA was at least partially due to biases in the experiment (see Ingolia et al.,2011); nevertheless, the authors do not mention or consider this fundamental issue in their analysis/conclusions at all(!).
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-