10 Matching Annotations
  1. Jul 2018
    1. On 2016 Mar 13, Tamir Tuller commented:

      We (and everyone) know that correlation on binning may be misleading if reported in a non-transparent way; specifically binning tends to increase the correlation (but usually has a much weaker effect on the p-value). However, we frankly do not understand why this ‘lecturing’ about the topic appears here (next to our study). Our study includes various sophisticated statistical tests, the binning procedure is described at the beginning in a coherent manner, the correlation for various bin sizes is also reported (one can learn about the relation between binning and correlation simply by looking at figure S5 without the need for this unnecessary correspondence :-)), etc. Thus, we believe that the nature of the signal and the challenging data should be very clear to readers who thoroughly read the paper (but we guess that it may be misleading, as any other paper would, if you do not bother to read all the details :-)).

      The statistical analysis in papers in our field (if they are performed accurately) should consider various aspects including non-trivial biases in the data, discretization, various confounding variables/explanations, various aspects of molecular evolution, huge datasets, etc. Thus, the reader and not only the author should consider them when evaluating the results; specifically, the strength of a correlation should be evaluated in the light of all these aspects. The aim of mentioning other papers and ‘top statisticians’ was to demonstrate that there are many people (as opposed to Plotkin/Shah/Cherry) that do understand this point.

      If the number of points in a typical systems biology study is ~300, the number of points analyzed in our study is 1,230,000-fold higher (!); a priori, a researcher with some minimal experience in the field should not expect to see similar levels of correlations in the two cases. Everyone also knows that increasing the number of points, specifically when dealing with non trivial NGS data, also tends to very significantly decrease the correlation. The aim of the binning was to align our signal to previously reported signals in the field (in terms of number of points), and as mentioned the paper includes many other analyses that give the reader a greater context for the signal (including an explicit graph reporting the relation between bin size and the correlation); in addition the non-binned correlation (0.02-0.07) is comparable to the level of correlation between two Hi-C measurements (~0.05) from different labs (!). It is clear that a typical signal in our field (e.g. higher than the correlation of 0.12 or even the “high” 0.38 mentioned in your paper Weinberg DE, 2016) if transferred via such a noisy/biased ‘channel’ with increased number of points will be order of magnitudes lower than our non-binned data.

      We, of course, do not expect that further back-and-forth will convince Plotkin and Shah of our points. But hopefully this exchange will at least have some value to the field for scientists who work to draw inferences from genomic datasets; specifically, we hope that other scientists will learn to thoroughly consider all the aspects mentioned above and below when reading/writing a scientific paper.

      BTW: regarding the correlation 0.12 that was improved to 0.38 in the new study. In the new study (Weinberg DE, 2016) you still did not perform many of the required statistical controls (among others control for Kozak sequence and AA bias) according to our review [http://www.cs.tau.ac.il/~tamirtul/Shah_et_al_review.pdf].

      Tamir Tuller & Alon Diament, Tel-Aviv University, March 13, 2016


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Mar 09, Joshua Plotkin commented:

      Hopefully this exchange now illustrates that correlations on binned data are terribly misleading. In the paper by Diament et al 2014, the authors never reported the actual correlation (r = 0.022) between two genomic measurements; instead they reported correlations on binned data (r = 0.86). As a result, the amount of variation in 3D position explained by codon usage, which is the central claim of the study, was inflated by 1,500-fold.

      There is no need to consult “top professional statisticians” to understand the obvious fact that binning data tends to inflate correlations, and that it has no scientific justification regardless of the size of a dataset.

      Diament and Tuller offer one non-scientific justification in their reply: that previous publications in “top journals” have done the same thing, citing Ghaemmaghami S, 2003 and Shah P, 2013 as examples. Even if this were true, appealing to journal name is not a strong justification for repeating statistical errors. Moreover, in fact, both cited studies used binning only to graph the data, showing also the variation within each bin, whereas the correlations were calculated on the unbinned data.

      We agree with Diament and Tuller that r = 0.12 should not have been described as a “strong” correlation by Shah P, 2013. (The same analysis on an improved experimental dataset yields r = 0.38, Weinberg DE, 2016). However, unlike Diament and Tuller, we maintain that scientists must nonetheless report the actual correlations between measured quantities, instead of inflating correlations by binning, which would have produced a misleading r = 0.62 in the case of Shah et al. 2013.

      We do not expect that further back-and-forth will convince Diament and Tuller of these points. But hopefully this exchange will have some value to the field of scientists who work to draw inferences from genomic datasets.

      --Joshua Plotkin & Premal Shah, University of Pennsylvania, March 9 2016


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    3. On 2016 Mar 05, Tamir Tuller commented:

      Plotkin couldn’t (and shouldn’t) be a reviewer of our paper simply since he was in our list of reviewers to exclude when the paper was submitted (unfortunately, indeed Plotkin’s & Shah’s comment here demonstrates that many of the details in our paper were completely overlooked by them in the biased ‘review’ they provide here, as all the raised issues were thoroughly answered in the original manuscript). Specifically in our data there were 369,000,000 initial points that were binned to up to 64,000 (!) bins (not 2 points! :-) ). The comment and answer to Cherry actually include all the important details regarding Plotkin’s & Shah’s comment. As noted, the other reviewers (and bio-statisticians that were presented the study) were, of course, aware of the binning process and found the paper interesting, correct and worthy of publication. We think that a discussion about binning (that it seems that Plotkin & Shah would like to promote) should actually include their own recent study (Shah P, 2013): there you can learn among others that in Shah P, 2013 Plotkin & Shah report in the abstract a correlation which is in fact very weak (according to their definitions here), r = 0.12, without controlling for relevant additional fundamental variables, and include a figure of binned values related to this correlation. This correlation (0.12) is reported in their study as “a strong positive correlation”. It is also important to mention that the number of points used for computing this correlation is more than one order of magnitude lower than the number of points/bins related to some of the correlations we report.

      Given their comment here, it is probable that Poltkin & Shah’s paper would not have been published (at least in its current form) had they reviewed it themselves. :-)

      Our full critical comment on (Shah et al., 2013) can be found here: Shah P, 2013 (or here http://www.cs.tau.ac.il/~tamirtul/Shah_et_al_review.pdf ).

      A point by point answer to Plotkin’s & Shah’s ‘review’ can be found here http://www.cs.tau.ac.il/~tamirtul/Plotkin__reply.pdf

      Our full answer to Cherry’s, Plotkin’s & Shah’s claims regarding binning can be found here: http://www.cs.tau.ac.il/~tamirtul/Cherry_reply.pdf

      Tamir Tuller & Alon Diament, Tel-Aviv University, March 5, 2016


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    4. On 2016 Feb 04, Joshua Plotkin commented:

      Weak effects made to appear strong by inflated correlation coefficients (reply)

      The titular claim of this paper is that codon usage and gene function are “strongly correlated” with physical proximity within the cell. To support this claim, the authors report correlations between various features of genes, such as codon usage, expression level etc, with 3D genomic organization data. Unfortunately, as Joshua Cherry has pointed out on PubMed commons (Nov 24, 2015), these correlations are all based on binned datasets. By systematically binning the data, the authors remove noise and artificially inflate the strength of the correlation, so that the resulting correlation coefficient does not reflect the actual strength of correlation in the data. In the extreme case of n=2 bins, for example, all the correlations would be r=1.

      We raised these concerns — the exact same ones Joshua Cherry expressed post-publication — in our original review of the submitted manuscript. We submitted our review to Nature Communications on May 21, 2014, and we never heard back from the journal or saw a revised version of the manuscript. It is now clear that our concerns were ignored both by the authors and by editors at Nature Communications.

      Artificially inflating correlations by binning data is a serious issue in ongoing biological studies. By posting our original review of this manuscript, alongside Cherry's post-publication critique, we hope to draw awareness and open discussion regarding this scientific issue.

      Joshua Plotkin & Premal Shah, University of Pennsylvania, February 4, 2016

      ----ORIGINAL REFEREE REPORT SUBMITTED May 21 2014----

      Remarks to the Author:

      The manuscript by Diament et al. aims to understand how the three dimensional arrangement of a eukaryotic genome is organized. The natural hypothesis is that functionally related genes are positionally closer to each other in space -- a hypothesis that has been proposed earlier but with limited empirical support. Here, the authors claim that earlier studies failed to identify a strong relationship between position and function due to lack of appropriate metrics to assess functional similarity between genes. The authors propose a "novel" metric based on patterns of codon usage and they demonstrate a putatively strong relationship between functionally related genes and their proximity in 3-D.

      The manuscript is severely flawed from both biological and statistical stand-points, and is not fit for publication. Following are my detailed comments:

      1) The "novel" CUBS method proposed by the authors is not novel at all. There is a rich literature of using Kullback-Liebler based distance metrics for studying patterns of codon usage, which the authors have completely ignored. Although, their metric is a symmetric version of KL-distance, the entire basis of this metric is not novel.

      2) The entire analysis is predicated on the assumption that functionally similar genes have similar patterns of codon usage. The only work cited in support of this notion is De Bivort et al 2009, where those authors found that amino acid metabolism might play a role in affecting protein composition for certain functionally related proteins in yeast. However, even there the extent of this effect was found to be limited to 20,000-60,000 residues, which constitutes less that 2% of the entire genome. To say that this pattern holds not only across the entire genome but also across four other species demands quite a stretch of imagination.

      3) More importantly, every single correlation reported in the paper is based on binned data. Although it is sometimes appropriate to bin the data for visualization purposes, it is entirely without merit to report correlation coefficients (and associated p-values) on binned data. This fact id demonstrated by comparing figures 3D and S2A, where changing the bin-size effects the apparent "correlation". All the correlation calculated should be based on the raw data and "improving statistical accuracy" is not a vlid justification for arbitrarily binning data. This problem of correlation-inflation due to binning the data is quite serious (eg Kenny & Montanari J Comput Aided Mol Des. 2013). Based on their own figures 3D and S2A, it seems clear that their results either have very small effect or do not at hold at all when analyzing the actual raw data.

      4) This statistical problem (#3) is further compounded by the fact that each data point in the correlations is based on a pair of genes, and hence the points are not independent of each other -- whereas t-tests used to assess significance assume independence. The standard way to deal with such data is to use Mantel's test or one of its several derivatives. The authors also need to take into account issues related to multiple regressions. Its likely that several of the gene features used as proxies for function are, again, highly correlated with each other. Once again, the nominal statistical significance of the results is inflated by failing to account for these dependencies.

      5) Finally, the authors make no attempt to explain why the data in their plots are so non-linear and even non-monotonic? Its clear that in several of the plots the relationship between CUBS and the predictor is highly non-linear, and that the linear fit is extremely poor. However, the authors make no effort to explain or even understand these patterns. Could they be an artifact of the data? If so, how does it affect the results?

      6) Moreover, the correlation coefficients reported in most of their plots make no sense whatsoever. For instance, in Fig1B, the best-fit regression line of CUBS vs PPI barely passes through the bulk of the data, and yet the authors report a perfect correlation of R=1.

      The issues above are so severe (starting from the unjustified assumption that similar codon usage implies similar gene function, which underlies the entire study) that revision cannot possible rectify these problems.

      Remarks to the Editor: none


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    5. On date unavailable, commented:

      None


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    6. On 2016 Mar 08, Joshua L Cherry commented:

      In their response to my comment, Diament and Tuller not only attempt to defend their use of a procedure that dramatically inflates correlation coefficients, but advocate for its wider use, which would be most unfortunate. Nothing in their response justifies this procedure or refutes the warnings against it that I cited. I address their points briefly below, and in more detail here.

      Statistical Significance vs. Effect Size

      The response makes much of the high statistical significance of the correlations. This does nothing to address my comment, which was about effect size. A weak correlation, however statistically significant, is not a strong correlation.

      Large Sample Size

      The response emphasizes the large number of data points in the authors’ data set, but this in no way justifies their correlation-inflating procedure. The large size of the dataset only allowed the authors to bin and average large numbers of data points, which only exacerbated the problem by leading to a more dramatic inflation of the correlation coefficients. The response also seems to imply that a large sample size will cause a strong correlation to yield a low correlation coefficient, which is incorrect.

      Other Evidence

      The response claims that "The conclusions of our paper have also been tested in a recent study (Diament A, 2015), where we showed that 3D distances predicted by CUFS can be employed to reconstruct an improved 3D model of the yeast genome." If CUFS data did improve estimates of 3D distance, this would not justify or vindicate the inflation of correlation coefficients by binning and averaging. Furthermore, as explained in my more detailed response, there is little evidence that the 3D distance estimates were actually improved.

      Measurement Errors

      The response repeats the article’s argument about measurement noise, but this argument is flawed. It is true that a strong underlying correlation coupled with sufficient measurement error could produce a weak apparent correlation with the observed properties. It does not follow that a weak correlation with these properties implies a strong underlying correlation. Counterexamples are common--probably the rule rather than the exception--and include Francis Galton’s classic studies of height among relatives. Galton and Pearson could easily have arrived at larger “correlation coefficients” through binning, but doing so would have defeated their purpose and been a great setback for statistics and quantitative genetics.

      Diament and Tuller claim, based on the low correlation coefficient between two sets of yeast 3D distances (r=0.05), that the maximum possible CUFS/3DGD correlation is 0.05. Were this correct, it would only argue for an underlying correlation of 0.022/0.05=0.44 for yeast, far short of the reported r=0.86. The claimed maximum is in fact incorrect, for several reasons:

      • If the correlation coefficient between replicate datasets were 0.05, the correct maximum would be sqrt(0.05)=0.22 (the inferred correlation of either replicate with reality), from which one can argue only for an underlying r=0.1.

      • Diament and Tuller did not analyze data from replicates, but compared the data that they used in the article to values derived from an older experiment based on a different technique. The low correlation between the datasets may be due mainly to noise in the other dataset.

      • The original response submitted by Diament and Tuller included two other comparisons that did not suffer from the above problem, and these yielded much higher estimates of reliability: r=0.39 and r=0.54. This information was omitted from the version of the response posted by Tuller.

      Diament and Tuller argue that r=0.05 for the two datasets, calculated as Spearman intended, absurdly implies that "any attempt to study 3D genomic organization using Hi-C...is futile”. In reality it implies only that these particular datasets are not very similar, which is not absurd, and undoubtedly true. Their apparently preferred alternative--binning the data to obtain a high correlation coefficient and concluding that the measurements are very similar--is clearly incorrect.

      Concluding Remarks

      The correlations between 3DGD and CUFS, though statistically significant, are quite weak, and should have been reported as such. The available information does not support the contention that they reflect strong correlations made weak by measurement noise. That others have committed or overlooked a "common error" is not an argument in its favor. That correlations coefficients in a field are often weak is no reason to inflate them or describe weak correlations as strong. It would be one thing to argue that a correlation coefficient of 0.022 tells us something important, but it is quite another to report an inflated correlation coefficient of 0.86 and describe the correlation as strong, as done by the authors.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    7. On 2016 Feb 27, Tamir Tuller commented:

      We are happy to report that the paper, including the binning procedure, was presented to top professional statisticians who found the paper very interesting. The comment by Cherry was submitted to Nature Communications as a comment, and was reviewed by at least two professional reviewers. Based on the review the decision was that the comment is not required as it does not teach us anything new and is irrelevant/incorrect in this context. Specifically, it is known that binning tends to increase the correlation; however, the opposite is also true, when there are more points the correlations tend to be lower. In the analyzed data, the binning procedure was described very clearly and mentioned at the beginning of the paper; there were 369,000,000 initial points that were binned to up to 64,000 (!) bins (not 2 points!) so that the correlations will be comparable to correlations reported in all previous systems biology studies in recent years (actually the number of points is still orders of magnitudes (!) higher than previous systems biology studies that we know of). As described in the link to the detailed reply (see below) the obtained correlations with raw data are similar to the ones obtains between two Hi-C measurements, demonstrating that indeed given the nature of the data the correlations are very high. A detailed reply to Cherry’s comment can be found here: http://www.cs.tau.ac.il/~tamirtul/Cherry_reply.pdf. In this link, we further explain why Cherry’s claims were thoroughly addressed in the original manuscript, and that the methods and results were presented in a transparent manner. Most importantly, we reiterate that the relation between variables has been subject to stringent statistical tests, and that the observed signals are indeed strong with respect to expected and previously reported ones in large-scale genomic studies. In addition, we illustrate again that the reported correlations are comparable to the maximal correlation expected when comparing large scale noisy data after quantization. Finally, we show that the correlations reported in our study are similar to the correlations obtained between two Hi-C experiments; thus, if we follow Cherry’s line of thought, we actually should absurdly conclude that the Hi-C protocol in general is problematic. We discuss the generality of our conclusion to systems biology analysis of Next Generation Sequencing (NGS) data.


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    8. On 2015 Nov 24, Joshua L Cherry commented:

      Weak effects made to appear strong by inflated correlation coefficients

      Diament et al. claim that the three-dimensional distances between eukaryotic genes are strongly correlated with differences in codon usage and gene function. These claims are based on correlation coefficients calculated by an illegitimate procedure that grossly inflates their magnitude. The correlations are in reality very weak, and the article’s conclusions are therefore unjustified.

      Diament et al. report high values (0.74-0.96) for Spearman’s rank-order correlation between measures of codon usage dissimilarity (CUFS) and 3D distance between genes (3DGD). These impressive values, however, are not the true correlation coefficients between the variables. Rather, they are the correlation coefficients of average values for bins of thousands of data points with similar values of CUFS. This binning and averaging procedure can be expected to yield high values in the presence of very weak correlations. The authors describe the process as "reducing biological noise through averaging". In reality it suppresses much of the variation in one variable that is not explained (in the statistical sense) by the other, making it appear as though the explained variation is a larger fraction of the total, and inflating the correlation coefficient accordingly. A miniscule correlation can be made to look like a strong correlation with this procedure, so long as the expected value of one variable mainly increases with the value of the other. Computing correlations based on averages is described by one textbook [1] as a “common error” that “can easily lead to an inflated correlation coefficient”, and correlations inflated in this way have been criticized elsewhere [2,3].

      Supplementary Fig. 5 of the article suggests that the true correlations between CUFS and 3DGD, corresponding to one data point per bin, are quite weak. Using data provided by the authors, I have found that they are very weak indeed: the Spearman’s correlation coefficients are 0.019, 0.022, 0.025, 0.071, and 0.034 for S. pombe, S. cerevisiae, A. thaliana, M. musculus, and H. sapiens respectively. The central claims of the article are therefore unfounded, as the reported strong correlations are an artifact of the binning procedure.

      The weakness of the correlations is evident in the meager sensitivity of averaged 3DGD to CUFS, which is apparent in Fig. 2 of the article. For example, for S. cerevisiae, for which a correlation coefficient of 0.85 was reported, averaged values of 3DGD vary only from ~3.0 for the lowest CUFS values to ~3.1 for the highest (a few outliers approach 3.3, but, as the authors argue, these are not meaningful). This is only a small fraction of the variation in 3DGD values, which range from 1 to 13 with a standard deviation of 0.72.

      Another perspective is provided by the distributions of CUFS values for different 3D distances. Distributions in S. cerevisiae are shown in here for distances between 1 and 5, which encompass 99.8% of the gene pairs (for larger distances the distributions vary more widely but are dominated by pairs involving one or a few genes). These distributions are quite similar to one another. Indeed, for 3DGD < 5 (encompassing 98% of the gene pairs), the distributions are difficult to distinguish, and the only distinguishing feature for 3DGD = 5 is due to effects of just a few genes (see figure legend). Differences between the means of the distributions (shown graphically in the plot) are quite small compared to the variation of CUFS within each 3D distance category. Clearly CUFS is not strongly associated with 3D distance.

      The weakness of the correlations is in no way negated by the fact that the authors’ binning procedure yields high values. Most correlation coefficients will be inflated by this procedure, including the prototypical coefficients calculated by Galton and Pearson. For samples drawn from a bivariate normal distribution with a correlation coefficient of just 0.03, this procedure, with the relevant bin and sample sizes, yields Spearman’s correlation coefficients greater than 0.9. It would be absurd to describe such variables as strongly correlated, and reporting a Spearman’s correlation of >0.9 would be grossly misleading.

      Large data sets can give us the statistical power to detect very weak correlations, revealing what has been called the “crud factor”: that “everything correlates to some extent with everything else.”[4] Weak correlations are easily produced in the absence of a direct or otherwise interesting connection between the variables. Weak correlations are sometimes enlightening, but exaggerating their strength only obscures.

      References

      1. Triola, M. F. Elementary Statistics. Addison-Wesley, Reading, MA (1992)

      2. Kenny, P. W. & Montanari C. A. Inflation of correlation in the pursuit of drug-likeness. J. Comput. Aided Mol. Des. 27:1-13 (2013).

      3. Brand, A. & Bradley M. T. More voodoo correlations: when average-based measures inflate correlations. The Journal of General Psychology. 139(4):260-272 (2012).

      4. Meehl, P. E. Why summaries of research on psychological theories are often uninterpretable. Psychological Reports. 66:195-244 (1990).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

  2. Feb 2018
    1. On 2015 Nov 24, Joshua L Cherry commented:

      Weak effects made to appear strong by inflated correlation coefficients

      Diament et al. claim that the three-dimensional distances between eukaryotic genes are strongly correlated with differences in codon usage and gene function. These claims are based on correlation coefficients calculated by an illegitimate procedure that grossly inflates their magnitude. The correlations are in reality very weak, and the article’s conclusions are therefore unjustified.

      Diament et al. report high values (0.74-0.96) for Spearman’s rank-order correlation between measures of codon usage dissimilarity (CUFS) and 3D distance between genes (3DGD). These impressive values, however, are not the true correlation coefficients between the variables. Rather, they are the correlation coefficients of average values for bins of thousands of data points with similar values of CUFS. This binning and averaging procedure can be expected to yield high values in the presence of very weak correlations. The authors describe the process as "reducing biological noise through averaging". In reality it suppresses much of the variation in one variable that is not explained (in the statistical sense) by the other, making it appear as though the explained variation is a larger fraction of the total, and inflating the correlation coefficient accordingly. A miniscule correlation can be made to look like a strong correlation with this procedure, so long as the expected value of one variable mainly increases with the value of the other. Computing correlations based on averages is described by one textbook [1] as a “common error” that “can easily lead to an inflated correlation coefficient”, and correlations inflated in this way have been criticized elsewhere [2,3].

      Supplementary Fig. 5 of the article suggests that the true correlations between CUFS and 3DGD, corresponding to one data point per bin, are quite weak. Using data provided by the authors, I have found that they are very weak indeed: the Spearman’s correlation coefficients are 0.019, 0.022, 0.025, 0.071, and 0.034 for S. pombe, S. cerevisiae, A. thaliana, M. musculus, and H. sapiens respectively. The central claims of the article are therefore unfounded, as the reported strong correlations are an artifact of the binning procedure.

      The weakness of the correlations is evident in the meager sensitivity of averaged 3DGD to CUFS, which is apparent in Fig. 2 of the article. For example, for S. cerevisiae, for which a correlation coefficient of 0.85 was reported, averaged values of 3DGD vary only from ~3.0 for the lowest CUFS values to ~3.1 for the highest (a few outliers approach 3.3, but, as the authors argue, these are not meaningful). This is only a small fraction of the variation in 3DGD values, which range from 1 to 13 with a standard deviation of 0.72.

      Another perspective is provided by the distributions of CUFS values for different 3D distances. Distributions in S. cerevisiae are shown in here for distances between 1 and 5, which encompass 99.8% of the gene pairs (for larger distances the distributions vary more widely but are dominated by pairs involving one or a few genes). These distributions are quite similar to one another. Indeed, for 3DGD < 5 (encompassing 98% of the gene pairs), the distributions are difficult to distinguish, and the only distinguishing feature for 3DGD = 5 is due to effects of just a few genes (see figure legend). Differences between the means of the distributions (shown graphically in the plot) are quite small compared to the variation of CUFS within each 3D distance category. Clearly CUFS is not strongly associated with 3D distance.

      The weakness of the correlations is in no way negated by the fact that the authors’ binning procedure yields high values. Most correlation coefficients will be inflated by this procedure, including the prototypical coefficients calculated by Galton and Pearson. For samples drawn from a bivariate normal distribution with a correlation coefficient of just 0.03, this procedure, with the relevant bin and sample sizes, yields Spearman’s correlation coefficients greater than 0.9. It would be absurd to describe such variables as strongly correlated, and reporting a Spearman’s correlation of >0.9 would be grossly misleading.

      Large data sets can give us the statistical power to detect very weak correlations, revealing what has been called the “crud factor”: that “everything correlates to some extent with everything else.”[4] Weak correlations are easily produced in the absence of a direct or otherwise interesting connection between the variables. Weak correlations are sometimes enlightening, but exaggerating their strength only obscures.

      References

      1. Triola, M. F. Elementary Statistics. Addison-Wesley, Reading, MA (1992)

      2. Kenny, P. W. & Montanari C. A. Inflation of correlation in the pursuit of drug-likeness. J. Comput. Aided Mol. Des. 27:1-13 (2013).

      3. Brand, A. & Bradley M. T. More voodoo correlations: when average-based measures inflate correlations. The Journal of General Psychology. 139(4):260-272 (2012).

      4. Meehl, P. E. Why summaries of research on psychological theories are often uninterpretable. Psychological Reports. 66:195-244 (1990).


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.

    2. On 2016 Feb 04, Joshua Plotkin commented:

      Weak effects made to appear strong by inflated correlation coefficients (reply)

      The titular claim of this paper is that codon usage and gene function are “strongly correlated” with physical proximity within the cell. To support this claim, the authors report correlations between various features of genes, such as codon usage, expression level etc, with 3D genomic organization data. Unfortunately, as Joshua Cherry has pointed out on PubMed commons (Nov 24, 2015), these correlations are all based on binned datasets. By systematically binning the data, the authors remove noise and artificially inflate the strength of the correlation, so that the resulting correlation coefficient does not reflect the actual strength of correlation in the data. In the extreme case of n=2 bins, for example, all the correlations would be r=1.

      We raised these concerns — the exact same ones Joshua Cherry expressed post-publication — in our original review of the submitted manuscript. We submitted our review to Nature Communications on May 21, 2014, and we never heard back from the journal or saw a revised version of the manuscript. It is now clear that our concerns were ignored both by the authors and by editors at Nature Communications.

      Artificially inflating correlations by binning data is a serious issue in ongoing biological studies. By posting our original review of this manuscript, alongside Cherry's post-publication critique, we hope to draw awareness and open discussion regarding this scientific issue.

      Joshua Plotkin & Premal Shah, University of Pennsylvania, February 4, 2016

      ----ORIGINAL REFEREE REPORT SUBMITTED May 21 2014----

      Remarks to the Author:

      The manuscript by Diament et al. aims to understand how the three dimensional arrangement of a eukaryotic genome is organized. The natural hypothesis is that functionally related genes are positionally closer to each other in space -- a hypothesis that has been proposed earlier but with limited empirical support. Here, the authors claim that earlier studies failed to identify a strong relationship between position and function due to lack of appropriate metrics to assess functional similarity between genes. The authors propose a "novel" metric based on patterns of codon usage and they demonstrate a putatively strong relationship between functionally related genes and their proximity in 3-D.

      The manuscript is severely flawed from both biological and statistical stand-points, and is not fit for publication. Following are my detailed comments:

      1) The "novel" CUBS method proposed by the authors is not novel at all. There is a rich literature of using Kullback-Liebler based distance metrics for studying patterns of codon usage, which the authors have completely ignored. Although, their metric is a symmetric version of KL-distance, the entire basis of this metric is not novel.

      2) The entire analysis is predicated on the assumption that functionally similar genes have similar patterns of codon usage. The only work cited in support of this notion is De Bivort et al 2009, where those authors found that amino acid metabolism might play a role in affecting protein composition for certain functionally related proteins in yeast. However, even there the extent of this effect was found to be limited to 20,000-60,000 residues, which constitutes less that 2% of the entire genome. To say that this pattern holds not only across the entire genome but also across four other species demands quite a stretch of imagination.

      3) More importantly, every single correlation reported in the paper is based on binned data. Although it is sometimes appropriate to bin the data for visualization purposes, it is entirely without merit to report correlation coefficients (and associated p-values) on binned data. This fact id demonstrated by comparing figures 3D and S2A, where changing the bin-size effects the apparent "correlation". All the correlation calculated should be based on the raw data and "improving statistical accuracy" is not a vlid justification for arbitrarily binning data. This problem of correlation-inflation due to binning the data is quite serious (eg Kenny & Montanari J Comput Aided Mol Des. 2013). Based on their own figures 3D and S2A, it seems clear that their results either have very small effect or do not at hold at all when analyzing the actual raw data.

      4) This statistical problem (#3) is further compounded by the fact that each data point in the correlations is based on a pair of genes, and hence the points are not independent of each other -- whereas t-tests used to assess significance assume independence. The standard way to deal with such data is to use Mantel's test or one of its several derivatives. The authors also need to take into account issues related to multiple regressions. Its likely that several of the gene features used as proxies for function are, again, highly correlated with each other. Once again, the nominal statistical significance of the results is inflated by failing to account for these dependencies.

      5) Finally, the authors make no attempt to explain why the data in their plots are so non-linear and even non-monotonic? Its clear that in several of the plots the relationship between CUBS and the predictor is highly non-linear, and that the linear fit is extremely poor. However, the authors make no effort to explain or even understand these patterns. Could they be an artifact of the data? If so, how does it affect the results?

      6) Moreover, the correlation coefficients reported in most of their plots make no sense whatsoever. For instance, in Fig1B, the best-fit regression line of CUBS vs PPI barely passes through the bulk of the data, and yet the authors report a perfect correlation of R=1.

      The issues above are so severe (starting from the unjustified assumption that similar codon usage implies similar gene function, which underlies the entire study) that revision cannot possible rectify these problems.

      Remarks to the Editor: none


      This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.