- Jul 2018
-
europepmc.org europepmc.org
-
On 2017 Jan 22, Eric Fauman commented:
I know nothing about cow genetics, but I have done some work on the genetics of metabolites in humans, so I was interested to see how the authors derived biological insights from this genetic study. In particular, I was intrigued by the suggestion in the abstract that they found evidence that genes involved in the synthesis of “milk components” are important for lactation persistence.
Unfortunately, the more I studied the paper the more problems I found that call this claim into question.
First off, the Q-Q plot is currently unavailable, but the text mentions there’s only a “slight deviation in the upper right tail”, which could mean there are no true significant signals.
To account for multiple testing, the authors decided to use a genome-wide association p-value cutoff of 0.95/44100 = 2.15e-5 instead of a more defensible 0.05/44100 = 1.1e-6.
Since their initial p-value cutoff yielded a relatively small number of significant SNPs, the authors used a much more lenient p-value cutoff of 5e-4 which presumably is well within the linear portion of the Q-Q plot.
The biggest problem with the enrichment analysis, however, is that they’ve neglected to account for genes drawn from a common locus. Often, paralogs of similar function are proximal in the genome. But typically we assume that a single SNP is affecting the function of only a single gene at a locus. So, for example, a SNP near the APOA4/APOA1/APOC3/APOA5 locus can tag all 4 genes, but it’s unfair to consider that 4 independent indications that “phospholipid efflux”, “reverse cholesterol transport”, “triglyceride homeostasis” and other pathways are “enriched” in this GWAS.
This issue, of overcounting pathways due to gene duplication, affects all their top findings, presumably rendering them non-significant. Besides lipid pathways, this issue also pertains to the “lactation” GO term, which was selected based on the genes GC, HK2, CSN2 and CSN3. GC, CSN2 and CSN3 are all co-located on Chromosome 6.
A perplexing claim in the paper is for the enrichment of the term “lipid metabolic process” (GO:0006629). According to the Ensembl Biomart, 912 Bos taurus genes fall into this category, or about 4% of the bovine protein coding genes (24616 according to Ensembl). So out of their set of 536 genes (flanking SNPs with P < 5e-4) we’d expect about 20 “lipid metabolic process” genes. And yet, this paper reports only 7. This might be significant, but for depletion, not enrichment.
Sample size is of course a huge issue in GWAS. While 3,800 cows is a large number, it appears this trait may require a substantially larger number of animals before it can yield biologically meaningful results.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
europepmc.org europepmc.org
-
On 2017 Jan 22, Eric Fauman commented:
I know nothing about cow genetics, but I have done some work on the genetics of metabolites in humans, so I was interested to see how the authors derived biological insights from this genetic study. In particular, I was intrigued by the suggestion in the abstract that they found evidence that genes involved in the synthesis of “milk components” are important for lactation persistence.
Unfortunately, the more I studied the paper the more problems I found that call this claim into question.
First off, the Q-Q plot is currently unavailable, but the text mentions there’s only a “slight deviation in the upper right tail”, which could mean there are no true significant signals.
To account for multiple testing, the authors decided to use a genome-wide association p-value cutoff of 0.95/44100 = 2.15e-5 instead of a more defensible 0.05/44100 = 1.1e-6.
Since their initial p-value cutoff yielded a relatively small number of significant SNPs, the authors used a much more lenient p-value cutoff of 5e-4 which presumably is well within the linear portion of the Q-Q plot.
The biggest problem with the enrichment analysis, however, is that they’ve neglected to account for genes drawn from a common locus. Often, paralogs of similar function are proximal in the genome. But typically we assume that a single SNP is affecting the function of only a single gene at a locus. So, for example, a SNP near the APOA4/APOA1/APOC3/APOA5 locus can tag all 4 genes, but it’s unfair to consider that 4 independent indications that “phospholipid efflux”, “reverse cholesterol transport”, “triglyceride homeostasis” and other pathways are “enriched” in this GWAS.
This issue, of overcounting pathways due to gene duplication, affects all their top findings, presumably rendering them non-significant. Besides lipid pathways, this issue also pertains to the “lactation” GO term, which was selected based on the genes GC, HK2, CSN2 and CSN3. GC, CSN2 and CSN3 are all co-located on Chromosome 6.
A perplexing claim in the paper is for the enrichment of the term “lipid metabolic process” (GO:0006629). According to the Ensembl Biomart, 912 Bos taurus genes fall into this category, or about 4% of the bovine protein coding genes (24616 according to Ensembl). So out of their set of 536 genes (flanking SNPs with P < 5e-4) we’d expect about 20 “lipid metabolic process” genes. And yet, this paper reports only 7. This might be significant, but for depletion, not enrichment.
Sample size is of course a huge issue in GWAS. While 3,800 cows is a large number, it appears this trait may require a substantially larger number of animals before it can yield biologically meaningful results.
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-