- Jul 2018
-
europepmc.org europepmc.org
-
On 2014 Jul 25, Claudiu Bandea commented:
Everlasting confusion on ‘functional DNA’ and ‘junk DNA’
As discussed by Doolittle (1), addressing and defining the functionality of genomic DNA can be challenging. However, our common sense combined with our philosophical instinct - we know a function when we see one - should allow us to sensibly address the biology of our genome. After all, the ENCODE ‘function fiasco’ was not the result of misunderstanding the concept of biological function, nor was it due to scientific incompetence as suggested by others (2). On the contrary, because it conflicted with some of the project’s objectives and with its significance, there was a concerted effort not to bring this concept forward (3); indeed, as clearly shown in a recent ENCODE publication (4), at least some ENCODE members seem well aware of the scientific rationale and criteria for addressing putative biological functions for genomic DNA.
Nevertheless, as concluded by Doolittle (1), poor use of words in communicating scientific observations, and lack of attention to detail have led to significant misrepresentations and confusion. Here are a few examples spanning more than four decades.
In a recent study entitled “Multiple knockout mouse models reveal lncRNAs are required for life and brain,” which addressed putative biological functions of long noncoding RNAs (lncRNAs), it was concluded that “This study demonstrates that lncRNAs play critical roles in vivo…” (5). Unfortunately, both the title and the conclusion misrepresent the results; an accurate interpretation of the results is that “Multiple knockout mouse models reveal that some (or a few) lncRNAs are required for life and brain,” and that “This study demonstrates that some (or a few) lncRNAs play critical roles in vivo….” As a matter of fact, based on the results of the study, which showed that only 5 of the 18 lncRNAs, which have been selected from hundreds sequences as the best candidates for being functional, a more appropriate scientific interpretation would be: “This study demonstrates that most lncRNAs (i.e. 13 out of 18) do not appear to play critical roles in vivo….” Moreover, if the goal is to evaluate this study at the highest scientific standards (as presumably required by the journal eLife, where it was published), then, I would suggest that, due to lack of appropriate control experiments such as duplicated knockout experiments, we don’t know if the observed dysfunctions were associated with specific lncRNAs, or they were caused by untargeted genome modifications introduced accidentally during the procedure of generating knockout mice, which in fact questions the validly of the entire study. As previously noted (3), all these problems reflect the deficiencies of the current limited and closed peer review system; indeed, a document reviewed by a few peers (usually, 2 or 3) can hardly be considered peer-reviewed.
The next three examples illustrate some of the confusion surrounding the concepts of biological function and junk DNA (jDNA), albeit in a more subtle way.
In one of the first papers addressing the notion of jDNA (6), David Comings writes: “These considerations suggest that up to 20% of the genome is actively used and the remaining 80+% is junk. But being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded.”
In their iconic paper “Selfish DNA: the ultimate parasite” (7), Leslie Orgel and Francis Crick, summarize their thoughts as follows: “The DNA of higher organisms usually falls into two classes, one specific and the other comparatively nonspecific. It seems plausible that most of the latter originated by the spreading of sequences which had little or no effect on the phenotype.”
More recently, in a publication addressing the ENCODE project (8), Sean Eddy, remarked: “These data support a view that eukaryotic genomes contain a substantial fraction of DNA that serves little useful purpose for the organism, much of which has originated from the replication of transposable (selfish) elements.”
Although seemingly innocent, these citations are relevant examples of poor use of words. Some of this confusion might dissipate by recognizing that: (i) if genomic sequences are not entirely useless, (ii) if they have little effect on the phenotype, or (iii) if they serve little useful purpose for the organism, then, these DNA sequences are functional, period.
In the next example, the protagonists are Michael Eisen, the host of a popular blog suggestively named “it is NOT junk”, and Ryan Gregory, the host of another popular blog “Genomicron”. Interestingly, Eisen was the PNAS editor in charge of Doolittle’s article on jDNA (1), and, according to Doolittle, Gregory is “now the principal C-value theorist” (1). Undoubtedly, Eisen and Gregory are among the most knowledgeable and versed communicators on genome biology and jDNA and, therefore, their ‘words’ are representative of the thinking in the field.
Immediately after the publication of ENCODE’s flurry of articles and the associated publicity stunt orchestrated by a few ENCODE scientists, both Eisen and Gregory reacted forcefully, but apparently from opposite perspectives. In a post in which he blasts ENCODE as “a carefully orchestrated spectacle” (9), Eisen writes: “nobody actually thinks that non-coding DNA is ‘junk’ any more. It’s an idea that pretty much only appears in the popular press… It is dishonest – nobody can credibly claim this to be a finding of ENCODE….”
In a prompt response (10), entitled “Michael Eisen’s take on ENCODE — there’s no junk?”, Gregory goes into a detailed and cynical questioning of Eisen’s perspective. Fortunately, due to the open communication platform offered by these blogs, this scientific ‘drama’ ended within hours, when Eisen apparently clarified: “I was not saying that everybody knows that 100% of the genome is functional! I was saying that nobody thinks that 100% of non-coding DNA is non-functional” (10).
Is there an end to this distressing confusion in the field of genome biology? Unlikely, if the funding system continues to focus primarily on generating data and observations, while neglecting their interpretation and integration into productive conceptual frameworks; in other words, you only get what you pay for.
References
(1) Doolittle WF. 2013. Is junk DNA bunk? A critique of ENCODE. Proc Natl Acad Sci USA., 110:5294-300. Doolittle WF, 2013
(2) Graur D et al., 2013. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol., 5:578-90. Graur D, 2013
(3) Bandea CI. 2014. Closing the gap between ‘words’ and ‘facts’ in evaluating genome biology and the ENCODE project. PubMed Commons (National Library of Medicine; Bethesda, MD). Comment on: Doolittle WF, 2013
(4) Kellis M. et al., 2014. Defining functional DNA elements in the human genome. Proc Natl Acad Sci USA., 111:6131-8. Kellis M, 2014
(5) Sauvageau M. et al., 2014. Multiple knockout mouse models reveal lincRNAs are required for life and brain development. eLife (DOI: 10.7554/eLife.01749) Sauvageau M, 2013
(6) Comings DE.1972. The structure and function of chromatin. Adv Hum Genet. 3:237-431. Comings DE, 1972
(7) Orgel LE, Crick FH. 1980. Selfish DNA: the ultimate parasite. Nature. 284:604-7. Orgel LE, 1980
(8) Eddy SR. 2012. The C-value paradox, junk DNA and ENCODE. Curr Biol. 22:898-9. Eddy SR, 2012
(9) Eisen M. 2012. This 100,000 word post on the ENCODE media bonanza will cure cancer. Blog: it is NOT junk. http://www.michaeleisen.org/blog/?p=1167
(10) Gregory TR. 2012. Michael Eisen’s take on ENCODE — there’s no junk? Blog: Genomicron. http://www.genomicron.evolverzone.com/2012/09/michael-eisens-take-on-encode-theres-no-junk/
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2014 Jul 16, Claudiu Bandea commented:
Closing the gap between ‘words’ and ‘facts’ in evaluating genome biology and the ENCODE project
When I referred to Ford Doolittle’s article (1) as “7 pages of small print text” (2), I meant it both, literary and figuratively. Indeed, Doolittle’s paper is a remarkable example of fine print, which is likely to induce heated arguments for a long time, just as the author concluded: “many of the most heated arguments in biology are not about facts at all but rather about the words that we use to describe what we think the facts might be.”
Remarkably, more than 40 years ago, Susumu Ohno started his famous paper on junk DNA (3) with a related statement: “Over the years, I have learned that there is no such thing as a fact. What passes for a fact is in truth a set of observations and its interpretation. Therefore, the interpretation is just as important to a fact as the observation itself.”
In my previous mini-essay (2) on Doolittle’s article (1), I made 3 main points:
(i) Ever since the notion of ‘junk DNA’ (jDNA) was introduced as a metaphor for presumably non-functional genomic DNA in species with relatively high C-value, it has been clearly used by the scientists in the field of genome biology and evolution to denote genomic DNA that has no biological function at all, whether informational (iDNA) or non-informational (niDNA), and implying otherwise is nonsensical.
(ii) The two main theories on putative non-informational functions for the so called jDNA, the nucleo-skeletal and the nucleotypic functions, which “describe genome size variation as the outcome of selection via the intermediate of cell size” (4), and which Doolittle uses as pillars for his theoretical framework on genome evolution and biology, do not explain the C-value paradox.
(iii) Apparently, Doolittle is not aware of the theory that most genomic niDNA, redefined as symbiotic DNA (sDNA), functions as a protective mechanism (adaptive genomic immunity) against deleterious insertional mutagenesis by endogenous and exogenous inserting elements, such as retroviruses, and that this theory is fully supported by the current data and observations and it explains the C-value paradox (5, 6).
Here, I attempt to further close the gap between 'words' and 'facts' in addressing the genome biology and in evaluating the ENCODE project.
In the introductory section, Doolittle outlines the premise for his paper: “a flurry of articles and letters”, published in Nature in other journals under the umbrella of the ENCODE project, “collectively claim function for the majority of the 3.2 Gb human genome”, which, if true, would debunk the notion of jDNA (1). The problem with Doolittle’s premise, however, is that it is not based on facts; indeed, the “flurry of ENCODE publications” did not claim that the majority of the human genome is functional (7). On the contrary, in what seems to have been a concerted, but tacit ‘silence policy,’ the ENCODE authors went out of their way not to address the ‘functionally’ of the human genome in their publications. In light of this fact, Doolittle had no choice but to build the premise for his paper on secondary sources offered by various science writers (8-10), who were apparently caught into a publicity stunt orchestrated on the side by a few of ENCODE scientists. Whether Doolittle’s approach of using secondary sources, which is a strong departure from conventional academic standards, sets up a hasty precedent for the scientific literature remains to be seen.
So, why wasn’t the ENCODE project designed in context of the fundamental issues and knowledge about genome biology and evolution, such as the C-value paradox, limited sequence conservation among closely related species, mutational load, and the evolutionary origin of most genomic sequences from transposable elements? And, why did the ENCODE researchers choose not to address these fundamental issues in their official publications? Obviously, this makes no sense considering that their massive and expensive project was funded specifically to annotate the ‘functional sequences’ of the human genome.
Fully addressing these questions might take us deep into the science of human behavior, and might highlight deep deficiencies in our current system of funding science, which relies on a weak and closed peer review system (parenthetically, a sensible solution would be to replace this system, which is vulnerable to abuse, with a stronger and true peer review system that is open to all peers).
Nevertheless, it is inconceivable that the ENCODE leaders, who represent some of the finest science institutions in the world, were scientifically incompetent, as suggested by some critics of ENCODE (10), and were not aware of these fundamental issues on genome biology and evolution. On the contrary, it was the appreciation for this fundamental knowledge that prompted them to be silent, as this knowledge is in conflict with some of their study objectives and raises inconvenient questions about the relevance of their study and results.
However, as illustrated by Doolittle’s article, the full significance of this fundamental knowledge on genome biology and evolution is not clearly recognized in the field, which has led to tremendous confusion and has allowed projects such as ENCODE to flourish. Indeed, unfortunately, the knowledge on genome biology and evolution has yet to crystalize in clear facts. However, here is one (in large print): based on the C-value paradox, limited sequence conservation, mutational load, and the evolutionary origin of most genomic sequences from transposable elements, it is clear that MOST OF THE HUMAN GENOME CANNOT HAVE INFORMATIONAL FUNCTIONS, period.
Now that we have cracked ENCODE’s ‘code of silence’, reset some of Doolittle’s small print, and crystalized the fact that only a small fraction of the human genome has informational functions, it is time to focus on the major question remaining in the field of genome evolution and biology:
Does most of the genome in organisms with relatively high C-value have non-informational functions, or most of it is non-functional, metaphorically speaking junk?
References
(1) Doolittle WF. 2013. Is junk DNA bunk? A critique of ENCODE. Proc Natl Acad Sci USA., 110:5294-300. Doolittle WF, 2013
(2) Bandea CI. 2013. Junk DNA is bunk, but not as suggested by ENCODE or Doolittle. PubMed Commons (National Library of Medicine; Bethesda, MD). Comment on: Doolittle WF, 2013
(3) Ohno S. 1973. Evolutional reason for having so much junk DNA. In Modern Aspects of Cytogenetics: Constitutive Heterochromatin in Man (ed. R.A. Pfeiffer), pp. 169-173. F.K. Schattauer Verlag, Stuttgart, Germany.
(4) Gregory TR. 2004. Insertion-deletion biases and the evolution of genome size. Gene, 324:15-34. Gregory TR, 2004
(5) Bandea CI. 1990. A protective function for noncoding, or secondary DNA. Med. Hypoth., 31:33-4. Bandea CI, 1990
(6) Bandea CI. 2013. On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al. bioRxiv doi: 10.1101/000588; http://biorxiv.org/content/early/2013/11/18/000588
(7) ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57–74. ENCODE Project Consortium., 2012
(8) Kolata G. 2012 (September 5). Bits of mystery DNA, far from ‘junk’, play crucial role. The New York Times, Section A, p. 1.
(9) Anonymous, 2012. Cracking ENCODE. Lancet, 380:950. Anonymous, 2012
(10) Pennisi E. 2012. Genomics. ENCODE project writes eulogy for junk DNA. Science, 337:1159–1161. Pennisi E, 2012
(11) Graur D et al., 2013. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol., 5(3):578-90.Graur D, 2013
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2013 Dec 08, Claudiu Bandea commented:
Junk DNA is bunk, but not as suggested by ENCODE or Doolittle
“Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete?” (1). Echoing much of the response by evolutionary biologists to ENCODE’s suggestion that more than 80% of the human genome is functional (2), Ford Doolittle’s answer is clearly ‘No’ (1).
However, Doolittle did not write 7 pages of small print text in PNAS (1) to discredit ENCODE’s questionable suggestion; many people only needed a paragraph or two (3; but see Ref 4). Instead, the author uses the ENCODE momentum to cover half of century of research and thinking on the evolution of genome size, junk DNA (jDNA) and the C-value enigma in search of solutions to these top remaining puzzles in genome biology. Doolittle navigates through deep conceptual gaps left open by decades of neglect in defining even the most basic notions, such as the meaning of biological function, and concludes his epic journey with a sensible prescription: “A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed” and that, by building this theoretical framework, “Much that we now call junk could then become functional” (1).
Like other scholars in the field of genome evolution (see, for example, Refs 3 and 4), Doolittle starts building his theoretical framework by outlining convincing data and arguments that exclude informational roles for most jDNA. Addressing non-informational roles for jDNA, however, is a much more complex and confusing issue, to the extent that the author’s narrative gets entangled as it navigates very close to a school of red herrings: “I submit that, up until now, junk has been used to denote DNA whose presence cannot reasonably be explained by natural selection at the level of the organism for encoded informational roles” (bold added) (1).
However, ever since the term ‘junk DNA’ was introduced a few decades ago (apparently, in the 1960’s, not in the 70’s as previously assumed; see Ref. 5) as jargon for presumably non-functional genomic DNA in species with high C-value, it denoted DNA whose presence could not be reasonably explained by natural selection at the level of the organism for both informational and non-informational roles, and implying otherwise is, well, fishy. Moreover, saying that “junk advocates have to date generally considered that even DNA fulfilling bulk structural roles remains, in terms of encoded information, just junk” (1) is also deceiving considering that the nucleo-skeletal and nucleotypic theories, which “describe genome size variation as the outcome of selection via intermediate of cell size” (6) have dominated the thinking in field of genome size evolution for decades; obviously, it would be equally nonsensical to state that from the perspective of non-informational functions of jDNA, the informational DNA is just junk.
Nevertheless, it is admirable that Doolittle embraces the nucleo-skeletal and nucleotypic theories as pillars of his theoretical framework on genome size evolution and jDNA, because, as stated in the following excerpt, the nucleotypic theory was presented as a substitute for his previous ideas (7) about jDNA as ‘selfish DNA’: “Although some researchers continue to characterize much variation in genome size as a mere by-product of an intragenomic selfish DNA "free-for-all" there is increasing evidence for the primacy of selection in molding genome sizes via impacts on cell size and division rates” (8).
So, is Doolittle’s suggestion valid? I don’t think so, not until the nucleo-skeletal and nucleotypic hypotheses or other hierarchical selection theories clearly explain the C-value enigma, and not before they pass the formidable ‘onion test’ (see Ref 4). As previously suggested (9,10), the data and observations interpreted as evidence for these theories can be explained simply as accommodating or adaptive responses by the hosts to the presence of large quantities of genomic jDNA sequences, which are there for other reasons.
Is jDNA bunk? As proposed almost a quarter century ago (9) and re-emphasized more recently (10), there is strong evidence that jDNA serves as an adaptive defensive mechanism against insertional mutagenesis (in both germline and somatic cells) by endogenous and exogenous inserting elements, such as retroviruses, which in humans and other multicellular species can lead to a high incidence of uncontrolled cellular proliferation, or cancer. Expectedly, as an adaptive, genomic defense mechanism, the amount of protective jDNA varies from one species to another based on insertional mutagenesis activity and evolutionary constraints on genome size, which might explain the evolution of genome size and C-value enigma.
References
(1) Doolittle WF. 2013. Is junk DNA bunk? A critique of ENCODE. Proc Natl Acad Sci USA., 110:5294-300. Doolittle WF, 2013
(2) ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57–74. ENCODE Project Consortium., 2012
(3) Eddy SR. 2012. The C-value paradox, junk DNA and ENCODE. Curr Biol. 6;22(21):R898-9; Eddy SR, 2012
(4) Graur D, Zheng Y, Price N, Azevedo RB, Zufall RA, Elhaik E. 2013. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol., 5(3):578-90.Graur D, 2013
(5) Graur D. 2013. The Origin of Junk DNA: A Historical Whodunnit. Judge Starling; http://judgestarling.tumblr.com/post/64504735261/the-origin-of-junk-dna-a-historical-whodunnit
(6) Gregory TR. 2004. Insertion-deletion biases and the evolution of genome size. Gene, 324:15-34). Gregory TR, 2004
(7) Doolittle WF, Sapienza C. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature. 284(5757):601-3. Doolittle WF, 1980
(8) Gregory TR, Hebert PD. 1999. The modulation of DNA content: proximate causes and ultimate consequences. Genome Res. 9(4):317-24. Gregory TR, 1999
(9) Bandea CI. 1990. A protective function for noncoding, or secondary DNA. Med. Hypoth., 31:33-4. Bandea CI, 1990
(10) Bandea CI. 2013. On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al. bioRxiv doi: 10.1101/000588; http://biorxiv.org/content/early/2013/11/18/000588
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-
- Feb 2018
-
europepmc.org europepmc.org
-
On 2013 Dec 08, Claudiu Bandea commented:
Junk DNA is bunk, but not as suggested by ENCODE or Doolittle
“Do data from the Encyclopedia Of DNA Elements (ENCODE) project render the notion of junk DNA obsolete?” (1). Echoing much of the response by evolutionary biologists to ENCODE’s suggestion that more than 80% of the human genome is functional (2), Ford Doolittle’s answer is clearly ‘No’ (1).
However, Doolittle did not write 7 pages of small print text in PNAS (1) to discredit ENCODE’s questionable suggestion; many people only needed a paragraph or two (3; but see Ref 4). Instead, the author uses the ENCODE momentum to cover half of century of research and thinking on the evolution of genome size, junk DNA (jDNA) and the C-value enigma in search of solutions to these top remaining puzzles in genome biology. Doolittle navigates through deep conceptual gaps left open by decades of neglect in defining even the most basic notions, such as the meaning of biological function, and concludes his epic journey with a sensible prescription: “A larger theoretical framework, embracing informational and structural roles for DNA, neutral as well as adaptive causes of complexity, and selection as a multilevel phenomenon, is needed” and that, by building this theoretical framework, “Much that we now call junk could then become functional” (1).
Like other scholars in the field of genome evolution (see, for example, Refs 3 and 4), Doolittle starts building his theoretical framework by outlining convincing data and arguments that exclude informational roles for most jDNA. Addressing non-informational roles for jDNA, however, is a much more complex and confusing issue, to the extent that the author’s narrative gets entangled as it navigates very close to a school of red herrings: “I submit that, up until now, junk has been used to denote DNA whose presence cannot reasonably be explained by natural selection at the level of the organism for encoded informational roles” (bold added) (1).
However, ever since the term ‘junk DNA’ was introduced a few decades ago (apparently, in the 1960’s, not in the 70’s as previously assumed; see Ref. 5) as jargon for presumably non-functional genomic DNA in species with high C-value, it denoted DNA whose presence could not be reasonably explained by natural selection at the level of the organism for both informational and non-informational roles, and implying otherwise is, well, fishy. Moreover, saying that “junk advocates have to date generally considered that even DNA fulfilling bulk structural roles remains, in terms of encoded information, just junk” (1) is also deceiving considering that the nucleo-skeletal and nucleotypic theories, which “describe genome size variation as the outcome of selection via intermediate of cell size” (6) have dominated the thinking in field of genome size evolution for decades; obviously, it would be equally nonsensical to state that from the perspective of non-informational functions of jDNA, the informational DNA is just junk.
Nevertheless, it is admirable that Doolittle embraces the nucleo-skeletal and nucleotypic theories as pillars of his theoretical framework on genome size evolution and jDNA, because, as stated in the following excerpt, the nucleotypic theory was presented as a substitute for his previous ideas (7) about jDNA as ‘selfish DNA’: “Although some researchers continue to characterize much variation in genome size as a mere by-product of an intragenomic selfish DNA "free-for-all" there is increasing evidence for the primacy of selection in molding genome sizes via impacts on cell size and division rates” (8).
So, is Doolittle’s suggestion valid? I don’t think so, not until the nucleo-skeletal and nucleotypic hypotheses or other hierarchical selection theories clearly explain the C-value enigma, and not before they pass the formidable ‘onion test’ (see Ref 4). As previously suggested (9,10), the data and observations interpreted as evidence for these theories can be explained simply as accommodating or adaptive responses by the hosts to the presence of large quantities of genomic jDNA sequences, which are there for other reasons.
Is jDNA bunk? As proposed almost a quarter century ago (9) and re-emphasized more recently (10), there is strong evidence that jDNA serves as an adaptive defensive mechanism against insertional mutagenesis (in both germline and somatic cells) by endogenous and exogenous inserting elements, such as retroviruses, which in humans and other multicellular species can lead to a high incidence of uncontrolled cellular proliferation, or cancer. Expectedly, as an adaptive, genomic defense mechanism, the amount of protective jDNA varies from one species to another based on insertional mutagenesis activity and evolutionary constraints on genome size, which might explain the evolution of genome size and C-value enigma.
References
(1) Doolittle WF. 2013. Is junk DNA bunk? A critique of ENCODE. Proc Natl Acad Sci USA., 110:5294-300. Doolittle WF, 2013
(2) ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57–74. ENCODE Project Consortium., 2012
(3) Eddy SR. 2012. The C-value paradox, junk DNA and ENCODE. Curr Biol. 6;22(21):R898-9; Eddy SR, 2012
(4) Graur D, Zheng Y, Price N, Azevedo RB, Zufall RA, Elhaik E. 2013. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol., 5(3):578-90.Graur D, 2013
(5) Graur D. 2013. The Origin of Junk DNA: A Historical Whodunnit. Judge Starling; http://judgestarling.tumblr.com/post/64504735261/the-origin-of-junk-dna-a-historical-whodunnit
(6) Gregory TR. 2004. Insertion-deletion biases and the evolution of genome size. Gene, 324:15-34). Gregory TR, 2004
(7) Doolittle WF, Sapienza C. 1980. Selfish genes, the phenotype paradigm and genome evolution. Nature. 284(5757):601-3. Doolittle WF, 1980
(8) Gregory TR, Hebert PD. 1999. The modulation of DNA content: proximate causes and ultimate consequences. Genome Res. 9(4):317-24. Gregory TR, 1999
(9) Bandea CI. 1990. A protective function for noncoding, or secondary DNA. Med. Hypoth., 31:33-4. Bandea CI, 1990
(10) Bandea CI. 2013. On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al. bioRxiv doi: 10.1101/000588; http://biorxiv.org/content/early/2013/11/18/000588
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2014 Jul 16, Claudiu Bandea commented:
Closing the gap between ‘words’ and ‘facts’ in evaluating genome biology and the ENCODE project
When I referred to Ford Doolittle’s article (1) as “7 pages of small print text” (2), I meant it both, literary and figuratively. Indeed, Doolittle’s paper is a remarkable example of fine print, which is likely to induce heated arguments for a long time, just as the author concluded: “many of the most heated arguments in biology are not about facts at all but rather about the words that we use to describe what we think the facts might be.”
Remarkably, more than 40 years ago, Susumu Ohno started his famous paper on junk DNA (3) with a related statement: “Over the years, I have learned that there is no such thing as a fact. What passes for a fact is in truth a set of observations and its interpretation. Therefore, the interpretation is just as important to a fact as the observation itself.”
In my previous mini-essay (2) on Doolittle’s article (1), I made 3 main points:
(i) Ever since the notion of ‘junk DNA’ (jDNA) was introduced as a metaphor for presumably non-functional genomic DNA in species with relatively high C-value, it has been clearly used by the scientists in the field of genome biology and evolution to denote genomic DNA that has no biological function at all, whether informational (iDNA) or non-informational (niDNA), and implying otherwise is nonsensical.
(ii) The two main theories on putative non-informational functions for the so called jDNA, the nucleo-skeletal and the nucleotypic functions, which “describe genome size variation as the outcome of selection via the intermediate of cell size” (4), and which Doolittle uses as pillars for his theoretical framework on genome evolution and biology, do not explain the C-value paradox.
(iii) Apparently, Doolittle is not aware of the theory that most genomic niDNA, redefined as symbiotic DNA (sDNA), functions as a protective mechanism (adaptive genomic immunity) against deleterious insertional mutagenesis by endogenous and exogenous inserting elements, such as retroviruses, and that this theory is fully supported by the current data and observations and it explains the C-value paradox (5, 6).
Here, I attempt to further close the gap between 'words' and 'facts' in addressing the genome biology and in evaluating the ENCODE project.
In the introductory section, Doolittle outlines the premise for his paper: “a flurry of articles and letters”, published in Nature in other journals under the umbrella of the ENCODE project, “collectively claim function for the majority of the 3.2 Gb human genome”, which, if true, would debunk the notion of jDNA (1). The problem with Doolittle’s premise, however, is that it is not based on facts; indeed, the “flurry of ENCODE publications” did not claim that the majority of the human genome is functional (7). On the contrary, in what seems to have been a concerted, but tacit ‘silence policy,’ the ENCODE authors went out of their way not to address the ‘functionally’ of the human genome in their publications. In light of this fact, Doolittle had no choice but to build the premise for his paper on secondary sources offered by various science writers (8-10), who were apparently caught into a publicity stunt orchestrated on the side by a few of ENCODE scientists. Whether Doolittle’s approach of using secondary sources, which is a strong departure from conventional academic standards, sets up a hasty precedent for the scientific literature remains to be seen.
So, why wasn’t the ENCODE project designed in context of the fundamental issues and knowledge about genome biology and evolution, such as the C-value paradox, limited sequence conservation among closely related species, mutational load, and the evolutionary origin of most genomic sequences from transposable elements? And, why did the ENCODE researchers choose not to address these fundamental issues in their official publications? Obviously, this makes no sense considering that their massive and expensive project was funded specifically to annotate the ‘functional sequences’ of the human genome.
Fully addressing these questions might take us deep into the science of human behavior, and might highlight deep deficiencies in our current system of funding science, which relies on a weak and closed peer review system (parenthetically, a sensible solution would be to replace this system, which is vulnerable to abuse, with a stronger and true peer review system that is open to all peers).
Nevertheless, it is inconceivable that the ENCODE leaders, who represent some of the finest science institutions in the world, were scientifically incompetent, as suggested by some critics of ENCODE (10), and were not aware of these fundamental issues on genome biology and evolution. On the contrary, it was the appreciation for this fundamental knowledge that prompted them to be silent, as this knowledge is in conflict with some of their study objectives and raises inconvenient questions about the relevance of their study and results.
However, as illustrated by Doolittle’s article, the full significance of this fundamental knowledge on genome biology and evolution is not clearly recognized in the field, which has led to tremendous confusion and has allowed projects such as ENCODE to flourish. Indeed, unfortunately, the knowledge on genome biology and evolution has yet to crystalize in clear facts. However, here is one (in large print): based on the C-value paradox, limited sequence conservation, mutational load, and the evolutionary origin of most genomic sequences from transposable elements, it is clear that MOST OF THE HUMAN GENOME CANNOT HAVE INFORMATIONAL FUNCTIONS, period.
Now that we have cracked ENCODE’s ‘code of silence’, reset some of Doolittle’s small print, and crystalized the fact that only a small fraction of the human genome has informational functions, it is time to focus on the major question remaining in the field of genome evolution and biology:
Does most of the genome in organisms with relatively high C-value have non-informational functions, or most of it is non-functional, metaphorically speaking junk?
References
(1) Doolittle WF. 2013. Is junk DNA bunk? A critique of ENCODE. Proc Natl Acad Sci USA., 110:5294-300. Doolittle WF, 2013
(2) Bandea CI. 2013. Junk DNA is bunk, but not as suggested by ENCODE or Doolittle. PubMed Commons (National Library of Medicine; Bethesda, MD). Comment on: Doolittle WF, 2013
(3) Ohno S. 1973. Evolutional reason for having so much junk DNA. In Modern Aspects of Cytogenetics: Constitutive Heterochromatin in Man (ed. R.A. Pfeiffer), pp. 169-173. F.K. Schattauer Verlag, Stuttgart, Germany.
(4) Gregory TR. 2004. Insertion-deletion biases and the evolution of genome size. Gene, 324:15-34. Gregory TR, 2004
(5) Bandea CI. 1990. A protective function for noncoding, or secondary DNA. Med. Hypoth., 31:33-4. Bandea CI, 1990
(6) Bandea CI. 2013. On the concept of biological function, junk DNA and the gospels of ENCODE and Graur et al. bioRxiv doi: 10.1101/000588; http://biorxiv.org/content/early/2013/11/18/000588
(7) ENCODE Project Consortium. 2012. An integrated encyclopedia of DNA elements in the human genome. Nature, 489:57–74. ENCODE Project Consortium., 2012
(8) Kolata G. 2012 (September 5). Bits of mystery DNA, far from ‘junk’, play crucial role. The New York Times, Section A, p. 1.
(9) Anonymous, 2012. Cracking ENCODE. Lancet, 380:950. Anonymous, 2012
(10) Pennisi E. 2012. Genomics. ENCODE project writes eulogy for junk DNA. Science, 337:1159–1161. Pennisi E, 2012
(11) Graur D et al., 2013. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol., 5(3):578-90.Graur D, 2013
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY. -
On 2014 Jul 25, Claudiu Bandea commented:
Everlasting confusion on ‘functional DNA’ and ‘junk DNA’
As discussed by Doolittle (1), addressing and defining the functionality of genomic DNA can be challenging. However, our common sense combined with our philosophical instinct - we know a function when we see one - should allow us to sensibly address the biology of our genome. After all, the ENCODE ‘function fiasco’ was not the result of misunderstanding the concept of biological function, nor was it due to scientific incompetence as suggested by others (2). On the contrary, because it conflicted with some of the project’s objectives and with its significance, there was a concerted effort not to bring this concept forward (3); indeed, as clearly shown in a recent ENCODE publication (4), at least some ENCODE members seem well aware of the scientific rationale and criteria for addressing putative biological functions for genomic DNA.
Nevertheless, as concluded by Doolittle (1), poor use of words in communicating scientific observations, and lack of attention to detail have led to significant misrepresentations and confusion. Here are a few examples spanning more than four decades.
In a recent study entitled “Multiple knockout mouse models reveal lncRNAs are required for life and brain,” which addressed putative biological functions of long noncoding RNAs (lncRNAs), it was concluded that “This study demonstrates that lncRNAs play critical roles in vivo…” (5). Unfortunately, both the title and the conclusion misrepresent the results; an accurate interpretation of the results is that “Multiple knockout mouse models reveal that some (or a few) lncRNAs are required for life and brain,” and that “This study demonstrates that some (or a few) lncRNAs play critical roles in vivo….” As a matter of fact, based on the results of the study, which showed that only 5 of the 18 lncRNAs, which have been selected from hundreds sequences as the best candidates for being functional, a more appropriate scientific interpretation would be: “This study demonstrates that most lncRNAs (i.e. 13 out of 18) do not appear to play critical roles in vivo….” Moreover, if the goal is to evaluate this study at the highest scientific standards (as presumably required by the journal eLife, where it was published), then, I would suggest that, due to lack of appropriate control experiments such as duplicated knockout experiments, we don’t know if the observed dysfunctions were associated with specific lncRNAs, or they were caused by untargeted genome modifications introduced accidentally during the procedure of generating knockout mice, which in fact questions the validly of the entire study. As previously noted (3), all these problems reflect the deficiencies of the current limited and closed peer review system; indeed, a document reviewed by a few peers (usually, 2 or 3) can hardly be considered peer-reviewed.
The next three examples illustrate some of the confusion surrounding the concepts of biological function and junk DNA (jDNA), albeit in a more subtle way.
In one of the first papers addressing the notion of jDNA (6), David Comings writes: “These considerations suggest that up to 20% of the genome is actively used and the remaining 80+% is junk. But being junk doesn’t mean it is entirely useless. Common sense suggests that anything that is completely useless would be discarded.”
In their iconic paper “Selfish DNA: the ultimate parasite” (7), Leslie Orgel and Francis Crick, summarize their thoughts as follows: “The DNA of higher organisms usually falls into two classes, one specific and the other comparatively nonspecific. It seems plausible that most of the latter originated by the spreading of sequences which had little or no effect on the phenotype.”
More recently, in a publication addressing the ENCODE project (8), Sean Eddy, remarked: “These data support a view that eukaryotic genomes contain a substantial fraction of DNA that serves little useful purpose for the organism, much of which has originated from the replication of transposable (selfish) elements.”
Although seemingly innocent, these citations are relevant examples of poor use of words. Some of this confusion might dissipate by recognizing that: (i) if genomic sequences are not entirely useless, (ii) if they have little effect on the phenotype, or (iii) if they serve little useful purpose for the organism, then, these DNA sequences are functional, period.
In the next example, the protagonists are Michael Eisen, the host of a popular blog suggestively named “it is NOT junk”, and Ryan Gregory, the host of another popular blog “Genomicron”. Interestingly, Eisen was the PNAS editor in charge of Doolittle’s article on jDNA (1), and, according to Doolittle, Gregory is “now the principal C-value theorist” (1). Undoubtedly, Eisen and Gregory are among the most knowledgeable and versed communicators on genome biology and jDNA and, therefore, their ‘words’ are representative of the thinking in the field.
Immediately after the publication of ENCODE’s flurry of articles and the associated publicity stunt orchestrated by a few ENCODE scientists, both Eisen and Gregory reacted forcefully, but apparently from opposite perspectives. In a post in which he blasts ENCODE as “a carefully orchestrated spectacle” (9), Eisen writes: “nobody actually thinks that non-coding DNA is ‘junk’ any more. It’s an idea that pretty much only appears in the popular press… It is dishonest – nobody can credibly claim this to be a finding of ENCODE….”
In a prompt response (10), entitled “Michael Eisen’s take on ENCODE — there’s no junk?”, Gregory goes into a detailed and cynical questioning of Eisen’s perspective. Fortunately, due to the open communication platform offered by these blogs, this scientific ‘drama’ ended within hours, when Eisen apparently clarified: “I was not saying that everybody knows that 100% of the genome is functional! I was saying that nobody thinks that 100% of non-coding DNA is non-functional” (10).
Is there an end to this distressing confusion in the field of genome biology? Unlikely, if the funding system continues to focus primarily on generating data and observations, while neglecting their interpretation and integration into productive conceptual frameworks; in other words, you only get what you pay for.
References
(1) Doolittle WF. 2013. Is junk DNA bunk? A critique of ENCODE. Proc Natl Acad Sci USA., 110:5294-300. Doolittle WF, 2013
(2) Graur D et al., 2013. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biol Evol., 5:578-90. Graur D, 2013
(3) Bandea CI. 2014. Closing the gap between ‘words’ and ‘facts’ in evaluating genome biology and the ENCODE project. PubMed Commons (National Library of Medicine; Bethesda, MD). Comment on: Doolittle WF, 2013
(4) Kellis M. et al., 2014. Defining functional DNA elements in the human genome. Proc Natl Acad Sci USA., 111:6131-8. Kellis M, 2014
(5) Sauvageau M. et al., 2014. Multiple knockout mouse models reveal lincRNAs are required for life and brain development. eLife (DOI: 10.7554/eLife.01749) Sauvageau M, 2013
(6) Comings DE.1972. The structure and function of chromatin. Adv Hum Genet. 3:237-431. Comings DE, 1972
(7) Orgel LE, Crick FH. 1980. Selfish DNA: the ultimate parasite. Nature. 284:604-7. Orgel LE, 1980
(8) Eddy SR. 2012. The C-value paradox, junk DNA and ENCODE. Curr Biol. 22:898-9. Eddy SR, 2012
(9) Eisen M. 2012. This 100,000 word post on the ENCODE media bonanza will cure cancer. Blog: it is NOT junk. http://www.michaeleisen.org/blog/?p=1167
(10) Gregory TR. 2012. Michael Eisen’s take on ENCODE — there’s no junk? Blog: Genomicron. http://www.genomicron.evolverzone.com/2012/09/michael-eisens-take-on-encode-theres-no-junk/
This comment, imported by Hypothesis from PubMed Commons, is licensed under CC BY.
-