They published the results of a pilot study back in July 2007 (ENCODE, 2007) in which they analyzed a specific 1% of the human genome. That result suggested that much of our genome is transcribed at some time or another or in some cell type (pervasive transcription). The consortium also showed that the genome was littered with DNA binding sites that were frequently occupied by DNA binding proteins.
Genomes & Junk DNAAll of this suggested strongly that most of our genome has a function. However, in the actual paper the group was careful not to draw any firm conclusions.
... we also uncovered some surprises that challenge the current dogma on biological mechanisms. The generation of numerous intercalated transcripts spanning the majority of the genome has been repeatedly suggested, but this phenomenon has been met with mixed opinions about the biological importance of these transcripts. Our analyses of numerous orthogonal data sets firmly establish the presence of these transcripts, and thus the simple view of the genome as having a defined set of isolated loci transcribed independently does not seem to be accurate. Perhaps the genome encodes a network of transcripts, many of which are linked to protein-coding transcripts and to the majority of which we cannot (yet) assign a biological role. Our perspective of transcription and genes may have to evolve and also poses some interesting mechanistic questions. For example, how are splicing signals coordinated and used when there are so many overlapping primary transcripts? Similarly, to what extent does this reflect neutral turnover of reproducible transcripts with no biological role?This didn't stop the hype. The results were widely interpreted as proof that most of our genome has a function and the result featured prominently in the creationist literature.
I don't blame science journalists for this. Lots of scientists also used the ENCODE result in 2007 to attack junk DNA. They honestly felt at the time that if a sequence was transcribed, no matter how rarely, it must have a function. They honestly felt that if a DNA binding protein bound to a piece of DNA then that site had a function.
Other scientists expressed skepticism over the interpretation of the ENCODE pilot project result. Some of them even disputed the data by showing that different techniques gave a different result on the pervasiveness of transcription. The most famous of these papers is the once from my colleagues here at the University of Toronto, Ben Blencow and Tim Hughes (van Bakel et al. 2010). There was lots of activity in the blogosphere as well [Pervasive Transcription].
The bottom line is that after five years of debate and discussion it is well established that just because a fragment of DNA is transcribed does not mean that it has a function. Transcription could be accidental and the product could be junk RNA [Useful RNAs?] [What is a gene, post-ENCODE?] [Junk RNA]. We now know How to Evaluate Genome Level Transcription Papers.
I'm not saying the issue is settled, although I strongly favor the idea that most of our genome is junk. What I'm saying is that in spite of the hype in 2007 the supporters of junk DNA have made a good case and this is still a legitimate scientific controversy.
We have also pointed out that just because a site is occupied by a DNA binding protein does not mean that it is functional. In fact, once you understand how DNA binding proteins work you expect many of them to be sitting nonproductively at sites that resemble the actual functional binding site [DNA Binding Proteins] [Slip Slidin' Along - How DNA Binding Proteins Find Their Target]. It has been widely known since 1976 that the problem with large genomes is that they soak up DNA binding proteins that are binding nonspecifically to DNA (Yamamoto and Alberts, 1976). This is not controversial, if you know what you're talking about.
Now comes the followup ENCODE study extended to cover (almost) the entire genome. The results are published in 30 papers, several of them in a single issue of Nature (Sept. 6, 2012) [Nature ENCODE: Research Papers]. I haven't read all the papers but my first impression is that there's not much that's new except that the dataset is now more complete. Here's what the consortium members say in the abstract [An integrated encyclopedia of DNA elements in the human genome].
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.Naturally this was interpreted by science journalists as proof that most of our genome isn't junk. Examples include, unfortunately, Ed Yong [ENCODE: the rough guide to the human genome], Fergus Walsh of the BBC [Detailed map of genome function], and Gina Kolata of The New York Times [Bits of Mystery DNA, Far From ‘Junk,’ Play Crucial Role].
UPDATE: Ryan Gregory has collected a bunch of articles in the popular press: The ENCODE media hype machine.At least one science journalist has put his interpretation on video. Here's Ian Sample of The Guardian [What the Encode project tells us about the human genome and 'junk DNA' - video]. You really need to watch it to see the extent of the problem. I wonder how long this will stay up?
This is 2012. A simple Google search will reveal that the concept of junk DNA is still alive and well. A search like that will also reveal the problems with interpreting the ENCODE result since we've had years of debate over the initial pilot study. There's no excuse for this kind of sloppy journalism.
Science journalist have been badly burned several times in the past few years. Surely they should know by now that a single paper on a new fossil won't overthrow our understanding of human evolution [Good Science? Bad Science Journalism?] nor will a single paper on arsenic in DNA make me rewrite my textbook. Science doesn't work that way. A single study won't cause us to entirely re-think our concept of the genome even if it's in thirty papers in Nature.
Responsible science journalist should have dug deeper to find out whether the new ENCODE data was any better than the earlier data and whether their interpretation of the results is being widely accepted in the scientific community. They don't have an excuse this time.
[The scientists who wrote the paper and the scientists who reviewed it will get theirs in a separate post.]
The ENCODE Project Consortium (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 799-816. [doi:10.1038/nature05874]
van Bakel, H., Nislow, C., Blencowe, B. and Hughes, T. (2010) Most "Dark Matter" Transcripts Are Associated With Known Genes. PLoS Biology 8: e1000371 [doi:10.1371/journal.pbio.1000371]
Yamamoto, K.R. and Alberts, B.M. (1976) Steroid Receptors: Elements for Modulation of Eukaryotic Transcription. Ann. Rev. Biochm. 45:721-746. [