Thursday, September 06, 2012

The ENCODE Data Dump and the Responsibility of Scientists

A few hours ago I criticized science journalists for getting suckered by the hype surrounding the publication of 30 papers from the ENCODE Consortium on the function of the human genome [The ENCODE Data Dump and the Responsibility of Science Journalists].

They got their information from supposedly reputable scientists but that's not an excuse. It is the duty and responsibility of science journalists to be skeptical of what scientists say about their own work. In this particular case, the scientists are saying the same things that were thoroughly criticized in 2007 when the preliminary results were published.

I'm not letting the science journalists off the hook but I reserve my harshest criticism for the scientists, especially Ewan Birney who is the lead analysis coordinator for the project and who has taken on the role as spokesperson for the consortium. Unless other members of the consortium speak out, I'll assume they agree with Ewan Birney. They bear the same responsibility for what has happened.

Ewan Birney is listed as the corresponding author for the main summary paper in Nature: An integrated encyclopedia of DNA elements in the human genome. Here's the opening paragraph,
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
I've highlighted the main take-home message.

The papers show no such thing as Ewan Birney admits on his own blog [ENCODE: My own thoughts].
It’s clear that 80% of the genome has a specific biochemical activity – whatever that might be. This question hinges on the word “functional” so let’s try to tackle this first. Like many English language words, “functional” is a very useful but context-dependent word. Does a “functional element” in the genome mean something that changes a biochemical property of the cell (i.e., if the sequence was not here, the biochemistry would be different) or is it something that changes a phenotypically observable trait that affects the whole organism? At their limits (considering all the biochemical activities being a phenotype), these two definitions merge. Having spent a long time thinking about and discussing this, not a single definition of “functional” works for all conversations. We have to be precise about the context. Pragmatically, in ENCODE we define our criteria as “specific biochemical activity” – for example, an assay that identifies a series of bases. This is not the entire genome (so, for example, things like “having a phosphodiester bond” would not qualify). We then subset this into different classes of assay; in decreasing order of coverage these are: RNA, “broad” histone modifications, “narrow” histone modifications, DNaseI hypersensitive sites, Transcription Factor ChIP-seq peaks, DNaseI Footprints, Transcription Factor bound motifs, and finally Exons.
In other words, "functional" simply means a little bit of DNA that's been identified in an assay of some sort or another.

For someone who claims to have "spent a long time thinking about and discussing this" that's a remarkably silly definition of function and if you're using it to discount junk DNA it's downright disingenuous. Did Birney really not anticipate all the hype about refuting junk DNA? Come on, he can't have been that stupid, could he?

Here's the video he prepared with Magdalena Skipper, Senior Editor at Nature. Check out what she says at 2:28
The striking overall result that the ENCODE project reports is that they can assign a function, a biochemical function, to 80% of the human genome. The reason why this is striking is because, not such a long time ago, we still considered that the vast proportion of the human genome was simply junk because we know that it's only 3% that encodes proteins.
Where did she get that idea if not from Ewan Birney? Watch Birney's performance to see if he challenges this interpretation or supports the concept that most of the human genome is involved in a vast network of complex controls.

Scientist have a responsibility to be scrupulously accurate when they present their own work to the general public. That means they should recognize the difference between what the data actually says and their own interpretation of the data. When scientists know that there are other ways to interpret the data, they are obliged to mention that. That's the mark of a good scientist. In this case Birney is well aware of the controversy over interpreting pervasive transcription and the possible insignificance of a DNA binding site. He knows that because the ENCODE Consortium was challenged in 2007 when it presented the results of the pilot project (see The ENCODE Data Dump and the Responsibility of Science Journalists).

This is, unfortunately, another case of a scientist acting irresponsibly by distorting the importance and the significance of the data. It's getting to be a serious problem and it makes it hard to convey real science to the general public. The public now believes that the concept of junk DNA has been rejected by scientists and that our huge genome really is full of wonderful sophisticated control elements regulating the expression of every gene.

It's going to take a lot of effort to undo the damage caused by scientists like Ean Birney.


  1. I agree. Birney's behavior and the content of the abstact is despicable. He's using equivocation regarding one definition of function, which he can demonstrate, to claim another definition of function, which he cannot demonstrate.

    However, journalism loves a "paradigm shift" and will lie to create one.

  2. Of course you will know more of them, but I have yet to meet a science communicator who understands science, full stop.

    "What is a P value?"


    "The text the scientist has given us has too many technical terms, let's just rewrite it how I think it is better and then publish it immediately without asking him to verify its correctness again."

  3. I find this more interesting than the usual bad science coverage, though, because the target appears to be scientists in other fields and moderately savvy laypeople.

    Before I was exposed to the substantial body of positive evidence in favour of a large portion of the genome being junk, I usually reacted to mentions of junk DNA with "argument from ignorance--it's probably regulatory." The sort of intellectual sloth that dismisses the unknown as unimportant really has plagued scientific communities from time to time, after all, and on the surface the idea of junk DNA really does seem to fit that pattern quite well.

    The ENCODE PR seems carefully calculated to make people say "Ha! I knew it all along!" It doesn't surprise me in the least that the misinformation is spreading like an Ebola outbreak.

  4. If he considers polyploidy for genome size differences he can't be a biologist. Where has he been since 2007? ENCODE was not only critisized in the blogosphere and has been critisism in peer reviewed journals (e.g. on dark matter transcripts). However, without the hype ENCODE is stirring up.

    I guess one of the problems that cause situations like this is the different speed of tecnolgical and scientific development. Of course it is interesting how different parts of the genome interact with each other and how phenotypes are regulated. However, such research, especially if hundreds of people and millions of dollars are involved, doesn't ask proper questions. Otherwise you would have to accept hypotheses like "everything relates to everything or at least to something". Besides such fig question big science is free of hypotheses. It collects data just because it can. One day, of course, Big Science has to publish to maintain Big Science big. And to make it even bigger it has to publish as much data as possible which has the nice side effect that the hollowness of at least some of Big Science claims (in the case of ENCODE surely the "80% of the genome is functional" claim) can be hidden from the public and (more importantly) the funding agencies. Still, there is hope that it becomes obvious to them that the Emporer is naked or at least not fully dressed.

  5. I think that Ewan Birney is a scientist who spends most of his time surrounded by scientists and is used to people have intelligent discourse in which they listen to what he says in context and do not jump on soundbites. I am sure his interview was edited and I suspect that (as head of EBI) he is far more excited by the Gigabytes of data released and the 30 papers and the iPad app for exploring the results etc. etc. than he cares about people misinterpreting "junk DNA" arguments and getting hot under the collar.

    You claim: "I've highlighted the main take-home message."

    Crap. You have highlighted the sentence you want to attack. The take home message of ENCODE is that much more of the genome is active (but not necessarily important) than we thought and much more regulation is long-range. If you actually listen to Ewan Birney in the clip that you put up, he talks about moving away from a idea where regulation is neatly packaged and well organised, to a "jungle" - a complex network. That is NOT saying that "junk DNA" is dead. That is not what he is even interested in.

    The problem with the hype is places, such as this blog, jumping on soundbites and single sentences out of context and then going on crusades to shout them down as wrong just because you disapprove of their language. It comes across as emotional, irrational and driven by adherence to dogma. You know you are right, so no need to read the 30 papers (yawn) or explore the interesting question of what does it mean for something to be "functional". You don't even quote the bit where he justifies why he settled on using the 80% figure as a summary. Just attack, attack, attack, like any good Creationist would. (Not suggesting that you are a Creationist, just that you are behaving like one.)

    How any scientist can be faced with the publication of such an amazing resource as ENCODE, with all its data, all the questions it raises, all the interesting things to discuss (and not always agree about) and just focus on one or two sentences in the press releases that they don't like, boggles my mind. I know you are obsessed with neutrality and fighting Creationists but that is not all that science is about.

    I am reminded why I stopped reading this blog for a while and think it is time for another break. If you give people a bit more benefit of the doubt and remember that they are (in the case of Ewan Birney anyway) highly intelligent, who knows, you might actually learn something - but that does not seem to be the goal of this blog. It is so much easier to criticise science than do it. And so much easier to twist someone's words into something you know how to attack than actually try to understand their data and what they might really be saying.

    I would just ask you to consider all the damage that you are doing by your determination to drag the reputation of scientists - and science - through the mud but I know it is probably futile for frequenting this blog has made one thing very clear: Larry Moran is never wrong, has never made an error of judgement and already knows all the answers.

    (Feel free to unleash the ad hominems, for I won't be reading the replies.)

    1. cabbages,

      I'm inclined to agree, broadly. I think the Creationists are almost controlling the debate, because people who take the fight to them are hypersensitive over their familiar tendency to misuse terms, and crow over anything they can spin. But I don't know to what extent Birney knows or cares about what Creationists might think. Or whether it really matters - bollocks to 'em!

      I think the use of the term 'function' is debateable - but as a programmer, I can see how a bioinformatician might find that a perfectly reasonable term. Functional 'code' is code that does something, even if it is not important or relevant to the result. The genome is not code in a strict sense, but it is sufficiently code-like for it to be a useful metaphor.

      The salient point is that they have delineated a starting set - everything in the genome that has a biochemical consequence. Then you can start to winnow it for 'meaning' etc.

      Rumours of the death of junk would appear to be greatly exaggerated, despite this voluminous category of detectable activity. If Creationists see this as a 'victory' (one in which they played no part), then good luck to 'em.

    2. "How any scientist can be faced with the publication of such an amazing resource as ENCODE, with all its data, all the questions it raises, all the interesting things to discuss (and not always agree about) and just focus on one or two sentences in the press releases that they don't like, boggles my mind."
      ENCODE may indeed be a valid source of data though some criticism on the validity of dark matter transcripts had already been made two years ago. However, Larry didn't complain about ENCODE data but the interpretation which is contrary to biological knowledge. In contrast to the just published data the interpretation of large parts of the human genome as junk has been corroborated again and again since Ohno introduced the term. Junk DNA is compatible with the observed mutation rate, with the pedigrees of our species and its relatives and it fits to the well characterized biology of repetitive elements. Even if there weren't any IDiots, YEC or OEC, ENCODE would still have to address the above mentioned issues. You shouldn't blame Larry for pointing to them.
      Unfortunately, there is bad science by otherwise good scientists quite often. However, ignoring criticism and e.g. denying the C-value paradox by stating that this is due to polyploidy like Dr. Birney did on Twitter doesn’t help to make him look good.

    3. @Allan: "Functional 'code' is code that does something, even if it is not important or relevant to the result." - Really? If a code segment spends ages calculating something and then discards the result, we should call it functional?

    4. If I were a member of the ENCODE Consortium I would be really, really, pissed off at the people who hijacked the story by making stupid statements about the interpretation of their data.

      In fact, if there really are members of the consortium who are smart enough to see the problem then I expect them to be speaking out very soon.

      So far the silence is deafening.

    5. konrad: @Allan: "Functional 'code' is code that does something, even if it is not important or relevant to the result." - Really? If a code segment spends ages calculating something and then discards the result, we should call it functional?

      Yeah, OK ... maybe I would not defend that statement to the death! :0) But I was thinking of it as a legitimate synonym for 'executable'.

  6. Show me only evidence I agree with...Friday, September 07, 2012 11:10:00 AM

    Please tell me you recognize the irony in your statement "They got their information from supposedly reputable scientists but that's not an excuse. It is the duty and responsibility of science journalists to be skeptical of what scientists say about their own work."

    When a "creationist" or ID proponent makes this same claim they are ridiculed.

    That said, I do think the work should be challenged just like any other claim and time will bare the proof.

    1. This has nothing to do with the evidence (which as far as I am aware is not controversial). It's about ENCODE's decision to use a severely misleading redefinition of "functional".

    2. When a "creationist" or ID proponent makes this same claim they are ridiculed.

      What, exactly, are you referring to?

      Are you saying that creationists are often skeptical of the results in the scientific literature?

      That's not been my experience. What they do is cherry pick the papers that support their position and tout them as the truth about science. They dismiss all scientific evidence that refutes their worldview by claiming that the scientists are either lying or extremely biased.

      That's not the same as legitimate skepticism.

    3. Censorship. It's the only thing that will protect us from the ID crowd. We must rigorously censor both research and the results from that research to ensure that nothing will ever be published that could possibly assist with their wrongheaded arguments. We know their arguments are impossibly wrong so therefore we must censor scientific inquiry and results to avoid the possibility that the public could be further confused by them.

      The ID crowd has been predicting the demise of the "junk DNA" hypothesis for years and we certainly don't want evidence to surface that essentially proves their assumptions to be correct.

      Rigorous censorship is the only way to protect the public.

    4. Censorship? It looks like Lenin and Stalin are back.

  7. Agreed completely. Unfortunately, with today's funding environment, tooting your own horn loudly and repeatedly is a condition for survival. If you don't do it, the loudest will get promoted/funded/published. This applies to everything from publishing (have you lately read any paper that wasn't "the first demonstration of..." or "highly relevant" for something or other) to funding (If I read another grant that claims to be "paradigm shifting" I'm going to scream). If this goes on for much longer, the postmodernists are going to end up being right about science being a social activity.

    1. That's the one thing the postmodernists _are_ right about.

  8. Nature News Blog has opened a discussion on the issue which you will find here:

    I've posted the following which is currently in moderation:

    Larry Moran’s complaint that ENCODE’s and Ewan Birney’s statement that 80% may be functional will feed all types of creationists was a minor concern. His foremost argument is that ENCODE ignores well established biological knowledge. E.g., the presence junk DNA is compatible with the observed mutation rates and with the pedigrees of our species and its relatives. It fits to the well characterized genetics and evolution of repetitive elements and doesn’t contradict the C-value paradox. Unfortunately, the above post doesn’t refer to the later. The big difference of genome sizes of closely related species is in stark contrast to ENCODE’s 80% value for functional DNA in the human genome. Hopefully, Ewan Birney’s twitter reply to the challenge of T. Ryan Gregory’s “onion test” was tongue-in-cheek or does he really believe that C-value differences are due to polyploidy? Dr. Moran is also criticizing that ENCODES frivolously communicated 80% number is based on a very loose definition of functional DNA. He further points out that it has been known that non-coding DNA is not the same as junk DNA decades before ENCODE published this notion. Isn’t checking the methodology and discussing the presented conclusions in the light of older research and current knowledge exactly what ENCODE authors would expect from their peers?

    1. I refuse to comment on that site because: (a) the terms and conditions take far too long to read, and (b) they ask for information that I don't want to give them.

  9. Thanks for this post. I think that nature and M. Skipper have the biggest responsability here. I reada lot of bullshit in this journal, but this is by far the worst.

  10. This thread may be stale, but I thought this post about a PLoS Medicine paper shows how the "spin" in the abstract correlates with the sensationalism in the science reporting.

    Could not have asked for better timing ;-)

    1. More embedding link problems. For the article, go to;