ENCODE researchers answered a bunch of question on Reddit a few days ago. I asked them to give their opinion on how much junk DNA is in our genome but they declined to answer that question. However, I think we can get some idea about the current thinking in the leading labs by looking at the questions they did choose to answer. I don't think the picture is very encouraging. It's been almost five years since the ENCODE publicity disaster of September 2012. You'd think the researchers might have learned a thing or two about junk DNA since that fiasco.The question and answer session on Reddit was prompted by award of a new grant to ENCODE. They just received 31.5 million dollars to continue their search for functional regions in the human genome. You might have guessed that Dan Graur would have a few words to say about giving ENCODE even more money [Proof that 100% of the Human Genome is Functional & that It Was Created by a Very Intelligent Designer @ENCODE_NIH].
Here's the list of researchers who answered questions on Reddit (Feb. 9, 2017).
- Nadav Ahituv, UCSF professor in the department of bioengineering and therapeutic sciences. Interested in gene regulation and how its alteration leads to morphological differences between organisms and human disease. Loves science and juggling.
- Elise Feingold: Lead Program Director, Functional Genomics Program, NHGRI. I’ve been part of the ENCODE Project Management team since its start in 2003. I came up with the project’s name, ENCODE!
- Dan Gilchrist, Program Director, Computational Genomics and Data Science, NHGRI. I joined the ENCODE Project Management team in 2014. Interests include mechanisms of gene regulation, using informatics to address biological questions, surf fishing.
- Mike Pazin, Program Director, Functional Genomics Program, NHGRI. I’ve been part of the ENCODE Project Management team since 2011. My background is in chromatin structure and gene regulation. I love science, learning about how things work, and playing music.
- Yin Shen: Assistant Professor in Neurology and Institute for Human Genetics, UCSF. I am interested in how genetics and epigenetics contribute to human health and diseases, especial for the human brain and complex neurological diseases. If I am not doing science, I like experimenting in the kitchen.
When asked about repeat sequences (e.g. LTRs, LINES, SINES, etc.) in the genomes, here's how Nadav Ahituv responded.
Great question and one that my lab is actually very interested in and has active research on! With time and a lot of cool research, repeats are being found to have important functions in our genome. Many of them have been what's called "exapted." This is a term used in evolutionary biology to describe a trait that has been co-opted for a use other than the one for which natural selection originally built it. There are several cases where repeats have been found to turn into additional exons of existing genes, or gene regulatory elements that regulate other genes and change genome structure. Of note also, in the new phase of ENCODE, what we call affectionately call ENCODE phase 4, there is actually a computational group, led by Ting Wang from Washington University in St. Louis, who will specifically study the role of repeats in gene regulation. - NadavYou can see that several readers on Reddit tried to set the record straight. This happened several times during the session and every time ENCODE researchers were challenged they ignored the challenge. Here's how I would have responded.
Great question! The vast majority of repeat sequences appear to be junk DNA by any reasonable definition. They are mostly broken transposons and fragments of transposons. A tiny percentage have become secondarily functional by adopting new roles in gene expression but the overall picture indicates that most of these repeats have no biological function. This is the largest category of junk DNA.Someone asked about noncoding RNAs, specifically whether variation in human populations could be used to confirm they were functional. Here's how Nadav Ahituv responded.
Yes! Great question. There is beautiful work from Katie Pollard, David Hauslerr, Shyam Prabhakar, Jim Noonan and many others that used variation to find human accelerated sequences. These are sequences that are conserved in all mammals but changed significantly in humans, much more than expected by chance/neutral evolution. Many of them have been found to be functional enhancers and several have also been associated with human-specific diseases. -NadavSomeone named "zmil" gave a much better response. Here's my answer.
Most noncoding RNAs aren't conserved, even in our closest relative. This is consistent with the idea that they are mostly junk RNA due to spurious transcription. That's the best explanation for the vast majority of these transcripts. However, there's always the remote possibility that a new functional gene could have arisen in the human lineage. When that happens, you can possibly detect it by looking for variation within the human population. A stretch of DNA that has fewer than average mutations may indicate that it's under negative selection and therefore functional. The experiments are difficult because these putative genes are quite small and the functional target size within the gene may be even smaller. Very few clear examples have been found. The evidence (lack of sequence conservation) still favors the conclusion that most noncoding RNA are nonfunctional.The initial questioner also asked, "What is the evolutionary advance to keep "neutral" sequences?" ENCODE declined to answer that question but "PsiWavefunction" gave good answers.
As a high school biology teacher, what I've been telling my students for several years is that only about 1.5% of the human genome encodes proteins, and the rest is:I replied by posting a link to: What's in Your Genome?. There was no further response from ENCODE.
As a high school level summary, was this a reasonably accurate picture of our knowledge of the genome ~10 years ago when I started teaching? What do you think the biggest revisions have been?
- regulatory elements
- genes for structural and regulatory RNAs
- junk like pseudogenes and endogenous retroviruses
- duplications of various kinds
- stuff that may have a function but we have no idea what it is
Yin Shen replies,
This is a pretty good summary. The lessons we learned in the past ten years include: 1. There are millions of non-coding regulatory elements, a much bigger number than the protein coding sequences. 2. The regulatory elements are cell type specific and they are the major driving force for cellular identity. 3. A majority of the genetic variations associated with complex diseases are located in these regulatory elements, therefore mutations in these regions can play important roles in individual's susceptibility to diseases.
Referring to the lessons that Yin Sen supposedly learned in the past decade, all three of them are seriously flawed. If he really thinks there are millions of functional regulatory elements in the human genome then if I had reviewed his grant it would not have been funded.
ENCODE was asked, "Based on what you are doing how much of our DNA would you reckon is actually junk and how much of our DNA actually has a function?" Here's how Nadav Ahituv responded. Keep in mind that he is a professor at the University of California at San Francisco. This is (was?) a very prestigious university.
Great question! Only 2% of our genome are genes that code for protein. Around 45% of our genome is actually made of what's called repeats, many of them viruses that were inserted into our genome. Various cool studies show that several of them have adapted new functions that made them 'stay' in our genome — like becoming parts of other genes or adopting a gene regulatory function (instructing genes when, where and at what levels to turn on). As for the remaining 53%, we see that a lot of it has regulatory function and other functions which we still don't know and which are fascinating in my mind to uncover.
The history of this field is also really fascinating – I recommend this article that does a great job describing when researchers first recognized the role of non-coding regulatory regions in the DNA (earlier than you might think!) Is Most of Our DNA Garbage? -Nadav.
If we define a gene as a DNA sequence that's transcribed then protein-coding genes occupy about 25% of our genome. That's because they are mostly introns. Most intron sequences are junk.There was no reply from ENCODE.
Transposon- and virues-related sequences make up a substantial percentage of our genome (probably >50%). Most of it is bits and pieces of defective transposons that look very much like junk. Some tiny percentage of these sequences have secondarily acquired a new function but the vast majority still has all the characteristics of junk DNA.
Proven regulatory sequences make up a very tiny percentage of the genome (less than 1%). Many researchers speculate that regulatory sequences cover a significant fraction of the genome but there's no solid evidence that this is true. If it were true, those thousands of sequences have to be in the few percent of unknown conserved sequences otherwise you have to postulate that they all evolved (and became fixed) in the human population within the last few million years. That's not very likely.
I'd like to ask each of the ENCODE researchers to give us their informed opinion (best guess) on the amount of junk DNA in out genome.
I think it's 90%. If this is true then what is "dark Matter"?
Do the ENCODE researchers agree that the null hypothesis is "no function" and function has to be proven in the face of abundant evidence that most of our genome is junk?
For those of you who don't want to slog through Carl Zimmer's article, non-coding regulatory sequences have been in the textbooks since the mid-1960s (more than half-a-century!).
The researchers were asked, "What would be a good book to understand more about our genome? I have some intro biolology and genetics books but they seem kind of outdated." They replied,
Mike Pazin said: The Deeper Genome, John Parrington; Homology, Genes, and Evolutionary Innovation, Gunter P. Wagner.Really!!! I'm not making this up. Those are actually the books they recommend to others.
Elise Feingold said: For a more lay-oriented audience, I would recommend "The Gene: An Intimate History" by Siddhartha Mukherjee
I responded with links to my criticism of John Parrington's book. The fact that Mike Pazin would recommend this book indicates that he probably agrees with the author. Parrington is a great defender of ENCODE and the idea that our genome is chock full of sophisticated and mysterious regulatory sequences; hence, the "deeper" genome. If that's what Mike Pazin believes then I would have rejected his grant application.
Elise Feingold has been with ENCODE since the beginning. She recommends Mukherjee's book as a good source for information on the human genome! That's ridiculous. The average person would be hard-pressed to find any useful information about genomes in that book.
Here's another exchange that's very interesting. Do you think ENCODE answers the question?
Djebel1 asks,It looks to me like they want to milk this controversy for as much money as possible. If that means giving up scientific integrity, then that's a small sacrifice.
- So, do you now agree that a confirmed chemical activity at a site is not equivalent to the site being functional? And that, no, not 80% of our DNA is functional?
- How did your point of view evolve on that matter since the 2012 controversy? How dissimilar were your own points of view as compared to the official press releases, saying that everything is functional?
Mike Pazin replies for ENCODE,
ENCODE 2 found biochemical signatures at 80% of the genome, adding up all signatures for all cell types. This was an important first pass. However, if one looks at particular biochemical marks (such as DNase) that are markers for particular candidate functions (regulatory DNA), the numbers are quite different (in this case about 10%). An important part of ENCODE 4 will be its specific focus on examining candidate elements to determine whether, when, and where they function in important human cell types. This will be the task of the new ENCODE characterization centers, two of which Yin and Nadav will be directing at UCSF.
Here's another exchange with a grad student who asked a pointed question.
zackroot asks,My response,
Evo-Devo grad student here, it's great to see such an awesome genomics group for AMA!
Nowadays, genomic "dark matter" seems to be a heavy word implying a whole bunch of different things. Does your analysis include anything regarding transcriptomics or are you purely looking at "junk DNA"?
Navin Ahituv responds for ENCODE,
Our group is mainly looking at gene regulatory elements such as promoters and enhancers that regulate transcription. Several of them are actually transcribed and are being referred to as enhancer RNA (eRNAs)
I don't really think we should be throwing the term "dark matter" around in whatever context we like. Stuff like that is what strains the trust between scientist and layman
Personally I think it's mostly looking at "junk DNA" but also how "junk DNA" might interact or effect other -omics. It'd be interesting to see their answer.
No further comment from ENCODE.
ENCODE is mostly looking at junk DNA and spurious transcription but they refuse to consider this possibility. They use the term "dark matter" to imply there's some mysterious function in all that excess DNA. However, after 12 years and several hundred million dollars they still haven't found it.Here's one response from ENCODE that's almost correct!
Navaltactics asks,My response,
My question is what classifies a sequence as "biologically relevant", and is a relevant sequence always relevant?
Elise Feingold replies,
Non-coding regulatory regions are often functional only in specific biological contexts, e.g., in specific cell types, during certain times in development or after particular environmental exposures. So a big challenge is assaying for function in the appropriate biological setting. If you don't find something has functional activity, it could be that you aren't looking for it in the right biological context or it's possible that those sequences have one function under one set of conditions and another function under a different set. It's also possible that we don't have the right set of tools to probe for the particular function. Or perhaps, it just isn't functional?
After half-a-century of studying genomes we have a pretty good idea that 90% of our genome is junk. The evidence is very solid and comes from several different sources. Thus, we can confidently conclude that much of what ENCODE identifies as "biochemically functional" is NOT biologically relevant. The only remaining step for ENCODE is to admit this and then tell us what small fraction of the genome is actually biologically functional. The best criterion for identifying true function is sequence conservation. In the absence of sequence conservation the burden of proof is on ENCODE researchers to identify specific nonconserved sequences that have a proven function. That can only be done by examining specific targets. So far, they have failed to identify more than a handful of nonconserved sequences that actually have a proven function.There were tons of questions that weren't answered. I'm guessing the researchers were too busy to address all of the questions. That probably explains why they didn't respond to some of the more pointed questions and followups. I'm sure they would have liked to answer the questions about how much junk DNA there is on our genome but the day went by far too quickly.
At some point, the continuing attempt to find function where none exists has to stop being funded.