ENCODE, modENCODE and Gene definition

ENCODE and modENCODE projects have recently achieved major milestones in decoding the meaning of genome sequences. Whereas ENCODE consortium focuses on the human genome, modENCODE researchers are using Drosophila melanogaster (fruit fly) and Caenorhabditis elegans (worm) genomes in their studies.

The initial aim of the ENCODE project (in 2003) was to decode the information on just 1% of the genome (termed “pilot project”) and to establish experimental techniques for large scale future studies. In 2007 the aim of the project was expanded to the entire genome. The project has officially ended this year, but there is a lot more work left for consortium members to finish. There are 32 groups (institutes) including more than 440 scientists that make up the consortium.

Regarding achievements, the ENCODE project has assigned some sort of biological function to roughly 80% of the human genome. This includes the discovery of roughly 70,000 promoters and 400,000 enhancer regions. This finding is surprising since it was widely believed that much of the human genome contains “junk” DNA, i.e., sequences that don't contribute to any meaningful activity.

The major findings of ENCODE and modENCODE projects include:

Transcription is pervasive leading to significant biological “noise” (low-level transcripts, unspliced introns, etc.)

  • Genes have multiple transcription start sites
  • Regulatory sequences are symmetrically distributed
  • Novel non-coding transcripts and sometimes overlapping protein-coding genes are found
  • Variety of chromatin modification mechanisms exist

In light of the new findings, the researchers have revised the definition of a gene.

Old definition of a gene
A gene is a continuous stretch of DNA that contains start and stop sites and encodes for RNA.

New definition of a gene
A gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products.


