Skip to main content

Researchers from the CNAG have contributed to the ENCODE project

By 5 de September de 2012November 18th, 2020No Comments
< Back to news
Source: National Institutes of Health (USA).

Researchers from the CNAG have contributed to the ENCODE project

More than 10 years ago the Human Genome Project produced an almost complete list of the 3 billion pairs of chemical letters in the DNA that embodies the genetic code – but nothing about the way this blueprint works. Now, after five years of concerted effort by more than 440 researchers in 32 labs around the world, working collaboratively in the ENCODE Project, the first holistic view of how the human genome actually does its job has emerged. Researches from the National Centre for Genomic Analysis (CNAG), based in the Barcelona Science Park, contributed to the project working together with the ENCODE RNA sequencing analysis working group.

During the ENCODE study, researchers found that more than 80 percent of the human genome sequence is linked to biological function and they mapped more than 4 million regulatory regions where proteins specifically interact with the DNA with exquisite specificity; these findings represent a significant advance in understanding the precise and complex controls over the expression of genetic information within a cell.

Moreover, the findings bring into much sharper focus the continually active genome in which proteins routinely turn genes on and off using sites that are sometimes at great distances from the genes they regulate; where sites on a chromosome interact with each other, also sometimes at great distances; where chemical modifications of DNA influence gene expression; and where various functional forms of RNA, a form of nucleic acid related to DNA, help regulate the whole system.

“During the early debates about the Human Genome Project, researchers had calculated that only a few percent of the sequence encoded proteins, the workhorses of the cell,” said Eric D. Green, director of the National Human Genome Research Institute (NHGRI), the leading institution of the ENCODE Project. “Early on, some scientists even argued that most of the genome was “junk.” ENCODE now gives us much more appreciation of the complex molecular ballet that converts genetic information into living cells and organisms, and we can now say that there is very little, if any, junk DNA.”

The CNAG researchers Tyler Alioto, Paolo Ribeca and Micha Sammeth, led by Roderic Guigó from the Centre de Regulació Genòmica (CRG), contributed to the project working together with the ENCODE RNA sequencing analysis working group. More specifically they helped to set up methods to process and understand the huge amount of RNA sequencing data produced by the ENCODE project.

The scale of the effort was remarkable. Hundreds of researchers across the United States, United Kingdom, Spain, Singapore and Japan performed more than 1,600 sets of experiments on 147 types of tissue with technologies, standardized across the consortium. The experiments relied on innovative uses of next-generation sequencing technologies, in total, ENCODE generated more than 15 trillion bytes of raw data and consumed the equivalent of more than 300 years of compute time to analyze.

The coordinated publication set includes one main integrative paper and five other papers in the journal Nature; 18 papers in Genome Research; and six papers in Genome Biology. The ENCODE data are so complex that the three journals have developed a pioneering way to present the information in an integrated form that they call “threads.”

The ENCODE data are rapidly becoming a fundamental resource for researchers to help understand human biology and disease. More than one hundred papers using ENCODE data have already been published by investigators who were not part of the ENCODE Project. For example, researchers studying the genetic basis of human diseases are using the ENCODE resource to sort through the many disease-associated variants, or markers, that map not only to protein-coding regions of the genome, but also to the non-coding regions of the genome, the vast tracts of sequence between genes where ENCODE has identified many regulatory sites, in an effort to determine which specific variants contribute to disease.

Identifying regulatory regions will also help researchers explain why different types of cells have different properties, for example why muscle cells generate force while liver cells break down food. Scientists know that muscle cells turn on some genes that only work in muscle, but it has not been previously possible to examine the complete set of regulatory elements that control the process.