Skip to main content
Uncategorized

Researchers from the CNAG and the CRG push genome data analysis one step forward

By 31 de October de 2012November 18th, 2020No Comments
< Back to news
 31.10.2012

Researchers from the CNAG and the CRG push genome data analysis one step forward

The GEM project, led by Paolo Ribeca from the Centro Nacional de Análisis Genómico (CNAG) –based in the Barcelona Science Park– and including scientists from this center and the Center for Genomic Regulation (CRG), allowed the development of a tool for the interpretation of genomic data that is several times faster and much more accurate than other tools currently being used.The study has been published in the journal Nature Methods.


Due to the exponential increase in sequencing capacity, efficient tools for data analysis are becoming essential to process the vast amount of biological data. This is the starting point that led some researchers at the CRG and the CNAG to design a computer program that helps to find sequences in the reference genome, quickly and accurately: such tools, called ‘mappers’, are essential to interpret data in genomic studies, as they represent the first analysis step for many biological experiments. After 5 years of development the result is the GEM (Genomic Multitool) mapper.

The GEM mapper is several times faster than other reference programs in the field and delivers breathtaking performance, matching into the huge human genome of reference about 40 million sequences per hour on a single CPU core. As it uses algorithms that guarantee that it doesn’t miss matches, GEM is also much more accurate than other comparable programs. In addition, GEM allows the parameters of the search to be tuned to the specific requirements of the biological experiment being performed, offering a versatility that cannot be achieved with most existing tools.

The good performance profile of GEM will help to face a practical problem: the dramatic increase in the amount of sequencing data. As an example, the CNAG started operations in 2010 with a park of 12 second generation sequencers that generated roughly 50 Gbases per day. Thanks to the recent spectacular advances in sequencing technology, today, only 2 and a half years after, the CNAG generates almost 20 times more data with the same number of sequencing machines. However, it would have been impossible to increase the computing resources of the CNAG accordingly (and this is a problem common to biomedical research everywhere in the world). Hence, the development of more efficient analysis tools like GEM is essential to keep up with the increasing rate of production.

The research was funded by the Spanish Ministerio de Educación y Ciencia (Consolider program), by the US National Institutes of Health/National Human Genome Research Institute, and by the European Union (READNA and ESGI programs).