Skip to main content
< Back to news
Diversity is key: metagenomes from different enviroments can predict different types of gene functions (Graphic, F Supek, IRB Barcelona).

The depths of the ocean and gut flora unravel the mystery of microbial genes

An international team led by computational biologist Fran Supek at IRB Barcelona in PCB develop a machine learning method to predict unknown gene functions of microbes. The system examines and compares ‘big data’ available on the metagenomes of human and environmental microbiomes.


Understanding the functions of genes in bacteria that form part of the human microbiome—the collection of microbes found inside our bodies—is important because these genes might explain mechanisms of bacterial infection or cohabitation in the host, antibiotic resistance, or the many effects—positive and negative—that the microbiome has on human health.

Surprisingly, the functions of a huge number of microbial genes are still unknown. This knowledge gap can be thought of as “genomic dark matter” in microbes, and neither computational biology nor current lab techniques have been able address this gap.

This challenge has now been tackled through an international collaboration between the Institute for Research in Biomedicine (IRB Barcelona) and two other interdisciplinary research centres, namely the IJS in Ljubljana (Slovenia) and RBI in Zagreb (Croatia). The findings have been published recently in Microbiome, the international journal of reference in microbiome research. The study was led by Fran Supek, computational biologist and leader of the Genome Data Science lab at IRB Barcelona, and first-authored by Vedrana Vidulin, a computer scientist affiliated to the centres in Slovenia and Croatia.

Intelligent prediction method

The researchers have developed a new computational method able to examine thousands of metagenomes simultaneously and identify the evolutionary signal that can predict the function of many microbial genes. This method, which analyses “big data” from human microbiomes (e.g. from the intestine or skin) and other metagenomes (e.g. from the soil or ocean) is based on a special kind of machine learning algorithm: it can create “decision trees” to predict hundreds of different functions at once, finding links between genes and at the same time predicting what they do in the microbial cell.

“This makes the algorithm very good at not getting confused by the noise in the metagenomic data, meaning that it is accurate and can confidently propose a biological role for a large number of genes with unknown functions. Intriguingly, it also proposes many additional functions for genes that already have some known role,” says Supek.

The most important finding to emerge from this research is that the analysis of human microbiomes and other metagenomic data, such as those of the soil and ocean, allows researchers to assign hundreds of gene functions that have evaded current computational genomics approaches until now. “In other words, metagenomes allow scientists to see what ordinary genomes don’t,” explains the Croatian researcher, who was recently awarded a grant from the European Research Council (ERC).

► Reference article: 
Vedrana Vidulin, Tomislav Šmuc, Sašo Džeroski, Fran Supek. “The evolutionary signal in metagenome phyletic profiles predicts many gene functions”. Microbiome, 2018; 6 (1) DOI: 10.1186/s40168-018-0506-4

► More information: IRB Barcelona website [+]