Bioteque: a computational tool to harmonise biological knowledge
Scientists led by Dr. Patrick Aloy, ICREA researcher at IRB Barcelona, based in the Barcelona Science Park, have developed a computational tool to harmonise, integrate and simplify the enormous and growing amount of biological data available. The result is a knowledge graph that provides information on how different biological entities (such as genes, diseases, and cells) are related to each other, including more than 30 million functional interactions. The work, which is open access, has been published in Nature Communications.
The rapid development of the different disciplines in the fields of biological and biomedical research (such as genomics, proteomics, and transcriptomics) in recent decades has led to exponential growth in the amount of biological data available. For example, at the European Bioinformatics Institute (EMBL-EBI), they have gone from managing a volume of 40 petabytes to working with 250 petabytes in just 6 years.
Scientists led by Dr. Patrick Aloy, ICREA researcher and head of the Structural Bioinformatics and Network Biology laboratory at IRB Barcelona, have developed a computational tool to harmonise, integrate and simplify these data. The result is a knowledge graph that provides information on how different biological entities are related to each other, including more than 30 million functional interactions.
The Bioteque works by integrating different levels of biological complexity and thus can report, for example, on two genes that are related, whether they physically interact, whether they are active in the same type of cells, and whether they are related to the same disease. It can also predict the sensitivity or resistance of a type of cell to a specific drug.
“This computational resource that we’ve developed is one of the first aimed at unifying biological information and it’s the only one to address such diversity and amount of data. It allows access, in an easy and harmonised way, to practically all the biological knowledge currently available, and it has enormous potential to accelerate biomedical research,” explains Dr. Patrick Aloy.
Almost 1,000 descriptors for 12 biological entities
The information held in the Bioteque is structured into 12 types of biological entities, such as gene, disease, tissue, cell, etc. For each of these entities, the tool considers a series of descriptors or characteristics, for example, the pattern of mutations of a gene, the profile of physical interactions of the resulting proteins, the expression of said gene in different cell types, or its relationship with different diseases. Among the 12 biological entities, the system covers around 1,000 types of descriptors.
“We have worked with information from 150 different databases, so first we had to integrate them, that is, put them all in the same “language”. And then we converted that knowledge into numerical descriptors that could be interpreted by algorithms, and that way we could computationally exploit these networks and connections,” concludes Adrià Fernández, the first author of the article and a doctoral student in the same laboratory.
The Bioteque will be expanded periodically with new databases, as they are made public. Both the tool and the databases and algorithms are open access and are available here: https://bioteque.irbbarcelona.org/.
» Link to the news: IRB Barcelona website [+]
» Reference article: Fernández-Torras, A., Duran-Frigola, M., Bertoni, M. et al. Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque. Nat Commun13, 5304 (2022). DOI: https://doi.org/10.1038/s41467-022-33026-0.