Bioinformatics | Cebib Centre For Biotechnology And Engineering

Bioinformatics

The Bioinformatics component of the Centre aims to develop effective tools for the simulation and prediction of complex processes: biological, medical, and social. This includes the analysis of genome sequences, protein structures, spread of diseases, disaster management, and more. While those problems have always been computationally intensive, the rapid increase in available data makes them data-intensive as well. Further, they are energy-intensive, especially with solutions that incorporate Artificial Intelligence (AI), an issue that is becoming an environmental concern worldwide. We therefore seek to develop algorithms and software that is efficient at scale, not only in the aspect of time performance, but also in terms of space and energy usage.

Some highlights of our developments include fast and highly compressed data representations for sequenced genomes, supporting complex pattern searches; a compressed filesystem for efficiently storing and transmitting sequenced genomes; Gdrift++, an approximate Bayesian computation-based simulation software product for population and evolutionary genetics; CaClust, a probabilistic graphical model integrating deep whole exome, single-cell RNA and B-cell receptor sequencing data to infer clone genotypes, cell-to-clone mapping, and single-cell genotyping; parallel algorithms for the physical characterization of small non-coding RNA in bacterial genomes; Peptipedia 2.0, a public database for searching, characterizing and analyzing peptide sequences in biotechnology and bioengineering applications; the Atacama Database, a site that gives access to the largest collection of information related to microorganisms from the Atacama Desert; new algorithms and mathematical models for predicting the effect of mutations on engineered enzymes, and for classification of unknown proteins; and Geo-informatics platforms and efficient algorithms for large-scale agent-based discrete event simulation, enabling the modeling and simulation of complex dynamic systems to support decision-making for disaster risk management in red tide events among other socioeconomics disasters.

We are now engaged in the process of aggressively incorporating AI technology in our solutions, including research on GPU-level parallelism, while at the same time researching on energy-efficient algorithms to mitigate its environmental impact.