Project Summary/Abstract
Microbial communities and their hosts play a key role in many applications, including protecting humans or plants
against diseases or developing the next generation of biofuels and biological remediation systems needed for
sustainable growth. Gaining a deep understanding of the fundamental biology of these systems is the key to har-
nessing their potential. Advances in high-throughput multi-omics techniques like metagenomics, metatranscrip-
tomics, exometabolomics, and proteomics, allow us to capture multiple snapshots of these complex biological
processes at once. These snapshots create large-scale high-dimensional datasets of omics features (e.g., mi-
crobial species, microbial genes, proteins, and small molecules). The reduced cost has also allowed researchers
to collect more multi-omics time-series data. These temporally resolved multi-omics features can together provide
a comprehensive picture of biological processes and their underlying activities.
These well-designed multi-omics studies have not been analyzed to their fullest potential yet, primarily due to
the lack of appropriate tools and annotation databases required for such analyses. For example, systematically
investigating the time component of this longitudinal data to investigate the temporal dynamics of omics features
in relationship with disease activities is an unmet need in many studies. Therefore, there is a critical need for
statistical tools to greatly improve research infrastructure by integrating different data types and systematically
investigating the time component of this longitudinal data.
This project's overarching goal is to develop efficient, interpretable, and scalable tools based on our previously
developed signal model, called partially-observed Boolean dynamical systems (POBDS), to characterize the time
component and capture the dynamical behavior of microbial communities through multi-omics data. The original
contributions can be organized across the following research goals:
(i) Developing novel methods in the POBDS context capable of modeling multi-omics data obtained through
various molecular profiling technologies and various diseases/domains.
(ii) Developing Bayesian optimization frameworks for the efficient and scalable reconstruction of the network
topology of microbial communities (i.e., inferring the type of interactions between a large number of genes,
bacteria, and microbes) through high dimensional multi-omics data.
(iii) Developing Bayesian reinforcement learning perturbation policies to decrease the number of data required
for the modeling/learning process (overcoming the non-identifiability issue) and acquire the most informative
data in microbial communities.
All the developed tools in this project will be presented in a user-friendly software/tool freely accessible to other
researchers.