Scalable Bayesian Network analysis of multimodal FACS and SUMOylation data, with
generalization to other big mixed biological datasets
Abstract
The Bayesian, or Belief, Network (BN) modeling is a powerful tool that is currently emerging as one of
the principal data analysis, exploration and visualization methods for multimodal (aka mixed, or
heterogeneous) “big” biological data. We have previously developed comprehensive BN algorithms
and software package aimed at heterogeneous big biological data analysis. Over the recent years we
have applied it to the different biological research domains / datasets (including chromatin interaction,
tRNA evolution, genetic epidemiology and metabolomics, cancer epidemiology and single cell
thymopoiesis data); work on three more projects (inferring immune signaling networks using FACS
data, genome-wide SUMOylation, Alzheimer's genomic analysis) is currently in progress. In course of
this work we have identified crucial “bottlenecks” that need to be addressed, on the methodological
level, to make the BN analysis universally usable in our general context (that is, big biological data
containing large numbers of variables of different types). These issues (scalability of the BN
reconstruction process, handling mixed data types, and interpretation, evaluation & comparison of the
resulting network models) have not been adequately addressed in the field yet, thus limiting the
usability of the otherwise very powerful and elegant BN approach.
Consequently, the primary goal of this project is to develop novel BN analysis algorithms with
emphasis on (a) scalability, (b) handling mixed data types, and (c) resulting networks' interpretation
and evaluation. We are particularly interested in the BN analysis of the quantitative flow cytometry
(FACS) data generated as part of the ongoing City of Hope cancer immunogenetics research projects,
as this type of data exemplifies BN modeling challenges, and any advances in algorithm and software
development would be generalizable to most instances of big biological data. We will subsequently
apply the BN analysis to the SUMOylation and chromatin interaction genomic data (also generated as
part of the ongoing collaborative City of Hope research projects), to further test generalizability, and to
produce additional biological results.