Large-scale Disease Pathway Discovery by Integrating Tissue-specific Molecular Networks via Hierarchical Bayesian Inference on Graph Neural Networks - PROJECT SUMMARY Analysis of molecular entities and the interplay between these components provides key insights into genotype/phenotype relationships, functioning of cells/tissues, and etiology of diseases. However, identifying pathways associated with diseases is a long-standing fundamental challenge. One major obstacle is that proteins associated with a single disease tend to assemble multiple disconnected pathway fragments sparsely embedded in protein-protein interaction (PPI) networks, and each localized connected module contains only a small fraction of the proteins. Moreover, as diseases manifest themselves in different tissues, the disease modules and their connectivity patterns tend to be tissue-specific. Although deep learning methods for pathway discovery show promising results, topological information of disease proteins remain largely unexplored. In addition, deep learning for network integration fuses tissue-specific networks into a single generic representation that obscures their distinct yet related features. Recent advancement in graph neural networks (GNNs) gains the capabilities to exploit topological information in molecular networks. However, since GNNs encode each node’s representing features by aggregating information from all its neighboring nodes within a neighborhood scope that needs to be pre-specified, identifying and grouping the loosely linked pathway fragments can be expensive and incomplete. Based on the PI’s pioneering work on developing GNN methods for biomedical interaction prediction, the overarching research goal for the next five years is to design and develop novel bio-statistical graph learning strategies for structural characterization of large-scale disease pathways and systematic bio-medical data analytics for a holistic view of molecular and functional relationships. The PI will model GNNs’ exploration for sparse connectivity patterns of disease proteins as stochastic processes to infer the scope of neighborhood where GNNs should aggregate information and adaptively sample interactions to identify critical sub-networks. This will enable us to leverage both node features and topological information to improve predictive performance. To integrate tissue-specific PPI networks for disease pathway discovery, the PI will augment our GNN inference by creating a hierarchical framework. The framework will consist of a hyper-prior to share complementary information between the molecular networks at the global level, and tissue-specific priors to capture their distinctive characteristics at the local level. The PI will design both case studies and comprehensive experiments to verify and interpret predicted disease proteins and modules. Since molecular interactions control cellular processes involving complicated cascades of biochemical reactions and signaling pathways, dysregulation of any component of these pathways can lead to a broad spectrum of human pathologies including cancer, cardiovascular disease, neurodegenerative conditions, and metabolic diseases. A holistic view of disease pathways will accelerate the development of new therapies to combat these diseases.