Abstract
It is a formidable task to identify the molecular causes of complicated traits such as exceptional longevity (EL).
The majority of machine learning algorithms generate mathematical correlations between genotypes and
phenotypes, but may fail to infer physiologically significant causes. A mechanistic understanding of how
individual molecular components work together in a system and how the system is affected and adapted to the
molecular change requires knowledge of molecular interactions across all biological levels, from DNAs to
RNAs to proteins to metabolites to organismal phenotypes. By integrating multi-omics data, recent approaches
in multi-modal machine learning and multi-layer network model promise to address this deficiency. However,
existing machine learning approaches are hampered by high-dimensionality, non-uniformity, numerous
confounders, and biological differences in multi-omics data across data resources, data domains, and species
as well as lack of interpretability due to the black-box nature of machine learning models. We will develop a
transformative deep learning framework to address challenges for multi-omics data integration and predictive
modeling of causal genotype-EL associations. This project is established on our substantial preliminary results,
successes in systems pharmacology for Alzheimer's disease drug discovery and using C. elegans as disease
and aging models, and close collaborations between experimental and computational laboratories. We shall
overcome several obstacles in order to discover the molecular mechanisms of EL. We will develop and
validate novel algorithms to 1) harmonize non-uniform data sets by removing environmental and biological
confounding factors (e.g., age, species, etc.) and technical biases (e.g., batch effect), 2) explicitly model the
biological information flow from DNAs to RNAs to proteins to metabolites to organismal phenotypes, and 3)
determine causal genetic factors and molecular interactions underlying EL. Specifically, we will: (1) develop
MuLGIT, a causal deep learning-powered cross-layer multi-omics harmonization and integration framework
that follows the central dogma of biology for deciphering the molecular interplays underlying EL; (2) develop a
transfer learning method PATH-AE for cross-species omics data integration and modeling for elucidating
evolutionarily conserved and species-specific molecular determinants of EL; (3) identify molecular targets and
pharmaceutical agents of EL by merging new methodologies for multi-omics data integration with state-of-the-
art methods for chemical genomics and perturbation genomics; and (4) experimentally validate computational
predictions using C. elegans models. Completion of this project will allow us to identify novel biomarkers,
druggable targets, and pharmacological agents associated with remarkable lifespan (EL).