AI-powered cross-level cross-species omics data integration to elucidate mechanisms of EL - Abstract It is a formidable task to identify the molecular causes of complicated traits such as exceptional longevity (EL). The majority of machine learning algorithms generate mathematical correlations between genotypes and phenotypes, but may fail to infer physiologically significant causes. A mechanistic understanding of how individual molecular components work together in a system and how the system is affected and adapted to the molecular change requires knowledge of molecular interactions across all biological levels, from DNAs to RNAs to proteins to metabolites to organismal phenotypes. By integrating multi-omics data, recent approaches in multi-modal machine learning and multi-layer network model promise to address this deficiency. However, existing machine learning approaches are hampered by high-dimensionality, non-uniformity, numerous confounders, and biological differences in multi-omics data across data resources, data domains, and species as well as lack of interpretability due to the black-box nature of machine learning models. We will develop a transformative deep learning framework to address challenges for multi-omics data integration and predictive modeling of causal genotype-EL associations. This project is established on our substantial preliminary results, successes in systems pharmacology for Alzheimer's disease drug discovery and using C. elegans as disease and aging models, and close collaborations between experimental and computational laboratories. We shall overcome several obstacles in order to discover the molecular mechanisms of EL. We will develop and validate novel algorithms to 1) harmonize non-uniform data sets by removing environmental and biological confounding factors (e.g., age, species, etc.) and technical biases (e.g., batch effect), 2) explicitly model the biological information flow from DNAs to RNAs to proteins to metabolites to organismal phenotypes, and 3) determine causal genetic factors and molecular interactions underlying EL. Specifically, we will: (1) develop MuLGIT, a causal deep learning-powered cross-layer multi-omics harmonization and integration framework that follows the central dogma of biology for deciphering the molecular interplays underlying EL; (2) develop a transfer learning method PATH-AE for cross-species omics data integration and modeling for elucidating evolutionarily conserved and species-specific molecular determinants of EL; (3) identify molecular targets and pharmaceutical agents of EL by merging new methodologies for multi-omics data integration with state-of-the- art methods for chemical genomics and perturbation genomics; and (4) experimentally validate computational predictions using C. elegans models. Completion of this project will allow us to identify novel biomarkers, druggable targets, and pharmacological agents associated with remarkable lifespan (EL).