Deep Learning for Single-Cell Genetics - ABSTRACT The dissection of genetic, molecular, and cellular heterogeneity in human complex diseases is imperative for understanding disease pathogenesis and advancing the therapeutics. Recent progresses in genetics and single-cell genomics have generated massive datasets, such as those from the UK Biobank and the Human Cell Atlas, providing great opportunities to investigate the genetic underpinnings of complex diseases and the cellular diversity across human tissues. However, existing computational and statistical methods are compartmentalized for genetic or single-cell data analysis, lacking a seamless integration needed to link genetic variants to specific cellular processes. This limitation impedes the differentiation of disease-causing genetic factors from confounding elements, resulting in an incomplete grasp of cellular and molecular disease mechanisms. In response, we propose a pioneering research field, “single-cell genetics,” aimed at systematically integrating cutting-edge single-cell genomic data into genetic analysis. Our goal is to develop a suite of deep learning methods, serving as the methodological foundation for single-cell genetics, to address fundamental challenges in disease genetics, including variant interpretation, gene discovery, cell prioritization, and disease prediction. This initiative not only leverages recent breakthroughs in artificial intelligence and single-cell genomics to elevate genetic discoveries, but also explores a new direction that extends the boundaries of genetics towards single-cell resolution. Specifically, we will first develop a novel sequence-based deep learning model to predict gene expression from DNA sequences. This model, trained on extensive single-cell and individual-level data, allows for a comprehensive characterization of cell-specific transcriptional regulation and accurate prediction of variant impact across individuals. Second, we will design a unified deep learning framework that connects single-cell reference data with summary statistics from genetic association studies. Accommodating both common and rare variants, along with various single-cell profiling data, this versatile framework enables the prioritization of disease-critical cells and identification of cell-specific risk variants and genes. Finally, we will develop novel deep learning-based genetic risk scores that integrate single-cell-resolved annotations with common or rare variants for individualized disease prediction. This approach shifts from the conventional statistical, linear principle of polygenic risk scores to a biologically informed, nonlinear framework, offering enhanced model interpretability and disease prediction. Overall, this proposal seeks to establish a next-generation paradigm for genetic analysis, provide foundational deep learning algorithms with applications in comprehending the genetic, molecular, and cellular underpinnings of complex diseases, and generate new actionable biological hypotheses for disease mechanisms and therapeutic targets. The resulting software and data resources will be made publicly available through open-access platforms, contributing valuable assets to the broader scientific community.