Statistical Methods for Precision Prevention - PROJECT SUMMARY/ABSTRACT Prevention holds the greatest potential for reducing cancer burden in the population. Recent breakthroughs in genomics technologies have propelled advancements in deciphering molecular events and understanding cancer causes, leading to improved risk stratification and precision prevention. The objective of this application is to develop statistical and computational approaches to harness state-of-the-art genomic data and translate these findings into precision prevention through risk-based intervention strategies. Tumors are heterogeneous and unraveling tumor heterogeneity to distinguish indolent versus aggressive tumors will facilitate the understanding of underlying disease process. Emerging spatial omics technology, which simultaneously profiles both molecular features and spatial locations, provides critical information about the tissue microenvironment essential for understanding disease development. There are significant challenges in analyzing such data, including non-comparability of spatial omics images from different samples, high-dimensional data, and limited sample size. The goal of Aim 1 is to develop (a) deep learning-based approaches that combine unsupervised and supervised loss functions for learning common features predictive of poor clinical outcomes and (b) robust and efficient data integration approaches for assessing the association of individual's risk factors with these features, leveraging external existing biomarker summary information to improve efficiency while accounting for data source heterogeneity. Identifying risk factors that are associated with tumor subtypes linked to adverse outcomes can better stratify the population into different risk levels, enabling tailored prevention approaches such as determining when to start screening. This requires careful cost-effectiveness analysis using large-scale observational data. However, analyzing such data is complex due to non-random utilization of intervention and confounding. Further, the timing of screening is continuous and subject to censoring, as some individuals may have died or been diagnosed with the disease before initiating screening. The goal of Aim 2 is to develop statistical methods for assessing the cost-effectiveness analysis of time-varying screening, including (a) a causal- inference-based estimation procedure to quantify the benefit and cost of the intervention; (b) leveraging improved risk prediction from Aim 1 to determine “when to start screening” based on its impact on the benefit and cost in the population. Through our collaboration, we will apply the methods to data from the Genetics and Epidemiology of Colorectal Cancer Consortium to gain insight into carcinogenesis and risk assessment and evaluate their translational impact in the Women's Health Initiative cohort. Since our methods are also applicable to other studies, we will develop R or python-based open-source software packages, along with detailed manuals and data processing pipelines. These resources will be made available in public repositories or on our websites.