Domain adaptation approaches to unify established and emerging sequencing technologies - PROJECT SUMMARY / ABSTRACT Advances in sequencing technologies provide new opportunities to interrogate biological systems from multiple perspectives. However, the introduction of new technologies highlights a problem many researchers face: missing data. Missing observations across technologies and biological states is a frequently observed problem in the field of computational biology. This missingness can be a result of limitations in the technology, the rarity of a biological state, or because the technology has not been widely adopted. While one technology may have high sparsity in biological observations, there is an opportunity to leverage existing, complementary data from an established technology to impute the missing biological observations. We address these issues by utilizing new methodological advances in machine learning, primarily focusing on domain adaptation techniques. These techniques learn patterns in one dataset that can be adapted to another dataset, enabling cross-technology information sharing. Our proposal introduces a general framework in which domain adaptation techniques can be used to unite an emerging technology with a different, but technology. To highlight the broad utility of this approach, we apply this model to three biomedical applications: 1) Predict cell-type-specific perturbation response in rheumatoid arthritis; 2) Predict tissue-of-origin from cell-free DNA (cfDNA); 3) Predict progenitor-specific gene signatures from cell-free DNA in acute myeloid leukemia (AML). The proposed aims not only unite existing and emerging sequencing technologies, but enable the discovery of new biology that is difficult or infeasible to directly observe. The research proposed builds on my experience in using statistical approaches for transcriptomic data. During the K99 phase I will require further training from my mentoring team in deep generative modeling (Dr. Casey Greene), modeling of single-cell data (Dr. Fan Zhang), and modeling of cfDNA and chromatin accessibility (Dr. Srinivas Ramachandran). The research will be conducted at the University of Colorado, Anschutz Medical Campus, in the Center for Health AI. In this institution, I will have access to the Colorado Clinical and Translational Sciences Institute and the RNA Bioscience Initiative, which provide resources for building an interdisciplinary and translational research program. With this training and available institutional resources, I will have a solid foundation on which to build an independent research program focused on domain adaptation applications for high-throughput sequencing technologies.