Predicting the effects of genetic variants on chromatin accessibility with a deep learning approach - PROJECT SUMMARY/ABSTRACT This project will assess a deep learning approach for predicting the effects of genetic variants on chromatin accessibility (CA). Currently, there is a knowledge gap in understanding the function and causality of genetic variants in human genetics research since over 90% of genetic variants have been found within the non-coding region of the genome. GWAS studies have provided us with information about these genetic variant associations. However, this research has yet to establish the molecular function of these genetic variants. Molecular quantitative trait locus (QTL) analysis has been used to determine variant function, and the identification of variants associated with molecular traits such as caQTLs. However, due to linkage disequilibrium, the identification of causal variants found from molecular QTL analysis is ambiguous; thus, they lack the power to identify associations with rare genetic variants. Furthermore, it has been shown that allelic-specific information, including allele-specific chromatin accessibility (ASCA), can increase the power to detect caQTLs, potentially improving machine learning model predictions. An alternative approach to determining variant function are machine learning methods, which have been utilized to determine the molecular function of genetic variants and have achieved success at predicting gene expression, CA, and transcription factor binding from DNA sequence. However, these machine learning models are solely trained on reference genome sequences and do not consider human genetic variation. The key focus of this research proposal is to investigate the hypothesis that a machine learning model that utilizes genetic variation and allele-specific information will accurately predict the effects of both common and rare genetic variants on chromatin accessibility. To investigate this hypothesis, here are two specific aims: Aim 1 will develop a variant-aware neural network to predict the effect of genetic variants on CA. Aim 2 will predict the function of rare genetic variants. In summary, this proposal strives to establish improved predictions of the molecular function of genetic variants found in the non-coding region of the genome by assessing the utility of genetic variation and ASCA with a deep learning approach. The proposed study will lead to the ability to predict the function of rare genetic variants, which are likely to be highly important to disease traits, but whose function cannot currently be uncovered utilizing QTL-based methods. This training plan will provide the applicant the opportunity to (1) develop expertise in machine learning in genomics, (2) gain skills in CRISPR-based genome editing, (3) improve scientific writing and communication skills, and (4) develop mentoring and teaching aptitude. These professional training goals will provide the applicant with the essential training and scientific experience required to obtain a postdoctoral fellowship and thereafter become an impactful independent research investigator at an R1 University.