Predicting the effects of genetic variants on chromatin accessibility with a deep learning approach - PROJECT SUMMARY/ABSTRACT
This project will assess a deep learning approach for predicting the effects of genetic variants on chromatin
accessibility (CA). Currently, there is a knowledge gap in understanding the function and causality of genetic
variants in human genetics research since over 90% of genetic variants have been found within the non-coding
region of the genome. GWAS studies have provided us with information about these genetic variant
associations. However, this research has yet to establish the molecular function of these genetic variants.
Molecular quantitative trait locus (QTL) analysis has been used to determine variant function, and the
identification of variants associated with molecular traits such as caQTLs. However, due to linkage
disequilibrium, the identification of causal variants found from molecular QTL analysis is ambiguous; thus, they
lack the power to identify associations with rare genetic variants. Furthermore, it has been shown that
allelic-specific information, including allele-specific chromatin accessibility (ASCA), can increase the power to
detect caQTLs, potentially improving machine learning model predictions. An alternative approach to
determining variant function are machine learning methods, which have been utilized to determine the
molecular function of genetic variants and have achieved success at predicting gene expression, CA, and
transcription factor binding from DNA sequence. However, these machine learning models are solely trained
on reference genome sequences and do not consider human genetic variation. The key focus of this research
proposal is to investigate the hypothesis that a machine learning model that utilizes genetic variation and
allele-specific information will accurately predict the effects of both common and rare genetic variants on
chromatin accessibility. To investigate this hypothesis, here are two specific aims: Aim 1 will develop a
variant-aware neural network to predict the effect of genetic variants on CA. Aim 2 will predict the function of
rare genetic variants. In summary, this proposal strives to establish improved predictions of the molecular
function of genetic variants found in the non-coding region of the genome by assessing the utility of genetic
variation and ASCA with a deep learning approach. The proposed study will lead to the ability to predict the
function of rare genetic variants, which are likely to be highly important to disease traits, but whose function
cannot currently be uncovered utilizing QTL-based methods. This training plan will provide the applicant the
opportunity to (1) develop expertise in machine learning in genomics, (2) gain skills in CRISPR-based genome
editing, (3) improve scientific writing and communication skills, and (4) develop mentoring and teaching
aptitude. These professional training goals will provide the applicant with the essential training and scientific
experience required to obtain a postdoctoral fellowship and thereafter become an impactful independent
research investigator at an R1 University.