Project Summary
Alcohol use disorder (AUD) is a major public health challenge in the USA and the world. In the National Survey
on Drug Use and Health (2018), 14.4 million adults aged 18 and older had AUD. This included 9.2 million men
and 5.3 million women. Furthermore, in 2014, alcohol-impaired driving fatalities accounted for 9,967 deaths in
the USA. Despite its importance, not much research has been done to identify the predisposing biological
factors that may lead to the development of AUD. While predictive models have been successful in
distinguishing between individuals with AUD and healthy controls, models identifying in advance if an individual
is prone to develop AUD, as well as the biomarkers indicating a predisposition for AUD, are still unclear. To
address this, the Collaborative Study of the Genetics of Alcoholism (COGA) of European American (EA) and
African American (AA) has recruited subjects aged 8-68, who are longitudinally followed and evaluated for
AUD over 30 years. The subjects were also assessed in terms of electrophysiology (EEG), single-nucleotide
polymorphisms (SNP), psychosocial and psychiatry evaluation and demographic questionnaires. The goal of
our proposed study is to conduct secondary analyses of COGA’s rich multimodal longitudinal data to develop
predictive models that can accurately predict vulnerability to AUD before an individual actually develops the
disorder. Machine learning (ML) methods hold particular promise to address this problem. Over the last
decade, ML methods applied to complex biomedical data have generally outperformed classical regression
approaches, suggesting that multi-dimensional modeling of genetic, biological and psychosocial data may best
reflect the underlying pathophysiology of AUD. Thus, in this project, we will leverage innovative ML methods,
especially those based on deep and ensemble learning, and the rich COGA data to develop multi-modal
predictive models of vulnerability to the disorder. Furthermore, the majority of the AUD predictive modeling
work has been conducted in EA populations, necessitating increased studies among underrepresented groups,
including AA and females, so that the benefits of precision medicine can reach all populations. Therefore, we
will conduct our predictive modeling analyses in subgroups stratified by age, sex, and ancestry (AA, EA). We
will also rigorously evaluate the developed predictive models in an independent validation set, stratified based
on the same criteria. Finally, we will employ systematic interpretation strategies for the models to identify EEG,
genetic (SNP, polygenic risk scores), psychosocial, psychiatric and demographic features that contribute most
strongly to accurate AUD prediction. At the conclusion of the secondary analysis-oriented work outlined in this
proposal, we expect to have identified an accurate, generalizable multi-modal predictive model of vulnerability
to AUD, as well as identified features that are associated with this serious disorder. Our work is likely to
contribute to a deeper understanding of this major public health challenge, as well as its personalized
diagnosis and treatment.