Project Summary
Psychotic and mood disorders represent a major driver of disability as well as health care cost. There is
considerable clinical heterogeneity among patients. Developing clinically implementable machine learning (ML)
tools to enable accurate patient stratification is critically important in order to augment effective personalized
treatment plans. Among the factors contributing to heterogeneity, childhood trauma is an under-recognized
source. The prevalence of childhood trauma is significant in adults with psychiatric disorders. Robust evidence
shows that: i) individuals exposed to childhood abuse are 2-3 times more likely to develop a psychiatric disorder
later in life, particularly psychosis; ii) childhood traumas impact critical windows of brain development and can
trigger the onset of psychosis; and iii) among patients with psychotic and mood disorders, childhood trauma
influences psychopathology, leading to more severe symptoms, poorer long-term outcomes (longer and higher
rate of relapses or rehospitalization), associated with substance abuse, and are often treatment resistant and
function poorly in society. Although evidence clearly indicates that childhood trauma contributes to psychiatric
risk and poor treatment outcomes, large-scale computational approaches to stratify subpopulations, extract
trauma features (e.g., frequency, type), and examine the links or the impact of trauma features on
psychopathology and treatment outcome have yet to be developed. We propose to create gold standard
annotations from Electronic health records (EHRs) and to leverage natural language processing (NLP) and ML
methods to develop a standardized re-useable data model for automatically extracting trauma-related features,
complex concepts, and symptom dimensions from EHRs. We will train and evaluate a semi-supervised NLP
model, which is built as a joint sequence model that can both identify named entities as well as extract the
relations between them. We will apply multiple strategies to validate the robustness of our model. Our proposed
NLP model is essentially a “computational version of a chart review” tool, designed to mimic human chart review
but performed automatically with the ability to scale. We will use this model to stratify psychosis subgroups (with
or without childhood trauma history) and to correlate among the extracted features with important clinical
outcome variables. Importantly, the annotation guidelines, corpus, and the data model developed by us will be
valuable resources to researchers in the field. The study builds on existing collaborations between a team
experienced in psychiatric phenotyping and application of EHRs, and a team active in developing and applying
emerging methods in ML to natural language data. The model architecture developed in this application will lay
the groundwork for a future clinical trial application.