Identification of Trauma-related Features in EHR Data for Patients with Psychosis and Mood Disorders - Project Summary Psychotic and mood disorders represent a major driver of disability as well as health care cost. There is considerable clinical heterogeneity among patients. Developing clinically implementable machine learning (ML) tools to enable accurate patient stratification is critically important in order to augment effective personalized treatment plans. Among the factors contributing to heterogeneity, childhood trauma is an under-recognized source. The prevalence of childhood trauma is significant in adults with psychiatric disorders. Robust evidence shows that: i) individuals exposed to childhood abuse are 2-3 times more likely to develop a psychiatric disorder later in life, particularly psychosis; ii) childhood traumas impact critical windows of brain development and can trigger the onset of psychosis; and iii) among patients with psychotic and mood disorders, childhood trauma influences psychopathology, leading to more severe symptoms, poorer long-term outcomes (longer and higher rate of relapses or rehospitalization), associated with substance abuse, and are often treatment resistant and function poorly in society. Although evidence clearly indicates that childhood trauma contributes to psychiatric risk and poor treatment outcomes, large-scale computational approaches to stratify subpopulations, extract trauma features (e.g., frequency, type), and examine the links or the impact of trauma features on psychopathology and treatment outcome have yet to be developed. We propose to create gold standard annotations from Electronic health records (EHRs) and to leverage natural language processing (NLP) and ML methods to develop a standardized re-useable data model for automatically extracting trauma-related features, complex concepts, and symptom dimensions from EHRs. We will train and evaluate a semi-supervised NLP model, which is built as a joint sequence model that can both identify named entities as well as extract the relations between them. We will apply multiple strategies to validate the robustness of our model. Our proposed NLP model is essentially a “computational version of a chart review” tool, designed to mimic human chart review but performed automatically with the ability to scale. We will use this model to stratify psychosis subgroups (with or without childhood trauma history) and to correlate among the extracted features with important clinical outcome variables. Importantly, the annotation guidelines, corpus, and the data model developed by us will be valuable resources to researchers in the field. The study builds on existing collaborations between a team experienced in psychiatric phenotyping and application of EHRs, and a team active in developing and applying emerging methods in ML to natural language data. The model architecture developed in this application will lay the groundwork for a future clinical trial application.