Joint learning methods for event and relation extraction from clinical narratives - Project Summary Electronic health records (EHRs), detailing patient status and all aspects of clinical care, can greatly facilitate quality improvement and surveillance initiatives as well as revolutionize clinical research. The unstructured clinical narratives in EHRs document critical information, including medical problems, treatments, and diagnostic tests as well as the rationale for care and outcomes. Natural Language Processing (NLP) and Information Extraction (IE) systems target the identification of such critical information from clinical narratives. These systems extract clinical concepts such as medical problems, treatments, and tests, determine the attributes of these concepts to get clarity on their presence/absence and other details in a patient; and identify the interactions of these concepts with each other in terms of predefined relations. Most clinical NLP systems that tackle the extraction of this information are pipeline based: i.e., extraction of clinical concepts precedes the determination of their attributes and the determination of relations between clinical concepts. While producing promising results, these systems suffer from two major limitations: (1) when faced with data imbalance, they perform best on the more prevalent classes of observations found in the data and suffer on the less prevalent ones, and (2) they allow errors to cascade between the components. These two limitations can also compound each other. As a result, the information extracted by NLP systems can be incomplete and coarse- grained, unable to support clinical applications that require a more fine-grained picture of the patient condition. In this project, we propose to address these limitations on a clinical information extraction task that aims to capture a more complete picture of the patient condition with a novel, fine-grained, hierarchical schema for clinically-salient events and their relations. We define clinically-salient events as medical problems, treatments, and tests that are documented during patient care. We capture each event in a frame that consists of a trigger and a set of fine-grained attributes. We build event–event relations on top of events. To address data imbalance, we propose (i) a novel active learning framework that guides manual annotation efforts towards diverse and informative samples that can boost automated recognition of less prevalent attributes and relations. To address cascading errors, we propose (ii) a novel joint learning system that enables multiple tasks to inform each other for better performance across all tasks. We evaluate our work on multiple note types from multiple institutions. Expected outcomes include (1) a comprehensive heterogeneous gold-standard dataset created from multiple institutions for clinically-salient events and relations, (2) NLP methods that generate state-of-the-art results in extraction of events and relations, and (3) publications that document our findings. The annotation guidelines and schema, the gold-standard annotations, and the NLP models and tools created during the project will be shared with the research community.