Cumulus: A Universal Research Sidecar for a SMART Learning Healthcare System - Title: Cumulus: A Universal Research Sidecar for a SMART Learning Healthcare System Applicant Name: Boston Children’s Hospital Physical Address: 300 Longwood Ave, Boston, MA 02115-5724 Contact Name: Jamie Chan Contact Phone Numbers (Voice, fax): 617-919-2729 E-mail address: osp@childrens.harvard.edu Project Abstract To achieve the National Academy of Medicine’s learning health system (LHS) vision, outcomes of Patients 1 through n should always inform the treatment of Patient n+1. This vision has been elusive. As highlighted in the Office of the National Coordinator of Health Information Technology (ONC) report, “Widespread adoption of electronic health record (EHR) systems and consumer electronics has resulted in large volumes of electronic health-related data. This has created opportunities for researchers to leverage the capabilities of an evolving digital architecture. However, progress remains slow because of challenges with both the data and the health information technology (IT) infrastructure that support research uses.” We propose Cumulus, which will be scalable across the health system and will support rapid learning with turnkey functions. To apply knowledge gleaned from populations to individual patients, cohorts of patients with similar characteristics must be constructed and analyzed. EHRs are a natural choice as a source. They are produced at scale and are a byproduct of care. Instead of promoting massive centralization of health data, we take a federated approach, with readily deployable local infrastructure—EHR “Sidecars” —and capacity to query across sites. Ten years after our original ONC-funded SMART project, we propose working with ONC to scale tools and develop functions leveraging the SMART/HL7 Bulk Data Export application programming interface. We architect open infrastructure, software, and technical specifications in support of embedding LHS research capacity within commodity health IT infrastructure. The proposed products will be relevant to advancing the clinical trials enterprise, drug development, health services research, machine learning and artificial intelligence, implementation research, and regulatory science. Because creating a rapid LHS is a bold, multi-faceted undertaking, our proposal is concrete and tightly scoped to address four objectives. Work on Objective 1 recognizes that extracting EHR data currently requires local specialized teams and complex processes to transform data for an analytics engine. Therefore, we build on the SMART/HL7 Bulk Data API, creating a common FHIR-based schema with annotations to support efficient analytics, and a standardized, scalable, transformation pipeline to populate it, addressing this gap to enable shared cohort definitions. Objective 2 activity ensures information in the text of clinical notes can contribute, alongside structured data, to rapid learning. We incorporate open source natural language processing software into the Objective 1 pipeline outputting FHIR data that augments the US Core Data for Interoperability (USCDI). Under Objective 3, we protect data leaving healthcare sites with de-identification by adapting existing open source anonymization tools to the Objective 1 pipeline. Objective 4 activity produces the Cumulus app, a user interface for identifying cohorts in annotated FHIR data from objectives 1 to 3. We use commercial cloud services from major vendors, likely focusing initially on offerings that include FHIR servers and SMART launch capabilities. In the out years of LEAP, we develop federated query across Cumulus nodes for health system-level learning and define a formal API for Cumulus bulk data apps. Our extraordinary stakeholder panel will guide our design, review our results, and advise on a strategy to scale Cumulus.