Project Summary
Status epilepticus (SE) is a neurologic emergency associated with high risk of neurologic decline and
readmission. Mortality, length of stay, and cost all increase when patients in SE progress to refractory status
epilepticus (RSE). SE is clinically heterogenous and broadly defined, which is a barrier to conducting
randomized trials and contributes to pervasive diagnostic delays and treatment variability. Further, rare
subtypes of SE, such as New-onset refractory status epilepticus (NORSE) remain poorly understood. Case
definitions that are extractible from the electronic health record (EHR) are necessary for a population-level
approach to surveillance of SE, including NORSE, aimed at identifying high risk groups and associated
conditions and exposures, supporting early diagnosis, determining incidence, establishing natural history and
targeting of therapies. EHR and administrative case definitions for SE do not exist, and current methods of
identifying patients with SE using only structured EHR data are prone to bias. In general, prediction models
using only structured data often have limited utility.
To our knowledge, our proposed project is the first attempt at large-scale multidimensional phenotyping for SE
using unstructured data. We hypothesize that generating consensus around the spectrum of clinical
phenotypes of SE and using Natural Language Processing (NLP), to identify and classify SE is an essential
first step for the creation of SE registries and comparative effectiveness and pragmatic trials of RSE
prevention. In Aim 1, we will apply an innovative sequential mixed methods approach, using (a) a modified
Delphi method to establish consensus around labels to identify relevant information elements (“ground truth”)
and (b) a discrete choice experiment (DCE) to rank identify time-evolving attributes of SE. This will allow us to
study whether attributes are weighted differently during a SE admission, and whether risk trajectories toward
developing RSE are identifiable. In Aim 2, we will leverage unstructured EHR data from two large academic
centers and apply NLP to develop a standardized data extraction model of symptom dimensions, clinical
features, and complex concepts of SE from EHRs. Such a model could then categorize SE by clinical
outcomes, specifically RSE. Such a tool lays an essential foundation for future comparative effectiveness and
pragmatic trials of potentially modifiable preventive factors of RSE, leading to the development of clinical
decision support tools, quality metrics, and performance measures for SE and RSE management.