Abstract: Data-Driven Sleep Biomarkers of Brain Health, Heart Health, and Mortality
Sleep state signals encode critical biological information about brain and cardiovascular health. However,
present approaches to polysomnography data (“sleep studies”) discard most of the collected information, instead
providing, using visual analysis and rules from the 1960s, relatively unsophisticated metrics (e.g., 30-second
sleep stages, apnea-hypopnea index). Visual scoring is also limited by interscorer inconsistencies. Recent
advances in computational science and Machine Learning (ML) / Artificial Intelligence (AI) open the way for 1)
standard scoring with unparalleled precision and consistency; 2) new data-driven, quantitative measures. There
is a critical unmet need for new tools, algorithms and datasets that leverage recent advances in data science to
develop robust sleep-based biomarkers of brain and cardiovascular health.
We propose to create a Complete AI Sleep Report (CAISR) algorithm for all standard sleep measures, and a
progressively accumulating library of novel analytics. We are ideally positioned to close this gap. We will
assemble between our six collaborating institutions sleep data from >200K patients (35,000 already assembled),
we have experience curating large clinical physiology and electronic medical records data for research; we have
progress already underway with building a scalable public data sharing portal; we have deep expertise in basic
and translational sleep science; and we have an established record of successfully developing and validating
novel deep learning tools and algorithms to analyze sleep data.
Our long-term goal is to increase the value of sleep physiology data by replacing manual analysis by open-
source data-driven AI approaches. Our central hypothesis is that sleep signals carry measurable latent
information about mortality and brain and heart health. Our specific aims are: 1) Create an online public portal
with de-identified polysomnograms (PSG) and cross-sectional and longitudinal electronic health records (EHR)
data for >200K adult and pediatric patients; 2) Implement CAISR and validated that it generalizes across age,
sex, and race. CAISR will also be externally validated on >13,000 PSGs from public research cohorts; 3) Develop
AI algorithms that a) differentiate patients with vs. without existing brain and heart disease; b) predict primary
outcomes of all cause and cardiovascular mortality, and secondary outcomes of heart disease (coronary artery
disease, myocardial infarction, congestive heart failure, atrial fibrillation, hypertension); and brain disease
(dementia, stroke, intracranial hemorrhage).
Completing these aims will lead to these expected outcomes: (1) sleep data across the lifespan, (2) sleep scoring
AI algorithms validated across age, sex, and ethnicity; (3) predictors of mortality and brain and heart health.
These outcomes will lead to new testable hypotheses, make sleep diagnostics more accessible to socially and
biologically underserved groups, and stimulate progress in data-driven sleep research.