New machine learning methods for extracting features from digital health data with applications to sleep apnea - ABSTRACT While obstructive sleep apnea (OSA) is linked to metabolic syndrome, the results from randomized clinical trials with positive airway pressure (PAP) therapy, a treatment for OSA, are inconclusive regarding therapy effects on glucose metabolism and cardiovascular disease. Continuous glucose monitors (CGM) and ambulatory blood pressure monitors (ABPM) are increasingly used to elucidate the effects of OSA on glucose metabolism and blood pressure, however, there is a large gap between the complexity of data from wearable devices and commonly used crude summary measures (e.g., glucose time in range, 24h blood pressure mean). Without advancements in algorithms that provide better characterization of temporal features in CGM and ABPM data, the adverse effects of OSA on glucose metabolism and blood pressure will likely remain underappreciated, and elucidating the heterogeneity of treatment effects with PAP therapy will remain difficult. Our proposal is motivated by data from a randomized clinical trial on the effects of PAP therapy on glycemic measures and blood pressure of patients with concurrent type 2 diabetes and OSA. Conventional CGM and ABPM summaries lack sensitivity to differentiate control and treatment groups and detect heterogeneity in treatment effects. The overall objective of this proposal is to develop novel statistical and machine learning methods to fully exploit CGM and ABPM data for precision phenotyping of glycemic and cardiovascular measures. To achieve our objective, we propose the following: (1) To address the limited statistical power of existing methods for characterizing features of the glycemic state, we will develop a distributional data analysis framework for CGM-based glycemic measures that will incorporate global and local temporal characteristics. (2) To address the limitations of existing methods for ABPM data due to the separate analysis of each blood pressure modality (systolic, diastolic) and oversimplified division of time into the night (0-6 h) and day, we will develop a tensor data analysis framework for ABPM data that will integrate all concurrent measurements (systolic, diastolic, mean arterial pressure, heart rate) aligned by the full 24h time period. We will develop software to enable the broad application of proposed algorithms to other studies that collect CGM and ABPM data. Completion of these aims will provide the necessary algorithmic and computational tools to test whether behavioral interventions and new pharmacotherapeutic agents improve the glycemic status and blood pressure in population subsets, including those with prediabetes and hypertension. Reducing the metabolic and cardiovascular risk burden has unquestionable relevance for the prevention of cardiovascular morbidity and mortality.