Linking endotype and phenotype to understand COPD heterogeneity via deep learning and network science - Summary/Abstract Chronic obstructive pulmonary disease (COPD) is the 4th leading cause of death worldwide, resulting in an immense public health burden. The clinical manifestations of COPD are extremely heterogeneous, and disease course is affected by numerous endogenous and exogenous factors. Finding groups of patients with similar pathobiology is crucial for the accurate prediction of disease progression and the development of personalized treatments. Currently, clinical research has been divided in the discrimination of patients based on either their phenotypic features, such as lung function, exacerbation frequency/intensity, presence of emphysema (clinical subtyping), or on the molecular compositions of their biological samples, as assessed through multi-omics assays (molecular subtyping). Despite providing some insights on different groupings of COPD patients, little agreement has been found between these two classification approaches. As such, the connection between pathophysiological processes, exposures, and their phenotypic consequences is currently unclear. In this application we propose to use deep neural network architectures to integrate phenotypic and genomic data of COPD subjects and construct integrated patient profiles that describe both the phenotypic and molecular features of the patient simultaneously. These profiles will be used to cluster patients to find joint clinical and molecular subtypes (endotypes) for COPD and to predict disease outcomes across a 5-year time span. We will extract the characteristic clinical and molecular features of each endotype to obtain endotype- specific biomarkers and connect them to clinical manifestations of COPD. Finally, we will develop network- based approaches to understand the key molecular pathways and regulators associated with each endotype. Achieving the objectives proposed in this plan will require a unique set of skills that span biology, network science, machine learning, and lung disease biology. Although Dr. Maiorino’s past career trajectory has prepared him well for the proposed research, advancing our current understanding of COPD heterogeneity is a challenging task that will require further training in specific areas. Dr. Maiorino has developed a comprehensive training program focusing on pulmonary disease biology, omics data integration, and high-dimensional statistics. Dr. Maiorino will take advantage of the rich intellectual environment offered by the Channing Division of Network Medicine and Harvard Medical School to attend courses and participate in regular meetings with his mentors and advisory board members. Altogether, Dr. Maiorino’s training and research plan will enable him to expand his current skillset and to develop into an independent investigator contributing to the advancement of precision medicine in COPD.