Project Summary
This proposal aims to develop, evaluate and disseminate novel statistical tools for rigorous investigation of the
clinical spectrum, biological underpinnings and social determinants of Severe Acute Respiratory Syndrome
Coronavirus 2 (SARS-CoV-2) infection, COronoVIrus Disease (COVID-19), and Post-Acute Sequelae of SARS-
CoV-2 infection (PASC). Given the enormous scale of the COVID-19 pandemic, the potential severity of PASC,
the complexity of available and anticipated data streams, and the paucity of biological and clinical knowledge,
there is an urgent need for novel and robust statistical methods to address the most pressing PASC related
research questions. Advanced statistical methods for observational data can be leveraged to address many of
these questions; however, the identification, rigorous application and advancement of apropos methods requires
sophisticated understanding of both the clinical context and the nuanced capabilities of available methods. To
this end, we propose statistical innovations and novel translation of existing methods to significantly advance
PASC clinical research, bringing together a team of physician-scientists and biostatisticians who are deeply
embedded in COVID-19 clinical research to achieve the following specific aims: Aim 1: Develop and evaluate a
causal mediation analysis framework for investigating the mechanistic pathways from SARS-CoV-2 infection to
PASC and PASC recovery, including methods to accommodate time-varying and unevenly spaced mediators
and time-to-event outcomes; Aim 2: Apply, evaluate, and extend marginal structural models as a framework to
assess the impact of interventions on likelihood of PASC associated severe outcomes, in the context of time-
varying confounding, competing and semi-competing risks, interval censoring and unobserved disease
subtypes; Aim 3: Develop and apply methods for positive unlabeled data, using an expectation-maximization
approach that leverages measured covariates and information on patient-level outcomes. The proposed
methods will be applied using local and national emerging observational and EHR data resource. These
statistical innovations will transform our understanding of the clinical course of PASC as we lead rigorous
application to several leading-edge data resources.