ABSTRACT
The COVID-19 pandemic has disproportionately affected African Americans (AA), causing high rates of severe
disease and deaths. Sickle cell disease (SCD), a life-threatening disorder primarily affecting AA (1 in 365 AA), has
magnified the impact of COVID-19 on AA community. Prior research is on COVID-19 in SCD people is limited by
small sample size, heterogeneity in study designs, and data quality issues questioning their generalizability.
Majority of these studies were temporally done during the early phase of pandemic and were focused on
evaluating the associations of clinical risk factors with adverse outcomes. These studies did not evaluate the impact
of social risk factors (social vulnerability indicators, rural location, social determinants of health) and the secular
trends in pandemic (viral variants, vaccines, antiviral therapies). Therefore, the true extent to which SCD is a
biological risk-factor for COVID-19 severity and how it interacts with other known biological and social risk factors
for COVID-19 remains unclear. Furthermore, the prevalence and morbidity caused by the long-COVID, the post-
acute sequalae of SARS-CoV-2 (PASC), is unknown in SCD population. Proposed study will address these
knowledge gaps using a data-driven strategy. The main objective is to characterize the epidemiology and
outcomes of COVID-19 in SCD patients using NIH initiated National COVID Cohort Collaborative (N3C), a
harmonized EHR repository. This registry is representative of US population with ~40 million patients [SCD
patients, 15,169; COVID positive SCD patients, 4,143]. Aim 1 will determine the biological and social risk factors
for adverse outcomes of COVID-19 in people with SCD. Sub-aim 1a will quantify the relative risk of COVID-19
adverse outcomes (severe COVID-19 and mortality) between propensity score (PS) matched cohorts of SCD and
non-SCD patients and clarify if SCD is a biologic risk factor. To identify the key biological or social risk factors for
adverse outcomes multivariate analyses will be performed. In Sub-aim 1b, will identify biological and social risk
factors of adverse outcome in SCD patients. These analyses will be limited to SCD cohort with COVID-19. Advanced
statistical methods have been employed to overcome the weaknesses inherent to usage of registry data (e.g.,
coding errors, missing data etc.). The exploratory aim, Aim 2, will characterize the subphenotypes of long COVID
in the AA with SCD and determine if differences exist compared to the general population. This aim will use
existing machine learning methods available with N3C to identify the clusters of long-COVID using Human
Phenotype Ontology-encoded EHR data. These analyses will identify clinical characteristics of the sub-phenotype
clusters or groups of long-COVID in AA with SCD and compare it with general population. Upon completion, our
study will provide real-world data to guide both clinical practice and public policymaking for preventing and
managing severe COVID-19 in SCD people. Future studies will focus on identifying the predictors of developing
long COVID and understand how it affects SCD comorbidity. Collectively, these discoveries will improve health
equity and management of acute COVID-19, now and endemic disease, and long COVID in SCD population.