Project Summary
In their 2019 Antibiotic Resistance Threats Report, the Centers for Disease Control and Prevention listed
Clostridioides (formerly Clostridium) difficile as an urgent threat. As the most common healthcare-associated
infection, it has an enormous impact on both the lives of individuals and the healthcare system at large.
Developing a C. difficile infection (CDI) is most often associated with the recent use of antibiotics, as broad
spectrum antibiotics can lead to a disruption of the normal gut microbiota, which in turn allows C. difficile spores
to germinate and overwhelm the remaining microbiome that normally keeps the vegetative C. difficile at bay.
Although the patient risk factors for CDI are fairly well understood, the potential roles of genetic variation in the
infecting strain in influencing the progression to severe CDI are less so. Given the extensive diversity in both the
nucleotide sequences of core genes and variation in gene content among common C. difficile strains, it is likely
that there are significant differences in how different strains of C. difficile interact with the host. Indeed, there
have been numerous reports of variation in the propensity for certain sequence types to cause severe disease,
although the genetic variation mediating strain-level differences is largely unknown.
In this proposal I take a data driven approach to identify genetic variants influencing patient immune
responses and clinical trajectories. To accomplish this, I will leverage a massive data repository created through
comprehensive sampling of all C. difficile positive cases at Michigan Medicine. Included in this repository are
1,678 C. difficile whole genome sequenced isolates, associated processed electronic health record data from
1,516 patients, and banked serum during the instance of CDI for 1178 patients. Serum cytokine levels have
already been determined for 220 of these patients. Preliminary studies conducted in support of this proposal
demonstrate that variation encoded in the genomes of infecting strains are predictive of both initial patient
immune responses and subsequent severe infections, supporting the contribution of strain genetic background
to patient clinical trajectories. I will build upon these studies and attempt to identify the specific variants, genes
and pathways that are mediating variation in clinical outcomes. To this end I will employ a combination of
machine learning and bacterial genome-wide association studies (bGWAS) to gain insight into bacterial genetic
features that influence patient immune response as quantified by serum cytokine measurements, as well as
bacterial genetic variation associated with severe outcomes. I will then validate these bioinformatic findings by
evaluating the accuracy of model predictions by comparison of predicted and actual 1) cytokine measures on
withheld serum samples, and 2) in vivo severity outcome in a mouse model of CDI. The resulting understanding
of the genetic factors of C. difficile that impact patient cytokine response and severity outcome can then be
leveraged to improve current treatment strategies, as well as indicate novel targets for therapy against CDI.