Whole-genome sequencing (WGS) has transformed our ability to track the spread of pathogens in healthcare
settings. With the ability to identify patients linked by transmission has come the capacity to determine with
high confidence the role of certain hospital locations, contaminated infrastructure, and colonized healthcare
personnel in mediating the spread of infections in hospitals. Moreover, broad integration of genomic with
clinical data has the potential to identify not just pathways of transmission, but also patient characteristics and
hospital practices that influence organism-specific transmission rates. However, to realize the potential of WGS
as a tool for precision infection prevention will require overcoming critical barriers. The most significant
challenges stem from the role that epidemic lineages play in the overall antibiotic resistance epidemic. It has
been shown that the majority of antibiotic resistance in healthcare settings is due to the importation and spread
epidemic lineages that have reached high-prevalence in regional healthcare networks. Due to the high
prevalence of a small number of strains, it becomes challenging even with WGS to determine whether two
infected patients are linked by transmission within the hospital, or if one or both patients acquired their
infections during a previous community or healthcare exposure. The standard approach for discerning if two
patients are linked by transmission is to employ species-specific thresholds for the number of single nucleotide
variants (SNVs) separating two patients isolates; above which they are concluded to not be linked by
transmission and below which transmission is deemed likely. However, there is a great deal of evidence that
applying these SNV-thresholds can lead to both false-positive and false-negative transmission inferences.
Sources of error include the difficulty of discriminating between recent transmission at a connected healthcare
facility and higher than expected SNV differences between true transmission pairs due to mutation
accumulation during long-term colonization. Here, we seek to develop, validate, and apply sampling,
sequencing and analysis strategies to enable accurate transmission inference in high-prevalence endemic
settings. In Aim 1 we will build on preliminary data showing that we can group patients linked by transmission
in an SNV-threshold free manner, and evaluate several methods for detection of intra-facility transmission
clusters. In Aim 2 we will develop and apply population sequencing strategies to comprehensively detect and
track the spread of multiple strains between patients. In Aim 3, we will expand the analysis of population
sequencing data to incorporate sharing of unfixed alleles into transmission inference. Lastly, we will apply our
optimized genomic epidemiology toolkit to determine the relative contribution of importation, patient-to-patient
transmission, environmental contamination and intra-patient evolution to colonization burden with five high-
priority MDROs in an ICU over the course of a year. In total, we expect the results of this proposal to enable
the routine use of genomics to track and prevent the spread of infections in hospitals.