Transcriptomics compendia for the study of strain-level genetic diversity of the human skin microbiome - PROJECT SUMMARY/ABSTRACT
Staphylococcus epidermidis is found across human skin as a common commensal but is also a hospital-acquired
pathogen. This duality makes this microbe a considerable pathogen and is likely due to the immense genetic
diversity that exists across its many strains. However, our understanding of the functional consequences of this
genetic diversity is limited in part due to significant gaps in gene functional characterization (over 25% of genes
have no known functions) and in part due to dynamic environmental effects of complex polymicrobial settings,
in which S. epidermidis is nearly always found, that can influence gene expression, function and virulence.
However, a comprehensive analysis of all gene functions in all S. epidermidis strains across multiple
pathogenicity-relevant environmental conditions would present a massive and intractable search space.
Systematic assessments of gene function can be generated by multiple `omics approaches: e.g., transcriptional
data and gene essentiality screen data can be readily generated for all genes irrespective of their annotation
status and can be interpreted within the context of genetic background and environmental conditions, is a
powerful tool for large-scale gene characterization. Currently, a limited set of transcriptional and gene fitness
data exists for a few strains of S. epidermidis, but extensive analogous data has been generated for its more
deeply studied cousin, skin pathogen S. aureus. New algorithms that could use existing data to transfer
knowledge from characterized genes, including those present in S. aureus, to lesser explored genes, including
strain-specific genes, would rapidly predict relevant gene functions that could then be tested experimentally.
Thus, my goal in this proposal is to develop computational tools that leverage existing transcriptomic
and gene essentiality data from S. aureus and S. epidermidis to identify functions for uncharacterized
genes in S. epidermidis that could determine a pathogenic vs. commensal lifestyle. In Aim 1 I will use
transfer learning to derive putative gene functions, benchmark the limits of this method with RNA-seq data
collected from multiple strains of S. epidermidis grown in polymicrobial communities on reconstructed human
epidermis, and assess the functional characterizations produced by this tool by testing the contributions to growth
in a phenotypic array with stressors and epistatic interactions with stress responsive transcription factor SrrA of
genes suggested to be important in multiple stress responses, as a case study. In Aim 2 I will use similar
algorithms as in Aim 2 but include gene essentiality data to derive condition-specific gene essentiality cliques
then validate gene characterization cliques using gene knock-downs and phenotype arrays. The work proposed
here presents a framework for the development of tools for rapid hypothesis generation paired with focused,
experimental hypothesis testing to identify functional consequences of genetic diversity across strains of the
perplexing pathogen S. epidermidis.