Methods for profiling the cancer virome and microbial strain genetics - Project Summary An increasing body of research links the human microbiome to cancer by a variety of mechanisms, including pathogenesis, immunity, and response to treatment (not to be confused with the more controversial, recently publicized, but unrelated role of putative tissue-resident microbes in cancer). Major knowledge gaps remain, however: while viruses such as HPV, Epstein-Barr virus, and hepatitis are well-known to drive carcinogenesis, not one of the many recent large-scale cancer microbiome studies have tested for endogenous viral involvement. This is due not to a lack of interest, but to the unavailability of computational tools to accurately identify non-bacterial microbial community members. Similarly, just as small variations in the human genome can profoundly alter phenotype, dramatic genetic and phenotypic differences are common among microbial strains of the same species. Again, only within the past few years have accurate methods become available to genotype microbial community members at scale from shotgun metagenomes. Almost no studies have as yet leveraged this new information, however, since again no biostatistical methods have been established to do so. We will thus address both of these methodological gaps in the analysis of cancer-associated viral microbiome members and microbial strain genetic variants by, first, developing computational methods for virome profiling from shotgun metagenomes and metatranscriptomes. These will be initially driven by applications to stool microbiomes from the CRC adenoma-to-carcinoma continuum, but also appropriate for other cancer microbiomes. They will integrate viral nucleotide and amino acid reference sequences with the lab’s published taxonomic profiling algorithms, in combination with deep learning for viral identification and quantification from long reads and assembled contigs. We will validate and apply these methods to a meta-analysis of 3,512 CRC, adenoma, and control stool shotgun metagenomes that have been collected, curated, and uniformly preprocessed from 16 existing studies to identify newly-detectable viruses significantly enriched or depleted in the CRC gut. Second, we will develop biostatistical methods for analysis of microbial strain genetic variants and phylogeny in malignancy-associated microbiomes. Again, these will be initially applied to CRC metagenomes, but are applicable to any cancer microbiomes. These methods will adapt a combination of regularized regression and phylogenetic linear mixed models (PGLMMs) to detect strain-specific genetic elements and phylogenetic lineages significantly present or absent during cancer outcomes. We will validate and apply them to the same CRC meta-analysis data collection. This will provide the first identification of strain-specific microbial genes and evolutionary selection pressures in the endogenous microbiome specific to colorectal carcinogenesis.