The Human DNA virome: from petabase scale to single-cell resolution - PROJECT SUMMARY At the turn of the millennium, the cost of sequencing one megabase of DNA exceeded 10 million dollars. Today, decoding that same megabase costs less than a penny. This dramatic reduction in the cost of sequencing has catalyzed a genomics revolution and resulted in petabases (millions of billions of bases) of DNA and RNA- sequences from human cells and tissues. We hypothesize that this transformational capacity of next-generation sequencing will now catalyze our understanding of the human virome—both by illuminating the set of DNA viruses in human tissues and by profiling the host (human) cells that are infected. Here, we outline new computational and experimental methods to realize the unique potential of petabase-scale sequencing data in studying human DNA viruses. Our work will uncover foundational aspects of the human virome, including tissue tropism and cellular reservoirs for all DNA viruses. Further, we present complementary strategies through host genetics analyses and single-cell multi-omics to define and characterize the molecular interactions of human cells associated with viral infections and latency. First, in Aim 1, we will develop methods to quantify latent viral features from petabases of unmapped whole genome sequencing reads from hundreds of thousands of individuals. These new molecular variables will reveal the degree of latent viral DNA in blood and will be paired with comprehensive host genotyping and phenotyping. We will determine host genetics factors associated with high viral levels and nominate phenotypes, including complex disease, that may be driven by long-term latent infection in individuals. In Aim 2, we will extend our petabase-scale resource (Serratus) to create a ‘Digital Human Virome’ by uniformly processing billions of dollars of public sequencing data from human cells and tissues to identify and quantify all DNA viruses. Our resource will aggregate meta-data to extract sex, cell/tissue of origin, disease status, and geographic location to create a Digital Human Virome for DNA viruses, revealing tissue tropism that can be mined for clinical associations, such as our recent discovery of HHV-6 reactivation in CAR T cells. Finally, in Aim 3, we will develop a new high-throughput single-cell multi-omics technology termed ‘Latent-seq’ that will identify individual human cells that harbor latent viruses with paired high-quality cell state measurements. We will first establish and benchmark the assay using a set of well-defined cell lines before extending applications to primary human tissues in collaboration with the Human Virome Program Consortium. Together, these workflows will define human DNA virome in health and disease by leveraging this unique moment in the capacity of genomics technologies. As every human gets infected by endemic eukaryotic DNA viruses, but only some individuals ever show symptoms, our systematic approaches will uncover new associations between molecular interactions and human phenotypes intertwined with the human virome.