A de novo approach to detect and interrogate neglected HIV diversity - Project Summary/Abstract The current HIV clade system is not well tailored to monitor viral changes in HIV transmission clusters over time: The clade system was established nearly 20 years ago, and the pre-selected clade references are not updated often. Furthermore, rapid viral evolution can easily render this clade system suboptimal in tracking viral changes in epidemics. Even worse, although we have long taken for granted to rely on this clade system in most HIV research, it remains elusive as to the degree that this system represents an actual spectrum of HIV-1 diversity. This proposed study aims to address the critical limitation of the current clade system and improve its accuracy in identifying and tracking viral changes in the emergence and evolution of HIV clusters. Our central hypothesis is that some new viral diversity involved in the emergence and evolution of transmission clusters already exists in contemporary HIV epidemics; however, due to the lack of appropriate investigation and detection tools, signals for the new viral diversity have been overlooked or missed in epidemiological surveillance until the clusters expand and become threatening. Our hypothesis is based on our previous findings of new G and J lineages that fall outside of the classic HIV clades, as well as a growing number of global HIV sequences whose genotyping is unattainable under the current HIV clade system. The rationale for this proposed study is that the new viral diversity information, hopefully revealed through the development of new tools in phylodynamic clustering, will efficiently improve our capability to track the emergence and evolution of expanding HIV clusters. In Aim 1, through rigorous quality control of genotyping information and extensive data mining, we will determine the first actual list of HIV-1 sequences that bear new diversity information beyond the traditional A-K clades. In Aim 2, we plan to develop a novel clustering algorithm to minimize misclassifications derived from incorrect or absent reference selection. This algorithm will be employed to determine for the first time whether the new viral diversity beyond the current clade system has gained epidemic importance by forming clusters. Finally, in Aim 3, we will integrate the results from Aim 1 and Aim 2 to construct a findable, accessible and reusable web interface to facilitate efficient tracking of nontraditional HIV diversities in the emergence and evolution of expanding HIV clusters. This interface would be the first platform with features to track nontraditional HIV diversities that have been neglected in the past. We will benefit from our over 17 years’ experience in HIV molecular epidemiology, computational biology, and algorithm/software development to perform this proposed project. At the end of this study, we will help fill the knowledge gap about the new viral diversity beyond the current HIV clade system and a technique gap to detect nontraditional viral clades. We expect to reveal a greater degree of HIV diversity that may have been underestimated in the current HIV research and surveillance. Our proposed approach would allow HIV sequence data to be better genotyped, thus help improve genomic-based HIV healthcare and treatment.