Project Summary/Abstract
The current HIV clade system is not well tailored to monitor viral changes in HIV transmission clusters
over time: The clade system was established nearly 20 years ago, and the pre-selected clade references are
not updated often. Furthermore, rapid viral evolution can easily render this clade system suboptimal in tracking
viral changes in epidemics. Even worse, although we have long taken for granted to rely on this clade system
in most HIV research, it remains elusive as to the degree that this system represents an actual spectrum of
HIV-1 diversity. This proposed study aims to address the critical limitation of the current clade system and
improve its accuracy in identifying and tracking viral changes in the emergence and evolution of HIV clusters.
Our central hypothesis is that some new viral diversity involved in the emergence and evolution of transmission
clusters already exists in contemporary HIV epidemics; however, due to the lack of appropriate investigation
and detection tools, signals for the new viral diversity have been overlooked or missed in epidemiological
surveillance until the clusters expand and become threatening. Our hypothesis is based on our previous
findings of new G and J lineages that fall outside of the classic HIV clades, as well as a growing number of
global HIV sequences whose genotyping is unattainable under the current HIV clade system. The rationale for
this proposed study is that the new viral diversity information, hopefully revealed through the development of
new tools in phylodynamic clustering, will efficiently improve our capability to track the emergence and
evolution of expanding HIV clusters. In Aim 1, through rigorous quality control of genotyping information and
extensive data mining, we will determine the first actual list of HIV-1 sequences that bear new diversity
information beyond the traditional A-K clades. In Aim 2, we plan to develop a novel clustering algorithm to
minimize misclassifications derived from incorrect or absent reference selection. This algorithm will be
employed to determine for the first time whether the new viral diversity beyond the current clade system has
gained epidemic importance by forming clusters. Finally, in Aim 3, we will integrate the results from Aim 1 and
Aim 2 to construct a findable, accessible and reusable web interface to facilitate efficient tracking of
nontraditional HIV diversities in the emergence and evolution of expanding HIV clusters. This interface would
be the first platform with features to track nontraditional HIV diversities that have been neglected in the past.
We will benefit from our over 17 years’ experience in HIV molecular epidemiology, computational biology, and
algorithm/software development to perform this proposed project. At the end of this study, we will help fill the
knowledge gap about the new viral diversity beyond the current HIV clade system and a technique gap to
detect nontraditional viral clades. We expect to reveal a greater degree of HIV diversity that may have been
underestimated in the current HIV research and surveillance. Our proposed approach would allow HIV
sequence data to be better genotyped, thus help improve genomic-based HIV healthcare and treatment.