PROJECT SUMMARY
Apart from the double-helix B-DNA structure discovered by Watson and Crick, approximately 13% of the
human genome comprises sequence motifs that can form non-canonical, or non-B, DNA conformations. This
project focuses on G-quadruplexes, the type of non-B DNA for which we have the strongest evidence of
genome-wide formation and functionality in human cells. There are more than 700,000 putative
G-quadruplex loci in the human genome. They constitute ~1% of the genome, compared to ~1.5% occupied by
protein-coding exons. Recent in vivo experiments showed that G-quadruplexes regulate key cellular processes
(e.g., chromatin organization and transcription). Thus we hypothesize that some groups of G-quadruplex loci
evolve under purifying selection. Yet, G-quadruplexes may represent a hurdle for DNA replication. Our
published preliminary results, based on the analysis of long-read sequencing data, demonstrated decreased
polymerization speed and increased polymerization errors at G-quadruplex loci genome-wide. We
hypothesize that the same phenomena occur in human cells and lead to increased mutagenesis at
G-quadruplex loci. Building upon our published and unpublished preliminary results, this project will
examine the contribution of G-quadruplex motifs to genome evolution, which has been critically
underexplored. Aim 1 will elucidate the mechanistic basis behind the increased mutation rate at G-quadruplex
loci, using state-of-the-art high-fidelity duplex sequencing. With in vivo experiments, we will test a hypothesis
that mutation rates are increased specifically at G-quadruplex structures forming in human cells and are
associated with replication slowdown. With in vitro experiments, we will test a hypothesis that two major
eukaryotic replicative polymerases (polymerases epsilon and delta, responsible for leading and lagging strand
synthesis, respectively) stall and have increased error frequencies at G-quadruplexes. Aim 2 will assess the
contribution of G-quadruplex loci to regional variation in mutation rates in the genome and will test a
hypothesis that G-quadruplex loci facilitate structural variation in human populations and chromosomal
rearrangements during evolution. Advanced statistical techniques, including ones from the Functional Data
Analysis domain, will be used in this Aim. Finally, Aim 3 will examine selection acting on G-quadruplex loci
using classical and novel statistical tests. We will test a hypothesis that G-quadruplexes located in different
functional compartments of the genome experience varying selective pressures, e.g., promoter motifs are
expected to evolve under strong purifying selection. Moreover, we will investigate a potential association
between biophysical stability of G-quadruplex structures and the strength of selection acting on them. This
Aim will also identify groups of physiologically relevant G-quadruplex loci that will drive future functional
studies. Overall, the project will substantially advance our understanding of the contribution of
G-quadruplexes to genome evolution and diseases.