1. Project Summary
De novo and ultra-rare copy-number variants (CNVs) often underlie the genetic etiology of pediatric and
neurodevelopmental diseases. As such, CNVs provide opportunities to study critical dosage sensitive genes, as
well timing and origin of structural variation formation (SV). SV results from distinct mutational mechanisms,
including DNA recombination, replication, and repair-associated processes, each leaving specific genomic scars
and identifiable signatures that can be accessed with appropriate sequencing methodologies. We and others
have shown that DNA repair mechanisms, such as break-induced replication (BIR) and microhomology-mediated
break-induced replication (MMBIR), largely contribute to germline SV formation in genomic disorders as well as
somatic events in cancer. The error-prone nature of BIR/MMBIR may lead to SVs characterized as complex
genomic rearrangements (CGRs) due to insertions of templated segments at the junctions as well as
amplification or deletion of genomic segments concomitantly with inversion formation. Our preliminary data
indicate that BIR and MMBIR are prone to occur in genomic regions laden with large repeats, here called highly‐
similar intrachromosomal repeats (HSIRs), often leading to nonrecurrent CNVs that perturb nearby dosage
sensitive genes. At least 70 genetic syndromes are known to be caused by nonrecurrent CNVs, but the
contribution of HSIRs to the underlying molecular mechanism has not been established. We hypothesize that
i) a relevant fraction of de novo nonrecurrent CNVs are generated by BIR on which HSIRs provide
substrate for ectopic recombination and template-switching; ii) inverted and direct HSIRs have distinct
roles in the formation of such CNVs; iii) genetic diseases caused by nonrecurrent CNVs present highly
diverse genomic structure that contributes to variability in gene and disease expression. These
hypotheses will be tested by virtue of the following aims: (1) to identify nonrecurrent CNVs in disease cohorts
and to investigate the features of repeats at the breakpoint junctions (Aim 1); (2) to investigate whether the
genomic structure of pathogenic CNVs at the Xq28 locus contributes to allele-specific phenotypic differences;
(3) to define the impact of the genomic structure of pathogenic CNVs to an individual transcriptome with
implication for disease expression (Aim 3). In all, we will combine extensive genomic and transcriptomic analysis
with robust phenotypic characterization to investigate the molecular properties of pathogenic HSIR-mediated
CNVs. This work will fill an important gap in knowledge concerning the role of genomic repeats
underlying the formation of SVs. Of particular interest is the establishment of the relative impact of HSIRs
on the generation of CGRs. Moreover, we will establish the clinical and biological relevance of HSIR-
mediated nonrecurrent CNVs for disease expression. In summary, this application will strongly impact our
understanding of human biological processes and disease mechanisms with broad implications for the diagnosis
of birth defects, neurodevelopment, and cancer.