PROJECT SUMMARY
Genomic structural variants (SV) involving deletions, duplications, insertions, inversions, and translocations of
sequences are an abundant source of genetic variation. SVs have been linked to Mendelian diseases, as well
as complex heritable diseases like schizophrenia, and cancer. However, recent comparisons of extremely
contiguous genome assemblies of humans and model organism Drosophila melanogaster have revealed that
common genotyping strategies relying on high throughput short reads miss 40-80% of SVs, including those
affecting phenotypes. Thus, contribution of SVs towards diseases and phenotypic variation remain grossly
underestimated. To accurately measure the contribution of SVs towards deleterious genetic variation and trait
variation, we propose to create a comprehensive map of genomewide SVs via comparison of extremely
contiguous genome assemblies. However, contiguous de novo assembly of human genomes with high
coverage (>50X) noisy long reads remains prohibitively expensive. So I propose to analyze SVs in the 25-fold
smaller genome of model organism D. melanogaster, which has contributed substantially to our understanding
of the genetics of complex human diseases. The proposed research aims to study fitness effects of
polymorphic SVs based on de novo genome assemblies of 50 genetically diverse D. melanogaster strains that
are as complete and contiguous as the current D. melanogaster reference genome – arguably the best
metazoan genome assembly (Aim 1). I propose to use this comprehensive set of variants to infer the
distribution of fitness effects of the SVs and to estimate the proportion of adaptive SVs, both of which are
reliable proxies for the evolutionary and functional significance of SVs (Aim 1). Aim 1 will involve training in
theory and cutting edge methods in molecular population genetics. Next, the proposed project will develop an
experimental approach to determine the fitness effects of variants for which an organismal phenotype is
unknown. As part of this, the proposed project will develop genome editing resources that will facilitate rapid
transformation of one of our sequenced strains with SVs, so that fitness effects of candidate SVs from trait
mapping studies can be examined (Aim 2). Training in Aim 2 includes development of CRISPR-Cas9 toolkit in
a common genetic background to investigate the functional effects of SVs. Finally, using the toolkit developed
in Aim 2, we propose to conduct high throughput fitness assays to evaluate the selective effects of SVs under
specific selection conditions (Aim 3). The training portion of the proposed research will complement the
applicant’s previous experience and position him for a successful research career. University of California
Irvine and the Emerson and Long labs together have the resources and expertise to ensure the successful
completion of the training phase of the grant.