PROJECT SUMMARY/ABSTRACT
This proposal for the NIH Pathway to Independence Award (K99/R00) focuses on the training of Dr. PingHsun
Hsieh to become an independent investigator of large-scale genomics and human population genetics. Dr.
Hsieh is a population geneticist by training, and the proposed studies will advance his training into long-read-
based sequencing technologies and novel machine-learning approaches to study the fitness consequences of
new mutations, with a focus on structural variants (SVs), in humans and nonhuman primates. Another
essential piece will be the development of resources on which types of new SVs are most likely to be
pathogenic and hence most worth further effort by medical researchers. The methods developed in this work
will enable other researchers to do more hypothesis-free analysis of SVs in disease etiology.
Specifically, the training program will center on the study of the distribution of fitness effects of new SVs in
human and nonhuman primates using high-quality SV calls and genotypes from several large-scale long- and
short-read sequencing projects. The mentored work will take place under the supervision of the primary
mentor, Dr. Evan Eichler, and the co-mentor, Dr. Sharon Browning, both at the University of Washington (UW).
The mentor and co-mentor are well-established experts in the characterization of genomic variations using
high-throughput technologies and the development of stochastic modeling methods for large-scale genetic
data, respectively. Dr. Hsieh will also gain advice from a formal advisory committee as well as through
activities arranged by the Department of Genome Sciences (GS), which is an optimal place for the mentored
training providing the candidate with access to outstanding scientists in areas including genetics of model
organisms, disease, population genetics, and the development of high-throughput genomic technologies.
While found in nature and yet generally deemed to be deleterious given their size, SVs can be beneficial, and
thus, the distribution of fitness effects (DFE) of new SVs (i.e., the relative frequencies of beneficial, neutral, and
deleterious SVs) remains elusive. In the proposed studies, we will infer the DFE of new SVs and other variants
to assess their relative importance in nature, which in turn helps prioritize variants (e.g., SVs vs. single-
nucleotide variants [SNVs]) in medical genetics. Specifically, in the K99/R00 phases we will (1) infer the DFE
of new SVs and SNVs using a diverse panel of ~100 long-read and ~4,000 short-read high-coverage
human and nonhuman primate genomes; (2) compare the DFE of new mutations among primates using
contemporary and ancient DNA genomes; and (3) study the fitness effects and selective constraints on
diseases in different mutation categories in large cohorts of >20,000 genomes. The skills learned in this
proposal are on the cutting-edge and are tailored for the candidate to amass a great amount of knowledge in
new areas of genomics, which will be applicable to many organisms and diseases and critical to the
candidate’s future independent laboratory.