PROJECT SUMMARY/ABSTRACT
The role of germline genetic variation and viral infection in development and progression has been studied
extensively in adult tumors and autoimmune disease. Less attention has been paid to the interaction of these
factors with birth defects and pediatric malignancies, particularly acute myeloid leukemia (AML), which in the
youngest patients is driven almost exclusively by structural variants (SVs) with poorly understood etiology. The
prevalence of gene fusion transcripts associated with leukemia at live birth is 10x to 100x greater than the
incidence of childhood leukemia, which suggests other risk factors must interact with SVs. One possible
candidate for interaction is the presence of germline mutations, and preliminary analysis has shown that patients
harboring SVs are enriched for germline mutations in genes responsible for DNA double-stranded break repair.
Another candidate is the timing of viral infection in genetically-predisposed individuals. Clinical trials of gene
therapy with viral vectors failed in part due to viral integrations activating oncogenes such as MECOM. Recent
work has shown a direct mechanism for derivative chromosome formation at the most common breakpoints in
leukemia, and human herpesviruses, including CMV, are one of the single greatest risk factors for chromosomal
birth defects. Additionally, germline and somatic copy number and short sequence variants have been
documented affecting e26 transformation specific (ETS) factors, which participate in high-risk gene fusions seen
in both solid and liquid tumors. These factors and their binding sites determine developmental fates across
tissues, yet their motifs are short tandem repeats -- the single most variable class of features in the human
genome. Small changes in dosage, as created by disruptions in binding site motifs or variation in ramp
sequences, may be sufficient to predispose individuals to disease. The primary obstacle to studying these
mechanisms has long been the small sample sizes and biased coverage of cohorts assembled for rare and
childhood diseases. The vast quantity of whole-genome, whole-transcriptome, and long-read sequencing data
provided by the Gabriella Miller Kids First! (GMKF) Consortium, Therapeutically Applicable Research to
Generate Effective Treatments (TARGET), and the X01 Long Read Pilot Project for omics-cold pediatric
leukemia patients, and others negate this obstacle. I posit that both the computational infrastructure and the
sample sizes required to address this urgent need are now in place, allowing us to determine if predisposition
risk can be mitigated by screening or prophylaxis. The overall goal of my F99 training phase is to characterize
germline variants and perform functional validation of variants in a zebrafish model of high-risk pediatric
AML. During my K00 phase, I propose to characterize germline regulatory, splicing, structural variants,
and viral genomic integrations as catalysts of risk for leukemia. The training and data resulting from this
fellowship award will establish the foundation of scientific and professional skills for my career as an independent
researcher.