SUMMARY
Leukodystrophies (LD) are a group of rare genetic disorders preferentially affecting cerebral white matter (myelin)
in previously healthy children and is associated with extremely poor prognosis. Some subtypes of LD including
hypomyelinating (HLD) have high DNA diagnostic rate (70%) whereas more complex forms mostly remain
unsolved even after standard short-read genome sequencing (srGS). We hypothesize that complex genetic
variants such as repeat expansions and other structural variants (SVs) mapping to blind spots of srGS account
for missed molecular diagnoses in LD. We have shown that human genomes accessed by long-read GS using
PacBio Sequel IIe (HiFi-GS) can reveal small nucleotide variants (SNVs) in difficult-to-map regions, expansions,
and SVs throughout the genome. More specifically, using our large HiFi-GS dataset (N=1191 individuals)
developed for understanding pediatric rare disease in the context of the Genomic Answers for Kids (GA4K)
program, we showed that HiFi-GS yielded increased discovery rate with >4-fold more rare coding SVs than srGS.
Our goal here is to first utilize AI/ML assisted srGS analyses on LD cases to define 125 cases of unsolved LD.
Then, using the GA4K HiFi-GS resource as reference we will explore the utility of expanded long-read capabilities
(SV detection, methylation, personal haploid assemblies) to allow fine definition of underexplored genomic
features in unsolved pediatric LD. By studying contiguous haploid assemblies, we aim to provide full assessment
of maternal and paternal DNA even in complex repetitive regions of the genome. In parallel, we will study DNA
methylation signatures obtained by 5mC-HiFi-GS runs and establish function of variation in non-coding spaces
and indirect effects of repeat expansions. Finally, we will validate the findings in functional data (patient derived
induced pluripotent stem cells ,iPSCs) to systematically study impact in RNA and for a subset in myelination
(oligodendrocytes derived from iPSCs). We will also exploit zebrafish model organism to functionally study new
LD disease genes and variants in vivo. We anticipate that HiFi-GS platform combined with distinct clinical
endophenotype among rare neurological disease will maximize potential for discovering new mechanisms for
“vanishing white matter” and more generally accelerate the development of 3rd generation tools for unsolved
genetic disease.