PROJECT SUMMARY
As much as 10% of the population suffers from a rare disease (RD); 80% of these diseases are caused by
gene mutations and up to 75% are present at birth or begin in childhood. Diagnosis of genetic diseases is often
problematic: roughly 25% of RD patients must wait between 5 and 30 years for a diagnosis, and about half of
the initial diagnoses are wrong. For many affected children, definitive diagnosis comes only after a protracted
and frustrating odyssey of visits to different specialists. Emerging genetic sequencing techniques offer the
possibility of shortening this long and costly path to diagnosis. Methods for determining the changes in gene
sequences across all genes (exome sequencing) or all genetic material (genome sequencing), collectively
referred to as Next-Generation Sequencing (NGS), and which were first used to identify the genetic cause of a
disease in 2010, are now becoming routine in the clinic. The ability to make a diagnosis with NGS has more
than doubled since 2010 for children with suspected genetic diseases. The diagnostic analysis of NGS data
involves the assessment of tens of thousands (exome) or even millions (genome) of changes in the DNA
(variants), which requires sophisticated computer algorithms that can sift through these/this data to find the
cause. Our group has developed the Human Phenotype Ontology (HPO), a resource widely used around the
world for the computational analysis of clinical data in human genetics and pediatrics, allowing algorithms to
match the symptoms of a patient with database records of over 7,000 genetic diseases.
Our Exomiser software compares the clinical phenotypes of patients with known human diseases and
genetically modified animal models, and couples this with an analysis of the disease-causing potential of DNA
variants, greatly reducing the search space to identify the causal variant. Exomiser efficiently processes both
exome and genome data. In this proposal, we plan to extend Exomiser to utilize new genomic data types
including long-read genome sequencing and NGS-based analysis of RNA data, which will improve
pathogenicity prediction for structural variants (SVs) and for variants affecting gene expression or splicing. We
will also predict novel disease genes through characterization of networks of clinical phenotypes and the
molecular functions (pathways) of affected genes. We plan to use these algorithms to assess collections
(cohorts) of unsolved cases in projects such as the 100,000 Genomes Project. Our algorithmic approach will
be applied to intelligently reanalyze unsolved cases periodically as new information is added to the medical
literature. And finally, we will develop tools to integrate Exomiser into a large range of settings by adding
support for standards generated by the Global Alliance for Genomics and Health (GA4GH). The proposed
advances will make Exomiser more efficient, more accurate, and easier for non-specialist pediatricians to use,
bringing genomic diagnostics to routine pediatric clinical care.