PROJECT SUMMARY
Leveraging the power of the human genome to understand the risks, causes, and treatments of human dis-
ease remains a grand challenge for all of biology and medicine. While sequencing costs have plummeted, and
clinical implementation has become commonplace, interpreting human genomes remains a highly challenging
task. It is our hypothesis that understanding the function of the genome and its products at a molecular, tissue,
and phenotypic level using advanced machine learning will help unlock the door to better interpretation for sci-
entific discovery and better clinical outcomes based on genomic medicine. To that end, our team has spent
the past two decades working to develop computational models of biology, to predict how those models are
perturbed through changes in the genome, and to use those perturbations to model phenotype and disease.
We have had many research outputs in this area, having developed and published a number of widely used
methods that predict biochemical and phenotypic changes caused by genetic variants to infer phenotype and
pathogenicity. However, we believe that there is a coming convergence between the variability in clinical inter-
pretation, high-throughput biotechnology assays, and modern machine learning methodology that will result in
more accurate clinical assessments and improved clinical care. Therefore, in this ambitious proposal, we are
addressing important questions in variant and genome interpretation consistent with this view and the mission
of the IGVF Consortium. Our major goals include (1) developing advanced semi-supervised approaches to
predict variants that disrupt molecular function and/or are capable of altering phenotypes; (2) identifying in-
formative assays, variants, and genes to automate experimental design with an emphasis on resource alloca-
tion and reduction of ascertainment bias in the Consortium; and (3) developing machine learning approaches
to integrate these models into a workflow of the IGVF Consortium and enable the interaction between compu-
tation and experiment in order to catalyze advances in both genetic variant interpretation and predictive model
development.