PROJECT SUMMARY / ABSTRACT
This grant proposal outlines a comprehensive plan to develop novel computational methods and software tools
for analyzing pangenomic data, with a focus on improving the accuracy and efficiency of variant calling and
genotyping, particularly for complex structural variants (SVs). The proposal is divided into five specific aims:
Aim 1: Create a pangenome mapper supporting long-reads, which will enable accurate and efficient mapping
of long-range sequencing data to pangenome references.
Aim 2: Develop personalized pangenomes, which involves rapid and efficient construction of a subset of a
larger graph based on an input sample's k-mers. This approach will tailor the pangenome for specific analysis
and so lead to improved performance in downstream analysis.
Aim 3: Create a pangenome variant calling and imputation method for unified genome inference, which will
combine imputation with read-based genotyping using machine learning to infer a more complete
representation of variation, including both small variants and SVs.
Aim 4: Genotyping complex SVs involving protein-coding genes, which will involve identifying long segmental
duplications, grouping haplotypes, and developing targeted genotyping methods for long and short reads.
Aim 5: Develop mature rGFA based variant calling for reporting both SV and small variants within polymorphic
sequence, which will expand the current definition of reportable variation and provide pipelines that can report
tens of thousands of additional variations per sample.
The proposal highlights the need for better computational tools for pangenome analysis, especially for complex
SVs, and outlines a comprehensive plan to address these challenges. The proposed software tools and
methods will enable researchers to analyze pangenomic data more effectively and efficiently, leading to new
insights into genetic variation and its role in disease and other biological processes.