Rare variant analysis integrating large public DNA sequencing controls and gene expression data for discovering novel predisposition genes of orofacial clefts - Project Summary/Abstract Genetic variations significantly contribute to orofacial cleft. Identifying predisposition genes of orofacial cleft holds promise for early diagnosis, preventive measures, and targeted therapies. However, pinpointing genes harboring rare causal variants remains a challenge due to limited statistical power associated with typical sample sizes of hundreds or thousands of cases and controls. We propose utilizing genetic burden analyses with large control sample size and gene expression data sets to discover orofacial predisposition genes. This approach leverages multiple genomics datasets: ~1,400 orofacial cleft samples from Gabriella Miller Kids First Pediatric Research Program and ~730,000 samples included in the public biobank-level summary counts from the recently released Genome Aggregation Database V4, ~240 orofacial cleft samples and ~220K controls from All of Us Genomics data sets. Furthermore, transcriptomics data sets will be analyzed to enrich potential causal genes, including the analysis of both bulk and single cell RNA-seq data sets from craniofacial tissues and reference GTEx tissues. We will perform discovery analysis, followed by further validation and combined analysis. This comprehensive approach aims to identify novel predisposition genes and variants associated with orofacial cleft, ultimately facilitating early diagnosis, risk stratification, improved understanding of disease etiology, and developing targeted treatments.