Abstract
Natural genetic variation impacts most human diseases, yet predicting how regulatory variants control gene
expression and ultimately disease phenotypes poses considerable challenges. First, the polygenic inheritance
influencing most conditions requires consideration of a vast number of genes and regulatory elements. This
task is challenged by the complexity of gene regulation, where 3D regulatory interactions can link enhancers
and genes over large genomic distances. Second, multiple interacting cell types are often dysregulated in
disease pathology. This necessitates an understanding of how the collective variants associating with a
disease affect each cell type involved in the disease process and subsequently how these dysregulated
cellular phenotypes crossregulate and drive subsequent cellular states. In this IGVF project, we will use
rheumatoid arthritis (RA), a human autoimmune inflammatory disease, as a case study to develop robust
machine learning models of gene regulation to decipher the impact of genomic variation on multiple cellular
drivers of pathology—namely, inflammatory T cell and fibroblast subsets found in affected joint tissue. The
choice of RA is motivated by its public health importance, specified target tissue, access to clinical samples,
considerable knowledge of disease-associated gene loci, and our team’s complementary expertise in machine
learning, RA pathophysiology, immunology and inflammation, and single-cell functional genomics.
We will develop an advanced machine learning framework to model the effects of allelic variation on gene
regulatory networks based on the analysis of epigenomes, transcriptomes, and connectomes of mouse
activated T cells and synovial fibroblasts and extend these models to RA patient joint tissue and primary cells.
We will train allele-specific gene regulatory models (GRMs) that account for long-range regulatory interactions
by integrating single-cell transcriptome and epigenome (sc-multiome) data with bulk 3D interactome analyses.
A notable feature of our approach is that we leverage the genetic diversity of evolutionarily distant F1 hybrid
mice to provide robust training data for these models, and then apply these advances to the human context
through transfer learning. Highly parallelized Perturb-seq experiments in primary synovial fibroblasts from RA
patients with single-cell multiomic readouts will then be used to evaluate and refine regulatory models and to
train network models that connect gene expression programs to phenotype. Finally, we will combine spatial
and single-cell transcriptomics conducted on samples from RA inflamed joints to model the organization and
interactions between T cells and sedentary tissue-organizing fibroblasts within local cellular communities.
The predictive GRMs that will be generated from our study along with the experimental systems for human
disease will be readily transferrable to other polygenic disorders which must consider complex regulatory
genomic networks for various interacting cell types in affected tissues.