A cell-type specific atlas of TF-element connectivity across human tissues - PROJECT SUMMARY This proposal builds upon the Common Fund data resources, GTEx and HuBMAP by integrating the single cell/single nucleus ATAC-seq data (sc/snATAC-seq) across these resources, and deploying recently developed deep learning methods to establish a cell type-specific atlas of TF-element predictions. Across our tissues and cell types, in health and disease, our genomes selectively activate and reorganize genes and cis-regulatory elements (CREs) to define diverse cell types. To accomplish this, hundreds of transcription factors (TFs) organize to determine the activity of millions of CREs, which in turn regulate the expression of ~25,000 genes. The vast majority of GWAS variants associated with common diseases and traits lie in CREs, thus a compelling hypothesis is that these variants disrupt binding of regulatory proteins. A grand challenge in biology is therefore to identify the precise genomic locations of these regulatory proteins across all CREs in all cell types in an effort to understand the function of non-coding genetic variation. The GTEx and HuBMAP Common Fund projects have generated a critical mass of sn/scATAC-seq datasets (~177 to date), across many different human donors and tissues. These data comprise an incredibly valuable resource of single cell data across human biology. We recently developed PRINT, a deep learning model that uses ATAC-seq to more-accurately reveal multiscale footprints of regulatory proteins on DNA (Hu et al. bioRxiv). ATAC-seq provides a measure of open chromatin. PRINT therefore enables the prediction of binding of regulatory proteins, such as TFs, within regions of open chromatin. We will reprocess and harmonize the ~177 GTEx and HuBMAP sn/scATAC datasets, and supplement these data with 375 ENCODE sn/scATAC datasets from human cell types and tissues. We will deploy PRINT to predict TF footprints in the GTEx, HuBMAP and ENCODE single cell ATAC data. We will utilize existing innovative deep learning models to annotate these footprints. Taken together, these analyses will enable us to characterize TF binding in human tissues and cell types, and resolve changes in CRE activity and TF binding across different cell types and differentiation trajectories in vivo. All data, code, and model predictions will be made available via the CFDE portal. We expect that these annotations will underpin CFDE user efforts to develop hypotheses regarding - and ultimately annotate - the function of genetic variants.