Multi-omic phenotyping of human transcriptional regulators - PROJECT SUMMARY The Molecular Phenotypes of Null Alleles in Cells (MorPhiC) program is using multiple perturbation strategies to realize the NHGRI's vision of assigning function to every human gene. Strategies include pooled and individual gene knockouts and knockdowns (KDs), generated using CRISPR technologies and auxin-inducible degrons. Following application of such assays, molecular phenotypes of the cells are profiled longitudinally and at individual time points using bulk and single-cell (sc)RNA-seq. Perturbation strategies have intrinsic sources of variability, e.g., KD penetrance, while the single-cell sequencing approaches contribute technical noise, e.g., `drop out.' Quantifying and controlling this variability are crucial to ensure reliable phenotypic assessment and fulfill MorPhiC's goal to accurately catalog gene function. Given the critical role of transcription factors (TFs) in regulating cell state, all four MorPhiC Data Production Centers (DPCs) will perturb TFs and then profile cells using bulk or sc-RNA-seq. A wide range of other `regulatory phenotyping' data, including (bulk or single-cell) ATAC-seq, are being generated within MorPhiC and TF ChIP-seq, HiC, and massively parallel reporter assay (MPRA) data are available in the ENCODE and Impact of Genomic Variation on Function (IGVF) consortia. To robustly define the regulatory impact of TF perturbation, we propose a JAX MorPhiC Data Analysis and Validation Center (DAV) to analyze these multi-modal data. Our team is uniquely positioned to establish this TF-focused DAV: we are co-located with the JAX MorPhiC DPC and have consortium-level collaborations with its PI, while our own work focuses on elucidating transcriptional regulation of genes and on developing robust computational methods through community efforts. In Aim 1, we will quantify and control variability in perturbation-based regulatory phenotyping by using heterogeneous data generated within MorPhiC to isolate their technical noise characteristics and to derive a set of TF-gene target pairs (TF-GTs). We will then computationally simulate large- scale perturbation screens, through which we will perform power analysis to quantify data variability and make recommendations that ameliorate it. In Aim 2, we will evaluate published gene regulatory network (GRN) inference methods. We will also conduct two “crowd-sourced” DREAM Challenges, in which community participants will develop GRN inference methods that we will objectively evaluate with MorPhiC data. Using top- performing methods, as well as a novel approach we are developing based on dynamical systems, we will perform in silico TF perturbation within the GRNs to prioritize TFs for experimental validation in MorPhiC. In Aim 3, we will further improve robustness of inferred TF-GTs by integrating them with TF ChIP-seq, HiC, and MPRA data, knockout mouse phenotyping data (KOMP2), and spatial transcriptomics data from JAX and MorPhiC. We will validate published methods for defining tissue-specific GRNs by overlapping them with relevant MorPhiC model systems and will then use them to predict TF-GTs in systems yet to be profiled by MorPhiC. Our Aims will bolster the field's ability to decipher the regulatory function of the ~ 1600 TF genes within the human genome.