Learning gene regulatory networks under latent confounding and data dependence - Gene regulatory networks (GRNs) encode the complex regulatory relations in transcription and splicing of genes. Learning GRNs from data is a problem of fundamental importance in computational biology. In this project, we formulate GRN inference as a causal discovery problem through graphical modeling, which is an active research area in statistics and data science in its own right. Leveraging large-scale RNA-seq data generated and accumulated in the literature, we will develop statistical methods to infer the structure of GRNs and identify direct causes of gene expression and alternative splicing. The proposed methodology is motivated by two notorious difficulties in learning GRNs, namely the existence of latent confounders and potential dependence in data. We will develop a coordinated local network learning algorithm, which is robust against latent confounding and computationally efficient. By identifying the parent set of a target gene such as a transcription factor (TF), this method facilitates the identification of the regulatory effect of the TF on any other gene, without the need to learn a full network. Due to latent confounders, we propose to model a GRN by an acyclic directed mixed graph (ADMG) having both directed and bidirected edges. A bidirected edge implies the two nodes (genes) share a common latent cause or confounder. We will develop a novel method to learn the structure of ADMGs via a hybrid approach. There are a large number of single-cell RNA-seq data generated from cells with potential dependence due to temporal or spatial association. We will develop a de-correlation approach to remove cell dependence in such single-cell data so that existing GRN learning algorithms may be applied on the de-correlated data with improved accuracy. The proposed research will advance statistical methods for GRN inference under complex and realistic settings and will also make substantial contributions to the general methodology for structure learning of graphical models. We further propose a novel idea to model feedback loops in a GRN by a chain graph with latent variables based on the causal interpretation of its undirected edges. Standard causal discovery methods assume independent data. De-correlation of dependent data is an innovative idea and has the potential to substantially improve the performance of many existing methods. This approach holds great promise for fitting graphical models on RNA-seq data from dependent cell populations. RELEVANCE (See instructions): Understanding the underlying causality for gene regulation is an important problem in medical science and public health. The high-level goal of this project is to develop novel mathematical models and statistical methods to construct gene regulatory networks from biomedical data, motivated by a few practical difficulties present to available approaches. The new methods will facilitate causal discovery in many biomedical problems, such as identification of potential causes for diseases.