Causal graphical methods for high-dimensional heterogeneous biomedical data - In the past decade, there has been an explosion of data collected from biological and biomedical systems, both in terms of type and volume. Mining these high-dimensional, heterogeneous, and often dynamic datasets to make biologically or medically important inferences or develop predictive models requires new sophisticated data analytics methods. New machine learning methods have begun filling this gap, but most of these methods generate “black box” models that lack clear interpretability. Additionally, these methods are associative, and are thus incapable of teasing out the complex cause-effect relationships among features in the dataset. Directed causal graphical models (DCGMs) are a powerful tool for filling this gap. DCGMs, learned from observational datasets, can represent causal relationships between variables. This allows DCGMs to generate hypotheses of mechanisms and construct parsimonious, causally informed predictive models. However, biomedical datasets often have features that make it difficult to construct causal graphical models over the full dataset. Examples include: data type heterogeneity, high dimensionality, multicollinearity, cyclicity, and nonstationarity. To address these problems, I propose to develop methods for learning causal graphs in datasets containing (1) a heterogeneous mixture of continuous, categorical, and censored variables, (2) high dimensionality and multicollinearity, and (3) cyclicity and nonstationarity. In Aim 1, I will develop a new causal discovery algorithm that accommodates continuous, categorical and censored variables (e.g., survival). In Aim 2, I will test and compare various methods for matrix decomposition and dimensionality reduction in their ability to learn a meaningful low-dimensional latent feature space to be used in graph learning methods. In Aim 3, I will develop a new method for causal discovery in dynamic, possibly cyclic, gene regulatory networks at single cell resolution. In all cases, testing and validation will be performed on synthetic and real-life publicly available datasets. These methodological improvements constitute important steps forward in the field of causal discovery and they can be utilized together or independently to provide a flexible and powerful platform for analysis of a wide range of biomedical datasets. Once made available, they will enable researchers to make inferences about causal mechanisms, generate hypotheses, and build robust, parsimonious predictive models.