New Statistical Methods for Estimating Health Effects of Environmental Exposures - PROJECT ABSTRACT This project will develop, test and apply new statistical methods to address analytical challenges associated with three key research questions in environmental epidemiology: (1) how to utilize exposure data at fine spatial resolution in health analysis, (2) which individual- and area-level factors render an individual more at-risk to harmful environmental exposures, and what are the joint health effects of environmental mixture exposures? We propose methods that adopt state-of-the-art and developing ideas from functional data analysis, Bayesian analysis and machine learning, while grounding the work on established environmental health study designs, current analytic practices and real-world data sources. In Aim 1, we will develop Bayesian scalar-on-quantile- function regression models to estimate associations between aggregated health outcome and population-wide exposure distribution. This is motivated by the increasing availability of spatially-resolved exposure products. We will treat exposure quantile function as a functional covariate to better characterize health effect associated with shifts in different parts of the exposure distribution. In Aim 2, we will extend Bayesian additive regression trees (BART) models to estimate heterogeneous associations to handle matched case-control data that arise from the case-crossover design. BART offers a strategy to simultaneously consider multiple effect modifiers due to its ability to capture nonlinear and complex interactions. We will also develop tools to visualize and summarize heterogeneous associations. In Aim 3, we will develop Bayesian Gaussian process regression to estimate joint effects environmental exposure mixtures. We will characterize the exposure-response surface by providing a more computationally practical and robust framework via random Fourier feature approximation. In this framework, we will also examine how to perform variable selection/dimension reduction, handle missing data, and distinguish differences between subgroups. In Aim 4, we will create R packages to handle continuous, binary, count, and clustered outcomes to facilitate adaptation of these methods by the wider scientific community. Each methods development area will be paired with motivating and ongoing epidemiologic studies. These include (1) estimating short-term health effects of wildfire-related air pollution and ambient temperature on emergency department (ED) visits in multiple U.S. states, (2) identifying heterogeneous effects of temperature on heat-related ED visits due to individual comorbidity, medication, and residential built environment using electronic health data from Emory Healthcare, and (3) estimating associations between various environmental exposures and adverse pregnancy outcomes in deeply phenotyped African-American mother-child dyads and among deliveries at Emory Healthcare. Overall, this project will provide timely analytic methods motivated by methodological gaps, emerging data sources and research priorities in environmental health. Our Bayesian approaches to distributional covariate, tree-based models for matched case-control data, and fast Gaussian process regression may also be appealing and relevant in other biomedical and public health research areas.