Data science approaches to autism: environmental modulators of the transcriptome and gene-x-environment interactions - ABSTRACT The prevalence of autism spectrum disorder (ASD) has dramatically increased in recent years to a rate of 1 in 31 U.S children. ASD risk is multifactorial, with genetic and environmental elements each playing a role. While major progress has been made in identifying causal genetic variants, environmental— particularly prenatal—exposures are increasingly recognized as critical contributors to neurodevelopmental risk. Over 200 high-production volume chemicals are routinely detected in American adults, including pregnant women. The ways in which environmental factors contribute to ASD risk (and how they interact with genetic risk) remain unclear, however. Understanding how these environmental chemicals influence neurodevelopment at the cellular and molecular level is crucial for development of mitigation strategies. Progress has been hampered by lack of large-scale data assessing the impact of chemicals at a cellular and molecular level in human neural cells that faithfully model early human brain development. A major barrier to progress has been the absence of scalable experimental platforms capable of evaluating compound exposures in ASD-relevant cell types and with comprehensive molecular readouts . Most existing studies rely on immortalized cell lines, which do not faithfully model human brain development. Here, in this project: “Data science approaches to autism: environmental modulators of the transcriptome and gene-x-environment interactions,” we address this substantial gap in the field by capitalizing on recent advances in genomics, stem cell technology, and data science. We will assess the activity of thousands of chemicals in the primary cell types implicated by ASD genetic risk, human neural progenitor cells and neurons. We leverage human stem cell-based model systems, which have been shown to accurately and reliably model key aspects of human brain development. We couple this with high-throughput culturing and robotic systems, which provide a unique opportunity to efficiently screen the entire ToxCast II/III library of ~4,700 chemical compounds to which humans are exposed, in both human primary neural progenitors (phNPCs) and induced neurons (iNeurons). We use an unbiased, genome-wide measurement at the cellular level, single-nucleus RNAseq, to comprehensively understand the effects that chemical exposures have on human neural development and to study the impacted gene regulatory networks and pathways. We will integrate this with public exposure data and molecular profiling in the human brain to find overlapping pathways. We will then evaluate how genetic background modulates cellular responses by using cell villages to model gene–environment (GxE) interactions across a cohort of 150 donors, including 110 individuals with ASD and 40 neurotypical controls. Using this framework, we will map exposure- responsive expression quantitative trait loci (eQTLs) for more than 40 chemicals with a novel method developed by one of the PIs, enabling the identification of GxE interactions at scale. We will integrate these data with publicly available data sources (e.g. PsychENCODE and ECHO). We will model the biophysical interactions between relevant chemicals and proteins, to provide molecular insights into the underlying mechanisms of action – namely, the means by which compounds induce systemic gene expression via perturbations in gene regulatory networks. We will build a public resource that will enable further mechanistic investigation. Our project will: 1) provide insight into mechanisms of action underlying known autism risk factors, 2) identify novel chemical risk factors that impact essential pathways in early brain development, and 3) query how genetic factors modulate susceptibility and resilience to chemical exposures, which will be comprehensively integrated with existing databases and resources.