Understanding causes of autism, its heterogeneity, and rising prevalence with emerging methods in causal inference - Abstract Over the past five decades, autism prevalence in the United States has risen from fewer than 1 in 2,000 children to approximately 1 in 31. Over the same period, early-life exposures—such as parental age, maternal diabetes, and prenatal complications—have changed dramatically. These parallel trends raise urgent questions: Are these early-life factors contributing to the increase in autism diagnoses? And how might these factors shape outcomes that are meaningful to autistic individuals and families? This project addresses these questions by applying modern causal inference methods to large, population-based datasets. Unlike traditional regression models, which estimate associations, causal models allow researchers to simulate what would happen if a single exposure were changed while holding all else constant. This enables valid estimation of both how much a risk factor contributes to autism diagnosis and related outcomes—such as communication or emotional regulation—and how much of the increase in autism prevalence can be attributed to historical shifts in these factors. By combining these models with flexible, data-driven approaches that capture the heterogeneity in autism, the project may also identify distinct pathways by which early-life factors may lead to different autism phenotypes. The study has four specific aims. First, researchers will prepare two complementary datasets for causal analysis: (1) the Study to Explore Early Development, a multisite case-control study led by the CDC that includes extensive behavioral, developmental, perinatal, and biological data; and (2) electronic health records from a large Midwestern health system linked to birth certificates. Second, causal models will be used to estimate the effect of early-life exposures on autism occurrence. Third, exposures will be jointly modeled with autism phenotypes to uncover causal pathways that may be obscured by traditional analytic approaches. Fourth, these models will be applied to longitudinal data to estimate the portion of the increase in autism prevalence attributable to historical shifts in early-life exposures. Findings will clarify which early-life factors have the strongest causal effects, which developmental outcomes they influence, and the extent to which they may have contributed to rising autism prevalence. Results will inform strategies for intervention and early identification and provide a foundation for prioritizing biological mechanisms and modifiable exposures. Code and models will be publicly released in a modular, template-based format to enable replication across datasets such as CHARGE, SFARI, All of Us, and TriNetX. The research team brings expertise in autism epidemiology, causal inference, clinical informatics, and participatory methods and includes individuals with lived experience of autism, ensuring that the work is rigorous, actionable, and grounded in the perspectives of the autism community.