PROJECT SUMMARY
This application is being submitted in response to the (NOSI) identified as NOT-CA-22-056.
Background. The specific aims of the parent grant (RF1AG071024) are to estimate the risk of mild cognitive
impairment (MCI) and Alzheimer’s disease (AD) and AD-related dementias (ADRD) associated with wildfire
particulate matter (PM2.5) (Aim 1), to identify individual- and area-level susceptibility factors that exacerbate the
association between wildfire PM2.5 and MCI and AD/ADRD (Aim 2), and to estimate the risk of MCI and AD/ADRD
associated with living near a wildfire disaster and the extent to which specific sub-groups have better or worse
outcomes (Aim 3).
As part of the work conducted in Aims 1 and 2 of the parent R01, we are modeling daily exposure to wildfire-
specific PM2.5 levels using a two-stage machine learning (ML) approach. We have curated and processed a large
quantity of data from a range of sources including weather variables, satellite data, and Environmental Protection
Agency (EPA) monitor data, in order to model wildfire specific PM2.5 levels. While we have expended
considerable effort on the data curation, we have not focused on making the data Artificial Intelligence (AI)/ML
ready and publicly available, both for our own researchers and for the broader research community. The data
sources required for effective wildfire analysis are disparate, not very accessible, and unfriendly to AI/ML
applications. Although the data is rich and publicly available through US agencies, acquiring it and preparing it
for analysis presents a significant investment for any researcher.
Overall Goals and Aims. With this administrative proposal, we plan to establish a new collaboration with AI/ML
and data experts at Harvard University with the goals of improving the vast and wide range of data sources,
developing reproducible pipelines, annotating, documenting, and processing the data, ensuring computational
scalability, encouraging community engagement, and disseminating these important AI/ML ready datasets for
the prediction of wildfire PM2.5 to a wider research community. Our specific aims are to improve the data for
AI/ML readiness (Aim 1), make the data publicly available to AI/ML applications (Aim 2), and demonstrate the
transformed data in an AI/ML application to predict wildfire PM2.5 exposure for California (Aim 3).
Impact. The final datasets will be AI/ML ready, reproducible, and disseminated to a wide user base. We will build
a collaborative environment allowing both internal and external researchers to use, contribute, and improve the
data inputs. This work will serve as a foundation for our group in the prediction of wildfire PM2.5 exposures for
the whole US and for the community and will strengthen the aims of the parent R01.