PROJECT SUMMARY/ABSTRACT
The long-term objective of this research is to provide a sustainable data infrastructure and web-based portal
to facilitate research into the environmental causes of disease. In this proposal we perform innovative early
stage development of technologies to demonstrate the utility of the proposed data and computing infrastructure.
Public health research about the environmental causes of disease is increasingly focused on multi-pollutant
exposures that occur over the lifespan. Compiling, managing and analyzing these data about environmental
quality for use in epidemiologic studies, however, requires significant effort and experience with computer
science, biostatistics, and exposure analysis, which may not be available to all researchers. To address this
challenge, we have specific aims to 1) develop a sustainable data infrastructure that integrates and
interpolates multiple environmental exposure data items to generate daily values on a 5-km spatial grid; 2)
develop an interactive web-based portal as the gateway for public health researchers to access and analyze
environmental quality data and epidemiologic data by integrating functionalities of data mining, analytics,
visualization, validation and dissemination; and 3) substantiate the utility of the data infrastructure and portal
for public health research though pilot studies with use cases in environmental epidemiology research. The
primary analysis methods in environmental epidemiology studies can be implemented by integrating
commercial and open source software modules to analyze exposure data with time and spatial dimensions.
Open source JavaScript libraries will be applied to enhance online data visualization capability. Meanwhile,
ArcGIS server can be utilized to implement spatial statistics over exposure and health outcome data. A special
layer will be implemented to enhance data privacy in online data integration and analytics. This advanced
computing platform also has the capability of uploading environmental quality data (such as generated by local
sensor networks) and health outcome data (individual or ecological data) for integration into the data
infrastructure. We will build the initial data infrastructure with temperature, criteria air pollutant, land use, traffic
network and census data; and will add other variables as time permits. The research team has expertise in
data mining and privacy research, user design and software engineering, distributed and Internet GIS, and
high performance geocomputation. Inclusion of public health researchers with expertise in environmental
epidemiology and exposure science will help to ensure that the design and functionality of the tools meet the
needs and expectations of public health researchers. We will test the functionality of the portal using varieties
of data, and test the usability of the portal with several pilot-scale epidemiologic studies. The results of which
will be used to develop user guidance documents and to disseminate this research through workshops.