Geostatistical Software for Non-Parametric Geostatistical Modeling of Uncertainty - 7. Project Summary/Abstract
A key component in any investigation of association and/or cause-effect relationships between the
environment and health outcomes is the availability of accurate and precise models of exposure. Because the
cost of collecting field data is often prohibitive, it is critical to incorporate any source of secondary information
available to supplement sparse datasets. Secondary data can take many forms (e.g., continuous or categorical
measurement scale), and display different levels of reliability: hard vs soft data (e.g., interval-type data,
probability distributions). Merging these different data layers while accounting for their spatial patterns,
compositional nature (case of categorical attributes) and local uncertainty is thus challenging.
This SBIR project is developing the first commercial software to offer tools for soft indicator coding and non-
parametric geostatistical modeling of uncertainty. The research product will be a stand-alone desktop space-
time (ST) analysis and visualization tool, building on the legacy core software developed by BioMedware.
These tools will be suited for the analysis of data outside health sciences, such as in remote sensing,
geochemistry, urban infrastructure or soil science, broadening significantly the commercial market for the end
product. This project will accomplish four aims:
Develop an indicator kriging alternative to Poisson and binomial kriging for filtering noise caused by the
small number problem and to disaggregate areal rate data (Area-to-Point kriging), while avoiding the
generation of negative kriging estimates.
Implement simplicial indicator kriging for predicting the probability of occurrence of categorical data and,
using the case of the composition of service lines (SL) in Flint Michigan, compare the accuracy of this
compositional approach to: 1) traditional indicator kriging that can result in negative probabilities of
occurrence and probabilities that do not sum to one, and 2) a combination of machine learning and
Bayesian data analysis used by BlueConduit, a US leader in SL composition prediction.
Develop and test a prototype module that will guide non-expert through the soft indicator coding of
information and variogram modeling, followed by the spatial interpolation and cross-validation based on
BioMedware’s space-time visualization and analysis technology.
Conduct a usability and user experience study and identify additional methods and tools to consider in
Phase II.
These technologic, scientific and commercial innovations will enhance our ability to model geostatistically
multivariate space-time phenomena and compute estimates and the associated uncertainty at the scale (e.g.
point location, census-tract level) the most relevant for environmental epidemiology.