Geostatistical Software for Non-Parametric Geostatistical Modeling of Uncertainty - 7. Project Summary/Abstract
A key component in any investigation of association and/or cause-effect relationships between the
environment and health outcomes is the availability of accurate and precise models of exposure. Because the
cost of collecting field data is often prohibitive, it is critical to incorporate any source of secondary information
available to supplement sparse datasets. Secondary data can take many forms (e.g., continuous or categorical
measurement scale), and display different levels of reliability: hard vs soft data (e.g., interval-type data,
probability distributions). Merging these different data layers while accounting for their spatial patterns,
compositional nature (case of categorical attributes) and local uncertainty is thus challenging. With the advent
of artificial intelligence (AI), particularly machine learning (ML), geostatistical predictive models have become
more sophisticated and effective. The marriage of geostatistics and AI empowers us to extract deeper insights
from spatial datasets, opening doors to predictive modeling, risk assessment, and optimized decision-making.
This SBIR project is developing the first commercial software to offer tools for soft indicator coding and non-
parametric geostatistical modeling of uncertainty, leveraging AI to analyze, interpret, and derive insights from
spatial data. The research product will be a stand-alone desktop (ST) analysis and visualization tool, building
on the legacy core software developed by BioMedware. These tools will be suited for the analysis of data
outside health sciences, such as in remote sensing, geochemistry, urban infrastructure or soil science,
broadening significantly the commercial market for the end product. This project will accomplish three aims:
Conduct further research developments to: 1) extend the new approach (quantile regression forest with
kriged data layer) developed in Phase I to include additional ML algorithms (i.e., support vector machines,
gradient boosting) and spatial data layers (e.g., eigenvectors of distance matrix) in the comparison study,
and 2) generalize cross-validation and performance measures to the multivariate case.
Complete a fully functional and tested soft indicator coding and ML geostatistical interpolation software
product ready for commercial distribution.
Conduct a formal usability study to evaluate the design of the prototype based on usability protocols
developed by the NIH involving (i) expert evaluation by the firm Tec-Ed and (ii) usability testing by
representative users.
These technologic, scientific and commercial innovations will enhance our ability to model geostatistically
multivariate spatial phenomena and compute estimates and the associated uncertainty at the scale (e.g. point
location, census-tract level) the most relevant for environmental epidemiology.