Gaining access to health data is the major barrier in developing and validating new AI methods for clinical
applications, since health data are protected by strict privacy laws. A significant obstacle in current data
provisioning is that existing methods to access and deidentifying health data are increasingly being
challenged for their effectiveness, with a common perception that it is generally impossible to fully
deidentify any health data set and still retain utility for research purposes.
Synthetic data is a promising concept for solving this conundrum, by reconciling data innovation with
data privacy. The goal of synthetic data is to create an as-realistic-as-possible dataset generated from
existing data - one that maintains the statistical properties of the original dataset, but does so without risk
of exposing sensitive information. While synthetic data is not new in health care, so far it was limited to
simple, single-modality, static datasets, which severely affected its impact.
The aim of this interdisciplinary research effort is the development of an algorithmic framework for the
faithful and privacy-preserving generation of heterogeneous, dynamic synthetic datasets to boost the
development of clinical decision support applications.
In the US, critical illness effects a significant number of Americans per year with an estimated 4 million
admission and 500,000 deaths per year. A sizable proportion of the patients suffer respiratory failure
requiring intubation. To increase the utility of algorithms in clinical applications, like in the ICU, strategies
are needed to address barriers to use of complex data. Thus, the ICU is a prototypical setting where
high-quality synthetic data would be tremendously helpful to break through this data bottleneck, while
respecting health data privacy laws. However, ascertaining data to test and validate the algorithms is
difficult to obtain. As such, this project proposes to use a type of severe respiratory (lung) failure, acute
respiratory distress syndrome (ARDS) to study the use of synthetic data for the development of artificial
intelligence-based algorithms. Patients with ARDS experience substantial morbidity and mortality,
prolonged mechanical ventilation high hospital-associated costs, and long-term physical and psychological
dysfunction. Using ARDS as an archetypical model to guide this research effort will a ensure successful
transition from theory to clinical practice.
RELEVANCE (See instructions):
The results of this project will play a key role in advancing AI research in health, especially in areas of
high-risk, high-cost care such as the emergency department, operating room, and ICU. On a specific level,
the project will improve detection and treatment of the acute respiratory distress syndrome. On a broader
level, this effort will contribute to more cost-efficient health care while enabling improved patient treatment
outcomes.