PROJECT SUMMARY
There is a void of well-developed machine learning tools for the clinical hospital setting for patient monitoring
and diagnosis. This absence is particularly relevant for an intensive care unit (ICU), where structured and
unstructured data are continuously recorded on numerous aspects of the health status of each patient. The
methods that have been developed are predominantly exclusive to the research literature, and they are focused
on models/algorithms trained on a single data type such as continuous response vitals data or natural language
data. Likewise, the metrics to evaluate these machine learning tools are often focused on a single metric such
as out-of-sample prediction error. For machine learning tools to really become effective they need to be built
from models that are able to incorporate varying data types, simultaneously, for making inferences on the health
state of a patient, and they need to be evaluated on a variety of metrics. Some of these metrics must be precise
(e.g., out-of-sample prediction error and false negative/positive rate), but other qualitative metrics must also be
considered such as clinical utility/feasibility, scalability, and the practicality of the user interface to a clinician.
By analogy to hypothesis testing problems, there is an important difference between statistical signi¿cance and
practical signi¿cance. The proposed research is aimed at developing statistical methodology to address these
key aspects, and to engineer machine learning tools to be applied to hospital patient monitoring and diagnosis.
In particular, the focus is on rapid identi¿cation of critically ill patients at risk for bleeding and physiological de-
terioration such as shock. For the purpose of this research, shock is divided into four categories: hypovolemic
shock, distributive shock, neurogenic shock, and cardiogenic shock. Historical ICU patient encounter data is
gathered with numerous examples of patients exhibiting each of these health states, as well as a baseline en-
counters exhibiting no shock. However, the timeline and detection for clinician diagnosis of shock is not precise
and is not without error. Accordingly, training data labels are only ever partially available, and the developed
machine learning methodology will account for the semi-supervised nature of the problem. To make inference
on the shock-related health state of ICU patients the machine learning methodology will incorporate a variety of
response data types. These emitted responses include continuously monitored vitals data, laboratory results,
functional wave form data on blood pressure, unstructured text data on clinician and procedural notes, and typi-
cal cross-sectional data on medical history and demographic information. The data integration challenges from
building all of these responses into a single parsimonious model will be a strong contribution of the proposed
research. Additionally, the proposed research plan spans from the methodological development of the research
ideas to production-end software with a clinical user-interface.