Most health systems attempt to measure patients' social risk factors, but such data collection is typically fraught
with operational and conceptual difficulties. Multi-domain screening questionnaires face reliability, validity, and
workflow challenges. Area-level data are not valid proxies for individual characteristics. Diagnosis codes are
underutilized. The day-to-day use of natural language processing (NLP) to extract social factors from text is
beyond the capacity of most organizations. Thus, health care organizations need more implementable and valid
approaches to measuring social factors. With implementable and valid approaches, health systems will more
effectively address the negative cost, quality and health outcomes associated with patients' social risk factors.
The objective of this proposal is to assess the validity of patient-level computable social factor phenotypes for
use in predicting patients' risk of increased healthcare costs and utilization. Computable phenotypes are com-
posites of characteristics defined through single data elements or a collection of data elements, observations or
events. Because these phenotypes derive from existing healthcare operations and electronic data systems, they
are well-positioned for widespread implementation. Our central hypothesis is that phenotypes computed from
existing structured demographic, clinical, and business operations data will support equally or more valid infer-
ences about patient social risks than other measurement approaches. Building upon strong preliminary data and
direction from experts in the field, we will determine the validity and usefulness of six novel social factor pheno-
types computed from already collected information within EHRs and health information exchanges (HIE) through
the following aims: Aim 1, Assess the concurrent validity of patient-level computable social factor phenotypes,
compares the concurrent validity of computed phenotypes, multi-domain questionnaires, and NLP against gold
standard measures of social factors in two health systems. Aim 2, Assess the predictive validity of patient-level
computable social factor phenotypes, will assess the validity of computable phenotypes, multi-domain question-
naires, NLP, and combined approaches in predicting costs and utilization. Aim 3, Assess the reliability (bias) of
patient-level computable social factor phenotypes across patient gender, race, ethnicity, and age, assesses the
reproducibility of measurement approaches across underserved populations. We will employ a multi-method
research approach to identify and mitigate potential bias. This project will lead to more valid and implementable
approaches to patient social factor measurement. The proposed research is significant because it directly ad-
dresses the challenges organizations face in addressing patients' social risks and will provide key inputs to
support organizations efforts at achieving a learning health system. This proposal is innovative by advancing the
psychometrics of social factors and identifying novel usages of EHR and HIE data. By working with multiple and
diverse populations, we address the priority populations of socioeconomically disadvantaged, racial minority
populations, and the elderly.