Health equity and the impacts of EHR data bias associated with social determinants - Project Summary / Abstract Achieving optimal health in the United States is challenging, in part due to inequities in social determinants of health (SDoH) like financial security, experiences of discrimination, and healthcare access. These biases may manifest in the data collected during health care in electronic health records (EHRs) and, in turn, be propagated in research and healthcare activities that use those data. In other words, real-world data will reflect real-world biases and inequities. A biased healthcare system will produce biased data. Analyses performed with biased data will produce biased results. The end result is that without appropriate understanding and intervention, these biases will perpetuate themselves, ultimately furthering inequity in health and healthcare. Increasingly, healthcare delivery has become reliant on clinical risk prediction and risk assessment algorithms that use EHR data to help identify patients who are at-risk, allocate health system resources, and inform healthcare decisions. Even if these algorithms are designed to be equally valid for all patients, if they are applied to biased data the results will also be biased. In order to improve equity in health and healthcare, it is vital that we understand biases in EHR data that are associated with social determinants of health and develop methods that can ensure that risk prediction algorithms produce valid results for all patients. Therefore, the objectives of the proposed work are to: 1) Characterize the patterns of bias in EHR data 2) Identify latent and observed factors that drive mechanisms of poor data quality 3) Evaluate the impact of data bias on clinical tasks that rely on EHR data 4) Evaluate structural modeling and debiasing methods to improve analyses conducted with EHR-derived datasets that contain bias. We will be working with data from OCHIN, a large community-based practice network, which provided care for approximately 1.8 million unique patients between 2018 and 2020. First, we will identify associations between SDoH and EHR data quality. Second, we will evaluate the accuracy of a set of representative clinical risk prediction and risk assessment algorithms to characterize the relationship between EHR data quality, algorithm performance, and SDoH. Finally, using structural models and the relationships defined in the first two aims, we will model the performance of clinical risk prediction and assessment algorithms in the EHR, and we will examine strategies for incorporating SDoH information to improve their accuracy and support appropriate clinical decision-making at the point of care.