For decades, diagnostic errors have constituted a blind spot in the effort to improve health care quality.
Compared with the multitude of metrics available to assess the quality of treatment, clinicians and policymakers
have few tools with which to measure and improve the quality of diagnostic decisions. Without better methods
to systematically measure the quality of diagnostic decisions at the clinician level, it will continue to be difficult to
identify patterns in diagnostic errors, categorize types and causes at scale, and develop and evaluate
interventions to prevent them. Our long-term goal is to develop tools to measure diagnostic quality across clinical
providers from large-scale data, and to build frameworks and knowledge to translate those measures into
appropriate interventions. The objective of this application is to apply and validate a system for measuring
diagnostic quality across radiologists in the setting of pneumonia diagnosis among 5.5 million visits with chest
X-rays in Veterans Health Administration (VHA) emergency departments (EDs). In this project, we will address
three challenges fundamental to any data-driven approach to measuring quality of diagnostic care. The first is a
lack of observable ground truth against which to benchmark diagnoses, particularly in large-scale data. This
challenge is particularly problematic when policies seek to balance type I errors (false positives) against type II
errors (false negatives). Second, rates of diagnostic errors depend on the underlying prevalence of disease in
the patient population, which may be incompletely observed. Third, small case numbers per clinician can
complicate comparisons between clinicians, since measured differences may reflect underlying diagnostic
quality or may arise from random noise. We will address these challenges with a novel combination of methods
from statistical classification and applied economics, building on prior work. We propose the following specific
aims: (1) We will validate data-driven measures of pneumonia diagnoses and diagnostic outcomes. In prior
conceptual work building on the econometric literature of selection, we show that we may infer relative
differences in diagnostic quality—as differences in type I error rates and type II error rates—even if individual
type I errors are unobservable, under quasi-experimental assignment of cases to radiologists; (2) We will
interpret provider-level rates of type I error and type II error in a receiver-operating curve (ROC) framework in
which diagnostic errors may arise from incorrect diagnostic thresholds (trading off type I and type II errors) or
poor diagnostic accuracy (incurring both too many type I errors and type II errors); and (3) To explore the
determinants of clinician diagnostic quality, we will correlate our measures of radiologist diagnostic quality with
their characteristics and actions across thousands of radiologists. To assess the potential consequences, we will
study health outcomes of patients quasi-experimentally assigned to radiologists of differing diagnostic quality.
Our project will lay the groundwork for data-driven measurement of diagnostic quality across clinical providers,
a necessary first step in understanding and improving the diagnostic performance of our health care system.