PROJECT SUMMARY
The National Institutes on Aging (NIA) has recommended strengthening research infrastructures to address
future aging research questions (2016 Data Infrastructure Review Committee Report and PAR-16-367). In
particular, they recommend: 1) integrating biological data into larger population-based studies; 2) increasing
use of electronic health record (EHR) data and linking to medical care claims data; and 3) developing new
approaches to collecting data to answer important scientific questions about mechanisms of aging.
The Rochester Epidemiology Project (REP; NIA R01 AG034676) is a unique infrastructure for studies of aging,
because the REP collects longitudinal EHR data on all health conditions that come to medical attention for a
large, Midwestern population. Therefore, the REP allows investigators to study all age-related diseases and
outcomes. However, the REP has three significant gaps. First, the REP does not include biospecimens.
Second, the REP is missing health care delivered outside of the health care institutions that partner with the
REP, and it does not include information on filled prescriptions. Third, a significant proportion of EHR data is
difficult to access due to two factors: 1) the full text of the EHRs includes extensive clinical notes about aging
outcomes and geriatric syndromes, but these notes are not routinely coded for billing, and can only be
accessed through laborious manual review; and 2) the REP health care partners use three different EHR
systems, making it difficult to apply electronic data extraction tools across all partners.
To address these three gaps, we will develop an interdisciplinary collaboration across experts in aging
research, epidemiologic methods, biobanking, and medical informatics to create a new, comprehensive
research infrastructure (“Bio-REP”) to support aging research. In the R21 phase, we will develop a
comprehensive research infrastructure that combines the REP data with Mayo Clinic Biobank biospecimens,
medical claims data from the Centers for Medicare and Medicaid Services (CMS; Aim 1), and geriatric
syndrome data that are included in the unstructured EHR clinical notes using Natural Language Processing
techniques (NLP; Aim 2). In the R33 phase, we will deploy NLP algorithms developed in Aim 2 in the clinical
notes from two additional EHR systems (Aim 3), and we will conduct two demonstration projects. First, we will
measure associations between novel aging-related biomarkers and aging-related outcomes (Aim 4). Second,
we will determine whether two common medications that are hypothesized to impact aging (metformin and
angiotensin receptor blockers) modify associations between aging biomarkers and aging outcomes (Aim 5).
The new, robust Bio-REP infrastructure will support a wide range of efficient, cost-effective observational
studies to characterize associations between aging-related biomarkers and specific diseases, geriatric
syndromes, and drug utilization. Such studies are urgently needed to design effective clinical trials to improve
the health span of the aging population.