PROJECT SUMMARY/ABSTRACT
Randomized clinical trials (RCTs) are the gold-standard method of evaluating cancer treatment, which has
immense health and economic burdens worldwide. However, practical considerations that allow an RCT to be
conducted typically require a relatively small sample size and restricted eligibility criteria such that the study
has inadequate power to generalize treatment effects to elderly patients or other under-represented patient pop-
ulations. On the other hand, massive real-world data (RWD) are increasingly captured by population-based
databases and registries, such as Surveillance, Epidemiology, and End Results (SEER), SEER-Medicare, and
National Cancer Database (NCDB), that have much broader demographic and clinical diversity compared to RCT
cohorts. Treatment evaluation using causal inference methods and RWD that were not collected purely for re-
search purposes is now frequently performed but fraught with limitations such as confounding due to lack of
randomization. In fact, the agreement between RCT and RWD ¿ndings is often low in the analysis of matched
RCT and RWD studies with the same treatment comparisons. Although several national organizations and reg-
ulatory agencies have advocated using RWD to complement RCTs, methods that integrate these two potentially
complementary data sources and achieve better treatment evaluation over the use of a single data source alone
have yet to be developed.
This proposal is motivated by the PIs' collaborative work to study the safety and ef¿cacy of treatment strategies
for elderly non-small cell lung cancer (NSCLC) and esophageal cancer patients by integrating data from multiple
sources: RCTs from NCI cooperative groups and the real-world databases (e.g. SEER, SEER-Medicare, and
NCDB). The objective of this project is to develop new statistical methods for integrative analyses of RCTs and
RWD that can improve the generalizability and increase estimation ef¿ciency of RCT ¿ndings to more diverse
"real-world" patients as well as under-studied populations while avoiding confounding bias inherent in RWD. In
Aim 1, we develop methods for statistical analysis of RCT data to compare chemoradiotherapy patterns for the
real-world and elderly NSCLC patients by leveraging the baseline covariates of comparable patients from SEER,
for whom the temporal information of chemotherapy and radiation and the outcome are both missing. Aims 2
and 3 focus on the settings when both RCT and RWD provide comparable covariates, treatment, and outcome
information. In Aim 2, we develop improved analysis of RCT data to evaluate trimodality therapy versus surgery
alone for the real-world and elderly esophageal cancer patients by exploiting the large sample size and predictive
power offered by the NCDB/SEER-Medicare. In Aim 3, we develop new ef¿cient and data-adaptive methods to
estimate individualized treatment effects of adjuvant chemotherapy versus observation, possibly modi¿ed by age
and tumor size, for stage IB resected NSCLC patients by integrating RCT and NCDB data.