PROJECT SUMMARY
Management, treatment, and diagnostic approaches for non-small cell lung cancer (NSCLC) have evolved in the
last decade from primarily empirical methodologies to objective strategies that rely on clinical characteristics of
the patient and morphological features of the nodule1. Recent recommendations by the United States Preventive
Service Task Force (USPSTF) recommends that high-risk individuals be screened yearly with low-dose
computed tomography (LDCT), as this screening practice provides high sensitivity with acceptable specificity for
lung cancer2. However, the introduction of LDCT as the primary screening modality for lung cancer has increased
the identification of indeterminate nodules. The increased detection rates caused by this screening practice
decreases the overall quality of life for at-risk individuals through repeated follow-up and the frequent need for
invasive procedures for what is likely a benign nodule. In this training grant, we aim to improve upon these
outcomes by improving the performance of deep neural networks (DNNs) in data-scarce domains, specifically
lung cancer. The overall hypothesis of this proposal is that DNN classification accuracy of indeterminate
lung nodules will be significantly improved through the use of pre-specified malignant nodule and
parenchymal morphological features that would not be readily extractable by a DNN directly from the
LDCT scans. We will address this hypothesis and achieve the goals of this proposal by augmenting the National
Lung Screening Trial (NLST) dataset to infer important morphological parenchymal features for malignant nodule
classification and by using ancillary data from the COPDgene dataset. The experiments proposed in Aim 1 will
explore the impact of using augmented morphological parenchymal features on the classification performance
of our deep neural networks. Aim 2 will explore the relative contribution of a contextually similar dataset,
COPDgene, for classification and parameter tuning. The proposed work will yield improved approaches for
classification of indeterminate pulmonary nodules as either malignant or benign via an innovative approach for
training DNNs using domain knowledge and contextually related datasets in data-scarce domains. Ultimately,
the application of these approaches will improve our understanding of those parenchymal morphological features
that are most critical for discriminating pulmonary nodules. In addition, the training grant I will receive in the
course of these studies related to generating CT markers, detecting early lung cancer pathogenesis, and
computational modeling will serve as a solid foundation for my future career as an independent biomedical
investigator.