Integration of epidemiology, pathology, immunology and outcomes in colorectal cancer - ABSTRACT
Machine learning has the potential to transform pathologic diagnosis and to address very limited accessibility of
expert pathology in low-income countries. Routine histology images of solid tumors contain an immense number
of visual features that can be extracted and processed by artificial intelligence tools like machine learning, which
excels at basic image analysis tasks such as tumor detection. In addition, machine learning can also predict
clinically relevant features directly from histology images including microsatellite instability and immune features
that independently predict prognosis response to therapy. This large, multicultural, racially and ethnically diverse
study uses images of whole slides from routinely collected clinical specimens and applies computational
pathology methods and digital spatial expression profiling to quantifiably improve CRC diagnosis, prognosis and
predictive models together with clinical, epidemiologic and genetic data. The study goals will be accomplished
through three specific aims. In Aim 1, we will apply novel machine learning algorithms from whole slide images
to reproducibly identify MSI, histopathologic and immune features of colorectal cancer in racially/ethnically
diverse populations. We will study H&E slides from 6,751 CRC cases, digitizing existing slides from 5,551 CRC
cases and 1,200 new cases of CRC with contemporaneous clinical and epidemiologic data. Then, we will apply
deep learning methods to accurately identify histopathologic features and immune characteristics of CRC. We
will use a robust training validation, and testing design (70%/15%/15%) to ensure the rigor and reproducibility of
our findings. In Aim 2, we will test whether machine learning algorithms that predict MSI and immune features
related to CRC prognosis improve with the addition of clinical, epidemiologic, and germline genetic data. We will
use machine learning statistical methods to test whether algorithms developed in Aim 1 improve prediction of
overall survival and response to therapy with the addition of supplemental information beyond whole slide digital
images. Finally, in Aim 3, we will compare the information derived from digital spatial profiling of expressed
proteins in colorectal tumors with the information derived from Immunoscore quantification of lymphocyte
populations at the tumor center (CT) and the invasive margin (IM), and explore whether these measures improve
the models developed in Aims 1 and 2 in a subset of samples. We will perform GeoMx digital spatial profiling of
56 proteins expressed in 150 Stage I-III TNM colorectal cancers to compare the performance of digital spatial
profiling to Immunoscore, a scoring system relying exclusively on expression patterns of CD3+ and CD8+ T cells.
This study takes advantage of pathologic, epidemiologic, clinical, immunologic and germline genetic data from
racially/ethnically diverse CRC patients from California, Detroit, New York, Florida, Puerto Rico, Israel and Spain.
Our overarching goal is to improve the efficient diagnosis of colorectal cancer with clinically impactful immune
profiles.