Major barriers to chronic graft-versus-host disease (cGVHD) research and preemptive treatment are the
inability to predict early following allogeneic hematopoietic cell transplantation (HCT), who will develop cGVHD,
and lack of specific and sensitive risk biomarkers of cGVHD before onset is detectable by clinical symptoms.
This project will use already collected plasma and PBMCs samples from BMTCTN 0201, 1202 and multicenter
pediatric and adults studies (NCT00075816, NCT01879072, and NCT02194439) and the Pasquarello tissue
bank at the Dana–Farber Cancer Institute to analyze proteomic and cellular signatures associated with
impending onset of clinical cGVHD, and overall survival using machine learning (ML) versus established
statistics. Proposed markers are based on previous published and unpublished studies and will include other
novel or hypothesized factors. We will use the tow BMT CTN and NCT02194439 biorepositories with sample
size totaling ~1300 HCT patients (669 cGVHD in comparison to 664 non-cGVHD controls) at day +90 post-
HCT and 14 plasma proteins [Stimulation 2 (ST2; the interleukin (IL)-33 receptor), chemokine (C-X-C motif)
ligand 9 (CXCL9), matrix metalloproteinase 3 (MMP3), osteopontin (OPN), and C-C motif chemokine 15
(CCL15), CD163, CXCL10, IL17, BAFF, B7H3, DKK3, IL1RACP, MCSF, CCL5] as well as 35 markers on 10+
populations totaling up to 300 parameters in a cohort of 200 patients with available PBMCs and paired plasma
at day +90±10 post-HCT with mass cytometry. We will then be in a unique position in the field of cGVHD to
address major questions: (a) Are plasma biomarkers or cellular biomarkers or the combination of both more
amenable to provide better specificity/sensitivity? (b) Can we increase sensitivity and specificity of cGVHD
biomarkers panels by using ML statistics? (c) Can we discover new key biologic drivers of cGVHD using ML
algorithms? As ML techniques are likely to provide better prediction when large amount of data with high-
dimensional covariates and nonlinear relationships are used, we hypothesize that ML analysis will increase
sensitivity and specificity of our panels as well as increase biology granularity. Specific Aim 1 will address if a
day-90 fourteen-plasma biomarker panel on 1300 patients’ samples, using ML, predicts risk of cGVHD with
higher specificity and sensitivity than established statistics. Specific Aim 2 will address if a day-90±10 thirty-
five-cellular biomarker panel, using single-cell mass cytometry and ML, is predictive of development of cGVHD
in a 30 cases vs 30 controls discovery cohort. Specific Aim 3 will address if a comprehensive day-90±10
proteomic biomarker panel only, or cellular biomarker panel only, or a combined proteomic and cellular
biomarker panel in a validation cohort of 200 paired plasma/PBMCs samples, will improve prediction of cGVHD
risk. Upon completion, these studies will result in novel biomarker panels that may facilitate cGVHD risk
stratification for HCT patients and identify candidates for new preemptive approaches.