Abstract
Human cancer is a dynamic disease that develops over an extended time period through the accumulation of a
series of genetic alterations. Delineating the system dynamics of disease progression can significantly advance
our understanding of tumor biology, and lay a critical foundation for the development of improved cancer
diagnostics, prognostics and targeted therapeutics. Traditionally, system dynamics is approached through
time-course studies achieved by repeated sampling of the same cohort of subjects across an entire biological
process. However, due to ethical and economic constraints, it is not feasible to collect time-series data to study
human cancer, and typically we can only obtain profile data from excised tumor tissues. Consequently, while
major efforts continue to reveal the genomic events associated with human cancer, to date, it has been difficult
to put the identified changes in the context of the dynamic disease process. With the rapid development of
sequencing technology, many thousands of static tumor samples are being collected in large-scale cancer
studies. This provides us with a unique opportunity to develop a novel analytical strategy to use static data,
instead of time-course data, to study disease dynamics. Built logically on our previous work, we propose a
large-scale interdisciplinary research plan to develop a series of novel methods that enable the construction of
high-resolution cancer progression models by using massive static data, the identification of pivotal molecular
events that drive stepwise disease progression, and the visualization of identified changes in a cancer
development roadmap. If successfully implemented, this work can effectively overcome the existing sampling
limitations, and open a new avenue of research to study cancer dynamics by using vast tissue archive, instead
of performing resource-intensive or impractical time-course studies. The developed methods will be intensively
tested on 27 breast cancer datasets comprised of ~9,000 samples. To our knowledge, no prior work has been
performed on this scale to study breast cancer dynamics. The analysis will result in the first working model of
breast cancer progression constructed by incorporating all genetic information. The constructed model can
provide a foundation for the visualization of key progressive molecular events and facilitate the identification of
pivotal driver genes and pathways and potential points of susceptibility for therapeutic intervention. Moreover,
interrogation of the constructed model will enable us to test novel hypotheses in silico and to prioritize
resources for more focused and detailed investigations experimentally. We expect that our work will have a
broad impact. Although in this study we focus mainly on breast cancer, the developed methods can also be
used to study other cancers and other human progressive diseases, where the lack of time-series data to study
system dynamics is a ubiquitous problem.