Realization of precision medicine ideas requires an unprecedented rapid pace of translation of biomedical
discoveries into clinical practice. However, while many non-canonical disease pathways and uncommon drug
actions, which are of vital importance for understanding individual patient-specific disease pathways, are
accumulated in the literature, most are not organized in databases. Currently, such knowledge is curated
manually or semi-automatically in a very limited scope. Meanwhile, the volume of biomedical information in
PubMed (currently 28 million publications) keeps growing by more than a million articles per year, which
demands more efficient and effective biocuration approaches.
To address this challenge, a novel biocuration method for automatic extraction of disease pathways from
figures and text of biomedical articles will be developed.
Specific Aim 1: To develop focused benchmark sets of articles to assess the performance of the biocuration
Specific Aim 2: To develop a method for extraction of components of disease pathways from articles’ figures
based on deep-learning techniques.
Specific Aim 3: To develop a method for reconstruction of disease-specific pathways through enrichment
and through graph neural network (GNN) approaches.
Specific Aim 4: To conduct a comprehensive evaluation of the pipeline.
The overarching goal of this project is to develop a computer-based automatic biocuration ecosystem for
rapid transformation of free-text biomedical literature into a machine-processable format for medical
The overall impact of the proposed project will be to significantly improve health outcomes in
individualized patient cases by efficiently bringing the latest biomedical discoveries into a precision
medicine setting. It will especially benefit cancer patients for which up-to-date knowledge of newly
discovered molecular mechanisms and drug actions is critical.