Computational methods for characterizing sources of variability in drug response - Project Summary/Abstract Drugs are approved by regulators because they are relatively safe and effective. However, there are typically unanswered basic science questions about the detailed mechanisms of action, the impacts of genetic and epigenetic variation, and the full range of phenotypic responses. This missing knowledge often leads to reduced efficacy and increased adverse events. My overall scientific goal is to generate more complete understanding of the mechanisms of drug response and their sources of variation, in order to enable more precise drug therapies. The last two decades has seen an explosion of data relevant to drug response. We have abundant data about human genetic variation and gene expression (and other omic) profiles that illuminate key cellular pathways in disease. Advances in protein 3D structure prediction provide useful models for most proteins, which enable proteome-wide screening for off-target drug interactions. Biobanks, electronic medical records (EMR), FDA adverse event databases and medical claims data provide clinical information and environmental exposures. These data have biases and blindspots. My lab has a track record creating methods for analysis of all these critical data types. We focus on computational/statistical approaches that integrate data at all scales, thus reducing the biases within individual scales. In 2000, we created the Pharmacogenetics Knowledgebase (PharmGKB) which curates information about how human genetic variation influences variation in drug response. PharmGKB has high quality information for 100s of drugs and genes, but pharmacogenetics typically explains far less than ~50% of variation in drug response. I hypothesize that a large fraction of the remaining variation can be explained by unknown off-targets, undiscovered pathways of drug response, genetic and epigenetic differences in expression, and differences in environment and disease physiology. Thus, my proposed work focuses on computational methods that use publicly available data to answer five driving questions: (1) What are the full set of clinical responses to drugs, alone and in combination? (2) What are the molecular targets (particularly off-targets) that are modulated by a drug? (3) What are the pathways that modulate drug response? (4) How does genetic variation in targets/pathways lead to variation in drug response? (5) How do epigenetics create variability in drug response? We will evaluate our methods with independent, held-out gold standard data sets (to establish quantitative statistical performance), and collaborate with experimental colleagues to validate key novel hypotheses. We will focus on genes and pathways that are critical in drug response for under-studied diseases and in under-studied populations.