PROJECT SUMMARY
The Song Lab consists of computer scientists, statisticians, and mathematicians who are fully committed to ad-
vancing biology. We develop ef¿cient computational tools and robust statistical methods to facilitate the research
of the broad biomedical community, while also getting deeply involved in data analysis to make new biological
discoveries. In particular, we have been making notable contributions to the ¿eld of population genomics, where
we have obtained signi¿cant theoretical results and developed useful inference tools that are generalizable to
complex models and scalable to big data. In the past ¿ve years, our research has branched out to other ar-
eas of genomics, including bulk and single-cell gene expression analysis; mRNA translation dynamics; structural
biology; immunology; and metagenomics.
Technological advances in sequencing and experimental assays have greatly increased the availability of
various kinds of genomic data, enabling us to catalog genetic and epigenetic variation in diverse populations,
and to probe fundamental biological processes (e.g., transcription and translation) in unprecedented detail. This
development is providing a number of new opportunities for basic and biomedical research, but often the data
are noisy and multifaceted, while the underlying biology is very complex, thus presenting both theoretical and
computational challenges for analysis and interpretation. New ef¿cient and robust statistical inference tools, as
well as theoretical analysis of mathematical models, are much in need of development to bring the promise of
the big data era in biology to full fruition. The central goal of our research program is to meet these important
challenges.
Over the next ¿ve years, we will continue to carry out basic research in both population genomics and computa-
tional genomics, and develop a suite of useful analytical tools, paying attention to sound mathematical modeling,
rigorous statistical estimation, and computational scalability. In particular, we will tackle several key technical
challenges in population genomics, and develop both likelihood-based and likelihood-free methods to enable in-
ference under more complicated, realistic models than previously possible. We will also develop novel inference
methods to analyze, integrate, and interpret various types of genomic data, and carry out theoretical analysis of
mathematical models to elucidate the intricate details of both transcription and translation processes. In addition,
we will continue to collaborate with empirical and experimental biologists to pursue basic research questions in
biology, as we have done fruitfully in the past.