PROJECT SUMMARY/ABSTRACT:
A key remaining gap in our understanding of biological systems at the molecular level is how to structurally
annotate the “dark” protein families—the portion of protein families unsolved by experimental structure
determination techniques and inaccessible to homology modeling. Nearly a quarter of protein families are
currently dark, where molecular conformation is completely unknown and this gap is likely to expand further
with the rapid accumulation of new protein sequences without annotated structures. The key challenge is now
how to bridge this gap to gain a comprehensive understanding of biology and disease, thereby paving the way
to structure-based drug design at genomic scale. Computational protein modeling plays a key role in this effort
due to its scalability and genome-wide applicability. My laboratory focuses on the development and application
of novel data-driven computational modeling and refinement methods to increase accuracy and coverage of
protein structure prediction on genomic scale irrespective of homology. Future research focuses on improving
homology-free protein folding using multiscale de novo modeling driven by deep learning-based inter-residue
interactions, enhancing low-homology threading or fold recognition by formulating new algorithms for remote
template identification despite low evolutionary relatedness, and developing methods for high-resolution
restrained structure refinement guided by generalized ensemble search for driving computational models to
near-experimental accuracy. Proteome-wide computational modeling and refinement effort will be conducted,
leveraging our unique access to large-scale supercomputing infrastructure, to build high-confidence models
covering the dark protein families, which will be organized in a database for public access. This comprehensive
database of structural annotations will shed light on the structures, functions, and interactions of the dark
proteome, with broad implications in drug discovery and human health. Software and web servers will be freely
disseminated to help worldwide community of biomedical researchers to apply these methods to their specific
research problems, thus multiplying the impact of computational modeling on basic research in biology and
medicine. My research program will involve close collaborations with other NIGMS-supported investigators,
create training opportunities for the next generation of researchers including members from underrepresented
groups, and foster future research advances in structural bioinformatics and computational biology.