Abstract
Intrinsically disordered proteins and disordered regions (collectively termed IDPs) perform vital biological
functions in transcriptional regulation, cell differentiation, and DNA condensation. IDPs rapidly interconvert
between different conformations, imparting plasticity, forming transient contacts and promoting allostery. IDPs
also participate in phase transitions, forming liquid droplets. The droplets facilitate diverse biological processes
that require localization in different regions in the cell. Yet, principles for understanding how a protein's
sequence shapes its ensemble of disordered conformations to perform its function and to promote phase
separation are still lacking. While the simple metric of amino acid composition explains broad conformational
features (radius, scaling exponents) and trends, minor variations in sequence, caused by post-translational
modifications (PTMs)/mutations can drastically alter disordered conformations and their functions. IDPs also
elude traditional sequence alignment tools to classify functionally similar proteins across species.
We propose to build a novel computational framework based on physico-chemical principles to describe the
ensemble of disordered conformations for IDPs with arbitrary sequence. To understand how PTMs/mutations
couple with diverse solution conditions to alter IDP conformation and the propensity of IDPs to phase separate,
we need computationally efficient models. The models must be capable of handling the combinatorial
challenge of analyzing multiple sequences and their variants due to preferential mutations/modifications,
alternate splicing under diverse conditions. The same challenge is faced when seeking evolutionary signatures
of multiple sequences across different species. An integrated approach combining polymer physics, all-atom
simulation, and multiple experiments will build coarse-grain models for such high-throughput analysis.
The proposed theoretical approach will i) provide guidance to determine how IDP conformations differ in vitro
and in vivo, ii) harness limited data (smFRET between specific probes) to make predictions for distances
between arbitrary residue pairs and iii) build a rigorous framework for comparing residue-pair specific
interaction parameters between different force fields and experiments, and suggest improvements, if needed.
The computationally efficient formalism will be applied at a large scale to provide a detailed description of
conformational ensembles, including residue-pair specific distance maps (beyond simple observables as
radius of gyration, end-to-end-distance, scaling exponents) for sets of disordered proteins to understand
functional similarities/dissimilarities, not possible by sequence alignment alone. The formalism will also quantify
IDP's susceptibility to chemical modifications/mutations, and environmental changes (pH, salinity) to alter
conformations, function and promote or suppress phase separation propensities in IDP solutions.