Project Summary
In mammals, 5-methylcytosine is the most common form of DNA methylation, and the level of methylation of
some specific CpG sites shows a strong correlation with age. These correlations can be used to build machine
learning-based models that can accurately predict the age of biological samples. Because these models can
quantify age with very high accuracy, researchers have termed them epigenetic aging clocks (e.g., Horvath’s
pan-tissue epigenetic clock and Hannum’s blood-based epigenetic clock). However, the reliability of existing
epigenetic clocks is limited, as they are built based on pure correlations, and it is unclear whether age-
associated methylation changes are causal to aging-related phenotypes. A new generation of epigenetic
clocks built on causal information will be more reliable and can enable the possibility of large-scale screening
of anti-aging interventions.
For the F99 phase of this proposal, I performed epigenome-wide Mendelian Randomization to identify CpGs
potentially causal to aging-related traits. This causal information was then incorporated into epigenetic clock
models to build causality-informed aging clocks, which are shown to separate age-related damage from
adaptation, namely DamAge and AdaptAge. I also built ClockBase, a database that contains over 300,000
experimental samples from GEO with the epigenetic age pre-calculated. I plan to further standardize the
sample information using large language models and apply the causality-informed biomarkers to screen for
anti-aging interventions.
In the K00 phase, I will use the protein language model and protein design tool to expand the universe of anti-
aging interventions. Specifically, I will study the protein structural features across mammalian species with
various lifespans to understand which features are associated with longevity. Then, I can incorporate this
information into protein design and optimize existing proteins to support a longer lifespan.
This proposal will advance our understanding of the molecular mechanisms underlying aging by incorporating
causality into epigenetic clock models. By distinguishing between age-related damage and adaptation, we can
develop more precise and informative aging biomarkers, which will have significant implications for aging
research and potential clinical applications. The K00 phase of the project will pioneer the application of protein
language models and protein design tools in aging research. Ultimately, it could pave the way for a completely
new branch of aging research – treating aging through the gradual redesign of the proteome.