SUMMARY
Between 2000, when the first version of Hypothesis Testing Using Phylogenies (HyPhy) was released, and
2022, the number of bases in GenBank increased ~250 fold, the number of sequenced genomes from a handful
to >3,5003, and the number of PubMed papers studying molecular evolution ~12 fold. Data generation has
ceased to be the bottleneck for biological and biomedical discoveries, and the new bottleneck is methods, tools,
and software for data analysis and interpretation. Comparative evolutionary analyses remain an essential and
powerful method for extracting meaning and insight from ever-expanding genomic sequence data. But, the lack
of emphasis and incentive structure for developing, maintaining, benchmarking, and improving software and
analysis tools, despite their essential and critical role in modern biology, biotechnology, and medicine remains
a major concern. Analytical, infrastructure, and incentive challenges exposed by the genomic data deluge during
the COVID-19 pandemic were aptly summarized. Over the last quarter century HyPhy has established itself as
a useful, popular, and durable platform for studying diverse evolutionary processes, such as natural selection
and recombination, across different taxonomic scales. Datamonkey, a web application providing free access to
“one-click” popular HyPhy analyses, has seen increasing use by researchers worldwide over the last two
decades. Through continued methodological innovation, improvements in performance and scalability,
accessible web services, focus on data visualization, and user support, HyPhy developers were able to sustain
and increase the reach and impact of the program. This proposal seeks to improve the software, enhance
biological realism, relevance, accessibility, and interpretability of the methods, design novel approaches to
address outstanding problems in evolutionary data analysis, and further lower access barriers to evolutionary
comparative analyses through integration with the Galaxy ecosystem.