Project Summary
SRI International and a group of collaborators propose to further develop the Escherichia coli EcoCyc database (DB).
EcoCyc is and will continue to be freely and openly available, and is accessible to scientists through the Internet, as
downloadable data files, and as a downloadable software application.
Scientists from multiple disciplines make wide use of EcoCyc; it has been cited 4,939 times, and from 2019–
2021, an average of 94,800 users per year visited the EcoCyc website. It serves as a general reference source on
E. coli for experimental biologists and is particularly useful for the analysis of functional-genomics experiments.
The DB serves computational biologists who are undertaking global studies of E. coli; metabolic engineers who are
developing new methods for chemicals production, including biofuels; and researchers and bioinformaticists who
are using EcoCyc as the gold-standard dataset to develop new computational methods, including the prediction of
operons, promoters, and protein functional linkages. Educators also use the DB.
We will update EcoCyc in an ongoing fashion to reflect new information about the genes, metabolic pathways,
and regulatory interactions of these important model organisms. Information will be integrated from the biomed-
ical literature and from large-scale experiments, such as data on gene essentiality, on nutrients supporting growth,
and on protein interactions. We will continue a comprehensive and ongoing effort to refine steady-state metabolic
network models of these organisms by validating model predictions against many conditions of growth and non-
growth for wildtype and knock-out strains. The resulting models will have applications in anti-microbial drug
discovery and metabolic engineering, and the model development process will lead to many improvements in the
EcoCyc DB.
We will launch a new effort to curate the genes and proteins of E. coli strains other than the strain served by
EcoCyc. Thousands of E. coli strains have been sequenced; yet in many cases, their genome annotations are of low
quality. We will project curated gene annotations from EcoCyc and from other E. coli strains to orthologs of those
genes in the BioCyc DBs for hundreds of other E. coli strains, thus significantly improving the annotation quality of
other E. coli strains in a cost-effective fashion.
The project will also expand the Pathway Tools software used to query and analyze EcoCyc, such as adding a
tool for projecting newly curated gene and protein annotations to orthologs in other E. coli strains.