Project Summary/Abstract
The African malaria mosquito Anopheles gambiae, because of its epidemiological importance, was the first
disease vector, with a genome sequenced in 2002. Since then the PEST strain assembly remains the only
available chromosome-level genome reference for this major African malaria vector. Although this assembly has
been the workhorse for functional and population genomic studies of malaria mosquitoes for almost two decades,
it is now failing to deliver the highest possible quality of the analyses as it is staggeringly imperfect by the modern
standards. The assembly has serious deficiencies such as a large portion of unmapped contigs, sequencing and
physical gaps, incorrect order and orientation of some scaffolds, and the presence of haplotypes derived from
the sister species An. coluzzii. Moreover, the PEST strain of An. gambiae is no longer available and the existing
assembly cannot be validated or improved with additional sequencing. As a result, a complete annotation and
an accurate functional characterization of the An. gambiae genome cannot be performed. Also, the lack of a
reliable reference represents a major impediment to population genomics studies, especially to those dealing
with structural genomic variations. For a long time, the high cost of sequencing and the sheer difficulty of genome
assembly has made major improvements of the mosquito genome prohibitive. Novel long-read sequencing
technologies and innovative scaffolding approaches now allow developing de novo chromosome-level genome
assemblies of superior quality at a reasonable cost. Also, the availability of polytene chromosomes ensures high-
resolution genome mapping in An. gambiae. The main goal of this R21 project is to develop a chromosome-level
genome assembly and to explore the structural genomic variations in the An. gambiae complex. This timely
project will meet the demand for a new highly-finished genome assembly for the major African malaria vector
based on the appropriate innovative tools and expertise of the PI and Co-I. Briefly, the project’s specific aims
are to (1) Obtain a contiguous genome assembly for An. gambiae using Oxford Nanopore, Illumina sequencing,
and chromosome-scale Hi-C scaffolding; (2) Validate the obtained assembly and construct a high-resolution
physical genome map for An. gambiae using fluorescence in situ hybridization; (3) Characterize structural
genomic variations in the An. gambiae complex. A new chromosome-level genome assembly for An. gambiae
will transform research as it will allow the most complete functional annotation and the most detailed population
analysis of malaria mosquitoes. The more complete assembly of heterochromatic sequences will improve our
understanding of the genomic “dark matter” and will stimulate epigenomic studies of this disease vector. The
scientific community will have free access to the new assembly from VEuPathDB and NCBI.