Gfastar: a C++ library and a tool suite to aid Telomere-to-Telomere genome assembly - Project Summary The recent completion of a Telomere-to-Telomere (T2T) human genome has demonstrated that, in principle, existing sequencing technologies allow gapless and nearly error-free, assembly of complex, human-sized genomes. Despite these technological advancements, genome assemblies that are currently being generated and released in public archives are still incomplete and contain a significant number of errors, which can dramatically impact downstream analyses. Algorithms that can generate T2T genomes are still in their infancy. The few that are available require extensive manual validation and curation and have so far worked on only a handful of model species. Dedicated algorithms and software tools are essential for achieving T2T assembly completeness and accuracy in all species. In particular, extensive evaluation and sophisticated manipulation of genome assembly graphs are required for T2T genome assembly. To this end, an efficient tool suite is missing. To bridge this gap, gfastar, a suite of algorithms and tools created for the evaluation and manipulation of assembly graphs will be further advanced and continuously maintained. Gfastar is under active development, and it is currently used by large-scale initiatives aimed at the generation of high-quality reference genomes such as the Vertebrate Genomes Project. Gfastar is powered by a dedicated C++ library, gfalibs. Gfalibs will be expanded to provide a comprehensive library dedicated to genome sequences and assembly graphs that can support multiple file formats commonly used by the genome assembly community (e.g. FASTA, FASTQ, GFA1/2, AGP, GAF, BAM, and FASTG), parallelized input/output (I/O) processing and many other general purpose functions and utilities. This library will be extensively used by the whole gfastar software ecosystem (rdeval, gfastats, gfalign, kcount, kreeq, teloscope, and gfase). Currently, several modules have already been implemented in gfastar. These existing modules will be expanded with additional functionalities and new tools will be developed. All these tools will synergistically contribute to the generation of T2T reference genomes at scale. As a whole, the gfastar tool suite will provide unparallelled algorithms and functionalities for assembly graph evaluation, manipulation and analysis, significantly supporting the genomic community by helping improve the completeness and accuracy of genomes.