Reference genome assemblies provide a map of a species’ DNA sequence and its spatial context—that is, where along the chromosomes a spec

The Vertebrate Genomes Project

submited by
Style Pass
2021-06-22 06:00:07

Reference genome assemblies provide a map of a species’ DNA sequence and its spatial context—that is, where along the chromosomes a specific piece of DNA sequence can be found. In the past, the generation of reference assemblies was prohibitively expensive and labour-intensive, so they were only produced for humans and the most important model organisms, and still contained gaps and errors. Draft genomes generated using more affordable second-generation sequencing technologies could be assembled for a larger number of species, but these were of lower quality because they were highly fragmented and their annotation was erroneous in some parts.

However, for a complete understanding of evolutionary processes and other fundamental questions in biology, high-quality reference genome assemblies of all species are required. Technological advances, improved computational methods and the ever-decreasing cost of sequencing enabled the Vertebrate Genomes Project (VGP), which was launched in 2017, to pursue the ambitious goal of producing a reference genome assembly for each of the extant vertebrate species on Earth. In the first phase of the project, the VGP has been focused on testing and improving genome sequencing and assembly approaches, on assembling a first set of 260 high-quality genomes of species representing all vertebrate orders (a work that is still in progress), and on the initial reporting of insights into genome evolution in vertebrates.

Milestones for phase II will be the production of assemblies for about 1,159 vertebrate families, and for phase III will involve the generation of assemblies for more than 10,000 genera; finally, in phase IV, assemblies will be completed for all vertebrate species. All sequence data and assemblies are being made freely available as they are produced and can be downloaded or browsed at GenomeArk, Genbank, Ensembl, and UCSC.

Leave a Comment