Improved reference genome for the domestic horse increases assembly contiguity and composition.
Commun Biol 2018;
1:197. [PMID:
30456315 PMCID:
PMC6240028 DOI:
10.1038/s42003-018-0199-z]
[Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 10/16/2018] [Indexed: 11/30/2022] Open
Abstract
Recent advances in genomic sequencing technology and computational assembly methods have allowed scientists to improve reference genome assemblies in terms of contiguity and composition. EquCab2, a reference genome for the domestic horse, was released in 2007. Although of equal or better quality compared to other first-generation Sanger assemblies, it had many of the shortcomings common to them. In 2014, the equine genomics research community began a project to improve the reference sequence for the horse, building upon the solid foundation of EquCab2 and incorporating new short-read data, long-read data, and proximity ligation data. Here, we present EquCab3. The count of non-N bases in the incorporated chromosomes is improved from 2.33 Gb in EquCab2 to 2.41 Gb in EquCab3. Contiguity has also been improved nearly 40-fold with a contig N50 of 4.5 Mb and scaffold contiguity enhanced to where all but one of the 32 chromosomes is comprised of a single scaffold.
Theodore Kalbfleisch et al. present an improved genome assembly for the domestic horse by combining short- and long-read data, as well as proximity ligation data. They improve contiguity of the assembly by 40-fold, with a 10-fold reduction in gaps.
Collapse