1
|
Waterhouse RM, Aganezov S, Anselmetti Y, Lee J, Ruzzante L, Reijnders MJMF, Feron R, Bérard S, George P, Hahn MW, Howell PI, Kamali M, Koren S, Lawson D, Maslen G, Peery A, Phillippy AM, Sharakhova MV, Tannier E, Unger MF, Zhang SV, Alekseyev MA, Besansky NJ, Chauve C, Emrich SJ, Sharakhov IV. Evolutionary superscaffolding and chromosome anchoring to improve Anopheles genome assemblies. BMC Biol 2020; 18:1. [PMID: 31898513 PMCID: PMC6939337 DOI: 10.1186/s12915-019-0728-3] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 11/26/2019] [Indexed: 11/18/2022] Open
Abstract
Background New sequencing technologies have lowered financial barriers to whole genome sequencing, but resulting assemblies are often fragmented and far from ‘finished’. Updating multi-scaffold drafts to chromosome-level status can be achieved through experimental mapping or re-sequencing efforts. Avoiding the costs associated with such approaches, comparative genomic analysis of gene order conservation (synteny) to predict scaffold neighbours (adjacencies) offers a potentially useful complementary method for improving draft assemblies. Results We evaluated and employed 3 gene synteny-based methods applied to 21 Anopheles mosquito assemblies to produce consensus sets of scaffold adjacencies. For subsets of the assemblies, we integrated these with additional supporting data to confirm and complement the synteny-based adjacencies: 6 with physical mapping data that anchor scaffolds to chromosome locations, 13 with paired-end RNA sequencing (RNAseq) data, and 3 with new assemblies based on re-scaffolding or long-read data. Our combined analyses produced 20 new superscaffolded assemblies with improved contiguities: 7 for which assignments of non-anchored scaffolds to chromosome arms span more than 75% of the assemblies, and a further 7 with chromosome anchoring including an 88% anchored Anopheles arabiensis assembly and, respectively, 73% and 84% anchored assemblies with comprehensively updated cytogenetic photomaps for Anopheles funestus and Anopheles stephensi. Conclusions Experimental data from probe mapping, RNAseq, or long-read technologies, where available, all contribute to successful upgrading of draft assemblies. Our evaluations show that gene synteny-based computational methods represent a valuable alternative or complementary approach. Our improved Anopheles reference assemblies highlight the utility of applying comparative genomics approaches to improve community genomic resources.
Collapse
Affiliation(s)
- Robert M Waterhouse
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland.
| | - Sergey Aganezov
- Department of Computer Science, Princeton University, Princeton, NJ, 08450, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD, 21218, USA
| | | | - Jiyoung Lee
- The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Livio Ruzzante
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Maarten J M F Reijnders
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Romain Feron
- Department of Ecology and Evolution, University of Lausanne, and Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | - Sèverine Bérard
- ISEM, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Phillip George
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Matthew W Hahn
- Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Paul I Howell
- Centers for Disease Control and Prevention, Atlanta, GA, 30329, USA
| | - Maryam Kamali
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.,Department of Medical Entomology and Parasitology, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Daniel Lawson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Ashley Peery
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Maria V Sharakhova
- Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA.,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050
| | - Eric Tannier
- Laboratoire de Biométrie et Biologie Evolutive, Université Lyon 1, Unité Mixte de Recherche 5558 Centre National de la Recherche Scientifique, 69622, Villeurbanne, France.,Institut national de recherche en informatique et en automatique, Montbonnot, 38334, Grenoble, Rhône-Alpes, France
| | - Maria F Unger
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA
| | - Simo V Zhang
- Departments of Biology and Computer Science, Indiana University, Bloomington, IN, 47405, USA
| | - Max A Alekseyev
- Department of Mathematics and Computational Biology Institute, George Washington University, Ashburn, VA, 20147, USA
| | - Nora J Besansky
- Eck Institute for Global Health and Department of Biological Sciences, University of Notre Dame, Galvin Life Sciences Building, Notre Dame, IN, 46556, USA
| | - Cedric Chauve
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, V5A 1S6, Canada
| | - Scott J Emrich
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, 37996, USA
| | - Igor V Sharakhov
- The Interdisciplinary PhD Program in Genetics, Bioinformatics, and Computational Biology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. .,Department of Entomology, Virginia Polytechnic Institute and State University, Blacksburg, VA, 24061, USA. .,Laboratory of Ecology, Genetics and Environmental Protection, Tomsk State University, Tomsk, Russia, 634050.
| |
Collapse
|