9
|
Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P, Gruca A, Grynberg M, Kajava AV, Promponas VJ, Anisimova M, Jakobsen KS, Linke D. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 2019; 47:10994-11006. [PMID: 31584084 PMCID: PMC6868369 DOI: 10.1093/nar/gkz841] [Citation(s) in RCA: 155] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2019] [Revised: 09/03/2019] [Accepted: 10/01/2019] [Indexed: 12/13/2022] Open
Abstract
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.
Collapse
Affiliation(s)
- Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Pablo Mier
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Miguel A Andrade-Navarro
- Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Husch-Weg 15, 55128 Mainz, Germany
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton. CB10 1SD, UK
| | - Patryk Jarnot
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Aleksandra Gruca
- Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics PAS, Pawińskiego 5A, 02-106 Warsaw, Poland
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier, UMR 5237 CNRS, Universite Montpellier 1919 Route de Mende, CEDEX 5, 34293 Montpellier, France
- Institut de Biologie Computationnelle, 34095 Montpellier, France
| | - Vasilis J Promponas
- Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, PO Box 20537, CY 1678 Nicosia, Cyprus
| | - Maria Anisimova
- Institute of Applied Simulations, School of Life Sciences and Facility Management, Zurich University of Applied Sciences (ZHAW), Wädenswil, Switzerland
- Swiss Institute of Bioinformatics (SIB), Lausanne, Switzerland
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| | - Dirk Linke
- Section for Genetics and Evolutionary Biology, Department of Biosciences, University of Oslo, NO-0316 Oslo, Norway
| |
Collapse
|
14
|
Schaper E, Korsunsky A, Pečerska J, Messina A, Murri R, Stockinger H, Zoller S, Xenarios I, Anisimova M. TRAL: tandem repeat annotation library. Bioinformatics 2015; 31:3051-3. [PMID: 25987568 DOI: 10.1093/bioinformatics/btv306] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 05/08/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Currently, more than 40 sequence tandem repeat detectors are published, providing heterogeneous, partly complementary, partly conflicting results. RESULTS We present TRAL, a tandem repeat annotation library that allows running and parsing of various detection outputs, clustering of redundant or overlapping annotations, several statistical frameworks for filtering false positive annotations, and importantly a tandem repeat annotation and refinement module based on circular profile hidden Markov models (cpHMMs). Using TRAL, we evaluated the performance of a multi-step tandem repeat annotation workflow on 547 085 sequences in UniProtKB/Swiss-Prot. The researcher can use these results to predict run-times for specific datasets, and to choose annotation complexity accordingly. AVAILABILITY AND IMPLEMENTATION TRAL is an open-source Python 3 library and is available, together with documentation and tutorials via http://www.vital-it.ch/software/tral. CONTACT elke.schaper@isb-sib.ch.
Collapse
Affiliation(s)
- Elke Schaper
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wäde
| | - Alexander Korsunsky
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Jūlija Pečerska
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wäde
| | - Antonio Messina
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Riccardo Murri
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Heinz Stockinger
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Stefan Zoller
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Ioannis Xenarios
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| | - Maria Anisimova
- Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland Vital-IT group, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, Quartier Sorge, 1015 Lausanne, Switzerland, Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland, Graz University of Technology, Institute of Molecular Biotechnology, 8010 Graz, Austria, Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland, Services and Support for Science IT, University of Zürich, 8057 Zürich, Switzerland and Institute of Applied Simulations, School of Life Sciences und Facility Management, Zürich University of Applied Sciences, 8820 Wädenswil, Switzerland
| |
Collapse
|