51
|
Thompson JD, Linard B, Lecompte O, Poch O. A comprehensive benchmark study of multiple sequence alignment methods: current challenges and future perspectives. PLoS One 2011; 6:e18093. [PMID: 21483869 PMCID: PMC3069049 DOI: 10.1371/journal.pone.0018093] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2010] [Accepted: 02/21/2011] [Indexed: 12/18/2022] Open
Abstract
Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies.
Collapse
Affiliation(s)
- Julie D Thompson
- Département de Biologie Structurale et Génomique, IGBMC (Institut de Génétique et de Biologie Moléculaire et Cellulaire), CNRS/INSERM/Université de Strasbourg, Illkirch, France.
| | | | | | | |
Collapse
|
52
|
Vroling B, Sanders M, Baakman C, Borrmann A, Verhoeven S, Klomp J, Oliveira L, de Vlieg J, Vriend G. GPCRDB: information system for G protein-coupled receptors. Nucleic Acids Res 2011; 39:D309-19. [PMID: 21045054 PMCID: PMC3013641 DOI: 10.1093/nar/gkq1009] [Citation(s) in RCA: 115] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2010] [Accepted: 10/07/2010] [Indexed: 11/14/2022] Open
Abstract
The GPCRDB is a Molecular Class-Specific Information System (MCSIS) that collects, combines, validates and disseminates large amounts of heterogeneous data on G protein-coupled receptors (GPCRs). The GPCRDB contains experimental data on sequences, ligand-binding constants, mutations and oligomers, as well as many different types of computationally derived data such as multiple sequence alignments and homology models. The GPCRDB provides access to the data via a number of different access methods. It offers visualization and analysis tools, and a number of query systems. The data is updated automatically on a monthly basis. The GPCRDB can be found online at http://www.gpcr.org/7tm/.
Collapse
Affiliation(s)
- Bas Vroling
- CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
| | - Marijn Sanders
- CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
| | - Coos Baakman
- CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
| | - Annika Borrmann
- CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
| | - Stefan Verhoeven
- CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
| | - Jan Klomp
- CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
| | - Laerte Oliveira
- CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
| | - Jacob de Vlieg
- CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
| | - Gert Vriend
- CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein Zuid 26-28, 6525 GA Nijmegen, Department of Molecular Design and Informatics, MSD, Molenstraat 110, 5340 BH, Oss, The Netherlands and Department of Biophysics, Escola Paulista de Medicina, Federal University of São Paulo, São Paulo 04023-062, Brazil
| |
Collapse
|