1
|
Abstract
MOTIVATION To recognize remote relationships between RNA molecules, one must be able to align structures without regard to sequence similarity. We have implemented a method, which is swift [O(n(2))], sensitive and tolerant of large gaps and insertions. Molecules are broken into overlapping fragments, which are characterized by their memberships in a probabilistic classification based on local geometry and H-bonding descriptors. This leads to a probabilistic similarity measure that is used in a conventional dynamic programming method. RESULTS Examples are given of database searching, the detection of structural similarities, which would not be found using sequence based methods, and comparisons with a previously published approach. AVAILABILITY AND IMPLEMENTATION Source code (C and perl) and binaries for linux are freely available at www.zbh.uni-hamburg.de/fries.
Collapse
Affiliation(s)
- Tim Wiegels
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, D-20146 Hamburg, Germany.
| | | | | |
Collapse
|
2
|
Chakraborty S, Rao BJ, Baker N, Asgeirsson B. Structural phylogeny by profile extraction and multiple superimposition using electrostatic congruence as a discriminator. INTRINSICALLY DISORDERED PROTEINS 2013; 1. [PMID: 25364645 PMCID: PMC4212511 DOI: 10.4161/idp.25463] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Phylogenetic analysis of proteins using multiple sequence alignment (MSA) assumes an underlying evolutionary relationship in these proteins which occasionally remains undetected due to considerable sequence divergence. Structural alignment programs have been developed to unravel such fuzzy relationships. However, none of these structure based methods have used electrostatic properties to discriminate between spatially equivalent residues. We present a methodology for MSA of a set of related proteins with known structures using electrostatic properties as an additional discriminator (STEEP). STEEP first extracts a profile, then generates a multiple structural superimposition providing a consolidated spatial framework for comparing residues and finally emits the MSA. Residues that are aligned differently by including or excluding electrostatic properties can be targeted by directed evolution experiments to transform the enzymatic properties of one protein into another. We have compared STEEP results to those obtained from a MSA program (ClustalW) and a structural alignment method (MUSTANG) for chymotrypsin serine proteases. Subsequently, we used PhyML to generate phylogenetic trees for the serine and metallo-β-lactamase superfamilies from the STEEP generated MSA, and corroborated the accepted relationships in these superfamilies. We have observed that STEEP acts as a functional classifier when electrostatic congruence is used as a discriminator, and thus identifies potential targets for directed evolution experiments. In summary, STEEP is unique among phylogenetic methods for its ability to use electrostatic congruence to specify mutations that might be the source of the functional divergence in a protein family. Based on our results, we also hypothesize that the active site and its close vicinity contains enough information to infer the correct phylogeny for related proteins.
Collapse
Affiliation(s)
- Sandeep Chakraborty
- Department of Biological Sciences, Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 400 005, India
| | - Basuthkar J Rao
- Department of Biological Sciences, Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai 400 005, India
| | - Nathan Baker
- Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, Washington 99354, United States
| | - Bjarni Asgeirsson
- Science Institute, Department of Biochemistry, University of Iceland, Dunhaga 3, IS-107 Reykjavik, Iceland
| |
Collapse
|
3
|
Santos MA, Turinsky AL, Ong S, Tsai J, Berger MF, Badis G, Talukder S, Gehrke AR, Bulyk ML, Hughes TR, Wodak SJ. Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences. Nucleic Acids Res 2010; 38:7927-42. [PMID: 20705649 PMCID: PMC3001082 DOI: 10.1093/nar/gkq714] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Classifying proteins into subgroups with similar molecular function on the basis of sequence is an important step in deriving reliable functional annotations computationally. So far, however, available classification procedures have been evaluated against protein subgroups that are defined by experts using mainly qualitative descriptions of molecular function. Recently, in vitro DNA-binding preferences to all possible 8-nt DNA sequences have been measured for 178 mouse homeodomains using protein-binding microarrays, offering the unprecedented opportunity of evaluating the classification methods against quantitative measures of molecular function. To this end, we automatically derive homeodomain subtypes from the DNA-binding data and independently group the same domains using sequence information alone. We test five sequence-based methods, which use different sequence-similarity measures and algorithms to group sequences. Results show that methods that optimize the classification robustness reflect well the detailed functional specificity revealed by the experimental data. In some of these classifications, 73–83% of the subfamilies exactly correspond to, or are completely contained in, the function-based subtypes. Our findings demonstrate that certain sequence-based classifications are capable of yielding very specific molecular function annotations. The availability of quantitative descriptions of molecular function, such as DNA-binding data, will be a key factor in exploiting this potential in the future.
Collapse
Affiliation(s)
- Miguel A Santos
- Molecular Structure and Function Program, Hospital for Sick Children, Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Pisanti N, Soldano H, Carpentier M, Pothier J. A Relational Extension of the Notion of Motifs: Application to the Common 3D Protein Substructures Searching Problem. J Comput Biol 2009; 16:1635-60. [PMID: 20047489 DOI: 10.1089/cmb.2008.0019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Nadia Pisanti
- Dipartimento di Informatica, Università di Pisa, Largo B. Pontecorvo, Pisa, Italy
- LIPN–UMR 7030 CNRS, Université Paris 13, Villetaneuse, France
| | - Henry Soldano
- LIPN–UMR 7030 CNRS, Université Paris 13, Villetaneuse, France
- Université Pierre et Marie Curie-Paris6, Atelier de BioInformatique, Paris, France
| | - Mathilde Carpentier
- Université Pierre et Marie Curie-Paris6, Equipe de Génomique Analytique, INSERM511 Paris, France
| | - Joel Pothier
- Université Pierre et Marie Curie-Paris6, Atelier de BioInformatique, Paris, France
| |
Collapse
|
5
|
Madhusudhan MS, Webb BM, Marti-Renom MA, Eswar N, Sali A. Alignment of multiple protein structures based on sequence and structure features. Protein Eng Des Sel 2009; 22:569-74. [PMID: 19587024 DOI: 10.1093/protein/gzp040] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Comparing the structures of proteins is crucial to gaining insight into protein evolution and function. Here, we align the sequences of multiple protein structures by a dynamic programming optimization of a scoring function that is a sum of an affine gap penalty and terms dependent on various sequence and structure features (SALIGN). The features include amino acid residue type, residue position, residue accessible surface area, residue secondary structure state and the conformation of a short segment centered on the residue. The multiple alignment is built by following the 'guide' tree constructed from the matrix of all pairwise protein alignment scores. Importantly, the method does not depend on the exact values of various parameters, such as feature weights and gap penalties, because the optimal alignment across a range of parameter values is found. Using multiple structure alignments in the HOMSTRAD database, SALIGN was benchmarked against MUSTANG for multiple alignments as well as against TM-align and CE for pairwise alignments. On the average, SALIGN produces a 15% improvement in structural overlap over HOMSTRAD and 14% over MUSTANG, and yields more equivalent structural positions than TM-align and CE in 90% and 95% of cases, respectively. The utility of accurate multiple structure alignment is illustrated by its application to comparative protein structure modeling.
Collapse
Affiliation(s)
- M S Madhusudhan
- Department of Bioengineering and Therapeutic Sciences, University of California at San Francisco, San Francisco, CA 94158, USA
| | | | | | | | | |
Collapse
|
6
|
Abstract
Protein structures often show similarities to another which would not be seen at the sequence level. Given the coordinates of a protein chain, the SALAMI server atwww.zbh.uni-hamburg.de/salami will search the protein data bank and return a set of similar structures without using sequence information. The results page lists the related proteins, details of the sequence and structure similarity and implied sequence alignments. Via a simple structure viewer, one can view superpositions of query and library structures and finally download superimposed coordinates. The alignment method is very tolerant of large gaps and insertions, and tends to produce slightly longer alignments than other similar programs.
Collapse
Affiliation(s)
- Thomas Margraf
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany.
| | | | | |
Collapse
|
7
|
Sun H, Ferhatosmanoglu H, Ota M, Wang Y. An enhanced partial order curve comparison algorithm and its application to analyzing protein folding trajectories. BMC Bioinformatics 2008; 9:344. [PMID: 18710565 PMCID: PMC2571979 DOI: 10.1186/1471-2105-9-344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2008] [Accepted: 08/18/2008] [Indexed: 11/13/2022] Open
Abstract
Background Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them. Results In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the enhanced partial order (EPO) algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. The EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study on miniprotein Trp-cage [1] demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events. Conclusion The EPO algorithm is general and applicable to a wide range of applications. We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities. For user's convenience, we provide a web server for the algorithm at .
Collapse
|
8
|
Wang X, Snoeyink J. Defining and computing optimum RMSD for gapped and weighted multiple-structure alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:525-533. [PMID: 18989040 DOI: 10.1109/tcbb.2008.92] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Pairwise structure alignment commonly uses root mean square deviation (RMSD) to measure the structural similarity, and methods for optimizing RMSD are well established. We extend RMSD to weighted RMSD for multiple structures. By using multiplicative weights, we show that weighted RMSD for all pairs is the same as weighted RMSD to an average of the structures. Thus, using RMSD or weighted RMSD implies that the average is a consensus structure. Although we show that in general, the two tasks of finding the optimal translations and rotations for minimizing weighted RMSD cannot be separated for multiple structures like they can for pairs, an inherent difficulty and a fact ignored by previous work, we develop a near-linear iterative algorithm to converge weighted RMSD to a local minimum. 10,000 experiments of gapped alignment done on each of 23 protein families from HOMSTRAD (where each structure starts with a random translation and rotation) converge rapidly to the same minimum. Finally we propose a heuristic method to iteratively remove the effect of outliers and find well-aligned positions that determine the structural conserved region by modeling B-factors and deviations from the average positions as weights and iteratively assigning higher weights to better aligned atoms.
Collapse
Affiliation(s)
- Xueyi Wang
- Department of Mathematics and Computer Science, Northwest Nazarene University, 623 Holly St., Nampa, ID 83686-5897, USA.
| | | |
Collapse
|
9
|
Schenk G, Margraf T, Torda AE. Protein sequence and structure alignments within one framework. Algorithms Mol Biol 2008; 3:4. [PMID: 18380904 PMCID: PMC2390564 DOI: 10.1186/1748-7188-3-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 04/01/2008] [Indexed: 11/19/2022] Open
Abstract
Background Protein structure alignments are usually based on very different techniques to sequence alignments. We propose a method which treats sequence, structure and even combined sequence + structure in a single framework. Using a probabilistic approach, we calculate a similarity measure which can be applied to fragments containing only protein sequence, structure or both simultaneously. Results Proof-of-concept results are given for the different problems. For sequence alignments, the methodology is no better than conventional methods. For structure alignments, the techniques are very fast, reliable and tolerant of a range of alignment parameters. Combined sequence and structure alignments may provide a more reliable alignment for pairs of proteins where pure structural alignments can be misled by repetitive elements or apparent symmetries. Conclusion The probabilistic framework has an elegance in principle, merging sequence and structure descriptors into a single framework. It has a practical use in fast structural alignments and a potential use in finding those examples where sequence and structural similarities apparently disagree.
Collapse
|
10
|
Shatsky M, Nussinov R, Wolfson HJ. Algorithms for multiple protein structure alignment and structure-derived multiple sequence alignment. Methods Mol Biol 2008; 413:125-46. [PMID: 18075164 PMCID: PMC10773980 DOI: 10.1007/978-1-59745-574-9_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Primary amino acid content and the geometry of the folded protein 3D structure are major parameters of protein function. During the course of evolution the protein 3D structure is more preserved than its primary sequence. Thus, analysis of protein structures is expected to lead to a deep insight into protein function. Recognition of a structural core common to a set of protein structures serves as a basic tool for the studies of protein evolution and classification, analysis of similar structural motifs and functional binding sites, and for homology modeling and threading. In this chapter, we discuss several biologically related computational aspects of the multiple structure alignment and propose a method that provides solutions to these problems. Finally, we address the problem of structure-based multiple sequence alignment and propose an optimization method that unifies primary sequence and 3D structure information.
Collapse
Affiliation(s)
- Maxim Shatsky
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | | | | |
Collapse
|
11
|
Via A, Peluso D, Gherardini PF, de Rinaldis E, Colombo T, Ausiello G, Helmer-Citterich M. 3dLOGO: a web server for the identification, analysis and use of conserved protein substructures. Nucleic Acids Res 2007; 35:W416-9. [PMID: 17488847 PMCID: PMC1933223 DOI: 10.1093/nar/gkm228] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
3dLOGO is a web server for the identification and analysis of conserved protein 3D substructures. Given a set of residues in a PDB (Protein Data Bank) chain, the server detects the matching substructure(s) in a set of user-provided protein structures, generates a multiple structure alignment centered on the input substructures and highlights other residues whose structural conservation becomes evident after the defined superposition. Conserved residues are proposed to the user for highlighting functional areas, deriving refined structural motifs or building sequence patterns. Residue structural conservation can be visualized through an expressly designed Java application, 3dProLogo, which is a 3D implementation of a sequence logo. The 3dLOGO server, with related documentation, is available at http://3dlogo.uniroma2.it/
Collapse
Affiliation(s)
- Allegra Via
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy, Bioinformatics Group, I.R.B.M. P. Angeletti, MRL-Rome, Via Pontina Km, 30600 Pomezia, Italy, Center for Comparative Functional Genomics, Department of Biology, New York University, NY 10003, USA and Systems Biology Group - Max-Delbrück-Centrum für Molekulare Medizin, Berlin
- *To whom correspondence should be addressed. +39 067259 4324+39 067259 4314 Correspondence may also be addressed to Manuela Helmer-Citterich.
| | - Daniele Peluso
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy, Bioinformatics Group, I.R.B.M. P. Angeletti, MRL-Rome, Via Pontina Km, 30600 Pomezia, Italy, Center for Comparative Functional Genomics, Department of Biology, New York University, NY 10003, USA and Systems Biology Group - Max-Delbrück-Centrum für Molekulare Medizin, Berlin
| | - Pier Federico Gherardini
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy, Bioinformatics Group, I.R.B.M. P. Angeletti, MRL-Rome, Via Pontina Km, 30600 Pomezia, Italy, Center for Comparative Functional Genomics, Department of Biology, New York University, NY 10003, USA and Systems Biology Group - Max-Delbrück-Centrum für Molekulare Medizin, Berlin
| | - Emanuele de Rinaldis
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy, Bioinformatics Group, I.R.B.M. P. Angeletti, MRL-Rome, Via Pontina Km, 30600 Pomezia, Italy, Center for Comparative Functional Genomics, Department of Biology, New York University, NY 10003, USA and Systems Biology Group - Max-Delbrück-Centrum für Molekulare Medizin, Berlin
| | - Teresa Colombo
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy, Bioinformatics Group, I.R.B.M. P. Angeletti, MRL-Rome, Via Pontina Km, 30600 Pomezia, Italy, Center for Comparative Functional Genomics, Department of Biology, New York University, NY 10003, USA and Systems Biology Group - Max-Delbrück-Centrum für Molekulare Medizin, Berlin
| | - Gabriele Ausiello
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy, Bioinformatics Group, I.R.B.M. P. Angeletti, MRL-Rome, Via Pontina Km, 30600 Pomezia, Italy, Center for Comparative Functional Genomics, Department of Biology, New York University, NY 10003, USA and Systems Biology Group - Max-Delbrück-Centrum für Molekulare Medizin, Berlin
| | - Manuela Helmer-Citterich
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome Tor Vergata, 00133 Rome, Italy, Bioinformatics Group, I.R.B.M. P. Angeletti, MRL-Rome, Via Pontina Km, 30600 Pomezia, Italy, Center for Comparative Functional Genomics, Department of Biology, New York University, NY 10003, USA and Systems Biology Group - Max-Delbrück-Centrum für Molekulare Medizin, Berlin
| |
Collapse
|
12
|
Birzele F, Gewehr JE, Csaba G, Zimmer R. Vorolign--fast structural alignment using Voronoi contacts. Bioinformatics 2007; 23:e205-11. [PMID: 17237093 DOI: 10.1093/bioinformatics/btl294] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
UNLABELLED Vorolign, a fast and flexible structural alignment method for two or more protein structures is introduced. The method aligns protein structures using double dynamic programming and measures the similarity of two residues based on the evolutionary conservation of their corresponding Voronoi-contacts in the protein structure. This similarity function allows aligning protein structures even in cases where structural flexibilities exist. Multiple structural alignments are generated from a set of pairwise alignments using a consistency-based, progressive multiple alignment strategy. RESULTS The performance of Vorolign is evaluated for different applications of protein structure comparison, including automatic family detection as well as pairwise and multiple structure alignment. Vorolign accurately detects the correct family, superfamily or fold of a protein with respect to the SCOP classification on a set of difficult target structures. A scan against a database of >4000 proteins takes on average 1 min per target. The performance of Vorolign in calculating pairwise and multiple alignments is found to be comparable with other pairwise and multiple protein structure alignment methods. AVAILABILITY Vorolign is freely available for academic users as a web server at http://www.bio.ifi.lmu.de/Vorolign
Collapse
Affiliation(s)
- Fabian Birzele
- Practical Informatics and Bioinformatics Group, Department of Informatics, Ludwig-Maximilians-University, Munich, Germany.
| | | | | | | |
Collapse
|
13
|
Konagurthu AS, Whisstock JC, Stuckey PJ, Lesk AM. MUSTANG: a multiple structural alignment algorithm. Proteins 2006; 64:559-74. [PMID: 16736488 DOI: 10.1002/prot.20921] [Citation(s) in RCA: 537] [Impact Index Per Article: 29.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Multiple structural alignment is a fundamental problem in structural genomics. In this article, we define a reliable and robust algorithm, MUSTANG (MUltiple STructural AligNment AlGorithm), for the alignment of multiple protein structures. Given a set of protein structures, the program constructs a multiple alignment using the spatial information of the C(alpha) atoms in the set. Broadly based on the progressive pairwise heuristic, this algorithm gains accuracy through novel and effective refinement phases. MUSTANG reports the multiple sequence alignment and the corresponding superposition of structures. Alignments generated by MUSTANG are compared with several handcurated alignments in the literature as well as with the benchmark alignments of 1033 alignment families from the HOMSTRAD database. The performance of MUSTANG was compared with DALI at a pairwise level, and with other multiple structural alignment tools such as POSA, CE-MC, MALECON, and MultiProt. MUSTANG performs comparably to popular pairwise and multiple structural alignment tools for closely related proteins, and performs more reliably than other multiple structural alignment methods on hard data sets containing distantly related proteins or proteins that show conformational changes.
Collapse
Affiliation(s)
- Arun S Konagurthu
- Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Melbourne, Victoria, 3010 Australia
| | | | | | | |
Collapse
|
14
|
Chen Y, Crippen GM. An iterative refinement algorithm for consistency based multiple structural alignment methods. Bioinformatics 2006; 22:2087-93. [PMID: 16809393 DOI: 10.1093/bioinformatics/btl351] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Multiple STructural Alignment (MSTA) provides valuable information for solving problems such as fold recognition. The consistency-based approach tries to find conflict-free subsets of alignments from a pre-computed all-to-all Pairwise Alignment Library (PAL). If large proportions of conflicts exist in the library, consistency can be hard to get. On the other hand, multiple structural superposition has been used in many MSTA methods to refine alignments. However, multiple structural superposition is dependent on alignments, and a superposition generated based on erroneous alignments is not guaranteed to be the optimal superposition. Correcting errors after making errors is not as good as avoiding errors from the beginning. Hence it is important to refine the pairwise library to reduce the number of conflicts before any consistency-based assembly. RESULTS We present an algorithm, Iterative Refinement of Induced Structural alignment (IRIS), to refine the PAL. A new measurement for the consistency of a library is also proposed. Experiments show that our algorithm can greatly improve T-COFFEE performance for less consistent pairwise alignment libraries. The final multiple alignment outperforms most state-of-the-art MSTA algorithms at assembling 15 transglycosidases. Results on three other benchmarks showed that the algorithm consistently improves multiple alignment performance. AVAILABILITY The C++ code of the algorithm is available upon request.
Collapse
Affiliation(s)
- Yu Chen
- Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA
| | | |
Collapse
|
15
|
Ebert J, Brutlag D. Development and validation of a consistency based multiple structure alignment algorithm. Bioinformatics 2006; 22:1080-7. [PMID: 16473868 DOI: 10.1093/bioinformatics/btl046] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SUMMARY We introduce an algorithm that uses the information gained from simultaneous consideration of an entire group of related proteins to create multiple structure alignments (MSTAs). Consistency-based alignment (CBA) first harnesses the information contained within regions that are consistently aligned among a set of pairwise superpositions in order to realign pairs of proteins through both global and local refinement methods. It then constructs a multiple alignment that is maximally consistent with the improved pairwise alignments. We validate CBA's alignments by assessing their accuracy in regions where at least two of the aligned structures contain the same conserved sequence motif. RESULTS CBA correctly aligns well over 90% of motif residues in superpositions of proteins belonging to the same family or superfamily, and it outperforms a number of previously reported MSTA algorithms.
Collapse
Affiliation(s)
- Jessica Ebert
- Program in Biophysics and Department of Biochemistry, Stanford University Stanford, CA 94305, USA
| | | |
Collapse
|
16
|
Lupyan D, Leo-Macias A, Ortiz AR. A new progressive-iterative algorithm for multiple structure alignment. Bioinformatics 2005; 21:3255-63. [PMID: 15941743 DOI: 10.1093/bioinformatics/bti527] [Citation(s) in RCA: 112] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Multiple structure alignments are becoming important tools in many aspects of structural bioinformatics. The current explosion in the number of available protein structures demands multiple structural alignment algorithms with an adequate balance of accuracy and speed, for large scale applications in structural genomics, protein structure prediction and protein classification. RESULTS A new multiple structural alignment program, MAMMOTH-mult, is described. It is demonstrated that the alignments obtained with the new method are an improvement over previous manual or automatic alignments available in several widely used databases at all structural levels. Detailed analysis of the structural alignments for a few representative cases indicates that MAMMOTH-mult delivers biologically meaningful trees and conservation at the sequence and structural levels of functional motifs in the alignments. An important improvement over previous methods is the reduction in computational cost. Typical alignments take only a median time of 5 CPU seconds in a single R12000 processor. MAMMOTH-mult is particularly useful for large scale applications. AVAILABILITY http://ub.cbm.uam.es/mammoth/mult.
Collapse
Affiliation(s)
- Dmitry Lupyan
- Department of Physiology and Biophysics, Mount Sinai School of Medicine, New York, NY 10029, USA
| | | | | |
Collapse
|
17
|
Abstract
MOTIVATION Existing comparisons of protein structures are not able to describe structural divergence and flexibility in the structures being compared because they focus on identifying a common invariant core and ignore parts of the structures outside this core. Understanding the structural divergence and flexibility is critical for studying the evolution of functions and specificities of proteins. RESULTS A new method of multiple protein structure alignment, POSA (Partial Order Structure Alignment), was developed using a partial order graph representation of multiple alignments. POSA has two unique features: (1) identifies and classifies regions that are conserved only in a subset of input structures and (2) allows internal rearrangements in protein structures. POSA outperforms other programs in the cases where structural flexibilities exist and provides new insights by visualizing the mosaic nature of multiple structural alignments. POSA is an ideal tool for studying the variation of protein structures within diverse structural families. AVAILABILITY POSA is freely available for academic users on a Web server at http://fatcat.burnham.org/POSA
Collapse
Affiliation(s)
- Yuzhen Ye
- Program in Bioinformatics and Systems Biology, The Burnham Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA.
| | | |
Collapse
|
18
|
|
19
|
Sandelin E. Extracting multiple structural alignments from pairwise alignments: a comparison of a rigorous and a heuristic approach. Bioinformatics 2004; 21:1002-9. [PMID: 15531607 PMCID: PMC2692033 DOI: 10.1093/bioinformatics/bti117] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Multiple structural alignments (MSTAs) provide position-specific information on the sequence variability allowed by protein folds. This information can be exploited to better understand the evolution of proteins and the physical chemistry of polypeptide folding. Most MSTA methods rely on a pre-computed library of pairwise alignments. This library will in general contain conflicting residue equivalences not all of which can be realized in the final MSTA. Hence to build a consistent MSTA, these methods have to select a conflict-free subset of equivalences. RESULTS Using a dataset with 327 families from SCOP 1.63 we compare the ability of two different methods to select an optimal conflict-free subset of equivalences. One is an implementation of Reinert et al.'s integer linear programming formulation (ILP) of the maximum weight trace problem (Reinert et al., 1997, Proc. 1st Ann. Int. Conf. Comput. Mol. Biol. (RECOMB-97), ACM Press, New York). This ILP formulation is a rigorous approach but its complexity is difficult to predict. The other method is T-Coffee (Notredame et al., 2000) which uses a heuristic enhancement of the equivalence weights which allow it to use the speed and simplicity of the progressive alignment approach while still incorporating information of all alignments in each step of building the MSTA. We find that although the ILP formulation consistently selects a more optimal set of conflict-free equivalences, the differences are small and the quality of the resulting MSTAs are essentially the same for both methods. Given its speed and predictable complexity, our results show that T-Coffee is an attractive alternative for producing high-quality MSTAs.
Collapse
Affiliation(s)
- Erik Sandelin
- Stockholm Bioinformatics Center, AlbaNova, Stockholm University, 106 91 Stockholm, Sweden.
| |
Collapse
|