51
|
Pei J, Kim BH, Grishin NV. PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res 2008; 36:2295-300. [PMID: 18287115 PMCID: PMC2367709 DOI: 10.1093/nar/gkn072] [Citation(s) in RCA: 1038] [Impact Index Per Article: 64.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Although multiple sequence alignments (MSAs) are essential for a wide range of applications from structure modeling to prediction of functional sites, construction of accurate MSAs for distantly related proteins remains a largely unsolved problem. The rapidly increasing database of spatial structures is a valuable source to improve alignment quality. We explore the use of 3D structural information to guide sequence alignments constructed by our MSA program PROMALS. The resulting tool, PROMALS3D, automatically identifies homologs with known 3D structures for the input sequences, derives structural constraints through structure-based alignments and combines them with sequence constraints to construct consistency-based multiple sequence alignments. The output is a consensus alignment that brings together sequence and structural information about input proteins and their homologs. PROMALS3D can also align sequences of multiple input structures, with the output representing a multiple structure-based alignment refined in combination with sequence constraints. The advantage of PROMALS3D is that it gives researchers an easy way to produce high-quality alignments consistent with both sequences and structures of proteins. PROMALS3D outperforms a number of existing methods for constructing multiple sequence or structural alignments using both reference-dependent and reference-independent evaluation methods.
Collapse
Affiliation(s)
- Jimin Pei
- Howard Hughes Medical Institute, Dallas, TX 75390, USA.
| | | | | |
Collapse
|
52
|
Kaminska KH, Baraniak U, Boniecki M, Nowaczyk K, Czerwoniec A, Bujnicki JM. Structural bioinformatics analysis of enzymes involved in the biosynthesis pathway of the hypermodified nucleoside ms(2)io(6)A37 in tRNA. Proteins 2008; 70:1-18. [PMID: 17910062 DOI: 10.1002/prot.21640] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
TRNAs from all organisms contain posttranscriptionally modified nucleosides, which are derived from the four canonical nucleosides. In most tRNAs that read codons beginning with U, adenosine in the position 37 adjacent to the 3' position of the anticodon is modified to N(6)-(Delta(2)-isopentenyl) adenosine (i(6)A). In many bacteria, such as Escherichia coli, this residue is typically hypermodified to N(6)-isopentenyl-2-thiomethyladenosine (ms(2)i(6)A). In a few bacteria, such as Salmonella typhimurium, ms(2)i(6)A can be further hydroxylated to N(6)-(cis-4-hydroxyisopentenyl)-2-thiomethyladenosine (ms(2)io(6)A). Although the enzymes that introduce the respective modifications (prenyltransferase MiaA, methylthiotransferase MiaB, and hydroxylase MiaE) have been identified, their structures remain unknown and sequence-function relationships remain obscure. We carried out sequence analysis and structure prediction of MiaA, MiaB, and MiaE, using the protein fold-recognition approach. Three-dimensional models of all three proteins were then built using a new modeling protocol designed to overcome uncertainties in the alignments and divergence between the templates. For MiaA and MiaB, the catalytic core was built based on the templates from the P-loop NTPase and Radical-SAM superfamilies, respectively. For MiaB, we have also modeled the C-terminal TRAM domain and the newly predicted N-terminal flavodoxin-fold domain. For MiaE, we confidently predict that it shares the three-dimensional fold with the ferritin-like four-helix bundle proteins and that it has a similar active site and mechanism of action to diiron carboxylate enzymes, in particular, methane monooxygenase (E.C.1.14.13.25) that catalyses the biological hydroxylation of alkanes. Our models provide the first structural platform for enzymes involved in the biosynthesis of i(6)A, ms(2)i(6)A, and ms(2)io(6)A, explain the data available from the literature and will help to design further experiments and interpret their results.
Collapse
Affiliation(s)
- Katarzyna H Kaminska
- Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, PL-61-614 Poznan, Poland
| | | | | | | | | | | |
Collapse
|
53
|
Abstract
Protein sequence alignment is the task of identifying evolutionarily or structurally related positions in a collection of amino acid sequences. Although the protein alignment problem has been studied for several decades, many recent studies have demonstrated considerable progress in improving the accuracy or scalability of multiple and pairwise alignment tools, or in expanding the scope of tasks handled by an alignment program. In this chapter, we review state-of-the-art protein sequence alignment and provide practical advice for users of alignment tools.
Collapse
Affiliation(s)
- Chuong B Do
- Computer Science Department, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
54
|
Affiliation(s)
- Cédric Notredame
- Information Génomique et Structurale, CNRS UPR2589, Institute for Structural Biology and Microbiology, Parc Scientifique de Luminy, Marseille, France.
| |
Collapse
|
55
|
Maravić Vlahovicek G, Cubrilo S, Tkaczuk KL, Bujnicki JM. Modeling and experimental analyses reveal a two-domain structure and amino acids important for the activity of aminoglycoside resistance methyltransferase Sgm. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2007; 1784:582-90. [PMID: 18343347 DOI: 10.1016/j.bbapap.2007.09.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Revised: 09/18/2007] [Accepted: 09/19/2007] [Indexed: 12/19/2022]
Abstract
Methyltransferases that carry out posttranscriptional N7-methylation of G1405 in 16S rRNA confer bacterial resistance to aminoglycoside antibiotics, including kanamycin and gentamicin. Genes encoding enzymes from this family (hereafter referred to as Arm, for aminoglycoside resistance methyltransferases) have been recently found to spread by horizontal gene transfer between various human pathogens. The knowledge of the Arm protein structure would lay the groundwork for the development of potential resistance inhibitors, which could be used to restore the potential of aminoglycosides to act against the resistant pathogens. We analyzed the sequence-function relationships of Sgm MTase, a member of the Arm family, by limited proteolysis and site-directed and random mutagenesis. We also modeled the structure of Sgm using bioinformatics techniques and used the model to provide a structural context for experimental results. We found that Sgm comprises two domains and we characterized a number of functionally compromised point mutants with substitutions of invariant or conserved residues. Our study provides a low-resolution (residue-level) model of sequence-structure-function relationships in the Arm family of enzymes and reveals the cofactor-binding and substrate-binding sites. These functional regions will be prime targets for further experimental and theoretical studies aimed at defining the reaction mechanism of m7 G1405 methylation, increasing the resolution of the model and developing Arm-specific inhibitors.
Collapse
Affiliation(s)
- Gordana Maravić Vlahovicek
- Department of Biochemistry and Molecular Biology, Faculty of Pharmacy and Biochemistry, University of Zagreb, Ante Kovacića 1, 10000 Zagreb, Croatia.
| | | | | | | |
Collapse
|
56
|
Qi Y, Sadreyev RI, Wang Y, Kim BH, Grishin NV. A comprehensive system for evaluation of remote sequence similarity detection. BMC Bioinformatics 2007; 8:314. [PMID: 17725841 PMCID: PMC2031906 DOI: 10.1186/1471-2105-8-314] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2007] [Accepted: 08/28/2007] [Indexed: 11/25/2022] Open
Abstract
Background Accurate and sensitive performance evaluation is crucial for both effective development of better structure prediction methods based on sequence similarity, and for the comparative analysis of existing methods. Up to date, there has been no satisfactory comprehensive evaluation method that (i) is based on a large and statistically unbiased set of proteins with clearly defined relationships; and (ii) covers all performance aspects of sequence-based structure predictors, such as sensitivity and specificity, alignment accuracy and coverage, and structure template quality. Results With the aim of designing such a method, we (i) select a statistically balanced set of divergent protein domains from SCOP, and define similarity relationships for the majority of these domains by complementing the best of information available in SCOP with a rigorous SVM-based algorithm; and (ii) develop protocols for the assessment of similarity detection and alignment quality from several complementary perspectives. The evaluation of similarity detection is based on ROC-like curves and includes several complementary approaches to the definition of true/false positives. Reference-dependent approaches use the 'gold standard' of pre-defined domain relationships and structure-based alignments. Reference-independent approaches assess the quality of structural match predicted by the sequence alignment, with respect to the whole domain length (global mode) or to the aligned region only (local mode). Similarly, the evaluation of alignment quality includes several reference-dependent and -independent measures, in global and local modes. As an illustration, we use our benchmark to compare the performance of several methods for the detection of remote sequence similarities, and show that different aspects of evaluation reveal different properties of the evaluated methods, highlighting their advantages, weaknesses, and potential for further development. Conclusion The presented benchmark provides a new tool for a statistically unbiased assessment of methods for remote sequence similarity detection, from various complementary perspectives. This tool should be useful both for users choosing the best method for a given purpose, and for developers designing new, more powerful methods. The benchmark set, reference alignments, and evaluation codes can be downloaded from .
Collapse
Affiliation(s)
- Yuan Qi
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
- Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ruslan I Sadreyev
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | - Yong Wang
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | - Bong-Hyun Kim
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
- Department of Biochemistry, University of Texas Southwestern Medical Center, 5323, Harry Hines Blvd, Dallas, TX 75390-9050, USA
| |
Collapse
|
57
|
Pei J, Kim BH, Tang M, Grishin NV. PROMALS web server for accurate multiple protein sequence alignments. Nucleic Acids Res 2007; 35:W649-52. [PMID: 17452345 PMCID: PMC1933189 DOI: 10.1093/nar/gkm227] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Multiple sequence alignments are essential in homology inference, structure modeling, functional prediction and phylogenetic analysis. We developed a web server that constructs multiple protein sequence alignments using PROMALS, a progressive method that improves alignment quality by using additional homologs from PSI-BLAST searches and secondary structure predictions from PSIPRED. PROMALS shows higher alignment accuracy than other advanced methods, such as MUMMALS, ProbCons, MAFFT and SPEM. The PROMALS web server takes FASTA format protein sequences as input. The output includes a colored alignment augmented with information about sequence grouping, predicted secondary structures and positional conservation. The PROMALS web server is available at: http://prodata.swmed.edu/promals/
Collapse
Affiliation(s)
- Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Road, Dallas, Texas 75390-9050, USA.
| | | | | | | |
Collapse
|
58
|
Structural and evolutionary bioinformatics of the SPOUT superfamily of methyltransferases. BMC Bioinformatics 2007; 8:73. [PMID: 17338813 PMCID: PMC1829167 DOI: 10.1186/1471-2105-8-73] [Citation(s) in RCA: 128] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2006] [Accepted: 03/05/2007] [Indexed: 11/29/2022] Open
Abstract
Background SPOUT methyltransferases (MTases) are a large class of S-adenosyl-L-methionine-dependent enzymes that exhibit an unusual alpha/beta fold with a very deep topological knot. In 2001, when no crystal structures were available for any of these proteins, Anantharaman, Koonin, and Aravind identified homology between SpoU and TrmD MTases and defined the SPOUT superfamily. Since then, multiple crystal structures of knotted MTases have been solved and numerous new homologous sequences appeared in the databases. However, no comprehensive comparative analysis of these proteins has been carried out to classify them based on structural and evolutionary criteria and to guide functional predictions. Results We carried out extensive searches of databases of protein structures and sequences to collect all members of previously identified SPOUT MTases, and to identify previously unknown homologs. Based on sequence clustering, characterization of domain architecture, structure predictions and sequence/structure comparisons, we re-defined families within the SPOUT superfamily and predicted putative active sites and biochemical functions for the so far uncharacterized members. We have also delineated the common core of SPOUT MTases and inferred a multiple sequence alignment for the conserved knot region, from which we calculated the phylogenetic tree of the superfamily. We have also studied phylogenetic distribution of different families, and used this information to infer the evolutionary history of the SPOUT superfamily. Conclusion We present the first phylogenetic tree of the SPOUT superfamily since it was defined, together with a new scheme for its classification, and discussion about conservation of sequence and structure in different families, and their functional implications. We identified four protein families as new members of the SPOUT superfamily. Three of these families are functionally uncharacterized (COG1772, COG1901, and COG4080), and one (COG1756 represented by Nep1p) has been already implicated in RNA metabolism, but its biochemical function has been unknown. Based on the inference of orthologous and paralogous relationships between all SPOUT families we propose that the Last Universal Common Ancestor (LUCA) of all extant organisms contained at least three SPOUT members, ancestors of contemporary RNA MTases that carry out m1G, m3U, and 2'O-ribose methylation, respectively. In this work we also speculate on the origin of the knot and propose possible 'unknotted' ancestors. The results of our analysis provide a comprehensive 'roadmap' for experimental characterization of SPOUT MTases and interpretation of functional studies in the light of sequence-structure relationships.
Collapse
|
59
|
Pei J, Grishin NV. PROMALS: towards accurate multiple sequence alignments of distantly related proteins. ACTA ACUST UNITED AC 2007; 23:802-8. [PMID: 17267437 DOI: 10.1093/bioinformatics/btm017] [Citation(s) in RCA: 230] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. RESULTS We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent. AVAILABILITY The PROMALS web server is available at: http://prodata.swmed.edu/promals/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimin Pei
- Howard Hughes Medical Institute, The University of Texas Southwestern Medical Center at Dallas, 6001 Forest Park Road, Dallas, TX 75390-9050, USA.
| | | |
Collapse
|