1
|
Cretin G, Périn C, Zimmermann N, Galochkina T, Gelly JC. ICARUS: flexible protein structural alignment based on Protein Units. Bioinformatics 2023; 39:btad459. [PMID: 37498544 PMCID: PMC10400377 DOI: 10.1093/bioinformatics/btad459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 07/04/2023] [Accepted: 07/26/2023] [Indexed: 07/28/2023] Open
Abstract
MOTIVATION Alignment of protein structures is a major problem in structural biology. The first approach commonly used is to consider proteins as rigid bodies. However, alignment of protein structures can be very complex due to conformational variability, or complex evolutionary relationships between proteins such as insertions, circular permutations or repetitions. In such cases, introducing flexibility becomes useful for two reasons: (i) it can help compare two protein chains which adopted two different conformational states, such as due to proteins/ligands interaction or post-translational modifications, and (ii) it aids in the identification of conserved regions in proteins that may have distant evolutionary relationships. RESULTS We propose ICARUS, a new approach for flexible structural alignment based on identification of Protein Units, evolutionarily preserved structural descriptors of intermediate size, between secondary structures and domains. ICARUS significantly outperforms reference methods on a dataset of very difficult structural alignments. AVAILABILITY AND IMPLEMENTATION Code is freely available online at https://github.com/DSIMB/ICARUS.
Collapse
Affiliation(s)
- Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Charlotte Périn
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
- TBI, Université de Toulouse, CNRS, INRAE, INSA, 31077 Toulouse, France
| | - Nicolas Zimmermann
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, F-75015 Paris, France
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| |
Collapse
|
2
|
Jayaraman V, Toledo‐Patiño S, Noda‐García L, Laurino P. Mechanisms of protein evolution. Protein Sci 2022; 31:e4362. [PMID: 35762715 PMCID: PMC9214755 DOI: 10.1002/pro.4362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/11/2022] [Accepted: 05/14/2022] [Indexed: 11/06/2022]
Abstract
How do proteins evolve? How do changes in sequence mediate changes in protein structure, and in turn in function? This question has multiple angles, ranging from biochemistry and biophysics to evolutionary biology. This review provides a brief integrated view of some key mechanistic aspects of protein evolution. First, we explain how protein evolution is primarily driven by randomly acquired genetic mutations and selection for function, and how these mutations can even give rise to completely new folds. Then, we also comment on how phenotypic protein variability, including promiscuity, transcriptional and translational errors, may also accelerate this process, possibly via "plasticity-first" mechanisms. Finally, we highlight open questions in the field of protein evolution, with respect to the emergence of more sophisticated protein systems such as protein complexes, pathways, and the emergence of pre-LUCA enzymes.
Collapse
Affiliation(s)
- Vijay Jayaraman
- Department of Molecular Cell BiologyWeizmann Institute of ScienceRehovotIsrael
| | - Saacnicteh Toledo‐Patiño
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| | - Lianet Noda‐García
- Department of Plant Pathology and Microbiology, Institute of Environmental Sciences, Robert H. Smith Faculty of Agriculture, Food and EnvironmentHebrew University of JerusalemRehovotIsrael
| | - Paola Laurino
- Protein Engineering and Evolution UnitOkinawa Institute of Science and Technology Graduate UniversityOkinawaJapan
| |
Collapse
|
3
|
Daniluk P, Oleniecki T, Lesyng B. DAMA: a method for computing multiple alignments of protein structures using local structure descriptors. Bioinformatics 2021; 38:80-85. [PMID: 34396393 PMCID: PMC8696102 DOI: 10.1093/bioinformatics/btab571] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 05/31/2021] [Accepted: 08/12/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The well-known fact that protein structures are more conserved than their sequences forms the basis of several areas of computational structural biology. Methods based on the structure analysis provide more complete information on residue conservation in evolutionary processes. This is crucial for the determination of evolutionary relationships between proteins and for the identification of recurrent structural patterns present in biomolecules involved in similar functions. However, algorithmic structural alignment is much more difficult than multiple sequence alignment. This study is devoted to the development and applications of DAMA-a novel effective environment capable to compute and analyze multiple structure alignments. RESULTS DAMA is based on local structural similarities, using local 3D structure descriptors and thus accounts for nearest-neighbor molecular environments of aligned residues. It is constrained neither by protein topology nor by its global structure. DAMA is an extension of our previous study (DEDAL) which demonstrated the applicability of local descriptors to pairwise alignment problems. Since the multiple alignment problem is NP-complete, an effective heuristic approach has been developed without imposing any artificial constraints. The alignment algorithm searches for the largest, consistent ensemble of similar descriptors. The new method is capable to capture most of the biologically significant similarities present in canonical test sets and is discriminatory enough to prevent the emergence of larger, but meaningless, solutions. Tests performed on the test sets, including protein kinases, demonstrate DAMA's capability of identifying equivalent residues, which should be very useful in discovering the biological nature of proteins similarity. Performance profiles show the advantage of DAMA over other methods, in particular when using a strict similarity measure QC, which is the ratio of correctly aligned columns, and when applying the methods to more difficult cases. AVAILABILITY AND IMPLEMENTATION DAMA is available online at http://dworkowa.imdik.pan.pl/EP/DAMA. Linux binaries of the software are available upon request. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Paweł Daniluk
- Bioinformatics Laboratory, Mossakowski Medical Research Centre, Polish Academy of Sciences, 02-106 Warsaw, Poland
| | - Tymoteusz Oleniecki
- College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, University of Warsaw, 02-089 Warsaw, Poland
| | | |
Collapse
|
4
|
Talibart H, Coste F. PPalign: optimal alignment of Potts models representing proteins with direct coupling information. BMC Bioinformatics 2021; 22:317. [PMID: 34112081 PMCID: PMC8191105 DOI: 10.1186/s12859-021-04222-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 05/25/2021] [Indexed: 11/29/2022] Open
Abstract
Background To assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models, which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose here to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. Due to non-local dependencies, the problem of aligning Potts models is hard and remains the main computational bottleneck for their use. Methods We introduce here an Integer Linear Programming formulation of the problem and PPalign, a program based on this formulation, to compute the optimal pairwise alignment of Potts models representing proteins in tractable time. The approach is assessed with respect to a non-redundant set of reference pairwise sequence alignments from SISYPHUS benchmark which have lowest sequence identity (between \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$3\%$$\end{document}3% and \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$20\%$$\end{document}20%) and enable to build reliable Potts models for each sequence to be aligned. This experimentation confirms that Potts models can be aligned in reasonable time (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$1'37''$$\end{document}1′37′′ in average on these alignments). The contribution of couplings is evaluated in comparison with HHalign and independent-site PPalign. Although Potts models were not fully optimized for alignment purposes and simple gap scores were used, PPalign yields a better mean \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$F_1$$\end{document}F1 score and finds significantly better alignments than HHalign and PPalign without couplings in some cases. Conclusions These results show that pairwise couplings from protein Potts models can be used to improve the alignment of remotely related protein sequences in tractable time. Our experimentation suggests yet that new research on the inference of Potts models is now needed to make them more comparable and suitable for homology search. We think that PPalign’s guaranteed optimality will be a powerful asset to perform unbiased investigations in this direction.
Collapse
|
5
|
Le HT, Do PC, Le L. Grafting Methionine on 1F1 Ab Increases the Broad-Activity on HA Structural-Conserved Residues of H1, H2, and H3 Influenza a Viruses. Evol Bioinform Online 2021; 17:11769343211003082. [PMID: 33795930 PMCID: PMC7975486 DOI: 10.1177/11769343211003082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 02/24/2021] [Indexed: 11/27/2022] Open
Abstract
A high level of mutation enables the influenza A virus to resist antibiotics
previously effective against the influenza A virus. A portion of the structure
of hemagglutinin HA is assumed to be well-conserved to maintain its role in
cellular fusion, and the structure tends to be more conserved than sequence. We
designed peptide inhibitors to target the conserved residues on the HA surface,
which were identified based on structural alignment. Most of the conserved and
strongly similar residues are located in the receptor-binding and esterase
regions on the HA1 domain In a later step, fragments of anti-HA antibodies were
gathered and screened for the binding ability to the found conserved residues.
As a result, Methionine amino acid got the best docking score within the −2.8 Å
radius of Van der Waals when it is interacting with Tyrosine, Arginine, and
Glutamic acid. Then, the binding affinity and spectrum of the fragments were
enhanced by grafting hotspot amino acid into the fragments to form peptide
inhibitors. Our peptide inhibitor was able to form in silico contact with a
structurally conserved region across H1, H2, and H3 HA, with the binding site at
the boundary between HA1 and HA2 domains, spreading across different monomers,
suggesting a new target for designing broad-spectrum antibody and vaccine. This
research presents an affordable method to design broad-spectrum peptide
inhibitors using fragments of an antibody as a scaffold.
Collapse
Affiliation(s)
- Hoa Thanh Le
- School of Biotechnology, International University, Ho Chi Minh City, Vietnam.,Vietnam National University, Ho Chi Minh City, Vietnam
| | - Phuc-Chau Do
- School of Biotechnology, International University, Ho Chi Minh City, Vietnam.,Vietnam National University, Ho Chi Minh City, Vietnam
| | - Ly Le
- School of Biotechnology, International University, Ho Chi Minh City, Vietnam.,Vietnam National University, Ho Chi Minh City, Vietnam.,Vingroup Big Data Institute, Hanoi, Vietnam
| |
Collapse
|
6
|
Searching protein space for ancient sub-domain segments. Curr Opin Struct Biol 2021; 68:105-112. [PMID: 33476896 DOI: 10.1016/j.sbi.2020.11.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/29/2020] [Indexed: 01/08/2023]
Abstract
Evolutionary processes that formed the current protein universe left their traces, among them homologous segments that recur, or are 'reused,' in multiple proteins. These reused segments, called 'themes,' can be found at various scales, the best known of which is the domain. Yet, recent studies have begun to focus on the evolutionary insights that can be derived from sub-domain-scale themes, which are candidates for traces of more ancient events. Characterizing these may provide clues to the emergence of domains. Particularly interesting are themes that are reused across dissimilar contexts, that is, where the rest of the protein domain differs. We survey computational studies identifying reused themes within different contexts at the sub-domain level.
Collapse
|
7
|
Carpentier M, Chomilier J. Protein multiple alignments: sequence-based versus structure-based programs. Bioinformatics 2020; 35:3970-3980. [PMID: 30942864 DOI: 10.1093/bioinformatics/btz236] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 03/05/2019] [Accepted: 04/02/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Multiple sequence alignment programs have proved to be very useful and have already been evaluated in the literature yet not alignment programs based on structure or both sequence and structure. In the present article we wish to evaluate the added value provided through considering structures. RESULTS We compared the multiple alignments resulting from 25 programs either based on sequence, structure or both, to reference alignments deposited in five databases (BALIBASE 2 and 3, HOMSTRAD, OXBENCH and SISYPHUS). On the whole, the structure-based methods compute more reliable alignments than the sequence-based ones, and even than the sequence+structure-based programs whatever the databases. Two programs lead, MAMMOTH and MATRAS, nevertheless the performances of MUSTANG, MATT, 3DCOMB, TCOFFEE+TM_ALIGN and TCOFFEE+SAP are better for some alignments. The advantage of structure-based methods increases at low levels of sequence identity, or for residues in regular secondary structures or buried ones. Concerning gap management, sequence-based programs set less gaps than structure-based programs. Concerning the databases, the alignments of the manually built databases are more challenging for the programs. AVAILABILITY AND IMPLEMENTATION All data and results presented in this study are available at: http://wwwabi.snv.jussieu.fr/people/mathilde/download/AliMulComp/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mathilde Carpentier
- Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE, Paris, France
| | - Jacques Chomilier
- Sorbonne Université, MNHN, CNRS, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC), BiBiP, Paris, France
| |
Collapse
|
8
|
Rozewicki J, Li S, Amada KM, Standley DM, Katoh K. MAFFT-DASH: integrated protein sequence and structural alignment. Nucleic Acids Res 2019; 47:W5-W10. [PMID: 31062021 PMCID: PMC6602451 DOI: 10.1093/nar/gkz342] [Citation(s) in RCA: 242] [Impact Index Per Article: 48.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 04/07/2019] [Accepted: 04/25/2019] [Indexed: 12/22/2022] Open
Abstract
Here, we describe a web server that integrates structural alignments with the MAFFT multiple sequence alignment (MSA) tool. For this purpose, we have prepared a web-based Database of Aligned Structural Homologs (DASH), which provides structural alignments at the domain and chain levels for all proteins in the Protein Data Bank (PDB), and can be queried interactively or by a simple REST-like API. MAFFT-DASH integration can be invoked with a single flag on either the web (https://mafft.cbrc.jp/alignment/server/) or command-line versions of MAFFT. In our benchmarks using 878 cases from the BAliBase, HomFam, OXFam, Mattbench and SISYPHUS datasets, MAFFT-DASH showed 10-20% improvement over standard MAFFT for MSA problems with weak similarity, in terms of Sum-of-Pairs (SP), a measure of how well a program succeeds at aligning input sequences in comparison to a reference alignment. When MAFFT alignments were supplemented with homologous sequences, further improvement was observed. Potential applications of DASH beyond MSA enrichment include functional annotation through detection of remote homology and assembly of template libraries for homology modeling.
Collapse
Affiliation(s)
- John Rozewicki
- Department of Genome Informatics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan
- Systems Immunology Laboratory, Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan
| | - Songling Li
- Department of Genome Informatics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan
- Systems Immunology Laboratory, Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan
| | - Karlou Mar Amada
- Systems Immunology Laboratory, Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan
| | - Daron M Standley
- Department of Genome Informatics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan
- Systems Immunology Laboratory, Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan
| | - Kazutaka Katoh
- Department of Genome Informatics, Genome Information Research Center, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan
- Systems Immunology Laboratory, Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita 565-0871, Japan
| |
Collapse
|
9
|
Nute M, Saleh E, Warnow T. Evaluating Statistical Multiple Sequence Alignment in Comparison to Other Alignment Methods on Protein Data Sets. Syst Biol 2019; 68:396-411. [PMID: 30329135 PMCID: PMC6472439 DOI: 10.1093/sysbio/syy068] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 09/27/2018] [Accepted: 10/11/2018] [Indexed: 01/15/2023] Open
Abstract
The estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical coestimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical coestimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy has better precision and recall (with respect to the true alignments) than the other alignment methods on the simulated data sets but has consistently lower recall on the biological benchmarks (with respect to the reference alignments) than many of the other methods. In other words, we find that BAli-Phy systematically underaligns when operating on biological sequence data but shows no sign of this on simulated data. There are several potential causes for this change in performance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments, and future research is needed to determine the most likely explanation. We conclude with a discussion of the potential ramifications for each of these possibilities. [BAli-Phy; homology; multiple sequence alignment; protein sequences; structural alignment.]
Collapse
Affiliation(s)
- Michael Nute
- Department of Statistics, University of Illinois at Urbana-Champaign, 725 S Wright St #101, Champaign, IL 61820, USA
| | - Ehsan Saleh
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Ave, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, 201 N. Goodwin Ave, Urbana, IL 61801, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, 1205 W. Clark St., Urbana, IL 61801, USA.,National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
10
|
RUPEE: A fast and accurate purely geometric protein structure search. PLoS One 2019; 14:e0213712. [PMID: 30875409 PMCID: PMC6420038 DOI: 10.1371/journal.pone.0213712] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 02/27/2019] [Indexed: 11/19/2022] Open
Abstract
Given the close relationship between protein structure and function, protein structure searches have long played an established role in bioinformatics. Despite their maturity, existing protein structure searches either use simplifying assumptions or compromise between fast response times and quality of results. These limitations can prevent the easy and efficient exploration of relationships between protein structures, which is the norm in other areas of inquiry. To address these limitations we have developed RUPEE, a fast and accurate purely geometric structure search combining techniques from information retrieval and big data with a novel approach to encoding sequences of torsion angles. Comparing our results to the output of mTM, SSM, and the CATHEDRAL structural scan, it is clear that RUPEE has set a new bar for purely geometric big data approaches to protein structure searches. RUPEE in top-aligned mode produces equal or better results than the best available protein structure searches, and RUPEE in fast mode demonstrates the fastest response times coupled with high quality results. The RUPEE protein structure search is available at https://ayoubresearch.com. Code and data are available at https://github.com/rayoub/rupee.
Collapse
|
11
|
Alva V, Söding J, Lupas AN. A vocabulary of ancient peptides at the origin of folded proteins. eLife 2015; 4:e09410. [PMID: 26653858 PMCID: PMC4739770 DOI: 10.7554/elife.09410] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2015] [Accepted: 12/13/2015] [Indexed: 01/01/2023] Open
Abstract
The seemingly limitless diversity of proteins in nature arose from only a few thousand domain prototypes, but the origin of these themselves has remained unclear. We are pursuing the hypothesis that they arose by fusion and accretion from an ancestral set of peptides active as co-factors in RNA-dependent replication and catalysis. Should this be true, contemporary domains may still contain vestiges of such peptides, which could be reconstructed by a comparative approach in the same way in which ancient vocabularies have been reconstructed by the comparative study of modern languages. To test this, we compared domains representative of known folds and identified 40 fragments whose similarity is indicative of common descent, yet which occur in domains currently not thought to be homologous. These fragments are widespread in the most ancient folds and enriched for iron-sulfur- and nucleic acid-binding. We propose that they represent the observable remnants of a primordial RNA-peptide world.
Collapse
Affiliation(s)
- Vikram Alva
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Johannes Söding
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Andrei N Lupas
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| |
Collapse
|
12
|
Terashi G, Takeda-Shitaka M. CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area. PLoS One 2015; 10:e0141440. [PMID: 26502070 PMCID: PMC4621035 DOI: 10.1371/journal.pone.0141440] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Accepted: 10/08/2015] [Indexed: 12/26/2022] Open
Abstract
Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue–residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue–residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html.
Collapse
Affiliation(s)
- Genki Terashi
- School of Pharmacy, Kitasato University, Tokyo, Japan
| | | |
Collapse
|
13
|
Goncearenco A, Berezovsky IN. Protein function from its emergence to diversity in contemporary proteins. Phys Biol 2015; 12:045002. [PMID: 26057563 DOI: 10.1088/1478-3975/12/4/045002] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The goal of this work is to learn from nature the rules that govern evolution and the design of protein function. The fundamental laws of physics lie in the foundation of the protein structure and all stages of the protein evolution, determining optimal sizes and shapes at different levels of structural hierarchy. We looked back into the very onset of the protein evolution with a goal to find elementary functions (EFs) that came from the prebiotic world and served as building blocks of the first enzymes. We defined the basic structural and functional units of biochemical reactions-elementary functional loops. The diversity of contemporary enzymes can be described via combinations of a limited number of elementary chemical reactions, many of which are performed by the descendants of primitive prebiotic peptides/proteins. By analyzing protein sequences we were able to identify EFs shared by seemingly unrelated protein superfamilies and folds and to unravel evolutionary relations between them. Binding and metabolic processing of the metal- and nucleotide-containing cofactors and ligands are among the most abundant ancient EFs that became indispensable in many natural enzymes. Highly designable folds provide structural scaffolds for many different biochemical reactions. We show that contemporary proteins are built from a limited number of EFs, making their analysis instrumental for establishing the rules for protein design. Evolutionary studies help us to accumulate the library of essential EFs and to establish intricate relations between different folds and functional superfamilies. Generalized sequence-structure descriptors of the EF will become useful in future design and engineering of desired enzymatic functions.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Unit and Department of Informatics, University of Bergen, N-5008 Bergen, Norway
| | | |
Collapse
|
14
|
Abstract
To explore protein space from a global perspective, we consider 9,710 SCOP (Structural Classification of Proteins) domains with up to 70% sequence identity and present all similarities among them as networks: In the "domain network," nodes represent domains, and edges connect domains that share "motifs," i.e., significantly sized segments of similar sequence and structure. We explore the dependence of the network on the thresholds that define the evolutionary relatedness of the domains. At excessively strict thresholds the network falls apart completely; for very lax thresholds, there are network paths between virtually all domains. Interestingly, at intermediate thresholds the network constitutes two regions that can be described as "continuous" versus "discrete." The continuous region comprises a large connected component, dominated by domains with alternating alpha and beta elements, and the discrete region includes the rest of the domains in isolated islands, each generally corresponding to a fold. We also construct the "motif network," in which nodes represent recurring motifs, and edges connect motifs that appear in the same domain. This network also features a large and highly connected component of motifs that originate from domains with alternating alpha/beta elements (and some all-alpha domains), and smaller isolated islands. Indeed, the motif network suggests that nature reuses such motifs extensively. The networks suggest evolutionary paths between domains and give hints about protein evolution and the underlying biophysics. They provide natural means of organizing protein space, and could be useful for the development of strategies for protein search and design.
Collapse
|
15
|
Kopec KO, Lupas AN. β-Propeller blades as ancestral peptides in protein evolution. PLoS One 2013; 8:e77074. [PMID: 24143202 PMCID: PMC3797127 DOI: 10.1371/journal.pone.0077074] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2013] [Accepted: 09/05/2013] [Indexed: 12/04/2022] Open
Abstract
Proteins of the β-propeller fold are ubiquitous in nature and widely used as structural scaffolds for ligand binding and enzymatic activity. This fold comprises between four and twelve four-stranded β-meanders, the so called blades that are arranged circularly around a central funnel-shaped pore. Despite the large size range of β-propellers, their blades frequently show sequence similarity indicative of a common ancestry and it has been proposed that the majority of β-propellers arose divergently by amplification and diversification of an ancestral blade. Given the structural versatility of β-propellers and the hypothesis that the first folded proteins evolved from a simpler set of peptides, we investigated whether this blade may have given rise to other folds as well. Using sequence comparisons, we identified proteins of four other folds as potential homologs of β-propellers: the luminal domain of inositol-requiring enzyme 1 (IRE1-LD), type II β-prisms, β-pinwheels, and WW domains. Because, with increasing evolutionary distance and decreasing sequence length, the statistical significance of sequence comparisons becomes progressively harder to distinguish from the background of convergent similarities, we complemented our analyses with a new method that evaluates possible homology based on the correlation between sequence and structure similarity. Our results indicate a homologous relationship of IRE1-LD and type II β-prisms with β-propellers, and an analogous one for β-pinwheels and WW domains. Whereas IRE1-LD most likely originated by fold-changing mutations from a fully formed PQQ motif β-propeller, type II β-prisms originated by amplification and differentiation of a single blade, possibly also of the PQQ type. We conclude that both β-propellers and type II β-prisms arose by independent amplification of a blade-sized fragment, which represents a remnant of an ancient peptide world.
Collapse
Affiliation(s)
- Klaus O. Kopec
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen, Baden-Württemberg, Germany
| | - Andrei N. Lupas
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen, Baden-Württemberg, Germany
- * E-mail:
| |
Collapse
|
16
|
Topham CM, Rouquier M, Tarrat N, André I. Adaptive Smith-Waterman residue match seeding for protein structural alignment. Proteins 2013; 81:1823-39. [DOI: 10.1002/prot.24327] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2013] [Revised: 04/22/2013] [Accepted: 05/15/2013] [Indexed: 12/30/2022]
Affiliation(s)
- Christopher M. Topham
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Mickaël Rouquier
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Nathalie Tarrat
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| | - Isabelle André
- Université de Toulouse, INSA, UPS, INP, LISBP; 135 Avenue de Rangueil F-31077 Toulouse France
- CNRS, UMR5504; F-31400 Toulouse France
- INRA, UMR792 Ingénierie des Systèmes Biologiques et des Procédés; F-31400 Toulouse France
| |
Collapse
|
17
|
Affiliation(s)
- Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa 31905, Israel;
| | - Leonid Pereyaslavets
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| | | | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| |
Collapse
|
18
|
Minami S, Sawada K, Chikenji G. MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C(α) only models, Alternative alignments, and Non-sequential alignments. BMC Bioinformatics 2013; 14:24. [PMID: 23331634 PMCID: PMC3637537 DOI: 10.1186/1471-2105-14-24] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 01/08/2013] [Indexed: 11/10/2022] Open
Abstract
Background Protein pairs that have the same secondary structure packing arrangement but have different topologies have attracted much attention in terms of both evolution and physical chemistry of protein structures. Further investigation of such protein relationships would give us a hint as to how proteins can change their fold in the course of evolution, as well as a insight into physico-chemical properties of secondary structure packing. For this purpose, highly accurate sequence order independent structure comparison methods are needed. Results We have developed a novel protein structure alignment algorithm, MICAN (a structure alignment algorithm that can handle Multiple-chain complexes, Inverse direction of secondary structures, Cα only models, Alternative alignments, and Non-sequential alignments). The algorithm was designed so as to identify the best structural alignment between protein pairs by disregarding the connectivity between secondary structure elements (SSE). One of the key feature of the algorithm is utilizing the multiple vector representation for each SSE, which enables us to correctly treat bent or twisted nature of long SSE. We compared MICAN with other 9 publicly available structure alignment programs, using both reference-dependent and reference-independent evaluation methods on a variety of benchmark test sets which include both sequential and non-sequential alignments. We show that MICAN outperforms the other existing methods for reproducing reference alignments of non-sequential test sets. Further, although MICAN does not specialize in sequential structure alignment, it showed the top level performance on the sequential test sets. We also show that MICAN program is the fastest non-sequential structure alignment program among all the programs we examined here. Conclusions MICAN is the fastest and the most accurate program among non-sequential alignment programs we examined here. These results suggest that MICAN is a highly effective tool for automatically detecting non-trivial structural relationships of proteins, such as circular permutations and segment-swapping, many of which have been identified manually by human experts so far. The source code of MICAN is freely download-able at http://www.tbp.cse.nagoya-u.ac.jp/MICAN.
Collapse
Affiliation(s)
- Shintaro Minami
- Department of Computational Science and Engineering, Nagoya University, Nagoya 464-8603, Japan
| | | | | |
Collapse
|
19
|
Wohlers I, Andonov R, Klau GW. DALIX: optimal DALI protein structure alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:26-36. [PMID: 23702541 DOI: 10.1109/tcbb.2012.143] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
We present a mathematical model and exact algorithm for optimally aligning protein structures using the DALI scoring model. This scoring model is based on comparing the interresidue distance matrices of proteins and is used in the popular DALI software tool, a heuristic method for protein structure alignment. Our model and algorithm extend an integer linear programming approach that has been previously applied for the related, but simpler, contact map overlap problem. To this end, we introduce a novel type of constraint that handles negative score values and relax it in a Lagrangian fashion. The new algorithm, which we call DALIX, is applicable to any distance matrix-based scoring scheme. We also review options that allow to consider fewer pairs of interresidue distances explicitly because their large number hinders the optimization process. Using four known data sets of varying structural similarity, we compute many provably score-optimal DALI alignments. This allowed, for the first time, to evaluate the DALI heuristic in sound mathematical terms. The results indicate that DALI usually computes optimal or close to optimal alignments. However, we detect a subset of small proteins for which DALI fails to generate any significant alignment, although such alignments do exist.
Collapse
Affiliation(s)
- Inken Wohlers
- Genominformatik, Universität Duisburg-Essen/Universitätsklinikum, Germany.
| | | | | |
Collapse
|
20
|
Arriagada M, Poleksic A. On the difference in quality between current heuristic and optimal solutions to the protein structure alignment problem. BIOMED RESEARCH INTERNATIONAL 2012; 2013:459248. [PMID: 23509725 PMCID: PMC3591119 DOI: 10.1155/2013/459248] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2012] [Accepted: 11/02/2012] [Indexed: 11/17/2022]
Abstract
The importance of pairwise protein structural comparison in biomedical research is fueling the search for algorithms capable of finding more accurate structural match of two input proteins in a timely manner. In recent years, we have witnessed rapid advances in the development of methods for approximate and optimal solutions to the protein structure matching problem. Albeit slow, these methods can be extremely useful in assessing the accuracy of more efficient, heuristic algorithms. We utilize a recently developed approximation algorithm for protein structure matching to demonstrate that a deep search of the protein superposition space leads to increased alignment accuracy with respect to many well-established measures of alignment quality. The results of our study suggest that a large and important part of the protein superposition space remains unexplored by current techniques for protein structure alignment.
Collapse
Affiliation(s)
- Mauricio Arriagada
- Department of Computer Science, School of Engineering, Pontificia Universidad Católica de Chile, 4860 Avenue Vicuña Mackenna, 6904411 Santiago, Chile
| | - Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, 1227 West 27th Street, Cedar Falls, IA 50613, USA
| |
Collapse
|
21
|
Shih HH, Tu C, Cao W, Klein A, Ramsey R, Fennell BJ, Lambert M, Ní Shúilleabháin D, Autin B, Kouranova E, Laxmanan S, Braithwaite S, Wu L, Ait-Zahra M, Milici AJ, Dumin JA, LaVallie ER, Arai M, Corcoran C, Paulsen JE, Gill D, Cunningham O, Bard J, Mosyak L, Finlay WJJ. An ultra-specific avian antibody to phosphorylated tau protein reveals a unique mechanism for phosphoepitope recognition. J Biol Chem 2012; 287:44425-34. [PMID: 23148212 DOI: 10.1074/jbc.m112.415935] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Highly specific antibodies to phosphoepitopes are valuable tools to study phosphorylation in disease states, but their discovery is largely empirical, and the molecular mechanisms mediating phosphospecific binding are poorly understood. Here, we report the generation and characterization of extremely specific recombinant chicken antibodies to three phosphoepitopes on the Alzheimer disease-associated protein tau. Each antibody shows full specificity for a single phosphopeptide. The chimeric IgG pT231/pS235_1 exhibits a K(D) of 0.35 nm in 1:1 binding to its cognate phosphopeptide. This IgG is murine ortholog-cross-reactive, specifically recognizing the pathological form of tau in brain samples from Alzheimer patients and a mouse model of tauopathy. To better understand the underlying binding mechanisms allowing such remarkable specificity, we determined the structure of pT231/pS235_1 Fab in complex with its cognate phosphopeptide at 1.9 Å resolution. The Fab fragment exhibits novel complementarity determining region (CDR) structures with a "bowl-like" conformation in CDR-H2 that tightly and specifically interacts with the phospho-Thr-231 phosphate group, as well as a long, disulfide-constrained CDR-H3 that mediates peptide recognition. This binding mechanism differs distinctly from either peptide- or hapten-specific antibodies described to date. Surface plasmon resonance analyses showed that pT231/pS235_1 binds a truly compound epitope, as neither phosphorylated Ser-235 nor free peptide shows any measurable binding affinity.
Collapse
Affiliation(s)
- Heather H Shih
- Global Biotherapeutics Technologies, Pfizer Global Research & Development, Cambridge, Massachusetts 02140, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Wohlers I, Malod-Dognin N, Andonov R, Klau GW. CSA: comprehensive comparison of pairwise protein structure alignments. Nucleic Acids Res 2012; 40:W303-9. [PMID: 22553365 PMCID: PMC3394275 DOI: 10.1093/nar/gks362] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Revised: 03/29/2012] [Accepted: 04/10/2012] [Indexed: 11/23/2022] Open
Abstract
CSA is a web server for the computation, evaluation and comprehensive comparison of pairwise protein structure alignments. Its exact alignment engine computes either optimal, top-scoring alignments or heuristic alignments with quality guarantee for the inter-residue distance-based scorings of contact map overlap, PAUL, DALI and MATRAS. These and additional, uploaded alignments are compared using a number of quality measures and intuitive visualizations. CSA brings new insight into the structural relationship of the protein pairs under investigation and is a valuable tool for studying structural similarities. It is available at http://csa.project.cwi.nl.
Collapse
Affiliation(s)
- Inken Wohlers
- Life Sciences Group, Centrum Wiskunde & Informatica, Science Park 123, 1098 XG Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
23
|
Affiliation(s)
- Spencer Bliven
- Bioinformatics Program, University of California, San Diego, La Jolla, California, United States of America
- * E-mail: (SB); (AP)
| | - Andreas Prlić
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, United States of America
- * E-mail: (SB); (AP)
| |
Collapse
|
24
|
Kovalchuk N, Smith J, Bazanova N, Pyvovarenko T, Singh R, Shirley N, Ismagul A, Johnson A, Milligan AS, Hrmova M, Langridge P, Lopato S. Characterization of the wheat gene encoding a grain-specific lipid transfer protein TdPR61, and promoter activity in wheat, barley and rice. JOURNAL OF EXPERIMENTAL BOTANY 2012; 63:2025-40. [PMID: 22213809 DOI: 10.1093/jxb/err409] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The TaPR61 gene from bread wheat encodes a lipid transfer protein (LTP) with a hydrophobic signal peptide, predicted to direct the TaPR61 protein to the apoplast. Modelling of TaPR61 revealed the presence of an internal cavity which can accommodate at least two lipid molecules. The full-length gene, including the promoter sequence of a TaPR61 orthologue, was cloned from a BAC library of Triticum durum. Quantitative RT-PCR analysis revealed the presence of TaPR61 and TdPR61 mainly in grain. A transcriptional TdPR61 promoter-GUS fusion was stably transformed into wheat, barley, and rice. The strongest GUS expression in all three plants was found in the endosperm transfer cells, the embryo surrounding region (ESR), and in the embryo. The promoter is strong and has similar but not identical spatial patterns of activity in wheat, barley, and rice. These results suggest that the TdPR61 promoter will be a useful tool for improving grain quality by manipulating the quality and quantity of nutrient/lipid uptake to the endosperm and embryo. Mapping of regions important for the promoter function using transient expression assays in developing embryos resulted in the identification of two segments important for promoter activation in embryos. The putative cis-elements from the distal segment were used as bait in a yeast 1-hybrid (Y1H) screen of a cDNA library prepared from the liquid part of the wheat multinucleate syncytium. A transcription factor isolated in the screen is similar to BES1/BLZ1 from Arabidopsis, which is known to be a key transcriptional regulator of the brassinosteroid signalling pathway.
Collapse
Affiliation(s)
- Nataliya Kovalchuk
- Australian Centre for Plant Functional Genomics, University of Adelaide, South Australia, Australia
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Abstract
The wealth of available protein structural data provides unprecedented opportunity to study and better understand the underlying principles of protein folding and protein structure evolution. A key to achieving this lies in the ability to analyse these data and to organize them in a coherent classification scheme. Over the past years several protein classifications have been developed that aim to group proteins based on their structural relationships. Some of these classification schemes explore the concept of structural neighbourhood (structural continuum), whereas other utilize the notion of protein evolution and thus provide a discrete rather than continuum view of protein structure space. This chapter presents a strategy for classification of proteins with known three-dimensional structure. Steps in the classification process along with basic definitions are introduced. Examples illustrating some fundamental concepts of protein folding and evolution with a special focus on the exceptions to them are presented.
Collapse
|
26
|
Sun H, Sacan A, Ferhatosmanoglu H, Wang Y. Smolign: a spatial motifs-based protein multiple structural alignment method. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:249-261. [PMID: 21464513 DOI: 10.1109/tcbb.2011.67] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Availability of an effective tool for protein multiple structural alignment (MSTA) is essential for discovery and analysis of biologically significant structural motifs that can help solve functional annotation and drug design problems. Existing MSTA methods collect residue correspondences mostly through pairwise comparison of consecutive fragments, which can lead to suboptimal alignments, especially when the similarity among the proteins is low. We introduce a novel strategy based on: building a contact-window based motif library from the protein structural data, discovery and extension of common alignment seeds from this library, and optimal superimposition of multiple structures according to these alignment seeds by an enhanced partial order curve comparison method. The ability of our strategy to detect multiple correspondences simultaneously, to catch alignments globally, and to support flexible alignments, endorse a sensitive and robust automated algorithm that can expose similarities among protein structures even under low similarity conditions. Our method yields better alignment results compared to other popular MSTA methods, on several protein structure data sets that span various structural folds and represent different protein similarity levels. A web-based alignment tool, a downloadable executable, and detailed alignment results for the data sets used here are available at http://sacan.biomed. drexel.edu/Smolign and http://bio.cse.ohio-state.edu/Smolign.
Collapse
Affiliation(s)
- Hong Sun
- The Ohio State University, Columbus
| | | | | | | |
Collapse
|
27
|
Use of comparative genomics approaches to characterize interspecies differences in response to environmental chemicals: challenges, opportunities, and research needs. Toxicol Appl Pharmacol 2011; 271:372-85. [PMID: 22142766 DOI: 10.1016/j.taap.2011.11.011] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Revised: 11/11/2011] [Accepted: 11/16/2011] [Indexed: 01/12/2023]
Abstract
A critical challenge for environmental chemical risk assessment is the characterization and reduction of uncertainties introduced when extrapolating inferences from one species to another. The purpose of this article is to explore the challenges, opportunities, and research needs surrounding the issue of how genomics data and computational and systems level approaches can be applied to inform differences in response to environmental chemical exposure across species. We propose that the data, tools, and evolutionary framework of comparative genomics be adapted to inform interspecies differences in chemical mechanisms of action. We compare and contrast existing approaches, from disciplines as varied as evolutionary biology, systems biology, mathematics, and computer science, that can be used, modified, and combined in new ways to discover and characterize interspecies differences in chemical mechanism of action which, in turn, can be explored for application to risk assessment. We consider how genetic, protein, pathway, and network information can be interrogated from an evolutionary biology perspective to effectively characterize variations in biological processes of toxicological relevance among organisms. We conclude that comparative genomics approaches show promise for characterizing interspecies differences in mechanisms of action, and further, for improving our understanding of the uncertainties inherent in extrapolating inferences across species in both ecological and human health risk assessment. To achieve long-term relevance and consistent use in environmental chemical risk assessment, improved bioinformatics tools, computational methods robust to data gaps, and quantitative approaches for conducting extrapolations across species are critically needed. Specific areas ripe for research to address these needs are recommended.
Collapse
|
28
|
SALEM SAEED, ZAKI MOHAMMEDJ, BYSTROFF CHRISTOPHER. ITERATIVE NON-SEQUENTIAL PROTEIN STRUCTURAL ALIGNMENT. J Bioinform Comput Biol 2011; 7:571-96. [PMID: 19507290 DOI: 10.1142/s0219720009004205] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2008] [Revised: 11/05/2008] [Accepted: 11/06/2008] [Indexed: 11/18/2022]
Abstract
Structural similarity between proteins gives us insights into their evolutionary relationships when there is low sequence similarity. In this paper, we present a novel approach called SNAP for non-sequential pair-wise structural alignment. Starting from an initial alignment, our approach iterates over a two-step process consisting of a superposition step and an alignment step, until convergence. We propose a novel greedy algorithm to construct both sequential and non-sequential alignments. The quality of SNAP alignments were assessed by comparing against the manually curated reference alignments in the challenging SISY and RIPC datasets. Moreover, when applied to a dataset of 4410 protein pairs selected from the CATH database, SNAP produced longer alignments with lower rmsd than several state-of-the-art alignment methods. Classification of folds using SNAP alignments was both highly sensitive and highly selective. The SNAP software along with the datasets are available online at
Collapse
Affiliation(s)
- SAEED SALEM
- Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th st. Troy, New York 12180, USA
| | - MOHAMMED J. ZAKI
- Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th st. Troy, New York 12180, USA
| | - CHRISTOPHER BYSTROFF
- Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th st. Troy, New York 12180, USA
- Department of Biology, Rensselaer Polytechnic Institute, 110 8th st. Troy, New York 12180, USA
| |
Collapse
|
29
|
Daniluk P, Lesyng B. A novel method to compare protein structures using local descriptors. BMC Bioinformatics 2011; 12:344. [PMID: 21849047 PMCID: PMC3179968 DOI: 10.1186/1471-2105-12-344] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2011] [Accepted: 08/17/2011] [Indexed: 11/15/2022] Open
Abstract
Background Protein structure comparison is one of the most widely performed tasks in bioinformatics. However, currently used methods have problems with the so-called "difficult similarities", including considerable shifts and distortions of structure, sequential swaps and circular permutations. There is a demand for efficient and automated systems capable of overcoming these difficulties, which may lead to the discovery of previously unknown structural relationships. Results We present a novel method for protein structure comparison based on the formalism of local descriptors of protein structure - DEscriptor Defined Alignment (DEDAL). Local similarities identified by pairs of similar descriptors are extended into global structural alignments. We demonstrate the method's capability by aligning structures in difficult benchmark sets: curated alignments in the SISYPHUS database, as well as SISY and RIPC sets, including non-sequential and non-rigid-body alignments. On the most difficult RIPC set of sequence alignment pairs the method achieves an accuracy of 77% (the second best method tested achieves 60% accuracy). Conclusions DEDAL is fast enough to be used in whole proteome applications, and by lowering the threshold of detectable structure similarity it may shed additional light on molecular evolution processes. It is well suited to improving automatic classification of structure domains, helping analyze protein fold space, or to improving protein classification schemes. DEDAL is available online at http://bioexploratorium.pl/EP/DEDAL.
Collapse
Affiliation(s)
- Paweł Daniluk
- Faculty of Physics, Department of Biophysics and CoE BioExploratorium, University of Warsaw, Żwirki i Wigury 93, Warsaw, Poland
| | | |
Collapse
|
30
|
Nguyen MN, Tan KP, Madhusudhan MS. CLICK--topology-independent comparison of biomolecular 3D structures. Nucleic Acids Res 2011; 39:W24-8. [PMID: 21602266 PMCID: PMC3125785 DOI: 10.1093/nar/gkr393] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2011] [Revised: 04/19/2011] [Accepted: 05/03/2011] [Indexed: 01/28/2023] Open
Abstract
Our server, CLICK: http://mspc.bii.a-star.edu.sg/click, is capable of superimposing the 3D structures of any pair of biomolecules (proteins, DNA, RNA, etc.). The server makes use of the Cartesian coordinates of the molecules with the option of using other structural features such as secondary structure, solvent accessible surface area and residue depth to guide the alignment. CLICK first looks for cliques of points (3-7 residues) that are structurally similar in the pair of structures to be aligned. Using these local similarities, a one-to-one equivalence is charted between the residues of the two structures. A least square fit then superimposes the two structures. Our method is especially powerful in establishing protein relationships by detecting similarities in structural subdomains, domains and topological variants. CLICK has been extensively benchmarked and compared with other popular methods for protein and RNA structural alignments. In most cases, CLICK alignments were statistically significantly better in terms of structure overlap. The method also recognizes conformational changes that may have occurred in structural domains or subdomains in one structure with respect to the other. For this purpose, the server produces complementary alignments to maximize the extent of detectable similarity. Various examples showcase the utility of our web server.
Collapse
Affiliation(s)
- M. N. Nguyen
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Department of Biological Sciences, National University of Singapore and School of Biological Sciences, Nanyang Technological University, Singapore
| | - K. P. Tan
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Department of Biological Sciences, National University of Singapore and School of Biological Sciences, Nanyang Technological University, Singapore
| | - M. S. Madhusudhan
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Department of Biological Sciences, National University of Singapore and School of Biological Sciences, Nanyang Technological University, Singapore
| |
Collapse
|
31
|
Rocha J, Alberich R. The significance of the ProtDeform score for structure prediction and alignment. PLoS One 2011; 6:e20889. [PMID: 21738592 PMCID: PMC3125161 DOI: 10.1371/journal.pone.0020889] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Accepted: 05/12/2011] [Indexed: 11/21/2022] Open
Abstract
BACKGROUND When a researcher uses a program to align two proteins and gets a score, one of her main concerns is how often the program gives a similar score to pairs that are or are not in the same fold. This issue was analysed in detail recently for the program TM-align with its associated TM-score. It was shown that because the TM-score is length independent, it allows a P-value and a hit probability to be defined depending only on the score. Also, it was found that the TM-scores of gapless alignments closely follow an Extreme Value Distribution (EVD). The program ProtDeform for structural protein alignment was developed recently and is characterised by the ability to propose different transformations of different protein regions. Our goal is to analyse its associated score to allow a researcher to have objective reasons to prefer one aligner over another, and carry out a better interpretation of the output. RESULTS The study on the ProtDeform score reveals that it is length independent in a wider score range than TM-scores and that PD-scores of gapless (random) alignments also approximately follow an EVD. On the CASP8 predictions, PD-scores and TM-scores, with respect to native structures, are highly correlated (0.95), and show that around a fifth of the predictions have a quality as low as 99.5% of the random scores. Using the Gold Standard benchmark, ProtDeform has lower probabilities of error than TM-align both at a similar speed. The analysis is extended to homology discrimination showing that, again, ProtDeform offers higher hit probabilities than TM-align. Finally, we suggest using three different P-values according to the three different contexts: Gapless alignments, optimised alignments for fold discrimination and that for superfamily discrimination. In conclusion, PD-scores are at the very least as valuable for prediction scoring as TM-scores, and on the protein classification problem, even more reliable.
Collapse
Affiliation(s)
- Jairo Rocha
- Department of Mathematics and Computer Science and IUNICS, University of the Balearic Islands, Palma, Spain.
| | | |
Collapse
|
32
|
Nguyen MN, Madhusudhan MS. Biological insights from topology independent comparison of protein 3D structures. Nucleic Acids Res 2011; 39:e94. [PMID: 21596786 PMCID: PMC3152366 DOI: 10.1093/nar/gkr348] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Comparing and classifying the three-dimensional (3D) structures of proteins is of crucial importance to molecular biology, from helping to determine the function of a protein to determining its evolutionary relationships. Traditionally, 3D structures are classified into groups of families that closely resemble the grouping according to their primary sequence. However, significant structural similarities exist at multiple levels between proteins that belong to these different structural families. In this study, we propose a new algorithm, CLICK, to capture such similarities. The method optimally superimposes a pair of protein structures independent of topology. Amino acid residues are represented by the Cartesian coordinates of a representative point (usually the Cα atom), side chain solvent accessibility, and secondary structure. Structural comparison is effected by matching cliques of points. CLICK was extensively benchmarked for alignment accuracy on four different sets: (i) 9537 pair-wise alignments between two structures with the same topology; (ii) 64 alignments from set (i) that were considered to constitute difficult alignment cases; (iii) 199 pair-wise alignments between proteins with similar structure but different topology; and (iv) 1275 pair-wise alignments of RNA structures. The accuracy of CLICK alignments was measured by the average structure overlap score and compared with other alignment methods, including HOMSTRAD, MUSTANG, Geometric Hashing, SALIGN, DALI, GANGSTA+, FATCAT, ARTS and SARA. On average, CLICK produces pair-wise alignments that are either comparable or statistically significantly more accurate than all of these other methods. We have used CLICK to uncover relationships between (previously) unrelated proteins. These new biological insights include: (i) detecting hinge regions in proteins where domain or sub-domains show flexibility; (ii) discovering similar small molecule binding sites from proteins of different folds and (iii) discovering topological variants of known structural/sequence motifs. Our method can generally be applied to compare any pair of molecular structures represented in Cartesian coordinates as exemplified by the RNA structure superimposition benchmark.
Collapse
Affiliation(s)
- Minh N Nguyen
- Bioinformatics Institute, 30 Biopolis Street, #07-01 Matrix, Singapore 138671
| | | |
Collapse
|
33
|
Venkateswaran JG, Song B, Kahveci T, Jermaine C. TRIAL: a tool for finding distant structural similarities. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:819-831. [PMID: 21393655 DOI: 10.1109/tcbb.2009.28] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Finding structural similarities in distantly related proteins can reveal functional relationships that can not be identified using sequence comparison. Given two proteins A and B and threshold ε Å, we develop an algorithm, TRiplet-based Iterative ALignment (TRIAL) for computing the transformation of B that maximizes the number of aligned residues such that the root mean square deviation (RMSD) of the alignment is at most ε Å. Our algorithm is designed with the specific goal of effectively handling proteins with low similarity in primary structure, where existing algorithms perform particularly poorly. Experiments show that our method outperforms existing methods. TRIAL alignment brings the secondary structures of distantly related proteins to similar orientations. It also finds larger number of secondary structure matches at lower RMSD values and increased overall alignment lengths. Its classification accuracy is up to 63 percent better than other methods, including CE and DALI. TRIAL successfully aligns 83 percent of the residues from the smaller protein in reasonable time while other methods align only 29 to 65 percent of the residues for the same set of proteins.
Collapse
|
34
|
Goncearenco A, Berezovsky IN. Prototypes of elementary functional loops unravel evolutionary connections between protein functions. ACTA ACUST UNITED AC 2010; 26:i497-503. [PMID: 20823313 PMCID: PMC2935408 DOI: 10.1093/bioinformatics/btq374] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Motivation: Earlier studies of protein structure revealed closed loops with a characteristic size 25–30 residues and ring-like shape as a basic universal structural element of globular proteins. Elementary functional loops (EFLs) have specific signatures and provide functional residues important for binding/activation and principal chemical transformation steps of the enzymatic reaction. The goal of this work is to show how these functional loops evolved from pre-domain peptides and to find a set of prototypes from which the EFLs of contemporary proteins originated. Results: This article describes a computational method for deriving prototypes of EFLs based on the sequences of complete genomes. The procedure comprises the iterative derivation of sequence profiles followed by their hierarchical clustering. The scoring function takes into account information content on profile positions, thus preserving the signature. The statistical significance of scores is evaluated from the empirical distribution of scores of the background model. A set of prototypes of EFLs from archaeal proteomes is derived. This set delineates evolutionary connections between major functions and illuminates how folds and functions emerged in pre-domain evolution as a combination of prototypes. Contact:Igor.Berezovsky@uni.no
Collapse
Affiliation(s)
- Alexander Goncearenco
- Bergen Center for Computational Science and Department of Informatics, University of Bergen, Bergen, Norway
| | | |
Collapse
|
35
|
Prlic A, Bliven S, Rose PW, Bluhm WF, Bizon C, Godzik A, Bourne PE. Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics 2010; 26:2983-5. [PMID: 20937596 PMCID: PMC3003546 DOI: 10.1093/bioinformatics/btq572] [Citation(s) in RCA: 149] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Summary: With the continuous growth of the RCSB Protein Data Bank (PDB), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here, we present a comparison tool for calculating both 1D protein sequence and 3D protein structure alignments. This tool supports various applications at the RCSB PDB website. First, a structure alignment web service calculates pairwise alignments. Second, a stand-alone application runs alignments locally and visualizes the results. Third, pre-calculated 3D structure comparisons for the whole PDB are provided and updated on a weekly basis. These three applications allow users to discover novel relationships between proteins available either at the RCSB PDB or provided by the user. Availability and Implementation: A web user interface is available at http://www.rcsb.org/pdb/workbench/workbench.do. The source code is available under the LGPL license from http://www.biojava.org. A source bundle, prepared for local execution, is available from http://source.rcsb.org Contact:andreas@sdsc.edu; pbourne@ucsd.edu
Collapse
Affiliation(s)
- Andreas Prlic
- San Diego Supercomputer Center, University of California San Diego, 9500 Gilman Drive, Mailcode 0505 La Jolla, CA 92093-0505, USA.
| | | | | | | | | | | | | |
Collapse
|
36
|
Andreeva A, Murzin AG. Structural classification of proteins and structural genomics: new insights into protein folding and evolution. Acta Crystallogr Sect F Struct Biol Cryst Commun 2010; 66:1190-7. [PMID: 20944210 PMCID: PMC2954204 DOI: 10.1107/s1744309110007177] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2010] [Accepted: 02/24/2010] [Indexed: 11/10/2022]
Abstract
During the past decade, the Protein Structure Initiative (PSI) centres have become major contributors of new families, superfamilies and folds to the Structural Classification of Proteins (SCOP) database. The PSI results have increased the diversity of protein structural space and accelerated our understanding of it. This review article surveys a selection of protein structures determined by the Joint Center for Structural Genomics (JCSG). It presents previously undescribed β-sheet architectures such as the double barrel and spiral β-roll and discusses new examples of unusual topologies and peculiar structural features observed in proteins characterized by the JCSG and other Structural Genomics centres.
Collapse
Affiliation(s)
- Antonina Andreeva
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, England
| | - Alexey G. Murzin
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, England
| |
Collapse
|
37
|
Wohlers I, Domingues FS, Klau GW. Towards optimal alignment of protein structure distance matrices. Bioinformatics 2010; 26:2273-80. [PMID: 20639543 DOI: 10.1093/bioinformatics/btq420] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Affiliation(s)
- Inken Wohlers
- CWI, Life Sciences Group, Amsterdam, The Netherlands.
| | | | | |
Collapse
|
38
|
Konagurthu AS, Reboul CF, Schmidberger JW, Irving JA, Lesk AM, Stuckey PJ, Whisstock JC, Buckle AM. MUSTANG-MR structural sieving server: applications in protein structural analysis and crystallography. PLoS One 2010; 5:e10048. [PMID: 20386610 PMCID: PMC2850368 DOI: 10.1371/journal.pone.0010048] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 03/16/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND A central tenet of structural biology is that related proteins of common function share structural similarity. This has key practical consequences for the derivation and analysis of protein structures, and is exploited by the process of "molecular sieving" whereby a common core is progressively distilled from a comparison of two or more protein structures. This paper reports a novel web server for "sieving" of protein structures, based on the multiple structural alignment program MUSTANG. METHODOLOGY/PRINCIPAL FINDINGS "Sieved" models are generated from MUSTANG-generated multiple alignment and superpositions by iteratively filtering out noisy residue-residue correspondences, until the resultant correspondences in the models are optimally "superposable" under a threshold of RMSD. This residue-level sieving is also accompanied by iterative elimination of the poorly fitting structures from the input ensemble. Therefore, by varying the thresholds of RMSD and the cardinality of the ensemble, multiple sieved models are generated for a given multiple alignment and superposition from MUSTANG. To aid the identification of structurally conserved regions of functional importance in an ensemble of protein structures, Lesk-Hubbard graphs are generated, plotting the number of residue correspondences in a superposition as a function of its corresponding RMSD. The conserved "core" (or typically active site) shows a linear trend, which becomes exponential as divergent parts of the structure are included into the superposition. CONCLUSIONS The application addresses two fundamental problems in structural biology: first, the identification of common substructures among structurally related proteins--an important problem in characterization and prediction of function; second, generation of sieved models with demonstrated uses in protein crystallographic structure determination using the technique of Molecular Replacement.
Collapse
Affiliation(s)
- Arun S. Konagurthu
- NICTA Victoria Research Laboratory at The University of Melbourne, The University of Melbourne, Melbourne, Australia
- Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia
| | - Cyril F. Reboul
- Department of Biochemistry and Molecular Biology, Monash University, Victoria, Australia
- ARC Centre of Excellence in Structural and Functional Microbial Genomics, Monash University, Victoria, Australia
| | - Jason W. Schmidberger
- Department of Biochemistry and Molecular Biology, Monash University, Victoria, Australia
| | - James A. Irving
- Department of Biochemistry and Molecular Biology, Monash University, Victoria, Australia
| | - Arthur M. Lesk
- The Huck Institute for Genomics, Proteomics, and Bioinformatics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Peter J. Stuckey
- NICTA Victoria Research Laboratory at The University of Melbourne, The University of Melbourne, Melbourne, Australia
- Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia
| | - James C. Whisstock
- Department of Biochemistry and Molecular Biology, Monash University, Victoria, Australia
- ARC Centre of Excellence in Structural and Functional Microbial Genomics, Monash University, Victoria, Australia
| | - Ashley M. Buckle
- Department of Biochemistry and Molecular Biology, Monash University, Victoria, Australia
- * E-mail:
| |
Collapse
|
39
|
Abstract
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.
Collapse
Affiliation(s)
- Vikram Alva
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen 72076, Germany
| | | | | | | | | |
Collapse
|
40
|
Joerger AC, Fersht AR. The tumor suppressor p53: from structures to drug discovery. Cold Spring Harb Perspect Biol 2010; 2:a000919. [PMID: 20516128 DOI: 10.1101/cshperspect.a000919] [Citation(s) in RCA: 233] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Even 30 years after its discovery, the tumor suppressor protein p53 is still somewhat of an enigma. p53's intimate and multifaceted role in the cell cycle is mirrored in its equally complex structural biology that is being unraveled only slowly. Here, we discuss key structural aspects of p53 function and its inactivation by oncogenic mutations. Concerted action of folded and intrinsically disordered domains of the highly dynamic p53 protein provides binding promiscuity and specificity, allowing p53 to process a myriad of cellular signals to maintain the integrity of the human genome. Importantly, progress in elucidating the structural biology of p53 and its partner proteins has opened various avenues for structure-guided rescue of p53 function in tumors. These emerging anticancer strategies include targeting mutant-specific lesions on the surface of destabilized cancer mutants with small molecules and selective inhibition of p53's degradative pathways.
Collapse
Affiliation(s)
- Andreas C Joerger
- MRC Centre for Protein Engineering, Hills Road, Cambridge, United Kingdom.
| | | |
Collapse
|
41
|
Berbalk C, Schwaiger CS, Lackner P. Accuracy analysis of multiple structure alignments. Protein Sci 2009; 18:2027-35. [PMID: 19621383 DOI: 10.1002/pro.213] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Protein structure alignment methods are essential for many different challenges in protein science, such as the determination of relations between proteins in the fold space or the analysis and prediction of their biological function. A number of different pairwise and multiple structure alignment (MStA) programs have been developed and provided to the community. Prior knowledge of the expected alignment accuracy is desirable for the user of such tools. To retrieve an estimate of the performance of current structure alignment methods, we compiled a test suite taken from literature and the SISYPHUS database consisting of proteins that are difficult to align. Subsequently, different MStA programs were evaluated regarding alignment correctness and general limitations. The analysis shows that there are large differences in the success between the methods in terms of applicability and correctness. The latter ranges from 44 to 75% correct core positions. Taking only the best method result per test case this number increases to 84%. We conclude that the methods available are applicable to difficult cases, but also that there is still room for improvements in both, practicability and alignment correctness. An approach that combines the currently available methods supported by a proper score would be useful. Until then, a user should not rely on just a single program.
Collapse
|
42
|
|
43
|
Structural relationships among proteins with different global topologies and their implications for function annotation strategies. Proc Natl Acad Sci U S A 2009; 106:17377-82. [PMID: 19805138 DOI: 10.1073/pnas.0907971106] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
It has become increasingly apparent that geometric relationships often exist between regions of two proteins that have quite different global topologies or folds. In this article, we examine whether such relationships can be used to infer a functional connection between the two proteins in question. We find, by considering a number of examples involving metal and cation binding, sugar binding, and aromatic group binding, that geometrically similar protein fragments can share related functions, even if they have been classified as belonging to different folds and topologies. Thus, the use of classifications inevitably limits the number of functional inferences that can be obtained from the comparative analysis of protein structures. In contrast, the development of interactive computational tools that recognize the "continuous" nature of protein structure/function space, by increasing the number of potentially meaningful relationships that are considered, may offer a dramatic enhancement in the ability to extract information from protein structure databases. We introduce the MarkUs server, that embodies this strategy and that is designed for a user interested in developing and validating specific functional hypotheses.
Collapse
|
44
|
Abstract
MOTIVATION Structural alignment is an important tool for understanding the evolutionary relationships between proteins. However, finding the best pairwise structural alignment is difficult, due to the infinite number of possible superpositions of two structures. Unlike the sequence alignment problem, which has a polynomial time solution, the structural alignment problem has not been even classified as solvable. RESULTS We study one of the most widely used measures of protein structural similarity, defined as the number of pairs of residues in two proteins that can be superimposed under a predefined distance cutoff. We prove that, for any two proteins, this measure can be optimized for all but finitely many distance cutoffs. Our method leads to a series of algorithms for optimizing other structure similarity measures, including the measures commonly used in protein structure prediction experiments. We also present a polynomial time algorithm for finding a near-optimal superposition of two proteins. Aside from having a relatively low cost, the algorithm for near-optimal solution returns a superposition of provable quality. In other words, the difference between the score of the returned superposition and the score of an optimal superposition can be explicitly computed and used to determine whether the returned superposition is, in fact, the best superposition. CONTACT poleksic@cs.uni.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, Cedar Falls, IA 50614, USA.
| |
Collapse
|
45
|
Micheletti C, Orland H. MISTRAL: a tool for energy-based multiple structural alignment of proteins. ACTA ACUST UNITED AC 2009; 25:2663-9. [PMID: 19692555 DOI: 10.1093/bioinformatics/btp506] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The steady growth of the number of available protein structures has constantly motivated the development of new algorithms for detecting structural correspondences in proteins. Detecting structural equivalences in two or more proteins is computationally demanding as it typically entails the exploration of the combinatorial space of all possible amino acid pairings in the parent proteins. The search is often aided by the introduction of various constraints such as considering protein fragments, rather than single amino acids, and/or seeking only sequential correspondences in the given proteins. An additional challenge is represented by the difficulty of associating to a given alignment, a reliable a priori measure of its statistical significance. RESULTS Here, we present and discuss MISTRAL (Multiple STRuctural ALignment), a novel strategy for multiple protein alignment based on the minimization of an energy function over the low-dimensional space of the relative rotations and translations of the molecules. The energy minimization avoids combinatorial searches and returns pairwise alignment scores for which a reliable a priori statistical significance can be given. AVAILABILITY MISTRAL is freely available for academic users as a standalone program and as a web service at http://ipht.cea.fr/protein.php.
Collapse
Affiliation(s)
- Cristian Micheletti
- SISSA, CNR-INFM Democritos and Italian Institute of Technology, Via Beirut 2-4, 34014 Trieste, Italy.
| | | |
Collapse
|
46
|
Sippl MJ. Fold space unlimited. Curr Opin Struct Biol 2009; 19:312-20. [DOI: 10.1016/j.sbi.2009.03.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2009] [Revised: 02/16/2009] [Accepted: 03/16/2009] [Indexed: 11/25/2022]
|
47
|
Hasegawa H, Holm L. Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol 2009; 19:341-8. [PMID: 19481444 DOI: 10.1016/j.sbi.2009.04.003] [Citation(s) in RCA: 303] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Accepted: 04/16/2009] [Indexed: 11/30/2022]
Abstract
Structure comparison opens a window into the distant past of protein evolution, which has been unreachable by sequence comparison alone. With 55,000 entries in the Protein Data Bank and about 500 new structures added each week, automated processing, comparison, and classification are necessary. A variety of methods use different representations, scoring functions, and optimization algorithms, and they generate contradictory results even for moderately distant structures. Sequence mutations, insertions, and deletions are accommodated by plastic deformations of the common core, retaining the precise geometry of the active site, and peripheral regions may refold completely. Therefore structure comparison methods that allow for flexibility and plasticity generate the most biologically meaningful alignments. Active research directions include both the search for fold invariant features and the modeling of structural transitions in evolution. Advances have been made in algorithmic robustness, multiple alignment, and speeding up database searches.
Collapse
Affiliation(s)
- Hitomi Hasegawa
- Institute of Biotechnology, University of Helsinki, P.O. Box 56 (Viikinkaari 5), 00014 University of Helsinki, Finland
| | | |
Collapse
|
48
|
Rocha J, Segura J, Wilson RC, Dasgupta S. Flexible structural protein alignment by a sequence of local transformations. ACTA ACUST UNITED AC 2009; 25:1625-31. [PMID: 19417057 PMCID: PMC2940242 DOI: 10.1093/bioinformatics/btp296] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Throughout evolution, homologous proteins have common regions that stay semi-rigid relative to each other and other parts that vary in a more noticeable way. In order to compare the increasing number of structures in the PDB, flexible geometrical alignments are needed, that are reliable and easy to use. RESULTS We present a protein structure alignment method whose main feature is the ability to consider different rigid transformations at different sites, allowing for deformations beyond a global rigid transformation. The performance of the method is comparable with that of the best ones from 10 aligners tested, regarding both the quality of the alignments with respect to hand curated ones, and the classification ability. An analysis of some structure pairs from the literature that need to be matched in a flexible fashion are shown. The use of a series of local transformations can be exported to other classifiers, and a future golden protein similarity measure could benefit from it. AVAILABILITY A public server for the program is available at http://dmi.uib.es/ProtDeform/. SUPPLEMENTARY INFORMATION All data used, results and examples are available at http://dmi.uib.es/people/jairo/bio/ProtDeform.
Collapse
Affiliation(s)
- Jairo Rocha
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma, Spain.
| | | | | | | |
Collapse
|
49
|
Tidow H, Andreeva A, Rutherford TJ, Fersht AR. Solution structure of the U11-48K CHHC zinc-finger domain that specifically binds the 5' splice site of U12-type introns. Structure 2009; 17:294-302. [PMID: 19217400 DOI: 10.1016/j.str.2008.11.013] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2008] [Revised: 11/24/2008] [Accepted: 11/26/2008] [Indexed: 10/21/2022]
Abstract
The formation of stable 18S U11/U12 di-snRNPs before their association with the pre-mRNA is a characteristic feature of the minor spliceosome. During the spliceosomal assembly, the 18S snRNP binds cooperatively to the introns' 5' splice and branch point site. The molecular basis for this recognition is still unknown. Here, we report the solution structure of the U11-48K CHHC Zn finger, a domain unique to the minor spliceosome. The CHHC Zn-finger structure revealed an unexpected similarity to the TFIIIA domains, with distinct features originating from the type and separation of the zinc-coordinating residues. We show that this domain specifically binds the 5' splice site sequence of U12-type introns when base paired to U11 snRNA in vitro and hence may contribute to the U12 intron recognition. We propose a model in which the U11-48K Zn finger stabilizes U11-5' splice site base pairing and thus plays an important role during the minor spliceosome assembly.
Collapse
Affiliation(s)
- Henning Tidow
- MRC Centre for Protein Engineering, Hills Road, Cambridge CB20QH, United Kingdom.
| | | | | | | |
Collapse
|
50
|
Cradle-loop barrels and the concept of metafolds in protein classification by natural descent. Curr Opin Struct Biol 2008; 18:358-65. [DOI: 10.1016/j.sbi.2008.02.006] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2007] [Accepted: 02/14/2008] [Indexed: 11/19/2022]
|