1
|
Vavra O, Tyzack J, Haddadi F, Stourac J, Damborsky J, Mazurenko S, Thornton JM, Bednar D. Large-scale annotation of biochemically relevant pockets and tunnels in cognate enzyme-ligand complexes. J Cheminform 2024; 16:114. [PMID: 39407342 PMCID: PMC11481355 DOI: 10.1186/s13321-024-00907-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 09/16/2024] [Indexed: 10/19/2024] Open
Abstract
Tunnels in enzymes with buried active sites are key structural features allowing the entry of substrates and the release of products, thus contributing to the catalytic efficiency. Targeting the bottlenecks of protein tunnels is also a powerful protein engineering strategy. However, the identification of functional tunnels in multiple protein structures is a non-trivial task that can only be addressed computationally. We present a pipeline integrating automated structural analysis with an in-house machine-learning predictor for the annotation of protein pockets, followed by the calculation of the energetics of ligand transport via biochemically relevant tunnels. A thorough validation using eight distinct molecular systems revealed that CaverDock analysis of ligand un/binding is on par with time-consuming molecular dynamics simulations, but much faster. The optimized and validated pipeline was applied to annotate more than 17,000 cognate enzyme-ligand complexes. Analysis of ligand un/binding energetics indicates that the top priority tunnel has the most favourable energies in 75% of cases. Moreover, energy profiles of cognate ligands revealed that a simple geometry analysis can correctly identify tunnel bottlenecks only in 50% of cases. Our study provides essential information for the interpretation of results from tunnel calculation and energy profiling in mechanistic enzymology and protein engineering. We formulated several simple rules allowing identification of biochemically relevant tunnels based on the binding pockets, tunnel geometry, and ligand transport energy profiles.Scientific contributionsThe pipeline introduced in this work allows for the detailed analysis of a large set of protein-ligand complexes, focusing on transport pathways. We are introducing a novel predictor for determining the relevance of binding pockets for tunnel calculation. For the first time in the field, we present a high-throughput energetic analysis of ligand binding and unbinding, showing that approximate methods for these simulations can identify additional mutagenesis hotspots in enzymes compared to purely geometrical methods. The predictor is included in the supplementary material and can also be accessed at https://github.com/Faranehhad/Large-Scale-Pocket-Tunnel-Annotation.git . The tunnel data calculated in this study has been made publicly available as part of the ChannelsDB 2.0 database, accessible at https://channelsdb2.biodata.ceitec.cz/ .
Collapse
Affiliation(s)
- O Vavra
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Pekařská 53, 656 91, Brno, Czech Republic
| | - J Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust GenomeCampus, Cambridge, CB10 1SD, UK
| | - F Haddadi
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Pekařská 53, 656 91, Brno, Czech Republic
| | - J Stourac
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Pekařská 53, 656 91, Brno, Czech Republic
| | - J Damborsky
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00, Brno, Czech Republic
- International Clinical Research Center, St. Anne's University Hospital Brno, Pekařská 53, 656 91, Brno, Czech Republic
| | - S Mazurenko
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00, Brno, Czech Republic.
- International Clinical Research Center, St. Anne's University Hospital Brno, Pekařská 53, 656 91, Brno, Czech Republic.
| | - J M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust GenomeCampus, Cambridge, CB10 1SD, UK.
| | - D Bednar
- Loschmidt Laboratories, Department of Experimental Biology and RECETOX, Faculty of Science, Masaryk University, Kamenice 5/A13, 625 00, Brno, Czech Republic.
- International Clinical Research Center, St. Anne's University Hospital Brno, Pekařská 53, 656 91, Brno, Czech Republic.
| |
Collapse
|
2
|
Guan Y, Mei J, Gao X, Wang C, Jia M, Ahmad S, Muhammad FN, Ai H. Prediction of the 3D conformation of a small peptide vaccine targeting Aβ42 oligomers. Phys Chem Chem Phys 2024; 26:20087-20102. [PMID: 39007924 DOI: 10.1039/d4cp02078b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
The original etiology of Alzheimer's disease (AD) is the deposition of amyloid-beta (Aβ) proteins, which starts from the aggregation of the Aβ oligomers. The optimal therapeutic strategy targeting Aβ oligomer aggregation is the development of AD vaccines. Despite the fact that positive progress has been made for experimental attempts at AD vaccines, the physicochemical and even structural properties of these AD vaccines remain unclear. In this study, through immunoinformatic and molecular dynamics (MD) simulations, we first designed and simulated an alternative of vaccine TAPAS and found that the structure of the alternative can reproduce the 3D conformation of TAPAS determined experimentally. Meanwhile, immunoinformatic methods were used to analyze the physicochemical properties of TAPAS, including immunogenicity, antigenicity, thermal stability, and solubility, which confirm well the efficacy and safety of the vaccine, and validate the scheme reliability of immunoinformatic and MD simulations in designing and simulating the TAPAS vaccine. Using the same scheme, we predicted the 3D conformation of the optimized ACI-24 peptide vaccine, an Aβ peptide with the first 15 residues of Aβ42 (Aβ1-15). The vaccine was verified once to be effective against both full-length Aβ1-42 and truncated Aβ4-42 aggregates, but an experimental 3D structure was absent. We have also explored the immune mechanism of the vaccine at the molecular level and found that the optimized ACI-24 and its analogues can block the growth of either full-length Aβ1-42 or truncated Aβ4-42 pentamer by contacting the hydrophobic residues within the N-terminus and β1 region on the contact surface of either pentamer. Additionally, residues (D1, D7, S8, H13, and Q15) were identified as the key residues of the vaccine to contact either of the two Aβ oligomers. This work provides a feasible implementation scheme of immunoinformatic and MD simulations for the development of AD small peptide vaccines, validating the power of the scheme as a parallel tool to the experimental approaches and injecting molecular-level information into the understanding and design of anti-AD vaccines.
Collapse
Affiliation(s)
- Yvning Guan
- School of Chemistry and Chemical Engineering, University of Jinan, Jinan 250022, P. R. China.
| | - Jinfei Mei
- School of Chemistry and Chemical Engineering, University of Jinan, Jinan 250022, P. R. China.
| | - Xvzhi Gao
- School of Chemistry and Chemical Engineering, University of Jinan, Jinan 250022, P. R. China.
| | - Chuanbo Wang
- School of Chemistry and Chemical Engineering, University of Jinan, Jinan 250022, P. R. China.
| | - Mengke Jia
- School of Chemistry and Chemical Engineering, University of Jinan, Jinan 250022, P. R. China.
| | - Sajjad Ahmad
- School of Chemistry and Chemical Engineering, University of Jinan, Jinan 250022, P. R. China.
| | - Fahad Nouman Muhammad
- School of Chemistry and Chemical Engineering, University of Jinan, Jinan 250022, P. R. China.
| | - Hongqi Ai
- School of Chemistry and Chemical Engineering, University of Jinan, Jinan 250022, P. R. China.
| |
Collapse
|
3
|
van Kempen M, Kim SS, Tumescheit C, Mirdita M, Lee J, Gilchrist CLM, Söding J, Steinegger M. Fast and accurate protein structure search with Foldseek. Nat Biotechnol 2024; 42:243-246. [PMID: 37156916 PMCID: PMC10869269 DOI: 10.1038/s41587-023-01773-0] [Citation(s) in RCA: 447] [Impact Index Per Article: 447.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Accepted: 03/30/2023] [Indexed: 05/10/2023]
Abstract
As structure prediction methods are generating millions of publicly available protein structures, searching these databases is becoming a bottleneck. Foldseek aligns the structure of a query protein against a database by describing tertiary amino acid interactions within proteins as sequences over a structural alphabet. Foldseek decreases computation times by four to five orders of magnitude with 86%, 88% and 133% of the sensitivities of Dali, TM-align and CE, respectively.
Collapse
Affiliation(s)
- Michel van Kempen
- Quantitative and Computational Biology Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
| | - Stephanie S Kim
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | | | - Milot Mirdita
- Quantitative and Computational Biology Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | - Jeongjae Lee
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | | | - Johannes Söding
- Quantitative and Computational Biology Group, Max Planck Institute for Multidisciplinary Sciences, Göttingen, Germany.
- Campus Institute Data Science (CIDAS), Göttingen, Germany.
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea.
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea.
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, South Korea.
| |
Collapse
|
4
|
Arshad N, Laurent-Rolle M, Ahmed WS, Hsu JCC, Mitchell SM, Pawlak J, Sengupta D, Biswas KH, Cresswell P. SARS-CoV-2 accessory proteins ORF7a and ORF3a use distinct mechanisms to downregulate MHC-I surface expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.05.17.492198. [PMID: 35611331 PMCID: PMC9128780 DOI: 10.1101/2022.05.17.492198] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Major histocompatibility complex class I (MHC-I) molecules, which are dimers of a glycosylated polymorphic transmembrane heavy chain and the small protein β 2 -microglobulin (β 2 m), bind peptides in the endoplasmic reticulum that are generated by the cytosolic turnover of cellular proteins. In virus-infected cells these peptides may include those derived from viral proteins. Peptide-MHC-I complexes then traffic through the secretory pathway and are displayed at the cell surface where those containing viral peptides can be detected by CD8 + T lymphocytes that kill infected cells. Many viruses enhance their in vivo survival by encoding genes that downregulate MHC-I expression to avoid CD8 + T cell recognition. Here we report that two accessory proteins encoded by SARS-CoV-2, the causative agent of the ongoing COVID-19 pandemic, downregulate MHC-I expression using distinct mechanisms. One, ORF3a, a viroporin, reduces global trafficking of proteins, including MHC-I, through the secretory pathway. The second, ORF7a, interacts specifically with the MHC-I heavy chain, acting as a molecular mimic of β 2 m to inhibit its association. This slows the exit of properly assembled MHC-I molecules from the endoplasmic reticulum. We demonstrate that ORF7a reduces antigen presentation by the human MHC-I allele HLA-A*02:01. Thus, both ORF3a and ORF7a act post-translationally in the secretory pathway to lower surface MHC-I expression, with ORF7a exhibiting a novel and specific mechanism that allows immune evasion by SARS-CoV-2. Significance Statement Viruses may down-regulate MHC class I expression on infected cells to avoid elimination by cytotoxic T cells. We report that the accessory proteins ORF7a and ORF3a of SARS-CoV-2 mediate this function and delineate the two distinct mechanisms involved. While ORF3a inhibits global protein trafficking to the cell surface, ORF7a acts specifically on MHC-I by competing with β 2 m for binding to the MHC-I heavy chain. This is the first account of molecular mimicry of β 2 m as a viral mechanism of MHC-I down-regulation to facilitate immune evasion.
Collapse
Affiliation(s)
- Najla Arshad
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Maudry Laurent-Rolle
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
- Section of Infectious Diseases, Department of Internal Medicine, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Wesam S Ahmed
- Division of Biological and Biomedical Sciences, College of Health & Life Sciences, Hamad Bin Khalifa University, Education City, Qatar Foundation, Doha – 34110, Qatar
| | - Jack Chun-Chieh Hsu
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Susan M Mitchell
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Joanna Pawlak
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
- Section of Infectious Diseases, Department of Internal Medicine, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Debrup Sengupta
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Kabir H Biswas
- Division of Biological and Biomedical Sciences, College of Health & Life Sciences, Hamad Bin Khalifa University, Education City, Qatar Foundation, Doha – 34110, Qatar
| | - Peter Cresswell
- Department of Immunobiology, Yale University School of Medicine, New Haven, CT 06520, USA
- Department of Cell Biology, Yale University School of Medicine, New Haven, CT 06520, USA
| |
Collapse
|
5
|
Krieger JM, Sorzano COS, Carazo JM, Bahar I. Protein dynamics developments for the large scale and cryoEM: case study of ProDy 2.0. Acta Crystallogr D Struct Biol 2022; 78:399-409. [PMID: 35362464 PMCID: PMC8972803 DOI: 10.1107/s2059798322001966] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 02/18/2022] [Indexed: 11/24/2022] Open
Abstract
Cryo-electron microscopy (cryoEM) has become a well established technique with the potential to produce structures of large and dynamic supramolecular complexes that are not amenable to traditional approaches for studying structure and dynamics. The size and low resolution of such molecular systems often make structural modelling and molecular dynamics simulations challenging and computationally expensive. This, together with the growing wealth of structural data arising from cryoEM and other structural biology methods, has driven a trend in the computational biophysics community towards the development of new pipelines for analysing global dynamics using coarse-grained models and methods. At the centre of this trend has been a return to elastic network models, normal mode analysis (NMA) and ensemble analyses such as principal component analysis, and the growth of hybrid simulation methodologies that make use of them. Here, this field is reviewed with a focus on ProDy, the Python application programming interface for protein dynamics, which has been developed over the last decade. Two key developments in this area are highlighted: (i) ensemble NMA towards extracting and comparing the signature dynamics of homologous structures, aided by the recent SignDy pipeline, and (ii) pseudoatom fitting for more efficient global dynamics analyses of large and low-resolution supramolecular assemblies from cryoEM, revisited in the CryoDy pipeline. It is believed that such a renewal and extension of old models and methods in new pipelines will be critical for driving the field forward into the next cryoEM revolution.
Collapse
Affiliation(s)
- James Michael Krieger
- Biocomputing Unit, Centro Nacional de Biotecnología (CSIC), Calle Darwin 3, 28049 Madrid, Spain
| | - Carlos Oscar S. Sorzano
- Biocomputing Unit, Centro Nacional de Biotecnología (CSIC), Calle Darwin 3, 28049 Madrid, Spain
| | - Jose Maria Carazo
- Biocomputing Unit, Centro Nacional de Biotecnología (CSIC), Calle Darwin 3, 28049 Madrid, Spain
| | - Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, 800 Murdoch Building, 3420 Forbes Avenue, Pittsburgh, PA 15213, USA
| |
Collapse
|
6
|
A Comparative Evaluation of the Structural and Dynamic Properties of Insect Odorant Binding Proteins. Biomolecules 2022; 12:biom12020282. [PMID: 35204784 PMCID: PMC8961588 DOI: 10.3390/biom12020282] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 01/23/2022] [Accepted: 01/24/2022] [Indexed: 02/01/2023] Open
Abstract
Insects devote a major part of their metabolic resources to the production of odorant binding proteins (OBPs). Although initially, these proteins were implicated in the solubilisation, binding and transport of semiochemicals to olfactory receptors, it is now recognised that they may play diverse, as yet uncharacterised, roles in insect physiology. The structures of these OBPs, the majority of which are known as “classical” OBPs, have shed some light on their potential functional roles. However, the dynamic properties of these proteins have received little attention despite their functional importance. Structural dynamics are encoded in the native protein fold and enable the adaptation of proteins to substrate binding. This paper provides a comparative review of the structural and dynamic properties of OBPs, making use of sequence/structure analysis, statistical and theoretical physics-based methods. It provides a new layer of information and additional methodological tools useful in unravelling the relationship between structure, dynamics and function of insect OBPs. The dynamic properties of OBPs, studied by means of elastic network models, reflect the similarities/dissimilarities observed in their respective structures and provides insights regarding protein motions that may have important implications for ligand recognition and binding. Furthermore, it was shown that the OBPs studied in this paper share conserved structural ‘core’ that may be of evolutionary and functional importance.
Collapse
|
7
|
Steuer J, Kukharenko O, Riedmiller K, Hartig JS, Peter C. Guanidine-II aptamer conformations and ligand binding modes through the lens of molecular simulation. Nucleic Acids Res 2021; 49:7954-7965. [PMID: 34233001 PMCID: PMC8373139 DOI: 10.1093/nar/gkab592] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 06/21/2021] [Accepted: 06/24/2021] [Indexed: 12/01/2022] Open
Abstract
Regulation of gene expression via riboswitches is a widespread mechanism in bacteria. Here, we investigate ligand binding of a member of the guanidine sensing riboswitch family, the guanidine-II riboswitch (Gd-II). It consists of two stem–loops forming a dimer upon ligand binding. Using extensive molecular dynamics simulations we have identified conformational states corresponding to ligand-bound and unbound states in a monomeric stem–loop of Gd-II and studied the selectivity of this binding. To characterize these states and ligand-dependent conformational changes we applied a combination of dimensionality reduction, clustering, and feature selection methods. In absence of a ligand, the shape of the binding pocket alternates between the conformation observed in presence of guanidinium and a collapsed conformation, which is associated with a deformation of the dimerization interface. Furthermore, the structural features responsible for the ability to discriminate against closely related analogs of guanidine are resolved. Based on these insights, we propose a mechanism that couples ligand binding to aptamer dimerization in the Gd-II system, demonstrating the value of computational methods in the field of nucleic acids research.
Collapse
Affiliation(s)
- Jakob Steuer
- Department of Chemistry, University of Konstanz, 78457 Konstanz, Germany.,Konstanz Research School Chemical Biology (KoRS-CB), University of Konstanz, 78457 Konstanz, Germany
| | - Oleksandra Kukharenko
- Department of Chemistry, University of Konstanz, 78457 Konstanz, Germany.,Max Planck Institute for Polymer Research, 55128 Mainz, Germany
| | - Kai Riedmiller
- Department of Chemistry, University of Konstanz, 78457 Konstanz, Germany
| | - Jörg S Hartig
- Department of Chemistry, University of Konstanz, 78457 Konstanz, Germany.,Konstanz Research School Chemical Biology (KoRS-CB), University of Konstanz, 78457 Konstanz, Germany
| | - Christine Peter
- Department of Chemistry, University of Konstanz, 78457 Konstanz, Germany.,Konstanz Research School Chemical Biology (KoRS-CB), University of Konstanz, 78457 Konstanz, Germany
| |
Collapse
|
8
|
Roda S, Santiago G, Guallar V. Mapping enzyme-substrate interactions: its potential to study the mechanism of enzymes. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2020; 122:1-31. [PMID: 32951809 DOI: 10.1016/bs.apcsb.2020.06.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
With the increase of the need to use more sustainable processes for the industry in our society, the modeling of enzymes has become crucial to fully comprehend their mechanism of action and use this knowledge to enhance and design their properties. A lot of methods to study enzymes computationally exist and they have been classified on sequence-based, structure-based, and the more new artificial intelligence-based ones. Albeit the abundance of methods to help predict the function of an enzyme, molecular modeling is crucial when trying to understand the enzyme mechanism, as they aim to correlate atomistic information with experimental data. Among them, methods that simulate the system dynamics at a molecular mechanics level of theory (classical force fields) have shown to offer a comprehensive study. In this book chapter, we will analyze these techniques, emphasizing the importance of precise modeling of enzyme-substrate interactions. In the end, a brief explanation of the transference of the information from research studies to the industry is given accompanied with two examples of family enzymes where their modeling has helped their exploitation.
Collapse
Affiliation(s)
- Sergi Roda
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | | | - Victor Guallar
- Barcelona Supercomputing Center (BSC), Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
9
|
Wen Z, He J, Huang SY. Topology-independent and global protein structure alignment through an FFT-based algorithm. Bioinformatics 2020; 36:478-486. [PMID: 31384919 DOI: 10.1093/bioinformatics/btz609] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 07/22/2019] [Accepted: 08/02/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Protein structure alignment is one of the fundamental problems in computational structure biology. A variety of algorithms have been developed to address this important issue in the past decade. However, due to their heuristic nature, current structure alignment methods may suffer from suboptimal alignment and/or over-fragmentation and thus lead to a biologically wrong alignment in some cases. To overcome these limitations, we have developed an accurate topology-independent and global structure alignment method through an FFT-based exhaustive search algorithm, which is referred to as FTAlign. RESULTS Our FTAlign algorithm was extensively tested on six commonly used datasets and compared with seven state-of-the-art structure alignment approaches, TMalign, DeepAlign, Kpax, 3DCOMB, MICAN, SPalignNS and CLICK. It was shown that FTAlign outperformed the other methods in reproducing manually curated alignments and obtained a high success rate of 96.7 and 90.0% on two gold-standard benchmarks, MALIDUP and MALISAM, respectively. Moreover, FTAlign also achieved the overall best performance in terms of biologically meaningful structure overlap (SO) and TMscore on both the sequential alignment test sets including MALIDUP, MALISAM and 64 difficult cases from HOMSTRAD, and the non-sequential sets including MALIDUP-NS, MALISAM-NS, 199 topology-different cases, where FTAlign especially showed more advantage for non-sequential alignment. Despite its global search feature, FTAlign is also computationally efficient and can normally complete a pairwise alignment within one second. AVAILABILITY AND IMPLEMENTATION http://huanglab.phys.hust.edu.cn/ftalign/.
Collapse
Affiliation(s)
- Zeyu Wen
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| | - Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| |
Collapse
|
10
|
Mirzaei S, Razmara J, Lotfi S. GADP-align: A genetic algorithm and dynamic programming-based method for structural alignment of proteins. BIOIMPACTS 2020; 11:271-279. [PMID: 34631489 PMCID: PMC8494253 DOI: 10.34172/bi.2021.37] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 06/10/2020] [Accepted: 06/16/2020] [Indexed: 11/16/2022]
Abstract
![]()
Introduction: Similarity analysis of protein structure is considered as a fundamental step to give insight into the relationships between proteins. The primary step in structural alignment is looking for the optimal correspondence between residues of two structures to optimize the scoring function. An exhaustive search for finding such a correspondence between two structures is intractable.
Methods: In this paper, a hybrid method is proposed, namely GADP-align, for pairwise protein structure alignment. The proposed method looks for an optimal alignment using a hybrid method based on a genetic algorithm and an iterative dynamic programming technique. To this end, the method first creates an initial map of correspondence between secondary structure elements (SSEs) of two proteins. Then, a genetic algorithm combined with an iterative dynamic programming algorithm is employed to optimize the alignment.
Results: The GADP-align algorithm was employed to align 10 ‘difficult to align’ protein pairs in order to evaluate its performance. The experimental study shows that the proposed hybrid method produces highly accurate alignments in comparison with the methods using exactly the dynamic programming technique. Furthermore, the proposed method prevents the local optimal traps caused by the unsuitable initial guess of the corresponding residues.
Conclusion: The findings of this paper demonstrate that employing the genetic algorithm along with the dynamic programming technique yields highly accurate alignments between a protein pair by exploring the global alignment and avoiding trapping in local alignments.
Collapse
Affiliation(s)
- Soraya Mirzaei
- Department of Computer Science, Faculty of Mathematics, Statistics, and Computer Science, University of Tabriz, Tabriz, Iran
| | - Jafar Razmara
- Department of Computer Science, Faculty of Mathematics, Statistics, and Computer Science, University of Tabriz, Tabriz, Iran
| | - Shahriar Lotfi
- Department of Computer Science, Faculty of Mathematics, Statistics, and Computer Science, University of Tabriz, Tabriz, Iran
| |
Collapse
|
11
|
Fallaize CJ, Green PJ, Mardia KV, Barber S. Bayesian protein sequence and structure alignment. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
| | - Peter J. Green
- University of Bristol UK
- University of Technology Sydney Australia
| | | | | |
Collapse
|
12
|
Sequence Pattern for Supersecondary Structure of Sandwich-Like Proteins. Methods Mol Biol 2019. [PMID: 30945226 DOI: 10.1007/978-1-4939-9161-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
The goal is to define sequence characteristics of beta-sandwich proteins that are unique for the beta-sandwich supersecondary structure (SSS). Finding of the conserved residues that are critical for protein structure can often be accomplished with homology methods, but these methods are not always adequate as residues with similar structural role do not always occupy the same position as determined by sequence alignment. In this paper, we show how to identify residues that play the same structural role in the different proteins of the same SSS, even when these residue positions cannot be aligned with sequence alignment methods. The SSS characteristics are (a) a set of positions in each strand that are involved in the formation of a hydrophobic core, residue content, and correlations of residues at these key positions, (b) maximum allowable number of "low-frequency residues" for each strand, (c) minimum allowed number of "high-frequency" residues for each loop, and (d) minimum and maximum lengths of each loop. These sequence characteristics are referred to as "sequence pattern" for their respective SSS. The high specificity and sensitivity for a particular SSS are confirmed by applying this pattern to all protein structures in the SCOP data bank. We present here the pattern for one of the most common SSS of beta-sandwich proteins.
Collapse
|
13
|
Gutierrez B, Escalera-Zamudio M, Pybus OG. Parallel molecular evolution and adaptation in viruses. Curr Opin Virol 2019; 34:90-96. [PMID: 30703578 PMCID: PMC7102768 DOI: 10.1016/j.coviro.2018.12.006] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Accepted: 12/11/2018] [Indexed: 01/05/2023]
Abstract
Parallel molecular evolution is the independent evolution of the same genotype or phenotype from distinct ancestors. The simple genomes and rapid evolution of many viruses mean they are useful model systems for studying parallel evolution by natural selection. Parallel adaptation occurs in the context of several viral behaviours, including cross-species transmission, drug resistance, and host immune escape, and its existence suggests that at least some aspects of virus evolution and emergence are repeatable and predictable. We introduce examples of virus parallel evolution and summarise key concepts. We outline the difficulties in detecting parallel adaptation using virus genomes, with a particular focus on phylogenetic and structural approaches, and we discuss future approaches that may improve our understanding of the phenomenon.
Collapse
Affiliation(s)
| | | | - Oliver G Pybus
- Department of Zoology, University of Oxford, Oxford, United Kingdom.
| |
Collapse
|
14
|
Fotoohifiroozabadi S, Mohamad MS, Deris S. NAHAL-Flex: A Numerical and Alphabetical Hinge Detection Algorithm for Flexible Protein Structure Alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:934-943. [PMID: 28534783 DOI: 10.1109/tcbb.2017.2705080] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Flexible proteins are proteins that have conformational changes in their structures. Protein flexibility analysis is critical for classifying and understanding protein functionality. For that analysis, the hinge areas where proteins show flexibility must be detected. To detect the location of the hinges, previous methods have utilized the three-dimensional (3D) structure of proteins, which is highly computational. To reduce the computational complexity, this study proposes a novel text-based method using structural alphabets (SAs) for detecting the hinge position, called NAHAL-Flex. Protein structures were encoded to a particular type of SA called the protein folding shape code (PFSC), which remains unaffected by location, scale, and rotation. The flexible regions of the proteins are the only places in which letter sequences can be distorted. With this knowledge, it is possible to find the longest alignment path of two letter sequences using a dynamic programming (DP) algorithm. Then, the proposed method looks for regions where the alphabet sequence is distorted to find the most probable hinge positions. In order to reduce the number of hinge positions, a genetic algorithm (GA) was utilized to find the best candidate hinge points. To evaluate the method's effectiveness, four different flexible and rigid protein databases, including two small datasets and two large datasets, were utilized. For the small dataset, the NAHAL-Flex method was comparable to state-of-the-art structural flexible alignment methods. The result for the large datasets show that NAHAL-Flex outperforms some well-known alignment methods, e.g., DaliLite, Matt, DeepAlign, and TM-align; the speed of NAHAL-Flex was faster and its result was more accurate than the other methods.
Collapse
|
15
|
GRAFENE: Graphlet-based alignment-free network approach integrates 3D structural and sequence (residue order) data to improve protein structural comparison. Sci Rep 2017; 7:14890. [PMID: 29097661 PMCID: PMC5668259 DOI: 10.1038/s41598-017-14411-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 10/11/2017] [Indexed: 12/26/2022] Open
Abstract
Initial protein structural comparisons were sequence-based. Since amino acids that are distant in the sequence can be close in the 3-dimensional (3D) structure, 3D contact approaches can complement sequence approaches. Traditional 3D contact approaches study 3D structures directly and are alignment-based. Instead, 3D structures can be modeled as protein structure networks (PSNs). Then, network approaches can compare proteins by comparing their PSNs. These can be alignment-based or alignment-free. We focus on the latter. Existing network alignment-free approaches have drawbacks: 1) They rely on naive measures of network topology. 2) They are not robust to PSN size. They cannot integrate 3) multiple PSN measures or 4) PSN data with sequence data, although this could improve comparison because the different data types capture complementary aspects of the protein structure. We address this by: 1) exploiting well-established graphlet measures via a new network alignment-free approach, 2) introducing normalized graphlet measures to remove the bias of PSN size, 3) allowing for integrating multiple PSN measures, and 4) using ordered graphlets to combine the complementary PSN data and sequence (specifically, residue order) data. We compare synthetic networks and real-world PSNs more accurately and faster than existing network (alignment-free and alignment-based), 3D contact, or sequence approaches.
Collapse
|
16
|
Barlowe S, Coan HB, Youker RT. SubVis: an interactive R package for exploring the effects of multiple substitution matrices on pairwise sequence alignment. PeerJ 2017; 5:e3492. [PMID: 28674656 PMCID: PMC5490468 DOI: 10.7717/peerj.3492] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 05/27/2017] [Indexed: 01/13/2023] Open
Abstract
Understanding how proteins mutate is critical to solving a host of biological problems. Mutations occur when an amino acid is substituted for another in a protein sequence. The set of likelihoods for amino acid substitutions is stored in a matrix and input to alignment algorithms. The quality of the resulting alignment is used to assess the similarity of two or more sequences and can vary according to assumptions modeled by the substitution matrix. Substitution strategies with minor parameter variations are often grouped together in families. For example, the BLOSUM and PAM matrix families are commonly used because they provide a standard, predefined way of modeling substitutions. However, researchers often do not know if a given matrix family or any individual matrix within a family is the most suitable. Furthermore, predefined matrix families may inaccurately reflect a particular hypothesis that a researcher wishes to model or otherwise result in unsatisfactory alignments. In these cases, the ability to compare the effects of one or more custom matrices may be needed. This laborious process is often performed manually because the ability to simultaneously load multiple matrices and then compare their effects on alignments is not readily available in current software tools. This paper presents SubVis, an interactive R package for loading and applying multiple substitution matrices to pairwise alignments. Users can simultaneously explore alignments resulting from multiple predefined and custom substitution matrices. SubVis utilizes several of the alignment functions found in R, a common language among protein scientists. Functions are tied together with the Shiny platform which allows the modification of input parameters. Information regarding alignment quality and individual amino acid substitutions is displayed with the JavaScript language which provides interactive visualizations for revealing both high-level and low-level alignment information.
Collapse
Affiliation(s)
- Scott Barlowe
- Department of Mathematics and Computer Science, Western Carolina University, Cullowhee, NC, United States of America
| | - Heather B Coan
- Department of Biology, Western Carolina University, Cullowhee, NC, United States of America
| | - Robert T Youker
- Department of Biology, Western Carolina University, Cullowhee, NC, United States of America
| |
Collapse
|
17
|
Collier JH, Allison L, Lesk AM, Stuckey PJ, Garcia de la Banda M, Konagurthu AS. Statistical inference of protein structural alignments using information and compression. Bioinformatics 2017; 33:1005-1013. [PMID: 28065899 DOI: 10.1093/bioinformatics/btw757] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 11/22/2016] [Indexed: 11/14/2022] Open
Abstract
Motivation Structural molecular biology depends crucially on computational techniques that compare protein three-dimensional structures and generate structural alignments (the assignment of one-to-one correspondences between subsets of amino acids based on atomic coordinates). Despite its importance, the structural alignment problem has not been formulated, much less solved, in a consistent and reliable way. To overcome these difficulties, we present here a statistical framework for the precise inference of structural alignments, built on the Bayesian and information-theoretic principle of Minimum Message Length (MML). The quality of any alignment is measured by its explanatory power-the amount of lossless compression achieved to explain the protein coordinates using that alignment. Results We have implemented this approach in MMLigner , the first program able to infer statistically significant structural alignments. We also demonstrate the reliability of MMLigner 's alignment results when compared with the state of the art. Importantly, MMLigner can also discover different structural alignments of comparable quality, a challenging problem for oligomers and protein complexes. Availability and Implementation Source code, binaries and an interactive web version are available at http://lcb.infotech.monash.edu.au/mmligner . Contact arun.konagurthu@monash.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- James H Collier
- Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Lloyd Allison
- Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Peter J Stuckey
- Department of Computing and Information Systems, University of Melbourne, Parkville, VIC 3010, Australia
| | | | - Arun S Konagurthu
- Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
18
|
Cao H, Lu Y. Using Variable-Length Aligned Fragment Pairs and an Improved Transition Function for Flexible Protein Structure Alignment. J Comput Biol 2017; 24:2-12. [DOI: 10.1089/cmb.2016.0135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Hu Cao
- School of Information Science and Engineering, Lanzhou University, Gansu 730000, Lanzhou, China
| | - Yonggang Lu
- School of Information Science and Engineering, Lanzhou University, Gansu 730000, Lanzhou, China
| |
Collapse
|
19
|
Zhou RB, Lu HM, Liu J, Shi JY, Zhu J, Lu QQ, Yin DC. A Systematic Analysis of the Structures of Heterologously Expressed Proteins and Those from Their Native Hosts in the RCSB PDB Archive. PLoS One 2016; 11:e0161254. [PMID: 27517583 PMCID: PMC4982684 DOI: 10.1371/journal.pone.0161254] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 08/02/2016] [Indexed: 11/18/2022] Open
Abstract
Recombinant expression of proteins has become an indispensable tool in modern day research. The large yields of recombinantly expressed proteins accelerate the structural and functional characterization of proteins. Nevertheless, there are literature reported that the recombinant proteins show some differences in structure and function as compared with the native ones. Now there have been more than 100,000 structures (from both recombinant and native sources) publicly available in the Protein Data Bank (PDB) archive, which makes it possible to investigate if there exist any proteins in the RCSB PDB archive that have identical sequence but have some difference in structures. In this paper, we present the results of a systematic comparative study of the 3D structures of identical naturally purified versus recombinantly expressed proteins. The structural data and sequence information of the proteins were mined from the RCSB PDB archive. The combinatorial extension (CE), FATCAT-flexible and TM-Align methods were employed to align the protein structures. The root-mean-square distance (RMSD), TM-score, P-value, Z-score, secondary structural elements and hydrogen bonds were used to assess the structure similarity. A thorough analysis of the PDB archive generated five-hundred-seventeen pairs of native and recombinant proteins that have identical sequence. There were no pairs of proteins that had the same sequence and significantly different structural fold, which support the hypothesis that expression in a heterologous host usually could fold correctly into their native forms.
Collapse
Affiliation(s)
- Ren-Bin Zhou
- Institute for Special Environmental Biophysics, Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, PR China
| | - Hui-Meng Lu
- Institute for Special Environmental Biophysics, Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, PR China
| | - Jie Liu
- School of Computer Science and Technology, Xidian University, Xi’an, PR China
| | - Jian-Yu Shi
- Institute for Special Environmental Biophysics, Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, PR China
| | - Jing Zhu
- Institute for Special Environmental Biophysics, Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, PR China
| | - Qin-Qin Lu
- Institute for Special Environmental Biophysics, Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, PR China
| | - Da-Chuan Yin
- Institute for Special Environmental Biophysics, Key Laboratory for Space Bioscience and Biotechnology, School of Life Sciences, Northwestern Polytechnical University, Xi’an, Shaanxi, PR China
- * E-mail:
| |
Collapse
|
20
|
Bietz S, Fährrolfes R, Rarey M. The Art of Compiling Protein Binding Site Ensembles. Mol Inform 2016; 35:593-598. [PMID: 27870245 DOI: 10.1002/minf.201600043] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Accepted: 04/25/2016] [Indexed: 01/24/2023]
Abstract
Structure-based drug design starts with the collection, preparation, and initial analysis of protein structures. With more than 115,000 structures publically available in the Protein Data Bank (PDB), fully automated processes reliably performing these important preprocessing steps are needed. Several tools are available for these tasks, however, most of them do not address the special needs of scientists interested in protein-ligand interactions. In this paper, we summarize our research activities towards an automated processing pipeline from raw PDB data towards ready-to-use protein binding site ensembles. Starting from a single protein structure, the pipeline covers the following phases: Extracting structurally related binding sites from the PDB, aligning disconnected binding site sequences, resolving tautomeric forms and protonation, orienting hydrogens and flippable side-chains, structurally aligning the multitude of binding sites, and performing a reasonable reduction of ensemble structures. The pipeline, named SIENA, creates protein-structural ensembles for the analysis of protein flexibility, molecular design efforts like docking or de novo design within seconds. For the first time, we are able to process the whole PDB in order to create a large collection of protein binding site ensembles. SIENA is available as part of the ZBH ProteinsPlus webserver under http://proteinsplus.zbh.uni-hamburg.de.
Collapse
Affiliation(s)
- Stefan Bietz
- University of Hamburg, ZBH -, Center for Bioinformatics, Bundesstraße 43, 20146, Hamburg, Germany
| | - Rainer Fährrolfes
- University of Hamburg, ZBH -, Center for Bioinformatics, Bundesstraße 43, 20146, Hamburg, Germany
| | - Matthias Rarey
- University of Hamburg, ZBH -, Center for Bioinformatics, Bundesstraße 43, 20146, Hamburg, Germany
| |
Collapse
|
21
|
Ritchie DW. Calculating and scoring high quality multiple flexible protein structure alignments. Bioinformatics 2016; 32:2650-8. [PMID: 27187202 DOI: 10.1093/bioinformatics/btw300] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 05/07/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Calculating multiple protein structure alignments (MSAs) is important for understanding functional and evolutionary relationships between protein families, and for modeling protein structures by homology. While incorporating backbone flexibility promises to circumvent many of the limitations of rigid MSA algorithms, very few flexible MSA algorithms exist today. This article describes several novel improvements to the Kpax algorithm which allow high quality flexible MSAs to be calculated. This article also introduces a new Gaussian-based MSA quality measure called 'M-score', which circumvents the pitfalls of RMSD-based quality measures. RESULTS As well as calculating flexible MSAs, the new version of Kpax can also score MSAs from other aligners and from previously aligned reference datasets. Results are presented for a large-scale evaluation of the Homstrad, SABmark and SISY benchmark sets using Kpax and Matt as examples of state-of-the-art flexible aligners and 3DCOMB as an example of a state-of-the-art rigid aligner. These results demonstrate the utility of the M-score as a measure of MSA quality and show that high quality MSAs may be achieved when structural flexibility is properly taken into account. AVAILABILITY AND IMPLEMENTATION Kpax 5.0 may be downloaded for academic use at http://kpax.loria.fr/ CONTACT dave.ritchie@inria.fr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
22
|
Babbitt GA, Coppola EE, Alawad MA, Hudson AO. Can all heritable biology really be reduced to a single dimension? Gene 2016; 578:162-8. [DOI: 10.1016/j.gene.2015.12.043] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Revised: 12/16/2015] [Accepted: 12/17/2015] [Indexed: 12/23/2022]
|
23
|
Brown P, Pullan W, Yang Y, Zhou Y. Fast and accurate non-sequential protein structure alignment using a new asymmetric linear sum assignment heuristic. Bioinformatics 2015; 32:370-7. [PMID: 26454279 DOI: 10.1093/bioinformatics/btv580] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 10/04/2015] [Indexed: 01/24/2023] Open
Abstract
MOTIVATION The three dimensional tertiary structure of a protein at near atomic level resolution provides insight alluding to its function and evolution. As protein structure decides its functionality, similarity in structure usually implies similarity in function. As such, structure alignment techniques are often useful in the classifications of protein function. Given the rapidly growing rate of new, experimentally determined structures being made available from repositories such as the Protein Data Bank, fast and accurate computational structure comparison tools are required. This paper presents SPalignNS, a non-sequential protein structure alignment tool using a novel asymmetrical greedy search technique. RESULTS The performance of SPalignNS was evaluated against existing sequential and non-sequential structure alignment methods by performing trials with commonly used datasets. These benchmark datasets used to gauge alignment accuracy include (i) 9538 pairwise alignments implied by the HOMSTRAD database of homologous proteins; (ii) a subset of 64 difficult alignments from set (i) that have low structure similarity; (iii) 199 pairwise alignments of proteins with similar structure but different topology; and (iv) a subset of 20 pairwise alignments from the RIPC set. SPalignNS is shown to achieve greater alignment accuracy (lower or comparable root-mean squared distance with increased structure overlap coverage) for all datasets, and the highest agreement with reference alignments from the challenging dataset (iv) above, when compared with both sequentially constrained alignments and other non-sequential alignments. AVAILABILITY AND IMPLEMENTATION SPalignNS was implemented in C++. The source code, binary executable, and a web server version is freely available at: http://sparks-lab.org CONTACT yaoqi.zhou@griffith.edu.au.
Collapse
Affiliation(s)
- Peter Brown
- School of ICT, Griffith University, Gold Coast, QLD 4222, Australia
| | - Wayne Pullan
- School of ICT, Griffith University, Gold Coast, QLD 4222, Australia
| | - Yuedong Yang
- Institute for Glycomics, Griffith University, Gold Coast, QLD 4222, Australia
| | - Yaoqi Zhou
- School of ICT, Griffith University, Gold Coast, QLD 4222, Australia Institute for Glycomics, Griffith University, Gold Coast, QLD 4222, Australia
| |
Collapse
|
24
|
Zhou CLE. CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments. SOURCE CODE FOR BIOLOGY AND MEDICINE 2015; 10:9. [PMID: 26246852 PMCID: PMC4526201 DOI: 10.1186/s13029-015-0039-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Accepted: 07/14/2015] [Indexed: 11/29/2022]
Abstract
Background In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. Results This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. Conclusions CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository. Electronic supplementary material The online version of this article (doi:10.1186/s13029-015-0039-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Carol L Ecale Zhou
- Computational Biology Group, Global Security Computing Applications Division, Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, CA 94550 USA
| |
Collapse
|
25
|
AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model. BIOMED RESEARCH INTERNATIONAL 2015; 2015:678764. [PMID: 26339631 PMCID: PMC4538422 DOI: 10.1155/2015/678764] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2014] [Accepted: 03/11/2015] [Indexed: 12/14/2022]
Abstract
Motivation. The solvent accessibility of protein residues is one of the driving forces of protein folding, while the contact number of protein residues limits the possibilities of protein conformations. The de novo prediction of these properties from protein sequence is important for the study of protein structure and function. Although these two properties are certainly related with each other, it is challenging to exploit this dependency for the prediction. Method. We present a method AcconPred for predicting solvent accessibility and contact number simultaneously, which is based on a shared weight multitask learning framework under the CNF (conditional neural fields) model. The multitask learning framework on a collection of related tasks provides more accurate prediction than the framework trained only on a single task. The CNF method not only models the complex relationship between the input features and the predicted labels, but also exploits the interdependency among adjacent labels. Results. Trained on 5729 monomeric soluble globular protein datasets, AcconPred could reach 0.68 three-state accuracy for solvent accessibility and 0.75 correlation for contact number. Tested on the 105 CASP11 domain datasets for solvent accessibility, AcconPred could reach 0.64 accuracy, which outperforms existing methods.
Collapse
|
26
|
DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields. Int J Mol Sci 2015; 16:17315-30. [PMID: 26230689 PMCID: PMC4581195 DOI: 10.3390/ijms160817315] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Revised: 07/15/2015] [Accepted: 07/16/2015] [Indexed: 12/14/2022] Open
Abstract
Intrinsically disordered proteins or protein regions are involved in key biological processes including regulation of transcription, signal transduction, and alternative splicing. Accurately predicting order/disorder regions ab initio from the protein sequence is a prerequisite step for further analysis of functions and mechanisms for these disordered regions. This work presents a learning method, weighted DeepCNF (Deep Convolutional Neural Fields), to improve the accuracy of order/disorder prediction by exploiting the long-range sequential information and the interdependency between adjacent order/disorder labels and by assigning different weights for each label during training and prediction to solve the label imbalance issue. Evaluated by the CASP9 and CASP10 targets, our method obtains 0.855 and 0.898 AUC values, which are higher than the state-of-the-art single ab initio predictors.
Collapse
|
27
|
Fuglebakk E, Tiwari SP, Reuter N. Comparing the intrinsic dynamics of multiple protein structures using elastic network models. Biochim Biophys Acta Gen Subj 2014; 1850:911-922. [PMID: 25267310 DOI: 10.1016/j.bbagen.2014.09.021] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 09/15/2014] [Accepted: 09/16/2014] [Indexed: 12/15/2022]
Abstract
BACKGROUND Elastic network models (ENMs) are based on the simple idea that a protein can be described as a set of particles connected by springs, which can then be used to describe its intrinsic flexibility using, for example, normal mode analysis. Since the introduction of the first ENM by Monique Tirion in 1996, several variants using coarser protein models have been proposed and their reliability for the description of protein intrinsic dynamics has been widely demonstrated. Lately an increasing number of studies have focused on the meaning of slow dynamics for protein function and its potential conservation through evolution. This leads naturally to comparisons of the intrinsic dynamics of multiple protein structures with varying levels of similarity. SCOPE OF REVIEW We describe computational strategies for calculating and comparing intrinsic dynamics of multiple proteins using elastic network models, as well as a selection of examples from the recent literature. MAJOR CONCLUSIONS The increasing interest for comparing dynamics across protein structures with various levels of similarity, has led to the establishment and validation of reliable computational strategies using ENMs. Comparing dynamics has been shown to be a viable way for gaining greater understanding for the mechanisms employed by proteins for their function. Choices of ENM parameters, structure alignment or similarity measures will likely influence the interpretation of the comparative analysis of protein motion. GENERAL SIGNIFICANCE Understanding the relation between protein function and dynamics is relevant to the fundamental understanding of protein structure-dynamics-function relationship. This article is part of a Special Issue entitled Recent developments of molecular dynamics.
Collapse
Affiliation(s)
- Edvin Fuglebakk
- Department of Molecular Biology, University of Bergen, Pb. 7803, N-5020 Bergen, Norway; Computational Biology Unit, Department of Informatics, University of Bergen, Pb. 7803, N-5020 Bergen, Norway.
| | - Sandhya P Tiwari
- Department of Molecular Biology, University of Bergen, Pb. 7803, N-5020 Bergen, Norway; Computational Biology Unit, Department of Informatics, University of Bergen, Pb. 7803, N-5020 Bergen, Norway.
| | - Nathalie Reuter
- Department of Molecular Biology, University of Bergen, Pb. 7803, N-5020 Bergen, Norway; Computational Biology Unit, Department of Informatics, University of Bergen, Pb. 7803, N-5020 Bergen, Norway.
| |
Collapse
|