1
|
Zheng F, Liu Y, Yang Y, Wen Y, Li M. Assessing computational tools for predicting protein stability changes upon missense mutations using a new dataset. Protein Sci 2024; 33:e4861. [PMID: 38084013 PMCID: PMC10751734 DOI: 10.1002/pro.4861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/14/2023] [Accepted: 12/06/2023] [Indexed: 12/28/2023]
Abstract
Insight into how mutations affect protein stability is crucial for protein engineering, understanding genetic diseases, and exploring protein evolution. Numerous computational methods have been developed to predict the impact of amino acid substitutions on protein stability. Nevertheless, comparing these methods poses challenges due to variations in their training data. Moreover, it is observed that they tend to perform better at predicting destabilizing mutations than stabilizing ones. Here, we meticulously compiled a new dataset from three recently published databases: ThermoMutDB, FireProtDB, and ProThermDB. This dataset, which does not overlap with the well-established S2648 dataset, consists of 4038 single-point mutations, including over 1000 stabilizing mutations. We assessed these mutations using 27 computational methods, including the latest ones utilizing mega-scale stability datasets and transfer learning. We excluded entries with overlap or similarity to training datasets to ensure fairness. Pearson correlation coefficients for the tested tools ranged from 0.20 to 0.53 on unseen data, and none of the methods could accurately predict stabilizing mutations, even those performing well in anti-symmetric property analysis. While most methods present consistent trends for predicting destabilizing mutations across various properties such as solvent exposure and secondary conformation, stabilizing mutations do not exhibit a clear pattern. Our study also suggests that solely addressing training dataset bias may not significantly enhance accuracy of predicting stabilizing mutations. These findings emphasize the importance of developing precise predictive methods for stabilizing mutations.
Collapse
Affiliation(s)
- Feifan Zheng
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yang Liu
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yan Yang
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Yuhao Wen
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| | - Minghui Li
- MOE Key Laboratory of Geriatric Diseases and ImmunologySchool of Biology and Basic Medical Sciences, Suzhou Medical College of Soochow UniversitySuzhouChina
| |
Collapse
|
2
|
Wu H, Chen Q, Zhang W, Mu W. Overview of strategies for developing high thermostability industrial enzymes: Discovery, mechanism, modification and challenges. Crit Rev Food Sci Nutr 2021; 63:2057-2073. [PMID: 34445912 DOI: 10.1080/10408398.2021.1970508] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Biocatalysts such as enzymes are environmentally friendly and have substrate specificity, which are preferred in the production of various industrial products. However, the strict reaction conditions in industry including high temperature, organic solvents, strong acids and bases and other harsh environments often destabilize enzymes, and thus substantially compromise their catalytic functions, and greatly restrict their applications in food, pharmaceutical, textile, bio-refining and feed industries. Therefore, developing industrial enzymes with high thermostability becomes very important in industry as thermozymes have more advantages under high temperature. Discovering new thermostable enzymes using genome sequencing, metagenomics and sample isolation from extreme environments, or performing molecular modification of the existing enzymes with poor thermostability using emerging protein engineering technology have become an effective means of obtaining thermozymes. Based on the thermozymes as biocatalytic chips in industry, this review systematically analyzes the ways to discover thermostable enzymes from extreme environment, clarifies various interaction forces that will affect thermal stability of enzymes, and proposes different strategies to improve enzymes' thermostability. Furthermore, latest development in the thermal stability modification of industrial enzymes through rational design strategies is comprehensively introduced from structure-activity relationship point of view. Challenges and future research perspectives are put forward as well.
Collapse
Affiliation(s)
- Hao Wu
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
| | - Qiuming Chen
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
| | - Wenli Zhang
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China
| | - Wanmeng Mu
- State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, Jiangsu, China.,International Joint Laboratory on Food Safety, Jiangnan University, Wuxi, Jiangsu, China
| |
Collapse
|
3
|
Chen Y, Lu H, Zhang N, Zhu Z, Wang S, Li M. PremPS: Predicting the impact of missense mutations on protein stability. PLoS Comput Biol 2020; 16:e1008543. [PMID: 33378330 PMCID: PMC7802934 DOI: 10.1371/journal.pcbi.1008543] [Citation(s) in RCA: 93] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 01/12/2021] [Accepted: 11/16/2020] [Indexed: 12/12/2022] Open
Abstract
Computational methods that predict protein stability changes induced by missense mutations have made a lot of progress over the past decades. Most of the available methods however have very limited accuracy in predicting stabilizing mutations because existing experimental sets are dominated by mutations reducing protein stability. Moreover, few approaches could consistently perform well across different test cases. To address these issues, we developed a new computational method PremPS to more accurately evaluate the effects of missense mutations on protein stability. The PremPS method is composed of only ten evolutionary- and structure-based features and parameterized on a balanced dataset with an equal number of stabilizing and destabilizing mutations. A comprehensive comparison of the predictive performance of PremPS with other available methods on nine benchmark datasets confirms that our approach consistently outperforms other methods and shows considerable improvement in estimating the impacts of stabilizing mutations. A protein could have multiple structures available, and if another structure of the same protein is used, the predicted change in stability for structure-based methods might be different. Thus, we further estimated the impact of using different structures on prediction accuracy, and demonstrate that our method performs well across different types of structures except for low-resolution structures and models built based on templates with low sequence identity. PremPS can be used for finding functionally important variants, revealing the molecular mechanisms of functional influences and protein design. PremPS is freely available at https://lilab.jysw.suda.edu.cn/research/PremPS/, which allows to do large-scale mutational scanning and takes about four minutes to perform calculations for a single mutation per protein with ~ 300 residues and requires ~ 0.4 seconds for each additional mutation.
Collapse
Affiliation(s)
- Yuting Chen
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Haoyu Lu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Ning Zhang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Zefeng Zhu
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Shuqin Wang
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| | - Minghui Li
- Center for Systems Biology, Department of Bioinformatics, School of Biology and Basic Medical Sciences, Soochow University, Suzhou, China
| |
Collapse
|
4
|
Zaucha J, Heinzinger M, Kulandaisamy A, Kataka E, Salvádor ÓL, Popov P, Rost B, Gromiha MM, Zhorov BS, Frishman D. Mutations in transmembrane proteins: diseases, evolutionary insights, prediction and comparison with globular proteins. Brief Bioinform 2020; 22:5872174. [PMID: 32672331 DOI: 10.1093/bib/bbaa132] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 05/26/2020] [Accepted: 05/28/2020] [Indexed: 12/18/2022] Open
Abstract
Membrane proteins are unique in that they interact with lipid bilayers, making them indispensable for transporting molecules and relaying signals between and across cells. Due to the significance of the protein's functions, mutations often have profound effects on the fitness of the host. This is apparent both from experimental studies, which implicated numerous missense variants in diseases, as well as from evolutionary signals that allow elucidating the physicochemical constraints that intermembrane and aqueous environments bring. In this review, we report on the current state of knowledge acquired on missense variants (referred to as to single amino acid variants) affecting membrane proteins as well as the insights that can be extrapolated from data already available. This includes an overview of the annotations for membrane protein variants that have been collated within databases dedicated to the topic, bioinformatics approaches that leverage evolutionary information in order to shed light on previously uncharacterized membrane protein structures or interaction interfaces, tools for predicting the effects of mutations tailored specifically towards the characteristics of membrane proteins as well as two clinically relevant case studies explaining the implications of mutated membrane proteins in cancer and cardiomyopathy.
Collapse
Affiliation(s)
- Jan Zaucha
- Department of Bioinformatics of the TUM School of Life Sciences Weihenstephan in Freising, Germany
| | - Michael Heinzinger
- Department of Informatics, Bioinformatics and Computational Biology of the TUM Faculty of Informatics in Garching, Germany
| | - A Kulandaisamy
- Department of Biotechnology of the IIT Bhupat and Jyoti Mehta School of BioSciences in Madras, India
| | - Evans Kataka
- Department of Bioinformatics of the TUM School of Life Sciences Weihenstephan in Freising, Germany
| | - Óscar Llorian Salvádor
- Department of Informatics, Bioinformatics and Computational Biology of the TUM Faculty of Informatics in Garching, Germany
| | - Petr Popov
- Center for Computational and Data-Intensive Science and Engineering of the Skolkovo Institute of Science and Technology in Moscow, Russia
| | - Burkhard Rost
- Department of Informatics, Bioinformatics and Computational Biology at the TUM Faculty of Informatics in Garching, Germany
| | | | - Boris S Zhorov
- Department of Biochemistry and Biomedical Sciences, McMaster University in Hamilton, Canada
| | - Dmitrij Frishman
- Department of Bioinformatics at the TUM School of Life Sciences Weihenstephan in Freising, Germany
| |
Collapse
|
5
|
Marabotti A, Scafuri B, Facchiano A. Predicting the stability of mutant proteins by computational approaches: an overview. Brief Bioinform 2020; 22:5850907. [PMID: 32496523 DOI: 10.1093/bib/bbaa074] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 04/07/2020] [Accepted: 04/10/2020] [Indexed: 01/06/2023] Open
Abstract
A very large number of computational methods to predict the change in thermodynamic stability of proteins due to mutations have been developed during the last 30 years, and many different web servers are currently available. Nevertheless, most of them suffer from severe drawbacks that decrease their general reliability and, consequently, their applicability to different goals such as protein engineering or the predictions of the effects of mutations in genetic diseases. In this review, we have summarized all the main approaches used to develop these tools, with a survey of the web servers currently available. Moreover, we have also reviewed the different assessments made during the years, in order to allow the reader to check directly the different performances of these tools, to select the one that best fits his/her needs, and to help naïve users in finding the best option for their needs.
Collapse
|
6
|
Broom A, Trainor K, Jacobi Z, Meiering EM. Computational Modeling of Protein Stability: Quantitative Analysis Reveals Solutions to Pervasive Problems. Structure 2020; 28:717-726.e3. [DOI: 10.1016/j.str.2020.04.003] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Revised: 03/26/2020] [Accepted: 04/06/2020] [Indexed: 12/20/2022]
|
7
|
Pandurangan AP, Blundell TL. Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning. Protein Sci 2020; 29:247-257. [PMID: 31693276 PMCID: PMC6933854 DOI: 10.1002/pro.3774] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2019] [Revised: 10/31/2019] [Accepted: 10/31/2019] [Indexed: 02/02/2023]
Abstract
Next-generation sequencing methods have not only allowed an understanding of genome sequence variation during the evolution of organisms but have also provided invaluable information about genetic variants in inherited disease and the emergence of resistance to drugs in cancers and infectious disease. A challenge is to distinguish mutations that are drivers of disease or drug resistance, from passengers that are neutral or even selectively advantageous to the organism. This requires an understanding of impacts of missense mutations in gene expression and regulation, and on the disruption of protein function by modulating protein stability or disturbing interactions with proteins, nucleic acids, small molecule ligands, and other biological molecules. Experimental approaches to understanding differences between wild-type and mutant proteins are most accurate but are also time-consuming and costly. Computational tools used to predict the impacts of mutations can provide useful information more quickly. Here, we focus on two widely used structure-based approaches, originally developed in the Blundell lab: site-directed mutator (SDM), a statistical approach to analyze amino acid substitutions, and mutation cutoff scanning matrix (mCSM), which uses graph-based signatures to represent the wild-type structural environment and machine learning to predict the effect of mutations on protein stability. Here, we describe DUET that uses machine learning to combine the two approaches. We discuss briefly the development of mCSM for understanding the impacts of mutations on interfaces with other proteins, nucleic acids, and ligands, and we exemplify the wide application of these approaches to understand human genetic disorders and drug resistance mutations relevant to cancer and mycobacterial infections. STATEMENT FOR A BROADER AUDIENCE: Genetic or somatic changes in genes can lead to mutations in human proteins, which give rise to genetic disorders or cancer, or to genes of pathogens leading to drug resistance. Computer software described here, using statistical approaches or machine learning, uses the information from genome sequencing of humans and pathogens, together with experimental or modeled 3D structures of gene products, the proteins, to predict impacts of mutations in genetic disease, cancer and drug resistance.
Collapse
Affiliation(s)
- Arun Prasad Pandurangan
- Department of BiochemistryUniversity of CambridgeCambridgeUK
- MRC Laboratory of Molecular BiologyCambridgeUK
| | - Tom L. Blundell
- Department of BiochemistryUniversity of CambridgeCambridgeUK
| |
Collapse
|
8
|
Hu Z, Yu C, Furutsuki M, Andreoletti G, Ly M, Hoskins R, Adhikari AN, Brenner SE. VIPdb, a genetic Variant Impact Predictor Database. Hum Mutat 2019; 40:1202-1214. [PMID: 31283070 PMCID: PMC7288905 DOI: 10.1002/humu.23858] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 06/27/2019] [Indexed: 12/30/2022]
Abstract
Genome sequencing identifies vast number of genetic variants. Predicting these variants' molecular and clinical effects is one of the preeminent challenges in human genetics. Accurate prediction of the impact of genetic variants improves our understanding of how genetic information is conveyed to molecular and cellular functions, and is an essential step towards precision medicine. Over one hundred tools/resources have been developed specifically for this purpose. We summarize these tools as well as their characteristics, in the genetic Variant Impact Predictor Database (VIPdb). This database will help researchers and clinicians explore appropriate tools, and inform the development of improved methods. VIPdb can be browsed and downloaded at https://genomeinterpretation.org/vipdb.
Collapse
Affiliation(s)
- Zhiqiang Hu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Changhua Yu
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Bioengineering, University of California, Berkeley, California 94720, USA
| | - Mabel Furutsuki
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720, USA
| | - Gaia Andreoletti
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Melissa Ly
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
- Division of Data Sciences, University of California, Berkeley, California 94720, USA
| | - Roger Hoskins
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Aashish N. Adhikari
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Steven E. Brenner
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| |
Collapse
|
9
|
The state-of-the-art strategies of protein engineering for enzyme stabilization. Biotechnol Adv 2018; 37:530-537. [PMID: 31138425 DOI: 10.1016/j.biotechadv.2018.10.011] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Revised: 10/12/2018] [Accepted: 10/25/2018] [Indexed: 12/11/2022]
Abstract
Enzymes generated by natural recruitment and protein engineering have greatly contribute in various sets of applications. However, their insufficient stability is a bottleneck that limit the rapid development of biocatalysis. Novel approaches based on precise and global structural dissection, advanced gene manipulation, and combination with the multidisciplinary techniques open a new horizon to generate stable enzymes efficiently. Here, we comprehensively introduced emerging advances of protein engineering strategies for enzyme stabilization. Then, we highlighted practical cases to show importance of enzyme stabilization in pharmaceutical and industrial applications. Combining computational enzyme design with molecular evolution will hold considerable promise in this field.
Collapse
|
10
|
Glusman G, Rose PW, Prlić A, Dougherty J, Duarte JM, Hoffman AS, Barton GJ, Bendixen E, Bergquist T, Bock C, Brunk E, Buljan M, Burley SK, Cai B, Carter H, Gao J, Godzik A, Heuer M, Hicks M, Hrabe T, Karchin R, Leman JK, Lane L, Masica DL, Mooney SD, Moult J, Omenn GS, Pearl F, Pejaver V, Reynolds SM, Rokem A, Schwede T, Song S, Tilgner H, Valasatava Y, Zhang Y, Deutsch EW. Mapping genetic variations to three-dimensional protein structures to enhance variant interpretation: a proposed framework. Genome Med 2017; 9:113. [PMID: 29254494 PMCID: PMC5735928 DOI: 10.1186/s13073-017-0509-y] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
The translation of personal genomics to precision medicine depends on the accurate interpretation of the multitude of genetic variants observed for each individual. However, even when genetic variants are predicted to modify a protein, their functional implications may be unclear. Many diseases are caused by genetic variants affecting important protein features, such as enzyme active sites or interaction interfaces. The scientific community has catalogued millions of genetic variants in genomic databases and thousands of protein structures in the Protein Data Bank. Mapping mutations onto three-dimensional (3D) structures enables atomic-level analyses of protein positions that may be important for the stability or formation of interactions; these may explain the effect of mutations and in some cases even open a path for targeted drug development. To accelerate progress in the integration of these data types, we held a two-day Gene Variation to 3D (GVto3D) workshop to report on the latest advances and to discuss unmet needs. The overarching goal of the workshop was to address the question: what can be done together as a community to advance the integration of genetic variants and 3D protein structures that could not be done by a single investigator or laboratory? Here we describe the workshop outcomes, review the state of the field, and propose the development of a framework with which to promote progress in this arena. The framework will include a set of standard formats, common ontologies, a common application programming interface to enable interoperation of the resources, and a Tool Registry to make it easy to find and apply the tools to specific analysis problems. Interoperability will enable integration of diverse data sources and tools and collaborative development of variant effect prediction methods.
Collapse
Affiliation(s)
| | - Peter W Rose
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, 98093, USA
| | - Andreas Prlić
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, 98093, USA.,RCSB Protein Data Bank, University of California San Diego, La Jolla, CA, 98093, USA
| | | | - José M Duarte
- RCSB Protein Data Bank, University of California San Diego, La Jolla, CA, 98093, USA
| | - Andrew S Hoffman
- Human Centered Design & Engineering, University of Washington, Seattle, WA, 98195, USA
| | - Geoffrey J Barton
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
| | - Emøke Bendixen
- Department of Molecular Biology and Genetics, Aarhus University, 8000, Aarhus, Denmark
| | - Timothy Bergquist
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA
| | - Christian Bock
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA
| | - Elizabeth Brunk
- University of California San Diego, La Jolla, CA, 92093, USA
| | - Marija Buljan
- Institute of Molecular Systems Biology, ETH Zurich, CH-8093, Zurich, Switzerland
| | - Stephen K Burley
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, 98093, USA.,RCSB Protein Data Bank, University of California San Diego, La Jolla, CA, 98093, USA.,Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, 08854, USA
| | - Binghuang Cai
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA
| | - Hannah Carter
- University of California San Diego, La Jolla, CA, 92093, USA
| | - JianJiong Gao
- Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, 10065, USA
| | - Adam Godzik
- SBP Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Michael Heuer
- AMPLab, University of California, Berkeley, CA, 94720, USA
| | | | - Thomas Hrabe
- SBP Medical Discovery Institute, La Jolla, CA, 92037, USA
| | - Rachel Karchin
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, 21218, USA.,Department of Oncology, Johns Hopkins Medicine, Baltimore, MD, 21287, USA
| | - Julia Koehler Leman
- Flatiron Institute, Center for Computational Biology, Simons Foundation, New York, NY, 10010, USA.,Department of Biology and Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Lydie Lane
- SIB Swiss Institute of Bioinformatics and University of Geneva, CH-1211, Geneva, Switzerland
| | - David L Masica
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, 21218, USA
| | - Sean D Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD, 20850, USA.,Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, 20742, USA
| | - Gilbert S Omenn
- Institute for Systems Biology, Seattle, WA, 98109, USA.,Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109-2218, USA
| | - Frances Pearl
- School of Life Sciences, University of Sussex, Brighton, BN1 9QG, UK
| | - Vikas Pejaver
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA.,The University of Washington eScience Institute, Seattle, WA, 98195, USA
| | | | - Ariel Rokem
- The University of Washington eScience Institute, Seattle, WA, 98195, USA
| | - Torsten Schwede
- SIB Swiss Institute of Bioinformatics and Biozentrum University of Basel, CH-4056, Basel, Switzerland
| | - Sicheng Song
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, 98109, USA
| | - Hagen Tilgner
- Brain and Mind Research Institute, Weill Cornell Medicine, New York City, NY, 10021, USA
| | - Yana Valasatava
- RCSB Protein Data Bank, University of California San Diego, La Jolla, CA, 98093, USA
| | - Yang Zhang
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109-2218, USA
| | | |
Collapse
|
11
|
Schomburg KT, Nittinger E, Meyder A, Bietz S, Schneider N, Lange G, Klein R, Rarey M. Prediction of protein mutation effects based on dehydration and hydrogen bonding - A large-scale study. Proteins 2017; 85:1550-1566. [DOI: 10.1002/prot.25315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Revised: 04/29/2017] [Accepted: 05/02/2017] [Indexed: 11/11/2022]
Affiliation(s)
- Karen T. Schomburg
- Universität Hamburg, ZBH - Center for Bioinformatics; Bundestrasse 43 Hamburg 20146 Germany
| | - Eva Nittinger
- Universität Hamburg, ZBH - Center for Bioinformatics; Bundestrasse 43 Hamburg 20146 Germany
| | - Agnes Meyder
- Universität Hamburg, ZBH - Center for Bioinformatics; Bundestrasse 43 Hamburg 20146 Germany
| | - Stefan Bietz
- Universität Hamburg, ZBH - Center for Bioinformatics; Bundestrasse 43 Hamburg 20146 Germany
| | - Nadine Schneider
- Universität Hamburg, ZBH - Center for Bioinformatics; Bundestrasse 43 Hamburg 20146 Germany
| | - Gudrun Lange
- Bayer CropScience AG, Industriepark Hoechst; G836 Frankfurt am Main 65926 Germany
| | - Robert Klein
- Bayer CropScience AG, Industriepark Hoechst; G836 Frankfurt am Main 65926 Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics; Bundestrasse 43 Hamburg 20146 Germany
| |
Collapse
|
12
|
Broom A, Jacobi Z, Trainor K, Meiering EM. Computational tools help improve protein stability but with a solubility tradeoff. J Biol Chem 2017; 292:14349-14361. [PMID: 28710274 DOI: 10.1074/jbc.m117.784165] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 07/11/2017] [Indexed: 01/18/2023] Open
Abstract
Accurately predicting changes in protein stability upon amino acid substitution is a much sought after goal. Destabilizing mutations are often implicated in disease, whereas stabilizing mutations are of great value for industrial and therapeutic biotechnology. Increasing protein stability is an especially challenging task, with random substitution yielding stabilizing mutations in only ∼2% of cases. To overcome this bottleneck, computational tools that aim to predict the effect of mutations have been developed; however, achieving accuracy and consistency remains challenging. Here, we combined 11 freely available tools into a meta-predictor (meieringlab.uwaterloo.ca/stabilitypredict/). Validation against ∼600 experimental mutations indicated that our meta-predictor has improved performance over any of the individual tools. The meta-predictor was then used to recommend 10 mutations in a previously designed protein of moderate thermodynamic stability, ThreeFoil. Experimental characterization showed that four mutations increased protein stability and could be amplified through ThreeFoil's structural symmetry to yield several multiple mutants with >2-kcal/mol stabilization. By avoiding residues within functional ties, we could maintain ThreeFoil's glycan-binding capacity. Despite successfully achieving substantial stabilization, however, almost all mutations decreased protein solubility, the most common cause of protein design failure. Examination of the 600-mutation data set revealed that stabilizing mutations on the protein surface tend to increase hydrophobicity and that the individual tools favor this approach to gain stability. Thus, whereas currently available tools can increase protein stability and combining them into a meta-predictor yields enhanced reliability, improvements to the potentials/force fields underlying these tools are needed to avoid gaining protein stability at the cost of solubility.
Collapse
Affiliation(s)
- Aron Broom
- From the Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Zachary Jacobi
- From the Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Kyle Trainor
- From the Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | | |
Collapse
|
13
|
Li M, Goncearenco A, Panchenko AR. Annotating Mutational Effects on Proteins and Protein Interactions: Designing Novel and Revisiting Existing Protocols. Methods Mol Biol 2017; 1550:235-260. [PMID: 28188534 PMCID: PMC5388446 DOI: 10.1007/978-1-4939-6747-6_17] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
In this review we describe a protocol to annotate the effects of missense mutations on proteins, their functions, stability, and binding. For this purpose we present a collection of the most comprehensive databases which store different types of sequencing data on missense mutations, we discuss their relationships, possible intersections, and unique features. Next, we suggest an annotation workflow using the state-of-the art methods and highlight their usability, advantages, and limitations for different cases. Finally, we address a particularly difficult problem of deciphering the molecular mechanisms of mutations on proteins and protein complexes to understand the origins and mechanisms of diseases.
Collapse
Affiliation(s)
- Minghui Li
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
14
|
Striegel DA, Wojtowicz D, Przytycka TM, Periwal V. Correlated rigid modes in protein families. Phys Biol 2016; 13:025003. [PMID: 27063781 DOI: 10.1088/1478-3975/13/2/025003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
A great deal of evolutionarily conserved information is contained in genomes and proteins. Enormous effort has been put into understanding protein structure and developing computational tools for protein folding, and many sophisticated approaches take structure and sequence homology into account. Several groups have applied statistical physics approaches to extracting information about proteins from sequences alone. Here, we develop a new method for sequence analysis based on first principles, in information theory, in statistical physics and in Bayesian analysis. We provide a complete derivation of our approach and we apply it to a variety of systems, to demonstrate its utility and its limitations. We show in some examples that phylogenetic alignments of amino-acid sequences of families of proteins imply the existence of a small number of modes that appear to be associated with correlated global variation. These modes are uncovered efficiently in our approach by computing a non-perturbative effective potential directly from the alignment. We show that this effective potential approaches a limiting form inversely with the logarithm of the number of sequences. Mapping symbol entropy flows along modes to underlying physical structures shows that these modes arise due to correlated compensatory adjustments. In the protein examples, these occur around functional binding pockets.
Collapse
|
15
|
Abstract
Using structure and sequence based analysis we can engineer proteins to increase their thermal stability.
Collapse
Affiliation(s)
- H. Pezeshgi Modarres
- Molecular Cell Biomechanics Laboratory
- Departments of Bioengineering and Mechanical Engineering
- University of California Berkeley
- Berkeley
- USA
| | - M. R. Mofrad
- Molecular Cell Biomechanics Laboratory
- Departments of Bioengineering and Mechanical Engineering
- University of California Berkeley
- Berkeley
- USA
| | - A. Sanati-Nezhad
- BioMEMS and Bioinspired Microfluidic Laboratory
- Department of Mechanical and Manufacturing Engineering
- University of Calgary
- Calgary
- Canada
| |
Collapse
|
16
|
Rohani L, Morton DJ, Wang XQ, Chaudhary J. Relative Stability of Wild-Type and Mutant p53 Core Domain: A Molecular Dynamic Study. J Comput Biol 2015; 23:80-89. [PMID: 26675082 DOI: 10.1089/cmb.2015.0163] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The p53 protein is a stress response protein that functions primarily as a tetrameric transcription factor. A tumor suppressor p53 binds to a specific DNA sequence and transactivates target genes, leading to cell cycle apoptosis. Encoded by the human gene TP53, p53 is a stress response protein that functions primarily as a tetrameric transcription factor. This gene regulates a large number of genes in response to a variety of cellular functions, including oncogene activation and DNA damage. Mutations in p53 are common in human cancer types. Herein we mutate a wild-type p53, 1TSR with four of its mutated proteins. The energy for the wild-type and mutated proteins is calculated by using molecular dynamics simulations along with simulated annealing. Our results show significant differences in energy between hotspot mutations and the wild type. Based on the findings, we investigate the correlation between molar masses of the target residue and the relative energy with respect to the wild type. Our results indicate that the relative energy changes play a pivotal role in bioactivity, in conformity with observations in the rate of mutation in biology.
Collapse
Affiliation(s)
- Leyla Rohani
- 1 Department of Physics and Center for Functional Nanoscale Materials, Clark Atlanta University , Atlanta, Georgia
| | - Derrick J Morton
- 2 Department of Biology, Center for Cancer Research and Therapeutics Development, Clark Atlanta University , Atlanta, Georgia
| | - Xiao-Qian Wang
- 1 Department of Physics and Center for Functional Nanoscale Materials, Clark Atlanta University , Atlanta, Georgia
| | - Jaideep Chaudhary
- 2 Department of Biology, Center for Cancer Research and Therapeutics Development, Clark Atlanta University , Atlanta, Georgia
| |
Collapse
|
17
|
Abstract
Background Reliable prediction of stability changes in protein variants is an important aspect of computational protein design. A number of machine learning methods that allow a classification of stability changes knowing only the sequence of the protein emerged. However, their performance on amino acid substitutions of previously unseen non-homologous proteins is rather limited. Moreover, the performance varies for different types of mutations based on the secondary structure or accessible surface area of the mutation site. Results We proposed feature-based multiple models with each model designed for a specific type of mutations. The new method is composed of five models trained for mutations in exposed, buried, helical, sheet, and coil residues. The classification of a mutation as stabilising or destabilising is made as a consensus of two models, one selected based on the predicted accessible surface area and the other based on the predicted secondary structure of the mutation site. We refer to our new method as Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM). Cross-validation results show that EASE-MM provides a notable improvement to our previous work reaching a Matthews correlation coefficient of 0.44. EASE-MM was able to correctly classify 73% and 75% of stabilising and destabilising protein variants, respectively. Using an independent test set of 238 mutations, we confirmed our results in a comparison with related work. Conclusions EASE-MM not only outperformed other related methods but achieved more balanced results for different types of mutations based on the accessible surface area, secondary structure, or magnitude of stability changes. This can be attributed to using multiple models with the most relevant features selected for the given type of mutations. Therefore, our results support the presumption that different interactions govern stability changes in the exposed and buried residues or in residues with a different secondary structure.
Collapse
|
18
|
Stefl S, Nishi H, Petukh M, Panchenko AR, Alexov E. Molecular mechanisms of disease-causing missense mutations. J Mol Biol 2013; 425:3919-36. [PMID: 23871686 DOI: 10.1016/j.jmb.2013.07.014] [Citation(s) in RCA: 187] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Revised: 07/04/2013] [Accepted: 07/10/2013] [Indexed: 12/23/2022]
Abstract
Genetic variations resulting in a change of amino acid sequence can have a dramatic effect on stability, hydrogen bond network, conformational dynamics, activity and many other physiologically important properties of proteins. The substitutions of only one residue in a protein sequence, so-called missense mutations, can be related to many pathological conditions and may influence susceptibility to disease and drug treatment. The plausible effects of missense mutations range from affecting the macromolecular stability to perturbing macromolecular interactions and cellular localization. Here we review the individual cases and genome-wide studies that illustrate the association between missense mutations and diseases. In addition, we emphasize that the molecular mechanisms of effects of mutations should be revealed in order to understand the disease origin. Finally, we report the current state-of-the-art methodologies that predict the effects of mutations on protein stability, the hydrogen bond network, pH dependence, conformational dynamics and protein function.
Collapse
Affiliation(s)
- Shannon Stefl
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, SC 29634, USA
| | | | | | | | | |
Collapse
|
19
|
Sánchez-González G, Kim JK, Kim DS, Garduño-Juárez R. A beta-complex statistical four body contact potential combined with a hydrogen bond statistical potential recognizes the correct native structure from protein decoy sets. Proteins 2013; 81:1420-33. [PMID: 23568277 DOI: 10.1002/prot.24293] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Revised: 03/04/2013] [Accepted: 03/22/2013] [Indexed: 11/10/2022]
Abstract
We present a new four-body knowledge-based potential for recognizing the native state of proteins from their misfolded states. This potential was extracted from a large set of protein structures determined by X-ray crystallography using BetaMol, a software based on the recent theory of the beta-complex (β-complex) and quasi-triangulation of the Voronoi diagram of spheres. This geometric construct reflects the size difference among atoms in their full Euclidean metric; property not accounted for in a typical 3D Delaunay triangulation. The ability of this potential to identify the native conformation over a large set of decoys was evaluated. Experiments show that this potential outperforms a potential constructed with a classical Delaunay triangulation in decoy discrimination tests. The addition of a statistical hydrogen bond potential to our four-body potential allows a significant improvement in the decoy discrimination, in such a way that we are able to predict successfully the native structure in 90% of cases.
Collapse
|
20
|
Structure-based mutant stability predictions on proteins of unknown structure. J Biotechnol 2012; 161:287-93. [DOI: 10.1016/j.jbiotec.2012.06.020] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Revised: 06/19/2012] [Accepted: 06/22/2012] [Indexed: 11/23/2022]
|
21
|
Toll-Riera M, Bostick D, Albà MM, Plotkin JB. Structure and age jointly influence rates of protein evolution. PLoS Comput Biol 2012; 8:e1002542. [PMID: 22693443 PMCID: PMC3364943 DOI: 10.1371/journal.pcbi.1002542] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2012] [Accepted: 04/17/2012] [Indexed: 12/01/2022] Open
Abstract
What factors determine a protein's rate of evolution are actively debated. Especially unclear is the relative role of intrinsic factors of present-day proteins versus historical factors such as protein age. Here we study the interplay of structural properties and evolutionary age, as determinants of protein evolutionary rate. We use a large set of one-to-one orthologs between human and mouse proteins, with mapped PDB structures. We report that previously observed structural correlations also hold within each age group – including relationships between solvent accessibility, designabililty, and evolutionary rates. However, age also plays a crucial role: age modulates the relationship between solvent accessibility and rate. Additionally, younger proteins, despite being less designable, tend to evolve faster than older proteins. We show that previously reported relationships between age and rate cannot be explained by structural biases among age groups. Finally, we introduce a knowledge-based potential function to study the stability of proteins through large-scale computation. We find that older proteins are more stable for their native structure, and more robust to mutations, than younger ones. Our results underscore that several determinants, both intrinsic and historical, can interact to determine rates of protein evolution. Rates of protein evolution vary dramatically within and between organisms. But the factors that determine a protein's evolutionary rate are still under debate, despite extensive studies over the past decade. Several determinants have been proposed, for example gene expression, the importance of the gene for the organism, the number of physical or genetic interactions it has, its structural characteristics, or when it originated. Here we study how age and structural characteristics interact with one another to influence evolutionary rates. We use a set of one-to-one orthologs of human and mouse proteins, with known crystal structures. We find that these two determinants interact: for example, the age of protein modulates how its structure correlates with evolutionary rate. Nonetheless, the influence of age on evolutionary rate cannot be explained by its interplay with structure.
Collapse
Affiliation(s)
- Macarena Toll-Riera
- Evolutionary Genomics Group, Fundació Institut Municipal d'Investigació Mèdica (FIMIM)- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - David Bostick
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - M. Mar Albà
- Evolutionary Genomics Group, Fundació Institut Municipal d'Investigació Mèdica (FIMIM)- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- * E-mail: (MMA); (JBP)
| | - Joshua B. Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail: (MMA); (JBP)
| |
Collapse
|
22
|
Capriotti E, Nehrt NL, Kann MG, Bromberg Y. Bioinformatics for personal genome interpretation. Brief Bioinform 2012; 13:495-512. [PMID: 22247263 DOI: 10.1093/bib/bbr070] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
An international consortium released the first draft sequence of the human genome 10 years ago. Although the analysis of this data has suggested the genetic underpinnings of many diseases, we have not yet been able to fully quantify the relationship between genotype and phenotype. Thus, a major current effort of the scientific community focuses on evaluating individual predispositions to specific phenotypic traits given their genetic backgrounds. Many resources aim to identify and annotate the specific genes responsible for the observed phenotypes. Some of these use intra-species genetic variability as a means for better understanding this relationship. In addition, several online resources are now dedicated to collecting single nucleotide variants and other types of variants, and annotating their functional effects and associations with phenotypic traits. This information has enabled researchers to develop bioinformatics tools to analyze the rapidly increasing amount of newly extracted variation data and to predict the effect of uncharacterized variants. In this work, we review the most important developments in the field--the databases and bioinformatics tools that will be of utmost importance in our concerted effort to interpret the human variome.
Collapse
Affiliation(s)
- Emidio Capriotti
- Department of Mathematics and Computer Science, University of Balearic Islands, ctra. de Valldemossa Km 7.5, Palma de Mallorca, 07122 Spain.
| | | | | | | |
Collapse
|
23
|
Li Y, Zhang J, Tai D, Middaugh CR, Zhang Y, Fang J. PROTS: a fragment based protein thermo-stability potential. Proteins 2011; 80:81-92. [PMID: 21976375 DOI: 10.1002/prot.23163] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Revised: 07/18/2011] [Accepted: 07/31/2011] [Indexed: 12/30/2022]
Abstract
Designing proteins with enhanced thermo-stability has been a main focus of protein engineering because of its theoretical and practical significance. Despite extensive studies in the past years, a general strategy for stabilizing proteins still remains elusive. Thus effective and robust computational algorithms for designing thermo-stable proteins are in critical demand. Here we report PROTS, a sequential and structural four-residue fragment based protein thermo-stability potential. PROTS is derived from a nonredundant representative collection of thousands of thermophilic and mesophilic protein structures and a large set of point mutations with experimentally determined changes of melting temperatures. To the best of our knowledge, PROTS is the first protein stability predictor based on integrated analysis and mining of these two types of data. Besides conventional cross validation and blind testing, we introduce hypothetical reverse mutations as a means of testing the robustness of protein thermo-stability predictors. In all tests, PROTS demonstrates the ability to reliably predict mutation induced thermo-stability changes as well as classify thermophilic and mesophilic proteins. In addition, this white-box predictor allows easy interpretation of the factors that influence mutation induced protein stability changes at the residue level.
Collapse
Affiliation(s)
- Yunqi Li
- Applied Bioinformatics Laboratory, the University of Kansas, Lawrence, Kansas 66047, USA
| | | | | | | | | | | |
Collapse
|
24
|
Dehouck Y, Kwasigroch JM, Gilis D, Rooman M. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics 2011; 12:151. [PMID: 21569468 PMCID: PMC3113940 DOI: 10.1186/1471-2105-12-151] [Citation(s) in RCA: 367] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Accepted: 05/13/2011] [Indexed: 12/31/2022] Open
Abstract
Background The rational design of modified proteins with controlled stability is of extreme importance in a whole range of applications, notably in the biotechnological and environmental areas, where proteins are used for their catalytic or other functional activities. Future breakthroughs in medical research may also be expected from an improved understanding of the effect of naturally occurring disease-causing mutations on the molecular level. Results PoPMuSiC-2.1 is a web server that predicts the thermodynamic stability changes caused by single site mutations in proteins, using a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the mutated residue. PoPMuSiC presents good prediction performances (correlation coefficient of 0.8 between predicted and measured stability changes, in cross validation, after exclusion of 10% outliers). It is moreover very fast, allowing the prediction of the stability changes resulting from all possible mutations in a medium size protein in less than a minute. This unique functionality is user-friendly implemented in PoPMuSiC and is particularly easy to exploit. Another new functionality of our server concerns the estimation of the optimality of each amino acid in the sequence, with respect to the stability of the structure. It may be used to detect structural weaknesses, i.e. clusters of non-optimal residues, which represent particularly interesting sites for introducing targeted mutations. This sequence optimality data is also expected to have significant implications in the prediction and the analysis of particular structural or functional protein regions. To illustrate the interest of this new functionality, we apply it to a dataset of known catalytic sites, and show that a much larger than average concentration of structural weaknesses is detected, quantifying how these sites have been optimized for function rather than stability. Conclusion The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.
Collapse
Affiliation(s)
- Yves Dehouck
- Bioinformatique génomique et structurale, Université Libre de Bruxelles, Av, Fr, Roosevelt 50, CP165/61, 1050 Brussels, Belgium.
| | | | | | | |
Collapse
|
25
|
Sun W, He J. From isotropic to anisotropic side chain representations: comparison of three models for residue contact estimation. PLoS One 2011; 6:e19238. [PMID: 21552527 PMCID: PMC3084275 DOI: 10.1371/journal.pone.0019238] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 03/29/2011] [Indexed: 11/19/2022] Open
Abstract
The criterion to determine residue contact is a fundamental problem in deriving knowledge-based mean-force potential energy calculations for protein structures. A frequently used criterion is to require the side chain center-to-center distance or the -to- atom distance to be within a pre-determined cutoff distance. However, the spatially anisotropic nature of the side chain determines that it is challenging to identify the contact pairs. This study compares three side chain contact models: the Atom Distance criteria (ADC) model, the Isotropic Sphere Side chain (ISS) model and the Anisotropic Ellipsoid Side chain (AES) model using 424 high resolution protein structures in the Protein Data Bank. The results indicate that the ADC model is the most accurate and ISS is the worst. The AES model eliminates about 95% of the incorrectly counted contact-pairs in the ISS model. Algorithm analysis shows that AES model is the most computational intensive while ADC model has moderate computational cost. We derived a dataset of the mis-estimated contact pairs by AES model. The most misjudged pairs are Arg-Glu, Arg-Asp and Arg-Tyr. Such a dataset can be useful for developing the improved AES model by incorporating the pair-specific information for the cutoff distance.
Collapse
Affiliation(s)
- Weitao Sun
- Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University, Beijing, China.
| | | |
Collapse
|
26
|
Ackerman SH, Gatti DL. The contribution of coevolving residues to the stability of KDO8P synthase. PLoS One 2011; 6:e17459. [PMID: 21408011 PMCID: PMC3052366 DOI: 10.1371/journal.pone.0017459] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2010] [Accepted: 02/03/2011] [Indexed: 12/03/2022] Open
Abstract
Background The evolutionary tree of 3-deoxy-D-manno-octulosonate 8-phosphate (KDO8P) synthase (KDO8PS), a bacterial enzyme that catalyzes a key step in the biosynthesis of bacterial endotoxin, is evenly divided between metal and non-metal forms, both having similar structures, but diverging in various degrees in amino acid sequence. Mutagenesis, crystallographic and computational studies have established that only a few residues determine whether or not KDO8PS requires a metal for function. The remaining divergence in the amino acid sequence of KDO8PSs is apparently unrelated to the underlying catalytic mechanism. Methodology/Principal Findings The multiple alignment of all known KDO8PS sequences reveals that several residue pairs coevolved, an indication of their possible linkage to a structural constraint. In this study we investigated by computational means the contribution of coevolving residues to the stability of KDO8PS. We found that about 1/4 of all strongly coevolving pairs probably originated from cycles of mutation (decreasing stability) and suppression (restoring it), while the remaining pairs are best explained by a succession of neutral or nearly neutral covarions. Conclusions/Significance Both sequence conservation and coevolution are involved in the preservation of the core structure of KDO8PS, but the contribution of coevolving residues is, in proportion, smaller. This is because small stability gains or losses associated with selection of certain residues in some regions of the stability landscape of KDO8PS are easily offset by a large number of possible changes in other regions. While this effect increases the tolerance of KDO8PS to deleterious mutations, it also decreases the probability that specific pairs of residues could have a strong contribution to the thermodynamic stability of the protein.
Collapse
Affiliation(s)
- Sharon H. Ackerman
- Department of Biochemistry and Molecular Biology, Wayne State University School of Medicine, Detroit, Michigan, United States of America
| | - Domenico L. Gatti
- Department of Biochemistry and Molecular Biology, Wayne State University School of Medicine, Detroit, Michigan, United States of America
- Cardiovascular Research Institute, Wayne State University School of Medicine, Detroit, Michigan, United States of America
- * E-mail:
| |
Collapse
|
27
|
Dong Q, Zhou S. Novel nonlinear knowledge-based mean force potentials based on machine learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:476-486. [PMID: 20820079 DOI: 10.1109/tcbb.2010.86] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The prediction of 3D structures of proteins from amino acid sequences is one of the most challenging problems in molecular biology. An essential task for solving this problem with coarse-grained models is to deduce effective interaction potentials. The development and evaluation of new energy functions is critical to accurately modeling the properties of biological macromolecules. Knowledge-based mean force potentials are derived from statistical analysis of proteins of known structures. Current knowledge-based potentials are almost in the form of weighted linear sum of interaction pairs. In this study, a class of novel nonlinear knowledge-based mean force potentials is presented. The potential parameters are obtained by nonlinear classifiers, instead of relative frequencies of interaction pairs against a reference state or linear classifiers. The support vector machine is used to derive the potential parameters on data sets that contain both native structures and decoy structures. Five knowledge-based mean force Boltzmann-based or linear potentials are introduced and their corresponding nonlinear potentials are implemented. They are the DIH potential (single-body residue-level Boltzmann-based potential), the DFIRE-SCM potential (two-body residue-level Boltzmann-based potential), the FS potential (two-body atom-level Boltzmann-based potential), the HR potential (two-body residue-level linear potential), and the T32S3 potential (two-body atom-level linear potential). Experiments are performed on well-established decoy sets, including the LKF data set, the CASP7 data set, and the Decoys “R”Us data set. The evaluation metrics include the energy Z score and the ability of each potential to discriminate native structures from a set of decoy structures. Experimental results show that all nonlinear potentials significantly outperform the corresponding Boltzmann-based or linear potentials, and the proposed discriminative framework is effective in developing knowledge-based mean force potentials. The nonlinear potentials can be widely used for ab initio protein structure prediction, model quality assessment, protein docking, and other challenging problems in computational biology.
Collapse
Affiliation(s)
- Qiwen Dong
- Shanghai Key Lab of Intelligent Information Processing and the School of Computer Science, Fudan University, Old Yifu Building, Room 202-5, 220 Handan Road, Shanhai 200433, China.
| | | |
Collapse
|
28
|
Masso M, Vaisman II. A structure-based computational mutagenesis elucidates the spectrum of stability-activity relationships in proteins. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2011; 2011:3225-3228. [PMID: 22255026 DOI: 10.1109/iembs.2011.6090877] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Protein engineering experiments involving single amino acid substitutions are routinely implemented for the analysis of protein structure, stability, and function. The resulting change in just one of these characteristics relative to the native protein constitutes the focus of any single study, as is the case with predictive computational models developed for the same purpose. Other than investigations into stability-activity trade-offs specifically resulting from active site residue replacements in a few enzymes, a literature survey fails to reveal a comprehensive analysis of stability-activity relationships in proteins upon mutation. Here, we employ a computational mutagenesis for quantifying overall protein structural change upon mutation, which is applied to a dataset of 938 single residue replacements distributed at positions throughout twenty diverse proteins. These mutants are selected based on the availability of both experimental stability and activity change data, and their structural change data are used to characterize the full range of stability-activity relationships.
Collapse
Affiliation(s)
- Majid Masso
- Laboratory for Structural Bioinformatics, School of Systems Biology, George Mason University, Manassas, VA 20110, USA.
| | | |
Collapse
|
29
|
Tian Y, Deutsch C, Krishnamoorthy B. Scoring function to predict solubility mutagenesis. Algorithms Mol Biol 2010; 5:33. [PMID: 20929563 PMCID: PMC2958853 DOI: 10.1186/1748-7188-5-33] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2010] [Accepted: 10/07/2010] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention. RESULTS We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%. AVAILABILITY Executables of programs, tables of weights, and datasets of mutants are available from the following web page: http://www.wsu.edu/~kbala/OptSolMut.html.
Collapse
Affiliation(s)
- Ye Tian
- Department of Mathematics, Washington State University, Pullman, WA 99164, USA
| | | | - Bala Krishnamoorthy
- Department of Mathematics, Washington State University, Pullman, WA 99164, USA
| |
Collapse
|
30
|
Masso M, Vaisman II. Knowledge-based computational mutagenesis for predicting the disease potential of human non-synonymous single nucleotide polymorphisms. J Theor Biol 2010; 266:560-8. [DOI: 10.1016/j.jtbi.2010.07.026] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2010] [Revised: 04/25/2010] [Accepted: 07/21/2010] [Indexed: 10/19/2022]
|
31
|
Esque J, Oguey C, de Brevern AG. A novel evaluation of residue and protein volumes by means of Laguerre tessellation. J Chem Inf Model 2010; 50:947-60. [PMID: 20392096 DOI: 10.1021/ci9004892] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Amino acids control the protein folding process and maintain its functional fold. This study underlines the interest of the Laguerre tessellation to determine relevant amino acid volumes in proteins. Previous studies used a limited number of proteins and only buried residues. The present computations improve the method and results on three main points: (i) a large, high-quality updated and refined data bank of proteins is used; (ii) all residues are taken into account, including those at the protein surface, thanks to (iii) the addition of a realistic solvent. The new values of the average and standard deviation of amino acid volumes show significant corrections with respect to previous studies. Another issue of the method is the polyhedral protein/water interface area (PIA) which quantifies the exposure of atoms or residues to the solvent. We propose this PIA as a new, parameter-free, alternative for measuring accessibility. The comparison with NACCESS is satisfactory; however, the methods disagree in pointing out buried residues: where NACCESS evaluates to zero, the exposure given by PIA ranges from 0 to 20%. Variations of average residue volumes have been analyzed under several conditions, e.g., how they depend on protein size and on secondary structure environments. As it is based on strong mathematical grounds and on numerous high-quality protein structures, our work gives a reliable methodology and up-to-date values of amino acid volumes and surface accessibility.
Collapse
Affiliation(s)
- Jeremy Esque
- LPTM, CNRS UMR 8089, Université de Cergy Pontoise, 2 av. Adolphe Chauvin - 95302 Cergy-Pontoise, France.
| | | | | |
Collapse
|
32
|
Sun W, He J. Understanding on the residue contact network using the log-normal cluster model and the multilevel wheel diagram. Biopolymers 2010; 93:904-16. [DOI: 10.1002/bip.21494] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
33
|
|
34
|
Aita T, Nishigaki K, Husimi Y. Toward the fast blind docking of a peptide to a target protein by using a four-body statistical pseudo-potential. Comput Biol Chem 2010; 34:53-62. [DOI: 10.1016/j.compbiolchem.2009.10.005] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Revised: 09/27/2009] [Accepted: 10/20/2009] [Indexed: 11/26/2022]
|
35
|
Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins 2009; 76:72-85. [PMID: 19089977 DOI: 10.1002/prot.22320] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Protein knowledge-based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse-grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge-based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi-chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi-chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi-chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions.
Collapse
Affiliation(s)
- Marcos R Betancourt
- Department of Physics, Indiana University Purdue University Indianapolis, Indianapolis, Indiana 46202, USA.
| |
Collapse
|