1
|
Chu SKS, Narang K, Siegel JB. Protein stability prediction by fine-tuning a protein language model on a mega-scale dataset. PLoS Comput Biol 2024; 20:e1012248. [PMID: 39038042 PMCID: PMC11293664 DOI: 10.1371/journal.pcbi.1012248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 08/01/2024] [Accepted: 06/13/2024] [Indexed: 07/24/2024] Open
Abstract
Protein stability plays a crucial role in a variety of applications, such as food processing, therapeutics, and the identification of pathogenic mutations. Engineering campaigns commonly seek to improve protein stability, and there is a strong interest in streamlining these processes to enable rapid optimization of highly stabilized proteins with fewer iterations. In this work, we explore utilizing a mega-scale dataset to develop a protein language model optimized for stability prediction. ESMtherm is trained on the folding stability of 528k natural and de novo sequences derived from 461 protein domains and can accommodate deletions, insertions, and multiple-point mutations. We show that a protein language model can be fine-tuned to predict folding stability. ESMtherm performs reasonably on small protein domains and generalizes to sequences distal from the training set. Lastly, we discuss our model's limitations compared to other state-of-the-art methods in generalizing to larger protein scaffolds. Our results highlight the need for large-scale stability measurements on a diverse dataset that mirrors the distribution of sequence lengths commonly observed in nature.
Collapse
Affiliation(s)
- Simon K. S. Chu
- Biophysics Graduate Program, University of California Davis, Davis, California, United States of America
| | - Kush Narang
- College of Biological Sciences, University of California Davis, Davis, California, United States of America
| | - Justin B. Siegel
- Genome Center, University of California Davis, Davis, California, United States of America
- Department of Chemistry, University of California Davis, Davis, California, United States of America
- Department of Biochemistry and Molecular Medicine, University of California Davis, Davis, California, United States of America
| |
Collapse
|
2
|
Tsishyn M, Pucci F, Rooman M. Quantification of biases in predictions of protein-protein binding affinity changes upon mutations. Brief Bioinform 2023; 25:bbad491. [PMID: 38197311 PMCID: PMC10777193 DOI: 10.1093/bib/bbad491] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/02/2023] [Accepted: 12/05/2023] [Indexed: 01/11/2024] Open
Abstract
Understanding the impact of mutations on protein-protein binding affinity is a key objective for a wide range of biotechnological applications and for shedding light on disease-causing mutations, which are often located at protein-protein interfaces. Over the past decade, many computational methods using physics-based and/or machine learning approaches have been developed to predict how protein binding affinity changes upon mutations. They all claim to achieve astonishing accuracy on both training and test sets, with performances on standard benchmarks such as SKEMPI 2.0 that seem overly optimistic. Here we benchmarked eight well-known and well-used predictors and identified their biases and dataset dependencies, using not only SKEMPI 2.0 as a test set but also deep mutagenesis data on the severe acute respiratory syndrome coronavirus 2 spike protein in complex with the human angiotensin-converting enzyme 2. We showed that, even though most of the tested methods reach a significant degree of robustness and accuracy, they suffer from limited generalizability properties and struggle to predict unseen mutations. Interestingly, the generalizability problems are more severe for pure machine learning approaches, while physics-based methods are less affected by this issue. Moreover, undesirable prediction biases toward specific mutation properties, the most marked being toward destabilizing mutations, are also observed and should be carefully considered by method developers. We conclude from our analyses that there is room for improvement in the prediction models and suggest ways to check, assess and improve their generalizability and robustness.
Collapse
Affiliation(s)
- Matsvei Tsishyn
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Roosevelt Ave, 1050, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Brussels, Belgium
| |
Collapse
|
3
|
Kharrat M, Triki CC, Alila-Fersi O, Jallouli O, Khemakham B, Mallouli S, Maalej M, Ammar M, Frikha F, Kamoun F, Fakhfakh F. Combined in Silico Prediction Methods, Molecular Dynamic Simulation, and Molecular Docking of FOXG1 Missense Mutations: Effect on FoxG1 Structure and Its Interactions with DNA and Bmi-1 Protein. J Mol Neurosci 2022; 72:1695-1705. [PMID: 35654936 DOI: 10.1007/s12031-022-02032-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 05/23/2022] [Indexed: 11/28/2022]
Abstract
FoxG1 encoded by FOXG1 gene is a transcriptional factor interacting with the DNA of targeted genes as well as with several proteins to regulate the forebrain development. Mutations in the FOXG1 gene have been shown to cause a wide spectrum of brain disorders, including the congenital variant of Rett syndrome. In this study, the direct sequencing of FOXG1 gene revealed a novel c.645C > A (F215L) variant in the patient P1 and a de novo known one c.755G > A (G252D) in the patient P2. To investigate the putative impact of FOXG1 missense variants, a computational pipeline by the application of in silico prediction methods, molecular dynamic simulation, and molecular docking approaches was used. Bioinformatics analysis and molecular dynamics simulation have demonstrated that F215L and G252D variants found in the DNA binding domain are highly deleterious mutations that may cause the protein structure destabilization. On the other hand, molecular docking revealed that F215L mutant is likely to have a great impact on destabilizing the protein structure and the disruption of the Bmi-1 binding site quite significantly. Regarding G252D mutation, it seems to abolish the ability of FoxG1 to bind DNA target, affecting the transcriptional regulation of targeted genes. Our study highlights the usefulness of combined computational approaches, molecular dynamic simulation, and molecular docking for a better understanding of the dysfunctional effects of FOXG1 missense mutations and their role in the etiopathogenesis as well as in the genotype-phenotype correlation.
Collapse
Affiliation(s)
- Marwa Kharrat
- Laboratory of Molecular and Functional Genetics, Faculty of Science, Sfax University, Sfax, Tunisia.
| | - Chahnez Charfi Triki
- Child Neurology Department, Hedi Chaker Hospital, Sfax, Tunisia.,Research Laboratory (LR19ES15), Sfax Medical School, Sfax University, Sfax, Tunisia
| | - Olfa Alila-Fersi
- Laboratory of Molecular and Functional Genetics, Faculty of Science, Sfax University, Sfax, Tunisia
| | - Olfa Jallouli
- Child Neurology Department, Hedi Chaker Hospital, Sfax, Tunisia.,Research Laboratory (LR19ES15), Sfax Medical School, Sfax University, Sfax, Tunisia
| | - Bassem Khemakham
- Laboratory of Plant Biotechnology, Faculty of Sciences of Sfax, Sfax University, Sfax, Tunisia
| | - Salma Mallouli
- Child Neurology Department, Hedi Chaker Hospital, Sfax, Tunisia.,Research Laboratory (LR19ES15), Sfax Medical School, Sfax University, Sfax, Tunisia
| | - Marwa Maalej
- Laboratory of Molecular and Functional Genetics, Faculty of Science, Sfax University, Sfax, Tunisia
| | - Marwa Ammar
- Laboratory of Molecular and Functional Genetics, Faculty of Science, Sfax University, Sfax, Tunisia
| | - Fakher Frikha
- Laboratory of Molecular and Cellular Screening Processes, Center of Biotechnology of Sfax, University of Sfax, Sfax, Tunisia
| | - Fatma Kamoun
- Child Neurology Department, Hedi Chaker Hospital, Sfax, Tunisia.,Research Laboratory (LR19ES15), Sfax Medical School, Sfax University, Sfax, Tunisia
| | - Faiza Fakhfakh
- Laboratory of Molecular and Functional Genetics, Faculty of Science, Sfax University, Sfax, Tunisia.
| |
Collapse
|
4
|
Exonic SNP in MHC-DMB2 is associated with gene expression and humoral immunity in Japanese quails. Vet Immunol Immunopathol 2021; 239:110302. [PMID: 34311147 DOI: 10.1016/j.vetimm.2021.110302] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 07/01/2021] [Accepted: 07/16/2021] [Indexed: 11/23/2022]
Abstract
The DMB2 gene is widely expressed at high levels in avian. This gene plays an important role in humoral immunity. The aim of this study was to investigate the effects of 361 G > C Single nucleotide polymorphism (SNP) on DMB2 protein structure and gene expression to determine how the 361 G > C SNP affects humoral immune response in Japanese quails. 0.2 mL of 5% sheep red blood cell (SRBC) was injected into breast muscle of 130 Japanese quails on 28 days. After DNA extraction, PCR was carried out to amplify a 333-base pair DNA fragment from the exon 2 of DMB2 gene. The pattern of all samples was determined through RFLP technique. PCR-RFLP results identified two alleles segregating (C, G) as three genotypes (CC, CG and GG) in Japanese Quails. The antibody response to SRBC with CC genotype was significantly higher than the CG and GG genotypes (P < 0.01). In silico analysis showed that the 361 G > C SNP has no effect on the physicochemical properties and 3D structure. The results of RT-qPCR indicated that the effect of genotype on gene expression is significant, so that the expression of CC genotype is more than CG and GG genotype. It can be inferred that the 361 G > C SNP in the exon 2 of MHC-DMB2 gene is not desirable. This mutation decreases humoral immune response by reducing DMB2 gene expression.
Collapse
|
5
|
AbsoluRATE: An in-silico method to predict the aggregation kinetics of native proteins. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2021; 1869:140682. [PMID: 34102324 DOI: 10.1016/j.bbapap.2021.140682] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 05/12/2021] [Accepted: 06/04/2021] [Indexed: 12/12/2022]
Abstract
Protein aggregation has two aspects, namely, mechanistic and kinetics. Understanding protein aggregation kinetics is critical for prediction of progression of diseases caused by amyloidosis, accumulation of aggregates in biotherapeutics during storage and engineering commercial nano-biomaterials. In this work, we have collected experimentally determined absolute protein aggregation rates and developed an SVM based regression model to predict absolute rates of protein and peptide aggregation near-physiological conditions. The regression model achieved a correlation coefficient of 0.72 with MAE of 0.91 (natural log of kapp, where kapp is in hour-1) using leave-one-out cross-validation on a dataset of 82 non-redundant proteins/peptides. The model accounts for the experimental conditions (such as temperature, pH, ionic and protein concentration) and sequence-based properties. The amino acid sequence features revealed by this model as being important for aggregation kinetics, are also associated with the aggregation mechanism. In particular, inherent aggregation propensity of the protein/peptide sequence and number of aggregation prone regions (APRs) unpunctuated by the gatekeeping residues, were found to play important roles in the prediction of the absolute aggregation rates. This analysis shows that mechanism and kinetics of protein aggregation are coupled via common sequence attributes. The aggregation kinetic prediction method developed in this work is available at https://web.iitm.ac.in/bioinfo2/absolurate-pred/index.html.
Collapse
|
6
|
Huang P, Chu SKS, Frizzo HN, Connolly MP, Caster RW, Siegel JB. Evaluating Protein Engineering Thermostability Prediction Tools Using an Independently Generated Dataset. ACS OMEGA 2020; 5:6487-6493. [PMID: 32258884 PMCID: PMC7114132 DOI: 10.1021/acsomega.9b04105] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 03/06/2020] [Indexed: 05/04/2023]
Abstract
Engineering proteins to enhance thermal stability is a widely utilized approach for creating industrially relevant biocatalysts. The development of new experimental datasets and computational tools to guide these engineering efforts remains an active area of research. Thus, to complement the previously reported measures of T 50 and kinetic constants, we are reporting an expansion of our previously published dataset of mutants for β-glucosidase to include both measures of T M and ΔΔG. For a set of 51 mutants, we found that T 50 and T M are moderately correlated, with a Pearson correlation coefficient and Spearman's rank coefficient of 0.58 and 0.47, respectively, indicating that the two methods capture different physical features. The performance of predicted stability using nine computational tools was also evaluated on the dataset of 51 mutants, none of which are found to be strong predictors of the observed changes in T 50, T M, or ΔΔG. Furthermore, the ability of the nine algorithms to predict the production of isolatable soluble protein was examined, which revealed that Rosetta ΔΔG, FoldX, DeepDDG, PoPMuSiC, and SDM were capable of predicting if a mutant could be produced and isolated as a soluble protein. These results further highlight the need for new algorithms for predicting modest, yet important, changes in thermal stability as well as a new utility for current algorithms for prescreening designs for the production of mutants that maintain fold and soluble production properties.
Collapse
Affiliation(s)
- Peishan Huang
- Biophysics
Graduate Group, University of California, Davis 95616, California, United States
| | - Simon K. S. Chu
- Biophysics
Graduate Group, University of California, Davis 95616, California, United States
| | - Henrique N. Frizzo
- Genome
Center, University of California, Davis 95616, California, United States
| | - Morgan P. Connolly
- Microbiology
Graduate Group, University of California, Davis 95616, California, United States
| | - Ryan W. Caster
- Genome
Center, University of California, Davis 95616, California, United States
| | - Justin B. Siegel
- Genome
Center, University of California, Davis 95616, California, United States
- Department
of Biochemistry & Molecular Medicine, University of California, Davis 95616, California, United States
- Department
of Chemistry, University of California, Davis 95616, California, United States
| |
Collapse
|
7
|
Zhang R, Ni S, Kennedy MA. Crystal structure of Alr1298, a pentapeptide repeat protein from the cyanobacterium Nostoc sp. PCC 7120, determined at 2.1 Å resolution. Proteins 2020; 88:1143-1153. [PMID: 32092202 DOI: 10.1002/prot.25882] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 02/13/2020] [Accepted: 02/13/2020] [Indexed: 02/03/2023]
Abstract
Nostoc sp. PCC 7120 are filamentous cyanobacteria capable of both oxygenic photosynthesis and nitrogen fixation, with the latter taking place in specialized cells known as heterocysts that terminally differentiate from vegetative cells under conditions of nitrogen starvation. Cyanobacteria have existed on earth for more than 2 billion years and are thought to be responsible for oxygenation of the earth's atmosphere. Filamentous cyanobacteria such as Nostoc sp. PCC 7120 may also represent the oldest multicellular organisms on earth that undergo cell differentiation. Pentapeptide repeat proteins (PRPs), which occur most abundantly in cyanobacteria, adopt a right-handed quadrilateral β-helical structure, also referred to as a repeat five residue (Rfr) fold, with four-consecutive pentapeptide repeats constituting a single coil in the β-helical structure. PRPs are predicted to exist in all compartments within cyanobacteria including the thylakoid and cell-wall membranes as well as the cytoplasm and thylakoid periplasmic space. Despite their intriguing structure and importance to understanding ancient cyanobacteria, the biochemical function of PRPs in cyanobacteria remains largely unknown. Here we report the crystal structure of Alr1298, a PRP from Nostoc sp. PCC 7120 predicted to reside in the cytoplasm. The structure displays the typical right-handed quadrilateral β-helical structure and includes a four-α-helix cluster capping the N-terminus and a single α-helix capping the C-terminus. A gene cluster analysis indicated that Alr1298 may belong to an operon linked to cell proliferation and/or thylakoid biogenesis. Elevated alr1298 gene expression following nitrogen starvation indicates that Alr1298 may play a role in response to nitrogen starvation and/or heterocyst differentiation.
Collapse
Affiliation(s)
- Ruojing Zhang
- Department of Chemistry and Biochemistry, Miami University, Oxford, Ohio
| | - Shuisong Ni
- Department of Chemistry and Biochemistry, Miami University, Oxford, Ohio
| | - Michael A Kennedy
- Department of Chemistry and Biochemistry, Miami University, Oxford, Ohio
| |
Collapse
|
8
|
Ding X, Zou Z, Brooks Iii CL. Deciphering protein evolution and fitness landscapes with latent space models. Nat Commun 2019; 10:5644. [PMID: 31822668 PMCID: PMC6904478 DOI: 10.1038/s41467-019-13633-0] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 11/12/2019] [Indexed: 12/03/2022] Open
Abstract
Protein sequences contain rich information about protein evolution, fitness landscapes, and stability. Here we investigate how latent space models trained using variational auto-encoders can infer these properties from sequences. Using both simulated and real sequences, we show that the low dimensional latent space representation of sequences, calculated using the encoder model, captures both evolutionary and ancestral relationships between sequences. Together with experimental fitness data and Gaussian process regression, the latent space representation also enables learning the protein fitness landscape in a continuous low dimensional space. Moreover, the model is also useful in predicting protein mutational stability landscapes and quantifying the importance of stability in shaping protein evolution. Overall, we illustrate that the latent space models learned using variational auto-encoders provide a mechanism for exploration of the rich data contained in protein sequences regarding evolution, fitness and stability and hence are well-suited to help guide protein engineering efforts.
Collapse
Affiliation(s)
- Xinqiang Ding
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Zhengting Zou
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Charles L Brooks Iii
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
- Biophysics Program, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
9
|
Koirala M, Alexov E. Computational chemistry methods to investigate the effects caused by DNA variants linked with disease. JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY 2019. [DOI: 10.1142/s0219633619300015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Computational chemistry offers variety of tools to study properties of biological macromolecules. These tools vary in terms of levels of details from quantum mechanical treatment to numerous macroscopic approaches. Here, we provide a review of computational chemistry algorithms and tools for modeling the effects of genetic variations and their association with diseases. Particular emphasis is given on modeling the effects of missense mutations on stability, conformational dynamics, binding, hydrogen bond network, salt bridges, and pH-dependent properties of the corresponding macromolecules. It is outlined that the disease may be caused by alteration of one or several of above-mentioned biophysical characteristics, and a successful prediction of pathogenicity requires detailed analysis of how the alterations affect the function of involved macromolecules. The review provides a short list of most commonly used algorithms to predict the molecular effects of mutations as well.
Collapse
Affiliation(s)
- Mahesh Koirala
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29630, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29630, USA
| |
Collapse
|
10
|
Bandaru S, Alvala M, Nayarisseri A, Sharda S, Goud H, Mundluru HP, Singh SK. Molecular dynamic simulations reveal suboptimal binding of salbutamol in T164I variant of β2 adrenergic receptor. PLoS One 2017; 12:e0186666. [PMID: 29053759 PMCID: PMC5650161 DOI: 10.1371/journal.pone.0186666] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2017] [Accepted: 10/05/2017] [Indexed: 01/09/2023] Open
Abstract
The natural variant C491T (rs1800088) in ADRB2 gene substitutes Threonine to Isoleucine at 164th position in β2AR and results in receptor sequestration and altered binding of agonists. Present investigation pursues to identify the effect of T164I variation on function and structure of β2AR through systematic computational approaches. The study, in addition, addresses altered binding of salbutamol in T164I variant through molecular dynamic simulations. Methods involving changes in free energy, solvent accessibility surface area, root mean square deviations and analysis of binding cavity revealed structural perturbations in receptor to incur upon T164I substitution. For comprehensive understanding of receptor upon substitution, OPLS force field aided molecular dynamic simulations were performed for 10 ns. Simulations revealed massive structural departure for T164I β2AR variant from the native state along with considerably higher root mean square fluctuations of residues near the cavity. Affinity prediction by molecular docking showed two folds reduced affinity of salbutamol in T164I variant. To validate the credibility docking results, simulations for ligand-receptor complex were performed which demonstrated unstable salbutamol-T164I β2AR complex formation. Further, analysis of interactions in course of simulations revealed reduced ligand-receptor interactions of salbutamol in T164I variant. Taken together, studies herein provide structural rationales for suboptimal binding of salbutamol in T164I variant through integrated molecular modeling approaches.
Collapse
Affiliation(s)
- Srinivas Bandaru
- Institute of Genetics and Hospital for Genetic Diseases, Osmania University, Hyderabad, India
- Molecular Modeling Lab, Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, Hyderabad, India
| | - Mallika Alvala
- Molecular Modeling Lab, Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, Hyderabad, India
| | - Anuraj Nayarisseri
- In Silico Research Laboratory, Eminent Biosciences, Indore, Madhya Pradesh, India
- Bioinformatics Research Laboratory, LeGene Biosciences Private Limited, Indore, Madhya Pradesh, India
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi, Tamil Nadu, India
| | - Saphy Sharda
- In Silico Research Laboratory, Eminent Biosciences, Indore, Madhya Pradesh, India
| | - Himshikha Goud
- In Silico Research Laboratory, Eminent Biosciences, Indore, Madhya Pradesh, India
| | - Hema Prasad Mundluru
- Institute of Genetics and Hospital for Genetic Diseases, Osmania University, Hyderabad, India
| | - Sanjeev Kumar Singh
- Computer Aided Drug Designing and Molecular Modeling Lab, Department of Bioinformatics, Alagappa University, Karaikudi, Tamil Nadu, India
- * E-mail:
| |
Collapse
|
11
|
Lugo-Martinez J, Pejaver V, Pagel KA, Jain S, Mort M, Cooper DN, Mooney SD, Radivojac P. The Loss and Gain of Functional Amino Acid Residues Is a Common Mechanism Causing Human Inherited Disease. PLoS Comput Biol 2016; 12:e1005091. [PMID: 27564311 PMCID: PMC5001644 DOI: 10.1371/journal.pcbi.1005091] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 08/02/2016] [Indexed: 01/12/2023] Open
Abstract
Elucidating the precise molecular events altered by disease-causing genetic variants represents a major challenge in translational bioinformatics. To this end, many studies have investigated the structural and functional impact of amino acid substitutions. Most of these studies were however limited in scope to either individual molecular functions or were concerned with functional effects (e.g. deleterious vs. neutral) without specifically considering possible molecular alterations. The recent growth of structural, molecular and genetic data presents an opportunity for more comprehensive studies to consider the structural environment of a residue of interest, to hypothesize specific molecular effects of sequence variants and to statistically associate these effects with genetic disease. In this study, we analyzed data sets of disease-causing and putatively neutral human variants mapped to protein 3D structures as part of a systematic study of the loss and gain of various types of functional attribute potentially underlying pathogenic molecular alterations. We first propose a formal model to assess probabilistically function-impacting variants. We then develop an array of structure-based functional residue predictors, evaluate their performance, and use them to quantify the impact of disease-causing amino acid substitutions on catalytic activity, metal binding, macromolecular binding, ligand binding, allosteric regulation and post-translational modifications. We show that our methodology generates actionable biological hypotheses for up to 41% of disease-causing genetic variants mapped to protein structures suggesting that it can be reliably used to guide experimental validation. Our results suggest that a significant fraction of disease-causing human variants mapping to protein structures are function-altering both in the presence and absence of stability disruption. Identifying the molecular changes caused by mutations is a major challenge in understanding and treating human genetic disease. To address this problem, we have developed a wide range of profiling tools designed to predict specific types of functional site from protein 3D structures. We then apply these tools to data sets of inherited disease-associated and putatively neutral amino acid substitutions and estimate the relative contribution of the loss and gain of functional residues in disease. Our results suggest that alterations of molecular function are involved in a significant number of cases of human genetic disease and are over-represented as compared to putatively neutral variants. Additionally, we use experimental data to show that it is possible to computationally identify the loss of specific functional events in disease pathogenesis. Finally, our methodology can be used to reliably identify the potential molecular consequences of disease-causing genetic variants and hence prioritize experimental validation.
Collapse
Affiliation(s)
- Jose Lugo-Martinez
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Vikas Pejaver
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Kymberleigh A. Pagel
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Shantanu Jain
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - David N. Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, United Kingdom
| | - Sean D. Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, United States of America
- * E-mail: (SDM); (PR)
| | - Predrag Radivojac
- Department of Computer Science and Informatics, Indiana University, Bloomington, Indiana, United States of America
- * E-mail: (SDM); (PR)
| |
Collapse
|
12
|
Structure Based Thermostability Prediction Models for Protein Single Point Mutations with Machine Learning Tools. PLoS One 2015; 10:e0138022. [PMID: 26361227 PMCID: PMC4567301 DOI: 10.1371/journal.pone.0138022] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Accepted: 08/24/2015] [Indexed: 11/19/2022] Open
Abstract
Thermostability issue of protein point mutations is a common occurrence in protein engineering. An application which predicts the thermostability of mutants can be helpful for guiding decision making process in protein design via mutagenesis. An in silico point mutation scanning method is frequently used to find “hot spots” in proteins for focused mutagenesis. ProTherm (http://gibk26.bio.kyutech.ac.jp/jouhou/Protherm/protherm.html) is a public database that consists of thousands of protein mutants’ experimentally measured thermostability. Two data sets based on two differently measured thermostability properties of protein single point mutations, namely the unfolding free energy change (ddG) and melting temperature change (dTm) were obtained from this database. Folding free energy change calculation from Rosetta, structural information of the point mutations as well as amino acid physical properties were obtained for building thermostability prediction models with informatics modeling tools. Five supervised machine learning methods (support vector machine, random forests, artificial neural network, naïve Bayes classifier, K nearest neighbor) and partial least squares regression are used for building the prediction models. Binary and ternary classifications as well as regression models were built and evaluated. Data set redundancy and balancing, the reverse mutations technique, feature selection, and comparison to other published methods were discussed. Rosetta calculated folding free energy change ranked as the most influential features in all prediction models. Other descriptors also made significant contributions to increasing the accuracy of the prediction models.
Collapse
|
13
|
Narayana Swamy A, Valasala H, Kamma S. In silico Evaluation of Nonsynonymous Single Nucleotide Polymorphisms in the ADIPOQ Gene Associated with Diabetes, Obesity, and Inflammation. Avicenna J Med Biotechnol 2015; 7:121-7. [PMID: 26306152 PMCID: PMC4508335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2015] [Accepted: 05/25/2015] [Indexed: 10/29/2022] Open
Abstract
BACKGROUND The human ADIPOQ gene encodes adiponectin protein hormone, which is involved in regulating glucose levels as well as fatty acid breakdown. It is exclusively produced by adipose tissue and abundantly present in the circulation, with concentration of around 0.01% of total serum proteins, with important effect on metabolism. METHODS Most deleterious nonsynonymous single nucleotide polymorphisms in the coding region of the ADIPOQ gene were investigated using SNP databases, and detected nonsynonymous variants were analyzed in silico from the standpoint of relevant protein function and stability by using SIFT, PolyPhen-2, PROVEAN and MUpro, I-Mutant2.0 tools, respectively. RESULT A total of 58 nonsynonymous SNPs consisting of 55 missense variations, 3 nonsense variations were found in the ADIPOQ gene. Next, 14 of the 55 missense variants were predicted to be damaging or deleterious by three different software programs (PolyPhen-2, SIFT, and PROVEAN), and 38 of them were predicted to be less stable (I-Mutant 2.0 and MUpro software). Totally, 10 variants out of 55 missense variants were predicted to be both deleterious and reduce protein stability. Additionally, 3 nonsense variants were predicted to produce a truncated ADIPOQ protein. RMSD and total energy were calculated for 4 nsSNPs out of 10 nsSNPs which were both deleterious and showed a decrease in protein stability. CONCLUSION rs144526209 has high root-mean-square deviation (RMSD) and lower total energy value compared to the native modeled structure. It was concluded that this nsSNP, potentially functional and polymorphic in the ADIPOQ gene, might be associated with diabetes, obesity, and inflammation.
Collapse
Affiliation(s)
| | | | - Sreenivasulu Kamma
- Corresponding author: Sreenivasulu Kamma, Ph.D., Department of Biotechnology, KL University, Vaddeswaram, Vijayawada, A.P India, Tel: +919 849519527, E-mail:
| |
Collapse
|
14
|
Rosse SA, Auer PL, Carlson CS. Functional annotation of putative regulatory elements at cancer susceptibility Loci. Cancer Inform 2014; 13:5-17. [PMID: 25288875 PMCID: PMC4179605 DOI: 10.4137/cin.s13789] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2014] [Revised: 06/16/2014] [Accepted: 06/17/2014] [Indexed: 01/07/2023] Open
Abstract
Most cancer-associated genetic variants identified from genome-wide association studies (GWAS) do not obviously change protein structure, leading to the hypothesis that the associations are attributable to regulatory polymorphisms. Translating genetic associations into mechanistic insights can be facilitated by knowledge of the causal regulatory variant (or variants) responsible for the statistical signal. Experimental validation of candidate functional variants is onerous, making bioinformatic approaches necessary to prioritize candidates for laboratory analysis. Thus, a systematic approach for recognizing functional (and, therefore, likely causal) variants in noncoding regions is an important step toward interpreting cancer risk loci. This review provides a detailed introduction to current regulatory variant annotations, followed by an overview of how to leverage these resources to prioritize candidate functional polymorphisms in regulatory regions.
Collapse
Affiliation(s)
- Stephanie A Rosse
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Paul L Auer
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA. ; School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
| | - Christopher S Carlson
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA. ; Department of Epidemiology, University of Washington, Seattle, WA, USA
| |
Collapse
|
15
|
Peterson LX, Kang X, Kihara D. Assessment of protein side-chain conformation prediction methods in different residue environments. Proteins 2014; 82:1971-84. [PMID: 24619909 PMCID: PMC5007623 DOI: 10.1002/prot.24552] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 03/02/2014] [Accepted: 03/07/2014] [Indexed: 11/09/2022]
Abstract
Computational prediction of side-chain conformation is an important component of protein structure prediction. Accurate side-chain prediction is crucial for practical applications of protein structure models that need atomic-detailed resolution such as protein and ligand design. We evaluated the accuracy of eight side-chain prediction methods in reproducing the side-chain conformations of experimentally solved structures deposited to the Protein Data Bank. Prediction accuracy was evaluated for a total of four different structural environments (buried, surface, interface, and membrane-spanning) in three different protein types (monomeric, multimeric, and membrane). Overall, the highest accuracy was observed for buried residues in monomeric and multimeric proteins. Notably, side-chains at protein interfaces and membrane-spanning regions were better predicted than surface residues even though the methods did not all use multimeric and membrane proteins for training. Thus, we conclude that the current methods are as practically useful for modeling protein docking interfaces and membrane-spanning regions as for modeling monomers.
Collapse
Affiliation(s)
- Lenna X. Peterson
- Department of Biological Sciences, Purdue University, West Lafayette IN, 47907, USA
| | - Xuejiao Kang
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette IN, 47907, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
16
|
Dudek MJ. A detailed representation of electrostatic energy in prediction of sequence and pH dependence of protein stability. Proteins 2014; 82:2497-511. [DOI: 10.1002/prot.24613] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/11/2014] [Accepted: 05/15/2014] [Indexed: 11/05/2022]
Affiliation(s)
- Michael J. Dudek
- Protabit LLC; 250 S Oak Knoll Ave. #211 Pasadena California 91101
| |
Collapse
|
17
|
Peterson TA, Doughty E, Kann MG. Towards precision medicine: advances in computational approaches for the analysis of human variants. J Mol Biol 2013; 425:4047-63. [PMID: 23962656 PMCID: PMC3807015 DOI: 10.1016/j.jmb.2013.08.008] [Citation(s) in RCA: 93] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Revised: 08/07/2013] [Accepted: 08/08/2013] [Indexed: 12/26/2022]
Abstract
Variations and similarities in our individual genomes are part of our history, our heritage, and our identity. Some human genomic variants are associated with common traits such as hair and eye color, while others are associated with susceptibility to disease or response to drug treatment. Identifying the human variations producing clinically relevant phenotypic changes is critical for providing accurate and personalized diagnosis, prognosis, and treatment for diseases. Furthermore, a better understanding of the molecular underpinning of disease can lead to development of new drug targets for precision medicine. Several resources have been designed for collecting and storing human genomic variations in highly structured, easily accessible databases. Unfortunately, a vast amount of information about these genetic variants and their functional and phenotypic associations is currently buried in the literature, only accessible by manual curation or sophisticated text text-mining technology to extract the relevant information. In addition, the low cost of sequencing technologies coupled with increasing computational power has enabled the development of numerous computational methodologies to predict the pathogenicity of human variants. This review provides a detailed comparison of current human variant resources, including HGMD, OMIM, ClinVar, and UniProt/Swiss-Prot, followed by an overview of the computational methods and techniques used to leverage the available data to predict novel deleterious variants. We expect these resources and tools to become the foundation for understanding the molecular details of genomic variants leading to disease, which in turn will enable the promise of precision medicine.
Collapse
Affiliation(s)
- Thomas A Peterson
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| | - Emily Doughty
- Biomedical Informatics Program, Stanford University, Stanford, CA 94305, USA
| | - Maricel G Kann
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA
| |
Collapse
|
18
|
Verma R, Schwaneberg U, Roccatano D. Computer-Aided Protein Directed Evolution: a Review of Web Servers, Databases and other Computational Tools for Protein Engineering. Comput Struct Biotechnol J 2012; 2:e201209008. [PMID: 24688649 PMCID: PMC3962222 DOI: 10.5936/csbj.201209008] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Revised: 10/07/2012] [Accepted: 10/12/2012] [Indexed: 12/01/2022] Open
Abstract
The combination of computational and directed evolution methods has proven a winning strategy for protein engineering. We refer to this approach as computer-aided protein directed evolution (CAPDE) and the review summarizes the recent developments in this rapidly growing field. We will restrict ourselves to overview the availability, usability and limitations of web servers, databases and other computational tools proposed in the last five years. The goal of this review is to provide concise information about currently available computational resources to assist the design of directed evolution based protein engineering experiment.
Collapse
Affiliation(s)
- Rajni Verma
- School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany ; Department of Biotechnology, RWTH Aachen University, Worringer Weg 1, 52074 Aachen, Germany
| | - Ulrich Schwaneberg
- Department of Biotechnology, RWTH Aachen University, Worringer Weg 1, 52074 Aachen, Germany
| | - Danilo Roccatano
- School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany
| |
Collapse
|
19
|
In silico analysis of single nucleotide polymorphism (SNPs) in human β-globin gene. PLoS One 2011; 6:e25876. [PMID: 22028795 PMCID: PMC3197589 DOI: 10.1371/journal.pone.0025876] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2011] [Accepted: 09/12/2011] [Indexed: 11/23/2022] Open
Abstract
Single amino acid substitutions in the globin chain are the most common forms of genetic variations that produce hemoglobinopathies- the most widespread inherited disorders worldwide. Several hemoglobinopathies result from homozygosity or compound heterozygosity to beta-globin (HBB) gene mutations, such as that producing sickle cell hemoglobin (HbS), HbC, HbD and HbE. Several of these mutations are deleterious and result in moderate to severe hemolytic anemia, with associated complications, requiring lifelong care and management. Even though many hemoglobinopathies result from single amino acid changes producing similar structural abnormalities, there are functional differences in the generated variants. Using in silico methods, we examined the genetic variations that can alter the expression and function of the HBB gene. Using a sequence homology-based Sorting Intolerant from Tolerant (SIFT) server we have searched for the SNPs, which showed that 200 (80%) non-synonymous polymorphism were found to be deleterious. The structure-based method via PolyPhen server indicated that 135 (40%) non-synonymous polymorphism may modify protein function and structure. The Pupa Suite software showed that the SNPs will have a phenotypic consequence on the structure and function of the altered protein. Structure analysis was performed on the key mutations that occur in the native protein coded by the HBB gene that causes hemoglobinopathies such as: HbC (E→K), HbD (E→Q), HbE (E→K) and HbS (E→V). Atomic Non-Local Environment Assessment (ANOLEA), Yet Another Scientific Artificial Reality Application (YASARA), CHARMM-GUI webserver for macromolecular dynamics and mechanics, and Normal Mode Analysis, Deformation and Refinement (NOMAD-Ref) of Gromacs server were used to perform molecular dynamics simulations and energy minimization calculations on β-Chain residue of the HBB gene before and after mutation. Furthermore, in the native and altered protein models, amino acid residues were determined and secondary structures were observed for solvent accessibility to confirm the protein stability. The functional study in this investigation may be a good model for additional future studies.
Collapse
|
20
|
Horst JA, Wang K, Horst OV, Cunningham ML, Samudrala R. Disease risk of missense mutations using structural inference from predicted function. Curr Protein Pept Sci 2011; 11:573-88. [PMID: 20887259 DOI: 10.2174/138920310794109139] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2010] [Accepted: 07/27/2010] [Indexed: 12/17/2022]
Abstract
Advancements in sequencing techniques place personalized genomic medicine upon the horizon, bringing along the responsibility of clinicians to understand the likelihood for a mutation to cause disease, and of scientists to separate etiology from nonpathologic variability. Pathogenicity is discernable from patterns of interactions between a missense mutation, the surrounding protein structure, and intermolecular interactions. Physicochemical stability calculations are not accessible without structures, as is the case for the vast majority of human proteins, so diagnostic accuracy remains in infancy. To model the effects of missense mutations on functional stability without structure, we combine novel protein sequence analysis algorithms to discern spatial distributions of sequence, evolutionary, and physicochemical conservation, through a new approach to optimize component selection. Novel components include a combinatory substitution matrix and two heuristic algorithms that detect positions which confer structural support to interaction interfaces. The method reaches 0.91 AUC in ten-fold cross-validation to predict alteration of function for 6,392 in vitro mutations. For clinical utility we trained the method on 7,022 disease associated missense mutations within the Online Mendelian inheritance in man amongst a larger randomized set. In a blinded prospective test to delineate mutations unique to 186 patients with craniosynostosis from those in the 95 highly variant Coriell controls and 1000 age matched controls, we achieved roughly 1/3 sensitivity and perfect specificity. The component algorithms retained during machine learning constitute novel protein sequence analysis techniques to describe environments supporting neutrality or pathology of mutations. This approach to pathogenetics enables new insight into the mechanistic relationship of missense mutations to disease phenotypes in our patients.
Collapse
Affiliation(s)
- Jeremy A Horst
- Department of Microbiology School of Medicine, University of Washington, 1959 NE Pacific St 357132, Seattle, WA 98195, USA
| | | | | | | | | |
Collapse
|
21
|
Andreotti G, Guarracino MR, Cammisa M, Correra A, Cubellis MV. Prediction of the responsiveness to pharmacological chaperones: lysosomal human alpha-galactosidase, a case of study. Orphanet J Rare Dis 2010; 5:36. [PMID: 21138548 PMCID: PMC3016270 DOI: 10.1186/1750-1172-5-36] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2010] [Accepted: 12/07/2010] [Indexed: 01/23/2023] Open
Abstract
Background The pharmacological chaperones therapy is a promising approach to cure genetic diseases. It relies on substrate competitors used at sub-inhibitory concentration which can be administered orally, reach difficult tissues and have low cost. Clinical trials are currently carried out for Fabry disease, a lysosomal storage disorder caused by inherited genetic mutations of alpha-galactosidase. Regrettably, not all genotypes respond to these drugs. Results We collected the experimental data available in literature on the enzymatic activity of ninety-six missense mutants of lysosomal alpha-galactosidase measured in the presence of pharmacological chaperones. We associated with each mutation seven features derived from the analysis of 3D-structure of the enzyme, two features associated with their thermo-dynamic stability and four features derived from sequence alone. Structural and thermodynamic analysis explains why some mutants of human lysosomal alpha-galactosidase cannot be rescued by pharmacological chaperones: approximately forty per cent of the non responsive cases examined can be correctly associated with a negative prognostic feature. They include mutations occurring in the active site pocket, mutations preventing disulphide bridge formation and severely destabilising mutations. Despite this finding, prediction of mutations responsive to pharmacological chaperones cannot be achieved with high accuracy relying on combinations of structure- and thermodynamic-derived features even with the aid of classical and state of the art statistical learning methods. We developed a procedure to predict responsive mutations with an accuracy as high as 87%: the method scores the mutations by using a suitable position-specific substitution matrix. Our approach is of general applicability since it does not require the knowledge of 3D-structure but relies only on the sequence. Conclusions Responsiveness to pharmacological chaperones depends on the structural/functional features of the disease-associated protein, whose complex interplay is best reflected on sequence conservation by evolutionary pressure. We propose a predictive method which can be applied to screen novel mutations of alpha galactosidase. The same approach can be extended on a genomic scale to find candidates for therapy with pharmacological chaperones among proteins with unknown tertiary structures.
Collapse
|
22
|
Mehta KR, Chan YM, Lee MX, Yang CY, Voloshchuk N, Montclare JK. Mutagenesis of tGCN5 core region reveals two critical surface residues F90 and R140. Biochem Biophys Res Commun 2010; 400:363-8. [DOI: 10.1016/j.bbrc.2010.08.069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2010] [Accepted: 08/17/2010] [Indexed: 12/01/2022]
|
23
|
Huang RB, Du QS, Wang CH, Liao SM, Chou KC. A fast and accurate method for predicting pKa of residues in proteins. Protein Eng Des Sel 2010; 23:35-42. [PMID: 19926592 DOI: 10.1093/protein/gzp067] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Predicting the pH-activities of residues in proteins is an important problem in enzyme engineering and protein design. A novel predictor called 'Pred-pK(a)' was developed based on the physicochemical properties of amino acids and protein 3D structure. The Pred-pK(a) approach considers the influence of all other residues of the protein to predict the pK(a) value of an ionizable residue. An empirical equation was formulated, in which the pK(a) value was a distance-dependent function of physicochemical parameters of 20 amino acid types, describing their electrostatic and van der Waals interaction, as well as the effects of hydrogen bonds and solvation. Two sets of coefficients, {a(alpha)} and {b(l)}, were used in the predictor: {a(alpha)} is the weight factors of 20 amino acid types and {b(l)} is the weight factors of physicochemical properties of amino acids. An iterative double least square procedure was proposed to solve the two sets of weight factors alternately and iteratively in a training set. The two coefficient sets {a(alpha)} and {b(l)} thus obtained were used to predict the pK(a) values of residues in a protein. The average predictive error is +/-0.6 pH in less than a minute in common personal computer.
Collapse
Affiliation(s)
- Ri-Bo Huang
- Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi 530004, People's Republic of China
| | | | | | | | | |
Collapse
|
24
|
Ozen A, Gönen M, Alpaydan E, Haliloğlu T. Machine learning integration for predicting the effect of single amino acid substitutions on protein stability. BMC STRUCTURAL BIOLOGY 2009; 9:66. [PMID: 19840377 PMCID: PMC2777163 DOI: 10.1186/1472-6807-9-66] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 10/19/2009] [Indexed: 11/10/2022]
Abstract
BACKGROUND Computational prediction of protein stability change due to single-site amino acid substitutions is of interest in protein design and analysis. We consider the following four ways to improve the performance of the currently available predictors: (1) We include additional sequence- and structure-based features, namely, the amino acid substitution likelihoods, the equilibrium fluctuations of the alpha- and beta-carbon atoms, and the packing density. (2) By implementing different machine learning integration approaches, we combine information from different features or representations. (3) We compare classification vs. regression methods to predict the sign vs. the output of stability change. (4) We allow a reject option for doubtful cases where the risk of misclassification is high. RESULTS We investigate three different approaches: early, intermediate and late integration, which respectively combine features, kernels over feature subsets, and decisions. We perform simulations on two data sets: (1) S1615 is used in previous studies, (2) S2783 is the updated version (as of July 2, 2009) extracted also from ProTherm. For S1615 data set, our highest accuracy using both sequence and structure information is 0.842 on cross-validation and 0.904 on testing using early integration. Newly added features, namely, local compositional packing and the mobility extent of the mutated residues, improve accuracy significantly with intermediate integration. For S2783 data set, we also train regression methods to estimate not only the sign but also the amount of stability change and apply risk-based classification to reject when the learner has low confidence and the loss of misclassification is high. The highest accuracy is 0.835 on cross-validation and 0.832 on testing using only sequence information. The percentage of false positives can be decreased to less than 0.005 by rejecting 10 per cent using late integration. CONCLUSION We find that in both early and late integration, combining inputs or decisions is useful in increasing accuracy. Intermediate integration allows assessing the contributions of individual features by looking at the assigned weights. Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option. The server for stability prediction for three integration approaches and the data sets are available at http://www.prc.boun.edu.tr/appserv/prc/mlsta.
Collapse
Affiliation(s)
- Ayşegül Ozen
- Department of Chemical Engineering, Polymer Research Center, Boğaziçi University, Istanbul, Turkey.
| | | | | | | |
Collapse
|
25
|
Lee S, Brown A, Pitt WR, Higueruelo AP, Gong S, Bickerton GR, Schreyer A, Tanramluk D, Baylay A, Blundell TL. Structural interactomics: informatics approaches to aid the interpretation of genetic variation and the development of novel therapeutics. MOLECULAR BIOSYSTEMS 2009; 5:1456-72. [DOI: 10.1039/b906402h] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
26
|
Demenkov PS, Aman EE, Ivanisenko VA. Prediction of the changes in thermodynamic stability of proteins caused by single amino acid substitutions. Biophysics (Nagoya-shi) 2008. [DOI: 10.1134/s0006350906070104] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
27
|
Bueno M, Camacho CJ, Sancho J. SIMPLE estimate of the free energy change due to aliphatic mutations: Superior predictions based on first principles. Proteins 2007; 68:850-62. [PMID: 17523191 DOI: 10.1002/prot.21453] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The bioinformatics revolution of the last decade has been instrumental in the development of empirical potentials to quantitatively estimate protein interactions for modeling and design. Although computationally efficient, these potentials hide most of the relevant thermodynamics in 5-to-40 parameters that are fitted against a large experimental database. Here, we revisit this longstanding problem and show that a careful consideration of the change in hydrophobicity, electrostatics, and configurational entropy between the folded and unfolded state of aliphatic point mutations predicts 20-30% less false positives and yields more accurate predictions than any published empirical energy function. This significant improvement is achieved with essentially no free parameters, validating past theoretical and experimental efforts to understand the thermodynamics of protein folding. Our first principle analysis strongly suggests that both the solute-solute van der Waals interactions in the folded state and the electrostatics free energy change of exposed aliphatic mutations are almost completely compensated by similar interactions operating in the unfolded ensemble. Not surprisingly, the problem of properly accounting for the solvent contribution to the free energy of polar and charged group mutations, as well as of mutations that disrupt the protein backbone remains open.
Collapse
Affiliation(s)
- Marta Bueno
- Department of Computational Biology, University of Pittsburgh, Pennsylvania, USA
| | | | | |
Collapse
|
28
|
Mechanisms for stabilisation and the maintenance of solubility in proteins from thermophiles. BMC STRUCTURAL BIOLOGY 2007; 7:18. [PMID: 17394655 PMCID: PMC1851960 DOI: 10.1186/1472-6807-7-18] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2006] [Accepted: 03/29/2007] [Indexed: 01/26/2023]
Abstract
Background The database of protein structures contains representatives from organisms with a range of growth temperatures. Various properties have been studied in a search for the molecular basis of protein adaptation to higher growth temperature. Charged groups have emerged as key distinguishing factors for proteins from thermophiles and mesophiles. Results A dataset of 291 thermophile-derived protein structures is compared with mesophile proteins. Calculations of electrostatic interactions support the importance of charges, but indicate that increases in charge contribution to folded state stabilisation do not generally correlate with the numbers of charged groups. Relative propensities of charged groups vary, such as the substitution of glutamic for aspartic acid sidechains. Calculations suggest an energetic basis, with less dehydration for longer sidechains. Most other properties studied show weak or insignificant separation of proteins from moderate thermophiles or hyperthermophiles and mesophiles, including an estimate of the difference in sidechain rotameric entropy upon protein folding. An exception is increased burial of alanine and proline residues and decreased burial of phenylalanine, methionine, tyrosine and tryptophan in hyperthermophile proteins compared to those from mesophiles. Conclusion Since an increase in the number of charged groups for hyperthermophile proteins is separable from charged group contribution to folded state stability, we hypothesise that charged group propensity is important in the context of protein solubility and the prevention of aggregation. Accordingly we find some separation between mesophile and hyperthermophile proteins when looking at the largest surface patch that does not contain a charged sidechain. With regard to our observation that aromatic sidechains are less buried in hyperthermophile proteins, further analysis indicates that the placement of some of these groups may facilitate the reduction of folding fluctuations in proteins of the higher growth temperature organisms.
Collapse
|
29
|
Huang LT, Gromiha MM, Hwang SF, Ho SY. Knowledge acquisition and development of accurate rules for predicting protein stability changes. Comput Biol Chem 2006; 30:408-15. [PMID: 17000135 DOI: 10.1016/j.compbiolchem.2006.06.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2006] [Revised: 06/19/2006] [Accepted: 06/19/2006] [Indexed: 11/22/2022]
Abstract
Knowing the mechanisms by which protein stability change is one of the most important and valuable tasks in molecular biology. The conventional methods of predicting protein stability changes mainly focus on improving prediction accuracy. However, it is desirable to extract domain knowledge from large databases that is beneficial to accurate prediction of the protein stability change. This paper presents an interpretable prediction tree method (named iPTREE) that produces explanatory rules to explore hidden knowledge accompanied with high prediction accuracy and consequently analyzes the factors influencing the protein stability changes. To evaluate iPTREE and the knowledge upon protein stability changes, a thermodynamic dataset consisting of 1615 mutants led by single point mutation from ProTherm is adopted. Being as a predictor for protein stability changes, the rule-based approach can achieve a prediction accuracy of 87%, which is better than other methods based on artificial neural networks (ANN) and support vector machines (SVM). Besides, these methods lack the ability in biological knowledge discovery. The human-interpretable rules produced by iPTREE reveal that temperature is a factor of concern in predicting protein stability changes. For example, one of interpretable rules with high support is as follows: if the introduced residue type is Alanine and temperature is between 4 degrees C and 40 degrees C, then the stability change will be negative (destabilizing). The present study demonstrates that iPTREE can easily be used in the application of protein stability changes where one requires more understandable knowledge.
Collapse
Affiliation(s)
- Liang-Tsung Huang
- Institute of Information Engineering and Computer Science, Feng Chia University, Taichung 407, Taiwan
| | | | | | | |
Collapse
|
30
|
Cheng J, Randall A, Baldi P. Prediction of protein stability changes for single-site mutations using support vector machines. Proteins 2006; 62:1125-32. [PMID: 16372356 DOI: 10.1002/prot.20810] [Citation(s) in RCA: 666] [Impact Index Per Article: 37.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. We use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. We evaluate our approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy-a significant improvement over previously published results. Moreover, the experimental results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because our method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MUpro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.
Collapse
Affiliation(s)
- Jianlin Cheng
- Institute for Genomics and Bioinformatics, School of Information and Computer Sciences, University of California, Irvine, California 92697-3425, USA
| | | | | |
Collapse
|
31
|
Hoppe C, Schomburg D. Prediction of protein thermostability with a direction- and distance-dependent knowledge-based potential. Protein Sci 2005; 14:2682-92. [PMID: 16155198 PMCID: PMC2253293 DOI: 10.1110/ps.04940705] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The increasing use of enzymes in industrial processes and the importance of understanding protein folding and stability have led to several attempts to predict and quantify the effect of every possible amino acid exchange (mutation) on the thermostability of proteins. In this article we describe a knowledge-based discrimination function that acts as a fast and reliable guide in protein engineering and optimization. The function used consists of two parts, a pairwise energy function based on a distance- and direction-dependent atomic description of the amino acid environment, and a torsion angle energy function. In a first step a training set of 11 proteins including 646 mutant proteins with experimentally determined thermostability was used to optimize the knowledge-based energy functions. The resulting potential function was then tested using a test mutant database consisting of 918 various point mutations introduced in 27 proteins. The best correlation coefficient obtained for the experimental data and the predicted thermostability for the training set is r = 0.81 (561 data points). A total of 76% of the mutations could be predicted correctly as being either stabilizing or destabilizing. The results for the test set are r = 0.74 (747 data points) and 72%, respectively. The global correlation over the combined data (1308 mutants) obtained is 0.78.
Collapse
Affiliation(s)
- Christian Hoppe
- Institut für Biochemie, Zülpicher Strasse 47, 50674 Köln, Germany
| | | |
Collapse
|
32
|
Gromiha MM, Selvaraj S. Inter-residue interactions in protein folding and stability. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2004; 86:235-77. [PMID: 15288760 DOI: 10.1016/j.pbiomolbio.2003.09.003] [Citation(s) in RCA: 225] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
During the process of protein folding, the amino acid residues along the polypeptide chain interact with each other in a cooperative manner to form the stable native structure. The knowledge about inter-residue interactions in protein structures is very helpful to understand the mechanism of protein folding and stability. In this review, we introduce the classification of inter-residue interactions into short, medium and long range based on a simple geometric approach. The features of these interactions in different structural classes of globular and membrane proteins, and in various folds have been delineated. The development of contact potentials and the application of inter-residue contacts for predicting the structural class and secondary structures of globular proteins, solvent accessibility, fold recognition and ab initio tertiary structure prediction have been evaluated. Further, the relationship between inter-residue contacts and protein-folding rates has been highlighted. Moreover, the importance of inter-residue interactions in protein-folding kinetics and for understanding the stability of proteins has been discussed. In essence, the information gained from the studies on inter-residue interactions provides valuable insights for understanding protein folding and de novo protein design.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Aomi Frontier Building 17F, 2-43 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | |
Collapse
|
33
|
Kumar S, Nussinov R. Experiment-guided thermodynamic simulations on reversible two-state proteins: implications for protein thermostability. Biophys Chem 2004; 111:235-46. [PMID: 15501567 DOI: 10.1016/j.bpc.2004.06.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2004] [Revised: 05/27/2004] [Accepted: 06/01/2004] [Indexed: 11/27/2022]
Abstract
Here, we perform protein thermodynamic simulations within a set of boundary conditions, effectively blanketing the experimental data. The thermodynamic parameters, melting temperature (TG), enthalpy change at the melting temperature (DeltaHG) and heat capacity change (DeltaCp) were systematically varied over the experimentally observed ranges for small single domain reversible two-state proteins. Parameter sets that satisfy the Gibbs-Helmholtz equation and yield a temperature of maximal stability (TS) around room temperature were selected. The results were divided into three categories by arbitrarily chosen TG ranges. The TG ranges in these categories correspond to typical values of the melting temperatures observed for the majority of the proteins from mesophilic, thermophilic and hyperthermophilic organisms. As expected, DeltaCp values tend to be high in mesophiles and low in hyperthermophiles. An increase in TG is accompanied by an up-shift and broadening of the protein stability curves, however, with a large scatter. Furthermore, the simulations reveal that the average DeltaHG increases with TG up to approximately 360 K and becomes constant thereafter. DeltaCp decreases with TG with different rates before and after approximately 360 K. This provides further justification for the separate grouping of proteins into thermophiles and hyperthermophiles to assess their thermodynamic differences. This analysis of the Gibbs-Helmholtz equation has allowed us to study the interdependence of the thermodynamic parameters TG, DeltaHG and DeltaCp and their derivatives in a more rigorous way than possible by the limited experimental protein thermodynamics data available in the literature. The results provide new insights into protein thermostability and suggest potential strategies for its manipulation.
Collapse
Affiliation(s)
- Sandeep Kumar
- Department of Biological Sciences and Bioengineering, Indian Institute of Technology Kanpur, Kanpur, U.P. 208016, India
| | | |
Collapse
|
34
|
Bordner AJ, Abagyan RA. Large-scale prediction of protein geometry and stability changes for arbitrary single point mutations. Proteins 2004; 57:400-13. [PMID: 15340927 DOI: 10.1002/prot.20185] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We have developed a method to both predict the geometry and the relative stability of point mutants that may be used for arbitrary mutations. The geometry optimization procedure was first tested on a new benchmark of 2141 ordered pairs of X-ray crystal structures of proteins that differ by a single point mutation, the largest data set to date. An empirical energy function, which includes terms representing the energy contributions of the folded and denatured proteins and uses the predicted mutant side chain conformation, was fit to a training set consisting of half of a diverse set of 1816 experimental stability values for single point mutations in 81 different proteins. The data included a substantial number of small to large residue mutations not considered by previous prediction studies. After removing 22 (approximately 2%) outliers, the stability calculation gave a standard deviation of 1.08 kcal/mol with a correlation coefficient of 0.82. The prediction method was then tested on the remaining half of the experimental data, giving a standard deviation of 1.10 kcal/mol and covariance of 0.66 for 97% of the test set. A regression fit of the energy function to a subset of 137 mutants, for which both native and mutant structures were available, gave a prediction error comparable to that for the complete training set with predicted side chain conformations. We found that about half of the variation is due to conformation-independent residue contributions. Finally, a fit to the experimental stability data using these residue parameters exclusively suggests guidelines for improving protein stability in the absence of detailed structure information.
Collapse
Affiliation(s)
- A J Bordner
- The Scripps Research Institute, 10550 North Torrey Pines Rd., Mail TPC-28, San Diego, California, USA.
| | | |
Collapse
|
35
|
Triantafillidou D, Persidou E, Lazarou D, Andrikopoulos P, Leontiadou F, Choli-Papadopoulou T. Structural destabilization of the recombinant thermophilic TthL11 ribosomal protein by a single amino acid substitution. Biol Chem 2004; 385:31-9. [PMID: 14977044 DOI: 10.1515/bc.2004.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Thermus thermophilus L11 protein has previously been reported to be resistant against tryptic and chymotryptic proteolysis under native conditions. With a single amino acid substitution, namely Trp101Arg, conformational changes were induced that resulted in the exhibition of specific amino acids that served as targets for tryptic and chymotryptic action and rendered the protein highly unstable even during purification. This unexpected process was evidenced by the isolation with size exclusion gel chromatography of the well-structured chymotryptic N-terminal domain in a high amount and its characterization both by Edman degradation and QTOF-EMS spectroscopy. On the other hand, the substitution of Val38Cys, which did not contribute to structural changes, indicates a very possible implication of this amino acid in the protein methylation process. The data reported in this work illustrate the distinctive amino acid dynamics in a thermophilic protein, which, while serving the function common to its counterparts from mesophilic organisms, has had to adapt to the extreme environmental conditions typical of thermophilic organisms.
Collapse
Affiliation(s)
- Dimitra Triantafillidou
- Laboratory of Biochemistry, School of Chemistry, Aristotle University of Thessaloniki, GR-54006 Thessaloniki, Greece
| | | | | | | | | | | |
Collapse
|
36
|
Selvaraj S, Gromiha MM. Role of hydrophobic clusters and long-range contact networks in the folding of (alpha/beta)8 barrel proteins. Biophys J 2003; 84:1919-25. [PMID: 12609894 PMCID: PMC1302761 DOI: 10.1016/s0006-3495(03)75000-0] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2002] [Accepted: 11/13/2002] [Indexed: 10/21/2022] Open
Abstract
Analysis on the three dimensional structures of (alpha/beta)(8) barrel proteins provides ample light to understand the factors that are responsible for directing and maintaining their common fold. In this work, the hydrophobically enriched clusters are identified in 92% of the considered (alpha/beta)(8) barrel proteins. The residue segments with hydrophobic clusters have high thermal stability. Further, these clusters are formed and stabilized through long-range interactions. Specifically, a network of long-range contacts connects adjacent beta-strands of the (alpha/beta)(8) barrel domain and the hydrophobic clusters. The implications of hydrophobic clusters and long-range networks in providing a feasible common mechanism for the folding of (alpha/beta)(8) barrel proteins are proposed.
Collapse
Affiliation(s)
- S Selvaraj
- Computational Biology Research Center (CBRC), Institute of Advanced Industrial Science and Technology (AIST) 2-41-6 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | | |
Collapse
|
37
|
Zhou H, Zhou Y. Stability scale and atomic solvation parameters extracted from 1023 mutation experiments. Proteins 2002; 49:483-92. [PMID: 12402358 DOI: 10.1002/prot.10241] [Citation(s) in RCA: 60] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The stability scale of 20 amino acid residues is derived from a database of 1023 mutation experiments on 35 proteins. The resulting scale of hydrophobic residues has an excellent correlation with the octanol-to-water transfer free energy corrected with an additional Flory-Huggins molar-volume term (correlation coefficient r = 0.95, slope = 1.05, and a near zero intercept). Thus, hydrophobic contribution to folding stability is characterized remarkably well by transfer experiments. However, no corresponding correlation is found for hydrophilic residues. Both the hydrophilic portion and the entire scale, however, correlate strongly with average burial accessible surface (r = 0.76 and 0.97, respectively). Such a strong correlation leads to a near uniform value of the atomic solvation parameters for atoms C, S, O/N, O(-0.5), and N(+0.5,1). All are in the range of 12-28 cal x mol(-1) A(-2), close to the original estimate of hydrophobic contribution of 25-30 cal x mol(-1) A(-2) to folding stability. Without any adjustable parameters, the new stability scale and new atomic solvation parameters yielded an accurate prediction of protein-protein binding free energy for a separate database of 21 protein-protein complexes (r = 0.80 and slope = 1.06, and r = 0.83 and slope = 0.93, respectively).
Collapse
Affiliation(s)
- Hongyi Zhou
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology & Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | |
Collapse
|
38
|
Abstract
Starting with the Protein Data Bank (PDB) as a common ancestor, the evolution of structural databases has been driven by the rapprochement of the structural world and the practical applications. The result is an impressive number of secondary structural databases that is welcomed by structural biologists and bioinformaticians but runs the risk of producing an embarrassment of riches among non-specialist users. Given that any profit depends on the number of customers, efficient interfaces between many structural data banks must be available to make their contents easily accessible. Increasing the information content of central structural repositories might be the best way to guide users through the many, sometimes overlapping databases.
Collapse
Affiliation(s)
- Oliviero Carugo
- Protein Structure and Bioinformatics Group, International Centre for Genetic Engineering and Biotechnology, Area Science Park, Padriciano 99, 34012 Trieste, Italy.
| | | |
Collapse
|
39
|
Abstract
The structures of enzymes reflect two tendencies that appear opposed. On one hand, they fold into compact, stable structures; on the other hand, they bind a ligand and catalyze a reaction. To be stable, enzymes fold to maximize favorable interactions, forming a tightly packed hydrophobic core, exposing hydrophilic groups, and optimizing intramolecular hydrogen-bonding. To be functional, enzymes carve out an active site for ligand binding, exposing hydrophobic surface area, clustering like charges, and providing unfulfilled hydrogen bond donors and acceptors. Using AmpC beta-lactamase, an enzyme that is well-characterized structurally and mechanistically, the relationship between enzyme stability and function was investigated by substituting key active-site residues and measuring the changes in stability and activity. Substitutions of catalytic residues Ser64, Lys67, Tyr150, Asn152, and Lys315 decrease the activity of the enzyme by 10(3)-10(5)-fold compared to wild-type. Concomitantly, many of these substitutions increase the stability of the enzyme significantly, by up to 4.7kcal/mol. To determine the structural origins of stabilization, the crystal structures of four mutant enzymes were determined to between 1.90A and 1.50A resolution. These structures revealed several mechanisms by which stability was increased, including mimicry of the substrate by the substituted residue (S64D), relief of steric strain (S64G), relief of electrostatic strain (K67Q), and improved polar complementarity (N152H). These results suggest that the preorganization of functionality characteristic of active sites has come at a considerable cost to enzyme stability. In proteins of unknown function, the presence of such destabilized regions may indicate the presence of a binding site.
Collapse
Affiliation(s)
- Beth M Beadle
- Department of Molecular Pharmacology and Biological Chemistry, Northwestern University School of Medicine, 303 East Chicago Avenue S215, Chicago, IL 60611-3008, USA
| | | |
Collapse
|
40
|
Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A. Importance of mutant position in Ramachandran plot for predicting protein stability of surface mutations. Biopolymers 2002; 64:210-20. [PMID: 12115138 DOI: 10.1002/bip.10125] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Understanding the mechanisms by which mutations affect protein stability is one of the most important problems in molecular biology. In this work, we analyzed the relationship between changes in protein stability caused by surface mutations and changes in 49 physicochemical, energetic, and conformational properties of amino acid residues. We found that the hydration entropy was the major contributor to the stability of surface mutations in helical segments; other properties responsible for size and volume of molecule also correlated significantly with stability. Classification of coil mutations based on their locations in the (phi-psi) map improved the correlation significantly, demonstrating the existence of a relationship between stability and strain energy, which indicates that the role of strain energy is very important for the stability of surface mutations. We observed that the inclusion of sequence and structural information raised the correlation, indicating the influence of surrounding residues on the stability of surface mutations. Further, we examined the previously reported "inverse relationship" between stability and hydrophobicity, and observed that the inverse hydrophobic effect was generally applicable only to coil mutations. The present study leads to a simple method for predicting protein stability changes caused by amino acid substitutions, which will be useful for protein engineering in designing novel proteins with increased stability and altered function.
Collapse
Affiliation(s)
- M Michael Gromiha
- RIKEN Tsukuba Institute, Institute of Physical and Chemical Research, 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074, Japan
| | | | | | | | | |
Collapse
|
41
|
Guerois R, Nielsen JE, Serrano L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J Mol Biol 2002; 320:369-87. [PMID: 12079393 DOI: 10.1016/s0022-2836(02)00442-4] [Citation(s) in RCA: 1295] [Impact Index Per Article: 58.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We have developed a computer algorithm, FOLDEF (for FOLD-X energy function), to provide a fast and quantitative estimation of the importance of the interactions contributing to the stability of proteins and protein complexes. The predictive power of FOLDEF was tested on a very large set of point mutants (1088 mutants) spanning most of the structural environments found in proteins. FOLDEF uses a full atomic description of the structure of the proteins. The different energy terms taken into account in FOLDEF have been weighted using empirical data obtained from protein engineering experiments. First, we considered a training database of 339 mutants in nine different proteins and optimised the set of parameters and weighting factors that best accounted for the changes in stability of the mutants. The predictive power of the method was then tested using a blind test mutant database of 667 mutants, as well as a database of 82 protein-protein complex mutants. The global correlation obtained for 95 % of the entire mutant database (1030 mutants) is 0.83 with a standard deviation of 0.81 kcal mol(-1) and a slope of 0.76. The present energy function uses a minimum of computational resources and can therefore easily be used in protein design algorithms, and in the field of protein structure and folding pathways prediction where one requires a fast and accurate energy function. FOLDEF is available via a web-interface at http://fold-x.embl-heidelberg.de
Collapse
|
42
|
Gromiha MM, Uedaira H, An J, Selvaraj S, Prabakaran P, Sarai A. ProTherm, Thermodynamic Database for Proteins and Mutants: developments in version 3.0. Nucleic Acids Res 2002; 30:301-2. [PMID: 11752320 PMCID: PMC99068 DOI: 10.1093/nar/30.1.301] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The current release of ProTherm, Thermodynamic Database for Proteins and Mutants, contains more than 10 000 numerical data (300% of the first version) of several thermodynamic parameters, experimental methods and conditions, reversibility of folding, details about the surrounding residues in space for all mutants, structural, functional and literature information. In the current version, we have added information about the source of each protein, identification codes for SWISS-PROT and Protein Information Resource and unique Protein Data Bank (PDB) code for proteins with relevant source. We have also provided additional options to search for data based on PDB code, number of states and reversibility. ProTherm is cross-linked with other sequence, structural, functional and literature databases, and the mutant sites and surrounding residues are automatically mapped on the structure. The ProTherm database is freely available at http://www.rtc.riken.go.jp/jouhou/protherm/protherm.html.
Collapse
Affiliation(s)
- M Michael Gromiha
- RIKEN Tsukuba Institute, Institute of Physical and Chemical Research, 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074, Japan
| | | | | | | | | | | |
Collapse
|
43
|
Gromiha MM, Thangakani AM. Role of medium- and long-range interactions to the stability of the mutants of T4 lysozyme. Prep Biochem Biotechnol 2001; 31:217-27. [PMID: 11513088 DOI: 10.1081/pb-100104905] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Inter-residue interactions play an important role to the folding and stability of protein molecules. In this work, we analyze the role of medium- and long-range interactions to the stability of T4 lysozyme mutants. We found that, in buried mutations, the increase in long-range contacts upon mutations destabilizes the protein, whereas, in surface mutations, the increase in long-range contacts increases the stability, indicating the importance of surrounding polar residues to the stability of surface mutations. Further, the increase in medium-range contacts decreases the stability of buried and surface mutations and a direct relationship is observed between the increase of medium-range contacts and increase in stability for partially buried/exposed mutations. Moreover, the relationship between amino acid properties and stability of T4 lysozyme mutants at positions Ile3, Phe53, and Leu99 showed that the effect of medium- and long-range contacts is less for buried mutations and the inter-residue contacts have significant correlation with the stability of partially buried mutations.
Collapse
Affiliation(s)
- M M Gromiha
- RIKEN Tsukuba Institute, The Institute of Physical and Chemical Research, Ibaraki, Japan.
| | | |
Collapse
|
44
|
Gromiha MM. Important inter-residue contacts for enhancing the thermal stability of thermophilic proteins. Biophys Chem 2001; 91:71-7. [PMID: 11403885 DOI: 10.1016/s0301-4622(01)00154-5] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Proteins from thermophilic organisms exhibit high thermal stability, but have structures that are very similar to their mesophilic homologues. In order to gain insight into the basis of thermostability, we have analyzed the medium- and long-range contacts in mesophilic and thermophilic proteins of 16 different families. We found that the thermophiles prefer to have contacts between residues with hydrogen-bond-forming capability. Apart from hydrophobic contacts, more contacts are observed between polar and non-polar residues in thermophiles than mesophiles. Residue-wise analysis showed that Tyr has good contacts with several other residues, and Cys has considerably higher long-range contacts in thermophiles compared with mesophiles. Furthermore, the residues occurring in the range of 31-34 residues apart in the sequence contribute significant long-range contacts to the stability of thermophilic proteins.
Collapse
Affiliation(s)
- M M Gromiha
- RIKEN Tsukuba Institute, 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074, Japan.
| |
Collapse
|
45
|
Gromiha MM. Factors influencing the stability of alpha-helices and beta-strands in thermophilic ribonuclease H. Prep Biochem Biotechnol 2001; 31:103-12. [PMID: 11426698 DOI: 10.1081/pb-100103376] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Understanding the influence of structural parameters is crucial to enhance the thermal stability of proteins. In this work, the stability (deltaG) of residues in different secondary structures of Ribonuclease H (RNase H) has been analyzed with 48 amino acid properties. The properties reflecting hydrophobicity show a good correlation with stability. Further, the linear distribution of surrounding hydrophobicity in alpha-helices, obtained from the three dimensional structure of thermophilic RNase H, agrees well with experimental deltaG values. Moreover, the stability parameters correlate better in alpha-helices than those did in beta-strand segments. Multiple regression analysis, incorporating combinations of three properties from among all possible combinations of the 48 properties, increased the correlation coefficient to 0.77.
Collapse
Affiliation(s)
- M M Gromiha
- Tsukuba Life Science Center, The Institute of Physical and Chemical Research (RIKEN), Ibaraki, Japan
| |
Collapse
|
46
|
Chiu WL, Sze CN, Ip LN, Chan SK, Au-Yeung SC. NTDB: Thermodynamic Database for Nucleic Acids. Nucleic Acids Res 2001; 29:230-3. [PMID: 11125100 PMCID: PMC29845 DOI: 10.1093/nar/29.1.230] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A new thermodynamic database for normal and modified nucleic acids has been developed. This Thermodynamic Database for Nucleic Acids (NTDB) includes sequence, structure and thermodynamic information as well as experimental methods and conditions. In this release, there are 1851 sequences containing both normal and modified nucleic acids. A user-friendly web-based interface has been developed to allow data searching under different conditions. Useful thermodynamic tools for the study of nucleic acids have been collected and linked for easy usage. NTDB is available at http://ntdb.chem.cuhk.edu.hk.
Collapse
Affiliation(s)
- W L Chiu
- Department of Chemistry, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China
| | | | | | | | | |
Collapse
|
47
|
Muthusamy R, Gromiha MM, Ponnuswamy PK. On the thermal unfolding character of globular proteins. JOURNAL OF PROTEIN CHEMISTRY 2000; 19:1-8. [PMID: 10882167 DOI: 10.1023/a:1007027623966] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
A theoretical model is presented to study the stepwise thermal unfolding of globular proteins using the stabilizing/destabilizing characters of amino acid residues in protein crystals. A multiple regression relation connecting the melting temperature and the amounts of stabilizing and destabilizing groups of residues in a protein, when used for the thermal behavior of peptide segments, provides reliable results on the stepwise unfolding nature of the protein. In ribonuclease A, the shell residues 16-22 are predicted to unfold earlier in the temperature range 30-45 degrees C; the beta-sheet structures undergo thermal denaturation as a single cooperative unit and there is evidence indicating the segment 106-118 as a nucleation site. In ribonuclease S, the S-peptide unfolds earlier than S-protein. The predicted average and the range of melting temperatures, and the folding pathways of a set of globular proteins, agree very well with the experimental results. The results obtained in the present study indicate that (i) most of the nucleation parts possess high relative thermal stability, (ii) the unfolded state retains some residual structure, and (iii) some segments undergo gradual and overlapping thermal denaturation.
Collapse
Affiliation(s)
- R Muthusamy
- Department of Physics, Bharathidasan University, Tamil Nadu, India
| | | | | |
Collapse
|