1
|
Fang X, Huang J, Zhang R, Wang F, Zhang Q, Li G, Yan J, Zhang H, Yan Y, Xu L. Convolution Neural Network-Based Prediction of Protein Thermostability. J Chem Inf Model 2019; 59:4833-4843. [PMID: 31657922 DOI: 10.1021/acs.jcim.9b00220] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Most natural proteins exhibit poor thermostability, which limits their industrial application. Computer-aided rational design is an efficient purpose-oriented method that can improve protein thermostability. Numerous machine-learning-based methods have been designed to predict the changes in protein thermostability induced by mutations. However, all of these methods have certain limitations due to existing mutation coding methods that overlook protein sequence features. Here we propose a method to predict protein thermostability using convolutional neural networks based on an in-depth study of thermostability-related protein properties. This method comprises a three-dimensional coding algorithm, including protein mutation information and a strategy to extract neighboring features at protein mutation sites based on multiscale convolution. The accuracies on the S1615 and S388 data sets, which are widely used for protein thermostability predictions, reached 86.4 and 87%, respectively. The Matthews correlation coefficient was nearly double those produced using other methods. Furthermore, a model was constructed to predict the thermostability of Rhizomucor miehei lipase mutants based on the S3661 data set, a single amino acid mutation data set screened from the ProTherm protein thermodynamics database. Compared with the RIF strategy, which consists of three algorithms, i.e., Rosetta ddg monomer, I Mutant 3.0, and FoldX, the accuracy of the proposed method was higher (75.0 vs 66.7%), and the negative sample resolution was simultaneously enhanced. These results indicate that our prediction method more effectively assessed the protein thermostability and distinguished its features, making it a powerful tool to devise mutations that enhance the thermostability of proteins, particularly enzymes.
Collapse
Affiliation(s)
- Xingrong Fang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Jinsha Huang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Rui Zhang
- Editorial Board of the Journal of Wuhan Institute of Technology , Wuhan Institute of Technology , Wuhan 430074 , P. R. China
| | - Fei Wang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Qiuyu Zhang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Guanlin Li
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Jinyong Yan
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Houjin Zhang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Yunjun Yan
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Li Xu
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| |
Collapse
|
2
|
Kulshreshtha S, Chaudhary V, Goswami GK, Mathur N. Computational approaches for predicting mutant protein stability. J Comput Aided Mol Des 2016; 30:401-12. [DOI: 10.1007/s10822-016-9914-3] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 05/02/2016] [Indexed: 11/24/2022]
|
3
|
Folkman L, Stantic B, Sattar A, Zhou Y. EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models. J Mol Biol 2016; 428:1394-1405. [DOI: 10.1016/j.jmb.2016.01.012] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Revised: 01/12/2016] [Accepted: 01/13/2016] [Indexed: 10/22/2022]
|
4
|
Gromiha MM, Anoosha P, Huang LT. Applications of Protein Thermodynamic Database for Understanding Protein Mutant Stability and Designing Stable Mutants. Methods Mol Biol 2016; 1415:71-89. [PMID: 27115628 DOI: 10.1007/978-1-4939-3572-7_4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Protein stability is the free energy difference between unfolded and folded states of a protein, which lies in the range of 5-25 kcal/mol. Experimentally, protein stability is measured with circular dichroism, differential scanning calorimetry, and fluorescence spectroscopy using thermal and denaturant denaturation methods. These experimental data have been accumulated in the form of a database, ProTherm, thermodynamic database for proteins and mutants. It also contains sequence and structure information of a protein, experimental methods and conditions, and literature information. Different features such as search, display, and sorting options and visualization tools have been incorporated in the database. ProTherm is a valuable resource for understanding/predicting the stability of proteins and it can be accessed at http://www.abren.net/protherm/ . ProTherm has been effectively used to examine the relationship among thermodynamics, structure, and function of proteins. We describe the recent progress on the development of methods for understanding/predicting protein stability, such as (1) general trends on mutational effects on stability, (2) relationship between the stability of protein mutants and amino acid properties, (3) applications of protein three-dimensional structures for predicting their stability upon point mutations, (4) prediction of protein stability upon single mutations from amino acid sequence, and (5) prediction methods for addressing double mutants. A list of online resources for predicting has also been provided.
Collapse
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Bhupat & Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600 036, India.
| | - P Anoosha
- Department of Biotechnology, Bhupat & Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600 036, India
| | - Liang-Tsung Huang
- Department of Medical Informatics, Tzu Chi University, Hualien, 970, Taiwan
| |
Collapse
|
5
|
Brender JR, Zhang Y. Predicting the Effect of Mutations on Protein-Protein Binding Interactions through Structure-Based Interface Profiles. PLoS Comput Biol 2015; 11:e1004494. [PMID: 26506533 PMCID: PMC4624718 DOI: 10.1371/journal.pcbi.1004494] [Citation(s) in RCA: 99] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 08/06/2015] [Indexed: 11/18/2022] Open
Abstract
The formation of protein-protein complexes is essential for proteins to perform their physiological functions in the cell. Mutations that prevent the proper formation of the correct complexes can have serious consequences for the associated cellular processes. Since experimental determination of protein-protein binding affinity remains difficult when performed on a large scale, computational methods for predicting the consequences of mutations on binding affinity are highly desirable. We show that a scoring function based on interface structure profiles collected from analogous protein-protein interactions in the PDB is a powerful predictor of protein binding affinity changes upon mutation. As a standalone feature, the differences between the interface profile score of the mutant and wild-type proteins has an accuracy equivalent to the best all-atom potentials, despite being two orders of magnitude faster once the profile has been constructed. Due to its unique sensitivity in collecting the evolutionary profiles of analogous binding interactions and the high speed of calculation, the interface profile score has additional advantages as a complementary feature to combine with physics-based potentials for improving the accuracy of composite scoring approaches. By incorporating the sequence-derived and residue-level coarse-grained potentials with the interface structure profile score, a composite model was constructed through the random forest training, which generates a Pearson correlation coefficient >0.8 between the predicted and observed binding free-energy changes upon mutation. This accuracy is comparable to, or outperforms in most cases, the current best methods, but does not require high-resolution full-atomic models of the mutant structures. The binding interface profiling approach should find useful application in human-disease mutation recognition and protein interface design studies. Few proteins carry out their tasks in isolation. Instead, proteins combine with each other in complicated ways that can be affected by either the natural genetic variation that occurs among people or by disease causing mutations such as those that occur in cancer or in genetic disorders. To understand how these mutations affect our health, it is necessary to understand how mutations can affect the strength of the interactions that bind proteins together. This is a difficult task to do in a laboratory on a large scale and scientists are increasingly turning to computational methods to predict these effects in advance. We show that by looking at the multiple alignments of similar protein-protein complex structures at the interface regions, new constraints based on the evolution of the three dimensional structures of proteins can be made to predict which mutations are compatible with two proteins interacting and which are not.
Collapse
Affiliation(s)
- Jeffrey R. Brender
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
6
|
Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics. Int J Mol Sci 2014; 15:9670-717. [PMID: 24886813 PMCID: PMC4100115 DOI: 10.3390/ijms15069670] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/15/2014] [Accepted: 05/16/2014] [Indexed: 12/25/2022] Open
Abstract
DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules.
Collapse
|
7
|
Abstract
Background Reliable prediction of stability changes in protein variants is an important aspect of computational protein design. A number of machine learning methods that allow a classification of stability changes knowing only the sequence of the protein emerged. However, their performance on amino acid substitutions of previously unseen non-homologous proteins is rather limited. Moreover, the performance varies for different types of mutations based on the secondary structure or accessible surface area of the mutation site. Results We proposed feature-based multiple models with each model designed for a specific type of mutations. The new method is composed of five models trained for mutations in exposed, buried, helical, sheet, and coil residues. The classification of a mutation as stabilising or destabilising is made as a consensus of two models, one selected based on the predicted accessible surface area and the other based on the predicted secondary structure of the mutation site. We refer to our new method as Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM). Cross-validation results show that EASE-MM provides a notable improvement to our previous work reaching a Matthews correlation coefficient of 0.44. EASE-MM was able to correctly classify 73% and 75% of stabilising and destabilising protein variants, respectively. Using an independent test set of 238 mutations, we confirmed our results in a comparison with related work. Conclusions EASE-MM not only outperformed other related methods but achieved more balanced results for different types of mutations based on the accessible surface area, secondary structure, or magnitude of stability changes. This can be attributed to using multiple models with the most relevant features selected for the given type of mutations. Therefore, our results support the presumption that different interactions govern stability changes in the exposed and buried residues or in residues with a different secondary structure.
Collapse
|
8
|
Folkman L, Stantic B, Sattar A. Towards sequence-based prediction of mutation-induced stability changes in unseen non-homologous proteins. BMC Genomics 2014; 15 Suppl 1:S4. [PMID: 24564514 PMCID: PMC4046685 DOI: 10.1186/1471-2164-15-s1-s4] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background Reliable prediction of stability changes induced by a single amino acid substitution is an important aspect of computational protein design. Several machine learning methods capable of predicting stability changes from the protein sequence alone have been introduced. Prediction performance of these methods is evaluated on mutations unseen during training. Nevertheless, different mutations of the same protein, and even the same residue, as encountered during training are commonly used for evaluation. We argue that a faithful evaluation can be achieved only when a method is tested on previously unseen proteins with low sequence similarity to the training set. Results We provided experimental evidence of the limitations of the evaluation commonly used for assessing the prediction performance. Furthermore, we demonstrated that the prediction of stability changes in previously unseen non-homologous proteins is a challenging task for currently available methods. To improve the prediction performance of our previously proposed method, we identified features which led to over-fitting and further extended the model with new features. The new method employs Evolutionary And Structural Encodings with Amino Acid parameters (EASE-AA). Evaluated with an independent test set of more than 600 mutations, EASE-AA yielded a Matthews correlation coefficient of 0.36 and was able to classify correctly 66% of the stabilising and 74% of the destabilising mutations. For real-value prediction, EASE-AA achieved the correlation of predicted and experimentally measured stability changes of 0.51. Conclusions Commonly adopted evaluation with mutations in the same protein, and even the same residue, randomly divided between the training and test sets lead to an overestimation of prediction performance. Therefore, stability changes prediction methods should be evaluated only on mutations in previously unseen non-homologous proteins. Under such an evaluation, EASE-AA predicts stability changes more reliably than currently available methods. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-S1-S4) contains supplementary material, which is available to authorized users.
Collapse
|