1
|
Rodella C, Lazaridi S, Lemmin T. TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms. BIOINFORMATICS ADVANCES 2024; 4:vbae103. [PMID: 39040220 PMCID: PMC11262459 DOI: 10.1093/bioadv/vbae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 06/14/2024] [Accepted: 07/12/2024] [Indexed: 07/24/2024]
Abstract
Motivation Understanding protein thermostability is essential for numerous biotechnological applications, but traditional experimental methods are time-consuming, expensive, and error-prone. Recently, deep learning (DL) techniques from natural language processing (NLP) was extended to the field of biology, since the primary sequence of proteins can be viewed as a string of amino acids that follow a physicochemical grammar. Results In this study, we developed TemBERTure, a DL framework that predicts thermostability class and melting temperature from protein sequences. Our findings emphasize the importance of data diversity for training robust models, especially by including sequences from a wider range of organisms. Additionally, we suggest using attention scores from Deep Learning models to gain deeper insights into protein thermostability. Analyzing these scores in conjunction with the 3D protein structure can enhance understanding of the complex interactions among amino acid properties, their positioning, and the surrounding microenvironment. By addressing the limitations of current prediction methods and introducing new exploration avenues, this research paves the way for more accurate and informative protein thermostability predictions, ultimately accelerating advancements in protein engineering. Availability and implementation TemBERTure model and the data are available at: https://github.com/ibmm-unibe-ch/TemBERTure.
Collapse
Affiliation(s)
- Chiara Rodella
- Institute of Biochemistry and Molecular Medicine (IBMM), University of Bern, Bern CH-3012, Switzerland
- Graduate School for Cellular and Biomedical Sciences (GCB), University of Bern, Bern CH-3012, Switzerland
| | - Symela Lazaridi
- Institute of Biochemistry and Molecular Medicine (IBMM), University of Bern, Bern CH-3012, Switzerland
- Graduate School for Cellular and Biomedical Sciences (GCB), University of Bern, Bern CH-3012, Switzerland
| | - Thomas Lemmin
- Institute of Biochemistry and Molecular Medicine (IBMM), University of Bern, Bern CH-3012, Switzerland
| |
Collapse
|
2
|
Qiu Y, Huang T, Cai YD. Review of predicting protein stability changes upon variations. Proteomics 2024; 24:e2300371. [PMID: 38643379 DOI: 10.1002/pmic.202300371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/07/2024] [Accepted: 04/08/2024] [Indexed: 04/22/2024]
Abstract
Forecasting alterations in protein stability caused by variations holds immense importance. Improving the thermal stability of proteins is important for biomedical and industrial applications. This review discusses the latest methods for predicting the effects of mutations on protein stability, databases containing protein mutations and thermodynamic parameters, and experimental techniques for efficiently assessing protein stability in high-throughput settings. Various publicly available databases for protein stability prediction are introduced. Furthermore, state-of-the-art computational approaches for anticipating protein stability changes due to variants are reviewed. Each method's types of features, base algorithm, and prediction results are also detailed. Additionally, some experimental approaches for verifying the prediction results of computational methods are introduced. Finally, the review summarizes the progress and challenges of protein stability prediction and discusses potential models for future research directions.
Collapse
Affiliation(s)
- Yiling Qiu
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, China
| | - Tao Huang
- Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
3
|
Huang A, Chen Z, Wu X, Yan W, Lu F, Liu F. Improving the thermal stability and catalytic activity of ulvan lyase by the combination of FoldX and KnowVolution campaign. Int J Biol Macromol 2024; 257:128577. [PMID: 38070809 DOI: 10.1016/j.ijbiomac.2023.128577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 11/22/2023] [Accepted: 12/01/2023] [Indexed: 01/26/2024]
Abstract
Thermal stability is one of the most important properties of ulvan lyases for their application in algae biomass degradation. The Knowledge gaining directed eVolution (KnowVolution) protein engineering strategy could be employed to improve thermostability of ulvan lyase with less screening effort. Herein, the unfolding free energies (ΔΔG) of the loop region were calculated using FoldX and four sites (D103, G104, T113, Q229) were selected for saturation mutagenesis, resulting in the identification of a favorable single-site mutant Q229M. Subsequently, iteration mutation was carried out with the mutant N57P (previously obtained by our group) to further enhance the performance of ulvan lyase. The results showed that the most beneficial variant N57P/Q229M exhibited a 1.67-fold and 2-fold increase in residual activity compared to the wild type after incubation at 40 °C and 50 °C for 1 h, respectively. In addition, the variant produced 1.06 mg/mL of reducing sugar in 2 h, which was almost four times as much as the wild type. Molecular dynamics simulations revealed that N57P/Q229M mutant enhanced the structural rigidity by augmenting intramolecular hydrogen bonds. Meanwhile, the shorter proton transmission distance between the general base of the enzyme and the substrate contributed to the glycosidic bond breakage. Our research showed that in silico saturation mutagenesis using position scan module in FoldX allowed for faster screening of mutants with improved thermal stability, and combining it with KnowVolution enabled a balanced effect of thermal stability and enzyme activity in protein engineering.
Collapse
Affiliation(s)
- Ailan Huang
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, PR China
| | - Zhengqi Chen
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, PR China
| | - Xinming Wu
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, PR China
| | - Wenxing Yan
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, PR China
| | - Fuping Lu
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, PR China; Key Laboratory of Industrial Fermentation Microbiology, Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, Tianjin, PR China
| | - Fufeng Liu
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, PR China; Key Laboratory of Industrial Fermentation Microbiology, Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, Tianjin, PR China.
| |
Collapse
|
4
|
Yu T, Boob AG, Volk MJ, Liu X, Cui H, Zhao H. Machine learning-enabled retrobiosynthesis of molecules. Nat Catal 2023. [DOI: 10.1038/s41929-022-00909-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
|
5
|
Kumar S, Duggineni VK, Singhania V, Misra SP, Deshpande PA. Unravelling and Quantifying the Biophysical– Biochemical Descriptors Governing Protein Thermostability by Machine Learning. ADVANCED THEORY AND SIMULATIONS 2023. [DOI: 10.1002/adts.202200703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
- Shashi Kumar
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Vinay Kumar Duggineni
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Vibhuti Singhania
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Swayam Prabha Misra
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Parag A. Deshpande
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| |
Collapse
|
6
|
Veiko VP, Antipov AN, Mordkovich NN, Okorokova NA, Safonova TN, Polyakov KM. The Thermostability of Nucleoside Phosphorylases from Prokaryotes. I. The Role of the Primary Structure of the N-terminal fragment of the Protein in the Thermostability of Uridine Phosphorylases. APPL BIOCHEM MICRO+ 2022. [DOI: 10.1134/s0003683822060151] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
AbstractMutant uridine phosphorylase genes from Shewanella oneidensis MR-1 (S. oneidensis) were constructed by site-directed mutagenesis and strains-producers of the corresponding recombinant (F5I and F5G) proteins were obtained on the basis of Escherichia coli cells. The mutant proteins were purified and their physicochemical and enzymatic properties were studied. It was shown that the N-terminal fragment of uridine phosphorylase plays an important role in the thermal stabilization of the enzyme as a whole. The role of the aminoacid (a.a.) residue phenylalanine (F5) in the formation of thermotolerance of uridine phosphorylases from gamma-proteobacteria was revealed.
Collapse
|
7
|
Pan Q, Nguyen TB, Ascher DB, Pires DEV. Systematic evaluation of computational tools to predict the effects of mutations on protein stability in the absence of experimental structures. Brief Bioinform 2022; 23:bbac025. [PMID: 35189634 PMCID: PMC9155634 DOI: 10.1093/bib/bbac025] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 01/13/2022] [Accepted: 01/30/2022] [Indexed: 12/26/2022] Open
Abstract
Changes in protein sequence can have dramatic effects on how proteins fold, their stability and dynamics. Over the last 20 years, pioneering methods have been developed to try to estimate the effects of missense mutations on protein stability, leveraging growing availability of protein 3D structures. These, however, have been developed and validated using experimentally derived structures and biophysical measurements. A large proportion of protein structures remain to be experimentally elucidated and, while many studies have based their conclusions on predictions made using homology models, there has been no systematic evaluation of the reliability of these tools in the absence of experimental structural data. We have, therefore, systematically investigated the performance and robustness of ten widely used structural methods when presented with homology models built using templates at a range of sequence identity levels (from 15% to 95%) and contrasted performance with sequence-based tools, as a baseline. We found there is indeed performance deterioration on homology models built using templates with sequence identity below 40%, where sequence-based tools might become preferable. This was most marked for mutations in solvent exposed residues and stabilizing mutations. As structure prediction tools improve, the reliability of these predictors is expected to follow, however we strongly suggest that these factors should be taken into consideration when interpreting results from structure-based predictors of mutation effects on protein stability.
Collapse
Affiliation(s)
- Qisheng Pan
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
| | - Thanh Binh Nguyen
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
- Department of Biochemistry, University of Cambridge, 80 Tennis Ct Rd, Cambridge CB2 1GA, UK
| | - Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria 3004, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane City, Queensland 4072, Australia
- Systems and Computational Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville, Victoria 3052, Australia
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria 3053, Australia
| |
Collapse
|
8
|
Dyer RP, Weiss GA. Making the cut with protease engineering. Cell Chem Biol 2022; 29:177-190. [PMID: 34921772 PMCID: PMC9127713 DOI: 10.1016/j.chembiol.2021.12.001] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 07/30/2021] [Accepted: 11/29/2021] [Indexed: 12/30/2022]
Abstract
Proteases cut with enviable precision and regulate diverse molecular events in biology. Such qualities drive a seemingly inexhaustible appetite for proteases with new activities and capabilities. Comprising 25% of the total industrial enzyme market, proteases appear in consumer goods, such as detergents, textile processing, and numerous foods; additionally, proteases include 25 US Food and Drug Administration-approved medicines and various research tools. Recent advances in protease engineering strategies address target specificity, catalytic efficiency, and stability. This guide to protease engineering surveys best practices and emerging strategies. We further highlight gaps and flexibilities inherent to each system that suggest opportunities for new technology development along with engineered proteases to solve challenges in proteomics, protein sequencing, and synthetic gene circuits.
Collapse
Affiliation(s)
- Rebekah P Dyer
- Department of Molecular Biology and Biochemistry, University of California, Irvine, 1102 NS-2, Irvine, CA 92697-2025, USA
| | - Gregory A Weiss
- Department of Chemistry, University of California, Irvine, 1102 NS-2, Irvine, CA 92697-2025, USA; Department of Molecular Biology and Biochemistry, University of California, Irvine, 1102 NS-2, Irvine, CA 92697-2025, USA; Department of Pharmaceutical Sciences, University of California, Irvine, 1102 NS-2, Irvine, CA 92697-2025, USA.
| |
Collapse
|
9
|
Abstract
Microbial community diversity is often correlated with physical environmental stresses like acidity, salinity, and temperature. For example, species diversity usually declines with increasing temperature above 20°C. However, few studies have examined whether the genetic functional diversity of community metagenomes varies in a similar way as species diversity along stress gradients. Here, we investigated bacterial communities in thermal spring sediments ranging from 21 to 88°C, representing communities of 330 to 3,800 bacterial and archaeal species based on 16S rRNA gene amplicon analysis. Metagenomes were sequenced, and Pfam abundances were used as a proxy for metagenomic functional diversity. Significant decreases in both species diversity and Pfam diversity were observed with increasing temperatures. The relationship between Pfam diversity and species diversity followed a power function with the steepest slopes in the high-temperature, low-diversity region of the gradient. Species additions to simple thermophilic communities added many new Pfams, while species additions to complex mesophilic communities added relatively fewer new Pfams, indicating that species diversity does not approach saturation as rapidly as Pfam diversity does. Many Pfams appeared to have distinct temperature ceilings of 60 to 80°C. This study suggests that temperature stress limits both taxonomic and functional diversity of microbial communities, but in a quantitatively different manner. Lower functional diversity at higher temperatures is probably due to two factors, including (i) the absence of many enzymes not adapted to thermophilic conditions, and (ii) the fact that high-temperature communities are comprised of fewer species with smaller average genomes and, therefore, contain fewer rare functions. IMPORTANCE Only recently have microbial ecologists begun to assess quantitatively how microbial species diversity correlates with environmental factors like pH, temperature, and salinity. However, still, very few studies have examined how the number of distinct biochemical functions of microbial communities, termed functional diversity, varies with the same environmental factors. Our study examined 18 microbial communities sampled across a wide temperature gradient and found that increasing temperature reduced both species and functional diversity, but in different ways. Initially, functional diversity increased sharply with increasing species diversity but eventually plateaued, following a power function. This pattern has been previously predicted in theoretical models, but our study validates this predicted power function with field metagenomic data. This study also presents a unique overview of the distribution of metabolic functions along a temperature gradient, demonstrating that many functions have temperature "ceilings" above which they are no longer found.
Collapse
|
10
|
Taking the leap between analytical chemistry and artificial intelligence: A tutorial review. Anal Chim Acta 2021; 1161:338403. [DOI: 10.1016/j.aca.2021.338403] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 03/02/2021] [Accepted: 03/03/2021] [Indexed: 01/01/2023]
|
11
|
Sato Y, Okano K, Kimura H, Honda K. TEMPURA: Database of Growth TEMPeratures of Usual and RAre Prokaryotes. Microbes Environ 2021; 35. [PMID: 32727974 PMCID: PMC7511790 DOI: 10.1264/jsme2.me20074] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Growth temperature is one of the most representative biological parameters for characterizing living organisms. Prokaryotes have been isolated from various temperature environments and show wide diversity in their growth temperatures. We herein constructed a database of growth TEMPeratures of Usual and RAre prokaryotes (TEMPURA, http://togodb.org/db/tempura), which contains the minimum, optimum, and maximum growth temperatures of 8,639 prokaryotic strains. Growth temperature information is linked with taxonomy IDs, phylogenies, and genomic information. TEMPURA provides useful information to researchers working on biotechnological applications of extremophiles and their biomolecules as well as those performing fundamental studies on the physiological diversity of prokaryotes.
Collapse
Affiliation(s)
- Yu Sato
- International Center for Biotechnology, Osaka University
| | - Kenji Okano
- International Center for Biotechnology, Osaka University
| | - Hiroyuki Kimura
- Research Institute of Green Science and Technology, Shizuoka University.,Department of Geosciences, Faculty of Science, Shizuoka University
| | - Kohsuke Honda
- International Center for Biotechnology, Osaka University
| |
Collapse
|
12
|
Mesbahuddin MS, Ganesan A, Kalyaanamoorthy S. Engineering stable carbonic anhydrases for CO2 capture: a critical review. Protein Eng Des Sel 2021; 34:6356912. [PMID: 34427656 DOI: 10.1093/protein/gzab021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 07/16/2021] [Indexed: 11/14/2022] Open
Abstract
Targeted inhibition of misregulated protein-protein interactions (PPIs) has been a promising area of investigation in drug discovery and development for human diseases. However, many constraints remain, including shallow binding surfaces and dynamic conformation changes upon interaction. A particularly challenging aspect is the undesirable off-target effects caused by inherent structural similarity among the protein families. To tackle this problem, phage display has been used to engineer PPIs for high-specificity binders with improved binding affinity and greatly reduced undesirable interactions with closely related proteins. Although general steps of phage display are standardized, library design is highly variable depending on experimental contexts. Here in this review, we examined recent advances in the structure-based combinatorial library design and the advantages and limitations of different approaches. The strategies described here can be explored for other protein-protein interactions and aid in designing new libraries or improving on previous libraries.
Collapse
Affiliation(s)
| | - Aravindhan Ganesan
- School of Pharmacy, University of Waterloo, Waterloo, Ontario N2G 1C5, Canada
| | | |
Collapse
|
13
|
Feng C, Ma Z, Yang D, Li X, Zhang J, Li Y. A Method for Prediction of Thermophilic Protein Based on Reduced Amino Acids and Mixed Features. Front Bioeng Biotechnol 2020; 8:285. [PMID: 32432088 PMCID: PMC7214540 DOI: 10.3389/fbioe.2020.00285] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 03/18/2020] [Indexed: 11/13/2022] Open
Abstract
The thermostability of proteins is a key factor considered during enzyme engineering, and finding a method that can identify thermophilic and non-thermophilic proteins will be helpful for enzyme design. In this study, we established a novel method combining mixed features and machine learning to achieve this recognition task. In this method, an amino acid reduction scheme was adopted to recode the amino acid sequence. Then, the physicochemical characteristics, auto-cross covariance (ACC), and reduced dipeptides were calculated and integrated to form a mixed feature set, which was processed using correlation analysis, feature selection, and principal component analysis (PCA) to remove redundant information. Finally, four machine learning methods and a dataset containing 500 random observations out of 915 thermophilic proteins and 500 random samples out of 793 non-thermophilic proteins were used to train and predict the data. The experimental results showed that 98.2% of thermophilic and non-thermophilic proteins were correctly identified using 10-fold cross-validation. Moreover, our analysis of the final reserved features and removed features yielded information about the crucial, unimportant and insensitive elements, it also provided essential information for enzyme design.
Collapse
Affiliation(s)
- Changli Feng
- College of Information Science and Technology, Taishan University, Tai’an, China
| | - Zhaogui Ma
- College of Information Science and Technology, Taishan University, Tai’an, China
| | - Deyun Yang
- College of Information Science and Technology, Taishan University, Tai’an, China
| | - Xin Li
- College of Information Science and Technology, Taishan University, Tai’an, China
| | - Jun Zhang
- Department of Rehabilitation, General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Yanjuan Li
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| |
Collapse
|
14
|
Fang X, Huang J, Zhang R, Wang F, Zhang Q, Li G, Yan J, Zhang H, Yan Y, Xu L. Convolution Neural Network-Based Prediction of Protein Thermostability. J Chem Inf Model 2019; 59:4833-4843. [PMID: 31657922 DOI: 10.1021/acs.jcim.9b00220] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Most natural proteins exhibit poor thermostability, which limits their industrial application. Computer-aided rational design is an efficient purpose-oriented method that can improve protein thermostability. Numerous machine-learning-based methods have been designed to predict the changes in protein thermostability induced by mutations. However, all of these methods have certain limitations due to existing mutation coding methods that overlook protein sequence features. Here we propose a method to predict protein thermostability using convolutional neural networks based on an in-depth study of thermostability-related protein properties. This method comprises a three-dimensional coding algorithm, including protein mutation information and a strategy to extract neighboring features at protein mutation sites based on multiscale convolution. The accuracies on the S1615 and S388 data sets, which are widely used for protein thermostability predictions, reached 86.4 and 87%, respectively. The Matthews correlation coefficient was nearly double those produced using other methods. Furthermore, a model was constructed to predict the thermostability of Rhizomucor miehei lipase mutants based on the S3661 data set, a single amino acid mutation data set screened from the ProTherm protein thermodynamics database. Compared with the RIF strategy, which consists of three algorithms, i.e., Rosetta ddg monomer, I Mutant 3.0, and FoldX, the accuracy of the proposed method was higher (75.0 vs 66.7%), and the negative sample resolution was simultaneously enhanced. These results indicate that our prediction method more effectively assessed the protein thermostability and distinguished its features, making it a powerful tool to devise mutations that enhance the thermostability of proteins, particularly enzymes.
Collapse
Affiliation(s)
- Xingrong Fang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Jinsha Huang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Rui Zhang
- Editorial Board of the Journal of Wuhan Institute of Technology , Wuhan Institute of Technology , Wuhan 430074 , P. R. China
| | - Fei Wang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Qiuyu Zhang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Guanlin Li
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Jinyong Yan
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Houjin Zhang
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Yunjun Yan
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| | - Li Xu
- Key Laboratory of Molecular Biophysics, Ministry of Education, College of Life Science and Technology , Huazhong University of Science and Technology , Wuhan 430074 , P. R. China
| |
Collapse
|
15
|
Koirala M, Alexov E. Computational chemistry methods to investigate the effects caused by DNA variants linked with disease. JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY 2019. [DOI: 10.1142/s0219633619300015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Computational chemistry offers variety of tools to study properties of biological macromolecules. These tools vary in terms of levels of details from quantum mechanical treatment to numerous macroscopic approaches. Here, we provide a review of computational chemistry algorithms and tools for modeling the effects of genetic variations and their association with diseases. Particular emphasis is given on modeling the effects of missense mutations on stability, conformational dynamics, binding, hydrogen bond network, salt bridges, and pH-dependent properties of the corresponding macromolecules. It is outlined that the disease may be caused by alteration of one or several of above-mentioned biophysical characteristics, and a successful prediction of pathogenicity requires detailed analysis of how the alterations affect the function of involved macromolecules. The review provides a short list of most commonly used algorithms to predict the molecular effects of mutations as well.
Collapse
Affiliation(s)
- Mahesh Koirala
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29630, USA
| | - Emil Alexov
- Department of Physics and Astronomy, Clemson University, Clemson, SC 29630, USA
| |
Collapse
|
16
|
Li G, Rabe KS, Nielsen J, Engqvist MKM. Machine Learning Applied to Predicting Microorganism Growth Temperatures and Enzyme Catalytic Optima. ACS Synth Biol 2019; 8:1411-1420. [PMID: 31117361 DOI: 10.1021/acssynbio.9b00099] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Enzymes that catalyze chemical reactions at high temperatures are used for industrial biocatalysis, applications in molecular biology, and as highly evolvable starting points for protein engineering. The optimal growth temperature (OGT) of organisms is commonly used to estimate the stability of enzymes encoded in their genomes, but the number of experimentally determined OGT values are limited, particularly for thermophilic organisms. Here, we report on the development of a machine learning model that can accurately predict OGT for bacteria, archaea, and microbial eukaryotes directly from their proteome-wide 2-mer amino acid composition. The trained model is made freely available for reuse. In a subsequent step we use OGT data in combination with amino acid composition of individual enzymes to develop a second machine learning model-for prediction of enzyme catalytic temperature optima ( Topt). The resulting model generates enzyme Topt estimates that are far superior to using OGT alone. Finally, we predict Topt for 6.5 million enzymes, covering 4447 enzyme classes, and make the resulting data set available to researchers. This work enables simple and rapid identification of enzymes that are potentially functional at extreme temperatures.
Collapse
Affiliation(s)
- Gang Li
- Department of Biology and Biological Engineering , Chalmers University of Technology , SE-412 96 Gothenburg , Sweden
| | - Kersten S Rabe
- Institute for Biological Interfaces 1 (IBG 1) , Karlsruhe Institute of Technology (KIT) , Group for Molecular Evolution, 76131 Karlsruhe , Germany
| | - Jens Nielsen
- Department of Biology and Biological Engineering , Chalmers University of Technology , SE-412 96 Gothenburg , Sweden
- Novo Nordisk Foundation Center for Biosustainability , Technical University of Denmark , DK-2800 Kgs. Lyngby , Denmark
| | - Martin K M Engqvist
- Department of Biology and Biological Engineering , Chalmers University of Technology , SE-412 96 Gothenburg , Sweden
| |
Collapse
|
17
|
Multidisciplinary involvement and potential of thermophiles. Folia Microbiol (Praha) 2018; 64:389-406. [PMID: 30386965 DOI: 10.1007/s12223-018-0662-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2018] [Accepted: 10/25/2018] [Indexed: 12/15/2022]
Abstract
The full biotechnological exploitation of thermostable enzymes in industrial processes is necessary for their commercial interest and industrious value. The heat-tolerant and heat-resistant enzymes are a key for efficient and cost-effective translation of substrates into useful products for commercial applications. The thermophilic, hyperthermophilic, and microorganisms adapted to extreme temperatures (i.e., low-temperature lovers or psychrophiles) are a rich source of thermostable enzymes with broad-ranging thermal properties, which have structural and functional stability to underpin a variety of technologies. These enzymes are under scrutiny for their great biotechnological potential. Temperature is one of the most critical parameters that shape microorganisms and their biomolecules for stability under harsh environmental conditions. This review describes in detail the sources of thermophiles and thermostable enzymes from prokaryotes and eukaryotes (microbial cell factories). Furthermore, the review critically examines perspectives to improve modern biocatalysts, its production and performance aiming to increase their value for biotechnology through higher standards, specificity, resistance, lowing costs, etc. These thermostable and thermally adapted extremophilic enzymes have been used in a wide range of industries that span all six enzyme classes. Thus, in particular, target of this review paper is to show the possibility of both high-value-low-volume (e.g., fine-chemical synthesis) and low-value-high-volume by-products (e.g., fuels) by minimizing changes to current industrial processes.
Collapse
|
18
|
Verkhivker GM. Biophysical simulations and structure-based modeling of residue interaction networks in the tumor suppressor proteins reveal functional role of cancer mutation hotspots in molecular communication. Biochim Biophys Acta Gen Subj 2018; 1863:210-225. [PMID: 30339916 DOI: 10.1016/j.bbagen.2018.10.009] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 10/06/2018] [Accepted: 10/13/2018] [Indexed: 12/19/2022]
Abstract
In the current study, we have combined molecular simulations and energetic analysis with dynamics-based network modeling and perturbation response scanning to determine molecular signatures of mutational hotspot residues in the p53, PTEN, and SMAD4 tumor suppressor proteins. By examining structure, energetics and dynamics of these proteins, we have shown that inactivating mutations preferentially target a group of structurally stable residues that play a fundamental role in global propagation of dynamic fluctuations and mediating allosteric interaction networks. Through integration of long-range perturbation dynamics and network-based approaches, we have quantified allosteric potential of residues in the studied proteins. The results have revealed that mutational hotspot sites often correspond to high centrality mediating centers of the residue interaction networks that are responsible for coordination of global dynamic changes and allosteric signaling. Our findings have also suggested that structurally stable mutational hotpots can act as major effectors of allosteric interactions and mutations in these positions are typically associated with severe phenotype. Modeling of shortest inter-residue pathways has shown that mutational hotspot sites can also serve as key mediating bridges of allosteric communication in the p53 and PTEN protein structures. Multiple regression models have indicated that functional significance of mutational hotspots can be strongly associated with the network signatures serving as robust predictors of critical regulatory positions responsible for loss-of-function phenotype. The results of this computational investigation are compared with the experimental studies and reveal molecular signatures of mutational hotspots, providing a plausible rationale for explaining and localizing disease-causing mutations in tumor suppressor genes.
Collapse
Affiliation(s)
- Gennady M Verkhivker
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, United States; Department of Pharmacology, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| |
Collapse
|
19
|
McGuinness KN, Pan W, Sheridan RP, Murphy G, Crespo A. Role of simple descriptors and applicability domain in predicting change in protein thermostability. PLoS One 2018; 13:e0203819. [PMID: 30192891 PMCID: PMC6128648 DOI: 10.1371/journal.pone.0203819] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 08/28/2018] [Indexed: 01/07/2023] Open
Abstract
The melting temperature (Tm) of a protein is the temperature at which half of the protein population is in a folded state. Therefore, Tm is a measure of the thermostability of a protein. Increasing the Tm of a protein is a critical goal in biotechnology and biomedicine. However, predicting the change in melting temperature (dTm) due to mutations at a single residue is difficult because it depends on an intricate balance of forces. Existing methods for predicting dTm have had similar levels of success using generally complex models. We find that training a machine learning model with a simple set of easy to calculate physicochemical descriptors describing the local environment of the mutation performed as well as more complicated machine learning models and is 2-6 orders of magnitude faster. Importantly, unlike in most previous publications, we perform a blind prospective test on our simple model by designing 96 variants of a protein not in the training set. Results from retrospective and prospective predictions reveal the limited applicability domain of each model. This study highlights the current deficiencies in the available dTm dataset and is a call to the community to systematically design a larger and more diverse experimental dataset of mutants to prospectively predict dTm with greater certainty.
Collapse
Affiliation(s)
- Kenneth N. McGuinness
- Modeling and Informatics, Merck & Co., Inc., Kenilworth, New Jersey, United States of America
| | - Weilan Pan
- Biochemical Engineering and Structure, Merck & Co., Inc., Rahway, New Jersey, United States of America
| | - Robert P. Sheridan
- Modeling and Informatics, Merck & Co., Inc., Kenilworth, New Jersey, United States of America
| | - Grant Murphy
- Biochemical Engineering and Structure, Merck & Co., Inc., Rahway, New Jersey, United States of America
| | - Alejandro Crespo
- Modeling and Informatics, Merck & Co., Inc., Kenilworth, New Jersey, United States of America
- * E-mail:
| |
Collapse
|