1
|
Meng C, Yuan Y, Zhao H, Pei Y, Li Z. IIFS: An improved incremental feature selection method for protein sequence processing. Comput Biol Med 2023; 167:107654. [PMID: 37944304 DOI: 10.1016/j.compbiomed.2023.107654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/09/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023]
Abstract
MOTIVATION Discrete features can be obtained from protein sequences using a feature extraction method. These features are the basis of downstream processing of protein data, but it is necessary to screen and select some important features from them as they generally have data redundancy. RESULT Here, we report IIFS, an improved incremental feature selection method that exploits a new subset search strategy to find the optimal feature set. IIFS combines nonadjacent sorting features to prevent the drawbacks of data explosion and excessive reliance on feature sorting results. The comparative experimental results on 27 feature sorting data show that IIFS can find more accurate and important features compared to existing methods.The IIFS approach also handles data redundancy more efficiently and finds more representative and discriminatory features while ensuring minimal feature dimensionality and good evaluation metrics. Moreover, we wrap this method and deploy it on a web server for access at http://112.124.26.17:8005/.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China; Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, China
| | - Ye Yuan
- Beidahuang Industry Group General Hospital, Harbin, 150001, China
| | - Haiyan Zhao
- College of Integration of Traditional Chinese and Western Medicine to Southwest Medical University, Luzhou, Sichuan, 646000, China
| | - Yue Pei
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zhi Li
- Department of Spleen and Stomach Diseases, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, Sichuan, 646000, China.
| |
Collapse
|
2
|
Fang J. Predicting thermostability difference between cellular protein orthologs. Bioinformatics 2023; 39:btad504. [PMID: 37572303 PMCID: PMC10457660 DOI: 10.1093/bioinformatics/btad504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/02/2023] [Accepted: 08/11/2023] [Indexed: 08/14/2023] Open
Abstract
MOTIVATION Protein thermostability is of great interest, both in theory and in practice. RESULTS This study compared orthologous proteins with different cellular thermostability. A large number of physicochemical properties of protein were calculated and used to develop a series of machine learning models for predicting cellular thermostability differences between orthologous proteins. Most of the important features in these models are also highly correlated to relative cellular thermostability. A comparison between the present study with previous comparison of orthologous proteins from thermophilic and mesophilic organisms found that most highly correlated features are consistent in these studies, suggesting they may be important to protein thermostability. AVAILABILITY AND IMPLEMENTATION Data freely available for download at https://github.com/fangj3/cellular-protein-thermostability-dataset.
Collapse
Affiliation(s)
- Jianwen Fang
- Computational & Systems Biology Branch, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, Rockville, MD 20850, United States
| |
Collapse
|
3
|
Kumar S, Duggineni VK, Singhania V, Misra SP, Deshpande PA. Unravelling and Quantifying the Biophysical– Biochemical Descriptors Governing Protein Thermostability by Machine Learning. ADVANCED THEORY AND SIMULATIONS 2023. [DOI: 10.1002/adts.202200703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
- Shashi Kumar
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Vinay Kumar Duggineni
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Vibhuti Singhania
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Swayam Prabha Misra
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Parag A. Deshpande
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| |
Collapse
|
4
|
Charoenkwan P, Chotpatiwetchkul W, Lee VS, Nantasenamat C, Shoombuatong W. A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides. Sci Rep 2021; 11:23782. [PMID: 34893688 PMCID: PMC8664844 DOI: 10.1038/s41598-021-03293-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 12/01/2021] [Indexed: 02/08/2023] Open
Abstract
Owing to their ability to maintain a thermodynamically stable fold at extremely high temperatures, thermophilic proteins (TTPs) play a critical role in basic research and a variety of applications in the food industry. As a result, the development of computation models for rapidly and accurately identifying novel TTPs from a large number of uncharacterized protein sequences is desirable. In spite of existing computational models that have already been developed for characterizing thermophilic proteins, their performance and interpretability remain unsatisfactory. We present a novel sequence-based thermophilic protein predictor, termed SCMTPP, for improving model predictability and interpretability. First, an up-to-date and high-quality dataset consisting of 1853 TPPs and 3233 non-TPPs was compiled from published literature. Second, the SCMTPP predictor was created by combining the scoring card method (SCM) with estimated propensity scores of g-gap dipeptides. Benchmarking experiments revealed that SCMTPP had a cross-validation accuracy of 0.883, which was comparable to that of a support vector machine-based predictor (0.906-0.910) and 2-17% higher than that of commonly used machine learning models. Furthermore, SCMTPP outperformed the state-of-the-art approach (ThermoPred) on the independent test dataset, with accuracy and MCC of 0.865 and 0.731, respectively. Finally, the SCMTPP-derived propensity scores were used to elucidate the critical physicochemical properties for protein thermostability enhancement. In terms of interpretability and generalizability, comparative results showed that SCMTPP was effective for identifying and characterizing TPPs. We had implemented the proposed predictor as a user-friendly online web server at http://pmlabstack.pythonanywhere.com/SCMTPP in order to allow easy access to the model. SCMTPP is expected to be a powerful tool for facilitating community-wide efforts to identify TPPs on a large scale and guiding experimental characterization of TPPs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- grid.7132.70000 0000 9039 7662Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200 Thailand
| | - Warot Chotpatiwetchkul
- grid.419784.70000 0001 0816 7508Applied Computational Chemistry Research Unit, Department of Chemistry, School of Science, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, 10520 Thailand
| | - Vannajan Sanghiran Lee
- grid.10347.310000 0001 2308 5949Department of Chemistry, Centre of Theoretical and Computational Physics, Faculty of Science, University of Malaya, 50603 Kuala Lumpur, Malaysia
| | - Chanin Nantasenamat
- grid.10223.320000 0004 1937 0490Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700 Thailand
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
5
|
Guo Z, Wang P, Liu Z, Zhao Y. Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction. Front Bioeng Biotechnol 2020; 8:584807. [PMID: 33195148 PMCID: PMC7642589 DOI: 10.3389/fbioe.2020.584807] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 09/11/2020] [Indexed: 01/19/2023] Open
Abstract
Thermophilicity is a very important property of proteins, as it sometimes determines denaturation and cell death. Thus, methods for predicting thermophilic proteins and non-thermophilic proteins are of interest and can contribute to the design and engineering of proteins. In this article, we describe the use of feature dimension reduction technology and LIBSVM to identify thermophilic proteins. The highest accuracy obtained by cross-validation was 96.02% with 119 parameters. When using only 16 features, we obtained an accuracy of 93.33%. We discuss the importance of the different characteristics in identification and report a comparison of the performance of support vector machine to that of other methods.
Collapse
Affiliation(s)
- Zifan Guo
- School of Aeronautics and Astronautic, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Zhendong Liu
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China
| | - Yuming Zhao
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| |
Collapse
|
6
|
Sanavia T, Birolo G, Montanucci L, Turina P, Capriotti E, Fariselli P. Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J 2020; 18:1968-1979. [PMID: 32774791 PMCID: PMC7397395 DOI: 10.1016/j.csbj.2020.07.011] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 07/10/2020] [Accepted: 07/14/2020] [Indexed: 12/13/2022] Open
Abstract
Protein stability predictions are becoming essential in medicine to develop novel immunotherapeutic agents and for drug discovery. Despite the large number of computational approaches for predicting the protein stability upon mutation, there are still critical unsolved problems: 1) the limited number of thermodynamic measurements for proteins provided by current databases; 2) the large intrinsic variability of ΔΔG values due to different experimental conditions; 3) biases in the development of predictive methods caused by ignoring the anti-symmetry of ΔΔG values between mutant and native protein forms; 4) over-optimistic prediction performance, due to sequence similarity between proteins used in training and test datasets. Here, we review these issues, highlighting new challenges required to improve current tools and to achieve more reliable predictions. In addition, we provide a perspective of how these methods will be beneficial for designing novel precision medicine approaches for several genetic disorders caused by mutations, such as cancer and neurodegenerative diseases.
Collapse
Affiliation(s)
- Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Giovanni Birolo
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| | - Ludovica Montanucci
- Department of Comparative Biomedicine and Food Science (BCA), University of Padova, Viale dell'Università 16, 35020 Legnaro, Italy
| | - Paola Turina
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Via F. Selmi 3, 40126 Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Via Santena 19, 10126 Torino, Italy
| |
Collapse
|
7
|
Gado JE, Beckham GT, Payne CM. Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning. J Chem Inf Model 2020; 60:4098-4107. [DOI: 10.1021/acs.jcim.0c00489] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Japheth E. Gado
- Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States
- National Bioenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United States
| | - Gregg T. Beckham
- National Bioenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United States
| | - Christina M. Payne
- Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States
| |
Collapse
|
8
|
Fang J. A critical review of five machine learning-based algorithms for predicting protein stability changes upon mutation. Brief Bioinform 2019; 21:1285-1292. [PMID: 31273374 DOI: 10.1093/bib/bbz071] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 05/14/2019] [Accepted: 05/16/2019] [Indexed: 01/02/2023] Open
Abstract
A number of machine learning (ML)-based algorithms have been proposed for predicting mutation-induced stability changes in proteins. In this critical review, we used hypothetical reverse mutations to evaluate the performance of five representative algorithms and found all of them suffer from the problem of overfitting. This approach is based on the fact that if a wild-type protein is more stable than a mutant protein, then the same mutant is less stable than the wild-type protein. We analyzed the underlying issues and suggest that the main causes of the overfitting problem include that the numbers of training cases were too small, and the features used in the models were not sufficiently informative for the task. We make recommendations on how to avoid overfitting in this important research area and improve the reliability and robustness of ML-based algorithms in general.
Collapse
Affiliation(s)
- Jianwen Fang
- Computational & Systems Biology Branch, Biometric Research Program, Division of Cancer Treatment and Diagnosis, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| |
Collapse
|
9
|
Volkening JD, Stecker KE, Sussman MR. Proteome-wide Analysis of Protein Thermal Stability in the Model Higher Plant Arabidopsis thaliana. Mol Cell Proteomics 2019; 18:308-319. [PMID: 30401684 PMCID: PMC6356070 DOI: 10.1074/mcp.ra118.001124] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Indexed: 12/16/2022] Open
Abstract
Modern tandem MS-based sequencing technologies allow for the parallel measurement of concentration and covalent modifications for proteins within a complex sample. Recently, this capability has been extended to probe a proteome's three-dimensional structure and conformational state by determining the thermal denaturation profile of thousands of proteins simultaneously. Although many animals and their resident microbes exist under a relatively narrow, regulated physiological temperature range, plants take on the often widely ranging temperature of their surroundings, possibly influencing the evolution of protein thermal stability. In this report we present the first in-depth look at the thermal proteome of a plant species, the model organism Arabidopsis thaliana By profiling the melting curves of over 1700 Arabidopsis proteins using six biological replicates, we have observed significant correlation between protein thermostability and several known protein characteristics, including molecular weight and the composition ratio of charged to polar amino acids. We also report on a divergence of the thermostability of the core and regulatory domains of the plant 26S proteasome that may reflect a unique property of the way protein turnover is regulated during temperature stress. Lastly, the highly replicated database of Arabidopsis melting temperatures reported herein provides baseline data on the variability of protein behavior in the assay. Unfolding behavior and experiment-to-experiment variability were observed to be protein-specific traits, and thus this data can serve to inform the design and interpretation of future targeted assays to probe the conformational status of proteins from plants exposed to different chemical, environmental and genetic challenges.
Collapse
Affiliation(s)
- Jeremy D Volkening
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706
| | - Kelly E Stecker
- Biomolecular Mass Spectrometry and Proteomics, Utrecht University, Utrecht, Netherlands
| | - Michael R Sussman
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706;.
| |
Collapse
|
10
|
A novel strategy to improve the thermostability of Penicillium camembertii mono- and di-acylglycerol lipase. Biochem Biophys Res Commun 2018; 500:639-644. [DOI: 10.1016/j.bbrc.2018.04.123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Accepted: 04/14/2018] [Indexed: 01/24/2023]
|
11
|
Maciejewska B, Źrubek K, Espaillat A, Wiśniewska M, Rembacz KP, Cava F, Dubin G, Drulis-Kawa Z. Modular endolysin of Burkholderia AP3 phage has the largest lysozyme-like catalytic subunit discovered to date and no catalytic aspartate residue. Sci Rep 2017; 7:14501. [PMID: 29109551 PMCID: PMC5674055 DOI: 10.1038/s41598-017-14797-9] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 10/16/2017] [Indexed: 01/19/2023] Open
Abstract
Endolysins are peptidoglycan-degrading enzymes utilized by bacteriophages to release the progeny from bacterial cells. The lytic properties of phage endolysins make them potential antibacterial agents for medical and industrial applications. Here, we present a comprehensive characterization of phage AP3 modular endolysin (AP3gp15) containing cell wall binding domain and an enzymatic domain (DUF3380 by BLASTP), both widespread and conservative. Our structural analysis demonstrates the low similarity of an enzymatic domain to known lysozymes and an unusual catalytic centre characterized by only a single glutamic acid residue and no aspartic acid. Thus, our findings suggest distinguishing a novel class of muralytic enzymes having the activity and catalytic centre organization of DUF3380. The lack of amino acid sequence homology between AP3gp15 and other known muralytic enzymes may reflect the evolutionary convergence of analogous glycosidases. Moreover, the broad antibacterial spectrum, lack of cytotoxic effect on human cells and the stability characteristics of AP3 endolysin advocate for its future application development.
Collapse
Affiliation(s)
- Barbara Maciejewska
- Institute of Genetics and Microbiology, University of Wroclaw, Przybyszewskiego 63/77, 51-148, Wroclaw, Poland
| | - Karol Źrubek
- Department of Microbiology, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, 30-387, Kraków, Poland
- Protein Crystallography Research Group, Malopolska Centre of Biotechnology, Gronostajowa 7A, 30-387, Krakow, Poland
| | - Akbar Espaillat
- Laboratory for Molecular Infection Medicine Sweden. Molecular Biology Department, Umeå University, SE-901 87, Umeå, Sweden
| | - Magdalena Wiśniewska
- Protein Crystallography Research Group, Malopolska Centre of Biotechnology, Gronostajowa 7A, 30-387, Krakow, Poland
| | - Krzysztof P Rembacz
- Department of Microbiology, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, 30-387, Kraków, Poland
- Protein Crystallography Research Group, Malopolska Centre of Biotechnology, Gronostajowa 7A, 30-387, Krakow, Poland
| | - Felipe Cava
- Laboratory for Molecular Infection Medicine Sweden. Molecular Biology Department, Umeå University, SE-901 87, Umeå, Sweden
| | - Grzegorz Dubin
- Department of Microbiology, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, 30-387, Kraków, Poland.
- Protein Crystallography Research Group, Malopolska Centre of Biotechnology, Gronostajowa 7A, 30-387, Krakow, Poland.
| | - Zuzanna Drulis-Kawa
- Institute of Genetics and Microbiology, University of Wroclaw, Przybyszewskiego 63/77, 51-148, Wroclaw, Poland.
| |
Collapse
|
12
|
Frey SL, Todd J, Wurtzler E, Strelez CR, Wendell D. A non-foaming proteosurfactant engineered from Ranaspumin-2. Colloids Surf B Biointerfaces 2015; 133:239-45. [PMID: 26117804 DOI: 10.1016/j.colsurfb.2015.05.043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Revised: 05/20/2015] [Accepted: 05/22/2015] [Indexed: 11/20/2022]
Abstract
Advances in biological surfactant proteins have already yielded a diverse range of benefits from dramatically improved survival rates for premature births to artificial photosynthesis. Presented here is the design, development, and analysis of a novel biosurfactant protein we call Surfactant Resisting Foam formatioN (SRFN). Starting with the Tungara frog's foam forming protein Ranaspumin-2, we have engineered a new surfactant protein with a destabilized hinge region to alter the kinetics and equilibrium of the protein structural transition from aqueous globular form to an extended surfactant structure at the air/water interface. SRFN is capable of approximately the same total surface tension reduction, but with the unique property of forming quickly collapsible foams. The difference in foam formation is attributed to the destabilizing glycine substitutions engineered into the hinge region. Surfactants used specifically to increase wettability, such as those used in agricultural applications would benefit from this new proteosurfactant since foamed liquid has greater wind resistance and decreased dispersal. Indeed, given growing concern of organsilicone surfactant effects on declining bee populations, biological surfactant proteins have several unique advantages over more common amphiphiles in that they can be renewably sourced, are environmentally friendly, degrade readily into non-toxic byproducts, and reduce surface tension without deleterious effects on cell membranes.
Collapse
Affiliation(s)
- Shelli L Frey
- Department of Chemistry, Gettysburg College, Gettysburg, PA 17325, United States
| | - Jacob Todd
- Department of Biomedical, Chemical and Environmental Engineering, Engineering Research Center, University of Cincinnati, Cincinnati, OH 45221, United States
| | - Elizabeth Wurtzler
- Department of Biomedical, Chemical and Environmental Engineering, Engineering Research Center, University of Cincinnati, Cincinnati, OH 45221, United States
| | - Carly R Strelez
- Department of Chemistry, Gettysburg College, Gettysburg, PA 17325, United States
| | - David Wendell
- Department of Biomedical, Chemical and Environmental Engineering, Engineering Research Center, University of Cincinnati, Cincinnati, OH 45221, United States.
| |
Collapse
|
13
|
Feng PM, Ding H, Chen W, Lin H. Naïve Bayes classifier with feature selection to identify phage virion proteins. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:530696. [PMID: 23762187 PMCID: PMC3671239 DOI: 10.1155/2013/530696] [Citation(s) in RCA: 107] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/10/2013] [Revised: 04/16/2013] [Accepted: 04/28/2013] [Indexed: 12/31/2022]
Abstract
Knowledge about the protein composition of phage virions is a key step to understand the functions of phage virion proteins. However, the experimental method to identify virion proteins is time consuming and expensive. Thus, it is highly desirable to develop novel computational methods for phage virion protein identification. In this study, a Naïve Bayes based method was proposed to predict phage virion proteins using amino acid composition and dipeptide composition. In order to remove redundant information, a novel feature selection technique was employed to single out optimized features. In the jackknife test, the proposed method achieved an accuracy of 79.15% for phage virion and nonvirion proteins classification, which are superior to that of other state-of-the-art classifiers. These results indicate that the proposed method could be as an effective and promising high-throughput method in phage proteomics research.
Collapse
Affiliation(s)
- Peng-Mian Feng
- School of Public Health, Hebei United University, Tangshan 063000, China
| | - Hui Ding
- Key Laboratory for Neuroinformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China
| | - Hao Lin
- Key Laboratory for Neuroinformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
14
|
An in silico method for designing thermostable variant of a dimeric mesophilic protein based on its 3D structure. J Mol Graph Model 2013; 42:92-103. [PMID: 23584153 DOI: 10.1016/j.jmgm.2013.02.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2012] [Revised: 02/25/2013] [Accepted: 02/27/2013] [Indexed: 11/21/2022]
Abstract
Designing proteins with enhanced thermostability has been a major interest of protein engineering because of its potential industrial applications. Here, we have presented a computational method for designing dimeric thermostable protein based on rational mutations on a mesophilic protein. Experimental and structural data indicate that the surface stability of a protein is a major factor controlling denaturation of a protein and ion-pairs are most efficient in enhancing the stability of the surfaces of the monomers and the interface between them. Our mutation based strategy is to first identify several polar or charged residues on the protein surface, interacting weakly with the rest of the protein and then replacing the side-chains of suitable neighboring residues to increase the interaction between these two residues. In stabilizing the interface, mutation is done in the interface for forming an ion pairs between the monomers. Application of this design strategy to a homo-dimeric protein and a hetero-dimeric protein as examples has produced excellent results. In both the cases the designed mutated proteins including the individual monomers and the interfaces were found to be considerably more stable than the respective mesophilic proteins as judged by self-energies and residue-wise interaction patterns. This method is easily applicable to any multi-meric proteins.
Collapse
|
15
|
Basu S, Sen S. Do Homologous Thermophilic–Mesophilic Proteins Exhibit Similar Structures and Dynamics at Optimal Growth Temperatures? A Molecular Dynamics Simulation Study. J Chem Inf Model 2013; 53:423-34. [DOI: 10.1021/ci300474h] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Affiliation(s)
- Sohini Basu
- Molecular modeling Section, Biolab, Chembiotek, TCG Lifesciences Ltd., Bengal Intelligent Park, Tower-B 2nd Floor, Block-EP & GP, Sector-V, Salt Lake Electronic Complex, Calcutta-700091, India
| | - Srikanta Sen
- Molecular modeling Section, Biolab, Chembiotek, TCG Lifesciences Ltd., Bengal Intelligent Park, Tower-B 2nd Floor, Block-EP & GP, Sector-V, Salt Lake Electronic Complex, Calcutta-700091, India
| |
Collapse
|
16
|
Holder T, Basquin C, Ebert J, Randel N, Jollivet D, Conti E, Jékely G, Bono F. Deep transcriptome-sequencing and proteome analysis of the hydrothermal vent annelid Alvinella pompejana identifies the CvP-bias as a robust measure of eukaryotic thermostability. Biol Direct 2013; 8:2. [PMID: 23324115 PMCID: PMC3564776 DOI: 10.1186/1745-6150-8-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Accepted: 01/11/2013] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Alvinella pompejana is an annelid worm that inhabits deep-sea hydrothermal vent sites in the Pacific Ocean. Living at a depth of approximately 2500 meters, these worms experience extreme environmental conditions, including high temperature and pressure as well as high levels of sulfide and heavy metals. A. pompejana is one of the most thermotolerant metazoans, making this animal a subject of great interest for studies of eukaryotic thermoadaptation. RESULTS In order to complement existing EST resources we performed deep sequencing of the A. pompejana transcriptome. We identified several thousand novel protein-coding transcripts, nearly doubling the sequence data for this annelid. We then performed an extensive survey of previously established prokaryotic thermoadaptation measures to search for global signals of thermoadaptation in A. pompejana in comparison with mesophilic eukaryotes. In an orthologous set of 457 proteins, we found that the best indicator of thermoadaptation was the difference in frequency of charged versus polar residues (CvP-bias), which was highest in A. pompejana. CvP-bias robustly distinguished prokaryotic thermophiles from prokaryotic mesophiles, as well as the thermophilic fungus Chaetomium thermophilum from mesophilic eukaryotes. Experimental values for thermophilic proteins supported higher CvP-bias as a measure of thermal stability when compared to their mesophilic orthologs. Proteome-wide mean CvP-bias also correlated with the body temperatures of homeothermic birds and mammals. CONCLUSIONS Our work extends the transcriptome resources for A. pompejana and identifies the CvP-bias as a robust and widely applicable measure of eukaryotic thermoadaptation.
Collapse
Affiliation(s)
- Thomas Holder
- Max-Planck-Institute for Developmental Biology, Spemannstr, 35, Tübingen, D-72076, Germany
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Li Y, Fang J. PROTS-RF: a robust model for predicting mutation-induced protein stability changes. PLoS One 2012; 7:e47247. [PMID: 23077576 PMCID: PMC3471942 DOI: 10.1371/journal.pone.0047247] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2012] [Accepted: 09/11/2012] [Indexed: 11/19/2022] Open
Abstract
The ability to improve protein thermostability via protein engineering is of great scientific interest and also has significant practical value. In this report we present PROTS-RF, a robust model based on the Random Forest algorithm capable of predicting thermostability changes induced by not only single-, but also double- or multiple-point mutations. The model is built using 41 features including evolutionary information, secondary structure, solvent accessibility and a set of fragment-based features. It achieves accuracies of 0.799,0.782, 0.787, and areas under receiver operating characteristic (ROC) curves of 0.873, 0.868 and 0.862 for single-, double- and multiple- point mutation datasets, respectively. Contrary to previous suggestions, our results clearly demonstrate that a robust predictive model trained for predicting single point mutation induced thermostability changes can be capable of predicting double and multiple point mutations. It also shows high levels of robustness in the tests using hypothetical reverse mutations. We demonstrate that testing datasets created based on physical principles can be highly useful for testing the robustness of predictive models.
Collapse
Affiliation(s)
- Yunqi Li
- Applied Bioinformatics Laboratory, The University of Kansas, Lawrence, Kansas, United States of America
| | - Jianwen Fang
- Applied Bioinformatics Laboratory, The University of Kansas, Lawrence, Kansas, United States of America
- * E-mail:
| |
Collapse
|
18
|
Zuo YC, Chen W, Fan GL, Li QZ. A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins. Amino Acids 2012; 44:573-80. [PMID: 22851052 DOI: 10.1007/s00726-012-1374-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 07/17/2012] [Indexed: 11/25/2022]
Abstract
The successful prediction of thermophilic proteins is useful for designing stable enzymes that are functional at high temperature. We have used the increment of diversity (ID), a novel amino acid composition-based similarity distance, in a 2-class K-nearest neighbor classifier to classify thermophilic and mesophilic proteins. And the KNN-ID classifier was successfully developed to predict the thermophilic proteins. Instead of extracting features from protein sequences as done previously, our approach was based on a diversity measure of symbol sequences. The similarity distance between each pair of protein sequences was first calculated to quantitatively measure the similarity level of one given sequence and the other. The query protein is then determined using the K-nearest neighbor algorithm. Comparisons with multiple recently published methods showed that the KNN-ID proposed in this study outperforms the other methods. The improved predictive performance indicated it is a simple and effective classifier for discriminating thermophilic and mesophilic proteins. At last, the influence of protein length and protein identity on prediction accuracy was discussed further. The prediction model and dataset used in this article can be freely downloaded from http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm .
Collapse
Affiliation(s)
- Yong-Chun Zuo
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
| | | | | | | |
Collapse
|
19
|
Dutta C, Paul S. Microbial lifestyle and genome signatures. Curr Genomics 2012; 13:153-62. [PMID: 23024607 PMCID: PMC3308326 DOI: 10.2174/138920212799860698] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2011] [Revised: 09/13/2011] [Accepted: 09/28/2011] [Indexed: 12/29/2022] Open
Abstract
Microbes are known for their unique ability to adapt to varying lifestyle and environment, even to the extreme or adverse ones. The genomic architecture of a microbe may bear the signatures not only of its phylogenetic position, but also of the kind of lifestyle to which it is adapted. The present review aims to provide an account of the specific genome signatures observed in microbes acclimatized to distinct lifestyles or ecological niches. Niche-specific signatures identified at different levels of microbial genome organization like base composition, GC-skew, purine-pyrimidine ratio, dinucleotide abundance, codon bias, oligonucleotide composition etc. have been discussed. Among the specific cases highlighted in the review are the phenomena of genome shrinkage in obligatory host-restricted microbes, genome expansion in strictly intra-amoebal pathogens, strand-specific codon usage in intracellular species, acquisition of genome islands in pathogenic or symbiotic organisms, discriminatory genomic traits of marine microbes with distinct trophic strategies, and conspicuous sequence features of certain extremophiles like those adapted to high temperature or high salinity.
Collapse
Affiliation(s)
- Chitra Dutta
- Structural Biology & Bioinformatics Division, CSIR- Indian Institute of Chemical Biology, 4, Raja S. C. Mullick Road, Kolkata 700032, India
| | | |
Collapse
|
20
|
Wainreb G, Wolf L, Ashkenazy H, Dehouck Y, Ben-Tal N. Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid site. ACTA ACUST UNITED AC 2011; 27:3286-92. [PMID: 21998155 PMCID: PMC3223369 DOI: 10.1093/bioinformatics/btr576] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Motivation: Accurate prediction of protein stability is important for understanding the molecular underpinnings of diseases and for the design of new proteins. We introduce a novel approach for the prediction of changes in protein stability that arise from a single-site amino acid substitution; the approach uses available data on mutations occurring in the same position and in other positions. Our algorithm, named Pro-Maya (Protein Mutant stAbilitY Analyzer), combines a collaborative filtering baseline model, Random Forests regression and a diverse set of features. Pro-Maya predicts the stability free energy difference of mutant versus wild type, denoted as ΔΔG. Results: We evaluated our algorithm extensively using cross-validation on two previously utilized datasets of single amino acid mutations and a (third) validation set. The results indicate that using known ΔΔG values of mutations at the query position improves the accuracy of ΔΔG predictions for other mutations in that position. The accuracy of our predictions in such cases significantly surpasses that of similar methods, achieving, e.g. a Pearson's correlation coefficient of 0.79 and a root mean square error of 0.96 on the validation set. Because Pro-Maya uses a diverse set of features, including predictions using two other methods, it also performs slightly better than other methods in the absence of additional experimental data on the query positions. Availability: Pro-Maya is freely available via web server at http://bental.tau.ac.il/ProMaya. Contact:nirb@tauex.tau.ac.il; wolf@cs.tau.ac.il Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gilad Wainreb
- Department of Biochemistry and Molecular Biology, Tel-Aviv University, Ramat Aviv 69978, Israel
| | | | | | | | | |
Collapse
|
21
|
Lu JL, Hu XH, Hu DG. A new hybrid fractal algorithm for predicting thermophilic nucleotide sequences. J Theor Biol 2011; 293:74-81. [PMID: 22001320 DOI: 10.1016/j.jtbi.2011.09.028] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2011] [Revised: 09/23/2011] [Accepted: 09/26/2011] [Indexed: 01/20/2023]
Abstract
Knowledge of thermophilic mechanisms about some organisms whose optimum growth temperature (OGT) ranges from 50 to 80 degree plays a major role in helping design stable proteins. How to predict a DNA sequence to be thermophilic is a long but not fairly resolved problem. Chaos game representation (CGR) can investigate the patterns hiding in DNA sequences, and can visually reveal previously unknown structure. Fractal dimensions are good tools to measure sizes of complex, highly irregular geometric objects. In this paper, we convert every DNA sequence into a high dimensional vector by CGR algorithm and fractal dimension, and then predict the DNA sequence thermostability by these fractal features and support vector machine (SVM). We have conducted experiments on three groups: 17-dimensional vector, 65-dimensional vector, and 257-dimensional vector. Each group is evaluated by the 10-fold cross-validation test. For the results, the group of 257-dimensional vector gets the best results: the average accuracy is 0.9456 and average MCC is 0.8878. The results are also compared with the previous work with single CGR features. The comparison shows the high effectiveness of the new hybrid fractal algorithm.
Collapse
Affiliation(s)
- Jin-Long Lu
- College of Science, Huazhong Agricultural University, Wuhan, PR China
| | | | | |
Collapse
|
22
|
Li Y, Zhang J, Tai D, Middaugh CR, Zhang Y, Fang J. PROTS: a fragment based protein thermo-stability potential. Proteins 2011; 80:81-92. [PMID: 21976375 DOI: 10.1002/prot.23163] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Revised: 07/18/2011] [Accepted: 07/31/2011] [Indexed: 12/30/2022]
Abstract
Designing proteins with enhanced thermo-stability has been a main focus of protein engineering because of its theoretical and practical significance. Despite extensive studies in the past years, a general strategy for stabilizing proteins still remains elusive. Thus effective and robust computational algorithms for designing thermo-stable proteins are in critical demand. Here we report PROTS, a sequential and structural four-residue fragment based protein thermo-stability potential. PROTS is derived from a nonredundant representative collection of thousands of thermophilic and mesophilic protein structures and a large set of point mutations with experimentally determined changes of melting temperatures. To the best of our knowledge, PROTS is the first protein stability predictor based on integrated analysis and mining of these two types of data. Besides conventional cross validation and blind testing, we introduce hypothetical reverse mutations as a means of testing the robustness of protein thermo-stability predictors. In all tests, PROTS demonstrates the ability to reliably predict mutation induced thermo-stability changes as well as classify thermophilic and mesophilic proteins. In addition, this white-box predictor allows easy interpretation of the factors that influence mutation induced protein stability changes at the residue level.
Collapse
Affiliation(s)
- Yunqi Li
- Applied Bioinformatics Laboratory, the University of Kansas, Lawrence, Kansas 66047, USA
| | | | | | | | | | | |
Collapse
|
23
|
Nakariyakul S, Liu ZP, Chen L. Detecting thermophilic proteins through selecting amino acid and dipeptide composition features. Amino Acids 2011; 42:1947-53. [DOI: 10.1007/s00726-011-0923-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2011] [Accepted: 04/20/2011] [Indexed: 11/29/2022]
|
24
|
Lin H, Chen W. Prediction of thermophilic proteins using feature selection technique. J Microbiol Methods 2010; 84:67-70. [PMID: 21044646 DOI: 10.1016/j.mimet.2010.10.013] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Revised: 10/15/2010] [Accepted: 10/19/2010] [Indexed: 11/16/2022]
Abstract
The thermostability of proteins is particularly relevant for enzyme engineering. Developing a computational method to identify mesophilic proteins would be helpful for protein engineering and design. In this work, we developed support vector machine based method to predict thermophilic proteins using the information of amino acid distribution and selected amino acid pairs. A reliable benchmark dataset including 915 thermophilic proteins and 793 non-thermophilic proteins was constructed for training and testing the proposed models. Results showed that 93.8% thermophilic proteins and 92.7% non-thermophilic proteins could be correctly predicted by using jackknife cross-validation. High predictive successful rate exhibits that this model can be applied for designing stable proteins.
Collapse
Affiliation(s)
- Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | | |
Collapse
|
25
|
Tian J, Wu N, Chu X, Fan Y. Predicting changes in protein thermostability brought about by single- or multi-site mutations. BMC Bioinformatics 2010; 11:370. [PMID: 20598148 PMCID: PMC2906492 DOI: 10.1186/1471-2105-11-370] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2010] [Accepted: 07/02/2010] [Indexed: 01/24/2023] Open
Abstract
Background An important aspect of protein design is the ability to predict changes in protein thermostability arising from single- or multi-site mutations. Protein thermostability is reflected in the change in free energy (ΔΔG) of thermal denaturation. Results We have developed predictive software, Prethermut, based on machine learning methods, to predict the effect of single- or multi-site mutations on protein thermostability. The input vector of Prethermut is based on known structural changes and empirical measurements of changes in potential energy due to protein mutations. Using a 10-fold cross validation test on the M-dataset, consisting of 3366 mutants proteins from ProTherm, the classification accuracy of random forests and the regression accuracy of random forest regression were slightly better than support vector machines and support vector regression, whereas the overall accuracy of classification and the Pearson correlation coefficient of regression were 79.2% and 0.72, respectively. Prethermut performs better on proteins containing multi-site mutations than those with single mutations. Conclusions The performance of Prethermut indicates that it is a useful tool for predicting changes in protein thermostability brought about by single- or multi-site mutations and will be valuable in the rational design of proteins.
Collapse
Affiliation(s)
- Jian Tian
- Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | | | | | | |
Collapse
|
26
|
Jordan DM, Ramensky VE, Sunyaev SR. Human allelic variation: perspective from protein function, structure, and evolution. Curr Opin Struct Biol 2010; 20:342-50. [PMID: 20399638 PMCID: PMC2921592 DOI: 10.1016/j.sbi.2010.03.006] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2010] [Accepted: 03/22/2010] [Indexed: 01/20/2023]
Abstract
It is widely anticipated that the coming year will be marked by the complete characterization of DNA sequence of protein-coding regions of thousands of human individuals. A number of existing computational methods use comparative protein sequence analysis and analysis of protein structure to predict the functional effect of coding human alleles. Functional and structural analysis of coding allelic variants can inform various aspects of research on human genetic variation. In population and evolutionary genetics it helps estimate the strength of purifying selection against deleterious missense mutations and study the imprint of demographic history on deleterious genetic variation. In medical genetics it may assist in the interpretation of uncharacterized mutations in genes involved in monogenic and oligogenic diseases. It has a potential to facilitate medical sequencing studies searching for genes underlying Mendelian diseases or harboring rare alleles involved in complex traits.
Collapse
Affiliation(s)
- Daniel M. Jordan
- Division of Genetics, Brigham & Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
- Program in Biophysics, Harvard University, Cambridge, Massachusetts, USA
| | - Vasily E. Ramensky
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia
| | - Shamil R. Sunyaev
- Division of Genetics, Brigham & Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
27
|
Li Y, Middaugh CR, Fang J. A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants. BMC Bioinformatics 2010; 11:62. [PMID: 20109199 PMCID: PMC3098108 DOI: 10.1186/1471-2105-11-62] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2009] [Accepted: 01/28/2010] [Indexed: 11/10/2022] Open
Abstract
Background The ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants. Results We report a novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting the relative thermostability of protein mutants. The scoring function was developed based on an elaborate analysis of a set of features calculated or predicted from 540 pairs of hyperthermophilic and mesophilic protein ortholog sequences. It was constructed by a linear combination of ten important features identified by a feature ranking procedure based on the random forest classification algorithm. The weights of these features in the scoring function were fitted by a hill-climbing algorithm. This scoring function has shown an excellent ability to discriminate hyperthermophilic from mesophilic sequences. The prediction accuracies reached 98.9% and 97.3% in discriminating orthologous pairs in training and the holdout testing datasets, respectively. Moreover, the scoring function can distinguish non-homologous sequences with an accuracy of 88.4%. Additional blind tests using two datasets of experimentally investigated mutations demonstrated that the scoring function can be used to predict the relative thermostability of proteins and their mutants at very high accuracies (92.9% and 94.4%). We also developed an amino acid substitution preference matrix between mesophilic and hyperthermophilic proteins, which may be useful in designing more thermostable proteins. Conclusions We have presented a novel scoring function which can distinguish not only HP/MP ortholog pairs, but also non-homologous pairs at high accuracies. Most importantly, it can be used to accurately predict the relative stability of proteins and their mutants, as demonstrated in two blind tests. In addition, the residue substitution preference matrix assembled in this study may reflect the thermal adaptation induced substitution biases. A web server implementing the scoring function and the dataset used in this study are freely available at http://www.abl.ku.edu/thermorank/.
Collapse
Affiliation(s)
- Yunqi Li
- Applied Bioinformatics Laboratory, the University of Kansas, Lawrence, KS 66047, USA
| | | | | |
Collapse
|
28
|
Basu S, Sen S. Turning a Mesophilic Protein into a Thermophilic One: A Computational Approach Based on 3D Structural Features. J Chem Inf Model 2009; 49:1741-50. [DOI: 10.1021/ci900183m] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Affiliation(s)
- Sohini Basu
- Molecular Modeling Section, Biolab, Chembiotek, TCG Lifesciences Ltd., Bengal Intelligent Park, Tower-B 2nd Floor, Block-EP & GP, Sector-V, Salt Lake Electronic Complex, Calcutta-700091, India
| | - Srikanta Sen
- Molecular Modeling Section, Biolab, Chembiotek, TCG Lifesciences Ltd., Bengal Intelligent Park, Tower-B 2nd Floor, Block-EP & GP, Sector-V, Salt Lake Electronic Complex, Calcutta-700091, India
| |
Collapse
|
29
|
Huang LT, Gromiha MM. Reliable prediction of protein thermostability change upon double mutation from amino acid sequence. ACTA ACUST UNITED AC 2009; 25:2181-7. [PMID: 19535532 DOI: 10.1093/bioinformatics/btp370] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
SUMMARY The accurate prediction of protein stability change upon mutation is one of the important issues for protein design. In this work, we have focused on the stability change of double mutations and systematically analyzed the wild-type and mutant residues, patterns in amino acid sequence and locations of mutants. Based on the sequence information of wild-type, mutant and three neighboring residues, we have presented a weighted decision table method (WET) for predicting the stability changes of 180 double mutants obtained from thermal (DeltaDeltaG) denaturation. Using 10-fold cross-validation test, our method showed a correlation of 0.75 between experimental and predicted values of stability changes, and an accuracy of 82.2% for discriminating the stabilizing and destabilizing mutants.
Collapse
Affiliation(s)
- Liang-Tsung Huang
- Department of Computer Science and Information Engineering, Mingdao University, Changhua 523, Taiwan
| | | |
Collapse
|
30
|
Damborsky J, Brezovsky J. Computational tools for designing and engineering biocatalysts. Curr Opin Chem Biol 2009; 13:26-34. [PMID: 19297237 DOI: 10.1016/j.cbpa.2009.02.021] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2008] [Revised: 02/15/2009] [Accepted: 02/17/2009] [Indexed: 11/28/2022]
Abstract
Current computational tools to assist experimentalists for the design and engineering of proteins with desired catalytic properties are reviewed. The applications of these tools for de novo design of protein active sites, optimization of substrate access and product exit pathways, redesign of protein-protein interfaces, identification of neutral/advantageous/deleterious mutations in the libraries from directed evolution and stabilization of protein structures are described. Remarkable progress is seen in de novo design of enzymes catalyzing a chemical reaction for which a natural biocatalyst does not exist. Yet, constructed biocatalysts do not match natural enzymes in their efficiency, suggesting that more research is needed to capture all the important features of natural biocatalysts in theoretical designs.
Collapse
Affiliation(s)
- Jiri Damborsky
- Institute of Experimental Biology and National Centre for Biomolecular Research, Masaryk University, Brno, Czech Republic.
| | | |
Collapse
|