1
|
Rodella C, Lazaridi S, Lemmin T. TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms. BIOINFORMATICS ADVANCES 2024; 4:vbae103. [PMID: 39040220 PMCID: PMC11262459 DOI: 10.1093/bioadv/vbae103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 06/14/2024] [Accepted: 07/12/2024] [Indexed: 07/24/2024]
Abstract
Motivation Understanding protein thermostability is essential for numerous biotechnological applications, but traditional experimental methods are time-consuming, expensive, and error-prone. Recently, deep learning (DL) techniques from natural language processing (NLP) was extended to the field of biology, since the primary sequence of proteins can be viewed as a string of amino acids that follow a physicochemical grammar. Results In this study, we developed TemBERTure, a DL framework that predicts thermostability class and melting temperature from protein sequences. Our findings emphasize the importance of data diversity for training robust models, especially by including sequences from a wider range of organisms. Additionally, we suggest using attention scores from Deep Learning models to gain deeper insights into protein thermostability. Analyzing these scores in conjunction with the 3D protein structure can enhance understanding of the complex interactions among amino acid properties, their positioning, and the surrounding microenvironment. By addressing the limitations of current prediction methods and introducing new exploration avenues, this research paves the way for more accurate and informative protein thermostability predictions, ultimately accelerating advancements in protein engineering. Availability and implementation TemBERTure model and the data are available at: https://github.com/ibmm-unibe-ch/TemBERTure.
Collapse
Affiliation(s)
- Chiara Rodella
- Institute of Biochemistry and Molecular Medicine (IBMM), University of Bern, Bern CH-3012, Switzerland
- Graduate School for Cellular and Biomedical Sciences (GCB), University of Bern, Bern CH-3012, Switzerland
| | - Symela Lazaridi
- Institute of Biochemistry and Molecular Medicine (IBMM), University of Bern, Bern CH-3012, Switzerland
- Graduate School for Cellular and Biomedical Sciences (GCB), University of Bern, Bern CH-3012, Switzerland
| | - Thomas Lemmin
- Institute of Biochemistry and Molecular Medicine (IBMM), University of Bern, Bern CH-3012, Switzerland
| |
Collapse
|
2
|
Susanty M, Naim Mursalim MK, Hertadi R, Purwarianti A, Rajab TLE. Classifying alkaliphilic proteins using embeddings from protein language model. Comput Biol Med 2024; 173:108385. [PMID: 38547659 DOI: 10.1016/j.compbiomed.2024.108385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 03/22/2024] [Accepted: 03/24/2024] [Indexed: 04/17/2024]
Abstract
Alkaliphilic proteins have great potential as biocatalysts in biotechnology, especially for enzyme engineering. Extensive research has focused on exploring the enzymatic potential of alkaliphiles and characterizing alkaliphilic proteins. However, the current method employed for identifying these proteins that requires web lab experiment is time-consuming, labor-intensive, and expensive. Therefore, the development of a computational method for alkaliphilic protein identification would be invaluable for protein engineering and design. In this study, we present a novel approach that uses embeddings from a protein language model called ESM-2(3B) in a deep learning framework to classify alkaliphilic and non-alkaliphilic proteins. To our knowledge, this is the first attempt to employ embeddings from a pre-trained protein language model to classify alkaliphilic protein. A reliable dataset comprising 1,002 alkaliphilic and 1,866 non-alkaliphilic proteins was constructed for training and testing the proposed model. The proposed model, dubbed ALPACA, achieves performance scores of 0.88, 0.84, and 0.75 for accuracy, f1-score, and Matthew correlation coefficient respectively on independent dataset. ALPACA is likely to serve as a valuable resource for exploring protein alkalinity and its role in protein design and engineering.
Collapse
Affiliation(s)
- Meredita Susanty
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Universitas Pertamina, School of Computer Science, Jl Teuku Nyak Arief Jakarta Selatan DKI Jakarta, Indonesia
| | - Muhammad Khaerul Naim Mursalim
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Universitas Universal, Kompleks Maha Vihara Duta Maitreya Bukit Beruntung, Sei Panas Batam, 29456, Kepulauan Riau, Indonesia
| | - Rukman Hertadi
- Institut Teknologi Bandung Faculty of Math and Natural Sciences, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia
| | - Ayu Purwarianti
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia; Center for Artificial Intelligence (U-CoE AI-VLB), Institut Teknologi Bandung, Bandung, Indonesia
| | - Tati LE Rajab
- Institut Teknologi Bandung School of Electrical Engineering and Informatics, Jl. Ganesa 10, Bandung, Jawa Barat, Indonesia.
| |
Collapse
|
3
|
Amangeldina A, Tan ZW, Berezovsky IN. Living in trinity of extremes: Genomic and proteomic signatures of halophilic, thermophilic, and pH adaptation. Curr Res Struct Biol 2024; 7:100129. [PMID: 38327713 PMCID: PMC10847869 DOI: 10.1016/j.crstbi.2024.100129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/16/2024] [Accepted: 01/16/2024] [Indexed: 02/09/2024] Open
Abstract
Since nucleic acids and proteins of unicellular prokaryotes are directly exposed to extreme environmental conditions, it is possible to explore the genomic-proteomic compositional determinants of molecular mechanisms of adaptation developed by them in response to harsh environmental conditions. Using a wealth of currently available complete genomes/proteomes we were able to explore signatures of adaptation to three environmental factors, pH, salinity, and temperature, observing major trends in compositions of their nucleic acids and proteins. We derived predictors of thermostability, halophilic, and pH adaptations and complemented them by the principal components analysis. We observed a clear difference between thermophilic and salinity/pH adaptations, whereas latter invoke seemingly overlapping mechanisms. The genome-proteome compositional trade-off reveals an intricate balance between the work of base paring and base stacking in stabilization of coding DNA and r/tRNAs, and, at the same time, universal requirements for the stability and foldability of proteins regardless of the nucleotide biases. Nevertheless, we still found hidden fingerprints of ancient evolutionary connections between the nucleotide and amino acid compositions indicating their emergence, mutual evolution, and adjustment. The evolutionary perspective on the adaptation mechanisms is further studied here by means of the comparative analysis of genomic/proteomic traits of archaeal and bacterial species. The overall picture of genomic/proteomic signals of adaptation obtained here provides a foundation for future engineering and design of functional biomolecules resistant to harsh environments.
Collapse
Affiliation(s)
- Aidana Amangeldina
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| | - Zhen Wah Tan
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | - Igor N. Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| |
Collapse
|
4
|
Haselbeck F, John M, Zhang Y, Pirnay J, Fuenzalida-Werner J, Costa R, Grimm D. Superior protein thermophilicity prediction with protein language model embeddings. NAR Genom Bioinform 2023; 5:lqad087. [PMID: 37829176 PMCID: PMC10566323 DOI: 10.1093/nargab/lqad087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Revised: 07/14/2023] [Accepted: 09/18/2023] [Indexed: 10/14/2023] Open
Abstract
Protein thermostability is important in many areas of biotechnology, including enzyme engineering and protein-hybrid optoelectronics. Ever-growing protein databases and information on stability at different temperatures allow the training of machine learning models to predict whether proteins are thermophilic. In silico predictions could reduce costs and accelerate the development process by guiding researchers to more promising candidates. Existing models for predicting protein thermophilicity rely mainly on features derived from physicochemical properties. Recently, modern protein language models that directly use sequence information have demonstrated superior performance in several tasks. In this study, we evaluate the usefulness of protein language model embeddings for thermophilicity prediction with ProLaTherm, a Protein Language model-based Thermophilicity predictor. ProLaTherm significantly outperforms all feature-, sequence- and literature-based comparison partners on multiple evaluation metrics. In terms of the Matthew's correlation coefficient, ProLaTherm outperforms the second-best competitor by 18.1% in a nested cross-validation setup. Using proteins from species not overlapping with species from the training data, ProLaTherm outperforms all competitors by at least 9.7%. On these data, it misclassified only one nonthermophilic protein as thermophilic. Furthermore, it correctly identified 97.4% of all thermophilic proteins in our test set with an optimal growth temperature above 70°C.
Collapse
Affiliation(s)
- Florian Haselbeck
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany
| | - Maura John
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany
| | - Yuqi Zhang
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany
| | - Jonathan Pirnay
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany
| | - Juan Pablo Fuenzalida-Werner
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Chair of Biogenic Functional Materials, 94315 Straubing, Germany
| | - Rubén D Costa
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Chair of Biogenic Functional Materials, 94315 Straubing, Germany
| | - Dominik G Grimm
- Technical University of Munich, Campus Straubing for Biotechnology and Sustainability, Bioinformatics, 94315 Straubing, Germany
- Weihenstephan-Triesdorf University of Applied Sciences, Bioinformatics, 94315 Straubing, Germany
- Technical University of Munich, TUM School of Computation, Information and Technology (CIT), 85748 Garching, Germany
| |
Collapse
|
5
|
Wan H, Zhang Y, Huang S. Prediction of thermophilic protein using 2-D general series correlation pseudo amino acid features. Methods 2023; 218:141-148. [PMID: 37604248 DOI: 10.1016/j.ymeth.2023.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/08/2023] [Accepted: 08/18/2023] [Indexed: 08/23/2023] Open
Abstract
The demand for thermophilic protein has been increasing in protein engineering recently. Many machine-learning methods for identifying thermophilic proteins have emerged during this period. However, most machine learning-based thermophilic protein identification studies have only focused on accuracy. The relationship between the features' meaning and the proteins' physicochemical properties has yet to be studied in depth. In this article, we focused on the relationship between the features and the thermal stability of thermophilic proteins. This method used 2-D general series correlation pseudo amino acid (SC-PseAAC-General) features and realized accuracy of 82.76% using the J48 classifier. In addition, this research found the presence of higher frequencies of glutamic acid in thermophilic proteins, which help thermophilic proteins maintain their thermal stability by forming hydrogen bonds and salt bridges that prevent denaturation at high temperatures.
Collapse
Affiliation(s)
- Hao Wan
- College of Life Science, Qingdao University, Qingdao 266071, China.
| | - Yanan Zhang
- College of Life Science, Qingdao University, Qingdao 266071, China
| | - Shibo Huang
- Beidahuang Industry Group General Hospital, Harbin 150001, China
| |
Collapse
|
6
|
Huang A, Lu F, Liu F. Discrimination of psychrophilic enzymes using machine learning algorithms with amino acid composition descriptor. Front Microbiol 2023; 14:1130594. [PMID: 36860491 PMCID: PMC9968940 DOI: 10.3389/fmicb.2023.1130594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 01/23/2023] [Indexed: 02/16/2023] Open
Abstract
Introduction Psychrophilic enzymes are a class of macromolecules with high catalytic activity at low temperatures. Cold-active enzymes possessing eco-friendly and cost-effective properties, are of huge potential application in detergent, textiles, environmental remediation, pharmaceutical as well as food industry. Compared with the time-consuming and labor-intensive experiments, computational modeling especially the machine learning (ML) algorithm is a high-throughput screening tool to identify psychrophilic enzymes efficiently. Methods In this study, the influence of 4 ML methods (support vector machines, K-nearest neighbor, random forest, and naïve Bayes), and three descriptors, i.e., amino acid composition (AAC), dipeptide combinations (DPC), and AAC + DPC on the model performance were systematically analyzed. Results and discussion Among the 4 ML methods, the support vector machine model based on the AAC descriptor using 5-fold cross-validation achieved the best prediction accuracy with 80.6%. The AAC outperformed than the DPC and AAC + DPC descriptors regardless of the ML methods used. In addition, amino acid frequencies between psychrophilic and non-psychrophilic proteins revealed that higher frequencies of Ala, Gly, Ser, and Thr, and lower frequencies of Glu, Lys, Arg, Ile,Val, and Leu could be related to the protein psychrophilicity. Further, ternary models were also developed that could classify psychrophilic, mesophilic, and thermophilic proteins effectively. The predictive accuracy of the ternary classification model using AAC descriptor via the support vector machine algorithm was 75.8%. These findings would enhance our insight into the cold-adaption mechanisms of psychrophilic proteins and aid in the design of engineered cold-active enzymes. Moreover, the proposed model could be used as a screening tool to identify novel cold-adapted proteins.
Collapse
Affiliation(s)
- Ailan Huang
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, China
| | - Fuping Lu
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, China,Key Laboratory of Industrial Fermentation Microbiology, Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, Tianjin, China
| | - Fufeng Liu
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, China,Key Laboratory of Industrial Fermentation Microbiology, Ministry of Education, Tianjin Key Laboratory of Industrial Microbiology, Tianjin, China,*Correspondence: Fufeng Liu, ✉ ;
| |
Collapse
|
7
|
Hossain D, Scott SH, Cluff T, Dukelow SP. The use of machine learning and deep learning techniques to assess proprioceptive impairments of the upper limb after stroke. J Neuroeng Rehabil 2023; 20:15. [PMID: 36707846 PMCID: PMC9881388 DOI: 10.1186/s12984-023-01140-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 01/18/2023] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Robots can generate rich kinematic datasets that have the potential to provide far more insight into impairments than standard clinical ordinal scales. Determining how to define the presence or absence of impairment in individuals using kinematic data, however, can be challenging. Machine learning techniques offer a potential solution to this problem. In the present manuscript we examine proprioception in stroke survivors using a robotic arm position matching task. Proprioception is impaired in 50-60% of stroke survivors and has been associated with poorer motor recovery and longer lengths of hospital stay. We present a simple cut-off score technique for individual kinematic parameters and an overall task score to determine impairment. We then compare the ability of different machine learning (ML) techniques and the above-mentioned task score to correctly classify individuals with or without stroke based on kinematic data. METHODS Participants performed an Arm Position Matching (APM) task in an exoskeleton robot. The task produced 12 kinematic parameters that quantify multiple attributes of position sense. We first quantified impairment in individual parameters and an overall task score by determining if participants with stroke fell outside of the 95% cut-off score of control (normative) values. Then, we applied five machine learning algorithms (i.e., Logistic Regression, Decision Tree, Random Forest, Random Forest with Hyperparameters Tuning, and Support Vector Machine), and a deep learning algorithm (i.e., Deep Neural Network) to classify individual participants as to whether or not they had a stroke based only on kinematic parameters using a tenfold cross-validation approach. RESULTS We recruited 429 participants with neuroimaging-confirmed stroke (< 35 days post-stroke) and 465 healthy controls. Depending on the APM parameter, we observed that 10.9-48.4% of stroke participants were impaired, while 44% were impaired based on their overall task score. The mean performance metrics of machine learning and deep learning models were: accuracy 82.4%, precision 85.6%, recall 76.5%, and F1 score 80.6%. All machine learning and deep learning models displayed similar classification accuracy; however, the Random Forest model had the highest numerical accuracy (83%). Our models showed higher sensitivity and specificity (AUC = 0.89) in classifying individual participants than the overall task score (AUC = 0.85) based on their performance in the APM task. We also found that variability was the most important feature in classifying performance in the APM task. CONCLUSION Our ML models displayed similar classification performance. ML models were able to integrate more kinematic information and relationships between variables into decision making and displayed better classification performance than the overall task score. ML may help to provide insight into individual kinematic features that have previously been overlooked with respect to clinical importance.
Collapse
Affiliation(s)
- Delowar Hossain
- grid.22072.350000 0004 1936 7697Department of Clinical Neuroscience, Cumming School of Medicine, University of Calgary, Calgary, AB Canada
| | - Stephen H. Scott
- grid.410356.50000 0004 1936 8331Department of Biomedical and Molecular Sciences, Queen’s University, Kingston, ON Canada
| | - Tyler Cluff
- grid.22072.350000 0004 1936 7697Faculty of Kinesiology, University of Calgary, Calgary, AB Canada
| | - Sean P. Dukelow
- grid.22072.350000 0004 1936 7697Department of Clinical Neuroscience, Cumming School of Medicine, University of Calgary, Calgary, AB Canada
| |
Collapse
|
8
|
Kumar S, Duggineni VK, Singhania V, Misra SP, Deshpande PA. Unravelling and Quantifying the Biophysical– Biochemical Descriptors Governing Protein Thermostability by Machine Learning. ADVANCED THEORY AND SIMULATIONS 2023. [DOI: 10.1002/adts.202200703] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
- Shashi Kumar
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Vinay Kumar Duggineni
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Vibhuti Singhania
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Swayam Prabha Misra
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| | - Parag A. Deshpande
- Quantum and Molecular Engineering Laboratory Department of Chemical Engineering Indian Institute of Technology Kharagpur Kharagpur 721302 India
| |
Collapse
|
9
|
Harmalkar A, Rao R, Richard Xie Y, Honer J, Deisting W, Anlahr J, Hoenig A, Czwikla J, Sienz-Widmann E, Rau D, Rice AJ, Riley TP, Li D, Catterall HB, Tinberg CE, Gray JJ, Wei KY. Toward generalizable prediction of antibody thermostability using machine learning on sequence and structure features. MAbs 2023; 15:2163584. [PMID: 36683173 PMCID: PMC9872953 DOI: 10.1080/19420862.2022.2163584] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 12/14/2022] [Accepted: 12/26/2022] [Indexed: 01/24/2023] Open
Abstract
Over the last three decades, the appeal for monoclonal antibodies (mAbs) as therapeutics has been steadily increasing as evident with FDA's recent landmark approval of the 100th mAb. Unlike mAbs that bind to single targets, multispecific biologics (msAbs) have garnered particular interest owing to the advantage of engaging distinct targets. One important modular component of msAbs is the single-chain variable fragment (scFv). Despite the exquisite specificity and affinity of these scFv modules, their relatively poor thermostability often hampers their development as a potential therapeutic drug. In recent years, engineering antibody sequences to enhance their stability by mutations has gained considerable momentum. As experimental methods for antibody engineering are time-intensive, laborious and expensive, computational methods serve as a fast and inexpensive alternative to conventional routes. In this work, we show two machine learning approaches - one with pre-trained language models (PTLM) capturing functional effects of sequence variation, and second, a supervised convolutional neural network (CNN) trained with Rosetta energetic features - to better classify thermostable scFv variants from sequence. Both of these models are trained over temperature-specific data (TS50 measurements) derived from multiple libraries of scFv sequences. On out-of-distribution (refers to the fact that the out-of-distribution sequnes are blind to the algorithm) sequences, we show that a sufficiently simple CNN model performs better than general pre-trained language models trained on diverse protein sequences (average Spearman correlation coefficient, ρ , of 0.4 as opposed to 0.15). On the other hand, an antibody-specific language model performs comparatively better than the CNN model on the same task (ρ = 0.52). Further, we demonstrate that for an independent mAb with available thermal melting temperatures for 20 experimentally characterized thermostable mutations, these models trained on TS50 data could identify 18 residue positions and 5 identical amino-acid mutations showing remarkable generalizability. Our results suggest that such models can be broadly applicable for improving the biological characteristics of antibodies. Further, transferring such models for alternative physicochemical properties of scFvs can have potential applications in optimizing large-scale production and delivery of mAbs or bsAbs.
Collapse
Affiliation(s)
- Ameya Harmalkar
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Roshan Rao
- Electrical Engineering and Computer Science, University of California, Berkeley, CA, USA
| | - Yuxuan Richard Xie
- Department of Bioengineering and Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jonas Honer
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Wibke Deisting
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Jonas Anlahr
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Anja Hoenig
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Julia Czwikla
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Eva Sienz-Widmann
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Doris Rau
- Therapeutic Discovery, Amgen Research (Munich) GmbH, Munich, Germany
| | - Austin J. Rice
- Therapeutic Discovery, Amgen Research, Amgen Inc, Thousand Oaks, CA, USA
| | - Timothy P. Riley
- Therapeutic Discovery, Amgen Research, Amgen Inc, Thousand Oaks, CA, USA
| | - Danqing Li
- Therapeutic Discovery, Amgen Research, Amgen Inc, Thousand Oaks, CA, USA
| | | | | | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Kathy Y. Wei
- Therapeutic Discovery, Amgen Research, Amgen Inc, South San Francisco, CA, USA
| |
Collapse
|
10
|
SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins. Comput Biol Med 2022; 146:105704. [PMID: 35690478 DOI: 10.1016/j.compbiomed.2022.105704] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 05/15/2022] [Accepted: 06/04/2022] [Indexed: 11/22/2022]
Abstract
Thermophilic proteins (TPPs) are important in the field of protein biochemistry and development of new enzymes. Thus, computational methods must be urgently developed to accurately and rapidly identify TPPs. To date, several computational methods have been developed for TPP identification; however, few limitations in terms of performance and utility remain. In this study, we present a novel computational method, SAPPHIRE, to achieve more accurate identification of TPPs using only sequence information without any need for structural information. We combined twelve different feature encodings representing different perspectives and six popular machine learning algorithms to train 72 baseline models and extract the key information of TPPs. Subsequently, the informative predicted probabilities from the baseline models were mined and selected using a genetic algorithm in conjunction with a self-assessment-report approach. Finally, the final meta-predictor, SAPPHIRE, was built and optimized by applying an optimal feature set. The performance of SAPPHIRE in the 10-fold cross-validation test showed that a superior predictive performance compared with several baseline models could be achieved. Moreover, SAPPHIRE yielded an accuracy of 0.942 and Matthew's coefficient correlation of 0.884, which were 7.68 and 5.12% higher than those of the current existing methods, respectively, as indicated by the independent test. The proposed computational approach is anticipated to facilitate large-scale identification of TPPs and accelerate their applications in the food industry. The codes and datasets are available at https://github.com/plenoi/SAPPHIRE.
Collapse
|
11
|
Charoenkwan P, Schaduangrat N, Hasan MM, Moni MA, Lió P, Shoombuatong W. Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins. EXCLI JOURNAL 2022; 21:554-570. [PMID: 35651661 PMCID: PMC9150013 DOI: 10.17179/excli2022-4723] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 02/21/2022] [Indexed: 12/15/2022]
Abstract
Thermophilic proteins (TPPs) are critical for basic research and in the food industry due to their ability to maintain a thermodynamically stable fold at extremely high temperatures. Thus, the expeditious identification of novel TPPs through computational models from protein sequences is very desirable. Over the last few decades, a number of computational methods, especially machine learning (ML)-based methods, for in silico prediction of TPPs have been developed. Therefore, it is desirable to revisit these methods and summarize their advantages and disadvantages in order to further develop new computational approaches to achieve more accurate and improved prediction of TPPs. With this goal in mind, we comprehensively investigate a large collection of fourteen state-of-the-art TPP predictors in terms of their dataset size, feature encoding schemes, feature selection strategies, ML algorithms, evaluation strategies and web server/software usability. To the best of our knowledge, this article represents the first comprehensive review on the development of ML-based methods for in silico prediction of TPPs. Among these TPP predictors, they can be classified into two groups according to the interpretability of ML algorithms employed (i.e., computational black-box methods and computational white-box methods). In order to perform the comparative analysis, we conducted a comparative study on several currently available TPP predictors based on two benchmark datasets. Finally, we provide future perspectives for the design and development of new computational models for TPP prediction. We hope that this comprehensive review will facilitate researchers in selecting an appropriate TPP predictor that is the most suitable one to deal with their purposes and provide useful perspectives for the development of more effective and accurate TPP predictors.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, the University of Queensland, St Lucia, QLD 4072, Australia
| | - Pietro Lió
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| |
Collapse
|
12
|
Ahmed Z, Zulfiqar H, Khan AA, Gul I, Dao FY, Zhang ZY, Yu XL, Tang L. iThermo: A Sequence-Based Model for Identifying Thermophilic Proteins Using a Multi-Feature Fusion Strategy. Front Microbiol 2022; 13:790063. [PMID: 35273581 PMCID: PMC8902591 DOI: 10.3389/fmicb.2022.790063] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 01/10/2022] [Indexed: 01/20/2023] Open
Abstract
Thermophilic proteins have important application value in biotechnology and industrial processes. The correct identification of thermophilic proteins provides important information for the application of these proteins in engineering. The identification method of thermophilic proteins based on biochemistry is laborious, time-consuming, and high cost. Therefore, there is an urgent need for a fast and accurate method to identify thermophilic proteins. Considering this urgency, we constructed a reliable benchmark dataset containing 1,368 thermophilic and 1,443 non-thermophilic proteins. A multi-layer perceptron (MLP) model based on a multi-feature fusion strategy was proposed to discriminate thermophilic proteins from non-thermophilic proteins. On independent data set, the proposed model could achieve an accuracy of 96.26%, which demonstrates that the model has a good application prospect. In order to use the model conveniently, a user-friendly software package called iThermo was established and can be freely accessed at http://lin-group.cn/server/iThermo/index.html. The high accuracy of the model and the practicability of the developed software package indicate that this study can accelerate the discovery and engineering application of thermally stable proteins.
Collapse
Affiliation(s)
- Zahoor Ahmed
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hasan Zulfiqar
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Abdullah Aman Khan
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China.,Sichuan Artificial Intelligence Research Institute, Yibin, China
| | - Ijaz Gul
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Tsinghua Shenzhen International Graduate School, Institute of Biopharmaceutical and Health Engineering, Tsinghua University, Shenzhen, China
| | - Fu-Ying Dao
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhao-Yue Zhang
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xiao-Long Yu
- School of Materials Science and Engineering, Hainan University, Haikou, China
| | - Lixia Tang
- School of Life Sciences and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
13
|
Charoenkwan P, Chotpatiwetchkul W, Lee VS, Nantasenamat C, Shoombuatong W. A novel sequence-based predictor for identifying and characterizing thermophilic proteins using estimated propensity scores of dipeptides. Sci Rep 2021; 11:23782. [PMID: 34893688 PMCID: PMC8664844 DOI: 10.1038/s41598-021-03293-w] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 12/01/2021] [Indexed: 02/08/2023] Open
Abstract
Owing to their ability to maintain a thermodynamically stable fold at extremely high temperatures, thermophilic proteins (TTPs) play a critical role in basic research and a variety of applications in the food industry. As a result, the development of computation models for rapidly and accurately identifying novel TTPs from a large number of uncharacterized protein sequences is desirable. In spite of existing computational models that have already been developed for characterizing thermophilic proteins, their performance and interpretability remain unsatisfactory. We present a novel sequence-based thermophilic protein predictor, termed SCMTPP, for improving model predictability and interpretability. First, an up-to-date and high-quality dataset consisting of 1853 TPPs and 3233 non-TPPs was compiled from published literature. Second, the SCMTPP predictor was created by combining the scoring card method (SCM) with estimated propensity scores of g-gap dipeptides. Benchmarking experiments revealed that SCMTPP had a cross-validation accuracy of 0.883, which was comparable to that of a support vector machine-based predictor (0.906-0.910) and 2-17% higher than that of commonly used machine learning models. Furthermore, SCMTPP outperformed the state-of-the-art approach (ThermoPred) on the independent test dataset, with accuracy and MCC of 0.865 and 0.731, respectively. Finally, the SCMTPP-derived propensity scores were used to elucidate the critical physicochemical properties for protein thermostability enhancement. In terms of interpretability and generalizability, comparative results showed that SCMTPP was effective for identifying and characterizing TPPs. We had implemented the proposed predictor as a user-friendly online web server at http://pmlabstack.pythonanywhere.com/SCMTPP in order to allow easy access to the model. SCMTPP is expected to be a powerful tool for facilitating community-wide efforts to identify TPPs on a large scale and guiding experimental characterization of TPPs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- grid.7132.70000 0000 9039 7662Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200 Thailand
| | - Warot Chotpatiwetchkul
- grid.419784.70000 0001 0816 7508Applied Computational Chemistry Research Unit, Department of Chemistry, School of Science, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, 10520 Thailand
| | - Vannajan Sanghiran Lee
- grid.10347.310000 0001 2308 5949Department of Chemistry, Centre of Theoretical and Computational Physics, Faculty of Science, University of Malaya, 50603 Kuala Lumpur, Malaysia
| | - Chanin Nantasenamat
- grid.10223.320000 0004 1937 0490Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700 Thailand
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
14
|
Foroozandeh Shahraki M, Farhadyar K, Kavousi K, Azarabad MH, Boroomand A, Ariaeenejad S, Hosseini Salekdeh G. A generalized machine-learning aided method for targeted identification of industrial enzymes from metagenome: A xylanase temperature dependence case study. Biotechnol Bioeng 2020; 118:759-769. [PMID: 33095441 DOI: 10.1002/bit.27608] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Revised: 09/23/2020] [Accepted: 10/11/2020] [Indexed: 11/08/2022]
Abstract
Growing industrial utilization of enzymes and the increasing availability of metagenomic data highlight the demand for effective methods of targeted identification and verification of novel enzymes from various environmental microbiota. Xylanases are a class of enzymes with numerous industrial applications and are involved in the degradation of xylose, a component of lignocellulose. The optimum temperature of enzymes is an essential factor to be considered when choosing appropriate biocatalysts for a particular purpose. Therefore, in silico prediction of this attribute is a significant cost and time-effective step in the effort to characterize novel enzymes. The objective of this study was to develop a computational method to predict the thermal dependence of xylanases. This tool was then implemented for targeted screening of putative xylanases with specific thermal dependencies from metagenomic data and resulted in the identification of three novel xylanases from sheep and cow rumen microbiota. Here we present thermal activity prediction for xylanase, a new sequence-based machine learning method that has been trained using a selected combination of various protein features. This random forest classifier discriminates non-thermophilic, thermophilic, and hyper-thermophilic xylanases. The model's performance was evaluated through multiple iterations of sixfold cross-validations as well as holdout tests, and it is freely accessible as a web-service at arimees.com.
Collapse
Affiliation(s)
- Mehdi Foroozandeh Shahraki
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Kiana Farhadyar
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Kaveh Kavousi
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Mohammad H Azarabad
- Laboratory of Complex Biological Systems and Bioinformatics (CBB), Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
| | - Amin Boroomand
- School of Natural Sciences, University of California Merced, Merced, California, USA
| | - Shohreh Ariaeenejad
- Department of Systems and Synthetic Biology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREEO), Karaj, Iran
| | - Ghasem Hosseini Salekdeh
- Department of Systems and Synthetic Biology, Agricultural Biotechnology Research Institute of Iran (ABRII), Agricultural Research Education and Extension Organization (AREEO), Karaj, Iran.,Department of Molecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| |
Collapse
|
15
|
Guo Z, Wang P, Liu Z, Zhao Y. Discrimination of Thermophilic Proteins and Non-thermophilic Proteins Using Feature Dimension Reduction. Front Bioeng Biotechnol 2020; 8:584807. [PMID: 33195148 PMCID: PMC7642589 DOI: 10.3389/fbioe.2020.584807] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Accepted: 09/11/2020] [Indexed: 01/19/2023] Open
Abstract
Thermophilicity is a very important property of proteins, as it sometimes determines denaturation and cell death. Thus, methods for predicting thermophilic proteins and non-thermophilic proteins are of interest and can contribute to the design and engineering of proteins. In this article, we describe the use of feature dimension reduction technology and LIBSVM to identify thermophilic proteins. The highest accuracy obtained by cross-validation was 96.02% with 119 parameters. When using only 16 features, we obtained an accuracy of 93.33%. We discuss the importance of the different characteristics in identification and report a comparison of the performance of support vector machine to that of other methods.
Collapse
Affiliation(s)
- Zifan Guo
- School of Aeronautics and Astronautic, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Zhendong Liu
- School of Computer Science and Technology, Shandong Jianzhu University, Jinan, China
| | - Yuming Zhao
- Information and Computer Engineering College, Northeast Forestry University, Harbin, China
| |
Collapse
|
16
|
Abstract
Background:
Thermophilic proteins can maintain good activity under high temperature,
therefore, it is important to study thermophilic proteins for the thermal stability of proteins.
Objective:
In order to solve the problem of low precision and low efficiency in predicting
thermophilic proteins, a prediction method based on feature fusion and machine learning was
proposed in this paper.
Methods:
For the selected thermophilic data sets, firstly, the thermophilic protein sequence was
characterized based on feature fusion by the combination of g-gap dipeptide, entropy density and
autocorrelation coefficient. Then, Kernel Principal Component Analysis (KPCA) was used to reduce
the dimension of the expressed protein sequence features in order to reduce the training time and
improve efficiency. Finally, the classification model was designed by using the classification
algorithm.
Results:
A variety of classification algorithms was used to train and test on the selected thermophilic
dataset. By comparison, the accuracy of the Support Vector Machine (SVM) under the jackknife
method was over 92%. The combination of other evaluation indicators also proved that the SVM
performance was the best.
Conclusion:
Because of choosing an effectively feature representation method and a robust
classifier, the proposed method is suitable for predicting thermophilic proteins and is superior to
most reported methods.
Collapse
Affiliation(s)
- Xian-Fang Wang
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Peng Gao
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Yi-Feng Liu
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Hong-Fei Li
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| | - Fan Lu
- School of Computer and Information Engineering, Henan Normal University, Henan, China
| |
Collapse
|
17
|
Gado JE, Beckham GT, Payne CM. Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and Ensemble Learning. J Chem Inf Model 2020; 60:4098-4107. [DOI: 10.1021/acs.jcim.0c00489] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Affiliation(s)
- Japheth E. Gado
- Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States
- National Bioenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United States
| | - Gregg T. Beckham
- National Bioenergy Center, National Renewable Energy Laboratory, Golden, Colorado 80401, United States
| | - Christina M. Payne
- Department of Chemical and Materials Engineering, University of Kentucky, Lexington, Kentucky 40506, United States
| |
Collapse
|
18
|
Kumar S, Dangi AK, Shukla P, Baishya D, Khare SK. Thermozymes: Adaptive strategies and tools for their biotechnological applications. BIORESOURCE TECHNOLOGY 2019; 278:372-382. [PMID: 30709766 DOI: 10.1016/j.biortech.2019.01.088] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 01/19/2019] [Accepted: 01/21/2019] [Indexed: 05/10/2023]
Abstract
In today's scenario of global climate change, there is a colossal demand for sustainable industrial processes and enzymes from thermophiles. Plausibly, thermozymes are an important toolkit, as they are known to be polyextremophilic in nature. Small genome size and diverse molecular conformational modifications have been implicated in devising adaptive strategies. Besides, the utilization of chemical technology and gene editing attributions according to mechanical necessities are the additional key factor for efficacious bioprocess development. Microbial thermozymes have been extensively used in waste management, biofuel, food, paper, detergent, medicinal and pharmaceutical industries. To understand the strength of enzymes at higher temperatures different models utilize X-ray structures of thermostable proteins, machine learning calculations, neural networks, but unified adaptive measures are yet to be totally comprehended. The present review provides a recent updates on thermozymes and various interdisciplinary applications including the aspects of thermophiles bioengineering utilizing synthetic biology and gene editing tools.
Collapse
Affiliation(s)
- Sumit Kumar
- Enzyme and Microbial Biochemistry Laboratory, Department of Chemistry, Indian Institute of Technology Delhi, New Delhi 110016, India
| | - Arun K Dangi
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
| | - Pratyoosh Shukla
- Enzyme Technology and Protein Bioinformatics Laboratory, Department of Microbiology, Maharshi Dayanand University, Rohtak, India
| | - Debabrat Baishya
- Department of Bioengineering and Technology, Institute of Science and Technology, Gauhati University, Guwahati 781014, Assam, India
| | - Sunil K Khare
- Enzyme and Microbial Biochemistry Laboratory, Department of Chemistry, Indian Institute of Technology Delhi, New Delhi 110016, India.
| |
Collapse
|
19
|
Volkening JD, Stecker KE, Sussman MR. Proteome-wide Analysis of Protein Thermal Stability in the Model Higher Plant Arabidopsis thaliana. Mol Cell Proteomics 2019; 18:308-319. [PMID: 30401684 PMCID: PMC6356070 DOI: 10.1074/mcp.ra118.001124] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Indexed: 12/16/2022] Open
Abstract
Modern tandem MS-based sequencing technologies allow for the parallel measurement of concentration and covalent modifications for proteins within a complex sample. Recently, this capability has been extended to probe a proteome's three-dimensional structure and conformational state by determining the thermal denaturation profile of thousands of proteins simultaneously. Although many animals and their resident microbes exist under a relatively narrow, regulated physiological temperature range, plants take on the often widely ranging temperature of their surroundings, possibly influencing the evolution of protein thermal stability. In this report we present the first in-depth look at the thermal proteome of a plant species, the model organism Arabidopsis thaliana By profiling the melting curves of over 1700 Arabidopsis proteins using six biological replicates, we have observed significant correlation between protein thermostability and several known protein characteristics, including molecular weight and the composition ratio of charged to polar amino acids. We also report on a divergence of the thermostability of the core and regulatory domains of the plant 26S proteasome that may reflect a unique property of the way protein turnover is regulated during temperature stress. Lastly, the highly replicated database of Arabidopsis melting temperatures reported herein provides baseline data on the variability of protein behavior in the assay. Unfolding behavior and experiment-to-experiment variability were observed to be protein-specific traits, and thus this data can serve to inform the design and interpretation of future targeted assays to probe the conformational status of proteins from plants exposed to different chemical, environmental and genetic challenges.
Collapse
Affiliation(s)
- Jeremy D Volkening
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706
| | - Kelly E Stecker
- Biomolecular Mass Spectrometry and Proteomics, Utrecht University, Utrecht, Netherlands
| | - Michael R Sussman
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706;.
| |
Collapse
|
20
|
Panja AS, Nag A, Bandopadhyay B, Maiti S. Protein Stability Determination (PSD): A Tool for Proteomics Analysis. Curr Bioinform 2018. [DOI: 10.2174/1574893613666180315121614] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein Stability Determination (PSD) is a sequence-based bioinformatics tool which was developed by utilizing a large input of datasets of protein sequences in FASTA format. The PSD can be used to analyze the meta-proteomics data which will help to predict and design thermozyme and mesozyme for academic and industrial purposes. The PSD also can be utilized to analyze the protein sequence and to predict whether it will be stable in thermophilic or in the mesophilic environment. </P><P> Method and Results: This tool which is supported by any operating system is designed in Java and it provides a user-friendly graphical interface. It is a simple programme and can predict the thermostability nature of proteins with >90% accuracy. The PSD can also predict the nature of constituent amino acids i.e. acidic or basic and polar or nonpolar etc.Conclusion:PSD is highly capable to determine the thermostability status of a protein of hypothetical or unknown peptides as well as meta-proteomics data from any established database. The utilities of the PSD driven analyses include predictions on the functional assignment to a protein. The PSD also helps in designing peptides having flexible combinations of amino acids for functional stability. PSD is freely available at https://sourceforge.net/projects/protein-sequence-determination.
Collapse
Affiliation(s)
- Anindya Sundar Panja
- Post Graduate Department of Biotechnology, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore-721102, West Bengal, India
| | - Akash Nag
- Department of Computer science, University of Burdwan, India
| | - Bidyut Bandopadhyay
- Post Graduate Department of Biotechnology, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore-721102, West Bengal, India
| | - Smarajit Maiti
- Post Graduate Department of Biochemistry and Biotechnology, Cell and Molecular Therapeutics Laboratory, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore-721102, West Bengal, India
| |
Collapse
|
21
|
Tang H, Cao RZ, Wang W, Liu TS, Wang LM, He CM. A two-step discriminated method to identify thermophilic proteins. INT J BIOMATH 2017. [DOI: 10.1142/s1793524517500504] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Improving thermostability of an enzyme can accelerate the relevant chemical reaction. Thus, the analysis and prediction of thermophilic proteins are conducive to protein engineering and enzyme design. In this study, a novel method based on two-step discrimination was proposed to distinguish between thermophilic and non-thermophilic proteins. The model was rigorously benchmarked on an objective dataset including 915 thermophilic proteins and 793 non-thermophilic proteins. Results showed that the overall accuracy of our method is 94.44% in 5-fold cross-validation, which is higher than those of other published methods. We believe that the two-step discriminated strategy will become a promising method in the relevant field of protein bioinformatics.
Collapse
Affiliation(s)
- Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou 646000, P. R. China
| | - Ren-Zhi Cao
- Computer Science Department, Pacific Lutheran University, Tacoma WA 98447, USA
| | - Wen Wang
- Computer Science Department, Pacific Lutheran University, Tacoma WA 98447, USA
| | - Tie-Shan Liu
- Maize Institute, Shandong Academy of Agricultural Science, Jinan 250100, P. R. China
| | - Li-Ming Wang
- Maize Institute, Shandong Academy of Agricultural Science, Jinan 250100, P. R. China
| | - Chun-Mei He
- Maize Institute, Shandong Academy of Agricultural Science, Jinan 250100, P. R. China
| |
Collapse
|
22
|
Fan GL, Liu YL, Wang H. Identification of thermophilic proteins by incorporating evolutionary and acid dissociation information into Chou's general pseudo amino acid composition. J Theor Biol 2016; 407:138-142. [DOI: 10.1016/j.jtbi.2016.07.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Revised: 06/24/2016] [Accepted: 07/07/2016] [Indexed: 10/21/2022]
|
23
|
Su JG, Han XM, Zhao SX, Hou YX, Li XY, Qi LS, Wang JH. Impacts of the charged residues mutation S48E/N62H on the thermostability and unfolding behavior of cold shock protein: insights from molecular dynamics simulation with Gō model. J Mol Model 2016; 22:91. [DOI: 10.1007/s00894-016-2958-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Accepted: 03/07/2016] [Indexed: 10/22/2022]
|
24
|
Contribution of main chain and side chain atoms and their locations to the stability of thermophilic proteins. J Mol Graph Model 2016; 64:85-93. [DOI: 10.1016/j.jmgm.2016.01.001] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 01/03/2016] [Indexed: 11/21/2022]
|
25
|
Insights into the molecular basis of piezophilic adaptation: Extraction of piezophilic signatures. J Theor Biol 2015; 390:117-26. [PMID: 26656108 DOI: 10.1016/j.jtbi.2015.11.021] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2015] [Revised: 11/06/2015] [Accepted: 11/21/2015] [Indexed: 11/20/2022]
Abstract
Piezophiles are the organisms which can successfully survive at extreme pressure conditions. However, the molecular basis of piezophilic adaptation is still poorly understood. Analysis of the protein sequence adjustments that had taken place during evolution can help to reveal the sequence adaptation parameters responsible for protein functional and structural adaptation at such high pressure conditions. In this current work we have used SVM classifier for filtering strong instances and generated human interpretable rules from these strong instances by using the PART algorithm. These generated rules were analyzed for getting insights into the molecular signature patterns present in the piezophilic proteins. The experiments were performed on three different temperature ranges piezophilic groups, namely psychrophilic-piezophilic, mesophilic-piezophilic, and thermophilic-piezophilic for the detailed comparative study. The best classification results were obtained as we move up the temperature range from psychrophilic-piezophilic to thermophilic-piezophilic. Based on the physicochemical classification of amino acids and using feature ranking algorithms, hydrophilic and polar amino acid groups have higher discriminative ability for psychrophilic-piezophilic and mesophilic-piezophilic groups along with hydrophobic and nonpolar amino acids for the thermophilic-piezophilic groups. We also observed an overrepresentation of polar, hydrophilic and small amino acid groups in the discriminatory rules of all the three temperature range piezophiles along with aliphatic, nonpolar and hydrophobic groups in the mesophilic-piezophilic and thermophilic-piezophilic groups.
Collapse
|
26
|
Nagarajan R, Chothani SP, Ramakrishnan C, Sekijima M, Gromiha MM. Structure based approach for understanding organism specific recognition of protein-RNA complexes. Biol Direct 2015; 10:8. [PMID: 25886642 PMCID: PMC4352265 DOI: 10.1186/s13062-015-0039-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 02/03/2015] [Indexed: 12/11/2022] Open
Abstract
Background Protein-RNA interactions perform diverse functions within the cell. Understanding the recognition mechanism of protein-RNA complexes has been a challenging task in molecular and computational biology. In earlier works, the recognition mechanisms have been studied for a specific complex or using a set of non–redundant complexes. In this work, we have constructed 18 sets of same protein-RNA complexes belonging to different organisms from Protein Data Bank (PDB). The similarities and differences in each set of complexes have been revealed in terms of various sequence and structure based features such as root mean square deviation, sequence homology, propensity of binding site residues, variance, conservation at binding sites, binding segments, binding motifs of amino acid residues and nucleotides, preferred amino acid-nucleotide pairs and influence of neighboring residues for binding. Results We found that the proteins of mesophilic organisms have more number of binding sites than thermophiles and the binding propensities of amino acid residues are distinct in E. coli, H. sapiens, S. cerevisiae, thermophiles and archaea. Proteins prefer to bind with RNA using a single residue segment in all the organisms while RNA prefers to use a stretch of up to six nucleotides for binding with proteins. We have developed amino acid residue-nucleotide pair potentials for different organisms, which could be used for predicting the binding specificity. Further, molecular dynamics simulation studies on aspartyl tRNA synthetase complexed with aspartyl tRNA showed specific modes of recognition in E. coli, T. thermophilus and S. cerevisiae. Conclusion Based on structural analysis and molecular dynamics simulations we suggest that the mode of recognition depends on the type of the organism in a protein-RNA complex. Reviewers This article was reviewed by Sandor Pongor, Gajendra Raghava and Narayanaswamy Srinivasan. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0039-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Raju Nagarajan
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| | - Sonia Pankaj Chothani
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India. .,Philips Research North America, 345 Scarborough Road, Briarcliff Manor, NY, 10510, USA.
| | - Chandrasekaran Ramakrishnan
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| | - Masakazu Sekijima
- Global Scientific Information and Computing Center (GSIC), Tokyo Institute of Technology, 2-12-1 Ookayama, Meguro-ku, Tokyo, 152-8550, Japan.
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat Jyoti Metha School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India.
| |
Collapse
|
27
|
Development of a machine learning method to predict membrane protein-ligand binding residues using basic sequence information. Adv Bioinformatics 2015; 2015:843030. [PMID: 25802517 PMCID: PMC4329842 DOI: 10.1155/2015/843030] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2014] [Revised: 01/07/2015] [Accepted: 01/08/2015] [Indexed: 12/26/2022] Open
Abstract
Locating ligand binding sites and finding the functionally important residues from protein sequences as well as structures became one of the challenges in understanding their function. Hence a Naïve Bayes classifier has been trained to predict whether a given amino acid residue in membrane protein sequence is a ligand binding residue or not using only sequence based information. The input to the classifier consists of the features of the target residue and two sequence neighbors on each side of the target residue. The classifier is trained and evaluated on a nonredundant set of 42 sequences (chains with at least one transmembrane domain) from 31 alpha-helical membrane proteins. The classifier achieves an overall accuracy of 70.7% with 72.5% specificity and 61.1% sensitivity in identifying ligand binding residues from sequence. The classifier performs better when the sequence is encoded by psi-blast generated PSSM profiles. Assessment of the predictions in the context of three-dimensional structures of proteins reveals the effectiveness of this method in identifying ligand binding sites from sequence information. In 83.3% (35 out of 42) of the proteins, the classifier identifies the ligand binding sites by correctly recognizing more than half of the binding residues. This will be useful to protein engineers in exploiting potential residues for functional assessment.
Collapse
|
28
|
Nath A, Subbiah K. Inferring biological basis about psychrophilicity by interpreting the rules generated from the correctly classified input instances by a classifier. Comput Biol Chem 2014; 53PB:198-203. [PMID: 25462328 DOI: 10.1016/j.compbiolchem.2014.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Revised: 09/02/2014] [Accepted: 10/06/2014] [Indexed: 11/19/2022]
Abstract
Organisms thriving at extreme cold surroundings are called as psychrophiles and they present a wealth of knowledge about sequence adjustments in proteins that had occurred during the adaptation to low temperatures. In this paper, we propose a new cascading model to investigate the basis for psychrophilicity. In this model, a superior classifier was used to discriminate psychrophilic from mesophilic protein sequences, and then the PART rule generating algorithm was applied on the input instances that are correctly classified by the classifier, to generate human interpretable rules. These derived rules were further validated on a structural dataset and finally analyzed to discover the underlying biological basis about the psychrophilicity. In this study, we have used one of the key features of psychrophilic proteins accountable for remaining functional in extreme cold temperature surroundings i.e., global patterns of amino acid composition as the input features. The rotation forest classifier outperformed all the other classifiers with maximum accuracy of 70.5% and maximum AUC of 0.78. The effect of sequence length on the classification accuracy was also investigated. The analysis of the derived rules and interpretation of the analyzed results had revealed some interesting phenomena such as the amino acids A, D, G, F, and S are over-represented, and T is under-represented in psychrophilic proteins. These findings augment the existing domain knowledge for psychrophilic sequence features.
Collapse
Affiliation(s)
- Abhigyan Nath
- Bioinformatics Section, Mahila Mahavidyalaya, Banaras Hindu University, Varanasi 221005, India.
| | - Karthikeyan Subbiah
- Department of Computer Science, Banaras Hindu University, Varanasi 221005, India.
| |
Collapse
|
29
|
Prediction of the determinants of thermal stability by linear discriminant analysis: the case of the glutamate dehydrogenase protein family. J Theor Biol 2014; 357:160-8. [PMID: 24853273 DOI: 10.1016/j.jtbi.2014.05.013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Revised: 05/07/2014] [Accepted: 05/08/2014] [Indexed: 11/21/2022]
Abstract
Little is known about the determinants of thermal stability in individual protein families. Most of the knowledge on thermostability comes, in fact, from comparative analyses between large, and heterogeneous, sets of thermo- and mesophilic proteins. Here, we present a multivariate statistical approach aimed to detect signature sequences for thermostability in a single protein family. It was applied to the glutamate dehydrogenase (GDH) family, which is a good model for investigating this peculiar process. The structure of GDH consists of six subunits, each of them organized into two domains. Formation of ion-pair networks on the surface of the protein subunits, or increase in the inter-subunit hydrophobic interactions, have been suggested as important factors for explaining stability at high temperatures. However, identification of the amino acid changes that are involved in this process still remains elusive. Our approach consisted of a linear discriminant analysis on a set of GDH sequences from Archaea and Bacteria (33 thermo- and 36 mesophilic GDHs). It led to detection of 3 amino acid clusters as the putative determinants of thermal stability. They were localized at the subunit interface or in close proximity to the binding site of the NAD(P)(+) coenzyme. Analysis within the clusters led to prediction of 8 critical amino acid sites. This approach could have a wide utility, in the ligth of the notion that each protein family seems to adopt its own strategy for achieving thermostability.
Collapse
|
30
|
Wang L, Li C. Optimal subset selection of primary sequence features using the genetic algorithm for thermophilic proteins identification. Biotechnol Lett 2014; 36:1963-9. [PMID: 24930111 DOI: 10.1007/s10529-014-1577-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2014] [Accepted: 05/28/2014] [Indexed: 10/25/2022]
Abstract
A genetic algorithm (GA) coupled with multiple linear regression (MLR) was used to extract useful features from amino acids and g-gap dipeptides for distinguishing between thermophilic and non-thermophilic proteins. The method was trained by a benchmark dataset of 915 thermophilic and 793 non-thermophilic proteins. The method reached an overall accuracy of 95.4 % in a Jackknife test using nine amino acids, 38 0-gap dipeptides and 29 1-gap dipeptides. The accuracy as a function of protein size ranged between 85.8 and 96.9 %. The overall accuracies of three independent tests were 93, 93.4 and 91.8 %. The observed results of detecting thermophilic proteins suggest that the GA-MLR approach described herein should be a powerful method for selecting features that describe thermostabile machines and be an aid in the design of more stable proteins.
Collapse
Affiliation(s)
- LiQiang Wang
- Department of Biochemistry and Molecular Biology, College of Life Science, Nankai University, Weijin Road 94, Tianjin, 300071, China,
| | | |
Collapse
|
31
|
Ebrahimi M, Aghagolzadeh P, Shamabadi N, Tahmasebi A, Alsharifi M, Adelson DL, Hemmatzadeh F, Ebrahimie E. Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein. PLoS One 2014; 9:e96984. [PMID: 24809455 PMCID: PMC4014573 DOI: 10.1371/journal.pone.0096984] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 04/07/2014] [Indexed: 01/05/2023] Open
Abstract
The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.
Collapse
Affiliation(s)
- Mansour Ebrahimi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | - Parisa Aghagolzadeh
- Department of Nephrology, Hypertension, and Clinical Pharmacology, University of Bern, Bern, Switzerland
| | - Narges Shamabadi
- Department of Biology, School of Basic Sciences, University of Qom, Qom, Iran
| | | | - Mohammed Alsharifi
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
| | - David L. Adelson
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
| | - Farhid Hemmatzadeh
- School of Animal and Veterinary Science, The University of Adelaide, Adelaide, Australia
- * E-mail: (FH); (EE)
| | - Esmaeil Ebrahimie
- School of Molecular and Biomedical Science, The University of Adelaide, Adelaide, Australia
- * E-mail: (FH); (EE)
| |
Collapse
|
32
|
AcalPred: a sequence-based tool for discriminating between acidic and alkaline enzymes. PLoS One 2013; 8:e75726. [PMID: 24130738 PMCID: PMC3794003 DOI: 10.1371/journal.pone.0075726] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Accepted: 08/16/2013] [Indexed: 11/19/2022] Open
Abstract
The structure and activity of enzymes are influenced by pH value of their surroundings. Although many enzymes work well in the pH range from 6 to 8, some specific enzymes have good efficiencies only in acidic (pH<5) or alkaline (pH>9) solution. Studies have demonstrated that the activities of enzymes correlate with their primary sequences. It is crucial to judge enzyme adaptation to acidic or alkaline environment from its amino acid sequence in molecular mechanism clarification and the design of high efficient enzymes. In this study, we developed a sequence-based method to discriminate acidic enzymes from alkaline enzymes. The analysis of variance was used to choose the optimized discriminating features derived from g-gap dipeptide compositions. And support vector machine was utilized to establish the prediction model. In the rigorous jackknife cross-validation, the overall accuracy of 96.7% was achieved. The method can correctly predict 96.3% acidic and 97.1% alkaline enzymes. Through the comparison between the proposed method and previous methods, it is demonstrated that the proposed method is more accurate. On the basis of this proposed method, we have built an online web-server called AcalPred which can be freely accessed from the website (http://lin.uestc.edu.cn/server/AcalPred). We believe that the AcalPred will become a powerful tool to study enzyme adaptation to acidic or alkaline environment.
Collapse
|
33
|
Abstract
Background: Prediction of the optimal habitat conditions for a given bacterium, based on genome sequence alone would be of value for scientific as well as industrial purposes. One example of such a habitat adaptation is the requirement for oxygen. In spite of good genome data availability, there have been only a few prediction attempts of bacterial oxygen requirements, using genome sequences. Here, we describe a method for distinguishing aerobic, anaerobic and facultative anaerobic bacteria, based on genome sequence-derived input, using naive Bayesian inference. In contrast, other studies found in literature only demonstrate the ability to distinguish two classes at a time. Results: The results shown in the present study are as good as or better than comparable methods previously described in the scientific literature, with an arguably simpler method, when results are directly compared. This method further compares the performance of a single-step naive Bayesian prediction of the three included classifications, compared to a simple Bayesian network with two steps. A two-step network, distinguishing first respiring from non-respiring organisms, followed by the distinction of aerobe and facultative anaerobe organisms within the respiring group, is found to perform best. Conclusions: A simple naive Bayesian network based on the presence or absence of specific protein domains within a genome is an effective and easy way to predict bacterial habitat preferences, such as oxygen requirement.
Collapse
Affiliation(s)
- Dan B Jensen
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - David W Ussery
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark; Comparative Genomics Group, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
| |
Collapse
|
34
|
Feng PM, Ding H, Chen W, Lin H. Naïve Bayes classifier with feature selection to identify phage virion proteins. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2013; 2013:530696. [PMID: 23762187 PMCID: PMC3671239 DOI: 10.1155/2013/530696] [Citation(s) in RCA: 107] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 03/10/2013] [Revised: 04/16/2013] [Accepted: 04/28/2013] [Indexed: 12/31/2022]
Abstract
Knowledge about the protein composition of phage virions is a key step to understand the functions of phage virion proteins. However, the experimental method to identify virion proteins is time consuming and expensive. Thus, it is highly desirable to develop novel computational methods for phage virion protein identification. In this study, a Naïve Bayes based method was proposed to predict phage virion proteins using amino acid composition and dipeptide composition. In order to remove redundant information, a novel feature selection technique was employed to single out optimized features. In the jackknife test, the proposed method achieved an accuracy of 79.15% for phage virion and nonvirion proteins classification, which are superior to that of other state-of-the-art classifiers. These results indicate that the proposed method could be as an effective and promising high-throughput method in phage proteomics research.
Collapse
Affiliation(s)
- Peng-Mian Feng
- School of Public Health, Hebei United University, Tangshan 063000, China
| | - Hui Ding
- Key Laboratory for Neuroinformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Wei Chen
- Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China
| | - Hao Lin
- Key Laboratory for Neuroinformation of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
35
|
Gromiha MM, Pathak MC, Saraboji K, Ortlund EA, Gaucher EA. Hydrophobic environment is a key factor for the stability of thermophilic proteins. Proteins 2013; 81:715-21. [PMID: 23319168 DOI: 10.1002/prot.24232] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2012] [Revised: 11/16/2012] [Accepted: 11/28/2012] [Indexed: 11/07/2022]
Affiliation(s)
- M Michael Gromiha
- Department of Biotechnology, Indian Institute of Technology Madras, Chennai 600036, Tamilnadu, India.
| | | | | | | | | |
Collapse
|
36
|
Lei JB, Yin JB, Shen HB. GFO: A data driven approach for optimizing the Gaussian function based similarity metric in computational biology. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2012.07.003] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
37
|
Jensen DB, Vesth TC, Hallin PF, Pedersen AG, Ussery DW. Bayesian prediction of bacterial growth temperature range based on genome sequences. BMC Genomics 2012; 13 Suppl 7:S3. [PMID: 23282160 PMCID: PMC3521210 DOI: 10.1186/1471-2164-13-s7-s3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Background The preferred habitat of a given bacterium can provide a hint of which types of enzymes of potential industrial interest it might produce. These might include enzymes that are stable and active at very high or very low temperatures. Being able to accurately predict this based on a genomic sequence, would thus allow for an efficient and targeted search for production organisms, reducing the need for culturing experiments. Results This study found a total of 40 protein families useful for distinction between three thermophilicity classes (thermophiles, mesophiles and psychrophiles). The predictive performance of these protein families were compared to those of 87 basic sequence features (relative use of amino acids and codons, genomic and 16S rDNA AT content and genome size). When using naïve Bayesian inference, it was possible to correctly predict the optimal temperature range with a Matthews correlation coefficient of up to 0.68. The best predictive performance was always achieved by including protein families as well as structural features, compared to either of these alone. A dedicated computer program was created to perform these predictions. Conclusions This study shows that protein families associated with specific thermophilicity classes can provide effective input data for thermophilicity prediction, and that the naïve Bayesian approach is effective for such a task. The program created for this study is able to efficiently distinguish between thermophilic, mesophilic and psychrophilic adapted bacterial genomes.
Collapse
Affiliation(s)
- Dan B Jensen
- Technical University of Denmark, Center for Systems Biology, Denmark.
| | | | | | | | | |
Collapse
|
38
|
Gaspar ME, Csermely P. Rigidity and flexibility of biological networks. Brief Funct Genomics 2012; 11:443-56. [DOI: 10.1093/bfgp/els023] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
|
39
|
Zuo YC, Chen W, Fan GL, Li QZ. A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins. Amino Acids 2012; 44:573-80. [PMID: 22851052 DOI: 10.1007/s00726-012-1374-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 07/17/2012] [Indexed: 11/25/2022]
Abstract
The successful prediction of thermophilic proteins is useful for designing stable enzymes that are functional at high temperature. We have used the increment of diversity (ID), a novel amino acid composition-based similarity distance, in a 2-class K-nearest neighbor classifier to classify thermophilic and mesophilic proteins. And the KNN-ID classifier was successfully developed to predict the thermophilic proteins. Instead of extracting features from protein sequences as done previously, our approach was based on a diversity measure of symbol sequences. The similarity distance between each pair of protein sequences was first calculated to quantitatively measure the similarity level of one given sequence and the other. The query protein is then determined using the K-nearest neighbor algorithm. Comparisons with multiple recently published methods showed that the KNN-ID proposed in this study outperforms the other methods. The improved predictive performance indicated it is a simple and effective classifier for discriminating thermophilic and mesophilic proteins. At last, the influence of protein length and protein identity on prediction accuracy was discussed further. The prediction model and dataset used in this article can be freely downloaded from http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm .
Collapse
Affiliation(s)
- Yong-Chun Zuo
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
| | | | | | | |
Collapse
|
40
|
Hosseinzadeh F, Ebrahimi M, Goliaei B, Shamabadi N. Classification of lung cancer tumors based on structural and physicochemical properties of proteins by bioinformatics models. PLoS One 2012; 7:e40017. [PMID: 22829872 PMCID: PMC3400626 DOI: 10.1371/journal.pone.0040017] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Accepted: 05/30/2012] [Indexed: 12/03/2022] Open
Abstract
Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis) investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms) were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation) shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported.
Collapse
Affiliation(s)
- Faezeh Hosseinzadeh
- Student at Laboratory of Biophysics and Molecular Biology, Institute of Biophysics and Biochemistry, University of Tehran, Tehran, Iran
| | - Mansour Ebrahimi
- Department of Biology at Basic science School & Bioinformatics Research Group, Green Research Center, University of Qom, Qom, Iran
| | - Bahram Goliaei
- Department of Medical Physics, Iran University of Medical Science, Tehran, Iran
| | - Narges Shamabadi
- Bioinformatics Research Group, Green Research Center, University of Qom, Qom, Iran
| |
Collapse
|
41
|
Lu JL, Hu XH, Hu DG. A new hybrid fractal algorithm for predicting thermophilic nucleotide sequences. J Theor Biol 2011; 293:74-81. [PMID: 22001320 DOI: 10.1016/j.jtbi.2011.09.028] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2011] [Revised: 09/23/2011] [Accepted: 09/26/2011] [Indexed: 01/20/2023]
Abstract
Knowledge of thermophilic mechanisms about some organisms whose optimum growth temperature (OGT) ranges from 50 to 80 degree plays a major role in helping design stable proteins. How to predict a DNA sequence to be thermophilic is a long but not fairly resolved problem. Chaos game representation (CGR) can investigate the patterns hiding in DNA sequences, and can visually reveal previously unknown structure. Fractal dimensions are good tools to measure sizes of complex, highly irregular geometric objects. In this paper, we convert every DNA sequence into a high dimensional vector by CGR algorithm and fractal dimension, and then predict the DNA sequence thermostability by these fractal features and support vector machine (SVM). We have conducted experiments on three groups: 17-dimensional vector, 65-dimensional vector, and 257-dimensional vector. Each group is evaluated by the 10-fold cross-validation test. For the results, the group of 257-dimensional vector gets the best results: the average accuracy is 0.9456 and average MCC is 0.8878. The results are also compared with the previous work with single CGR features. The comparison shows the high effectiveness of the new hybrid fractal algorithm.
Collapse
Affiliation(s)
- Jin-Long Lu
- College of Science, Huazhong Agricultural University, Wuhan, PR China
| | | | | |
Collapse
|
42
|
Prediction of thermostability from amino acid attributes by combination of clustering with attribute weighting: a new vista in engineering enzymes. PLoS One 2011; 6:e23146. [PMID: 21853079 PMCID: PMC3154288 DOI: 10.1371/journal.pone.0023146] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2011] [Accepted: 07/06/2011] [Indexed: 11/19/2022] Open
Abstract
The engineering of thermostable enzymes is receiving increased attention. The paper, detergent, and biofuel industries, in particular, seek to use environmentally friendly enzymes instead of toxic chlorine chemicals. Enzymes typically function at temperatures below 60°C and denature if exposed to higher temperatures. In contrast, a small portion of enzymes can withstand higher temperatures as a result of various structural adaptations. Understanding the protein attributes that are involved in this adaptation is the first step toward engineering thermostable enzymes. We employed various supervised and unsupervised machine learning algorithms as well as attribute weighting approaches to find amino acid composition attributes that contribute to enzyme thermostability. Specifically, we compared two groups of enzymes: mesostable and thermostable enzymes. Furthermore, a combination of attribute weighting with supervised and unsupervised clustering algorithms was used for prediction and modelling of protein thermostability from amino acid composition properties. Mining a large number of protein sequences (2090) through a variety of machine learning algorithms, which were based on the analysis of more than 800 amino acid attributes, increased the accuracy of this study. Moreover, these models were successful in predicting thermostability from the primary structure of proteins. The results showed that expectation maximization clustering in combination with uncertainly and correlation attribute weighting algorithms can effectively (100%) classify thermostable and mesostable proteins. Seventy per cent of the weighting methods selected Gln content and frequency of hydrophilic residues as the most important protein attributes. On the dipeptide level, the frequency of Asn-Glu was the key factor in distinguishing mesostable from thermostable enzymes. This study demonstrates the feasibility of predicting thermostability irrespective of sequence similarity and will serve as a basis for engineering thermostable enzymes in the laboratory.
Collapse
|
43
|
Chen SA, Ou YY, Lee TY, Gromiha MM. Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties. ACTA ACUST UNITED AC 2011; 27:2062-7. [PMID: 21653515 DOI: 10.1093/bioinformatics/btr340] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY Transporters are proteins that are involved in the movement of ions or molecules across biological membranes. Currently, our knowledge about the functions of transporters is limited due to the paucity of their 3D structures. Hence, computational techniques are necessary to annotate the functions of transporters. In this work, we focused on an important functional aspect of transporters, namely annotation of targets for transport proteins. We have systematically analyzed four major classes of transporters with different transporter targets: (i) electron, (ii) protein/mRNA, (iii) ion and (iv) others, using amino acid properties. We have developed a radial basis function network-based method for predicting transport targets with amino acid properties and position specific scoring matrix profiles. Our method showed a 10-fold cross-validation accuracy of 90.1, 80.1, 70.3 and 82.3% for electron transporters, protein/mRNA transporters, ion transporters and others, respectively, in a dataset of 543 transporters. We have also evaluated the performance of the method with an independent dataset of 108 proteins and we obtained similar accuracy. We suggest that our method could be an effective tool for functional annotation of transport proteins. AVAILABILITY http://rbf.bioinfo.tw/~sachen/ttrbf.html
Collapse
Affiliation(s)
- Shu-An Chen
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan
| | | | | | | |
Collapse
|
44
|
Nakariyakul S, Liu ZP, Chen L. Detecting thermophilic proteins through selecting amino acid and dipeptide composition features. Amino Acids 2011; 42:1947-53. [DOI: 10.1007/s00726-011-0923-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2011] [Accepted: 04/20/2011] [Indexed: 11/29/2022]
|
45
|
Gorania M, Seker H, Haris PI. Predicting a protein's melting temperature from its amino acid sequence. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2011; 2010:1820-3. [PMID: 21095941 DOI: 10.1109/iembs.2010.5626421] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Melting temperature is an important characteristic feature of a protein and is used for various purposes such as in drug development. Currently protein melting temperature is determined by laboratory methods such as Differential Scanning Calorimetry, Circular Dichroism, Fourier transform infrared spectroscopy and several other methods. These methods are laborious and costly. Therefore, we propose a novel bioinformatics based method for predicting protein melting temperature from amino acid sequence of a protein. This is not only a challenging task but has been previously unexplored. For this study, melting temperature of 230 proteins from a range of organisms was collected along with their sequence information from the published literature. The melting temperature of these proteins represents a very large spectrum and varies between 25°C and 113°C. The protein sequences are then used to derive two sets of sequence-driven features, namely amino acid composition (AAC) and pseudo-amino acid composition (PseudoAAC) to characterise the proteins. In order to predict the melting temperature, two different computational intelligence methods, namely artificial neural networks (ANN) and adaptive network-fuzzy inference system (ANFIS) were utilized. Amongst over 100 different models generated, the ANN produced the best model with the least error (0.01087 for the AAC and 0.01086 for the pseudoAAC). As both feature sets yielded quite similar error and computation of pseudoAAC is costly when compared to that of AAC, traditional AAC seems to be an effective feature set for predicting melting temperature. The results obtained in this study are very promising and, for the first time, shows that the melting temperature of a protein can be predicted from its amino acid sequence only. Therefore, costly lab-based experiments may not be required to measure the melting temperature and the bioinformatics models can help speed up laboratory processes such as in drug development.
Collapse
Affiliation(s)
- Malde Gorania
- Bio-Health Informatics Research Group at the Centre of Computational, Department of Informatics, Faculty of Technology, De Montfort University, UK LE11 9BH.
| | | | | |
Collapse
|
46
|
Discrimination of Golgi type II membrane proteins based on their hydropathy profiles and the amino acid propensities of their transmembrane regions. Biosci Biotechnol Biochem 2011; 75:82-8. [PMID: 21228484 DOI: 10.1271/bbb.100571] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Membrane proteins in the Golgi apparatus play important roles in biological functions, predominantly as catalysts related to post-translational modification of protein oligosaccharides. We succeeded in extracting the characteristics of Golgi type II membrane proteins computationally by comparison with those of Golgi no retention proteins, which are mainly localized in the plasma membrane. Golgi type II membrane proteins were detected by combining hydropathy alignment and a position-specific score matrix of the amino acid propensities around the transmembrane region. We achieved 96.2% sensitivity, 93.5% specificity, and a 0.949 success rate in a self-consistency test. In a 5-fold cross-validation test, 88.0% sensitivity, 85.5% specificity, and a 0.867 success rate were achieved.
Collapse
|
47
|
Dabirmanesh B, Daneshjou S, Sepahi AA, Ranjbar B, Khavari-Nejad RA, Gill P, Heydari A, Khajeh K. Effect of ionic liquids on the structure, stability and activity of two related α-amylases. Int J Biol Macromol 2011; 48:93-7. [DOI: 10.1016/j.ijbiomac.2010.10.001] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2010] [Revised: 10/03/2010] [Accepted: 10/04/2010] [Indexed: 11/25/2022]
|
48
|
Lin H, Chen W. Prediction of thermophilic proteins using feature selection technique. J Microbiol Methods 2010; 84:67-70. [PMID: 21044646 DOI: 10.1016/j.mimet.2010.10.013] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Revised: 10/15/2010] [Accepted: 10/19/2010] [Indexed: 11/16/2022]
Abstract
The thermostability of proteins is particularly relevant for enzyme engineering. Developing a computational method to identify mesophilic proteins would be helpful for protein engineering and design. In this work, we developed support vector machine based method to predict thermophilic proteins using the information of amino acid distribution and selected amino acid pairs. A reliable benchmark dataset including 915 thermophilic proteins and 793 non-thermophilic proteins was constructed for training and testing the proposed models. Results showed that 93.8% thermophilic proteins and 92.7% non-thermophilic proteins could be correctly predicted by using jackknife cross-validation. High predictive successful rate exhibits that this model can be applied for designing stable proteins.
Collapse
Affiliation(s)
- Hao Lin
- Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China.
| | | |
Collapse
|
49
|
Abstract
Neural networks are a class of intelligent learning machines establishing the relationships between descriptors of real-world objects. As optimisation tools they are also a class of computational algorithms implemented using statistical/numerical techniques for parameter estimate, model selection, and generalisation enhancement. In bioinformatics applications, neural networks have played an important role for classification, function approximation, knowledge discovery, and data visualisation. This chapter will focus on supervised neural networks and discuss their applications to bioinformatics.
Collapse
|
50
|
Li Y, Middaugh CR, Fang J. A novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting relative thermostability of protein mutants. BMC Bioinformatics 2010; 11:62. [PMID: 20109199 PMCID: PMC3098108 DOI: 10.1186/1471-2105-11-62] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2009] [Accepted: 01/28/2010] [Indexed: 11/10/2022] Open
Abstract
Background The ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants. Results We report a novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting the relative thermostability of protein mutants. The scoring function was developed based on an elaborate analysis of a set of features calculated or predicted from 540 pairs of hyperthermophilic and mesophilic protein ortholog sequences. It was constructed by a linear combination of ten important features identified by a feature ranking procedure based on the random forest classification algorithm. The weights of these features in the scoring function were fitted by a hill-climbing algorithm. This scoring function has shown an excellent ability to discriminate hyperthermophilic from mesophilic sequences. The prediction accuracies reached 98.9% and 97.3% in discriminating orthologous pairs in training and the holdout testing datasets, respectively. Moreover, the scoring function can distinguish non-homologous sequences with an accuracy of 88.4%. Additional blind tests using two datasets of experimentally investigated mutations demonstrated that the scoring function can be used to predict the relative thermostability of proteins and their mutants at very high accuracies (92.9% and 94.4%). We also developed an amino acid substitution preference matrix between mesophilic and hyperthermophilic proteins, which may be useful in designing more thermostable proteins. Conclusions We have presented a novel scoring function which can distinguish not only HP/MP ortholog pairs, but also non-homologous pairs at high accuracies. Most importantly, it can be used to accurately predict the relative stability of proteins and their mutants, as demonstrated in two blind tests. In addition, the residue substitution preference matrix assembled in this study may reflect the thermal adaptation induced substitution biases. A web server implementing the scoring function and the dataset used in this study are freely available at http://www.abl.ku.edu/thermorank/.
Collapse
Affiliation(s)
- Yunqi Li
- Applied Bioinformatics Laboratory, the University of Kansas, Lawrence, KS 66047, USA
| | | | | |
Collapse
|