1
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
2
|
Iwaniak A, Minkiewicz P, Darewicz M. Bioinformatics and bioactive peptides from foods: Do they work together? ADVANCES IN FOOD AND NUTRITION RESEARCH 2024; 108:35-111. [PMID: 38461003 DOI: 10.1016/bs.afnr.2023.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/11/2024]
Abstract
We live in the Big Data Era which affects many aspects of science, including research on bioactive peptides derived from foods, which during the last few decades have been a focus of interest for scientists. These two issues, i.e., the development of computer technologies and progress in the discovery of novel peptides with health-beneficial properties, are closely interrelated. This Chapter presents the example applications of bioinformatics for studying biopeptides, focusing on main aspects of peptide analysis as the starting point, including: (i) the role of peptide databases; (ii) aspects of bioactivity prediction; (iii) simulation of peptide release from proteins. Bioinformatics can also be used for predicting other features of peptides, including ADMET, QSAR, structure, and taste. To answer the question asked "bioinformatics and bioactive peptides from foods: do they work together?", currently it is almost impossible to find examples of peptide research with no bioinformatics involved. However, theoretical predictions are not equivalent to experimental work and always require critical scrutiny. The aspects of compatibility of in silico and in vitro results are also summarized herein.
Collapse
Affiliation(s)
- Anna Iwaniak
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland.
| | - Piotr Minkiewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| | - Małgorzata Darewicz
- Chair of Food Biochemistry, Faculty of Food Science, University of Warmia and Mazury in Olsztyn, Olsztyn-Kortowo, Poland
| |
Collapse
|
3
|
Shi W, Singha M, Pu L, Srivastava G, Ramanujam J, Brylinski M. GraphSite: Ligand Binding Site Classification with Deep Graph Learning. Biomolecules 2022; 12:biom12081053. [PMID: 36008947 PMCID: PMC9405584 DOI: 10.3390/biom12081053] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/18/2022] [Accepted: 07/20/2022] [Indexed: 12/10/2022] Open
Abstract
The binding of small organic molecules to protein targets is fundamental to a wide array of cellular functions. It is also routinely exploited to develop new therapeutic strategies against a variety of diseases. On that account, the ability to effectively detect and classify ligand binding sites in proteins is of paramount importance to modern structure-based drug discovery. These complex and non-trivial tasks require sophisticated algorithms from the field of artificial intelligence to achieve a high prediction accuracy. In this communication, we describe GraphSite, a deep learning-based method utilizing a graph representation of local protein structures and a state-of-the-art graph neural network to classify ligand binding sites. Using neural weighted message passing layers to effectively capture the structural, physicochemical, and evolutionary characteristics of binding pockets mitigates model overfitting and improves the classification accuracy. Indeed, comprehensive cross-validation benchmarks against a large dataset of binding pockets belonging to 14 diverse functional classes demonstrate that GraphSite yields the class-weighted F1-score of 81.7%, outperforming other approaches such as molecular docking and binding site matching. Further, it also generalizes well to unseen data with the F1-score of 70.7%, which is the expected performance in real-world applications. We also discuss new directions to improve and extend GraphSite in the future.
Collapse
Affiliation(s)
- Wentao Shi
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803, USA; (W.S.); (J.R.)
| | - Manali Singha
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA; (M.S.); (G.S.)
| | - Limeng Pu
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA;
| | - Gopal Srivastava
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA; (M.S.); (G.S.)
| | - Jagannathan Ramanujam
- Division of Electrical and Computer Engineering, Louisiana State University, Baton Rouge, LA 70803, USA; (W.S.); (J.R.)
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA;
| | - Michal Brylinski
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA; (M.S.); (G.S.)
- Center for Computation and Technology, Louisiana State University, Baton Rouge, LA 70803, USA;
- Correspondence: ; Tel.: +1-(225)-578-2791; Fax: +1-(225)-578-2597
| |
Collapse
|
4
|
Yamaguchi S, Nakashima H, Moriwaki Y, Terada T, Shimizu K. Prediction of protein mononucleotide binding sites using AlphaFold2 and machine learning. Comput Biol Chem 2022; 100:107744. [DOI: 10.1016/j.compbiolchem.2022.107744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 07/12/2022] [Accepted: 07/22/2022] [Indexed: 11/26/2022]
|
5
|
Gupta TK, Klumpe S, Gries K, Heinz S, Wietrzynski W, Ohnishi N, Niemeyer J, Spaniol B, Schaffer M, Rast A, Ostermeier M, Strauss M, Plitzko JM, Baumeister W, Rudack T, Sakamoto W, Nickelsen J, Schuller JM, Schroda M, Engel BD. Structural basis for VIPP1 oligomerization and maintenance of thylakoid membrane integrity. Cell 2021; 184:3643-3659.e23. [PMID: 34166613 DOI: 10.1016/j.cell.2021.05.011] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 02/16/2021] [Accepted: 05/10/2021] [Indexed: 12/21/2022]
Abstract
Vesicle-inducing protein in plastids 1 (VIPP1) is essential for the biogenesis and maintenance of thylakoid membranes, which transform light into life. However, it is unknown how VIPP1 performs its vital membrane-remodeling functions. Here, we use cryo-electron microscopy to determine structures of cyanobacterial VIPP1 rings, revealing how VIPP1 monomers flex and interweave to form basket-like assemblies of different symmetries. Three VIPP1 monomers together coordinate a non-canonical nucleotide binding pocket on one end of the ring. Inside the ring's lumen, amphipathic helices from each monomer align to form large hydrophobic columns, enabling VIPP1 to bind and curve membranes. In vivo mutations in these hydrophobic surfaces cause extreme thylakoid swelling under high light, indicating an essential role of VIPP1 lipid binding in resisting stress-induced damage. Using cryo-correlative light and electron microscopy (cryo-CLEM), we observe oligomeric VIPP1 coats encapsulating membrane tubules within the Chlamydomonas chloroplast. Our work provides a structural foundation for understanding how VIPP1 directs thylakoid biogenesis and maintenance.
Collapse
Affiliation(s)
- Tilak Kumar Gupta
- Department of Molecular Structural Biology, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany
| | - Sven Klumpe
- Department of Molecular Structural Biology, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany
| | - Karin Gries
- Molecular Biotechnology and Systems Biology, Technische Universität Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Steffen Heinz
- Department of Molecular Plant Sciences, LMU Munich, 82152 Martinsried, Germany
| | - Wojciech Wietrzynski
- Department of Molecular Structural Biology, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany; Helmholtz Pioneer Campus, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Norikazu Ohnishi
- Institute of Plant Science and Resources, Okayama University, Kurashiki, Okayama 710-0046, Japan
| | - Justus Niemeyer
- Molecular Biotechnology and Systems Biology, Technische Universität Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Benjamin Spaniol
- Molecular Biotechnology and Systems Biology, Technische Universität Kaiserslautern, 67663 Kaiserslautern, Germany
| | - Miroslava Schaffer
- Department of Molecular Structural Biology, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany
| | - Anna Rast
- Department of Molecular Structural Biology, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany; Department of Molecular Plant Sciences, LMU Munich, 82152 Martinsried, Germany
| | - Matthias Ostermeier
- Department of Molecular Plant Sciences, LMU Munich, 82152 Martinsried, Germany
| | - Mike Strauss
- Department of Anatomy and Cell Biology, McGill University, Montreal, QC H3A 17C, Canada
| | - Jürgen M Plitzko
- Department of Molecular Structural Biology, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany
| | - Wolfgang Baumeister
- Department of Molecular Structural Biology, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany
| | - Till Rudack
- Biospectroscopy, Center for Protein Diagnostics (PRODI), Ruhr University Bochum, 44801 Bochum, Germany; Department of Biophysics, Faculty of Biology & Biotechnology, Ruhr University Bochum, 44780 Bochum, Germany
| | - Wataru Sakamoto
- Institute of Plant Science and Resources, Okayama University, Kurashiki, Okayama 710-0046, Japan
| | - Jörg Nickelsen
- Department of Molecular Plant Sciences, LMU Munich, 82152 Martinsried, Germany
| | - Jan M Schuller
- SYNMIKRO Research Center and Department of Chemistry, Philipps-University Marburg, 35032 Marburg, Germany.
| | - Michael Schroda
- Molecular Biotechnology and Systems Biology, Technische Universität Kaiserslautern, 67663 Kaiserslautern, Germany.
| | - Benjamin D Engel
- Department of Molecular Structural Biology, Max Planck Institute of Biochemistry, 82152 Martinsried, Germany; Helmholtz Pioneer Campus, Helmholtz Zentrum München, 85764 Neuherberg, Germany; Department of Chemistry, Technical University of Munich, 85748 Garching, Germany.
| |
Collapse
|
6
|
Sharma N, Patiyal S, Dhall A, Pande A, Arora C, Raghava GPS. AlgPred 2.0: an improved method for predicting allergenic proteins and mapping of IgE epitopes. Brief Bioinform 2020; 22:5985292. [PMID: 33201237 DOI: 10.1093/bib/bbaa294] [Citation(s) in RCA: 106] [Impact Index Per Article: 26.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Revised: 10/02/2020] [Accepted: 10/05/2020] [Indexed: 12/22/2022] Open
Abstract
AlgPred 2.0 is a web server developed for predicting allergenic proteins and allergenic regions in a protein. It is an updated version of AlgPred developed in 2006. The dataset used for training, testing and validation consists of 10 075 allergens and 10 075 non-allergens. In addition, 10 451 experimentally validated immunoglobulin E (IgE) epitopes were used to identify antigenic regions in a protein. All models were trained on 80% of data called training dataset, and the performance of models was evaluated using 5-fold cross-validation technique. The performance of the final model trained on the training dataset was evaluated on 20% of data called validation dataset; no two proteins in any two sets have more than 40% similarity. First, a Basic Local Alignment Search Tool (BLAST) search has been performed against the dataset, and allergens were predicted based on the level of similarity with known allergens. Second, IgE epitopes obtained from the IEDB database were searched in the dataset to predict allergens based on their presence in a protein. Third, motif-based approaches like multiple EM for motif elicitation/motif alignment and search tool have been used to predict allergens. Fourth, allergen prediction models have been developed using a wide range of machine learning techniques. Finally, the ensemble approach has been used for predicting allergenic protein by combining prediction scores of different approaches. Our best model achieved maximum performance in terms of area under receiver operating characteristic curve 0.98 with Matthew's correlation coefficient 0.85 on the validation dataset. A web server AlgPred 2.0 has been developed that allows the prediction of allergens, mapping of IgE epitope, motif search and BLAST search (https://webs.iiitd.edu.in/raghava/algpred2/).
Collapse
Affiliation(s)
- Neelam Sharma
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Anjali Dhall
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Akshara Pande
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
7
|
Arora C, Kaur D, Lathwal A, Raghava GP. Risk prediction in cutaneous melanoma patients from their clinico-pathological features: superiority of clinical data over gene expression data. Heliyon 2020; 6:e04811. [PMID: 32913910 PMCID: PMC7472860 DOI: 10.1016/j.heliyon.2020.e04811] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 06/19/2020] [Accepted: 08/25/2020] [Indexed: 12/26/2022] Open
Abstract
Risk assessment in cutaneous melanoma (CM) patients is one of the major challenges in the effective treatment of CM patients. Traditionally, clinico-pathological features such as Breslow thickness, American Joint Committee on Cancer (AJCC) tumor staging, etc. are utilized for this purpose. However, due to advancements in technology, most of the upcoming risk prediction methods are gene-expression profile (GEP) based. In this study, we have tried to develop new GEP and clinico-pathological features-based biomarkers and assessed their prognostic strength in contrast to existing prognostic methods. We developed risk prediction models using the expression of the genes associated with different cancer-related pathways and got a maximum hazard ratio (HR) of 2.52 with p-value ~10-8 for the apoptotic pathway. Another model, based on combination of apoptotic and notch pathway genes boosted the HR to 2.57. Next, we developed models based on individual clinical features and got a maximum HR of 2.45 with p-value ~10-6 for Breslow thickness. We also developed models using the best features of clinical as well as gene-expression data and obtained a maximum HR of 3.19 with p-value ~10-9. Finally, we developed a new ensemble method using clinical variables only and got a maximum HR of 6.40 with p-value ~10-15. Based on this method, a web-based service and an android application named 'CMcrpred' is available at (https://webs.iiitd.edu.in/raghava/cmcrpred/) and Google Play Store respectively to facilitate scientific community. This study reveals that our new ensemble method based on only clinico-pathological features overperforms methods based on GEP based profiles as well as currently used AJCC staging. It also highlights the need to explore the full potential of clinical variables for prognostication of cancer patients.
Collapse
Affiliation(s)
- Chakit Arora
- Department of Computational Biology, IIIT- Delhi, New-Delhi, India
| | - Dilraj Kaur
- Department of Computational Biology, IIIT- Delhi, New-Delhi, India
| | - Anjali Lathwal
- Department of Computational Biology, IIIT- Delhi, New-Delhi, India
| | | |
Collapse
|
8
|
Hu X, Feng Z, Zhang X, Liu L, Wang S. The Identification of Metal Ion Ligand-Binding Residues by Adding the Reclassified Relative Solvent Accessibility. Front Genet 2020; 11:214. [PMID: 32265982 PMCID: PMC7096583 DOI: 10.3389/fgene.2020.00214] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 02/24/2020] [Indexed: 11/13/2022] Open
Abstract
Many proteins realize their special functions by binding with specific metal ion ligands during a cell's life cycle. The ability to correctly identify metal ion ligand-binding residues is valuable for the human health and the design of molecular drug. Precisely identifying these residues, however, remains challenging work. We have presented an improved computational approach for predicting the binding residues of 10 metal ion ligands (Zn2+, Cu2+, Fe2+, Fe3+, Co2+, Ca2+, Mg2+, Mn2+, Na+, and K+) by adding reclassified relative solvent accessibility (RSA). The best accuracy of fivefold cross-validation was higher than 77.9%, which was about 16% higher than the previous result on the same dataset. It was found that different reclassification of the RSA information can make different contributions to the identification of specific ligand binding residues. Our study has provided an additional understanding of the effect of the RSA on the identification of metal ion ligand binding residues.
Collapse
Affiliation(s)
| | - Zhenxing Feng
- College of Sciences, Inner Mongolla University of Technology, Hohhot, China
| | - Xiaojin Zhang
- College of Sciences, Inner Mongolla University of Technology, Hohhot, China
| | | | | |
Collapse
|
9
|
Zhao J, Cao Y, Zhang L. Exploring the computational methods for protein-ligand binding site prediction. Comput Struct Biotechnol J 2020; 18:417-426. [PMID: 32140203 PMCID: PMC7049599 DOI: 10.1016/j.csbj.2020.02.008] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 01/23/2020] [Accepted: 02/11/2020] [Indexed: 12/21/2022] Open
Abstract
Proteins participate in various essential processes in vivo via interactions with other molecules. Identifying the residues participating in these interactions not only provides biological insights for protein function studies but also has great significance for drug discoveries. Therefore, predicting protein-ligand binding sites has long been under intense research in the fields of bioinformatics and computer aided drug discovery. In this review, we first introduce the research background of predicting protein-ligand binding sites and then classify the methods into four categories, namely, 3D structure-based, template similarity-based, traditional machine learning-based and deep learning-based methods. We describe representative algorithms in each category and elaborate on machine learning and deep learning-based prediction methods in more detail. Finally, we discuss the trends and challenges of the current research such as molecular dynamics simulation based cryptic binding sites prediction, and highlight prospective directions for the near future.
Collapse
Affiliation(s)
- Jingtian Zhao
- College of Computer Science, Sichuan University, Chengdu 610065, China
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Le Zhang
- College of Computer Science, Sichuan University, Chengdu 610065, China
| |
Collapse
|
10
|
Le NQK, Ho QT, Ou YY. Using two-dimensional convolutional neural networks for identifying GTP binding sites in Rab proteins. J Bioinform Comput Biol 2020; 17:1950005. [PMID: 30866734 DOI: 10.1142/s0219720019500057] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Deep learning has been increasingly and widely used to solve numerous problems in various fields with state-of-the-art performance. It can also be applied in bioinformatics to reduce the requirement for feature extraction and reach high performance. This study attempts to use deep learning to predict GTP binding sites in Rab proteins, which is one of the most vital molecular functions in life science. A functional loss of GTP binding sites in Rab proteins has been implicated in a variety of human diseases (choroideremia, intellectual disability, cancer, Parkinson's disease). Therefore, creating a precise model to identify their functions is a crucial problem for understanding these diseases and designing the drug targets. Our deep learning model with two-dimensional convolutional neural network and position-specific scoring matrix profiles could identify GTP binding residues with achieved sensitivity of 92.3%, specificity of 99.8%, accuracy of 99.5%, and MCC of 0.92 for independent dataset. Compared with other published works, this approach achieved a significant improvement. Throughout the proposed study, we provide an effective model for predicting GTP binding sites in Rab proteins and a basis for further research that can apply deep learning in bioinformatics, especially in nucleotide binding site prediction.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- * Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan 32003, R. O. C.,† School of Humanities, Nanyang Technological University, 48 Nanyang Ave, Singapore 639798, Singapore
| | - Quang-Thai Ho
- * Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan 32003, R. O. C
| | - Yu-Yen Ou
- * Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan 32003, R. O. C
| |
Collapse
|
11
|
Agrawal P, Mishra G, Raghava GPS. SAMbinder: A Web Server for Predicting S-Adenosyl-L-Methionine Binding Residues of a Protein From Its Amino Acid Sequence. Front Pharmacol 2020; 10:1690. [PMID: 32082172 PMCID: PMC7002541 DOI: 10.3389/fphar.2019.01690] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 12/24/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION S-adenosyl-L-methionine (SAM) is an essential cofactor present in the biological system and plays a key role in many diseases. There is a need to develop a method for predicting SAM binding sites in a protein for designing drugs against SAM associated disease. To the best of our knowledge, there is no method that can predict the binding site of SAM in a given protein sequence. RESULT This manuscript describes a method SAMbinder, developed for predicting SAM interacting residue in a protein from its primary sequence. All models were trained, tested, and evaluated on 145 SAM binding protein chains where no two chains have more than 40% sequence similarity. Firstly, models were developed using different machine learning techniques on a balanced data set containing 2,188 SAM interacting and an equal number of non-interacting residues. Our random forest based model developed using binary profile feature got maximum Matthews Correlation Coefficient (MCC) 0.42 with area under receiver operating characteristics (AUROC) 0.79 on the validation data set. The performance of our models improved significantly from MCC 0.42 to 0.61, when evolutionary information in the form of the position-specific scoring matrix (PSSM) profile is used as a feature. We also developed models on a realistic data set containing 2,188 SAM interacting and 40,029 non-interacting residues and got maximum MCC 0.61 with AUROC of 0.89. In order to evaluate the performance of our models, we used internal as well as external cross-validation technique. AVAILABILITY AND IMPLEMENTATION https://webs.iiitd.edu.in/raghava/sambinder/.
Collapse
Affiliation(s)
- Piyush Agrawal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
- Bioinformatics Center, CSIR-Institute of Microbial Technology, Chandigarh, India
| | - Gaurav Mishra
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
- Department of Electrical Engineering, Shiv Nadar University, Greater Noida, India
| | - Gajendra P. S. Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
12
|
Kaur D, Arora C, Raghava GPS. A Hybrid Model for Predicting Pattern Recognition Receptors Using Evolutionary Information. Front Immunol 2020; 11:71. [PMID: 32082326 PMCID: PMC7002473 DOI: 10.3389/fimmu.2020.00071] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Accepted: 01/13/2020] [Indexed: 12/17/2022] Open
Abstract
This study describes a method developed for predicting pattern recognition receptors (PRRs), which are an integral part of the immune system. The models developed here were trained and evaluated on the largest possible non-redundant PRRs, obtained from PRRDB 2.0, and non-pattern recognition receptors (Non-PRRs), obtained from Swiss-Prot. Firstly, a similarity-based approach using BLAST was used to predict PRRs and got limited success due to a large number of no-hits. Secondly, machine learning-based models were developed using sequence composition and achieved a maximum MCC of 0.63. In addition to this, models were developed using evolutionary information in the form of PSSM composition and achieved maximum MCC value of 0.66. Finally, we developed hybrid models that combined a similarity-based approach using BLAST and machine learning-based models. Our best model, which combined BLAST and PSSM based model, achieved a maximum MCC value of 0.82 with an AUROC value of 0.95, utilizing the potential of both similarity-based search and machine learning techniques. In order to facilitate the scientific community, we also developed a web server "PRRpred" based on the best model developed in this study (http://webs.iiitd.edu.in/raghava/prrpred/).
Collapse
Affiliation(s)
- Dilraj Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Chakit Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
13
|
Patiyal S, Agrawal P, Kumar V, Dhall A, Kumar R, Mishra G, Raghava GP. NAGbinder: An approach for identifying N-acetylglucosamine interacting residues of a protein from its primary sequence. Protein Sci 2020; 29:201-210. [PMID: 31654438 PMCID: PMC6933864 DOI: 10.1002/pro.3761] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2019] [Revised: 10/24/2019] [Accepted: 10/24/2019] [Indexed: 12/14/2022]
Abstract
N-acetylglucosamine (NAG) belongs to the eight essential saccharides that are required to maintain the optimal health and precise functioning of systems ranging from bacteria to human. In the present study, we have developed a method, NAGbinder, which predicts the NAG-interacting residues in a protein from its primary sequence information. We extracted 231 NAG-interacting nonredundant protein chains from Protein Data Bank, where no two sequences share more than 40% sequence identity. All prediction models were trained, validated, and evaluated on these 231 protein chains. At first, prediction models were developed on balanced data consisting of 1,335 NAG-interacting and noninteracting residues, using various window size. The model developed by implementing Random Forest using binary profiles as the main principle for identifying NAG-interacting residue with window size 9, performed best among other models. It achieved highest Matthews Correlation Coefficient (MCC) of 0.31 and 0.25, and Area Under Receiver Operating Curve (AUROC) of 0.73 and 0.70 on training and validation data set, respectively. We also developed prediction models on realistic data set (1,335 NAG-interacting and 47,198 noninteracting residues) using the same principle, where the model achieved MCC of 0.26 and 0.27, and AUROC of 0.70 and 0.71, on training and validation data set, respectively. The success of our method can be appraised by the fact that, if a sequence of 1,000 amino acids is analyzed with our approach, 10 residues will be predicted as NAG-interacting, out of which five are correct. Best models were incorporated in the standalone version and in the webserver available at https://webs.iiitd.edu.in/raghava/nagbinder/.
Collapse
Affiliation(s)
- Sumeet Patiyal
- Department of Computational BiologyIndraprastha Institute of Information TechnologyDelhiIndia
| | - Piyush Agrawal
- Department of Computational BiologyIndraprastha Institute of Information TechnologyDelhiIndia
- Bioinformatics CentreCSIR‐Institute of Microbial TechnologyChandigarhIndia
| | - Vinod Kumar
- Department of Computational BiologyIndraprastha Institute of Information TechnologyDelhiIndia
- Bioinformatics CentreCSIR‐Institute of Microbial TechnologyChandigarhIndia
| | - Anjali Dhall
- Department of Computational BiologyIndraprastha Institute of Information TechnologyDelhiIndia
| | - Rajesh Kumar
- Department of Computational BiologyIndraprastha Institute of Information TechnologyDelhiIndia
- Bioinformatics CentreCSIR‐Institute of Microbial TechnologyChandigarhIndia
| | - Gaurav Mishra
- Department of Electrical EngineeringShiv Nadar University, Greater NoidaGautam Buddha NagarIndia
| | - Gajendra P.S. Raghava
- Department of Computational BiologyIndraprastha Institute of Information TechnologyDelhiIndia
| |
Collapse
|
14
|
Bagchi A. Latest trends in structure based drug design with protein targets. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2019; 121:1-23. [PMID: 32312418 DOI: 10.1016/bs.apcsb.2019.11.008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Structure based drug designing is an important endeavor in the field of structural bioinformatics. Previously the entire process was dependent on the wet-lab experiments to build libraries of ligand molecules. And the molecules used to be tested to determine their binding efficacies with protein target. However, the entire process is very lengthy and above all highly expensive. With the advent of supercomputers and increasing computational powers, the search process for finding suitable ligand molecules against target proteins have become more streamlined and cost-effective. Now the entire ligand search process is performed in-silico with the help of the techniques of virtual screening, molecular docking simulations and molecular dynamics studies. In the present chapter, a brief overview of the computational techniques involved in structure based drug designing is presented with a special emphasis on the thermodynamic principles behind the molecular interactions.
Collapse
Affiliation(s)
- Angshuman Bagchi
- Department of Biochemistry and Biophysics, University of Kalyani, Kalyani, West Bengal, India
| |
Collapse
|
15
|
Hu X, Ge R, Feng Z. Recognizing five molecular ligand-binding sites with similar chemical structure. J Comput Chem 2019; 41:110-118. [PMID: 31642535 DOI: 10.1002/jcc.26077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 08/27/2019] [Accepted: 08/31/2019] [Indexed: 02/03/2023]
Abstract
Accurate identification of ligand-binding sites and discovering the protein-ligand interaction mechanism are important for understanding proteins' functions and designing new drugs. Meanwhile, accurate computational prediction and mechanism research are two grand challenges in proteomics. In this article, ligand-binding residues of five ligands (ATP, ADP, GTP, GDP, and NAD) are predicted as a group, due to their similar chemical structures and close biological function relations. The data set of binding sites by five ligands (ATP, ADP, GTP, GDP, and NAD) are collated from Biolip database. Then, five features, containing increment of diversity value, matrix scoring value, auto-covariance, secondary structure information, and surface accessibility information are used in binding site predictions. The support vector machine (SVM) model is used with the five features to predict ligand-binding sites. Finally, prediction results are tested by fivefold cross validation. Accuracy (Acc) of five ligands (ATP, ADP, GTP, GDP, and NAD) achieves 77.4%, 71.2%, 82.1%, 82.9%, and 85.3%, respectively; and Matthew correlation coefficient (MCC) of the above five ligands achieves 0.549, 0.424, 0.643, 0.659, and 0.702, respectively. The research result shows that for ligands with similar chemical structures, microenvironment of their binding sites and their sensitivities to features are similar, while, differences of their ligand-binding properties exist at the same time. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Xiuzhen Hu
- Departments of Physics, College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Riletu Ge
- Departments of Physics, College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Zhenxing Feng
- Departments of Mathematics, College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| |
Collapse
|
16
|
Bao Y, Marini S, Tamura T, Kamada M, Maegawa S, Hosokawa H, Song J, Akutsu T. Toward more accurate prediction of caspase cleavage sites: a comprehensive review of current methods, tools and features. Brief Bioinform 2019; 20:1669-1684. [PMID: 29860277 PMCID: PMC6917222 DOI: 10.1093/bib/bby041] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 04/16/2018] [Indexed: 12/20/2022] Open
Abstract
As one of the few irreversible protein posttranslational modifications, proteolytic cleavage is involved in nearly all aspects of cellular activities, ranging from gene regulation to cell life-cycle regulation. Among the various protease-specific types of proteolytic cleavage, cleavages by casapses/granzyme B are considered as essential in the initiation and execution of programmed cell death and inflammation processes. Although a number of substrates for both types of proteolytic cleavage have been experimentally identified, the complete repertoire of caspases and granzyme B substrates remains to be fully characterized. To tackle this issue and complement experimental efforts for substrate identification, systematic bioinformatics studies of known cleavage sites provide important insights into caspase/granzyme B substrate specificity, and facilitate the discovery of novel substrates. In this article, we review and benchmark 12 state-of-the-art sequence-based bioinformatics approaches and tools for caspases/granzyme B cleavage prediction. We evaluate and compare these methods in terms of their input/output, algorithms used, prediction performance, validation methods and software availability and utility. In addition, we construct independent data sets consisting of caspases/granzyme B substrates from different species and accordingly assess the predictive power of these different predictors for the identification of cleavage sites. We find that the prediction results are highly variable among different predictors. Furthermore, we experimentally validate the predictions of a case study by performing caspase cleavage assay. We anticipate that this comprehensive review and survey analysis will provide an insightful resource for biologists and bioinformaticians who are interested in using and/or developing tools for caspase/granzyme B cleavage prediction.
Collapse
Affiliation(s)
- Yu Bao
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Simone Marini
- Department of Computational Medicine and Bioinformatics, University of Michigan, 1241 E. Catherine St., 5940 Buhl, Ann Arbor 48109-5618, USA
| | - Takeyuki Tamura
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Mayumi Kamada
- Graduate School of Medicine, Kyoto University, Sakyo-ku, Kyoto 606-8507, Japan
| | - Shingo Maegawa
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Hiroshi Hosokawa
- Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute, Monash Centre for Data Science and ARC Centre of Excellence in Advance Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| |
Collapse
|
17
|
Gal A, Balicza P, Weaver D, Naghdi S, Joseph SK, Várnai P, Gyuris T, Horváth A, Nagy L, Seifert EL, Molnar MJ, Hajnóczky G. MSTO1 is a cytoplasmic pro-mitochondrial fusion protein, whose mutation induces myopathy and ataxia in humans. EMBO Mol Med 2018; 9:967-984. [PMID: 28554942 PMCID: PMC5494519 DOI: 10.15252/emmm.201607058] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The protein MSTO1 has been localized to mitochondria and linked to mitochondrial morphology, but its specific role has remained unclear. We identified a c.22G > A (p.Val8Met) mutation of MSTO1 in patients with minor physical abnormalities, myopathy, ataxia, and neurodevelopmental impairments. Lactate stress test and myopathological results suggest mitochondrial dysfunction. In patient fibroblasts, MSTO1 mRNA and protein abundance are decreased, mitochondria display fragmentation, aggregation, and decreased network continuity and fusion activity. These characteristics can be reversed by genetic rescue. Short‐term silencing of MSTO1 in HeLa cells reproduced the impairment of mitochondrial morphology and dynamics observed in the fibroblasts without damaging bioenergetics. At variance with a previous report, we find MSTO1 to be localized in the cytoplasmic area with limited colocalization with mitochondria. MSTO1 interacts with the fusion machinery as a soluble factor at the cytoplasm‐mitochondrial outer membrane interface. After plasma membrane permeabilization, MSTO1 is released from the cells. Thus, an MSTO1 loss‐of‐function mutation is associated with a human disorder showing mitochondrial involvement. MSTO1 likely has a physiologically relevant role in mitochondrial morphogenesis by supporting mitochondrial fusion.
Collapse
Affiliation(s)
- Aniko Gal
- MitoCare Center for Mitochondrial Imaging Research and Diagnostics, Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA.,Institute of Genomic Medicine and Rare Disorders, Semmelweis University, Budapest, Hungary
| | - Peter Balicza
- Institute of Genomic Medicine and Rare Disorders, Semmelweis University, Budapest, Hungary
| | - David Weaver
- MitoCare Center for Mitochondrial Imaging Research and Diagnostics, Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Shamim Naghdi
- MitoCare Center for Mitochondrial Imaging Research and Diagnostics, Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Suresh K Joseph
- MitoCare Center for Mitochondrial Imaging Research and Diagnostics, Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Péter Várnai
- Department of Physiology, Semmelweis University, Budapest, Hungary
| | - Tibor Gyuris
- Department of Biochemistry and Molecular Biology, University of Debrecen, Debrecen, Hungary
| | - Attila Horváth
- Department of Biochemistry and Molecular Biology, University of Debrecen, Debrecen, Hungary
| | - Laszlo Nagy
- Department of Biochemistry and Molecular Biology, University of Debrecen, Debrecen, Hungary
| | - Erin L Seifert
- MitoCare Center for Mitochondrial Imaging Research and Diagnostics, Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Maria Judit Molnar
- Institute of Genomic Medicine and Rare Disorders, Semmelweis University, Budapest, Hungary
| | - György Hajnóczky
- MitoCare Center for Mitochondrial Imaging Research and Diagnostics, Department of Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA
| |
Collapse
|
18
|
Cao X, Hu X, Zhang X, Gao S, Ding C, Feng Y, Bao W. Identification of metal ion binding sites based on amino acid sequences. PLoS One 2017; 12:e0183756. [PMID: 28854211 PMCID: PMC5576659 DOI: 10.1371/journal.pone.0183756] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Accepted: 08/10/2017] [Indexed: 11/26/2022] Open
Abstract
The identification of metal ion binding sites is important for protein function annotation and the design of new drug molecules. This study presents an effective method of analyzing and identifying the binding residues of metal ions based solely on sequence information. Ten metal ions were extracted from the BioLip database: Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, K+ and Co2+. The analysis showed that Zn2+, Cu2+, Fe2+, Fe3+, and Co2+ were sensitive to the conservation of amino acids at binding sites, and promising results can be achieved using the Position Weight Scoring Matrix algorithm, with an accuracy of over 79.9% and a Matthews correlation coefficient of over 0.6. The binding sites of other metals can also be accurately identified using the Support Vector Machine algorithm with multifeature parameters as input. In addition, we found that Ca2+ was insensitive to hydrophobicity and hydrophilicity information and Mn2+ was insensitive to polarization charge information. An online server was constructed based on the framework of the proposed method and is freely available at http://60.31.198.140:8081/metal/HomePage/HomePage.html.
Collapse
Affiliation(s)
- Xiaoyong Cao
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Xiaojin Zhang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Sujuan Gao
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
- College of Sciences, Inner Mongolia Agricultural University, Hohhot, 010021, China
| | - Changjiang Ding
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Yonge Feng
- College of Sciences, Inner Mongolia Agricultural University, Hohhot, 010021, China
| | - Weihua Bao
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051, China
| |
Collapse
|
19
|
Le NQK, Ou YY. Incorporating efficient radial basis function networks and significant amino acid pairs for predicting GTP binding sites in transport proteins. BMC Bioinformatics 2016; 17:501. [PMID: 28155651 PMCID: PMC5259906 DOI: 10.1186/s12859-016-1369-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Background Guanonine-protein (G-protein) is known as molecular switches inside cells, and is very important in signals transmission from outside to inside cell. Especially in transport protein, most of G-proteins play an important role in membrane trafficking; necessary for transferring proteins and other molecules to a variety of destinations outside and inside of the cell. The function of membrane trafficking is controlled by G-proteins via Guanosine triphosphate (GTP) binding sites. The GTP binding sites active G-proteins initiated to membrane vesicles by interacting with specific effector proteins. Without the interaction from GTP binding sites, G-proteins could not be active in membrane trafficking and consequently cause many diseases, i.e., cancer, Parkinson… Thus it is very important to identify GTP binding sites in membrane trafficking, in particular, and in transport protein, in general. Results We developed the proposed model with a cross-validation and examined with an independent dataset. We achieved an accuracy of 95.6% for evaluating with cross-validation and 98.7% for examining the performance with the independent data set. For newly discovered transport protein sequences, our approach performed remarkably better than similar methods such as GTPBinder, NsitePred and TargetSOS. Moreover, a friendly web server was developed for identifying GTP binding sites in transport proteins available for all users. Conclusions We approached a computational technique using PSSM profiles and SAAPs for identifying GTP binding residues in transport proteins. When we included SAAPs into PSSM profiles, the predictive performance achieved a significant improvement in all measurement metrics. Furthermore, the proposed method could be a power tool for determining new proteins that belongs into GTP binding sites in transport proteins and can provide useful information for biologists.
Collapse
Affiliation(s)
- Nguyen-Quoc-Khanh Le
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan.
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, Taiwan.
| |
Collapse
|
20
|
Hu X, Wang K, Dong Q. Protein ligand-specific binding residue predictions by an ensemble classifier. BMC Bioinformatics 2016; 17:470. [PMID: 27855637 PMCID: PMC5114821 DOI: 10.1186/s12859-016-1348-3] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 11/10/2016] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Prediction of ligand binding sites is important to elucidate protein functions and is helpful for drug design. Although much progress has been made, many challenges still need to be addressed. Prediction methods need to be carefully developed to account for chemical and structural differences between ligands. RESULTS In this study, we present ligand-specific methods to predict the binding sites of protein-ligand interactions. First, a sequence-based method is proposed that only extracts features from protein sequence information, including evolutionary conservation scores and predicted structure properties. An improved AdaBoost algorithm is applied to address the serious imbalance problem between the binding and non-binding residues. Then, a combined method is proposed that combines the current template-free method and four other well-established template-based methods. The above two methods predict the ligand binding sites along the sequences using a ligand-specific strategy that contains metal ions, acid radical ions, nucleotides and ferroheme. Testing on a well-established dataset showed that the proposed sequence-based method outperformed the profile-based method by 4-19% in terms of the Matthews correlation coefficient on different ligands. The combined method outperformed each of the individual methods, with an improvement in the average Matthews correlation coefficients of 5.55% over all ligands. The results also show that the ligand-specific methods significantly outperform the general-purpose methods, which confirms the necessity of developing elaborate ligand-specific methods for ligand binding site prediction. CONCLUSIONS Two efficient ligand-specific binding site predictors are presented. The standalone package is freely available for academic usage at http://dase.ecnu.edu.cn/qwdong/TargetCom/TargetCom_standalone.tar.gz or request upon the corresponding author.
Collapse
Affiliation(s)
- Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, 010051 People’s Republic of China
| | - Kai Wang
- College of Animal Science and Technology, Jilin Agricultural University, Changchun, 130118 People’s Republic of China
| | - Qiwen Dong
- Institute for Data Science and Engineering, East China Normal University, Shanghai, 200062 People’s Republic of China
- Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055 People’s Republic of China
- Present Address: School of Computer Science and Software Engineering, East China Normal University, #3663, North Zhongshan RD, Shanghai, 200062 China
| |
Collapse
|
21
|
Fang C, Noguchi T, Yamana H. Analysis of evolutionary conservation patterns and their influence on identifying protein functional sites. J Bioinform Comput Biol 2015; 12:1440003. [PMID: 25362840 DOI: 10.1142/s0219720014400034] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Evolutionary conservation information included in position-specific scoring matrix (PSSM) has been widely adopted by sequence-based methods for identifying protein functional sites, because all functional sites, whether in ordered or disordered proteins, are found to be conserved at some extent. However, different functional sites have different conservation patterns, some of them are linear contextual, some of them are mingled with highly variable residues, and some others seem to be conserved independently. Every value in PSSMs is calculated independently of each other, without carrying the contextual information of residues in the sequence. Therefore, adopting the direct output of PSSM for prediction fails to consider the relationship between conservation patterns of residues and the distribution of conservation scores in PSSMs. In order to demonstrate the importance of combining PSSMs with the specific conservation patterns of functional sites for prediction, three different PSSM-based methods for identifying three kinds of functional sites have been analyzed. Results suggest that, different PSSM-based methods differ in their capability to identify different patterns of functional sites, and better combining PSSMs with the specific conservation patterns of residues would largely facilitate the prediction.
Collapse
Affiliation(s)
- Chun Fang
- Department of Computer Science and Engineering of Shandong, University of Technology, Shandong 255049, P. R. China
| | | | | |
Collapse
|
22
|
Predicting flavin and nicotinamide adenine dinucleotide-binding sites in proteins using the fragment transformation method. BIOMED RESEARCH INTERNATIONAL 2015; 2015:402536. [PMID: 26000290 PMCID: PMC4426894 DOI: 10.1155/2015/402536] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Accepted: 07/21/2014] [Indexed: 11/18/2022]
Abstract
We developed a computational method to identify NAD- and FAD-binding sites in proteins. First, we extracted from the Protein Data Bank structures of proteins that bind to at least one of these ligands. NAD-/FAD-binding residue templates were then constructed by identifying binding residues through the ligand-binding database BioLiP. The fragment transformation method was used to identify structures within query proteins that resembled the ligand-binding templates. By comparing residue types and their relative spatial positions, potential binding sites were identified and a ligand-binding potential for each residue was calculated. Setting the false positive rate at 5%, our method predicted NAD- and FAD-binding sites at true positive rates of 67.1% and 68.4%, respectively. Our method provides excellent results for identifying FAD- and NAD-binding sites in proteins, and the most important is that the requirement of conservation of residue types and local structures in the FAD- and NAD-binding sites can be verified.
Collapse
|
23
|
Hu J, He X, Yu DJ, Yang XB, Yang JY, Shen HB. A new supervised over-sampling algorithm with application to protein-nucleotide binding residue prediction. PLoS One 2014; 9:e107676. [PMID: 25229688 PMCID: PMC4168127 DOI: 10.1371/journal.pone.0107676] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 08/09/2014] [Indexed: 12/21/2022] Open
Abstract
Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/.
Collapse
Affiliation(s)
- Jun Hu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Xue He
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- Changshu Institute, Nanjing University of Science and Technology, Changshu, Jiangsu, China
- * E-mail: (DJY); (HBS)
| | - Xi-Bei Yang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
- School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China
| | - Jing-Yu Yang
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China
- * E-mail: (DJY); (HBS)
| |
Collapse
|
24
|
Fang C, Noguchi T, Yamana H. Simplified sequence-based method for ATP-binding prediction using contextual local evolutionary conservation. Algorithms Mol Biol 2014; 9:7. [PMID: 24618258 PMCID: PMC3995811 DOI: 10.1186/1748-7188-9-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 03/05/2014] [Indexed: 12/23/2022] Open
Abstract
Background Identifying ligand-binding sites is a key step to annotate the protein functions and to find applications in drug design. Now, many sequence-based methods adopted various predicted results from other classifiers, such as predicted secondary structure, predicted solvent accessibility and predicted disorder probabilities, to combine with position-specific scoring matrix (PSSM) as input for binding sites prediction. These predicted features not only easily result in high-dimensional feature space, but also greatly increased the complexity of algorithms. Moreover, the performances of these predictors are also largely influenced by the other classifiers. Results In order to verify that conservation is the most powerful attribute in identifying ligand-binding sites, and to show the importance of revising PSSM to match the detailed conservation pattern of functional site in prediction, we have analyzed the Adenosine-5'-triphosphate (ATP) ligand as an example, and proposed a simple method for ATP-binding sites prediction, named as CLCLpred (Contextual Local evolutionary Conservation-based method for Ligand-binding prediction). Our method employed no predicted results from other classifiers as input; all used features were extracted from PSSM only. We tested our method on 2 separate data sets. Experimental results showed that, comparing with other 9 existing methods on the same data sets, our method achieved the best performance. Conclusions This study demonstrates that: 1) exploiting the signal from the detailed conservation pattern of residues will largely facilitate the prediction of protein functional sites; and 2) the local evolutionary conservation enables accurate prediction of ATP-binding sites directly from protein sequence.
Collapse
|
25
|
Chauhan JS, Rao A, Raghava GPS. In silico platform for prediction of N-, O- and C-glycosites in eukaryotic protein sequences. PLoS One 2013; 8:e67008. [PMID: 23840574 PMCID: PMC3695939 DOI: 10.1371/journal.pone.0067008] [Citation(s) in RCA: 156] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Accepted: 05/17/2013] [Indexed: 11/19/2022] Open
Abstract
Glycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large number of eukaryotic glycoproteins also have therapeutic and potential technology applications. Therefore, characterization and analysis of glycosites (glycosylated residues) in these proteins is of great interest to biologists. In order to cater these needs a number of in silico tools have been developed over the years, however, a need to get even better prediction tools remains. Therefore, in this study we have developed a new webserver GlycoEP for more accurate prediction of N-linked, O-linked and C-linked glycosites in eukaryotic glycoproteins using two larger datasets, namely, standard and advanced datasets. In case of standard datasets no two glycosylated proteins are more similar than 40%; advanced datasets are highly non-redundant where no two glycosites’ patterns (as defined in methods) have more than 60% similarity. Further, based on our results with several algorihtms developed using different machine-learning techniques, we found Support Vector Machine (SVM) as optimum tool to develop glycosite prediction models. Accordingly, using our more stringent and non-redundant advanced datasets, the SVM based models developed in this study achieved a prediction accuracy of 84.26%, 86.87% and 91.43% with corresponding MCC of 0.54, 0.20 and 0.78, for N-, O- and C-linked glycosites, respectively. The best performing models trained on advanced datasets were then implemented as a user-friendly web server GlycoEP (http://www.imtech.res.in/raghava/glycoep/). Additionally, this server provides prediction models developed on standard datasets and allows users to scan sequons in input protein sequences.
Collapse
Affiliation(s)
| | - Alka Rao
- Protein Science and Engineering, Institute of Microbial Technology, Chandigarh, India
| | | |
Collapse
|
26
|
Parca L, Ferré F, Ausiello G, Helmer-Citterich M. Nucleos: a web server for the identification of nucleotide-binding sites in protein structures. Nucleic Acids Res 2013; 41:W281-5. [PMID: 23703207 PMCID: PMC3692072 DOI: 10.1093/nar/gkt390] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Nucleos is a web server for the identification of nucleotide-binding sites in protein structures. Nucleos compares the structure of a query protein against a set of known template 3D binding sites representing nucleotide modules, namely the nucleobase, carbohydrate and phosphate. Structural features, clustering and conservation are used to filter and score the predictions. The predicted nucleotide modules are then joined to build whole nucleotide-binding sites, which are ranked by their score. The server takes as input either the PDB code of the query protein structure or a user-submitted structure in PDB format. The output of Nucleos is composed of ranked lists of predicted nucleotide-binding sites divided by nucleotide type (e.g. ATP-like). For each ranked prediction, Nucleos provides detailed information about the score, the template structure and the structural match for each nucleotide module composing the nucleotide-binding site. The predictions on the query structure and the template-binding sites can be viewed directly on the web through a graphical applet. In 98% of the cases, the modules composing correct predictions belong to proteins with no homology relationship between each other, meaning that the identification of brand-new nucleotide-binding sites is possible using information from non-homologous proteins. Nucleos is available at http://nucleos.bio.uniroma2.it/nucleos/.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, Centre for Molecular Bioinformatics, University of Rome Tor Vergata, Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | | | | | | |
Collapse
|
27
|
Hybrid approach for predicting coreceptor used by HIV-1 from its V3 loop amino acid sequence. PLoS One 2013; 8:e61437. [PMID: 23596523 PMCID: PMC3626595 DOI: 10.1371/journal.pone.0061437] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 03/13/2013] [Indexed: 12/18/2022] Open
Abstract
Background HIV-1 infects the host cell by interacting with the primary receptor CD4 and a coreceptor CCR5 or CXCR4. Maraviroc, a CCR5 antagonist binds to CCR5 receptor. Thus, it is important to identify the coreceptor used by the HIV strains dominating in the patient. In past, a number of experimental assays and in-silico techniques have been developed for predicting the coreceptor tropism. The prediction accuracy of these methods is excellent when predicting CCR5(R5) tropic sequences but is relatively poor for CXCR4(X4) tropic sequences. Therefore, any new method for accurate determination of coreceptor usage would be of paramount importance to the successful management of HIV-infected individuals. Results The dataset used in this study comprised 1799 R5-tropic and 598 X4-tropic third variable (V3) sequences of HIV-1. We compared the amino acid composition of both types of V3 sequences and observed that certain types of residues, e.g., Asparagine and Isoleucine, were preferred in R5-tropic sequences whereas residues like Lysine, Arginine, and Tryptophan were preferred in X4-tropic sequences. Initially, Support Vector Machine-based models were developed using amino acid composition, dipeptide composition, and split amino acid composition, which achieved accuracy up to 90%. We used BLAST to discriminate R5- and X4-tropic sequences and correctly predicted 93.16% of R5- and 75.75% of X4-tropic sequences. In order to improve the prediction accuracy, a Hybrid model was developed that achieved 91.66% sensitivity, 81.77% specificity, 89.19% accuracy and 0.72 Matthews Correlation Coefficient. The performance of our models was also evaluated on an independent dataset (256 R5- and 81 X4-tropic sequences) and achieved maximum accuracy of 84.87% with Matthews Correlation Coefficient 0.63. Conclusion This study describes a highly efficient method for predicting HIV-1 coreceptor usage from V3 sequences. In order to provide a service to the scientific community, a webserver HIVcoPred was developed (http://www.imtech.res.in/raghava/hivcopred/) for predicting the coreceptor usage.
Collapse
|
28
|
Panwar B, Gupta S, Raghava GPS. Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information. BMC Bioinformatics 2013; 14:44. [PMID: 23387468 PMCID: PMC3577447 DOI: 10.1186/1471-2105-14-44] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 01/31/2013] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure. RESULTS In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets. CONCLUSIONS This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred (http://crdd.osdd.net/raghava/vitapred/).
Collapse
Affiliation(s)
- Bharat Panwar
- Bioinformatics Centre, Institute of Microbial Technology (CSIR), Chandigarh, India
| | | | | |
Collapse
|
29
|
Finding protein targets for small biologically relevant ligands across fold space using inverse ligand binding predictions. Structure 2013; 20:1815-22. [PMID: 23141694 DOI: 10.1016/j.str.2012.09.011] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2012] [Revised: 08/14/2012] [Accepted: 09/16/2012] [Indexed: 01/12/2023]
Abstract
Inverse ligand binding prediction utilizes a few protein-ligand (drug) complexes to predict other secondary therapeutic and off-targets of a given drug molecule on a proteomic scale. We adapt two binding site predictors, FINDSITE and SMAP, to perform the inverse predictions and evaluate them on over 30 representative ligands. Use of just one complex allows the identification of other protein targets; the availability of additional complexes improves the results. Both methods offer comparable quality when using three complexes with diverse proteins. SMAP is better when fewer complexes are available, while FINDSITE provides stronger predictions for smaller ligands. We propose a consensus that combines (and outperforms) the two complementary approaches implemented by FINDSITE and SMAP. Most importantly, we demonstrate that these methods successfully find distant targets that belong to structurally different folds compared to the proteins in the input complexes.
Collapse
|
30
|
Parca L, Gherardini PF, Truglio M, Mangone I, Ferrè F, Helmer-Citterich M, Ausiello G. Identification of nucleotide-binding sites in protein structures: a novel approach based on nucleotide modularity. PLoS One 2012; 7:e50240. [PMID: 23209685 PMCID: PMC3507729 DOI: 10.1371/journal.pone.0050240] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2012] [Accepted: 10/22/2012] [Indexed: 01/30/2023] Open
Abstract
Nucleotides are involved in several cellular processes, ranging from the transmission of genetic information, to energy transfer and storage. Both sequence and structure based methods have been developed to predict the location of nucleotide-binding sites in proteins. Here we propose a novel methodology that leverages the observation that nucleotide-binding sites have a modular structure. Nucleotides are composed of identifiable fragments, i.e. the phosphate, the nucleobase and the carbohydrate moieties. These fragments are bound by specific structural motifs that recur in proteins of different fold. Moreover these motifs behave as modules and are found in different combinations across fold space. Our method predicts binding sites for each nucleotide fragment by comparing a query protein with a database of templates extracted from proteins of known structure. Whenever a similarity is found the fragment bound by the template is transferred on the query protein, thus identifying a putative binding site. Predictions falling inside the surface of the protein are discarded, and the remaining ones are scored using clustering and conservation. The method is able to rank as first a correct prediction in the 48%, 48% and 68% of the analyzed proteins for the nucleobase, carbohydrate and phosphate respectively, while considering the first five predictions the performances change to 71%, 65% and 86% respectively. Furthermore we attempted to reconstruct the full structure of the binding site, starting from the predicted positions of the fragments. We calculated that in the 59% of the analyzed proteins the method ranks as first a reconstructed binding site or a part of it. Finally we tested the reliability of our method in a real world case in which it has to predict nucleotide-binding sites in unbound proteins. We analyzed proteins whose structure has been solved with and without the nucleotide and observed only little variations in the method performance.
Collapse
Affiliation(s)
- Luca Parca
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | | | - Mauro Truglio
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - Iolanda Mangone
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - Fabrizio Ferrè
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | | | - Gabriele Ausiello
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| |
Collapse
|
31
|
Chauhan JS, Bhat AH, Raghava GPS, Rao A. GlycoPP: a webserver for prediction of N- and O-glycosites in prokaryotic protein sequences. PLoS One 2012; 7:e40155. [PMID: 22808107 PMCID: PMC3392279 DOI: 10.1371/journal.pone.0040155] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2012] [Accepted: 06/01/2012] [Indexed: 12/30/2022] Open
Abstract
Glycosylation is one of the most abundant post-translational modifications (PTMs) required for various structure/function modulations of proteins in a living cell. Although elucidated recently in prokaryotes, this type of PTM is present across all three domains of life. In prokaryotes, two types of protein glycan linkages are more widespread namely, N- linked, where a glycan moiety is attached to the amide group of Asn, and O- linked, where a glycan moiety is attached to the hydroxyl group of Ser/Thr/Tyr. For their biologically ubiquitous nature, significance, and technology applications, the study of prokaryotic glycoproteins is a fast emerging area of research. Here we describe new Support Vector Machine (SVM) based algorithms (models) developed for predicting glycosylated-residues (glycosites) with high accuracy in prokaryotic protein sequences. The models are based on binary profile of patterns, composition profile of patterns, and position-specific scoring matrix profile of patterns as training features. The study employ an extensive dataset of 107 N-linked and 116 O-linked glycosites extracted from 59 experimentally characterized glycoproteins of prokaryotes. This dataset includes validated N-glycosites from phyla Crenarchaeota, Euryarchaeota (domain Archaea), Proteobacteria (domain Bacteria) and validated O-glycosites from phyla Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria (domain Bacteria). In view of the current understanding that glycosylation occurs on folded proteins in bacteria, hybrid models have been developed using information on predicted secondary structures and accessible surface area in various combinations with training features. Using these models, N-glycosites and O-glycosites could be predicted with an accuracy of 82.71% (MCC 0.65) and 73.71% (MCC 0.48), respectively. An evaluation of the best performing models with 28 independent prokaryotic glycoproteins confirms the suitability of these models in predicting N- and O-glycosites in potential glycoproteins from aforementioned organisms, with reasonably high confidence. A web server GlycoPP, implementing these models is available freely at http:/www.imtech.res.in/raghava/glycopp/.
Collapse
Affiliation(s)
- Jagat S. Chauhan
- Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Adil H. Bhat
- Protein Science and Engineering, Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
| | - Gajendra P. S. Raghava
- Bioinformatics Centre, Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
- * E-mail: (AR); (GPSR)
| | - Alka Rao
- Protein Science and Engineering, Institute of Microbial Technology, Council of Scientific and Industrial Research, Chandigarh, India
- * E-mail: (AR); (GPSR)
| |
Collapse
|
32
|
Xiong Y, Liu J, Zhang W, Zeng T. Prediction of heme binding residues from protein sequences with integrative sequence profiles. Proteome Sci 2012; 10 Suppl 1:S20. [PMID: 22759579 PMCID: PMC3380730 DOI: 10.1186/1477-5956-10-s1-s20] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Background The heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand these activities and aid in functional annotation, however, insufficient work has been done on the research of heme binding residues from protein sequence information. Methods We propose a sequence-based approach for accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. In order to select the informative physicochemical properties, we design an intuitive feature selection scheme by combining a greedy strategy with correlation analysis. Results Our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent tests. Conclusions The novel feature of an integrative sequence profile achieves good performance using a reduced set of feature vector elements.
Collapse
Affiliation(s)
- Yi Xiong
- School of Computer, Wuhan University, Wuhan 430072, China.
| | | | | | | |
Collapse
|
33
|
Song J, Tan H, Wang M, Webb GI, Akutsu T. TANGLE: two-level support vector regression approach for protein backbone torsion angle prediction from primary sequences. PLoS One 2012; 7:e30361. [PMID: 22319565 PMCID: PMC3271071 DOI: 10.1371/journal.pone.0030361] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 12/14/2011] [Indexed: 12/29/2022] Open
Abstract
Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/.
Collapse
Affiliation(s)
- Jiangning Song
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
- * E-mail: (JS); (GIW); (TA)
| | - Hao Tan
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Mingjun Wang
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Geoffrey I. Webb
- Faculty of Information Technology, Monash University, Melbourne, Victoria, Australia
- * E-mail: (JS); (GIW); (TA)
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto, Japan
- * E-mail: (JS); (GIW); (TA)
| |
Collapse
|
34
|
Chen K, Mizianty MJ, Kurgan L. Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors. ACTA ACUST UNITED AC 2011; 28:331-41. [PMID: 22130595 DOI: 10.1093/bioinformatics/btr657] [Citation(s) in RCA: 86] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Nucleotides are multifunctional molecules that are essential for numerous biological processes. They serve as sources for chemical energy, participate in the cellular signaling and they are involved in the enzymatic reactions. The knowledge of the nucleotide-protein interactions helps with annotation of protein functions and finds applications in drug design. RESULTS We propose a novel ensemble of accurate high-throughput predictors of binding residues from the protein sequence for ATP, ADP, AMP, GTP and GDP. Empirical tests show that our NsitePred method significantly outperforms existing predictors and approaches based on sequence alignment and residue conservation scoring. The NsitePred accurately finds more binding residues and binding sites and it performs particularly well for the sites with residues that are clustered close together in the sequence. The high predictive quality stems from the usage of novel, comprehensive and custom-designed inputs that utilize information extracted from the sequence, evolutionary profiles, several sequence-predicted structural descriptors and sequence alignment. Analysis of the predictive model reveals several sequence-derived hallmarks of nucleotide-binding residues; they are usually conserved and flanked by less conserved residues, and they are associated with certain arrangements of secondary structures and amino acid pairs in the specific neighboring positions in the sequence. AVAILABILITY http://biomine.ece.ualberta.ca/nSITEpred/ CONTACT lkurgan@ece.ualberta.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Chen
- School of Computer Science and Software Engineering, Tianjin Polytechnic University, Hedong District, Tianjin 300160, PR of China
| | | | | |
Collapse
|
35
|
Liu R, Hu J. HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information. BMC Bioinformatics 2011; 12:207. [PMID: 21612668 PMCID: PMC3124436 DOI: 10.1186/1471-2105-12-207] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2010] [Accepted: 05/26/2011] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Accurate prediction of binding residues involved in the interactions between proteins and small ligands is one of the major challenges in structural bioinformatics. Heme is an essential and commonly used ligand that plays critical roles in electron transfer, catalysis, signal transduction and gene expression. Although much effort has been devoted to the development of various generic algorithms for ligand binding site prediction over the last decade, no algorithm has been specifically designed to complement experimental techniques for identification of heme binding residues. Consequently, an urgent need is to develop a computational method for recognizing these important residues. RESULTS Here we introduced an efficient algorithm HemeBIND for predicting heme binding residues by integrating structural and sequence information. We systematically investigated the characteristics of binding interfaces based on a non-redundant dataset of heme-protein complexes. It was found that several sequence and structural attributes such as evolutionary conservation, solvent accessibility, depth and protrusion clearly illustrate the differences between heme binding and non-binding residues. These features can then be separately used or combined to build the structure-based classifiers using support vector machine (SVM). The results showed that the information contained in these features is largely complementary and their combination achieved the best performance. To further improve the performance, an attempt has been made to develop a post-processing procedure to reduce the number of false positives. In addition, we built a sequence-based classifier based on SVM and sequence profile as an alternative when only sequence information can be used. Finally, we employed a voting method to combine the outputs of structure-based and sequence-based classifiers, which demonstrated remarkably better performance than the individual classifier alone. CONCLUSIONS HemeBIND is the first specialized algorithm used to predict binding residues in protein structures for heme ligands. Extensive experiments indicated that both the structure-based and sequence-based methods have effectively identified heme binding residues while the complementary relationship between them can result in a significant improvement in prediction performance. The value of our method is highlighted through the development of HemeBIND web server that is freely accessible at http://mleg.cse.sc.edu/hemeBIND/.
Collapse
Affiliation(s)
- Rong Liu
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
| | | |
Collapse
|