1
|
Shafiee S, Fathi A, Taherzadeh G. SPPPred: Sequence-Based Protein-Peptide Binding Residue Prediction Using Genetic Programming and Ensemble Learning. IEEE/ACM Trans Comput Biol Bioinform 2023; 20:2029-2040. [PMID: 37015594 DOI: 10.1109/tcbb.2022.3230540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Peptide-binding proteins play significant roles in various applications such as gene expression, metabolism, signal transmission, DNA (Deoxyribose Nucleic Acid) repair, and replication. Investigating the binding residues in protein-peptide complexes, especially from their sequence only, is challenging experimentally and computationally. Although several computational approaches have been introduced to determine and predict these binding residues, there is still ample room to improve the prediction performance. In this work, we introduce a novel ensemble machine learning-based approach called SPPPred (Sequence-based Protein-Peptide binding residue Prediction) to predict protein-peptide binding residues. First, we extract relevant sequential information and employ genetic programming algorithm for feature construction to find more distinctive features. We then, in the next step, build an ensemble-based machine learning classifier to predict binding residues. The proposed method shows consistent and comparable performance on both ten-fold cross-validation and independent test set. Furthermore, SPPPred yields F-Measure (F-M), Accuracy(ACC), and Matthews' Correlation Coefficient (MCC) of 0.310, 0.949, and 0.230 on the independent test set, respectively, which outperforms other competing methods by approximately up to 9% on the independent test set. SPPPred is publicly available https://github.com/GTaherzadeh/SPPPred.git.
Collapse
|
2
|
Taherzadeh G, Campbell M, Zhou Y. Computational Prediction of N- and O-Linked Glycosylation Sites for Human and Mouse Proteins. Methods Mol Biol 2022; 2499:177-186. [PMID: 35696081 DOI: 10.1007/978-1-0716-2317-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Protein glycosylation is one of the most complex posttranslational modifications (PTM) that play a fundamental role in protein function. Identification and annotation of these sites using experimental approaches are challenging and time consuming. Hence, there is a demand to build fast and efficient computational methods to address this problem. Here, we present the SPRINT-Gly framework containing the largest dataset and a prediction model of glycosylation sites for a given protein sequence. In this framework, we construct a large dataset containing N- and O-linked glycosylation sites of human and mouse proteins, collected from different sources. We then introduce the SPRINT-Gly method to predict putative N- and O-linked sites. SPRINT-Gly is a machine learning-based approach consisting of a number of trained predictive models for glycosylation sites in both human and mouse proteins, separately. The method is built by incorporating sequence-based, predicted structural, and physicochemical information of the neighboring residues of each N- and O-linked glycosylation site and by training deep learning neural network and support vector machine as classifiers. SPRINT-Gly outperformed other existing methods by achieving 18% and 50% higher Matthew's correlation coefficient for N- and O-linked glycosylation site prediction, respectively. SPRINT-Gly is publicly available as an online and stand-alone predictor at https://sparks-lab.org/server/sprint-gly/ .
Collapse
Affiliation(s)
- Ghazaleh Taherzadeh
- Department of Mathematics and Computer Science, Wilkes University, Wilkes-Barre, PA, USA.
| | - Matthew Campbell
- Institute for Glycomics, Griffith University, Southport, QLD, Australia
| | - Yaoqi Zhou
- Institute for Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, China
| |
Collapse
|
3
|
Lensink MF, Brysbaert G, Mauri T, Nadzirin N, Velankar S, Chaleil RAG, Clarence T, Bates PA, Kong R, Liu B, Yang G, Liu M, Shi H, Lu X, Chang S, Roy RS, Quadir F, Liu J, Cheng J, Antoniak A, Czaplewski C, Giełdoń A, Kogut M, Lipska AG, Liwo A, Lubecka EA, Maszota-Zieleniak M, Sieradzan AK, Ślusarz R, Wesołowski PA, Zięba K, Del Carpio Muñoz CA, Ichiishi E, Harmalkar A, Gray JJ, Bonvin AMJJ, Ambrosetti F, Vargas Honorato R, Jandova Z, Jiménez-García B, Koukos PI, Van Keulen S, Van Noort CW, Réau M, Roel-Touris J, Kotelnikov S, Padhorny D, Porter KA, Alekseenko A, Ignatov M, Desta I, Ashizawa R, Sun Z, Ghani U, Hashemi N, Vajda S, Kozakov D, Rosell M, Rodríguez-Lumbreras LA, Fernandez-Recio J, Karczynska A, Grudinin S, Yan Y, Li H, Lin P, Huang SY, Christoffer C, Terashi G, Verburgt J, Sarkar D, Aderinwale T, Wang X, Kihara D, Nakamura T, Hanazono Y, Gowthaman R, Guest JD, Yin R, Taherzadeh G, Pierce BG, Barradas-Bautista D, Cao Z, Cavallo L, Oliva R, Sun Y, Zhu S, Shen Y, Park T, Woo H, Yang J, Kwon S, Won J, Seok C, Kiyota Y, Kobayashi S, Harada Y, Takeda-Shitaka M, Kundrotas PJ, Singh A, Vakser IA, Dapkūnas J, Olechnovič K, Venclovas Č, Duan R, Qiu L, Xu X, Zhang S, Zou X, Wodak SJ. Prediction of protein assemblies, the next frontier: The CASP14-CAPRI experiment. Proteins 2021; 89:1800-1823. [PMID: 34453465 PMCID: PMC8616814 DOI: 10.1002/prot.26222] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 07/24/2021] [Accepted: 08/05/2021] [Indexed: 12/19/2022]
Abstract
We present the results for CAPRI Round 50, the fourth joint CASP-CAPRI protein assembly prediction challenge. The Round comprised a total of twelve targets, including six dimers, three trimers, and three higher-order oligomers. Four of these were easy targets, for which good structural templates were available either for the full assembly, or for the main interfaces (of the higher-order oligomers). Eight were difficult targets for which only distantly related templates were found for the individual subunits. Twenty-five CAPRI groups including eight automatic servers submitted ~1250 models per target. Twenty groups including six servers participated in the CAPRI scoring challenge submitted ~190 models per target. The accuracy of the predicted models was evaluated using the classical CAPRI criteria. The prediction performance was measured by a weighted scoring scheme that takes into account the number of models of acceptable quality or higher submitted by each group as part of their five top-ranking models. Compared to the previous CASP-CAPRI challenge, top performing groups submitted such models for a larger fraction (70-75%) of the targets in this Round, but fewer of these models were of high accuracy. Scorer groups achieved stronger performance with more groups submitting correct models for 70-80% of the targets or achieving high accuracy predictions. Servers performed less well in general, except for the MDOCKPP and LZERD servers, who performed on par with human groups. In addition to these results, major advances in methodology are discussed, providing an informative overview of where the prediction of protein assemblies currently stands.
Collapse
Affiliation(s)
- Marc F Lensink
- CNRS UMR8576 UGSF, Institute for Structural and Functional Glycobiology, University of Lille, Lille, France
| | - Guillaume Brysbaert
- CNRS UMR8576 UGSF, Institute for Structural and Functional Glycobiology, University of Lille, Lille, France
| | - Théo Mauri
- CNRS UMR8576 UGSF, Institute for Structural and Functional Glycobiology, University of Lille, Lille, France
| | - Nurul Nadzirin
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Sameer Velankar
- Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | | | - Tereza Clarence
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK
| | - Paul A Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, London, UK
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Bin Liu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Guangbo Yang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Ming Liu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Hang Shi
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Xufeng Lu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, China
| | - Raj S Roy
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Farhan Quadir
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
| | - Anna Antoniak
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | | | - Artur Giełdoń
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | - Mateusz Kogut
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | | | - Adam Liwo
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | - Emilia A Lubecka
- Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk, Poland
| | | | | | - Rafał Ślusarz
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | - Patryk A Wesołowski
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
- Intercollegiate Faculty of Biotechnology, University of Gdansk and Medical University of Gdansk, Gdansk, Poland
| | - Karolina Zięba
- Faculty of Chemistry, University of Gdansk, Gdansk, Poland
| | | | - Eiichiro Ichiishi
- International University of Health and Welfare Hospital (IUHW Hospital), Nasushiobara City, Japan
| | - Ameya Harmalkar
- Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Jeffrey J Gray
- Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, USA
| | - Alexandre M J J Bonvin
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Francesco Ambrosetti
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Rodrigo Vargas Honorato
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Zuzana Jandova
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Brian Jiménez-García
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Panagiotis I Koukos
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Siri Van Keulen
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Charlotte W Van Noort
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Manon Réau
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Jorge Roel-Touris
- Computational Structural Biology Group, Bijvoet Centre for Biomolecular Research, Department of Chemistry, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Sergei Kotelnikov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
- Innopolis University, Russia
| | - Dzmitry Padhorny
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Kathryn A Porter
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Andrey Alekseenko
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
- Institute of Computer-Aided Design of the Russian Academy of Sciences, Moscow, Russia
| | - Mikhail Ignatov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Israel Desta
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Ryota Ashizawa
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Zhuyezi Sun
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Usman Ghani
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Nasser Hashemi
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts, USA
- Department of Chemistry, Boston University, Boston, Massachusetts, USA
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, USA
| | - Mireia Rosell
- Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de la Rioja - Gobierno de La Rioja, Logrono, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Luis A Rodríguez-Lumbreras
- Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de la Rioja - Gobierno de La Rioja, Logrono, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Juan Fernandez-Recio
- Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC - Universidad de la Rioja - Gobierno de La Rioja, Logrono, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | | | - Sergei Grudinin
- Université Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, Grenoble, France
| | - Yumeng Yan
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Hao Li
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Peicong Lin
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Daipayan Sarkar
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Xiao Wang
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, Indiana, USA
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Tsukasa Nakamura
- Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi, Japan
| | - Yuya Hanazono
- Institute for Quantum Life Science, National Institutes for Quantum and Radiological Science and Technology, Tokai, Ibaraki, Japan
| | - Ragul Gowthaman
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Johnathan D Guest
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Rui Yin
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Ghazaleh Taherzadeh
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | - Brian G Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, Maryland, USA
| | | | - Zhen Cao
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Luigi Cavallo
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Romina Oliva
- University of Naples "Parthenope", Napoli, Italy
| | - Yuanfei Sun
- Department of Electrical and Computer Engineering, Texas A&M University, Texas, USA
| | - Shaowen Zhu
- Department of Electrical and Computer Engineering, Texas A&M University, Texas, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, Texas, USA
| | - Taeyong Park
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Hyeonuk Woo
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Jinsol Yang
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Sohee Kwon
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Jonghun Won
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Yasuomi Kiyota
- School of Pharmacy, Kitasato University, Minato-ku, Tokyo, Japan
| | | | - Yoshiki Harada
- School of Pharmacy, Kitasato University, Minato-ku, Tokyo, Japan
| | | | - Petras J Kundrotas
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, USA
| | - Amar Singh
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, USA
| | - Ilya A Vakser
- Computational Biology Program and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, USA
| | - Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Rui Duan
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Liming Qiu
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Xianjin Xu
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Shuang Zhang
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
| | - Xiaoqin Zou
- Institute for Data Science and Informatics, University of Missouri, Columbia, Missouri, USA
- Dalton Cardiovascular Research Center, University of Missouri, Columbia, Missouri, USA
- Department of Physics and Astronomy, University of Missouri, Columbia, Missouri, USA
- Department of Biochemistry, University of Missouri, Columbia, Missouri, USA
| | | |
Collapse
|
4
|
Yin R, Guest JD, Taherzadeh G, Gowthaman R, Mittra I, Quackenbush J, Pierce BG. Structural and energetic profiling of SARS-CoV-2 receptor binding domain antibody recognition and the impact of circulating variants. PLoS Comput Biol 2021; 17:e1009380. [PMID: 34491988 PMCID: PMC8448325 DOI: 10.1371/journal.pcbi.1009380] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 09/17/2021] [Accepted: 08/25/2021] [Indexed: 11/19/2022] Open
Abstract
The SARS-CoV-2 pandemic highlights the need for a detailed molecular understanding of protective antibody responses. This is underscored by the emergence and spread of SARS-CoV-2 variants, including Alpha (B.1.1.7) and Delta (B.1.617.2), some of which appear to be less effectively targeted by current monoclonal antibodies and vaccines. Here we report a high resolution and comprehensive map of antibody recognition of the SARS-CoV-2 spike receptor binding domain (RBD), which is the target of most neutralizing antibodies, using computational structural analysis. With a dataset of nonredundant experimentally determined antibody-RBD structures, we classified antibodies by RBD residue binding determinants using unsupervised clustering. We also identified the energetic and conservation features of epitope residues and assessed the capacity of viral variant mutations to disrupt antibody recognition, revealing sets of antibodies predicted to effectively target recently described viral variants. This detailed structure-based reference of antibody RBD recognition signatures can inform therapeutic and vaccine design strategies.
Collapse
Affiliation(s)
- Rui Yin
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, United States of America
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, United States of America
| | - Johnathan D. Guest
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, United States of America
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, United States of America
| | - Ghazaleh Taherzadeh
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, United States of America
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, United States of America
| | - Ragul Gowthaman
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, United States of America
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, United States of America
| | - Ipsa Mittra
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, United States of America
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, United States of America
| | - Jane Quackenbush
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, United States of America
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, United States of America
| | - Brian G. Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, Maryland, United States of America
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|
5
|
Ahmed S, Hossain Z, Uddin M, Taherzadeh G, Sharma A, Shatabda S, Dehzangi A. Accurate prediction of RNA 5-hydroxymethylcytosine modification by utilizing novel position-specific gapped k-mer descriptors. Comput Struct Biotechnol J 2020; 18:3528-3538. [PMID: 33304452 PMCID: PMC7701324 DOI: 10.1016/j.csbj.2020.10.032] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 10/30/2020] [Accepted: 10/30/2020] [Indexed: 12/13/2022] Open
Abstract
RNA modification is an essential step towards generation of new RNA structures. Such modification is potentially able to modify RNA function or its stability. Among different modifications, 5-Hydroxymethylcytosine (5hmC) modification of RNA exhibit significant potential for a series of biological processes. Understanding the distribution of 5hmC in RNA is essential to determine its biological functionality. Although conventional sequencing techniques allow broad identification of 5hmC, they are both time-consuming and resource-intensive. In this study, we propose a new computational tool called iRNA5hmC-PS to tackle this problem. To build iRNA5hmC-PS we extract a set of novel sequence-based features called Position-Specific Gapped k-mer (PSG k-mer) to obtain maximum sequential information. Our feature analysis shows that our proposed PSG k-mer features contain vital information for the identification of 5hmC sites. We also use a group-wise feature importance calculation strategy to select a small subset of features containing maximum discriminative information. Our experimental results demonstrate that iRNA5hmC-PS is able to enhance the prediction performance, dramatically. iRNA5hmC-PS achieves 78.3% prediction performance, which is 12.8% better than those reported in the previous studies. iRNA5hmC-PS is publicly available as an online tool at http://103.109.52.8:81/iRNA5hmC-PS. Its benchmark dataset, source codes, and documentation are available at https://github.com/zahid6454/iRNA5hmC-PS.
Collapse
Affiliation(s)
- Sajid Ahmed
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Zahid Hossain
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Mahtab Uddin
- Department of Natural Science, United International University, Dhaka, Bangladesh
| | - Ghazaleh Taherzadeh
- Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, MD 20742, USA
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD 4111, Australia.,Department of Medical Science Mathematics, Tokyo Medical and Dental University (TMDU), Tokyo, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.,School of Engineering and Physics, University of the South Pacific, Suva, Fiji
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Abdollah Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ 08102, USA.,Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| |
Collapse
|
6
|
Dipta SR, Taherzadeh G, Ahmad MW, Arafat ME, Shatabda S, Dehzangi A. SEMal: Accurate protein malonylation site predictor using structural and evolutionary information. Comput Biol Med 2020; 125:104022. [PMID: 33022522 DOI: 10.1016/j.compbiomed.2020.104022] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 09/24/2020] [Accepted: 09/25/2020] [Indexed: 10/23/2022]
Abstract
Post Transactional Modification (PTM) is a vital process which plays an important role in a wide range of biological interactions. One of the most recently identified PTMs is Malonylation. It has been shown that Malonylation has an important impact on different biological pathways including glucose and fatty acid metabolism. Malonylation can be detected experimentally using mass spectrometry. However, this process is both costly and time-consuming which has inspired research to find more efficient and fast computational methods to solve this problem. This paper proposes a novel approach, called SEMal, to identify Malonylation sites in protein sequences. It uses both structural and evolutionary-based features to solve this problem. It also uses Rotation Forest (RoF) as its classification technique to predict Malonylation sites. To the best of our knowledge, our extracted features as well as our employed classifier have never been used for this problem. Compared to the previously proposed methods, SEMal outperforms them in all metrics such as sensitivity (0.94 and 0.89), accuracy (0.94 and 0.91), and Matthews correlation coefficient (0.88 and 0.82), for Homo Sapiens and Mus Musculus species, respectively. SEMal is publicly available as an online predictor at: http://brl.uiu.ac.bd/SEMal/.
Collapse
Affiliation(s)
- Shubhashis Roy Dipta
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Ghazaleh Taherzadeh
- Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, MD, 20742, USA
| | - Md Wakil Ahmad
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Md Easin Arafat
- Institute of Information Technology, Jahangirnagar University, Savar, Dhaka, Bangladesh
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh.
| | - Abdollah Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, 08102, USA; Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, 08102, USA.
| |
Collapse
|
7
|
Arafat ME, Ahmad MW, Shovan S, Dehzangi A, Dipta SR, Hasan MAM, Taherzadeh G, Shatabda S, Sharma A. Accurately Predicting Glutarylation Sites Using Sequential Bi-Peptide-Based Evolutionary Features. Genes (Basel) 2020; 11:E1023. [PMID: 32878321 PMCID: PMC7565944 DOI: 10.3390/genes11091023] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/19/2020] [Accepted: 08/27/2020] [Indexed: 02/07/2023] Open
Abstract
Post Translational Modification (PTM) is defined as the alteration of protein sequence upon interaction with different macromolecules after the translation process. Glutarylation is considered one of the most important PTMs, which is associated with a wide range of cellular functioning, including metabolism, translation, and specified separate subcellular localizations. During the past few years, a wide range of computational approaches has been proposed to predict Glutarylation sites. However, despite all the efforts that have been made so far, the prediction performance of the Glutarylation sites has remained limited. One of the main challenges to tackle this problem is to extract features with significant discriminatory information. To address this issue, we propose a new machine learning method called BiPepGlut using the concept of a bi-peptide-based evolutionary method for feature extraction. To build this model, we also use the Extra-Trees (ET) classifier for the classification purpose, which, to the best of our knowledge, has never been used for this task. Our results demonstrate BiPepGlut is able to significantly outperform previously proposed models to tackle this problem. BiPepGlut achieves 92.0%, 84.8%, 95.6%, 0.82, and 0.88 in accuracy, sensitivity, specificity, Matthew's Correlation Coefficient, and F1-score, respectively. BiPepGlut is implemented as a publicly available online predictor.
Collapse
Affiliation(s)
- Md. Easin Arafat
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - Md. Wakil Ahmad
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - S.M. Shovan
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi 6204, Bangladesh; (S.M.S.); (M.A.M.H.)
| | - Abdollah Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ 08102, USA;
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| | - Shubhashis Roy Dipta
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - Md. Al Mehedi Hasan
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi 6204, Bangladesh; (S.M.S.); (M.A.M.H.)
| | - Ghazaleh Taherzadeh
- Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, MD 20742, USA
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD 4111, Australia
- Department of Medical Science Mathematics, Tokyo Medical and Dental University (TMDU), Tokyo 113-8510, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji
| |
Collapse
|
8
|
Wardah W, Dehzangi A, Taherzadeh G, Rashid MA, Khan M, Tsunoda T, Sharma A. Predicting protein-peptide binding sites with a deep convolutional neural network. J Theor Biol 2020; 496:110278. [DOI: 10.1016/j.jtbi.2020.110278] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Revised: 04/05/2020] [Accepted: 04/08/2020] [Indexed: 10/24/2022]
|
9
|
Taherzadeh G, Dehzangi A, Golchin M, Zhou Y, Campbell MP. SPRINT-Gly: predicting N- and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties. Bioinformatics 2020; 35:4140-4146. [PMID: 30903686 DOI: 10.1093/bioinformatics/btz215] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 03/03/2019] [Accepted: 03/21/2019] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Protein glycosylation is one of the most abundant post-translational modifications that plays an important role in immune responses, intercellular signaling, inflammation and host-pathogen interactions. However, due to the poor ionization efficiency and microheterogeneity of glycopeptides identifying glycosylation sites is a challenging task, and there is a demand for computational methods. Here, we constructed the largest dataset of human and mouse glycosylation sites to train deep learning neural networks and support vector machine classifiers to predict N-/O-linked glycosylation sites, respectively. RESULTS The method, called SPRINT-Gly, achieved consistent results between ten-fold cross validation and independent test for predicting human and mouse glycosylation sites. For N-glycosylation, a mouse-trained model performs equally well in human glycoproteins and vice versa, however, due to significant differences in O-linked sites separate models were generated. Overall, SPRINT-Gly is 18% and 50% higher in Matthews correlation coefficient than the next best method compared in N-linked and O-linked sites, respectively. This improved performance is due to the inclusion of novel structure and sequence-based features. AVAILABILITY AND IMPLEMENTATION http://sparks-lab.org/server/SPRINT-Gly/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD, USA
| | - Maryam Golchin
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia.,Institute for Glycomics, Griffith University, Parklands Drive, Gold Coast, QLD, Australia
| | - Matthew P Campbell
- Institute for Glycomics, Griffith University, Parklands Drive, Gold Coast, QLD, Australia
| |
Collapse
|
10
|
Abrahams JL, Taherzadeh G, Jarvas G, Guttman A, Zhou Y, Campbell MP. Recent advances in glycoinformatic platforms for glycomics and glycoproteomics. Curr Opin Struct Biol 2020. [PMID: 31874386 DOI: 10.1016/jsbi.2019.11.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
Protein glycosylation is the most complex and prevalent post-translation modification in terms of the number of proteins modified and the diversity generated. To understand the functional roles of glycoproteins it is important to gain an insight into the repertoire of oligosaccharides present. The comparison and relative quantitation of glycoforms combined with site-specific identification and occupancy are necessary steps in this direction. Computational platforms have continued to mature assisting researchers with the interpretation of such glycomics and glycoproteomics data sets, but frequently support dedicated workflows and users rely on the manual interpretation of data to gain insights into the glycoproteome. The growth of site-specific knowledge has also led to the implementation of machine-learning algorithms to predict glycosylation which is now being integrated into glycoproteomics pipelines. This short review describes commercial and open-access databases and software with an emphasis on those that are actively maintained and designed to support current analytical workflows.
Collapse
Affiliation(s)
- Jodie L Abrahams
- Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Gabor Jarvas
- Translational Glycomics Research Group, Research Institute of Biomolecular and Chemical Engineering, University of Pannonia, Veszprém, Hungary; Horváth Csaba Laboratory of Bioseparation Sciences, Research Centre for Molecular Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Andras Guttman
- Translational Glycomics Research Group, Research Institute of Biomolecular and Chemical Engineering, University of Pannonia, Veszprém, Hungary; Horváth Csaba Laboratory of Bioseparation Sciences, Research Centre for Molecular Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary; SCIEX, Brea, CA, USA
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Matthew P Campbell
- Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia.
| |
Collapse
|
11
|
Abrahams JL, Taherzadeh G, Jarvas G, Guttman A, Zhou Y, Campbell MP. Recent advances in glycoinformatic platforms for glycomics and glycoproteomics. Curr Opin Struct Biol 2019; 62:56-69. [PMID: 31874386 DOI: 10.1016/j.sbi.2019.11.009] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 11/05/2019] [Accepted: 11/15/2019] [Indexed: 12/16/2022]
Abstract
Protein glycosylation is the most complex and prevalent post-translation modification in terms of the number of proteins modified and the diversity generated. To understand the functional roles of glycoproteins it is important to gain an insight into the repertoire of oligosaccharides present. The comparison and relative quantitation of glycoforms combined with site-specific identification and occupancy are necessary steps in this direction. Computational platforms have continued to mature assisting researchers with the interpretation of such glycomics and glycoproteomics data sets, but frequently support dedicated workflows and users rely on the manual interpretation of data to gain insights into the glycoproteome. The growth of site-specific knowledge has also led to the implementation of machine-learning algorithms to predict glycosylation which is now being integrated into glycoproteomics pipelines. This short review describes commercial and open-access databases and software with an emphasis on those that are actively maintained and designed to support current analytical workflows.
Collapse
Affiliation(s)
- Jodie L Abrahams
- Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Gabor Jarvas
- Translational Glycomics Research Group, Research Institute of Biomolecular and Chemical Engineering, University of Pannonia, Veszprém, Hungary; Horváth Csaba Laboratory of Bioseparation Sciences, Research Centre for Molecular Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Andras Guttman
- Translational Glycomics Research Group, Research Institute of Biomolecular and Chemical Engineering, University of Pannonia, Veszprém, Hungary; Horváth Csaba Laboratory of Bioseparation Sciences, Research Centre for Molecular Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary; SCIEX, Brea, CA, USA
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Matthew P Campbell
- Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia.
| |
Collapse
|
12
|
Dehzangi A, López Y, Taherzadeh G, Sharma A, Tsunoda T. SumSec: Accurate Prediction of Sumoylation Sites Using Predicted Secondary Structure. Molecules 2018; 23:E3260. [PMID: 30544729 PMCID: PMC6320791 DOI: 10.3390/molecules23123260] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2018] [Revised: 11/30/2018] [Accepted: 12/05/2018] [Indexed: 12/13/2022] Open
Abstract
Post Translational Modification (PTM) is defined as the modification of amino acids along the protein sequences after the translation process. These modifications significantly impact on the functioning of proteins. Therefore, having a comprehensive understanding of the underlying mechanism of PTMs turns out to be critical in studying the biological roles of proteins. Among a wide range of PTMs, sumoylation is one of the most important modifications due to its known cellular functions which include transcriptional regulation, protein stability, and protein subcellular localization. Despite its importance, determining sumoylation sites via experimental methods is time-consuming and costly. This has led to a great demand for the development of fast computational methods able to accurately determine sumoylation sites in proteins. In this study, we present a new machine learning-based method for predicting sumoylation sites called SumSec. To do this, we employed the predicted secondary structure of amino acids to extract two types of structural features from neighboring amino acids along the protein sequence which has never been used for this task. As a result, our proposed method is able to enhance the sumoylation site prediction task, outperforming previously proposed methods in the literature. SumSec demonstrated high sensitivity (0.91), accuracy (0.94) and MCC (0.88). The prediction accuracy achieved in this study is 21% better than those reported in previous studies. The script and extracted features are publicly available at: https://github.com/YosvanyLopez/SumSec.
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD 21251, USA.
| | - Yosvany López
- Genesis Institute of Genetic Research, Genesis Healthcare Co., Tokyo 150-6015, Japan.
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast 4222, Australia.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane 4111, Australia.
- School of Engineering & Physics, University of the South Pacific, Suva, Fiji.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.
- CREST, JST, Tokyo 102-0076, Japan.
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan.
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan.
- CREST, JST, Tokyo 102-0076, Japan.
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo 113-8510, Japan.
| |
Collapse
|
13
|
Abstract
Protein-carbohydrate interaction is essential for biological systems, and carbohydrate-binding proteins (CBPs) are important targets when designing antiviral and anticancer drugs. Due to the high cost and difficulty associated with experimental approaches, many computational methods have been developed as complementary approaches to predict CBPs or carbohydrate-binding sites. However, most of these computational methods are not publicly available. Here, we provide a comprehensive review of related studies and demonstrate our two recently developed bioinformatics methods. The method SPOT-CBP is a template-based method for detecting CBPs based on structure through structural homology search combined with a knowledge-based scoring function. This method can yield model complex structure in addition to accurate prediction of CBPs. Furthermore, it has been observed that similarly accurate predictions can be made using structures from homology modeling, which has significantly expanded its applicability. The other method, SPRINT-CBH, is a de novo approach that predicts binding residues directly from protein sequences by using sequence information and predicted structural properties. This approach does not need structurally similar templates and thus is not limited by the current database of known protein-carbohydrate complex structures. These two complementary methods are available at https://sparks-lab.org. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Huiying Zhao
- Sun Yat-Sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia.,Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia
| | - Yuedong Yang
- School of Information and Communication Technology, Griffith University, Gold Coast, Queensland, Australia.,Institute for Glycomics, Griffith University, Gold Coast, Queensland, Australia.,School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
14
|
Taherzadeh G, Yang Y, Xu H, Xue Y, Liew AWC, Zhou Y. Predicting lysine-malonylation sites of proteins using sequence and predicted structural features. J Comput Chem 2018; 39:1757-1763. [DOI: 10.1002/jcc.25353] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2018] [Revised: 03/30/2018] [Accepted: 04/08/2018] [Indexed: 12/21/2022]
Affiliation(s)
- Ghazaleh Taherzadeh
- School of Information and Communication Technology; Griffith University, Parklands Drive; Southport Queensland 4222 Australia
| | - Yuedong Yang
- School of Data and Computer Science; Sun Yat-sen University; Guangzhou 510275 China
| | - Haodong Xu
- Key Laboratory of Molecular Biophysics of Ministry of Education, College of Life Science and Technology and the Collaborative Innovation Center for Biomedical Engineering; Huazhong University of Science and Technology; Wuhan Hubei 430074 China
| | - Yu Xue
- Key Laboratory of Molecular Biophysics of Ministry of Education, College of Life Science and Technology and the Collaborative Innovation Center for Biomedical Engineering; Huazhong University of Science and Technology; Wuhan Hubei 430074 China
| | - Alan Wee-Chung Liew
- School of Information and Communication Technology; Griffith University, Parklands Drive; Southport Queensland 4222 Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology; Griffith University, Parklands Drive; Southport Queensland 4222 Australia
- Institute for Glycomics, Griffith University, Parklands Dr; Southport Queensland 4222 Australia
| |
Collapse
|
15
|
Dehzangi A, López Y, Lal SP, Taherzadeh G, Sattar A, Tsunoda T, Sharma A. Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams. PLoS One 2018; 13:e0191900. [PMID: 29432431 PMCID: PMC5809022 DOI: 10.1371/journal.pone.0191900] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Accepted: 01/12/2018] [Indexed: 11/18/2022] Open
Abstract
Post-translational modification refers to the biological mechanism involved in the enzymatic modification of proteins after being translated in the ribosome. This mechanism comprises a wide range of structural modifications, which bring dramatic variations to the biological function of proteins. One of the recently discovered modifications is succinylation. Although succinylation can be detected through mass spectrometry, its current experimental detection turns out to be a timely process unable to meet the exponential growth of sequenced proteins. Therefore, the implementation of fast and accurate computational methods has emerged as a feasible solution. This paper proposes a novel classification approach, which effectively incorporates the secondary structure and evolutionary information of proteins through profile bigrams for succinylation prediction. The proposed predictor, abbreviated as SSEvol-Suc, made use of the above features for training an AdaBoost classifier and consequently predicting succinylated lysine residues. When SSEvol-Suc was compared with four benchmark predictors, it outperformed them in metrics such as sensitivity (0.909), accuracy (0.875) and Matthews correlation coefficient (0.75).
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, Maryland, United States of America
| | - Yosvany López
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- * E-mail:
| | - Sunil Pranit Lal
- School of Engineering & Advanced Technology, Massey University, Palmerston North, New Zealand
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Queensland, Australia
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Queensland, Australia
- Institute for Integrated and Intelligent Systems, Griffith University, Queensland, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- CREST, JST, Tokyo, Japan
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Institute for Integrated and Intelligent Systems, Griffith University, Queensland, Australia
- School of Engineering & Physics, University of the South Pacific, Suva, Fiji
| |
Collapse
|
16
|
López Y, Sharma A, Dehzangi A, Lal SP, Taherzadeh G, Sattar A, Tsunoda T. Success: evolutionary and structural properties of amino acids prove effective for succinylation site prediction. BMC Genomics 2018; 19:923. [PMID: 29363424 PMCID: PMC5781056 DOI: 10.1186/s12864-017-4336-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Post-translational modification is considered an important biological mechanism with critical impact on the diversification of the proteome. Although a long list of such modifications has been studied, succinylation of lysine residues has recently attracted the interest of the scientific community. The experimental detection of succinylation sites is an expensive process, which consumes a lot of time and resources. Therefore, computational predictors of this covalent modification have emerged as a last resort to tackling lysine succinylation. RESULTS In this paper, we propose a novel computational predictor called 'Success', which efficiently uses the structural and evolutionary information of amino acids for predicting succinylation sites. To do this, each lysine was described as a vector that combined the above information of surrounding amino acids. We then designed a support vector machine with a radial basis function kernel for discriminating between succinylated and non-succinylated residues. We finally compared the Success predictor with three state-of-the-art predictors in the literature. As a result, our proposed predictor showed a significant improvement over the compared predictors in statistical metrics, such as sensitivity (0.866), accuracy (0.838) and Matthews correlation coefficient (0.677) on a benchmark dataset. CONCLUSIONS The proposed predictor effectively uses the structural and evolutionary information of the amino acids surrounding a lysine. The bigram feature extraction approach, while retaining the same number of features, facilitates a better description of lysines. A support vector machine with a radial basis function kernel was used to discriminate between modified and unmodified lysines. The aforementioned aspects make the Success predictor outperform three state-of-the-art predictors in succinylation detection.
Collapse
Affiliation(s)
- Yosvany López
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan. .,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan. .,Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia. .,School of Engineering & Physics, University of the South Pacific, Suva, Fiji.
| | - Abdollah Dehzangi
- Department of Computer Science, School of Computer, Mathematical, and Natural Sciences, Morgan State University, Baltimore, Maryland, USA
| | - Sunil Pranit Lal
- School of Engineering & Advanced Technology, Massey University, Palmerston North, New Zealand
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Brisbane, Australia
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.,School of Information and Communication Technology, Griffith University, Brisbane, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.,CREST, JST, Tokyo, 113-8510, Japan
| |
Collapse
|
17
|
Guruge I, Taherzadeh G, Zhan J, Zhou Y, Yang Y. B
-factor profile prediction for RNA flexibility using support vector machines. J Comput Chem 2017; 39:407-411. [DOI: 10.1002/jcc.25124] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 11/07/2017] [Indexed: 12/12/2022]
Affiliation(s)
- Ivantha Guruge
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Jian Zhan
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
| | - Yuedong Yang
- School of Information and Communication Technology and Institue for Glycomics; Griffith University, Parklands Drive; Southport Queensland 4215 Australia
- School of Data and Computer Science; Sun Yat-sen University; Guangzhou 510275 China
| |
Collapse
|
18
|
Taherzadeh G, Zhou Y, Liew AWC, Yang Y. Structure-based prediction of protein– peptide binding regions using Random Forest. Bioinformatics 2017; 34:477-484. [DOI: 10.1093/bioinformatics/btx614] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 09/25/2017] [Indexed: 11/12/2022] Open
Affiliation(s)
- Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD, Australia
| | - Alan Wee-Chung Liew
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
| | - Yuedong Yang
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
- Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD, Australia
- School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
19
|
Dehzangi A, López Y, Lal SP, Taherzadeh G, Michaelson J, Sattar A, Tsunoda T, Sharma A. PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. J Theor Biol 2017; 425:97-102. [PMID: 28483566 DOI: 10.1016/j.jtbi.2017.05.005] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Revised: 04/28/2017] [Accepted: 05/03/2017] [Indexed: 11/25/2022]
Abstract
Post-translational modification (PTM) is a covalent and enzymatic modification of proteins, which contributes to diversify the proteome. Despite many reported PTMs with essential roles in cellular functioning, lysine succinylation has emerged as a subject of particular interest. Because its experimental identification remains a costly and time-consuming process, computational predictors have been recently proposed for tackling this important issue. However, the performance of current predictors is still very limited. In this paper, we propose a new predictor called PSSM-Suc which employs evolutionary information of amino acids for predicting succinylated lysine residues. Here we described each lysine residue in terms of profile bigrams extracted from position specific scoring matrices. We compared the performance of PSSM-Suc to that of existing predictors using a widely used benchmark dataset. PSSM-Suc showed a significant improvement in performance over state-of-the-art predictors. Its sensitivity, accuracy and Matthews correlation coefficient were 0.8159, 0.8199 and 0.6396, respectively.
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa, USA.
| | - Yosvany López
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan; Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Sunil Pranit Lal
- School of Engineering & Advanced Technology, Massey University, New Zealand
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4215, Australia
| | - Jacob Michaelson
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa, USA
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4215, Australia; Institute for Integrated and Intelligent Systems, Griffith University, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan; Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan; CREST, JST, Tokyo 113-8510, Japan
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan; Institute for Integrated and Intelligent Systems, Griffith University, Australia; School of Engineering & Physics, University of the South Pacific, Fiji
| |
Collapse
|
20
|
López Y, Dehzangi A, Lal SP, Taherzadeh G, Michaelson J, Sattar A, Tsunoda T, Sharma A. SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids. Anal Biochem 2017; 527:24-32. [PMID: 28363440 DOI: 10.1016/j.ab.2017.03.021] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Revised: 03/13/2017] [Accepted: 03/28/2017] [Indexed: 11/30/2022]
Abstract
Post-Translational Modification (PTM) is a biological reaction which contributes to diversify the proteome. Despite many modifications with important roles in cellular activity, lysine succinylation has recently emerged as an important PTM mark. It alters the chemical structure of lysines, leading to remarkable changes in the structure and function of proteins. In contrast to the huge amount of proteins being sequenced in the post-genome era, the experimental detection of succinylated residues remains expensive, inefficient and time-consuming. Therefore, the development of computational tools for accurately predicting succinylated lysines is an urgent necessity. To date, several approaches have been proposed but their sensitivity has been reportedly poor. In this paper, we propose an approach that utilizes structural features of amino acids to improve lysine succinylation prediction. Succinylated and non-succinylated lysines were first retrieved from 670 proteins and characteristics such as accessible surface area, backbone torsion angles and local structure conformations were incorporated. We used the k-nearest neighbors cleaning treatment for dealing with class imbalance and designed a pruned decision tree for classification. Our predictor, referred to as SucStruct (Succinylation using Structural features), proved to significantly improve performance when compared to previous predictors, with sensitivity, accuracy and Mathew's correlation coefficient equal to 0.7334-0.7946, 0.7444-0.7608 and 0.4884-0.5240, respectively.
Collapse
Affiliation(s)
- Yosvany López
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan; Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Abdollah Dehzangi
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa, USA.
| | - Sunil Pranit Lal
- School of Engineering & Advanced Technology, Massey University, New Zealand
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4215, Australia
| | - Jacob Michaelson
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa, USA
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4215, Australia; Institute for Integrated and Intelligent Systems, Griffith University, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan; Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan; CREST, JST, Tokyo 113-8510, Japan
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan; Institute for Integrated and Intelligent Systems, Griffith University, Australia
| |
Collapse
|
21
|
Taherzadeh G, Zhou Y, Liew AWC, Yang Y. Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines. J Chem Inf Model 2016; 56:2115-2122. [PMID: 27623166 DOI: 10.1021/acs.jcim.6b00320] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew's correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at http://sparks-lab.org/server/SPRINT-CBH .
Collapse
Affiliation(s)
- Ghazaleh Taherzadeh
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Alan Wee-Chung Liew
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Yuedong Yang
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| |
Collapse
|
22
|
Taherzadeh G, Yang Y, Zhang T, Liew AW, Zhou Y. Sequence‐based prediction of protein–peptide binding sites using support vector machine. J Comput Chem 2016; 37:1223-9. [DOI: 10.1002/jcc.24314] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2015] [Accepted: 01/06/2016] [Indexed: 11/06/2022]
Affiliation(s)
- Ghazaleh Taherzadeh
- School of Information and Communication TechnologyGriffith UniversityParklands DriveSouthport Queensland4215 Australia
| | - Yuedong Yang
- School of Information and Communication TechnologyGriffith UniversityParklands DriveSouthport Queensland4215 Australia
- Institute for Glycomics, Griffith UniversityParklands DrSouthport Queensland4215 Australia
| | - Tuo Zhang
- Weill Cornell Medical College1300 York AvenueNew York, New York10065
| | - Alan Wee‐Chung Liew
- School of Information and Communication TechnologyGriffith UniversityParklands DriveSouthport Queensland4215 Australia
| | - Yaoqi Zhou
- School of Information and Communication TechnologyGriffith UniversityParklands DriveSouthport Queensland4215 Australia
- Institute for Glycomics, Griffith UniversityParklands DrSouthport Queensland4215 Australia
| |
Collapse
|