51
|
Khan YD, Rasool N, Hussain W, Khan SA, Chou KC. iPhosY-PseAAC: identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC. Mol Biol Rep 2018; 45:2501-2509. [DOI: 10.1007/s11033-018-4417-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Accepted: 10/01/2018] [Indexed: 10/28/2022]
|
52
|
Chen W, Ding H, Zhou X, Lin H, Chou KC. iRNA(m6A)-PseDNC: Identifying N 6-methyladenosine sites using pseudo dinucleotide composition. Anal Biochem 2018; 561-562:59-65. [PMID: 30201554 DOI: 10.1016/j.ab.2018.09.002] [Citation(s) in RCA: 126] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 08/31/2018] [Accepted: 09/03/2018] [Indexed: 01/28/2023]
Abstract
As a prevalent post-transcriptional modification, N6-methyladenosine (m6A) plays key roles in a series of biological processes. Although experimental technologies have been developed and applied to identify m6A sites, they are still cost-ineffective for transcriptome-wide detections of m6A. As good complements to the experimental techniques, some computational methods have been proposed to identify m6A sites. However, their performance remains unsatisfactory. In this study, we firstly proposed an Euclidean distance based method to construct a high quality benchmark dataset. By encoding the RNA sequences using pseudo nucleotide composition, a new predictor called iRNA(m6A)-PseDNC was developed to identify m6A sites in the Saccharomyces cerevisiae genome. It has been demonstrated by the 10-fold cross validation test that the performance of iRNA(m6A)-PseDNC is superior to the existing methods. Meanwhile, for the convenience of most experimental scientists, established at the site http://lin-group.cn/server/iRNA(m6A)-PseDNC.php is its web-server, by which users can easily get their desired results without need to go through the detailed mathematics. It is anticipated that iRNA(m6A)-PseDNC will become a useful high throughput tool for identifying m6A sites in the S. cerevisiae genome.
Collapse
Affiliation(s)
- Wei Chen
- School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, 063000, China; Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611730, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Xu Zhou
- School of Sciences, Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, 063000, China.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA, 02478, USA.
| |
Collapse
|
53
|
Qiu WR, Sun BQ, Xiao X, Xu ZC, Jia JH, Chou KC. iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics 2018; 110:239-246. [DOI: 10.1016/j.ygeno.2017.10.008] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 10/23/2017] [Accepted: 10/25/2017] [Indexed: 01/23/2023]
|
54
|
Akbar S, Hayat M. iMethyl-STTNC: Identification of N 6-methyladenosine sites by extending the idea of SAAC into Chou's PseAAC to formulate RNA sequences. J Theor Biol 2018; 455:205-211. [PMID: 30031793 DOI: 10.1016/j.jtbi.2018.07.018] [Citation(s) in RCA: 79] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 07/14/2018] [Accepted: 07/17/2018] [Indexed: 11/17/2022]
Abstract
N6- methyladenosine (m6A) is a vital post-transcriptional modification, which adds another layer of epigenetic regulation at RNA level. It chemically modifies mRNA that effects protein expression. RNA sequence contains many genetic code motifs (GAC). Among these codes, identification of methylated or not methylated GAC motif is highly indispensable. However, with a large number of RNA sequences generated in post-genomic era, it becomes a challenging task how to accurately and speedily characterize these sequences. In view of this, the concept of an intelligent is incorporated with a computational model that truly and fast reflects the motif of the desired classes. An intelligent computational model "iMethyl-STTNC" model is proposed for identification of methyladenosine sites in RNA. In the proposed study, four feature extraction techniques, such as; Pseudo-dinucleotide-composition, Pseudo-trinucleotide-composition, split-trinucleotide-composition, and split-tetra-nucleotides-composition (STTNC) are utilized for genuine numerical descriptors. Three different classification algorithms including probabilistic neural network, Support vector machine (SVM), and K-nearest neighbor are adopted for prediction. After examining the outcomes of prediction model on each feature spaces, SVM using STTNC feature space reported the highest accuracy of 69.84%, 91.84% on dataset1 and dataset2, respectively. The reported results show that our proposed predictor has achieved encouraging results compared to the present approaches, so far in the research. It is finally reckoned that our developed model might be beneficial for in-depth analysis of genomes and drug development.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan.
| |
Collapse
|
55
|
Ju Z, Wang SY. Prediction of citrullination sites by incorporating k-spaced amino acid pairs into Chou's general pseudo amino acid composition. Gene 2018; 664:78-83. [DOI: 10.1016/j.gene.2018.04.055] [Citation(s) in RCA: 76] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Revised: 03/23/2018] [Accepted: 04/18/2018] [Indexed: 01/09/2023]
|
56
|
Ning Q, Zhao X, Bao L, Ma Z, Zhao X. Detecting Succinylation sites from protein sequences using ensemble support vector machine. BMC Bioinformatics 2018; 19:237. [PMID: 29940836 PMCID: PMC6016146 DOI: 10.1186/s12859-018-2249-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 06/14/2018] [Indexed: 12/14/2022] Open
Abstract
Background Lysine succinylation is a new kind of post-translational modification which plays a key role in protein conformation regulation and cellular function control. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. However, traditional methods, experimental approaches, are labor-intensive and time-consuming. Computational prediction methods have been proposed recent years, and they are popular because of their convenience and high speed. In this study, we developed a new method to predict succinylation sites in protein combining multiple features, including amino acid composition, binary encoding, physicochemical property and grey pseudo amino acid composition, with a feature selection scheme (information gain). And then, it was trained using SVM (Support Vector Machine) and an ensemble learning algorithm. Results The performance of this method was measured with an accuracy of 89.14% and a MCC (Matthew Correlation Coefficient) of 0.79 using 10-fold cross validation on training dataset and an accuracy of 84.5% and a MCC of 0.2 on independent dataset. Conclusions The conclusions made from this study can help to understand more of the succinylation mechanism. These results suggest that our method was very promising for predicting succinylation sites. The source code and data of this paper are freely available athttps://github.com/ningq669/PSuccE. Electronic supplementary material The online version of this article (10.1186/s12859-018-2249-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qiao Ning
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Xiaosa Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Lingling Bao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China.
| | - Xiaowei Zhao
- Key Laboratory of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, 130117, China.
| |
Collapse
|
57
|
Mattice AMS, MacLean IA, Childers CL, Storey KB. -Characterization of pyruvate kinase from the anoxia tolerant turtle, Trachemys scripta elegans: a potential role for enzyme methylation during metabolic rate depression. PeerJ 2018; 6:e4918. [PMID: 29900073 PMCID: PMC5995096 DOI: 10.7717/peerj.4918] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 05/16/2018] [Indexed: 01/21/2023] Open
Abstract
Background Pyruvate kinase (PK) is responsible for the final reaction in glycolysis. As PK is a glycolytic control point, the analysis of PK posttranslational modifications (PTM) and kinetic changes reveals a key piece of the reorganization of energy metabolism in an anoxia tolerant vertebrate. Methods To explore PK regulation, the enzyme was isolated from red skeletal muscle and liver of aerobic and 20-hr anoxia-exposed red eared-slider turtles (Trachemys scripta elegans). Kinetic analysis and immunoblotting were used to assess enzyme function and the corresponding covalent modifications to the enzymes structure during anoxia. Results Both muscle and liver isoforms showed decreased affinity for phosphoenolpyruvate substrate during anoxia, and muscle PK also had a lower affinity for ADP. I50 values for the inhibitors ATP and lactate were lower for PK from both tissues after anoxic exposure while I50 L-alanine was only reduced in the liver. Both isozymes showed significant increases in threonine phosphorylation (by 42% in muscle and 60% in liver) and lysine methylation (by 43% in muscle and 70% in liver) during anoxia which have been linked to suppression of PK activity in other organisms. Liver PK also showed a 26% decrease in tyrosine phosphorylation under anoxia. Discussion Anoxia responsive changes in turtle muscle and liver PK coordinate with an overall reduced activity state. This reduced affinity for the forward glycolytic reaction is likely a key component of the overall metabolic rate depression that supports long term survival in anoxia tolerant turtles. The coinciding methyl- and phospho- PTM alterations present the mechanism for tissue specific enzyme modification during anoxia.
Collapse
Affiliation(s)
- Amanda M S Mattice
- Institute of Biochemistry, Department of Biology, Carleton University, Ottawa, Canada
| | - Isabelle A MacLean
- Institute of Biochemistry, Department of Biology, Carleton University, Ottawa, Canada
| | - Christine L Childers
- Institute of Biochemistry, Department of Biology, Carleton University, Ottawa, Canada
| | - Kenneth B Storey
- Institute of Biochemistry, Department of Biology and Chemistry, Carleton University, Ottawa, Canada
| |
Collapse
|
58
|
Khan YD, Rasool N, Hussain W, Khan SA, Chou KC. iPhosT-PseAAC: Identify phosphothreonine sites by incorporating sequence statistical moments into PseAAC. Anal Biochem 2018; 550:109-116. [DOI: 10.1016/j.ab.2018.04.021] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2018] [Revised: 04/19/2018] [Accepted: 04/21/2018] [Indexed: 01/29/2023]
|
59
|
Shi S, Wang L, Cao M, Chen G, Yu J. Proteomic analysis and prediction of amino acid variations that influence protein posttranslational modifications. Brief Bioinform 2018; 20:1597-1606. [DOI: 10.1093/bib/bby036] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Revised: 03/07/2018] [Indexed: 12/18/2022] Open
Abstract
Abstract
Accumulative studies have indicated that amino acid variations through changing the type of residues of the target sites or key flanking residues could directly or indirectly influence protein posttranslational modifications (PTMs) and bring about a detrimental effect on protein function. Computational mutation analysis can greatly narrow down the efforts on experimental work. To increase the utilization of current computational resources, we first provide an overview of computational prediction of amino acid variations that influence protein PTMs and their functional analysis. We also discuss the challenges that are faced while developing novel in silico approaches in the future. The development of better methods for mutation analysis-related protein PTMs will help to facilitate the development of personalized precision medicine.
Collapse
Affiliation(s)
- Shaoping Shi
- Department of Mathematics and Numerical Simulation and High-Performance Computing Laboratory, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Lina Wang
- Department of Science, Nanchang Institute of Technology, Nanchang, Jiangxi 330031, China
| | - Man Cao
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Guodong Chen
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| | - Jialin Yu
- Department of Mathematics, School of Sciences, Nanchang University, Nanchang, Jiangxi 330031, China
| |
Collapse
|
60
|
Islam MM, Saha S, Rahman MM, Shatabda S, Farid DM, Dehzangi A. iProtGly-SS: Identifying protein glycation sites using sequence and structure based features. Proteins 2018; 86:777-789. [PMID: 29675975 DOI: 10.1002/prot.25511] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 02/27/2018] [Accepted: 04/14/2018] [Indexed: 12/20/2022]
Abstract
Glycation is chemical reaction by which sugar molecule bonds with a protein without the help of enzymes. This is often cause to many diseases and therefore the knowledge about glycation is very important. In this paper, we present iProtGly-SS, a protein lysine glycation site identification method based on features extracted from sequence and secondary structural information. In the experiments, we found the best feature groups combination: Amino Acid Composition, Secondary Structure Motifs, and Polarity. We used support vector machine classifier to train our model and used an optimal set of features using a group based forward feature selection technique. On standard benchmark datasets, our method is able to significantly outperform existing methods for glycation prediction. A web server for iProtGly-SS is implemented and publicly available to use: http://brl.uiu.ac.bd/iprotgly-ss/.
Collapse
Affiliation(s)
- Md Mofijul Islam
- Department of CSE, University of Dhaka, Dhaka, Bangladesh.,Department of CSE, United International University, Dhaka, Bangladesh
| | - Sanjay Saha
- Department of CSE, United International University, Dhaka, Bangladesh
| | | | - Swakkhar Shatabda
- Department of CSE, United International University, Dhaka, Bangladesh
| | - Dewan Md Farid
- Department of CSE, United International University, Dhaka, Bangladesh
| | - Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, MD, 21251, USA
| |
Collapse
|
61
|
iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition. Oncotarget 2018; 8:41178-41188. [PMID: 28476023 PMCID: PMC5522291 DOI: 10.18632/oncotarget.17104] [Citation(s) in RCA: 146] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 03/15/2017] [Indexed: 01/24/2023] Open
Abstract
Occurring at cytosine (C) of RNA, 5-methylcytosine (m5C) is an important post-transcriptional modification (PTCM). The modification plays significant roles in biological processes by regulating RNA metabolism in both eukaryotes and prokaryotes. It may also, however, cause cancers and other major diseases. Given an uncharacterized RNA sequence that contains many C residues, can we identify which one of them can be of m5C modification, and which one cannot? It is no doubt a crucial problem, particularly with the explosive growth of RNA sequences in the postgenomic age. Unfortunately, so far no user-friendly web-server whatsoever has been developed to address such a problem. To meet the increasingly high demand from most experimental scientists working in the area of drug development, we have developed a new predictor called iRNAm5C-PseDNC by incorporating ten types of physical-chemical properties into pseudo dinucleotide composition via the auto/cross-covariance approach. Rigorous jackknife tests show that its anticipated accuracy is quite high. For most experimental scientists’ convenience, a user-friendly web-server for the predictor has been provided at http://www.jci-bioinfo.cn/iRNAm5C-PseDNC along with a step-by-step user guide, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the approach presented here can also be used to deal with many other problems in genome analysis.
Collapse
|
62
|
iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into chou's pseudo amino acid composition. J Theor Biol 2018; 442:11-21. [DOI: 10.1016/j.jtbi.2018.01.008] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 12/23/2017] [Accepted: 01/10/2018] [Indexed: 02/08/2023]
|
63
|
Qiu WR, Sun BQ, Xiao X, Xu ZC, Chou KC. iHyd-PseCp: Identify hydroxyproline and hydroxylysine in proteins by incorporating sequence-coupled effects into general PseAAC. Oncotarget 2018; 7:44310-44321. [PMID: 27322424 PMCID: PMC5190098 DOI: 10.18632/oncotarget.10027] [Citation(s) in RCA: 141] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 05/29/2016] [Indexed: 12/30/2022] Open
Abstract
Protein hydroxylation is a posttranslational modification (PTM), in which a CH group in Pro (P) or Lys (K) residue has been converted into a COH group, or a hydroxyl group (−OH) is converted into an organic compound. Closely associated with cellular signaling activities, this type of PTM is also involved in some major diseases, such as stomach cancer and lung cancer. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of P or K, which ones can be hydroxylated, and which ones cannot? With the explosive growth of protein sequences in the post-genomic age, the problem has become even more urgent. To address such a problem, we have developed a predictor called iHyd-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition (PseAAC) and introducing the “Random Forest” algorithm to operate the calculation. Rigorous jackknife tests indicated that the new predictor remarkably outperformed the existing state-of-the-art prediction method for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for iHyd-PseCp has been established at http://www.jci-bioinfo.cn/iHyd-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Department of Computer Science and Bond Life Science Center, University of Missouri, Columbia, MO, USA
| | - Bi-Qian Sun
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Gordon Life Science Institute, Boston, MA, USA
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.,Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| |
Collapse
|
64
|
Chen W, Feng P, Yang H, Ding H, Lin H, Chou KC. iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget 2018; 8:4208-4217. [PMID: 27926534 PMCID: PMC5354824 DOI: 10.18632/oncotarget.13758] [Citation(s) in RCA: 199] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 11/23/2016] [Indexed: 01/14/2023] Open
Abstract
Catalyzed by adenosine deaminase (ADAR), the adenosine to inosine (A-to-I) editing in RNA is not only involved in various important biological processes, but also closely associated with a series of major diseases. Therefore, knowledge about the A-to-I editing sites in RNA is crucially important for both basic research and drug development. Given an uncharacterized RNA sequence that contains many adenosine (A) residues, can we identify which one of them can be of A-to-I editing, and which one cannot? Unfortunately, so far no computational method whatsoever has been developed to address such an important problem based on the RNA sequence information alone. To fill this empty area, we have proposed a predictor called iRNA-AI by incorporating the chemical properties of nucleotides and their sliding occurrence density distribution along a RNA sequence into the general form of pseudo nucleotide composition (PseKNC). It has been shown by the rigorous jackknife test and independent dataset test that the performance of the proposed predictor is quite promising. For the convenience of most experimental scientists, a user-friendly web-server for iRNA-AI has been established at http://lin.uestc.edu.cn/server/iRNA-AI/, by which users can easily get their desired results without the need to go through the mathematical details.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan, China.,Gordon Life Science Institute, Belmont, Massachusetts, United States of America
| | - Pengmian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Gordon Life Science Institute, Belmont, Massachusetts, United States of America
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Gordon Life Science Institute, Belmont, Massachusetts, United States of America
| |
Collapse
|
65
|
Dehzangi A, López Y, Lal SP, Taherzadeh G, Sattar A, Tsunoda T, Sharma A. Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams. PLoS One 2018; 13:e0191900. [PMID: 29432431 PMCID: PMC5809022 DOI: 10.1371/journal.pone.0191900] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Accepted: 01/12/2018] [Indexed: 11/18/2022] Open
Abstract
Post-translational modification refers to the biological mechanism involved in the enzymatic modification of proteins after being translated in the ribosome. This mechanism comprises a wide range of structural modifications, which bring dramatic variations to the biological function of proteins. One of the recently discovered modifications is succinylation. Although succinylation can be detected through mass spectrometry, its current experimental detection turns out to be a timely process unable to meet the exponential growth of sequenced proteins. Therefore, the implementation of fast and accurate computational methods has emerged as a feasible solution. This paper proposes a novel classification approach, which effectively incorporates the secondary structure and evolutionary information of proteins through profile bigrams for succinylation prediction. The proposed predictor, abbreviated as SSEvol-Suc, made use of the above features for training an AdaBoost classifier and consequently predicting succinylated lysine residues. When SSEvol-Suc was compared with four benchmark predictors, it outperformed them in metrics such as sensitivity (0.909), accuracy (0.875) and Matthews correlation coefficient (0.75).
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, Maryland, United States of America
| | - Yosvany López
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- * E-mail:
| | - Sunil Pranit Lal
- School of Engineering & Advanced Technology, Massey University, Palmerston North, New Zealand
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Queensland, Australia
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Queensland, Australia
- Institute for Integrated and Intelligent Systems, Griffith University, Queensland, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- CREST, JST, Tokyo, Japan
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Institute for Integrated and Intelligent Systems, Griffith University, Queensland, Australia
- School of Engineering & Physics, University of the South Pacific, Suva, Fiji
| |
Collapse
|
66
|
Jia C, Zuo Y, Zou Q. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics 2018; 34:2029-2036. [DOI: 10.1093/bioinformatics/bty039] [Citation(s) in RCA: 97] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 02/05/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Cangzhi Jia
- Department of Mathematics, Dalian Maritime University, Dalian, China
| | - Yun Zuo
- Department of Mathematics, Dalian Maritime University, Dalian, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| |
Collapse
|
67
|
Qiu WR, Xiao X, Xu ZC, Chou KC. iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget 2018; 7:51270-51283. [PMID: 27323404 PMCID: PMC5239474 DOI: 10.18632/oncotarget.9987] [Citation(s) in RCA: 132] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 05/23/2016] [Indexed: 11/26/2022] Open
Abstract
Protein phosphorylation is a posttranslational modification (PTM or PTLM), where a phosphoryl group is added to the residue(s) of a protein molecule. The most commonly phosphorylated amino acids occur at serine (S), threonine (T), and tyrosine (Y). Protein phosphorylation plays a significant role in a wide range of cellular processes; meanwhile its dysregulation is also involved with many diseases. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of S, T, or Y, which ones can be phosphorylated, and which ones cannot? To address this problem, we have developed a predictor called iPhos-PseEn by fusing four different pseudo component approaches (amino acids’ disorder scores, nearest neighbor scores, occurrence frequencies, and position weights) into an ensemble classifier via a voting system. Rigorous cross-validations indicated that the proposed predictor remarkably outperformed its existing counterparts. For the convenience of most experimental scientists, a user-friendly web-server for iPhos-PseEn has been established at http://www.jci-bioinfo.cn/iPhos-PseEn, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Wang-Ren Qiu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Department of Computer Science and Bond Life Science Center, University of Missouri, Columbia, MO, USA
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China.,Gordon Life Science Institute, Boston, MA, USA
| | - Zhao-Chun Xu
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA, USA.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia.,Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| |
Collapse
|
68
|
Xiao X, Ye HX, Liu Z, Jia JH, Chou KC. iROS-gPseKNC: Predicting replication origin sites in DNA by incorporating dinucleotide position-specific propensity into general pseudo nucleotide composition. Oncotarget 2018; 7:34180-9. [PMID: 27147572 PMCID: PMC5085147 DOI: 10.18632/oncotarget.9057] [Citation(s) in RCA: 109] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 04/09/2016] [Indexed: 11/25/2022] Open
Abstract
DNA replication, occurring in all living organisms and being the basis for biological inheritance, is the process of producing two identical replicas from one original DNA molecule. To in-depth understand such an important biological process and use it for developing new strategy against genetics diseases, the knowledge of duplication origin sites in DNA is indispensible. With the explosive growth of DNA sequences emerging in the postgenomic age, it is highly desired to develop high throughput tools to identify these regions purely based on the sequence information alone. In this paper, by incorporating the dinucleotide position-specific propensity information into the general pseudo nucleotide composition and using the random forest classifier, a new predictor called iROS-gPseKNC was proposed. Rigorously cross-validations have indicated that the proposed predictor is significantly better than the best existing method in sensitivity, specificity, overall accuracy, and stability. Furthermore, a user-friendly web-server for iROS-gPseKNC has been established at http://www.jci-bioinfo.cn/iROS-gPseKNC, by which users can easily get their desired results without the need to bother the complicated mathematics, which were presented just for the integrity of the methodology itself.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, 333403, China.,Information School, ZheJiang Textile and Fashion College, NingBo, 315211, China.,Gordon Life Science Institute, Boston, Massachusetts, 02478, USA
| | - Han-Xiao Ye
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, 333403, China
| | - Zi Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Jian-Hua Jia
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen, 333403, China
| | - Kuo-Chen Chou
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, 21589, Saudi Arabia.,Gordon Life Science Institute, Boston, Massachusetts, 02478, USA
| |
Collapse
|
69
|
López Y, Sharma A, Dehzangi A, Lal SP, Taherzadeh G, Sattar A, Tsunoda T. Success: evolutionary and structural properties of amino acids prove effective for succinylation site prediction. BMC Genomics 2018; 19:923. [PMID: 29363424 PMCID: PMC5781056 DOI: 10.1186/s12864-017-4336-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Post-translational modification is considered an important biological mechanism with critical impact on the diversification of the proteome. Although a long list of such modifications has been studied, succinylation of lysine residues has recently attracted the interest of the scientific community. The experimental detection of succinylation sites is an expensive process, which consumes a lot of time and resources. Therefore, computational predictors of this covalent modification have emerged as a last resort to tackling lysine succinylation. RESULTS In this paper, we propose a novel computational predictor called 'Success', which efficiently uses the structural and evolutionary information of amino acids for predicting succinylation sites. To do this, each lysine was described as a vector that combined the above information of surrounding amino acids. We then designed a support vector machine with a radial basis function kernel for discriminating between succinylated and non-succinylated residues. We finally compared the Success predictor with three state-of-the-art predictors in the literature. As a result, our proposed predictor showed a significant improvement over the compared predictors in statistical metrics, such as sensitivity (0.866), accuracy (0.838) and Matthews correlation coefficient (0.677) on a benchmark dataset. CONCLUSIONS The proposed predictor effectively uses the structural and evolutionary information of the amino acids surrounding a lysine. The bigram feature extraction approach, while retaining the same number of features, facilitates a better description of lysines. A support vector machine with a radial basis function kernel was used to discriminate between modified and unmodified lysines. The aforementioned aspects make the Success predictor outperform three state-of-the-art predictors in succinylation detection.
Collapse
Affiliation(s)
- Yosvany López
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan. .,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan. .,Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia. .,School of Engineering & Physics, University of the South Pacific, Suva, Fiji.
| | - Abdollah Dehzangi
- Department of Computer Science, School of Computer, Mathematical, and Natural Sciences, Morgan State University, Baltimore, Maryland, USA
| | - Sunil Pranit Lal
- School of Engineering & Advanced Technology, Massey University, Palmerston North, New Zealand
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Brisbane, Australia
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.,School of Information and Communication Technology, Griffith University, Brisbane, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.,CREST, JST, Tokyo, 113-8510, Japan
| |
Collapse
|
70
|
Jia J, Liu Z, Xiao X, Liu B, Chou KC. iCar-PseCp: identify carbonylation sites in proteins by Monte Carlo sampling and incorporating sequence coupled effects into general PseAAC. Oncotarget 2018; 7:34558-70. [PMID: 27153555 PMCID: PMC5085176 DOI: 10.18632/oncotarget.9148] [Citation(s) in RCA: 149] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 04/09/2016] [Indexed: 01/22/2023] Open
Abstract
Carbonylation is a posttranslational modification (PTM or PTLM), where a carbonyl group is added to lysine (K), proline (P), arginine (R), and threonine (T) residue of a protein molecule. Carbonylation plays an important role in orchestrating various biological processes but it is also associated with many diseases such as diabetes, chronic lung disease, Parkinson's disease, Alzheimer's disease, chronic renal failure, and sepsis. Therefore, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence containing many residues of K, P, R, or T, which ones can be carbonylated, and which ones cannot? To address this problem, we have developed a predictor called iCar-PseCp by incorporating the sequence-coupled information into the general pseudo amino acid composition, and balancing out skewed training dataset by Monte Carlo sampling to expand positive subset. Rigorous target cross-validations on a same set of carbonylation-known proteins indicated that the new predictor remarkably outperformed its existing counterparts. For the convenience of most experimental scientists, a user-friendly web-server for iCar-PseCp has been established at http://www.jci-bioinfo.cn/iCar-PseCp, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics.
Collapse
Affiliation(s)
- Jianhua Jia
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403 China.,Gordon Life Science Institute, Boston, MA 02478, USA
| | - Zi Liu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China
| | - Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403 China.,Gordon Life Science Institute, Boston, MA 02478, USA
| | - Bingxiang Liu
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403 China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
71
|
iACP: a sequence-based tool for identifying anticancer peptides. Oncotarget 2017; 7:16895-909. [PMID: 26942877 PMCID: PMC4941358 DOI: 10.18632/oncotarget.7815] [Citation(s) in RCA: 300] [Impact Index Per Article: 42.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2016] [Accepted: 02/11/2016] [Indexed: 02/07/2023] Open
Abstract
Cancer remains a major killer worldwide. Traditional methods of cancer treatment are expensive and have some deleterious side effects on normal cells. Fortunately, the discovery of anticancer peptides (ACPs) has paved a new way for cancer treatment. With the explosive growth of peptide sequences generated in the post genomic age, it is highly desired to develop computational methods for rapidly and effectively identifying ACPs, so as to speed up their application in treating cancer. Here we report a sequence-based predictor called iACP developed by the approach of optimizing the g-gap dipeptide components. It was demonstrated by rigorous cross-validations that the new predictor remarkably outperformed the existing predictors for the same purpose in both overall accuracy and stability. For the convenience of most experimental scientists, a publicly accessible web-server for iACP has been established at http://lin.uestc.edu.cn/server/iACP, by which users can easily obtain their desired results.
Collapse
|
72
|
Du PF, Zhao W, Miao YY, Wei LY, Wang L. UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences. Int J Mol Sci 2017; 18:ijms18112400. [PMID: 29135934 PMCID: PMC5713368 DOI: 10.3390/ijms18112400] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 11/01/2017] [Accepted: 11/03/2017] [Indexed: 01/12/2023] Open
Abstract
With the avalanche of biological sequences in public databases, one of the most challenging problems in computational biology is to predict their biological functions and cellular attributes. Most of the existing prediction algorithms can only handle fixed-length numerical vectors. Therefore, it is important to be able to represent biological sequences with various lengths using fixed-length numerical vectors. Although several algorithms, as well as software implementations, have been developed to address this problem, these existing programs can only provide a fixed number of representation modes. Every time a new sequence representation mode is developed, a new program will be needed. In this paper, we propose the UltraPse as a universal software platform for this problem. The function of the UltraPse is not only to generate various existing sequence representation modes, but also to simplify all future programming works in developing novel representation modes. The extensibility of UltraPse is particularly enhanced. It allows the users to define their own representation mode, their own physicochemical properties, or even their own types of biological sequences. Moreover, UltraPse is also the fastest software of its kind. The source code package, as well as the executables for both Linux and Windows platforms, can be downloaded from the GitHub repository.
Collapse
Affiliation(s)
- Pu-Feng Du
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.
| | - Wei Zhao
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.
| | - Yang-Yang Miao
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.
- School of Chemical Engineering, Tianjin University, Tianjin 300350, China.
| | - Le-Yi Wei
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.
| | - Likun Wang
- Institute of Systems Biomedicine, Beijing Key Laboratory of Tumor Systems Biology, Department of Pathology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China.
| |
Collapse
|
73
|
Ju Z, Cao JZ. Prediction of protein N-formylation using the composition of k-spaced amino acid pairs. Anal Biochem 2017; 534:40-45. [DOI: 10.1016/j.ab.2017.07.011] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2017] [Revised: 06/05/2017] [Accepted: 07/10/2017] [Indexed: 02/04/2023]
|
74
|
Prediction of lysine crotonylation sites by incorporating the composition of k -spaced amino acid pairs into Chou’s general PseAAC. J Mol Graph Model 2017; 77:200-204. [DOI: 10.1016/j.jmgm.2017.08.020] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 08/21/2017] [Accepted: 08/21/2017] [Indexed: 12/11/2022]
|
75
|
Liu B, Wu H, Zhang D, Wang X, Chou KC. Pse-Analysis: a python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods. Oncotarget 2017; 8:13338-13343. [PMID: 28076851 PMCID: PMC5355101 DOI: 10.18632/oncotarget.14524] [Citation(s) in RCA: 114] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Accepted: 12/27/2016] [Indexed: 12/20/2022] Open
Abstract
To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the aforementioned tedious jobs can be automatically done by the computer. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. The Pse-Analysis Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/Pse-Analysis/, and can be directly run on Windows, Linux, and Unix.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Gordon Life Science Institute, Boston, Massachusetts, USA
| | - Hao Wu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Deyuan Zhang
- School of Computer, Shenyang Aerospace University, Shenyang, Liaoning, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts, USA.,Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
76
|
Kumar P, Joy J, Pandey A, Gupta D. PRmePRed: A protein arginine methylation prediction tool. PLoS One 2017; 12:e0183318. [PMID: 28813517 PMCID: PMC5557562 DOI: 10.1371/journal.pone.0183318] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 08/02/2017] [Indexed: 12/16/2022] Open
Abstract
Protein methylation is an important Post-Translational Modification (PTMs) of proteins. Arginine methylation carries out and regulates several important biological functions, including gene regulation and signal transduction. Experimental identification of arginine methylation site is a daunting task as it is costly as well as time and labour intensive. Hence reliable prediction tools play an important task in rapid screening and identification of possible methylation sites in proteomes. Our preliminary assessment using the available prediction methods on collected data yielded unimpressive results. This motivated us to perform a comprehensive data analysis and appraisal of features relevant in the context of biological significance, that led to the development of a prediction tool PRmePRed with better performance. The PRmePRed perform reasonably well with an accuracy of 84.10%, 82.38% sensitivity, 83.77% specificity, and Matthew’s correlation coefficient of 66.20% in 10-fold cross-validation. PRmePRed is freely available at http://bioinfo.icgeb.res.in/PRmePRed/
Collapse
Affiliation(s)
- Pawan Kumar
- Translational Bioinformatics Group, ICGEB, New Delhi, India
| | - Joseph Joy
- Translational Bioinformatics Group, ICGEB, New Delhi, India
| | | | - Dinesh Gupta
- Translational Bioinformatics Group, ICGEB, New Delhi, India
- * E-mail:
| |
Collapse
|
77
|
Ju Z, He JJ. Prediction of lysine propionylation sites using biased SVM and incorporating four different sequence features into Chou's PseAAC. J Mol Graph Model 2017; 76:356-363. [PMID: 28763688 DOI: 10.1016/j.jmgm.2017.07.022] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 07/20/2017] [Accepted: 07/21/2017] [Indexed: 12/21/2022]
Abstract
Lysine propionylation is an important and common protein acylation modification in both prokaryotes and eukaryotes. To better understand the molecular mechanism of propionylation, it is important to identify propionylated substrates and their corresponding propionylation sites accurately. In this study, a novel bioinformatics tool named PropPred is developed to predict propionylation sites by using multiple feature extraction and biased support vector machine. On the one hand, various features are incorporated, including amino acid composition, amino acid factors, binary encoding, and the composition of k-spaced amino acid pairs. And the F-score feature method and the incremental feature selection algorithm are adopted to remove the redundant features. On the other hand, the biased support vector machine algorithm is used to handle the imbalanced problem in propionylation sites training dataset. As illustrated by 10-fold cross-validation, the performance of PropPred achieves a satisfactory performance with a Sensitivity of 70.03%, a Specificity of 75.61%, an accuracy of 75.02% and a Matthew's correlation coefficient of 0.3085. Feature analysis shows that some amino acid factors play the most important roles in the prediction of propionylation sites. These analysis and prediction results might provide some clues for understanding the molecular mechanisms of propionylation. A user-friendly web-server for PropPred is established at 123.206.31.171/PropPred/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China.
| | - Jian-Jun He
- College of Information and Communication Engineering, Dalian Minzu University, 116600, People's Republic of China.
| |
Collapse
|
78
|
Akbar S, Hayat M, Iqbal M, Jan MA. iACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space. Artif Intell Med 2017; 79:62-70. [PMID: 28655440 DOI: 10.1016/j.artmed.2017.06.008] [Citation(s) in RCA: 94] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 06/12/2017] [Accepted: 06/16/2017] [Indexed: 01/10/2023]
Abstract
Cancer is a fatal disease, responsible for one-quarter of all deaths in developed countries. Traditional anticancer therapies such as, chemotherapy and radiation, are highly expensive, susceptible to errors and ineffective techniques. These conventional techniques induce severe side-effects on human cells. Due to perilous impact of cancer, the development of an accurate and highly efficient intelligent computational model is desirable for identification of anticancer peptides. In this paper, evolutionary intelligent genetic algorithm-based ensemble model, 'iACP-GAEnsC', is proposed for the identification of anticancer peptides. In this model, the protein sequences are formulated, using three different discrete feature representation methods, i.e., amphiphilic Pseudo amino acid composition, g-Gap dipeptide composition, and Reduce amino acid alphabet composition. The performance of the extracted feature spaces are investigated separately and then merged to exhibit the significance of hybridization. In addition, the predicted results of individual classifiers are combined together, using optimized genetic algorithm and simple majority technique in order to enhance the true classification rate. It is observed that genetic algorithm-based ensemble classification outperforms than individual classifiers as well as simple majority voting base ensemble. The performance of genetic algorithm-based ensemble classification is highly reported on hybrid feature space, with an accuracy of 96.45%. In comparison to the existing techniques, 'iACP-GAEnsC' model has achieved remarkable improvement in terms of various performance metrics. Based on the simulation results, it is observed that 'iACP-GAEnsC' model might be a leading tool in the field of drug design and proteomics for researchers.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Mian Ahmad Jan
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| |
Collapse
|
79
|
Porto WF, Pires ÁS, Franco OL. Antimicrobial activity predictors benchmarking analysis using shuffled and designed synthetic peptides. J Theor Biol 2017; 426:96-103. [PMID: 28536036 DOI: 10.1016/j.jtbi.2017.05.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Revised: 05/05/2017] [Accepted: 05/09/2017] [Indexed: 12/29/2022]
Abstract
The antimicrobial activity prediction tools aim to help the novel antimicrobial peptides (AMP) sequences discovery, utilizing machine learning methods. Such approaches have gained increasing importance in the generation of novel synthetic peptides by means of rational design techniques. This study focused on predictive ability of such approaches to determine the antimicrobial sequence activities, which were previously characterized at the protein level by in vitro studies. Using four web servers and one standalone software, we evaluated 78 sequences generated by the so-called linguistic model, being 40 designed and 38 shuffled sequences, with ∼60 and ∼25% of identity to AMPs, respectively. The ab initio molecular modelling of such sequences indicated that the structure does not affect the predictions, as both sets present similar structures. Overall, the systems failed on predicting shuffled versions of designed peptides, as they are identical in AMPs composition, which implies in accuracies below 30%. The prediction accuracy is negatively affected by the low specificity of all systems here evaluated, as they, on the other hand, reached 100% of sensitivity. Our results suggest that complementary approaches with high specificity, not necessarily high accuracy, should be developed to be used together with the current systems, overcoming their limitations.
Collapse
Affiliation(s)
- William F Porto
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia Universidade Católica de Brasília, Brasília, Distrito Federal, Brazil; Porto Reports, Brasília, Distrito Federal, Brazil
| | - Állan S Pires
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia Universidade Católica de Brasília, Brasília, Distrito Federal, Brazil
| | - Octavio L Franco
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia Universidade Católica de Brasília, Brasília, Distrito Federal, Brazil; S-Inova Biotech, Pós-graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, Mato Grosso do Sul, Brazil.
| |
Collapse
|
80
|
Dehzangi A, López Y, Lal SP, Taherzadeh G, Michaelson J, Sattar A, Tsunoda T, Sharma A. PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction. J Theor Biol 2017; 425:97-102. [PMID: 28483566 DOI: 10.1016/j.jtbi.2017.05.005] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2017] [Revised: 04/28/2017] [Accepted: 05/03/2017] [Indexed: 11/25/2022]
Abstract
Post-translational modification (PTM) is a covalent and enzymatic modification of proteins, which contributes to diversify the proteome. Despite many reported PTMs with essential roles in cellular functioning, lysine succinylation has emerged as a subject of particular interest. Because its experimental identification remains a costly and time-consuming process, computational predictors have been recently proposed for tackling this important issue. However, the performance of current predictors is still very limited. In this paper, we propose a new predictor called PSSM-Suc which employs evolutionary information of amino acids for predicting succinylated lysine residues. Here we described each lysine residue in terms of profile bigrams extracted from position specific scoring matrices. We compared the performance of PSSM-Suc to that of existing predictors using a widely used benchmark dataset. PSSM-Suc showed a significant improvement in performance over state-of-the-art predictors. Its sensitivity, accuracy and Matthews correlation coefficient were 0.8159, 0.8199 and 0.6396, respectively.
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa, USA.
| | - Yosvany López
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan; Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Sunil Pranit Lal
- School of Engineering & Advanced Technology, Massey University, New Zealand
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4215, Australia
| | - Jacob Michaelson
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa, USA
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4215, Australia; Institute for Integrated and Intelligent Systems, Griffith University, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan; Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan; CREST, JST, Tokyo 113-8510, Japan
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan; Institute for Integrated and Intelligent Systems, Griffith University, Australia; School of Engineering & Physics, University of the South Pacific, Fiji
| |
Collapse
|
81
|
López Y, Dehzangi A, Lal SP, Taherzadeh G, Michaelson J, Sattar A, Tsunoda T, Sharma A. SucStruct: Prediction of succinylated lysine residues by using structural properties of amino acids. Anal Biochem 2017; 527:24-32. [PMID: 28363440 DOI: 10.1016/j.ab.2017.03.021] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Revised: 03/13/2017] [Accepted: 03/28/2017] [Indexed: 11/30/2022]
Abstract
Post-Translational Modification (PTM) is a biological reaction which contributes to diversify the proteome. Despite many modifications with important roles in cellular activity, lysine succinylation has recently emerged as an important PTM mark. It alters the chemical structure of lysines, leading to remarkable changes in the structure and function of proteins. In contrast to the huge amount of proteins being sequenced in the post-genome era, the experimental detection of succinylated residues remains expensive, inefficient and time-consuming. Therefore, the development of computational tools for accurately predicting succinylated lysines is an urgent necessity. To date, several approaches have been proposed but their sensitivity has been reportedly poor. In this paper, we propose an approach that utilizes structural features of amino acids to improve lysine succinylation prediction. Succinylated and non-succinylated lysines were first retrieved from 670 proteins and characteristics such as accessible surface area, backbone torsion angles and local structure conformations were incorporated. We used the k-nearest neighbors cleaning treatment for dealing with class imbalance and designed a pruned decision tree for classification. Our predictor, referred to as SucStruct (Succinylation using Structural features), proved to significantly improve performance when compared to previous predictors, with sensitivity, accuracy and Mathew's correlation coefficient equal to 0.7334-0.7946, 0.7444-0.7608 and 0.4884-0.5240, respectively.
Collapse
Affiliation(s)
- Yosvany López
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan; Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Abdollah Dehzangi
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa, USA.
| | - Sunil Pranit Lal
- School of Engineering & Advanced Technology, Massey University, New Zealand
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4215, Australia
| | - Jacob Michaelson
- Department of Psychiatry, Carver College of Medicine, University of Iowa, Iowa, USA
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, Queensland 4215, Australia; Institute for Integrated and Intelligent Systems, Griffith University, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan; Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan; CREST, JST, Tokyo 113-8510, Japan
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan; Institute for Integrated and Intelligent Systems, Griffith University, Australia
| |
Collapse
|
82
|
Wu C, Yao S, Li X, Chen C, Hu X. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human. Int J Mol Sci 2017; 18:E420. [PMID: 28212312 PMCID: PMC5343954 DOI: 10.3390/ijms18020420] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Revised: 02/03/2017] [Accepted: 02/08/2017] [Indexed: 02/02/2023] Open
Abstract
DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.
Collapse
Affiliation(s)
- Chengchao Wu
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| | - Shixin Yao
- College of Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Xinghao Li
- College of Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Chujia Chen
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| | - Xuehai Hu
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
83
|
Huang G, Chu C, Huang T, Kong X, Zhang Y, Zhang N, Cai YD. Exploring Mouse Protein Function via Multiple Approaches. PLoS One 2016; 11:e0166580. [PMID: 27846315 PMCID: PMC5112993 DOI: 10.1371/journal.pone.0166580] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2016] [Accepted: 10/31/2016] [Indexed: 01/16/2023] Open
Abstract
Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1st-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1st-order predicted functions are wrong but the 2nd-order predicted functions are correct, the 1st-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.
Collapse
Affiliation(s)
- Guohua Huang
- Department of Mathematics, Shaoyang University, Shaoyang, Hunan, 422000, China
| | - Chen Chu
- Institute of Biochemistry and Cell Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Xiangyin Kong
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | - Yunhua Zhang
- College of Life Science, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Ning Zhang
- Department of Biomedical Engineering, Tianjin Key Lab of Biomedical Engineering Measurement, Tianjin University, Tianjin, China
- * E-mail: (NZ); (Y-DC)
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, 99 Shangda Road, Shanghai, 200444, China
- * E-mail: (NZ); (Y-DC)
| |
Collapse
|
84
|
Wen PP, Shi SP, Xu HD, Wang LN, Qiu JD. Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization. Bioinformatics 2016; 32:3107-3115. [PMID: 27354692 DOI: 10.1093/bioinformatics/btw377] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2016] [Accepted: 06/13/2016] [Indexed: 02/04/2023] Open
Abstract
As one of the most important reversible types of post-translational modification, protein methylation catalyzed by methyltransferases carries many pivotal biological functions as well as many essential biological processes. Identification of methylation sites is prerequisite for decoding methylation regulatory networks in living cells and understanding their physiological roles. Experimental methods are limitations of labor-intensive and time-consuming. While in silicon approaches are cost-effective and high-throughput manner to predict potential methylation sites, but those previous predictors only have a mixed model and their prediction performances are not fully satisfactory now. Recently, with increasing availability of quantitative methylation datasets in diverse species (especially in eukaryotes), there is a growing need to develop a species-specific predictor. Here, we designed a tool named PSSMe based on information gain (IG) feature optimization method for species-specific methylation site prediction. The IG method was adopted to analyze the importance and contribution of each feature, then select the valuable dimension feature vectors to reconstitute a new orderly feature, which was applied to build the finally prediction model. Finally, our method improves prediction performance of accuracy about 15% comparing with single features. Furthermore, our species-specific model significantly improves the predictive performance compare with other general methylation prediction tools. Hence, our prediction results serve as useful resources to elucidate the mechanism of arginine or lysine methylation and facilitate hypothesis-driven experimental design and validation. AVAILABILITY AND IMPLEMENTATION The tool online service is implemented by C# language and freely available at http://bioinfo.ncu.edu.cn/PSSMe.aspx CONTACT: jdqiu@ncu.edu.cnSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ping-Ping Wen
- Department of Chemistry, Department of Mathematics, Nanchang University, Nanchang 330031, China
| | - Shao-Ping Shi
- Department of Chemistry, Department of Mathematics, Nanchang University, Nanchang 330031, China
| | - Hao-Dong Xu
- Department of Chemistry, Department of Mathematics, Nanchang University, Nanchang 330031, China
| | - Li-Na Wang
- Department of Chemistry, Department of Mathematics, Nanchang University, Nanchang 330031, China
| | - Jian-Ding Qiu
- Department of Chemistry, Department of Mathematics, Nanchang University, Nanchang 330031, China Department of Materials and Chemical Engineering, Pingxiang University, Pingxiang 337055, China
| |
Collapse
|
85
|
Improving N(6)-methyladenosine site prediction with heuristic selection of nucleotide physical-chemical properties. Anal Biochem 2016; 508:104-13. [PMID: 27293216 DOI: 10.1016/j.ab.2016.06.001] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2016] [Revised: 05/31/2016] [Accepted: 06/01/2016] [Indexed: 12/28/2022]
Abstract
N(6)-methyladenosine (m(6)A) is one of the most common and abundant post-transcriptional RNA modifications found in viruses and most eukaryotes. m(6)A plays an essential role in many vital biological processes to regulate gene expression. Because of its widespread distribution across the genomes, the identification of m(6)A sites from RNA sequences is of significant importance for better understanding the regulatory mechanism of m(6)A. Although progress has been achieved in m(6)A site prediction, challenges remain. This article aims to further improve the performance of m(6)A site prediction by introducing a new heuristic nucleotide physical-chemical property selection (HPCS) algorithm. The proposed HPCS algorithm can effectively extract an optimized subset of nucleotide physical-chemical properties under the prescribed feature representation for encoding an RNA sequence into a feature vector. We demonstrate the efficacy of the proposed HPCS algorithm under different feature representations, including pseudo dinucleotide composition (PseDNC), auto-covariance (AC), and cross-covariance (CC). Based on the proposed HPCS algorithm, we implemented an m(6)A site predictor, called M6A-HPCS, which is freely available at http://csbio.njust.edu.cn/bioinf/M6A-HPCS. Experimental results over rigorous jackknife tests on benchmark datasets demonstrated that the proposed M6A-HPCS achieves higher success rates and outperforms existing state-of-the-art sequence-based m(6)A site predictors.
Collapse
|
86
|
Ju Z, Cao JZ, Gu H. Predicting lysine phosphoglycerylation with fuzzy SVM by incorporating k-spaced amino acid pairs into Chou׳s general PseAAC. J Theor Biol 2016; 397:145-50. [DOI: 10.1016/j.jtbi.2016.02.020] [Citation(s) in RCA: 86] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Revised: 02/14/2016] [Accepted: 02/15/2016] [Indexed: 11/30/2022]
|
87
|
Jia J, Liu Z, Xiao X, Liu B, Chou KC. pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J Theor Biol 2016; 394:223-230. [DOI: 10.1016/j.jtbi.2016.01.020] [Citation(s) in RCA: 231] [Impact Index Per Article: 28.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Revised: 01/06/2016] [Accepted: 01/07/2016] [Indexed: 10/22/2022]
|
88
|
Zhao X, Ning Q, Ai M, Chai H, Yang G. Identification of S-glutathionylation sites in species-specific proteins by incorporating five sequence-derived features into the general pseudo-amino acid composition. J Theor Biol 2016; 398:96-102. [PMID: 27025952 DOI: 10.1016/j.jtbi.2016.03.030] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Revised: 02/29/2016] [Accepted: 03/17/2016] [Indexed: 11/25/2022]
Abstract
As a selective and reversible protein post-translational modification, S-glutathionylation generates mixed disulfides between glutathione (GSH) and cysteine residues, and plays an important role in regulating protein activity, stability, and redox regulation. To fully understand S-glutathionylation mechanisms, identification of substrates and specific S-Glutathionylated sites is crucial. Experimental identification of S-glutathionylated sites is labor-intensive and time consuming, so establishing an effective computational method is much desirable due to their convenient and fast speed. Therefore, in this study, a new bioinformatics tool named SSGlu (Species-Specific identification of Protein S-glutathionylation Sites) was developed to identify species-specific protein S-glutathionylated sites, utilizing support vector machines that combine multiple sequence-derived features with a two-step feature selection. By 5-fold cross validation, the performance of SSGlu was measured with an AUC of 0.8105 and 0.8041 for Homo sapiens and Mus musculus, respectively. Additionally, SSGlu was compared with the existing methods, and the higher MCC and AUC of SSGlu demonstrated that SSGlu was very promising to predict S-glutathionylated sites. Furthermore, a site-specific analysis showed that S-glutathionylation intimately correlated with the features derived from its surrounding sites. The conclusions derived from this study might help to understand more of the S-glutathionylation mechanism and guide the related experimental validation. For public access, SSGlu is freely accessible at http://59.73.198.144:8080/SSGlu/.
Collapse
Affiliation(s)
- Xiaowei Zhao
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China.
| | - Qiao Ning
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China
| | - Meiyue Ai
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China
| | - Haiting Chai
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China
| | - Guifu Yang
- School of Computer Science and Information Technology, Northeast Normal University, Changchun 130117, China
| |
Collapse
|
89
|
iSuc-PseOpt: Identifying lysine succinylation sites in proteins by incorporating sequence-coupling effects into pseudo components and optimizing imbalanced training dataset. Anal Biochem 2016; 497:48-56. [DOI: 10.1016/j.ab.2015.12.009] [Citation(s) in RCA: 230] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Revised: 12/02/2015] [Accepted: 12/11/2015] [Indexed: 11/18/2022]
|
90
|
Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks. Sci Rep 2016; 6:19598. [PMID: 26797014 PMCID: PMC4726425 DOI: 10.1038/srep19598] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Accepted: 12/14/2015] [Indexed: 11/09/2022] Open
Abstract
The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.
Collapse
|
91
|
iPPBS-Opt: A Sequence-Based Ensemble Classifier for Identifying Protein-Protein Binding Sites by Optimizing Imbalanced Training Datasets. Molecules 2016; 21:E95. [PMID: 26797600 PMCID: PMC6274413 DOI: 10.3390/molecules21010095] [Citation(s) in RCA: 136] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Revised: 12/18/2015] [Accepted: 01/07/2016] [Indexed: 12/25/2022] Open
Abstract
Knowledge of protein-protein interactions and their binding sites is indispensable for in-depth understanding of the networks in living cells. With the avalanche of protein sequences generated in the postgenomic age, it is critical to develop computational methods for identifying in a timely fashion the protein-protein binding sites (PPBSs) based on the sequence information alone because the information obtained by this way can be used for both biomedical research and drug development. To address such a challenge, we have proposed a new predictor, called iPPBS-Opt, in which we have used: (1) the K-Nearest Neighbors Cleaning (KNNC) and Inserting Hypothetical Training Samples (IHTS) treatments to optimize the training dataset; (2) the ensemble voting approach to select the most relevant features; and (3) the stationary wavelet transform to formulate the statistical samples. Cross-validation tests by targeting the experiment-confirmed results have demonstrated that the new predictor is very promising, implying that the aforementioned practices are indeed very effective. Particularly, the approach of using the wavelets to express protein/peptide sequences might be the key in grasping the problem's essence, fully consistent with the findings that many important biological functions of proteins can be elucidated with their low-frequency internal motions. To maximize the convenience of most experimental scientists, we have provided a step-by-step guide on how to use the predictor's web server (http://www.jci-bioinfo.cn/iPPBS-Opt) to get the desired results without the need to go through the complicated mathematical equations involved.
Collapse
|
92
|
Aftabi Y, Colagar AH, Mehrnejad F. An in silico approach to investigate the source of the controversial interpretations about the phenotypic results of the human AhR-gene G1661A polymorphism. J Theor Biol 2016; 393:1-15. [PMID: 26776670 DOI: 10.1016/j.jtbi.2016.01.001] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2015] [Revised: 12/11/2015] [Accepted: 01/01/2016] [Indexed: 12/21/2022]
Abstract
Aryl hydrocarbon receptor (AhR) acts as an enhancer binding ligand-activated intracellular receptor. Chromatin remodeling components and general transcription factors such as TATA-binding protein (TBP) are evoked on AhR-target genes by interaction with its flexible transactivation domain (TAD). AhR-G1661A single nucleotide polymorphism (SNP: rs2066853) causes an arginine to lysine substitution in the acidic sub-domain of TAD at position 554 (R554K). Although, numerous studies associate the SNP with some abnormalities such as cancer, other reliable investigations refuse the associations. Consequently, the interpretation of the phenotypic results of G1661A-transition has been controversial. In this study, an in silico analysis were performed to investigate the possible effects of the transition on AhR-mRNA, protein structure, interaction properties and modifications. The analysis revealed that the R554K substitution affects secondary structure and solvent accessibility of adjacent residues. Also, it causes to decreasing of the AhR stability; altering the hydropathy features of the local sequence and changing the pattern of the residues at the binding site of the TAD-acidic sub-domain. Generating of new sites for ubiquitination and acetylation for AhR-K554 variant respectively at positions 544 and 560 was predicted. Our findings intensify the idea that the AhR-G1661A transition may affects AhR-TAD interactions, especially with the TBP, which influence AhR-target genes expression. However, the previously reported flexibility of the modular TAD could act as an intervening factor, moderate the SNP effects and causes distinct outcomes in different individuals and tissues.
Collapse
Affiliation(s)
- Younes Aftabi
- Department of Molecular and Cell Biology, Faculty of Basic Sciences, University of Mazandaran, Babolsar, Post Code: 47416-95447, Mazandaran, Iran
| | - Abasalt Hosseinzadeh Colagar
- Department of Molecular and Cell Biology, Faculty of Basic Sciences, University of Mazandaran, Babolsar, Post Code: 47416-95447, Mazandaran, Iran.
| | - Faramarz Mehrnejad
- Department of Life Science Engineering, Faculty of New Sciences & Technologies, University of Tehran, P.O. Box: 14395-1561, Tehran, Iran
| |
Collapse
|
93
|
Chen W, Feng P, Ding H, Lin H, Chou KC. iRNA-Methyl: Identifying N6-methyladenosine sites using pseudo nucleotide composition. Anal Biochem 2015; 490:26-33. [DOI: 10.1016/j.ab.2015.08.021] [Citation(s) in RCA: 254] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Revised: 08/13/2015] [Accepted: 08/16/2015] [Indexed: 10/23/2022]
|
94
|
Liu B, Fang L, Wang S, Wang X, Li H, Chou KC. Identification of microRNA precursor with the degenerate K-tuple or Kmer strategy. J Theor Biol 2015; 385:153-9. [DOI: 10.1016/j.jtbi.2015.08.025] [Citation(s) in RCA: 131] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 08/21/2015] [Accepted: 08/24/2015] [Indexed: 10/23/2022]
|
95
|
Jia J, Liu Z, Xiao X, Liu B, Chou KC. Identification of protein-protein binding sites by incorporating the physicochemical properties and stationary wavelet transforms into pseudo amino acid composition. J Biomol Struct Dyn 2015; 34:1946-61. [PMID: 26375780 DOI: 10.1080/07391102.2015.1095116] [Citation(s) in RCA: 88] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
With the explosive growth of protein sequences entering into protein data banks in the post-genomic era, it is highly demanded to develop automated methods for rapidly and effectively identifying the protein-protein binding sites (PPBSs) based on the sequence information alone. To address this problem, we proposed a predictor called iPPBS-PseAAC, in which each amino acid residue site of the proteins concerned was treated as a 15-tuple peptide segment generated by sliding a window along the protein chains with its center aligned with the target residue. The working peptide segment is further formulated by a general form of pseudo amino acid composition via the following procedures: (1) it is converted into a numerical series via the physicochemical properties of amino acids; (2) the numerical series is subsequently converted into a 20-D feature vector by means of the stationary wavelet transform technique. Formed by many individual "Random Forest" classifiers, the operation engine to run prediction is a two-layer ensemble classifier, with the 1st-layer voting out the best training data-set from many bootstrap systems and the 2nd-layer voting out the most relevant one from seven physicochemical properties. Cross-validation tests indicate that the new predictor is very promising, meaning that many important key features, which are deeply hidden in complicated protein sequences, can be extracted via the wavelets transform approach, quite consistent with the facts that many important biological functions of proteins can be elucidated with their low-frequency internal motions. The web server of iPPBS-PseAAC is accessible at http://www.jci-bioinfo.cn/iPPBS-PseAAC , by which users can easily acquire their desired results without the need to follow the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Jianhua Jia
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China
| | - Zi Liu
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China
| | - Xuan Xiao
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China.,c Gordon Life Science Institute , Boston , MA 02478 , USA
| | - Bingxiang Liu
- a Computer Department , Jing-De-Zhen Ceramic Institute , Jing-De-Zhen 333403 , China
| | - Kuo-Chen Chou
- b Center of Excellence in Genomic Medicine Research (CEGMR) , King Abdulaziz University , Jeddah 21589 , Saudi Arabia.,c Gordon Life Science Institute , Boston , MA 02478 , USA
| |
Collapse
|
96
|
Zhao X, Ning Q, Chai H, Ai M, Ma Z. PGlcS: Prediction of protein O-GlcNAcylation sites with multiple features and analysis. J Theor Biol 2015; 380:524-9. [DOI: 10.1016/j.jtbi.2015.06.026] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Revised: 06/01/2015] [Accepted: 06/02/2015] [Indexed: 10/23/2022]
|
97
|
Kabir M, Hayat M. iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou’s PseAAC to formulate DNA samples. Mol Genet Genomics 2015; 291:285-96. [DOI: 10.1007/s00438-015-1108-5] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Accepted: 08/19/2015] [Indexed: 10/23/2022]
|
98
|
Yuan J, Xue B. Role of structural flexibility in the evolution of emerin. J Theor Biol 2015; 385:102-11. [PMID: 26319992 DOI: 10.1016/j.jtbi.2015.08.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Revised: 08/07/2015] [Accepted: 08/17/2015] [Indexed: 02/07/2023]
Abstract
Emerin is a short inner nuclear membrane protein with an LEM-domain at the N-terminal end and a transmembrane domain at the C-terminal end. The middle region of human emerin contains multiple binding motifs. Since emerin is often found in evolutionarily newer species, the functional conservation of emerin becomes an interesting topic. In this study, we have demonstrated that most of the functional motifs of emerin are intrinsically disordered or highly flexible. Many post-translational modification sites and mutation sites are associated with these disordered regions. The averaged substitution rates of most functional motifs between species correlate positively with the averaged disorder scores of those functional motifs. Human emerin sequence may have acquired new functions on protein-protein interaction through the formation of hydrophobic motifs in the middle region, which is resulted from accumulated mutations during the evolution process.
Collapse
Affiliation(s)
- Jia Yuan
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, 4202 E. Fowler Ave, ISA 2015, Tampa, FL 33620, USA
| | - Bin Xue
- Department of Cell Biology, Microbiology and Molecular Biology, School of Natural Sciences and Mathematics, College of Arts and Sciences, University of South Florida, 4202 E. Fowler Ave, ISA 2015, Tampa, FL 33620, USA.
| |
Collapse
|
99
|
Shi Y, Guo Y, Hu Y, Li M. Position-specific prediction of methylation sites from sequence conservation based on information theory. Sci Rep 2015. [PMID: 26202727 PMCID: PMC5378888 DOI: 10.1038/srep12403] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Protein methylation plays vital roles in many biological processes and has been implicated in various human diseases. To fully understand the mechanisms underlying methylation for use in drug design and work in methylation-related diseases, an initial but crucial step is to identify methylation sites. The use of high-throughput bioinformatics methods has become imperative to predict methylation sites. In this study, we developed a novel method that is based only on sequence conservation to predict protein methylation sites. Conservation difference profiles between methylated and non-methylated peptides were constructed by the information entropy (IE) in a wider neighbor interval around the methylation sites that fully incorporated all of the environmental information. Then, the distinctive neighbor residues were identified by the importance scores of information gain (IG). The most representative model was constructed by support vector machine (SVM) for Arginine and Lysine methylation, respectively. This model yielded a promising result on both the benchmark dataset and independent test set. The model was used to screen the entire human proteome, and many unknown substrates were identified. These results indicate that our method can serve as a useful supplement to elucidate the mechanism of protein methylation and facilitate hypothesis-driven experimental design and validation.
Collapse
Affiliation(s)
- Yinan Shi
- College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China
| | - Yayun Hu
- College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, Sichuan 610064, PR China
| |
Collapse
|
100
|
iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. J Theor Biol 2015; 377:47-56. [DOI: 10.1016/j.jtbi.2015.04.011] [Citation(s) in RCA: 243] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Revised: 04/07/2015] [Accepted: 04/09/2015] [Indexed: 12/24/2022]
|