51
|
Liu B, Yang F, Huang DS, Chou KC. iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics 2017; 34:33-40. [DOI: 10.1093/bioinformatics/btx579] [Citation(s) in RCA: 235] [Impact Index Per Article: 33.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 09/13/2017] [Indexed: 12/30/2022] Open
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- The Gordon Life Science Institute, Boston, MA, USA
| | - Fan Yang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Kuo-Chen Chou
- The Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
- Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
52
|
Yu B, Lou L, Li S, Zhang Y, Qiu W, Wu X, Wang M, Tian B. Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising. J Mol Graph Model 2017; 76:260-273. [DOI: 10.1016/j.jmgm.2017.07.012] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 07/11/2017] [Accepted: 07/12/2017] [Indexed: 11/25/2022]
|
53
|
Du PF. Predicting Protein Submitochondrial Locations: The 10th Anniversary. Curr Genomics 2017; 18:316-321. [PMID: 29081687 PMCID: PMC5635615 DOI: 10.2174/1389202918666170228143256] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 10/16/2016] [Accepted: 11/02/2016] [Indexed: 12/16/2022] Open
Abstract
Predicting protein submitochondrial location has been studied for about ten years. A number of methods have been developed. The prediction performances have been improved to an almost perfect level. In this review, we introduce the background of this research topic. We also compare the methods, the performances and the datasets that have been used by these studies. Towards the end, we provide hints for the future directions of this research topic.
Collapse
Affiliation(s)
- Pu-Feng Du
- School of Computer Science and Technology, Tianjin University, Tianjin300350, China
| |
Collapse
|
54
|
Cheng X, Zhao SG, Lin WZ, Xiao X, Chou KC. pLoc-mAnimal: predict subcellular localization of animal proteins with both single and multiple sites. Bioinformatics 2017; 33:3524-3531. [DOI: 10.1093/bioinformatics/btx476] [Citation(s) in RCA: 167] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 07/22/2017] [Indexed: 12/24/2022] Open
Affiliation(s)
- Xiang Cheng
- College of Information Science and Technology, Donghua University, Shanghai, China
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Shu-Guang Zhao
- College of Information Science and Technology, Donghua University, Shanghai, China
| | - Wei-Zhong Lin
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China
- The Gordon Life Science Institute, Boston, MA, USA
| | - Kuo-Chen Chou
- The Gordon Life Science Institute, Boston, MA, USA
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
55
|
Feng P, Ding H, Yang H, Chen W, Lin H, Chou KC. iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC. MOLECULAR THERAPY. NUCLEIC ACIDS 2017; 7:155-163. [PMID: 28624191 PMCID: PMC5415964 DOI: 10.1016/j.omtn.2017.03.006] [Citation(s) in RCA: 215] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2017] [Revised: 03/16/2017] [Accepted: 03/17/2017] [Indexed: 11/23/2022]
Abstract
There are many different types of RNA modifications, which are essential for numerous biological processes. Knowledge about the occurrence sites of RNA modifications in its sequence is a key for in-depth understanding of their biological functions and mechanism. Unfortunately, it is both time-consuming and laborious to determine these sites purely by experiments alone. Although some computational methods were developed in this regard, each one could only be used to deal with some type of modification individually. To our knowledge, no method has thus far been developed that can identify the occurrence sites for several different types of RNA modifications with one seamless package or platform. To address such a challenge, a novel platform called "iRNA-PseColl" has been developed. It was formed by incorporating both the individual and collective features of the sequence elements into the general pseudo K-tuple nucleotide composition (PseKNC) of RNA via the chemicophysical properties and density distribution of its constituent nucleotides. Rigorous cross-validations have indicated that the anticipated success rates achieved by the proposed platform are quite high. To maximize the convenience for most experimental biologists, the platform's web-server has been provided at http://lin.uestc.edu.cn/server/iRNA-PseColl along with a step-by-step user guide that will allow users to easily achieve their desired results without the need to go through the mathematical details involved in this paper.
Collapse
Affiliation(s)
- Pengmian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan, 063000, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| |
Collapse
|
56
|
Yang H, Li X, Cai Y, Wang Q, Li W, Liu G, Tang Y. In silico prediction of chemical subcellular localization via multi-classification methods. MEDCHEMCOMM 2017; 8:1225-1234. [PMID: 30108833 PMCID: PMC6072212 DOI: 10.1039/c7md00074j] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 03/22/2017] [Indexed: 12/16/2022]
Abstract
Chemical subcellular localization is closely related to drug distribution in the body and hence important in drug discovery and design. Although many in vivo and in vitro methods have been developed, in silico methods play key roles in the prediction of chemical subcellular localization due to their low costs and high performance. For that purpose, machine learning-based methods were developed here. At first, 614 unique compounds localized in the lysosome, mitochondria, nucleus and plasma membrane were collected from the literature. 80% of the compounds were used to build the models and the rest as the external validation set. Both fingerprints and molecular descriptors were used to describe the molecules, and six machine learning methods were applied to build the multi-classification models. The performance of the models was measured by 5-fold cross-validation and external validation. We further detected key substructures for each localization and analyzed potential structure-localization relationships, which could be very helpful for molecular design and modification. The key substructures can also be used as features complementary to fingerprints to improve the performance of the models.
Collapse
Affiliation(s)
- Hongbin Yang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Xiao Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Yingchun Cai
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Qin Wang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| |
Collapse
|
57
|
Wang P, Ge R, Liu L, Xiao X, Li Y, Cai Y. Multi-label Learning for Predicting the Activities of Antimicrobial Peptides. Sci Rep 2017; 7:2202. [PMID: 28526820 PMCID: PMC5438384 DOI: 10.1038/s41598-017-01986-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Accepted: 04/05/2017] [Indexed: 01/06/2023] Open
Abstract
Antimicrobial peptides (AMPs) are peptide antibiotics with a broad spectrum of antimicrobial activities. Activity prediction of AMPs from their amino acid sequences is of great therapeutic importance but imposes challenges on prediction methods due to label interactions. In this paper we propose a novel multi-label learning model to address this problem. A weighted K-nearest neighbor classifier is adopted for efficient representation learning of the sequence data. A multiple linear regression model is then employed to learn a mapping from the classifier score vectors to the target labels, with label correlations considered. Several popular multi-label learning algorithms and feature extraction methods were tested on a comprehensive, up-to-date AMP dataset with twelve biological activities covered and its filtered version with five activities covered. The experimental results showed that our proposed method has competitive performance with previous works and could be used as a powerful engine for activity prediction of AMPs.
Collapse
Affiliation(s)
- Pu Wang
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, China.,Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, 518055, China.,Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, 333403, China
| | - Ruiquan Ge
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, China.,Shenzhen College of Advanced Technology, University of Chinese Academy of Sciences, Shenzhen, 518055, China.,School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, China
| | - Liming Liu
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, China.,College of Mathematics and Statistics, Shenzhen University, Shenzhen, 518060, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, 333403, China
| | - Ye Li
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, China.
| | - Yunpeng Cai
- Shenzhen Institutes of Advanced Technology, and Key Lab for Health Informatics, Chinese Academy of Sciences, Shenzhen, Guangdong, 518055, China.
| |
Collapse
|
58
|
Liu B, Yang F, Chou KC. 2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function. MOLECULAR THERAPY-NUCLEIC ACIDS 2017. [PMID: 28624202 PMCID: PMC5415553 DOI: 10.1016/j.omtn.2017.04.008] [Citation(s) in RCA: 194] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Involved with important cellular or gene functions and implicated with many kinds of cancers, piRNAs, or piwi-interacting RNAs, are of small non-coding RNA with around 19–33 nt in length. Given a small non-coding RNA molecule, can we predict whether it is of piRNA according to its sequence information alone? Furthermore, there are two types of piRNA: one has the function of instructing target mRNA deadenylation, and the other does not. Can we discriminate one from the other? With the avalanche of RNA sequences emerging in the postgenomic age, it is urgent to address the two problems for both basic research and drug development. Unfortunately, to the best of our knowledge, so far no computational methods whatsoever could be used to deal with the second problem, let alone deal with the two problems together. Here, by incorporating the physicochemical properties of nucleotides into the pseudo K-tuple nucleotide composition (PseKNC), we proposed a powerful predictor called 2L-piRNA. It is a two-layer ensemble classifier, in which the first layer is for identifying whether a query RNA molecule is piRNA or non-piRNA, and the second layer for identifying whether a piRNA is with or without the function of instructing target mRNA deadenylation. Rigorous cross-validations have indicated that the success rates achieved by the proposed predictor are quite high. For the convenience of most biologists and drug development scientists, the web server for 2L-piRNA has been established at http://bioinformatics.hitsz.edu.cn/2L-piRNA/, by which users can easily get their desired results without the need to go through the mathematical details.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China; Gordon Life Science Institute, Belmont, MA 02478, USA.
| | - Fan Yang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA 02478, USA; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
| |
Collapse
|
59
|
Jiao YS, Du PF. Predicting protein submitochondrial locations by incorporating the positional-specific physicochemical properties into Chou's general pseudo-amino acid compositions. J Theor Biol 2017; 416:81-87. [DOI: 10.1016/j.jtbi.2016.12.026] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Revised: 12/06/2016] [Accepted: 12/30/2016] [Indexed: 11/26/2022]
|
60
|
Zhou SF, Zhong WZ. Drug Design and Discovery: Principles and Applications. Molecules 2017; 22:molecules22020279. [PMID: 28208821 PMCID: PMC6155886 DOI: 10.3390/molecules22020279] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Revised: 02/08/2017] [Accepted: 02/09/2017] [Indexed: 12/23/2022] Open
Affiliation(s)
- Shu-Feng Zhou
- Department of Bioengineering and Biotechnology, College of Chemical Engineering, Huaqiao University, Xiamen 361021, Fujian, China.
| | - Wei-Zhu Zhong
- Gordon Life Science Institute, Belmont, MA 02478, USA.
| |
Collapse
|
61
|
Khan M, Hayat M, Khan SA, Iqbal N. Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou's general PseAAC. J Theor Biol 2017; 415:13-19. [DOI: 10.1016/j.jtbi.2016.12.004] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Revised: 10/24/2016] [Accepted: 12/07/2016] [Indexed: 01/22/2023]
|
62
|
Ali F, Hayat M. Machine learning approaches for discrimination of Extracellular Matrix proteins using hybrid feature space. J Theor Biol 2016; 403:30-37. [DOI: 10.1016/j.jtbi.2016.05.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2015] [Revised: 05/02/2016] [Accepted: 05/03/2016] [Indexed: 01/12/2023]
|
63
|
Identification of apolipoprotein using feature selection technique. Sci Rep 2016; 6:30441. [PMID: 27443605 PMCID: PMC4957217 DOI: 10.1038/srep30441] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Accepted: 07/01/2016] [Indexed: 12/16/2022] Open
Abstract
Apolipoprotein is a kind of protein which can transport the lipids through the lymphatic and circulatory systems. The abnormal expression level of apolipoprotein always causes angiocardiopathy. Thus, correct recognition of apolipoprotein from proteomic data is very crucial to the comprehension of cardiovascular system and drug design. This study is to develop a computational model to predict apolipoproteins. In the model, the apolipoproteins and non-apolipoproteins were collected to form benchmark dataset. On the basis of the dataset, we extracted the g-gap dipeptide composition information from residue sequences to formulate protein samples. To exclude redundant information or noise, the analysis of various (ANOVA)-based feature selection technique was proposed to find out the best feature subset. The support vector machine (SVM) was selected as discrimination algorithm. Results show that 96.2% of sensitivity and 99.3% of specificity were achieved in five-fold cross-validation. These findings open new perspectives to improve apolipoproteins prediction by considering the specific dipeptides. We expect that these findings will help to improve drug development in anti-angiocardiopathy disease.
Collapse
|