251
|
Carvalho TFM, Silva JCF, Calil IP, Fontes EPB, Cerqueira FR. Rama: a machine learning approach for ribosomal protein prediction in plants. Sci Rep 2017; 7:16273. [PMID: 29176736 PMCID: PMC5701237 DOI: 10.1038/s41598-017-16322-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 10/30/2017] [Indexed: 12/04/2022] Open
Abstract
Ribosomal proteins (RPs) play a fundamental role within all type of cells, as they are major components of ribosomes, which are essential for translation of mRNAs. Furthermore, these proteins are involved in various physiological and pathological processes. The intrinsic biological relevance of RPs motivated advanced studies for the identification of unrevealed RPs. In this work, we propose a new computational method, termed Rama, for the prediction of RPs, based on machine learning techniques, with a particular interest in plants. To perform an effective classification, Rama uses a set of fundamental attributes of the amino acid side chains and applies a two-step procedure to classify proteins with unknown function as RPs. The evaluation of the resultant predictive models showed that Rama could achieve mean sensitivity, precision, and specificity of 0.91, 0.91, and 0.82, respectively. Furthermore, a list of proteins that have no annotation in Phytozome v.10, and are annotated as RPs in Phytozome v.12, were correctly classified by our models. Additional computational experiments have also shown that Rama presents high accuracy to differentiate ribosomal proteins from RNA-binding proteins. Finally, two novel proteins of Arabidopsis thaliana were validated in biological experiments. Rama is freely available at http://inctipp.bioagro.ufv.br:8080/Rama .
Collapse
Affiliation(s)
| | - José Cleydson F Silva
- Computer Science Department, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
- National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
| | - Iara Pinheiro Calil
- National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil
| | - Elizabeth Pacheco Batista Fontes
- National Institute of Science and Technology in Plant-Pest Interactions/BIOAGRO, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil.
| | - Fabio Ribeiro Cerqueira
- Computer Science Department, Universidade Federal de Viçosa, 36570-900, Minas Gerais, Brazil.
- Department of Production Engineering, Universidade Federal Fluminense, Petrópolis, 25650-050, Rio de Janeiro, Brazil.
| |
Collapse
|
252
|
Cheng X, Xiao X, Chou KC. pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information. Bioinformatics 2017; 34:1448-1456. [DOI: 10.1093/bioinformatics/btx711] [Citation(s) in RCA: 127] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Accepted: 10/31/2017] [Indexed: 01/19/2023] Open
Affiliation(s)
- Xiang Cheng
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
| | - Xuan Xiao
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
| | - Kuo-Chen Chou
- Computer Science, Jingdezhen Ceramic Institute, Jingdezhen, China
- Computational Biology, Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
253
|
Qiu WR, Sun BQ, Tang H, Huang J, Lin H. Identify and analysis crotonylation sites in histone by using support vector machines. Artif Intell Med 2017; 83:75-81. [DOI: 10.1016/j.artmed.2017.02.007] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2016] [Revised: 01/25/2017] [Indexed: 10/20/2022]
|
254
|
A novel hierarchical selective ensemble classifier with bioinformatics application. Artif Intell Med 2017; 83:82-90. [DOI: 10.1016/j.artmed.2017.02.005] [Citation(s) in RCA: 124] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2016] [Revised: 02/09/2017] [Accepted: 02/10/2017] [Indexed: 10/20/2022]
|
255
|
Xu C, Ge L, Zhang Y, Dehmer M, Gutman I. Computational prediction of therapeutic peptides based on graph index. J Biomed Inform 2017; 75:63-69. [DOI: 10.1016/j.jbi.2017.09.011] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 09/14/2017] [Accepted: 09/25/2017] [Indexed: 11/25/2022]
|
256
|
Liu H, Ren G, Hu H, Zhang L, Ai H, Zhang W, Zhao Q. LPI-NRLMF: lncRNA-protein interaction prediction by neighborhood regularized logistic matrix factorization. Oncotarget 2017; 8:103975-103984. [PMID: 29262614 PMCID: PMC5732780 DOI: 10.18632/oncotarget.21934] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Accepted: 08/28/2017] [Indexed: 01/08/2023] Open
Abstract
LncRNA-protein interactions play important roles in many important cellular processes including signaling, transcriptional regulation, and even the generation and progression of complex diseases. However, experimental methods for determining proteins bound by a specific lncRNA remain expensive, difficult and time-consuming, and only a few theoretical approaches are available for predicting potential lncRNA-protein associations. In this study, we developed a novel matrix factorization computational approach to uncover lncRNA-protein relationships, namely lncRNA-protein interactions prediction by neighborhood regularized logistic matrix factorization (LPI-NRLMF). Moreover, it is a semi-supervised and does not need negative samples. As a result, new model obtained reliable performance in the leave-one-out cross validation (the AUC of 0.9025 and AUPR of 0.6924), which significantly improved the prediction performance of previous models. Furthermore, the case study demonstrated that many lncRNA-protein interactions predicted by our method can be successfully confirmed by experiments. It is anticipated that LPI-NRLMF could serve as a useful resource for potential lncRNA-protein association identification.
Collapse
Affiliation(s)
- Hongsheng Liu
- School of Life Science, Liaoning University, Shenyang, 110036, China.,Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Liaoning Province, Shenyang, 110036, China.,Engineering Laboratory for Molecular Simulation and Designing of Drug Molecules of Liaoning, Shenyang, 110036, China
| | - Guofei Ren
- School of Information, Liaoning University, Shenyang, 110036, China
| | - Huan Hu
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Haixin Ai
- School of Life Science, Liaoning University, Shenyang, 110036, China
| | - Wen Zhang
- School of Computer, Wuhan University, Wuhan, 430072, China
| | - Qi Zhao
- Research Center for Computer Simulating and Information Processing of Bio-macromolecules of Liaoning Province, Shenyang, 110036, China.,School of Mathematics, Liaoning University, Shenyang, 110036, China
| |
Collapse
|
257
|
RIFS: a randomly restarted incremental feature selection algorithm. Sci Rep 2017; 7:13013. [PMID: 29026108 PMCID: PMC5638869 DOI: 10.1038/s41598-017-13259-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 09/21/2017] [Indexed: 11/24/2022] Open
Abstract
The advent of big data era has imposed both running time and learning efficiency challenges for the machine learning researchers. Biomedical OMIC research is one of these big data areas and has changed the biomedical research drastically. But the high cost of data production and difficulty in participant recruitment introduce the paradigm of “large p small n” into the biomedical research. Feature selection is usually employed to reduce the high number of biomedical features, so that a stable data-independent classification or regression model may be achieved. This study randomly changes the first element of the widely-used incremental feature selection (IFS) strategy and selects the best feature subset that may be ranked low by the statistical association evaluation algorithms, e.g. t-test. The hypothesis is that two low-ranked features may be orchestrated to achieve a good classification performance. The proposed Randomly re-started Incremental Feature Selection (RIFS) algorithm demonstrates both higher classification accuracy and smaller feature number than the existing algorithms. RIFS also outperforms the existing methylomic diagnosis model for the prostate malignancy with a larger accuracy and a lower number of transcriptomic features.
Collapse
|
258
|
Ju Z, Sun J, Li Y, Wang L. Predicting lysine glycation sites using bi-profile bayes feature extraction. Comput Biol Chem 2017; 71:98-103. [PMID: 29040908 DOI: 10.1016/j.compbiolchem.2017.10.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 09/14/2017] [Accepted: 10/07/2017] [Indexed: 12/21/2022]
Abstract
Glycation is a nonenzymatic post-translational modification which has been found to be involved in various biological processes and closely associated with many metabolic diseases. The accurate identification of glycation sites is important to understand the underlying molecular mechanisms of glycation. As the traditional experimental methods are often labor-intensive and time-consuming, it is desired to develop computational methods to predict glycation sites. In this study, a novel predictor named BPB_GlySite is proposed to predict lysine glycation sites by using bi-profile bayes feature extraction and support vector machine algorithm. As illustrated by 10-fold cross-validation, BPB_GlySite achieves a satisfactory performance with a Sensitivity of 63.68%, a Specificity of 72.60%, an Accuracy of 69.63% and a Matthew's correlation coefficient of 0.3499. Experimental results also indicate that BPB_GlySite significantly outperforms three existing glycation sites predictors: NetGlycate, PreGly and Gly-PseAAC. Therefore, BPB_GlySite can be a useful bioinformatics tool for the prediction of glycation sites. A user-friendly web-server for BPB_GlySite is established at 123.206.31.171/BPB_GlySite/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China.
| | - Juhe Sun
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China
| | - Yanjie Li
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China
| | - Li Wang
- College of Science, Shenyang Aerospace University, 110136, People's Republic of China
| |
Collapse
|
259
|
Cheng X, Xiao X, Chou KC. pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics 2017; 110:S0888-7543(17)30102-7. [PMID: 28989035 DOI: 10.1016/j.ygeno.2017.10.002] [Citation(s) in RCA: 92] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Revised: 09/28/2017] [Accepted: 10/04/2017] [Indexed: 01/21/2023]
Abstract
Information of the proteins' subcellular localization is crucially important for revealing their biological functions in a cell, the basic unit of life. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop computational tools for timely identifying their subcellular locations based on the sequence information alone. The current study is focused on the Gram-negative bacterial proteins. Although considerable efforts have been made in protein subcellular prediction, the problem is far from being solved yet. This is because mounting evidences have indicated that many Gram-negative bacterial proteins exist in two or more location sites. Unfortunately, most existing methods can be used to deal with single-location proteins only. Actually, proteins with multi-locations may have some special biological functions important for both basic research and drug design. In this study, by using the multi-label theory, we developed a new predictor called "pLoc-mGneg" for predicting the subcellular localization of Gram-negative bacterial proteins with both single and multiple locations. Rigorous cross-validation on a high quality benchmark dataset indicated that the proposed predictor is remarkably superior to "iLoc-Gneg", the state-of-the-art predictor for the same purpose. For the convenience of most experimental scientists, a user-friendly web-server for the novel predictor has been established at http://www.jci-bioinfo.cn/pLoc-mGneg/, by which users can easily get their desired results without the need to go through the complicated mathematics involved.
Collapse
Affiliation(s)
- Xiang Cheng
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen, China; The Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Kuo-Chen Chou
- The Gordon Life Science Institute, Boston, MA 02478, USA; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
260
|
Prediction of lysine crotonylation sites by incorporating the composition of k -spaced amino acid pairs into Chou’s general PseAAC. J Mol Graph Model 2017; 77:200-204. [DOI: 10.1016/j.jmgm.2017.08.020] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2017] [Revised: 08/21/2017] [Accepted: 08/21/2017] [Indexed: 12/11/2022]
|
261
|
Manavalan B, Basith S, Shin TH, Choi S, Kim MO, Lee G. MLACP: machine-learning-based prediction of anticancer peptides. Oncotarget 2017; 8:77121-77136. [PMID: 29100375 PMCID: PMC5652333 DOI: 10.18632/oncotarget.20365] [Citation(s) in RCA: 176] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 07/13/2017] [Indexed: 01/25/2023] Open
Abstract
Cancer is the second leading cause of death globally, and use of therapeutic peptides to target and kill cancer cells has received considerable attention in recent years. Identification of anticancer peptides (ACPs) through wet-lab experimentation is expensive and often time consuming; therefore, development of an efficient computational method is essential to identify potential ACP candidates prior to in vitro experimentation. In this study, we developed support vector machine- and random forest-based machine-learning methods for the prediction of ACPs using the features calculated from the amino acid sequence, including amino acid composition, dipeptide composition, atomic composition, and physicochemical properties. We trained our methods using the Tyagi-B dataset and determined the machine parameters by 10-fold cross-validation. Furthermore, we evaluated the performance of our methods on two benchmarking datasets, with our results showing that the random forest-based method outperformed the existing methods with an average accuracy and Matthews correlation coefficient value of 88.7% and 0.78, respectively. To assist the scientific community, we also developed a publicly accessible web server at www.thegleelab.org/MLACP.html.
Collapse
Affiliation(s)
| | - Shaherin Basith
- College of Pharmacy, Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul, Republic of Korea
| | - Tae Hwan Shin
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
- Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| | - Sun Choi
- College of Pharmacy, Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul, Republic of Korea
| | - Myeong Ok Kim
- Division of Life Science and Applied Life Science (BK21 Plus), College of Natural Sciences, Gyeongsang National University, Jinju, Republic of Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea
- Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea
| |
Collapse
|
262
|
Sankari ES, Manimegalai D. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets. J Theor Biol 2017; 435:208-217. [PMID: 28941868 DOI: 10.1016/j.jtbi.2017.09.018] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 09/15/2017] [Accepted: 09/18/2017] [Indexed: 12/19/2022]
Abstract
Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier.
Collapse
Affiliation(s)
- E Siva Sankari
- Department of CSE, Government College of Engineering, Tirunelveli, Tamil Nadu, India.
| | - D Manimegalai
- Department of IT, National Engineering College, Kovilpatti, Tamil Nadu, India.
| |
Collapse
|
263
|
Du QS, Wang SQ, Xie NZ, Wang QY, Huang RB, Chou KC. 2L-PCA: a two-level principal component analyzer for quantitative drug design and its applications. Oncotarget 2017; 8:70564-70578. [PMID: 29050302 PMCID: PMC5642577 DOI: 10.18632/oncotarget.19757] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 06/30/2017] [Indexed: 01/25/2023] Open
Abstract
A two-level principal component predictor (2L-PCA) was proposed based on the principal component analysis (PCA) approach. It can be used to quantitatively analyze various compounds and peptides about their functions or potentials to become useful drugs. One level is for dealing with the physicochemical properties of drug molecules, while the other level is for dealing with their structural fragments. The predictor has the self-learning and feedback features to automatically improve its accuracy. It is anticipated that 2L-PCA will become a very useful tool for timely providing various useful clues during the process of drug development.
Collapse
Affiliation(s)
- Qi-Shi Du
- State Key Laboratory of China for Biomass Energy Enzyme Technology, National Engineering Research Center of China for Non-Food Biorefinery, Guangxi Academy of Sciences, Nanning 530007, China
- Gordon Life Science Institute, Boston, MA 02478, USA
| | - Shu-Qing Wang
- School of Pharmacy, Tianjin Medical University, Tianjin 300070, China
| | - Neng-Zhong Xie
- State Key Laboratory of China for Biomass Energy Enzyme Technology, National Engineering Research Center of China for Non-Food Biorefinery, Guangxi Academy of Sciences, Nanning 530007, China
| | - Qing-Yan Wang
- State Key Laboratory of China for Biomass Energy Enzyme Technology, National Engineering Research Center of China for Non-Food Biorefinery, Guangxi Academy of Sciences, Nanning 530007, China
| | - Ri-Bo Huang
- State Key Laboratory of China for Biomass Energy Enzyme Technology, National Engineering Research Center of China for Non-Food Biorefinery, Guangxi Academy of Sciences, Nanning 530007, China
| | - Kuo-Chen Chou
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
- Gordon Life Science Institute, Boston, MA 02478, USA
| |
Collapse
|
264
|
Liu B, Yang F, Huang DS, Chou KC. iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC. Bioinformatics 2017; 34:33-40. [DOI: 10.1093/bioinformatics/btx579] [Citation(s) in RCA: 235] [Impact Index Per Article: 33.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 09/13/2017] [Indexed: 12/30/2022] Open
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
- The Gordon Life Science Institute, Boston, MA, USA
| | - Fan Yang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Kuo-Chen Chou
- The Gordon Life Science Institute, Boston, MA, USA
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
- Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
265
|
pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC. Gene 2017; 628:315-321. [DOI: 10.1016/j.gene.2017.07.036] [Citation(s) in RCA: 135] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 07/08/2017] [Accepted: 07/11/2017] [Indexed: 12/25/2022]
|
266
|
Liu B, Wu H, Zhang D, Wang X, Chou KC. Pse-Analysis: a python package for DNA/RNA and protein/ peptide sequence analysis based on pseudo components and kernel methods. Oncotarget 2017; 8:13338-13343. [PMID: 28076851 PMCID: PMC5355101 DOI: 10.18632/oncotarget.14524] [Citation(s) in RCA: 114] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Accepted: 12/27/2016] [Indexed: 12/20/2022] Open
Abstract
To expedite the pace in conducting genome/proteome analysis, we have developed a Python package called Pse-Analysis. The powerful package can automatically complete the following five procedures: (1) sample feature extraction, (2) optimal parameter selection, (3) model training, (4) cross validation, and (5) evaluating prediction quality. All the work a user needs to do is to input a benchmark dataset along with the query biological sequences concerned. Based on the benchmark dataset, Pse-Analysis will automatically construct an ideal predictor, followed by yielding the predicted results for the submitted query samples. All the aforementioned tedious jobs can be automatically done by the computer. Moreover, the multiprocessing technique was adopted to enhance computational speed by about 6 folds. The Pse-Analysis Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/Pse-Analysis/, and can be directly run on Windows, Linux, and Unix.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Gordon Life Science Institute, Boston, Massachusetts, USA
| | - Hao Wu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Deyuan Zhang
- School of Computer, Shenyang Aerospace University, Shenyang, Liaoning, China
| | - Xiaolong Wang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China.,Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, Massachusetts, USA.,Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
267
|
He L, Li Y, He RL, Yau SST. A novel alignment-free vector method to cluster protein sequences. J Theor Biol 2017; 427:41-52. [DOI: 10.1016/j.jtbi.2017.06.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Revised: 05/04/2017] [Accepted: 06/02/2017] [Indexed: 11/29/2022]
|
268
|
pDHS-SVM: A prediction method for plant DNase I hypersensitive sites based on support vector machine. J Theor Biol 2017; 426:126-133. [DOI: 10.1016/j.jtbi.2017.05.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 05/10/2017] [Accepted: 05/23/2017] [Indexed: 11/21/2022]
|
269
|
Huo H, Li T, Wang S, Lv Y, Zuo Y, Yang L. Prediction of presynaptic and postsynaptic neurotoxins by combining various Chou's pseudo components. Sci Rep 2017; 7:5827. [PMID: 28724993 PMCID: PMC5517432 DOI: 10.1038/s41598-017-06195-y] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 06/08/2017] [Indexed: 11/09/2022] Open
Abstract
Presynaptic and postsynaptic neurotoxins are two groups of neurotoxins. Identification of presynaptic and postsynaptic neurotoxins is an important work for numerous newly found toxins. It is both costly and time consuming to determine these two neurotoxins by experimental methods. As a complement, using computational methods for predicting presynaptic and postsynaptic neurotoxins could provide some useful information in a timely manner. In this study, we described four algorithms for predicting presynaptic and postsynaptic neurotoxins from sequence driven features by using Increment of Diversity (ID), Multinomial Naive Bayes Classifier (MNBC), Random Forest (RF), and K-nearest Neighbours Classifier (IBK). Each protein sequence was encoded by pseudo amino acid (PseAA) compositions and three biological motif features, including MEME, Prosite and InterPro motif features. The Maximum Relevance Minimum Redundancy (MRMR) feature selection method was used to rank the PseAA compositions and the 50 top ranked features were selected to improve the prediction accuracy. The PseAA compositions and three kinds of biological motif features were combined and 12 different parameters that defined as P1-P12 were selected as the input parameters of ID, MNBC, RF, and IBK. The prediction results obtained in this study were significantly better than those of previously developed methods.
Collapse
Affiliation(s)
- Haiyan Huo
- Department of Environmental Engineering, Hohhot University for Nationalities, Hohhot, 010051, China
| | - Tao Li
- College of Life Science, Inner Mongolia Agricultural University, Hohhot, 010018, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yingli Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, Inner Mongolia University, Hohhot, 010021, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
270
|
Ortiz-Martinez M, Gonzalez de Mejia E, García-Lara S, Aguilar O, Lopez-Castillo LM, Otero-Pappatheodorou JT. Antiproliferative effect of peptide fractions isolated from a quality protein maize, a white hybrid maize, and their derived peptides on hepatocarcinoma human HepG2 cells. J Funct Foods 2017. [DOI: 10.1016/j.jff.2017.04.015] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
|
271
|
A computational model for predicting integrase catalytic domain of retrovirus. J Theor Biol 2017; 423:63-70. [PMID: 28454901 DOI: 10.1016/j.jtbi.2017.04.020] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Revised: 04/01/2017] [Accepted: 04/21/2017] [Indexed: 11/23/2022]
Abstract
Integrase catalytic domain (ICD) is an essential part in the retrovirus for integration reaction, which enables its newly synthesized DNA to be incorporated into the DNA of infected cells. Owing to the crucial role of ICD for the retroviral replication and the absence of an equivalent of integrase in host cells, it is comprehensible that ICD is a promising drug target for therapeutic intervention. However, annotated ICDs in UniProtKB database have still been insufficient for a good understanding of their statistical characteristics so far. Accordingly, it is of great importance to put forward a computational ICD model in this work to annotate these domains in the retroviruses. The proposed model then discovered 11,660 new putative ICDs after scanning sequences without ICD annotations. Subsequently in order to provide much confidence in ICD prediction, it was tested under different cross-validation methods, compared with other database search tools, and verified on independent datasets. Furthermore, an evolutionary analysis performed on the annotated ICDs of retroviruses revealed a tight connection between ICD and retroviral classification. All the datasets involved in this paper and the application software tool of this model can be available for free download at https://sourceforge.net/projects/icdtool/files/?source=navbar.
Collapse
|
272
|
Akbar S, Hayat M, Iqbal M, Jan MA. iACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space. Artif Intell Med 2017; 79:62-70. [PMID: 28655440 DOI: 10.1016/j.artmed.2017.06.008] [Citation(s) in RCA: 96] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 06/12/2017] [Accepted: 06/16/2017] [Indexed: 01/10/2023]
Abstract
Cancer is a fatal disease, responsible for one-quarter of all deaths in developed countries. Traditional anticancer therapies such as, chemotherapy and radiation, are highly expensive, susceptible to errors and ineffective techniques. These conventional techniques induce severe side-effects on human cells. Due to perilous impact of cancer, the development of an accurate and highly efficient intelligent computational model is desirable for identification of anticancer peptides. In this paper, evolutionary intelligent genetic algorithm-based ensemble model, 'iACP-GAEnsC', is proposed for the identification of anticancer peptides. In this model, the protein sequences are formulated, using three different discrete feature representation methods, i.e., amphiphilic Pseudo amino acid composition, g-Gap dipeptide composition, and Reduce amino acid alphabet composition. The performance of the extracted feature spaces are investigated separately and then merged to exhibit the significance of hybridization. In addition, the predicted results of individual classifiers are combined together, using optimized genetic algorithm and simple majority technique in order to enhance the true classification rate. It is observed that genetic algorithm-based ensemble classification outperforms than individual classifiers as well as simple majority voting base ensemble. The performance of genetic algorithm-based ensemble classification is highly reported on hybrid feature space, with an accuracy of 96.45%. In comparison to the existing techniques, 'iACP-GAEnsC' model has achieved remarkable improvement in terms of various performance metrics. Based on the simulation results, it is observed that 'iACP-GAEnsC' model might be a leading tool in the field of drug design and proteomics for researchers.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Mian Ahmad Jan
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| |
Collapse
|
273
|
Feng P, Ding H, Yang H, Chen W, Lin H, Chou KC. iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC. MOLECULAR THERAPY. NUCLEIC ACIDS 2017; 7:155-163. [PMID: 28624191 PMCID: PMC5415964 DOI: 10.1016/j.omtn.2017.03.006] [Citation(s) in RCA: 215] [Impact Index Per Article: 30.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2017] [Revised: 03/16/2017] [Accepted: 03/17/2017] [Indexed: 11/23/2022]
Abstract
There are many different types of RNA modifications, which are essential for numerous biological processes. Knowledge about the occurrence sites of RNA modifications in its sequence is a key for in-depth understanding of their biological functions and mechanism. Unfortunately, it is both time-consuming and laborious to determine these sites purely by experiments alone. Although some computational methods were developed in this regard, each one could only be used to deal with some type of modification individually. To our knowledge, no method has thus far been developed that can identify the occurrence sites for several different types of RNA modifications with one seamless package or platform. To address such a challenge, a novel platform called "iRNA-PseColl" has been developed. It was formed by incorporating both the individual and collective features of the sequence elements into the general pseudo K-tuple nucleotide composition (PseKNC) of RNA via the chemicophysical properties and density distribution of its constituent nucleotides. Rigorous cross-validations have indicated that the anticipated success rates achieved by the proposed platform are quite high. To maximize the convenience for most experimental biologists, the platform's web-server has been provided at http://lin.uestc.edu.cn/server/iRNA-PseColl along with a step-by-step user guide that will allow users to easily achieve their desired results without the need to go through the mathematical details involved in this paper.
Collapse
Affiliation(s)
- Pengmian Feng
- Hebei Province Key Laboratory of Occupational Health and Safety for Coal Industry, School of Public Health, North China University of Science and Technology, Tangshan, 063000, China
| | - Hui Ding
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Hui Yang
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China
| | - Wei Chen
- Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan 063000, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| | - Kuo-Chen Chou
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 610054, China; Gordon Life Science Institute, Boston, MA 02478, USA.
| |
Collapse
|
274
|
Jia C, Zuo Y. S-SulfPred: A sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. J Theor Biol 2017; 422:84-89. [DOI: 10.1016/j.jtbi.2017.03.031] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Revised: 03/05/2017] [Accepted: 03/20/2017] [Indexed: 10/19/2022]
|
275
|
Yang H, Li X, Cai Y, Wang Q, Li W, Liu G, Tang Y. In silico prediction of chemical subcellular localization via multi-classification methods. MEDCHEMCOMM 2017; 8:1225-1234. [PMID: 30108833 PMCID: PMC6072212 DOI: 10.1039/c7md00074j] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 03/22/2017] [Indexed: 12/16/2022]
Abstract
Chemical subcellular localization is closely related to drug distribution in the body and hence important in drug discovery and design. Although many in vivo and in vitro methods have been developed, in silico methods play key roles in the prediction of chemical subcellular localization due to their low costs and high performance. For that purpose, machine learning-based methods were developed here. At first, 614 unique compounds localized in the lysosome, mitochondria, nucleus and plasma membrane were collected from the literature. 80% of the compounds were used to build the models and the rest as the external validation set. Both fingerprints and molecular descriptors were used to describe the molecules, and six machine learning methods were applied to build the multi-classification models. The performance of the models was measured by 5-fold cross-validation and external validation. We further detected key substructures for each localization and analyzed potential structure-localization relationships, which could be very helpful for molecular design and modification. The key substructures can also be used as features complementary to fingerprints to improve the performance of the models.
Collapse
Affiliation(s)
- Hongbin Yang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Xiao Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Yingchun Cai
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Qin Wang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design , School of Pharmacy , East China University of Science and Technology , Shanghai 200237 , China .
| |
Collapse
|
276
|
He S, Liu X, Wang Y, Xu S, Lu J, Yang C, Zhou S, Sun Y, Gui W, Qin W. An effective fault diagnosis approach based on optimal weighted least squares support vector machine. CAN J CHEM ENG 2017. [DOI: 10.1002/cjce.22865] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Shiming He
- State Key Laboratory of Industrial Control Technology; College of Control Science & Engineering; Zhejiang University; Hangzhou 310027 P. R. China
| | - Xinggao Liu
- State Key Laboratory of Industrial Control Technology; College of Control Science & Engineering; Zhejiang University; Hangzhou 310027 P. R. China
| | - Yalin Wang
- School of Information Science and Engineering; Central South University; Changsha 410083 P. R. China
| | - Shenghu Xu
- China Petroleum Chemical Co Jiujiang branch; Jiujiang 332004 P. R. China
| | - Jiangang Lu
- State Key Laboratory of Industrial Control Technology; College of Control Science & Engineering; Zhejiang University; Hangzhou 310027 P. R. China
| | - Chunhua Yang
- School of Information Science and Engineering; Central South University; Changsha 410083 P. R. China
| | - Shengwu Zhou
- China Petroleum Chemical Co Jiujiang branch; Jiujiang 332004 P. R. China
| | - Youxian Sun
- State Key Laboratory of Industrial Control Technology; College of Control Science & Engineering; Zhejiang University; Hangzhou 310027 P. R. China
| | - Weihua Gui
- School of Information Science and Engineering; Central South University; Changsha 410083 P. R. China
| | - Weizhong Qin
- China Petroleum Chemical Co Jiujiang branch; Jiujiang 332004 P. R. China
| |
Collapse
|
277
|
Goede SL, de Galan BE, Leow MKS. Personalized glucose-insulin model based on signal analysis. J Theor Biol 2017; 419:333-342. [PMID: 28039012 DOI: 10.1016/j.jtbi.2016.12.018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 12/04/2016] [Accepted: 12/26/2016] [Indexed: 10/20/2022]
Abstract
Glucose plasma measurements for diabetes patients are generally presented as a glucose concentration-time profile with 15-60min time scale intervals. This limited resolution obscures detailed dynamic events of glucose appearance and metabolism. Measurement intervals of 15min or more could contribute to imperfections in present diabetes treatment. High resolution data from mixed meal tolerance tests (MMTT) for 24 type 1 and type 2 diabetes patients were used in our present modeling. We introduce a model based on the physiological properties of transport, storage and utilization. This logistic approach follows the principles of electrical network analysis and signal processing theory. The method mimics the physiological equivalent of the glucose homeostasis comprising the meal ingestion, absorption via the gastrointestinal tract (GIT) to the endocrine nexus between the liver, pancreatic alpha and beta cells. This model demystifies the metabolic 'black box' by enabling in silico simulations and fitting of individual responses to clinical data. Five-minute intervals MMTT data measured from diabetic subjects result in two independent model parameters that characterize the complete glucose system response at a personalized level. From the individual data measurements, we obtain a model which can be analyzed with a standard electrical network simulator for diagnostics and treatment optimization. The insulin dosing time scale can be accurately adjusted to match the individual requirements of characterized diabetic patients without the physical burden of treatment.
Collapse
Affiliation(s)
- Simon L Goede
- Systems Research, Oterlekerweg 4, 1841 GP Stompetoren, The Netherlands.
| | - Bastiaan E de Galan
- Department of General Internal Medicine of Radboud University Nijmegen Medical Centre, Postbus 9101, 6500 HB Nijmegen, The Netherlands.
| | - Melvin Khee Shing Leow
- Dept of Endocrinology, Tan Tock Seng Hospital, Singapore 308433, Office of Clinical Sciences, Duke-NUS Graduate Medical School, Singapore Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore.
| |
Collapse
|
278
|
Liu B, Yang F, Chou KC. 2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function. MOLECULAR THERAPY-NUCLEIC ACIDS 2017. [PMID: 28624202 PMCID: PMC5415553 DOI: 10.1016/j.omtn.2017.04.008] [Citation(s) in RCA: 194] [Impact Index Per Article: 27.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Involved with important cellular or gene functions and implicated with many kinds of cancers, piRNAs, or piwi-interacting RNAs, are of small non-coding RNA with around 19–33 nt in length. Given a small non-coding RNA molecule, can we predict whether it is of piRNA according to its sequence information alone? Furthermore, there are two types of piRNA: one has the function of instructing target mRNA deadenylation, and the other does not. Can we discriminate one from the other? With the avalanche of RNA sequences emerging in the postgenomic age, it is urgent to address the two problems for both basic research and drug development. Unfortunately, to the best of our knowledge, so far no computational methods whatsoever could be used to deal with the second problem, let alone deal with the two problems together. Here, by incorporating the physicochemical properties of nucleotides into the pseudo K-tuple nucleotide composition (PseKNC), we proposed a powerful predictor called 2L-piRNA. It is a two-layer ensemble classifier, in which the first layer is for identifying whether a query RNA molecule is piRNA or non-piRNA, and the second layer for identifying whether a piRNA is with or without the function of instructing target mRNA deadenylation. Rigorous cross-validations have indicated that the success rates achieved by the proposed predictor are quite high. For the convenience of most biologists and drug development scientists, the web server for 2L-piRNA has been established at http://bioinformatics.hitsz.edu.cn/2L-piRNA/, by which users can easily get their desired results without the need to go through the mathematical details.
Collapse
Affiliation(s)
- Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China; Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China; Gordon Life Science Institute, Belmont, MA 02478, USA.
| | - Fan Yang
- School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong 518055, China
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Belmont, MA 02478, USA; Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Center of Excellence in Genomic Medicine Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
| |
Collapse
|
279
|
Cheng X, Zhao SG, Xiao X, Chou KC. iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals. Oncotarget 2017; 8:58494-58503. [PMID: 28938573 PMCID: PMC5601669 DOI: 10.18632/oncotarget.17028] [Citation(s) in RCA: 96] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 03/28/2017] [Indexed: 01/18/2023] Open
Abstract
Recommended by the World Health Organization (WHO), drug compounds have been classified into 14 main ATC (Anatomical Therapeutic Chemical) classes according to their therapeutic and chemical characteristics. Given an uncharacterized compound, can we develop a computational method to fast identify which ATC class or classes it belongs to? The information thus obtained will timely help adjusting our focus and selection, significantly speeding up the drug development process. But this problem is by no means an easy one since some drug compounds may belong to two or more than two ATC classes. To address this problem, using the DO (Drug Ontology) approach based on the ChEBI (Chemical Entities of Biological Interest) database, we developed a predictor called iATC-mDO. Subsequently, hybridizing it with an existing drug ATC classifier, we constructed a predictor called iATC-mHyb. It has been demonstrated by the rigorous cross-validation and from five different measuring angles that iATC-mHyb is remarkably superior to the best existing predictor in identifying the ATC classes for drug compounds. To convenience most experimental scientists, a user-friendly web-server for iATC-mHyd has been established at http://www.jci-bioinfo.cn/iATC-mHyb, by which users can easily get their desired results without the need to go through the complicated mathematical equations involved.
Collapse
Affiliation(s)
- Xiang Cheng
- College of Information Science and Technology, Donghua University, Shanghai 201620, China.,Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333001, China
| | - Shu-Guang Zhao
- College of Information Science and Technology, Donghua University, Shanghai 201620, China
| | - Xuan Xiao
- Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333001, China.,Gordon Life Science Institute, Boston, MA 02478, USA
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
280
|
Yang L, Wang S, Zhou M, Chen X, Jiang W, Zuo Y, Lv Y. Molecular classification of prostate adenocarcinoma by the integrated somatic mutation profiles and molecular network. Sci Rep 2017; 7:738. [PMID: 28389666 PMCID: PMC5429686 DOI: 10.1038/s41598-017-00872-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 03/20/2017] [Indexed: 01/01/2023] Open
Abstract
Prostate cancer is one of the most common cancers in men and a leading cause of cancer death worldwide, displaying a broad range of heterogeneity in terms of clinical and molecular behavior. Increasing evidence suggests that classifying prostate cancers into distinct molecular subtypes is critical to exploring the potential molecular variation underlying this heterogeneity and to better treat this cancer. In this study, the somatic mutation profiles of prostate cancer were downloaded from the TCGA database and used as the source nodes of the random walk with restart algorithm (RWRA) for generating smoothed mutation profiles in the STRING network. The smoothed mutation profiles were selected as the input matrix of the Graph-regularized Nonnegative Matrix Factorization (GNMF) for classifying patients into distinct molecular subtypes. The results were associated with most of the clinical and pathological outcomes. In addition, some bioinformatics analyses were performed for the robust subtyping, and good results were obtained. These results indicated that prostate cancers can be usefully classified according to their mutation profiles, and we hope that these subtypes will help improve the treatment stratification of this cancer in the future.
Collapse
Affiliation(s)
- Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Xiaowen Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Wei Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
| | - Yongchun Zuo
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, Inner Mongolia University, Hohhot, 010021, China.
| | - Yingli Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
281
|
Sequence-based discrimination of protein-RNA interacting residues using a probabilistic approach. J Theor Biol 2017; 418:77-83. [DOI: 10.1016/j.jtbi.2017.01.040] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2016] [Revised: 01/06/2017] [Accepted: 01/27/2017] [Indexed: 11/22/2022]
|
282
|
PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm. J Theor Biol 2017; 417:1-7. [DOI: 10.1016/j.jtbi.2017.01.019] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Revised: 01/06/2017] [Accepted: 01/14/2017] [Indexed: 12/12/2022]
|
283
|
Wu C, Yao S, Li X, Chen C, Hu X. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human. Int J Mol Sci 2017; 18:E420. [PMID: 28212312 PMCID: PMC5343954 DOI: 10.3390/ijms18020420] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Revised: 02/03/2017] [Accepted: 02/08/2017] [Indexed: 02/02/2023] Open
Abstract
DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.
Collapse
Affiliation(s)
- Chengchao Wu
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| | - Shixin Yao
- College of Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Xinghao Li
- College of Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Chujia Chen
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| | - Xuehai Hu
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
284
|
Meher PK, Sahu TK, Saini V, Rao AR. Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou's general PseAAC. Sci Rep 2017; 7:42362. [PMID: 28205576 PMCID: PMC5304217 DOI: 10.1038/srep42362] [Citation(s) in RCA: 305] [Impact Index Per Article: 43.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Accepted: 01/09/2017] [Indexed: 11/13/2022] Open
Abstract
Antimicrobial peptides (AMPs) are important components of the innate immune system that have been found to be effective against disease causing pathogens. Identification of AMPs through wet-lab experiment is expensive. Therefore, development of efficient computational tool is essential to identify the best candidate AMP prior to the in vitro experimentation. In this study, we made an attempt to develop a support vector machine (SVM) based computational approach for prediction of AMPs with improved accuracy. Initially, compositional, physico-chemical and structural features of the peptides were generated that were subsequently used as input in SVM for prediction of AMPs. The proposed approach achieved higher accuracy than several existing approaches, while compared using benchmark dataset. Based on the proposed approach, an online prediction server iAMPpred has also been developed to help the scientific community in predicting AMPs, which is freely accessible at http://cabgrid.res.in:8080/amppred/. The proposed approach is believed to supplement the tools and techniques that have been developed in the past for prediction of AMPs.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-110012, India
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-110012, India
| | - Varsha Saini
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-110012, India.,Department of Bioinformatics, Janta Vedic College, Baraut, Baghpat-250611, Uttar Pradesh, India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-110012, India
| |
Collapse
|
285
|
Zheng Y, Li H, Wang Y, Meng H, Zhang Q, Zhao X. Evolutionary mechanism and biological functions of 8-mers containing CG dinucleotide in yeast. Chromosome Res 2017; 25:173-189. [PMID: 28181048 DOI: 10.1007/s10577-017-9554-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 12/27/2016] [Accepted: 01/27/2017] [Indexed: 01/01/2023]
Abstract
The rules of k-mer non-random usage and the biological functions are worthy of special attention. Firstly, the article studied human 8-mer spectra and found that only the spectra of cytosine-guanine (CG) dinucleotide classification formed independent unimodal distributions when the 8-mers were classified into three subsets under 16 dinucleotide classifications. Secondly, the distribution rules were reproduced by other seven species including yeast, which showed that the evolution phenomenon had species universality. It followed that we proposed two theoretical conjectures: (1) CG1 motifs (8-mers including 1 CG) are the nucleosome-binding motifs. (2) CG2 motifs (8-mers including two or more than two CG) are the modular units of CpG islands. Our conjectures were confirmed in yeast by the following results: a maximum of average area under the receiver operating characteristic (AUC) resulted from CG1 information during nucleosome core sequences, and linker sequences were distinguished by three CG subsets; there was a one-to-one relationship between abundant CG1 signal regions and histone positions; the sequence changing of squeezed nucleosomes was relevant with the strength of CG1 signals; and the AUC value of 0.986 was based on CG2 information when CpG islands and non-CpG islands were distinguished by the three CG subsets.
Collapse
Affiliation(s)
- Yan Zheng
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Hong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China. .,, No.235, West University Street, Hohhot, Inner Mongolia, China.
| | - Yue Wang
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Hu Meng
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Qiang Zhang
- College of Science, Inner Mongolia Agricultural University, Hohhot, 010018, China
| | - Xiaoqing Zhao
- Biotechnology research centre, Inner Mongolia Academy of Agricultural and Animal Husbandry Science, Hohhot, 010021, China
| |
Collapse
|
286
|
Khan M, Hayat M, Khan SA, Iqbal N. Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou's general PseAAC. J Theor Biol 2017; 415:13-19. [DOI: 10.1016/j.jtbi.2016.12.004] [Citation(s) in RCA: 88] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Revised: 10/24/2016] [Accepted: 12/07/2016] [Indexed: 01/22/2023]
|
287
|
Xiao X, Cheng X, Su S, Mao Q, Chou KC. pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.99032] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
288
|
Liu B, Wu H, Chou KC. Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.94007] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
289
|
He W, Jia C. EnhancerPred2.0: predicting enhancers and their strength based on position-specific trinucleotide propensity and electron–ion interaction potential feature selection. MOLECULAR BIOSYSTEMS 2017; 13:767-774. [DOI: 10.1039/c7mb00054e] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Enhancers arecis-acting elements that play major roles in upregulating eukaryotic gene expression by providing binding sites for transcription factors and their complexes.
Collapse
Affiliation(s)
- Wenying He
- Department of Mathematics
- Dalian Maritime University
- Dalian 116026
- China
| | - Cangzhi Jia
- Department of Mathematics
- Dalian Maritime University
- Dalian 116026
- China
| |
Collapse
|
290
|
Identification of DEP domain-containing proteins by a machine learning method and experimental analysis of their expression in human HCC tissues. Sci Rep 2016; 6:39655. [PMID: 28000796 PMCID: PMC5175133 DOI: 10.1038/srep39655] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Accepted: 11/24/2016] [Indexed: 12/23/2022] Open
Abstract
The Dishevelled/EGL-10/Pleckstrin (DEP) domain-containing (DEPDC) proteins have seven members. However, whether this superfamily can be distinguished from other proteins based only on the amino acid sequences, remains unknown. Here, we describe a computational method to segregate DEPDCs and non-DEPDCs. First, we examined the Pfam numbers of the known DEPDCs and used the longest sequences for each Pfam to construct a phylogenetic tree. Subsequently, we extracted 188-dimensional (188D) and 20D features of DEPDCs and non-DEPDCs and classified them with random forest classifier. We also mined the motifs of human DEPDCs to find the related domains. Finally, we designed experimental verification methods of human DEPDC expression at the mRNA level in hepatocellular carcinoma (HCC) and adjacent normal tissues. The phylogenetic analysis showed that the DEPDCs superfamily can be divided into three clusters. Moreover, the 188D and 20D features can both be used to effectively distinguish the two protein types. Motif analysis revealed that the DEP and RhoGAP domain was common in human DEPDCs, human HCC and the adjacent tissues that widely expressed DEPDCs. However, their regulation was not identical. In conclusion, we successfully constructed a binary classifier for DEPDCs and experimentally verified their expression in human HCC tissues.
Collapse
|
291
|
Lin W, Xu D. Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types. Bioinformatics 2016; 32:3745-3752. [PMID: 27565585 PMCID: PMC5167070 DOI: 10.1093/bioinformatics/btw560] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2016] [Revised: 08/07/2016] [Accepted: 08/22/2016] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION With the rapid increase of infection resistance to antibiotics, it is urgent to find novel infection therapeutics. In recent years, antimicrobial peptides (AMPs) have been utilized as potential alternatives for infection therapeutics. AMPs are key components of the innate immune system and can protect the host from various pathogenic bacteria. Identifying AMPs and their functional types has led to many studies, and various predictors using machine learning have been developed. However, there is room for improvement; in particular, no predictor takes into account the lack of balance among different functional AMPs. RESULTS In this paper, a new synthetic minority over-sampling technique on imbalanced and multi-label datasets, referred to as ML-SMOTE, was designed for processing and identifying AMPs' functional families. A novel multi-label classifier, MLAMP, was also developed using ML-SMOTE and grey pseudo amino acid composition. The classifier obtained 0.4846 subset accuracy and 0.16 hamming loss. AVAILABILITY AND IMPLEMENTATION A user-friendly web-server for MLAMP was established at http://www.jci-bioinfo.cn/MLAMP CONTACTS: linweizhong@jci.edu.cn or xudong@missouri.edu.
Collapse
Affiliation(s)
- Weizhong Lin
- nformation Engineering School, Jingdezhen Ceramic Institute, Jingdezhen 333406, China
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
292
|
Jia C, He W. EnhancerPred: a predictor for discovering enhancers based on the combination and selection of multiple features. Sci Rep 2016; 6:38741. [PMID: 27941893 PMCID: PMC5150536 DOI: 10.1038/srep38741] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 11/11/2016] [Indexed: 12/31/2022] Open
Abstract
Enhancers are cis elements that play an important role in regulating gene expression by enhancing it. Recent study of modifications revealed that enhancers are a large group of functional elements with many different subgroups, which have different biological activities and regulatory effects on target genes. As powerful auxiliary tools, several computational methods have been proposed to distinguish enhancers from other regulatory elements, but only one method has been considered to clustering them into subgroups. In this study, we developed a predictor (called EnhancerPred) to distinguish between enhancers and nonenhancers and to determine enhancers' strength. A two-step wrapper-based feature selection method was applied in high dimension feature vector from bi-profile Bayes and pseudo-nucleotide composition. Finally, the combination of 104 features from bi-profile Bayes, 1 feature from nucleotide composition and 9 features from pseudo-nucleotide composition yielded the best performance for identifying enhancers and nonenhancers, with overall Acc of 77.39%. The combination of 89 features from bi-profile Bayes and 10 features from pseudo-nucleotide composition yielded the best performance for identifying strong and weak enhancers, with overall Acc of 68.19%. The process and steps of feature optimization illustrated that it is necessary to construct a particular model for identifying strong enhancers and weak enhancers.
Collapse
Affiliation(s)
- Cangzhi Jia
- Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| | - Wenying He
- Department of Mathematics, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| |
Collapse
|
293
|
Cai Y, Liao Z, Ju Y, Liu J, Mao Y, Liu X. Resistance gene identification from Larimichthys crocea with machine learning techniques. Sci Rep 2016; 6:38367. [PMID: 27922074 PMCID: PMC5138596 DOI: 10.1038/srep38367] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 11/08/2016] [Indexed: 12/11/2022] Open
Abstract
The research on resistance genes (R-gene) plays a vital role in bioinformatics as it has the capability of coping with adverse changes in the external environment, which can form the corresponding resistance protein by transcription and translation. It is meaningful to identify and predict R-gene of Larimichthys crocea (L.Crocea). It is friendly for breeding and the marine environment as well. Large amounts of L.Crocea's immune mechanisms have been explored by biological methods. However, much about them is still unclear. In order to break the limited understanding of the L.Crocea's immune mechanisms and to detect new R-gene and R-gene-like genes, this paper came up with a more useful combination prediction method, which is to extract and classify the feature of available genomic data by machine learning. The effectiveness of feature extraction and classification methods to identify potential novel R-gene was evaluated, and different statistical analyzes were utilized to explore the reliability of prediction method, which can help us further understand the immune mechanisms of L.Crocea against pathogens. In this paper, a webserver called LCRG-Pred is available at http://server.malab.cn/rg_lc/.
Collapse
Affiliation(s)
- Yinyin Cai
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005, China
- State Key Laboratory of Large Yellow Croaker Breeding, Ningde Fufa Fisheries Company Limited, Ningde, 352000, China
| | - Zhijun Liao
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, 350122, China
| | - Ying Ju
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005, China
| | - Juan Liu
- School of Aerospace Engineering, Xiamen University, Xiamen, Fujian 361005, China
| | - Yong Mao
- State Key Laboratory of Large Yellow Croaker Breeding, Ningde Fufa Fisheries Company Limited, Ningde, 352000, China
- College of Ocean and Earth Sciences, Xiamen University, Xiamen, 361102, China
| | - Xiangrong Liu
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005, China
- State Key Laboratory of Large Yellow Croaker Breeding, Ningde Fufa Fisheries Company Limited, Ningde, 352000, China
| |
Collapse
|
294
|
Yang L, Wang S, Zhou M, Chen X, Zuo Y, Lv Y. Characterization of BioPlex network by topological properties. J Theor Biol 2016; 409:148-154. [PMID: 27552850 DOI: 10.1016/j.jtbi.2016.08.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Revised: 07/28/2016] [Accepted: 08/20/2016] [Indexed: 11/16/2022]
Abstract
Protein-protein interaction (PPI) networks are emerging as valuable prototypes to study important problems in molecular cellular biology and systems biomedicine. An analysis of the topological properties of a PPI network is very helpful for understanding the function and structure of networks. In this study, we analyzed the topological patterns in the BioPlex network containing interactions among 10,961 proteins; most interactions were previously undocumented. The BioPlex network is a comprehensive map of human protein interactions and represents the first phase of a long-term effort to profile the entire human ORFEOME collection. Similar to other biological networks, we observed that the BioPlex network has several topological properties. We also quantified correlations profiles for the BioPlex network and compared them to randomized versions of the same network. We found that for the BioPlex network, edges between proteins with intermediate degrees were strongly suppressed, whereas edges between low-connected proteins were favored. Finally, the degrees of essential genes were compared with the degrees of non-essential genes and randomly selected proteins. There were no significant differences between the groups.
Collapse
Affiliation(s)
- Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Meng Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Xiaowen Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The National Research Center for Animal Transgenic Biotechnology, Inner Mongolia University, Hohhot 010021, China
| | - Yingli Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
295
|
Kuo TH, Li KB. Predicting Protein-Protein Interaction Sites Using Sequence Descriptors and Site Propensity of Neighboring Amino Acids. Int J Mol Sci 2016; 17:ijms17111788. [PMID: 27792167 PMCID: PMC5133789 DOI: 10.3390/ijms17111788] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2016] [Revised: 10/14/2016] [Accepted: 10/18/2016] [Indexed: 12/17/2022] Open
Abstract
Information about the interface sites of Protein–Protein Interactions (PPIs) is useful for many biological research works. However, despite the advancement of experimental techniques, the identification of PPI sites still remains as a challenging task. Using a statistical learning technique, we proposed a computational tool for predicting PPI interaction sites. As an alternative to similar approaches requiring structural information, the proposed method takes all of the input from protein sequences. In addition to typical sequence features, our method takes into consideration that interaction sites are not randomly distributed over the protein sequence. We characterized this positional preference using protein complexes with known structures, proposed a numerical index to estimate the propensity and then incorporated the index into a learning system. The resulting predictor, without using structural information, yields an area under the ROC curve (AUC) of 0.675, recall of 0.597, precision of 0.311 and accuracy of 0.583 on a ten-fold cross-validation experiment. This performance is comparable to the previous approach in which structural information was used. Upon introducing the B-factor data to our predictor, we demonstrated that the AUC can be further improved to 0.750. The tool is accessible at http://bsaltools.ym.edu.tw/predppis.
Collapse
Affiliation(s)
- Tzu-Hao Kuo
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan.
| | - Kuo-Bin Li
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei 112, Taiwan.
- Office of Information Management, National Yang-Ming University Hospital, Yilan 260, Taiwan.
| |
Collapse
|
296
|
Fan GL, Liu YL, Wang H. Identification of thermophilic proteins by incorporating evolutionary and acid dissociation information into Chou's general pseudo amino acid composition. J Theor Biol 2016; 407:138-142. [DOI: 10.1016/j.jtbi.2016.07.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Revised: 06/24/2016] [Accepted: 07/07/2016] [Indexed: 10/21/2022]
|
297
|
Li GQ, Liu Z, Shen HB, Yu DJ. TargetM6A: Identifying N6-Methyladenosine Sites From RNA Sequences via Position-Specific Nucleotide Propensities and a Support Vector Machine. IEEE Trans Nanobioscience 2016; 15:674-682. [DOI: 10.1109/tnb.2016.2599115] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
298
|
Gogoi D, Baruah VJ, Chaliha AK, Kakoti BB, Sarma D, Buragohain AK. 3D pharmacophore-based virtual screening, docking and density functional theory approach towards the discovery of novel human epidermal growth factor receptor-2 (HER2) inhibitors. J Theor Biol 2016; 411:68-80. [PMID: 27693363 DOI: 10.1016/j.jtbi.2016.09.016] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Revised: 09/06/2016] [Accepted: 09/20/2016] [Indexed: 11/24/2022]
Abstract
Human epidermal growth factor receptor 2 (HER2) is one of the four members of the epidermal growth factor receptor (EGFR) family and is expressed to facilitate cellular proliferation across various tissue types. Therapies targeting HER2, which is a transmembrane glycoprotein with tyrosine kinase activity, offer promising prospects especially in breast and gastric/gastroesophageal cancer patients. Persistence of both primary and acquired resistance to various routine drugs/antibodies is a disappointing outcome in the treatment of many HER2 positive cancer patients and is a challenge that requires formulation of new and improved strategies to overcome the same. Identification of novel HER2 inhibitors with improved therapeutics index was performed with a highly correlating (r=0.975) ligand-based pharmacophore model (Hypo1) in this study. Hypo1 was generated from a training set of 22 compounds with HER2 inhibitory activity and this well-validated hypothesis was subsequently used as a 3D query to screen compounds in a total of four databases of which two were natural product databases. Further, these compounds were analyzed for compliance with Veber's drug-likeness rule and optimum ADMET parameters. The selected compounds were then subjected to molecular docking and Density Functional Theory (DFT) analysis to discern their molecular interactions at the active site of HER2. The findings thus presented would be an important starting point towards the development of novel HER2 inhibitors using well-validated computational techniques.
Collapse
Affiliation(s)
- Dhrubajyoti Gogoi
- DBT-Bioinformatics Infrastructure Facility, Centre for Biotechnology and Bioinformatics, School of Science and Engineering, Dibrugarh University, Dibrugarh, Assam, India
| | - Vishwa Jyoti Baruah
- DBT-Bioinformatics Infrastructure Facility, Centre for Biotechnology and Bioinformatics, School of Science and Engineering, Dibrugarh University, Dibrugarh, Assam, India
| | - Amrita Kashyap Chaliha
- DBT-Bioinformatics Infrastructure Facility, Centre for Biotechnology and Bioinformatics, School of Science and Engineering, Dibrugarh University, Dibrugarh, Assam, India
| | - Bibhuti Bhushan Kakoti
- DBT-Bioinformatics Infrastructure Facility, Centre for Biotechnology and Bioinformatics, School of Science and Engineering, Dibrugarh University, Dibrugarh, Assam, India
| | - Diganta Sarma
- Department of Chemistry, School of Science and Engineering, Dibrugarh University, Dibrugarh, Assam, India
| | - Alak Kumar Buragohain
- DBT-Bioinformatics Infrastructure Facility, Centre for Biotechnology and Bioinformatics, School of Science and Engineering, Dibrugarh University, Dibrugarh, Assam, India.
| |
Collapse
|
299
|
Li FM, Wang XQ. Identifying anticancer peptides by using improved hybrid compositions. Sci Rep 2016; 6:33910. [PMID: 27670968 PMCID: PMC5037382 DOI: 10.1038/srep33910] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2016] [Accepted: 09/02/2016] [Indexed: 12/20/2022] Open
Abstract
Cancer is one of the main causes of threats to human life. Identification of anticancer peptides is important for developing effective anticancer drugs. In this paper, we developed an improved predictor to identify the anticancer peptides. The amino acid composition (AAC), the average chemical shifts (acACS) and the reduced amino acid composition (RAAC) were selected to predict the anticancer peptides by using the support vector machine (SVM). The overall prediction accuracy reaches to 93.61% in jackknife test. The results indicated that the combined parameter was helpful to the prediction for anticancer peptides.
Collapse
Affiliation(s)
- Feng-Min Li
- College of Science, Inner Mongolia Agricultural University, Hohhot, 010018, China
| | - Xiao-Qian Wang
- College of Science, Inner Mongolia Agricultural University, Hohhot, 010018, China
| |
Collapse
|
300
|
Oshikoya KA, Oreagba IA, Godman B, Oguntayo FS, Fadare J, Orubu S, Massele A, Senbanjo IO. Potential drug-drug interactions in paediatric outpatient prescriptions in Nigeria and implications for the future. Expert Rev Clin Pharmacol 2016; 9:1505-1515. [PMID: 27592636 DOI: 10.1080/17512433.2016.1232619] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
BACKGROUND Information regarding the incidence of drug-drug interactions (DDIs) and adverse drug events (ADEs) among paediatric patients in Nigeria is limited. METHODS Prospective clinical audit among paediatric outpatients in four general hospitals in Nigeria over a 3-month period. Details of ADEs documented in case files was extracted. RESULTS Among 1233 eligible patients, 208 (16.9%) received prescriptions with at least one potential DDI. Seven drug classes were implicated with antimalarial combination therapies predominating. Exposure mostly to a single potential DDI, commonly involved promethazine, artemether/lumefantrine, ciprofloxacin and artemether/lumefantrine. Exposure mostly to major and serious, and moderate and clinically significant, potential DDIs. Overall exposure similar across all age groups and across genders. A significant association was seen between severity of potential DDIs and age. Only 48 (23.1%) of these patients presented at follow-up clinics with only 15 reporting ADEs. CONCLUSION There was exposure to potential DDIs in this population. However, potential DDIs were associated with only a few reported ADEs.
Collapse
Affiliation(s)
- Kazeem Adeola Oshikoya
- a Pharmacology Department , Lagos State University College of Medicine , Ikeja , Nigeria
| | - Ibrahim Adekunle Oreagba
- b Pharmacology, Therapeutic and Toxicology Department , College of Medicine, University of Lagos , Idiaraba , Nigeria
| | - Brian Godman
- c Division of Clinical Pharmacology , Karolinska Institute , Stockholm , Sweden.,d Strathclyde Institute of Pharmacy and Biomedical Sciences , University of Strathclyde , Glasgow , United Kingdom
| | - Fisayo Solomon Oguntayo
- b Pharmacology, Therapeutic and Toxicology Department , College of Medicine, University of Lagos , Idiaraba , Nigeria
| | - Joseph Fadare
- e Department of Pharmacology , Ekiti State University , Ado-Ekiti , Nigeria
| | - Samuel Orubu
- f Faculty of Pharmacy , Niger Delta University , Wilberforce Island , Nigeria
| | - Amos Massele
- g Department of Clinical Pharmacology , School of Medicine, University of Botswana , Gaborone , Botswana
| | | |
Collapse
|