1
|
Bai H, Li QZ, Qi YC, Zhai YY, Jin W. The prediction of tumor and normal tissues based on the DNA methylation values of ten key sites. BIOCHIMICA ET BIOPHYSICA ACTA. GENE REGULATORY MECHANISMS 2022; 1865:194841. [PMID: 35798200 DOI: 10.1016/j.bbagrm.2022.194841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 05/28/2022] [Accepted: 06/28/2022] [Indexed: 06/15/2023]
Abstract
Abnormal DNA methylation can alter the gene expression to promote or inhibit tumorigenesis in colon adenocarcinoma (COAD). However, the finding important genes and key sites of abnormal DNA methylation which result in the occurrence of COAD is still an eventful task. Here, we studied the effects of DNA methylation in the 12 types of genomic features on the changes of gene expression in COAD, the 10 important COAD-related genes and the key abnormal DNA methylation sites were identified. The effects of important genes on the prognosis were verified by survival analysis. Moreover, it was shown that the important genes were participated in cancer pathways and were hub genes in a co-expression network. Based on the DNA methylation levels in the ten sites, the least diversity increment algorithm for predicting tumor tissues and normal tissues in seventeen cancer types are proposed. The better results are obtained in jackknife test. For example, the predictive accuracies are 94.17 %, 91.28 %, 89.04 % and 88.89 %, respectively, for COAD, rectum adenocarcinoma, pancreatic adenocarcinoma and cholangiocarcinoma. Finally, by computing enrichment score of infiltrating immunocytes and the activity of immune pathways, we found that the genes are highly correlated with immune microenvironment.
Collapse
Affiliation(s)
- Hui Bai
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Qian-Zhong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China; The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, Inner Mongolia University, Hohhot 010070, China.
| | - Ye-Chen Qi
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Yuan-Yuan Zhai
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Wen Jin
- Inner Mongolia key laboratory of gene regulation of the metabolic disease, Department of Clinical Medical Research Center, Inner Mongolia People's Hospital, Hohhot 010010, China
| |
Collapse
|
2
|
Liu T, Chen J, Zhang Q, Hippe K, Hunt C, Le T, Cao R, Tang H. The Development of Machine Learning Methods in discriminating Secretory Proteins of Malaria Parasite. Curr Med Chem 2021; 29:807-821. [PMID: 34636289 DOI: 10.2174/0929867328666211005140625] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/28/2021] [Accepted: 08/15/2021] [Indexed: 11/22/2022]
Abstract
Malaria caused by Plasmodium falciparum is one of the major infectious diseases in the world. It is essential to exploit an effective method to predict secretory proteins of malaria parasites to develop effective cures and treatment. Biochemical assays can provide details for accurate identification of the secretory proteins, but these methods are expensive and time-consuming. In this paper, we summarized the machine learning-based identification algorithms and compared the construction strategies between different computational methods. Also, we discussed the use of machine learning to improve the ability of algorithms to identify proteins secreted by malaria parasites.
Collapse
Affiliation(s)
- Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Jiamao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University. United States
| | - Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University. United States
| | - Thu Le
- Department of Computer Science, Pacific Lutheran University. United States
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University. United States
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| |
Collapse
|
3
|
Cao Y, Yu C, Huang S, Wang S, Zuo Y, Yang L. Characterization and Prediction of Presynaptic and Postsynaptic Neurotoxins Based on Reduced Amino Acids and Biological Properties. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200707150512] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Presynaptic and postsynaptic neurotoxins are two important neurotoxins. Due to the important
role of presynaptic and postsynaptic neurotoxins in pharmacology and neuroscience, identification of them becomes very
important in biology.
Method:
In this study, the statistical test and F-score were used to calculate the difference between amino acids and
biological properties. The support vector machine was used to predict the presynaptic and postsynaptic neurotoxins by
using the reduced amino acid alphabet types.
Results:
By using the reduced amino acid alphabet as the input parameters of support vector machine, the overall accuracy
of our classifier had increased to 91.07%, which was the highest overall accuracy in this study. When compared with the
other published methods, better predictive results were obtained by our classifier.
Conclusion:
In summary, we analyzed the differences between two neurotoxins in amino acids and biological properties,
and constructed a classifier that could predict these two neurotoxins by using the reduced amino acid alphabet.
Collapse
Affiliation(s)
- Yiyin Cao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Chunlu Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shenghui Huang
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
4
|
Xie W, Feng YE. Prediction of the Disordered Regions of Intrinsically Disordered Proteins Based on the Molecular Functions. Protein Pept Lett 2020; 27:279-286. [PMID: 30819075 DOI: 10.2174/0929866526666190226160629] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2018] [Revised: 01/03/2019] [Accepted: 02/08/2019] [Indexed: 01/29/2023]
Abstract
BACKGROUND Intrinsically disordered proteins lack a well-defined three dimensional structure under physiological conditions while possessing the essential biological functions. They take part in various physiological processes such as signal transduction, transcription and posttranslational modifications and etc. The disordered regions are the main functional sites for intrinsically disordered proteins. Therefore, the research of the disordered regions has become a hot issue. OBJECTIVE In this paper, our motivation is to analysis of the features of disordered regions with different molecular functions and predict of different disordered regions using valid features. METHODS In this article, according to the different molecular function, we firstly divided intrinsically disordered proteins into six classes in DisProt database. Then, we extracted four features using bioinformatics methods, namely, Amino Acid Index (AAIndex), codon frequency (Codon), three kinds of protein secondary structure compositions (3PSS) and Chemical Shifts (CSs), and used these features to predict the disordered regions of the different functions by Support Vector Machine (SVM). RESULTS The best overall accuracy was 99.29% using the chemical shift (CSs) as feature. In feature fusion, the overall accuracy can reach 88.70% by using CSs+AAIndex as features. The overall accuracy was up to 86.09% by using CSs+AAIndex+Codon+3PSS as features. CONCLUSION We predicted and analyzed the disordered regions based on the molecular functions. The results showed that the prediction performance can be improved by adding chemical shifts and AAIndex as features, especially chemical shifts. Moreover, the chemical shift was the most effective feature in the prediction. We hoped that our results will be constructive for the study of intrinsically disordered proteins.
Collapse
Affiliation(s)
- WeiXia Xie
- College of Science, Inner Mongolia Agriculture University, Hohhot 010018, China
| | - Yong E Feng
- College of Science, Inner Mongolia Agriculture University, Hohhot 010018, China
| |
Collapse
|
5
|
Bian H, Guo M, Wang J. Recognition of Mitochondrial Proteins in Plasmodium Based on the Tripeptide Composition. Front Cell Dev Biol 2020; 8:578901. [PMID: 33043014 PMCID: PMC7525148 DOI: 10.3389/fcell.2020.578901] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 08/13/2020] [Indexed: 01/31/2023] Open
Abstract
Mitochondria play essential roles in eukaryotic cells, especially in Plasmodium cells. They have several unusual evolutionary and functional features that are incredibly vital for disease diagnosis and drug design. Thus, predicting mitochondrial proteins of Plasmodium has become a worthwhile work. However, existing computational methods can only predict mitochondrial proteins of Plasmodium falciparum (P. falciparum for short), and these methods have low accuracy. It is highly desirable to design a classifier with high accuracy for predicting mitochondrial proteins for all Plasmodium species, not only P. falciparum. We proposed a novel method, named as PM-OTC, for predicting mitochondrial proteins in Plasmodium. PM-OTC uses the Support Vector Machine (SVM) as the classifier and the selected tripeptide composition as the features. We adopted the 5-fold cross-validation method to train and test PM-OTC. Results demonstrate that PM-OTC achieves an accuracy of 94.91%, and performances of PM-OTC are superior to other methods.
Collapse
Affiliation(s)
- Haodong Bian
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, China
| | - Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, China.,Stage Key Laboratories of Reproductive Regulation & Breeding of Grassland Livestock, Hohhot, China
| |
Collapse
|
6
|
Abstract
During the last three decades or so, many efforts have been made to study the protein cleavage
sites by some disease-causing enzyme, such as HIV (Human Immunodeficiency Virus) protease
and SARS (Severe Acute Respiratory Syndrome) coronavirus main proteinase. It has become increasingly
clear <i>via</i> this mini-review that the motivation driving the aforementioned studies is quite wise,
and that the results acquired through these studies are very rewarding, particularly for developing peptide
drugs.
Collapse
Affiliation(s)
- Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, United States
| |
Collapse
|
7
|
MPPIF-Net: Identification of Plasmodium Falciparum Parasite Mitochondrial Proteins Using Deep Features with Multilayer Bi-directional LSTM. Processes (Basel) 2020. [DOI: 10.3390/pr8060725] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Mitochondrial proteins of Plasmodium falciparum (MPPF) are an important target for anti-malarial drugs, but their identification through manual experimentation is costly, and in turn, their related drugs production by pharmaceutical institutions involves a prolonged time duration. Therefore, it is highly desirable for pharmaceutical companies to develop computationally automated and reliable approach to identify proteins precisely, resulting in appropriate drug production in a timely manner. In this direction, several computationally intelligent techniques are developed to extract local features from biological sequences using machine learning methods followed by various classifiers to discriminate the nature of proteins. Unfortunately, these techniques demonstrate poor performance while capturing contextual features from sequence patterns, yielding non-representative classifiers. In this paper, we proposed a sequence-based framework to extract deep and representative features that are trust-worthy for Plasmodium mitochondrial proteins identification. The backbone of the proposed framework is MPPF identification-net (MPPFI-Net), that is based on a convolutional neural network (CNN) with multilayer bi-directional long short-term memory (MBD-LSTM). MPPIF-Net inputs protein sequences, passes through various convolution and pooling layers to optimally extract learned features. We pass these features into our sequence learning mechanism, MBD-LSTM, that is particularly trained to classify them into their relevant classes. Our proposed model is experimentally evaluated on newly prepared dataset PF2095 and two existing benchmark datasets i.e., PF175 and MPD using the holdout method. The proposed method achieved 97.6%, 97.1%, and 99.5% testing accuracy on PF2095, PF175, and MPD datasets, respectively, which outperformed the state-of-the-art approaches.
Collapse
|
8
|
Liu T, Tang H. A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite. Curr Pharm Des 2020; 26:3049-3058. [PMID: 32156226 DOI: 10.2174/1381612826666200310122324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 02/10/2020] [Indexed: 11/22/2022]
Abstract
The number of human deaths caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in the malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.
Collapse
Affiliation(s)
- Ting Liu
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
9
|
Yu B, Qiu W, Chen C, Ma A, Jiang J, Zhou H, Ma Q. SubMito-XGBoost: predicting protein submitochondrial localization by fusing multiple feature information and eXtreme gradient boosting. Bioinformatics 2019; 36:1074-1081. [DOI: 10.1093/bioinformatics/btz734] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Revised: 09/04/2019] [Accepted: 09/25/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
Mitochondria are an essential organelle in most eukaryotes. They not only play an important role in energy metabolism but also take part in many critical cytopathological processes. Abnormal mitochondria can trigger a series of human diseases, such as Parkinson's disease, multifactor disorder and Type-II diabetes. Protein submitochondrial localization enables the understanding of protein function in studying disease pathogenesis and drug design.
Results
We proposed a new method, SubMito-XGBoost, for protein submitochondrial localization prediction. Three steps are included: (i) the g-gap dipeptide composition (g-gap DC), pseudo-amino acid composition (PseAAC), auto-correlation function (ACF) and Bi-gram position-specific scoring matrix (Bi-gram PSSM) are employed to extract protein sequence features, (ii) Synthetic Minority Oversampling Technique (SMOTE) is used to balance samples, and the ReliefF algorithm is applied for feature selection and (iii) the obtained feature vectors are fed into XGBoost to predict protein submitochondrial locations. SubMito-XGBoost has obtained satisfactory prediction results by the leave-one-out-cross-validation (LOOCV) compared with existing methods. The prediction accuracies of the SubMito-XGBoost method on the two training datasets M317 and M983 were 97.7% and 98.9%, which are 2.8–12.5% and 3.8–9.9% higher than other methods, respectively. The prediction accuracy of the independent test set M495 was 94.8%, which is significantly better than the existing studies. The proposed method also achieves satisfactory predictive performance on plant and non-plant protein submitochondrial datasets. SubMito-XGBoost also plays an important role in new drug design for the treatment of related diseases.
Availability and implementation
The source codes and data are publicly available at https://github.com/QUST-AIBBDRC/SubMito-XGBoost/.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- School of Life Sciences, University of Science and Technology of China, Hefei 230027, China
- Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
- School of Mathematics and Statistics, Changsha University of Science and Technology, Changsha 410114, China
| | - Wenying Qiu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Anjun Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Jing Jiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
- School of Aerospace Engineering, Xiamen University, Xiamen 361001, China
| | - Hongyan Zhou
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao 266061, China
- Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao 266061, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
10
|
Chou KC. Proposing Pseudo Amino Acid Components is an Important Milestone for Proteome and Genome Analyses. Int J Pept Res Ther 2019. [DOI: 10.1007/s10989-019-09910-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
|
11
|
Yonge F, Weixia X. Identification of Mitochondrial Proteins of Malaria Parasite Adding the New Parameter. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666180608100348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Malaria has been one of the serious infectious diseases caused by Plasmodium falciparum (P. falciparum). Mitochondrial proteins of P. falciparum are regarded as effective drug targets against malaria. Thus, it is necessary to accurately identify mitochondrial proteins of malaria parasite. Many algorithms have been proposed for the prediction of mitochondrial proteins of malaria parasite and yielded the better results. However, the parameters used by these methods were primarily based on amino acid sequences. In this study, we added a novel parameter for predicting mitochondrial proteins of malaria parasite based on protein secondary structure. Firstly, we extracted three feature parameters, namely, three kinds of protein secondary structures compositions (3PSS), 20 amino acid compositions (20AAC) and 400 dipeptide compositions (400DC), and used the analysis of variance (ANOVA) to screen 400 dipeptides. Secondly, we adopted these features to predict mitochondrial proteins of malaria parasite by using support vector machine (SVM). Finally, we found that 1) adding the feature of protein secondary structure (3PSS) can indeed improve the prediction accuracy. This result demonstrated that the parameter of protein secondary structure is a valid feature in the prediction of mitochondrial proteins of malaria parasite; 2) feature combination can improve the prediction’s results; feature selection can reduce the dimension and simplify the calculation. We achieved the sensitivity (Sn) of 98.16%, the specificity (Sp) of 97.64% and overall accuracy (Acc) of 97.88% with 0.957 of Mathew’s correlation coefficient (MCC) by using 3PSS+ 20AAC+ 34DC as a feature in 15-fold cross-validation. This result is compared with that of the similar work in the same dataset, showing the superiority of our work.
Collapse
Affiliation(s)
- Feng Yonge
- College of Science, Inner Mongolia Agriculture University, Hohhot 010018, China
| | - Xie Weixia
- College of Science, Inner Mongolia Agriculture University, Hohhot 010018, China
| |
Collapse
|
12
|
Akbar S, Hayat M, Kabir M, Iqbal M. iAFP-gap-SMOTE: An Efficient Feature Extraction Scheme Gapped Dipeptide Composition is Coupled with an Oversampling Technique for Identification of Antifreeze Proteins. LETT ORG CHEM 2019. [DOI: 10.2174/1570178615666180816101653] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Antifreeze proteins (AFPs) perform distinguishable roles in maintaining homeostatic conditions of living organisms and protect their cell and body from freezing in extremely cold conditions. Owing to high diversity in protein sequences and structures, the discrimination of AFPs from non- AFPs through experimental approaches is expensive and lengthy. It is, therefore, vastly desirable to propose a computational intelligent and high throughput model that truly reflects AFPs quickly and accurately. In a sequel, a new predictor called “iAFP-gap-SMOTE” is proposed for the identification of AFPs. Protein sequences are expressed by adopting three numerical feature extraction schemes namely; Split Amino Acid Composition, G-gap di-peptide Composition and Reduce Amino Acid alphabet composition. Usually, classification hypothesis biased towards majority class in case of the imbalanced dataset. Oversampling technique Synthetic Minority Over-sampling Technique is employed in order to increase the instances of the lower class and control the biasness. 10-fold cross-validation test is applied to appraise the success rates of “iAFP-gap-SMOTE” model. After the empirical investigation, “iAFP-gap-SMOTE” model obtained 95.02% accuracy. The comparison suggested that the accuracy of” iAFP-gap-SMOTE” model is higher than that of the present techniques in the literature so far. It is greatly recommended that our proposed model “iAFP-gap-SMOTE” might be helpful for the research community and academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Muhammad Kabir
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP 23200, Pakistan
| |
Collapse
|
13
|
Pan Y, Wang S, Zhang Q, Lu Q, Su D, Zuo Y, Yang L. Analysis and prediction of animal toxins by various Chou's pseudo components and reduced amino acid compositions. J Theor Biol 2018; 462:221-229. [PMID: 30452961 DOI: 10.1016/j.jtbi.2018.11.010] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 11/06/2018] [Accepted: 11/15/2018] [Indexed: 01/19/2023]
Abstract
The animal toxin proteins are one of the disulfide rich small peptides that detected in venomous species. They are used as pharmacological tools and therapeutic agents in medicine for the high specificity of their targets. The successful analysis and prediction of toxin proteins may have important signification for the pharmacological and therapeutic researches of toxins. In this study, significant differences were found between the toxins and the non-toxins in amino acid compositions and several important biological properties. The random forest was firstly proposed to predict the animal toxin proteins by selecting 400 pseudo amino acid compositions and the dipeptide compositions of reduced amino acid alphabet as the input parameters. Based on dipeptide composition of reduced amino acid alphabet with 13 reduced amino acids, the best overall accuracy of 85.71% was obtained. These results indicated that our algorithm was an efficient tool for the animal toxin prediction.
Collapse
Affiliation(s)
- Yi Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
14
|
Poonam, Gupta Y, Gupta N, Singh S, Wu L, Chhikara BS, Rawat M, Rathi B. Multistage inhibitors of the malaria parasite: Emerging hope for chemoprotection and malaria eradication. Med Res Rev 2018; 38:1511-1535. [DOI: 10.1002/med.21486] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2017] [Revised: 12/09/2017] [Accepted: 12/26/2017] [Indexed: 12/13/2022]
Affiliation(s)
- Poonam
- Department of Chemistry; Miranda House, University of Delhi; India
| | - Yash Gupta
- National Institute of Malaria Research (ICMR); New Delhi India
| | - Nikesh Gupta
- Special Centre for Nanosciences; Jawaharlal Nehru University; New Delhi India
| | - Snigdha Singh
- Laboratory for Translational Chemistry and Drug Discovery, Department of Chemistry; Hansraj College University Enclave, University of Delhi; Delhi India
| | - Lidong Wu
- Department of Chemistry; Massachusetts Institute of Technology; Cambridge MA USA
- Key Laboratory of Control of Quality and Safety for Aquatic Products; Ministry of Agriculture, Chinese Academy of Fishery Sciences; Beijing China
| | | | - Manmeet Rawat
- Department of Internal Medicine; University of New Mexico School of Medicine; Albuquerque NM USA
| | - Brijesh Rathi
- Laboratory for Translational Chemistry and Drug Discovery, Department of Chemistry; Hansraj College University Enclave, University of Delhi; Delhi India
- Department of Chemistry; Massachusetts Institute of Technology; Cambridge MA USA
| |
Collapse
|
15
|
Akbar S, Hayat M, Iqbal M, Jan MA. iACP-GAEnsC: Evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space. Artif Intell Med 2017; 79:62-70. [PMID: 28655440 DOI: 10.1016/j.artmed.2017.06.008] [Citation(s) in RCA: 94] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Revised: 06/12/2017] [Accepted: 06/16/2017] [Indexed: 01/10/2023]
Abstract
Cancer is a fatal disease, responsible for one-quarter of all deaths in developed countries. Traditional anticancer therapies such as, chemotherapy and radiation, are highly expensive, susceptible to errors and ineffective techniques. These conventional techniques induce severe side-effects on human cells. Due to perilous impact of cancer, the development of an accurate and highly efficient intelligent computational model is desirable for identification of anticancer peptides. In this paper, evolutionary intelligent genetic algorithm-based ensemble model, 'iACP-GAEnsC', is proposed for the identification of anticancer peptides. In this model, the protein sequences are formulated, using three different discrete feature representation methods, i.e., amphiphilic Pseudo amino acid composition, g-Gap dipeptide composition, and Reduce amino acid alphabet composition. The performance of the extracted feature spaces are investigated separately and then merged to exhibit the significance of hybridization. In addition, the predicted results of individual classifiers are combined together, using optimized genetic algorithm and simple majority technique in order to enhance the true classification rate. It is observed that genetic algorithm-based ensemble classification outperforms than individual classifiers as well as simple majority voting base ensemble. The performance of genetic algorithm-based ensemble classification is highly reported on hybrid feature space, with an accuracy of 96.45%. In comparison to the existing techniques, 'iACP-GAEnsC' model has achieved remarkable improvement in terms of various performance metrics. Based on the simulation results, it is observed that 'iACP-GAEnsC' model might be a leading tool in the field of drug design and proteomics for researchers.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Muhammad Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| | - Mian Ahmad Jan
- Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan.
| |
Collapse
|
16
|
Liu B, Wu H, Chou KC. Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences. ACTA ACUST UNITED AC 2017. [DOI: 10.4236/ns.2017.94007] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
17
|
Li L, Yu S, Xiao W, Li Y, Hu W, Huang L, Zheng X, Zhou S, Yang H. Protein submitochondrial localization from integrated sequence representation and SVM-based backward feature extraction. MOLECULAR BIOSYSTEMS 2015; 11:170-7. [PMID: 25335193 DOI: 10.1039/c4mb00340c] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Mitochondrion, a tiny energy factory, plays an important role in various biological processes of most eukaryotic cells.
Collapse
Affiliation(s)
- Liqi Li
- Department of General Surgery
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Sanjiu Yu
- Institute of Cardiovascular Diseases of PLA
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Weidong Xiao
- Department of General Surgery
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Yongsheng Li
- Institute of Cancer
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Wenjuan Hu
- Department of Pathophysiology and High Altitude Pathology
- College of High Altitude Military Medicine
- Third Military Medical University
- Chongqing 400038
- China
| | - Lan Huang
- Institute of Cardiovascular Diseases of PLA
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Xiaoqi Zheng
- Department of Mathematics
- Shanghai Normal University
- Shanghai 200234
- China
| | - Shiwen Zhou
- National Drug Clinical Trial Institution
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| | - Hua Yang
- Department of General Surgery
- Xinqiao Hospital
- Third Military Medical University
- Chongqing 400037
- China
| |
Collapse
|
18
|
Ding H, Li D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids 2014; 47:329-33. [DOI: 10.1007/s00726-014-1862-4] [Citation(s) in RCA: 76] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Accepted: 10/27/2014] [Indexed: 10/24/2022]
|
19
|
Predicting the types of J-proteins using clustered amino acids. BIOMED RESEARCH INTERNATIONAL 2014; 2014:935719. [PMID: 24804260 PMCID: PMC3996952 DOI: 10.1155/2014/935719] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Revised: 03/04/2014] [Accepted: 03/13/2014] [Indexed: 01/24/2023]
Abstract
J-proteins are molecular chaperones and present in a wide variety of organisms from prokaryote to eukaryote. Based on their domain organizations, J-proteins can be classified into 4 types, that is, Type I, Type II, Type III, and Type IV. Different types of J-proteins play distinct roles in influencing cancer properties and cell death. Thus, reliably annotating the types of J-proteins is essential to better understand their molecular functions. In the present work, a support vector machine based method was developed to identify the types of J-proteins using the tripeptide composition of reduced amino acid alphabet. In the jackknife cross-validation, the maximum overall accuracy of 94% was achieved on a stringent benchmark dataset. We also analyzed the amino acid compositions by using analysis of variance and found the distinct distributions of amino acids in each family of the J-proteins. To enhance the value of the practical applications of the proposed model, an online web server was developed and can be freely accessed.
Collapse
|
20
|
Das Roy R, Dash D. Selection of relevant features from amino acids enables development of robust classifiers. Amino Acids 2014; 46:1343-51. [PMID: 24604165 DOI: 10.1007/s00726-014-1697-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Accepted: 02/14/2014] [Indexed: 12/30/2022]
Abstract
Machine learning (ML) has been extensively applied to develop models and to understand high-throughput data of biological processes. However, new ML models, trained with novel experimental results, are required to build regularly for more precise predictions. ML methods can build models from numeric data, whereas biological data are generally textual (DNA, protein sequences) or images and needs feature calculation algorithms to generate quantitative features. Programming skills along with domain knowledge are required to develop these algorithms. Therefore, the process of knowledge discovery through ML is decelerated due to lack of generic tools to construct features and to build models directly from the data. Hence, we developed a schema that calculates about 5,000 features, selects relevant features and develops protein classifiers from the training data. To demonstrate the general applicability and robustness of our method, fungal adhesins and nuclear receptor proteins were used for building classifiers which outperformed existing classifiers when tested on independent data. Next, we built a classifier for mitochondrial proteins of Plasmodium falciparum which causes human malaria because the latest corresponding classifiers are not publically accessible. Our classifier attained 98.18 % accuracy and 0.95 Matthews correlation coefficient by fivefold cross-validation and outperformed existing classifiers on independent test set. We implemented this schema as user-friendly and open source application Pro-Gyan ( http://code.google.com/p/pro-gyan/ ), to build and share executable classifiers without programming knowledge.
Collapse
Affiliation(s)
- Rishi Das Roy
- GN Ramachandran Knowledge Centre for Genome Informatics, CSIR-Institute of Genomics and Integrative Biology, Mall Road, Delhi, 110007, India,
| | | |
Collapse
|
21
|
Feng PM, Chen W, Lin H, Chou KC. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal Biochem 2013; 442:118-25. [DOI: 10.1016/j.ab.2013.05.024] [Citation(s) in RCA: 230] [Impact Index Per Article: 20.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Revised: 05/21/2013] [Accepted: 05/22/2013] [Indexed: 01/22/2023]
|
22
|
Mirza MT, Khan A, Tahir M, Lee YS. MitProt-Pred: Predicting mitochondrial proteins of Plasmodium falciparum parasite using diverse physiochemical properties and ensemble classification. Comput Biol Med 2013; 43:1502-11. [PMID: 24034742 DOI: 10.1016/j.compbiomed.2013.07.024] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Revised: 07/19/2013] [Accepted: 07/24/2013] [Indexed: 10/26/2022]
Abstract
Mitochondrial protein of Plasmodium falciparum is an important target for anti-malarial drugs. Experimental approaches for detecting mitochondrial proteins are costly and time consuming. Therefore, MitProt-Pred is developed that utilizes Bi-profile Bayes, Pseudo Average Chemical Shift, Split Amino Acid Composition, and Pseudo Amino Acid Composition based features of the protein sequences. Hybrid feature space is also developed by combining different individual feature spaces. These feature spaces are learned and exploited through SVM based ensemble. MitProt-Pred achieved significantly improved prediction performance for two standard datasets. We also developed the score level ensemble, which outperforms the feature level ensemble.
Collapse
Affiliation(s)
- Muhammad Tayyeb Mirza
- Pattern Recognition Laboratory, Department of Computer and Information Sciences, PIEAS, Nilore, Islamabad, Pakistan
| | | | | | | |
Collapse
|
23
|
Determination of protein subcellular localization in apicomplexan parasites. Trends Parasitol 2012; 28:546-54. [PMID: 22995720 DOI: 10.1016/j.pt.2012.08.008] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 08/22/2012] [Accepted: 08/24/2012] [Indexed: 11/20/2022]
Abstract
Parasites from the phylum Apicomplexa include causative agents of serious diseases including malaria (Plasmodium spp.) and toxoplasmosis (Toxoplasma gondii). Apicomplexan parasites infect thousands of types of animal cells and send their proteins to an array of compartments within their own cell, as well as exporting proteins into and beyond their host cell. Ascertaining destinations to which individual proteins are delivered allows researchers to better understand parasite biology and to identify potential targets for therapeutic interventions. Our toolkit for establishing subcellular locations of apicomplexan proteins is becoming more extensive and specialized, and here we review developments in this technology.
Collapse
|
24
|
Chen W, Feng P, Lin H. Prediction of ketoacyl synthase family using reduced amino acid alphabets. J Ind Microbiol Biotechnol 2011; 39:579-84. [PMID: 22042516 DOI: 10.1007/s10295-011-1047-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2011] [Accepted: 10/04/2011] [Indexed: 11/28/2022]
Abstract
Ketoacyl synthases are enzymes involved in fatty acid synthesis and can be classified into five families based on primary sequence similarity. Different families have different catalytic mechanisms. Developing cost-effective computational models to identify the family of ketoacyl synthases will be helpful for enzyme engineering and in knowing individual enzymes' catalytic mechanisms. In this work, a support vector machine-based method was developed to predict ketoacyl synthase family using the n-peptide composition of reduced amino acid alphabets. In jackknife cross-validation, the model based on the 2-peptide composition of a reduced amino acid alphabet of size 13 yielded the best overall accuracy of 96.44% with average accuracy of 93.36%, which is superior to other state-of-the-art methods. This result suggests that the information provided by n-peptide compositions of reduced amino acid alphabets provides efficient means for enzyme family classification and that the proposed model can be efficiently used for ketoacyl synthase family annotation.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, College of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China.
| | | | | |
Collapse
|