1
|
Yan J, Zhang B, Zhou M, Campbell-Valois FX, Siu SWI. A deep learning method for predicting the minimum inhibitory concentration of antimicrobial peptides against Escherichia coli using Multi-Branch-CNN and Attention. mSystems 2023; 8:e0034523. [PMID: 37431995 PMCID: PMC10506472 DOI: 10.1128/msystems.00345-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 05/31/2023] [Indexed: 07/12/2023] Open
Abstract
Antimicrobial peptides (AMPs) are a promising alternative to antibiotics to combat drug resistance in pathogenic bacteria. However, the development of AMPs with high potency and specificity remains a challenge, and new tools to evaluate antimicrobial activity are needed to accelerate the discovery process. Therefore, we proposed MBC-Attention, a combination of a multi-branch convolution neural network architecture and attention mechanisms to predict the experimental minimum inhibitory concentration of peptides against Escherichia coli. The optimal MBC-Attention model achieved an average Pearson correlation coefficient (PCC) of 0.775 and a root mean squared error (RMSE) of 0.533 (log μM) in three independent tests of randomly drawn sequences from the data set. This results in a 5-12% improvement in PCC and a 6-13% improvement in RMSE compared to 17 traditional machine learning models and 2 optimally tuned models using random forest and support vector machine. Ablation studies confirmed that the two proposed attention mechanisms, global attention and local attention, contributed largely to performance improvement. IMPORTANCE Antimicrobial peptides (AMPs) are potential candidates for replacing conventional antibiotics to combat drug resistance in pathogenic bacteria. Therefore, it is necessary to evaluate the antimicrobial activity of AMPs quantitatively. However, wet-lab experiments are labor-intensive and time-consuming. To accelerate the evaluation process, we develop a deep learning method called MBC-Attention to regress the experimental minimum inhibitory concentration of AMPs against Escherichia coli. The proposed model outperforms traditional machine learning methods. Data, scripts to reproduce experiments, and the final production models are available on GitHub.
Collapse
Affiliation(s)
- Jielu Yan
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, Macau, China
| | - Bob Zhang
- PAMI Research Group, Department of Computer and Information Science, University of Macau, Taipa, Macau, China
| | - Mingliang Zhou
- School of Computer Science, Chongqing University, Shapingba, Chongqing, China
| | - François-Xavier Campbell-Valois
- Host-Microbe Interactions Laboratory, Center for Chemical and Synthetic Biology, Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario, Canada
- Centre for Infection, Immunity, and Inflammation, University of Ottawa, Ottawa, Ontario, Canada
- Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, Ontario, Canada
| | - Shirley W. I. Siu
- Institute of Science and Environment, University of Saint Joseph, Macau, China
| |
Collapse
|
2
|
Hayat M, Tahir M, Alarfaj FK, Alturki R, Gazzawe F. NLP-BCH-Ens: NLP-based intelligent computational model for discrimination of malaria parasite. Comput Biol Med 2022; 149:105962. [DOI: 10.1016/j.compbiomed.2022.105962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 07/29/2022] [Accepted: 08/13/2022] [Indexed: 11/03/2022]
|
3
|
Feng C, Wu J, Wei H, Xu L, Zou Q. CRCF: A Method of Identifying Secretory Proteins of Malaria Parasites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2149-2157. [PMID: 34061749 DOI: 10.1109/tcbb.2021.3085589] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Malaria is a mosquito-borne disease that results in millions of cases and deaths annually. The development of a fast computational method that identifies secretory proteins of the malaria parasite is important for research on antimalarial drugs and vaccines. Thus, a method was developed to identify the secretory proteins of malaria parasites. In this method, a reduced alphabet was selected to recode the original protein sequence. A feature synthesis method was used to synthesise three different types of feature information. Finally, the random forest method was used as a classifier to identify the secretory proteins. In addition, a web server was developed to share the proposed algorithm. Experiments using the benchmark dataset demonstrated that the overall accuracy achieved by the proposed method was greater than 97.8 percent using the 10-fold cross-validation method. Furthermore, the reduced schemes and characteristic performance analyses are discussed.
Collapse
|
4
|
Zou H, Yang F, Yin Z. iTTCA-MFF: identifying tumor T cell antigens based on multiple feature fusion. Immunogenetics 2022; 74:447-454. [PMID: 35246701 DOI: 10.1007/s00251-022-01258-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 02/26/2022] [Indexed: 11/05/2022]
Abstract
Cancer is a terrible disease, recent studies reported that tumor T cell antigens (TTCAs) may play a promising role in cancer treatment. Since experimental methods are still expensive and time-consuming, it is highly desirable to develop automatic computational methods to identify tumor T cell antigens from the huge amount of natural and synthetic peptides. Hence, in this study, a novel computational model called iTTCA-MFF was proposed to identify TTCAs. In order to describe the sequence effectively, the physicochemical (PC) properties of amino acid and residue pairwise energy content matrix (RECM) were firstly employed to encode peptide sequences. Then, two different approaches including covariance and Pearson's correlation coefficient (PCC) were used to collect discriminative information from PC and RECM matrixes. Next, an effective feature selection approach called the least absolute shrinkage and selection operator (LAASO) was adopted to select the optimal features. These selected optimal features were fed into support vector machine (SVM) for identifying TTCAs. We performed experiments on two different datasets, experimental results indicated that the proposed method is promising and may play a complementary role to the existing methods for identifying TTCAs. The datasets and codes can be available at https://figshare.com/articles/online_resource/iTTCA-MFF/17636120 .
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China.
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China
| |
Collapse
|
5
|
Liu T, Chen J, Zhang Q, Hippe K, Hunt C, Le T, Cao R, Tang H. The Development of Machine Learning Methods in discriminating Secretory Proteins of Malaria Parasite. Curr Med Chem 2021; 29:807-821. [PMID: 34636289 DOI: 10.2174/0929867328666211005140625] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/28/2021] [Accepted: 08/15/2021] [Indexed: 11/22/2022]
Abstract
Malaria caused by Plasmodium falciparum is one of the major infectious diseases in the world. It is essential to exploit an effective method to predict secretory proteins of malaria parasites to develop effective cures and treatment. Biochemical assays can provide details for accurate identification of the secretory proteins, but these methods are expensive and time-consuming. In this paper, we summarized the machine learning-based identification algorithms and compared the construction strategies between different computational methods. Also, we discussed the use of machine learning to improve the ability of algorithms to identify proteins secreted by malaria parasites.
Collapse
Affiliation(s)
- Ting Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Jiamao Chen
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Qian Zhang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| | - Kyle Hippe
- Department of Computer Science, Pacific Lutheran University. United States
| | - Cassandra Hunt
- Department of Computer Science, Pacific Lutheran University. United States
| | - Thu Le
- Department of Computer Science, Pacific Lutheran University. United States
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University. United States
| | - Hua Tang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou. China
| |
Collapse
|
6
|
Li H, Du H, Wang X, Gao P, Liu Y, Lin W. Remarks on Computational Method for Identifying Acid and Alkaline Enzymes. Curr Pharm Des 2020; 26:3105-3114. [PMID: 32552636 DOI: 10.2174/1381612826666200617170826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 05/07/2020] [Indexed: 11/22/2022]
Abstract
The catalytic efficiency of the enzyme is thousands of times higher than that of ordinary catalysts. Thus, they are widely used in industrial and medical fields. However, enzymes with protein structure can be destroyed and inactivated in high temperature, over acid or over alkali environment. It is well known that most of enzymes work well in an environment with pH of 6-8, while some special enzymes remain active only in an alkaline environment with pH > 8 or an acidic environment with pH < 6. Therefore, the identification of acidic and alkaline enzymes has become a key task for industrial production. Because of the wide varieties of enzymes, it is hard work to determine the acidity and alkalinity of the enzyme by experimental methods, and even this task cannot be achieved. Converting protein sequences into digital features and building computational models can efficiently and accurately identify the acidity and alkalinity of enzymes. This review summarized the progress of the digital features to express proteins and computational methods to identify acidic and alkaline enzymes. We hope that this paper will provide more convenience, ideas, and guides for computationally classifying acid and alkaline enzymes.
Collapse
Affiliation(s)
- Hongfei Li
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Haoze Du
- Department of Computer Science, Wake Forest University, Winston-Salem, NC, 27109, United States
| | - Xianfang Wang
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Peng Gao
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Yifeng Liu
- School of Computer and Information Engineering, Henan Normal University, Xinxiang 453007, China
| | - Weizhong Lin
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, United States
| |
Collapse
|
7
|
Yu Y, Wang S, Wang Y, Cao Y, Yu C, Pan Y, Su D, Lu Q, Zuo Y, Yang L. Using Reduced Amino Acid Alphabet and Biological Properties to Analyze and Predict Animal Neurotoxin Protein. Curr Drug Metab 2020; 21:810-817. [PMID: 32433000 DOI: 10.2174/1389200221666200520090555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 01/07/2020] [Accepted: 01/15/2020] [Indexed: 11/22/2022]
Abstract
AIMS Because of the high affinity of these animal neurotoxin proteins for some special target site, they were usually used as pharmacological tools and therapeutic agents in medicine to gain deep insights into the function of the nervous system. BACKGROUND AND OBJECTIVE The animal neurotoxin proteins are one of the most common functional groups among the animal toxin proteins. Thus, it was very important to characterize and predict the animal neurotoxin proteins. METHODS In this study, the differences between the animal neurotoxin proteins and non-toxin proteins were analyzed. RESULT Significant differences were found between them. In addition, the support vector machine was proposed to predict the animal neurotoxin proteins. The predictive results of our classifier achieved the overall accuracy of 96.46%. Furthermore, the random forest and k-nearest neighbors were applied to predict the animal neurotoxin proteins. CONCLUSION The compared results indicated that the predictive performances of our classifier were better than other two algorithms.
Collapse
Affiliation(s)
- Yao Yu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yakun Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yiyin Cao
- Public Health College, Harbin Medical University, Harbin 150081, China
| | - Chunlu Yu
- Public Health College, Harbin Medical University, Harbin 150081, China
| | - Yi Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
8
|
Zheng L, Huang S, Mu N, Zhang H, Zhang J, Chang Y, Yang L, Zuo Y. RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou's five-step rule. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5650975. [PMID: 31802128 PMCID: PMC6893003 DOI: 10.1093/database/baz131] [Citation(s) in RCA: 41] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 10/16/2019] [Accepted: 10/17/2019] [Indexed: 12/12/2022]
Abstract
By reducing amino acid alphabet, the protein complexity can be significantly simplified, which could improve computational efficiency, decrease information redundancy and reduce chance of overfitting. Although some reduced alphabets have been proposed, different classification rules could produce distinctive results for protein sequence analysis. Thus, it is urgent to construct a systematical frame for reduced alphabets. In this work, we constructed a comprehensive web server called RAACBook for protein sequence analysis and machine learning application by integrating reduction alphabets. The web server contains three parts: (i) 74 types of reduced amino acid alphabet were manually extracted to generate 673 reduced amino acid clusters (RAACs) for dealing with unique protein problems. It is easy for users to select desired RAACs from a multilayer browser tool. (ii) An online tool was developed to analyze primary sequence of protein. The tool could produce K-tuple reduced amino acid composition by defining three correlation parameters (K-tuple, g-gap, λ-correlation). The results are visualized as sequence alignment, mergence of RAA composition, feature distribution and logo of reduced sequence. (iii) The machine learning server is provided to train the model of protein classification based on K-tuple RAAC. The optimal model could be selected according to the evaluation indexes (ROC, AUC, MCC, etc.). In conclusion, RAACBook presents a powerful and user-friendly service in protein sequence analysis and computational proteomics. RAACBook can be freely available at http://bioinfor.imu.edu.cn/raacbook. Database URL: http://bioinfor.imu.edu.cn/raacbook
Collapse
Affiliation(s)
- Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Nengjiang Mu
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Haoyue Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Jiayu Zhang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Yu Chang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Baojian Road No.157, Harbin 150081, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Zhaojun Road No.24, Hohhot, 010070, China
| |
Collapse
|
9
|
Liu T, Tang H. A Brief Survey of Machine Learning Methods in Identification of Mitochondria Proteins in Malaria Parasite. Curr Pharm Des 2020; 26:3049-3058. [PMID: 32156226 DOI: 10.2174/1381612826666200310122324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Accepted: 02/10/2020] [Indexed: 11/22/2022]
Abstract
The number of human deaths caused by malaria is increasing day-by-day. In fact, the mitochondrial proteins of the malaria parasite play vital roles in the organism. For developing effective drugs and vaccines against infection, it is necessary to accurately identify mitochondrial proteins of the malaria parasite. Although precise details for the mitochondrial proteins can be provided by biochemical experiments, they are expensive and time-consuming. In this review, we summarized the machine learning-based methods for mitochondrial proteins identification in the malaria parasite and compared the construction strategies of these computational methods. Finally, we also discussed the future development of mitochondrial proteins recognition with algorithms.
Collapse
Affiliation(s)
- Ting Liu
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| | - Hua Tang
- Department of Pathophysiology, Key Laboratory of Medical Electrophysiology, Ministry of Education, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
10
|
Identifying FL11 subtype by characterizing tumor immune microenvironment in prostate adenocarcinoma via Chou's 5-steps rule. Genomics 2020; 112:1500-1515. [DOI: 10.1016/j.ygeno.2019.08.021] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/03/2019] [Accepted: 08/26/2019] [Indexed: 12/14/2022]
|
11
|
pLoc_bal-mHum: Predict subcellular localization of human proteins by PseAAC and quasi-balancing training dataset. Genomics 2019; 111:1274-1282. [DOI: 10.1016/j.ygeno.2018.08.007] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Revised: 08/14/2018] [Accepted: 08/16/2018] [Indexed: 12/17/2022]
|
12
|
Narwani TJ, Craveur P, Shinada NK, Floch A, Santuz H, Vattekatte AM, Srinivasan N, Rebehmed J, Gelly JC, Etchebest C, de Brevern AG. Discrete analyses of protein dynamics. J Biomol Struct Dyn 2019; 38:2988-3002. [PMID: 31361191 DOI: 10.1080/07391102.2019.1650112] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Protein structures are highly dynamic macromolecules. This dynamics is often analysed through experimental and/or computational methods only for an isolated or a limited number of proteins. Here, we explore large-scale protein dynamics simulation to observe dynamics of local protein conformations using different perspectives. We analysed molecular dynamics to investigate protein flexibility locally, using classical approaches such as RMSf, solvent accessibility, but also innovative approaches such as local entropy. First, we focussed on classical secondary structures and analysed specifically how β-strand, β-turns, and bends evolve during molecular simulations. We underlined interesting specific bias between β-turns and bends, which are considered as the same category, while their dynamics show differences. Second, we used a structural alphabet that is able to approximate every part of the protein structures conformations, namely protein blocks (PBs) to analyse (i) how each initial local protein conformations evolve during dynamics and (ii) if some exchange can exist among these PBs. Interestingly, the results are largely complex than simple regular/rigid and coil/flexible exchange. AbbreviationsNeqnumber of equivalentPBProtein BlocksPDBProtein DataBankRMSfroot mean square fluctuationsCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Tarun Jairaj Narwani
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Pierrick Craveur
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nicolas K Shinada
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Discngine, SAS, Paris, France
| | - Aline Floch
- Laboratoire D'Excellence GR-Ex, Paris, France.,Etablissement Français du Sang Ile de France, Créteil, France.,IMRB - INSERM U955 Team 2 « Transfusion et Maladies du Globule Rouge », Paris Est- Créteil Univ, Créteil, France.,UPEC, Université Paris Est-Créteil, Créteil, France
| | - Hubert Santuz
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Akhila Melarkode Vattekatte
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | | | - Joseph Rebehmed
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| | - Jean-Christophe Gelly
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| | - Catherine Etchebest
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | - Alexandre G de Brevern
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| |
Collapse
|
13
|
Zuo Y, Chang Y, Huang S, Zheng L, Yang L, Cao G. iDEF-PseRAAC: Identifying the Defensin Peptide by Using Reduced Amino Acid Composition Descriptor. Evol Bioinform Online 2019; 15:1176934319867088. [PMID: 31391777 PMCID: PMC6669840 DOI: 10.1177/1176934319867088] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Accepted: 07/08/2019] [Indexed: 11/18/2022] Open
Abstract
Defensins as 1 of major classes of host defense peptides play a significant role in the innate immunity, which are extremely evolved in almost all living organisms. Developing high-throughput computational methods can accurately help in designing drugs or medical means to defense against pathogens. To take up such a challenge, an up-to-date server based on rigorous benchmark dataset, referred to as iDEF-PseRAAC, was designed for predicting the defensin family in this study. By extracting primary sequence compositions based on different types of reduced amino acid alphabet, it was calculated that the best overall accuracy of the selected feature subset was achieved to 92.38%. Therefore, we can conclude that the information provided by abundant types of amino acid reduction will provide efficient and rational methodology for defensin identification. And, a free online server is freely available for academic users at http://bioinfor.imu.edu.cn/idpf. We hold expectations that iDEF-PseRAAC may be a promising weapon for the function annotation about the defensins protein.
Collapse
Affiliation(s)
- Yongchun Zuo
- College of Veterinary Medicine, Inner Mongolia Agricultural University, Hohhot, China.,State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Yu Chang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Shenghui Huang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Guifang Cao
- College of Veterinary Medicine, Inner Mongolia Agricultural University, Hohhot, China
| |
Collapse
|
14
|
Characterization of human proteins with different subcellular localizations by topological and biological properties. Genomics 2018; 111:1831-1838. [PMID: 30543849 DOI: 10.1016/j.ygeno.2018.12.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 12/02/2018] [Accepted: 12/07/2018] [Indexed: 11/20/2022]
Abstract
Knowing the protein localization can provide valuable information resource for elucidating protein function. In recent years, with the advances of human genomics and proteomics, it is possible to characterize human proteins that are located in different subcellular localizations. In this study, we used the topological properties and biological properties to characterize human proteins with six subcellular localizations. Almost all of these properties were found to be significantly different among six protein categories. Network topology analysis indicated that several significant topological properties, including the degree and k-core, were higher for the mitochondrial proteins. Biological property analysis showed that the nuclear proteins appeared to be correlated with important biological function. We hope these findings may provide some important help for comprehensive understanding the biological function of proteins, and prediction of protein subcellular localizations in human.
Collapse
|
15
|
Pan Y, Wang S, Zhang Q, Lu Q, Su D, Zuo Y, Yang L. Analysis and prediction of animal toxins by various Chou's pseudo components and reduced amino acid compositions. J Theor Biol 2018; 462:221-229. [PMID: 30452961 DOI: 10.1016/j.jtbi.2018.11.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Revised: 11/06/2018] [Accepted: 11/15/2018] [Indexed: 01/19/2023]
Abstract
The animal toxin proteins are one of the disulfide rich small peptides that detected in venomous species. They are used as pharmacological tools and therapeutic agents in medicine for the high specificity of their targets. The successful analysis and prediction of toxin proteins may have important signification for the pharmacological and therapeutic researches of toxins. In this study, significant differences were found between the toxins and the non-toxins in amino acid compositions and several important biological properties. The random forest was firstly proposed to predict the animal toxin proteins by selecting 400 pseudo amino acid compositions and the dipeptide compositions of reduced amino acid alphabet as the input parameters. Based on dipeptide composition of reduced amino acid alphabet with 13 reduced amino acids, the best overall accuracy of 85.71% was obtained. These results indicated that our algorithm was an efficient tool for the animal toxin prediction.
Collapse
Affiliation(s)
- Yi Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
16
|
Zhang Q, Wang S, Pan Y, Su D, Lu Q, Zuo Y, Yang L. Characterization of proteins in different subcellular localizations for Escherichia coli K12. Genomics 2018; 111:1134-1141. [PMID: 30026105 DOI: 10.1016/j.ygeno.2018.07.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Revised: 07/07/2018] [Accepted: 07/11/2018] [Indexed: 10/28/2022]
Abstract
Knowing the comprehensive knowledge about the protein subcellular localization is an important step to understand the function of the proteins. Recent advances in system biology have allowed us to develop more accurate methods for characterizing the proteins at subcellular localization level. In this study, the analysis method was developed to characterize the topological properties and biological properties of the cytoplasmic proteins, inner membrane proteins, outer membrane proteins and periplasmic proteins in Escherichia coli (E. coli). Statistical significant differences were found in all topological properties and biological properties among proteins in different subcellular localizations. In addition, investigation was carried out to analyze the differences in 20 amino acid compositions for four protein categories. We also found that there were significant differences in all of the 20 amino acid compositions. These findings may be helpful for understanding the comprehensive relationship between protein subcellular localization and biological function.
Collapse
Affiliation(s)
- Qi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Shiyuan Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yi Pan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Dongqing Su
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Qianzi Lu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Yongchun Zuo
- The State key Laboratory of Reproductive Regulation, Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot 010070, China.
| | - Lei Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| |
Collapse
|
17
|
Ju Z, He JJ. Prediction of lysine glutarylation sites by maximum relevance minimum redundancy feature selection. Anal Biochem 2018; 550:1-7. [DOI: 10.1016/j.ab.2018.04.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 04/05/2018] [Accepted: 04/06/2018] [Indexed: 12/17/2022]
|
18
|
Lai HY, Chen XX, Chen W, Tang H, Lin H. Sequence-based predictive modeling to identify cancerlectins. Oncotarget 2018; 8:28169-28175. [PMID: 28423655 PMCID: PMC5438640 DOI: 10.18632/oncotarget.15963] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 02/24/2017] [Indexed: 11/25/2022] Open
Abstract
Lectins are a diverse type of glycoproteins or carbohydrate-binding proteins that have a wide distribution to various species. They can specially identify and exclusively bind to a certain kind of saccharide groups. Cancerlectins are a group of lectins that are closely related to cancer and play a major role in the initiation, survival, growth, metastasis and spread of tumor. Several computational methods have emerged to discriminate cancerlectins from non-cancerlectins, which promote the study on pathogenic mechanisms and clinical treatment of cancer. However, the predictive accuracies of most of these techniques are very limited. In this work, by constructing a benchmark dataset based on the CancerLectinDB database, a new amino acid sequence-based strategy for feature description was developed, and then the binomial distribution was applied to screen the optimal feature set. Ultimately, an SVM-based predictor was performed to distinguish cancerlectins from non-cancerlectins, and achieved an accuracy of 77.48% with AUC of 85.52% in jackknife cross-validation. The results revealed that our prediction model could perform better comparing with published predictive tools.
Collapse
Affiliation(s)
- Hong-Yan Lai
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xin-Xin Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wei Chen
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.,Department of Physics, School of Sciences, and Center for Genomics and Computational Biology, North China University of Science and Technology, Tangshan, Tangshan, China
| | - Hua Tang
- Department of Pathophysiology, Southwest Medical University, Luzhou, China
| | - Hao Lin
- Key Laboratory for Neuro-Information of Ministry of Education, School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
19
|
iDPF-PseRAAAC: A Web-Server for Identifying the Defensin Peptide Family and Subfamily Using Pseudo Reduced Amino Acid Alphabet Composition. PLoS One 2015; 10:e0145541. [PMID: 26713618 PMCID: PMC4694767 DOI: 10.1371/journal.pone.0145541] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 12/04/2015] [Indexed: 11/29/2022] Open
Abstract
Defensins as one of the most abundant classes of antimicrobial peptides are an essential part of the innate immunity that has evolved in most living organisms from lower organisms to humans. To identify specific defensins as interesting antifungal leads, in this study, we constructed a more rigorous benchmark dataset and the iDPF-PseRAAAC server was developed to predict the defensin family and subfamily. Using reduced dipeptide compositions were used, the overall accuracy of proposed method increased to 95.10% for the defensin family, and 98.39% for the vertebrate subfamily, which is higher than the accuracy from other methods. The jackknife test shows that more than 4% improvement was obtained comparing with the previous method. A free online server was further established for the convenience of most experimental scientists at http://wlxy.imu.edu.cn/college/biostation/fuwu/iDPF-PseRAAAC/index.asp. A friendly guide is provided to describe how to use the web server. We anticipate that iDPF-PseRAAAC may become a useful high-throughput tool for both basic research and drug design.
Collapse
|
20
|
Sankarasubramanian J, Vishnu US, Dinakaran V, Sridhar J, Gunasekaran P, Rajendhran J. Computational prediction of secretion systems and secretomes of Brucella: identification of novel type IV effectors and their interaction with the host. MOLECULAR BIOSYSTEMS 2015; 12:178-90. [PMID: 26575364 DOI: 10.1039/c5mb00607d] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Brucella spp. are facultative intracellular pathogens that cause brucellosis in various mammals including humans. Brucella survive inside the host cells by forming vacuoles and subverting host defence systems. This study was aimed to predict the secretion systems and the secretomes of Brucella spp. from 39 complete genome sequences available in the databases. Furthermore, an attempt was made to identify the type IV secretion effectors and their interactions with host proteins. We predicted the secretion systems of Brucella by the KEGG pathway and SecReT4. Brucella secretomes and type IV effectors (T4SEs) were predicted through genome-wide screening using JVirGel and S4TE, respectively. Protein-protein interactions of Brucella T4SEs with their hosts were analyzed by HPIDB 2.0. Genes coding for Sec and Tat pathways of secretion and type I (T1SS), type IV (T4SS) and type V (T5SS) secretion systems were identified and they are conserved in all the species of Brucella. In addition to the well-known VirB operon coding for the type IV secretion system (T4SS), we have identified the presence of additional genes showing homology with T4SS of other organisms. On the whole, 10.26 to 14.94% of total proteomes were found to be either secreted (secretome) or membrane associated (membrane proteome). Approximately, 1.7 to 3.0% of total proteomes were identified as type IV secretion effectors (T4SEs). Prediction of protein-protein interactions showed 29 and 36 host-pathogen specific interactions between Bos taurus (cattle)-B. abortus and Ovis aries (sheep)-B. melitensis, respectively. Functional characterization of the predicted T4SEs and their interactions with their respective hosts may reveal the secrets of host specificity of Brucella.
Collapse
Affiliation(s)
- Jagadesan Sankarasubramanian
- Department of Genetics, School of Biological Sciences, Madurai Kamaraj University, Madurai, 625021, Tamil Nadu, India.
| | | | | | | | | | | |
Collapse
|
21
|
Fan GL, Zhang XY, Liu YL, Nang Y, Wang H. DSPMP: Discriminating secretory proteins of malaria parasite by hybridizing different descriptors of Chou's pseudo amino acid patterns. J Comput Chem 2015; 36:2317-27. [PMID: 26484844 DOI: 10.1002/jcc.24210] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Revised: 08/20/2015] [Accepted: 08/23/2015] [Indexed: 12/28/2022]
Abstract
Identification of the proteins secreted by the malaria parasite is important for developing effective drugs and vaccines against infection. Therefore, we developed an improved predictor called "DSPMP" (Discriminating Secretory Proteins of Malaria Parasite) to identify the secretory proteins of the malaria parasite by integrating several vector features using support vector machine-based methods. DSPMP achieved an overall predictive accuracy of 98.61%, which is superior to that of the existing predictors in this field. We show that our method is capable of identifying the secretory proteins of the malaria parasite and found that the amino acid composition for buried and exposed sequences, denoted by AAC(b/e), was the most important feature for constructing the predictor. This article not only introduces a novel method for detecting the important features of sample proteins related to the malaria parasite but also provides a useful tool for tackling general protein-related problems. The DSPMP webserver is freely available at http://202.207.14.87:8032/fuwu/DSPMP/index.asp.
Collapse
Affiliation(s)
- Guo-Liang Fan
- Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Xiao-Yan Zhang
- Department of Physics, Inner Mongolia University of Technology, Hohhot, 010051, China
| | - Yan-Ling Liu
- Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Yi Nang
- Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| | - Hui Wang
- Department of Physics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China
| |
Collapse
|
22
|
Feng YE. Identify Secretory Protein of Malaria Parasite with Modified Quadratic Discriminant Algorithm and Amino Acid Composition. Interdiscip Sci 2015; 8:156-161. [DOI: 10.1007/s12539-015-0112-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Revised: 12/15/2014] [Accepted: 03/16/2015] [Indexed: 12/13/2022]
|
23
|
Zuo YC, Su WX, Zhang SH, Wang SS, Wu CY, Yang L, Li GP. Discrimination of membrane transporter protein types using K-nearest neighbor method derived from the similarity distance of total diversity measure. MOLECULAR BIOSYSTEMS 2015; 11:950-7. [PMID: 25607774 DOI: 10.1039/c4mb00681j] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Membrane transporters play crucial roles in the fundamental cellular processes of living organisms. Computational techniques are very necessary to annotate the transporter functions. In this study, a multi-class K nearest neighbor classifier based on the increment of diversity (KNN-ID) was developed to discriminate the membrane transporter types when the increment of diversity (ID) was introduced as one of the novel similarity distances. Comparisons with multiple recently published methods showed that the proposed KNN-ID method outperformed the other methods, obtaining more than 20% improvement for overall accuracy. The overall prediction accuracy reached was 83.1%, when the K was selected as 2. The prediction sensitivity achieved 76.7%, 89.1%, 80.1% for channels/pores, electrochemical potential-driven transporters, primary active transporters, respectively. Discrimination and comparison between any two different classes of transporters further demonstrated that the proposed method is a potential classifier and will play a complementary role for facilitating the functional assignment of transporters.
Collapse
Affiliation(s)
- Yong-Chun Zuo
- The Key Laboratory of Mammalian Reproductive Biology and Biotechnology of the Ministry of Education, College of Life Sciences, Inner Mongolia University, Hohhot, 010021, China.
| | | | | | | | | | | | | |
Collapse
|
24
|
Improved prediction of residue flexibility by embedding optimized amino acid grouping into RSA-based linear models. Amino Acids 2014; 46:2665-80. [DOI: 10.1007/s00726-014-1817-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Accepted: 07/21/2014] [Indexed: 11/26/2022]
|
25
|
Zuo YC, Peng Y, Liu L, Chen W, Yang L, Fan GL. Predicting peroxidase subcellular location by hybridizing different descriptors of Chou’ pseudo amino acid patterns. Anal Biochem 2014; 458:14-9. [DOI: 10.1016/j.ab.2014.04.032] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2014] [Revised: 04/22/2014] [Accepted: 04/25/2014] [Indexed: 11/28/2022]
|
26
|
Human proteins characterization with subcellular localizations. J Theor Biol 2014; 358:61-73. [PMID: 24862400 DOI: 10.1016/j.jtbi.2014.05.008] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2014] [Revised: 05/04/2014] [Accepted: 05/05/2014] [Indexed: 11/20/2022]
Abstract
Proteins are responsible for performing the vast majority of cellular functions which are critical to a cell's survival. The knowledge of the subcellular localization of proteins can provide valuable information about their molecular functions. Therefore, one of the fundamental goals in cell biology and proteomics is to analyze the subcellular localizations and functions of these proteins. Recent large-scale human genomics and proteomics studies have made it possible to characterize human proteins at a subcellular localization level. In this study, according to the annotation in Swiss-Prot, 8842 human proteins were classified into seven subcellular localizations. Human proteins in the seven subcellular localizations were compared by using topological properties, biological properties, codon usage indices, mRNA expression levels, protein complexity and physicochemical properties. All these properties were found to be significantly different in the seven categories. In addition, based on these properties and pseudo-amino acid compositions, a machine learning classifier was built for the prediction of protein subcellular localization. The study presented here was an attempt to address the aforementioned properties for comparing human proteins of different subcellular localizations. We hope our findings presented in this study may provide important help for the prediction of protein subcellular localization and for understanding the general function of human proteins in cells.
Collapse
|
27
|
Fan GL, Li QZ. Discriminating bioluminescent proteins by incorporating average chemical shift and evolutionary information into the general form of Chou's pseudo amino acid composition. J Theor Biol 2013; 334:45-51. [DOI: 10.1016/j.jtbi.2013.06.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Revised: 05/30/2013] [Accepted: 06/03/2013] [Indexed: 01/22/2023]
|
28
|
Predicting acidic and alkaline enzymes by incorporating the average chemical shift and gene ontology informations into the general form of Chou's PseAAC. Process Biochem 2013. [DOI: 10.1016/j.procbio.2013.05.012] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
29
|
Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model. PLoS One 2012. [PMID: 23189138 PMCID: PMC3506597 DOI: 10.1371/journal.pone.0049040] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
The malaria disease has become a cause of poverty and a major hindrance to economic development. The culprit of the disease is the parasite, which secretes an array of proteins within the host erythrocyte to facilitate its own survival. Accordingly, the secretory proteins of malaria parasite have become a logical target for drug design against malaria. Unfortunately, with the increasing resistance to the drugs thus developed, the situation has become more complicated. To cope with the drug resistance problem, one strategy is to timely identify the secreted proteins by malaria parasite, which can serve as potential drug targets. However, it is both expensive and time-consuming to identify the secretory proteins of malaria parasite by experiments alone. To expedite the process for developing effective drugs against malaria, a computational predictor called "iSMP-Grey" was developed that can be used to identify the secretory proteins of malaria parasite based on the protein sequence information alone. During the prediction process a protein sample was formulated with a 60D (dimensional) feature vector formed by incorporating the sequence evolution information into the general form of PseAAC (pseudo amino acid composition) via a grey system model, which is particularly useful for solving complicated problems that are lack of sufficient information or need to process uncertain information. It was observed by the jackknife test that iSMP-Grey achieved an overall success rate of 94.8%, remarkably higher than those by the existing predictors in this area. As a user-friendly web-server, iSMP-Grey is freely accessible to the public at http://www.jci-bioinfo.cn/iSMP-Grey. Moreover, for the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematical equations involved in this paper.
Collapse
|
30
|
Zuo YC, Chen W, Fan GL, Li QZ. A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins. Amino Acids 2012; 44:573-80. [PMID: 22851052 DOI: 10.1007/s00726-012-1374-z] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2011] [Accepted: 07/17/2012] [Indexed: 11/25/2022]
Abstract
The successful prediction of thermophilic proteins is useful for designing stable enzymes that are functional at high temperature. We have used the increment of diversity (ID), a novel amino acid composition-based similarity distance, in a 2-class K-nearest neighbor classifier to classify thermophilic and mesophilic proteins. And the KNN-ID classifier was successfully developed to predict the thermophilic proteins. Instead of extracting features from protein sequences as done previously, our approach was based on a diversity measure of symbol sequences. The similarity distance between each pair of protein sequences was first calculated to quantitatively measure the similarity level of one given sequence and the other. The query protein is then determined using the K-nearest neighbor algorithm. Comparisons with multiple recently published methods showed that the KNN-ID proposed in this study outperforms the other methods. The improved predictive performance indicated it is a simple and effective classifier for discriminating thermophilic and mesophilic proteins. At last, the influence of protein length and protein identity on prediction accuracy was discussed further. The prediction model and dataset used in this article can be freely downloaded from http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm .
Collapse
Affiliation(s)
- Yong-Chun Zuo
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
| | | | | | | |
Collapse
|
31
|
Chen W, Feng P, Lin H. Prediction of ketoacyl synthase family using reduced amino acid alphabets. J Ind Microbiol Biotechnol 2011; 39:579-84. [PMID: 22042516 DOI: 10.1007/s10295-011-1047-z] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2011] [Accepted: 10/04/2011] [Indexed: 11/28/2022]
Abstract
Ketoacyl synthases are enzymes involved in fatty acid synthesis and can be classified into five families based on primary sequence similarity. Different families have different catalytic mechanisms. Developing cost-effective computational models to identify the family of ketoacyl synthases will be helpful for enzyme engineering and in knowing individual enzymes' catalytic mechanisms. In this work, a support vector machine-based method was developed to predict ketoacyl synthase family using the n-peptide composition of reduced amino acid alphabets. In jackknife cross-validation, the model based on the 2-peptide composition of a reduced amino acid alphabet of size 13 yielded the best overall accuracy of 96.44% with average accuracy of 93.36%, which is superior to other state-of-the-art methods. This result suggests that the information provided by n-peptide compositions of reduced amino acid alphabets provides efficient means for enzyme family classification and that the proposed model can be efficiently used for ketoacyl synthase family annotation.
Collapse
Affiliation(s)
- Wei Chen
- Department of Physics, College of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan 063000, China.
| | | | | |
Collapse
|
32
|
Using increment of diversity to predict mitochondrial proteins of malaria parasite: integrating pseudo-amino acid composition and structural alphabet. Amino Acids 2010; 42:1309-16. [DOI: 10.1007/s00726-010-0825-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2010] [Accepted: 12/17/2010] [Indexed: 11/29/2022]
|
33
|
Joseph AP, Agarwal G, Mahajan S, Gelly JC, Swapna LS, Offmann B, Cadet F, Bornot A, Tyagi M, Valadié H, Schneider B, Etchebest C, Srinivasan N, De Brevern AG. A short survey on protein blocks. Biophys Rev 2010; 2:137-147. [PMID: 21731588 DOI: 10.1007/s12551-010-0036-1] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Protein structures are classically described in terms of secondary structures. Even if the regular secondary structures have relevant physical meaning, their recognition from atomic coordinates has some important limitations such as uncertainties in the assignment of boundaries of helical and β-strand regions. Further, on an average about 50% of all residues are assigned to an irregular state, i.e., the coil. Thus different research teams have focused on abstracting conformation of protein backbone in the localized short stretches. Using different geometric measures, local stretches in protein structures are clustered in a chosen number of states. A prototype representative of the local structures in each cluster is generally defined. These libraries of local structures prototypes are named as "structural alphabets". We have developed a structural alphabet, named Protein Blocks, not only to approximate the protein structure, but also to predict them from sequence. Since its development, we and other teams have explored numerous new research fields using this structural alphabet. We review here some of the most interesting applications.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- DSIMB, Dynamique des Structures et Interactions des Macromolécules Biologiques Université Paris-Diderot - Paris VII INTS INSERM : U665 INTS, 6 rue Alexandre Cabanel, 75739 Paris Cedex 15 FRANCE,FR
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|