1
|
Reis WF, Silva MES, Gondim ACS, Torres RCF, Carneiro RF, Nagano CS, Sampaio AH, Teixeira CS, Gomes LCBF, Sousa BL, Andrade AL, Teixeira EH, Vasconcelos MA. Glucose-Binding Dioclea bicolor Lectin (DBL): Purification, Characterization, Structural Analysis, and Antibacterial Properties. Protein J 2024; 43:559-576. [PMID: 38615284 DOI: 10.1007/s10930-024-10199-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/07/2024] [Indexed: 04/15/2024]
Abstract
In this study, we purified a lectin isolated from the seeds of Dioclea bicolor (DBL) via affinity purification. Electrophoresis analysis revealed that DBL had three bands, α, β, and γ chains, with molecular masses of approximately 29, 14, and 12 kDa, respectively. Gel filtration chromatography revealed that the native form of DBL had a molecular mass of approximately 100 kDa, indicating that it is a tetramer. Interestingly, DBL-induced hemagglutination was inhibited by several glucosides, mannosides, ampicillin, and tetracycline with minimum inhibitory concentration (MIC) values of 1.56-50 mM. Analysis of the complete amino acid sequence of DBL revealed the presence of 237 amino acids with high similarity to other Diocleinae lectins. Circular dichroism showed the prominent β-sheet secondary structure of DBL. Furthermore, DBL structure prediction revealed a Discrete Optimized Protein Energy (DOPE) score of -26,642.69141/Normalized DOPE score of -1.84041. The DBL monomer was found to consist a β-sandwich based on its 3D structure. Molecular docking showed the interactions between DBL and α-D-glucose, N-acetyl-D-glucosamine, α-D-mannose, α-methyl-D-mannoside, ampicillin, and tetracycline. In addition, DBL showed antimicrobial activity with an MIC of 125 μg/mL and exerted synergistic effects in combination with ampicillin and tetracycline (fractional inhibitory concentration index ≤ 0.5). Additionally, DBL significantly inhibited biofilm formation and showed no toxicity in murine fibroblasts (p < 0.05). These results suggest that DBL exhibits antimicrobial activity and works synergistically with antibiotics.
Collapse
Affiliation(s)
- Willian F Reis
- Departamento de Ciências da Natureza E da Terra, Universidade Do Estado de Minas Gerais, Unidade de Divinópolis, Divinópolis, MG, Brazil
| | - Marcos E S Silva
- Faculdade de Educação de Itapipoca, Universidade Estadual Do Ceará, Itapipoca, CE, Brazil
- Faculdade de Ciências Exatas E Naturais, Universidade Do Estado Do Rio Grande Do Norte, Mossoró, RN, Brazil
| | - Ana C S Gondim
- Departamento de Química Orgânica E Inorgânica, Universidade Federal Do Ceará, Fortaleza, CE, Brazil
| | - Renato C F Torres
- Centro de Ciências Agrárias E da Biodiversidade, Universidade Federal Do Cariri, Crato, CE, Brazil
| | - Rômulo F Carneiro
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal Do Ceará, Fortaleza, CE, Brazil
| | - Celso S Nagano
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal Do Ceará, Fortaleza, CE, Brazil
| | - Alexandre H Sampaio
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal Do Ceará, Fortaleza, CE, Brazil
| | - Claudener S Teixeira
- Centro de Ciências Agrárias E da Biodiversidade, Universidade Federal Do Cariri, Crato, CE, Brazil
| | - Lenita C B F Gomes
- Faculdade de Filosofia Dom Aureliano Matos, Universidade Estadual Do Ceará, Limoeiro Do Norte, CE, Brazil
| | - Bruno L Sousa
- Faculdade de Filosofia Dom Aureliano Matos, Universidade Estadual Do Ceará, Limoeiro Do Norte, CE, Brazil
| | - Alexandre L Andrade
- Laboratório Integrado de Biomoléculas - LIBS, Departamento de Patologia E Medicina Legal, Universidade Federal Do Ceará, Fortaleza, CE, Brazil
| | - Edson H Teixeira
- Laboratório Integrado de Biomoléculas - LIBS, Departamento de Patologia E Medicina Legal, Universidade Federal Do Ceará, Fortaleza, CE, Brazil
| | - Mayron A Vasconcelos
- Departamento de Ciências da Natureza E da Terra, Universidade Do Estado de Minas Gerais, Unidade de Divinópolis, Divinópolis, MG, Brazil.
- Faculdade de Educação de Itapipoca, Universidade Estadual Do Ceará, Itapipoca, CE, Brazil.
- Faculdade de Ciências Exatas E Naturais, Universidade Do Estado Do Rio Grande Do Norte, Mossoró, RN, Brazil.
| |
Collapse
|
2
|
Abbass J, Parisi C. Machine learning-based prediction of proteins' architecture using sequences of amino acids and structural alphabets. J Biomol Struct Dyn 2024:1-16. [PMID: 38505995 DOI: 10.1080/07391102.2024.2328736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 03/05/2024] [Indexed: 03/21/2024]
Abstract
In addition to the growth of protein structures generated through wet laboratory experiments and deposited in the PDB repository, AlphaFold predictions have significantly contributed to the creation of a much larger database of protein structures. Annotating such a vast number of structures has become an increasingly challenging task. CATH is widely recognized as one the most common platforms for addressing this challenge, as it classifies proteins based on their structural and evolutionary relationships, offering the scientific community an invaluable resource for uncovering various properties, including functional annotations. While CATH annotation involves - to some extent - human intervention, keeping up with the classification of the rapidly expanding repositories of protein structures has become exceedingly difficult. Therefore, there is a pressing need for a fully automated approach. On the other hand, the abundance of protein sequences stemming from next generation sequencing technologies, lacking structural annotations, presents an additional challenge to the scientific community. Consequently, 'pre-annotating' protein sequences with structural features, ensuring a high level of precision, could prove highly advantageous. In this paper, after a thorough investigation, we introduce a novel machine-learning model capable of classifying any protein domain, whether it has a known structure or not, into one of the 40 main CATH Architectures. We achieve an F1 Score of 0.92 using only the amino acid sequence and a score of 0.94 using both the sequence of amino acids and the sequence of structural alphabets.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Jad Abbass
- School of Computer Science and Mathematics, Kingston University, London, UK
| | - Charles Parisi
- School of Computer Science and Mathematics, Kingston University, London, UK
- Telecom Physique Strasbourg, Strasbourg University, Strasbourg, France
| |
Collapse
|
3
|
Wu J, Qing H, Ouyang J, Zhou J, Gao Z, Mason CE, Liu Z, Shi T. HiFun: homology independent protein function prediction by a novel protein-language self-attention model. Brief Bioinform 2023; 24:bbad311. [PMID: 37649370 DOI: 10.1093/bib/bbad311] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 07/31/2023] [Accepted: 08/08/2023] [Indexed: 09/01/2023] Open
Abstract
Protein function prediction based on amino acid sequence alone is an extremely challenging but important task, especially in metagenomics/metatranscriptomics field, in which novel proteins have been uncovered exponentially from new microorganisms. Many of them are extremely low homology to known proteins and cannot be annotated with homology-based or information integrative methods. To overcome this problem, we proposed a Homology Independent protein Function annotation method (HiFun) based on a unified deep-learning model by reassembling the sequence as protein language. The robustness of HiFun was evaluated using the benchmark datasets and metrics in the CAFA3 challenge. To navigate the utility of HiFun, we annotated 2 212 663 unknown proteins and discovered novel motifs in the UHGP-50 catalog. We proved that HiFun can extract latent function related structure features which empowers it ability to achieve function annotation for non-homology proteins. HiFun can substantially improve newly proteins annotation and expand our understanding of microorganisms' adaptation in various ecological niches. Moreover, we provided a free and accessible webservice at http://www.unimd.org/HiFun, requiring only protein sequences as input, offering researchers an efficient and practical platform for predicting protein functions.
Collapse
Affiliation(s)
- Jun Wu
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
| | - Haipeng Qing
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
| | - Jian Ouyang
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
| | - Jiajia Zhou
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
| | - Zihao Gao
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
| | | | - Zhichao Liu
- Nonclinical Drug Safety, Boehringer Ingelheim Pharmaceuticals, Inc., Ridgefield, Connecticut 06877, United States
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, the Institute of Biomedical Sciences and The School of Life Sciences, East China Normal University, Shanghai , 200241, China
- School of Statistics, Key Laboratory of Advanced Theory and Application in Statistics and Data Science-MOE, East China Normal University, Shanghai 200062, China
- Beijing Advanced Innovation Center, for Big Data-Based Precision Medicine, Beihang University & Capital Medical University, Beijing 100083, China
| |
Collapse
|
4
|
Li Y, Wei Y, Xu S, Tan Q, Zong L, Wang J, Wang Y, Chen J, Hong L, Li Y. AcrNET: predicting anti-CRISPR with deep learning. Bioinformatics 2023; 39:btad259. [PMID: 37084259 PMCID: PMC10174705 DOI: 10.1093/bioinformatics/btad259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 04/08/2023] [Accepted: 04/12/2023] [Indexed: 04/22/2023] Open
Abstract
MOTIVATION As an important group of proteins discovered in phages, anti-CRISPR inhibits the activity of the immune system of bacteria (i.e. CRISPR-Cas), offering promise for gene editing and phage therapy. However, the prediction and discovery of anti-CRISPR are challenging due to their high variability and fast evolution. Existing biological studies rely on known CRISPR and anti-CRISPR pairs, which may not be practical considering the huge number. Computational methods struggle with prediction performance. To address these issues, we propose a novel deep neural network for anti-CRISPR analysis (AcrNET), which achieves significant performance. RESULTS On both the cross-fold and cross-dataset validation, our method outperforms the state-of-the-art methods. Notably, AcrNET improves the prediction performance by at least 15% regarding the F1 score for the cross-dataset test problem comparing with state-of-art Deep Learning method. Moreover, AcrNET is the first computational method to predict the detailed anti-CRISPR classes, which may help illustrate the anti-CRISPR mechanism. Taking advantage of a Transformer protein language model ESM-1b, which was pre-trained on 250 million protein sequences, AcrNET overcomes the data scarcity problem. Extensive experiments and analysis suggest that the Transformer model feature, evolutionary feature, and local structure feature complement each other, which indicates the critical properties of anti-CRISPR proteins. AlphaFold prediction, further motif analysis, and docking experiments further demonstrate that AcrNET can capture the evolutionarily conserved pattern and the interaction between anti-CRISPR and the target implicitly. AVAILABILITY AND IMPLEMENTATION Web server: https://proj.cse.cuhk.edu.hk/aihlab/AcrNET/. Training code and pre-trained model are available at.
Collapse
Affiliation(s)
- Yunxiang Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR 999077, China
| | - Yumeng Wei
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR 999077, China
| | - Sheng Xu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR 999077, China
| | - Qingxiong Tan
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR 999077, China
| | - Licheng Zong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR 999077, China
| | - Jiuming Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR 999077, China
| | - Yixuan Wang
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR 999077, China
| | - Jiayang Chen
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR 999077, China
| | - Liang Hong
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR 999077, China
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, Hong Kong SAR 999077, China
| |
Collapse
|
5
|
Chen Y, Gao L, Zhang T. Stack-VTP: prediction of vesicle transport proteins based on stacked ensemble classifier and evolutionary information. BMC Bioinformatics 2023; 24:137. [PMID: 37029385 PMCID: PMC10080812 DOI: 10.1186/s12859-023-05257-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 03/28/2023] [Indexed: 04/09/2023] Open
Abstract
Vesicle transport proteins not only play an important role in the transmembrane transport of molecules, but also have a place in the field of biomedicine, so the identification of vesicle transport proteins is particularly important. We propose a method based on ensemble learning and evolutionary information to identify vesicle transport proteins. Firstly, we preprocess the imbalanced dataset by random undersampling. Secondly, we extract position-specific scoring matrix (PSSM) from protein sequences, and then further extract AADP-PSSM and RPSSM features from PSSM, and use the Max-Relevance-Max-Distance (MRMD) algorithm to select the optimal feature subset. Finally, the optimal feature subset is fed into the stacked classifier for vesicle transport proteins identification. The experimental results show that the of accuracy (ACC), sensitivity (SN) and specificity (SP) of our method on the independent testing set are 82.53%, 0.774 and 0.836, respectively. The SN, SP and ACC of our proposed method are 0.013, 0.007 and 0.76% higher than the current state-of-the-art methods.
Collapse
Affiliation(s)
- Yu Chen
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Lixin Gao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.
| |
Collapse
|
6
|
Zhu L, Wang X, Li F, Song J. PreAcrs: a machine learning framework for identifying anti-CRISPR proteins. BMC Bioinformatics 2022; 23:444. [PMID: 36284264 PMCID: PMC9597991 DOI: 10.1186/s12859-022-04986-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 10/14/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Anti-CRISPR proteins are potent modulators that inhibit the CRISPR-Cas immunity system and have huge potential in gene editing and gene therapy as a genome-editing tool. Extensive studies have shown that anti-CRISPR proteins are essential for modifying endogenous genes, promoting the RNA-guided binding and cleavage of DNA or RNA substrates. In recent years, identifying and characterizing anti-CRISPR proteins has become a hot and significant research topic in bioinformatics. However, as most anti-CRISPR proteins fall short in sharing similarities to those currently known, traditional screening methods are time-consuming and inefficient. Machine learning methods could fill this gap with powerful predictive capability and provide a new perspective for anti-CRISPR protein identification. RESULTS Here, we present a novel machine learning ensemble predictor, called PreAcrs, to identify anti-CRISPR proteins from protein sequences directly. Three features and eight different machine learning algorithms were used to train PreAcrs. PreAcrs outperformed other existing methods and significantly improved the prediction accuracy for identifying anti-CRISPR proteins. CONCLUSIONS In summary, the PreAcrs predictor achieved a competitive performance for predicting new anti-CRISPR proteins in terms of accuracy and robustness. We anticipate PreAcrs will be a valuable tool for researchers to speed up the research process. The source code is available at: https://github.com/Lyn-666/anti_CRISPR.git .
Collapse
Affiliation(s)
- Lin Zhu
- Institute for Advanced Study, Shenzhen University, Shenzhen, China
| | - Xiaoyu Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
| | - Fuyi Li
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC Australia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800 Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800 Australia
| |
Collapse
|
7
|
Xu Y, Wojtczak D. Dive into machine learning algorithms for influenza virus host prediction with hemagglutinin sequences. Biosystems 2022; 220:104740. [DOI: 10.1016/j.biosystems.2022.104740] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 07/02/2022] [Accepted: 07/16/2022] [Indexed: 11/26/2022]
|
8
|
Pritam M, Singh G, Kumar R, Singh SP. Screening of potential antigens from whole proteome and development of multi-epitope vaccine against Rhizopus delemar using immunoinformatics approaches. J Biomol Struct Dyn 2022; 41:2118-2145. [PMID: 35067195 DOI: 10.1080/07391102.2022.2028676] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
Mucormycosis is a deadly fungal disease mainly caused by Rhizopus oryzae (strain 99-880), also known as Rhizopus delemar. Previously, mucormycosis occurs in immunocompromised patients of diabetes mellitus, cancer, organ transplant, etc. But there was a drastic increase in mucormycosis cases in the ongoing COVID-19 pandemic. Despite several available therapies and antifungal treatments, the mortality rate of mucormycosis is about more than 50%. Currently, there is no vaccine available in the market for mucormycosis that urgently needs to develop a potential vaccine against mucormycosis with high efficacy. In the present study, we have screened 4 genome-derived predicted antigens (GDPA) through sequential filtration of the whole proteome of R. delemar using different benchmarked bioinformatics tools. These 4 GDPA along with 4 randomly selected experimentally reported antigens (ERA) were sourced for prediction of B- and T- cell epitopes and utilized in designing of two potential multi-epitope vaccine candidates which can induce both innate and adaptive immunity against R. delemar. Besides these, comparative immune simulation studies and in silico cloning were performed using L. lactis as an expression system for their possible uses as oral vaccines. This is the first multi-epitope vaccine designed against R. delemar through systematic pipelined reverse vaccinology and immunoinformatic approaches. Although the wet-lab based experimental validation of designed vaccines is required before testing in the preclinical model, the current study will significantly help in reducing the cost of experimentation as well as improving the efficacy of vaccine therapy against mucormycosis and other pathogenic diseases.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Manisha Pritam
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow, India
| | - Garima Singh
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow, India
| | - Rajnish Kumar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow, India
| | | |
Collapse
|
9
|
Gong Y, Dong B, Zhang Z, Zhai Y, Gao B, Zhang T, Zhang J. VTP-Identifier: Vesicular Transport Proteins Identification Based on PSSM Profiles and XGBoost. Front Genet 2022; 12:808856. [PMID: 35047020 PMCID: PMC8762342 DOI: 10.3389/fgene.2021.808856] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/29/2021] [Indexed: 11/13/2022] Open
Abstract
Vesicular transport proteins are related to many human diseases, and they threaten human health when they undergo pathological changes. Protein function prediction has been one of the most in-depth topics in bioinformatics. In this work, we developed a useful tool to identify vesicular transport proteins. Our strategy is to extract transition probability composition, autocovariance transformation and other information from the position-specific scoring matrix as feature vectors. EditedNearesNeighbours (ENN) is used to address the imbalance of the data set, and the Max-Relevance-Max-Distance (MRMD) algorithm is adopted to reduce the dimension of the feature vector. We used 5-fold cross-validation and independent test sets to evaluate our model. On the test set, VTP-Identifier presented a higher performance compared with GRU. The accuracy, Matthew's correlation coefficient (MCC) and area under the ROC curve (AUC) were 83.6%, 0.531 and 0.873, respectively.
Collapse
Affiliation(s)
- Yue Gong
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Benzhi Dong
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Zixiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Yixiao Zhai
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Bo Gao
- Department of Radiology, The Second Affiliated Hospital, Harbin Medical University, Harbin, China
| | - Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Jingyu Zhang
- Department of Neurology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
10
|
Yang C, Yang Z, Tong K, Wang J, Yang W, Yu R, Jiang F, Ji Y. Homology modeling and molecular docking simulation of martentoxin as a specific inhibitor of the BK channel. ANNALS OF TRANSLATIONAL MEDICINE 2022; 10:71. [PMID: 35282126 PMCID: PMC8848368 DOI: 10.21037/atm-21-6967] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Accepted: 01/13/2022] [Indexed: 11/18/2022]
Abstract
Background Large conductance calcium-activated potassium channel (BK channel) is gated by both voltage and calcium ions and is widely distributed in excitable and nonexcitable cells. BK channel plays an important role in epilepsy and other diseases, but BK channel subtype-specific drugs are still extremely rare. Martentoxin was previously isolated from the venom of members of Scorpionidae and shown to be composed of 37 amino acids. Research has shown that the pharmacological selectivity of martentoxin to the BK channel is higher than that to other potassium channels. Therefore, it is of great significance to study the mechanism of interaction between martentoxin and BK channels. Methods The three-dimensional structure of BK channel pore region was constructed by homologous modeling method, and the key amino acid sites of BK channel interaction with martentoxin were analyzed by protein-protein docking, molecular dynamic simulation and virtual alanine mutation. Results Based on homologous modeling of BK channel pore structure and protein-protein docking analysis, Phe1, Lys28 and Arg35 of martentoxin were found to be key amino acids in toxin BK channel interaction. Conclusions This study reveals the structural basis of martentoxin interaction with BK channel. These results will contribute to the design of BK channel specific blockers based on the structure of martentoxin.
Collapse
Affiliation(s)
- Chao Yang
- Translational Institute for Cancer Pain, Chongming Hospital Affiliated to Shanghai University of Medicine and Health Sciences (Xinhua Hospital Chongming Branch), Shanghai, China
| | - Zihao Yang
- College of Life Sciences and Food Engineering, Huaiyin Institute of Technology, Huai'an, China
| | - Kuiyuan Tong
- College of Life Sciences and Food Engineering, Huaiyin Institute of Technology, Huai'an, China
| | - Jiawei Wang
- School of Life and Medicine Sciences, Shanghai University, Shanghai, China
| | - Wanli Yang
- Translational Institute for Cancer Pain, Chongming Hospital Affiliated to Shanghai University of Medicine and Health Sciences (Xinhua Hospital Chongming Branch), Shanghai, China
| | - Ruihua Yu
- Translational Institute for Cancer Pain, Chongming Hospital Affiliated to Shanghai University of Medicine and Health Sciences (Xinhua Hospital Chongming Branch), Shanghai, China
| | - Feng Jiang
- Translational Institute for Cancer Pain, Chongming Hospital Affiliated to Shanghai University of Medicine and Health Sciences (Xinhua Hospital Chongming Branch), Shanghai, China
| | - Yonghua Ji
- Translational Institute for Cancer Pain, Chongming Hospital Affiliated to Shanghai University of Medicine and Health Sciences (Xinhua Hospital Chongming Branch), Shanghai, China.,School of Life and Medicine Sciences, Shanghai University, Shanghai, China
| |
Collapse
|
11
|
Jia Y, Huang S, Zhang T. KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest. Front Genet 2021; 12:811158. [PMID: 34912382 PMCID: PMC8667860 DOI: 10.3389/fgene.2021.811158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 11/15/2021] [Indexed: 02/04/2023] Open
Abstract
DNA-binding protein (DBP) is a protein with a special DNA binding domain that is associated with many important molecular biological mechanisms. Rapid development of computational methods has made it possible to predict DBP on a large scale; however, existing methods do not fully integrate DBP-related features, resulting in rough prediction results. In this article, we develop a DNA-binding protein identification method called KK-DBP. To improve prediction accuracy, we propose a feature extraction method that fuses multiple PSSM features. The experimental results show a prediction accuracy on the independent test dataset PDB186 of 81.22%, which is the highest of all existing methods.
Collapse
Affiliation(s)
- Yuran Jia
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| | - Shan Huang
- Department of Neurology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, China
| |
Collapse
|
12
|
Sousa ARDO, Andrade FRN, Chaves RP, Sousa BLD, Lima DBD, Souza RODS, da Silva CGL, Teixeira CS, Sampaio AH, Nagano CS, Carneiro RF. Structural characterization of a galectin isolated from the marine sponge Chondrilla caribensis with leishmanicidal potential. Biochim Biophys Acta Gen Subj 2021; 1865:129992. [PMID: 34508835 DOI: 10.1016/j.bbagen.2021.129992] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/11/2021] [Accepted: 08/19/2021] [Indexed: 01/08/2023]
Abstract
BACKGROUND Solving primary structure of lectins leads to an understanding of the physiological roles within an organism and its biotechnological potential. Only eight sponge lectins have had their primary structure fully determined. METHODS The primary structure of CCL, Chondrilla caribensis lectin, was determined by tandem mass spectrometry. The three-dimensional structure was predicted and the protein-carbohydrate interaction analysed by molecular docking. Furthermore, the anti-leishmanial activity was observed by assays with Leishmania infantum. RESULTS The amino acid sequence consists of 142 amino acids with a calculated molecular mass of 15,443 Da. The lectin has a galectin-like domain architecture. As observed in other sponge galectins, the signature sequence of a highly conserved domain was also identified in CCL with some modifications. CCL exhibits a typical galectin structure consisting of a β-sandwich. Molecular docking showed that the amino acids interacting with CCL ligands at the monosaccharide binding site are mostly the same as those conserved in this family of lectins. Through its interaction with L. infantum glycans, CCL was able to inhibit the development of this parasite. CCL also induced apoptosis after eliciting ROS production and altering the membrane integrity of Leishmania infantum promastigote. CONCLUSIONS CCL joins the restricted group of sponge lectins with determined primary structure and very high biotechnological potential owing to its promising results against pathogens that cause Leishmaniasis. GENERAL SIGNIFICANCE As the determination of primary structure is important for biological studies, now CCL can become a sponge galectin with an exciting future in the field of human health.
Collapse
Affiliation(s)
- Andressa Rocha de Oliveira Sousa
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, 60440-970 Fortaleza, Ceará, Brazil
| | - Francisco Regivânio Nascimento Andrade
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, 60440-970 Fortaleza, Ceará, Brazil
| | - Renata Pinheiro Chaves
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, 60440-970 Fortaleza, Ceará, Brazil
| | - Bruno Lopes de Sousa
- Faculdade de Filosofia Dom Aureliano Matos, Universidade Estadual do Ceará, Brazil
| | | | | | | | - Claudener Souza Teixeira
- Centro de Ciências Agrárias e da Biodiversidade, Universidade Federal do Cariri, Crato, CE, Brazil
| | - Alexandre Holanda Sampaio
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, 60440-970 Fortaleza, Ceará, Brazil
| | - Celso Shiniti Nagano
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, 60440-970 Fortaleza, Ceará, Brazil
| | - Rômulo Farias Carneiro
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, 60440-970 Fortaleza, Ceará, Brazil.
| |
Collapse
|
13
|
Zervou MA, Doutsi E, Pavlidis P, Tsakalides P. Structural classification of proteins based on the computationally efficient recurrence quantification analysis and horizontal visibility graphs. Bioinformatics 2021; 37:1796-1804. [PMID: 34048559 DOI: 10.1093/bioinformatics/btab407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 04/13/2021] [Accepted: 05/27/2021] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein structural class prediction is one of the most significant problems in bioinformatics, as it has a prominent role in understanding the function and evolution of proteins. Designing a computationally efficient but at the same time accurate prediction method remains a pressing issue, especially for sequences that we cannot obtain a sufficient amount of homologous information from existing protein sequence databases. Several studies demonstrate the potential of utilizing chaos game representation (CGR) along with time series analysis tools such as recurrence quantification analysis (RQA), complex networks, horizontal visibility graphs (HVG) and others. However, the majority of existing works involve a large amount of features and they require an exhaustive, time consuming search of the optimal parameters. To address the aforementioned problems, this work adopts the generalized multidimensional recurrence quantification analysis (GmdRQA) as an efficient tool that enables to process concurrently a multidimensional time series and reduce the number of features. In addition, two data-driven algorithms, namely average mutual information (AMI) and false nearest neighbors (FNN), are utilized to define in a fast yet precise manner the optimal GmdRQA parameters. RESULTS The classification accuracy is improved by the combination of GmdRQA with the HVG. Experimental evaluation on a real benchmark dataset demonstrates that our methods achieve similar performance with the state-of-the-art but with a smaller computational cost. AVAILABILITY The code to reproduce all the results is available at https://github.com/aretiz/protein_structure_classification/tree/main. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michaela Areti Zervou
- Department of Computer Science, University of Crete, Heraklion, 700 13, Greece.,Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, 700 13, Greece
| | - Effrosyni Doutsi
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, 700 13, Greece
| | - Pavlos Pavlidis
- Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, 700 13, Greece
| | - Panagiotis Tsakalides
- Department of Computer Science, University of Crete, Heraklion, 700 13, Greece.,Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, 700 13, Greece
| |
Collapse
|
14
|
iT3SE-PX: Identification of Bacterial Type III Secreted Effectors Using PSSM Profiles and XGBoost Feature Selection. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6690299. [PMID: 33505516 PMCID: PMC7806399 DOI: 10.1155/2021/6690299] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 12/24/2020] [Accepted: 12/26/2020] [Indexed: 11/18/2022]
Abstract
Identification of bacterial type III secreted effectors (T3SEs) has become a popular research topic in the field of bioinformatics due to its crucial role in understanding host-pathogen interaction and developing better therapeutic targets against the pathogens. However, the recognition of all effector proteins by using traditional experimental approaches is often time-consuming and laborious. Therefore, development of computational methods to accurately predict putative novel effectors is important in reducing the number of biological experiments for validation. In this study, we proposed a method, called iT3SE-PX, to identify T3SEs solely based on protein sequences. First, three kinds of features were extracted from the position-specific scoring matrix (PSSM) profiles to help train a machine learning (ML) model. Then, the extreme gradient boosting (XGBoost) algorithm was performed to rank these features based on their classification ability. Finally, the optimal features were selected as inputs to a support vector machine (SVM) classifier to predict T3SEs. Based on the two benchmark datasets, we conducted a 100-time randomized 5-fold cross validation (CV) and an independent test, respectively. The experimental results demonstrated that the proposed method achieved superior performance compared to most of the existing methods and could serve as a useful tool for identifying putative T3SEs, given only the sequence information.
Collapse
|
15
|
Zhang J, Lv L, Lu D, Kong D, Al-Alashaari MAA, Zhao X. Variable selection from a feature representing protein sequences: a case of classification on bacterial type IV secreted effectors. BMC Bioinformatics 2020; 21:480. [PMID: 33109082 PMCID: PMC7590791 DOI: 10.1186/s12859-020-03826-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 10/19/2020] [Indexed: 12/13/2022] Open
Abstract
Background Classification of certain proteins with specific functions is momentous for biological research. Encoding approaches of protein sequences for feature extraction play an important role in protein classification. Many computational methods (namely classifiers) are used for classification on protein sequences according to various encoding approaches. Commonly, protein sequences keep certain labels corresponding to different categories of biological functions (e.g., bacterial type IV secreted effectors or not), which makes protein prediction a fantasy. As to protein prediction, a kernel set of protein sequences keeping certain labels certified by biological experiments should be existent in advance. However, it has been hardly ever seen in prevailing researches. Therefore, unsupervised learning rather than supervised learning (e.g. classification) should be considered. As to protein classification, various classifiers may help to evaluate the effectiveness of different encoding approaches. Besides, variable selection from an encoded feature representing protein sequences is an important issue that also needs to be considered. Results Focusing on the latter problem, we propose a new method for variable selection from an encoded feature representing protein sequences. Taking a benchmark dataset containing 1947 protein sequences as a case, experiments are made to identify bacterial type IV secreted effectors (T4SE) from protein sequences, which are composed of 399 T4SE and 1548 non-T4SE. Comparable and quantified results are obtained only using certain components of the encoded feature, i.e., position-specific scoring matix, and that indicates the effectiveness of our method. Conclusions Certain variables other than an encoded feature they belong to do work for discrimination between different types of proteins. In addition, ensemble classifiers with an automatic assignment of different base classifiers do achieve a better classification result.
Collapse
Affiliation(s)
- Jian Zhang
- College of Artificial Intelligence, Wuxi Vocational College of Science and Technology, No. 8 Xinxi Road, Wuxi, 214028, China
| | - Lixin Lv
- College of Artificial Intelligence, Wuxi Vocational College of Science and Technology, No. 8 Xinxi Road, Wuxi, 214028, China
| | - Donglei Lu
- College of Artificial Intelligence, Wuxi Vocational College of Science and Technology, No. 8 Xinxi Road, Wuxi, 214028, China
| | - Denan Kong
- College of Information and Computer Engineering, Northeast Forestry University, No. 26 Hexing Road, Harbin, 150040, China
| | | | - Xudong Zhao
- College of Information and Computer Engineering, Northeast Forestry University, No. 26 Hexing Road, Harbin, 150040, China.
| |
Collapse
|
16
|
Wang J, Dai W, Li J, Xie R, Dunstan RA, Stubenrauch C, Zhang Y, Lithgow T. PaCRISPR: a server for predicting and visualizing anti-CRISPR proteins. Nucleic Acids Res 2020; 48:W348-W357. [PMID: 32459325 PMCID: PMC7319593 DOI: 10.1093/nar/gkaa432] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 04/22/2020] [Accepted: 05/13/2020] [Indexed: 01/09/2023] Open
Abstract
Anti-CRISPRs are widespread amongst bacteriophage and promote bacteriophage infection by inactivating the bacterial host's CRISPR–Cas defence system. Identifying and characterizing anti-CRISPR proteins opens an avenue to explore and control CRISPR–Cas machineries for the development of new CRISPR–Cas based biotechnological and therapeutic tools. Past studies have identified anti-CRISPRs in several model phage genomes, but a challenge exists to comprehensively screen for anti-CRISPRs accurately and efficiently from genome and metagenome sequence data. Here, we have developed an ensemble learning based predictor, PaCRISPR, to accurately identify anti-CRISPRs from protein datasets derived from genome and metagenome sequencing projects. PaCRISPR employs different types of feature recognition united within an ensemble framework. Extensive cross-validation and independent tests show that PaCRISPR achieves a significantly more accurate performance compared with homology-based baseline predictors and an existing toolkit. The performance of PaCRISPR was further validated in discovering anti-CRISPRs that were not part of the training for PaCRISPR, but which were recently demonstrated to function as anti-CRISPRs for phage infections. Data visualization on anti-CRISPR relationships, highlighting sequence similarity and phylogenetic considerations, is part of the output from the PaCRISPR toolkit, which is freely available at http://pacrispr.erc.monash.edu/.
Collapse
Affiliation(s)
- Jiawei Wang
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - Wei Dai
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Jiahui Li
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Ruopeng Xie
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Rhys A Dunstan
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - Christopher Stubenrauch
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, VIC 3800, Australia
| | - Yanju Zhang
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Trevor Lithgow
- To whom correspondence should be addressed. Tel: +61 3 9902 9217; Fax: +61 3 9905 3726;
| |
Collapse
|
17
|
Pritam M, Singh G, Swaroop S, Singh AK, Pandey B, Singh SP. A cutting-edge immunoinformatics approach for design of multi-epitope oral vaccine against dreadful human malaria. Int J Biol Macromol 2020; 158:159-179. [PMID: 32360460 PMCID: PMC7189201 DOI: 10.1016/j.ijbiomac.2020.04.191] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Revised: 03/28/2020] [Accepted: 04/22/2020] [Indexed: 12/18/2022]
Abstract
Human malaria is a pathogenic disease mainly caused by Plasmodium falciparum, which was responsible for about 405,000 deaths globally in the year 2018. To date, several vaccine candidates have been evaluated for prevention, which failed to produce optimal output at various preclinical/clinical stages. This study is based on designing of polypeptide vaccines (PVs) against human malaria that cover almost all stages of life-cycle of Plasmodium and for the same 5 genome derived predicted antigenic proteins (GDPAP) have been used. For the development of a multi-immune inducer, 15 PVs were initially designed using T-cell epitope ensemble, which covered >99% human population as well as linear B-cell epitopes with or without adjuvants. The immune simulation of PVs showed higher levels of T-cell and B-cell activities compared to positive and negative vaccine controls. Furthermore, in silico cloning of PVs and codon optimization followed by enhanced expression within Lactococcus lactis host system was also explored. Although, the study has sound theoretical and in silico findings, the in vitro/in vivo evaluation seems imperative to warrant the immunogenicity and safety of PVs towards management of P. falciparum infection in the future.
Collapse
Affiliation(s)
- Manisha Pritam
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow Campus, Lucknow 226028, India
| | - Garima Singh
- Amity Institute of Biotechnology, Amity University Uttar Pradesh, Lucknow Campus, Lucknow 226028, India
| | - Suchit Swaroop
- Experimental & Public Health Lab, Department of Zoology, University of Lucknow, Lucknow 226007, India
| | - Akhilesh Kumar Singh
- Department of Biotechnology, Mahatma Gandhi Central University, Bihar 845401, India
| | - Brijesh Pandey
- Department of Biotechnology, Mahatma Gandhi Central University, Bihar 845401, India
| | | |
Collapse
|
18
|
Ge Y, Zhao S, Zhao X. A step-by-step classification algorithm of protein secondary structures based on double-layer SVM model. Genomics 2020; 112:1941-1946. [DOI: 10.1016/j.ygeno.2019.11.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Revised: 10/15/2019] [Accepted: 11/11/2019] [Indexed: 11/26/2022]
|
19
|
Apurva M, Mazumdar H. Predicting structural class for protein sequences of 40% identity based on features of primary and secondary structure using Random Forest algorithm. Comput Biol Chem 2020; 84:107164. [DOI: 10.1016/j.compbiolchem.2019.107164] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Revised: 10/25/2019] [Accepted: 11/10/2019] [Indexed: 02/08/2023]
|
20
|
Guo L, Wang S, Li M, Cao Z. Accurate classification of membrane protein types based on sequence and evolutionary information using deep learning. BMC Bioinformatics 2019; 20:700. [PMID: 31874615 PMCID: PMC6929490 DOI: 10.1186/s12859-019-3275-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background Membrane proteins play an important role in the life activities of organisms. Knowing membrane protein types provides clues for understanding the structure and function of proteins. Though various computational methods for predicting membrane protein types have been developed, the results still do not meet the expectations of researchers. Results We propose two deep learning models to process sequence information and evolutionary information, respectively. Both models obtained better results than traditional machine learning models. Furthermore, to improve the performance of the sequence information model, we also provide a new vector representation method to replace the one-hot encoding, whose overall success rate improved by 3.81% and 6.55% on two datasets. Finally, a more effective model is obtained by fusing the above two models, whose overall success rate reached 95.68% and 92.98% on two datasets. Conclusion The final experimental results show that our method is more effective than existing methods for predicting membrane protein types, which can help laboratory researchers to identify the type of novel membrane proteins.
Collapse
Affiliation(s)
- Lei Guo
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China.
| | - Mingyuan Li
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China
| | - Zicheng Cao
- School of Public Health (Shenzhen), Sun Yat-sen University, Guangzhou, 510006, People's Republic of China
| |
Collapse
|
21
|
Wang S, Wang X. Prediction of protein structural classes by different feature expressions based on 2-D wavelet denoising and fusion. BMC Bioinformatics 2019; 20:701. [PMID: 31874617 PMCID: PMC6929547 DOI: 10.1186/s12859-019-3276-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Protein structural class predicting is a heavily researched subject in bioinformatics that plays a vital role in protein functional analysis, protein folding recognition, rational drug design and other related fields. However, when traditional feature expression methods are adopted, the features usually contain considerable redundant information, which leads to a very low recognition rate of protein structural classes. RESULTS We constructed a prediction model based on wavelet denoising using different feature expression methods. A new fusion idea, first fuse and then denoise, is proposed in this article. Two types of pseudo amino acid compositions are utilized to distill feature vectors. Then, a two-dimensional (2-D) wavelet denoising algorithm is used to remove the redundant information from two extracted feature vectors. The two feature vectors based on parallel 2-D wavelet denoising are fused, which is known as PWD-FU-PseAAC. The related source codes are available at https://github.com/Xiaoheng-Wang12/Wang-xiaoheng/tree/master. CONCLUSIONS Experimental verification of three low-similarity datasets suggests that the proposed model achieves notably good results as regarding the prediction of protein structural classes.
Collapse
Affiliation(s)
- Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China.
| | - Xiaoheng Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, People's Republic of China
| |
Collapse
|
22
|
Zhu XJ, Feng CQ, Lai HY, Chen W, Hao L. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2018.10.007] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
23
|
A novel feature selection method to predict protein structural class. Comput Biol Chem 2018; 76:118-129. [DOI: 10.1016/j.compbiolchem.2018.06.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Revised: 05/14/2018] [Accepted: 06/30/2018] [Indexed: 01/05/2023]
|
24
|
Wang J, Yang B, Revote J, Leier A, Marquez-Lago TT, Webb G, Song J, Chou KC, Lithgow T. POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles. Bioinformatics 2018; 33:2756-2758. [PMID: 28903538 DOI: 10.1093/bioinformatics/btx302] [Citation(s) in RCA: 107] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2017] [Accepted: 05/09/2017] [Indexed: 11/13/2022] Open
Abstract
Summary Evolutionary information in the form of a Position-Specific Scoring Matrix (PSSM) is a widely used and highly informative representation of protein sequences. Accordingly, PSSM-based feature descriptors have been successfully applied to improve the performance of various predictors of protein attributes. Even though a number of algorithms have been proposed in previous studies, there is currently no universal web server or toolkit available for generating this wide variety of descriptors. Here, we present POSSUM ( Po sition- S pecific S coring matrix-based feat u re generator for m achine learning), a versatile toolkit with an online web server that can generate 21 types of PSSM-based feature descriptors, thereby addressing a crucial need for bioinformaticians and computational biologists. We envisage that this comprehensive toolkit will be widely used as a powerful tool to facilitate feature extraction, selection, and benchmarking of machine learning-based models, thereby contributing to a more effective analysis and modeling pipeline for bioinformatics research. Availability and implementation http://possum.erc.monash.edu/ . Contact trevor.lithgow@monash.edu or jiangning.song@monash.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jiawei Wang
- Biomedicine Discovery Institute, Monash University, VIC 3800, Australia
| | - Bingjiao Yang
- College of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Jerico Revote
- Biomedicine Discovery Institute, Monash University, VIC 3800, Australia
| | - André Leier
- Informatics Institute and Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Tatiana T Marquez-Lago
- Informatics Institute and Department of Genetics, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Geoffrey Webb
- Monash Centre for Data Science, Faculty of Information Technology
| | - Jiangning Song
- Biomedicine Discovery Institute, Monash University, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology.,ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, VIC 3800, Australia
| | - Kuo-Chen Chou
- Gordon Life Science Institute, Boston, MA 02478, USA.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Trevor Lithgow
- Biomedicine Discovery Institute, Monash University, VIC 3800, Australia
| |
Collapse
|
25
|
Liang Y, Zhang S. Predict protein structural class by incorporating two different modes of evolutionary information into Chou's general pseudo amino acid composition. J Mol Graph Model 2017; 78:110-117. [DOI: 10.1016/j.jmgm.2017.10.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 10/03/2017] [Accepted: 10/03/2017] [Indexed: 11/27/2022]
|
26
|
Yu B, Lou L, Li S, Zhang Y, Qiu W, Wu X, Wang M, Tian B. Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising. J Mol Graph Model 2017; 76:260-273. [DOI: 10.1016/j.jmgm.2017.07.012] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 07/11/2017] [Accepted: 07/12/2017] [Indexed: 11/25/2022]
|
27
|
Yuan M, Yang Z, Huang G, Ji G. Feature selection by maximizing correlation information for integrated high-dimensional protein data. Pattern Recognit Lett 2017. [DOI: 10.1016/j.patrec.2017.03.011] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
28
|
Carneiro RF, Torres RCF, Chaves RP, de Vasconcelos MA, de Sousa BL, Goveia ACR, Arruda FV, Matos MNC, Matthews-Cascon H, Freire VN, Teixeira EH, Nagano CS, Sampaio AH. Purification, Biochemical Characterization, and Amino Acid Sequence of a Novel Type of Lectin from Aplysia dactylomela Eggs with Antibacterial/Antibiofilm Potential. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2017; 19:49-64. [PMID: 28150103 DOI: 10.1007/s10126-017-9728-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 01/08/2017] [Indexed: 06/06/2023]
Abstract
A new lectin from Aplysia dactylomela eggs (ADEL) was isolated by affinity chromatography on HCl-activated Sepharose™ media. Hemagglutination caused by ADEL was inhibited by several galactosides, mainly galacturonic acid (Ka = 6.05 × 106 M-1). The primary structure of ADEL consists of 217 residues, including 11 half-cystines involved in five intrachain and one interchain disulfide bond, resulting in a molecular mass of 57,228 ± 2 Da, as determined by matrix-assisted laser desorption/ionization time of flight mass spectrometry. ADEL showed high similarity with lectins isolated from Aplysia eggs, but not with other known lectins, indicating that these lectins could be grouped into a new family of animal lectins. Three glycosylation sites were found in its polypeptide backbone. Data from peptide-N-glycosidase F digestion and MS suggest that all oligosaccharides attached to ADEL are high in mannose. The secondary structure of ADEL is predominantly β-sheet, and its tertiary structure is sensitive to the presence of ligands, as observed by CD. A 3D structure model of ADEL was created and shows two domains connected by a short loop. Domain A is composed of a flat three-stranded and a curved five-stranded β-sheet, while domain B presents a flat three-stranded and a curved four-stranded β-sheet. Molecular docking revealed favorable binding energies for interactions between lectin and galacturonic acid, lactose, galactosamine, and galactose. Moreover, ADEL was able to agglutinate and inhibit biofilm formation of Staphylococcus aureus, suggesting that this lectin may be a potential alternative to conventional use of antimicrobial agents in the treatment of infections caused by Staphylococcal biofilms.
Collapse
Affiliation(s)
- Rômulo Farias Carneiro
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, Av. Mister Hull, Box 6043, Fortaleza, Ceará, 60440-970, Brazil
| | - Renato Cézar Farias Torres
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, Av. Mister Hull, Box 6043, Fortaleza, Ceará, 60440-970, Brazil
| | - Renata Pinheiro Chaves
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, Av. Mister Hull, Box 6043, Fortaleza, Ceará, 60440-970, Brazil
| | - Mayron Alves de Vasconcelos
- Laboratório Integrado de Biomoléculas - LIBS, Departamento de Patologia e Medicina Legal, Universidade Federal do Ceará, Monsenhor Furtado, s/n, Fortaleza, Ceará, 60430-160, Brazil
| | - Bruno Lopes de Sousa
- Departamento de Física, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, Fortaleza, Ceará, 60440-970, Brazil
| | - André Castelo Rodrigues Goveia
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, Av. Mister Hull, Box 6043, Fortaleza, Ceará, 60440-970, Brazil
| | - Francisco Vassiliepe Arruda
- Laboratório Integrado de Biomoléculas - LIBS, Departamento de Patologia e Medicina Legal, Universidade Federal do Ceará, Monsenhor Furtado, s/n, Fortaleza, Ceará, 60430-160, Brazil
| | - Maria Nágila Carneiro Matos
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, Av. Mister Hull, Box 6043, Fortaleza, Ceará, 60440-970, Brazil
| | - Helena Matthews-Cascon
- Laboratório de Invertebrados Marinhos do Ceará - LIMCE, Departamento de Biologia, Universidade Federal do Ceará, Campus do Pici s/n, bloco 906, Fortaleza, CE, 60455-760, Brazil
| | - Valder Nogueira Freire
- Departamento de Física, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, Fortaleza, Ceará, 60440-970, Brazil
| | - Edson Holanda Teixeira
- Laboratório Integrado de Biomoléculas - LIBS, Departamento de Patologia e Medicina Legal, Universidade Federal do Ceará, Monsenhor Furtado, s/n, Fortaleza, Ceará, 60430-160, Brazil
| | - Celso Shiniti Nagano
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, Av. Mister Hull, Box 6043, Fortaleza, Ceará, 60440-970, Brazil
| | - Alexandre Holanda Sampaio
- Laboratório de Biotecnologia Marinha - BioMar-Lab, Departamento de Engenharia de Pesca, Universidade Federal do Ceará, Campus do Pici s/n, bloco 871, Av. Mister Hull, Box 6043, Fortaleza, Ceará, 60440-970, Brazil.
| |
Collapse
|
29
|
Xu Y, Li L, Ding J, Wu LY, Mai G, Zhou F. Gly-PseAAC: Identifying protein lysine glycation through sequences. Gene 2017; 602:1-7. [DOI: 10.1016/j.gene.2016.11.021] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Revised: 08/29/2016] [Accepted: 11/10/2016] [Indexed: 11/29/2022]
|
30
|
Fan GL, Liu YL, Wang H. Identification of thermophilic proteins by incorporating evolutionary and acid dissociation information into Chou's general pseudo amino acid composition. J Theor Biol 2016; 407:138-142. [DOI: 10.1016/j.jtbi.2016.07.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Revised: 06/24/2016] [Accepted: 07/07/2016] [Indexed: 10/21/2022]
|
31
|
Kong L, Kong L, Jing R. Improving the Prediction of Protein Structural Class for Low-Similarity Sequences by Incorporating Evolutionaryand Structural Information. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2016. [DOI: 10.20965/jaciii.2016.p0402] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Protein structural class prediction is beneficial to study protein function, regulation and interactions. However, protein structural class prediction for low-similarity sequences (i.e., below 40% in pairwise sequence similarity) remains a challenging problem at present. In this study, a novel computational method is proposed to accurately predict protein structural class for low-similarity sequences. This method is based on support vector machine in conjunction with integrated features from evolutionary information generated with position specific iterative basic local alignment search tool (PSI-BLAST) and predicted secondary structure. Various prediction accuracies evaluated by the jackknife tests are reported on two widely-used low-similarity benchmark datasets (25PDB and 1189), reaching overall accuracies 89.3% and 87.9%, which are significantly higher than those achieved by state-of-the-art in protein structural class prediction. The experimental results suggest that our method could serve as an effective alternative to existing methods in protein structural classification, especially for low-similarity sequences.
Collapse
|
32
|
Prediction of sumoylation sites in proteins using linear discriminant analysis. Gene 2016; 576:99-104. [DOI: 10.1016/j.gene.2015.09.072] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 08/24/2015] [Accepted: 09/28/2015] [Indexed: 01/05/2023]
|
33
|
Liu L, Cui J, Zhou J. A Novel Prediction Method of Protein Structural Classes Based on Protein Super-Secondary Structure. ACTA ACUST UNITED AC 2016. [DOI: 10.4236/jcc.2016.415005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
34
|
Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:370756. [PMID: 26788119 PMCID: PMC4693000 DOI: 10.1155/2015/370756] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 11/19/2015] [Accepted: 12/01/2015] [Indexed: 11/17/2022]
Abstract
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Collapse
|
35
|
Xu Y, Ding YX, Ding J, Wu LY, Deng NY. Phogly–PseAAC: Prediction of lysine phosphoglycerylation in proteins incorporating with position-specific propensity. J Theor Biol 2015; 379:10-5. [DOI: 10.1016/j.jtbi.2015.04.016] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2015] [Revised: 03/17/2015] [Accepted: 04/11/2015] [Indexed: 01/04/2023]
|
36
|
Abbass J, Nebel JC. Customised fragments libraries for protein structure prediction based on structural class annotations. BMC Bioinformatics 2015; 16:136. [PMID: 25925397 PMCID: PMC4419399 DOI: 10.1186/s12859-015-0576-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 04/17/2015] [Indexed: 12/05/2022] Open
Abstract
Background Since experimental techniques are time and cost consuming, in silico protein structure prediction is essential to produce conformations of protein targets. When homologous structures are not available, fragment-based protein structure prediction has become the approach of choice. However, it still has many issues including poor performance when targets’ lengths are above 100 residues, excessive running times and sub-optimal energy functions. Taking advantage of the reliable performance of structural class prediction software, we propose to address some of the limitations of fragment-based methods by integrating structural constraints in their fragment selection process. Results Using Rosetta, a state-of-the-art fragment-based protein structure prediction package, we evaluated our proposed pipeline on 70 former CASP targets containing up to 150 amino acids. Using either CATH or SCOP-based structural class annotations, enhancement of structure prediction performance is highly significant in terms of both GDT_TS (at least +2.6, p-values < 0.0005) and RMSD (−0.4, p-values < 0.005). Although CATH and SCOP classifications are different, they perform similarly. Moreover, proteins from all structural classes benefit from the proposed methodology. Further analysis also shows that methods relying on class-based fragments produce conformations which are more relevant to user and converge quicker towards the best model as estimated by GDT_TS (up to 10% in average). This substantiates our hypothesis that usage of structurally relevant templates conducts to not only reducing the size of the conformation space to be explored, but also focusing on a more relevant area. Conclusions Since our methodology produces models the quality of which is up to 7% higher in average than those generated by a standard fragment-based predictor, we believe it should be considered before conducting any fragment-based protein structure prediction. Despite such progress, ab initio prediction remains a challenging task, especially for proteins of average and large sizes. Apart from improving search strategies and energy functions, integration of additional constraints seems a promising route, especially if they can be accurately predicted from sequence alone. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0576-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jad Abbass
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, UK.
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, UK.
| |
Collapse
|
37
|
Zhang L, Zhao X, Kong L. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou's pseudo amino acid composition. J Theor Biol 2014; 355:105-10. [PMID: 24735902 DOI: 10.1016/j.jtbi.2014.04.008] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 02/26/2014] [Accepted: 04/04/2014] [Indexed: 10/25/2022]
Abstract
Knowledge of protein structural class plays an important role in characterizing the overall folding type of a given protein. At present, it is still a challenge to extract sequence information solely using protein sequence for protein structural class prediction with low similarity sequence in the current computational biology. In this study, a novel sequence representation method is proposed based on position specific scoring matrix for protein structural class prediction. By defined evolutionary difference formula, varying length proteins are expressed as uniform dimensional vectors, which can represent evolutionary difference information between the adjacent residues of a given protein. To perform and evaluate the proposed method, support vector machine and jackknife tests are employed on three widely used datasets, 25PDB, 1189 and 640 datasets with sequence similarity lower than 25%, 40% and 25%, respectively. Comparison of our results with the previous methods shows that our method may provide a promising method to predict protein structural class especially for low-similarity sequences.
Collapse
Affiliation(s)
- Lichao Zhang
- College of Marine Life Science, Ocean University of China, Yushan Road, Qingdao 266003, PR China
| | - Xiqiang Zhao
- College of Mathematical Science, Ocean University of China, Songling Road, Qingdao 266100, PR China.
| | - Liang Kong
- College of Mathematics and Information Technology, Hebei Normal University of Science and Technology, Qinhuangdao 066004, PR China
| |
Collapse
|
38
|
Kong L, Zhang L. Novel structure-driven features for accurate prediction of protein structural class. Genomics 2014; 103:292-7. [DOI: 10.1016/j.ygeno.2014.04.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2013] [Revised: 04/05/2014] [Accepted: 04/07/2014] [Indexed: 11/25/2022]
|
39
|
PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations. PLoS One 2014; 9:e92863. [PMID: 24675610 PMCID: PMC3968047 DOI: 10.1371/journal.pone.0092863] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 02/27/2014] [Indexed: 02/05/2023] Open
Abstract
Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.
Collapse
|