Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang S, Ye F, Yuan X. Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM. J Biomol Struct Dyn 2012;29:634-42. [PMID: 22545994 DOI: 10.1080/07391102.2011.672627] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

For:	Zhang S, Ye F, Yuan X. Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM. J Biomol Struct Dyn 2012;29:634-42. [PMID: 22545994 DOI: 10.1080/07391102.2011.672627] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Number

Cited by Other Article(s)

Zhang T, Jia Y, Li H, Xu D, Zhou J, Wang G. CRISPRCasStack: a stacking strategy-based ensemble learning framework for accurate identification of Cas proteins. Brief Bioinform 2022;23:6674167. [PMID: 35998924 DOI: 10.1093/bib/bbac335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 07/13/2022] [Accepted: 07/23/2022] [Indexed: 11/12/2022] Open

Mohammadi A, Zahiri J, Mohammadi S, Khodarahmi M, Arab SS. PSSMCOOL: A Comprehensive R Package for Generating Evolutionary-based Descriptors of Protein Sequences from PSSM Profiles. BIOLOGY METHODS AND PROTOCOLS 2022;7:bpac008. [PMID: 35388370 PMCID: PMC8977839 DOI: 10.1093/biomethods/bpac008] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 01/21/2022] [Indexed: 11/14/2022]

Gong Y, Dong B, Zhang Z, Zhai Y, Gao B, Zhang T, Zhang J. VTP-Identifier: Vesicular Transport Proteins Identification Based on PSSM Profiles and XGBoost. Front Genet 2022;12:808856. [PMID: 35047020 PMCID: PMC8762342 DOI: 10.3389/fgene.2021.808856] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/29/2021] [Indexed: 11/13/2022] Open

Rout RK, Umer S, Sheikh S, Sindhwani S, Pati S. EightyDVec: a method for protein sequence similarity analysis using physicochemical properties of amino acids. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING: IMAGING & VISUALIZATION 2022. [DOI: 10.1080/21681163.2021.1956369] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Bankapur S, Patil N. Enhanced Protein Structural Class Prediction Using Effective Feature Modeling and Ensemble of Classifiers. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2409-2419. [PMID: 32149653 DOI: 10.1109/tcbb.2020.2979430] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Abstract

Protein Secondary Structural Class (PSSC) information is important in investigating further challenges of protein sequences like protein fold recognition, protein tertiary structure prediction, and analysis of protein functions for drug discovery. Identification of PSSC using biological methods is time-consuming and cost-intensive. Several computational models have been developed to predict the structural class; however, they lack in generalization of the model. Hence, predicting PSSC based on protein sequences is still proving to be an uphill task. In this article, we proposed an effective, novel and generalized prediction model consisting of a feature modeling and an ensemble of classifiers. The proposed feature modeling extracts discriminating information (features) by leveraging three techniques: (i) Embedding - features are extracted on the basis of spatial residue arrangements of the sequences using word embedding approaches; (ii) SkipXGram Bi-gram - various sets of skipped bi-gram features are extracted from the sequences; and (iii) General Statistical (GS) based features are extracted which covers the global information of structural sequences. The combined effective sets of features are trained and classified using an ensemble of three classifiers: Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Machines (GBM). The proposed model when assessed on five benchmark datasets (high and low sequence similarity), viz. z277, z498, 25PDB, 1189, and FC699, reported an overall accuracy of 93.55, 97.58, 81.82, 81.11, and 93.93 percent respectively. The proposed model is further validated on a large-scale updated low similarity ( ≤ 25%) dataset, where it achieved an overall accuracy of 81.11 percent. The proposed generalized model is robust and consistently outperformed several state-of-the-art models on all the five benchmark datasets.

Collapse

Zervou MA, Doutsi E, Pavlidis P, Tsakalides P. Structural classification of proteins based on the computationally efficient recurrence quantification analysis and horizontal visibility graphs. Bioinformatics 2021;37:1796-1804. [PMID: 34048559 DOI: 10.1093/bioinformatics/btab407] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 04/13/2021] [Accepted: 05/27/2021] [Indexed: 11/14/2022] Open

Zhang Q, Liu P, Wang X, Zhang Y, Han Y, Yu B. StackPDB: Predicting DNA-binding proteins based on XGB-RFE feature optimization and stacked ensemble classifier. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106921] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Zhang J, Lv L, Lu D, Kong D, Al-Alashaari MAA, Zhao X. Variable selection from a feature representing protein sequences: a case of classification on bacterial type IV secreted effectors. BMC Bioinformatics 2020;21:480. [PMID: 33109082 PMCID: PMC7590791 DOI: 10.1186/s12859-020-03826-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 10/19/2020] [Indexed: 12/13/2022] Open

Abstract

Background

Classification of certain proteins with specific functions is momentous for biological research. Encoding approaches of protein sequences for feature extraction play an important role in protein classification. Many computational methods (namely classifiers) are used for classification on protein sequences according to various encoding approaches. Commonly, protein sequences keep certain labels corresponding to different categories of biological functions (e.g., bacterial type IV secreted effectors or not), which makes protein prediction a fantasy. As to protein prediction, a kernel set of protein sequences keeping certain labels certified by biological experiments should be existent in advance. However, it has been hardly ever seen in prevailing researches. Therefore, unsupervised learning rather than supervised learning (e.g. classification) should be considered. As to protein classification, various classifiers may help to evaluate the effectiveness of different encoding approaches. Besides, variable selection from an encoded feature representing protein sequences is an important issue that also needs to be considered.

Results

Focusing on the latter problem, we propose a new method for variable selection from an encoded feature representing protein sequences. Taking a benchmark dataset containing 1947 protein sequences as a case, experiments are made to identify bacterial type IV secreted effectors (T4SE) from protein sequences, which are composed of 399 T4SE and 1548 non-T4SE. Comparable and quantified results are obtained only using certain components of the encoded feature, i.e., position-specific scoring matix, and that indicates the effectiveness of our method.

Conclusions

Certain variables other than an encoded feature they belong to do work for discrimination between different types of proteins. In addition, ensemble classifiers with an automatic assignment of different base classifiers do achieve a better classification result.

Collapse

Zhang Y, Yu S, Xie R, Li J, Leier A, Marquez-Lago TT, Akutsu T, Smith AI, Ge Z, Wang J, Lithgow T, Song J. PeNGaRoo, a combined gradient boosting and ensemble learning framework for predicting non-classical secreted proteins. Bioinformatics 2020;36:704-712. [PMID: 31393553 DOI: 10.1093/bioinformatics/btz629] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Revised: 07/17/2019] [Accepted: 08/07/2019] [Indexed: 12/17/2022] Open

Abstract

MOTIVATION

Gram-positive bacteria have developed secretion systems to transport proteins across their cell wall, a process that plays an important role during host infection. These secretion mechanisms have also been harnessed for therapeutic purposes in many biotechnology applications. Accordingly, the identification of features that select a protein for efficient secretion from these microorganisms has become an important task. Among all the secreted proteins, 'non-classical' secreted proteins are difficult to identify as they lack discernable signal peptide sequences and can make use of diverse secretion pathways. Currently, several computational methods have been developed to facilitate the discovery of such non-classical secreted proteins; however, the existing methods are based on either simulated or limited experimental datasets. In addition, they often employ basic features to train the models in a simple and coarse-grained manner. The availability of more experimentally validated datasets, advanced feature engineering techniques and novel machine learning approaches creates new opportunities for the development of improved predictors of 'non-classical' secreted proteins from sequence data.

RESULTS

In this work, we first constructed a high-quality dataset of experimentally verified 'non-classical' secreted proteins, which we then used to create benchmark datasets. Using these benchmark datasets, we comprehensively analyzed a wide range of features and assessed their individual performance. Subsequently, we developed a two-layer Light Gradient Boosting Machine (LightGBM) ensemble model that integrates several single feature-based models into an overall prediction framework. At this stage, LightGBM, a gradient boosting machine, was used as a machine learning approach and the necessary parameter optimization was performed by a particle swarm optimization strategy. All single feature-based LightGBM models were then integrated into a unified ensemble model to further improve the predictive performance. Consequently, the final ensemble model achieved a superior performance with an accuracy of 0.900, an F-value of 0.903, Matthew's correlation coefficient of 0.803 and an area under the curve value of 0.963, and outperforming previous state-of-the-art predictors on the independent test. Based on our proposed optimal ensemble model, we further developed an accessible online predictor, PeNGaRoo, to serve users' demands. We believe this online web server, together with our proposed methodology, will expedite the discovery of non-classically secreted effector proteins in Gram-positive bacteria and further inspire the development of next-generation predictors.

AVAILABILITY AND IMPLEMENTATION

http://pengaroo.erc.monash.edu/.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Affiliation(s)

Yanju Zhang Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
Sha Yu Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China.,Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, VIC 3800, Australia
Ruopeng Xie Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China.,Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, VIC 3800, Australia
Jiahui Li Bioinformatics Group, School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China.,Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
André Leier Department of Genetics, AL, USA.,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
Tatiana T Marquez-Lago Department of Genetics, AL, USA.,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
Tatsuya Akutsu Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
A Ian Smith Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, VIC 3800, Australia.,ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, VIC 3800, Australia
Zongyuan Ge Monash e-Research Centre and Faculty of Engineering, Monash University, Melbourne, VIC 3800, Australia
Jiawei Wang Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
Trevor Lithgow Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
Jiangning Song Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, VIC 3800, Australia.,ARC Centre of Excellence in Advanced Molecular Imaging, Monash University, VIC 3800, Australia

Collapse

Ge Y, Zhao S, Zhao X. A step-by-step classification algorithm of protein secondary structures based on double-layer SVM model. Genomics 2020;112:1941-1946. [DOI: 10.1016/j.ygeno.2019.11.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Revised: 10/15/2019] [Accepted: 11/11/2019] [Indexed: 11/26/2022]

Kong M, Zhang Y, Xu D, Chen W, Dehmer M. FCTP-WSRC: Protein-Protein Interactions Prediction via Weighted Sparse Representation Based Classification. Front Genet 2020;11:18. [PMID: 32117437 PMCID: PMC7010952 DOI: 10.3389/fgene.2020.00018] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 01/07/2020] [Indexed: 12/21/2022] Open

Identification of amyloidogenic peptides via optimized integrated features space based on physicochemical properties and PSSM. Anal Biochem 2019;583:113362. [PMID: 31310738 DOI: 10.1016/j.ab.2019.113362] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2019] [Revised: 07/09/2019] [Accepted: 07/12/2019] [Indexed: 01/08/2023]

Ali F, Ahmed S, Swati ZNK, Akbar S. DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information. J Comput Aided Mol Des 2019;33:645-658. [PMID: 31123959 DOI: 10.1007/s10822-019-00207-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 05/18/2019] [Indexed: 12/28/2022]

Abstract

DNA-binding proteins (DBPs) participate in various biological processes including DNA replication, recombination, and repair. In the human genome, about 6-7% of these proteins are utilized for genes encoding. DBPs shape the DNA into a compact structure known chromatin while some of these proteins regulate the chromosome packaging and transcription process. In the pharmaceutical industry, DBPs are used as a key component of antibiotics, steroids, and cancer drugs. These proteins also involve in biophysical, biological, and biochemical studies of DNA. Due to the crucial role in various biological activities, identification of DBPs is a hot issue in protein science. A series of experimental and computational methods have been proposed, however, some methods didn't achieve the desired results while some are inadequate in its accuracy and authenticity. Still, it is highly desired to present more intelligent computational predictors. In this work, we introduce an innovative computational method namely DP-BINDER based on physicochemical and evolutionary information. We captured local highly decisive features from physicochemical properties of primary protein sequences via normalized Moreau-Broto autocorrelation (NMBAC) and evolutionary information by position specific scoring matrix-transition probability composition (PSSM-TPC) and pseudo-position specific scoring matrix (PsePSSM) using training and independent datasets. The optimal features were selected by the support vector machine-recursive feature elimination and correlation bias reduction (SVM-RFE + CBR) from fused features and were fed into random forest (RF) and support vector machine (SVM). Our method attained 92.46% and 89.58% accuracy with jackknife and ten-fold cross-validation, respectively on the training dataset, while 81.17% accuracy on the independent dataset for prediction of DBPs. These results demonstrate that our method attained the highest success rate in the literature. The superiority of DP-BINDER over existing approaches due to several reasons including abstraction of local dominant features via effective feature descriptors, utilization of appropriate feature selection algorithms and effective classifier.

Collapse

Wang J, Li J, Yang B, Xie R, Marquez-Lago TT, Leier A, Hayashida M, Akutsu T, Zhang Y, Chou KC, Selkrig J, Zhou T, Song J, Lithgow T. Bastion3: a two-layer ensemble predictor of type III secreted effectors. Bioinformatics 2018;35:2017-2028. [PMID: 30388198 PMCID: PMC7963071 DOI: 10.1093/bioinformatics/bty914] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 10/15/2018] [Accepted: 10/31/2018] [Indexed: 01/31/2023] Open

Abstract

MOTIVATION

Type III secreted effectors (T3SEs) can be injected into host cell cytoplasm via type III secretion systems (T3SSs) to modulate interactions between Gram-negative bacterial pathogens and their hosts. Due to their relevance in pathogen-host interactions, significant computational efforts have been put toward identification of T3SEs and these in turn have stimulated new T3SE discoveries. However, as T3SEs with new characteristics are discovered, these existing computational tools reveal important limitations: (i) most of the trained machine learning models are based on the N-terminus (or incorporating also the C-terminus) instead of the proteins' complete sequences, and (ii) the underlying models (trained with classic algorithms) employed only few features, most of which were extracted based on sequence-information alone. To achieve better T3SE prediction, we must identify more powerful, informative features and investigate how to effectively integrate these into a comprehensive model.

RESULTS

In this work, we present Bastion3, a two-layer ensemble predictor developed to accurately identify type III secreted effectors from protein sequence data. In contrast with existing methods that employ single models with few features, Bastion3 explores a wide range of features, from various types, trains single models based on these features and finally integrates these models through ensemble learning. We trained the models using a new gradient boosting machine, LightGBM and further boosted the models' performances through a novel genetic algorithm (GA) based two-step parameter optimization strategy. Our benchmark test demonstrates that Bastion3 achieves a much better performance compared to commonly used methods, with an ACC value of 0.959, F-value of 0.958, MCC value of 0.917 and AUC value of 0.956, which comprehensively outperformed all other toolkits by more than 5.6% in ACC value, 5.7% in F-value, 12.4% in MCC value and 5.8% in AUC value. Based on our proposed two-layer ensemble model, we further developed a user-friendly online toolkit, maximizing convenience for experimental scientists toward T3SE prediction. With its design to ease future discoveries of novel T3SEs and improved performance, Bastion3 is poised to become a widely used, state-of-the-art toolkit for T3SE prediction.

AVAILABILITY AND IMPLEMENTATION

http://bastion3.erc.monash.edu/.

CONTACT

selkrig@embl.de or wyztli@163.com or or trevor.lithgow@monash.edu.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Affiliation(s)

Jiawei Wang Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia
Jiahui Li Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia,Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
Bingjiao Yang School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
Ruopeng Xie School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
Tatiana T Marquez-Lago Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
André Leier Department of Genetics, School of Medicine, University of Alabama at Birmingham, AL, USA,Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, AL, USA
Morihiro Hayashida National Institute of Technology, Matsue College, Matsue, Shimane, Japan
Tatsuya Akutsu Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
Yanju Zhang School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
Kuo-Chen Chou Gordon Life Science Institute, Boston, MA, USA,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China,Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia
Joel Selkrig European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
Tieli Zhou Department of Clinical Laboratory, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
Jiangning Song To whom correspondence should be addressed.
Trevor Lithgow Infection and Immunity Program, Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC, Australia

Collapse

Zhang S, Liang Y. Predicting apoptosis protein subcellular localization by integrating auto-cross correlation and PSSM into Chou’s PseAAC. J Theor Biol 2018;457:163-169. [DOI: 10.1016/j.jtbi.2018.08.042] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Revised: 08/25/2018] [Accepted: 08/31/2018] [Indexed: 10/28/2022]

Göktepe YE, Kodaz H. Prediction of Protein-Protein Interactions Using An Effective Sequence Based Combined Method. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.03.062] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]

Wang J, Yang B, Revote J, Leier A, Marquez-Lago TT, Webb G, Song J, Chou KC, Lithgow T. POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles. Bioinformatics 2018;33:2756-2758. [PMID: 28903538 DOI: 10.1093/bioinformatics/btx302] [Citation(s) in RCA: 107] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2017] [Accepted: 05/09/2017] [Indexed: 11/13/2022] Open

Srivastava A, Kumar M. Prediction of zinc binding sites in proteins using sequence derived information. J Biomol Struct Dyn 2018;36:4413-4423. [PMID: 29241411 DOI: 10.1080/07391102.2017.1417910] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

An Ensemble Classifier with Random Projection for Predicting Protein–Protein Interactions Using Sequence and Evolutionary Information. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8010089] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]

Xie S, Li Z, Hu H. Protein secondary structure prediction based on the fuzzy support vector machine with the hyperplane optimization. Gene 2017;642:74-83. [PMID: 29104167 DOI: 10.1016/j.gene.2017.11.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 10/29/2017] [Accepted: 11/02/2017] [Indexed: 11/30/2022]

Liang Y, Zhang S. Predict protein structural class by incorporating two different modes of evolutionary information into Chou's general pseudo amino acid composition. J Mol Graph Model 2017;78:110-117. [DOI: 10.1016/j.jmgm.2017.10.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 10/03/2017] [Accepted: 10/03/2017] [Indexed: 11/27/2022]

Yu B, Lou L, Li S, Zhang Y, Qiu W, Wu X, Wang M, Tian B. Prediction of protein structural class for low-similarity sequences using Chou’s pseudo amino acid composition and wavelet denoising. J Mol Graph Model 2017;76:260-273. [DOI: 10.1016/j.jmgm.2017.07.012] [Citation(s) in RCA: 60] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Revised: 07/11/2017] [Accepted: 07/12/2017] [Indexed: 11/25/2022]

Li Y, Song T, Yang J, Zhang Y, Yang J. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids. PLoS One 2016;11:e0167430. [PMID: 27918587 PMCID: PMC5137889 DOI: 10.1371/journal.pone.0167430] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2016] [Accepted: 11/14/2016] [Indexed: 11/30/2022] Open

Kong L, Kong L, Jing R. Improving the Prediction of Protein Structural Class for Low-Similarity Sequences by Incorporating Evolutionaryand Structural Information. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS 2016. [DOI: 10.20965/jaciii.2016.p0402] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Yang R, Zhang C, Gao R, Zhang L. A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data. Int J Mol Sci 2016;17:218. [PMID: 26861308 PMCID: PMC4783950 DOI: 10.3390/ijms17020218] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Accepted: 01/26/2016] [Indexed: 01/08/2023] Open

Chen J, Xu H, He PA, Dai Q, Yao Y. A multiple information fusion method for predicting subcellular locations of two different types of bacterial protein simultaneously. Biosystems 2016;139:37-45. [DOI: 10.1016/j.biosystems.2015.12.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Revised: 10/08/2015] [Accepted: 12/10/2015] [Indexed: 12/14/2022]

Prediction of Protein Structural Classes for Low-Similarity Sequences Based on Consensus Sequence and Segmented PSSM. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015;2015:370756. [PMID: 26788119 PMCID: PMC4693000 DOI: 10.1155/2015/370756] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 11/19/2015] [Accepted: 12/01/2015] [Indexed: 11/17/2022]

Briefing in application of machine learning methods in ion channel prediction. ScientificWorldJournal 2015;2015:945927. [PMID: 25961077 PMCID: PMC4415473 DOI: 10.1155/2015/945927] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2014] [Accepted: 09/11/2014] [Indexed: 01/09/2023] Open

Abbass J, Nebel JC. Customised fragments libraries for protein structure prediction based on structural class annotations. BMC Bioinformatics 2015;16:136. [PMID: 25925397 PMCID: PMC4419399 DOI: 10.1186/s12859-015-0576-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 04/17/2015] [Indexed: 12/05/2022] Open

Abstract

Background

Since experimental techniques are time and cost consuming, in silico protein structure prediction is essential to produce conformations of protein targets. When homologous structures are not available, fragment-based protein structure prediction has become the approach of choice. However, it still has many issues including poor performance when targets’ lengths are above 100 residues, excessive running times and sub-optimal energy functions. Taking advantage of the reliable performance of structural class prediction software, we propose to address some of the limitations of fragment-based methods by integrating structural constraints in their fragment selection process.

Results

Using Rosetta, a state-of-the-art fragment-based protein structure prediction package, we evaluated our proposed pipeline on 70 former CASP targets containing up to 150 amino acids. Using either CATH or SCOP-based structural class annotations, enhancement of structure prediction performance is highly significant in terms of both GDT_TS (at least +2.6, p-values < 0.0005) and RMSD (−0.4, p-values < 0.005). Although CATH and SCOP classifications are different, they perform similarly. Moreover, proteins from all structural classes benefit from the proposed methodology. Further analysis also shows that methods relying on class-based fragments produce conformations which are more relevant to user and converge quicker towards the best model as estimated by GDT_TS (up to 10% in average). This substantiates our hypothesis that usage of structurally relevant templates conducts to not only reducing the size of the conformation space to be explored, but also focusing on a more relevant area.

Conclusions

Since our methodology produces models the quality of which is up to 7% higher in average than those generated by a standard fragment-based predictor, we believe it should be considered before conducting any fragment-based protein structure prediction. Despite such progress, ab initio prediction remains a challenging task, especially for proteins of average and large sizes. Apart from improving search strategies and energy functions, integration of additional constraints seems a promising route, especially if they can be accurately predicted from sequence alone.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0576-2) contains supplementary material, which is available to authorized users.

Collapse

Ding S, Yan S, Qi S, Li Y, Yao Y. A protein structural classes prediction method based on PSI-BLAST profile. J Theor Biol 2014;353:19-23. [DOI: 10.1016/j.jtbi.2014.02.034] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 01/27/2014] [Accepted: 02/24/2014] [Indexed: 11/27/2022]

Zhang L, Zhao X, Kong L. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou's pseudo amino acid composition. J Theor Biol 2014;355:105-10. [PMID: 24735902 DOI: 10.1016/j.jtbi.2014.04.008] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2013] [Revised: 02/26/2014] [Accepted: 04/04/2014] [Indexed: 10/25/2022]

PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations. PLoS One 2014;9:e92863. [PMID: 24675610 PMCID: PMC3968047 DOI: 10.1371/journal.pone.0092863] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 02/27/2014] [Indexed: 02/05/2023] Open

Ding S, Li Y, Shi Z, Yan S. A protein structural classes prediction method based on predicted secondary structure and PSI-BLAST profile. Biochimie 2014;97:60-5. [DOI: 10.1016/j.biochi.2013.09.013] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Accepted: 09/16/2013] [Indexed: 10/26/2022]

Dehzangi A, Paliwal K, Lyons J, Sharma A, Sattar A. Proposing a highly accurate protein structural class predictor using segmentation-based features. BMC Genomics 2014;15 Suppl 1:S2. [PMID: 24564476 PMCID: PMC4046757 DOI: 10.1186/1471-2164-15-s1-s2] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Zhang S, Liang Y, Yuan X. Improving the prediction accuracy of protein structural class: Approached with alternating word frequency and normalized Lempel–Ziv complexity. J Theor Biol 2014;341:71-7. [DOI: 10.1016/j.jtbi.2013.10.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Revised: 09/08/2013] [Accepted: 10/08/2013] [Indexed: 10/26/2022]

Chang JM, Taly JF, Erb I, Sung TY, Hsu WL, Tang CY, Notredame C, Su ECY. Efficient and interpretable prediction of protein functional classes by correspondence analysis and compact set relations. PLoS One 2013;8:e75542. [PMID: 24146760 PMCID: PMC3795737 DOI: 10.1371/journal.pone.0075542] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Accepted: 08/19/2013] [Indexed: 11/24/2022] Open

Abstract

Predicting protein functional classes such as localization sites and modifications plays a crucial role in function annotation. Given a tremendous amount of sequence data yielded from high-throughput sequencing experiments, the need of efficient and interpretable prediction strategies has been rapidly amplified. Our previous approach for subcellular localization prediction, PSLDoc, archives high overall accuracy for Gram-negative bacteria. However, PSLDoc is computational intensive due to incorporation of homology extension in feature extraction and probabilistic latent semantic analysis in feature reduction. Besides, prediction results generated by support vector machines are accurate but generally difficult to interpret.

In this work, we incorporate three new techniques to improve efficiency and interpretability. First, homology extension is performed against a compact non-redundant database using a fast search model to reduce running time. Second, correspondence analysis (CA) is incorporated as an efficient feature reduction to generate a clear visual separation of different protein classes. Finally, functional classes are predicted by a combination of accurate compact set (CS) relation and interpretable one-nearest neighbor (1-NN) algorithm. Besides localization data sets, we also apply a human protein kinase set to validate generality of our proposed method.

Experiment results demonstrate that our method make accurate prediction in a more efficient and interpretable manner. First, homology extension using a fast search on a compact database can greatly accelerate traditional running time up to twenty-five times faster without sacrificing prediction performance. This suggests that computational costs of many other predictors that also incorporate homology information can be largely reduced. In addition, CA can not only efficiently identify discriminative features but also provide a clear visualization of different functional classes. Moreover, predictions based on CS achieve 100% precision. When combined with 1-NN on unpredicted targets by CS, our method attains slightly better or comparable performance compared with the state-of-the-art systems.

Collapse

You ZH, Lei YK, Zhu L, Xia J, Wang B. Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis. BMC Bioinformatics 2013;14 Suppl 8:S10. [PMID: 23815620 PMCID: PMC3654889 DOI: 10.1186/1471-2105-14-s8-s10] [Citation(s) in RCA: 150] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Protein-protein interactions (PPIs) play crucial roles in the execution of various cellular processes and form the basis of biological mechanisms. Although large amount of PPIs data for different species has been generated by high-throughput experimental techniques, current PPI pairs obtained with experimental methods cover only a fraction of the complete PPI networks, and further, the experimental methods for identifying PPIs are both time-consuming and expensive. Hence, it is urgent and challenging to develop automated computational methods to efficiently and accurately predict PPIs.

RESULTS

We present here a novel hierarchical PCA-EELM (principal component analysis-ensemble extreme learning machine) model to predict protein-protein interactions only using the information of protein sequences. In the proposed method, 11188 protein pairs retrieved from the DIP database were encoded into feature vectors by using four kinds of protein sequences information. Focusing on dimension reduction, an effective feature extraction method PCA was then employed to construct the most discriminative new feature set. Finally, multiple extreme learning machines were trained and then aggregated into a consensus classifier by majority voting. The ensembling of extreme learning machine removes the dependence of results on initial random weights and improves the prediction performance.

CONCLUSIONS

When performed on the PPI data of Saccharomyces cerevisiae, the proposed method achieved 87.00% prediction accuracy with 86.15% sensitivity at the precision of 87.59%. Extensive experiments are performed to compare our method with state-of-the-art techniques Support Vector Machine (SVM). Experimental results demonstrate that proposed PCA-EELM outperforms the SVM method by 5-fold cross-validation. Besides, PCA-EELM performs faster than PCA-SVM based method. Consequently, the proposed approach can be considered as a new promising and powerful tools for predicting PPI with excellent performance and less time.

Collapse

Exploring Potential Discriminatory Information Embedded in PSSM to Enhance Protein Structural Class Prediction Accuracy. ACTA ACUST UNITED AC 2013. [DOI: 10.1007/978-3-642-39159-0_19] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2023]