Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Qiu JD, Luo SH, Huang JH, Liang RP. Using support vector machines to distinguish enzymes: approached by incorporating wavelet transform. J Theor Biol 2008;256:625-31. [PMID: 19049810 DOI: 10.1016/j.jtbi.2008.10.026] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2008] [Revised: 09/26/2008] [Accepted: 10/20/2008] [Indexed: 10/21/2022]

For:	Qiu JD, Luo SH, Huang JH, Liang RP. Using support vector machines to distinguish enzymes: approached by incorporating wavelet transform. J Theor Biol 2008;256:625-31. [PMID: 19049810 DOI: 10.1016/j.jtbi.2008.10.026] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2008] [Revised: 09/26/2008] [Accepted: 10/20/2008] [Indexed: 10/21/2022]

Number

Cited by Other Article(s)

Buton N, Coste F, Le Cunff Y. Predicting enzymatic function of protein sequences with attention. Bioinformatics 2023;39:btad620. [PMID: 37874958 PMCID: PMC10612403 DOI: 10.1093/bioinformatics/btad620] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 09/11/2023] [Accepted: 10/22/2023] [Indexed: 10/26/2023] Open

Rappoport D, Jinich A. Enzyme Substrate Prediction from Three-Dimensional Feature Representations Using Space-Filling Curves. J Chem Inf Model 2023;63:1637-1648. [PMID: 36802628 DOI: 10.1021/acs.jcim.3c00005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]

Duhan N, Norton JM, Kaundal R. deepNEC: a novel alignment-free tool for the identification and classification of nitrogen biochemical network-related enzymes using deep learning. Brief Bioinform 2022;23:6553605. [PMID: 35325031 DOI: 10.1093/bib/bbac071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 01/25/2022] [Accepted: 02/10/2022] [Indexed: 11/12/2022] Open

Abstract

Nitrogen is essential for life and its transformations are an important part of the global biogeochemical cycle. Being an essential nutrient, nitrogen exists in a range of oxidation states from +5 (nitrate) to -3 (ammonium and amino-nitrogen), and its oxidation and reduction reactions catalyzed by microbial enzymes determine its environmental fate. The functional annotation of the genes encoding the core nitrogen network enzymes has a broad range of applications in metagenomics, agriculture, wastewater treatment and industrial biotechnology. This study developed an alignment-free computational approach to determine the predicted nitrogen biochemical network-related enzymes from the sequence itself. We propose deepNEC, a novel end-to-end feature selection and classification model training approach for nitrogen biochemical network-related enzyme prediction. The algorithm was developed using Deep Learning, a class of machine learning algorithms that uses multiple layers to extract higher-level features from the raw input data. The derived protein sequence is used as an input, extracting sequential and convolutional features from raw encoded protein sequences based on classification rather than traditional alignment-based methods for enzyme prediction. Two large datasets of protein sequences, enzymes and non-enzymes were used to train the models with protein sequence features like amino acid composition, dipeptide composition (DPC), conformation transition and distribution, normalized Moreau-Broto (NMBroto), conjoint and quasi order, etc. The k-fold cross-validation and independent testing were performed to validate our model training. deepNEC uses a four-tier approach for prediction; in the first phase, it will predict a query sequence as enzyme or non-enzyme; in the second phase, it will further predict and classify enzymes into nitrogen biochemical network-related enzymes or non-nitrogen metabolism enzymes; in the third phase, it classifies predicted enzymes into nine nitrogen metabolism classes; and in the fourth phase, it predicts the enzyme commission number out of 20 classes for nitrogen metabolism. Among all, the DPC + NMBroto hybrid feature gave the best prediction performance (accuracy of 96.15% in k-fold training and 93.43% in independent testing) with an Matthews correlation coefficient (0.92 training and 0.87 independent testing) in phase I; phase II (accuracy of 99.71% in k-fold training and 98.30% in independent testing); phase III (overall accuracy of 99.03% in k-fold training and 98.98% in independent testing); phase IV (overall accuracy of 99.05% in k-fold training and 98.18% in independent testing), the DPC feature gave the best prediction performance. We have also implemented a homology-based method to remove false negatives. All the models have been implemented on a web server (prediction tool), which is freely available at http://bioinfo.usu.edu/deepNEC/.

Collapse

Wang S, Deng L, Xia X, Cao Z, Fei Y. Predicting antifreeze proteins with weighted generalized dipeptide composition and multi-regression feature selection ensemble. BMC Bioinformatics 2021;22:340. [PMID: 34162327 PMCID: PMC8220696 DOI: 10.1186/s12859-021-04251-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 06/09/2021] [Indexed: 12/15/2022] Open

Abstract

BACKGROUND

Antifreeze proteins (AFPs) are a group of proteins that inhibit body fluids from growing to ice crystals and thus improve biological antifreeze ability. It is vital to the survival of living organisms in extremely cold environments. However, little research is performed on sequences feature extraction and selection for antifreeze proteins classification in the structure and function prediction, which is of great significance.

RESULTS

In this paper, to predict the antifreeze proteins, a feature representation of weighted generalized dipeptide composition (W-GDipC) and an ensemble feature selection based on two-stage and multi-regression method (LRMR-Ri) are proposed. Specifically, four feature selection algorithms: Lasso regression, Ridge regression, Maximal information coefficient and Relief are used to select the feature sets, respectively, which is the first stage of LRMR-Ri method. If there exists a common feature subset among the above four sets, it is the optimal subset; otherwise we use Ridge regression to select the optimal subset from the public set pooled by the four sets, which is the second stage of LRMR-Ri. The LRMR-Ri method combined with W-GDipC was performed both on the antifreeze proteins dataset (binary classification), and on the membrane protein dataset (multiple classification). Experimental results show that this method has good performance in support vector machine (SVM), decision tree (DT) and stochastic gradient descent (SGD). The values of ACC, RE and MCC of LRMR-Ri and W-GDipC with antifreeze proteins dataset and SVM classifier have reached as high as 95.56%, 97.06% and 0.9105, respectively, much higher than those of each single method: Lasso, Ridge, Mic and Relief, nearly 13% higher than single Lasso for ACC.

CONCLUSION

The experimental results show that the proposed LRMR-Ri and W-GDipC method can significantly improve the accuracy of antifreeze proteins prediction compared with other similar single feature methods. In addition, our method has also achieved good results in the classification and prediction of membrane proteins, which verifies its widely reliability to a certain extent.

Collapse

Semwal R, Aier I, Tyagi P, Varadwaj PK. DeEPn: a deep neural network based tool for enzyme functional annotation. J Biomol Struct Dyn 2020;39:2733-2743. [PMID: 32274968 DOI: 10.1080/07391102.2020.1754292] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Li Y, Wang S, Umarov R, Xie B, Fan M, Li L, Gao X. DEEPre: sequence-based enzyme EC number prediction by deep learning. Bioinformatics 2018;34:760-769. [PMID: 29069344 PMCID: PMC6030869 DOI: 10.1093/bioinformatics/btx680] [Citation(s) in RCA: 124] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 10/20/2017] [Indexed: 11/15/2022] Open

Dalkiran A, Rifaioglu AS, Martin MJ, Cetin-Atalay R, Atalay V, Doğan T. ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature. BMC Bioinformatics 2018;19:334. [PMID: 30241466 PMCID: PMC6150975 DOI: 10.1186/s12859-018-2368-y] [Citation(s) in RCA: 73] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 09/10/2018] [Indexed: 11/29/2022] Open

Morris JH, Apeltsin L, Newman AM, Baumbach J, Wittkop T, Su G, Bader GD, Ferrin TE. clusterMaker: a multi-algorithm clustering plugin for Cytoscape. BMC Bioinformatics 2011;12:436. [PMID: 22070249 PMCID: PMC3262844 DOI: 10.1186/1471-2105-12-436] [Citation(s) in RCA: 411] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2011] [Accepted: 11/09/2011] [Indexed: 12/02/2022] Open

Abstract

Background

In the post-genomic era, the rapid increase in high-throughput data calls for computational tools capable of integrating data of diverse types and facilitating recognition of biologically meaningful patterns within them. For example, protein-protein interaction data sets have been clustered to identify stable complexes, but scientists lack easily accessible tools to facilitate combined analyses of multiple data sets from different types of experiments. Here we present clusterMaker, a Cytoscape plugin that implements several clustering algorithms and provides network, dendrogram, and heat map views of the results. The Cytoscape network is linked to all of the other views, so that a selection in one is immediately reflected in the others. clusterMaker is the first Cytoscape plugin to implement such a wide variety of clustering algorithms and visualizations, including the only implementations of hierarchical clustering, dendrogram plus heat map visualization (tree view), k-means, k-medoid, SCPS, AutoSOME, and native (Java) MCL.

Results

Results are presented in the form of three scenarios of use: analysis of protein expression data using a recently published mouse interactome and a mouse microarray data set of nearly one hundred diverse cell/tissue types; the identification of protein complexes in the yeast Saccharomyces cerevisiae; and the cluster analysis of the vicinal oxygen chelate (VOC) enzyme superfamily. For scenario one, we explore functionally enriched mouse interactomes specific to particular cellular phenotypes and apply fuzzy clustering. For scenario two, we explore the prefoldin complex in detail using both physical and genetic interaction clusters. For scenario three, we explore the possible annotation of a protein as a methylmalonyl-CoA epimerase within the VOC superfamily. Cytoscape session files for all three scenarios are provided in the Additional Files section.

Conclusions

The Cytoscape plugin clusterMaker provides a number of clustering algorithms and visualizations that can be used independently or in combination for analysis and visualization of biological data sets, and for confirming or generating hypotheses about biological function. Several of these visualizations and algorithms are only available to Cytoscape users through the clusterMaker plugin. clusterMaker is available via the Cytoscape plugin manager.

Collapse

Qiu JD, Sun XY, Suo SB, Shi SP, Huang SY, Liang RP, Zhang L. Predicting homo-oligomers and hetero-oligomers by pseudo-amino acid composition: An approach from discrete wavelet transformation. Biochimie 2011;93:1132-8. [DOI: 10.1016/j.biochi.2011.03.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2010] [Accepted: 03/28/2011] [Indexed: 12/16/2022]

Shi SP, Qiu JD, Sun XY, Huang JH, Huang SY, Suo SB, Liang RP, Zhang L. Identify submitochondria and subchloroplast locations with pseudo amino acid composition: approach from the strategy of discrete wavelet transform feature extraction. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2011;1813:424-30. [PMID: 21255619 DOI: 10.1016/j.bbamcr.2011.01.011] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2010] [Revised: 01/06/2011] [Accepted: 01/10/2011] [Indexed: 12/11/2022]

AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol 2010;270:56-62. [PMID: 21056045 DOI: 10.1016/j.jtbi.2010.10.037] [Citation(s) in RCA: 191] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2010] [Revised: 10/29/2010] [Accepted: 10/29/2010] [Indexed: 12/11/2022]

Qiu JD, Sun XY, Huang JH, Liang RP. Prediction of the Types of Membrane Proteins Based on Discrete Wavelet Transform and Support Vector Machines. Protein J 2010;29:114-9. [PMID: 20165909 DOI: 10.1007/s10930-010-9230-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Qiu JD, Luo SH, Huang JH, Sun XY, Liang RP. Predicting subcellular location of apoptosis proteins based on wavelet transform and support vector machine. Amino Acids 2009;38:1201-8. [DOI: 10.1007/s00726-009-0331-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2009] [Accepted: 07/20/2009] [Indexed: 11/28/2022]