Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Vries JK, Munshi R, Tobi D, Klein-Seetharaman J, Benos PV, Bahar I. A sequence alignment-independent method for protein classification. ACTA ACUST UNITED AC 2005;3:137-48. [PMID: 15693739 DOI: 10.2165/00822942-200403020-00008] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

For:	Vries JK, Munshi R, Tobi D, Klein-Seetharaman J, Benos PV, Bahar I. A sequence alignment-independent method for protein classification. ACTA ACUST UNITED AC 2005;3:137-48. [PMID: 15693739 DOI: 10.2165/00822942-200403020-00008] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Number

Cited by Other Article(s)

Vaz JM, Balaji S. Convolutional neural networks (CNNs): concepts and applications in pharmacogenomics. Mol Divers 2021;25:1569-1584. [PMID: 34031788 PMCID: PMC8342355 DOI: 10.1007/s11030-021-10225-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2021] [Accepted: 04/21/2021] [Indexed: 12/17/2022]

Li M, Ling C, Xu Q, Gao J. Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments. Amino Acids 2017;50:255-266. [PMID: 29151135 DOI: 10.1007/s00726-017-2512-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2017] [Accepted: 11/14/2017] [Indexed: 10/18/2022]

Abstract

Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .

Collapse

Ganapathiraju MK, Mitchell AD, Thahir M, Motwani K, Ananthasubramanian S. Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences. J Bioinform Comput Biol 2012;10:1250016. [PMID: 22817111 DOI: 10.1142/s0219720012500163] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Santos AR, Santos MA, Baumbach J, McCulloch JA, Oliveira GC, Silva A, Miyoshi A, Azevedo V. A singular value decomposition approach for improved taxonomic classification of biological sequences. BMC Genomics 2011;12 Suppl 4:S11. [PMID: 22369633 PMCID: PMC3287580 DOI: 10.1186/1471-2164-12-s4-s11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Hawkins T, Kihara D. FUNCTION PREDICTION OF UNCHARACTERIZED PROTEINS. J Bioinform Comput Biol 2011;5:1-30. [PMID: 17477489 DOI: 10.1142/s0219720007002503] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2006] [Revised: 09/23/2006] [Accepted: 10/10/2006] [Indexed: 11/18/2022]

Hong H, Hong Q, Perkins R, Shi L, Fang H, Su Z, Dragan Y, Fuscoe JC, Tong W. The accurate prediction of protein family from amino acid sequence by measuring features of sequence fragments. J Comput Biol 2010;16:1671-88. [PMID: 20047490 DOI: 10.1089/cmb.2008.0115] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Mansoori E, Zolghadri M, Katebi S. Protein Superfamily Classification Using Fuzzy Rule-Based Classifier. IEEE Trans Nanobioscience 2009;8:92-9. [DOI: 10.1109/tnb.2009.2016484] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Vries JK, Liu X. Subfamily specific conservation profiles for proteins based on n-gram patterns. BMC Bioinformatics 2008;9:72. [PMID: 18234090 PMCID: PMC2267698 DOI: 10.1186/1471-2105-9-72] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2007] [Accepted: 01/30/2008] [Indexed: 11/10/2022] Open

Otaki JM, Gotoh T, Yamamoto H. Potential implications of availability of short amino acid sequences in proteins: an old and new approach to protein decoding and design. BIOTECHNOLOGY ANNUAL REVIEW 2008;14:109-41. [PMID: 18606361 DOI: 10.1016/s1387-2656(08)00004-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Abstract

Three-dimensional structure of a protein molecule is primarily determined by its amino acid sequence, and thus the elucidation of general rules embedded in amino acid sequences is of great importance in protein science and engineering. To extract valuable information from sequences, we propose an analytical method in which a protein sequence is considered to be constructed by serial superimpositions of short amino acid sequences of n amino acid sets, especially triplets (3-aa sets). Using the comprehensive nonredundant protein database, we first examined "availability" of all possible combinatorial sets of 8,000 triplet species. Availability score was mathematically defined as an indicator for the relative "preference" or "avoidance" for a given short constituent sequence to be used in protein chain. Availability scores of real proteins were clearly biased against those of randomly generated proteins. We found many triplet species that occurred in the database more than expected or less than expected. Such bias was extended to longer sets, and we found that some species of pentats (5-aa sets) that occurred reasonably frequently in the randomly generated protein population did not occur at all in any real proteins known today. Availability score was dependent on species, potentially serving as a phylogenetic indicator. Furthermore, we suggest possibilities of various biotechnological applications of characteristic short sequences such as human-specific and pathogen-specific short sequences obtained from availability analysis. Availability score was also dependent on secondary structures, potentially serving as a structural indicator. Availability analysis on triplets may be combined with a comprehensive data collection on the varphi and psi peptide-bond angles of the amino acid at the center of each triplet, i.e., a collection of Ramachandran plots for each triplet. These triplet characters, together with other physicochemical data, will provide us with basic information between protein sequence and structure, by which structure prediction and engineering may be greatly facilitated. Availability analysis may also be useful in identifying word processing units in amino acid sequences based on an analogy to natural languages. Together with other approaches, availability analysis will elucidate general rules hidden in the primary sequences and eventually contributes to rebuilding the paradigm of protein science.

Collapse

Vries JK, Liu X, Bahar I. The relationship between n-gram patterns and protein secondary structure. Proteins 2007;68:830-8. [PMID: 17523186 DOI: 10.1002/prot.21480] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Tobi D, Bahar I. Recruitment of rare 3-grams at functional sites: is this a mechanism for increasing enzyme specificity? BMC Bioinformatics 2007;8:226. [PMID: 17598909 PMCID: PMC1950313 DOI: 10.1186/1471-2105-8-226] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2007] [Accepted: 06/28/2007] [Indexed: 11/10/2022] Open

Levy ED, Ouzounis CA, Gilks WR, Audit B. Probabilistic annotation of protein sequences based on functional classifications. BMC Bioinformatics 2005;6:302. [PMID: 16354297 PMCID: PMC1361783 DOI: 10.1186/1471-2105-6-302] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2005] [Accepted: 12/14/2005] [Indexed: 11/17/2022] Open

Ganapathiraju M, Balakrishnan N, Reddy R, Klein-Seetharaman J. Computational Biology and Language. AMBIENT INTELLIGENCE FOR SCIENTIFIC DISCOVERY 2005. [DOI: 10.1007/978-3-540-32263-4_2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]