Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rigoutsos I, Floratos A, Parida L, Gao Y, Platt D. The emergence of pattern discovery techniques in computational biology. Metab Eng 2000;2:159-77. [PMID: 11056059 DOI: 10.1006/mben.2000.0151] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

For:	Rigoutsos I, Floratos A, Parida L, Gao Y, Platt D. The emergence of pattern discovery techniques in computational biology. Metab Eng 2000;2:159-77. [PMID: 11056059 DOI: 10.1006/mben.2000.0151] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Number

Cited by Other Article(s)

Baichoo S, Ouzounis CA. Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment. Biosystems 2017;156-157:72-85. [PMID: 28392341 DOI: 10.1016/j.biosystems.2017.03.003] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 03/21/2017] [Accepted: 03/22/2017] [Indexed: 12/12/2022]

Extended GT-STAF information indices based on Markov approximation models. Chem Phys Lett 2013. [DOI: 10.1016/j.cplett.2013.03.057] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Avoiding cross-bifix-free binary words. ACTA INFORM 2013. [DOI: 10.1007/s00236-013-0176-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Mukherjee S, Mitra S. HIDDEN MARKOV MODELS, GRAMMARS, AND BIOLOGY: A TUTORIAL. J Bioinform Comput Biol 2011;3:491-526. [PMID: 15852517 DOI: 10.1142/s0219720005001077] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2004] [Revised: 01/05/2004] [Accepted: 01/06/2005] [Indexed: 11/18/2022]

Pérez AJ, Rodríguez A, Trelles O, Thode G. A computational strategy for protein function assignment which addresses the multidomain problem. Comp Funct Genomics 2010;3:423-40. [PMID: 18629055 PMCID: PMC2447339 DOI: 10.1002/cfg.208] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2002] [Accepted: 08/12/2002] [Indexed: 11/25/2022] Open

Ferreira PG, Azevedo PJ. Evaluating deterministic motif significance measures in protein databases. Algorithms Mol Biol 2007;2:16. [PMID: 18157916 PMCID: PMC2254621 DOI: 10.1186/1748-7188-2-16] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2007] [Accepted: 12/24/2007] [Indexed: 11/13/2022] Open

Elloumi F, Nason M. SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences. BMC Bioinformatics 2007;8:354. [PMID: 17883842 PMCID: PMC2082047 DOI: 10.1186/1471-2105-8-354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2007] [Accepted: 09/20/2007] [Indexed: 11/18/2022] Open

Davey NE, Shields DC, Edwards RJ. SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent. Nucleic Acids Res 2006;34:3546-54. [PMID: 16855291 PMCID: PMC1524906 DOI: 10.1093/nar/gkl486] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open

Bridging Lossy and Lossless Compression by Motif Pattern Discovery. LECTURE NOTES IN COMPUTER SCIENCE 2006. [DOI: 10.1007/11889342_51] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]

Tao T, Zhai CX, Lu X, Fang H. A study of statistical methods for function prediction of protein motifs. ACTA ACUST UNITED AC 2005;3:115-24. [PMID: 15693737 DOI: 10.2165/00822942-200403020-00006] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Darzentas N, Rigoutsos I, Ouzounis CA. Sensitive detection of sequence similarity using combinatorial pattern discovery: A challenging study of two distantly related protein families. Proteins 2005;61:926-37. [PMID: 16224785 DOI: 10.1002/prot.20608] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Stenberg P, Pettersson F, Saura AO, Berglund A, Larsson J. Sequence signature analysis of chromosome identity in three Drosophila species. BMC Bioinformatics 2005;6:158. [PMID: 15975141 PMCID: PMC1181806 DOI: 10.1186/1471-2105-6-158] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2004] [Accepted: 06/23/2005] [Indexed: 11/30/2022] Open

Blekas K, Fotiadis DI, Likas A. Motif-based protein sequence classification using neural networks. J Comput Biol 2005;12:64-82. [PMID: 15725734 DOI: 10.1089/cmb.2005.12.64] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Reliable detection of episodes in event sequences. Knowl Inf Syst 2005. [DOI: 10.1007/s10115-004-0174-5] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Helman P, Veroff R, Atlas SR, Willman C. A Bayesian network classification methodology for gene expression data. J Comput Biol 2005;11:581-615. [PMID: 15579233 DOI: 10.1089/cmb.2004.11.581] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

We present new techniques for the application of a Bayesian network learning framework to the problem of classifying gene expression data. The focus on classification permits us to develop techniques that address in several ways the complexities of learning Bayesian nets. Our classification model reduces the Bayesian network learning problem to the problem of learning multiple subnetworks, each consisting of a class label node and its set of parent genes. We argue that this classification model is more appropriate for the gene expression domain than are other structurally similar Bayesian network classification models, such as Naive Bayes and Tree Augmented Naive Bayes (TAN), because our model is consistent with prior domain experience suggesting that a relatively small number of genes, taken in different combinations, is required to predict most clinical classes of interest. Within this framework, we consider two different approaches to identifying parent sets which are supported by the gene expression observations and any other currently available evidence. One approach employs a simple greedy algorithm to search the universe of all genes; the second approach develops and applies a gene selection algorithm whose results are incorporated as a prior to enable an exhaustive search for parent sets over a restricted universe of genes. Two other significant contributions are the construction of classifiers from multiple, competing Bayesian network hypotheses and algorithmic methods for normalizing and binning gene expression data in the absence of prior expert knowledge. Our classifiers are developed under a cross validation regimen and then validated on corresponding out-of-sample test sets. The classifiers attain a classification rate in excess of 90% on out-of-sample test sets for two publicly available datasets. We present an extensive compilation of results reported in the literature for other classification methods run against these same two datasets. Our results are comparable to, or better than, any we have found reported for these two sets, when a train-test protocol as stringent as ours is followed.

Collapse

Huynh T, Rigoutsos I. The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update. Nucleic Acids Res 2004;32:W10-5. [PMID: 15215340 PMCID: PMC441505 DOI: 10.1093/nar/gkh367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Bahr U, Darai G. Re-evaluation and in silico annotation of the Tupaia herpesvirus proteins. Virus Genes 2004;28:99-120. [PMID: 14739655 DOI: 10.1023/b:viru.0000012267.97659.e0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Abstract

Herpesviruses represent an exceptionally suitable model to analyze evolutionary old pathogens, their competency to adapt to existing and changing molecular niches in host species, and the modulation of the gene content and function to comply with the requirements of life. The basis for numerous studies dealing with these questions are reliable statements about the gene content of herpesviral genomes and the functions of viral proteins. The recent determination of the coding strategy of the chimpanzee cytomegalovirus genome and the re-evaluation of the gene content of the human cytomegalovirus genome made it also necessary to restructure the putative transcription map of the Tupaia herpesvirus (THV) genome. Twenty-three THV-specific ORFs formerly predicted to be coding for viral proteins were deleted from the THV transcription map resulting in a gene layout that is now characterized by the presence of conserved genes in the genome center, that probably reflect the genome structure of common herpesviral ancestors, and species-specific genes at the termini. The conserved regions in the THV genome are characterized by high G + C contents between 60% and 80%, a high CpG dinucleotide frequency, and the presence of densely packed putative CpG islands. The genome termini seem to provide the requirements of large scale rearrangements and complements of the gene content to adapt to new environmental demands. With the help of the recently designed method of dictionary-driven, pattern-based protein annotation it was possible to assign putative functions to almost all potential THV proteins, e.g. 123 were found to be putative membrane or secreted proteins, putative signal domains were identified in 69, and 29 proteins were predicted to be glycosylated. The present study adds new aspects to the knowledge about the precise gene composition of herpesvirus genomes and viral protein functions that are of exceptional importance for studies dealing with the phylogeny, the evolution, vaccine vector development, virus-host interactions, pathogenesis and the determination of protein functions of herpesviruses.

Collapse

A Tennis Video Indexing Approach Through Pattern Discovery in Interactive Process. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2004 2004. [DOI: 10.1007/978-3-540-30541-5_7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]

Apostolico A, Parida L. Incremental Paradigms of Motif Discovery. J Comput Biol 2004;11:15-25. [PMID: 15072686 DOI: 10.1089/106652704773416867] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Huynh T, Rigoutsos I, Parida L, Platt D, Shibuya T. The web server of IBM's Bioinformatics and Pattern Discovery group. Nucleic Acids Res 2003;31:3645-50. [PMID: 12824385 PMCID: PMC169027 DOI: 10.1093/nar/gkg621] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2003] [Revised: 04/08/2003] [Accepted: 04/08/2003] [Indexed: 11/12/2022] Open

Rigoutsos I, Novotny J, Huynh T, Chin-Bow ST, Parida L, Platt D, Coleman D, Shenk T. In silico pattern-based analysis of the human cytomegalovirus genome. J Virol 2003;77:4326-44. [PMID: 12634390 PMCID: PMC150618 DOI: 10.1128/jvi.77.7.4326-4344.2003] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2002] [Accepted: 12/23/2002] [Indexed: 11/20/2022] Open

Bicciato S, Pandin M, Didonè G, Di Bello C. Pattern identification and classification in gene expression data using an autoassociative neural network model. Biotechnol Bioeng 2003;81:594-606. [PMID: 12514809 DOI: 10.1002/bit.10505] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Abstract

The application of DNA microarray technology for analysis of gene expression creates enormous opportunities to accelerate the pace in understanding living systems and identification of target genes and pathways for drug development and therapeutic intervention. Parallel monitoring of the expression profiles of thousands of genes seems particularly promising for a deeper understanding of cancer biology and the identification of molecular signatures supporting the histological classification schemes of neoplastic specimens. However, the increasing volume of data generated by microarray experiments poses the challenge of developing equally efficient methods and analysis procedures to extract, interpret, and upgrade the information content of these databases. Herein, a computational procedure for pattern identification, feature extraction, and classification of gene expression data through the analysis of an autoassociative neural network model is described. The identified patterns and features contain critical information about gene-phenotype relationships observed during changes in cell physiology. They represent a rational and dimensionally reduced base for understanding the basic biology of the onset of diseases, defining targets of therapeutic intervention, and developing diagnostic tools for the identification and classification of pathological states. The proposed method has been tested on two different microarray datasets-Golub's analysis of acute human leukemia [Golub et al. (1999) Science 286:531-537], and the human colon adenocarcinoma study presented by Alon et al. [1999; Proc Natl Acad Sci USA 97:10101-10106]. The analysis of the neural network internal structure allows the identification of specific phenotype markers and the extraction of peculiar associations among genes and physiological states. At the same time, the neural network outputs provide assignment to multiple classes, such as different pathological conditions or tissue samples, for previously unseen instances.

Collapse

Rigoutsos I, Huynh T, Floratos A, Parida L, Platt D. Dictionary-driven protein annotation. Nucleic Acids Res 2002;30:3901-16. [PMID: 12202776 PMCID: PMC137405 DOI: 10.1093/nar/gkf464] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2002] [Revised: 06/04/2002] [Accepted: 06/04/2002] [Indexed: 11/14/2022] Open

Abstract

Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were released publicly after we built the Bio-Dictionary that is used in our experiments. Finally, we have computed the annotations of more than 70 complete genomes and made them available on the World Wide Web at http://cbcsrv.watson.ibm.com/Annotations/.

Collapse

Shibuya T, Rigoutsos I. Dictionary-driven prokaryotic gene finding. Nucleic Acids Res 2002;30:2710-25. [PMID: 12060689 PMCID: PMC117281 DOI: 10.1093/nar/gkf338] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Burgard AP, Moore GL, Maranas CD. Review of the TEIRESIAS-based tools of the IBM Bioinformatics and Pattern Discovery Group. Metab Eng 2001;3:285-8. [PMID: 11676564 DOI: 10.1006/mben.2001.0195] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]