Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Apostolico A, Parida L. Incremental Paradigms of Motif Discovery. J Comput Biol 2004;11:15-25. [PMID: 15072686 DOI: 10.1089/106652704773416867] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Number

Cited by Other Article(s)

Pizzi C, Ornamenti M, Spangaro S, Rombo SE, Parida L. Efficient Algorithms for Sequence Analysis with Entropic Profiles. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:117-128. [PMID: 28113780 DOI: 10.1109/tcbb.2016.2620143] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]

Comin M, Verzotto D. Beyond Fixed-Resolution Alignment-Free Measures for Mammalian Enhancers Sequence Comparison. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014;11:628-637. [PMID: 26356333 DOI: 10.1109/tcbb.2014.2306830] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Entropic Profiles, Maximal Motifs and the Discovery of Significant Repetitions in Genomic Sequences. ACTA ACUST UNITED AC 2014. [DOI: 10.1007/978-3-662-44753-6_12] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Panni S, Rombo SE. Searching for repetitions in biological networks: methods, resources and tools. Brief Bioinform 2013;16:118-36. [PMID: 24300112 DOI: 10.1093/bib/bbt084] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Image Classification Based on 2D Feature Motifs. ACTA ACUST UNITED AC 2013. [DOI: 10.1007/978-3-642-40769-7_30] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Comin M, Verzotto D. Alignment-free phylogeny of whole genomes using underlying subwords. Algorithms Mol Biol 2012;7:34. [PMID: 23216990 PMCID: PMC3549825 DOI: 10.1186/1748-7188-7-34] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2012] [Accepted: 11/29/2012] [Indexed: 11/24/2022] Open

Abstract

Background

With the progress of modern sequencing technologies a large number of complete genomes are now available. Traditionally the comparison of two related genomes is carried out by sequence alignment. There are cases where these techniques cannot be applied, for example if two genomes do not share the same set of genes, or if they are not alignable to each other due to low sequence similarity, rearrangements and inversions, or more specifically to their lengths when the organisms belong to different species. For these cases the comparison of complete genomes can be carried out only with ad hoc methods that are usually called alignment-free methods.

Methods

In this paper we propose a distance function based on subword compositions called Underlying Approach (UA). We prove that the matching statistics, a popular concept in the field of string algorithms able to capture the statistics of common words between two sequences, can be derived from a small set of “independent” subwords, namely the irredundant common subwords. We define a distance-like measure based on these subwords, such that each region of genomes contributes only once, thus avoiding to count shared subwords a multiple number of times. In a nutshell, this filter discards subwords occurring in regions covered by other more significant subwords.

Results

The Underlying Approach (UA) builds a scoring function based on this set of patterns, called underlying. We prove that this set is by construction linear in the size of input, without overlaps, and can be efficiently constructed. Results show the validity of our method in the reconstruction of phylogenetic trees, where the Underlying Approach outperforms the current state of the art methods. Moreover, we show that the accuracy of UA is achieved with a very small number of subwords, which in some cases carry meaningful biological information.

Availability

http://www.dei.unipd.it/∼ciompin/main/underlying.html

Collapse

Cunial F, Apostolico A. Phylogeny Construction with Rigid Gapped Motifs. J Comput Biol 2012;19:911-27. [DOI: 10.1089/cmb.2012.0060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Zhao X, Sze SH. Motif finding in DNA sequences based on skipping nonconserved positions in background Markov chains. J Comput Biol 2011;18:759-70. [PMID: 21554019 DOI: 10.1089/cmb.2010.0197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Comin M, Verzotto D. The irredundant class method for remote homology detection of protein sequences. J Comput Biol 2011;18:1819-29. [PMID: 21548811 DOI: 10.1089/cmb.2010.0171] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Grossi R, Pietracaprina A, Pisanti N, Pucci G, Upfal E, Vandin F. MADMX: A Strategy for Maximal Dense Motif Extraction. J Comput Biol 2011;18:535-45. [DOI: 10.1089/cmb.2010.0177] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Apostolico A, Comin M, Parida L. VARUN: discovering extensible motifs under saturation constraints. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010;7:752-26. [PMID: 21030741 DOI: 10.1109/tcbb.2008.123] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]

Comin M, Verzotto D. Classification of protein sequences by means of irredundant patterns. BMC Bioinformatics 2010;11 Suppl 1:S16. [PMID: 20122187 PMCID: PMC3009487 DOI: 10.1186/1471-2105-11-s1-s16] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Rombo SE. Optimal extraction of motif patterns in 2D. INFORM PROCESS LETT 2009. [DOI: 10.1016/j.ipl.2009.06.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Grossi R, Pietracaprina A, Pisanti N, Pucci G, Upfal E, Vandin F. MADMX: A Novel Strategy for Maximal Dense Motif Extraction. LECTURE NOTES IN COMPUTER SCIENCE 2009. [DOI: 10.1007/978-3-642-04241-6_30] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]

Zhang S, Su W, Yang J. ARCS-Motif: discovering correlated motifs from unaligned biological sequences. Bioinformatics 2008;25:183-9. [PMID: 19073591 DOI: 10.1093/bioinformatics/btn609] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Parida L. Discovering Topological Motifs Using a Compact Notation. J Comput Biol 2007;14:300-23. [PMID: 17563313 DOI: 10.1089/cmb.2006.0142] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Optimal Offline Extraction of Irredundant Motif Bases. ACTA ACUST UNITED AC 2007. [DOI: 10.1007/978-3-540-73545-8_36] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Zhang Y, Zaki MJ. EXMOTIF: efficient structured motif extraction. Algorithms Mol Biol 2006;1:21. [PMID: 17109757 PMCID: PMC1698483 DOI: 10.1186/1748-7188-1-21] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2006] [Accepted: 11/16/2006] [Indexed: 11/22/2022] Open

Bridging Lossy and Lossless Compression by Motif Pattern Discovery. LECTURE NOTES IN COMPUTER SCIENCE 2006. [DOI: 10.1007/11889342_51] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]

Pisanti N, Crochemore M, Grossi R, Sagot MF. Bases of motifs for generating repeated patterns with wild cards. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2005;2:40-50. [PMID: 17044163 DOI: 10.1109/tcbb.2005.5] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]