Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zhang Y, Zaki MJ. EXMOTIF: efficient structured motif extraction. Algorithms Mol Biol 2006;1:21. [PMID: 17109757 PMCID: PMC1698483 DOI: 10.1186/1748-7188-1-21] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2006] [Accepted: 11/16/2006] [Indexed: 11/22/2022] Open

For:	Zhang Y, Zaki MJ. EXMOTIF: efficient structured motif extraction. Algorithms Mol Biol 2006;1:21. [PMID: 17109757 PMCID: PMC1698483 DOI: 10.1186/1748-7188-1-21] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2006] [Accepted: 11/16/2006] [Indexed: 11/22/2022] Open

Number

Cited by Other Article(s)

Mohanty S, Pattnaik PK, Al-Absi AA, Kang DK. A Review on Planted (l, d) Motif Discovery Algorithms for Medical Diagnose. SENSORS (BASEL, SWITZERLAND) 2022;22:1204. [PMID: 35161949 PMCID: PMC8838483 DOI: 10.3390/s22031204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 01/19/2022] [Accepted: 01/31/2022] [Indexed: 11/16/2022]

Yoshida T, Takeuchi I, Karasuyama M. Distance metric learning for graph structured data. Mach Learn 2021. [DOI: 10.1007/s10994-021-06009-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Saidi R, Dhifli W, Maddouri M, Mephu Nguifo E. Efficiently Mining Recurrent Substructures from Protein Three-Dimensional Structure Graphs. J Comput Biol 2019;26:561-571. [DOI: 10.1089/cmb.2018.0171] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Ayad LAK, Pissis SPP, Retha A. libFLASM: a software library for fixed-length approximate string matching. BMC Bioinformatics 2016;17:454. [PMID: 27832739 PMCID: PMC5103500 DOI: 10.1186/s12859-016-1320-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 11/03/2016] [Indexed: 01/06/2023] Open

Abstract

Background

Approximate string matching is the problem of finding all factors of a given text that are at a distance at most k from a given pattern. Fixed-length approximate string matching is the problem of finding all factors of a text of length n that are at a distance at most k from any factor of length ℓ of a pattern of length m. There exist bit-vector techniques to solve the fixed-length approximate string matching problem in time \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {O}(m\lceil \ell /w \rceil n)$\end{document}O(m⌈ℓ/w⌉n) and space \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$\mathcal {O}(m\lceil \ell /w\rceil)$\end{document}O(m⌈ℓ/w⌉) under the edit and Hamming distance models, where w is the size of the computer word; as such these techniques are independent of the distance threshold k or the alphabet size. Fixed-length approximate string matching is a generalisation of approximate string matching and, hence, has numerous direct applications in computational molecular biology and elsewhere.

Results

We present and make available libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching under both the edit and the Hamming distance models. Moreover we describe how fixed-length approximate string matching is applied to solve real problems by incorporating libFLASM into established applications for multiple circular sequence alignment as well as single and structured motif extraction. Specifically, we describe how it can be used to improve the accuracy of multiple circular sequence alignment in terms of the inferred likelihood-based phylogenies; and we also describe how it is used to efficiently find motifs in molecular sequences representing regulatory or functional regions. The comparison of the performance of the library to other algorithms show how it is competitive, especially with increasing distance thresholds.

Conclusions

Fixed-length approximate string matching is a generalisation of the classic approximate string matching problem. We present libFLASM, a free open-source C++ software library for solving fixed-length approximate string matching. The extensive experimental results presented here suggest that other applications could benefit from using libFLASM, and thus further maintenance and development of libFLASM is desirable.

Collapse

Pari A, Baraani A, Parseh S. NSAMD: A new approach to discover structured contiguous substrings in sequence datasets using Next-Symbol-Array. Comput Biol Chem 2016;64:384-395. [PMID: 27620380 DOI: 10.1016/j.compbiolchem.2016.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2015] [Revised: 08/25/2016] [Accepted: 09/03/2016] [Indexed: 10/21/2022]

Abstract

In many sequence data mining applications, the goal is to find frequent substrings. Some of these applications like extracting motifs in protein and DNA sequences are looking for frequently occurring approximate contiguous substrings called simple motifs. By approximate we mean that some mismatches are allowed during similarity test between substrings, and it helps to discover unknown patterns. Structured motifs in DNA sequences are frequent structured contiguous substrings which contains two or more simple motifs. There are some works that have been done to find simple motifs but these works have problems such as low scalability, high execution time, no guarantee to find all patterns, and low flexibility in adaptation to other application. The Flame is the only algorithm that can find all unknown structured patterns in a dataset and has solved most of these problems but its scalability for very large sequences is still weak. In this research a new approach named Next-Symbol-Array based Motif Discovery (NSAMD) is represented to improve scalability in extracting all unknown simple and structured patterns. To reach this goal a new data structure has been presented called Next-Symbol-Array. This data structure makes change in how to find patterns by NSAMD in comparison with Flame and helps to find structured motif faster. Proposed algorithm is as accurate as Flame and extracts all existing patterns in dataset. Performance comparisons show that NSAMD outperforms Flame in extracting structured motifs in both execution time (51% faster) and memory usage (more than 99%). Proposed algorithm is slower in extracting simple motifs but considerable improvement in memory usage (more than 99%) makes NSAMD more scalable than Flame. This advantage of NSAMD is very important in biological applications in which very large sequences are applied.

Collapse

Leoncini M, Montangero M, Pellegrini M, Tillan KP. CMStalker: A Combinatorial Tool for Composite Motif Discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015;12:1123-1136. [PMID: 26451824 DOI: 10.1109/tcbb.2014.2359444] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Federico M, Leoncini M, Montangero M, Valente P. Direct vs 2-stage approaches to structured motif finding. Algorithms Mol Biol 2012;7:20. [PMID: 22908910 PMCID: PMC3564690 DOI: 10.1186/1748-7188-7-20] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2011] [Accepted: 07/25/2012] [Indexed: 12/03/2022] Open

Abstract

Background

The notion of DNA motif is a mathematical abstraction used to model regions of the DNA (known as Transcription Factor Binding Sites, or TFBSs) that are bound by a given Transcription Factor to regulate gene expression or repression. In turn, DNA structured motifs are a mathematical counterpart that models sets of TFBSs that work in concert in the gene regulations processes of higher eukaryotic organisms. Typically, a structured motif is composed of an ordered set of isolated (or simple) motifs, separated by a variable, but somewhat constrained number of “irrelevant” base-pairs. Discovering structured motifs in a set of DNA sequences is a computationally hard problem that has been addressed by a number of authors using either a direct approach, or via the preliminary identification and successive combination of simple motifs.

Results

We describe a computational tool, named SISMA, for the de-novo discovery of structured motifs in a set of DNA sequences. SISMA is an exact, enumerative algorithm, meaning that it finds all the motifs conforming to the specifications. It does so in two stages: first it discovers all the possible component simple motifs, then combines them in a way that respects the given constraints. We developed SISMA mainly with the aim of understanding the potential benefits of such a 2-stage approach w.r.t. direct methods. In fact, no 2-stage software was available for the general problem of structured motif discovery, but only a few tools that solved restricted versions of the problem. We evaluated SISMA against other published tools on a comprehensive benchmark made of both synthetic and real biological datasets. In a significant number of cases, SISMA outperformed the competitors, exhibiting a good performance also in most of the cases in which it was inferior.

Conclusions

A reflection on the results obtained lead us to conclude that a 2-stage approach can be implemented with many advantages over direct approaches. Some of these have to do with greater modularity, ease of parallelization, and the possibility to perform adaptive searches of structured motifs. As another consideration, we noted that most hard instances for SISMA were easy to detect in advance. In these cases one may initially opt for a direct method; or, as a viable alternative in most laboratories, one could run both direct and 2-stage tools in parallel, halting the computations when the first halts.

Collapse

Temporal Pattern Mining for Medical Applications. ACTA ACUST UNITED AC 2012. [DOI: 10.1007/978-3-642-23151-3_2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/19/2023]

Saidi R, Maddouri M, Mephu Nguifo E. Protein sequences classification by means of feature extraction with substitution matrices. BMC Bioinformatics 2010;11:175. [PMID: 20377887 PMCID: PMC2868007 DOI: 10.1186/1471-2105-11-175] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2009] [Accepted: 04/08/2010] [Indexed: 11/10/2022] Open

Dorr DH, Denton AM. Establishing relationships among patterns in stock market data. DATA KNOWL ENG 2009. [DOI: 10.1016/j.datak.2008.10.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Zaki MJ, Karypis G, Yang J. Data Mining in Bioinformatics (BIOKDD). Algorithms Mol Biol 2007;2:4. [PMID: 17428327 PMCID: PMC1852315 DOI: 10.1186/1748-7188-2-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2006] [Accepted: 04/11/2007] [Indexed: 11/10/2022] Open

Zhang Y, Zaki MJ. SMOTIF: efficient structured pattern and profile motif search. Algorithms Mol Biol 2006;1:22. [PMID: 17118189 PMCID: PMC1679804 DOI: 10.1186/1748-7188-1-22] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2006] [Accepted: 11/21/2006] [Indexed: 11/29/2022] Open