Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Zambelli F, Pesole G, Pavesi G. Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. Brief Bioinform 2012;14:225-37. [PMID: 22517426 PMCID: PMC3603212 DOI: 10.1093/bib/bbs016] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

For:	Zambelli F, Pesole G, Pavesi G. Motif discovery and transcription factor binding sites before and after the next-generation sequencing era. Brief Bioinform 2012;14:225-37. [PMID: 22517426 PMCID: PMC3603212 DOI: 10.1093/bib/bbs016] [Citation(s) in RCA: 93] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open

Number

Cited by Other Article(s)

Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023;24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]

Tognon M, Giugno R, Pinello L. A survey on algorithms to characterize transcription factor binding sites. Brief Bioinform 2023;24:bbad156. [PMID: 37099664 PMCID: PMC10422928 DOI: 10.1093/bib/bbad156] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 03/27/2023] [Accepted: 04/01/2023] [Indexed: 04/28/2023] Open

Farhadi F, Allahbakhsh M, Maghsoudi A, Armin N, Amintoosi H. DiMo: discovery of microRNA motifs using deep learning and motif embedding. Brief Bioinform 2023;24:bbad182. [PMID: 37165972 DOI: 10.1093/bib/bbad182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 04/17/2023] [Accepted: 04/21/2023] [Indexed: 05/12/2023] Open

Monteiro LDFR, Giraldi LA, Winck FV. From Feasting to Fasting: The Arginine Pathway as a Metabolic Switch in Nitrogen-Deprived Chlamydomonas reinhardtii. Cells 2023;12:1379. [PMID: 37408213 PMCID: PMC10216424 DOI: 10.3390/cells12101379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 05/09/2023] [Accepted: 05/10/2023] [Indexed: 07/07/2023] Open

Abstract

The metabolism of the model microalgae Chlamydomonas reinhardtii under nitrogen deprivation is of special interest due to its resulting increment of triacylglycerols (TAGs), that can be applied in biotechnological applications. However, this same condition impairs cell growth, which may limit the microalgae's large applications. Several studies have identified significant physiological and molecular changes that occur during the transition from an abundant to a low or absent nitrogen supply, explaining in detail the differences in the proteome, metabolome and transcriptome of the cells that may be responsible for and responsive to this condition. However, there are still some intriguing questions that reside in the core of the regulation of these cellular responses that make this process even more interesting and complex. In this scenario, we reviewed the main metabolic pathways that are involved in the response, mining and exploring, through a reanalysis of omics data from previously published datasets, the commonalities among the responses and unraveling unexplained or non-explored mechanisms of the possible regulatory aspects of the response. Proteomics, metabolomics and transcriptomics data were reanalysed using a common strategy, and an in silico gene promoter motif analysis was performed. Together, these results identified and suggested a strong association between the metabolism of amino acids, especially arginine, glutamate and ornithine pathways to the production of TAGs, via the de novo synthesis of lipids. Furthermore, our analysis and data mining indicate that signalling cascades orchestrated with the indirect participation of phosphorylation, nitrosylation and peroxidation events may be essential to the process. The amino acid pathways and the amount of arginine and ornithine available in the cells, at least transiently during nitrogen deprivation, may be in the core of the post-transcriptional, metabolic regulation of this complex phenomenon. Their further exploration is important to the discovery of novel advances in the understanding of microalgae lipids' production.

Collapse

Maseko NN, Steenkamp ET, Wingfield BD, Wilken PM. An in Silico Approach to Identifying TF Binding Sites: Analysis of the Regulatory Regions of BUSCO Genes from Fungal Species in the Ceratocystidaceae Family. Genes (Basel) 2023;14:genes14040848. [PMID: 37107606 PMCID: PMC10137650 DOI: 10.3390/genes14040848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/26/2023] [Accepted: 03/27/2023] [Indexed: 04/03/2023] Open

Deyneko IV. Guidelines on the performance evaluation of motif recognition methods in bioinformatics. Front Genet 2023;14:1135320. [PMID: 36824436 PMCID: PMC9941176 DOI: 10.3389/fgene.2023.1135320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 01/19/2023] [Indexed: 02/09/2023] Open

Scagnoli F, Palma A, Favia A, Scuoppo C, Illi B, Nasi S. A New Insight into MYC Action: Control of RNA Polymerase II Methylation and Transcription Termination. Biomedicines 2023;11:biomedicines11020412. [PMID: 36830948 PMCID: PMC9952900 DOI: 10.3390/biomedicines11020412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 01/16/2023] [Accepted: 01/26/2023] [Indexed: 02/01/2023] Open

Yu Q, Zhang X, Hu Y, Chen S, Yang L. A Method for Predicting DNA Motif Length Based On Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:61-73. [PMID: 35275822 DOI: 10.1109/tcbb.2022.3158471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

NetREX-CF integrates incomplete transcription factor data with gene expression to reconstruct gene regulatory networks. Commun Biol 2022;5:1282. [PMID: 36418514 PMCID: PMC9684490 DOI: 10.1038/s42003-022-04226-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 11/04/2022] [Indexed: 11/25/2022] Open

Zhang S, Ma A, Zhao J, Xu D, Ma Q, Wang Y. Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data. Brief Bioinform 2022;23:bbab374. [PMID: 34607350 PMCID: PMC8769700 DOI: 10.1093/bib/bbab374] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 08/22/2021] [Accepted: 08/23/2021] [Indexed: 12/28/2022] Open

Kuiper M, Bonello J, Fernández-Breis JT, Bucher P, Futschik ME, Gaudet P, Kulakovskiy IV, Licata L, Logie C, Lovering RC, Makeev VJ, Orchard S, Panni S, Perfetto L, Sant D, Schulz S, Zerbino DR, Lægreid A. The Gene Regulation Knowledge Commons: The action area of GREEKC. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2021;1865:194768. [PMID: 34757206 DOI: 10.1016/j.bbagrm.2021.194768] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 10/18/2021] [Accepted: 10/20/2021] [Indexed: 02/08/2023]

Affiliation(s)

Martin Kuiper Systems Biology Group, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway.
Joseph Bonello Faculty of Information & Communication Technology, University of Malta, Msida, Malta
Jesualdo Tomás Fernández-Breis Departamento de Informática y Sistemas, Universidad de Murcia, IMIB-Arrixaca, CP 30100 Murcia, Spain
Philipp Bucher Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Amphipôle, 1015 Lausanne, Switzerland
Matthias E Futschik Systems Biology and Bioinformatics Laboratory (SysBioLab), Centre of Marine Sciences (CCMAR), University of Algarve, 8005-139 Faro, Portugal
Pascale Gaudet SIB Swiss Institute of Bioinformatics, 1 Rue Michel-Servet, 1204 Geneva, Switzerland
Ivan V Kulakovskiy Institute of Protein Research, Russian Academy of Sciences, Institutskaya 4, 142290 Pushchino, Russia
Luana Licata Department of Biology, University of Rome Tor Vergata, Rome, Italy
Colin Logie Department of Molecular Biology, Faculty of Science, Radboud University, PO Box 9101, Nijmegen 6500HG, the Netherlands
Ruth C Lovering Functional Gene Annotation, Pre-clinical and Fundamental Science, Institute of Cardiovascular Science, University College London, 5 University Street, London WC1E 6JF, UK
Vsevolod J Makeev Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkina 3, 119991 Moscow, Russia
Sandra Orchard European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Simona Panni Department DIBEST, University of Calabria, Rende, Italy
Livia Perfetto Fondazione Human Technopole, Department of Biology, Via Cristina Belgioioso, 171, 20157 Milan, Italy
David Sant Department of Biomedical Informatics, University of Utah, 421 Wakara Way #140, Salt Lake City, UT 84108, United States
Stefan Schulz Institute of Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerpl. 2, Graz, Austria
Daniel R Zerbino European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Astrid Lægreid Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, 7491 Trondheim, Norway

Collapse

Castellana S, Biagini T, Parca L, Petrizzelli F, Bianco SD, Vescovi AL, Carella M, Mazza T. A comparative benchmark of classic DNA motif discovery tools on synthetic data. Brief Bioinform 2021;22:6341664. [PMID: 34351399 DOI: 10.1093/bib/bbab303] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/08/2021] [Accepted: 07/15/2021] [Indexed: 01/01/2023] Open

Li JY, Jin S, Tu XM, Ding Y, Gao G. Identifying complex motifs in massive omics data with a variable-convolutional layer in deep neural network. Brief Bioinform 2021;22:6312656. [PMID: 34219140 DOI: 10.1093/bib/bbab233] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 05/25/2021] [Accepted: 05/28/2021] [Indexed: 01/10/2023] Open

Geete K, Pandey M. Robust Transcription Factor Binding Site Prediction Using Deep Neural Networks. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200429121156] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Identification of Cis-Regulatory Sequences Controlling Pollen-Specific Expression of Hydroxyproline-Rich Glycoprotein Genes in Arabidopsis thaliana. PLANTS 2020;9:plants9121751. [PMID: 33322028 PMCID: PMC7763877 DOI: 10.3390/plants9121751] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Revised: 11/26/2020] [Accepted: 12/07/2020] [Indexed: 02/06/2023]

He Y, Shen Z, Zhang Q, Wang S, Huang DS. A survey on deep learning in DNA/RNA motif mining. Brief Bioinform 2020;22:5916939. [PMID: 33005921 PMCID: PMC8293829 DOI: 10.1093/bib/bbaa229] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 08/19/2020] [Accepted: 08/24/2020] [Indexed: 01/18/2023] Open

Sultan I, Fromion V, Schbath S, Nicolas P. Statistical modelling of bacterial promoter sequences for regulatory motif discovery with the help of transcriptome data: application to Listeria monocytogenes. J R Soc Interface 2020;17:20200600. [PMID: 33023397 PMCID: PMC7653377 DOI: 10.1098/rsif.2020.0600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 09/10/2020] [Indexed: 11/12/2022] Open

DNA motif discovery using chemical reaction optimization. EVOLUTIONARY INTELLIGENCE 2020. [DOI: 10.1007/s12065-020-00444-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]

Zhou J, Lu Q, Gui L, Xu R, Long Y, Wang H. MTTFsite: cross-cell type TF binding site prediction by using multi-task learning. Bioinformatics 2020;35:5067-5077. [PMID: 31161194 PMCID: PMC6954652 DOI: 10.1093/bioinformatics/btz451] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 05/19/2019] [Accepted: 05/30/2019] [Indexed: 12/30/2022] Open

Abstract

Motivation

The prediction of transcription factor binding sites (TFBSs) is crucial for gene expression analysis. Supervised learning approaches for TFBS predictions require large amounts of labeled data. However, many TFs of certain cell types either do not have sufficient labeled data or do not have any labeled data.

Results

In this paper, a multi-task learning framework (called MTTFsite) is proposed to address the lack of labeled data problem by leveraging on labeled data available in cross-cell types. The proposed MTTFsite contains a shared CNN to learn common features for all cell types and a private CNN for each cell type to learn private features. The common features are aimed to help predicting TFBSs for all cell types especially those cell types that lack labeled data. MTTFsite is evaluated on 241 cell type TF pairs and compared with a baseline method without using any multi-task learning model and a fully shared multi-task model that uses only a shared CNN and do not use private CNNs. For cell types with insufficient labeled data, results show that MTTFsite performs better than the baseline method and the fully shared model on more than 89% pairs. For cell types without any labeled data, MTTFsite outperforms the baseline method and the fully shared model by more than 80 and 93% pairs, respectively. A novel gene expression prediction method (called TFChrome) using both MTTFsite and histone modification features is also presented. Results show that TFBSs predicted by MTTFsite alone can achieve good performance. When MTTFsite is combined with histone modification features, a significant 5.7% performance improvement is obtained.

Availability and implementation

The resource and executable code are freely available at http://hlt.hitsz.edu.cn/MTTFsite/ and http://www.hitsz-hlt.com:8080/MTTFsite/.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Höllbacher B, Balázs K, Heinig M, Uhlenhaut NH. Seq-ing answers: Current data integration approaches to uncover mechanisms of transcriptional regulation. Comput Struct Biotechnol J 2020;18:1330-1341. [PMID: 32612756 PMCID: PMC7306512 DOI: 10.1016/j.csbj.2020.05.018] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 05/21/2020] [Accepted: 05/23/2020] [Indexed: 02/06/2023] Open

Ronzio M, Zambelli F, Dolfini D, Mantovani R, Pavesi G. Integrating Peak Colocalization and Motif Enrichment Analysis for the Discovery of Genome-Wide Regulatory Modules and Transcription Factor Recruitment Rules. Front Genet 2020;11:72. [PMID: 32153638 PMCID: PMC7046753 DOI: 10.3389/fgene.2020.00072] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 01/22/2020] [Indexed: 12/14/2022] Open

Li T, Zhang X, Luo F, Wu FX, Wang J. MultiMotifMaker: A Multi-Thread Tool for Identifying DNA Methylation Motifs from Pacbio Reads. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020;17:220-225. [PMID: 30059318 DOI: 10.1109/tcbb.2018.2861399] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Yu Q, Zhao X, Huo H. A new algorithm for DNA motif discovery using multiple sample sequence sets. J Bioinform Comput Biol 2019;17:1950021. [PMID: 31617465 DOI: 10.1142/s0219720019500215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Lan G, Zhou J, Xu R, Lu Q, Wang H. Cross-Cell-Type Prediction of TF-Binding Site by Integrating Convolutional Neural Network and Adversarial Network. Int J Mol Sci 2019;20:ijms20143425. [PMID: 31336830 PMCID: PMC6679139 DOI: 10.3390/ijms20143425] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 06/27/2019] [Accepted: 07/08/2019] [Indexed: 01/18/2023] Open

Hu J, Wang J, Lin J, Liu T, Zhong Y, Liu J, Zheng Y, Gao Y, He J, Shang X. MD-SVM: a novel SVM-based algorithm for the motif discovery of transcription factor binding sites. BMC Bioinformatics 2019;20:200. [PMID: 31074373 PMCID: PMC6509868 DOI: 10.1186/s12859-019-2735-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open

Transcription-dependent spreading of the Dal80 yeast GATA factor across the body of highly expressed genes. PLoS Genet 2019;15:e1007999. [PMID: 30818362 PMCID: PMC6413948 DOI: 10.1371/journal.pgen.1007999] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 03/12/2019] [Accepted: 01/31/2019] [Indexed: 12/30/2022] Open

Dao P, Hoinka J, Takahashi M, Zhou J, Ho M, Wang Y, Costa F, Rossi JJ, Backofen R, Burnett J, Przytycka TM. AptaTRACE Elucidates RNA Sequence-Structure Motifs from Selection Trends in HT-SELEX Experiments. Cell Syst 2019;3:62-70. [PMID: 27467247 DOI: 10.1016/j.cels.2016.07.003] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2016] [Revised: 06/24/2016] [Accepted: 07/01/2016] [Indexed: 10/21/2022]

Tran NTL, Huang CH. Performance evaluation for MOTIFSIM. Biol Proced Online 2018;20:23. [PMID: 30574025 PMCID: PMC6299673 DOI: 10.1186/s12575-018-0088-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Accepted: 12/07/2018] [Indexed: 11/10/2022] Open

Abstract

Background

Previous studies show various results obtained from different motif finders for an identical dataset. This is largely due to the fact that these tools use different strategies and possess unique features for discovering the motifs. Hence, using multiple tools and methods has been suggested because the motifs commonly reported by them are more likely to be biologically significant.

Results

The common significant motifs from multiple tools can be obtained by using MOTIFSIM tool. In this work, we evaluated the performance of MOTIFSIM in three aspects. First, we compared the pair-wise comparison technique of MOTIFSIM with the un-gapped Smith-Waterman algorithm and four common distance metrics: average Kullback-Leibler, average log-likelihood ratio, Chi-Square distance, and Pearson Correlation Coefficient. Second, we compared the performance of MOTIFSIM with RSAT Matrix-clustering tool for motif clustering. Lastly, we evaluated the performances of nineteen motif finders and the reliability of MOTIFSIM for identifying the common significant motifs from multiple tools.

Conclusions

The pair-wise comparison results reveal that MOTIFSIM attains better performance than the un-gapped Smith-Waterman algorithm and four distance metrics. The clustering results also demonstrate that MOTIFSIM achieves similar or even better performance than RSAT Matrix-clustering. Furthermore, the findings indicate if the motif detection does not require a special tool for detecting a specific type of motif then using multiple motif finders and combining with MOTIFSIM for obtaining the common significant motifs, it improved the results for DNA motif detection.

Electronic supplementary material

The online version of this article (10.1186/s12575-018-0088-3) contains supplementary material, which is available to authorized users.

Collapse

Tran NTL, Huang CH. MODSIDE: a motif discovery pipeline and similarity detector. BMC Genomics 2018;19:755. [PMID: 30340511 PMCID: PMC6194616 DOI: 10.1186/s12864-018-5148-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2018] [Accepted: 10/08/2018] [Indexed: 01/06/2023] Open

Lee NK, Li X, Wang D. A comprehensive survey on genetic algorithms for DNA motif prediction. Inf Sci (N Y) 2018. [DOI: 10.1016/j.ins.2018.07.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]

Guca E, Suñol D, Ruiz L, Konkol A, Cordero J, Torner C, Aragon E, Martin-Malpartida P, Riera A, Macias MJ. TGIF1 homeodomain interacts with Smad MH1 domain and represses TGF-β signaling. Nucleic Acids Res 2018;46:9220-9235. [PMID: 30060237 PMCID: PMC6158717 DOI: 10.1093/nar/gky680] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 07/17/2018] [Indexed: 12/16/2022] Open

Affiliation(s)

Ewelina Guca Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
David Suñol Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
Lidia Ruiz Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
Agnieszka Konkol Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
Jorge Cordero Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
Carles Torner Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
Eric Aragon Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
Pau Martin-Malpartida Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain
Antoni Riera Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain Departament de Química Inorgànica i Orgànica, Secció de Química Orgànica, Universitat de Barcelona, Martí i Franquès 1-11, 08028, Barcelona, Spain
Maria J Macias Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, Barcelona 08028, Spain ICREA, Passeig Lluís Companys 23, 08010-Barcelona, Spain To whom correspondence should be addressed. Tel: +34 934037189;

Collapse

Al-Ouran R, Schmidt R, Naik A, Jones J, Drews F, Juedes D, Elnitski L, Welch L. Discovering Gene Regulatory Elements Using Coverage-Based Heuristics. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:1290-1300. [PMID: 26540692 DOI: 10.1109/tcbb.2015.2496261] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Yu Q, Wei D, Huo H. SamSelect: a sample sequence selection algorithm for quorum planted motif search on large DNA datasets. BMC Bioinformatics 2018;19:228. [PMID: 29914360 PMCID: PMC6006848 DOI: 10.1186/s12859-018-2242-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2018] [Accepted: 06/12/2018] [Indexed: 01/08/2023] Open

Lee NK, Azizan FL, Wong YS, Omar N. DeepFinder: An integration of feature-based and deep learning approach for DNA motif discovery. BIOTECHNOL BIOTEC EQ 2018. [DOI: 10.1080/13102818.2018.1438209] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022] Open

Mitra S, Biswas A, Narlikar L. DIVERSITY in binding, regulation, and evolution revealed from high-throughput ChIP. PLoS Comput Biol 2018;14:e1006090. [PMID: 29684008 PMCID: PMC5933800 DOI: 10.1371/journal.pcbi.1006090] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Revised: 05/03/2018] [Accepted: 03/14/2018] [Indexed: 12/27/2022] Open

Abstract

Genome-wide in vivo protein-DNA interactions are routinely mapped using high-throughput chromatin immunoprecipitation (ChIP). ChIP-reported regions are typically investigated for enriched sequence-motifs, which are likely to model the DNA-binding specificity of the profiled protein and/or of co-occurring proteins. However, simple enrichment analyses can miss insights into the binding-activity of the protein. Note that ChIP reports regions making direct contact with the protein as well as those binding through intermediaries. For example, consider a ChIP experiment targeting protein X, which binds DNA at its cognate sites, but simultaneously interacts with four other proteins. Each of these proteins also binds to its own specific cognate sites along distant parts of the genome, a scenario consistent with the current view of transcriptional hubs and chromatin loops. Since ChIP will pull down all X-associated regions, the final reported data will be a union of five distinct sets of regions, each containing binding sites of one of the five proteins, respectively. Characterizing all five different motifs and the corresponding sets is important to interpret the ChIP experiment and ultimately, the role of X in regulation. We present diversity which attempts exactly this: it partitions the data so that each partition can be characterized with its own de novo motif. Diversity uses a Bayesian approach to identify the optimal number of motifs and the associated partitions, which together explain the entire dataset. This is in contrast to standard motif finders, which report motifs individually enriched in the data, but do not necessarily explain all reported regions. We show that the different motifs and associated regions identified by diversity give insights into the various complexes that may be forming along the chromatin, something that has so far not been attempted from ChIP data. Webserver at http://diversity.ncl.res.in/; standalone (Mac OS X/Linux) from https://github.com/NarlikarLab/DIVERSITY/releases/tag/v1.0.0.

A high-throughput chromatin immunoprecipitation (ChIP) experiment identifies genomic regions bound by a protein in vivo. Current motif-discovery approaches seek an enriched motif signature in the reported regions, which they can attribute to the protein’s binding preferences. However, Diversity models the fact that since a ChIP experiment pulls down regions participating in all complexes involving the profiled protein, the reported regions are in all likelihood, a collection of different types of protein-DNA contacts. Diversity asks a different question: what sequence component caused a specific region to be reported in a ChIP experiment? The answer, in combination with additional data such as sequence conservation, SNPs, chromatin structure, downstream gene-expression, etc. can yield insights into the diverse regulatory mechanisms at play. The added benefits of a webserver and a standalone parallel version make diversity a practical tool for discovering new biology from ChIP experiments.

Collapse

Guo Y, Tian K, Zeng H, Guo X, Gifford DK. A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction. Genome Res 2018;28:891-900. [PMID: 29654070 PMCID: PMC5991515 DOI: 10.1101/gr.226852.117] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Accepted: 04/04/2018] [Indexed: 12/15/2022]

Tiana M, Acosta-Iborra B, Puente-Santamaría L, Hernansanz-Agustin P, Worsley-Hunt R, Masson N, García-Rio F, Mole D, Ratcliffe P, Wasserman WW, Jimenez B, del Peso L. The SIN3A histone deacetylase complex is required for a complete transcriptional response to hypoxia. Nucleic Acids Res 2018;46:120-133. [PMID: 29059365 PMCID: PMC5758878 DOI: 10.1093/nar/gkx951] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 10/02/2017] [Accepted: 10/06/2017] [Indexed: 01/02/2023] Open

Affiliation(s)

Maria Tiana Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (CSIC-UAM), 28029 Madrid, Spain IdiPaz, Instituto de Investigación Sanitaria del Hospital Universitario La Paz, 28029 Madrid, Spain CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, 28029 Madrid, Spain
Barbara Acosta-Iborra Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (CSIC-UAM), 28029 Madrid, Spain
Laura Puente-Santamaría Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (CSIC-UAM), 28029 Madrid, Spain
Pablo Hernansanz-Agustin Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (CSIC-UAM), 28029 Madrid, Spain Servicio Inmunología, Hospital Universitario de La Princesa, Instituto de Investigación Sanitaria del hospital de La Princesa, 28006 Madrid, Spain
Rebecca Worsley-Hunt Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia Vancouver, British Columbia V5Z 4H4, Canada
Norma Masson Target Discovery Institute, University of Oxford, Oxford OX3 7FZ, UK
Francisco García-Rio IdiPaz, Instituto de Investigación Sanitaria del Hospital Universitario La Paz, 28029 Madrid, Spain CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, 28029 Madrid, Spain Servicio de Neumología, Hospital Universitario La Paz, Instituto de Investigación Sanitaria del hospital de La Paz, 28029 Madrid, Spain
David Mole Henry Wellcome Building for Molecular Physiology, University of Oxford, Oxford OX3 7BN, UK
Peter Ratcliffe Target Discovery Institute, University of Oxford, Oxford OX3 7FZ, UK
Wyeth W Wasserman Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Department of Medical Genetics, University of British Columbia Vancouver, British Columbia V5Z 4H4, Canada
Benilde Jimenez Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (CSIC-UAM), 28029 Madrid, Spain IdiPaz, Instituto de Investigación Sanitaria del Hospital Universitario La Paz, 28029 Madrid, Spain CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, 28029 Madrid, Spain
Luis del Peso Departamento de Bioquímica, Universidad Autónoma de Madrid (UAM) and Instituto de Investigaciones Biomédicas ‘Alberto Sols’ (CSIC-UAM), 28029 Madrid, Spain IdiPaz, Instituto de Investigación Sanitaria del Hospital Universitario La Paz, 28029 Madrid, Spain CIBER de Enfermedades Respiratorias (CIBERES), Instituto de Salud Carlos III, 28029 Madrid, Spain

Collapse

Vishnevsky OV, Bocharnikov AV, Kolchanov NA. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets. J Bioinform Comput Biol 2017;16:1740012. [PMID: 29281953 DOI: 10.1142/s0219720017400121] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Fu H, Zhang X. Noncoding Variants Functional Prioritization Methods Based on Predicted Regulatory Factor Binding Sites. Curr Genomics 2017;18:322-331. [PMID: 29081688 PMCID: PMC5635616 DOI: 10.2174/1389202918666170228143619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Revised: 10/16/2016] [Accepted: 11/02/2016] [Indexed: 12/31/2022] Open

Abstract

BACKGROUNDS

With the advent of the post genomic era, the research for the genetic mechanism of the diseases has found to be increasingly depended on the studies of the genes, the gene-networks and gene-protein interaction networks. To explore gene expression and regulation, the researchers have carried out many studies on transcription factors and their binding sites (TFBSs). Based on the large amount of transcription factor binding sites predicting values in the deep learning models, further computation and analysis have been done to reveal the relationship between the gene mutation and the occurrence of the disease. It has been demonstrated that based on the deep learning methods, the performances of the prediction for the functions of the noncoding variants are outperforming than those of the conventional methods. The research on the prediction for functions of Single Nucleotide Polymorphisms (SNPs) is expected to uncover the mechanism of the gene mutation affection on traits and diseases of human beings.

RESULTS

We reviewed the conventional TFBSs identification methods from different perspectives. As for the deep learning methods to predict the TFBSs, we discussed the related problems, such as the raw data preprocessing, the structure design of the deep convolution neural network (CNN) and the model performance measure et al. And then we summarized the techniques that usually used in finding out the functional noncoding variants from de novo sequence.

CONCLUSION

Along with the rapid development of the high-throughout assays, more and more sample data and chromatin features would be conducive to improve the prediction accuracy of the deep convolution neural network for TFBSs identification. Meanwhile, getting more insights into the deep CNN framework itself has been proved useful for both the promotion on model performance and the development for more suitable design to sample data. Based on the feature values predicted by the deep CNN model, the prioritization model for functional noncoding variants would contribute to reveal the affection of gene mutation on the diseases.

Collapse

Zhang H, Zhu L, Huang DS. WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data. Sci Rep 2017;7:3217. [PMID: 28607381 PMCID: PMC5468353 DOI: 10.1038/s41598-017-03554-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 05/02/2017] [Indexed: 01/24/2023] Open

Tran NTL, Huang CH. Cloud-based MOTIFSIM: Detecting Similarity in Large DNA Motif Data Sets. J Comput Biol 2017;24:450-459. [DOI: 10.1089/cmb.2016.0080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Liu B, Yang J, Li Y, McDermaid A, Ma Q. An algorithmic perspective of de novo cis-regulatory motif finding based on ChIP-seq data. Brief Bioinform 2017;19:1069-1081. [DOI: 10.1093/bib/bbx026] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2016] [Indexed: 01/06/2023] Open

Kumagai Y, Vandenbon A, Teraguchi S, Akira S, Suzuki Y. Genome-wide map of RNA degradation kinetics patterns in dendritic cells after LPS stimulation facilitates identification of primary sequence and secondary structure motifs in mRNAs. BMC Genomics 2016;17:1032. [PMID: 28155712 PMCID: PMC5259865 DOI: 10.1186/s12864-016-3325-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Immune cells have to change their gene expression patterns dynamically in response to external stimuli such as lipopolysaccharide (LPS). The gene expression is regulated at multiple steps in eukaryotic cells, in which control of RNA levels at both the transcriptional level and the post-transcriptional level plays important role. Impairment of the control leads to aberrant immune responses such as excessive or impaired production of cytokines. However, genome-wide studies focusing on the post-transcriptional control were relatively rare until recently. Moreover, several RNA cis elements and RNA-binding proteins have been found to be involved in the process, but our general understanding remains poor, partly because identification of regulatory RNA motifs is very challenging in spite of its importance. We took advantage of genome-wide measurement of RNA degradation in combination with estimation of degradation kinetics by qualitative approach, and performed de novo prediction of RNA sequence and structure motifs.

METHODS

To classify genes by their RNA degradation kinetics, we first measured RNA degradation time course in mouse dendritic cells after LPS stimulation and the time courses were clustered to estimate degradation kinetics and to find patterns in the kinetics. Then genes were clustered by their similarity in degradation kinetics patterns. The 3' UTR sequences of a cluster was subjected to de novo sequence or structure motif prediction.

RESULTS

The quick degradation kinetics was found to be strongly associated with lower gene expression level, immediate regulation (both induction and repression) of gene expression level, and longer 3' UTR length. De novo sequence motif prediction found AU-rich element-like and TTP-binding sequence-like motifs which are enriched in quickly degrading genes. De novo structure motif prediction found a known functional motif, namely stem-loop structure containing sequence bound by RNA-binding protein Roquin and Regnase-1, as well as unknown motifs.

CONCLUSIONS

The current study indicated that degradation kinetics patterns lead to classification different from that by gene expression and the differential classification facilitates identification of functional motifs. Identification of novel motif candidates implied post-transcriptional controls different from that by known pairs of RNA-binding protein and RNA motif.

Collapse

Zhang Y, Wang P, Yan M. An Entropy-Based Position Projection Algorithm for Motif Discovery. BIOMED RESEARCH INTERNATIONAL 2016;2016:9127474. [PMID: 27882329 PMCID: PMC5110948 DOI: 10.1155/2016/9127474] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 09/20/2016] [Accepted: 10/05/2016] [Indexed: 12/31/2022]

Yu Q, Huo H, Feng D. PairMotifChIP: A Fast Algorithm for Discovery of Patterns Conserved in Large ChIP-seq Data Sets. BIOMED RESEARCH INTERNATIONAL 2016;2016:4986707. [PMID: 27843946 PMCID: PMC5098105 DOI: 10.1155/2016/4986707] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Revised: 09/04/2016] [Accepted: 09/27/2016] [Indexed: 11/18/2022]

Mathelier A, Xin B, Chiu TP, Yang L, Rohs R, Wasserman WW. DNA Shape Features Improve Transcription Factor Binding Site Predictions In Vivo. Cell Syst 2016;3:278-286.e4. [PMID: 27546793 PMCID: PMC5042832 DOI: 10.1016/j.cels.2016.07.001] [Citation(s) in RCA: 84] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2015] [Revised: 03/04/2016] [Accepted: 06/30/2016] [Indexed: 01/09/2023]

Guo H, Huo H, Yu Q. SMCis: An Effective Algorithm for Discovery of Cis-Regulatory Modules. PLoS One 2016;11:e0162968. [PMID: 27637070 PMCID: PMC5026350 DOI: 10.1371/journal.pone.0162968] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 08/31/2016] [Indexed: 12/02/2022] Open

Liu B, Zhang H, Zhou C, Li G, Fennell A, Wang G, Kang Y, Liu Q, Ma Q. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes. BMC Genomics 2016;17:578. [PMID: 27507169 PMCID: PMC4977642 DOI: 10.1186/s12864-016-2982-x] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Accepted: 07/29/2016] [Indexed: 11/10/2022] Open

Abstract

Background

Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction.

Results

Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP³). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP³ consistently outperformed other popular motif finding tools. We have integrated MP³ into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes.

Conclusion

The performance evaluation indicated that MP³ is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance progress in elucidating transcription regulation mechanism, thus provide benefit to the genomic research community and prokaryotic genome researchers in particular.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-016-2982-x) contains supplementary material, which is available to authorized users.

Collapse

Yu Q, Huo H, Zhao R, Feng D, Vitter JS, Huan J. RefSelect: a reference sequence selection algorithm for planted (l, d) motif search. BMC Bioinformatics 2016;17 Suppl 9:266. [PMID: 27454113 PMCID: PMC4959363 DOI: 10.1186/s12859-016-1130-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Al-Okaily A, Huang CH. ET-Motif: Solving the Exact (l, d)-Planted Motif Problem Using Error Tree Structure. J Comput Biol 2016;23:615-23. [PMID: 27152692 DOI: 10.1089/cmb.2015.0238] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open