1
|
Zhang Y, Wang P, Yan M. An Entropy-Based Position Projection Algorithm for Motif Discovery. BIOMED RESEARCH INTERNATIONAL 2016; 2016:9127474. [PMID: 27882329 PMCID: PMC5110948 DOI: 10.1155/2016/9127474] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 09/20/2016] [Accepted: 10/05/2016] [Indexed: 12/31/2022]
Abstract
Motif discovery problem is crucial for understanding the structure and function of gene expression. Over the past decades, many attempts using consensus and probability training model for motif finding are successful. However, the most existing motif discovery algorithms are still time-consuming or easily trapped in a local optimum. To overcome these shortcomings, in this paper, we propose an entropy-based position projection algorithm, called EPP, which designs a projection process to divide the dataset and explores the best local optimal solution. The experimental results on real DNA sequences, Tompa data, and ChIP-seq data show that EPP is advantageous in dealing with the motif discovery problem and outperforms current widely used algorithms.
Collapse
Affiliation(s)
- Yipu Zhang
- Department of Automation, School of Electronics and Control Engineering, Chang'An University, Xi'an 710064, China
| | - Ping Wang
- Department of Automation, School of Electronics and Control Engineering, Chang'An University, Xi'an 710064, China
| | - Maode Yan
- Department of Automation, School of Electronics and Control Engineering, Chang'An University, Xi'an 710064, China
| |
Collapse
|
2
|
Harigua-Souiai E, Cortes-Ciriano I, Desdouits N, Malliavin TE, Guizani I, Nilges M, Blondel A, Bouvier G. Identification of binding sites and favorable ligand binding moieties by virtual screening and self-organizing map analysis. BMC Bioinformatics 2015; 16:93. [PMID: 25888251 PMCID: PMC4381396 DOI: 10.1186/s12859-015-0518-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Accepted: 02/24/2015] [Indexed: 11/24/2022] Open
Abstract
Background Identifying druggable cavities on a protein surface is a crucial step in structure based drug design. The cavities have to present suitable size and shape, as well as appropriate chemical complementarity with ligands. Results We present a novel cavity prediction method that analyzes results of virtual screening of specific ligands or fragment libraries by means of Self-Organizing Maps. We demonstrate the method with two thoroughly studied proteins where it successfully identified their active sites (AS) and relevant secondary binding sites (BS). Moreover, known active ligands mapped the AS better than inactive ones. Interestingly, docking a naive fragment library brought even more insight. We then systematically applied the method to the 102 targets from the DUD-E database, where it showed a 90% identification rate of the AS among the first three consensual clusters of the SOM, and in 82% of the cases as the first one. Further analysis by chemical decomposition of the fragments improved BS prediction. Chemical substructures that are representative of the active ligands preferentially mapped in the AS. Conclusion The new approach provides valuable information both on relevant BSs and on chemical features promoting bioactivity. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0518-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Emna Harigua-Souiai
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France. .,Laboratory of Molecular Epidemiology and Experimental Pathology - LR11IPT04, Institut Pasteur de Tunis, Université Tunis el Manar - Tunisia, 13, Place Pasteur, Tunis, 1002, Tunisia. .,University of Carthage, Faculty of sciences of Bizerte - Tunisia, Jarzouna, 7021, Tunisia.
| | - Isidro Cortes-Ciriano
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Nathan Desdouits
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Thérèse E Malliavin
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Ikram Guizani
- Laboratory of Molecular Epidemiology and Experimental Pathology - LR11IPT04, Institut Pasteur de Tunis, Université Tunis el Manar - Tunisia, 13, Place Pasteur, Tunis, 1002, Tunisia.
| | - Michael Nilges
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Arnaud Blondel
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| | - Guillaume Bouvier
- Institut Pasteur, Unité de Bioinformatique Structurale, CNRS UMR 3528, Département de Biologie Structurale et Chimie, 25, rue du Dr Roux, Paris, 75015, France.
| |
Collapse
|
4
|
Hassanien AE, Al-Shammari ET, Ghali NI. Computational intelligence techniques in bioinformatics. Comput Biol Chem 2013; 47:37-47. [PMID: 23891719 DOI: 10.1016/j.compbiolchem.2013.04.007] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Revised: 04/06/2013] [Accepted: 04/24/2013] [Indexed: 10/26/2022]
Abstract
Computational intelligence (CI) is a well-established paradigm with current systems having many of the characteristics of biological computers and capable of performing a variety of tasks that are difficult to do using conventional techniques. It is a methodology involving adaptive mechanisms and/or an ability to learn that facilitate intelligent behavior in complex and changing environments, such that the system is perceived to possess one or more attributes of reason, such as generalization, discovery, association and abstraction. The objective of this article is to present to the CI and bioinformatics research communities some of the state-of-the-art in CI applications to bioinformatics and motivate research in new trend-setting directions. In this article, we present an overview of the CI techniques in bioinformatics. We will show how CI techniques including neural networks, restricted Boltzmann machine, deep belief network, fuzzy logic, rough sets, evolutionary algorithms (EA), genetic algorithms (GA), swarm intelligence, artificial immune systems and support vector machines, could be successfully employed to tackle various problems such as gene expression clustering and classification, protein sequence classification, gene selection, DNA fragment assembly, multiple sequence alignment, and protein function prediction and its structure. We discuss some representative methods to provide inspiring examples to illustrate how CI can be utilized to address these problems and how bioinformatics data can be characterized by CI. Challenges to be addressed and future directions of research are also presented and an extensive bibliography is included.
Collapse
Affiliation(s)
- Aboul Ella Hassanien
- Faculty of Computers and Information, Cairo University, 5 Ahmed Zewal Street, Orman, Giza, Egypt; Scientific Research Group in Egypt (SRGE), Egypt(1).
| | | | | |
Collapse
|
5
|
Sahu TK, Rao AR, Vasisht S, Singh N, Singh UP. Computational approaches, databases and tools for in silico motif discovery. Interdiscip Sci 2012; 4:239-255. [PMID: 23354813 DOI: 10.1007/s12539-012-0141-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Revised: 04/12/2012] [Accepted: 06/13/2012] [Indexed: 06/01/2023]
Abstract
Motifs are the biologically significant fragments of nucleotide or peptide sequences in a specific pattern. Motifs are categorized as structural motifs and sequence motifs. These are discovered by phylogenetic studies of similar genes across species. Structural motifs are formed by three dimensional arrangements of amino acids consisting of two or more α helices or β strands whereas sequence motifs are formed by the nucleotide fragments appearing in the exons of a gene. The arrangement of residues in structural motifs may not be continuous while it is continuous in sequence motifs. Sequence motifs may encode to the structural motifs. The algorithms used for motif discovery are important part of the bio-computational studies. The purpose of motif discovery is to identify patterns in biopolymer (nucleotide or protein) sequences to understand the structure and function of the molecules and their evolutionary aspects. The main aim of this paper is to provide systematic compilation of a review on different approaches, databases and tools used in motif discovery.
Collapse
Affiliation(s)
- Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, Indian Agricultural Statistics Research Institute, New Delhi, India
| | | | | | | | | |
Collapse
|
6
|
Zhao H, Yang Y, Zhou Y. Structure-based prediction of DNA-binding proteins by structural alignment and a volume-fraction corrected DFIRE-based energy function. Bioinformatics 2010; 26:1857-63. [PMID: 20525822 DOI: 10.1093/bioinformatics/btq295] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
MOTIVATION Template-based prediction of DNA binding proteins requires not only structural similarity between target and template structures but also prediction of binding affinity between the target and DNA to ensure binding. Here, we propose to predict protein-DNA binding affinity by introducing a new volume-fraction correction to a statistical energy function based on a distance-scaled, finite, ideal-gas reference (DFIRE) state. RESULTS We showed that this energy function together with the structural alignment program TM-align achieves the Matthews correlation coefficient (MCC) of 0.76 with an accuracy of 98%, a precision of 93% and a sensitivity of 64%, for predicting DNA binding proteins in a benchmark of 179 DNA binding proteins and 3797 non-binding proteins. The MCC value is substantially higher than the best MCC value of 0.69 given by previous methods. Application of this method to 2235 structural genomics targets uncovered 37 as DNA binding proteins, 27 (73%) of which are putatively DNA binding and only 1 protein whose annotated functions do not contain DNA binding, while the remaining proteins have unknown function. The method provides a highly accurate and sensitive technique for structure-based prediction of DNA binding proteins. AVAILABILITY The method is implemented as a part of the Structure-based function-Prediction On-line Tools (SPOT) package available at http://sparks.informatics.iupui.edu/spot
Collapse
Affiliation(s)
- Huiying Zhao
- School of Informatics, Indiana University Purdue University, Indianapolis, IN 46202, USA
| | | | | |
Collapse
|