1
|
Mortz M, Dégletagne C, Romestaing C, Duchamp C. Comparative genomic analysis identifies small open reading frames (sORFs) with peptide-encoding features in avian 16S rDNA. Genomics 2019; 112:1120-1127. [PMID: 31247329 DOI: 10.1016/j.ygeno.2019.06.026] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Revised: 06/01/2019] [Accepted: 06/21/2019] [Indexed: 12/14/2022]
Abstract
The mitochondrial genome (mt-DNA) functional repertoire has recently been enriched in mammals by the identification of functional small open reading frames (sORFs) embedded in ribosomal DNAs. Through comparative genomic analyses the presence of putatively functional sORFs was investigated in birds. Alignment of available avian mt-DNA sequences revealed highly conserved regions containing four putative sORFs that presented low insertion/deletion polymorphism rate (<0.1%) and preserved in frame start/stop codons in >80% of species. Detected sORFs included avian homologs of human Humanin and Short-Humanin-Like-Peptide 6 and two new sORFs not yet described in mammals. The amino-acid sequences of the four putative encoded peptides were strongly conserved among birds, with amino-acid p-distances (5.6 to 25.4%) similar to those calculated for typical avian mt-DNA-encoded proteins (14.8%). Conservation resulted from either drastic conservation of the nucleotide sequence or negative selection pressure. These data extend to birds the possibility that mitochondrial rDNA may encode small bioactive peptides.
Collapse
Affiliation(s)
- Mathieu Mortz
- Université de Lyon, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés, UMR 5023 CNRS, Université Claude Bernard Lyon 1, ENTPE, Villeurbanne Cedex, France
| | - Cyril Dégletagne
- Université de Lyon, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés, UMR 5023 CNRS, Université Claude Bernard Lyon 1, ENTPE, Villeurbanne Cedex, France
| | - Caroline Romestaing
- Université de Lyon, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés, UMR 5023 CNRS, Université Claude Bernard Lyon 1, ENTPE, Villeurbanne Cedex, France
| | - Claude Duchamp
- Université de Lyon, Laboratoire d'Ecologie des Hydrosystèmes Naturels et Anthropisés, UMR 5023 CNRS, Université Claude Bernard Lyon 1, ENTPE, Villeurbanne Cedex, France.
| |
Collapse
|
2
|
Malikanti R, Vadija R, Veeravarapu H, Mustyala KK, Malkhed V, Vuruputuri U. Identification of small molecular ligands as potent inhibitors of fatty acid metabolism in Mycobacterium tuberculosis. J Mol Struct 2017. [DOI: 10.1016/j.molstruc.2017.08.090] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
3
|
Blanco E, Corominas M. CBS: an open platform that integrates predictive methods and epigenetics information to characterize conserved regulatory features in multiple Drosophila genomes. BMC Genomics 2012; 13:688. [PMID: 23228284 PMCID: PMC3564944 DOI: 10.1186/1471-2164-13-688] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Accepted: 11/28/2012] [Indexed: 12/11/2022] Open
Abstract
Background Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila.
Collapse
Affiliation(s)
- Enrique Blanco
- Departament de Genètica and Institut de Biomedicina (IBUB), Universitat de Barcelona, Av, Diagonal 643, 08028, Barcelona, Spain.
| | | |
Collapse
|
4
|
Bi C. Memetic algorithms for de novo motif-finding in biomedical sequences. Artif Intell Med 2012; 56:1-17. [DOI: 10.1016/j.artmed.2012.04.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Revised: 04/03/2012] [Accepted: 04/10/2012] [Indexed: 11/26/2022]
|
5
|
Maity TS, Close DW, Valdez YE, Nowak-Lovato K, Marti-Arbona R, Nguyen TT, Unkefer PJ, Hong-Geller E, Bradbury ARM, Dunbar J. Discovery of DNA operators for TetR and MarR family transcription factors from Burkholderia xenovorans. Microbiology (Reading) 2012; 158:571-582. [DOI: 10.1099/mic.0.055129-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Affiliation(s)
- Tuhin Subhra Maity
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Devin W. Close
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Yolanda E. Valdez
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Kristy Nowak-Lovato
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | | - Tinh T. Nguyen
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | - Pat J. Unkefer
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | | | | - John Dunbar
- Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| |
Collapse
|
6
|
Aerts S. Computational strategies for the genome-wide identification of cis-regulatory elements and transcriptional targets. Curr Top Dev Biol 2012; 98:121-45. [PMID: 22305161 DOI: 10.1016/b978-0-12-386499-4.00005-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Transcription factors (TFs) are key proteins that decode the information in our genome to express a precise and unique set of proteins and RNA molecules in each cell type in our body. These factors play a pivotal role in all biological processes, including the determination of a cell's fate during development and the maintenance of a cell's physiological function. To achieve this, a TF binds to specific DNA sequences in the noncoding part of the genome, recruits chromatin modifiers and cofactors, and directs the transcription initiation rate of its "target genes." Therefore, a key challenge in deciphering a transcriptional switch is to identify the direct target genes of the master regulators that control the switch, the cis-regulatory elements implementing (auto-)regulatory loops, and the target genes of all the TFs in the downstream regulatory network. A better knowledge of a TF's targetome during specification and differentiation of a particular cell type will generate mechanistic insight into its developmental program. Here, I review computational strategies and methods to predict transcriptional targets by genome-wide searches for TF binding sites using position weight matrices, motif clusters, phylogenetic footprinting, chromatin binding and accessibility data, enhancer classification, motif enrichment, and gene expression signatures.
Collapse
Affiliation(s)
- Stein Aerts
- Laboratory of Computational Biology, Center for Human Genetics, Katholieke Universiteit Leuven, Leuven, Belgium
| |
Collapse
|
7
|
Zhou Q. On weight matrix and free energy models for sequence motif detection. J Comput Biol 2011; 17:1621-38. [PMID: 21128853 DOI: 10.1089/cmb.2009.0142] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The problem of motif detection can be formulated as the construction of a discriminant function to separate sequences of a specific pattern from background. In computational biology, motif detection is used to predict DNA binding sites of a transcription factor (TF), mostly based on the weight matrix (WM) model or the Gibbs free energy (FE) model. However, despite the wide applications, theoretical analysis of these two models and their predictions is still lacking. We derive asymptotic error rates of prediction procedures based on these models under different data generation assumptions. This allows a theoretical comparison between the WM-based and the FE-based predictions in terms of asymptotic efficiency. Applications of the theoretical results are demonstrated with empirical studies on ChIP-seq data and protein binding microarray data. We find that, irrespective of underlying data generation mechanisms, the FE approach shows higher or comparable predictive power relative to the WM approach when the number of observed binding sites used for constructing a discriminant decision is not too small.
Collapse
Affiliation(s)
- Qing Zhou
- Department of Statistics, University of California, Los Angeles, California 90095, USA.
| |
Collapse
|
8
|
Sánchez-Cabo F, Rainer J, Dopazo A, Trajanoski Z, Hackl H. Insights into global mechanisms and disease by gene expression profiling. Methods Mol Biol 2011; 719:269-98. [PMID: 21370089 DOI: 10.1007/978-1-61779-027-0_13] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Transcriptomics has played an essential role as proof of concept in the development of experimental and bioinformatics approaches for the generation and analysis of Omics data. We are giving an introduction on how large-scale technologies for gene expression profiling, especially microarrays, have changed the view from studying single molecular events to a systems level view of global mechanisms in a cell, the biological processes, and their pathological mutations. The main platforms available for gene expression profiling (from microarrays to RNA-seq) are presented and the general concepts that need to be taken into account for proper data analysis in order to extract objective and general conclusions from transcriptomics experiments are introduced. We also describe the available main bioinformatics resources used for this purpose.
Collapse
Affiliation(s)
- Fátima Sánchez-Cabo
- Genomics Unit, Centro Nacional de Investigaciones Cardiovasculares, Madrid, Spain
| | | | | | | | | |
Collapse
|
9
|
Mason MJ, Plath K, Zhou Q. Identification of context-dependent motifs by contrasting ChIP binding data. ACTA ACUST UNITED AC 2010; 26:2826-32. [PMID: 20870645 DOI: 10.1093/bioinformatics/btq546] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION DNA binding proteins play crucial roles in the regulation of gene expression. Transcription factors (TFs) activate or repress genes directly while other proteins influence chromatin structure for transcription. Binding sites of a TF exhibit a similar sequence pattern called a motif. However, a one-to-one map does not exist between each TF and motif. Many TFs in a protein family may recognize the same motif with subtle nucleotide differences leading to different binding affinities. Additionally, a particular TF may bind different motifs under certain conditions, for example in the presence of different co-regulators. The availability of genome-wide binding data of multiple collaborative TFs makes it possible to detect such context-dependent motifs. RESULTS We developed a contrast motif finder (CMF) for the de novo identification of motifs that are differentially enriched in two sets of sequences. Applying this method to a number of TF binding datasets from mouse embryonic stem cells, we demonstrate that CMF achieves substantially higher accuracy than several well-known motif finding methods. By contrasting sequences bound by distinct sets of TFs, CMF identified two different motifs that may be recognized by Oct4 dependent on the presence of another co-regulator and detected subtle motif signals that may be associated with potential competitive binding between Sox2 and Tcf3. AVAILABILITY The software CMF is freely available for academic use at www.stat.ucla.edu/∼zhou/CMF.
Collapse
Affiliation(s)
- Mike J Mason
- Department of Statistics, University of California, Los Angeles, CA 90095, USA
| | | | | |
Collapse
|
10
|
Fuellen G. Evolution of gene regulation--on the road towards computational inferences. Brief Bioinform 2010; 12:122-31. [PMID: 20702596 DOI: 10.1093/bib/bbq060] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
If fragments of DNA are transcribed (expressed), they deserve to be called (parts of) a gene. Whether transcription takes place depends on the 'gene regulatory network'. This network is defined as the complex interplay of the sequence, biochemical modifications and structure of the chromosomal DNA with the regulatory proteins/RNA (transcription factors, co-factors, regulating RNA and the transcriptional apparatus itself). Gene regulatory networks play a role in various stages of development as well as in the maintenance of the organism; in this review we will concentrate on the former. Their evolutionary reconstruction is daunting (to say the least), and bioinformatics tools are in their infancy. However, gain of understanding offers a reward beyond itself, since evolutionary considerations can enable discoveries in the first place, e.g. the computational identification of conserved transcription factor binding sites. We discuss the evolution of gene regulation in the context of the 'Genetic Theory of Morphological Evolution' as described by Carroll, identifying those parts of the theory that are relevant for bioinformatics, and their implications. We discuss the important question of how bioinformatics analysis results on the evolution of gene regulation may be validated. Finally, we briefly exemplify use of the UCSC genome browser, exploiting its pre-computed alignments to describe the evolution of gene regulation.
Collapse
Affiliation(s)
- Georg Fuellen
- Institute for Biostatistics and Informatics in Medicine and Ageing Research-IBIMA, University of Rostock, Medical Faculty, Ernst-Heydemann-Str. 8, 18057 Rostock, Germany.
| |
Collapse
|
11
|
Genome-wide identification of cis-regulatory motifs and modules underlying gene coregulation using statistics and phylogeny. Proc Natl Acad Sci U S A 2010; 107:14615-20. [PMID: 20671200 DOI: 10.1073/pnas.1002876107] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Cell fate determination depends in part on the establishment of specific transcriptional programs of gene expression. These programs result from the interpretation of the genomic cis-regulatory information by sequence-specific factors. Decoding this information in sequenced genomes is an important issue. Here, we developed statistical analysis tools to computationally identify the cis-regulatory elements that control gene expression in a set of coregulated genes. Starting with a small number of validated and/or predicted cis-regulatory modules (CRMs) in a reference species as a training set, but with no a priori knowledge of the factors acting in trans, we computationally predicted transcription factor binding sites (TFBSs) and genomic CRMs underlying coregulation. This method was applied to the gene expression program active in Drosophila melanogaster sensory organ precursor cells (SOPs), a specific type of neural progenitor cells. Mutational analysis showed that four, including one newly characterized, out of the five top-ranked families of predicted TFBSs were required for SOP-specific gene expression. Additionaly, 19 out of the 29 top-ranked predicted CRMs directed gene expression in neural progenitor cells, i.e., SOPs or larval brain neuroblasts, with a notable fraction active in SOPs (11/29). We further identified the lola gene as the target of two SOP-specific CRMs and found that the lola gene contributed to SOP specification. The statistics and phylogeny-based tools described here can be more generally applied to identify the cis-regulatory elements of specific gene regulatory networks in any family of related species with sequenced genomes.
Collapse
|
12
|
Meng G, Mosig A, Vingron M. A computational evaluation of over-representation of regulatory motifs in the promoter regions of differentially expressed genes. BMC Bioinformatics 2010; 11:267. [PMID: 20487530 PMCID: PMC3098066 DOI: 10.1186/1471-2105-11-267] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2009] [Accepted: 05/20/2010] [Indexed: 12/28/2022] Open
Abstract
Background Observed co-expression of a group of genes is frequently attributed to co-regulation by shared transcription factors. This assumption has led to the hypothesis that promoters of co-expressed genes should share common regulatory motifs, which forms the basis for numerous computational tools that search for these motifs. While frequently explored for yeast, the validity of the underlying hypothesis has not been assessed systematically in mammals. This demonstrates the need for a systematic and quantitative evaluation to what degree co-expressed genes share over-represented motifs for mammals. Results We identified 33 experiments for human and mouse in the ArrayExpress Database where transcription factors were manipulated and which exhibited a significant number of differentially expressed genes. We checked for over-representation of transcription factor binding sites in up- or down-regulated genes using the over-representation analysis tool oPOSSUM. In 25 out of 33 experiments, this procedure identified the binding matrices of the affected transcription factors. We also carried out de novo prediction of regulatory motifs shared by differentially expressed genes. Again, the detected motifs shared significant similarity with the matrices of the affected transcription factors. Conclusions Our results support the claim that functional regulatory motifs are over-represented in sets of differentially expressed genes and that they can be detected with computational methods.
Collapse
Affiliation(s)
- Guofeng Meng
- CAS-MPG Partner Institute and Key Laboratory for Computational Biology, Shanghai Institutes for Biological Sciences, 320 Yue Yang Road, 200031, Shanghai, China.
| | | | | |
Collapse
|
13
|
van Hijum SAFT, Medema MH, Kuipers OP. Mechanisms and evolution of control logic in prokaryotic transcriptional regulation. Microbiol Mol Biol Rev 2009; 73:481-509, Table of Contents. [PMID: 19721087 PMCID: PMC2738135 DOI: 10.1128/mmbr.00037-08] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
A major part of organismal complexity and versatility of prokaryotes resides in their ability to fine-tune gene expression to adequately respond to internal and external stimuli. Evolution has been very innovative in creating intricate mechanisms by which different regulatory signals operate and interact at promoters to drive gene expression. The regulation of target gene expression by transcription factors (TFs) is governed by control logic brought about by the interaction of regulators with TF binding sites (TFBSs) in cis-regulatory regions. A factor that in large part determines the strength of the response of a target to a given TF is motif stringency, the extent to which the TFBS fits the optimal TFBS sequence for a given TF. Advances in high-throughput technologies and computational genomics allow reconstruction of transcriptional regulatory networks in silico. To optimize the prediction of transcriptional regulatory networks, i.e., to separate direct regulation from indirect regulation, a thorough understanding of the control logic underlying the regulation of gene expression is required. This review summarizes the state of the art of the elements that determine the functionality of TFBSs by focusing on the molecular biological mechanisms and evolutionary origins of cis-regulatory regions.
Collapse
Affiliation(s)
- Sacha A F T van Hijum
- Molecular Genetics, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Kerklaan 30, 9751 NN Haren, The Netherlands.
| | | | | |
Collapse
|
14
|
Affiliation(s)
- P Librado
- Departament de Genètica, Facultat de Biologia and Institut de Recerca de la Biodiversitat, Universitat de Barcelona, Barcelona, Spain
| | | |
Collapse
|
15
|
Abstract
The BioSapiens network has developed a distributed infrastructure for genome and proteome annotation
by laboratories anywhere in the world. The BioSapiens network has developed a distributed infrastructure for genome and proteome annotation by laboratories anywhere in the world.
Collapse
|