1
|
Stevens KM, Warnecke T. Histone variants in archaea - An undiscovered country. Semin Cell Dev Biol 2023; 135:50-58. [PMID: 35221208 DOI: 10.1016/j.semcdb.2022.02.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 02/20/2022] [Accepted: 02/20/2022] [Indexed: 12/23/2022]
Abstract
Exchanging core histones in the nucleosome for paralogous variants can have important functional ramifications. Many of these variants, and their physiological roles, have been characterized in exquisite detail in model eukaryotes, including humans. In comparison, our knowledge of histone biology in archaea remains rudimentary. This is true in particular for our knowledge of histone variants. Many archaea encode several histone genes that differ in sequence, but do these paralogs make distinct, adaptive contributions to genome organization and regulation in a manner comparable to eukaryotes? Below, we review what we know about histone variants in archaea at the level of structure, regulation, and evolution. In all areas, our knowledge pales when compared to the wealth of insight that has been gathered for eukaryotes. Recent findings, however, provide tantalizing glimpses into a rich and largely undiscovered country that is at times familiar and eukaryote-like and at times strange and uniquely archaeal. We sketch a preliminary roadmap for further exploration of this country; an undertaking that may ultimately shed light not only on chromatin biology in archaea but also on the origin of histone-based chromatin in eukaryotes.
Collapse
Affiliation(s)
- Kathryn M Stevens
- Medical Research Council London Institute of Medical Sciences, London, United Kingdom; Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom
| | - Tobias Warnecke
- Medical Research Council London Institute of Medical Sciences, London, United Kingdom; Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, United Kingdom.
| |
Collapse
|
2
|
Minazzato G, Gasparrini M, Heroux A, Sernova NV, Rodionov DA, Cianci M, Sorci L, Raffaelli N. Bacterial NadQ (COG4111) is a Nudix-like, ATP-responsive regulator of NAD biosynthesis. J Struct Biol 2022; 214:107917. [PMID: 36332744 DOI: 10.1016/j.jsb.2022.107917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 10/13/2022] [Accepted: 10/27/2022] [Indexed: 11/06/2022]
Abstract
Nicotinamide-adenine dinucleotide (NAD) is centrally important to metabolic reactions that involve redox chemistry. In bacteria, NAD biosynthesis is controlled by different transcription factors, depending on the species. Among the four regulators identified so far, the protein NadQ is reported to act as a repressor of the de novo NAD biosynthetic pathway in proteobacteria. Using comparative genomics, a systematic reconstruction of NadQ regulons in thousands of fully sequenced bacterial genomes has been performed, confirming that NadQ is present in α-proteobacteria and some β- and γ-proteobacteria, including pathogens like Bordetella pertussis and Neisseria meningitidis, where it likely controls de novo NAD biosynthesis. Through mobility shift assay and mutagenesis, the DNA binding activity of NadQ from Agrobacterium tumefaciens was experimentally validated and determined to be suppressed by ATP. The crystal structures of NadQ in native form and in complex with ATP were determined, indicating that NadQ is a dimer, with each monomer composed of an N-terminal Nudix domain hosting the effector binding site and a C-terminal winged helix-turn-helix domain that binds DNA. Within the dimer, we found one ATP molecule bound, at saturating concentration of the ligand, in keeping with an intrinsic asymmetry of the quaternary structure. Overall, this study provided the basis for depicting a working model of NadQ regulation mechanism.
Collapse
Affiliation(s)
- Gabriele Minazzato
- Department of Agricultural, Food and Environmental Sciences, Polytechnic University of Marche, Ancona, Italy
| | - Massimiliano Gasparrini
- Department of Agricultural, Food and Environmental Sciences, Polytechnic University of Marche, Ancona, Italy
| | - Annie Heroux
- Elettra - Sincrotrone Trieste S.C.P.A., Basovizza, Italy
| | - Natalia V Sernova
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Dmitry A Rodionov
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia; Sanford-Burnham-Prebys Medical Discovery Institute, La Jolla, CA, USA
| | - Michele Cianci
- Department of Agricultural, Food and Environmental Sciences, Polytechnic University of Marche, Ancona, Italy
| | - Leonardo Sorci
- Department of Materials, Environmental Sciences and Urban Planning, Division of Bioinformatics and Biochemistry, Polytechnic University of Marche, Ancona, Italy.
| | - Nadia Raffaelli
- Department of Agricultural, Food and Environmental Sciences, Polytechnic University of Marche, Ancona, Italy.
| |
Collapse
|
3
|
Suvorova IA, Gelfand MS. Comparative Analysis of the IclR-Family of Bacterial Transcription Factors and Their DNA-Binding Motifs: Structure, Positioning, Co-Evolution, Regulon Content. Front Microbiol 2021; 12:675815. [PMID: 34177859 PMCID: PMC8222616 DOI: 10.3389/fmicb.2021.675815] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 05/14/2021] [Indexed: 11/13/2022] Open
Abstract
The IclR-family is a large group of transcription factors (TFs) regulating various biological processes in diverse bacteria. Using comparative genomics techniques, we have identified binding motifs of IclR-family TFs, reconstructed regulons and analyzed their content, finding co-occurrences between the regulated COGs (clusters of orthologous genes), useful for future functional characterizations of TFs and their regulated genes. We describe two main types of IclR-family motifs, similar in sequence but different in the arrangement of the half-sites (boxes), with GKTYCRYW3-4RYGRAMC and TGRAACAN1-2TGTTYCA consensuses, and also predict that TFs in 32 orthologous groups have binding sites comprised of three boxes with alternating direction, which implies two possible alternative modes of dimerization of TFs. We identified trends in site positioning relative to the translational gene start, and show that TFs in 94 orthologous groups bind tandem sites with 18-22 nucleotides between their centers. We predict protein-DNA contacts via the correlation analysis of nucleotides in binding sites and amino acids of the DNA-binding domain of TFs, and show that the majority of interacting positions and predicted contacts are similar for both types of motifs and conform well both to available experimental data and to general protein-DNA interaction trends.
Collapse
Affiliation(s)
- Inna A Suvorova
- Institute for Information Transmission Problems of Russian Academy of Sciences (The Kharkevich Institute), Moscow, Russia
| | - Mikhail S Gelfand
- Institute for Information Transmission Problems of Russian Academy of Sciences (The Kharkevich Institute), Moscow, Russia.,Skolkovo Institute of Science and Technology, Moscow, Russia
| |
Collapse
|
4
|
Suvorova IA, Gelfand MS. Comparative Genomic Analysis of the Regulation of Aromatic Metabolism in Betaproteobacteria. Front Microbiol 2019; 10:642. [PMID: 30984152 PMCID: PMC6449761 DOI: 10.3389/fmicb.2019.00642] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 03/14/2019] [Indexed: 01/23/2023] Open
Abstract
Aromatic compounds are a common carbon and energy source for many microorganisms, some of which can even degrade toxic chloroaromatic xenobiotics. This comparative study of aromatic metabolism in 32 Betaproteobacteria species describes the links between several transcription factors (TFs) that control benzoate (BenR, BenM, BoxR, BzdR), catechol (CatR, CatM, BenM), chlorocatechol (ClcR), methylcatechol (MmlR), 2,4-dichlorophenoxyacetate (TfdR, TfdS), phenol (AphS, AphR, AphT), biphenyl (BphS), and toluene (TbuT) metabolism. We characterize the complexity and variability in the organization of aromatic metabolism operons and the structure of regulatory networks that may differ even between closely related species. Generally, the upper parts of pathways, rare pathway variants, and degradative pathways of exotic and complex, in particular, xenobiotic compounds are often controlled by a single TF, while the regulation of more common and/or central parts of the aromatic metabolism may vary widely and often involves several TFs with shared and/or dual, or cascade regulation. The most frequent and at the same time variable connections exist between AphS, AphR, AphT, and BenR. We have identified a novel LysR-family TF that regulates the metabolism of catechol (or some catechol derivative) and either substitutes CatR(M)/BenM, or shares functions with it. We have also predicted several new members of aromatic metabolism regulons, in particular, some COGs regulated by several different TFs.
Collapse
Affiliation(s)
- Inna A Suvorova
- Institute for Information Transmission Problems RAS (The Kharkevich Institute), Moscow, Russia
| | - Mikhail S Gelfand
- Institute for Information Transmission Problems RAS (The Kharkevich Institute), Moscow, Russia.,Faculty of Computer Science, Higher School of Economics, Moscow, Russia.,Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Russia
| |
Collapse
|
5
|
Phyletic Distribution and Lineage-Specific Domain Architectures of Archaeal Two-Component Signal Transduction Systems. J Bacteriol 2018; 200:JB.00681-17. [PMID: 29263101 PMCID: PMC5847659 DOI: 10.1128/jb.00681-17] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Accepted: 12/11/2017] [Indexed: 12/14/2022] Open
Abstract
The two-component signal transduction (TCS) machinery is a key mechanism of sensing environmental changes in the prokaryotic world. TCS systems have been characterized thoroughly in bacteria but to a much lesser extent in archaea. Here, we provide an updated census of more than 2,000 histidine kinases and response regulators encoded in 218 complete archaeal genomes, as well as unfinished genomes available from metagenomic data. We describe the domain architectures of the archaeal TCS components, including several novel output domains, and discuss the evolution of the archaeal TCS machinery. The distribution of TCS systems in archaea is strongly biased, with high levels of abundance in haloarchaea and thaumarchaea but none detected in the sequenced genomes from the phyla Crenarchaeota, Nanoarchaeota, and Korarchaeota The archaeal sensor histidine kinases are generally similar to their well-studied bacterial counterparts but are often located in the cytoplasm and carry multiple PAS and/or GAF domains. In contrast, archaeal response regulators differ dramatically from the bacterial ones. Most archaeal genomes do not encode any of the major classes of bacterial response regulators, such as the DNA-binding transcriptional regulators of the OmpR/PhoB, NarL/FixJ, NtrC, AgrA/LytR, and ActR/PrrA families and the response regulators with GGDEF and/or EAL output domains. Instead, archaea encode multiple copies of response regulators containing either the stand-alone receiver (REC) domain or combinations of REC with PAS and/or GAF domains. Therefore, the prevailing mechanism of archaeal TCS signaling appears to be via a variety of protein-protein interactions, rather than direct transcriptional regulation.IMPORTANCE Although the Archaea represent a separate domain of life, their signaling systems have been assumed to be closely similar to the bacterial ones. A study of the domain architectures of the archaeal two-component signal transduction (TCS) machinery revealed an overall similarity of archaeal and bacterial sensory modules but substantial differences in the signal output modules. The prevailing mechanism of archaeal TCS signaling appears to involve various protein-protein interactions rather than direct transcription regulation. The complete list of histidine kinases and response regulators encoded in the analyzed archaeal genomes is available online at http://www.ncbi.nlm.nih.gov/Complete_Genomes/TCSarchaea.html.
Collapse
|
6
|
Suvorova IA, Rodionov DA. Comparative genomics of pyridoxal 5'-phosphate-dependent transcription factor regulons in Bacteria. Microb Genom 2016; 2:e000047. [PMID: 28348826 PMCID: PMC5320631 DOI: 10.1099/mgen.0.000047] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 12/16/2015] [Indexed: 12/13/2022] Open
Abstract
The MocR-subfamily transcription factors (MocR-TFs) characterized by the GntR-family DNA-binding domain and aminotransferase-like sensory domain are broadly distributed among certain lineages of Bacteria. Characterized MocR-TFs bind pyridoxal 5'-phosphate (PLP) and control transcription of genes involved in PLP, gamma aminobutyric acid (GABA) and taurine metabolism via binding specific DNA operator sites. To identify putative target genes and DNA binding motifs of MocR-TFs, we performed comparative genomics analysis of over 250 bacterial genomes. The reconstructed regulons for 825 MocR-TFs comprise structural genes from over 200 protein families involved in diverse biological processes. Using the genome context and metabolic subsystem analysis we tentatively assigned functional roles for 38 out of 86 orthologous groups of studied regulators. Most of these MocR-TF regulons are involved in PLP metabolism, as well as utilization of GABA, taurine and ectoine. The remaining studied MocR-TF regulators presumably control genes encoding enzymes involved in reduction/oxidation processes, various transporters and PLP-dependent enzymes, for example aminotransferases. Predicted DNA binding motifs of MocR-TFs are generally similar in each orthologous group and are characterized by two to four repeated sequences. Identified motifs were classified according to their structures. Motifs with direct and/or inverted repeat symmetry constitute the majority of inferred DNA motifs, suggesting preferable TF dimerization in head-to-tail or head-to-head configuration. The obtained genomic collection of in silico reconstructed MocR-TF motifs and regulons in Bacteria provides a basis for future experimental characterization of molecular mechanisms for various regulators in this family.
Collapse
Affiliation(s)
- Inna A. Suvorova
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
| | - Dmitry A. Rodionov
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Science, Moscow, Russia
- Sanford-Burnham-Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
- Correspondence D. A. Rodionov ()
| |
Collapse
|
7
|
Gasch P, Fundinger M, Müller JT, Lee T, Bailey-Serres J, Mustroph A. Redundant ERF-VII Transcription Factors Bind to an Evolutionarily Conserved cis-Motif to Regulate Hypoxia-Responsive Gene Expression in Arabidopsis. THE PLANT CELL 2016; 28:160-80. [PMID: 26668304 PMCID: PMC4746684 DOI: 10.1105/tpc.15.00866] [Citation(s) in RCA: 183] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2015] [Accepted: 12/01/2015] [Indexed: 05/08/2023]
Abstract
The response of Arabidopsis thaliana to low-oxygen stress (hypoxia), such as during shoot submergence or root waterlogging, includes increasing the levels of ∼50 hypoxia-responsive gene transcripts, many of which encode enzymes associated with anaerobic metabolism. Upregulation of over half of these mRNAs involves stabilization of five group VII ethylene response factor (ERF-VII) transcription factors, which are routinely degraded via the N-end rule pathway of proteolysis in an oxygen- and nitric oxide-dependent manner. Despite their importance, neither the quantitative contribution of individual ERF-VIIs nor the cis-regulatory elements they govern are well understood. Here, using single- and double-null mutants, the constitutively synthesized ERF-VIIs RELATED TO APETALA2.2 (RAP2.2) and RAP2.12 are shown to act redundantly as principle activators of hypoxia-responsive genes; constitutively expressed RAP2.3 contributes to this redundancy, whereas the hypoxia-induced HYPOXIA RESPONSIVE ERF1 (HRE1) and HRE2 play minor roles. An evolutionarily conserved 12-bp cis-regulatory motif that binds to and is sufficient for activation by RAP2.2 and RAP2.12 is identified through a comparative phylogenetic motif search, promoter dissection, yeast one-hybrid assays, and chromatin immunopurification. This motif, designated the hypoxia-responsive promoter element, is enriched in promoters of hypoxia-responsive genes in multiple species.
Collapse
Affiliation(s)
- Philipp Gasch
- Plant Physiology, University Bayreuth, 95440 Bayreuth, Germany
| | | | - Jana T Müller
- Plant Physiology, University Bayreuth, 95440 Bayreuth, Germany
| | - Travis Lee
- Center for Plant Cell Biology and Botany and Plant Sciences Department, University of California, Riverside, California 92521
| | - Julia Bailey-Serres
- Center for Plant Cell Biology and Botany and Plant Sciences Department, University of California, Riverside, California 92521
| | | |
Collapse
|
8
|
The Identification of Cis-Regulatory Sequence Motifs in Gene Promoters Based on SNP Information. Methods Mol Biol 2016; 1482:31-47. [PMID: 27557759 DOI: 10.1007/978-1-4939-6396-6_3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Conservation of particular molecular sequence motifs throughout evolution is a strong indicator of their functional relevance as selective pressure likely prevented the accumulation of mutations. Known as "phylogenetic footprinting", this rationale has been exploited for the identification of novel functional motifs using sequence information from sequence alignments of diverse species, in particular transcription factor binding site motifs in aligned gene promoter sequences of orthologous genes. With the rapid advances of sequencing technologies, whole genome sequence information is accumulating not only across different species, but increasingly for variants of the same species exhibiting relatively little sequence variability, primarily present as single nucleotide polymorphisms (SNPs). Here, we lay out the basic strategy for the identification of functional cis-regulatory motifs in gene promoter regions based on SNP information.
Collapse
|
9
|
Suvorova IA, Korostelev YD, Gelfand MS. GntR Family of Bacterial Transcription Factors and Their DNA Binding Motifs: Structure, Positioning and Co-Evolution. PLoS One 2015; 10:e0132618. [PMID: 26151451 PMCID: PMC4494728 DOI: 10.1371/journal.pone.0132618] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 06/16/2015] [Indexed: 12/03/2022] Open
Abstract
The GntR family of transcription factors (TFs) is a large group of proteins present in diverse bacteria and regulating various biological processes. Here we use the comparative genomics approach to reconstruct regulons and identify binding motifs of regulators from three subfamilies of the GntR family, FadR, HutC, and YtrA. Using these data, we attempt to predict DNA-protein contacts by analyzing correlations between binding motifs in DNA and amino acid sequences of TFs. We identify pairs of positions with high correlation between amino acids and nucleotides for FadR, HutC, and YtrA subfamilies and show that the most predicted DNA-protein interactions are quite similar in all subfamilies and conform well to the experimentally identified contacts formed by FadR from E. coli and AraR from B. subtilis. The most frequent predicted contacts in the analyzed subfamilies are Arg-G, Asn-A, Asp-C. We also analyze the divergon structure and preferred site positions relative to regulated genes in the FadR and HutC subfamilies. A single site in a divergon usually regulates both operons and is approximately in the middle of the intergenic area. Double sites are either involved in the co-operative regulation of both operons and then are in the center of the intergenic area, or each site in the pair independently regulates its own operon and tends to be near it. We also identify additional candidate TF-binding boxes near palindromic binding sites of TFs from the FadR, HutC, and YtrA subfamilies, which may play role in the binding of additional TF-subunits.
Collapse
Affiliation(s)
- Inna A. Suvorova
- Research and Training Center on Bioinformatics, Institute for Information Transmission Problems RAS (The Kharkevich Institute), Moscow, Russia
- * E-mail:
| | - Yuri D. Korostelev
- Research and Training Center on Bioinformatics, Institute for Information Transmission Problems RAS (The Kharkevich Institute), Moscow, Russia
| | - Mikhail S. Gelfand
- Research and Training Center on Bioinformatics, Institute for Information Transmission Problems RAS (The Kharkevich Institute), Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia
| |
Collapse
|
10
|
Leyn SA, Rodionova IA, Li X, Rodionov DA. Novel Transcriptional Regulons for Autotrophic Cycle Genes in Crenarchaeota. J Bacteriol 2015; 197:2383-91. [PMID: 25939834 PMCID: PMC4524184 DOI: 10.1128/jb.00249-15] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 04/29/2015] [Indexed: 12/18/2022] Open
Abstract
UNLABELLED Autotrophic microorganisms are able to utilize carbon dioxide as their only carbon source, or, alternatively, many of them can grow heterotrophically on organics. Different variants of autotrophic pathways have been identified in various lineages of the phylum Crenarchaeota. Aerobic members of the order Sulfolobales utilize the hydroxypropionate-hydroxybutyrate cycle (HHC) to fix inorganic carbon, whereas anaerobic Thermoproteales use the dicarboxylate-hydroxybutyrate cycle (DHC). Knowledge of transcriptional regulation of autotrophic pathways in Archaea is limited. We applied a comparative genomics approach to predict novel autotrophic regulons in the Crenarchaeota. We report identification of two novel DNA motifs associated with the autotrophic pathway genes in the Sulfolobales (HHC box) and Thermoproteales (DHC box). Based on genome context evidence, the HHC box regulon was attributed to a novel transcription factor from the TrmB family named HhcR. Orthologs of HhcR are present in all Sulfolobales genomes but were not found in other lineages. A predicted HHC box regulatory motif was confirmed by in vitro binding assays with the recombinant HhcR protein from Metallosphaera yellowstonensis. For the DHC box regulon, we assigned a different potential regulator, named DhcR, which is restricted to the order Thermoproteales. DhcR in Thermoproteus neutrophilus (Tneu_0751) was previously identified as a DNA-binding protein with high affinity for the promoter regions of two autotrophic operons. The global HhcR and DhcR regulons reconstructed by comparative genomics were reconciled with available omics data in Metallosphaera and Thermoproteus spp. The identified regulons constitute two novel mechanisms for transcriptional control of autotrophic pathways in the Crenarchaeota. IMPORTANCE Little is known about transcriptional regulation of carbon dioxide fixation pathways in Archaea. We previously applied the comparative genomics approach for reconstruction of DtxR family regulons in diverse lineages of Archaea. Here, we utilize similar computational approaches to identify novel regulatory motifs for genes that are autotrophically induced in microorganisms from two lineages of Crenarchaeota and to reconstruct the respective regulons. The predicted novel regulons in archaeal genomes control the majority of autotrophic pathway genes and also other carbon and energy metabolism genes. The HhcR regulon was experimentally validated by DNA-binding assays in Metallosphaera spp. Novel regulons described for the first time in this work provide a basis for understanding the mechanisms of transcriptional regulation of autotrophic pathways in Archaea.
Collapse
Affiliation(s)
- Semen A Leyn
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Irina A Rodionova
- Sanford-Burnham Medical Research Institute, La Jolla, California, USA
| | - Xiaoqing Li
- Sanford-Burnham Medical Research Institute, La Jolla, California, USA
| | - Dmitry A Rodionov
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia Sanford-Burnham Medical Research Institute, La Jolla, California, USA
| |
Collapse
|
11
|
Fernandez L, Mercader JM, Planas-Fèlix M, Torrents D. Adaptation to environmental factors shapes the organization of regulatory regions in microbial communities. BMC Genomics 2014; 15:877. [PMID: 25294412 PMCID: PMC4287501 DOI: 10.1186/1471-2164-15-877] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 09/24/2014] [Indexed: 11/10/2022] Open
Abstract
Background It has been shown in a number of metagenomic studies that the addition and removal of specific genes have allowed microbiomes to adapt to specific environmental conditions by losing and gaining specific functions. But it is not known whether and how the regulation of gene expression also contributes to adaptation. Results We have here characterized and analyzed the metaregulome of three different environments, as well as their impact in the adaptation to particular variable physico-chemical conditions. For this, we have developed a computational protocol to extract regulatory regions and their corresponding transcription factors binding sites directly from metagenomic reads and applied it to three well known environments: Acid Mine, Whale Fall, and Waseca Farm. Taking the density of regulatory sites in promoters as a measure of the potential and complexity of gene regulation, we found it to be quantitatively the same in all three environments, despite their different physico-chemical conditions and species composition. However, we found that each environment distributes their regulatory potential differently across their functional space. Among the functions with highest regulatory potential in each niche, we found significant enrichment of processes related to sensing and buffering external variable factors specific to each environment, like for example, the availability of co-factors in deep sea, of oligosaccharides in soil and the regulation of pH in the acid mine. Conclusions These results highlight the potential impact of gene regulation in the adaptation of bacteria to the different habitats through the distribution of their regulatory potential among specific functions, and point to critical environmental factors that challenge the growth of any microbial community. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-877) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | - David Torrents
- Joint IRB-BSC program on Computational Biology, BSC, Jordi Girona, 29, 08034 Barcelona, Spain.
| |
Collapse
|
12
|
|
13
|
Chen R, Peng Y, Choi B, Xu J, Hu H. A private DNA motif finding algorithm. J Biomed Inform 2014; 50:122-32. [PMID: 24412700 DOI: 10.1016/j.jbi.2013.12.016] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2013] [Revised: 11/25/2013] [Accepted: 12/29/2013] [Indexed: 10/25/2022]
Abstract
With the increasing availability of genomic sequence data, numerous methods have been proposed for finding DNA motifs. The discovery of DNA motifs serves a critical step in many biological applications. However, the privacy implication of DNA analysis is normally neglected in the existing methods. In this work, we propose a private DNA motif finding algorithm in which a DNA owner's privacy is protected by a rigorous privacy model, known as ∊-differential privacy. It provides provable privacy guarantees that are independent of adversaries' background knowledge. Our algorithm makes use of the n-gram model and is optimized for processing large-scale DNA sequences. We evaluate the performance of our algorithm over real-life genomic data and demonstrate the promise of integrating privacy into DNA motif finding.
Collapse
Affiliation(s)
- Rui Chen
- Department of Computer Science, Hong Kong Baptist University, Hong Kong.
| | - Yun Peng
- School of Computer Engineering, Nanyang Technological University, Singapore.
| | - Byron Choi
- Department of Computer Science, Hong Kong Baptist University, Hong Kong.
| | - Jianliang Xu
- Department of Computer Science, Hong Kong Baptist University, Hong Kong.
| | - Haibo Hu
- Department of Computer Science, Hong Kong Baptist University, Hong Kong.
| |
Collapse
|
14
|
Galperin MY, Koonin EV. Comparative Genomics Approaches to Identifying Functionally Related Genes. ALGORITHMS FOR COMPUTATIONAL BIOLOGY 2014. [DOI: 10.1007/978-3-319-07953-0_1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
15
|
Abbas MM, Abouelhoda M, Bahig HM. A hybrid method for the exact planted (l, d) motif finding problem and its parallelization. BMC Bioinformatics 2012; 13 Suppl 17:S10. [PMID: 23281969 PMCID: PMC3521218 DOI: 10.1186/1471-2105-13-s17-s10] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Background Given a set of DNA sequences s1, ..., st, the (l, d) motif problem is to find an l-length motif sequence M , not necessary existing in any of the input sequences, such that for each sequence si, 1 ≤ i ≤ t, there is at least one subsequence differing with at most d mismatches from M. Many exact algorithms have been developed to solve the motif finding problem in the last three decades. However, the problem is still challenging and its solution is limited to small values of l and d. Results In this paper we present a new efficient method to improve the performance of the exact algorithms for the motif finding problem. Our method is composed of two main steps: First, we process q ≤ t sequences to find candidate motifs. Second, the candidate motifs are searched in the remaining sequences. For both steps, we use the best available algorithms. Our method is a hybrid one, because it integrates currently existing algorithms to achieve the best running time. In this paper, we show how the optimal value of q is determined to achieve the best running time. Our experimental results show that there is about 24% speed-up achieved by our method compared to the best existing algorithm. Furthermore, we also present a parallel version of our method running on shared memory architecture. Our experiments show that the performance of our algorithm scales linearly with the number of processors. Using the parallel version, we were able to solve the (21, 8) challenging instance using 8 processors in 20.42 hours instead of 6.68 days of the serial version. Conclusions Our method speeds up the solution of the exact motif problem. Our method is generic, because it can accommodate any new faster algorithm based on traditional methods. We expect that our method will help to discover longer motifs. The software we developed is available for free for academic research at http://www.nubios.nileu.edu.eg/tools/hymotif.
Collapse
Affiliation(s)
- Mostafa M Abbas
- Department of Basic Sciences, Faculty of Engineering, Sinai University, El-Arish, Egypt.
| | | | | |
Collapse
|
16
|
Sernova NV, Gelfand MS. Comparative genomics of CytR, an unusual member of the LacI family of transcription factors. PLoS One 2012; 7:e44194. [PMID: 23028500 PMCID: PMC3454398 DOI: 10.1371/journal.pone.0044194] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2012] [Accepted: 07/30/2012] [Indexed: 11/19/2022] Open
Abstract
CytR is a transcription regulator from the LacI family, present in some gamma-proteobacteria including Escherichia coli and known not only for its cellular role, control of transport and utilization of nucleosides, but for a number of unusual structural properties. The present study addressed three related problems: structure of CytR-binding sites and motifs, their evolutionary conservation, and identification of new members of the CytR regulon. While the majority of CytR-binding sites are imperfect inverted repeats situated between binding sites for another transcription factor, CRP, other architectures were observed, in particular, direct repeats. While the similarity between sites for different genes in one genome is rather low, and hence the consensus motif is weak, there is high conservation of orthologous sites in different genomes (mainly in the Enterobacteriales) arguing for the presence of specific CytR-DNA contacts. On larger evolutionary distances candidate CytR sites may migrate but the approximate distance between flanking CRP sites tends to be conserved, which demonstrates that the overall structure of the CRP-CytR-DNA complex is gene-specific. The analysis yielded candidate CytR-binding sites for orthologs of known regulon members in less studied genomes of the Enterobacteriales and Vibrionales and identified a new candidate member of the CytR regulon, encoding a transporter named NupT (YcdZ).
Collapse
Affiliation(s)
- Natalia V. Sernova
- A.A.Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences (IITP RAS), Moscow, Russia
| | - Mikhail S. Gelfand
- A.A.Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences (IITP RAS), Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, M.V.Lomonosov Moscow State University, Moscow, Russia
- * E-mail:
| |
Collapse
|
17
|
Lim CK, Hassan KA, Penesyan A, Loper JE, Paulsen IT. The effect of zinc limitation on the transcriptome ofPseudomonas protegens Pf-5. Environ Microbiol 2012; 15:702-15. [DOI: 10.1111/j.1462-2920.2012.02849.x] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Revised: 07/09/2012] [Accepted: 07/21/2012] [Indexed: 02/03/2023]
Affiliation(s)
- Chee Kent Lim
- Department of Chemistry and Biomolecular Sciences; Macquarie University; Sydney; NSW; Australia
| | - Karl A. Hassan
- Department of Chemistry and Biomolecular Sciences; Macquarie University; Sydney; NSW; Australia
| | - Anahit Penesyan
- Department of Chemistry and Biomolecular Sciences; Macquarie University; Sydney; NSW; Australia
| | - Joyce E. Loper
- USDA-ARS Horticultural Crops Research Laboratory and Department of Botany and Plant Pathology; Oregon State University; Corvallis; OR; USA
| | - Ian T. Paulsen
- Department of Chemistry and Biomolecular Sciences; Macquarie University; Sydney; NSW; Australia
| |
Collapse
|
18
|
Kim D, Hong JSJ, Qiu Y, Nagarajan H, Seo JH, Cho BK, Tsai SF, Palsson BØ. Comparative analysis of regulatory elements between Escherichia coli and Klebsiella pneumoniae by genome-wide transcription start site profiling. PLoS Genet 2012; 8:e1002867. [PMID: 22912590 PMCID: PMC3415461 DOI: 10.1371/journal.pgen.1002867] [Citation(s) in RCA: 111] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2012] [Accepted: 06/14/2012] [Indexed: 01/08/2023] Open
Abstract
Genome-wide transcription start site (TSS) profiles of the enterobacteria Escherichia coli and Klebsiella pneumoniae were experimentally determined through modified 5′ RACE followed by deep sequencing of intact primary mRNA. This identified 3,746 and 3,143 TSSs for E. coli and K. pneumoniae, respectively. Experimentally determined TSSs were then used to define promoter regions and 5′ UTRs upstream of coding genes. Comparative analysis of these regulatory elements revealed the use of multiple TSSs, identical sequence motifs of promoter and Shine-Dalgarno sequence, reflecting conserved gene expression apparatuses between the two species. In both species, over 70% of primary transcripts were expressed from operons having orthologous genes during exponential growth. However, expressed orthologous genes in E. coli and K. pneumoniae showed a strikingly different organization of upstream regulatory regions with only 20% identical promoters with TSSs in both species. Over 40% of promoters had TSSs identified in only one species, despite conserved promoter sequences existing in the other species. 662 conserved promoters having TSSs in both species resulted in the same number of comparable 5′ UTR pairs, and that regulatory element was found to be the most variant region in sequence among promoter, 5′ UTR, and ORF. In K. pneumoniae, 48 sRNAs were predicted and 36 of them were expressed during exponential growth. Among them, 34 orthologous sRNAs between two species were analyzed in depth, and the analysis showed that many sRNAs of K. pneumoniae, including pleiotropic sRNAs such as rprA, arcZ, and sgrS, may work in the same way as in E. coli. These results reveal a new dimension of comparative genomics such that a comparison of two genomes needs to be comprehensive over all levels of genome organization. In order to investigate similarities and differences of closely related species, most of the comparative genomics studies focus on comparing the gene contents either shared or specific for each genome. However, it is also important to investigate the differences in non-coding regulatory elements because they influence the transcriptional and post-transcriptional processes. Thus, we performed a genome-wide profiling of transcription start sites (TSSs) in two species, E. coli K-12 MG1655 and K. pneumoniae MGH78578. Experimental identification of TSSs is important for precise definition of promoter regions and 5′ untranslated regions upstream of coding genes. Comparative analysis of these regulatory elements revealed the use of multiple TSSs, identical sequence motifs of promoter and Shine-Dalgarno sequence. However, we observed that the upstream regulatory regions of the majority of operons having orthologous genes were organized with different usage of promoters and TSSs, resulting in diverse and complex gene regulation. We also found that the 5′ UTR is the least conserved regulatory element in sequence between the two species. Moreover, 34 orthologous sRNAs between E. coli and K. pneumoniae were analyzed in depth. The analysis suggested many of K. pneumoniae sRNAs might regulate the target genes as in E. coli.
Collapse
Affiliation(s)
- Donghyuk Kim
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Jay Sung-Joong Hong
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Yu Qiu
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Harish Nagarajan
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Joo-Hyun Seo
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Byung-Kwan Cho
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
| | - Shih-Feng Tsai
- Division of Molecular and Genomic Medicine, National Health Research Institutes, Miaoli, Taiwan
| | - Bernhard Ø. Palsson
- Department of Bioengineering, University of California San Diego, La Jolla, California, United States of America
- * E-mail:
| |
Collapse
|
19
|
Wilbanks EG, Larsen DJ, Neches RY, Yao AI, Wu CY, Kjolby RAS, Facciotti MT. A workflow for genome-wide mapping of archaeal transcription factors with ChIP-seq. Nucleic Acids Res 2012; 40:e74. [PMID: 22323522 PMCID: PMC3378898 DOI: 10.1093/nar/gks063] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Deciphering the structure of gene regulatory networks across the tree of life remains one of the major challenges in postgenomic biology. We present a novel ChIP-seq workflow for the archaea using the model organism Halobacterium salinarum sp. NRC-1 and demonstrate its application for mapping the genome-wide binding sites of natively expressed transcription factors. This end-to-end pipeline is the first protocol for ChIP-seq in archaea, with methods and tools for each stage from gene tagging to data analysis and biological discovery. Genome-wide binding sites for transcription factors with many binding sites (TfbD) are identified with sensitivity, while retaining specificity in the identification the smaller regulons (bacteriorhodopsin-activator protein). Chromosomal tagging of target proteins with a compact epitope facilitates a standardized and cost-effective workflow that is compatible with high-throughput immunoprecipitation of natively expressed transcription factors. The Pique package, an open-source bioinformatics method, is presented for identification of binding events. Relative to ChIP-Chip and qPCR, this workflow offers a robust catalog of protein–DNA binding events with improved spatial resolution and significantly decreased cost. While this study focuses on the application of ChIP-seq in H. salinarum sp. NRC-1, our workflow can also be adapted for use in other archaea and bacteria with basic genetic tools.
Collapse
Affiliation(s)
- Elizabeth G Wilbanks
- University of California Davis, Department of Biomedical Engineering and Genome Center, One Shields Avenue, Davis, CA 95616, USA.
| | | | | | | | | | | | | |
Collapse
|
20
|
Bordron P, Eveillard D, Rusu I. Integrated analysis of the gene neighbouring impact on bacterial metabolic networks. IET Syst Biol 2011; 5:261-8. [PMID: 21823757 DOI: 10.1049/iet-syb.2010.0070] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Different levels of abstraction are needed to represent a living system. Unfortunately information of different nature is not superposable in an obvious way, but requires a dedicated framework. Because biological abstractions, i.e., genomic or metabolic information, can be easily represented as graphs, it is intuitive to integrate them into a unique graph, in which one can perform graph analysis for investigating a given biological assumption. This study follows such a philosophy and completes a genome and metabolome combination. In a such integrated framework and as illustration, we applied a graph analysis that automatically investigates impacts of the gene adjacency to predict functional relationships between genes and reactions. Our approach, called SIPPER, creates a weighted graph, in which the weights rely on the given relationship between genes, and computes (alternative) chains of reactions catalysed by genes. This method, as a generalisation of methods already published, can be easily adapted to several biological assumptions, properties or measures. This paper evaluates SIPPER on Escherichia coli. We automatically extract subgraphs, called k-SIPs, and quantify their interest in both genomic and metabolic contexts by showing functional compounds like operons or functional modules.
Collapse
Affiliation(s)
- P Bordron
- Université de Nantes, Computational Biology Group (ComBi) - LINA, CNRS UMR 6241, Nantes, France.
| | | | | |
Collapse
|
21
|
Rodionov DA, Novichkov PS, Stavrovskaya ED, Rodionova IA, Li X, Kazanov MD, Ravcheev DA, Gerasimova AV, Kazakov AE, Kovaleva GY, Permina EA, Laikova ON, Overbeek R, Romine MF, Fredrickson JK, Arkin AP, Dubchak I, Osterman AL, Gelfand MS. Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the Shewanella genus. BMC Genomics 2011; 12 Suppl 1:S3. [PMID: 21810205 PMCID: PMC3223726 DOI: 10.1186/1471-2164-12-s1-s3] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in bacteria is one of the critical tasks of modern genomics. The Shewanella genus is comprised of metabolically versatile gamma-proteobacteria, whose lifestyles and natural environments are substantially different from Escherichia coli and other model bacterial species. The comparative genomics approaches and computational identification of regulatory sites are useful for the in silico reconstruction of transcriptional regulatory networks in bacteria. Results To explore conservation and variations in the Shewanella transcriptional networks we analyzed the repertoire of transcription factors and performed genomics-based reconstruction and comparative analysis of regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors and their DNA binding sites, 8 riboswitches and 6 translational attenuators. Forty five regulons were newly inferred from the genome context analysis, whereas others were propagated from previously characterized regulons in the Enterobacteria and Pseudomonas spp.. Multiple variations in regulatory strategies between the Shewanella spp. and E. coli include regulon contraction and expansion (as in the case of PdhR, HexR, FadR), numerous cases of recruiting non-orthologous regulators to control equivalent pathways (e.g. PsrA for fatty acid degradation) and, conversely, orthologous regulators to control distinct pathways (e.g. TyrR, ArgR, Crp). Conclusions We tentatively defined the first reference collection of ~100 transcriptional regulons in 16 Shewanella genomes. The resulting regulatory network contains ~600 regulated genes per genome that are mostly involved in metabolism of carbohydrates, amino acids, fatty acids, vitamins, metals, and stress responses. Several reconstructed regulons including NagR for N-acetylglucosamine catabolism were experimentally validated in S. oneidensis MR-1. Analysis of correlations in gene expression patterns helps to interpret the reconstructed regulatory network. The inferred regulatory interactions will provide an additional regulatory constrains for an integrated model of metabolism and regulation in S. oneidensis MR-1.
Collapse
Affiliation(s)
- Dmitry A Rodionov
- Sanford-Burnham Medical Research Institute, La Jolla, California, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Comparative genomic analysis of the hexuronate metabolism genes and their regulation in gammaproteobacteria. J Bacteriol 2011; 193:3956-63. [PMID: 21622752 DOI: 10.1128/jb.00277-11] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The hexuronate metabolism in Escherichia coli is regulated by two related transcription factors from the FadR subfamily of the GntR family, UxuR and ExuR. UxuR controls the d-glucuronate metabolism, while ExuR represses genes involved in the metabolism of all hexuronates. We use a comparative genomics approach to reconstruct the hexuronate metabolic pathways and transcriptional regulons in gammaproteobacteria. We demonstrate differences in the binding motifs of UxuR and ExuR, identify new candidate members of the UxuR/ExuR regulons, and describe the links between the UxuR/ExuR regulons and the adjacent regulons UidR, KdgR, and YjjM. We provide experimental evidence that two predicted members of the UxuR regulon, yjjM and yjjN, are the subject of complex regulation by this transcription factor in E. coli.
Collapse
|
23
|
Rodionov DA, Yang C, Li X, Rodionova IA, Wang Y, Obraztsova AY, Zagnitko OP, Overbeek R, Romine MF, Reed S, Fredrickson JK, Nealson KH, Osterman AL. Genomic encyclopedia of sugar utilization pathways in the Shewanella genus. BMC Genomics 2010; 11:494. [PMID: 20836887 PMCID: PMC2996990 DOI: 10.1186/1471-2164-11-494] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2010] [Accepted: 09/13/2010] [Indexed: 11/16/2022] Open
Abstract
Background Carbohydrates are a primary source of carbon and energy for many bacteria. Accurate projection of known carbohydrate catabolic pathways across diverse bacteria with complete genomes constitutes a substantial challenge due to frequent variations in components of these pathways. To address a practically and fundamentally important challenge of reconstruction of carbohydrate utilization machinery in any microorganism directly from its genomic sequence, we combined a subsystems-based comparative genomic approach with experimental validation of selected bioinformatic predictions by a combination of biochemical, genetic and physiological experiments. Results We applied this integrated approach to systematically map carbohydrate utilization pathways in 19 genomes from the Shewanella genus. The obtained genomic encyclopedia of sugar utilization includes ~170 protein families (mostly metabolic enzymes, transporters and transcriptional regulators) spanning 17 distinct pathways with a mosaic distribution across Shewanella species providing insights into their ecophysiology and adaptive evolution. Phenotypic assays revealed a remarkable consistency between predicted and observed phenotype, an ability to utilize an individual sugar as a sole source of carbon and energy, over the entire matrix of tested strains and sugars. Comparison of the reconstructed catabolic pathways with E. coli identified multiple differences that are manifested at various levels, from the presence or absence of certain sugar catabolic pathways, nonorthologous gene replacements and alternative biochemical routes to a different organization of transcription regulatory networks. Conclusions The reconstructed sugar catabolome in Shewanella spp includes 62 novel isofunctional families of enzymes, transporters, and regulators. In addition to improving our knowledge of genomics and functional organization of carbohydrate utilization in Shewanella, this study led to a substantial expansion of our current version of the Genomic Encyclopedia of Carbohydrate Utilization. A systematic and iterative application of this approach to multiple taxonomic groups of bacteria will further enhance it, creating a knowledge base adequate for the efficient analysis of any newly sequenced genome as well as of the emerging metagenomic data.
Collapse
Affiliation(s)
- Dmitry A Rodionov
- Burnham Institute for Medical Research, La Jolla, California 92037, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Sahota G, Stormo GD. Novel sequence-based method for identifying transcription factor binding sites in prokaryotic genomes. ACTA ACUST UNITED AC 2010; 26:2672-7. [PMID: 20807838 DOI: 10.1093/bioinformatics/btq501] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Computational techniques for microbial genomic sequence analysis are becoming increasingly important. With next-generation sequencing technology and the human microbiome project underway, current sequencing capacity is significantly greater than the speed at which organisms of interest can be studied experimentally. Most related computational work has been focused on sequence assembly, gene annotation and metabolic network reconstruction. We have developed a method that will primarily use available sequence data in order to determine prokaryotic transcription factor (TF) binding specificities. RESULTS Specificity determining residues (critical residues) were identified from crystal structures of DNA-protein complexes and TFs with the same critical residues were grouped into specificity classes. The putative binding regions for each class were defined as the set of promoters for each TF itself (autoregulatory) and the immediately upstream and downstream operons. MEME was used to find putative motifs within each separate class. Tests on the LacI and TetR TF families, using RegulonDB annotated sites, showed the sensitivity of prediction 86% and 80%, respectively. AVAILABILITY http://ural.wustl.edu/∼gsahota/HTHmotif/
Collapse
Affiliation(s)
- Gurmukh Sahota
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO 63108, USA
| | | |
Collapse
|
25
|
Novichkov PS, Rodionov DA, Stavrovskaya ED, Novichkova ES, Kazakov AE, Gelfand MS, Arkin AP, Mironov AA, Dubchak I. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach. Nucleic Acids Res 2010; 38:W299-307. [PMID: 20542910 PMCID: PMC2896116 DOI: 10.1093/nar/gkq531] [Citation(s) in RCA: 113] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.
Collapse
|
26
|
Kendall SL, Movahedzadeh F, Wietzorrek A, Stoker NG. Microarray analysis of bacterial gene expression: towards the regulome. Comp Funct Genomics 2010; 3:352-4. [PMID: 18629272 PMCID: PMC2448436 DOI: 10.1002/cfg.193] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2002] [Accepted: 06/10/2002] [Indexed: 12/17/2022] Open
Abstract
Microarray technology allows co-regulated genes to be identified. In order to identify genes that are controlled by specific regulators, gene expression can be compared
in mutant and wild-type bacteria. However, there are a number of pitfalls with this
approach; in particular, the regulator may not be active under the conditions in which
the wild-type strain is cultured. Once co-regulated genes have been identified, proteinbinding
motifs can be identified. By combining these data with a map of promoters,
or operons (the operome), the regulatory networks in the cell (the regulome) can start
to be built up.
Collapse
Affiliation(s)
- Sharon L Kendall
- Department of Pathology and Infectious Diseases, Royal Veterinary College, Royal College Street, London NW1 0TU, UK.
| | | | | | | |
Collapse
|
27
|
Gu Y, Ding Y, Ren C, Sun Z, Rodionov DA, Zhang W, Yang S, Yang C, Jiang W. Reconstruction of xylose utilization pathway and regulons in Firmicutes. BMC Genomics 2010; 11:255. [PMID: 20406496 PMCID: PMC2873477 DOI: 10.1186/1471-2164-11-255] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2010] [Accepted: 04/21/2010] [Indexed: 11/10/2022] Open
Abstract
Background Many Firmicutes bacteria, including solvent-producing clostridia such as Clostridium acetobutylicum, are able to utilize xylose, an abundant carbon source in nature. Nevertheless, homology searches failed to recognize all the genes for the complete xylose and xyloside utilization pathway in most of them. Moreover, the regulatory mechanisms of xylose catabolism in many Firmicutes except Bacillus spp. still remained unclear. Results A comparative genomic approach was used to reconstruct the xylose and xyloside utilization pathway and analyze its regulatory mechanisms in 24 genomes of the Firmicutes. A novel xylose isomerase that is not homologous to previously characterized xylose isomerase, was identified in C. acetobutylicum and several other Clostridia species. The candidate genes for the xylulokinase, xylose transporters, and the transcriptional regulator of xylose metabolism (XylR), were unambiguously assigned in all of the analyzed species based on the analysis of conserved chromosomal gene clustering and regulons. The predicted functions of these genes in C. acetobutylicum were experimentally confirmed through a combination of genetic and biochemical techniques. XylR regulons were reconstructed by identification and comparative analysis of XylR-binding sites upstream of xylose and xyloside utilization genes. A novel XylR-binding DNA motif, which is exceptionally distinct from the DNA motif known for Bacillus XylR, was identified in three Clostridiales species and experimentally validated in C. acetobutylicum by an electrophoretic mobility shift assay. Conclusions This study provided comprehensive insights to the xylose catabolism and its regulation in diverse Firmicutes bacteria especially Clostridia species, and paved ways for improving xylose utilization capability in C. acetobutylicum by genetic engineering in the future.
Collapse
Affiliation(s)
- Yang Gu
- Key Laboratory of Synthetic Biology, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | | | | | | | | | | | | | | | | |
Collapse
|
28
|
The effect of orthology and coregulation on detecting regulatory motifs. PLoS One 2010; 5:e8938. [PMID: 20140085 PMCID: PMC2815771 DOI: 10.1371/journal.pone.0008938] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 01/05/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. METHODOLOGY We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. RESULTS AND CONCLUSIONS Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE.
Collapse
|
29
|
Zare-Mirakabad F, Ahrabian H, Sadeghi M, Hashemifar S, Nowzari-Dalini A, Goliaei B. Genetic algorithm for dyad pattern finding in DNA sequences. Genes Genet Syst 2009; 84:81-93. [DOI: 10.1266/ggs.84.81] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Affiliation(s)
- Fatemeh Zare-Mirakabad
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran
| | - Hayedeh Ahrabian
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran
- Center of Excellence in Biomathematics, School of Mathematics, Statistics, and Computer Science, University of Tehran
| | - Mehdi Sadeghi
- National Institute of Genetic Engendering and Biotechnology
- School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM)
| | - Somaieh Hashemifar
- Center of Excellence in Biomathematics, School of Mathematics, Statistics, and Computer Science, University of Tehran
| | - Abbas Nowzari-Dalini
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran
- Center of Excellence in Biomathematics, School of Mathematics, Statistics, and Computer Science, University of Tehran
| | - Bahram Goliaei
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, University of Tehran
| |
Collapse
|
30
|
Li E, Reich CI, Olsen GJ. A whole-genome approach to identifying protein binding sites: promoters in Methanocaldococcus (Methanococcus) jannaschii. Nucleic Acids Res 2008; 36:6948-58. [PMID: 18981048 PMCID: PMC2602779 DOI: 10.1093/nar/gkm499] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
We have adapted an electrophoretic mobility shift assay (EMSA) to isolate genomic DNA fragments that bind the archaeal transcription initiation factors TATA-binding protein (TBP) and transcription factor B (TFB) to perform a genome-wide search for promoters. Mobility-shifted fragments were cloned, tested for their ability to compete with known promoter-containing fragments for a limited concentration of transcription factors, and sequenced. We applied the method to search for promoters in the genome of Methanocaldococcus jannaschii. Selection was most efficient for promoters of tRNA genes and genes for several presumed small non-coding RNAs (ncRNA). Protein-coding gene promoters were dramatically underrepresented relative to their frequency in the genome. The repeated isolation of these genomic regions was partially rectified by including a hybridization-based screening. Sequence alignment of the affinity-selected promoters revealed previously identified TATA box, BRE, and the putative initiator element. In addition, the conserved bases immediately upstream and downstream of the BRE and TATA box suggest that the composition and structure of archaeal natural promoters are more complicated.
Collapse
Affiliation(s)
- Enhu Li
- Division of Biology, California Institute of Technology, Pasadena, CA 91125, USA
| | | | | |
Collapse
|
31
|
Abstract
The specific and tightly controlled transport of numerous nutrients and metabolites across cellular membranes is crucial to all forms of life. However, many of the transporter proteins involved have yet to be identified, including the vitamin transporters in various human pathogens, whose growth depends strictly on vitamin uptake. Comparative analysis of the ever-growing collection of microbial genomes coupled with experimental validation enables the discovery of such transporters. Here, we used this approach to discover an abundant class of vitamin transporters in prokaryotes with an unprecedented architecture. These transporters have energy-coupling modules comprised of a conserved transmembrane protein and two nucleotide binding proteins similar to those of ATP binding cassette (ABC) transporters, but unlike ABC transporters, they use small integral membrane proteins to capture specific substrates. We identified 21 families of these substrate capture proteins, each with a different specificity predicted by genome context analyses. Roughly half of the substrate capture proteins (335 cases) have a dedicated energizing module, but in 459 cases distributed among almost 100 gram-positive bacteria, including numerous human pathogens, different and unrelated substrate capture proteins share the same energy-coupling module. The shared use of energy-coupling modules was experimentally confirmed for folate, thiamine, and riboflavin transporters. We propose the name energy-coupling factor transporters for the new class of membrane transporters.
Collapse
|
32
|
Comparative genomics of regulation of fatty acid and branched-chain amino acid utilization in proteobacteria. J Bacteriol 2008; 191:52-64. [PMID: 18820024 DOI: 10.1128/jb.01175-08] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Bacteria can use branched-chain amino acids (ILV, i.e., isoleucine, leucine, valine) and fatty acids (FAs) as sole carbon and energy sources converting ILV into acetyl-coenzyme A (CoA), propanoyl-CoA, and propionyl-CoA, respectively. In this work, we used the comparative genomic approach to identify candidate transcriptional factors and DNA motifs that control ILV and FA utilization pathways in proteobacteria. The metabolic regulons were characterized based on the identification and comparison of candidate transcription factor binding sites in groups of phylogenetically related genomes. The reconstructed ILV/FA regulatory network demonstrates considerable variability and involves six transcriptional factors from the MerR, TetR, and GntR families binding to 11 distinct DNA motifs. The ILV degradation genes in gamma- and betaproteobacteria are regulated mainly by a novel regulator from the MerR family (e.g., LiuR in Pseudomonas aeruginosa) (40 species); in addition, the TetR-type regulator LiuQ was identified in some betaproteobacteria (eight species). Besides the core set of ILV utilization genes, the LiuR regulon in some lineages is expanded to include genes from other metabolic pathways, such as the glyoxylate shunt and glutamate synthase in Shewanella species. The FA degradation genes are controlled by four regulators including FadR in gammaproteobacteria (34 species), PsrA in gamma- and betaproteobacteria (45 species), FadP in betaproteobacteria (14 species), and LiuR orthologs in alphaproteobacteria (22 species). The remarkable variability of the regulatory systems associated with the FA degradation pathway is discussed from functional and evolutionary points of view.
Collapse
|
33
|
Rodionov DA, Li X, Rodionova IA, Yang C, Sorci L, Dervyn E, Martynowski D, Zhang H, Gelfand MS, Osterman AL. Transcriptional regulation of NAD metabolism in bacteria: genomic reconstruction of NiaR (YrxA) regulon. Nucleic Acids Res 2008; 36:2032-46. [PMID: 18276644 PMCID: PMC2330245 DOI: 10.1093/nar/gkn046] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A comparative genomic approach was used to reconstruct transcriptional regulation of NAD biosynthesis in bacteria containing orthologs of Bacillus subtilis gene yrxA, a previously identified niacin-responsive repressor of NAD de novo synthesis. Members of YrxA family (re-named here NiaR) are broadly conserved in the Bacillus/Clostridium group and in the deeply branching Fusobacteria and Thermotogales lineages. We analyzed upstream regions of genes associated with NAD biosynthesis to identify candidate NiaR-binding DNA motifs and assess the NiaR regulon content in these species. Representatives of the two distinct types of candidate NiaR-binding sites, characteristic of the Firmicutes and Thermotogales, were verified by an electrophoretic mobility shift assay. In addition to transcriptional control of the nadABC genes, the NiaR regulon in some species extends to niacin salvage (the pncAB genes) and includes uncharacterized membrane proteins possibly involved in niacin transport. The involvement in niacin uptake proposed for one of these proteins (re-named NiaP), encoded by the B. subtilis gene yceI, was experimentally verified. In addition to bacteria, members of the NiaP family are conserved in multicellular eukaryotes, including human, pointing to possible NaiP involvement in niacin utilization in these organisms. Overall, the analysis of the NiaR and NrtR regulons (described in the accompanying paper) revealed mechanisms of transcriptional regulation of NAD metabolism in nearly a hundred diverse bacteria.
Collapse
|
34
|
Rodionov DA, De Ingeniis J, Mancini C, Cimadamore F, Zhang H, Osterman AL, Raffaelli N. Transcriptional regulation of NAD metabolism in bacteria: NrtR family of Nudix-related regulators. Nucleic Acids Res 2008; 36:2047-59. [PMID: 18276643 PMCID: PMC2330246 DOI: 10.1093/nar/gkn047] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
A novel family of transcription factors responsible for regulation of various aspects of NAD synthesis in a broad range of bacteria was identified by comparative genomics approach. Regulators of this family (here termed NrtR for Nudix-related transcriptional regulators), currently annotated as ADP-ribose pyrophosphatases from the Nudix family, are composed of an N-terminal Nudix-like effector domain and a C-terminal DNA-binding HTH-like domain. NrtR regulons were reconstructed in diverse bacterial genomes by identification and comparative analysis of NrtR-binding sites upstream of genes involved in NAD biosynthetic pathways. The candidate NrtR-binding DNA motifs showed significant variability between microbial lineages, although the common consensus sequence could be traced for most of them. Bioinformatics predictions were experimentally validated by gel mobility shift assays for two NrtR family representatives. ADP-ribose, the product of glycohydrolytic cleavage of NAD, was found to suppress the in vitro binding of NrtR proteins to their DNA target sites. In addition to a major role in the direct regulation of NAD homeostasis, some members of NrtR family appear to have been recruited for the regulation of other metabolic pathways, including sugar pentoses utilization and biogenesis of phosphoribosyl pyrophosphate. This work and the accompanying study of NiaR regulon demonstrate significant variability of regulatory strategies for control of NAD metabolic pathway in bacteria.
Collapse
|
35
|
TrpY regulation of trpB2 transcription in Methanothermobacter thermautotrophicus. J Bacteriol 2008; 190:2637-41. [PMID: 18263726 DOI: 10.1128/jb.01926-07] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
TrpY binds specifically to TRP box sequences upstream of trpB2, but the repression of trpB2 transcription requires additional TrpY assembly that is stimulated by but not dependent on the presence of tryptophan. Inhibitory complex formation is prevented by insertions within the regulatory region and by a G149R substitution in TrpY, even though TrpY(G149R) retains both TRP box DNA- and tryptophan-binding abilities.
Collapse
|
36
|
SIGffRid: a tool to search for sigma factor binding sites in bacterial genomes using comparative approach and biologically driven statistics. BMC Bioinformatics 2008; 9:73. [PMID: 18237374 PMCID: PMC2375139 DOI: 10.1186/1471-2105-9-73] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2007] [Accepted: 01/31/2008] [Indexed: 11/10/2022] Open
Abstract
Background Many programs have been developed to identify transcription factor binding sites. However, most of them are not able to infer two-word motifs with variable spacer lengths. This case is encountered for RNA polymerase Sigma (σ) Factor Binding Sites (SFBSs) usually composed of two boxes, called -35 and -10 in reference to the transcription initiation point. Our goal is to design an algorithm detecting SFBS by using combinational and statistical constraints deduced from biological observations. Results We describe a new approach to identify SFBSs by comparing two related bacterial genomes. The method, named SIGffRid (SIGma Factor binding sites Finder using R'MES to select Input Data), performs a simultaneous analysis of pairs of promoter regions of orthologous genes. SIGffRid uses a prior identification of over-represented patterns in whole genomes as selection criteria for potential -35 and -10 boxes. These patterns are then grouped using pairs of short seeds (of which one is possibly gapped), allowing a variable-length spacer between them. Next, the motifs are extended guided by statistical considerations, a feature that ensures a selection of motifs with statistically relevant properties. We applied our method to the pair of related bacterial genomes of Streptomyces coelicolor and Streptomyces avermitilis. Cross-check with the well-defined SFBSs of the SigR regulon in S. coelicolor is detailed, validating the algorithm. SFBSs for HrdB and BldN were also found; and the results suggested some new targets for these σ factors. In addition, consensus motifs for BldD and new SFBSs binding sites were defined, overlapping previously proposed consensuses. Relevant tests were carried out also on bacteria with moderate GC content (i.e. Escherichia coli/Salmonella typhimurium and Bacillus subtilis/Bacillus licheniformis pairs). Motifs of house-keeping σ factors were found as well as other SFBSs such as that of SigW in Bacillus strains. Conclusion We demonstrate that our approach combining statistical and biological criteria was successful to predict SFBSs. The method versatility autorizes the recognition of other kinds of two-box regulatory sites.
Collapse
|
37
|
Kurmangaliyev YZ, Gelfand MS. Computational analysis of splicing errors and mutations in human transcripts. BMC Genomics 2008; 9:13. [PMID: 18194514 PMCID: PMC2234086 DOI: 10.1186/1471-2164-9-13] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2007] [Accepted: 01/14/2008] [Indexed: 01/10/2023] Open
Abstract
Background Most retained introns found in human cDNAs generated by high-throughput sequencing projects seem to result from underspliced transcripts, and thus they capture intermediate steps of pre-mRNA splicing. On the other hand, mutations in splice sites cause exon skipping of the respective exon or activation of pre-existing cryptic sites. Both types of events reflect properties of the splicing mechanism. Results The retained introns were significantly shorter than constitutive ones, and skipped exons are shorter than exons with cryptic sites. Both donor and acceptor splice sites of retained introns were weaker than splice sites of constitutive introns. The authentic acceptor sites affected by mutations were significantly weaker in exons with activated cryptic sites than in skipped exons. The distance from a mutated splice site to the nearest equivalent site is significantly shorter in cases of activated cryptic sites compared to exon skipping events. The prevalence of retained introns within genes monotonically increased in the 5'-to-3' direction (more retained introns close to the 3'-end), consistent with the model of co-transcriptional splicing. The density of exonic splicing enhancers was higher, and the density of exonic splicing silencers lower in retained introns compared to constitutive ones and in exons with cryptic sites compared to skipped exons. Conclusion Thus the analysis of retained introns in human cDNA, exons skipped due to mutations in splice sites and exons with cryptic sites produced results consistent with the intron definition mechanism of splicing of short introns, co-transcriptional splicing, dependence of splicing efficiency on the splice site strength and the density of candidate exonic splicing enhancers and silencers. These results are consistent with other, recently published analyses.
Collapse
Affiliation(s)
- Yerbol Z Kurmangaliyev
- Institute for Information Transmission Problems (the Kharkevich Institute) RAS, Bolshoi Karetny pereulok 19, Moscow, 127994, Russia.
| | | |
Collapse
|
38
|
Ermakova EO, Nurtdinov RN, Gelfand MS. Overlapping alternative donor splice sites in the human genome. J Bioinform Comput Biol 2008; 5:991-1004. [PMID: 17933007 DOI: 10.1142/s0219720007003089] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2007] [Revised: 05/30/2007] [Accepted: 06/01/2007] [Indexed: 11/18/2022]
Abstract
Over 50% of donor splice sites in the human genome have a potential alternative donor site at a distance of three to six nucleotides. Conservation of these potential sites is determined by the consensus requirements and by its exonic or intronic location. Several hundred pairs of overlapping sites are confirmed to be alternatively spliced as both sites in a pair are supported by a protein, by a full-length mRNA, or by expressed sequence tags (ESTs) from at least two independent clone libraries. Overlapping sites may clash with consensus requirements. Pairs with a site shift of four nucleotides are the most abundant, despite the frameshift in the protein-coding region that they introduce. The site usage in pairs is usually uneven, and the major site is more frequently conserved in other mammalian genomes. Overlapping alternative donor sites and acceptor sites may have different functional roles: alternative splicing of overlapping acceptor sites leads mainly to microvariations in protein sequences; whereas alternative donor sites often lead to frameshifts and thus either yield major differences in the protein sequence and structure, or generate nonsense-mediated decay-inducing mRNA isoforms likely involved in regulated unproductive splicing pathways.
Collapse
Affiliation(s)
- Ekaterina O Ermakova
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Bolshoi Karetny per. 19, 127994 Moscow, Russia.
| | | | | |
Collapse
|
39
|
Spitalny P, Thomm M. A polymerase III-like reinitiation mechanism is operating in regulation of histone expression in archaea. Mol Microbiol 2007; 67:958-70. [PMID: 18182021 PMCID: PMC2253867 DOI: 10.1111/j.1365-2958.2007.06084.x] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
An archaeal histone gene from the hyperthermophile Pyrococcus furiosus containing four consecutive putative oligo-dT terminator sequences was used as a model system to investigate termination signals and the mechanism of termination in vitro. The archaeal RNA polymerase terminated with high efficiency at the first terminator at 90°C when it contained five to six T residues, at 80°C readthrough was significantly increased. A putative hairpin structure upstream of the first terminator had no effect on termination efficiency. Template competition experiments starting with RNA polymerase molecules engaged in ternary complexes revealed recycling of RNA polymerase from the terminator to the promoter of the same template. This facilitated reinitiation was dependent upon the presence of a terminator sequence suggesting that pausing at the terminator is required for recycling as in the RNA polymerase III system. Replacement of the sequences immediately downstream of the oligo-dT terminator by an AT-rich segment improved termination efficiency. Both AT-rich and GC-rich downstream sequences seemed to impair the facilitated reinitiation pathway. Our data suggest that recycling is dependent on a subtle interplay of pausing of RNA polymerase at the terminator and RNA polymerase translocation beyond the oligo-dT termination signal that is dramatically affected by downstream sequences.
Collapse
Affiliation(s)
- Patrizia Spitalny
- Department of Microbiology, University of Regensburg, Universitätsstrasse 31, 93053 Regensburg, Germany
| | | |
Collapse
|
40
|
Cross-talk Between Iron and Nitrogen Regulatory Networks in Anabaena (Nostoc) sp. PCC 7120: Identification of Overlapping Genes in FurA and NtcA Regulons. J Mol Biol 2007; 374:267-81. [DOI: 10.1016/j.jmb.2007.09.010] [Citation(s) in RCA: 86] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2007] [Revised: 08/30/2007] [Accepted: 09/04/2007] [Indexed: 01/26/2023]
|
41
|
Abstract
BACKGROUND Unraveling the mechanisms that regulate gene expression is a major challenge in biology. An important task in this challenge is to identify regulatory elements, especially the binding sites in deoxyribonucleic acid (DNA) for transcription factors. These binding sites are short DNA segments that are called motifs. Recent advances in genome sequence availability and in high-throughput gene expression analysis technologies have allowed for the development of computational methods for motif finding. As a result, a large number of motif finding algorithms have been implemented and applied to various motif models over the past decade. This survey reviews the latest developments in DNA motif finding algorithms. RESULTS Earlier algorithms use promoter sequences of coregulated genes from single genome and search for statistically overrepresented motifs. Recent algorithms are designed to use phylogenetic footprinting or orthologous sequences and also an integrated approach where promoter sequences of coregulated genes and phylogenetic footprinting are used. All the algorithms studied have been reported to correctly detect the motifs that have been previously detected by laboratory experimental approaches, and some algorithms were able to find novel motifs. However, most of these motif finding algorithms have been shown to work successfully in yeast and other lower organisms, but perform significantly worse in higher organisms. CONCLUSION Despite considerable efforts to date, DNA motif finding remains a complex challenge for biologists and computer scientists. Researchers have taken many different approaches in developing motif discovery tools and the progress made in this area of research is very encouraging. Performance comparison of different motif finding tools and identification of the best tools have proven to be a difficult task because tools are designed based on algorithms and motif models that are diverse and complex and our incomplete understanding of the biology of regulatory mechanism does not always provide adequate evaluation of underlying algorithms over motif models.
Collapse
Affiliation(s)
- Modan K Das
- Computer Science Department, Oklahoma State University, Stillwater, Oklahoma 74078, USA
- USDA-ARS, Department of Plant Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Ho-Kwok Dai
- Computer Science Department, Oklahoma State University, Stillwater, Oklahoma 74078, USA
| |
Collapse
|
42
|
Kovaleva GY, Gelfand MS. Transcriptional regulation of the methionine and cysteine transport and metabolism in streptococci. FEMS Microbiol Lett 2007; 276:207-15. [DOI: 10.1111/j.1574-6968.2007.00934.x] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
43
|
Genome-wide analysis of chlamydiae for promoters that phylogenetically footprint. Res Microbiol 2007; 158:685-93. [PMID: 18039561 DOI: 10.1016/j.resmic.2007.08.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2007] [Accepted: 08/22/2007] [Indexed: 11/23/2022]
Abstract
Currently, there is a lack of phylogenetic footprinting programmes that can take advantage of multiple whole genome sequences of different species within the same bacterial genus. Therefore, we have developed and tested a position weight matrix-based programme called Footy, that performs genome-wide analysis of bacterial genomes for promoters that phylogenetically footprint. When Footy was used to analyse the non-coding regions upstream of genes from three chlamyidal species for promoters that phylogenetically footprint, it predicted a total of 42 promoters, of which 41 were new. Ten of the 41 new promoters predicted by Footy were biologically assayed in Chlamydia trachomatis by mapping the 5' end of the transcripts for the associated genes. The primer extension assay validated seven of the 10 promoters. When Footy was compared to two other accepted methods for genome-wide prediction of promoters in bacteria (the standard PWM method and MITRA), Footy performed equally as well or better than these programmes. This paper, therefore, shows the value of a bioinformatics programme able to perform genome-wide analysis of bacteria for promoters that phylogenetically footprint.
Collapse
|
44
|
Affiliation(s)
- Dmitry A Rodionov
- Burnham Institute for Medical Research, La Jolla, California 92037, USA.
| |
Collapse
|
45
|
Tsiganova MO, Gelfand MS, Ravcheev DA. Regulation of bacterial respiration: Comparison of microarray and comparative genomics data. Mol Biol 2007. [DOI: 10.1134/s0026893307030168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
46
|
Cubonová L, Sandman K, Karr EA, Cochran AJ, Reeve JN. Spontaneous trpY mutants and mutational analysis of the TrpY archaeal transcription regulator. J Bacteriol 2007; 189:4338-42. [PMID: 17400746 PMCID: PMC1913389 DOI: 10.1128/jb.00164-07] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2007] [Accepted: 03/19/2007] [Indexed: 11/20/2022] Open
Abstract
Over 90% of Methanothermobacter thermautotrophicus mutants isolated as spontaneously resistant to 5-methyl tryptophan had mutations in trpY. Most were single-base-pair substitutions that identified separate DNA- and tryptophan-binding regions in TrpY. In vivo and in vitro studies revealed that DNA binding was sufficient for TrpY repression of trpY transcription but that TrpY must bind DNA and tryptophan to assemble a complex that represses trpEGCFBAD.
Collapse
Affiliation(s)
- L'ubomíra Cubonová
- Department of Microbiology, Ohio State University, Columbus, OH 43210-1292, USA
| | | | | | | | | |
Collapse
|
47
|
Gvakharia BO, Permina EA, Gelfand MS, Bottomley PJ, Sayavedra-Soto LA, Arp DJ. Global transcriptional response of Nitrosomonas europaea to chloroform and chloromethane. Appl Environ Microbiol 2007; 73:3440-5. [PMID: 17369330 PMCID: PMC1907119 DOI: 10.1128/aem.02831-06] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Upon exposure of Nitrosomonas europaea to chloroform (7 microM, 1 h), transcripts for 175 of 2,460 genes were found at higher levels in treated cells than in untreated cells and transcripts for 501 genes were found at lower levels. With chloromethane (3.2 mM, 1 h), transcripts for 67 genes were at higher levels and transcripts for 148 genes were at lower levels. Transcripts for 37 genes were at higher levels following both treatments and included genes for heat shock proteins, sigma-factors of the extracytoplasmic function subfamily, and toxin-antitoxin loci. N. europaea has higher levels of transcripts for a variety of defense genes when exposed to chloroform or chloromethane.
Collapse
Affiliation(s)
- Barbara O Gvakharia
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | | | | | | | | | | |
Collapse
|
48
|
Wang LY, Snyder M, Gerstein M. BoCaTFBS: a boosted cascade learner to refine the binding sites suggested by ChIP-chip experiments. Genome Biol 2007; 7:R102. [PMID: 17078876 PMCID: PMC1794589 DOI: 10.1186/gb-2006-7-11-r102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2006] [Revised: 08/29/2006] [Accepted: 11/01/2006] [Indexed: 11/23/2022] Open
Abstract
BoCaTFBS, a new method that combines noisy data from ChIP-chip experiments with known binding-site patterns, is described and applied to the ENCODE project. Comprehensive mapping of transcription factor binding sites is essential in postgenomic biology. For this, we propose a mining approach combining noisy data from ChIP (chromatin immunoprecipitation)-chip experiments with known binding site patterns. Our method (BoCaTFBS) uses boosted cascades of classifiers for optimum efficiency, in which components are alternating decision trees; it exploits interpositional correlations; and it explicitly integrates massive negative information from ChIP-chip experiments. We applied BoCaTFBS within the ENCODE project and showed that it outperforms many traditional binding site identification methods (for instance, profiles).
Collapse
Affiliation(s)
- Lu-yong Wang
- Integrated Data Systems Department, Siemens Corporate Research, 755 College Road East, Princeton, New Jersey 08540, USA
| | - Michael Snyder
- Department of Molecular, Cellular, and Developmental Biology, KBT 926, 266 Whitney Ave, Yale University, New Haven, Connecticut 06520, USA
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Bass 432A, 266 Whitney Ave, Yale University, New Haven, CT 06520, USA
- Program in Computational Biology and Bioinformatics, Bass 432A, 266 Whitney Ave, Yale University, New Haven, CT 06520, USA
- Department of Computer Science, 51 Prospect Street, Yale University, New Haven, Connecticut 06520, USA
| |
Collapse
|
49
|
Ravcheev DA, Gerasimova AV, Mironov AA, Gelfand MS. Comparative genomic analysis of regulation of anaerobic respiration in ten genomes from three families of gamma-proteobacteria (Enterobacteriaceae, Pasteurellaceae, Vibrionaceae). BMC Genomics 2007; 8:54. [PMID: 17313674 PMCID: PMC1805755 DOI: 10.1186/1471-2164-8-54] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2006] [Accepted: 02/21/2007] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Gamma-proteobacteria, such as Escherichia coli, can use a variety of respiratory substrates employing numerous aerobic and anaerobic respiratory systems controlled by multiple transcription regulators. Thus, in E. coli, global control of respiration is mediated by four transcription factors, Fnr, ArcA, NarL and NarP. However, in other Gamma-proteobacteria the composition of global respiration regulators may be different. RESULTS In this study we applied a comparative genomic approach to the analysis of three global regulatory systems, Fnr, ArcA and NarP. These systems were studied in available genomes containing these three regulators, but lacking NarL. So, we considered several representatives of Pasteurellaceae, Vibrionaceae and Yersinia spp. As a result, we identified new regulon members, functioning in respiration, central metabolism (glycolysis, gluconeogenesis, pentose phosphate pathway, citrate cicle, metabolism of pyruvate and lactate), metabolism of carbohydrates and fatty acids, transcriptional regulation and transport, in particular: the ATP synthase operon atpIBEFHAGCD, Na+-exporting NADH dehydrogenase operon nqrABCDEF, the D-amino acids dehydrogenase operon dadAX. Using an extension of the comparative technique, we demonstrated taxon-specific changes in regulatory interactions and predicted taxon-specific regulatory cascades. CONCLUSION A comparative genomic technique was applied to the analysis of global regulation of respiration in ten gamma-proteobacterial genomes. Three structurally different but functionally related regulatory systems were described. A correlation between the regulon size and the position of a transcription factor in regulatory cascades was observed: regulators with larger regulons tend to occupy top positions in the cascades. On the other hand, there is no obvious link to differences in the species' lifestyles and metabolic capabilities.
Collapse
Affiliation(s)
- Dmitry A Ravcheev
- Lomonosov Moscow State University, Department of Bioengineering and Bioinformatics, Moscow, 119992, Russia
- Institute for Information Transmission Problems, Moscow, 127994, Russia
| | | | - Andrey A Mironov
- Lomonosov Moscow State University, Department of Bioengineering and Bioinformatics, Moscow, 119992, Russia
- Institute for Information Transmission Problems, Moscow, 127994, Russia
- State Scientific Center GosNIIGenetika, Moscow, 113545, Russia
| | - Mikhail S Gelfand
- Lomonosov Moscow State University, Department of Bioengineering and Bioinformatics, Moscow, 119992, Russia
- Institute for Information Transmission Problems, Moscow, 127994, Russia
- State Scientific Center GosNIIGenetika, Moscow, 113545, Russia
| |
Collapse
|
50
|
Abstract
As the molecular adapters between codons and amino acids, transfer-RNAs are pivotal molecules of the genetic code. The coding properties of a tRNA molecule do not reside only in its primary sequence. Posttranscriptional nucleoside modifications, particularly in the anticodon loop, can modify cognate codon recognition, affect aminoacylation properties, or stabilize the codon-anticodon wobble base pairing to prevent ribosomal frameshifting. Despite a wealth of biophysical and structural knowledge of the tRNA modifications themselves, their pathways of biosynthesis had been until recently only partially characterized. This discrepancy was mainly due to the lack of obvious phenotypes for tRNA modification-deficient strains and to the difficulty of the biochemical assays used to detect tRNA modifications. However, the availability of hundreds of whole-genome sequences has allowed the identification of many of these missing tRNA-modification genes. This chapter reviews the methods that were used to identify these genes with a special emphasis on the comparative genomic approaches. Methods that link gene and function but do not rely on sequence homology will be detailed, with examples taken from the tRNA modification field.
Collapse
|