1
|
Baumgarten N, Rumpf L, Kessler T, Schulz MH. A statistical approach for identifying single nucleotide variants that affect transcription factor binding. iScience 2024; 27:109765. [PMID: 38736546 PMCID: PMC11088338 DOI: 10.1016/j.isci.2024.109765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 01/30/2024] [Accepted: 04/15/2024] [Indexed: 05/14/2024] Open
Abstract
Non-coding variants located within regulatory elements may alter gene expression by modifying transcription factor (TF) binding sites, thereby leading to functional consequences. Different TF models are being used to assess the effect of DNA sequence variants, such as single nucleotide variants (SNVs). Often existing methods are slow and do not assess statistical significance of results. We investigated the distribution of absolute maximal differential TF binding scores for general computational models that affect TF binding. We find that a modified Laplace distribution can adequately approximate the empirical distributions. A benchmark on in vitro and in vivo datasets showed that our approach improves upon an existing method in terms of performance and speed. Applications on eQTLs and on a genome-wide association study illustrate the usefulness of our statistics by highlighting cell type-specific regulators and target genes. An implementation of our approach is freely available on GitHub and as bioconda package.
Collapse
Affiliation(s)
- Nina Baumgarten
- Institute of Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computational Genomic Medicine, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computer Science, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590 Frankfurt am Main, Germany
| | - Laura Rumpf
- Institute of Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computational Genomic Medicine, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computer Science, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590 Frankfurt am Main, Germany
| | - Thorsten Kessler
- German Heart Centre Munich, Department of Cardiology, School of Medicine and Health, Technical University of Munich, 80636 Munich, Germany
- German Centre for Cardiovascular Research, Partner Site Munich Heart Alliance, 80636 Munich, Germany
| | - Marcel H. Schulz
- Institute of Cardiovascular Regeneration, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computational Genomic Medicine, Goethe University, 60590 Frankfurt am Main, Germany
- Institute for Computer Science, Goethe University, 60590 Frankfurt am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590 Frankfurt am Main, Germany
| |
Collapse
|
2
|
Abdullahi KB. Kabirian-based optinalysis: A conceptually grounded framework for symmetry/asymmetry, similarity/dissimilarity and identity/unidentity estimations in mathematical structures and biological sequences. MethodsX 2023; 11:102400. [PMID: 37928104 PMCID: PMC10622715 DOI: 10.1016/j.mex.2023.102400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 09/24/2023] [Indexed: 11/07/2023] Open
Abstract
This paper introduces "Kabirian-based optinalysis (KBO)," a pioneering framework that addresses the longstanding challenges in estimating symmetry/asymmetry, similarity/dissimilarity, and identity/unidentity within mathematical structures and biological sequences. The existing methods often lack a strong theoretical foundation, leading to inconsistencies and limitations. Kabirian-based optinalysis draws inspiration from isomorphism and automorphism, providing a theoretically grounded framework that unifies estimation methodologies. It introduces the concept of optiscale, autoreflective pairing, isoreflective pairing, and others ensuring invariance and robustness under various mathematical transformations and establishing functional bijectivity for isomorphic or automorphic structures. This not only overcomes previous limitations but also offers precise and interpretable estimations. Additionally, the framework introduces "geometrical pairwise analysis" to improve sensitivity to position-specific and character-specific variations in biological sequences. This novel approach enhances the accuracy of sequence similarity assessments, surpassing the constraints of conventional methods. The novelty of this work extends beyond mathematics and biology, impacting diverse fields such as computer science, data analysis, pattern recognition, and evolutionary biology. Kabirian-based optinalysis presents a holistic and theoretically grounded solution that has the potential to revolutionize the analysis of complex structures and sequences, opening new horizons for interdisciplinary research.•Inspired by automorphism and isomorphism, Kabirian-based optinalysis introduces a new paradigm-shifting and unified approach to estimations in mathematical structures and biological sequences with a solid conceptual and theoretical foundation.•The GPA method enhances pairwise sequence similarity estimation by being sensitive to position-specific and character-specific variations and providing a comprehensive characterization of these features.
Collapse
Affiliation(s)
- Kabir Bindawa Abdullahi
- Department of Biology, Faculty of Natural and Applied Sciences, Umaru Musa Yar'adua University, P.M.B., Katsina, Katsina State 2218, Nigeria
| |
Collapse
|
3
|
Wang Z, Peng C, Wu W, Yan C, Lv Y, Li JT. Developmental regulation of conserved non-coding element evolution provides insights into limb loss in squamates. SCIENCE CHINA. LIFE SCIENCES 2023; 66:2399-2414. [PMID: 37256419 DOI: 10.1007/s11427-023-2362-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 05/09/2023] [Indexed: 06/01/2023]
Abstract
Limb loss shows recurrent phenotypic evolution across squamate lineages. Here, based on three de novo-assembled genomes of limbless lizards from different lineages, we showed that divergence of conserved non-coding elements (CNEs) played an important role in limb development. These CNEs were associated with genes required for limb initiation and outgrowth, and with regulatory signals in the early stage of limb development. Importantly, we identified the extensive existence of insertions and deletions (InDels) in the CNEs, with the numbers ranging from 111 to 756. Most of these CNEs with InDels were lineage-specific in the limbless squamates. Nearby genes of these InDel CNEs were important to early limb formation, such as Tbx4, Fgf10, and Gli3. Based on functional experiments, we found that nucleotide mutations and InDels both affected the regulatory function of the CNEs. Our study provides molecular evidence underlying limb loss in squamate reptiles from a developmental perspective and sheds light on the importance of regulatory element InDels in phenotypic evolution.
Collapse
Affiliation(s)
- Zeng Wang
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & h Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Changjun Peng
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & h Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Wei Wu
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & h Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Chaochao Yan
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & h Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China
| | - Yunyun Lv
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & h Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China
- College of Life Science, Neijiang Normal University, Neijiang, 641100, China
| | - Jia-Tang Li
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & h Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- Southeast Asia Biodiversity Research Institute, Chinese Academy of Sciences, Yezin Nay Pyi Taw, 05282, Myanmar.
| |
Collapse
|
4
|
Peng C, Wu DD, Ren JL, Peng ZL, Ma Z, Wu W, Lv Y, Wang Z, Deng C, Jiang K, Parkinson CL, Qi Y, Zhang ZY, Li JT. Large-scale snake genome analyses provide insights into vertebrate development. Cell 2023; 186:2959-2976.e22. [PMID: 37339633 DOI: 10.1016/j.cell.2023.05.030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 04/06/2023] [Accepted: 05/19/2023] [Indexed: 06/22/2023]
Abstract
Snakes are a remarkable squamate lineage with unique morphological adaptations, especially those related to the evolution of vertebrate skeletons, organs, and sensory systems. To clarify the genetic underpinnings of snake phenotypes, we assembled and analyzed 14 de novo genomes from 12 snake families. We also investigated the genetic basis of the morphological characteristics of snakes using functional experiments. We identified genes, regulatory elements, and structural variations that have potentially contributed to the evolution of limb loss, an elongated body plan, asymmetrical lungs, sensory systems, and digestive adaptations in snakes. We identified some of the genes and regulatory elements that might have shaped the evolution of vision, the skeletal system and diet in blind snakes, and thermoreception in infrared-sensitive snakes. Our study provides insights into the evolution and development of snakes and vertebrates.
Collapse
Affiliation(s)
- Changjun Peng
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dong-Dong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Jin-Long Ren
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China
| | - Zhong-Liang Peng
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhifei Ma
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wei Wu
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yunyun Lv
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China; College of Life Science, Neijiang Normal University, Neijiang, Sichuan 641100, China
| | - Zeng Wang
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Cao Deng
- Departments of Bioinformatics, DNA Stories Bioinformatics Center, Chengdu 610000, China
| | - Ke Jiang
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China
| | | | - Yin Qi
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China
| | - Zhi-Yi Zhang
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China
| | - Jia-Tang Li
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610040, China; University of Chinese Academy of Sciences, Beijing 100049, China; Southeast Asia Biodiversity Research Institute, Chinese Academy of Sciences, Yezin, Nay Pyi Taw 05282, Myanmar.
| |
Collapse
|
5
|
Batyrshina ZS, Shavit R, Yaakov B, Bocobza S, Tzin V. The transcription factor TaMYB31 regulates the benzoxazinoid biosynthetic pathway in wheat. JOURNAL OF EXPERIMENTAL BOTANY 2022; 73:5634-5649. [PMID: 35554544 PMCID: PMC9467655 DOI: 10.1093/jxb/erac204] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Accepted: 05/10/2022] [Indexed: 05/13/2023]
Abstract
Benzoxazinoids are specialized metabolites that are highly abundant in staple crops, such as maize and wheat. Although their biosynthesis has been studied for several decades, the regulatory mechanisms of the benzoxazinoid pathway remain unknown. Here, we report that the wheat transcription factor MYB31 functions as a regulator of benzoxazinoid biosynthesis genes. A transcriptomic analysis of tetraploid wheat (Triticum turgidum) tissue revealed the up-regulation of two TtMYB31 homoeologous genes upon aphid and caterpillar feeding. TaMYB31 gene silencing in the hexaploid wheat Triticum aestivum significantly reduced benzoxazinoid metabolite levels and led to susceptibility to herbivores. Thus, aphid progeny production, caterpillar body weight gain, and spider mite oviposition significantly increased in TaMYB31-silenced plants. A comprehensive transcriptomic analysis of hexaploid wheat revealed that the TaMYB31 gene is co-expressed with the target benzoxazinoid-encoded Bx genes under several biotic and environmental conditions. Therefore, we analyzed the effect of abiotic stresses on benzoxazinoid levels and discovered a strong accumulation of these compounds in the leaves. The results of a dual fluorescence assay indicated that TaMYB31 binds to the Bx1 and Bx4 gene promoters, thereby activating the transcription of genes involved in the benzoxazinoid pathway. Our finding is the first report of the transcriptional regulation mechanism of the benzoxazinoid pathway in wheat.
Collapse
Affiliation(s)
- Zhaniya S Batyrshina
- French Associates Institute for Agriculture and Biotechnology of Drylands, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben Gurion, 8499000, Israel
| | - Reut Shavit
- French Associates Institute for Agriculture and Biotechnology of Drylands, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben Gurion, 8499000, Israel
| | - Beery Yaakov
- French Associates Institute for Agriculture and Biotechnology of Drylands, Jacob Blaustein Institutes for Desert Research, Ben-Gurion University of the Negev, Midreshet Ben Gurion, 8499000, Israel
| | - Samuel Bocobza
- Department of Ornamentals and Biotechnology, Institute of Plant Sciences, Agricultural Research Organization, The Volcani Center, 68 Hamakabim Road, 7528809, Rishon LeZion, Israel
| | | |
Collapse
|
6
|
The zinc-finger bearing xenogeneic silencer MucR in α-proteobacteria balances adaptation and regulatory integrity. THE ISME JOURNAL 2022; 16:738-749. [PMID: 34584215 PMCID: PMC8857273 DOI: 10.1038/s41396-021-01118-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2021] [Revised: 09/07/2021] [Accepted: 09/10/2021] [Indexed: 02/08/2023]
Abstract
Foreign AT-rich genes drive bacterial adaptation to new niches while challenging the existing regulation network. Here we report that MucR, a conserved regulator in α-proteobacteria, balances adaptation and regulatory integrity in Sinorhizobium fredii, a facultative microsymbiont of legumes. Chromatin immunoprecipitation sequencing coupled with transcriptomic data reveal that average transcription levels of both target and non-target genes, under free-living and symbiotic conditions, increase with their conservation levels. Targets involved in environmental adaptation and symbiosis belong to genus or species core and can be repressed or activated by MucR in a condition-dependent manner, implying regulatory integrations. However, most targets are enriched in strain-specific genes of lower expression levels and higher AT%. Within each conservation levels, targets have higher AT% and average transcription levels than non-target genes and can be further up-regulated in the mucR mutant. This is consistent with higher AT% of spacers between -35 and -10 elements of promoters for target genes, which enhances transcription. The MucR recruitment level linearly increases with AT% and the number of a flexible pattern (with periodic repeats of Ts) of target sequences. Collectively, MucR directly represses AT-rich foreign genes with predisposed high transcription potential while progressive erosions of its target sites facilitate regulatory integrations of foreign genes.
Collapse
|
7
|
Zeng C, Takeda A, Sekine K, Osato N, Fukunaga T, Hamada M. Bioinformatics Approaches for Determining the Functional Impact of Repetitive Elements on Non-coding RNAs. Methods Mol Biol 2022; 2509:315-340. [PMID: 35796972 DOI: 10.1007/978-1-0716-2380-0_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With a large number of annotated non-coding RNAs (ncRNAs), repetitive sequences are found to constitute functional components (termed as repetitive elements) in ncRNAs that perform specific biological functions. Bioinformatics analysis is a powerful tool for improving our understanding of the role of repetitive elements in ncRNAs. This chapter summarizes recent findings that reveal the role of repetitive elements in ncRNAs. Furthermore, relevant bioinformatics approaches are systematically reviewed, which promises to provide valuable resources for studying the functional impact of repetitive elements on ncRNAs.
Collapse
Affiliation(s)
- Chao Zeng
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan.
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo, Japan.
| | - Atsushi Takeda
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Kotaro Sekine
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Naoki Osato
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan.
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo, Japan.
| |
Collapse
|
8
|
Hochfeld LM, Bertolini M, Broadley D, Botchkareva NV, Betz RC, Schoch S, Nöthen MM, Heilmann-Heimbach S. Evidence for a functional interaction of WNT10A and EBF1 in male-pattern baldness. PLoS One 2021; 16:e0256846. [PMID: 34506541 PMCID: PMC8432770 DOI: 10.1371/journal.pone.0256846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 08/17/2021] [Indexed: 11/19/2022] Open
Abstract
More than 300 genetic risk loci have been identified for male pattern baldness (MPB) but little is known about the exact molecular mechanisms through which the associated variants exert their effects on MPB pathophysiology. Here, we aimed at further elucidating the regulatory architecture of the MPB risk locus on chromosome (chr.) 2q35, where we have previously reported a regulatory effect of the MPB lead variant on the expression of WNT10A. A HaploReg database research for regulatory annotations revealed that the association signal at 2q35 maps to a binding site for the transcription factor EBF1, whose gene is located at a second MPB risk locus on chr. 5q33.3. To investigate a potential interaction between EBF1 and WNT10A during MPB development, we performed in vitro luciferase reporter assays as well as expression analyses and immunofluorescence co-stainings in microdissected human hair follicles. Our experiments confirm that EBF1 activates the WNT10A promoter and that the WNT10A/EBF1 interaction is impacted by the allelic expression of the MPB risk allele at 2q35. Expression analyses across different hair cycle phases and immunhistochemical (co)stainings against WNT10A and EBF1 suggest a predominant relevance of EBF1/WNT10A interaction for hair shaft formation during anagen. Based on these findings we suggest a functional mechanism at the 2q35 risk locus for MPB, where an MPB-risk allele associated reduction in WNT10A promoter activation via EBF1 results in a decrease in WNT10A expression that eventually results in anagen shortening, that is frequently observed in MPB affected hair follicles. To our knowledge, this study is the first follow-up study on MPB that proves functional interaction between two MPB risk loci and sheds light on the underlying pathophysiological mechanism at these loci.
Collapse
Affiliation(s)
- Lara M. Hochfeld
- Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany
| | - Marta Bertolini
- Monasterium Laboratory, Skin and Hair Research Solutions GmbH, Münster, Germany
| | - David Broadley
- Centre for Skin Sciences, Faculty of Life Sciences, University of Bradford, Bradford, England, United Kingdom
| | - Natalia V. Botchkareva
- Centre for Skin Sciences, Faculty of Life Sciences, University of Bradford, Bradford, England, United Kingdom
| | - Regina C. Betz
- Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany
| | - Susanne Schoch
- Department of Neuropathology, University of Bonn Medical Center, Bonn, Germany
| | - Markus M. Nöthen
- Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany
| | - Stefanie Heilmann-Heimbach
- Institute of Human Genetics, University of Bonn, School of Medicine & University Hospital Bonn, Bonn, Germany
- * E-mail:
| |
Collapse
|
9
|
Revolutionizing enzyme engineering through artificial intelligence and machine learning. Emerg Top Life Sci 2021; 5:113-125. [PMID: 33835131 DOI: 10.1042/etls20200257] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 03/17/2021] [Accepted: 03/22/2021] [Indexed: 12/20/2022]
Abstract
The combinatorial space of an enzyme sequence has astronomical possibilities and exploring it with contemporary experimental techniques is arduous and often ineffective. Multi-target objectives such as concomitantly achieving improved selectivity, solubility and activity of an enzyme have narrow plausibility under approaches of restricted mutagenesis and combinatorial search. Traditional enzyme engineering approaches have a limited scope for complex optimization due to the requirement of a priori knowledge or experimental burden of screening huge protein libraries. The recent surge in high-throughput experimental methods including Next Generation Sequencing and automated screening has flooded the field of molecular biology with big-data, which requires us to re-think our concurrent approaches towards enzyme engineering. Artificial Intelligence (AI) and Machine Learning (ML) have great potential to revolutionize smart enzyme engineering without the explicit need for a complete understanding of the underlying molecular system. Here, we portray the role and position of AI techniques in the field of enzyme engineering along with their scope and limitations. In addition, we explain how the traditional approaches of directed evolution and rational design can be extended through AI tools. Recent successful examples of AI-assisted enzyme engineering projects and their deviation from traditional approaches are highlighted. A comprehensive picture of current challenges and future avenues for AI in enzyme engineering are also discussed.
Collapse
|
10
|
Schield DR, Pasquesi GIM, Perry BW, Adams RH, Nikolakis ZL, Westfall AK, Orton RW, Meik JM, Mackessy SP, Castoe TA. Snake Recombination Landscapes Are Concentrated in Functional Regions despite PRDM9. Mol Biol Evol 2021; 37:1272-1294. [PMID: 31926008 DOI: 10.1093/molbev/msaa003] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Meiotic recombination in vertebrates is concentrated in hotspots throughout the genome. The location and stability of hotspots have been linked to the presence or absence of PRDM9, leading to two primary models for hotspot evolution derived from mammals and birds. Species with PRDM9-directed recombination have rapid turnover of hotspots concentrated in intergenic regions (i.e., mammals), whereas hotspots in species lacking PRDM9 are concentrated in functional regions and have greater stability over time (i.e., birds). Snakes possess PRDM9, yet virtually nothing is known about snake recombination. Here, we examine the recombination landscape and test hypotheses about the roles of PRDM9 in rattlesnakes. We find substantial variation in recombination rate within and among snake chromosomes, and positive correlations between recombination rate and gene density, GC content, and genetic diversity. Like mammals, snakes appear to have a functional and active PRDM9, but rather than being directed away from genes, snake hotspots are concentrated in promoters and functional regions-a pattern previously associated only with species that lack a functional PRDM9. Snakes therefore provide a unique example of recombination landscapes in which PRDM9 is functional, yet recombination hotspots are associated with functional genic regions-a combination of features that defy existing paradigms for recombination landscapes in vertebrates. Our findings also provide evidence that high recombination rates are a shared feature of vertebrate microchromosomes. Our results challenge previous assumptions about the adaptive role of PRDM9 and highlight the diversity of recombination landscape features among vertebrate lineages.
Collapse
Affiliation(s)
- Drew R Schield
- Department of Biology, University of Texas at Arlington, Arlington, TX
| | | | - Blair W Perry
- Department of Biology, University of Texas at Arlington, Arlington, TX
| | - Richard H Adams
- Department of Biology, University of Texas at Arlington, Arlington, TX.,Department of Computer & Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL
| | | | | | - Richard W Orton
- Department of Biology, University of Texas at Arlington, Arlington, TX
| | - Jesse M Meik
- Department of Biological Sciences, Tarleton State University, Stephenville, TX
| | - Stephen P Mackessy
- School of Biological Sciences, University of Northern Colorado, Greeley, CO
| | - Todd A Castoe
- Department of Biology, University of Texas at Arlington, Arlington, TX
| |
Collapse
|
11
|
Hale MD, Parrott BB. Assessing the Ability of Developmentally Precocious Estrogen Signaling to Recapitulate Ovarian Transcriptomes and Follicle Dynamics in Alligators from a Contaminated Lake. ENVIRONMENTAL HEALTH PERSPECTIVES 2020; 128:117003. [PMID: 33186072 PMCID: PMC7665278 DOI: 10.1289/ehp6627] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 10/09/2020] [Accepted: 10/16/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND Concern has grown in recent decades over anthropogenic contaminants that interfere with the functioning of endocrine hormones. However, mechanisms connecting developmental processes to pathologies associated with endocrine-disrupting chemical (EDC) exposure are poorly understood in naturally exposed populations. OBJECTIVES We sought to a) characterize divergence in ovarian transcriptomic and follicular profiles between alligators originating from a historically EDC-contaminated site, Lake Apopka, and a reference site; b) test the ability of developmentally precocious estrogen exposure to recapitulate site-associated patterns of divergence; and c) test whether treatment with exogenous follicle-stimulating hormone (FSH) is capable of rescuing phenotypes associated with contaminant exposure and/or embryonic estrogen treatment. METHODS Alligators eggs were collected from a contaminated site and a reference site, and a subset of eggs from the reference site were treated with estradiol (E2) during embryonic development prior to gonadal differentiation. After hatching, alligators were raised under controlled laboratory settings for 5 months. Juveniles from both sites were divided and treated with exogenous FSH. Histological analyses and RNA-sequencing were conducted to characterize divergence in ovarian follicle dynamics and transcriptomes between sites, between reference and E2-treated animals, and between FSH-treated and nontreated animals. RESULTS We observed broad site-of-origin divergence in ovarian transcriptomes and reductions in ovarian follicle density between juvenile alligators from Lake Apopka and the reference site. Treating embryos from the reference site with E2 overwhelmingly recapitulated transcriptional and histological alterations observed in Lake Apopka juveniles. Ovarian phenotypes observed in Lake Apopka alligators or resulting from estrogen treatment were only partially rescued by treatment with exogenous FSH. DISCUSSION Recapitulation of ovarian abnormalities by precocious E2 revealed a relatively simple mechanism underlying contaminant-induced pathologies in a historical example of environmental endocrine disruption. Findings reported here support a model where the developmental timing of estrogen signaling has the potential to permanently alter ovarian organization and function. https://doi.org/10.1289/EHP6627.
Collapse
Affiliation(s)
- Matthew D. Hale
- Savannah River Ecology Laboratory, Aiken, South Carolina, USA
- Odum School of Ecology, University of Georgia, Athens, Georgia, USA
| | - Benjamin B. Parrott
- Savannah River Ecology Laboratory, Aiken, South Carolina, USA
- Odum School of Ecology, University of Georgia, Athens, Georgia, USA
| |
Collapse
|
12
|
Carazo F, Romero JP, Rubio A. Upstream analysis of alternative splicing: a review of computational approaches to predict context-dependent splicing factors. Brief Bioinform 2020; 20:1358-1375. [PMID: 29390045 DOI: 10.1093/bib/bby005] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Revised: 12/14/2017] [Indexed: 12/13/2022] Open
Abstract
Alternative splicing (AS) has shown to play a pivotal role in the development of diseases, including cancer. Specifically, all the hallmarks of cancer (angiogenesis, cell immortality, avoiding immune system response, etc.) are found to have a counterpart in aberrant splicing of key genes. Identifying the context-specific regulators of splicing provides valuable information to find new biomarkers, as well as to define alternative therapeutic strategies. The computational models to identify these regulators are not trivial and require three conceptual steps: the detection of AS events, the identification of splicing factors that potentially regulate these events and the contextualization of these pieces of information for a specific experiment. In this work, we review the different algorithmic methodologies developed for each of these tasks. Main weaknesses and strengths of the different steps of the pipeline are discussed. Finally, a case study is detailed to help the reader be aware of the potential and limitations of this computational approach.
Collapse
|
13
|
Fostier J. BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs. BMC Bioinformatics 2020; 21:81. [PMID: 32164557 PMCID: PMC7068855 DOI: 10.1186/s12859-020-3348-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND The identification of all matches of a large set of position weight matrices (PWMs) in long DNA sequences requires significant computational resources for which a number of efficient yet complex algorithms have been proposed. RESULTS We propose BLAMM, a simple and efficient tool inspired by high performance computing techniques. The workload is expressed in terms of matrix-matrix products that are evaluated with high efficiency using optimized BLAS library implementations. The algorithm is easy to parallelize and implement on CPUs and GPUs and has a runtime that is independent of the selected p-value. In terms of single-core performance, it is competitive with state-of-the-art software for PWM matching while being much more efficient when using multithreading. Additionally, BLAMM requires negligible memory. For example, both strands of the entire human genome can be scanned for 1404 PWMs in the JASPAR database in 13 min with a p-value of 10-4 using a 36-core machine. On a dual GPU system, the same task can be performed in under 5 min. CONCLUSIONS BLAMM is an efficient tool for identifying PWM matches in large DNA sequences. Its C++ source code is available under the GNU General Public License Version 3 at https://github.com/biointec/blamm.
Collapse
Affiliation(s)
- Jan Fostier
- Department of Information Technology - IDLab, Ghent University - imec, Technologiepark 126, Ghent (Zwijnaarde), B-9052, Belgium.
| |
Collapse
|
14
|
Li Y, Liu Y, Juedes D, Drews F, Bunescu R, Welch L. Set cover-based methods for motif selection. Bioinformatics 2020; 36:1044-1051. [PMID: 31665223 PMCID: PMC7703758 DOI: 10.1093/bioinformatics/btz697] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2018] [Revised: 08/13/2019] [Accepted: 09/13/2019] [Indexed: 11/14/2022] Open
Abstract
Motivation De novo motif discovery algorithms find statistically over-represented sequence motifs that may function as transcription factor binding sites. Current methods often report large numbers of motifs, making it difficult to perform further analyses and experimental validation. The motif selection problem seeks to identify a minimal set of putative regulatory motifs that characterize sequences of interest (e.g. ChIP-Seq binding regions). Results In this study, the motif selection problem is mapped to variants of the set cover problem that are solved via tabu search and by relaxed integer linear programing (RILP). The algorithms are employed to analyze 349 ChIP-Seq experiments from the ENCODE project, yielding a small number of high-quality motifs that represent putative binding sites of primary factors and cofactors. Specifically, when compared with the motifs reported by Kheradpour and Kellis, the set cover-based algorithms produced motif sets covering 35% more peaks for 11 TFs and identified 4 more putative cofactors for 6 TFs. Moreover, a systematic evaluation using nested cross-validation revealed that the RILP algorithm selected fewer motifs and was able to cover 6% more peaks and 3% fewer background regions, which reduced the error rate by 7%. Availability and implementation The source code of the algorithms and all the datasets are available at https://github.com/YichaoOU/Set_cover_tools. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yichao Li
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| | - Yating Liu
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| | - David Juedes
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| | - Frank Drews
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| | - Razvan Bunescu
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| | - Lonnie Welch
- Department of Electrical Engineering and Computer Science, Ohio University, Athens, OH 45701, USA
| |
Collapse
|
15
|
Shen Y, Pan X, Yang J. Gene regulation and prognostic indicators of lung squamous cell carcinoma: TCGA-derived miRNA/mRNA sequencing and DNA methylation data. J Cell Physiol 2019; 234:22896-22910. [PMID: 31169310 DOI: 10.1002/jcp.28852] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2018] [Revised: 03/15/2019] [Accepted: 05/01/2019] [Indexed: 11/07/2022]
Abstract
Lung squamous cell carcinoma (LSCC) is a common cancer worldwide, and this study aimed to investigate the key regulatory networks and prognostic indicators of LSCC. MicroRNA (miRNA)/messenger RNA (mRNA) sequencing and DNA methylation data were obtained from the Cancer Genome Atlas. Differentially expressed miRNAs (DEmiRNAs) and genes (DEGs) were identified by the limma package. Then, the transcription factors (TFs) of DEmiRNAs/DEGs, as well as the targets of miRNAs, were predicted by the TFmiR online tool. Using the t test, aberrant methylation was detected in TF binding sites (TFBSs) in promoters. Finally, integrated network and survival analyses were conducted using SPSS software. We obtained 104 DEmiRNAs and 4,491 DEGs, and validated 2,113 DEGs (VDEGs). Then, 103 TFs, 295 TFs, and 14 DEmiRNAs were predicted to target 95 DEmiRNAs, 821 DEGs and 283 DEGs, respectively. After TF-DEmiRNA/DEG and TF-DEmiRNA-DEG networks were constructed (e.g., E2F1-CDC25A, miR29a-RAN, miR326-TBL1XR1), five feedforward loops between ZEB1 and miR-141/200a/200b/200c/429 were found. Furthermore, VDEGs CDC25A, RAN, TBL1XR1 as well as miR-130b and miR-590 were negatively correlated with survival rates. E2F1-CDC25A, miR29a-RAN, miR326-TBL1XR1, and the feedforward loops between ZEB1/ZEB2 and miR-141/200a/200b/200c/429 might participate in LSCC development. Compared with BEAS-2B cells, the SK-MES-1 cells presented a higher expression level of miR-141, miR-200a, miR-200b, miR-200c but a lower expression level of ZEB1. Overexpressed miR-200c significantly attenuated the expression of ZEB1 and ZEB2 and inhibited the proliferation and migration of SK-MES-1 cells (all p < 0.05). In addition, CDC25A, miR-200a, miR-200b, miR-200c, miR-130b, and miR-590 are potential prognostic indicators of LSCC.
Collapse
Affiliation(s)
- Yuzhou Shen
- Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Xufeng Pan
- Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Jun Yang
- Department of Thoracic Surgery, Shanghai Chest Hospital, Shanghai Jiaotong University, Shanghai, China
| |
Collapse
|
16
|
Schield DR, Card DC, Hales NR, Perry BW, Pasquesi GM, Blackmon H, Adams RH, Corbin AB, Smith CF, Ramesh B, Demuth JP, Betrán E, Tollis M, Meik JM, Mackessy SP, Castoe TA. The origins and evolution of chromosomes, dosage compensation, and mechanisms underlying venom regulation in snakes. Genome Res 2019; 29:590-601. [PMID: 30898880 PMCID: PMC6442385 DOI: 10.1101/gr.240952.118] [Citation(s) in RCA: 96] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 02/15/2019] [Indexed: 01/28/2023]
Abstract
Here we use a chromosome-level genome assembly of a prairie rattlesnake (Crotalus viridis), together with Hi-C, RNA-seq, and whole-genome resequencing data, to study key features of genome biology and evolution in reptiles. We identify the rattlesnake Z Chromosome, including the recombining pseudoautosomal region, and find evidence for partial dosage compensation driven by an evolutionary accumulation of a female-biased up-regulation mechanism. Comparative analyses with other amniotes provide new insight into the origins, structure, and function of reptile microchromosomes, which we demonstrate have markedly different structure and function compared to macrochromosomes. Snake microchromosomes are also enriched for venom genes, which we show have evolved through multiple tandem duplication events in multiple gene families. By overlaying chromatin structure information and gene expression data, we find evidence for venom gene-specific chromatin contact domains and identify how chromatin structure guides precise expression of multiple venom gene families. Further, we find evidence for venom gland-specific transcription factor activity and characterize a complement of mechanisms underlying venom production and regulation. Our findings reveal novel and fundamental features of reptile genome biology, provide insight into the regulation of snake venom, and broadly highlight the biological insight enabled by chromosome-level genome assemblies.
Collapse
Affiliation(s)
- Drew R Schield
- Department of Biology, University of Texas at Arlington, Arlington, Texas 76010, USA
| | - Daren C Card
- Department of Biology, University of Texas at Arlington, Arlington, Texas 76010, USA
| | - Nicole R Hales
- Department of Biology, University of Texas at Arlington, Arlington, Texas 76010, USA
| | - Blair W Perry
- Department of Biology, University of Texas at Arlington, Arlington, Texas 76010, USA
| | - Giulia M Pasquesi
- Department of Biology, University of Texas at Arlington, Arlington, Texas 76010, USA
| | - Heath Blackmon
- Department of Biology, Texas A&M University, College Station, Texas 77843, USA
| | - Richard H Adams
- Department of Biology, University of Texas at Arlington, Arlington, Texas 76010, USA
| | - Andrew B Corbin
- Department of Biology, University of Texas at Arlington, Arlington, Texas 76010, USA
| | - Cara F Smith
- School of Biological Sciences, University of Northern Colorado, Greeley, Colorado 80639, USA
| | - Balan Ramesh
- Department of Biology, University of Texas at Arlington, Arlington, Texas 76010, USA
| | - Jeffery P Demuth
- Department of Biology, University of Texas at Arlington, Arlington, Texas 76010, USA
| | - Esther Betrán
- Department of Biology, University of Texas at Arlington, Arlington, Texas 76010, USA
| | - Marc Tollis
- School of Life Sciences, Arizona State University, Tempe, Arizona 85287, USA
| | - Jesse M Meik
- Department of Biological Sciences, Tarleton State University, Stephenville, Texas 76402, USA
| | - Stephen P Mackessy
- School of Biological Sciences, University of Northern Colorado, Greeley, Colorado 80639, USA
| | - Todd A Castoe
- Department of Biology, University of Texas at Arlington, Arlington, Texas 76010, USA
| |
Collapse
|
17
|
Bioinformatics Approaches to Gain Insights into cis-Regulatory Motifs Involved in mRNA Localization. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1203:165-194. [PMID: 31811635 DOI: 10.1007/978-3-030-31434-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Messenger RNA (mRNA) is a fundamental intermediate in the expression of proteins. As an integral part of this important process, protein production can be localized by the targeting of mRNA to a specific subcellular compartment. The subcellular destination of mRNA is suggested to be governed by a region of its primary sequence or secondary structure, which consequently dictates the recruitment of trans-acting factors, such as RNA-binding proteins or regulatory RNAs, to form a messenger ribonucleoprotein particle. This molecular ensemble is requisite for precise and spatiotemporal control of gene expression. In the context of RNA localization, the description of the binding preferences of an RNA-binding protein defines a motif, and one, or more, instance of a given motif is defined as a localization element (zip code). In this chapter, we first discuss the cis-regulatory motifs previously identified as mRNA localization elements. We then describe motif representation in terms of entropy and information content and offer an overview of motif databases and search algorithms. Finally, we provide an outline of the motif topology of asymmetrically localized mRNA molecules.
Collapse
|
18
|
Dempster-Shafer Theory for the Prediction of Auxin-Response Elements (AuxREs) in Plant Genomes. BIOMED RESEARCH INTERNATIONAL 2018; 2018:3837060. [PMID: 30515394 PMCID: PMC6236769 DOI: 10.1155/2018/3837060] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 10/15/2018] [Indexed: 11/17/2022]
Abstract
Auxin is a major regulator of plant growth and development; its action involves transcriptional activation. The identification of Auxin-response element (AuxRE) is one of the most important issues to understand the Auxin regulation of gene expression. Over the past few years, a large number of motif identification tools have been developed. Despite these considerable efforts provided by computational biologists, building reliable models to predict regulatory elements has still been a difficult challenge. In this context, we propose in this work a data fusion approach for the prediction of AuxRE. Our method is based on the combined use of Dempster-Shafer evidence theory and fuzzy theory. To evaluate our model, we have scanning the DORNRÖSCHEN promoter by our model. All proven AuxRE present in the promoter has been detected. At the 0.9 threshold we have no false positive. The comparison of the results of our model and some previous motifs finding tools shows that our model can predict AuxRE more successfully than the other tools and produce less false positive. The comparison of the results before and after combination shows the importance of Dempster-Shafer combination in the decrease of false positive and to improve the reliability of prediction. For an overall evaluation we have chosen to present the performance of our approach in comparison with other methods. In fact, the results indicated that the data fusion method has the highest degree of sensitivity (Sn) and Positive Predictive Value (PPV).
Collapse
|
19
|
Krystkowiak I, Manguy J, Davey NE. PSSMSearch: a server for modeling, visualization, proteome-wide discovery and annotation of protein motif specificity determinants. Nucleic Acids Res 2018; 46:W235-W241. [PMID: 29873773 PMCID: PMC6030969 DOI: 10.1093/nar/gky426] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Revised: 04/11/2018] [Accepted: 05/15/2018] [Indexed: 11/29/2022] Open
Abstract
There is a pressing need for in silico tools that can aid in the identification of the complete repertoire of protein binding (SLiMs, MoRFs, miniMotifs) and modification (moiety attachment/removal, isomerization, cleavage) motifs. We have created PSSMSearch, an interactive web-based tool for rapid statistical modeling, visualization, discovery and annotation of protein motif specificity determinants to discover novel motifs in a proteome-wide manner. PSSMSearch analyses proteomes for regions with significant similarity to a motif specificity determinant model built from a set of aligned motif-containing peptides. Multiple scoring methods are available to build a position-specific scoring matrix (PSSM) describing the motif specificity determinant model. This model can then be modified by a user to add prior knowledge of specificity determinants through an interactive PSSM heatmap. PSSMSearch includes a statistical framework to calculate the significance of specificity determinant model matches against a proteome of interest. PSSMSearch also includes the SLiMSearch framework's annotation, motif functional analysis and filtering tools to highlight relevant discriminatory information. Additional tools to annotate statistically significant shared keywords and GO terms, or experimental evidence of interaction with a motif-recognizing protein have been added. Finally, PSSM-based conservation metrics have been created for taxonomic range analyses. The PSSMSearch web server is available at http://slim.ucd.ie/pssmsearch/.
Collapse
Affiliation(s)
- Izabella Krystkowiak
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
- UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Jean Manguy
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
- UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
- Food for Health Ireland, University College Dublin, Belfield, Dublin 4, Ireland
| | - Norman E Davey
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
- UCD School of Medicine & Medical Science, University College Dublin, Belfield, Dublin 4, Ireland
| |
Collapse
|
20
|
Gao L, Bao W, Zhang H, Yuan CA, Huang DS. Fast sequence analysis based on diamond sampling. PLoS One 2018; 13:e0198922. [PMID: 29953448 PMCID: PMC6023231 DOI: 10.1371/journal.pone.0198922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2018] [Accepted: 05/29/2018] [Indexed: 12/02/2022] Open
Abstract
Both in DNA and protein contexts, an important method for modelling motifs is to utilize position weight matrix (PWM) in biological sequences. With the development of genome sequencing technology, the quantity of the sequence data is increasing explosively, so the faster searching algorithms which have the ability to meet the increasingly need are desired to develop. In this paper, we proposed a method for speeding up the searching process of candidate transcription factor binding sites (TFBS), and the users can be allowed to specify p threshold to get the desired trade-off between speed and sensitivity for a particular sequence analysis. Moreover, the proposed method can also be generalized to large-scale annotation and sequence projects.
Collapse
Affiliation(s)
- Liangxin Gao
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Wenzhen Bao
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Hongbo Zhang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Chang-An Yuan
- Science Computing and Intelligent Information Processing of GuangXi Higher Education Key Laboratory, Guangxi Teachers Education University, Nanning, Guangxi, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
| |
Collapse
|
21
|
Improving Gene Regulatory Network Inference by Incorporating Rates of Transcriptional Changes. Sci Rep 2017; 7:17244. [PMID: 29222512 PMCID: PMC5722905 DOI: 10.1038/s41598-017-17143-1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 11/22/2017] [Indexed: 11/18/2022] Open
Abstract
Organisms respond to changes in their environment through transcriptional regulatory networks (TRNs). The regulatory hierarchy of these networks can be inferred from expression data. Computational approaches to identify TRNs can be applied in any species where quality RNA can be acquired, However, ChIP-Seq and similar validation methods are challenging to employ in non-model species. Improving the accuracy of computational inference methods can significantly reduce the cost and time of subsequent validation experiments. We have developed ExRANGES, an approach that improves the ability to computationally infer TRN from time series expression data. ExRANGES utilizes both the rate of change in expression and the absolute expression level to identify TRN connections. We evaluated ExRANGES in five data sets from different model systems. ExRANGES improved the identification of experimentally validated transcription factor targets for all species tested, even in unevenly spaced and sparse data sets. This improved ability to predict known regulator-target relationships enhances the utility of network inference approaches in non-model species where experimental validation is challenging. We integrated ExRANGES with two different network construction approaches and it has been implemented as an R package available here: http://github.com/DohertyLab/ExRANGES. To install the package type: devtools::install_github(“DohertyLab/ExRANGES”).
Collapse
|
22
|
Spadafore M, Najarian K, Boyle AP. A proximity-based graph clustering method for the identification and application of transcription factor clusters. BMC Bioinformatics 2017; 18:530. [PMID: 29187152 PMCID: PMC5706350 DOI: 10.1186/s12859-017-1935-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Accepted: 11/14/2017] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Transcription factors (TFs) form a complex regulatory network within the cell that is crucial to cell functioning and human health. While methods to establish where a TF binds to DNA are well established, these methods provide no information describing how TFs interact with one another when they do bind. TFs tend to bind the genome in clusters, and current methods to identify these clusters are either limited in scope, unable to detect relationships beyond motif similarity, or not applied to TF-TF interactions. METHODS Here, we present a proximity-based graph clustering approach to identify TF clusters using either ChIP-seq or motif search data. We use TF co-occurrence to construct a filtered, normalized adjacency matrix and use the Markov Clustering Algorithm to partition the graph while maintaining TF-cluster and cluster-cluster interactions. We then apply our graph structure beyond clustering, using it to increase the accuracy of motif-based TFBS searching for an example TF. RESULTS We show that our method produces small, manageable clusters that encapsulate many known, experimentally validated transcription factor interactions and that our method is capable of capturing interactions that motif similarity methods might miss. Our graph structure is able to significantly increase the accuracy of motif TFBS searching, demonstrating that the TF-TF connections within the graph correlate with biological TF-TF interactions. CONCLUSION The interactions identified by our method correspond to biological reality and allow for fast exploration of TF clustering and regulatory dynamics.
Collapse
Affiliation(s)
- Maxwell Spadafore
- University of Michigan Medical School, 1301 Catherine, Ann Arbor, 48109-5624 USA
| | - Kayvan Najarian
- University of Michigan Department of Computational Medicine and Bioinformatics, 100 Washtenaw Avenue, Ann Arbor, 48109 USA
- University of Michigan Medical School Department of Emergency Medicine, 1500 E Medical Center Drive, Ann Arbor, 48109 USA
| | - Alan P. Boyle
- University of Michigan Department of Computational Medicine and Bioinformatics, 100 Washtenaw Avenue, Ann Arbor, 48109 USA
- University of Michigan Department of Genetics, 1241 E Catherine, Ann Arbor, 48109 USA
| |
Collapse
|
23
|
Rice ES, Kohno S, John JS, Pham S, Howard J, Lareau LF, O'Connell BL, Hickey G, Armstrong J, Deran A, Fiddes I, Platt RN, Gresham C, McCarthy F, Kern C, Haan D, Phan T, Schmidt C, Sanford JR, Ray DA, Paten B, Guillette LJ, Green RE. Improved genome assembly of American alligator genome reveals conserved architecture of estrogen signaling. Genome Res 2017; 27:686-696. [PMID: 28137821 PMCID: PMC5411764 DOI: 10.1101/gr.213595.116] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 12/13/2016] [Indexed: 12/12/2022]
Abstract
The American alligator, Alligator mississippiensis, like all crocodilians, has temperature-dependent sex determination, in which the sex of an embryo is determined by the incubation temperature of the egg during a critical period of development. The lack of genetic differences between male and female alligators leaves open the question of how the genes responsible for sex determination and differentiation are regulated. Insight into this question comes from the fact that exposing an embryo incubated at male-producing temperature to estrogen causes it to develop ovaries. Because estrogen response elements are known to regulate genes over long distances, a contiguous genome assembly is crucial for predicting and understanding their impact. We present an improved assembly of the American alligator genome, scaffolded with in vitro proximity ligation (Chicago) data. We use this assembly to scaffold two other crocodilian genomes based on synteny. We perform RNA sequencing of tissues from American alligator embryos to find genes that are differentially expressed between embryos incubated at male- versus female-producing temperature. Finally, we use the improved contiguity of our assembly along with the current model of CTCF-mediated chromatin looping to predict regions of the genome likely to contain estrogen-responsive genes. We find that these regions are significantly enriched for genes with female-biased expression in developing gonads after the critical period during which sex is determined by incubation temperature. We thus conclude that estrogen signaling is a major driver of female-biased gene expression in the post-temperature sensitive period gonads.
Collapse
Affiliation(s)
- Edward S Rice
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Satomi Kohno
- Department of Biology, St. Cloud State University, St. Cloud, Minnesota 56301, USA
| | - John St John
- Driver Group, LLC, San Francisco, California 94158, USA
| | - Son Pham
- BioTuring, Incorporated, San Diego, California 92121, USA
| | - Jonathan Howard
- Department of Biochemistry, Stanford University, Stanford, California 94305, USA
| | - Liana F Lareau
- California Institute for Quantitative Biosciences, University of California, Berkeley, California 94720, USA
| | - Brendan L O'Connell
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA.,Dovetail Genomics, LLC, Santa Cruz, California 95060, USA
| | - Glenn Hickey
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Joel Armstrong
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Alden Deran
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Ian Fiddes
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Roy N Platt
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas 79409, USA
| | - Cathy Gresham
- Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, Mississippi State, Mississippi 39762, USA
| | - Fiona McCarthy
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Colin Kern
- Department of Animal Science, University of California, Davis, California 95616, USA
| | - David Haan
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Tan Phan
- HCM University of Science, Ho Chí Minh, Vietnam 748500
| | - Carl Schmidt
- Department of Animal and Food Sciences, University of Delaware, Newark, Delaware 19717, USA
| | - Jeremy R Sanford
- Department of Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, California 95064, USA
| | - David A Ray
- Department of Biological Sciences, Texas Tech University, Lubbock, Texas 79409, USA
| | - Benedict Paten
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | - Louis J Guillette
- Department of Obstetrics and Gynecology, Marine Biomedicine and Environmental Science Center, Hollings Marine Laboratory, Medical University of South Carolina, Charleston, South Carolina 29412, USA
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA.,California Institute for Quantitative Biosciences, University of California, Berkeley, California 94720, USA.,Dovetail Genomics, LLC, Santa Cruz, California 95060, USA
| |
Collapse
|
24
|
Jayaram N, Usvyat D, R Martin AC. Evaluating tools for transcription factor binding site prediction. BMC Bioinformatics 2016; 17:547. [PMID: 27806697 PMCID: PMC6889335 DOI: 10.1186/s12859-016-1298-9] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2016] [Accepted: 10/20/2016] [Indexed: 12/21/2022] Open
Abstract
Background Binding of transcription factors to transcription factor binding sites (TFBSs) is key to the mediation of transcriptional regulation. Information on experimentally validated functional TFBSs is limited and consequently there is a need for accurate prediction of TFBSs for gene annotation and in applications such as evaluating the effects of single nucleotide variations in causing disease. TFBSs are generally recognized by scanning a position weight matrix (PWM) against DNA using one of a number of available computer programs. Thus we set out to evaluate the best tools that can be used locally (and are therefore suitable for large-scale analyses) for creating PWMs from high-throughput ChIP-Seq data and for scanning them against DNA. Results We evaluated a set of de novo motif discovery tools that could be downloaded and installed locally using ENCODE-ChIP-Seq data and showed that rGADEM was the best-performing tool. TFBS prediction tools used to scan PWMs against DNA fall into two classes — those that predict individual TFBSs and those that identify clusters. Our evaluation showed that FIMO and MCAST performed best respectively. Conclusions Selection of the best-performing tools for generating PWMs from ChIP-Seq data and for scanning PWMs against DNA has the potential to improve prediction of precise transcription factor binding sites within regions identified by ChIP-Seq experiments for gene finding, understanding regulation and in evaluating the effects of single nucleotide variations in causing disease. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1298-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Narayan Jayaram
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Daniel Usvyat
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK
| | - Andrew C R Martin
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
25
|
O'Neill PK, Erill I. Parametric bootstrapping for biological sequence motifs. BMC Bioinformatics 2016; 17:406. [PMID: 27716039 PMCID: PMC5052923 DOI: 10.1186/s12859-016-1246-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2016] [Accepted: 09/08/2016] [Indexed: 11/10/2022] Open
Abstract
Background Biological sequence motifs drive the specific interactions of proteins and nucleic acids. Accordingly, the effective computational discovery and analysis of such motifs is a central theme in bioinformatics. Many practical questions about the properties of motifs can be recast as random sampling problems. In this light, the task is to determine for a given motif whether a certain feature of interest is statistically unusual among relevantly similar alternatives. Despite the generality of this framework, its use has been frustrated by the difficulties of defining an appropriate reference class of motifs for comparison and of sampling from it effectively. Results We define two distributions over the space of all motifs of given dimension. The first is the maximum entropy distribution subject to mean information content, and the second is the truncated uniform distribution over all motifs having information content within a given interval. We derive exact sampling algorithms for each. As a proof of concept, we employ these sampling methods to analyze a broad collection of prokaryotic and eukaryotic transcription factor binding site motifs. In addition to positional information content, we consider the informational Gini coefficient of the motif, a measure of the degree to which information is evenly distributed throughout a motif’s positions. We find that both prokaryotic and eukaryotic motifs tend to exhibit higher informational Gini coefficients (IGC) than would be expected by chance under either reference distribution. As a second application, we apply maximum entropy sampling to the motif p-value problem and use it to give elementary derivations of two new estimators. Conclusions Despite the historical centrality of biological sequence motif analysis, this study constitutes to our knowledge the first use of principled null hypotheses for sequence motifs given information content. Through their use, we are able to characterize for the first time differerences in global motif statistics between biological motifs and their null distributions. In particular, we observe that biological sequence motifs show an unusual distribution of IGC, presumably due to biochemical constraints on the mechanisms of direct read-out. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1246-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Patrick K O'Neill
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, 21250, US
| | - Ivan Erill
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, 21250, US.
| |
Collapse
|
26
|
Hacisuleyman E, Shukla CJ, Weiner CL, Rinn JL. Function and evolution of local repeats in the Firre locus. Nat Commun 2016; 7:11021. [PMID: 27009974 PMCID: PMC4820808 DOI: 10.1038/ncomms11021] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 02/07/2016] [Indexed: 11/23/2022] Open
Abstract
More than half the human and mouse genomes are comprised of repetitive sequences, such as transposable elements (TEs), which have been implicated in many biological processes. In contrast, much less is known about other repeats, such as local repeats that occur in multiple instances within a given locus in the genome but not elsewhere. Here, we systematically characterize local repeats in the genomic locus of the Firre long noncoding RNA (lncRNA). We find a conserved function for the RRD repeat as a ribonucleic nuclear retention signal that is sufficient to retain an otherwise cytoplasmic mRNA in the nucleus. We also identified a repeat, termed R0, that can function as a DNA enhancer element within the intronic sequences of Firre. Collectively, our data suggest that local repeats can have diverse functionalities and molecular modalities in the Firre locus and perhaps more globally in other lncRNAs. Mammalian genomes contain multiple repetitive sequences such as transposable elements and local repeats. Here, the authors show that the conserved long non-coding RNA Firre contains repeats that act as nuclear retention signals and a DNA enhancer element.
Collapse
Affiliation(s)
- Ezgi Hacisuleyman
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Department of Stem Cell and Regenerative Biology, Harvard University7 Divinity Avenue, Room 305, Cambridge, Massachusetts 02138, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
| | - Chinmay J Shukla
- Department of Stem Cell and Regenerative Biology, Harvard University7 Divinity Avenue, Room 305, Cambridge, Massachusetts 02138, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Biological and Biomedical Sciences, Harvard University, Boston, Massachusetts 02115, USA
| | - Catherine L Weiner
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA.,Department of Stem Cell and Regenerative Biology, Harvard University7 Divinity Avenue, Room 305, Cambridge, Massachusetts 02138, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA
| | - John L Rinn
- Department of Stem Cell and Regenerative Biology, Harvard University7 Divinity Avenue, Room 305, Cambridge, Massachusetts 02138, USA.,Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA.,Department of Pathology, Beth Israel Deaconess Medical Center, Boston, Massachusetts 02215, USA
| |
Collapse
|
27
|
Wu X, Ruan L, Yang Y, Mei Q. Identification of crucial regulatory relationships between long non-coding RNAs and protein-coding genes in lung squamous cell carcinoma. Mol Cell Probes 2016; 30:146-52. [PMID: 26928440 DOI: 10.1016/j.mcp.2016.02.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Revised: 01/22/2016] [Accepted: 02/19/2016] [Indexed: 01/25/2023]
Abstract
PURPOSE This study aimed to analyze the relationships of long non-coding RNAs (lncRNAs) and protein-coding genes in lung squamous cell carcinoma (LUSC). METHODS RNA-seq data of LUSC deposited in the TCGA database were used to identify differentially expressed protein-coding genes (DECGs) and differentially expressed lncRNA genes (DE-lncRNAs) between LUSC samples and normal samples. Functional enrichment analysis of DECGs was then performed. Subsequently, the target genes and regulators of DE-lncRNAs were predicted from the DECGs. Additionally, expression levels of target genes of DE-lncRNAs were validated by RT-qPCR after the silence of DE-lncRNAs. RESULTS In total, 5162 differentially expressed genes (DEGs) were screened from the LUSC samples, and there were seven upregulated lncRNA genes in the DEGs. The upregulated DECGs were enriched in GO terms like RNA binding and metabolic process. Meanwhile, the downregulated DECGs were enriched in GO terms like cell cycle. Furthermore, the lncRNAs PVT1 and TERC targeted multiple DECGs. PVT1 targeted genes related to cell cycle (e.g. POLA2, POLD1, MCM4, MCM5 and MCM6), and reduced expression of PVT1 decreased expression of the genes. TERC regulated several genes (e.g. NDUFAB1, NDUFA11 and NDUFB5), and reduced expression of TERC increased expression of the genes. Additionally, PVT1 was regulated by multiple transcription factors (TFs) identified from DECGs, such as HSF1; and TERC was modulated by TFs, such as PIR. CONCLUSION A set of regulatory relationships between PVT1 and its targets and regulators, as well as TERC and its targets and regulators, may play crucial roles in the progress of LUSC.
Collapse
Affiliation(s)
- Xiaofen Wu
- Department of Gerontology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Lei Ruan
- Department of Gerontology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Yi Yang
- Department of Gerontology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Qi Mei
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China.
| |
Collapse
|
28
|
van Loo KMJ, Schaub C, Pitsch J, Kulbida R, Opitz T, Ekstein D, Dalal A, Urbach H, Beck H, Yaari Y, Schoch S, Becker AJ. Zinc regulates a key transcriptional pathway for epileptogenesis via metal-regulatory transcription factor 1. Nat Commun 2015; 6:8688. [PMID: 26498180 PMCID: PMC4846312 DOI: 10.1038/ncomms9688] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 09/18/2015] [Indexed: 11/29/2022] Open
Abstract
Temporal lobe epilepsy (TLE) is the most common focal seizure disorder in adults. In many patients, transient brain insults, including status epilepticus (SE), are followed by a latent period of epileptogenesis, preceding the emergence of clinical seizures. In experimental animals, transcriptional upregulation of CaV3.2 T-type Ca(2+)-channels, resulting in an increased propensity for burst discharges of hippocampal neurons, is an important trigger for epileptogenesis. Here we provide evidence that the metal-regulatory transcription factor 1 (MTF1) mediates the increase of CaV3.2 mRNA and intrinsic excitability consequent to a rise in intracellular Zn(2+) that is associated with SE. Adeno-associated viral (rAAV) transfer of MTF1 into murine hippocampi leads to increased CaV3.2 mRNA. Conversely, rAAV-mediated expression of a dominant-negative MTF1 abolishes SE-induced CaV3.2 mRNA upregulation and attenuates epileptogenesis. Finally, data from resected human hippocampi surgically treated for pharmacoresistant TLE support the Zn(2+)-MTF1-CaV3.2 cascade, thus providing new vistas for preventing and treating TLE.
Collapse
Affiliation(s)
- Karen M. J. van Loo
- Section for Translational Epilepsy Research, Department of Neuropathology, University of Bonn Medical Center, Bonn 53105, Germany
| | - Christina Schaub
- Laboratory for Experimental Epileptology and Cognition Research, Department of Epileptology, University of Bonn Medical Center, Bonn 53105, Germany
- Department of Neurology, University of Bonn Medical Center, Bonn 53105, Germany
| | - Julika Pitsch
- Section for Translational Epilepsy Research, Department of Neuropathology, University of Bonn Medical Center, Bonn 53105, Germany
| | - Rebecca Kulbida
- Section for Translational Epilepsy Research, Department of Neuropathology, University of Bonn Medical Center, Bonn 53105, Germany
| | - Thoralf Opitz
- Laboratory for Experimental Epileptology and Cognition Research, Department of Epileptology, University of Bonn Medical Center, Bonn 53105, Germany
| | - Dana Ekstein
- Department of Medical Neurobiology, IMRIC, Hebrew University–Hadassah School of Medicine, Jerusalem 91120, Israel
- Department of Neurology, Hadassah—Hebrew University Medical Center, Jerusalem 91120, Israel
| | - Adam Dalal
- Department of Medical Neurobiology, IMRIC, Hebrew University–Hadassah School of Medicine, Jerusalem 91120, Israel
| | - Horst Urbach
- Department of Neuroradiology, Medical Center University of Freiburg, Freiburg 79106, Germany
| | - Heinz Beck
- Laboratory for Experimental Epileptology and Cognition Research, Department of Epileptology, University of Bonn Medical Center, Bonn 53105, Germany
| | - Yoel Yaari
- Department of Medical Neurobiology, IMRIC, Hebrew University–Hadassah School of Medicine, Jerusalem 91120, Israel
| | - Susanne Schoch
- Section for Translational Epilepsy Research, Department of Neuropathology, University of Bonn Medical Center, Bonn 53105, Germany
| | - Albert J. Becker
- Section for Translational Epilepsy Research, Department of Neuropathology, University of Bonn Medical Center, Bonn 53105, Germany
| |
Collapse
|
29
|
Abreu VAC, Almeida S, Tiwari S, Hassan SS, Mariano D, Silva A, Baumbach J, Azevedo V, Röttger R. CMRegNet-An interspecies reference database for corynebacterial and mycobacterial regulatory networks. BMC Genomics 2015; 16:452. [PMID: 26062809 PMCID: PMC4464113 DOI: 10.1186/s12864-015-1631-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 05/14/2015] [Indexed: 11/10/2022] Open
Abstract
Background Organisms utilize a multitude of mechanisms for responding to changing environmental conditions, maintaining their functional homeostasis and to overcome stress situations. One of the most important mechanisms is transcriptional gene regulation. In-depth study of the transcriptional gene regulatory network can lead to various practical applications, creating a greater understanding of how organisms control their cellular behavior. Description In this work, we present a new database, CMRegNet for the gene regulatory networks of Corynebacterium glutamicum ATCC 13032 and Mycobacterium tuberculosis H37Rv. We furthermore transferred the known networks of these model organisms to 18 other non-model but phylogenetically close species (target organisms) of the CMNR group. In comparison to other network transfers, for the first time we utilized two model organisms resulting into a more diverse and complete network of the target organisms. Conclusion CMRegNet provides easy access to a total of 3,103 known regulations in C. glutamicum ATCC 13032 and M. tuberculosis H37Rv and to 38,940 evolutionary conserved interactions for 18 non-model species of the CMNR group. This makes CMRegNet to date the most comprehensive database of regulatory interactions of CMNR bacteria. The content of CMRegNet is publicly available online via a web interface found at http://lgcm.icb.ufmg.br/cmregnet.
Collapse
Affiliation(s)
- Vinicius A C Abreu
- Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais (Universidade Federal de Minas Gerais), Belo Horizonte, Minas Gerais, Brazil.
| | - Sintia Almeida
- Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais (Universidade Federal de Minas Gerais), Belo Horizonte, Minas Gerais, Brazil.
| | - Sandeep Tiwari
- Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais (Universidade Federal de Minas Gerais), Belo Horizonte, Minas Gerais, Brazil.
| | - Syed Shah Hassan
- Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais (Universidade Federal de Minas Gerais), Belo Horizonte, Minas Gerais, Brazil.
| | - Diego Mariano
- Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais (Universidade Federal de Minas Gerais), Belo Horizonte, Minas Gerais, Brazil.
| | - Artur Silva
- Institute of Biological Sciences, Federal University of Pará, Belém, Pará, Brazil.
| | - Jan Baumbach
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
| | - Vasco Azevedo
- Graduate Program in Bioinformatics, Institute of Biological Sciences, Federal University of Minas Gerais (Universidade Federal de Minas Gerais), Belo Horizonte, Minas Gerais, Brazil.
| | - Richard Röttger
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark. .,Computational Systems Biology, Max Planck Institute for Informatics, Campus E 2.1, 66123, Saarbrucken, Germany.
| |
Collapse
|
30
|
Identification and characterization of VpsR and VpsT binding sites in Vibrio cholerae. J Bacteriol 2015; 197:1221-35. [PMID: 25622616 DOI: 10.1128/jb.02439-14] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
UNLABELLED The ability to form biofilms is critical for environmental survival and transmission of Vibrio cholerae, a facultative human pathogen responsible for the disease cholera. Biofilm formation is controlled by several transcriptional regulators and alternative sigma factors. In this study, we report that the two main positive regulators of biofilm formation, VpsR and VpsT, bind to nonoverlapping target sequences in the regulatory region of vpsL in vitro. VpsR binds to a proximal site (the R1 box) as well as a distal site (the R2 box) with respect to the transcriptional start site identified upstream of vpsL. The VpsT binding site (the T box) is located between the R1 and R2 boxes. While mutations in the T and R boxes resulted in a decrease in vpsL expression, deletion of the T and R2 boxes resulted in an increase in vpsL expression. Analysis of the role of H-NS in vpsL expression revealed that deletion of hns resulted in enhanced vpsL expression. The level of vpsL expression was higher in an hns vpsT double mutant than in the parental strain but lower than that in an hns mutant. In silico analysis of the regulatory regions of the VpsR and VpsT targets resulted in the identification of conserved recognition motifs for VpsR and VpsT and revealed that operons involved in biofilm formation and vpsT are coregulated by VpsR and VpsT. Furthermore, a comparative genomics analysis revealed substantial variability in the promoter region of the vpsT and vpsL genes among extant V. cholerae isolates, suggesting that regulation of biofilm formation is under active selection. IMPORTANCE Vibrio cholerae causes cholera and is a natural inhabitant of aquatic environments. One critical factor that is important for environmental survival and transmission of V. cholerae is the microbe's ability to form biofilms, which are surface-associated communities encased in a matrix composed of the exopolysaccharide VPS (Vibrio polysaccharide), proteins, and nucleic acids. Two proteins, VpsR and VpsT, positively regulate VPS production and biofilm formation. We characterized the structural features of the promoter of the vpsL gene, determined the target sequences recognized by VpsT and VpsR, and analyzed their distribution and conservation patterns in multiple V. cholerae isolates. This work fills a fundamental gap in our understanding of the regulatory mechanisms employed by the master regulators VpsR and VpsT in controlling biofilm matrix production.
Collapse
|
31
|
Kelley DR, Hendrickson DG, Tenen D, Rinn JL. Transposable elements modulate human RNA abundance and splicing via specific RNA-protein interactions. Genome Biol 2014; 15:537. [PMID: 25572935 PMCID: PMC4272801 DOI: 10.1186/s13059-014-0537-5] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Accepted: 11/07/2014] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Transposable elements (TEs) have significantly influenced the evolution of transcriptional regulatory networks in the human genome. Post-transcriptional regulation of human genes by TE-derived sequences has been observed in specific contexts, but has yet to be systematically and comprehensively investigated. Here, we study a collection of 75 CLIP-Seq experiments mapping the RNA binding sites for a diverse set of 51 human proteins to explore the role of TEs in post-transcriptional regulation of human mRNAs and lncRNAs via RNA-protein interactions. RESULTS We detect widespread interactions between RNA binding proteins (RBPs) and many families of TE-derived sequence in the CLIP-Seq data. Further, alignment coverage peaks on specific positions of the TE consensus sequences, illuminating a diversity of TE-specific RBP binding motifs. Evidence of binding and conservation of these motifs in the nonrepetitive transcriptome suggests that TEs have generally appropriated existing sequence preferences of the RBPs. Depletion assays for numerous RBPs show that TE-derived binding sites affect transcript abundance and splicing similarly to nonrepetitive sites. However, in a few cases the effect of RBP binding depends on the specific TE family bound; for example, the ubiquitously expressed RBP HuR confers transcript stability unless bound to an Alu element. CONCLUSIONS Our meta-analysis suggests a widespread role for TEs in shaping RNA-protein regulatory networks in the human genome.
Collapse
|
32
|
Yin J, Morrissey ME, Shine L, Kennedy C, Higgins DG, Kennedy BN. Genes and signaling networks regulated during zebrafish optic vesicle morphogenesis. BMC Genomics 2014; 15:825. [PMID: 25266257 PMCID: PMC4190348 DOI: 10.1186/1471-2164-15-825] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2014] [Accepted: 09/24/2014] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The genetic cascades underpinning vertebrate early eye morphogenesis are poorly understood. One gene family essential for eye morphogenesis encodes the retinal homeobox (Rx) transcription factors. Mutations in the human retinal homeobox gene (RAX) can lead to gross morphological phenotypes ranging from microphthalmia to anophthalmia. Zebrafish rx3 null mutants produce a similar striking eyeless phenotype with an associated expanded forebrain. Thus, we used zebrafish rx3-/- mutants as a model to uncover an Rx3-regulated gene network during early eye morphogenesis. RESULTS Rx3-regulated genes were identified using whole transcriptomic sequencing (RNA-seq) of rx3-/- mutants and morphologically wild-type siblings during optic vesicle morphogenesis. A gene co-expression network was then constructed for the Rx3-regulated genes, identifying gene cross-talk during early eye development. Genes highly connected in the network are hub genes, which tend to exhibit higher expression changes between rx3-/- mutants and normal phenotype siblings. Hub genes down-regulated in rx3-/- mutants encompass homeodomain transcription factors and mediators of retinoid-signaling, both associated with eye development and known human eye disorders. In contrast, genes up-regulated in rx3-/- mutants are centered on Wnt signaling pathways, associated with brain development and disorders. The temporal expression pattern of Rx3-regulated genes was further profiled during early development from maternal stage until visual function is fully mature. Rx3-regulated genes exhibited synchronized expression patterns, and a transition of gene expression during the early segmentation stage when Rx3 was highly expressed. Furthermore, most of these deregulated genes are enriched with multiple RAX-binding motif sequences on the gene promoter. CONCLUSIONS Here, we assembled a comprehensive model of Rx3-regulated genes during early eye morphogenesis. Rx3 promotes optic vesicle morphogenesis and represses brain development through a highly correlated and modulated network, exhibiting repression of genes mediating Wnt signaling and concomitant enhanced expression of homeodomain transcription factors and retinoid-signaling genes.
Collapse
Affiliation(s)
- Jun Yin
- />UCD Conway Institute, UCD School of Medicine and Medical Science, University College Dublin, Belfield, Dublin 4 Ireland
- />Department of Genetics, Yale University School of Medicine, New Haven, CT 06520 USA
| | - Maria E Morrissey
- />UCD Conway Institute, UCD School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4 Ireland
| | - Lisa Shine
- />UCD Conway Institute, UCD School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4 Ireland
| | - Ciarán Kennedy
- />UCD Conway Institute, UCD School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4 Ireland
| | - Desmond G Higgins
- />UCD Conway Institute, UCD School of Medicine and Medical Science, University College Dublin, Belfield, Dublin 4 Ireland
| | - Breandán N Kennedy
- />UCD Conway Institute, UCD School of Biomolecular and Biomedical Science, University College Dublin, Belfield, Dublin 4 Ireland
| |
Collapse
|
33
|
Large-scale investigation of human TF-miRNA relations based on coexpression profiles. BIOMED RESEARCH INTERNATIONAL 2014; 2014:623078. [PMID: 24995316 PMCID: PMC4068100 DOI: 10.1155/2014/623078] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Revised: 05/02/2014] [Accepted: 05/18/2014] [Indexed: 02/05/2023]
Abstract
Noncoding, endogenous microRNAs (miRNAs) are fairly well known for regulating gene expression rather than protein coding. Dysregulation of miRNA gene, either upregulated or downregulated, may lead to severe diseases or oncogenesis, especially when the miRNA disorder involves significant bioreactions or pathways. Thus, how miRNA genes are transcriptionally regulated has been highlighted as well as target recognition in recent years. In this study, a large-scale investigation of novel cis- and trans-elements was undertaken to further determine TF-miRNA regulatory relations, which are necessary to unravel the transcriptional regulation of miRNA genes. Based on miRNA and annotated gene expression profiles, the term “coTFBS” was introduced to detect common transcription factors and the corresponding binding sites within the promoter regions of each miRNA and its coexpressed annotated genes. The computational pipeline was successfully established to filter redundancy due to short sequence motifs for TFBS pattern search. Eventually, we identified more convinced TF-miRNA regulatory relations for 225 human miRNAs. This valuable information is helpful in understanding miRNA functions and provides knowledge to evaluate the therapeutic potential in clinical research. Once most expression profiles of miRNAs in the latest database are completed, TF candidates of more miRNAs can be explored by this filtering approach in the future.
Collapse
|
34
|
Kerpedjiev P, Frellsen J, Lindgreen S, Krogh A. Adaptable probabilistic mapping of short reads using position specific scoring matrices. BMC Bioinformatics 2014; 15:100. [PMID: 24717095 PMCID: PMC4021105 DOI: 10.1186/1471-2105-15-100] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 03/28/2014] [Indexed: 11/10/2022] Open
Abstract
Background Modern DNA sequencing methods produce vast amounts of data that often requires mapping to a reference genome. Most existing programs use the number of mismatches between the read and the genome as a measure of quality. This approach is without a statistical foundation and can for some data types result in many wrongly mapped reads. Here we present a probabilistic mapping method based on position-specific scoring matrices, which can take into account not only the quality scores of the reads but also user-specified models of evolution and data-specific biases. Results We show how evolution, data-specific biases, and sequencing errors are naturally dealt with probabilistically. Our method achieves better results than Bowtie and BWA on simulated and real ancient and PAR-CLIP reads, as well as on simulated reads from the AT rich organism P. falciparum, when modeling the biases of these data. For simulated Illumina reads, the method has consistently higher sensitivity for both single-end and paired-end data. We also show that our probabilistic approach can limit the problem of random matches from short reads of contamination and that it improves the mapping of real reads from one organism (D. melanogaster) to a related genome (D. simulans). Conclusion The presented work is an implementation of a novel approach to short read mapping where quality scores, prior mismatch probabilities and mapping qualities are handled in a statistically sound manner. The resulting implementation provides not only a tool for biologists working with low quality and/or biased sequencing data but also a demonstration of the feasibility of using a probability based alignment method on real and simulated data sets.
Collapse
Affiliation(s)
| | | | | | - Anders Krogh
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen, Denmark.
| |
Collapse
|
35
|
Combining docking site and phosphosite predictions to find new substrates: identification of smoothelin-like-2 (SMTNL2) as a c-Jun N-terminal kinase (JNK) substrate. Cell Signal 2013; 25:2518-29. [PMID: 23981301 DOI: 10.1016/j.cellsig.2013.08.004] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Revised: 08/05/2013] [Accepted: 08/06/2013] [Indexed: 12/20/2022]
Abstract
Specific docking interactions between mitogen-activated protein kinases (MAPKs), their regulators, and their downstream substrates, are crucial for efficient and accurate signal transmission. To identify novel substrates of the c-Jun N-terminal kinase (JNK) family of MAPKs, we searched the human genome for proteins that contained (1), a predicted JNK-docking site (D-site); and (2), a cluster of putative JNK target phosphosites located close to the D-site. Here we describe a novel JNK substrate that emerged from this analysis, the functionally uncharacterized protein smoothelin-like 2 (SMTNL2). SMTNL2 protein bound with high-affinity to multiple MAPKs including JNK1-3 and ERK2; furthermore, the identity of conserved amino acids in the predicted docking site (residues 180-193) was necessary for this high-affinity binding. In addition, purified full-length SMTNL2 protein was phosphorylated by JNK1-3 in vitro, and this required the integrity of the D-site. Using mass spectrometry and mutagenesis, we identified four D-site-dependent phosphoacceptor sites in close proximity to the docking site, at S217, S241, T236 and T239. A short peptide comprised of the SMTNL2 D-site inhibited JNK-mediated phosphorylation of the ATF2 transcription factor, showing that SMTNL2 can compete with other substrates for JNK binding. Moreover, when transfected into HEK293 cells, SMTNL2 was phosphorylated by endogenous JNK in a D-site dependent manner, on the same residues identified in vitro. SMTNL2 protein was expressed in many mammalian tissues, with a notably high expression in skeletal muscle. Consistent with the hypothesis that SMTNL2 has a function in skeletal muscle, SMTNL2 protein expression was strongly induced during the transition from myoblasts to myotubes in differentiating C2C12 cells.
Collapse
|
36
|
Meyer F, Kurtz S, Beckstette M. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns. BMC Bioinformatics 2013; 14:226. [PMID: 23865810 PMCID: PMC3765529 DOI: 10.1186/1471-2105-14-226] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Accepted: 07/11/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. RESULTS We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. CONCLUSIONS The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator.
Collapse
Affiliation(s)
- Fernando Meyer
- Center for Bioinformatics, University of Hamburg, Bundesstrasse 43, Hamburg 20146, Germany.
| | | | | |
Collapse
|
37
|
Schlüter JP, Reinkensmeier J, Barnett MJ, Lang C, Krol E, Giegerich R, Long SR, Becker A. Global mapping of transcription start sites and promoter motifs in the symbiotic α-proteobacterium Sinorhizobium meliloti 1021. BMC Genomics 2013; 14:156. [PMID: 23497287 PMCID: PMC3616915 DOI: 10.1186/1471-2164-14-156] [Citation(s) in RCA: 117] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2012] [Accepted: 02/12/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sinorhizobium meliloti is a soil-dwelling α-proteobacterium that possesses a large, tripartite genome and engages in a nitrogen fixing symbiosis with its plant hosts. Although much is known about this important model organism, global characterization of genetic regulatory circuits has been hampered by a lack of information about transcription and promoters. RESULTS Using an RNAseq approach and RNA populations representing 16 different growth and stress conditions, we comprehensively mapped S. meliloti transcription start sites (TSS). Our work identified 17,001 TSS that we grouped into six categories based on the genomic context of their transcripts: mRNA (4,430 TSS assigned to 2,657 protein-coding genes), leaderless mRNAs (171), putative mRNAs (425), internal sense transcripts (7,650), antisense RNA (3,720), and trans-encoded sRNAs (605). We used this TSS information to identify transcription factor binding sites and putative promoter sequences recognized by seven of the 15 known S. meliloti σ factors σ70, σ54, σH1, σH2, σE1, σE2, and σE9). Altogether, we predicted 2,770 new promoter sequences, including 1,302 located upstream of protein coding genes and 722 located upstream of antisense RNA or trans-encoded sRNA genes. To validate promoter predictions for targets of the general stress response σ factor, RpoE2 (σE2), we identified rpoE2-dependent genes using microarrays and confirmed TSS for a subset of these by 5' RACE mapping. CONCLUSIONS By identifying TSS and promoters on a global scale, our work provides a firm foundation for the continued study of S. meliloti gene expression with relation to gene organization, σ factors and other transcription factors, and regulatory RNAs.
Collapse
Affiliation(s)
- Jan-Philip Schlüter
- Institute of Biology III, Faculty of Biology, Albert-Ludwigs-Universität Freiburg, Freiburg, Germany
| | | | | | | | | | | | | | | |
Collapse
|
38
|
Pernhorst K, van Loo KMJ, von Lehe M, Priebe L, Cichon S, Herms S, Hoffmann P, Helmstaedter C, Sander T, Schoch S, Becker AJ. Rs6295 promoter variants of the serotonin type 1A receptor are differentially activated by c-Jun in vitro and correlate to transcript levels in human epileptic brain tissue. Brain Res 2013; 1499:136-44. [PMID: 23333373 DOI: 10.1016/j.brainres.2012.12.045] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2012] [Revised: 12/19/2012] [Accepted: 12/29/2012] [Indexed: 01/20/2023]
Abstract
Many brain disorders, including epilepsy, migraine and depression, manifest with episodic symptoms that may last for various time intervals. Transient alterations of neuronal function such as related to serotonin homeostasis generally underlie this phenomenon. Several nucleotide polymorphisms (SNPs) in gene promoters associated with these diseases have been described. For obvious reasons, their regulatory roles on gene expression particularly in human brain tissue remain largely enigmatic. The rs6295 G-/C-allelic variant is located in the promoter region of the human HTR1a gene, encoding the G-protein-coupled receptor for 5-hydroxytryptamine (5HT1AR). In addition to reported transcriptional repressor binding, our bioinformatic analyses predicted a reduced binding affinity of the transcription factor (TF) c-Jun for the G-allele. In vitro luciferase transfection assays revealed c-Jun to (a) activate the rs6295 C- significantly stronger than the G-allelic variant and (b) antagonize efficiently the repressive effect of Hes5 on the promoter. The G-allele of rs6295 is known to be associated with aspects of major depression and migraine. In order to address a potential role of rs6295 variants in human brain tissue, we have isolated DNA and mRNA from fresh frozen hippocampal tissue of pharmacoresistant temporal lobe epilepsy (TLE) patients (n=140) after epilepsy surgery for seizure control. We carried out SNP genotyping studies and mRNA analyses in order to determine HTR1a mRNA expression in human hippocampal samples stratified according to the rs6295 allelic variant. The mRNA expression of HTR1a was significantly more abundant in hippocampal mRNA of TLE patients homozygous for the rs6295 C-allele as compared to those with the GG-genotype. These data may point to a novel, i.e., rs6295 allelic variant and c-Jun dependent transcriptional 5HT1AR 'receptoropathy'.
Collapse
Affiliation(s)
- Katharina Pernhorst
- Department of Neuropathology, University of Bonn Medical Center, Sigmund-Freud Str. 25, Bonn 53105, Germany
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
High-resolution detection of DNA binding sites of the global transcriptional regulator GlxR in Corynebacterium glutamicum. Microbiology (Reading) 2013; 159:12-22. [DOI: 10.1099/mic.0.062059-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
40
|
Saxena RK, Penmetsa RV, Upadhyaya HD, Kumar A, Carrasquilla-Garcia N, Schlueter JA, Farmer A, Whaley AM, Sarma BK, May GD, Cook DR, Varshney RK. Large-scale development of cost-effective single-nucleotide polymorphism marker assays for genetic mapping in pigeonpea and comparative mapping in legumes. DNA Res 2012; 19:449-61. [PMID: 23103470 PMCID: PMC3514856 DOI: 10.1093/dnares/dss025] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Single-nucleotide polymorphisms (SNPs, >2000) were discovered by using RNA-seq and allele-specific sequencing approaches in pigeonpea (Cajanus cajan). For making the SNP genotyping cost-effective, successful competitive allele-specific polymerase chain reaction (KASPar) assays were developed for 1616 SNPs and referred to as PKAMs (pigeonpea KASPar assay markers). Screening of PKAMs on 24 genotypes [23 from cultivated species and 1 wild species (Cajanus scarabaeoides)] defined a set of 1154 polymorphic markers (77.4%) with a polymorphism information content (PIC) value from 0.04 to 0.38. One thousand and ninety-four PKAMs showed polymorphisms between parental lines of the reference mapping population (C. cajan ICP 28 × C. scarabaeoides ICPW 94). By using high-quality marker genotyping data on 167 F2 lines from the population, a comprehensive genetic map comprising 875 PKAMs with an average inter-marker distance of 1.11 cM was developed. Previously mapped 35 simple sequence repeat markers were integrated into the PKAM map and an integrated genetic map of 996.21 cM was constructed. Mapped PKAMs showed a higher degree of synteny with the genome of Glycine max followed by Medicago truncatula and Lotus japonicus and least with Vigna unguiculata. These PKAMs will be useful for genetics research and breeding applications in pigeonpea and for utilizing genome information from other legume species.
Collapse
Affiliation(s)
- Rachit K Saxena
- Center of Excellence in Genomics (CEG), International Crops Research Institute for Semi-Arid Tropics (ICRISAT), Patancheru 502324, India
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Phenylacetic acid catabolism and its transcriptional regulation in Corynebacterium glutamicum. Appl Environ Microbiol 2012; 78:5796-804. [PMID: 22685150 DOI: 10.1128/aem.01588-12] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The industrially important organism Corynebacterium glutamicum has been characterized in recent years for its robust ability to assimilate aromatic compounds. In this study, C. glutamicum strain AS 1.542 was investigated for its ability to catabolize phenylacetic acid (PAA). The paa genes were identified; they are organized as a continuous paa gene cluster. The type strain of C. glutamicum, ATCC 13032, is not able to catabolize PAA, but the recombinant strain ATCC 13032/pEC-K18mob2::paa gained the ability to grow on PAA. The paaR gene, encoding a TetR family transcription regulator, was studied in detail. Disruption of paaR in strain AS 1.542 resulted in transcriptional increases of all paa genes. Transcription start sites and putative promoter regions were determined. An imperfect palindromic motif (5'-ACTNACCGNNCGNNCGGTNAGT-3'; 22 bp) was identified in the upstream regions of paa genes. Electrophoretic mobility shift assays (EMSA) demonstrated specific binding of PaaR to this motif, and phenylacetyl coenzyme A (PA-CoA) blocked binding. It was concluded that PaaR is the negative regulator of PAA degradation and that PA-CoA is the PaaR effector. In addition, GlxR binding sites were found, and binding to GlxR was confirmed. Therefore, PAA catabolism in C. glutamicum is regulated by the pathway-specific repressor PaaR, and also likely by the global transcription regulator GlxR. By comparative genomic analysis, we reconstructed orthologous PaaR regulons in 57 species, including species of Actinobacteria, Proteobacteria, and Flavobacteria, that carry PAA utilization genes and operate by conserved binding motifs, suggesting that PaaR-like regulation might commonly exist in these bacteria.
Collapse
|
42
|
Barzantny H, Schröder J, Strotmeier J, Fredrich E, Brune I, Tauch A. The transcriptional regulatory network of Corynebacterium jeikeium K411 and its interaction with metabolic routes contributing to human body odor formation. J Biotechnol 2012; 159:235-48. [DOI: 10.1016/j.jbiotec.2012.01.021] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2011] [Revised: 01/12/2012] [Accepted: 01/17/2012] [Indexed: 01/08/2023]
|
43
|
Sun Y, Buhler J, Yuan C. Designing filters for fast-known NcRNA identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:774-787. [PMID: 22084145 DOI: 10.1109/tcbb.2011.149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Detecting members of known noncoding RNA (ncRNA) families in genomic DNA is an important part of sequence annotation. However, the most widely used tool for modeling ncRNA families, the covariance model (CM), incurs a high-computational cost when used for genome-wide search. This cost can be reduced by using a filter to exclude sequences that are unlikely to contain the ncRNA of interest, applying the CM only where it is likely to match strongly. Despite recent advances, designing an efficient filter that can detect ncRNA instances lacking strong conservation while excluding most irrelevant sequences remains challenging. In this work, we design three types of filters based on multiple secondary structure profiles (SSPs). An SSP augments a regular profile (i.e., a position weight matrix) with secondary structure information but can still be efficiently scanned against long sequences. Multi-SSPbased filters combine evidence from multiple SSP matches and can achieve high sensitivity and specificity. Our SSP-based filters are extensively tested in BRAliBase III data set, Rfam 9.0, and a published soil metagenomic data set. In addition, we compare the SSPbased filters with several other ncRNA search tools including Infernal (with profile HMMs as filters), ERPIN, and tRNAscan-SE. Our experiments demonstrate that carefully designed SSP filters can achieve significant speedup over unfiltered CM search while maintaining high sensitivity for various ncRNA families. The designed filters and filter-scanning programs are available at our website: www.cse.msu.edu/~yannisun/ssp/.
Collapse
Affiliation(s)
- Yanni Sun
- Department of Computer Science and Engineering, Michigan State University, 3115 Engineering Building, East Lansing, MI 48824, USA.
| | | | | |
Collapse
|
44
|
Regulatory Snapshots: integrative mining of regulatory modules from expression time series and regulatory networks. PLoS One 2012; 7:e35977. [PMID: 22563474 PMCID: PMC3341384 DOI: 10.1371/journal.pone.0035977] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2011] [Accepted: 03/24/2012] [Indexed: 12/15/2022] Open
Abstract
Explaining regulatory mechanisms is crucial to understand complex cellular responses leading to system perturbations. Some strategies reverse engineer regulatory interactions from experimental data, while others identify functional regulatory units (modules) under the assumption that biological systems yield a modular organization. Most modular studies focus on network structure and static properties, ignoring that gene regulation is largely driven by stimulus-response behavior. Expression time series are key to gain insight into dynamics, but have been insufficiently explored by current methods, which often (1) apply generic algorithms unsuited for expression analysis over time, due to inability to maintain the chronology of events or incorporate time dependency; (2) ignore local patterns, abundant in most interesting cases of transcriptional activity; (3) neglect physical binding or lack automatic association of regulators, focusing mainly on expression patterns; or (4) limit the discovery to a predefined number of modules. We propose Regulatory Snapshots, an integrative mining approach to identify regulatory modules over time by combining transcriptional control with response, while overcoming the above challenges. Temporal biclustering is first used to reveal transcriptional modules composed of genes showing coherent expression profiles over time. Personalized ranking is then applied to prioritize prominent regulators targeting the modules at each time point using a network of documented regulatory associations and the expression data. Custom graphics are finally depicted to expose the regulatory activity in a module at consecutive time points (snapshots). Regulatory Snapshots successfully unraveled modules underlying yeast response to heat shock and human epithelial-to-mesenchymal transition, based on regulations documented in the YEASTRACT and JASPAR databases, respectively, and available expression data. Regulatory players involved in functionally enriched processes related to these biological events were identified. Ranking scores further suggested ability to discern the primary role of a gene (target or regulator). Prototype is available at: http://kdbio.inesc-id.pt/software/regulatorysnapshots.
Collapse
|
45
|
Promoter variants determine γ-aminobutyric acid homeostasis-related gene transcription in human epileptic hippocampi. J Neuropathol Exp Neurol 2012; 70:1080-8. [PMID: 22082659 DOI: 10.1097/nen.0b013e318238b9af] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022] Open
Abstract
The functional consequences of single nucleotide polymorphisms associated with episodic brain disorders such as epilepsy and depression are unclear. Allelic associations with generalized epilepsies have been reported for single nucleotide polymorphisms rs1883415 (ALDH5A1; succinic semialdehyde dehydrogenase) and rs4906902 (GABRB3; GABAA β3), both of which are present in the 5' regulatory region of genes involved in γ-aminobutyric acid (GABA) homeostasis. To address their allelic association with episodic brain disorders and allele-specific impact on the transcriptional regulation of these genes in human brain tissue, DNA and messenger RNA (mRNA) isolated from hippocampi were obtained at epilepsy surgery of 146 pharmacoresistant mesial temporal lobe epilepsy (mTLE) patients and from 651 healthy controls. We found that the C allele of rs1883415 is accumulated to a greater extentin mTLE versus controls. By real-time quantitative reverse transcription-polymerase chain reaction analyses, individuals homozygous for the C allele showed higher ALDH5A1 mRNA expression. The rs4906902 G allele of the GABRB3 gene was overrepresented in mTLE patients with depression; individuals homozygous for the G allele showed reduced GABRB3 mRNA expression. Bioinformatic analyses suggest that rs1883415 and rs4906902 alter the DNA binding affinity of the transcription factors Egr-3 in ALDH5A1 and MEF-2 in GABRB3 promoters, respectively. Using in vitro luciferase transfection assays, we observed that, in both cases, the transcription factors regulate gene expression depending on the allelic variant in the same direction as in the human hippocampi. Our data suggest that distinct promoter variants may sensitize individuals for differential, potentially stimulus-induced alterations of GABA homeostasis-relevant gene expression. This might contribute to the episodic onset of symptoms and point to new targets for pharmacotherapies.
Collapse
|
46
|
Pauling J, Röttger R, Neuner A, Salgado H, Collado-Vides J, Kalaghatgi P, Azevedo V, Tauch A, Pühler A, Baumbach J. On the trail of EHEC/EAEC--unraveling the gene regulatory networks of human pathogenic Escherichia coli bacteria. Integr Biol (Camb) 2012; 4:728-33. [PMID: 22318347 DOI: 10.1039/c2ib00132b] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Pathogenic Escherichia coli, such as Enterohemorrhagic E. coli (EHEC) and Enteroaggregative E. coli (EAEC), are globally widespread bacteria. Some may cause the hemolytic uremic syndrome (HUS). Varying strains cause epidemics all over the world. Recently, we observed an epidemic outbreak of a multi-resistant EHEC strain in Western Europe, mainly in Germany. The Robert Koch Institute reports >4300 infections and >50 deaths (July, 2011). Farmers lost several million EUR since the origin of infection was unclear. Here, we contribute to the currently ongoing research with a computer-aided study of EHEC transcriptional regulatory interactions, a network of genetic switches that control, for instance, pathogenicity, survival and reproduction of bacterial cells. Our strategy is to utilize knowledge of gene regulatory networks from the evolutionary relative E. coli K-12, a harmless strain mainly used for wet lab studies. In order to provide high-potential candidates for human pathogenic E. coli bacteria, such as EHEC, we developed the integrated online database and an analysis platform EhecRegNet. We utilize 3489 known regulations from E. coli K-12 for predictions of yet unknown gene regulatory interactions in 16 human pathogens. For these strains we predict 40,913 regulatory interactions. EhecRegNet is based on the identification of evolutionarily conserved regulatory sites within the DNA of the harmless E. coli K-12 and the pathogens. Identifying and characterizing EHEC's genetic control mechanism network on a large scale will allow for a better understanding of its survival and infection strategies. This will support the development of urgently needed new treatments. EhecRegNet is online via http://www.ehecregnet.de.
Collapse
Affiliation(s)
- Josch Pauling
- Computational Systems Biology, Max Planck Institute for Informatics, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Pauling J, Röttger R, Tauch A, Azevedo V, Baumbach J. CoryneRegNet 6.0--Updated database content, new analysis methods and novel features focusing on community demands. Nucleic Acids Res 2011; 40:D610-4. [PMID: 22080556 PMCID: PMC3245100 DOI: 10.1093/nar/gkr883] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Post-genomic analysis techniques such as next-generation sequencing have produced vast amounts of data about micro organisms including genetic sequences, their functional annotations and gene regulatory interactions. The latter are genetic mechanisms that control a cell's characteristics, for instance, pathogenicity as well as survival and reproduction strategies. CoryneRegNet is the reference database and analysis platform for corynebacterial gene regulatory networks. In this article we introduce the updated version 6.0 of CoryneRegNet and describe the updated database content which includes, 6352 corynebacterial regulatory interactions compared with 4928 interactions in release 5.0 and 3235 regulations in release 4.0, respectively. We also demonstrate how we support the community by integrating analysis and visualization features for transiently imported custom data, such as gene regulatory interactions. Furthermore, with release 6.0, we provide easy-to-use functions that allow the user to submit data for persistent storage with the CoryneRegNet database. Thus, it offers important options to its users in terms of community demands. CoryneRegNet is publicly available at http://www.coryneregnet.de.
Collapse
Affiliation(s)
- Josch Pauling
- Computational Systems Biology, Max Planck Institute for Informatics, Campus E1.4, 66123 Saarbrücken, Germany
| | | | | | | | | |
Collapse
|
48
|
Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, Schlueter JA, Donoghue MTA, Azam S, Fan G, Whaley AM, Farmer AD, Sheridan J, Iwata A, Tuteja R, Penmetsa RV, Wu W, Upadhyaya HD, Yang SP, Shah T, Saxena KB, Michael T, McCombie WR, Yang B, Zhang G, Yang H, Wang J, Spillane C, Cook DR, May GD, Xu X, Jackson SA. Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat Biotechnol 2011; 30:83-9. [PMID: 22057054 DOI: 10.1038/nbt.2022] [Citation(s) in RCA: 432] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2011] [Accepted: 10/03/2011] [Indexed: 11/08/2022]
Abstract
Pigeonpea is an important legume food crop grown primarily by smallholder farmers in many semi-arid tropical regions of the world. We used the Illumina next-generation sequencing platform to generate 237.2 Gb of sequence, which along with Sanger-based bacterial artificial chromosome end sequences and a genetic map, we assembled into scaffolds representing 72.7% (605.78 Mb) of the 833.07 Mb pigeonpea genome. Genome analysis predicted 48,680 genes for pigeonpea and also showed the potential role that certain gene families, for example, drought tolerance-related genes, have played throughout the domestication of pigeonpea and the evolution of its ancestors. Although we found a few segmental duplication events, we did not observe the recent genome-wide duplication events observed in soybean. This reference genome sequence will facilitate the identification of the genetic basis of agronomically important traits, and accelerate the development of improved pigeonpea varieties that could improve food security in many developing countries.
Collapse
Affiliation(s)
- Rajeev K Varshney
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Patancheru, India.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Gonçalves JP, Francisco AP, Mira NP, Teixeira MC, Sá-Correia I, Oliveira AL, Madeira SC. TFRank: network-based prioritization of regulatory associations underlying transcriptional responses. ACTA ACUST UNITED AC 2011; 27:3149-57. [PMID: 21965816 DOI: 10.1093/bioinformatics/btr546] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
MOTIVATION Uncovering mechanisms underlying gene expression control is crucial to understand complex cellular responses. Studies in gene regulation often aim to identify regulatory players involved in a biological process of interest, either transcription factors coregulating a set of target genes or genes eventually controlled by a set of regulators. These are frequently prioritized with respect to a context-specific relevance score. Current approaches rely on relevance measures accounting exclusively for direct transcription factor-target interactions, namely overrepresentation of binding sites or target ratios. Gene regulation has, however, intricate behavior with overlapping, indirect effect that should not be neglected. In addition, the rapid accumulation of regulatory data already enables the prediction of large-scale networks suitable for higher level exploration by methods based on graph theory. A paradigm shift is thus emerging, where isolated and constrained analyses will likely be replaced by whole-network, systemic-aware strategies. RESULTS We present TFRank, a graph-based framework to prioritize regulatory players involved in transcriptional responses within the regulatory network of an organism, whereby every regulatory path containing genes of interest is explored and incorporated into the analysis. TFRank selected important regulators of yeast adaptation to stress induced by quinine and acetic acid, which were missed by a direct effect approach. Notably, they reportedly confer resistance toward the chemicals. In a preliminary study in human, TFRank unveiled regulators involved in breast tumor growth and metastasis when applied to genes whose expression signatures correlated with short interval to metastasis.
Collapse
|
50
|
PATRIC: the comprehensive bacterial bioinformatics resource with a focus on human pathogenic species. Infect Immun 2011; 79:4286-98. [PMID: 21896772 DOI: 10.1128/iai.00207-11] [Citation(s) in RCA: 201] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided.
Collapse
|