1
|
Perdas E, Gadzalska K, Hrytsiuk I, Borowiec M, Fendler W, Młynarski W. Case report: Neonatal diabetes mellitus with congenital hypothyroidism as a result of biallelic heterozygous mutations in GLIS3 gene. Pediatr Diabetes 2022; 23:668-674. [PMID: 35394098 DOI: 10.1111/pedi.13341] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 03/25/2022] [Accepted: 04/04/2022] [Indexed: 11/30/2022] Open
Abstract
Neonatal diabetes mellitus with congenital hypothyroidism (NDH) syndrome (MIM# 610199) is a rare disease caused by autosomal recessive mutations in the GLIS3 gene. GLIS3 is an important transcription factor that might acts as both a repressor and activator of transcription. To date, 22 cases of NDH syndrome from 16 families and 11 countries have been described. Herein, we report a child who developed diabetes during the first week of age. Additionally, she suffered from congenital hypothyroidism, cardiac abnormalities, and polycystic kidney disease. Genetic analysis revealed that patient is a carrier of two novel heterozygous mutations, p.Pro444fsdelG (c.1330delC) and p.His647Arg (c.1940A > G) in the GLIS3 gene. Each was inherited from clinically healthy father and mother, respectively. Bioinformatic tools (SIFT, PolyPhen2, PROVEAN and SWISS-MODEL) declared that the p.His647Arg (c.1940A > G) variant has strong detrimental effect and disturbs Kruppel-like zinc finger domain. All but one so far described cases of NDH syndrome have been caused by homozygous of GLIS3, making the described case the second case of pathogenic, compound heterozygosity of GLIS3 worldwide posing substantial clinical novelty and detailing an interesting interplay between the observed variants and GLIS3 expression, which seems to be autoregulated. Hence, the damaging missense mutation may further reduce the expression of any remaining functional alleles. This case report expands our understanding of the clinical phenotype, treatment approaches, and outcome of infants with GLIS3 mutations and indicates the need for further research to deepen our understanding of the role of GLIS3.
Collapse
Affiliation(s)
- Ewelina Perdas
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, Lodz, Poland
| | - Karolina Gadzalska
- Department of Clinical and Laboratory Genetics, Medical University of Lodz, Lodz, Poland
| | - Ihor Hrytsiuk
- Western Ukrainian Specialised Children's Medical Centre, Lviv, Ukraine
| | - Maciej Borowiec
- Department of Clinical and Laboratory Genetics, Medical University of Lodz, Lodz, Poland
| | - Wojciech Fendler
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, Lodz, Poland
| | - Wojciech Młynarski
- Department of Pediatrics, Oncology and Hematology, Medical University of Lodz, Lodz, Poland
| |
Collapse
|
2
|
Harrison PM. fLPS 2.0: rapid annotation of compositionally-biased regions in biological sequences. PeerJ 2021; 9:e12363. [PMID: 34760378 PMCID: PMC8557692 DOI: 10.7717/peerj.12363] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 09/30/2021] [Indexed: 12/12/2022] Open
Abstract
Compositionally-biased (CB) regions in biological sequences are enriched for a subset of sequence residue types. These can be shorter regions with a concentrated bias (i.e., those termed ‘low-complexity’), or longer regions that have a compositional skew. These regions comprise a prominent class of the uncharacterized ‘dark matter’ of the protein universe. Here, I report the latest version of the fLPS package for the annotation of CB regions, which includes added consideration of DNA sequences, to label the eight possible biased regions of DNA. In this version, the user is now able to restrict analysis to a specified subset of residue types, and also to filter for previously annotated domains to enable detection of discontinuous CB regions. A ‘thorough’ option has been added which enables the labelling of subtler biases, typically made from a skew for several residue types. In the output, protein CB regions are now labelled with bias classes reflecting the physico-chemical character of the biasing residues. The fLPS 2.0 package is available from: https://github.com/pmharrison/flps2 or in a Supplemental File of this paper.
Collapse
Affiliation(s)
- Paul M Harrison
- Department of Biology, McGill University, Montreal, QC, Canada
| |
Collapse
|
3
|
Tribbles Pseudokinase 3 Contributes to Cancer Stemness of Endometrial Cancer Cells by Regulating β-Catenin Expression. Cancers (Basel) 2020; 12:cancers12123785. [PMID: 33334065 PMCID: PMC7765506 DOI: 10.3390/cancers12123785] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Revised: 11/28/2020] [Accepted: 12/10/2020] [Indexed: 01/18/2023] Open
Abstract
Simple Summary Endometrial cancer (EC) is the second most common female malignancy worldwide, but the pathogenesis is not fully understood. Tribbles pseudokinase 3 (TRIB3) is a kind of scaffold protein that may regulate multiple cellular processes by organizing binding partner proteins involving signaling transduction pathways. The goal of this study is to investigate if TRIB3 is involved in the malignant features of EC. Our data demonstrate that TRIB3 positively regulates the cancer stem-cell activity and in vivo tumorigenicity of EC cells by modulating β-catenin signaling through directly interacting with the ELF4 transcription factor. Our results could lead to new insight for developing a novel therapeutic strategy for EC by targeting TRIB3. Abstract Endometrial cancer (EC) is the second most common gynecological malignancy worldwide. Tribbles pseudokinase 3 (TRIB3) is a scaffolding protein that regulates intracellular signal transduction, and its role in tumor development is controversial. Here, we investigated the biological function of TRIB3 in EC. We found that the messenger RNA (mRNA) expression level of TRIB3 was significantly and positively correlated with shorter overall survival of EC patients in The Cancer Genome Atlas database. The protein expression of TRIB3 was found to be significantly increased in EC cancer stem cells (CSCs) enriched by tumorsphere cultivation. Knockdown of TRIB3 in EC cells suppressed tumorsphere formation, the expression of cancer stemness genes, and the in vivo tumorigenesis. The expression of β-catenin at both the protein and the mRNA levels was downregulated upon TRIB3 silencing. TRIB3 was found to interact with E74 Like ETS transcription factor 4 (ELF4) in the nucleus and bound to ELF4 consensus sites within the catenin beta 1 (CTNNB1) promoter in EC cell lines. These data indicated that TRIB3 may regulate CTNNB1 transcription by enhancing the recruitment of ELF4 to the CTNNB1 promoter. In conclusion, our results suggest that TRIB3 plays an oncogenic role in EC and positively regulates the self-renewal and tumorigenicity of EC-CSCs. Targeting TRIB3 is considered as a potential therapeutic strategy in future EC therapy.
Collapse
|
4
|
Dreos R, Ambrosini G, Groux R, Cavin Périer R, Bucher P. The eukaryotic promoter database in its 30th year: focus on non-vertebrate organisms. Nucleic Acids Res 2016; 45:D51-D55. [PMID: 27899657 PMCID: PMC5210552 DOI: 10.1093/nar/gkw1069] [Citation(s) in RCA: 174] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 10/21/2016] [Accepted: 10/24/2016] [Indexed: 01/21/2023] Open
Abstract
We present an update of the Eukaryotic Promoter Database EPD (http://epd.vital-it.ch), more specifically on the EPDnew division, which contains comprehensive organisms-specific transcription start site (TSS) collections automatically derived from next generation sequencing (NGS) data. Thanks to the abundant release of new high-throughput transcript mapping data (CAGE, TSS-seq, GRO-cap) the database could be extended to plant and fungal species. We further report on the expansion of the mass genome annotation (MGA) repository containing promoter-relevant chromatin profiling data and on improvements for the EPD entry viewers. Finally, we present a new data access tool, ChIP-Extract, which enables computational biologists to extract diverse types of promoter-associated data in numerical table formats that are readily imported into statistical analysis platforms such as R.
Collapse
Affiliation(s)
- René Dreos
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | - Giovanna Ambrosini
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland.,Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
| | - Romain Groux
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | | | - Philipp Bucher
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland.,Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
| |
Collapse
|
5
|
Yella VR, Bansal M. In silico Identification of Eukaryotic Promoters. SYSTEMS AND SYNTHETIC BIOLOGY 2015. [DOI: 10.1007/978-94-017-9514-2_4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Dreos R, Ambrosini G, Périer RC, Bucher P. The Eukaryotic Promoter Database: expansion of EPDnew and new promoter analysis tools. Nucleic Acids Res 2014; 43:D92-6. [PMID: 25378343 PMCID: PMC4383928 DOI: 10.1093/nar/gku1111] [Citation(s) in RCA: 207] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
We present an update of EPDNew (http://epd.vital-it.ch), a recently introduced new part of the Eukaryotic Promoter Database (EPD) which has been described in more detail in a previous NAR Database Issue. EPD is an old database of experimentally characterized eukaryotic POL II promoters, which are conceptually defined as transcription initiation sites or regions. EPDnew is a collection of automatically compiled, organism-specific promoter lists complementing the old corpus of manually compiled promoter entries of EPD. This new part is exclusively derived from next generation sequencing data from high-throughput promoter mapping experiments. We report on the recent growth of EPDnew, its extension to additional model organisms and its improved integration with other bioinformatics resources developed by our group, in particular the Signal Search Analysis and ChIP-Seq web servers.
Collapse
Affiliation(s)
- René Dreos
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | - Giovanna Ambrosini
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
| | - Rouayda Cavin Périer
- Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
| | - Philipp Bucher
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland Swiss Institute for Experimental Cancer Research (ISREC), School of Life Sciences, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland
| |
Collapse
|
7
|
Dreos R, Ambrosini G, Cavin Périer R, Bucher P. EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era. Nucleic Acids Res 2012. [PMID: 23193273 PMCID: PMC3531148 DOI: 10.1093/nar/gks1233] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The Eukaryotic Promoter Database (EPD), available online at http://epd.vital-it.ch, is a collection of experimentally defined eukaryotic POL II promoters which has been maintained for more than 25 years. A promoter is represented by a single position in the genome, typically the major transcription start site (TSS). EPD primarily serves biologists interested in analysing the motif content, chromatin structure or DNA methylation status of co-regulated promoter subsets. Initially, promoter evidence came from TSS mapping experiments targeted at single genes and published in journal articles. Today, the TSS positions provided by EPD are inferred from next-generation sequencing data distributed in electronic form. Traditionally, EPD has been a high-quality database with low coverage. The focus of recent efforts has been to reach complete gene coverage for important model organisms. To this end, we introduced a new section called EPDnew, which is automatically assembled from multiple, carefully selected input datasets. As another novelty, we started to use chromatin signatures in addition to mRNA 5′tags to locate promoters of weekly expressed genes. Regarding user interfaces, we introduced a new promoter viewer which enables users to explore promoter-defining experimental evidence in a UCSC genome browser window.
Collapse
Affiliation(s)
- René Dreos
- Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | | | | | | |
Collapse
|
8
|
Zheng G, Liu Q, Ding G, Wei C, Li Y. Towards biological characters of interactions between transcription factors and their DNA targets in mammals. BMC Genomics 2012; 13:388. [PMID: 22888987 PMCID: PMC3472306 DOI: 10.1186/1471-2164-13-388] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2012] [Accepted: 06/29/2012] [Indexed: 01/07/2023] Open
Abstract
Background In post-genomic era, the study of transcriptional regulation is pivotal to decode genetic information. Transcription factors (TFs) are central proteins for transcriptional regulation, and interactions between TFs and their DNA targets (TFBSs) are important for downstream genes’ expression. However, the lack of knowledge about interactions between TFs and TFBSs is still baffling people to investigate the mechanism of transcription. Results To expand the knowledge about interactions between TFs and TFBSs, three biological features (sequence feature, structure feature, and evolution feature) were utilized to build TFBS identification models for studying binding preference between TFs and their DNA targets in mammals. Results show that each feature does have fairly well performance to capture TFBSs, and the hybrid model combined all three features is more robust for TFBS identification. Subsequently, correspondence between TFs and their TFBSs was investigated to explore interactions among them in mammals. Results indicate that TFs and TFBSs are reciprocal in sequence, structure, and evolution level. Conclusions Our work demonstrates that, to some extent, TFs and TFBSs have developed a coevolutionary relationship in order to keep their physical binding and maintain their regulatory functions. In summary, our work will help understand transcriptional regulation and interpret binding mechanism between proteins and DNAs.
Collapse
Affiliation(s)
- Guangyong Zheng
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
| | | | | | | | | |
Collapse
|
9
|
Tazearslan C, Cho M, Suh Y. Discovery of functional gene variants associated with human longevity: opportunities and challenges. J Gerontol A Biol Sci Med Sci 2011; 67:376-83. [PMID: 22156437 DOI: 10.1093/gerona/glr200] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Age is a major risk factor for many human diseases. Extremely long-lived individuals, such as centenarians, have managed to ward off age-related diseases and serve as human models to search for the genetic factors that influence longevity. The discovery of evolutionarily conserved pathways with major impact on life span in animal models has provided tantalizing opportunities to test the relevance of these pathways for human longevity. Here we specifically focus on the insulin/insulin-like growth factor-1 signaling as a prime candidate pathway. Coupled with the rapid advances in ultra high-throughput sequencing technologies, it is now feasible to comprehensively analyze all possible sequence variants in candidate genes segregating with a longevity phenotype and to investigate the functional consequences of the associated variants. A better understanding of the functional genes that affect healthy longevity in humans may lead to a rational basis for intervention strategies that can delay or prevent age-related diseases.
Collapse
Affiliation(s)
- Cagdas Tazearslan
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | | | | |
Collapse
|
10
|
Thorsen K, Schepeler T, Øster B, Rasmussen MH, Vang S, Wang K, Hansen KQ, Lamy P, Pedersen JS, Eller A, Mansilla F, Laurila K, Wiuf C, Laurberg S, Dyrskjøt L, Ørntoft TF, Andersen CL. Tumor-specific usage of alternative transcription start sites in colorectal cancer identified by genome-wide exon array analysis. BMC Genomics 2011; 12:505. [PMID: 21999571 PMCID: PMC3208247 DOI: 10.1186/1471-2164-12-505] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2011] [Accepted: 10/14/2011] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Approximately half of all human genes use alternative transcription start sites (TSSs) to control mRNA levels and broaden the transcriptional output in healthy tissues. Aberrant expression patterns promoting carcinogenesis, however, may arise from alternative promoter usage. RESULTS By profiling 108 colorectal samples using exon arrays, we identified nine genes (TCF12, OSBPL1A, TRAK1, ANK3, CHEK1, UGP2, LMO7, ACSL5, and SCIN) showing tumor-specific alternative TSS usage in both adenoma and cancer samples relative to normal mucosa. Analysis of independent exon array data sets corroborated these findings. Additionally, we confirmed the observed patterns for selected mRNAs using quantitative real-time reverse-transcription PCR. Interestingly, for some of the genes, the tumor-specific TSS usage was not restricted to colorectal cancer. A comprehensive survey of the nine genes in lung, bladder, liver, prostate, gastric, and brain cancer revealed significantly altered mRNA isoform ratios for CHEK1, OSBPL1A, and TCF12 in a subset of these cancer types.To identify the mechanism responsible for the shift in alternative TSS usage, we antagonized the Wnt-signaling pathway in DLD1 and Ls174T colorectal cancer cell lines, which remarkably led to a shift in the preferred TSS for both OSBPL1A and TRAK1. This indicated a regulatory role of the Wnt pathway in selecting TSS, possibly also involving TP53 and SOX9, as their transcription binding sites were enriched in the promoters of the tumor preferred isoforms together with their mRNA levels being increased in tumor samples. Finally, to evaluate the prognostic impact of the altered TSS usage, immunohistochemistry was used to show deregulation of the total protein levels of both TCF12 and OSBPL1A, corresponding to the mRNA levels observed. Furthermore, the level of nuclear TCF12 had a significant correlation to progression free survival in a cohort of 248 stage II colorectal cancer samples. CONCLUSIONS Alternative TSS usage in colorectal adenoma and cancer samples has been shown for nine genes, and OSBPL1A and TRAK1 were found to be regulated in vitro by Wnt signaling. TCF12 protein expression was upregulated in cancer samples and correlated with progression free survival.
Collapse
Affiliation(s)
- Kasper Thorsen
- Department of Molecular Medicine, Aarhus University Hospital, Skejby, 8200 Aarhus N, Denmark
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Ghedira K, Hornischer K, Konovalova T, Jenhani AZ, Benkahla A, Kel A. Identification of key mechanisms controlling gene expression in Leishmania infected macrophages using genome-wide promoter analysis. INFECTION GENETICS AND EVOLUTION 2010; 11:769-77. [PMID: 21093613 DOI: 10.1016/j.meegid.2010.10.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2010] [Revised: 10/18/2010] [Accepted: 10/19/2010] [Indexed: 01/15/2023]
Abstract
The present study describes the in silico prediction of the regulatory network of Leishmania infected human macrophages. The construction of the gene regulatory network requires the identification of Transcription Factor Binding Sites (TFBSs) in the regulatory regions (promoters, enhancers) of genes that are regulated upon Leishmania infection. The promoters of human, mouse, rat, dog and chimpanzee genes were first identified in the whole genomes using available experimental data on full length cDNA sequences or deep CAGE tag data (DBTSS, FANTOM3, FANTOM4), mRNA models (ENSEMBL), or using hand annotated data (EPD, TRANSFAC). A phylogenetic footprinting analysis and a MATCH analysis of the promoter sequences were then performed to predict TFBS. Then, an SQL database that integrates all results of promoter analysis as well as other genome annotation information obtained from ENSEMBL, TRANSFAC, TRED (Transcription Regulatory Element Database), ORegAnno and the ENCODE project, was established. Finally publicly available expression data from human Leishmania infected macrophages were analyzed using the genome-wide information on predicted TFBS with the computer system ExPlain™. The gene regulatory network was constructed and activated signal transduction pathways were revealed. The Irak1 pathway was identified as a key pathway regulating gene expression changes in Leishmania infected macrophages.
Collapse
Affiliation(s)
- Kais Ghedira
- Laboratory of Immunology, Vaccinology, and Molecular Genetics, Institut Pasteur de Tunis, 13 place Pasteur BP 74, Tunis, Tunisia
| | | | | | | | | | | |
Collapse
|
12
|
Chen T, Yu WH, Izard J, Baranova OV, Lakshmanan A, Dewhirst FE. The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2010; 2010:baq013. [PMID: 20624719 PMCID: PMC2911848 DOI: 10.1093/database/baq013] [Citation(s) in RCA: 718] [Impact Index Per Article: 51.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The human oral microbiome is the most studied human microflora, but 53% of the species have not yet been validly named and 35% remain uncultivated. The uncultivated taxa are known primarily from 16S rRNA sequence information. Sequence information tied solely to obscure isolate or clone numbers, and usually lacking accurate phylogenetic placement, is a major impediment to working with human oral microbiome data. The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with a body site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity based on a curated 16S rRNA gene-based provisional naming scheme. Currently, two primary types of information are provided in HOMD—taxonomic and genomic. Named oral species and taxa identified from 16S rRNA gene sequence analysis of oral isolates and cloning studies were placed into defined 16S rRNA phylotypes and each given unique Human Oral Taxon (HOT) number. The HOT interlinks phenotypic, phylogenetic, genomic, clinical and bibliographic information for each taxon. A BLAST search tool is provided to match user 16S rRNA gene sequences to a curated, full length, 16S rRNA gene reference data set. For genomic analysis, HOMD provides comprehensive set of analysis tools and maintains frequently updated annotations for all the human oral microbial genomes that have been sequenced and publicly released. Oral bacterial genome sequences, determined as part of the Human Microbiome Project, are being added to the HOMD as they become available. We provide HOMD as a conceptual model for the presentation of microbiome data for other human body sites. Database URL: http://www.homd.org
Collapse
Affiliation(s)
- Tsute Chen
- The Forsyth Institute, Boston, MA 02115, USA.
| | | | | | | | | | | |
Collapse
|
13
|
Kumar GR, Sakthivel K, Sundaram R, Neeraja C, Balachandran S, Rani NS, Viraktamath B, Madhav M. Allele mining in crops: Prospects and potentials. Biotechnol Adv 2010; 28:451-61. [DOI: 10.1016/j.biotechadv.2010.02.007] [Citation(s) in RCA: 88] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2009] [Revised: 09/21/2009] [Accepted: 09/25/2009] [Indexed: 12/26/2022]
|
14
|
Tian S, Haney RA, Feder ME. Phylogeny disambiguates the evolution of heat-shock cis-regulatory elements in Drosophila. PLoS One 2010; 5:e10669. [PMID: 20498853 PMCID: PMC2871787 DOI: 10.1371/journal.pone.0010669] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2010] [Accepted: 04/23/2010] [Indexed: 11/19/2022] Open
Abstract
Heat-shock genes have a well-studied control mechanism for their expression that is mediated through cis-regulatory motifs known as heat-shock elements (HSEs). The evolution of important features of this control mechanism has not been investigated in detail, however. Here we exploit the genome sequencing of multiple Drosophila species, combined with a wealth of available information on the structure and function of HSEs in D. melanogaster, to undertake this investigation. We find that in single-copy heat shock genes, entire HSEs have evolved or disappeared 14 times, and the phylogenetic approach bounds the timing and direction of these evolutionary events in relation to speciation. In contrast, in the multi-copy gene Hsp70, the number of HSEs is nearly constant across species. HSEs evolve in size, position, and sequence within heat-shock promoters. In turn, functional significance of certain features is implicated by preservation despite this evolutionary change; these features include tail-to-tail arrangements of HSEs, gapped HSEs, and the presence or absence of entire HSEs. The variation among Drosophila species indicates that the cis-regulatory encoding of responsiveness to heat and other stresses is diverse. The broad dimensions of variation uncovered are particularly important as they suggest a substantial challenge for functional studies.
Collapse
Affiliation(s)
- Sibo Tian
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois, United States of America
| | - Robert A. Haney
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois, United States of America
| | - Martin E. Feder
- Department of Organismal Biology and Anatomy, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
15
|
Solovyev VV, Shahmuradov IA, Salamov AA. Identification of promoter regions and regulatory sites. Methods Mol Biol 2010; 674:57-83. [PMID: 20827586 DOI: 10.1007/978-1-60761-854-6_5] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Promoter sequences are the main regulatory elements of gene expression. Their recognition by computer algorithms is fundamental for understanding gene expression patterns, cell specificity and development. This chapter describes the advanced approaches to identify promoters in animal, plant and bacterial sequences. Also, we discuss an approach to identify statistically significant regulatory motifs in genomic sequences.
Collapse
|
16
|
Contrasting patterns of transposable element insertions in Drosophila heat-shock promoters. PLoS One 2009; 4:e8486. [PMID: 20041194 PMCID: PMC2793543 DOI: 10.1371/journal.pone.0008486] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2009] [Accepted: 11/22/2009] [Indexed: 01/22/2023] Open
Abstract
The proximal promoter regions of heat-shock genes harbor a remarkable number of P transposable element (TE) insertions relative to both positive and negative control proximal promoter regions in natural populations of Drosophila melanogaster. We have screened the sequenced genomes of 12 species of Drosophila to test whether this pattern is unique to these populations. In the 12 species' genomes, transposable element insertions are no more abundant in promoter regions of single-copy heat-shock genes than in promoters with similar or dissimilar architecture. Also, insertions appear randomly distributed across the promoter region, whereas insertions clustered near the transcription start site in promoters of single-copy heat-shock genes in D. melanogaster natural populations. Hsp70 promoters exhibit more TE insertions per promoter than all other genesets in the 12 species, similarly to in natural populations of D. melanogaster. Insertions in the Hsp70 promoter region, however, cluster away from the transcription start site in the 12 species, but near it in natural populations of D. melanogaster. These results suggest that D. melanogaster heat-shock promoters are unique in terms of their interaction with transposable elements, and confirm that Hsp70 promoters are distinctive in TE insertions across Drosophila.
Collapse
|
17
|
Rach EA, Yuan HY, Majoros WH, Tomancak P, Ohler U. Motif composition, conservation and condition-specificity of single and alternative transcription start sites in the Drosophila genome. Genome Biol 2009; 10:R73. [PMID: 19589141 PMCID: PMC2728527 DOI: 10.1186/gb-2009-10-7-r73] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2008] [Revised: 04/21/2009] [Accepted: 07/09/2009] [Indexed: 01/05/2023] Open
Abstract
A map of transcription start sites across the Drosophila genome, providing insights into initiation patterns and spatiotemporal conditions. Background Transcription initiation is a key component in the regulation of gene expression. mRNA 5' full-length sequencing techniques have enhanced our understanding of mammalian transcription start sites (TSSs), revealing different initiation patterns on a genomic scale. Results To identify TSSs in Drosophila melanogaster, we applied a hierarchical clustering strategy on available 5' expressed sequence tags (ESTs) and identified a high quality set of 5,665 TSSs for approximately 4,000 genes. We distinguished two initiation patterns: 'peaked' TSSs, and 'broad' TSS cluster groups. Peaked promoters were found to contain location-specific sequence elements; conversely, broad promoters were associated with non-location-specific elements. In alignments across other Drosophila genomes, conservation levels of sequence elements exceeded 90% within the melanogaster subgroup, but dropped considerably for distal species. Elements in broad promoters had lower levels of conservation than those in peaked promoters. When characterizing the distributions of ESTs, 64% of TSSs showed distinct associations to one out of eight different spatiotemporal conditions. Available whole-genome tiling array time series data revealed different temporal patterns of embryonic activity across the majority of genes with distinct alternative promoters. Many genes with maternally inherited transcripts were found to have alternative promoters utilized later in development. Core promoters of maternally inherited transcripts showed differences in motif composition compared to zygotically active promoters. Conclusions Our study provides a comprehensive map of Drosophila TSSs and the conditions under which they are utilized. Distinct differences in motif associations with initiation pattern and spatiotemporal utilization illustrate the complex regulatory code of transcription initiation.
Collapse
Affiliation(s)
- Elizabeth A Rach
- Program in Computational Biology and Bioinformatics, Duke University, Science Drive, Durham, NC 27708, USA
| | | | | | | | | |
Collapse
|
18
|
Parida SK, Dalal V, Singh AK, Singh NK, Mohapatra T. Genic non-coding microsatellites in the rice genome: characterization, marker design and use in assessing genetic and evolutionary relationships among domesticated groups. BMC Genomics 2009; 10:140. [PMID: 19335879 PMCID: PMC2680414 DOI: 10.1186/1471-2164-10-140] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2008] [Accepted: 03/31/2009] [Indexed: 11/13/2022] Open
Abstract
Background Completely sequenced plant genomes provide scope for designing a large number of microsatellite markers, which are useful in various aspects of crop breeding and genetic analysis. With the objective of developing genic but non-coding microsatellite (GNMS) markers for the rice (Oryza sativa L.) genome, we characterized the frequency and relative distribution of microsatellite repeat-motifs in 18,935 predicted protein coding genes including 14,308 putative promoter sequences. Results We identified 19,555 perfect GNMS repeats with densities ranging from 306.7/Mb in chromosome 1 to 450/Mb in chromosome 12 with an average of 357.5 GNMS per Mb. The average microsatellite density was maximum in the 5' untranslated regions (UTRs) followed by those in introns, promoters, 3'UTRs and minimum in the coding sequences (CDS). Primers were designed for 17,966 (92%) GNMS repeats, including 4,288 (94%) hypervariable class I types, which were bin-mapped on the rice genome. The GNMS markers were most polymorphic in the intronic region (73.3%) followed by markers in the promoter region (53.3%) and least in the CDS (26.6%). The robust polymerase chain reaction (PCR) amplification efficiency and high polymorphic potential of GNMS markers over genic coding and random genomic microsatellite markers suggest their immediate use in efficient genotyping applications in rice. A set of these markers could assess genetic diversity and establish phylogenetic relationships among domesticated rice cultivar groups. We also demonstrated the usefulness of orthologous and paralogous conserved non-coding microsatellite (CNMS) markers, identified in the putative rice promoter sequences, for comparative physical mapping and understanding of evolutionary and gene regulatory complexities among rice and other members of the grass family. The divergence between long-grained aromatics and subspecies japonica was estimated to be more recent (0.004 Mya) compared to short-grained aromatics from japonica (0.006 Mya) and long-grained aromatics from subspecies indica (0.014 Mya). Conclusion Our analyses showed that GNMS markers with their high polymorphic potential would be preferred candidate functional markers in various marker-based applications in rice genetics, genomics and breeding. The CNMS markers provided encouraging implications for their use in comparative genome mapping and understanding of evolutionary complexities in rice and other members of grass family.
Collapse
Affiliation(s)
- Swarup Kumar Parida
- National Research Centre on Plant Biotechnology, Indian Agricultural Research Institute, New Delhi, India.
| | | | | | | | | |
Collapse
|
19
|
Portales-Casamar E, Kirov S, Lim J, Lithwick S, Swanson MI, Ticoll A, Snoddy J, Wasserman WW. PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation. Genome Biol 2008; 8:R207. [PMID: 17916232 PMCID: PMC2246282 DOI: 10.1186/gb-2007-8-10-r207] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2007] [Revised: 09/05/2007] [Accepted: 09/28/2007] [Indexed: 01/29/2023] Open
Abstract
PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at , is open for business.
Collapse
Affiliation(s)
- Elodie Portales-Casamar
- Centre for Molecular Medicine and Therapeutics, CFRI, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada.
| | | | | | | | | | | | | | | |
Collapse
|
20
|
Hansen MA, Nielsen JE, Retelska D, Larsen N, Leffers H. A shared promoter region suggests a common ancestor for the human VCX/Y, SPANX, and CSAG gene families and the murine CYPT family. Mol Reprod Dev 2008; 75:219-29. [PMID: 17342728 DOI: 10.1002/mrd.20651] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Many testis-specific genes from the sex chromosomes are subject to rapid evolution, which can make it difficult to identify murine genes in the human genome. The murine CYPT gene family includes 15 members, but orthologs were undetectable in the human genome. However, using refined homology search, sequences corresponding to the shared promoter region of the CYPT family were identified at 39 loci. Most loci were located immediately upstream of genes belonging to the VCX/Y, SPANX, or CSAG gene families. Sequence comparison of the loci revealed a conserved CYPT promoter-like (CPL) element featuring TATA and CCAAT boxes. The expression of members of the three families harboring the CPL resembled the murine expression of the CYPT family, with weak expression in late pachytene spermatocytes and predominant expression in spermatids, but some genes were also weakly expressed in somatic cells and in other germ cell types. The genomic regions harboring the gene families were rich in direct and inverted segmental duplications (SD), which may facilitate gene conversion and rapid evolution. The conserved CPL and the common expression profiles suggest that the human VCX/Y, SPANX, and CSAG2 gene families together with the murine SPANX gene and the CYPT family may share a common ancestor. Finally, we present evidence that VCX/Y and SPANX may be paralogs with a similar protein structure consisting of C terminal acidic repeats of variable lengths.
Collapse
Affiliation(s)
- Martin A Hansen
- Department of Growth and Reproduction, Rigshospitalet, Copenhagen University Hospital, Blegdamsvej, Denmark.
| | | | | | | | | |
Collapse
|
21
|
Faiger H, Ivanchenko M, Haran TE. Nearest-neighbor non-additivity versus long-range non-additivity in TATA-box structure and its implications for TBP-binding mechanism. Nucleic Acids Res 2007; 35:4409-19. [PMID: 17576671 PMCID: PMC1935006 DOI: 10.1093/nar/gkm451] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
TBP recognizes its target sites, TATA boxes, by recognizing their sequence-dependent structure and flexibility. Studying this mode of TATA-box recognition, termed ‘indirect readout’, is important for elucidating the binding mechanism in this system, as well as for developing methods to locate new binding sites in genomic DNA. We determined the binding stability and TBP-induced TATA-box bending for consensus-like TATA boxes. In addition, we calculated the individual information score of all studied sequences. We show that various non-additive effects exist in TATA boxes, dependent on their structural properties. By several criterions, we divide TATA boxes to two main groups. The first group contains sequences with 3–4 consecutive adenines. Sequences in this group have a rigid context-independent cooperative structure, best described by a nearest-neighbor non-additive model. Sequences in the second group have a flexible, context-dependent conformation, which cannot be described by an additive model or by a nearest-neighbor non-additive model. Classifying TATA boxes by these and other structural rules clarifies the different recognition pathways and binding mechanisms used by TBP upon binding to different TATA boxes. We discuss the structural and evolutionary sources of the difficulties in predicting new binding sites by probabilistic weight-matrix methods for proteins in which indirect readout is dominant.
Collapse
Affiliation(s)
| | | | - Tali E. Haran
- *To whom correspondence should be addressed. 972 4 8293767972 4 8225153
| |
Collapse
|
22
|
Schmid CD, Sengstag T, Bucher P, Delorenzi M. MADAP, a flexible clustering tool for the interpretation of one-dimensional genome annotation data. Nucleic Acids Res 2007; 35:W201-5. [PMID: 17526516 PMCID: PMC1933235 DOI: 10.1093/nar/gkm343] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
A recurring task in the analysis of mass genome annotation data from high-throughput technologies is the identification of peaks or clusters in a noisy signal profile. Examples of such applications are the definition of promoters on the basis of transcription start site profiles, the mapping of transcription factor binding sites based on ChIP-chip data and the identification of quantitative trait loci (QTL) from whole genome SNP profiles. Input to such an analysis is a set of genome coordinates associated with counts or intensities. The output consists of a discrete number of peaks with respective volumes, extensions and center positions. We have developed for this purpose a flexible one-dimensional clustering tool, called MADAP, which we make available as a web server and as standalone program. A set of parameters enables the user to customize the procedure to a specific problem. The web server, which returns results in textual and graphical form, is useful for small to medium-scale applications, as well as for evaluation and parameter tuning in view of large-scale applications, requiring a local installation. The program written in C++ can be freely downloaded from ftp://ftp.epd.unil.ch/pub/software/unix/madap. The MADAP web server can be accessed at http://www.isrec.isb-sib.ch/madap/.
Collapse
|
23
|
Cullen ME, Barton PJR. Mapping transcriptional start sites and in silico DNA footprinting. Methods Mol Biol 2007; 366:203-16. [PMID: 17568126 DOI: 10.1007/978-1-59745-030-0_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Determination of a gene's transcriptional start site underlies the identification of the proximal promoter region and thus facilitates the subsequent analysis of components controlling its expression, namely, cis-acting regulatory elements and their cognate binding proteins. It also enables assembly of meaningful reporter constructs to examine promoter function in different cellular contexts. In this chapter, basic protocols for two experimental approaches to transcriptional start site determination are described: primer extension analysis and the ribonuclease protection assay. Consideration is also given to RNA sources, RNA purification, and primer design. The explosion in genomic DNA and mRNA sequence information derived from genomic sequencing projects, expressed sequence tags and microarrays, combined with in silico analysis, such as automated sequence annotation and gene identification algorithms, now provides an alternative source of detailed information on gene structure and expression. Two approaches to the in silico identification of transcription factor binding sites are described.
Collapse
Affiliation(s)
- Martin E Cullen
- Heart Science Centre, National Heart and Lung Institute, Imperial College London, Harefield, Middlesex, UK
| | | |
Collapse
|
24
|
Davuluri RV. Bioinformatics tools for modeling transcription factor target genes and epigenetic changes. Methods Mol Biol 2007; 408:129-151. [PMID: 18314581 DOI: 10.1007/978-1-59745-547-3_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The combinatorial control of gene regulatory switches involves both transcription factor (TF) complexes and associated epigenetic modifications to the chromatin template. The novel high-throughput technologies, such as Chromatin ImmunoPrecipitation ChIP-chip, have enabled genome-wide in vivo identification of TF target regulatory regions and related epigenetic modifications, which led to the view of highly dynamic TF-DNA interactions in activated or repressed promoters. Consequently, modeling and elucidating the combinatorial interaction of TFs and corresponding cis-regulatory modules in target promoters is of paramount interest. An estimated 5% of the genes in mammalian genomes code for TF proteins, and computational modeling of cis-regulatory logic would rapidly increase the pace of experimental confirmation of TF target promoters at the bench. The purpose of this chapter is to discuss the use of different bioinformatics tools for predicting the target genes of TFs of interest in mammalian genomes, and the application of these methods in the analysis of ChIP-chip experimental data. The author describes most commonly used databases and prediction programs that are available on the World Wide Web and demonstrate the use of some of these programs by an example. A list of these programs is provided along with their web Uniform Resource Locator (URLs) and guidelines for successful application are suggested.
Collapse
Affiliation(s)
- Ramana V Davuluri
- OSU Comprehensive Cancer Center, Ohio State University, Columbus, USA
| |
Collapse
|
25
|
Jin VX, Rabinovich A, Squazzo SL, Green R, Farnham PJ. A computational genomics approach to identify cis-regulatory modules from chromatin immunoprecipitation microarray data--a case study using E2F1. Genome Res 2006; 16:1585-95. [PMID: 17053090 PMCID: PMC1665642 DOI: 10.1101/gr.5520206] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Advances in high-throughput technologies, such as ChIP-chip, and the completion of human and mouse genomic sequences now allow analysis of the mechanisms of gene regulation on a systems level. In this study, we have developed a computational genomics approach (termed ChIPModules), which begins with experimentally determined binding sites and integrates positional weight matrices constructed from transcription factor binding sites, a comparative genomics approach, and statistical learning methods to identify transcriptional regulatory modules. We began with E2F1 binding site information obtained from ChIP-chip analyses of ENCODE regions, from both HeLa and MCF7 cells. Our approach not only distinguished targets from nontargets with a high specificity, but it also identified five regulatory modules for E2F1. One of the identified modules predicted a colocalization of E2F1 and AP-2alpha on a set of target promoters with an intersite distance of <270 bp. We tested this prediction using ChIP-chip assays with arrays containing approximately 14,000 human promoters. We found that both E2F1 and AP-2alpha bind within the predicted distance to a large number of human promoters, demonstrating the strength of our sequence-based, unbiased, and universal protocol. Finally, we have used our ChIPModules approach to develop a database that includes thousands of computationally identified and/or experimentally verified E2F1 target promoters.
Collapse
Affiliation(s)
- Victor X. Jin
- Department of Pharmacology and the Genome Center, University of California–Davis, Davis, California 95616, USA
| | - Alina Rabinovich
- Department of Pharmacology and the Genome Center, University of California–Davis, Davis, California 95616, USA
| | - Sharon L. Squazzo
- Department of Pharmacology and the Genome Center, University of California–Davis, Davis, California 95616, USA
| | - Roland Green
- NimbleGen Systems Inc., Madison, Wisconsin 53711, USA
| | - Peggy J. Farnham
- Department of Pharmacology and the Genome Center, University of California–Davis, Davis, California 95616, USA
- Corresponding author.E-mail ; fax (530) 754-9658
| |
Collapse
|
26
|
Tummala R, Sinha S. Differentiation-specific transcriptional regulation of the ESE-2 gene by a novel keratinocyte-restricted factor. J Cell Biochem 2006; 97:766-81. [PMID: 16229011 DOI: 10.1002/jcb.20685] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Epithelium specific Ets-2 (ESE-2), an epithelium-specific ETS-domain transcription factor, is highly expressed in differentiated keratinocytes. To understand the molecular mechanisms that govern the cell-type and differentiation-specific expression of ESE-2 in keratinocytes, we have focused our studies on the identification and characterization of its cis-regulatory elements. We first performed DNase I hypersensitive site mapping and demonstrated that the promoter region of ESE-2 is in open chromatin conformation in differentiated keratinocytes. Next, we performed transient transfection assays with several 5' serially deleted constructs containing segments of the ESE-2 promoter. These experiments have led to the identification of a short fragment that shows remarkable sequence conservation between several species and harbors most of the transcriptional activity. Interestingly, a high level of transcriptional activity was only observed when the transfected keratinocytes were induced to differentiate by increasing the calcium concentration in the cell-culture medium. To identify the factors that mediate the transcriptional activity, we analyzed this segment by mutational and electrophoretic mobility shift assays (EMSA) experiments. Our studies have identified a critical stretch of nucleotides that is important for both basal as well as calcium responsive reporter activity and that binds to a nuclear factor, keratinocyte restricted factor (KRF). KRF is a novel transcription factor that is restricted to nuclear extracts isolated from keratinocytes and that binds to unique DNA sequences, which do not resemble any known consensus binding motif for transcription factors. Our preliminary experiments shed light on the biochemical nature of KRF and set the stage for future studies in identification of KRF and testing its role in governing ESE-2 gene expression in vivo.
Collapse
Affiliation(s)
- Ramakumar Tummala
- Department of Biochemistry, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | |
Collapse
|
27
|
Podvinec M, Meyer UA. Prediction of cis-regulatory elements for drug-activated transcription factors in the regulation of drug-metabolising enzymes and drug transporters. Expert Opin Drug Metab Toxicol 2006; 2:367-79. [PMID: 16863440 DOI: 10.1517/17425255.2.3.367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The expression of drug-metabolising enzymes is affected by many endogenous and exogenous factors, including sex, age, diet and exposure to xenobiotics and drugs. To understand fully how the organism metabolises a drug, these alterations in gene expression must be taken into account. The central process, the definition of likely regulatory elements in the genes coding for enzymes and transporters involved in drug disposition, can be vastly accelerated using existing and emerging bioinformatics methods to unravel the regulatory networks causing drug-mediated induction of genes. Here, various approaches to predict transcription factor interactions with regulatory DNA elements are reviewed.
Collapse
Affiliation(s)
- Michael Podvinec
- Swiss Institute of Bioinformatics and Biozentrum, University of Basel, Klingelbergstrasse 50-70, CH-4056 Basel, Switzerland.
| | | |
Collapse
|
28
|
Gershenzon NI, Trifonov EN, Ioshikhes IP. The features of Drosophila core promoters revealed by statistical analysis. BMC Genomics 2006; 7:161. [PMID: 16790048 PMCID: PMC1538597 DOI: 10.1186/1471-2164-7-161] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2006] [Accepted: 06/21/2006] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Experimental investigation of transcription is still a very labor- and time-consuming process. Only a few transcription initiation scenarios have been studied in detail. The mechanism of interaction between basal machinery and promoter, in particular core promoter elements, is not known for the majority of identified promoters. In this study, we reveal various transcription initiation mechanisms by statistical analysis of 3393 nonredundant Drosophila promoters. RESULTS Using Drosophila-specific position-weight matrices, we identified promoters containing TATA box, Initiator, Downstream Promoter Element (DPE), and Motif Ten Element (MTE), as well as core elements discovered in Human (TFIIB Recognition Element (BRE) and Downstream Core Element (DCE)). Promoters utilizing known synergetic combinations of two core elements (TATA_Inr, Inr_MTE, Inr_DPE, and DPE_MTE) were identified. We also establish the existence of promoters with potentially novel synergetic combinations: TATA_DPE and TATA_MTE. Our analysis revealed several motifs with the features of promoter elements, including possible novel core promoter element(s). Comparison of Human and Drosophila showed consistent percentages of promoters with TATA, Inr, DPE, and synergetic combinations thereof, as well as most of the same functional and mutual positions of the core elements. No statistical evidence of MTE utilization in Human was found. Distinct nucleosome positioning in particular promoter classes was revealed. CONCLUSION We present lists of promoters that potentially utilize the aforementioned elements/combinations. The number of these promoters is two orders of magnitude larger than the number of promoters in which transcription initiation was experimentally studied. The sequences are ready to be experimentally tested or used for further statistical analysis. The developed approach may be utilized for other species.
Collapse
Affiliation(s)
- Naum I Gershenzon
- Department of Biomedical Informatics, The Ohio State University, 333 West 10Avenue, Columbus OH 43210, USA
- Department of Physics, Wright State University, Dayton OH 45435, USA
| | - Edward N Trifonov
- Genome Diversity Center, Institute of Evolution, University of Haifa, Haifa 31905, Israel
| | - Ilya P Ioshikhes
- Department of Biomedical Informatics, The Ohio State University, 333 West 10Avenue, Columbus OH 43210, USA
| |
Collapse
|
29
|
Whitfield EJ, Pruess M, Apweiler R. Bioinformatics database infrastructure for biotechnology research. J Biotechnol 2006; 124:629-39. [PMID: 16757051 DOI: 10.1016/j.jbiotec.2006.04.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2005] [Revised: 03/06/2006] [Accepted: 04/03/2006] [Indexed: 10/24/2022]
Abstract
Many databases are available that provide valuable data resources for the biotechnological researcher. According to their core data, they can be divided into different types. Some databases provide primary data, like all published nucleotide sequences, others deal with protein sequences. In addition to these two basic types of databases, a huge number of more specialized resources are available, like databases about protein structures, protein identification, special features of genes and/or proteins, or certain organisms. Furthermore, some resources offer integrated views on different types of data, allowing the user to do easy customized queries over large datasets and to compare different types of data.
Collapse
Affiliation(s)
- Eleanor J Whitfield
- EMBL-EBI, Wellcome Trust Genome Campus, Hinxton Hall, Hinxton, Cambs CB10 1SD, UK.
| | | | | |
Collapse
|
30
|
Klimova NV, Levitsky VG, Ignatieva EV, Vasiliev GV, Kobzev VF, Busygina TV, Merkulova TI, Kolchanov NA. Potential binding sites for SF-1: Recognition by the SiteGA method, experimental verification, and search for new target genes. Mol Biol 2006. [DOI: 10.1134/s0026893306030125] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
31
|
Romano RA, Birkaya B, Sinha S. Defining the regulatory elements in the proximal promoter of DeltaNp63 in keratinocytes: Potential roles for Sp1/Sp3, NF-Y, and p63. J Invest Dermatol 2006; 126:1469-79. [PMID: 16645595 DOI: 10.1038/sj.jid.5700297] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
p63, a homolog of the tumor suppressor p53, plays an important role in the formation of stratified epithelium such as those in the epidermis of the skin. The p63 gene gives rise to multiple functionally distinct protein isoforms, including the DeltaNp63 class of isoforms, which lacks the N-terminal transactivation domain and is synthesized from an internal promoter. DeltaNp63 proteins are the predominant isoforms expressed in keratinocytes and are thought to be important for maintenance of the proliferative capacity of these cells. Here, we have examined the transcriptional control mechanisms that govern the expression DeltaNp63 in keratinocytes. We first performed DNase I hypersensitive site mapping and demonstrated that the promoter region of DeltaNp63 is in open chromatin state in keratinocytes. To identify the cis-elements that regulate DeltaNp63, we have performed transient transfection assays in keratinocytes with several DeltaNp63 promoter constructs. This identified a short evolutionarily conserved fragment that harbors most of the transcriptional activity of the DeltaNp63 promoter. Biochemical studies of this element have revealed critical roles for CCAAT-box-binding factor (CBF/NF-Y) and Sp1/Sp3 family of proteins. In addition, our data suggest that DeltaNp63 is recruited to and can activate its own promoter, possibly through protein-protein interactions, thus providing an auto-regulatory loop of self-regulation. These studies support the notion that unique and distinct pathways control the expression of individual p53 family members and their various isoforms.
Collapse
Affiliation(s)
- Rose-Anne Romano
- Department of Biochemistry, State University of New York at Buffalo, Buffalo, New York, USA
| | | | | |
Collapse
|
32
|
Le Texier V, Riethoven JJ, Kumanduri V, Gopalakrishnan C, Lopez F, Gautheret D, Thanaraj TA. AltTrans: transcript pattern variants annotated for both alternative splicing and alternative polyadenylation. BMC Bioinformatics 2006; 7:169. [PMID: 16556303 PMCID: PMC1435940 DOI: 10.1186/1471-2105-7-169] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2005] [Accepted: 03/23/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The three major mechanisms that regulate transcript formation involve the selection of alternative sites for transcription start (TS), splicing, and polyadenylation. Currently there are efforts that collect data & annotation individually for each of these variants. It is important to take an integrated view of these data sets and to derive a data set of alternate transcripts along with consolidated annotation. We have been developing in the past computational pipelines that generate value-added data at genome-scale on individual variant types; these include AltSplice on splicing and AltPAS on polyadenylation. We now extend these pipelines and integrate the resultant data sets to facilitate an integrated view of the contributions from splicing and polyadenylation in the formation of transcript variants. DESCRIPTION The AltSplice pipeline examines gene-transcript alignments and delineates alternative splice events and splice patterns; this pipeline is extended as AltTrans to delineate isoform transcript patterns for each of which both introns/exons and 'terminating' polyA site are delineated; EST/mRNA sequences that qualify the transcript pattern confirm both the underlying splicing and polyadenylation. The AltPAS pipeline examines gene-transcript alignments and delineates all potential polyA sites irrespective of underlying splicing patterns. Resultant polyA sites from both AltTrans and AltPAS are merged. The generated database reports data on alternative splicing, alternative polyadenylation and the resultant alternate transcript patterns; the basal data is annotated for various biological features. The data (named as integrated AltTrans data) generated for both the organisms of human and mouse is made available through the Alternate Transcript Diversity web site at http://www.ebi.ac.uk/atd/. CONCLUSION The reported data set presents alternate transcript patterns that are annotated for both alternative splicing and alternative polyadenylation. Results based on current transcriptome data indicate that the contribution of alternative splicing is larger than that of alternative polyadenylation.
Collapse
Affiliation(s)
- Vincent Le Texier
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Jean-Jack Riethoven
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- 18 Crispin Close, Haverhill, Suffolk, CB9 9PT, UK
| | - Vasudev Kumanduri
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Chellappa Gopalakrishnan
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Fabrice Lopez
- INSERM ERM206, Université de la Méditerranée, Luminy case 928 – 13 288 Marseille Cedex 09, France
| | - Daniel Gautheret
- INSERM ERM206, Université de la Méditerranée, Luminy case 928 – 13 288 Marseille Cedex 09, France
| | - Thangavel Alphonse Thanaraj
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- 4 Copperfields, Saffron Walden, Essex, CB11 4FG, UK
| |
Collapse
|
33
|
Sun H, Palaniswamy SK, Pohar TT, Jin VX, Huang THM, Davuluri RV. MPromDb: an integrated resource for annotation and visualization of mammalian gene promoters and ChIP-chip experimental data. Nucleic Acids Res 2006; 34:D98-103. [PMID: 16381984 PMCID: PMC1347458 DOI: 10.1093/nar/gkj096] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
We have developed Mammalian Promoter Database (MPromDb), a novel database that integrates gene promoters with experimentally supported annotation of transcription start sites, cis-regulatory elements, CpG islands and chromatin immunoprecipitation microarray (ChIP-chip) experimental results with intuitively designed presentation. Release 1.0 of MPromDb currently contains 36 407 promoters and first exons (19 170 from human, 15 953 from mouse and 1284 from rat), 3739 transcription factor (TF)-binding sites (2027 from human, 1181 mouse and 531 rat) and 224 TFs with links to PubMed and GenBank references. Target promoters of TFs that have been identified by ChIP-chip assay are integrated into the database. MPromDb serves as a portal for genome-wide promoter analysis of data generated by ChIP-chip experimental studies. MPromDb can be accessed from .
Collapse
Affiliation(s)
| | | | | | | | | | - Ramana V. Davuluri
- To whom correspondence should be addressed. Tel: +1 614 688 3088; Fax: +1 614 688 4006;
| |
Collapse
|
34
|
Schmid CD, Perier R, Praz V, Bucher P. EPD in its twentieth year: towards complete promoter coverage of selected model organisms. Nucleic Acids Res 2006; 34:D82-5. [PMID: 16381980 PMCID: PMC1347508 DOI: 10.1093/nar/gkj146] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, experimentally defined by a transcription start site (TSS). Access to promoter sequences is provided by pointers to positions in the corresponding genomes. Promoter evidence comes from conventional TSS mapping experiments for individual genes, or, starting from release 73, from mass genome annotation projects. Subsets of promoter sequences with customized 5′ and 3′ extensions can be downloaded from the EPD website. The focus of current development efforts is to reach complete promoter coverage for important model organisms as soon as possible. To speed up this process, a new class of preliminary promoter entries has been introduced as of release 83, which requires less stringent admission criteria. As part of a continuous integration process, new web-based interfaces have been developed, which allow joint analysis of promoter sequences with other bioinformatics resources developed by our group, in particular programs offered by the Signal Search Analysis Server, and gene expression data stored in the CleanEx database. EPD can be accessed at .
Collapse
Affiliation(s)
- Christoph D. Schmid
- Swiss Institute of BioinformaticsChemin des Boveresses 155, CH-1066 Epalinges, Switzerland
| | - Rouaïda Perier
- Swiss Institute of BioinformaticsChemin des Boveresses 155, CH-1066 Epalinges, Switzerland
| | - Viviane Praz
- Swiss Institute of BioinformaticsChemin des Boveresses 155, CH-1066 Epalinges, Switzerland
| | - Philipp Bucher
- Swiss Institute of BioinformaticsChemin des Boveresses 155, CH-1066 Epalinges, Switzerland
- Swiss Institute for Experimental Cancer ResearchChemin des Boveresses 155, CH-1066 Epalinges, Switzerland
- To whom correspondence should be addressed. Tel: +41 21 6925892 (ext. 58); Fax: +41 21 652 5945;
| |
Collapse
|
35
|
Kawaji H, Kasukawa T, Fukuda S, Katayama S, Kai C, Kawai J, Carninci P, Hayashizaki Y. CAGE Basic/Analysis Databases: the CAGE resource for comprehensive promoter analysis. Nucleic Acids Res 2006; 34:D632-6. [PMID: 16381948 PMCID: PMC1347397 DOI: 10.1093/nar/gkj034] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Cap-analysis gene expression (CAGE) Basic and Analysis Databases store an original resource produced by CAGE, which measures expression levels of transcription starting sites by sequencing large amounts of transcript 5′ ends, termed CAGE tags. Millions of human and mouse high-quality CAGE tags derived from different conditions in >20 tissues consisting of >250 RNA samples are essential for identification of novel promoters and promoter characterization in the aspect of expression profile. CAGE Basic Database is a primary database of the CAGE resource, RNA samples, CAGE libraries, CAGE clone and tag sequences and so on. CAGE Analysis Database stores promoter related information, such as counts of related transcripts, CpG islands and conserved genome region. It also provides expression profiles at base pair and promoter levels. Both databases are based on the same framework, CAGE tag starting sites, tag clusters for defining promoters and transcriptional units (TUs). Their associations and TU attributes are available to find promoters of interest. These databases were provided for Functional Annotation Of Mouse 3 (FANTOM3), an international collaboration research project focusing on expanding the transcriptome and subsequent analyses. Now access is free for all users through the World Wide Web at .
Collapse
Affiliation(s)
- Hideya Kawaji
- NTT Software CorporationTeisan Kannai Building 209, Yamashita-cho Naka-ku, Yokohama, Kanagawa, 231-8551, Japan
| | - Takeya Kasukawa
- NTT Software CorporationTeisan Kannai Building 209, Yamashita-cho Naka-ku, Yokohama, Kanagawa, 231-8551, Japan
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Shiro Fukuda
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Shintaro Katayama
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- To whom corresponding should be addressed. Tel: +81 45 503 9222; Fax: +81 45 503 9216;
| | - Chikatoshi Kai
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Jun Kawai
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Piero Carninci
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Yoshihide Hayashizaki
- Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
- Genome Science Laboratory, Discovery Research Institute, RIKEN Wako Institute2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| |
Collapse
|
36
|
Faiger H, Ivanchenko M, Cohen I, Haran TE. TBP flanking sequences: asymmetry of binding, long-range effects and consensus sequences. Nucleic Acids Res 2006; 34:104-19. [PMID: 16407329 PMCID: PMC1326239 DOI: 10.1093/nar/gkj414] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
We carried out in vitro selection experiments to systematically probe the effects of TATA-box flanking sequences on its interaction with the TATA-box binding protein (TBP). This study validates our previous hypothesis that the effect of the flanking sequences on TBP/TATA-box interactions is much more significant when the TATA box has a context-dependent DNA structure. Several interesting observations, with implications for protein-DNA interactions in general, came out of this study. (i) Selected sequences are selection-method specific and TATA-box dependent. (ii) The variability in binding stability as a function of the flanking sequences for (T-A)4 boxes is as large as the variability in binding stability as a function of the core TATA box itself. Thus, for (T-A)4 boxes the flanking sequences completely dominate and determine the binding interaction. (iii) Binding stabilities of all but one of the individual selected sequences of the (T-A)4 form is significantly higher than that of their mononucleotide-based consensus sequence. (iv) Even though the (T-A)4 sequence is symmetric the flanking sequence pattern is asymmetric. We propose that the plasticity of (T-A)n sequences increases the number of conformationally distinct TATA boxes without the need to extent the TBP contact region beyond the eight-base-pair long TATA box.
Collapse
Affiliation(s)
| | | | | | - Tali E. Haran
- To whom correspondence should be addressed. Tel: +972 4 8293767; Fax: +972 4 8225153;
| |
Collapse
|
37
|
Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED, Prychyna Y, Zhang X, Jones SJM. ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics 2006; 22:637-40. [PMID: 16397004 DOI: 10.1093/bioinformatics/btk027] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Our understanding of gene regulation is currently limited by our ability to collectively synthesize and catalogue transcriptional regulatory elements stored in scientific literature. Over the past decade, this task has become increasingly challenging as the accrual of biologically validated regulatory sequences has accelerated. To meet this challenge, novel community-based approaches to regulatory element annotation are required. SUMMARY Here, we present the Open Regulatory Annotation (ORegAnno) database as a dynamic collection of literature-curated regulatory regions, transcription factor binding sites and regulatory mutations (polymorphisms and haplotypes). ORegAnno has been designed to manage the submission, indexing and validation of new annotations from users worldwide. Submissions to ORegAnno are immediately cross-referenced to EnsEMBL, dbSNP, Entrez Gene, the NCBI Taxonomy database and PubMed, where appropriate. AVAILABILITY ORegAnno is available directly through MySQL, Web services, and online at http://www.oreganno.org. All software is licensed under the Lesser GNU Public License (LGPL).
Collapse
Affiliation(s)
- S B Montgomery
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada V5Z 4E6
| | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Roepcke S, Zhi D, Vingron M, Arndt PF. Identification of highly specific localized sequence motifs in human ribosomal protein gene promoters. Gene 2006; 365:48-56. [PMID: 16343812 DOI: 10.1016/j.gene.2005.09.033] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2005] [Revised: 07/22/2005] [Accepted: 09/27/2005] [Indexed: 11/28/2022]
Abstract
For ribosomal protein (RP) genes the start of transcription is rigidly controlled to maintain the 5'-TOP signal on the messenger RNA. The responsible regulatory mechanism is not yet fully understood. Careful comparative analysis of their proximal promoter sequences reveals common characteristics and thus provides clues to the underlying mechanism. We have extracted the proximal promoters of the 80 human cytosolic ribosomal protein genes together with the orthologous mouse sequences. After annotating the set with transcription factor binding sites based on the available literature, we searched for over-represented sequence motifs. We uncovered a novel motif that is localized at a fixed distance downstream to the transcription start. 31 out of the 80 promoters contain the motif in the same orientation around position +62 (standard deviation 6). A second evolutionary conserved and palindromic motif is found 13 times in the RP promoter set, 9 instances of which are located upstream around position -40. In addition, we see a characteristic profile of the GC-content and of the CpG dinucleotide frequencies. Our results support a model for the transcription of ribosomal protein genes in which the maintenance of the accurate start of transcription is provided by specific transcription factors. Such a factor binds the target DNA at a fixed location relative to the TSS, and possibly interacts directly with the basal transcription machinery.
Collapse
Affiliation(s)
- Stefan Roepcke
- Max Planck Institute for Molecular Genetics, Ihnestr. 73, 14195 Berlin, Germany.
| | | | | | | |
Collapse
|
39
|
Kimura K, Wakamatsu A, Suzuki Y, Ota T, Nishikawa T, Yamashita R, Yamamoto JI, Sekine M, Tsuritani K, Wakaguri H, Ishii S, Sugiyama T, Saito K, Isono Y, Irie R, Kushida N, Yoneyama T, Otsuka R, Kanda K, Yokoi T, Kondo H, Wagatsuma M, Murakawa K, Ishida S, Ishibashi T, Takahashi-Fujii A, Tanase T, Nagai K, Kikuchi H, Nakai K, Isogai T, Sugano S. Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes. Genes Dev 2006; 16:55-65. [PMID: 16344560 PMCID: PMC1356129 DOI: 10.1101/gr.4039406] [Citation(s) in RCA: 371] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2005] [Accepted: 09/19/2005] [Indexed: 12/21/2022]
Abstract
By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs). On average, there were 3.1 PAPs per gene, with the composition of one CpG-island-containing promoter per 2.6 CpG-less promoters. In 17% of the PAP-containing loci, tissue-specific use of the PAPs was observed. The richest tissue sources of the tissue-specific PAPs were testis and brain. It was also intriguing that the PAP-containing promoters were enriched in the genes encoding signal transduction-related proteins and were rarer in the genes encoding extracellular proteins, possibly reflecting the varied functional requirement for and the restricted expression of those categories of genes, respectively. The patterns of the first exons were highly diverse as well. On average, there were 7.7 different splicing types of first exons per locus partly produced by the PAPs, suggesting that a wide variety of transcripts can be achieved by this mechanism. Our findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.
Collapse
Affiliation(s)
- Kouichi Kimura
- Life Science Research Laboratory, Central Research Laboratory, Hitachi, Ltd., Kokubunji, Tokyo, 185-8601, Japan
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Abstract
An unsupervised clustering of 4541 DNA sequences containing active promoter regions from vertebrate and arthropod classes (including their viral genes) was performed. All necessary information was solely gathered a priori from the DNA sequences by measuring frequencies of tri-nucleotides and tetra-nucleotides. We employed Super Paramagnetic Clustering, a novel clustering algorithm based on physical properties of an inhomogeneous granular ferromagnet. This method utilizes Swendsen-Wang cluster Monte Carlo simulations to distinguish clusters by measuring pairs of correlation functions from different resolutions. We identified two strongly separated clusters of human viral genes corresponding to the Epstein-Barr virus and the Herpes Simplex virus type 1. In addition, vertebrate and arthropod sequences were successfully separated into two different classes with merely 9.25% of arthropod sequences being misclassified. From a functional perspective, these sequences have high gene function correlations with sequences from the vertebrate cluster. By tuning a clustering parameter, Super Paramagnetic Clustering was able to classify vertebrate class further into two major clusters, from where a large number of housekeeping genes and tissue-specific genes were found respectively. The indications came from observation of gene expression function and consensus transcription factors which were found grouped together in specific positions of the DNA sequences.
Collapse
Affiliation(s)
- Sugiarto Radjiman
- Department of Computational Science, National University of Singapore, 117543 Singapore, Republic of Singapore.
| | | | | | | |
Collapse
|
41
|
Narang V, Sung WK, Mittal A. Computational modeling of oligonucleotide positional densities for human promoter prediction. Artif Intell Med 2005; 35:107-19. [PMID: 16076553 DOI: 10.1016/j.artmed.2005.02.005] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2004] [Revised: 01/31/2005] [Accepted: 02/22/2005] [Indexed: 11/18/2022]
Abstract
OBJECTIVE The gene promoter region controls transcriptional initiation of a gene, which is the most important step in gene regulation. In-silico detection of promoter region in genomic sequences has a number of applications in gene discovery and understanding gene expression regulation. However, computational prediction of eukaryotic poly-II promoters has remained a difficult task. This paper introduces a novel statistical technique for detecting promoter regions in long genomic sequences. METHOD A number of existing techniques analyze the occurrence frequencies of oligonucleotides in promoter sequences as compared to other genomic regions. In contrast, the present work studies the positional densities of oligonucleotides in promoter sequences. The analysis does not require any non-promoter sequence dataset or any model of the background oligonucleotide content of the genome. The statistical model learnt from a dataset of promoter sequences automatically recognizes a number of transcription factor binding sites simultaneously with their occurrence positions relative to the transcription start site. Based on this model, a continuous naïve Bayes classifier is developed for the detection of human promoters and transcription start sites in genomic sequences. RESULTS The present study extends the scope of statistical models in general promoter modeling and prediction. Promoter sequence features learnt by the model correlate well with known biological facts. Results of human transcription start site prediction compare favorably with existing 2nd generation promoter prediction tools.
Collapse
Affiliation(s)
- Vipin Narang
- Department of Computer Science, S16 #06-02, 3 Science Drive 2, National University of Singapore, Singapore 117543, Singapore.
| | | | | |
Collapse
|
42
|
Lee DH, Gershenzon N, Gupta M, Ioshikhes IP, Reinberg D, Lewis BA. Functional characterization of core promoter elements: the downstream core element is recognized by TAF1. Mol Cell Biol 2005; 25:9674-86. [PMID: 16227614 PMCID: PMC1265815 DOI: 10.1128/mcb.25.21.9674-9686.2005] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Downstream elements are a newly appreciated class of core promoter elements of RNA polymerase II-transcribed genes. The downstream core element (DCE) was discovered in the human beta-globin promoter, and its sequence composition is distinct from that of the downstream promoter element (DPE). We show here that the DCE is a bona fide core promoter element present in a large number of promoters and with high incidence in promoters containing a TATA motif. Database analysis indicates that the DCE is found in diverse promoters, supporting its functional relevance in a variety of promoter contexts. The DCE consists of three subelements, and DCE function is recapitulated in a TFIID-dependent manner. Subelement 3 can function independently of the other two and shows a TFIID requirement as well. UV photo-cross-linking results demonstrate that TAF1/TAF(II)250 interacts with the DCE subelement DNA in a sequence-dependent manner. These data show that downstream elements consist of at least two types, those of the DPE class and those of the DCE class; they function via different DNA sequences and interact with different transcription activation factors. Finally, these data argue that TFIID is, in fact, a core promoter recognition complex.
Collapse
Affiliation(s)
- Dong-Hoon Lee
- Department of Biochemistry, Robert Woods Johnson Medical School, 683 Hoes Lane, Piscataway, NJ 08854, USA
| | | | | | | | | | | |
Collapse
|
43
|
Abstract
Currently, more than 10 million DNA sequence variations have been uncovered in the human genome. The most detailed variation discovery efforts have focused on candidate genes involved in cardiovascular disease or in susceptibilities associated with exposure to environmental agents. Here we provide an overview of natural genetic variation from the literature and in 510 human candidate genes resequenced for variation discovery. The average human gene contains 126 biallelic polymorphisms, 46 of which are common (> or =5% minor allele frequency) and 5 of which are found in coding regions. Using this complete picture of genetic diversity, we explore conservation, signatures of selection, and historical recombination to mine information useful for candidate gene association studies. In general, we find that the patterns of human gene variation suggest that no one approach will be appropriate for genetic association studies across all genes. Therefore, many different approaches may be required to identify the elusive genotypes associated with common human phenotypes.
Collapse
Affiliation(s)
- Dana C Crawford
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
| | | | | |
Collapse
|
44
|
Suhre K, Audic S, Claverie JM. Mimivirus gene promoters exhibit an unprecedented conservation among all eukaryotes. Proc Natl Acad Sci U S A 2005; 102:14689-93. [PMID: 16203998 PMCID: PMC1239944 DOI: 10.1073/pnas.0506465102] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2005] [Indexed: 11/18/2022] Open
Abstract
The initial analysis of the recently sequenced genome of Acanthamoeba polyphaga Mimivirus, the largest known double-stranded DNA virus, predicted a proteome of size and complexity more akin to small parasitic bacteria than to other nucleocytoplasmic large DNA viruses and identified numerous functions never before described in a virus. It has been proposed that the Mimivirus lineage could have emerged before the individualization of cellular organisms from the three domains of life. An exhaustive in silico analysis of the noncoding moiety of all known viral genomes now uncovers the unprecedented perfect conservation of an AAAATTGA motif in close to 50% of the Mimivirus genes. This motif preferentially occurs in genes transcribed from the predicted leading strand and is associated with functions required early in the viral infectious cycle, such as transcription and protein translation. A comparison with the known promoter of unicellular eukaryotes, amoebal protists in particular, strongly suggests that the AAAATTGA motif is the structural equivalent of the TATA box core promoter element. This element is specific to the Mimivirus lineage and may correspond to an ancestral promoter structure predating the radiation of the eukaryotic kingdoms. This unprecedented conservation of core promoter regions is another exceptional feature of Mimivirus that again raises the question of its evolutionary origin.
Collapse
Affiliation(s)
- Karsten Suhre
- Information Génomique et Structurale, Centre National de la Recherche Scientifique, Institut de Biologie Structurale et Microbiologie, 13402 Marseille, France.
| | | | | |
Collapse
|
45
|
Fukue Y, Sumida N, Tanase JI, Ohyama T. A highly distinctive mechanical property found in the majority of human promoters and its transcriptional relevance. Nucleic Acids Res 2005; 33:3821-7. [PMID: 16027106 PMCID: PMC1175459 DOI: 10.1093/nar/gki700] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A recent study revealed that TATA boxes and initiator sequences have a common anomalous mechanical property, i.e. they comprise distinctive flexible and rigid sequences when compared with the other parts of the promoter region. In the present study, using the flexibility parameters from two different models, we calculated the average flexibility profiles of 1004 human promoters that do not contain canonical promoter elements, such as a TATA box, initiator (Inr) sequence, downstream promoter element or a GC box, and those of 382 human promoters that contain the GC box only. Here, we show that they have a common characteristic mechanical property that is strikingly similar to those of the TATA box-containing or Inr-containing promoters. Their most interesting feature is that the TATA- or Inr-corresponding region lies in the several nucleotides around the transcription start site. We have also found that a dinucleotide step from −1 to +1 (transcription start site) has a slight tendency to adopt CA that is known to be flexible. We also demonstrate that certain synthetic DNA fragments designed to mimic the average mechanical property of these 1386 promoters can drive transcription. This distinctive mechanical property may be the hallmark of a promoter.
Collapse
Affiliation(s)
| | | | | | - Takashi Ohyama
- To whom correspondence should be addressed. Tel: +81 78 435 2547; Fax: +81 78 435 2539;
| |
Collapse
|
46
|
Vishnevsky OV, Kolchanov NA. ARGO: a web system for the detection of degenerate motifs and large-scale recognition of eukaryotic promoters. Nucleic Acids Res 2005; 33:W417-22. [PMID: 15980502 PMCID: PMC1160220 DOI: 10.1093/nar/gki459] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2005] [Revised: 04/13/2005] [Accepted: 04/13/2005] [Indexed: 11/13/2022] Open
Abstract
Reliable recognition of the promoters in eukaryotic genomes remains an open issue. This is largely owing to the poor understanding of the features of the structural-functional organization of the eukaryotic promoters essential for their function and recognition. However, it was demonstrated that detection of ensembles of regulatory signals characteristic of specific promoter groups increases the accuracy of promoter recognition and prediction of specific expression features of the queried genes. The ARGO_Motifs package was developed for the detection of sets of region-specific degenerate oligonucleotide motifs in the regulatory regions of the eukaryotic genes. The ARGO_Viewer package was developed for the recognition of tissue-specific gene promoters based on the presence and distribution of oligonucleotide motifs obtained by the ARGO_Motifs program. Analysis and recognition of tissue-specific promoters in five gene samples demonstrated high quality of promoter recognition. The public version of the ARGO system is available at http://wwwmgs2.bionet.nsc.ru/argo/ and http://emj-pc.ics.uci.edu/argo/.
Collapse
Affiliation(s)
- Oleg V Vishnevsky
- Institute of Cytology and Genetics, SB RAS Lavrentyev Avenue, 10, Novosibirsk, 630090, Russia.
| | | |
Collapse
|
47
|
Qian J, Esumi N, Chen Y, Wang Q, Chowers I, Zack DJ. Identification of regulatory targets of tissue-specific transcription factors: application to retina-specific gene regulation. Nucleic Acids Res 2005; 33:3479-91. [PMID: 15967807 PMCID: PMC1153713 DOI: 10.1093/nar/gki658] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2005] [Revised: 04/28/2005] [Accepted: 05/26/2005] [Indexed: 01/22/2023] Open
Abstract
Identification of tissue-specific gene regulatory networks can yield insights into the molecular basis of a tissue's development, function and pathology. Here, we present a computational approach designed to identify potential regulatory target genes of photoreceptor cell-specific transcription factors (TFs). The approach is based on the hypothesis that genes related to the retina in terms of expression, disease and/or function are more likely to be the targets of retina-specific TFs than other genes. A list of genes that are preferentially expressed in retina was obtained by integrating expressed sequence tag, SAGE and microarray datasets. The regulatory targets of retina-specific TFs are enriched in this set of retina-related genes. A Bayesian approach was employed to integrate information about binding site location relative to a gene's transcription start site. Our method was applied to three retina-specific TFs, CRX, NRL and NR2E3, and a number of potential targets were predicted. To experimentally assess the validity of the bioinformatic predictions, mobility shift, transient transfection and chromatin immunoprecipitation assays were performed with five predicted CRX targets, and the results were suggestive of CRX regulation in 5/5, 3/5 and 4/5 cases, respectively. Together, these experiments strongly suggest that RP1, GUCY2D, ABCA4 are novel targets of CRX.
Collapse
Affiliation(s)
- Jiang Qian
- Wilmer Institute, Johns Hopkins University School of Medicine Baltimore, MD 21287, USA.
| | | | | | | | | | | |
Collapse
|
48
|
Kanhere A, Bansal M. Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes. Nucleic Acids Res 2005; 33:3165-75. [PMID: 15939933 PMCID: PMC1143579 DOI: 10.1093/nar/gki627] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
During the process of transcription, RNA polymerase can exactly locate a promoter sequence in the complex maze of a genome. Several experimental studies and computational analyses have shown that the promoter sequences apparently possess some special properties, such as unusual DNA structures and low stability, which make them distinct from the rest of the genome. But most of these studies have been carried out on a particular set of promoter sequences or on promoter sequences from similar organisms. To examine whether the promoters from a wide variety of organisms share these special properties, we have carried out an analysis of sets of promoters from bacteria, vertebrates and plants. These promoters were analyzed with respect to the prediction of three different properties, such as DNA curvature, bendability and stability, which are relevant to transcription. All the promoter sequences are predicted to share certain features, such as stability and bendability profiles, but there are significant differences in DNA curvature profiles and nucleotide composition between the different organisms. These similarities and differences are correlated with some of the known facts about transcription process in the promoters from the three groups of organisms.
Collapse
Affiliation(s)
| | - Manju Bansal
- To whom correspondence should be addressed. Tel: +91 80 2293 2534; Fax: +91 80 2360 0535;
| |
Collapse
|
49
|
Kersey P, Bower L, Morris L, Horne A, Petryszak R, Kanz C, Kanapin A, Das U, Michoud K, Phan I, Gattiker A, Kulikova T, Faruque N, Duggan K, Mclaren P, Reimholz B, Duret L, Penel S, Reuter I, Apweiler R. Integr8 and Genome Reviews: integrated views of complete genomes and proteomes. Nucleic Acids Res 2005; 33:D297-302. [PMID: 15608201 PMCID: PMC539993 DOI: 10.1093/nar/gki039] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Integr8 is a new web portal for exploring the biology of organisms with completely deciphered genomes. For over 190 species, Integr8 provides access to general information, recent publications, and a detailed statistical overview of the genome and proteome of the organism. The preparation of this analysis is supported through Genome Reviews, a new database of bacterial and archaeal DNA sequences in which annotation has been upgraded (compared to the original submission) through the integration of data from many sources, including the EMBL Nucleotide Sequence Database, the UniProt Knowledgebase, InterPro, CluSTr, GOA and HOGENOM. Integr8 also allows the users to customize their own interactive analysis, and to download both customized and prepared datasets for their own use. Integr8 is available at http://www.ebi.ac.uk/integr8.
Collapse
Affiliation(s)
- Paul Kersey
- The EMBL Outstation-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Barta E, Sebestyén E, Pálfy TB, Tóth G, Ortutay CP, Patthy L. DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants. Nucleic Acids Res 2005; 33:D86-90. [PMID: 15608291 PMCID: PMC540051 DOI: 10.1093/nar/gki097] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
DoOP (http://doop.abc.hu/) is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21 061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. Users can search DoOP with either sequence or text (annotation) to find promoter clusters of various genes. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed graphically.
Collapse
Affiliation(s)
- Endre Barta
- Agricultural Biotechnology Center, Gödöllo, Szent-Györgyi Albert u. 4, H-2100, Hungary.
| | | | | | | | | | | |
Collapse
|