151
|
Computational prediction of transcription factor binding sites based on an integrative approach incorporating genomic and epigenomic features. Genes Genomics 2014. [DOI: 10.1007/s13258-013-0136-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
152
|
Li Y, Umbach DM, Li L. T-KDE: a method for genome-wide identification of constitutive protein binding sites from multiple ChIP-seq data sets. BMC Genomics 2014; 15:27. [PMID: 24428924 PMCID: PMC3903014 DOI: 10.1186/1471-2164-15-27] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2013] [Accepted: 01/13/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A protein may bind to its target DNA sites constitutively, i.e., regardless of cell type. Intuitively, constitutive binding sites should be biologically functional. A prerequisite for understanding their functional relevance is knowing all their locations for a protein of interest. Genome-wide discovery of constitutive binding sites requires robust and efficient computational methods to integrate results from numerous binding experiments. Such methods are lacking, however. RESULTS To locate constitutive binding sites for a protein using ChIP-seq data for that protein from multiple cell lines, we developed a method, T-KDE, which combines a binary range tree with a kernel density estimator. Using 132 CTCF (CCCTC-binding factor) ChIP-seq datasets, we showed that the number of constitutive sites identified by T-KDE is robust to the choice of tuning parameter and that T-KDE identifies binding site locations more accurately than a binning approach. Furthermore, T-KDE can identify constitutive sites that are missed by a motif-based approach either because a bound site failed to reach the motif significance cutoff or because the peak sequence scanned was too short. By studying sites declared constitutive by T-KDE but not by the motif-based approach, we discovered two new CTCF motif variants. Using ENCODE data on 22 transcription factors (TF) in 132 cell lines, we identified constitutive binding sites for each TF and provide evidence that, for some TFs, they may be biologically meaningful. CONCLUSIONS T-KDE is an efficient and effective method to predict constitutive protein binding sites using ChIP-seq peaks from multiple cell lines. Besides constitutive binding sites for a given protein, T-KDE can identify genomic "hot spots" where several different proteins bind and, conversely, cell-type-specific sites bound by a given protein.
Collapse
Affiliation(s)
| | | | - Leping Li
- Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, Morrisville, NC 27709, USA.
| |
Collapse
|
153
|
Lo Giacco D, Chianese C, Ars E, Ruiz-Castañé E, Forti G, Krausz C. Recurrent X chromosome-linked deletions: discovery of new genetic factors in male infertility. J Med Genet 2014; 51:340-4. [DOI: 10.1136/jmedgenet-2013-101988] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|
154
|
Guo J, Li T, Schipper J, Nilson KA, Fordjour FK, Cooper JJ, Gordân R, Price DH. Sequence specificity incompletely defines the genome-wide occupancy of Myc. Genome Biol 2014; 15:482. [PMID: 25287278 PMCID: PMC4242493 DOI: 10.1186/s13059-014-0482-3] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Accepted: 09/22/2014] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The Myc-Max heterodimer is a transcription factor that regulates expression of a large number of genes. Genome occupancy of Myc-Max is thought to be driven by Enhancer box (E-box) DNA elements, CACGTG or variants, to which the heterodimer binds in vitro. RESULTS By analyzing ChIP-Seq datasets, we demonstrate that the positions occupied by Myc-Max across the human genome correlate with the RNA polymerase II, Pol II, transcription machinery significantly better than with E-boxes. Metagene analyses show that in promoter regions, Myc is uniformly positioned about 100 bp upstream of essentially all promoter proximal paused polymerases with Max about 15 bp upstream of Myc. We re-evaluate the DNA binding properties of full length Myc-Max proteins. Electrophoretic mobility shift assay results demonstrate Myc-Max heterodimers display significant sequence preference, but have high affinity for any DNA. Quantification of the relative affinities of Myc-Max for all possible 8-mers using universal protein-binding microarray assays shows that sequences surrounding core 6-mers significantly affect binding. Compared to the in vitro sequence preferences,Myc-Max genomic occupancy measured by ChIP-Seq is largely, although not completely, independent of sequence specificity. CONCLUSIONS We quantified the affinity of Myc-Max to all possible 8-mers and compared this with the sites of Myc binding across the human genome. Our results indicate that the genomic occupancy of Myc cannot be explained by its intrinsic DNA specificity and suggest that the transcription machinery and associated promoter accessibility play a predominant role in Myc recruitment.
Collapse
Affiliation(s)
- Jiannan Guo
- />Department of Biochemistry, University of Iowa, Iowa City, IA 52242 USA
| | - Tiandao Li
- />Department of Biochemistry, University of Iowa, Iowa City, IA 52242 USA
- />The Genome Institute, Washington University in St. Louis, St. Louis, MO 63108 USA
| | - Joshua Schipper
- />Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708 USA
| | - Kyle A Nilson
- />Molecular and Cellular Biology Program, University of Iowa, Iowa City, IA 52242 USA
| | - Francis K Fordjour
- />Department of Biochemistry, University of Iowa, Iowa City, IA 52242 USA
| | - Jeffrey J Cooper
- />Department of Biochemistry, University of Iowa, Iowa City, IA 52242 USA
| | - Raluca Gordân
- />Institute for Genome Sciences and Policy, Duke University, Durham, NC 27708 USA
| | - David H Price
- />Department of Biochemistry, University of Iowa, Iowa City, IA 52242 USA
- />Molecular and Cellular Biology Program, University of Iowa, Iowa City, IA 52242 USA
| |
Collapse
|
155
|
Predeus AV, Gopalakrishnan S, Huang Y, Tang J, Feeney AJ, Oltz EM, Artyomov MN. Targeted chromatin profiling reveals novel enhancers in Ig H and Ig L chain Loci. THE JOURNAL OF IMMUNOLOGY 2013; 192:1064-70. [PMID: 24353267 DOI: 10.4049/jimmunol.1302800] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The assembly and expression of mouse Ag receptor genes are controlled by a collection of cis-acting regulatory elements, including transcriptional promoters and enhancers. Although many powerful enhancers have been identified for Ig (Ig) and TCR (Tcr) loci, it remained unclear whether additional regulatory elements remain undiscovered. In this study, we use chromatin profiling of pro-B cells to define 38 epigenetic states in mouse Ag receptor loci, each of which reflects a distinct regulatory potential. One of these chromatin states corresponds to known transcriptional enhancers and identifies a new set of candidate elements in all three Ig loci. Four of the candidates were subjected to functional assays, and all four exhibit enhancer activity in B but not in T lineage cells. The new regulatory elements identified by focused chromatin profiling most likely have important functions in the creation, refinement, and expression of Ig repertoires.
Collapse
Affiliation(s)
- Alexander V Predeus
- Department of Pathology, Washington University School of Medicine, St. Louis, MO 63110
| | | | | | | | | | | | | |
Collapse
|
156
|
Brignull LM, Czimmerer Z, Saidi H, Daniel B, Villela I, Bartlett NW, Johnston SL, Meira LB, Nagy L, Nohturfft A. Reprogramming of lysosomal gene expression by interleukin-4 and Stat6. BMC Genomics 2013; 14:853. [PMID: 24314139 PMCID: PMC3880092 DOI: 10.1186/1471-2164-14-853] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 11/26/2013] [Indexed: 01/05/2023] Open
Abstract
Background Lysosomes play important roles in multiple aspects of physiology, but the problem of how the transcription of lysosomal genes is coordinated remains incompletely understood. The goal of this study was to illuminate the physiological contexts in which lysosomal genes are coordinately regulated and to identify transcription factors involved in this control. Results As transcription factors and their target genes are often co-regulated, we performed meta-analyses of array-based expression data to identify regulators whose mRNA profiles are highly correlated with those of a core set of lysosomal genes. Among the ~50 transcription factors that rank highest by this measure, 65% are involved in differentiation or development, and 22% have been implicated in interferon signaling. The most strongly correlated candidate was Stat6, a factor commonly activated by interleukin-4 (IL-4) or IL-13. Publicly available chromatin immunoprecipitation (ChIP) data from alternatively activated mouse macrophages show that lysosomal genes are overrepresented among Stat6-bound targets. Quantification of RNA from wild-type and Stat6-deficient cells indicates that Stat6 promotes the expression of over 100 lysosomal genes, including hydrolases, subunits of the vacuolar H+ ATPase and trafficking factors. While IL-4 inhibits and activates different sets of lysosomal genes, Stat6 mediates only the activating effects of IL-4, by promoting increased expression and by neutralizing undefined inhibitory signals induced by IL-4. Conclusions The current data establish Stat6 as a broadly acting regulator of lysosomal gene expression in mouse macrophages. Other regulators whose expression correlates with lysosomal genes suggest that lysosome function is frequently re-programmed during differentiation, development and interferon signaling.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | - Axel Nohturfft
- Division of Biomedical Sciences, Molecular and Metabolic Signaling Centre, St, George's University of London, Cranmer Terrace, London SW17 0RE, UK.
| |
Collapse
|
157
|
Bailey T, Krajewski P, Ladunga I, Lefebvre C, Li Q, Liu T, Madrigal P, Taslim C, Zhang J. Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput Biol 2013; 9:e1003326. [PMID: 24244136 PMCID: PMC3828144 DOI: 10.1371/journal.pcbi.1003326] [Citation(s) in RCA: 164] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections.
Collapse
Affiliation(s)
- Timothy Bailey
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
- * E-mail: (TB); (PM)
| | - Pawel Krajewski
- Department of Biometry and Bioinformatics, Institute of Plant Genetics, Polish Academy of Sciences, Poznań, Poland
| | - Istvan Ladunga
- Department of Statistics, Beadle Center, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Celine Lefebvre
- Inserm U981, Cancer Institute Gustave Roussy, Villejuif, France
| | - Qunhua Li
- Department of Statistics, Penn State University, University Park, Pennsylvania, United States of America
| | - Tao Liu
- Department of Biochemistry, University at Buffalo, Buffalo, New York, United States of America
| | - Pedro Madrigal
- Department of Biometry and Bioinformatics, Institute of Plant Genetics, Polish Academy of Sciences, Poznań, Poland
- * E-mail: (TB); (PM)
| | - Cenny Taslim
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| | - Jie Zhang
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, United States of America
| |
Collapse
|
158
|
Capra JA, Erwin GD, McKinsey G, Rubenstein JLR, Pollard KS. Many human accelerated regions are developmental enhancers. Philos Trans R Soc Lond B Biol Sci 2013; 368:20130025. [PMID: 24218637 PMCID: PMC3826498 DOI: 10.1098/rstb.2013.0025] [Citation(s) in RCA: 137] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The genetic changes underlying the dramatic differences in form and function between humans and other primates are largely unknown, although it is clear that gene regulatory changes play an important role. To identify regulatory sequences with potentially human-specific functions, we and others used comparative genomics to find non-coding regions conserved across mammals that have acquired many sequence changes in humans since divergence from chimpanzees. These regions are good candidates for performing human-specific regulatory functions. Here, we analysed the DNA sequence, evolutionary history, histone modifications, chromatin state and transcription factor (TF) binding sites of a combined set of 2649 non-coding human accelerated regions (ncHARs) and predicted that at least 30% of them function as developmental enhancers. We prioritized the predicted ncHAR enhancers using analysis of TF binding site gain and loss, along with the functional annotations and expression patterns of nearby genes. We then tested both the human and chimpanzee sequence for 29 ncHARs in transgenic mice, and found 24 novel developmental enhancers active in both species, 17 of which had very consistent patterns of activity in specific embryonic tissues. Of these ncHAR enhancers, five drove expression patterns suggestive of different activity for the human and chimpanzee sequence at embryonic day 11.5. The changes to human non-coding DNA in these ncHAR enhancers may modify the complex patterns of gene expression necessary for proper development in a human-specific manner and are thus promising candidates for understanding the genetic basis of human-specific biology.
Collapse
Affiliation(s)
- John A Capra
- Gladstone Institutes, University of California, , San Francisco, CA 94158, USA
| | | | | | | | | |
Collapse
|
159
|
Maeso I, Irimia M, Tena JJ, Casares F, Gómez-Skarmeta JL. Deep conservation of cis-regulatory elements in metazoans. Philos Trans R Soc Lond B Biol Sci 2013; 368:20130020. [PMID: 24218633 DOI: 10.1098/rstb.2013.0020] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Despite the vast morphological variation observed across phyla, animals share multiple basic developmental processes orchestrated by a common ancestral gene toolkit. These genes interact with each other building complex gene regulatory networks (GRNs), which are encoded in the genome by cis-regulatory elements (CREs) that serve as computational units of the network. Although GRN subcircuits involved in ancient developmental processes are expected to be at least partially conserved, identification of CREs that are conserved across phyla has remained elusive. Here, we review recent studies that revealed such deeply conserved CREs do exist, discuss the difficulties associated with their identification and describe new approaches that will facilitate this search.
Collapse
Affiliation(s)
- Ignacio Maeso
- Department of Zoology, University of Oxford, , Oxford, UK
| | | | | | | | | |
Collapse
|
160
|
Xie D, Boyle AP, Wu L, Zhai J, Kawli T, Snyder M. Dynamic trans-acting factor colocalization in human cells. Cell 2013; 155:713-24. [PMID: 24243024 DOI: 10.1016/j.cell.2013.09.043] [Citation(s) in RCA: 107] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2013] [Revised: 07/13/2013] [Accepted: 08/27/2013] [Indexed: 01/02/2023]
Abstract
Different trans-acting factors (TFs) collaborate and act in concert at distinct loci to perform accurate regulation of their target genes. To date, the cobinding of TF pairs has been investigated in a limited context both in terms of the number of factors within a cell type and across cell types and the extent of combinatorial colocalizations. Here, we use an approach to analyze TF colocalization within a cell type and across multiple cell lines at an unprecedented level. We extend this approach with large-scale mass spectrometry analysis of immunoprecipitations of 50 TFs. Our combined approach reveals large numbers of interesting TF-TF associations. We observe extensive change in TF colocalizations both within a cell type exposed to different conditions and across multiple cell types. We show distinct functional annotations and properties of different TF cobinding patterns and provide insights into the complex regulatory landscape of the cell.
Collapse
Affiliation(s)
- Dan Xie
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | | | | | | | | | |
Collapse
|
161
|
Ferg M, Armant O, Yang L, Dickmeis T, Rastegar S, Strähle U. Gene transcription in the zebrafish embryo: regulators and networks. Brief Funct Genomics 2013; 13:131-43. [PMID: 24152666 DOI: 10.1093/bfgp/elt044] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The precise spatial and temporal control of gene expression is a key process in the development, maintenance and regeneration of the vertebrate body. A substantial proportion of vertebrate genomes encode genes that control the transcription of the genetic information into mRNA. The zebrafish is particularly well suited to investigate gene regulatory networks underlying the control of gene expression during development due to the external development of its transparent embryos and the increasingly sophisticated tools for genetic manipulation available for this model system. We review here recent data on the analysis of cis-regulatory modules, transcriptional regulators and their integration into gene regulatory networks in the zebrafish, using the developing spinal cord as example.
Collapse
Affiliation(s)
- Marco Ferg
- Institute of Toxicology and Genetics, Karlsruhe Institute of Technology (KIT), Postfach 3640, 76021 Karlsruhe, Germany.
| | | | | | | | | | | |
Collapse
|
162
|
Foley JW, Sidow A. Transcription-factor occupancy at HOT regions quantitatively predicts RNA polymerase recruitment in five human cell lines. BMC Genomics 2013; 14:720. [PMID: 24138567 PMCID: PMC3826616 DOI: 10.1186/1471-2164-14-720] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Accepted: 10/04/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-occupancy target (HOT) regions are compact genome loci occupied by many different transcription factors (TFs). HOT regions were initially defined in invertebrate model organisms, and we here show that they are a ubiquitous feature of the human gene-regulation landscape. RESULTS We identified HOT regions by a comprehensive analysis of ChIP-seq data from 96 DNA-associated proteins in 5 human cell lines. Most HOT regions co-localize with RNA polymerase II binding sites, but many are not near the promoters of annotated genes. At HOT promoters, TF occupancy is strongly predictive of transcription preinitiation complex recruitment and moderately predictive of initiating Pol II recruitment, but only weakly predictive of elongating Pol II and RNA transcript abundance. TF occupancy varies quantitatively within human HOT regions; we used this variation to discover novel associations between TFs. The sequence motif associated with any given TF's direct DNA binding is somewhat predictive of its empirical occupancy, but a great deal of occupancy occurs at sites without the TF's motif, implying indirect recruitment by another TF whose motif is present. CONCLUSIONS Mammalian HOT regions are regulatory hubs that integrate the signals from diverse regulatory pathways to quantitatively tune the promoter for RNA polymerase II recruitment.
Collapse
Affiliation(s)
- Joseph W Foley
- Department of Genetics, Stanford University, 300 Pasteur Drive, Stanford, California 94305, USA
- Current address: Douglas Mental Health University Institute, McGill University, 6875 Boulevard LaSalle, Verdun, Québec H4H 1R3, Canada
| | - Arend Sidow
- Department of Genetics, Stanford University, 300 Pasteur Drive, Stanford, California 94305, USA
- Department of Pathology, Stanford University, 300 Pasteur Drive, Stanford, California 94305, USA
| |
Collapse
|
163
|
Khurana E, Fu Y, Colonna V, Mu XJ, Kang HM, Lappalainen T, Sboner A, Lochovsky L, Chen J, Harmanci A, Das J, Abyzov A, Balasubramanian S, Beal K, Chakravarty D, Challis D, Chen Y, Clarke D, Clarke L, Cunningham F, Evani US, Flicek P, Fragoza R, Garrison E, Gibbs R, Gümüş ZH, Herrero J, Kitabayashi N, Kong Y, Lage K, Liluashvili V, Lipkin SM, MacArthur DG, Marth G, Muzny D, Pers TH, Ritchie GRS, Rosenfeld JA, Sisu C, Wei X, Wilson M, Xue Y, Yu F, Dermitzakis ET, Yu H, Rubin MA, Tyler-Smith C, Gerstein M. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 2013; 342:1235587. [PMID: 24092746 PMCID: PMC3947637 DOI: 10.1126/science.1235587] [Citation(s) in RCA: 270] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Interpreting variants, especially noncoding ones, in the increasing number of personal genomes is challenging. We used patterns of polymorphisms in functionally annotated regions in 1092 humans to identify deleterious variants; then we experimentally validated candidates. We analyzed both coding and noncoding regions, with the former corroborating the latter. We found regions particularly sensitive to mutations ("ultrasensitive") and variants that are disruptive because of mechanistic effects on transcription-factor binding (that is, "motif-breakers"). We also found variants in regions with higher network centrality tend to be deleterious. Insertions and deletions followed a similar pattern to single-nucleotide variants, with some notable exceptions (e.g., certain deletions and enhancers). On the basis of these patterns, we developed a computational tool (FunSeq), whose application to ~90 cancer genomes reveals nearly a hundred candidate noncoding drivers.
Collapse
Affiliation(s)
- Ekta Khurana
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
| | - Yao Fu
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
| | - Vincenza Colonna
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus,
Cambridge, CB10 1SA, UK
- Institute of Genetics and Biophysics, National Research Council
(CNR), 80131 Naples, Italy
| | - Xinmeng Jasmine Mu
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
| | - Hyun Min Kang
- Center for Statistical Genetics, Biostatistics, University of
Michigan, Ann Arbor, MI 48109, USA
| | - Tuuli Lappalainen
- Department of Genetic Medicine and Development, University of Geneva
Medical School, 1211 Geneva, Switzerland
- Institute for Genetics and Genomics in Geneva (iGE3), University of
Geneva, 1211 Geneva, Switzerland
- Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| | - Andrea Sboner
- Institute for Precision Medicine and the Department of Pathology and
Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian
Hospital, New York, NY 10065, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute
for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021,
USA
| | - Lucas Lochovsky
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
| | - Jieming Chen
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Integrated Graduate Program in Physical and Engineering Biology,
Yale University, New Haven, CT 06520, USA
| | - Arif Harmanci
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
| | - Jishnu Das
- Department of Biological Statistics and Computational Biology,
Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University,
Ithaca, NY 14853, USA
| | - Alexej Abyzov
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
| | - Suganthi Balasubramanian
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
| | - Kathryn Beal
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dimple Chakravarty
- Institute for Precision Medicine and the Department of Pathology and
Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian
Hospital, New York, NY 10065, USA
| | - Daniel Challis
- Baylor College of Medicine, Human Genome Sequencing Center,
Houston, TX 77030, USA
| | - Yuan Chen
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus,
Cambridge, CB10 1SA, UK
| | - Declan Clarke
- Department of Chemistry, Yale University, New Haven, CT 06520, USA
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Uday S. Evani
- Baylor College of Medicine, Human Genome Sequencing Center,
Houston, TX 77030, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Robert Fragoza
- Weill Institute for Cell and Molecular Biology, Cornell University,
Ithaca, NY 14853, USA
- Department of Molecular Biology and Genetics, Cornell University,
Ithaca, NY 14853, USA
| | - Erik Garrison
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
| | - Richard Gibbs
- Baylor College of Medicine, Human Genome Sequencing Center,
Houston, TX 77030, USA
| | - Zeynep H. Gümüş
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute
for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021,
USA
- Department of Physiology and Biophysics, Weill Cornell Medical
College, New York, NY, 10065, USA
| | - Javier Herrero
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Naoki Kitabayashi
- Institute for Precision Medicine and the Department of Pathology and
Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian
Hospital, New York, NY 10065, USA
| | - Yong Kong
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
- Keck Biotechnology Resource Laboratory, Yale University, New Haven,
CT 06511, USA
| | - Kasper Lage
- Pediatric Surgical Research Laboratories, MassGeneral Hospital for
Children, Massachusetts General Hospital, Boston, MA 02114, USA
- Analytical and Translational Genetics Unit, Massachusetts General
Hospital, Boston, MA 02114, USA
- Harvard Medical School, Boston, MA 02115, USA
- Center for Biological Sequence Analysis, Department of Systems
Biology, Technical University of Denmark, Lyngby, Denmark
- Center for Protein Research, University of Copenhagen, Copenhagen,
Denmark
| | - Vaja Liluashvili
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute
for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021,
USA
- Department of Physiology and Biophysics, Weill Cornell Medical
College, New York, NY, 10065, USA
| | - Steven M. Lipkin
- Department of Medicine, Weill Cornell Medical College, New York, NY
10065, USA
| | - Daniel G. MacArthur
- Analytical and Translational Genetics Unit, Massachusetts General
Hospital, Boston, MA 02114, USA
- Program in Medical and Population Genetics, Broad Institute of
Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA 02142,
USA
| | - Gabor Marth
- Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
| | - Donna Muzny
- Baylor College of Medicine, Human Genome Sequencing Center,
Houston, TX 77030, USA
| | - Tune H. Pers
- Center for Biological Sequence Analysis, Department of Systems
Biology, Technical University of Denmark, Lyngby, Denmark
- Division of Endocrinology and Center for Basic and Translational
Obesity Research, Children’s Hospital, Boston, MA 02115, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Graham R. S. Ritchie
- European Molecular Biology Laboratory, European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jeffrey A. Rosenfeld
- Department of Medicine, Rutgers New Jersey Medical School, Newark,
NJ 07101, USA
- IST/High Performance and Research Computing, Rutgers University
Newark, NJ 07101, USA
- Sackler Institute for Comparative Genomics, American Museum of
Natural History, New York, NY 10024, USA
| | - Cristina Sisu
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
| | - Xiaomu Wei
- Weill Institute for Cell and Molecular Biology, Cornell University,
Ithaca, NY 14853, USA
- Department of Medicine, Weill Cornell Medical College, New York, NY
10065, USA
| | - Michael Wilson
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Child Study Center, Yale University, New Haven, CT 06520, USA
| | - Yali Xue
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus,
Cambridge, CB10 1SA, UK
| | - Fuli Yu
- Baylor College of Medicine, Human Genome Sequencing Center,
Houston, TX 77030, USA
| | | | - Emmanouil T. Dermitzakis
- Department of Genetic Medicine and Development, University of Geneva
Medical School, 1211 Geneva, Switzerland
- Institute for Genetics and Genomics in Geneva (iGE3), University of
Geneva, 1211 Geneva, Switzerland
- Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology,
Cornell University, Ithaca, NY 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University,
Ithaca, NY 14853, USA
| | - Mark A. Rubin
- Institute for Precision Medicine and the Department of Pathology and
Laboratory Medicine, Weill Cornell Medical College and New York-Presbyterian
Hospital, New York, NY 10065, USA
| | - Chris Tyler-Smith
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus,
Cambridge, CB10 1SA, UK
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale
University, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale
University, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, New Haven, CT
06520, USA
| |
Collapse
|
164
|
Ago1 Interacts with RNA polymerase II and binds to the promoters of actively transcribed genes in human cancer cells. PLoS Genet 2013; 9:e1003821. [PMID: 24086155 PMCID: PMC3784563 DOI: 10.1371/journal.pgen.1003821] [Citation(s) in RCA: 98] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 07/24/2013] [Indexed: 12/31/2022] Open
Abstract
Argonaute proteins are often credited for their cytoplasmic activities in which they function as central mediators of the RNAi platform and microRNA (miRNA)-mediated processes. They also facilitate heterochromatin formation and establishment of repressive epigenetic marks in the nucleus of fission yeast and plants. However, the nuclear functions of Ago proteins in mammalian cells remain elusive. In the present study, we combine ChIP-seq (chromatin immunoprecipitation coupled with massively parallel sequencing) with biochemical assays to show that nuclear Ago1 directly interacts with RNA Polymerase II and is widely associated with chromosomal loci throughout the genome with preferential enrichment in promoters of transcriptionally active genes. Additional analyses show that nuclear Ago1 regulates the expression of Ago1-bound genes that are implicated in oncogenic pathways including cell cycle progression, growth, and survival. Our findings reveal the first landscape of human Ago1-chromosomal interactions, which may play a role in the oncogenic transcriptional program of cancer cells. Argonaute (Ago) proteins are an evolutionarily conserved family of proteins indispensable for a gene regulation mechanism known as RNA interference (RNAi) which is mediated by small RNA including microRNA (miRNA) and small interfering RNA (siRNA) and occurs mainly in the cytoplasm. In mammalian cells, however, the function of Agos in the nucleus is largely unknown despite a few examples in which Agos are shown to be involved in regulating gene transcription and alternative splicing. In this study, by taking a genome-wide approach, we found that human Ago1, but not Ago2, is pervasively associated with gene regulatory sequences known as promoter and interacts with the core component of the gene transcription machinery to exert positive impact on gene expression in cancer cells. Strikingly, the genes bound and regulated by Ago1 are mostly genes that stimulate cell growth and survival, and are known to be involved in the development of cancer. The findings from our study unveil an unexpected role of nuclear Ago1 in regulating gene expression which may be important both in normal cellular processes and in disease such as cancer.
Collapse
|
165
|
Giacopelli F, Cappato S, Tonachini L, Mura M, Di Lascio S, Fornasari D, Ravazzolo R, Bocciardi R. Identification and characterization of regulatory elements in the promoter of ACVR1, the gene mutated in Fibrodysplasia Ossificans Progressiva. Orphanet J Rare Dis 2013; 8:145. [PMID: 24047559 PMCID: PMC4015442 DOI: 10.1186/1750-1172-8-145] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 09/03/2013] [Indexed: 12/12/2022] Open
Abstract
Background The ACVR1 gene encodes a type I receptor for bone morphogenetic proteins (BMPs). Mutations in the ACVR1 gene are associated with Fibrodysplasia Ossificans Progressiva (FOP), a rare and extremely disabling disorder characterized by congenital malformation of the great toes and progressive heterotopic endochondral ossification in muscles and other non-skeletal tissues. Several aspects of FOP pathophysiology are still poorly understood, including mechanisms regulating ACVR1 expression. This work aimed to identify regulatory elements that control ACVR1 gene transcription. Methods and results We first characterized the structure and composition of human ACVR1 gene transcripts by identifying the transcription start site, and then characterized a 2.9 kb upstream region. This region showed strong activating activity when tested by reporter gene assays in transfected cells. We identified specific elements within the 2.9 kb region that are important for transcription factor binding using deletion constructs, co-transfection experiments with plasmids expressing selected transcription factors, site-directed mutagenesis of consensus binding-site sequences, and by protein/DNA binding assays. We also characterized a GC-rich minimal promoter region containing binding sites for the Sp1 transcription factor. Conclusions Our results showed that several transcription factors such as Egr-1, Egr-2, ZBTB7A/LRF, and Hey1, regulate the ACVR1 promoter by binding to the -762/-308 region, which is essential to confer maximal transcriptional activity. The Sp1 transcription factor acts at the most proximal promoter segment upstream of the transcription start site. We observed significant differences in different cell types suggesting tissue specificity of transcriptional regulation. These findings provide novel insights into the molecular mechanisms that regulate expression of the ACVR1 gene and that could be targets of new strategies for future therapeutic treatments.
Collapse
Affiliation(s)
- Francesca Giacopelli
- Department of Neurosciences, Rehabilitation, Ophthalmogy, Genetics, Maternal and Child Health and CEBR, Università degli Studi di Genova, Genova, Italy.
| | | | | | | | | | | | | | | |
Collapse
|
166
|
Lehrach H. DNA sequencing methods in human genetics and disease research. F1000PRIME REPORTS 2013; 5:34. [PMID: 24049638 PMCID: PMC3768324 DOI: 10.12703/p5-34] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
DNA sequencing has revolutionized biological and medical research, and is poised to have a similar impact in medicine. This tool is just one of a number of developments in our capability to identify, quantitate and functionally characterize the components of the biological networks keeping us healthy or making us sick, but in many respects it has played the leading role in this process. The new technologies do, however, also provide a bridge between genotype and phenotype, both in man and model (as well as all other) organisms, revolutionize the identification of elements involved in a multitude of human diseases or other phenotypes, and generate a wealth of medically relevant information on every single person, as the basis of a truly personalized medicine of the future.
Collapse
Affiliation(s)
- Hans Lehrach
- Max Planck Institute for Molecular GeneticsIhnestrasse 73, 14195, BerlinGermany
- Dahlem Centre for Genome Research and Medical Systems BiologyFabeckstrasse 60-62, 14195 BerlinGermany
- Alacris Theranostics GmbHFabeckstrasse. 60-62, 14195 BerlinGermany
| |
Collapse
|
167
|
De S, Pedersen BS, Kechris K. The dilemma of choosing the ideal permutation strategy while estimating statistical significance of genome-wide enrichment. Brief Bioinform 2013; 15:919-28. [PMID: 23956260 DOI: 10.1093/bib/bbt053] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Integrative analyses of genomic, epigenomic and transcriptomic features for human and various model organisms have revealed that many such features are nonrandomly distributed in the genome. Significant enrichment (or depletion) of genomic features is anticipated to be biologically important. Detection of genomic regions having enrichment of certain features and estimation of corresponding statistical significance rely on the expected null distribution generated by a permutation model. We discuss different genome-wide permutation approaches, present examples where the permutation strategy affects the null model and show that the confidence in estimating statistical significance of genome-wide enrichment might depend on the choice of the permutation approach. In those cases, where biologically relevant constraints are unclear, it is preferable to examine whether key conclusions are consistent, irrespective of the choice of the randomization strategy.
Collapse
|
168
|
Yan J, Enge M, Whitington T, Dave K, Liu J, Sur I, Schmierer B, Jolma A, Kivioja T, Taipale M, Taipale J. Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 2013; 154:801-13. [PMID: 23953112 DOI: 10.1016/j.cell.2013.07.034] [Citation(s) in RCA: 271] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Revised: 05/23/2013] [Accepted: 07/23/2013] [Indexed: 10/26/2022]
Abstract
During cell division, transcription factors (TFs) are removed from chromatin twice, during DNA synthesis and during condensation of chromosomes. How TFs can efficiently find their sites following these stages has been unclear. Here, we have analyzed the binding pattern of expressed TFs in human colorectal cancer cells. We find that binding of TFs is highly clustered and that the clusters are enriched in binding motifs for several major TF classes. Strikingly, almost all clusters are formed around cohesin, and loss of cohesin decreases both DNA accessibility and binding of TFs to clusters. We show that cohesin remains bound in S phase, holding the nascent sister chromatids together at the TF cluster sites. Furthermore, cohesin remains bound to the cluster sites when TFs are evicted in early M phase. These results suggest that cohesin-binding functions as a cellular memory that promotes re-establishment of TF clusters after DNA replication and chromatin condensation.
Collapse
Affiliation(s)
- Jian Yan
- Science for Life Laboratory, Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm 14183, Sweden
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
169
|
Blobel GA, Hardison RC. A cluster to remember. Cell 2013; 154:718-20. [PMID: 23953105 PMCID: PMC3878159 DOI: 10.1016/j.cell.2013.07.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Based on a massive transcription factor location analysis within a single cell type, in this issue Yan et al. find that the great majority of occupancies occur within dense clusters of up to 100 factors that almost invariably contain cohesins. Retention of cohesins at cluster sites during mitosis raises the possibility that they contribute to transcriptional memory during the cell cycle.
Collapse
Affiliation(s)
- Gerd A. Blobel
- Division of Hematology, The Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ross C. Hardison
- Department of Biochemistry and Molecular Biology, Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
170
|
Cheng Q, Kazemian M, Pham H, Blatti C, Celniker SE, Wolfe SA, Brodsky MH, Sinha S. Computational identification of diverse mechanisms underlying transcription factor-DNA occupancy. PLoS Genet 2013; 9:e1003571. [PMID: 23935523 PMCID: PMC3731213 DOI: 10.1371/journal.pgen.1003571] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2012] [Accepted: 05/02/2013] [Indexed: 12/13/2022] Open
Abstract
ChIP-based genome-wide assays of transcription factor (TF) occupancy have emerged as a powerful, high-throughput method to understand transcriptional regulation, especially on a global scale. This has led to great interest in the underlying biochemical mechanisms that direct TF-DNA binding, with the ultimate goal of computationally predicting a TF's occupancy profile in any cellular condition. In this study, we examined the influence of various potential determinants of TF-DNA binding on a much larger scale than previously undertaken. We used a thermodynamics-based model of TF-DNA binding, called “STAP,” to analyze 45 TF-ChIP data sets from Drosophila embryonic development. We built a cross-validation framework that compares a baseline model, based on the ChIP'ed (“primary”) TF's motif, to more complex models where binding by secondary TFs is hypothesized to influence the primary TF's occupancy. Candidates interacting TFs were chosen based on RNA-SEQ expression data from the time point of the ChIP experiment. We found widespread evidence of both cooperative and antagonistic effects by secondary TFs, and explicitly quantified these effects. We were able to identify multiple classes of interactions, including (1) long-range interactions between primary and secondary motifs (separated by ≤150 bp), suggestive of indirect effects such as chromatin remodeling, (2) short-range interactions with specific inter-site spacing biases, suggestive of direct physical interactions, and (3) overlapping binding sites suggesting competitive binding. Furthermore, by factoring out the previously reported strong correlation between TF occupancy and DNA accessibility, we were able to categorize the effects into those that are likely to be mediated by the secondary TF's effect on local accessibility and those that utilize accessibility-independent mechanisms. Finally, we conducted in vitro pull-down assays to test model-based predictions of short-range cooperative interactions, and found that seven of the eight TF pairs tested physically interact and that some of these interactions mediate cooperative binding to DNA. Chromatin Immunoprecipitation (ChIP)-based genome-wide assays of transcription factor (TF) occupancy have emerged as a powerful, high throughput method to understand transcriptional regulation, especially on a global scale. Here, we utilize 45 ChIP-chip and ChIP-SEQ data sets from Drosophila to explore the underlying mechanisms of TF-DNA binding. For this, we employ a biophysically motivated computational model, in conjunction with over 300 TF motifs (binding specificities) as well as gene expression and DNA accessibility data from different developmental stages in Drosophila embryos. Our findings provide robust statistical evidence of the role played by TF-TF interactions in shaping genome-wide TF-DNA binding profiles, and thus in directing gene regulation. Our method allows us to go beyond simply recognizing the existence of such interactions, to quantifying their effects on TF occupancy. We are able to categorize the probable mechanisms of these effects as involving direct physical interactions versus accessibility-mediated indirect interactions, long-range versus short-range interactions, and cooperative versus antagonistic interactions. Our analysis reveals widespread evidence of combinatorial regulation present in recently generated ChIP data sets, and sets the stage for rich integrative models of the future that will predict cell type-specific TF occupancy values from sequence and expression data.
Collapse
Affiliation(s)
- Qiong Cheng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Majid Kazemian
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Hannah Pham
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Charles Blatti
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Susan E. Celniker
- Department of Genome Dynamics, Berkeley Drosophila Genome Project, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Scot A. Wolfe
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Michael H. Brodsky
- Program in Gene Function and Expression, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- Department of Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- * E-mail: (MHB); (SS)
| | - Saurabh Sinha
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- Institute of Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail: (MHB); (SS)
| |
Collapse
|
171
|
Schlesinger F, Smith AD, Gingeras TR, Hannon GJ, Hodges E. De novo DNA demethylation and noncoding transcription define active intergenic regulatory elements. Genome Res 2013; 23:1601-14. [PMID: 23811145 PMCID: PMC3787258 DOI: 10.1101/gr.157271.113] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Deep sequencing of mammalian DNA methylomes has uncovered a previously unpredicted number of discrete hypomethylated regions in intergenic space (iHMRs). Here, we combined whole-genome bisulfite sequencing data with extensive gene expression and chromatin-state data to define functional classes of iHMRs, and to reconstruct the dynamics of their establishment in a developmental setting. Comparing HMR profiles in embryonic stem and primary blood cells, we show that iHMRs mark an exclusive subset of active DNase hypersensitive sites (DHS), and that both developmentally constitutive and cell-type-specific iHMRs display chromatin states typical of distinct regulatory elements. We also observe that iHMR changes are more predictive of nearby gene activity than the promoter HMR itself, and that expression of noncoding RNAs within the iHMR accompanies full activation and complete demethylation of mature B cell enhancers. Conserved sequence features corresponding to iHMR transcript start sites, including a discernible TATA motif, suggest a conserved, functional role for transcription in these regions. Similarly, we explored both primate-specific and human population variation at iHMRs, finding that while enhancer iHMRs are more variable in sequence and methylation status than any other functional class, conservation of the TATA box is highly predictive of iHMR maintenance, reflecting the impact of sequence plasticity and transcriptional signals on iHMR establishment. Overall, our analysis allowed us to construct a three-step timeline in which (1) intergenic DHS are pre-established in the stem cell, (2) partial demethylation of blood-specific intergenic DHSs occurs in blood progenitors, and (3) complete iHMR formation and transcription coincide with enhancer activation in lymphoid-specified cells.
Collapse
Affiliation(s)
- Felix Schlesinger
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA
| | | | | | | | | |
Collapse
|
172
|
Kilpatrick AM, Ward B, Aitken S. MCOIN: a novel heuristic for determining transcription factor binding site motif width. Algorithms Mol Biol 2013; 8:16. [PMID: 23806098 PMCID: PMC3716798 DOI: 10.1186/1748-7188-8-16] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 06/24/2013] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND In transcription factor binding site discovery, the true width of the motif to be discovered is generally not known a priori. The ability to compute the most likely width of a motif is therefore a highly desirable property for motif discovery algorithms. However, this is a challenging computational problem as a result of changing model dimensionality at changing motif widths. The complexity of the problem is increased as the discovered model at the true motif width need not be the most statistically significant in a set of candidate motif models. Further, the core motif discovery algorithm used cannot guarantee to return the best possible result at each candidate width. RESULTS We present MCOIN, a novel heuristic for automatically determining transcription factor binding site motif width, based on motif containment and information content. Using realistic synthetic data and previously characterised prokaryotic data, we show that MCOIN outperforms the current most popular method (E-value of the resulting multiple alignment) as a predictor of motif width, based on mean absolute error. MCOIN is also shown to choose models which better match known sites at higher levels of motif conservation, based on ROC analysis. CONCLUSIONS We demonstrate the performance of MCOIN as part of a deterministic motif discovery algorithm and conclude that MCOIN outperforms current methods for determining motif width.
Collapse
Affiliation(s)
- Alastair M Kilpatrick
- School of Informatics, University of Edinburgh, Informatics Forum, 10 Crichton Street, EH8 9AB Edinburgh, Scotland
| | - Bruce Ward
- School of Biological Sciences, University of Edinburgh, Darwin Building, King’s Buildings, Mayfield Road, EH9 3JR Edinburgh, Scotland
| | - Stuart Aitken
- School of Informatics, University of Edinburgh, Informatics Forum, 10 Crichton Street, EH8 9AB Edinburgh, Scotland
| |
Collapse
|
173
|
Whyte WA, Orlando DA, Hnisz D, Abraham BJ, Lin CY, Kagey MH, Rahl PB, Lee TI, Young RA. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 2013; 153:307-19. [PMID: 23582322 DOI: 10.1016/j.cell.2013.03.035] [Citation(s) in RCA: 2784] [Impact Index Per Article: 253.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2012] [Revised: 02/25/2013] [Accepted: 03/25/2013] [Indexed: 02/07/2023]
Abstract
Master transcription factors Oct4, Sox2, and Nanog bind enhancer elements and recruit Mediator to activate much of the gene expression program of pluripotent embryonic stem cells (ESCs). We report here that the ESC master transcription factors form unusual enhancer domains at most genes that control the pluripotent state. These domains, which we call super-enhancers, consist of clusters of enhancers that are densely occupied by the master regulators and Mediator. Super-enhancers differ from typical enhancers in size, transcription factor density and content, ability to activate transcription, and sensitivity to perturbation. Reduced levels of Oct4 or Mediator cause preferential loss of expression of super-enhancer-associated genes relative to other genes, suggesting how changes in gene expression programs might be accomplished during development. In other more differentiated cells, super-enhancers containing cell-type-specific master transcription factors are also found at genes that define cell identity. Super-enhancers thus play key roles in the control of mammalian cell identity.
Collapse
Affiliation(s)
- Warren A Whyte
- Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
174
|
Harmston N, Lenhard B. Chromatin and epigenetic features of long-range gene regulation. Nucleic Acids Res 2013; 41:7185-99. [PMID: 23766291 PMCID: PMC3753629 DOI: 10.1093/nar/gkt499] [Citation(s) in RCA: 88] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The precise regulation of gene transcription during metazoan development is controlled by a complex system of interactions between transcription factors, histone modifications and modifying enzymes and chromatin conformation. Developments in chromosome conformation capture technologies have revealed that interactions between regions of chromatin are pervasive and highly cell-type specific. The movement of enhancers and promoters in and out of higher-order chromatin structures within the nucleus are associated with changes in expression and histone modifications. However, the factors responsible for mediating these changes and determining enhancer:promoter specificity are still not completely known. In this review, we summarize what is known about the patterns of epigenetic and chromatin features characteristic of elements involved in long-range interactions. In addition, we review the insights into both local and global patterns of chromatin interactions that have been revealed by the latest experimental and computational methods.
Collapse
Affiliation(s)
- Nathan Harmston
- MRC Clinical Sciences Centre, Faculty of Medicine, Imperial College, London W12 0NN, UK, Institute of Clinical Sciences, Faculty of Medicine, Imperial College, London W12 0NN, UK and Department of Informatics, University of Bergen, Thromøhlensgate 55, N-5008 Bergen, Norway
| | | |
Collapse
|
175
|
Chica C, Szarzynska B, Chen-Min-Tao R, Duvernois-Berthet E, Kassam M, Colot V, Roudier F. Profiling spatial enrichment of chromatin marks suggests an additional epigenomic dimension in gene regulation. FRONTIERS IN LIFE SCIENCE 2013. [DOI: 10.1080/21553769.2013.844734] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
176
|
Wang C, Zhang MQ, Zhang Z. Computational identification of active enhancers in model organisms. GENOMICS, PROTEOMICS & BIOINFORMATICS 2013; 11:142-50. [PMID: 23685394 PMCID: PMC4357786 DOI: 10.1016/j.gpb.2013.04.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/28/2012] [Revised: 04/01/2013] [Accepted: 04/20/2013] [Indexed: 12/11/2022]
Abstract
As a class of cis-regulatory elements, enhancers were first identified as the genomic regions that are able to markedly increase the transcription of genes nearly 30years ago. Enhancers can regulate gene expression in a cell-type specific and developmental stage specific manner. Although experimental technologies have been developed to identify enhancers genome-wide, the design principle of the regulatory elements and the way they rewire the transcriptional regulatory network tempo-spatially are far from clear. At present, developing predictive methods for enhancers, particularly for the cell-type specific activity of enhancers, is central to computational biology. In this review, we survey the current computational approaches for active enhancer prediction and discuss future directions.
Collapse
Affiliation(s)
- Chengqi Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| | - Michael Q. Zhang
- Department of Molecular Cell Biology, Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
- Bioinformatics Division, Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing 100084, China
| | - Zhihua Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
177
|
Abstract
By its very nature, genomics produces large, high-dimensional datasets that are well suited to analysis by machine learning approaches. Here, we explain some key aspects of machine learning that make it useful for genome annotation, with illustrative examples from ENCODE.
Collapse
Affiliation(s)
- Kevin Y Yip
- Program in Computational Biology and Bioinformatics, Yale University, 260/266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, 260/266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
- Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
- CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Chao Cheng
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH 03755, USA
- Institute for Quantitative Biomedical Sciences, Norris Cotton Cancer Center, Geisel School of Medicine at Dartmouth, Lebanon, NH 03766, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, 260/266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, 260/266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA
| |
Collapse
|
178
|
Qu H, Fang X. A brief review on the Human Encyclopedia of DNA Elements (ENCODE) project. GENOMICS PROTEOMICS & BIOINFORMATICS 2013; 11:135-41. [PMID: 23722115 PMCID: PMC4357814 DOI: 10.1016/j.gpb.2013.05.001] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2013] [Revised: 05/15/2013] [Accepted: 05/18/2013] [Indexed: 12/18/2022]
Abstract
The ENCyclopedia Of DNA Elements (ENCODE) project is an international research consortium that aims to identify all functional elements in the human genome sequence. The second phase of the project comprised 1640 datasets from 147 different cell types, yielding a set of 30 publications across several journals. These data revealed that 80.4% of the human genome displays some functionality in at least one cell type. Many of these regulatory elements are physically associated with one another and further form a network or three-dimensional conformation to affect gene expression. These elements are also related to sequence variants associated with diseases or traits. All these findings provide us new insights into the organization and regulation of genes and genome, and serve as an expansive resource for understanding human health and disease.
Collapse
|
179
|
Abstract
The ENCyclopedia Of DNA Elements (ENCODE) project is a public research consortium that aims to identify all functional elements of the human genome sequence. The project comprised 1640 data sets, from 147 different cell type and the findings were released in a coordinated set of 34 publications across several journals. The ENCODE publications report that 80.4% of the human genome displays some functionality. These data have important implications for interpreting results from large-scale genetics studies. We reviewed some of the key findings from the ENCODE publications and discuss how they can influence or inform further investigations into the genetic factors contributing to neuropsychiatric disorders.
Collapse
|
180
|
Ernst J, Kellis M. Interplay between chromatin state, regulator binding, and regulatory motifs in six human cell types. Genome Res 2013; 23:1142-54. [PMID: 23595227 PMCID: PMC3698507 DOI: 10.1101/gr.144840.112] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The regions bound by sequence-specific transcription factors can be highly variable across different cell types despite the static nature of the underlying genome sequence. This has been partly attributed to changes in chromatin accessibility, but a systematic picture has been hindered by the lack of large-scale data sets. Here, we use 456 binding experiments for 119 regulators and 84 chromatin maps generated by the ENCODE in six human cell types, and relate those to a global map of regulatory motif instances for these factors. We find specific and robust chromatin state preferences for each regulator beyond the previously reported open-chromatin association, suggesting a much richer chromatin landscape beyond simple accessibility. The preferentially bound chromatin states of regulators were enriched for sequence motifs of regulators relative to all states, suggesting that these preferences are at least partly encoded by the genomic sequence. Relative to all regions bound by a regulator, however, regulatory motifs were surprisingly depleted in the regulator's preferentially bound states, suggesting additional non-sequence-specific binding beyond the level predicted by the regulatory motifs. Such permissive binding was largely restricted to open-chromatin regions showing histone modification marks characteristic of active enhancer and promoter regions, whereas open-chromatin regions lacking such marks did not show permissive binding. Lastly, the vast majority of cobinding of regulator pairs is predicted by the chromatin state preferences of individual regulators. Overall, our results suggest a joint role of sequence motifs and specific chromatin states beyond mere accessibility in mediating regulator binding dynamics across different cell types.
Collapse
Affiliation(s)
- Jason Ernst
- Department of Biological Chemistry, David Geffen School of Medicine, University of California, Los Angeles, USA
| | | |
Collapse
|
181
|
de Graaf CA, van Steensel B. Chromatin organization: form to function. Curr Opin Genet Dev 2013; 23:185-90. [DOI: 10.1016/j.gde.2012.11.011] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2012] [Revised: 11/12/2012] [Accepted: 11/19/2012] [Indexed: 11/17/2022]
|
182
|
Van Nostrand EL, Kim SK. Integrative analysis of C. elegans modENCODE ChIP-seq data sets to infer gene regulatory interactions. Genome Res 2013; 23:941-53. [PMID: 23531767 PMCID: PMC3668362 DOI: 10.1101/gr.152876.112] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The C. elegans modENCODE Consortium has defined in vivo binding sites for a large array of transcription factors by ChIP-seq. In this article, we present examples that illustrate how this compendium of ChIP-seq data can drive biological insights not possible with analysis of individual factors. First, we analyze the number of independent factors bound to the same locus, termed transcription factor complexity, and find that low-complexity sites are more likely to respond to altered expression of a single bound transcription factor. Next, we show that comparison of binding sites for the same factor across developmental stages can reveal insight into the regulatory network of that factor, as we find that the transcription factor UNC-62 has distinct binding profiles at different stages due to distinct cofactor co-association as well as tissue-specific alternative splicing. Finally, we describe an approach to infer potential regulators of gene expression changes found in profiling experiments (such as DNA microarrays) by screening these altered genes to identify significant enrichment for targets of a transcription factor identified in ChIP-seq data sets. After confirming that this approach can correctly identify the upstream regulator on expression data sets for which the regulator was previously known, we applied this approach to identify novel candidate regulators of transcriptional changes with age. The analysis revealed nine candidate aging regulators, of which three were previously known to have a role in longevity. We experimentally showed that two of the new candidate aging regulators can extend lifespan when overexpressed, indicating that this approach can identify novel functional regulators of complex processes.
Collapse
Affiliation(s)
- Eric L Van Nostrand
- Department of Genetics and Department of Developmental Biology, Stanford University Medical Center, Stanford, California 94305, USA
| | | |
Collapse
|
183
|
Klein K, Zanger UM. Pharmacogenomics of Cytochrome P450 3A4: Recent Progress Toward the "Missing Heritability" Problem. Front Genet 2013; 4:12. [PMID: 23444277 PMCID: PMC3580761 DOI: 10.3389/fgene.2013.00012] [Citation(s) in RCA: 161] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2012] [Accepted: 01/26/2013] [Indexed: 12/19/2022] Open
Abstract
CYP3A4 is the most important drug metabolizing enzyme in adult humans because of its prominent expression in liver and gut and because of its broad substrate specificity, which includes drugs from most therapeutic categories and many endogenous substances. Expression and function of CYP3A4 vary extensively both intra- and interindividually thus contributing to unpredictable drug response and toxicity. A multitude of environmental, genetic, and physiological factors are known to influence CYP3A4 expression and activity. Among the best predictable sources of variation are drug–drug interactions, which are either caused by pregnane X-receptor (PXR), constitutive androstane receptor (CAR) mediated gene induction, or by inhibition through coadministered drugs or other chemicals, including also plant and food ingredients. Among physiological and pathophysiological factors are hormonal status, age, and gender, the latter of which was shown to result in higher levels in females compared to males, as well as inflammatory processes that downregulate CYP3A4 transcription. Despite the influence of these non-genetic factors, the genetic influence on CYP3A4 activity was estimated in previous twin studies and using information on repeated drug administration to account for 66% up to 88% of the interindividual variation. Although many single nucleotide polymorphisms (SNPs) within the CYP3A locus have been identified, genetic association studies have so far failed to explain a major part of the phenotypic variability. The term “missing heritability” has been used to denominate the gap between expected and known genetic contribution, e.g., for complex diseases, and is also used here in analogy. In this review we summarize CYP3A4 pharmacogenetics/genomics from the early inheritance estimations up to the most recent genetic and clinical studies, including new findings about SNPs in CYP3A4 (*22) and other genes (P450 oxidoreductase (POR), peroxisome proliferator-activated receptor alpha (PPARA)) with possible contribution to CYP3A4 variable expression.
Collapse
Affiliation(s)
- Kathrin Klein
- Dr. Margarete Fischer Bosch Institute of Clinical Pharmacology, Stuttgart Stuttgart, Germany ; University of Tübingen Tübingen, Germany
| | | |
Collapse
|
184
|
Kindt ASD, Navarro P, Semple CAM, Haley CS. The genomic signature of trait-associated variants. BMC Genomics 2013; 14:108. [PMID: 23418889 PMCID: PMC3600003 DOI: 10.1186/1471-2164-14-108] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2012] [Accepted: 02/11/2013] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Genome-wide association studies have identified thousands of SNP variants associated with hundreds of phenotypes. For most associations the causal variants and the molecular mechanisms underlying pathogenesis remain unknown. Exploration of the underlying functional annotations of trait-associated loci has thrown some light on their potential roles in pathogenesis. However, there are some shortcomings of the methods used to date, which may undermine efforts to prioritize variants for further analyses. Here, we introduce and apply novel methods to rigorously identify annotation classes showing enrichment or depletion of trait-associated variants taking into account the underlying associations due to co-location of different functional annotations and linkage disequilibrium. RESULTS We assessed enrichment and depletion of variants in publicly available annotation classes such as genic regions, regulatory features, measures of conservation, and patterns of histone modifications. We used logistic regression to build a multivariate model that identified the most influential functional annotations for trait-association status of genome-wide significant variants. SNPs associated with all of the enriched annotations were 8 times more likely to be trait-associated variants than SNPs annotated with none of them. Annotations associated with chromatin state together with prior knowledge of the existence of a local expression QTL (eQTL) were the most important factors in the final logistic regression model. Surprisingly, despite the widespread use of evolutionary conservation to prioritize variants for study we find only modest enrichment of trait-associated SNPs in conserved regions. CONCLUSION We established odds ratios of functional annotations that are more likely to contain significantly trait-associated SNPs, for the purpose of prioritizing GWAS hits for further studies. Additionally, we estimated the relative and combined influence of the different genomic annotations, which may facilitate future prioritization methods by adding substantial information.
Collapse
Affiliation(s)
- Alida S D Kindt
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, EH4 2XU, Edinburgh, UK
| | - Pau Navarro
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, EH4 2XU, Edinburgh, UK
| | - Colin A M Semple
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, EH4 2XU, Edinburgh, UK
| | - Chris S Haley
- MRC Human Genetics Unit, MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, EH4 2XU, Edinburgh, UK
| |
Collapse
|
185
|
Abstract
Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.
Collapse
Affiliation(s)
- Elaine R Mardis
- The Genome Institute at Washington University School of Medicine, St. Louis, Missouri 63108, USA.
| |
Collapse
|
186
|
Hoffman MM, Ernst J, Wilder SP, Kundaje A, Harris RS, Libbrecht M, Giardine B, Ellenbogen PM, Bilmes JA, Birney E, Hardison RC, Dunham I, Kellis M, Noble WS. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 2012; 41:827-41. [PMID: 23221638 PMCID: PMC3553955 DOI: 10.1093/nar/gks1284] [Citation(s) in RCA: 369] [Impact Index Per Article: 30.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative toward the annotation of regulatory elements, their interrelations contain much richer information for the systematic annotation of regulatory elements. To uncover these interrelations and to generate an interpretable summary of the massive datasets of the ENCODE Project, we apply unsupervised learning methodologies, converting dozens of chromatin datasets into discrete annotation maps of regulatory regions and other chromatin elements across the human genome. These methods rediscover and summarize diverse aspects of chromatin architecture, elucidate the interplay between chromatin activity and RNA transcription, and reveal that a large proportion of the genome lies in a quiescent state, even across multiple cell types. The resulting annotation of non-coding regulatory elements correlate strongly with mammalian evolutionary constraint, and provide an unbiased approach for evaluating metrics of evolutionary constraint in human. Lastly, we use the regulatory annotations to revisit previously uncharacterized disease-associated loci, resulting in focused, testable hypotheses through the lens of the chromatin landscape.
Collapse
Affiliation(s)
- Michael M Hoffman
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195-5065, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
187
|
Seemab U, Ain QU, Nawaz MS, Saeed Z, Rashid S. TrFAST: a tool to predict signaling pathway-specific transcription factor binding sites. GENOMICS PROTEOMICS & BIOINFORMATICS 2012; 10:354-9. [PMID: 23317703 PMCID: PMC5054711 DOI: 10.1016/j.gpb.2012.06.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Revised: 06/10/2012] [Accepted: 06/12/2012] [Indexed: 11/30/2022]
Abstract
Recent advances in the development of high-throughput tools have significantly revolutionized our understanding of molecular mechanisms underlying normal and dysfunctional biological processes. Here we present a novel computational tool, transcription factor search and analysis tool (TrFAST), which was developed for the in silico analysis of transcription factor binding sites (TFBSs) of signaling pathway-specific TFs. TrFAST facilitates searching as well as comparative analysis of regulatory motifs through an exact pattern matching algorithm followed by the graphical representation of matched binding sites in multiple sequences up to 50 kb in length. TrFAST is proficient in reducing the number of comparisons by the exact pattern matching strategy. In contrast to the pre-existing tools that find TFBS in a single sequence, TrFAST seeks out the desired pattern in multiple sequences simultaneously. It counts the GC content within the given multiple sequence data set and assembles the combinational details of consensus sequence(s) located at these regions, thereby generating a visual display based on the abundance of unique pattern. Comparative regulatory region analysis of multiple orthologous sequences simultaneously enhances the features of TrFAST and provides a significant insight into study of conservation of non-coding cis-regulatory elements. TrFAST is freely available at http://www.fi-pk.com/trfast.html.
Collapse
Affiliation(s)
- Umair Seemab
- National Centre for Bioinformatics, Quaid-i-Azam University, Islamabad 44000, Pakistan.
| | | | | | | | | |
Collapse
|
188
|
Marsman J, Horsfield JA. Long distance relationships: enhancer-promoter communication and dynamic gene transcription. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2012; 1819:1217-27. [PMID: 23124110 DOI: 10.1016/j.bbagrm.2012.10.008] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2012] [Revised: 10/18/2012] [Accepted: 10/22/2012] [Indexed: 11/27/2022]
Abstract
The three-dimensional regulation of gene transcription involves loop formation between enhancer and promoter elements, controlling spatiotemporal gene expression in multicellular organisms. Enhancers are usually located in non-coding DNA and can activate gene transcription by recruiting transcription factors, chromatin remodeling factors and RNA Polymerase II. Research over the last few years has revealed that enhancers have tell-tale characteristics that facilitate their detection by several approaches, although the hallmarks of enhancers are not always uniform. Enhancers likely play an important role in the activation of genes by functioning as a primary point of contact for transcriptional activators, and by making physical contact with gene promoters often by means of a chromatin loop. Although numerous transcriptional regulators participate in the formation of chromatin loops that bring enhancers into proximity with promoters, the mechanism(s) of enhancer-promoter connectivity remain enigmatic. Here we discuss enhancer function, review some of the many proteins shown to be involved in establishing enhancer-promoter loops, and describe the dynamics of enhancer-promoter contacts during development, differentiation and in specific cell types.
Collapse
Affiliation(s)
- Judith Marsman
- Department of Pathology, The University of Otago, Dunedin, New Zealand
| | | |
Collapse
|
189
|
Architecture of the human regulatory network derived from ENCODE data. Nature 2012; 489:91-100. [PMID: 22955619 DOI: 10.1038/nature11245] [Citation(s) in RCA: 1086] [Impact Index Per Article: 90.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2011] [Accepted: 05/22/2012] [Indexed: 12/21/2022]
Abstract
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.
Collapse
|
190
|
Pei B, Sisu C, Frankish A, Howald C, Habegger L, Mu XJ, Harte R, Balasubramanian S, Tanzer A, Diekhans M, Reymond A, Hubbard TJ, Harrow J, Gerstein MB. The GENCODE pseudogene resource. Genome Biol 2012; 13:R51. [PMID: 22951037 PMCID: PMC3491395 DOI: 10.1186/gb-2012-13-9-r51] [Citation(s) in RCA: 253] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2012] [Revised: 05/30/2012] [Accepted: 06/25/2012] [Indexed: 12/11/2022] Open
Abstract
Background Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent evidence suggests that many of them might have some form of biological activity, and the possibility of functionality has increased interest in their accurate annotation and integration with functional genomics data. Results As part of the GENCODE annotation of the human genome, we present the first genome-wide pseudogene assignment for protein-coding genes, based on both large-scale manual annotation and in silico pipelines. A key aspect of this coupled approach is that it allows us to identify pseudogenes in an unbiased fashion as well as untangle complex events through manual evaluation. We integrate the pseudogene annotations with the extensive ENCODE functional genomics information. In particular, we determine the expression level, transcription-factor and RNA polymerase II binding, and chromatin marks associated with each pseudogene. Based on their distribution, we develop simple statistical models for each type of activity, which we validate with large-scale RT-PCR-Seq experiments. Finally, we compare our pseudogenes with conservation and variation data from primate alignments and the 1000 Genomes project, producing lists of pseudogenes potentially under selection. Conclusions At one extreme, some pseudogenes possess conventional characteristics of functionality; these may represent genes that have recently died. On the other hand, we find interesting patterns of partial activity, which may suggest that dead genes are being resurrected as functioning non-coding RNAs. The activity data of each pseudogene are stored in an associated resource, psiDR, which will be useful for the initial identification of potentially functional pseudogenes.
Collapse
Affiliation(s)
- Baikang Pei
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
191
|
Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi AM, Tanzer A, Lagarde J, Lin W, Schlesinger F, Xue C, Marinov GK, Khatun J, Williams BA, Zaleski C, Rozowsky J, Röder M, Kokocinski F, Abdelhamid RF, Alioto T, Antoshechkin I, Baer MT, Bar NS, Batut P, Bell K, Bell I, Chakrabortty S, Chen X, Chrast J, Curado J, Derrien T, Drenkow J, Dumais E, Dumais J, Duttagupta R, Falconnet E, Fastuca M, Fejes-Toth K, Ferreira P, Foissac S, Fullwood MJ, Gao H, Gonzalez D, Gordon A, Gunawardena H, Howald C, Jha S, Johnson R, Kapranov P, King B, Kingswood C, Luo OJ, Park E, Persaud K, Preall JB, Ribeca P, Risk B, Robyr D, Sammeth M, Schaffer L, See LH, Shahab A, Skancke J, Suzuki AM, Takahashi H, Tilgner H, Trout D, Walters N, Wang H, Wrobel J, Yu Y, Ruan X, Hayashizaki Y, Harrow J, Gerstein M, Hubbard T, Reymond A, Antonarakis SE, Hannon G, Giddings MC, Ruan Y, Wold B, Carninci P, Guigó R, Gingeras TR. Landscape of transcription in human cells. Nature 2012; 489:101-8. [PMID: 22955620 PMCID: PMC3684276 DOI: 10.1038/nature11233] [Citation(s) in RCA: 3828] [Impact Index Per Article: 319.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2011] [Accepted: 05/15/2012] [Indexed: 02/07/2023]
Abstract
Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.
Collapse
Affiliation(s)
- Sarah Djebali
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Carrie A. Davis
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Angelika Merkel
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Alex Dobin
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Timo Lassmann
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Ali M. Mortazavi
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
- University of California Irvine, Dept of. Developmental and Cell Biology, 2300 Biological Sciences III, Irving, CA USA 92697
| | - Andrea Tanzer
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Julien Lagarde
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Wei Lin
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Felix Schlesinger
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Chenghai Xue
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Georgi K. Marinov
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Jainab Khatun
- Boise State University, College of Arts & Sciences, 1910 University Dr. Boise, ID USA 83725
| | - Brian A. Williams
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Chris Zaleski
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Joel Rozowsky
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520
| | - Maik Röder
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Felix Kokocinski
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire United Kingdom CB10 1SA
| | - Rehab F. Abdelhamid
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Tyler Alioto
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Igor Antoshechkin
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Michael T. Baer
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Nadav S. Bar
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Philippe Batut
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Kimberly Bell
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Ian Bell
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - Sudipto Chakrabortty
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Xian Chen
- University of North Carolina at Chapel Hill, Department of Biochemistry & Biophysics, 120 Mason Farm Rd., Chapel Hill, NC USA 27599
| | - Jacqueline Chrast
- University of Lausanne, Center for Integrative Genomics, Genopode building, Lausanne, Switzerland 1015
| | - Joao Curado
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Thomas Derrien
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Jorg Drenkow
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Erica Dumais
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - Jacqueline Dumais
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - Radha Duttagupta
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - Emilie Falconnet
- University of Geneva Medical School, Department of Genetic Medicine and Development and iGE3 Institute of Genetics and Genomics of Geneva, 1 rue Michel-Servet, Geneva, Switzerland 1015
| | - Meagan Fastuca
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Kata Fejes-Toth
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Pedro Ferreira
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Sylvain Foissac
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - Melissa J. Fullwood
- Genome Institute of Singapore, Genome Technology and Biology, 60 Biopolis Street, #02-01, Genome, Singapore, Singapore 138672
| | - Hui Gao
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| | - David Gonzalez
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Assaf Gordon
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Harsha Gunawardena
- University of North Carolina at Chapel Hill, Department of Biochemistry & Biophysics, 120 Mason Farm Rd., Chapel Hill, NC USA 27599
| | - Cedric Howald
- University of Lausanne, Center for Integrative Genomics, Genopode building, Lausanne, Switzerland 1015
| | - Sonali Jha
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Rory Johnson
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Philipp Kapranov
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
- St. Laurent Institute, One Kendall Square, Cambridge, MA
| | - Brandon King
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Colin Kingswood
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Oscar J. Luo
- Genome Institute of Singapore, Genome Technology and Biology, 60 Biopolis Street, #02-01, Genome, Singapore, Singapore 138672
| | - Eddie Park
- University of California Irvine, Dept of. Developmental and Cell Biology, 2300 Biological Sciences III, Irving, CA USA 92697
| | - Kimberly Persaud
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Jonathan B. Preall
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Paolo Ribeca
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Brian Risk
- Boise State University, College of Arts & Sciences, 1910 University Dr. Boise, ID USA 83725
| | - Daniel Robyr
- University of Geneva Medical School, Department of Genetic Medicine and Development and iGE3 Institute of Genetics and Genomics of Geneva, 1 rue Michel-Servet, Geneva, Switzerland 1015
| | - Michael Sammeth
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Lorian Schaffer
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Lei-Hoon See
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Atif Shahab
- Genome Institute of Singapore, Genome Technology and Biology, 60 Biopolis Street, #02-01, Genome, Singapore, Singapore 138672
| | - Jorgen Skancke
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
- Department of Chemical Engineering, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Ana Maria Suzuki
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Hazuki Takahashi
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Hagen Tilgner
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Diane Trout
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Nathalie Walters
- University of Lausanne, Center for Integrative Genomics, Genopode building, Lausanne, Switzerland 1015
| | - Huaien Wang
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - John Wrobel
- Boise State University, College of Arts & Sciences, 1910 University Dr. Boise, ID USA 83725
| | - Yanbao Yu
- University of North Carolina at Chapel Hill, Department of Biochemistry & Biophysics, 120 Mason Farm Rd., Chapel Hill, NC USA 27599
| | - Xiaoan Ruan
- Genome Institute of Singapore, Genome Technology and Biology, 60 Biopolis Street, #02-01, Genome, Singapore, Singapore 138672
| | - Yoshihide Hayashizaki
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Jennifer Harrow
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire United Kingdom CB10 1SA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520
- Department of Computer Science, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520
| | - Tim Hubbard
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire United Kingdom CB10 1SA
| | - Alexandre Reymond
- University of Lausanne, Center for Integrative Genomics, Genopode building, Lausanne, Switzerland 1015
| | - Stylianos E. Antonarakis
- University of Geneva Medical School, Department of Genetic Medicine and Development and iGE3 Institute of Genetics and Genomics of Geneva, 1 rue Michel-Servet, Geneva, Switzerland 1015
| | - Gregory Hannon
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
| | - Morgan C. Giddings
- Boise State University, College of Arts & Sciences, 1910 University Dr. Boise, ID USA 83725
- University of North Carolina at Chapel Hill, Department of Biochemistry & Biophysics, 120 Mason Farm Rd., Chapel Hill, NC USA 27599
| | - Yijun Ruan
- Genome Institute of Singapore, Genome Technology and Biology, 60 Biopolis Street, #02-01, Genome, Singapore, Singapore 138672
| | - Barbara Wold
- California Institute of Technology, Division of Biology, 91125. 2 Beckman Institute, Pasadena, CA USA 91125
| | - Piero Carninci
- RIKEN Yokohama Institute, RIKEN Omics Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa Japan 230-0045
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88 . Barcelona, Catalunya, Spain 08003
| | - Thomas R. Gingeras
- Cold Spring Harbor Laboratory, Functional Genomics, 1 Bungtown Rd. Cold Spring Harbor, NY, USA 11742
- Affymetrix, Inc, 3380 Central Expressway, Santa Clara, CA. USA 95051
| |
Collapse
|
192
|
Abstract
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Collapse
|