1
|
Breeze CE, Reynolds AP, van Dongen J, Dunham I, Lazar J, Neph S, Vierstra J, Bourque G, Teschendorff AE, Stamatoyannopoulos JA, Beck S. eFORGE v2.0: updated analysis of cell type-specific signal in epigenomic data. Bioinformatics 2020; 35:4767-4769. [PMID: 31161210 PMCID: PMC6853678 DOI: 10.1093/bioinformatics/btz456] [Citation(s) in RCA: 68] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Revised: 04/24/2019] [Accepted: 05/29/2019] [Indexed: 12/31/2022] Open
Abstract
SUMMARY The Illumina Infinium EPIC BeadChip is a new high-throughput array for DNA methylation analysis, extending the earlier 450k array by over 400 000 new sites. Previously, a method named eFORGE was developed to provide insights into cell type-specific and cell-composition effects for 450k data. Here, we present a significantly updated and improved version of eFORGE that can analyze both EPIC and 450k array data. New features include analysis of chromatin states, transcription factor motifs and DNase I footprints, providing tools for epigenome-wide association study interpretation and epigenome editing. AVAILABILITY AND IMPLEMENTATION eFORGE v2.0 is implemented as a web tool available from https://eforge.altiusinstitute.org and https://eforge-tf.altiusinstitute.org/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Charles E Breeze
- Medical Genomics Group, UCL Cancer Institute, University College London, London WC1E 6BT, UK.,Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Alex P Reynolds
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Jenny van Dongen
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam 1081BT, The Netherlands
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge CB10 1SD, UK
| | - John Lazar
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Shane Neph
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Jeff Vierstra
- Altius Institute for Biomedical Sciences, Seattle, WA 98121, USA
| | - Guillaume Bourque
- Department of Human Genetics, McGill University and Génome Québec Innovation Center, Montréal H3A 0G1, Canada
| | - Andrew E Teschendorff
- CAS Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.,Statistical Genomics Group, UCL Cancer Institute, University College London, London WC1E 6BT, UK
| | | | - Stephan Beck
- Medical Genomics Group, UCL Cancer Institute, University College London, London WC1E 6BT, UK
| |
Collapse
|
2
|
Sullivan AM, Arsovski AA, Thompson A, Sandstrom R, Thurman RE, Neph S, Johnson AK, Sullivan ST, Sabo PJ, Neri FV, Weaver M, Diegel M, Nemhauser JL, Stamatoyannopoulos JA, Bubb KL, Queitsch C. Mapping and Dynamics of Regulatory DNA in Maturing Arabidopsis thaliana Siliques. Front Plant Sci 2019; 10:1434. [PMID: 31798605 PMCID: PMC6868056 DOI: 10.3389/fpls.2019.01434] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 10/16/2019] [Indexed: 05/04/2023]
Abstract
The genome is reprogrammed during development to produce diverse cell types, largely through altered expression and activity of key transcription factors. The accessibility and critical functions of epidermal cells have made them a model for connecting transcriptional events to development in a range of model systems. In Arabidopsis thaliana and many other plants, fertilization triggers differentiation of specialized epidermal seed coat cells that have a unique morphology caused by large extracellular deposits of polysaccharides. Here, we used DNase I-seq to generate regulatory landscapes of A. thaliana seeds at two critical time points in seed coat maturation (4 and 7 DPA), enriching for seed coat cells with the INTACT method. We found over 3,000 developmentally dynamic regulatory DNA elements and explored their relationship with nearby gene expression. The dynamic regulatory elements were enriched for motifs for several transcription factors families; most notably the TCP family at the earlier time point and the MYB family at the later one. To assess the extent to which the observed regulatory sites in seeds added to previously known regulatory sites in A. thaliana, we compared our data to 11 other data sets generated with 7-day-old seedlings for diverse tissues and conditions. Surprisingly, over a quarter of the regulatory, i.e. accessible, bases observed in seeds were novel. Notably, plant regulatory landscapes from different tissues, cell types, or developmental stages were more dynamic than those generated from bulk tissue in response to environmental perturbations, highlighting the importance of extending studies of regulatory DNA to single tissues and cell types during development.
Collapse
Affiliation(s)
| | - Andrej A. Arsovski
- Department of Biology, University of Washington, Seattle, WA, United States
| | - Agnieszka Thompson
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Richard Sandstrom
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Robert E. Thurman
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Shane Neph
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Audra K. Johnson
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Shawn T. Sullivan
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Peter J. Sabo
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Fidencio V. Neri
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Molly Weaver
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | - Morgan Diegel
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| | | | | | - Kerry L. Bubb
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
- *Correspondence: Kerry L. Bubb,
| | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA, United States
| |
Collapse
|
3
|
Abstract
The bulk of modern genomics research includes, in part, analyses of large data sets, such as those derived from high resolution, high-throughput experiments, that make computations challenging. The BEDOPS toolkit offers a broad spectrum of fundamental analysis capabilities to query, operate on, and compare quantitatively genomic data sets of any size and number. The toolkit facilitates the construction of complex analysis pipelines that remain efficient in both memory and time by chaining together combinations of its complementary components. The principal utilities accept raw or compressed data in a flexible format, and they provide built-in features to expedite parallel computations.
Collapse
Affiliation(s)
- Shane Neph
- Department of Genome Sciences, Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, Seattle, WA, 98195, USA
| | - Alex P Reynolds
- Department of Genome Sciences, Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, Seattle, WA, 98195, USA
| | - M Scott Kuehn
- Opower Inc., 760 Market Street, San Francisco, CA, 94102, USA
| | - John A Stamatoyannopoulos
- Altius Institute for Biomedical Sciences, 2211 Elliott Avenue, Seattle, WA, 98195, USA.
- Department of Medicine, University of Washington, Seattle, WA, USA.
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
| |
Collapse
|
4
|
Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, Sandstrom R, Ma Z, Davis C, Pope BD, Shen Y, Pervouchine DD, Djebali S, Thurman RE, Kaul R, Rynes E, Kirilusha A, Marinov GK, Williams BA, Trout D, Amrhein H, Fisher-Aylor K, Antoshechkin I, DeSalvo G, See LH, Fastuca M, Drenkow J, Zaleski C, Dobin A, Prieto P, Lagarde J, Bussotti G, Tanzer A, Denas O, Li K, Bender MA, Zhang M, Byron R, Groudine MT, McCleary D, Pham L, Ye Z, Kuan S, Edsall L, Wu YC, Rasmussen MD, Bansal MS, Kellis M, Keller CA, Morrissey CS, Mishra T, Jain D, Dogan N, Harris RS, Cayting P, Kawli T, Boyle AP, Euskirchen G, Kundaje A, Lin S, Lin Y, Jansen C, Malladi VS, Cline MS, Erickson DT, Kirkup VM, Learned K, Sloan CA, Rosenbloom KR, Lacerda de Sousa B, Beal K, Pignatelli M, Flicek P, Lian J, Kahveci T, Lee D, Kent WJ, Ramalho Santos M, Herrero J, Notredame C, Johnson A, Vong S, Lee K, Bates D, Neri F, Diegel M, Canfield T, Sabo PJ, Wilken MS, Reh TA, Giste E, Shafer A, Kutyavin T, Haugen E, Dunn D, Reynolds AP, Neph S, Humbert R, Hansen RS, De Bruijn M, Selleri L, Rudensky A, Josefowicz S, Samstein R, Eichler EE, Orkin SH, Levasseur D, Papayannopoulou T, Chang KH, Skoultchi A, Gosh S, Disteche C, Treuting P, Wang Y, Weiss MJ, Blobel GA, Cao X, Zhong S, Wang T, Good PJ, Lowdon RF, Adams LB, Zhou XQ, Pazin MJ, Feingold EA, Wold B, Taylor J, Mortazavi A, Weissman SM, Stamatoyannopoulos JA, Snyder MP, Guigo R, Gingeras TR, Gilbert DM, Hardison RC, Beer MA, Ren B. A comparative encyclopedia of DNA elements in the mouse genome. Nature 2015; 515:355-64. [PMID: 25409824 PMCID: PMC4266106 DOI: 10.1038/nature13992] [Citation(s) in RCA: 1135] [Impact Index Per Article: 126.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 10/24/2014] [Indexed: 12/11/2022]
Abstract
The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
Collapse
Affiliation(s)
- Feng Yue
- 1] Ludwig Institute for Cancer Research and University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, California 92093, USA. [2] Department of Biochemistry and Molecular Biology, College of Medicine, The Pennsylvania State University, Hershey, Pennsylvania 17033, USA
| | - Yong Cheng
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Alessandra Breschi
- Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, 08003 Barcelona, Catalonia, Spain
| | - Jeff Vierstra
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Weisheng Wu
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Tyrone Ryba
- Department of Biological Science, 319 Stadium Drive, Florida State University, Tallahassee, Florida 32306-4295, USA
| | - Richard Sandstrom
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Zhihai Ma
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Carrie Davis
- Functional Genomics, Cold Spring Harbor Laboratory, Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | - Benjamin D Pope
- Department of Biological Science, 319 Stadium Drive, Florida State University, Tallahassee, Florida 32306-4295, USA
| | - Yin Shen
- Ludwig Institute for Cancer Research and University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Dmitri D Pervouchine
- Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, 08003 Barcelona, Catalonia, Spain
| | - Sarah Djebali
- Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, 08003 Barcelona, Catalonia, Spain
| | - Robert E Thurman
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Rajinder Kaul
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Eric Rynes
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Anthony Kirilusha
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Georgi K Marinov
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Brian A Williams
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Diane Trout
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Henry Amrhein
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Katherine Fisher-Aylor
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Igor Antoshechkin
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Gilberto DeSalvo
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - Lei-Hoon See
- Functional Genomics, Cold Spring Harbor Laboratory, Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | - Meagan Fastuca
- Functional Genomics, Cold Spring Harbor Laboratory, Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | - Jorg Drenkow
- Functional Genomics, Cold Spring Harbor Laboratory, Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | - Chris Zaleski
- Functional Genomics, Cold Spring Harbor Laboratory, Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | - Alex Dobin
- Functional Genomics, Cold Spring Harbor Laboratory, Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | - Pablo Prieto
- Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, 08003 Barcelona, Catalonia, Spain
| | - Julien Lagarde
- Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, 08003 Barcelona, Catalonia, Spain
| | - Giovanni Bussotti
- Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, 08003 Barcelona, Catalonia, Spain
| | - Andrea Tanzer
- 1] Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, 08003 Barcelona, Catalonia, Spain. [2] Department of Theoretical Chemistry, Faculty of Chemistry, University of Vienna, Waehringerstrasse 17/3/303, A-1090 Vienna, Austria
| | - Olgert Denas
- Departments of Biology and Mathematics and Computer Science, Emory University, O. Wayne Rollins Research Center, 1510 Clifton Road NE, Atlanta, Georgia 30322, USA
| | - Kanwei Li
- Departments of Biology and Mathematics and Computer Science, Emory University, O. Wayne Rollins Research Center, 1510 Clifton Road NE, Atlanta, Georgia 30322, USA
| | - M A Bender
- 1] Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA. [2] Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | - Miaohua Zhang
- Basic Science Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | - Rachel Byron
- Basic Science Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | - Mark T Groudine
- 1] Basic Science Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA. [2] Department of Radiation Oncology, University of Washington, Seattle, Washington 98195, USA
| | - David McCleary
- Ludwig Institute for Cancer Research and University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Long Pham
- Ludwig Institute for Cancer Research and University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Zhen Ye
- Ludwig Institute for Cancer Research and University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Samantha Kuan
- Ludwig Institute for Cancer Research and University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Lee Edsall
- Ludwig Institute for Cancer Research and University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Yi-Chieh Wu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts 02139, USA
| | - Matthew D Rasmussen
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts 02139, USA
| | - Mukul S Bansal
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts 02139, USA
| | - Manolis Kellis
- 1] Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts 02139, USA. [2] Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA
| | - Cheryl A Keller
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Christapher S Morrissey
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Tejaswini Mishra
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Deepti Jain
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Nergiz Dogan
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Robert S Harris
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Philip Cayting
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Trupti Kawli
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Alan P Boyle
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Ghia Euskirchen
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Shin Lin
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Yiing Lin
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Camden Jansen
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, California 92697, USA
| | - Venkat S Malladi
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Melissa S Cline
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, California 95064, USA
| | - Drew T Erickson
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Vanessa M Kirkup
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, California 95064, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, California 95064, USA
| | - Cricket A Sloan
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Kate R Rosenbloom
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, California 95064, USA
| | - Beatriz Lacerda de Sousa
- Departments of Obstetrics/Gynecology and Pathology, and Center for Reproductive Sciences, University of California San Francisco, San Francisco, California 94143, USA
| | - Kathryn Beal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Miguel Pignatelli
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jin Lian
- Yale University, Department of Genetics, PO Box 208005, 333 Cedar Street, New Haven, Connecticut 06520-8005, USA
| | - Tamer Kahveci
- Computer &Information Sciences &Engineering, University of Florida, Gainesville, Florida 32611, USA
| | - Dongwon Lee
- McKusick-Nathans Institute of Genetic Medicine and Department of Biomedical Engineering, Johns Hopkins University, 733 N. Broadway, BRB 573 Baltimore, Maryland 21205, USA
| | - W James Kent
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, California 95064, USA
| | - Miguel Ramalho Santos
- Departments of Obstetrics/Gynecology and Pathology, and Center for Reproductive Sciences, University of California San Francisco, San Francisco, California 94143, USA
| | - Javier Herrero
- 1] European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. [2] Bill Lyons Informatics Centre, UCL Cancer Institute, University College London, London WC1E 6DD, UK
| | - Cedric Notredame
- Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, 08003 Barcelona, Catalonia, Spain
| | - Audra Johnson
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Shinny Vong
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Kristen Lee
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Daniel Bates
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Fidencio Neri
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Morgan Diegel
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Theresa Canfield
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Peter J Sabo
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Matthew S Wilken
- Department of Biological Structure, University of Washington, HSB I-516, 1959 NE Pacific Street, Seattle, Washington 98195, USA
| | - Thomas A Reh
- Department of Biological Structure, University of Washington, HSB I-516, 1959 NE Pacific Street, Seattle, Washington 98195, USA
| | - Erika Giste
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Anthony Shafer
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Tanya Kutyavin
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Eric Haugen
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Douglas Dunn
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Alex P Reynolds
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Shane Neph
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Richard Humbert
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - R Scott Hansen
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Marella De Bruijn
- MRC Molecular Haemotology Unit, University of Oxford, Oxford OX3 9DS, UK
| | - Licia Selleri
- Department of Cell and Developmental Biology, Weill Cornell Medical College, New York, New York 10065, USA
| | - Alexander Rudensky
- HHMI and Ludwig Center at Memorial Sloan Kettering Cancer Center, Immunology Program, Memorial Sloan Kettering Cancer Canter, New York, New York 10065, USA
| | - Steven Josefowicz
- HHMI and Ludwig Center at Memorial Sloan Kettering Cancer Center, Immunology Program, Memorial Sloan Kettering Cancer Canter, New York, New York 10065, USA
| | - Robert Samstein
- HHMI and Ludwig Center at Memorial Sloan Kettering Cancer Center, Immunology Program, Memorial Sloan Kettering Cancer Canter, New York, New York 10065, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Stuart H Orkin
- Dana Farber Cancer Institute, Harvard Medical School, Cambridge, Massachusetts 02138, USA
| | - Dana Levasseur
- University of Iowa Carver College of Medicine, Department of Internal Medicine, Iowa City, Iowa 52242, USA
| | - Thalia Papayannopoulou
- Division of Hematology, Department of Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Kai-Hsin Chang
- University of Iowa Carver College of Medicine, Department of Internal Medicine, Iowa City, Iowa 52242, USA
| | - Arthur Skoultchi
- Department of Cell Biology, Albert Einstein College of Medicine, Bronx, New York 10461, USA
| | - Srikanta Gosh
- Department of Cell Biology, Albert Einstein College of Medicine, Bronx, New York 10461, USA
| | - Christine Disteche
- Department of Pathology, University of Washington, Seattle, Washington 98195, USA
| | - Piper Treuting
- Department of Comparative Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Yanli Wang
- Bioinformatics and Genomics program, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Mitchell J Weiss
- Department of Hematology, St Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Gerd A Blobel
- 1] Division of Hematology, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA. [2] Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Xiaoyi Cao
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Sheng Zhong
- Department of Bioengineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
| | - Ting Wang
- Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri 63108, USA
| | - Peter J Good
- NHGRI, National Institutes of Health, 5635 Fishers Lane, Bethesda, Maryland 20892-9307, USA
| | - Rebecca F Lowdon
- NHGRI, National Institutes of Health, 5635 Fishers Lane, Bethesda, Maryland 20892-9307, USA
| | - Leslie B Adams
- NHGRI, National Institutes of Health, 5635 Fishers Lane, Bethesda, Maryland 20892-9307, USA
| | - Xiao-Qiao Zhou
- NHGRI, National Institutes of Health, 5635 Fishers Lane, Bethesda, Maryland 20892-9307, USA
| | - Michael J Pazin
- NHGRI, National Institutes of Health, 5635 Fishers Lane, Bethesda, Maryland 20892-9307, USA
| | - Elise A Feingold
- NHGRI, National Institutes of Health, 5635 Fishers Lane, Bethesda, Maryland 20892-9307, USA
| | - Barbara Wold
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | - James Taylor
- Departments of Biology and Mathematics and Computer Science, Emory University, O. Wayne Rollins Research Center, 1510 Clifton Road NE, Atlanta, Georgia 30322, USA
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, California 92697, USA
| | - Sherman M Weissman
- Yale University, Department of Genetics, PO Box 208005, 333 Cedar Street, New Haven, Connecticut 06520-8005, USA
| | | | - Michael P Snyder
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477 Stanford, California 94305, USA
| | - Roderic Guigo
- Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, 08003 Barcelona, Catalonia, Spain
| | - Thomas R Gingeras
- Functional Genomics, Cold Spring Harbor Laboratory, Bungtown Road, Cold Spring Harbor, New York 11724, USA
| | - David M Gilbert
- Department of Biological Science, 319 Stadium Drive, Florida State University, Tallahassee, Florida 32306-4295, USA
| | - Ross C Hardison
- Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| | - Michael A Beer
- McKusick-Nathans Institute of Genetic Medicine and Department of Biomedical Engineering, Johns Hopkins University, 733 N. Broadway, BRB 573 Baltimore, Maryland 21205, USA
| | - Bing Ren
- Ludwig Institute for Cancer Research and University of California, San Diego School of Medicine, 9500 Gilman Drive, La Jolla, California 92093, USA
| | | |
Collapse
|
5
|
Stergachis AB, Neph S, Sandstrom R, Haugen E, Reynolds AP, Zhang M, Byron R, Canfield T, Stelhing-Sun S, Lee K, Thurman RE, Vong S, Bates D, Neri F, Diegel M, Giste E, Dunn D, Vierstra J, Hansen RS, Johnson AK, Sabo PJ, Wilken MS, Reh TA, Treuting PM, Kaul R, Groudine M, Bender MA, Borenstein E, Stamatoyannopoulos JA. Conservation of trans-acting circuitry during mammalian regulatory evolution. Nature 2015; 515:365-70. [PMID: 25409825 PMCID: PMC4405208 DOI: 10.1038/nature13972] [Citation(s) in RCA: 176] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Accepted: 10/15/2014] [Indexed: 12/27/2022]
Abstract
The basic body plan and major physiological axes have been highly conserved during mammalian evolution, yet only a small fraction of the human genome sequence appears to be subject to evolutionary constraint. To quantify cis- versus trans-acting contributions to mammalian regulatory evolution, we performed genomic DNase I footprinting of the mouse genome across 25 cell and tissue types, collectively defining ∼8.6 million transcription factor (TF) occupancy sites at nucleotide resolution. Here we show that mouse TF footprints conjointly encode a regulatory lexicon that is ∼95% similar with that derived from human TF footprints. However, only ∼20% of mouse TF footprints have human orthologues. Despite substantial turnover of the cis-regulatory landscape, nearly half of all pairwise regulatory interactions connecting mouse TF genes have been maintained in orthologous human cell types through evolutionary innovation of TF recognition sequences. Furthermore, the higher-level organization of mouse TF-to-TF connections into cellular network architectures is nearly identical with human. Our results indicate that evolutionary selection on mammalian gene regulation is targeted chiefly at the level of trans-regulatory circuitry, enabling and potentiating cis-regulatory plasticity. Mouse genomic footprinting reveals conservation of transcription factor (TF) recognition repertoires and trans-regulatory circuitry despite massive turnover of DNA elements that contact TFs in vivo. Having generated genomic DNase I footprinting data of the mouse genome across 25 cell and tissue types, these authors use these data to quantify cis-versus-trans regulatory contributions to mammalian regulatory evolution. They describe more than 600 motifs that collectively are over 95% similar to that recognized in vivo by human transcription factors (TFs). Despite substantial turnover of the cis-regulatory landscape around each TF gene, nearly half of all pairwise regulatory interactions connecting mouse TF genes have been maintained in orthologous human cell types through evolutionary innovation of TF recognition sequences. Conservation between mouse and human TF regulatory networks is particularly similar at the highest organization level. The work was performed as part of the mouse ENCODE project.
Collapse
Affiliation(s)
- Andrew B Stergachis
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Shane Neph
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Richard Sandstrom
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Eric Haugen
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Alex P Reynolds
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Miaohua Zhang
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | - Rachel Byron
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA
| | - Theresa Canfield
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Sandra Stelhing-Sun
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Kristen Lee
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Robert E Thurman
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Shinny Vong
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Daniel Bates
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Fidencio Neri
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Morgan Diegel
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Erika Giste
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Douglas Dunn
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Jeff Vierstra
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - R Scott Hansen
- 1] Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA [2] Department of Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Audra K Johnson
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Peter J Sabo
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - Matthew S Wilken
- Department of Biological Structure, University of Washington, Seattle, Washington 98195, USA
| | - Thomas A Reh
- Department of Biological Structure, University of Washington, Seattle, Washington 98195, USA
| | - Piper M Treuting
- Department of Comparative Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Rajinder Kaul
- 1] Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA [2] Department of Medicine, University of Washington, Seattle, Washington 98195, USA
| | - Mark Groudine
- 1] Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA [2] Division of Radiation Oncology, University of Washington, Seattle, Washington 98195, USA
| | - M A Bender
- 1] Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA [2] Department of Pediatrics, University of Washington, Seattle, Washington 98195, USA
| | - Elhanan Borenstein
- 1] Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA [2] Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98102, USA [3] Santa Fe Institute, Santa Fe, New Mexico 87501, USA
| | - John A Stamatoyannopoulos
- 1] Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA [2] Department of Medicine, University of Washington, Seattle, Washington 98195, USA
| |
Collapse
|
6
|
Sullivan AM, Arsovski AA, Lempe J, Bubb KL, Weirauch MT, Sabo PJ, Sandstrom R, Thurman RE, Neph S, Reynolds AP, Stergachis AB, Vernot B, Johnson AK, Haugen E, Sullivan ST, Thompson A, Neri FV, Weaver M, Diegel M, Mnaimneh S, Yang A, Hughes TR, Nemhauser JL, Queitsch C, Stamatoyannopoulos JA. Mapping and dynamics of regulatory DNA and transcription factor networks in A. thaliana. Cell Rep 2014; 8:2015-2030. [PMID: 25220462 DOI: 10.1016/j.celrep.2014.08.019] [Citation(s) in RCA: 159] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Revised: 05/20/2014] [Accepted: 08/07/2014] [Indexed: 01/23/2023] Open
Abstract
Our understanding of gene regulation in plants is constrained by our limited knowledge of plant cis-regulatory DNA and its dynamics. We mapped DNase I hypersensitive sites (DHSs) in A. thaliana seedlings and used genomic footprinting to delineate ∼ 700,000 sites of in vivo transcription factor (TF) occupancy at nucleotide resolution. We show that variation associated with 72 diverse quantitative phenotypes localizes within DHSs. TF footprints encode an extensive cis-regulatory lexicon subject to recent evolutionary pressures, and widespread TF binding within exons may have shaped codon usage patterns. The architecture of A. thaliana TF regulatory networks is strikingly similar to that of animals in spite of diverged regulatory repertoires. We analyzed regulatory landscape dynamics during heat shock and photomorphogenesis, disclosing thousands of environmentally sensitive elements and enabling mapping of key TF regulatory circuits underlying these fundamental responses. Our results provide an extensive resource for the study of A. thaliana gene regulation and functional biology.
Collapse
Affiliation(s)
| | - Andrej A Arsovski
- Department of Biology, University of Washington, Seattle, WA 98195, USA
| | - Janne Lempe
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Kerry L Bubb
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Matthew T Weirauch
- Center for Autoimmune Genomics and Etiology (CAGE) and Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Peter J Sabo
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Richard Sandstrom
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Robert E Thurman
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Shane Neph
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Alex P Reynolds
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Andrew B Stergachis
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Benjamin Vernot
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Audra K Johnson
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Eric Haugen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Shawn T Sullivan
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Agnieszka Thompson
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Fidencio V Neri
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Molly Weaver
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Morgan Diegel
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Sanie Mnaimneh
- Donnelly Centre and Department of Molecular Genetics, University of Toronto, Toronto ON M5S 3E1, Canada
| | - Ally Yang
- Donnelly Centre and Department of Molecular Genetics, University of Toronto, Toronto ON M5S 3E1, Canada
| | - Timothy R Hughes
- Donnelly Centre and Department of Molecular Genetics, University of Toronto, Toronto ON M5S 3E1, Canada; Canadian Institute for Advanced Research (CIFAR) Program in Genetic Networks, Toronto ON M5G 1Z8, Canada
| | | | - Christine Queitsch
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| | | |
Collapse
|
7
|
Stergachis AB, Neph S, Reynolds A, Humbert R, Miller B, Paige SL, Vernot B, Cheng JB, Thurman RE, Sandstrom R, Haugen E, Heimfeld S, Murry CE, Akey JM, Stamatoyannopoulos JA. Developmental fate and cellular maturity encoded in human regulatory DNA landscapes. Cell 2013; 154:888-903. [PMID: 23953118 DOI: 10.1016/j.cell.2013.07.020] [Citation(s) in RCA: 222] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2013] [Revised: 04/16/2013] [Accepted: 07/12/2013] [Indexed: 10/26/2022]
Abstract
Cellular-state information between generations of developing cells may be propagated via regulatory regions. We report consistent patterns of gain and loss of DNase I-hypersensitive sites (DHSs) as cells progress from embryonic stem cells (ESCs) to terminal fates. DHS patterns alone convey rich information about cell fate and lineage relationships distinct from information conveyed by gene expression. Developing cells share a proportion of their DHS landscapes with ESCs; that proportion decreases continuously in each cell type as differentiation progresses, providing a quantitative benchmark of developmental maturity. Developmentally stable DHSs densely encode binding sites for transcription factors involved in autoregulatory feedback circuits. In contrast to normal cells, cancer cells extensively reactivate silenced ESC DHSs and those from developmental programs external to the cell lineage from which the malignancy derives. Our results point to changes in regulatory DNA landscapes as quantitative indicators of cell-fate transitions, lineage relationships, and dysfunction.
Collapse
Affiliation(s)
- Andrew B Stergachis
- Department of Genome Sciences, University of Washington, Seattle, WA 98109, USA
| | - Shane Neph
- Department of Genome Sciences, University of Washington, Seattle, WA 98109, USA
| | - Alex Reynolds
- Department of Genome Sciences, University of Washington, Seattle, WA 98109, USA
| | - Richard Humbert
- Department of Genome Sciences, University of Washington, Seattle, WA 98109, USA
| | - Brady Miller
- Department of Genome Sciences, University of Washington, Seattle, WA 98109, USA.,Department of Medicine, Division of Hematology University of Washington, Seattle, WA 98195, USA
| | - Sharon L Paige
- Department of Pathology, University of Washington, Seattle, WA 98109, USA.,Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA 98109, USA
| | - Benjamin Vernot
- Department of Genome Sciences, University of Washington, Seattle, WA 98109, USA
| | - Jeffrey B Cheng
- Department of Dermatology, University of California, San Francisco, CA 94143, USA
| | - Robert E Thurman
- Department of Genome Sciences, University of Washington, Seattle, WA 98109, USA
| | - Richard Sandstrom
- Department of Genome Sciences, University of Washington, Seattle, WA 98109, USA
| | - Eric Haugen
- Department of Genome Sciences, University of Washington, Seattle, WA 98109, USA
| | - Shelly Heimfeld
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Charles E Murry
- Department of Pathology, University of Washington, Seattle, WA 98109, USA.,Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA 98109, USA.,Department of Bioengineering, University of Washington, Seattle, WA 98109, USA.,Department of Medicine, Division of Cardiology University of Washington, Seattle, WA 98195, USA
| | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle, WA 98109, USA
| | - John A Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, Seattle, WA 98109, USA.,Department of Medicine, Division of Oncology University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
8
|
Vernot B, Stergachis AB, Maurano MT, Vierstra J, Neph S, Thurman RE, Stamatoyannopoulos JA, Akey JM. Personal and population genomics of human regulatory variation. Genome Res 2013; 22:1689-97. [PMID: 22955981 PMCID: PMC3431486 DOI: 10.1101/gr.134890.111] [Citation(s) in RCA: 91] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The characteristics and evolutionary forces acting on regulatory variation in humans remains elusive because of the difficulty in defining functionally important noncoding DNA. Here, we combine genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (DHSs) from 138 cell and tissue types with whole-genome sequences of 53 geographically diverse individuals in order to better delimit the patterns of regulatory variation in humans. We estimate that individuals likely harbor many more functionally important variants in regulatory DNA compared with protein-coding regions, although they are likely to have, on average, smaller effect sizes. Moreover, we demonstrate that there is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. We also find marked variability in functional constraint among transcription factor motifs in regulatory DNA, with sequence motifs for major developmental regulators, such as HOX proteins, exhibiting levels of constraint comparable to protein-coding regions. Finally, we perform a genome-wide scan of recent positive selection and identify hundreds of novel substrates of adaptive regulatory evolution that are enriched for biologically interesting pathways such as melanogenesis and adipocytokine signaling. These data and results provide new insights into patterns of regulatory variation in individuals and populations and demonstrate that a large proportion of functionally important variation lies beyond the exome.
Collapse
Affiliation(s)
- Benjamin Vernot
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Samstein RM, Arvey A, Josefowicz SZ, Peng X, Reynolds A, Sandstrom R, Neph S, Sabo P, Kim JM, Liao W, Li MO, Leslie C, Stamatoyannopoulos JA, Rudensky AY. Foxp3 exploits a pre-existent enhancer landscape for regulatory T cell lineage specification. Cell 2012; 151:153-66. [PMID: 23021222 PMCID: PMC3493256 DOI: 10.1016/j.cell.2012.06.053] [Citation(s) in RCA: 355] [Impact Index Per Article: 29.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2012] [Revised: 06/06/2012] [Accepted: 06/29/2012] [Indexed: 12/13/2022]
Abstract
Regulatory T (Treg) cells, whose identity and function are defined by the transcription factor Foxp3, are indispensable for immune homeostasis. It is unclear whether Foxp3 exerts its Treg lineage specification function through active modification of the chromatin landscape and establishment of new enhancers or by exploiting a pre-existing enhancer landscape. Analysis of the chromatin accessibility of Foxp3-bound enhancers in Treg and Foxp3-negative T cells showed that Foxp3 was bound overwhelmingly to preaccessible enhancers occupied by its cofactors in precursor cells or a structurally related predecessor. Furthermore, the bulk of Foxp3-bound Treg cell enhancers lacking in Foxp3(-) CD4(+) cells became accessible upon T cell receptor activation prior to Foxp3 expression, and only a small subset associated with several functionally important genes were exclusively Treg cell specific. Thus, in a late cellular differentiation process, Foxp3 defines Treg cell functionality in an "opportunistic" manner by largely exploiting the preformed enhancer network instead of establishing a new enhancer landscape.
Collapse
Affiliation(s)
- Robert M Samstein
- Howard Hughes Medical Institute, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Sheffield NC, Stergachis AB, Wang H, Vernot B, Garg K, Sandstrom R, Bates D, Canfield TK, Diegel M, Dunn D, Ebersol AK, Frum T, Giste E, Harding L, Johnson AK, Johnson EM, Kutyavin T, Lajoie B, Lee BK, Lee K, London D, Lotakis D, Neph S, Neri F, Nguyen ED, Reynolds AP, Roach V, Safi A, Sanchez ME, Sanyal A, Shafer A, Simon JM, Song L, Vong S, Weaver M, Zhang Z, Zhang Z, Lenhard B, Tewari M, Dorschner MO, Hansen RS, Navas PA, Stamatoyannopoulos G, Iyer VR, Lieb JD, Sunyaev SR, Akey JM, Sabo PJ, Kaul R, Furey TS, Dekker J, Crawford GE, Stamatoyannopoulos JA. The accessible chromatin landscape of the human genome. Nature 2012; 489:75-82. [PMID: 22955617 PMCID: PMC3721348 DOI: 10.1038/nature11232] [Citation(s) in RCA: 1898] [Impact Index Per Article: 158.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Accepted: 05/15/2012] [Indexed: 02/07/2023]
Abstract
DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify ∼2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect ∼580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation.
Collapse
Affiliation(s)
- Robert E. Thurman
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Eric Rynes
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Richard Humbert
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Jeff Vierstra
- Department of Genome Sciences, University of Washington, Seattle, WA
| | | | - Eric Haugen
- Department of Genome Sciences, University of Washington, Seattle, WA
| | | | | | - Hao Wang
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Benjamin Vernot
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Kavita Garg
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Richard Sandstrom
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Daniel Bates
- Department of Genome Sciences, University of Washington, Seattle, WA
| | | | - Morgan Diegel
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Douglas Dunn
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Abigail K. Ebersol
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Tristan Frum
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Erika Giste
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Lisa Harding
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Audra K. Johnson
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Ericka M. Johnson
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Tanya Kutyavin
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Bryan Lajoie
- Program in Gene Function, University of Massachusetts Medical School, Worcester, MA
| | - Bum-Kyu Lee
- Institute for Cellular and Molecular Biology, University of Texas, Austin, TX
| | - Kristen Lee
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Darin London
- Institute for Genome Sciences and Policy, Duke University, Durham, NC
| | - Dimitra Lotakis
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Shane Neph
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Fidencio Neri
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Eric D. Nguyen
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Alex P. Reynolds
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Vaughn Roach
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Alexias Safi
- Institute for Genome Sciences and Policy, Duke University, Durham, NC
| | - Minerva E. Sanchez
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Amartya Sanyal
- Program in Gene Function, University of Massachusetts Medical School, Worcester, MA
| | - Anthony Shafer
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Jeremy M. Simon
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Lingyun Song
- Institute for Genome Sciences and Policy, Duke University, Durham, NC
| | - Shinny Vong
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Molly Weaver
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Zhancheng Zhang
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Zhuzhu Zhang
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Boris Lenhard
- Bergen Center for Computational Science, University of Bergen, Bergen, Norway
| | - Muneesh Tewari
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Michael O. Dorschner
- Dept. of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA
| | - R. Scott Hansen
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Patrick A. Navas
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | | | - Vishwanath R. Iyer
- Institute for Cellular and Molecular Biology, University of Texas, Austin, TX
| | - Jason D. Lieb
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Shamil R. Sunyaev
- Dept. of Medicine, Division of Genetics, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA
| | - Joshua M. Akey
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Peter J. Sabo
- Department of Genome Sciences, University of Washington, Seattle, WA
| | - Rajinder Kaul
- Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA
| | - Terrence S. Furey
- Department of Biology, University of North Carolina, Chapel Hill, NC
| | - Job Dekker
- Program in Gene Function, University of Massachusetts Medical School, Worcester, MA
| | | | - John A. Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, Seattle, WA
- Department of Medicine, Division of Oncology, University of Washington, Seattle, WA
| |
Collapse
|
11
|
Neph S, Vierstra J, Stergachis AB, Reynolds AP, Haugen E, Vernot B, Thurman RE, Sandstrom R, Johnson AK, Maurano MT, Humbert R, Rynes E, Wang H, Vong S, Lee K, Bates D, Diegel M, Roach V, Dunn D, Neri J, Schafer A, Hansen RS, Kutyavin T, Giste E, Weaver M, Canfield T, Sabo P, Zhang M, Balasundaram G, Byron R, MacCoss MJ, Akey JM, Bender M, Groudine M, Kaul R, Stamatoyannopoulos JA. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 2012; 489:83-90. [PMID: 22955618 PMCID: PMC3736582 DOI: 10.1038/nature11212] [Citation(s) in RCA: 566] [Impact Index Per Article: 47.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2011] [Accepted: 05/10/2012] [Indexed: 01/04/2023]
Abstract
Regulatory factor binding to genomic DNA protects the underlying sequence from cleavage by DNase I, leaving nucleotide-resolution footprints. Using genomic DNase I footprinting across 41 diverse cell and tissue types, we detected 45 million transcription factor occupancy events within regulatory regions, representing differential binding to 8.4 million distinct short sequence elements. Here we show that this small genomic sequence compartment, roughly twice the size of the exome, encodes an expansive repertoire of conserved recognition sequences for DNA-binding proteins that nearly doubles the size of the human cis-regulatory lexicon. We find that genetic variants affecting allelic chromatin states are concentrated in footprints, and that these elements are preferentially sheltered from DNA methylation. High-resolution DNase I cleavage patterns mirror nucleotide-level evolutionary conservation and track the crystallographic topography of protein-DNA interfaces, indicating that transcription factor structure has been evolutionarily imprinted on the human genome sequence. We identify a stereotyped 50-base-pair footprint that precisely defines the site of transcript origination within thousands of human promoters. Finally, we describe a large collection of novel regulatory factor recognition motifs that are highly conserved in both sequence and function, and exhibit cell-selective occupancy patterns that closely parallel major regulators of development, differentiation and pluripotency.
Collapse
Affiliation(s)
- Shane Neph
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Jeff Vierstra
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | | | - Alex P. Reynolds
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Eric Haugen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Benjamin Vernot
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Robert E. Thurman
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Richard Sandstrom
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Audra K. Johnson
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Matthew T. Maurano
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Richard Humbert
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Eric Rynes
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Hao Wang
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Shinny Vong
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Kristen Lee
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Daniel Bates
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Morgan Diegel
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Vaughn Roach
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Douglas Dunn
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Jun Neri
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Anthony Schafer
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - R. Scott Hansen
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195
| | - Tanya Kutyavin
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Erika Giste
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Molly Weaver
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Theresa Canfield
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Peter Sabo
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Miaohua Zhang
- Basic Sciences Division, Fred Hutchison Cancer Research Center, Seattle, WA 98109
| | | | - Rachel Byron
- Basic Sciences Division, Fred Hutchison Cancer Research Center, Seattle, WA 98109
| | - Michael J. MacCoss
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Joshua M. Akey
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Michael Bender
- Basic Sciences Division, Fred Hutchison Cancer Research Center, Seattle, WA 98109
| | - Mark Groudine
- Basic Sciences Division, Fred Hutchison Cancer Research Center, Seattle, WA 98109
| | - Rajinder Kaul
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195
| | - John A. Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
- Division of Oncology, Deparment of Medicine, University of Washington, Seattle, WA 98195
| |
Collapse
|
12
|
Neph S, Stergachis AB, Reynolds A, Sandstrom R, Borenstein E, Stamatoyannopoulos JA. Circuitry and dynamics of human transcription factor regulatory networks. Cell 2012; 150:1274-86. [PMID: 22959076 DOI: 10.1016/j.cell.2012.04.040] [Citation(s) in RCA: 373] [Impact Index Per Article: 31.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2012] [Revised: 03/19/2012] [Accepted: 04/23/2012] [Indexed: 12/20/2022]
Abstract
The combinatorial cross-regulation of hundreds of sequence-specific transcription factors (TFs) defines a regulatory network that underlies cellular identity and function. Here we use genome-wide maps of in vivo DNaseI footprints to assemble an extensive core human regulatory network comprising connections among 475 sequence-specific TFs and to analyze the dynamics of these connections across 41 diverse cell and tissue types. We find that human TF networks are highly cell selective and are driven by cohorts of factors that include regulators with previously unrecognized roles in control of cellular identity. Moreover, we identify many widely expressed factors that impact transcriptional regulatory networks in a cell-selective manner. Strikingly, in spite of their inherent diversity, all cell-type regulatory networks independently converge on a common architecture that closely resembles the topology of living neuronal networks. Together, our results provide an extensive description of the circuitry, dynamics, and organizing principles of the human TF regulatory network.
Collapse
Affiliation(s)
- Shane Neph
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | |
Collapse
|
13
|
Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom R, Qu H, Brody J, Shafer A, Neri F, Lee K, Kutyavin T, Stehling-Sun S, Johnson AK, Canfield TK, Giste E, Diegel M, Bates D, Hansen RS, Neph S, Sabo PJ, Heimfeld S, Raubitschek A, Ziegler S, Cotsapas C, Sotoodehnia N, Glass I, Sunyaev SR, Kaul R, Stamatoyannopoulos JA. Systematic localization of common disease-associated variation in regulatory DNA. Science 2012; 337:1190-5. [PMID: 22955828 DOI: 10.1126/science.1222794] [Citation(s) in RCA: 2409] [Impact Index Per Article: 200.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Genome-wide association studies have identified many noncoding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active during fetal development and are enriched in variants associated with gestational exposure-related phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo identification of pathogenic cell types for Crohn's disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.
Collapse
Affiliation(s)
- Matthew T Maurano
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK, Rynes E, Maurano MT, Vierstra J, Thomas S, Sandstrom R, Humbert R, Stamatoyannopoulos JA. BEDOPS: high-performance genomic feature operations. ACTA ACUST UNITED AC 2012; 28:1919-20. [PMID: 22576172 DOI: 10.1093/bioinformatics/bts277] [Citation(s) in RCA: 567] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
UNLABELLED The large and growing number of genome-wide datasets highlights the need for high-performance feature analysis and data comparison methods, in addition to efficient data storage and retrieval techniques. We introduce BEDOPS, a software suite for common genomic analysis tasks which offers improved flexibility, scalability and execution time characteristics over previously published packages. The suite includes a utility to compress large inputs into a lossless format that can provide greater space savings and faster data extractions than alternatives. AVAILABILITY http://code.google.com/p/bedops/ includes binaries, source and documentation.
Collapse
Affiliation(s)
- Shane Neph
- Department of Genome Sciences and Department of Medicine, University of Washington, Seattle, Washington, DC 98195, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Mercer TR, Neph S, Dinger ME, Crawford J, Smith MA, Shearwood AMJ, Haugen E, Bracken CP, Rackham O, Stamatoyannopoulos JA, Filipovska A, Mattick JS. The human mitochondrial transcriptome. Cell 2011; 146:645-58. [PMID: 21854988 DOI: 10.1016/j.cell.2011.06.051] [Citation(s) in RCA: 590] [Impact Index Per Article: 45.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Revised: 06/15/2011] [Accepted: 06/27/2011] [Indexed: 11/27/2022]
Abstract
The human mitochondrial genome comprises a distinct genetic system transcribed as precursor polycistronic transcripts that are subsequently cleaved to generate individual mRNAs, tRNAs, and rRNAs. Here, we provide a comprehensive analysis of the human mitochondrial transcriptome across multiple cell lines and tissues. Using directional deep sequencing and parallel analysis of RNA ends, we demonstrate wide variation in mitochondrial transcript abundance and precisely resolve transcript processing and maturation events. We identify previously undescribed transcripts, including small RNAs, and observe the enrichment of several nuclear RNAs in mitochondria. Using high-throughput in vivo DNaseI footprinting, we establish the global profile of DNA-binding protein occupancy across the mitochondrial genome at single-nucleotide resolution, revealing regulatory features at mitochondrial transcription initiation sites and functional insights into disease-associated variants. This integrated analysis of the mitochondrial transcriptome reveals unexpected complexity in the regulation, expression, and processing of mitochondrial RNA and provides a resource for future studies of mitochondrial function (accessed at http://mitochondria.matticklab.com).
Collapse
Affiliation(s)
- Tim R Mercer
- Institute for Molecular Bioscience, The University of Queensland, Brisbane QLD 4072, Australia
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Attanasio C, Reymond A, Humbert R, Lyle R, Kuehn MS, Neph S, Sabo PJ, Goldy J, Weaver M, Haydock A, Lee K, Dorschner M, Dermitzakis ET, Antonarakis SE, Stamatoyannopoulos JA. Assaying the regulatory potential of mammalian conserved non-coding sequences in human cells. Genome Biol 2008; 9:R168. [PMID: 19055709 PMCID: PMC2646272 DOI: 10.1186/gb-2008-9-12-r168] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2008] [Revised: 09/24/2008] [Accepted: 12/02/2008] [Indexed: 01/26/2023] Open
Abstract
The fraction of experimentally active conserved non-coding sequences within any given cell type is low, so classical assays are unlikely to expose their potential. Background Conserved non-coding sequences in the human genome are approximately tenfold more abundant than known genes, and have been hypothesized to mark the locations of cis-regulatory elements. However, the global contribution of conserved non-coding sequences to the transcriptional regulation of human genes is currently unknown. Deeply conserved elements shared between humans and teleost fish predominantly flank genes active during morphogenesis and are enriched for positive transcriptional regulatory elements. However, such deeply conserved elements account for <1% of the conserved non-coding sequences in the human genome, which are predominantly mammalian. Results We explored the regulatory potential of a large sample of these 'common' conserved non-coding sequences using a variety of classic assays, including chromatin remodeling, and enhancer/repressor and promoter activity. When tested across diverse human model cell types, we find that the fraction of experimentally active conserved non-coding sequences within any given cell type is low (approximately 5%), and that this proportion increases only modestly when considered collectively across cell types. Conclusions The results suggest that classic assays of cis-regulatory potential are unlikely to expose the functional potential of the substantial majority of mammalian conserved non-coding sequences in the human genome.
Collapse
Affiliation(s)
- Catia Attanasio
- Department of Genetic Medicine and Development, University of Geneva Medical School, 1 rue Michel Servet, 1211, Geneva 4, Switzerland.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Yao Z, Barrick J, Weinberg Z, Neph S, Breaker R, Tompa M, Ruzzo WL. A computational pipeline for high- throughput discovery of cis-regulatory noncoding RNA in prokaryotes. PLoS Comput Biol 2008; 3:e126. [PMID: 17616982 PMCID: PMC1913097 DOI: 10.1371/journal.pcbi.0030126] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2007] [Accepted: 05/17/2007] [Indexed: 01/11/2023] Open
Abstract
Noncoding RNAs (ncRNAs) are important functional RNAs that do not code for proteins. We present a highly efficient computational pipeline for discovering cis-regulatory ncRNA motifs de novo. The pipeline differs from previous methods in that it is structure-oriented, does not require a multiple-sequence alignment as input, and is capable of detecting RNA motifs with low sequence conservation. We also integrate RNA motif prediction with RNA homolog search, which improves the quality of the RNA motifs significantly. Here, we report the results of applying this pipeline to Firmicute bacteria. Our top-ranking motifs include most known Firmicute elements found in the RNA family database (Rfam). Comparing our motif models with Rfam's hand-curated motif models, we achieve high accuracy in both membership prediction and base-pair–level secondary structure prediction (at least 75% average sensitivity and specificity on both tasks). Of the ncRNA candidates not in Rfam, we find compelling evidence that some of them are functional, and analyze several potential ribosomal protein leaders in depth. For decades, scientists believed that, with a few key exceptions, RNA played a secondary role in the cell. Recent discoveries have sharply revised this simple picture, revealing widespread, diverse, and surprisingly sophisticated roles for RNA. For example, many bacteria use RNA elements called “riboswitches” to switch various gene activities on or off in response to extremely sensitive detection of specific molecules. Discovery of new functional RNA elements remains a very challenging task, both computationally and experimentally. It is computationally difficult largely because of the importance of an RNA molecule's 3-D structure, and the fact that molecules with very different nucleotide sequences can fold into the same shape. In this paper, we propose a computational procedure, based on comparing the genomes of multiple bacteria, for discovery of novel RNAs. Unlike most previous approaches, ours does not require a letter-by-letter alignment of these diverse genomes, making it more applicable to RNA elements whose structure, but not nucleotide sequence, has been preserved through evolution. In an extensive test on the Firmicutes, a bacterial phylum containing well-studied organisms such as Bacillus subtilis and important pathogens such as anthrax, we recover most known noncoding RNA elements, as well as making many novel predictions.
Collapse
Affiliation(s)
- Zizhen Yao
- Department of Computer Science and Engineering, University of Washington, Seattle, Washington, USA.
| | | | | | | | | | | | | |
Collapse
|
18
|
Weinberg Z, Barrick JE, Yao Z, Roth A, Kim JN, Gore J, Wang JX, Lee ER, Block KF, Sudarsan N, Neph S, Tompa M, Ruzzo WL, Breaker RR. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline. Nucleic Acids Res 2007; 35:4809-19. [PMID: 17621584 PMCID: PMC1950547 DOI: 10.1093/nar/gkm487] [Citation(s) in RCA: 231] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
We applied a computational pipeline based on comparative genomics to bacteria, and identified 22 novel candidate RNA motifs. We predicted six to be riboswitches, which are mRNA elements that regulate gene expression on binding a specific metabolite. In separate studies, we confirmed that two of these are novel riboswitches. Three other riboswitch candidates are upstream of either a putative transporter gene in the order Lactobacillales, citric acid cycle genes in Burkholderiales or molybdenum cofactor biosynthesis genes in several phyla. The remaining riboswitch candidate, the widespread Genes for the Environment, for Membranes and for Motility (GEMM) motif, is associated with genes important for natural competence in Vibrio cholerae and the use of metal ions as electron acceptors in Geobacter sulfurreducens. Among the other motifs, one has a genetic distribution similar to a previously published candidate riboswitch, ykkC/yxkD, but has a different structure. We identified possible non-coding RNAs in five phyla, and several additional cis-regulatory RNAs, including one in ε-proteobacteria (upstream of purD, involved in purine biosynthesis), and one in Cyanobacteria (within an ATP synthase operon). These candidate RNAs add to the growing list of RNA motifs involved in multiple cellular processes, and suggest that many additional RNAs remain to be discovered.
Collapse
Affiliation(s)
- Zasha Weinberg
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
- *To whom correspondence should be addressed.(203) 432-6554(203) 432-6161
| | - Jeffrey E. Barrick
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Zizhen Yao
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Adam Roth
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Jane N. Kim
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Jeremy Gore
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Joy Xin Wang
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Elaine R. Lee
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Kirsten F. Block
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Narasimhan Sudarsan
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Shane Neph
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Martin Tompa
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Walter L. Ruzzo
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| | - Ronald R. Breaker
- Department of Molecular, Cellular and Developmental Biology, Howard Hughes Medical Institute, Department of Molecular Biophysics and Biochemistry, Yale University, Box 208103, New Haven, CT 06520-8103, USA Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington, Box 352350, Seattle, WA 98195-2350, USA
| |
Collapse
|
19
|
Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SCJ, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermüller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung WK, Ooi HS, Chiu KP, Foissac S, Alioto T, Brent M, Pachter L, Tress ML, Valencia A, Choo SW, Choo CY, Ucla C, Manzano C, Wyss C, Cheung E, Clark TG, Brown JB, Ganesh M, Patel S, Tammana H, Chrast J, Henrichsen CN, Kai C, Kawai J, Nagalakshmi U, Wu J, Lian Z, Lian J, Newburger P, Zhang X, Bickel P, Mattick JS, Carninci P, Hayashizaki Y, Weissman S, Hubbard T, Myers RM, Rogers J, Stadler PF, Lowe TM, Wei CL, Ruan Y, Struhl K, Gerstein M, Antonarakis SE, Fu Y, Green ED, Karaöz U, Siepel A, Taylor J, Liefer LA, Wetterstrand KA, Good PJ, Feingold EA, Guyer MS, Cooper GM, Asimenos G, Dewey CN, Hou M, Nikolaev S, Montoya-Burgos JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Huang H, Zhang NR, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Seringhaus M, Church D, Rosenbloom K, Kent WJ, Stone EA, Batzoglou S, Goldman N, Hardison RC, Haussler D, Miller W, Sidow A, Trinklein ND, Zhang ZD, Barrera L, Stuart R, King DC, Ameur A, Enroth S, Bieda MC, Kim J, Bhinge AA, Jiang N, Liu J, Yao F, Vega VB, Lee CWH, Ng P, Shahab A, Yang A, Moqtaderi Z, Zhu Z, Xu X, Squazzo S, Oberley MJ, Inman D, Singer MA, Richmond TA, Munn KJ, Rada-Iglesias A, Wallerman O, Komorowski J, Fowler JC, Couttet P, Bruce AW, Dovey OM, Ellis PD, Langford CF, Nix DA, Euskirchen G, Hartman S, Urban AE, Kraus P, Van Calcar S, Heintzman N, Kim TH, Wang K, Qu C, Hon G, Luna R, Glass CK, Rosenfeld MG, Aldred SF, Cooper SJ, Halees A, Lin JM, Shulha HP, Zhang X, Xu M, Haidar JNS, Yu Y, Ruan Y, Iyer VR, Green RD, Wadelius C, Farnham PJ, Ren B, Harte RA, Hinrichs AS, Trumbower H, Clawson H, Hillman-Jackson J, Zweig AS, Smith K, Thakkapallayil A, Barber G, Kuhn RM, Karolchik D, Armengol L, Bird CP, de Bakker PIW, Kern AD, Lopez-Bigas N, Martin JD, Stranger BE, Woodroffe A, Davydov E, Dimas A, Eyras E, Hallgrímsdóttir IB, Huppert J, Zody MC, Abecasis GR, Estivill X, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VVB, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Koriabine M, Nefedov M, Osoegawa K, Yoshinaga Y, Zhu B, de Jong PJ. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007; 447:799-816. [PMID: 17571346 PMCID: PMC2212820 DOI: 10.1038/nature05874] [Citation(s) in RCA: 3782] [Impact Index Per Article: 222.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Collapse
|
20
|
Abstract
Phylogenetic footprinting is a method for the discovery of regulatory elements in a set of homologous regulatory regions, usually collected from multiple species. It does so by identifying the most conserved motifs in those homologous regions. This note describes web software that has been designed specifically for this purpose in prokaryotic genomes, making use of the phylogenetic relationships among the homologous sequences in order to make more accurate predictions. The software is called MicroFootPrinter and is available at .
Collapse
Affiliation(s)
| | - Martin Tompa
- To whom correspondence should be addressed. Tel: +1 206 543 9263; Fax: +1 206 543 8331;
| |
Collapse
|