1
|
Wirthlin ME, Schmid TA, Elie JE, Zhang X, Kowalczyk A, Redlich R, Shvareva VA, Rakuljic A, Ji MB, Bhat NS, Kaplow IM, Schäffer DE, Lawler AJ, Wang AZ, Phan BN, Annaldasula S, Brown AR, Lu T, Lim BK, Azim E, Clark NL, Meyer WK, Pond SLK, Chikina M, Yartsev MM, Pfenning AR, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Vocal learning-associated convergent evolution in mammalian proteins and regulatory elements. Science 2024; 383:eabn3263. [PMID: 38422184 DOI: 10.1126/science.abn3263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 02/20/2024] [Indexed: 03/02/2024]
Abstract
Vocal production learning ("vocal learning") is a convergently evolved trait in vertebrates. To identify brain genomic elements associated with mammalian vocal learning, we integrated genomic, anatomical, and neurophysiological data from the Egyptian fruit bat (Rousettus aegyptiacus) with analyses of the genomes of 215 placental mammals. First, we identified a set of proteins evolving more slowly in vocal learners. Then, we discovered a vocal motor cortical region in the Egyptian fruit bat, an emergent vocal learner, and leveraged that knowledge to identify active cis-regulatory elements in the motor cortex of vocal learners. Machine learning methods applied to motor cortex open chromatin revealed 50 enhancers robustly associated with vocal learning whose activity tended to be lower in vocal learners. Our research implicates convergent losses of motor cortex regulatory elements in mammalian vocal learning evolution.
Collapse
Affiliation(s)
- Morgan E Wirthlin
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Tobias A Schmid
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Julie E Elie
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94708, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Xiaomeng Zhang
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Amanda Kowalczyk
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ruby Redlich
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Varvara A Shvareva
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Ashley Rakuljic
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Maria B Ji
- Department of Psychology, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Ninad S Bhat
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Irene M Kaplow
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Daniel E Schäffer
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Alyssa J Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Andrew Z Wang
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - BaDoi N Phan
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Siddharth Annaldasula
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ashley R Brown
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Tianyu Lu
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Byung Kook Lim
- Neurobiology section, Division of Biological Science, University of California, San Diego, La Jolla, CA 92093, USA
| | - Eiman Azim
- Molecular Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Nathan L Clark
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Wynn K Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA 18015, USA
| | | | - Maria Chikina
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Michael M Yartsev
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94708, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Andreas R Pfenning
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Raney BJ, Barber GP, Benet-Pagès A, Casper J, Clawson H, Cline M, Diekhans M, Fischer C, Navarro Gonzalez J, Hickey G, Hinrichs A, Kuhn R, Lee B, Lee C, Le Mercier P, Miga K, Nassar L, Nejad P, Paten B, Perez G, Schmelter D, Speir M, Wick B, Zweig A, Haussler D, Kent W, Haeussler M. The UCSC Genome Browser database: 2024 update. Nucleic Acids Res 2024; 52:D1082-D1088. [PMID: 37953330 PMCID: PMC10767968 DOI: 10.1093/nar/gkad987] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/06/2023] [Accepted: 10/17/2023] [Indexed: 11/14/2023] Open
Abstract
The UCSC Genome Browser (https://genome.ucsc.edu) is a web-based genomic visualization and analysis tool that serves data to over 7,000 distinct users per day worldwide. It provides annotation data on thousands of genome assemblies, ranging from human to SARS-CoV2. This year, we have introduced new data from the Human Pangenome Reference Consortium and on viral genomes including SARS-CoV2. We have added 1,200 new genomes to our GenArk genome system, increasing the overall diversity of our genomic representation. We have added support for nine new user-contributed track hubs to our public hub system. Additionally, we have released 29 new tracks on the human genome and 11 new tracks on the mouse genome. Collectively, these new features expand both the breadth and depth of the genomic knowledge that we share publicly with users worldwide.
Collapse
Affiliation(s)
- Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anna Benet-Pagès
- Institute of Neurogenomics, Helmholtz Zentrum München GmbH - German Research Center for Environmental Health, 85764 Neuherberg, Germany
- Medical Genetics Center (Medizinisch Genetisches Zentrum), Munich 80335, Germany
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Melissa S Cline
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Clayton Fischer
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Glenn Hickey
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Phillipe Le Mercier
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, CMU, 1 Michel Servet, 1211 Geneva 4, Switzerland
| | - Karen H Miga
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Luis R Nassar
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Parisa Nejad
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Gerardo Perez
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Daniel Schmelter
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brittney D Wick
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
3
|
Kuderna LFK, Ulirsch JC, Rashid S, Ameen M, Sundaram L, Hickey G, Cox AJ, Gao H, Kumar A, Aguet F, Christmas MJ, Clawson H, Haeussler M, Janiak MC, Kuhlwilm M, Orkin JD, Bataillon T, Manu S, Valenzuela A, Bergman J, Rouselle M, Silva FE, Agueda L, Blanc J, Gut M, de Vries D, Goodhead I, Harris RA, Raveendran M, Jensen A, Chuma IS, Horvath JE, Hvilsom C, Juan D, Frandsen P, Schraiber JG, de Melo FR, Bertuol F, Byrne H, Sampaio I, Farias I, Valsecchi J, Messias M, da Silva MNF, Trivedi M, Rossi R, Hrbek T, Andriaholinirina N, Rabarivola CJ, Zaramody A, Jolly CJ, Phillips-Conroy J, Wilkerson G, Abee C, Simmons JH, Fernandez-Duque E, Kanthaswamy S, Shiferaw F, Wu D, Zhou L, Shao Y, Zhang G, Keyyu JD, Knauf S, Le MD, Lizano E, Merker S, Navarro A, Nadler T, Khor CC, Lee J, Tan P, Lim WK, Kitchener AC, Zinner D, Gut I, Melin AD, Guschanski K, Schierup MH, Beck RMD, Karakikes I, Wang KC, Umapathy G, Roos C, Boubli JP, Siepel A, Kundaje A, Paten B, Lindblad-Toh K, Rogers J, Marques Bonet T, Farh KKH. Identification of constrained sequence elements across 239 primate genomes. Nature 2024; 625:735-742. [PMID: 38030727 PMCID: PMC10808062 DOI: 10.1038/s41586-023-06798-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023]
Abstract
Noncoding DNA is central to our understanding of human gene regulation and complex diseases1,2, and measuring the evolutionary sequence constraint can establish the functional relevance of putative regulatory elements in the human genome3-9. Identifying the genomic elements that have become constrained specifically in primates has been hampered by the faster evolution of noncoding DNA compared to protein-coding DNA10, the relatively short timescales separating primate species11, and the previously limited availability of whole-genome sequences12. Here we construct a whole-genome alignment of 239 species, representing nearly half of all extant species in the primate order. Using this resource, we identified human regulatory elements that are under selective constraint across primates and other mammals at a 5% false discovery rate. We detected 111,318 DNase I hypersensitivity sites and 267,410 transcription factor binding sites that are constrained specifically in primates but not across other placental mammals and validate their cis-regulatory effects on gene expression. These regulatory elements are enriched for human genetic variants that affect gene expression and complex traits and diseases. Our results highlight the important role of recent evolution in regulatory sequence elements differentiating primates, including humans, from other placental mammals.
Collapse
Affiliation(s)
- Lukas F K Kuderna
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Jacob C Ulirsch
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Sabrina Rashid
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Mohamed Ameen
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Laksshman Sundaram
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Anthony J Cox
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Hong Gao
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Arvind Kumar
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Francois Aguet
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | - Matthew J Christmas
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Hiram Clawson
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | - Mareike C Janiak
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Martin Kuhlwilm
- Department of Evolutionary Anthropology, University of Vienna, Vienna, Austria
- Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Vienna, Austria
| | - Joseph D Orkin
- Département d'Anthropologie, Université de Montréal, Montréal, Quebec, Canada
| | - Thomas Bataillon
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
| | - Shivakumara Manu
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Alejandro Valenzuela
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Juraj Bergman
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Section for Ecoinformatics and Biodiversity, Department of Biology, Aarhus University, Aarhus, Denmark
| | | | - Felipe Ennes Silva
- Research Group on Primate Biology and Conservation, Mamirauá Institute for Sustainable Development, Tefé, Brazil
- Evolutionary Biology and Ecology (EBE), Département de Biologie des Organismes, Université libre de Bruxelles (ULB), Brussels, Belgium
| | - Lidia Agueda
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Julie Blanc
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Marta Gut
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Dorien de Vries
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Ian Goodhead
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - R Alan Harris
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Axel Jensen
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, Uppsala, Sweden
| | | | - Julie E Horvath
- North Carolina Museum of Natural Sciences, Raleigh, NC, USA
- Department of Biological and Biomedical Sciences, North Carolina Central University, Durham, NC, USA
- Department of Biological Sciences, North Carolina State University, Raleigh, NC, USA
- Department of Evolutionary Anthropology, Duke University, Durham, NC, USA
- Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - David Juan
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | | | - Joshua G Schraiber
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA
| | | | - Fabrício Bertuol
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Brazil
| | - Hazel Byrne
- Department of Anthropology, University of Utah, Salt Lake City, UT, USA
| | | | - Izeni Farias
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Brazil
| | - João Valsecchi
- Research Group on Terrestrial Vertebrate Ecology, Mamirauá Institute for Sustainable Development, Tefé, Brazil
- Rede de Pesquisa em Diversidade, Conservação e Uso da Fauna da Amazônia - RedeFauna, Manaus, Brazil
- Comunidad de Manejo de Fauna Silvestre en la Amazonía y en Latinoamérica-ComFauna, Iquitos, Peru
| | - Malu Messias
- Universidade Federal de Rondônia, Porto Velho, Brazil
| | | | - Mihir Trivedi
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Rogerio Rossi
- Instituto de Biociências, Universidade Federal do Mato Grosso, Cuiabá, Brazil
| | - Tomas Hrbek
- Universidade Federal do Amazonas, Departamento de Genética, Laboratório de Evolução e Genética Animal (LEGAL), Manaus, Brazil
- Department of Biology, Trinity University, San Antonio, TX, USA
| | - Nicole Andriaholinirina
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Clément J Rabarivola
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Alphonse Zaramody
- Life Sciences and Environment, Technology and Environment of Mahajanga, University of Mahajanga, Mahajanga, Madagascar
| | - Clifford J Jolly
- Department of Anthropology, New York University, New York, NY, USA
| | - Jane Phillips-Conroy
- Department of Neuroscience, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | - Gregory Wilkerson
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop, TX, USA
| | - Christian Abee
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop, TX, USA
| | - Joe H Simmons
- Keeling Center for Comparative Medicine and Research, MD Anderson Cancer Center, Bastrop, TX, USA
| | | | - Sree Kanthaswamy
- School of Interdisciplinary Forensics, Arizona State University, Phoenix, AZ, USA
- California National Primate Research Center, University of California, Davis, CA, USA
| | - Fekadu Shiferaw
- Guinea Worm Eradication Program, The Carter Center Ethiopia, Addis Ababa, Ethiopia
| | - Dongdong Wu
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Long Zhou
- Center for Evolutionary and Organismal Biology, Zhejiang University School of Medicine, Hangzhou, China
| | - Yong Shao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Guojie Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Center for Evolutionary and Organismal Biology, Zhejiang University School of Medicine, Hangzhou, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- Liangzhu Laboratory, Zhejiang University Medical Center, Hangzhou, China
- Women's Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Julius D Keyyu
- Tanzania Wildlife Research Institute (TAWIRI), Arusha, Tanzania
| | - Sascha Knauf
- Institute of International Animal Health/One Health, Friedrich-Loeffler-Institut, Federal Research Institute for Animal Health, Greifswald-Insel Riems, Germany
- Professorship for International Animal Health/One Health, Faculty of Veterinary Medicine, Justus Liebig University, Giessen, Germany
| | - Minh D Le
- Department of Environmental Ecology, Faculty of Environmental Sciences, University of Science and Central Institute for Natural Resources and Environmental Studies, Vietnam National University, Hanoi, Vietnam
| | - Esther Lizano
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Stefan Merker
- Department of Zoology, State Museum of Natural History Stuttgart, Stuttgart, Germany
| | - Arcadi Navarro
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
- Barcelonaβeta Brain Research Center, Pasqual Maragall Foundation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Tilo Nadler
- Cuc Phuong Commune, Nho Quan District, Vietnam
| | - Chiea Chuen Khor
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | | | - Patrick Tan
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore, Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore, Singapore
| | - Weng Khong Lim
- SingHealth Duke-NUS Institute of Precision Medicine (PRISM), Singapore, Singapore
- Cancer and Stem Cell Biology Program, Duke-NUS Medical School, Singapore, Singapore
- SingHealth Duke-NUS Genomic Medicine Centre, Singapore, Singapore
| | - Andrew C Kitchener
- Department of Natural Sciences, National Museums Scotland, Edinburgh, UK
- School of Geosciences, Edinburgh, UK
| | - Dietmar Zinner
- Cognitive Ethology Laboratory, Germany Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
- Department of Primate Cognition, Georg-August-Universität Göttingen, Göttingen, Germany
- Leibniz ScienceCampus Primate Cognition, Göttingen, Germany
| | - Ivo Gut
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain
| | - Amanda D Melin
- Department of Anthropology and Archaeology, University of Calgary, Calgary, Alberta, Canada
- Department of Medical Genetics, University of Calgary, Calgary, Alberta, Canada
- Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| | - Katerina Guschanski
- Department of Ecology and Genetics, Animal Ecology, Uppsala University, Uppsala, Sweden
- Institute of Ecology and Evolution, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | | | - Robin M D Beck
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Ioannis Karakikes
- Cardiovascular Institute, Stanford University, Stanford, CA, USA
- Department of Cardiothoracic Surgery, Stanford University, Stanford, CA, USA
| | - Kevin C Wang
- Department of Cancer Biology, Stanford University, Stanford, CA, USA
- Department of Dermatology, Stanford University School of Medicine, Stanford, CA, USA
- Veterans Affairs Palo Alto Healthcare System, Palo Alto, CA, USA
| | - Govindhaswamy Umapathy
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
- Laboratory for the Conservation of Endangered Species, CSIR-Centre for Cellular and Molecular Biology, Hyderabad, India
| | - Christian Roos
- Gene Bank of Primates and Primate Genetics Laboratory, German Primate Center, Leibniz Institute for Primate Research, Göttingen, Germany
| | - Jean P Boubli
- School of Science, Engineering and Environment, University of Salford, Salford, UK
| | - Adam Siepel
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Anshul Kundaje
- Department of Computer Science, Stanford University, Stanford, CA, USA
- Department of Genetics, Stanford University, Stanford, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| | - Tomas Marques Bonet
- IBE, Institute of Evolutionary Biology (UPF-CSIC), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Barcelona, Spain.
- Centro Nacional de Analisis Genomico (CNAG), Barcelona, Spain.
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
- Universitat Pompeu Fabra, Barcelona, Spain.
| | - Kyle Kai-How Farh
- Illumina Artificial Intelligence Laboratory, Illumina, San Diego, CA, USA.
| |
Collapse
|
4
|
Clawson H, Lee BT, Raney BJ, Barber GP, Casper J, Diekhans M, Fischer C, Gonzalez JN, Hinrichs AS, Lee CM, Nassar LR, Perez G, Wick B, Schmelter D, Speir ML, Armstrong J, Zweig AS, Kuhn RM, Kirilenko BM, Hiller M, Haussler D, Kent WJ, Haeussler M. GenArk: towards a million UCSC genome browsers. Genome Biol 2023; 24:217. [PMID: 37784172 PMCID: PMC10544498 DOI: 10.1186/s13059-023-03057-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Accepted: 09/11/2023] [Indexed: 10/04/2023] Open
Abstract
Interactive graphical genome browsers are essential tools in genomics, but they do not contain all the recent genome assemblies. We create Genome Archive (GenArk) collection of UCSC Genome Browsers from NCBI assemblies. Built on our established track hub system, this enables fast visualization of annotations. Assemblies come with gene models, repeat masks, BLAT, and in silico PCR. Users can add annotations via track hubs and custom tracks. We can bulk-import third-party resources, demonstrated with TOGA and Ensembl gene models for hundreds of assemblies.Three thousand two hundred sixty-nine GenArk assemblies are listed at https://hgdownload.soe.ucsc.edu/hubs/ and can be searched for on the Genome Browser gateway page.
Collapse
Affiliation(s)
- Hiram Clawson
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA.
| | - Brian T Lee
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Galt P Barber
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Jonathan Casper
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Clay Fischer
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | | | - Angie S Hinrichs
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Luis R Nassar
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Gerardo Perez
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Brittney Wick
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Daniel Schmelter
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Joel Armstrong
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - Bogdan M Kirilenko
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325, Frankfurt, Germany
- Senckenberg Research Institute, Senckenberganlage 25, 60325, Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438, Frankfurt, Germany
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325, Frankfurt, Germany
- Senckenberg Research Institute, Senckenberganlage 25, 60325, Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438, Frankfurt, Germany
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | - W James Kent
- Genomics Institute, University of California, Santa Cruz, CA, 95064, USA
| | | |
Collapse
|
5
|
Christmas MJ, Kaplow IM, Genereux DP, Dong MX, Hughes GM, Li X, Sullivan PF, Hindle AG, Andrews G, Armstrong JC, Bianchi M, Breit AM, Diekhans M, Fanter C, Foley NM, Goodman DB, Goodman L, Keough KC, Kirilenko B, Kowalczyk A, Lawless C, Lind AL, Meadows JRS, Moreira LR, Redlich RW, Ryan L, Swofford R, Valenzuela A, Wagner F, Wallerman O, Brown AR, Damas J, Fan K, Gatesy J, Grimshaw J, Johnson J, Kozyrev SV, Lawler AJ, Marinescu VD, Morrill KM, Osmanski A, Paulat NS, Phan BN, Reilly SK, Schäffer DE, Steiner C, Supple MA, Wilder AP, Wirthlin ME, Xue JR, Birren BW, Gazal S, Hubley RM, Koepfli KP, Marques-Bonet T, Meyer WK, Nweeia M, Sabeti PC, Shapiro B, Smit AFA, Springer MS, Teeling EC, Weng Z, Hiller M, Levesque DL, Lewin HA, Murphy WJ, Navarro A, Paten B, Pollard KS, Ray DA, Ruf I, Ryder OA, Pfenning AR, Lindblad-Toh K, Karlsson EK, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Evolutionary constraint and innovation across hundreds of placental mammals. Science 2023. [PMID: 37104599 DOI: 0.1126/science.abn3943] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Zoonomia is the largest comparative genomics resource for mammals produced to date. By aligning genomes for 240 species, we identify bases that, when mutated, are likely to affect fitness and alter disease risk. At least 332 million bases (~10.7%) in the human genome are unusually conserved across species (evolutionarily constrained) relative to neutrally evolving repeats, and 4552 ultraconserved elements are nearly perfectly conserved. Of 101 million significantly constrained single bases, 80% are outside protein-coding exons and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Changes in genes and regulatory elements are associated with exceptional mammalian traits, such as hibernation, that could inform therapeutic development. Earth's vast and imperiled biodiversity offers distinctive power for identifying genetic variants that affect genome function and organismal phenotypes.
Collapse
Affiliation(s)
- Matthew J Christmas
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Irene M Kaplow
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | - Michael X Dong
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Graham M Hughes
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Xue Li
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Morningside Graduate School of Biomedical Sciences, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Allyson G Hindle
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Gregory Andrews
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Joel C Armstrong
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matteo Bianchi
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Ana M Breit
- School of Biology and Ecology, University of Maine, Orono, ME 04469, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cornelia Fanter
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Daniel B Goodman
- Department of Microbiology and Immunology, University of California San Francisco, San Francisco, CA 94143, USA
| | | | - Kathleen C Keough
- Fauna Bio, Inc., Emeryville, CA 94608, USA
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Bogdan Kirilenko
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
| | - Amanda Kowalczyk
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Colleen Lawless
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Abigail L Lind
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Jennifer R S Meadows
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Lucas R Moreira
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Ruby W Redlich
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Louise Ryan
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Ross Swofford
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Alejandro Valenzuela
- Department of Experimental and Health Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Franziska Wagner
- Museum of Zoology, Senckenberg Natural History Collections Dresden, 01109 Dresden, Germany
| | - Ola Wallerman
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Ashley R Brown
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Joana Damas
- The Genome Center, University of California Davis, Davis, CA 95616, USA
| | - Kaili Fan
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Jenna Grimshaw
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Jeremy Johnson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Sergey V Kozyrev
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Alyssa J Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Voichita D Marinescu
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Kathleen M Morrill
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Morningside Graduate School of Biomedical Sciences, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Austin Osmanski
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Nicole S Paulat
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - BaDoi N Phan
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Steven K Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Daniel E Schäffer
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Cynthia Steiner
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Megan A Supple
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Aryn P Wilder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Morgan E Wirthlin
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Allen Institute for Brain Science, Seattle, WA 98109, USA
| | - James R Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Bruce W Birren
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Steven Gazal
- Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | | | - Klaus-Peter Koepfli
- Center for Species Survival, Smithsonian's National Zoo and Conservation Biology Institute, Washington, DC 20008, USA
- Computer Technologies Laboratory, ITMO University, St. Petersburg 197101, Russia
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA 22630, USA
| | - Tomas Marques-Bonet
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08036 Barcelona, Spain
- Department of Medicine and Life Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Wynn K Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA 18015, USA
| | - Martin Nweeia
- Department of Comprehensive Care, School of Dental Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
- Department of Vertebrate Zoology, Canadian Museum of Nature, Ottawa, Ontario K2P 2R1, Canada
- Department of Vertebrate Zoology, Smithsonian Institution, Washington, DC 20002, USA
- Narwhal Genome Initiative, Department of Restorative Dentistry and Biomaterials Sciences, Harvard School of Dental Medicine, Boston, MA 02115, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Mark S Springer
- Department of Evolution, Ecology and Organismal Biology, University of California Riverside, Riverside, CA 92521, USA
| | - Emma C Teeling
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Michael Hiller
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
| | | | - Harris A Lewin
- The Genome Center, University of California Davis, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
- John Muir Institute for the Environment, University of California Davis, Davis, CA 95616, USA
| | - William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Arcadi Navarro
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- Department of Medicine and Life Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, 08005 Barcelona, Spain
- CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08003 Barcelona, Spain
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Katherine S Pollard
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - David A Ray
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Irina Ruf
- Division of Messel Research and Mammalogy, Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt am Main, Germany
| | - Oliver A Ryder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
- Department of Evolution, Behavior and Ecology, School of Biological Sciences, University of California San Diego, La Jolla, CA 92039, USA
| | - Andreas R Pfenning
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kerstin Lindblad-Toh
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Elinor K Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Kaplow IM, Lawler AJ, Schäffer DE, Srinivasan C, Sestili HH, Wirthlin ME, Phan BN, Prasad K, Brown AR, Zhang X, Foley K, Genereux DP, Karlsson EK, Lindblad-Toh K, Meyer WK, Pfenning AR, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Relating enhancer genetic variation across mammals to complex phenotypes using machine learning. Science 2023; 380:eabm7993. [PMID: 37104615 DOI: 10.1126/science.abm7993] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Protein-coding differences between species often fail to explain phenotypic diversity, suggesting the involvement of genomic elements that regulate gene expression such as enhancers. Identifying associations between enhancers and phenotypes is challenging because enhancer activity can be tissue-dependent and functionally conserved despite low sequence conservation. We developed the Tissue-Aware Conservation Inference Toolkit (TACIT) to associate candidate enhancers with species' phenotypes using predictions from machine learning models trained on specific tissues. Applying TACIT to associate motor cortex and parvalbumin-positive interneuron enhancers with neurological phenotypes revealed dozens of enhancer-phenotype associations, including brain size-associated enhancers that interact with genes implicated in microcephaly or macrocephaly. TACIT provides a foundation for identifying enhancers associated with the evolution of any convergently evolved phenotype in any large group of species with aligned genomes.
Collapse
Affiliation(s)
- Irene M Kaplow
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alyssa J Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Daniel E Schäffer
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Chaitanya Srinivasan
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Heather H Sestili
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Morgan E Wirthlin
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - BaDoi N Phan
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Kavya Prasad
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ashley R Brown
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaomeng Zhang
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Kathleen Foley
- Department of Biological Sciences, Lehigh University, Bethlehem, PA, USA
| | - Diane P Genereux
- Broad Institute, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Elinor K Karlsson
- Broad Institute, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Kerstin Lindblad-Toh
- Broad Institute, Cambridge, MA, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Wynn K Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA, USA
| | - Andreas R Pfenning
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed AW, Kontopoulos DG, Hilgers L, Lindblad-Toh K, Karlsson EK, Hiller M, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Integrating gene annotation with orthology inference at scale. Science 2023; 380:eabn3107. [PMID: 37104600 DOI: 10.1126/science.abn3107] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Annotating coding genes and inferring orthologs are two classical challenges in genomics and evolutionary biology that have traditionally been approached separately, limiting scalability. We present TOGA (Tool to infer Orthologs from Genome Alignments), a method that integrates structural gene annotation and orthology inference. TOGA implements a different paradigm to infer orthologous loci, improves ortholog detection and annotation of conserved genes compared with state-of-the-art methods, and handles even highly fragmented assemblies. TOGA scales to hundreds of genomes, which we demonstrate by applying it to 488 placental mammal and 501 bird assemblies, creating the largest comparative gene resources so far. Additionally, TOGA detects gene losses, enables selection screens, and automatically provides a superior measure of mammalian genome quality. TOGA is a powerful and scalable method to annotate and compare genes in the genomic era.
Collapse
Affiliation(s)
- Bogdan M Kirilenko
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Chetan Munegowda
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Ekaterina Osipova
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - David Jebb
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Moritz Blumer
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Ariadna E Morales
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Alexis-Walid Ahmed
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Dimitrios-Georgios Kontopoulos
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Leon Hilgers
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 751 32 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Elinor K Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Wilder AP, Supple MA, Subramanian A, Mudide A, Swofford R, Serres-Armero A, Steiner C, Koepfli KP, Genereux DP, Karlsson EK, Lindblad-Toh K, Marques-Bonet T, Munoz Fuentes V, Foley K, Meyer WK, Ryder OA, Shapiro B, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. The contribution of historical processes to contemporary extinction risk in placental mammals. Science 2023; 380:eabn5856. [PMID: 37104572 PMCID: PMC10184782 DOI: 10.1126/science.abn5856] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Species persistence can be influenced by the amount, type, and distribution of diversity across the genome, suggesting a potential relationship between historical demography and resilience. In this study, we surveyed genetic variation across single genomes of 240 mammals that compose the Zoonomia alignment to evaluate how historical effective population size (Ne) affects heterozygosity and deleterious genetic load and how these factors may contribute to extinction risk. We find that species with smaller historical Ne carry a proportionally larger burden of deleterious alleles owing to long-term accumulation and fixation of genetic load and have a higher risk of extinction. This suggests that historical demography can inform contemporary resilience. Models that included genomic data were predictive of species' conservation status, suggesting that, in the absence of adequate census or ecological data, genomic information may provide an initial risk assessment.
Collapse
Affiliation(s)
- Aryn P Wilder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Megan A Supple
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA 95064, USA
| | | | | | - Ross Swofford
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Aitor Serres-Armero
- Institute of Evolutionary Biology, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Cynthia Steiner
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA 22630, USA
- Center for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Washington, DC 30008, USA
- Computer Technologies Laboratory, ITMO University, St. Petersburg 197101, Russia
| | | | - Elinor K Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala 751 32, Sweden
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona 08003, Spain
- Catalan Institution of Research and Advanced Studies, Barcelona 08010, Spain
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona 08028, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona 08193, Spain
| | - Violeta Munoz Fuentes
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Kathleen Foley
- College of Law, University of Iowa, Iowa City, IA 52242, USA
- Department of Biological Sciences, Lehigh University, Bethlehem, PA 18015, USA
| | - Wynn K Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA 18015, USA
| | - Oliver A Ryder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
- Department of Evolution, Behavior and Ecology, Division of Biology, University of California, San Diego, La Jolla, CA 92039, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA 95064, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Andrews G, Fan K, Pratt HE, Phalke N, Karlsson EK, Lindblad-Toh K, Gazal S, Moore JE, Weng Z, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Mammalian evolution of human cis-regulatory elements and transcription factor binding sites. Science 2023; 380:eabn7930. [PMID: 37104580 DOI: 10.1126/science.abn7930] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Understanding the regulatory landscape of the human genome is a long-standing objective of modern biology. Using the reference-free alignment across 241 mammalian genomes produced by the Zoonomia Consortium, we charted evolutionary trajectories for 0.92 million human candidate cis-regulatory elements (cCREs) and 15.6 million human transcription factor binding sites (TFBSs). We identified 439,461 cCREs and 2,024,062 TFBSs under evolutionary constraint. Genes near constrained elements perform fundamental cellular processes, whereas genes near primate-specific elements are involved in environmental interaction, including odor perception and immune response. About 20% of TFBSs are transposable element-derived and exhibit intricate patterns of gains and losses during primate evolution whereas sequence variants associated with complex traits are enriched in constrained TFBSs. Our annotations illuminate the regulatory functions of the human genome.
Collapse
Affiliation(s)
- Gregory Andrews
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Kaili Fan
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Henry E Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Nishigandha Phalke
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Elinor K Karlsson
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132 Uppsala, Sweden
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Foley NM, Mason VC, Harris AJ, Bredemeyer KR, Damas J, Lewin HA, Eizirik E, Gatesy J, Karlsson EK, Lindblad-Toh K, Springer MS, Murphy WJ, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. A genomic timescale for placental mammal evolution. Science 2023; 380:eabl8189. [PMID: 37104581 DOI: 10.1126/science.abl8189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
The precise pattern and timing of speciation events that gave rise to all living placental mammals remain controversial. We provide a comprehensive phylogenetic analysis of genetic variation across an alignment of 241 placental mammal genome assemblies, addressing prior concerns regarding limited genomic sampling across species. We compared neutral genome-wide phylogenomic signals using concatenation and coalescent-based approaches, interrogated phylogenetic variation across chromosomes, and analyzed extensive catalogs of structural variants. Interordinal relationships exhibit relatively low rates of phylogenomic conflict across diverse datasets and analytical methods. Conversely, X-chromosome versus autosome conflicts characterize multiple independent clades that radiated during the Cenozoic. Genomic time trees reveal an accumulation of cladogenic events before and immediately after the Cretaceous-Paleogene (K-Pg) boundary, implying important roles for Cretaceous continental vicariance and the K-Pg extinction in the placental radiation.
Collapse
Affiliation(s)
- Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| | - Victor C Mason
- Institute of Cell Biology, University of Bern, Bern, Switzerland
| | - Andrew J Harris
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics and Genomics, Texas A&M University, College Station, TX, USA
| | - Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics and Genomics, Texas A&M University, College Station, TX, USA
| | - Joana Damas
- The Genome Center, University of California, Davis, CA, USA
| | - Harris A Lewin
- The Genome Center, University of California, Davis, CA, USA
- Department of Evolution and Ecology, University of California, Davis, CA, USA
| | - Eduardo Eizirik
- School of Health and Life Sciences, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, Brazil
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - Elinor K Karlsson
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Molecular Medicine, University of Massachussetts Chan Medical School, Worcester, MA 01605, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, USA
| | - William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics and Genomics, Texas A&M University, College Station, TX, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Xue JR, Mackay-Smith A, Mouri K, Garcia MF, Dong MX, Akers JF, Noble M, Li X, Lindblad-Toh K, Karlsson EK, Noonan JP, Capellini TD, Brennand KJ, Tewhey R, Sabeti PC, Reilly SK, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. The functional and evolutionary impacts of human-specific deletions in conserved elements. Science 2023; 380:eabn2253. [PMID: 37104592 DOI: 10.1126/science.abn2253] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Conserved genomic sequences disrupted in humans may underlie uniquely human phenotypic traits. We identified and characterized 10,032 human-specific conserved deletions (hCONDELs). These short (average 2.56 base pairs) deletions are enriched for human brain functions across genetic, epigenomic, and transcriptomic datasets. Using massively parallel reporter assays in six cell types, we discovered 800 hCONDELs conferring significant differences in regulatory activity, half of which enhance rather than disrupt regulatory function. We highlight several hCONDELs with putative human-specific effects on brain development, including HDAC5, CPEB4, and PPP2CA. Reverting an hCONDEL to the ancestral sequence alters the expression of LOXL2 and developmental genes involved in myelination and synaptic function. Our data provide a rich resource to investigate the evolutionary mechanisms driving new traits in humans and other species.
Collapse
Affiliation(s)
- James R Xue
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for System Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Ava Mackay-Smith
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | | | | | - Michael X Dong
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Jared F Akers
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Mark Noble
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Xue Li
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Elinor K Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA, USA
| | - James P Noonan
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| | - Terence D Capellini
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Kristen J Brennand
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Department of Psychiatry, Yale University, New Haven, CT, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences Tufts University School of Medicine, Boston, MA, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for System Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Department of Immunology and Infectious Disease, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Steven K Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Clawson H, Lee BT, Raney BJ, Barber GP, Casper J, Diekhans M, Fischer C, Gonzalez JN, Hinrichs AS, Lee CM, Nassar LR, Perez G, Wick B, Schmelter D, Speir ML, Armstrong J, Zweig AS, Kuhn RM, Kirilenko BM, Hiller M, Haussler D, Kent WJ, Haeussler M. GenArk: Towards a million UCSC Genome Browsers. Res Sq 2023:rs.3.rs-2697398. [PMID: 37066427 PMCID: PMC10104252 DOI: 10.21203/rs.3.rs-2697398/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Interactive graphical genome browsers are essential tools for biologists working with DNA sequences. Although tens of thousands of new genome assemblies have become available over the last decade, accessibility is limited by the work involved in manually creating browsers and curating annotations. The results can push the limits of data storage infrastructure. To facilitate managing this increasing number of genome assemblies, we created the Genome Archive (GenArk) collection of UCSC Genome Browsers from assemblies hosted at NCBI(1). Built on our established assembly hub system, this collection enables fast, on-demand visualization of chromosome regions without requiring a database server. Available annotations include gene models, some mapped through whole-genome alignments, repeat masks, GC content, and others. We also modified our popular BLAT(2) aligner and in-silico PCR to support a large number of genomes using limited RAM. Users can upload additional annotations themselves via track hubs(3) and custom tracks. We can import more annotations in bulk from third-party resources, demonstrated here with TOGA(4) gene models. 2,430 GenArk assemblies are listed at https://hgdownload.soe.ucsc.edu/hubs/ and can be found by searching on the main UCSC gateway page. We will continue to add human high-quality assemblies and for other organisms, we are looking forward to receiving requests from the research community for ever more browsers and whole-genome alignments via http://genome.ucsc.edu/assemblyRequest.html.
Collapse
Affiliation(s)
- Hiram Clawson
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Clay Fischer
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | | | - Angie S Hinrichs
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Luis R Nassar
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Gerardo Perez
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Brittney Wick
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Daniel Schmelter
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Joel Armstrong
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Bogdan M. Kirilenko
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325 Frankfurt, Germany
- Senckenberg Research Institute, Senckenberganlage 25, 60325 Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438 Frankfurt, Germany
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Senckenberganlage 25, 60325 Frankfurt, Germany
- Senckenberg Research Institute, Senckenberganlage 25, 60325 Frankfurt, Germany
- Institute of Cell Biology and Neuroscience, Faculty of Biosciences, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438 Frankfurt, Germany
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | | |
Collapse
|
13
|
Nassar LR, Barber GP, Benet-Pagès A, Casper J, Clawson H, Diekhans M, Fischer C, Gonzalez JN, Hinrichs A, Lee B, Lee C, Muthuraman P, Nguy B, Pereira T, Nejad P, Perez G, Raney B, Schmelter D, Speir M, Wick B, Zweig A, Haussler D, Kuhn R, Haeussler M, Kent W. The UCSC Genome Browser database: 2023 update. Nucleic Acids Res 2022; 51:D1188-D1195. [PMID: 36420891 PMCID: PMC9825520 DOI: 10.1093/nar/gkac1072] [Citation(s) in RCA: 121] [Impact Index Per Article: 60.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/14/2022] [Accepted: 10/25/2022] [Indexed: 11/26/2022] Open
Abstract
The UCSC Genome Browser (https://genome.ucsc.edu) is an omics data consolidator, graphical viewer, and general bioinformatics resource that continues to serve the community as it enters its 23rd year. This year has seen an emphasis in clinical data, with new tracks and an expanded Recommended Track Sets feature on hg38 as well as the addition of a single cell track group. SARS-CoV-2 continues to remain a focus, with regular annotation updates to the browser and continued curation of our phylogenetic sequence placing tool, hgPhyloPlace, whose tree has now reached over 12M sequences. Our GenArk resource has also grown, offering over 2500 hubs and a system for users to request any absent assemblies. We have expanded our bigBarChart display type and created new ways to visualize data via bigRmsk and dynseq display. Displaying custom annotations is now easier due to our chromAlias system which eliminates the requirement for renaming sequence names to the UCSC standard. Users involved in data generation may also be interested in our new tools and trackDb settings which facilitate the creation and display of their custom annotations.
Collapse
Affiliation(s)
- Luis R Nassar
- To whom correspondence should be addressed. Tel: +1 305 205 9160;
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anna Benet-Pagès
- Institute of Neurogenomics, Helmholtz Zentrum München GmbH - German Research Center for Environmental Health, 85764Neuherberg, Germany,Medical Genetics Center (Medizinisch Genetisches Zentrum), Munich 80335, Germany
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Clay Fischer
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Pranav Muthuraman
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Beagan Nguy
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Tiana Pereira
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Parisa Nejad
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Gerardo Perez
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Daniel Schmelter
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brittney D Wick
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
14
|
Doronina L, Reising O, Clawson H, Churakov G, Schmitz J. Euarchontoglires Challenged by Incomplete Lineage Sorting. Genes (Basel) 2022; 13:774. [PMID: 35627160 PMCID: PMC9141288 DOI: 10.3390/genes13050774] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 04/08/2022] [Accepted: 04/20/2022] [Indexed: 11/17/2022] Open
Abstract
Euarchontoglires, once described as Supraprimates, comprise primates, colugos, tree shrews, rodents, and lagomorphs in a clade that evolved about 90 million years ago (mya) from a shared ancestor with Laurasiatheria. The rapid speciation of groups within Euarchontoglires, and the subsequent inherent incomplete marker fixation in ancestral lineages, led to challenged attempts at phylogenetic reconstructions, particularly for the phylogenetic position of tree shrews. To resolve this conundrum, we sampled genome-wide presence/absence patterns of transposed elements (TEs) from all representatives of Euarchontoglires. This specific marker system has the advantage that phylogenetic diagnostic characters can be extracted in a nearly unbiased fashion genome-wide from reference genomes. Their insertions are virtually free of homoplasy. We simultaneously employed two computational tools, the genome presence/absence compiler (GPAC) and 2-n-way, to find a maximum of diagnostic insertions from more than 3 million TE positions. From 361 extracted diagnostic TEs, 132 provide significant support for the current resolution of Primatomorpha (Primates plus Dermoptera), 94 support the union of Euarchonta (Primates, Dermoptera, plus Scandentia), and 135 marker insertion patterns support a variety of alternative phylogenetic scenarios. Thus, whole genome-level analysis and a virtually homoplasy-free marker system offer an opportunity to finally resolve the notorious phylogenetic challenges that nature produces in rapidly diversifying groups.
Collapse
Affiliation(s)
- Liliya Doronina
- Institute of Experimental Pathology, ZMBE, University of Münster, 48149 Münster, Germany; (O.R.); (G.C.); (J.S.)
| | - Olga Reising
- Institute of Experimental Pathology, ZMBE, University of Münster, 48149 Münster, Germany; (O.R.); (G.C.); (J.S.)
| | - Hiram Clawson
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA;
| | - Gennady Churakov
- Institute of Experimental Pathology, ZMBE, University of Münster, 48149 Münster, Germany; (O.R.); (G.C.); (J.S.)
| | - Jürgen Schmitz
- Institute of Experimental Pathology, ZMBE, University of Münster, 48149 Münster, Germany; (O.R.); (G.C.); (J.S.)
- EvoPAD-RTG, University of Münster, 48149 Münster, Germany
| |
Collapse
|
15
|
Benet-Pagès A, Rosenbloom KR, Nassar LR, Lee CM, Raney BJ, Clawson H, Schmelter D, Casper J, Gonzalez JN, Perez G, Lee BT, Zweig AS, James Kent W, Haeussler M, Kuhn RM. Variant Interpretation: UCSC Genome Browser Recommended Track Sets. Hum Mutat 2022; 43:998-1011. [PMID: 35088925 PMCID: PMC9288501 DOI: 10.1002/humu.24335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 11/30/2021] [Accepted: 01/25/2022] [Indexed: 11/11/2022]
Abstract
The UCSC Genome Browser has been an important tool for genomics and clinical genetics since the sequence of the human genome was first released in 2000. As it has grown in scope to display more types of data it has also grown more complicated. The data, which are dispersed at many locations worldwide, are collected into one view on the Browser, where the graphical interface presents the data in one location. This supports the expertise of the researcher to interpret variants in the genome. Because the analysis of Single Nucleotide Variants (SNVs) and Copy Number Variants (CNVs) require interpretation of data at very different genomic scales, different data resources are required. We present here several Recommended Track Sets designed to facilitate the interpretation of variants in the clinic, offering quick access to datasets relevant to the appropriate scale. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Anna Benet-Pagès
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA.,Medical Genetics Center (MGZ), Munich, Germany
| | - Kate R Rosenbloom
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Luis R Nassar
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Daniel Schmelter
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | | | - Gerardo Perez
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| | | | - Robert M Kuhn
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, 95064, USA
| |
Collapse
|
16
|
Lee BT, Barber GP, Benet-Pagès A, Casper J, Clawson H, Diekhans M, Fischer C, Gonzalez JN, Hinrichs A, Lee C, Muthuraman P, Nassar L, Nguy B, Pereira T, Perez G, Raney B, Rosenbloom K, Schmelter D, Speir M, Wick B, Zweig A, Haussler D, Kuhn R, Haeussler M, Kent W. The UCSC Genome Browser database: 2022 update. Nucleic Acids Res 2022; 50:D1115-D1122. [PMID: 34718705 PMCID: PMC8728131 DOI: 10.1093/nar/gkab959] [Citation(s) in RCA: 129] [Impact Index Per Article: 64.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 09/30/2021] [Accepted: 10/04/2021] [Indexed: 11/25/2022] Open
Abstract
The UCSC Genome Browser, https://genome.ucsc.edu, is a graphical viewer for exploring genome annotations. The website provides integrated tools for visualizing, comparing, analyzing, and sharing both publicly available and user-generated genomic datasets. Data highlights this year include a collection of easily accessible public hub assemblies on new organisms, now featuring BLAT alignment and PCR capabilities, and new and updated clinical tracks (gnomAD, DECIPHER, CADD, REVEL). We introduced a new Track Sets feature and enhanced variant displays to aid in the interpretation of clinical data. We also added a tool to rapidly place new SARS-CoV-2 genomes in a global phylogenetic tree enabling researchers to view the context of emerging mutations in our SARS-CoV-2 Genome Browser. Other new software focuses on usability features, including more informative mouseover displays and new fonts.
Collapse
Affiliation(s)
- Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anna Benet-Pagès
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Medical Genetics Center (Medizinisch Genetisches Zentrum), Munich 80335, Germany
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Clay Fischer
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Pranav Muthuraman
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Luis R Nassar
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Beagan Nguy
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Tiana Pereira
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Gerardo Perez
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Kate R Rosenbloom
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Daniel Schmelter
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brittney D Wick
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
17
|
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, Lee C, Ko BJ, Chaisson M, Gedman GL, Cantin LJ, Thibaud-Nissen F, Haggerty L, Bista I, Smith M, Haase B, Mountcastle J, Winkler S, Paez S, Howard J, Vernes SC, Lama TM, Grutzner F, Warren WC, Balakrishnan CN, Burt D, George JM, Biegler MT, Iorns D, Digby A, Eason D, Robertson B, Edwards T, Wilkinson M, Turner G, Meyer A, Kautt AF, Franchini P, Detrich HW, Svardal H, Wagner M, Naylor GJP, Pippel M, Malinsky M, Mooney M, Simbirsky M, Hannigan BT, Pesout T, Houck M, Misuraca A, Kingan SB, Hall R, Kronenberg Z, Sović I, Dunn C, Ning Z, Hastie A, Lee J, Selvaraj S, Green RE, Putnam NH, Gut I, Ghurye J, Garrison E, Sims Y, Collins J, Pelan S, Torrance J, Tracey A, Wood J, Dagnew RE, Guan D, London SE, Clayton DF, Mello CV, Friedrich SR, Lovell PV, Osipova E, Al-Ajli FO, Secomandi S, Kim H, Theofanopoulou C, Hiller M, Zhou Y, Harris RS, Makova KD, Medvedev P, Hoffman J, Masterson P, Clark K, Martin F, Howe K, Flicek P, Walenz BP, Kwak W, Clawson H, Diekhans M, Nassar L, Paten B, Kraus RHS, Crawford AJ, Gilbert MTP, Zhang G, Venkatesh B, Murphy RW, Koepfli KP, Shapiro B, Johnson WE, Di Palma F, Marques-Bonet T, Teeling EC, Warnow T, Graves JM, Ryder OA, Haussler D, O'Brien SJ, Korlach J, Lewin HA, Howe K, Myers EW, Durbin R, Phillippy AM, Jarvis ED. Towards complete and error-free genome assemblies of all vertebrate species. Nature 2021; 592:737-746. [PMID: 33911273 PMCID: PMC8081667 DOI: 10.1038/s41586-021-03451-0] [Citation(s) in RCA: 617] [Impact Index Per Article: 205.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 03/12/2021] [Indexed: 02/02/2023]
Abstract
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Collapse
Affiliation(s)
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shane A McCarthy
- Department of Genetics, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Joana Damas
- The Genome Center, University of California Davis, Davis, CA, USA
| | - Giulio Formenti
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Marcela Uliano-Silva
- Leibniz Institute for Zoo and Wildlife Research, Department of Evolutionary Genetics, Berlin, Germany
- Berlin Center for Genomics in Biodiversity Research, Berlin, Germany
| | | | | | - Juwan Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Mark Chaisson
- University of Southern California, Los Angeles, CA, USA
| | - Gregory L Gedman
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Lindsey J Cantin
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Iliana Bista
- Department of Genetics, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | | | - Bettina Haase
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | - Sylke Winkler
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- DRESDEN-concept Genome Center, Dresden, Germany
| | - Sadye Paez
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | | | - Sonja C Vernes
- Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
- School of Biology, University of St Andrews, St Andrews, UK
| | - Tanya M Lama
- University of Massachusetts Cooperative Fish and Wildlife Research Unit, Amherst, MA, USA
| | - Frank Grutzner
- School of Biological Science, The Environment Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Wesley C Warren
- Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | | | - Dave Burt
- UQ Genomics, University of Queensland, Brisbane, Queensland, Australia
| | - Julia M George
- Department of Biological Sciences, Clemson University, Clemson, SC, USA
| | - Matthew T Biegler
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - David Iorns
- The Genetic Rescue Foundation, Wellington, New Zealand
| | - Andrew Digby
- Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand
| | - Daryl Eason
- Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand
| | - Bruce Robertson
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | | | - Mark Wilkinson
- Department of Life Sciences, Natural History Museum, London, UK
| | - George Turner
- School of Natural Sciences, Bangor University, Gwynedd, UK
| | - Axel Meyer
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Andreas F Kautt
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Paolo Franchini
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - H William Detrich
- Department of Marine and Environmental Sciences, Northeastern University Marine Science Center, Nahant, MA, USA
| | - Hannes Svardal
- Department of Biology, University of Antwerp, Antwerp, Belgium
- Naturalis Biodiversity Center, Leiden, The Netherlands
| | - Maximilian Wagner
- Institute of Biology, Karl-Franzens University of Graz, Graz, Austria
| | - Gavin J P Naylor
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
| | - Martin Pippel
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology, Dresden, Germany
| | - Milan Malinsky
- Wellcome Sanger Institute, Cambridge, UK
- Zoological Institute, University of Basel, Basel, Switzerland
| | | | | | | | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | | | | | - Ivan Sović
- Pacific Biosciences, Menlo Park, CA, USA
- Digital BioLogic, Ivanić-Grad, Croatia
| | | | - Zemin Ning
- Wellcome Sanger Institute, Cambridge, UK
| | | | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | | | - Richard E Green
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
- Dovetail Genomics, Santa Cruz, CA, USA
| | | | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Jay Ghurye
- Dovetail Genomics, Santa Cruz, CA, USA
- Department of Computer Science, University of Maryland College Park, College Park, MD, USA
| | - Erik Garrison
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Ying Sims
- Wellcome Sanger Institute, Cambridge, UK
| | | | | | | | | | | | | | - Dengfeng Guan
- Department of Genetics, University of Cambridge, Cambridge, UK
- School of Computer Science and Technology, Center for Bioinformatics, Harbin Institute of Technology, Harbin, China
| | - Sarah E London
- Department of Psychology, Institute for Mind and Biology, University of Chicago, Chicago, IL, USA
| | - David F Clayton
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Samantha R Friedrich
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Peter V Lovell
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Ekaterina Osipova
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology, Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
| | - Farooq O Al-Ajli
- Monash University Malaysia Genomics Facility, School of Science, Selangor Darul Ehsan, Malaysia
- Tropical Medicine and Biology Multidisciplinary Platform, Monash University Malaysia, Selangor Darul Ehsan, Malaysia
- Qatar Falcon Genome Project, Doha, Qatar
| | | | - Heebal Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
- eGnome, Inc., Seoul, Republic of Korea
| | | | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Research Institute, Frankfurt, Germany
- Goethe-University, Faculty of Biosciences, Frankfurt, Germany
| | | | - Robert S Harris
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Paul Medvedev
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Jinna Hoffman
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Karen Clark
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Fergal Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Kevin Howe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Woori Kwak
- eGnome, Inc., Seoul, Republic of Korea
- Hoonygen, Seoul, Korea
| | - Hiram Clawson
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Luis Nassar
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Robert H S Kraus
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, Germany
| | - Andrew J Crawford
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, The GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
- University Museum, NTNU, Trondheim, Norway
| | - Guojie Zhang
- China National Genebank, BGI-Shenzhen, Shenzhen, China
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Byrappa Venkatesh
- Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore, Singapore
| | - Robert W Murphy
- Centre for Biodiversity, Royal Ontario Museum, Toronto, Ontario, Canada
| | - Klaus-Peter Koepfli
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, DC, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Warren E Johnson
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, DC, USA
- The Walter Reed Biosystematics Unit, Museum Support Center MRC-534, Smithsonian Institution, Suitland, MD, USA
- Walter Reed Army Institute of Research, Silver Spring, MD, USA
| | - Federica Di Palma
- Department of Biological Sciences, Earlham Institute, University of East Anglia, Norwich, UK
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Emma C Teeling
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
| | - Tandy Warnow
- Department of Computer Science, The University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | | | - Oliver A Ryder
- San Diego Zoo Global, Escondido, CA, USA
- Department of Evolution, Behavior, and Ecology, University of California San Diego, La Jolla, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Stephen J O'Brien
- Laboratory of Genomics Diversity-Center for Computer Technologies, ITMO University, St. Petersburg, Russian Federation
- Guy Harvey Oceanographic Center, Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Fort Lauderdale, FL, USA
| | | | - Harris A Lewin
- The Genome Center, University of California Davis, Davis, CA, USA
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
- John Muir Institute for the Environment, University of California Davis, Davis, CA, USA
| | | | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
- Center for Systems Biology, Dresden, Germany.
- Faculty of Computer Science, Technical University Dresden, Dresden, Germany.
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, UK.
- Wellcome Sanger Institute, Cambridge, UK.
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA.
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
18
|
Navarro Gonzalez J, Zweig AS, Speir ML, Schmelter D, Rosenbloom KR, Raney BJ, Powell CC, Nassar LR, Maulding ND, Lee CM, Lee BT, Hinrichs AS, Fyfe AC, Fernandes JD, Diekhans M, Clawson H, Casper J, Benet-Pagès A, Barber GP, Haussler D, Kuhn RM, Haeussler M, Kent WJ. The UCSC Genome Browser database: 2021 update. Nucleic Acids Res 2021; 49:D1046-D1057. [PMID: 33221922 PMCID: PMC7779060 DOI: 10.1093/nar/gkaa1070] [Citation(s) in RCA: 273] [Impact Index Per Article: 91.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 10/19/2020] [Accepted: 11/18/2020] [Indexed: 12/11/2022] Open
Abstract
For more than two decades, the UCSC Genome Browser database (https://genome.ucsc.edu) has provided high-quality genomics data visualization and genome annotations to the research community. As the field of genomics grows and more data become available, new modes of display are required to accommodate new technologies. New features released this past year include a Hi-C heatmap display, a phased family trio display for VCF files, and various track visualization improvements. Striving to keep data up-to-date, new updates to gene annotations include GENCODE Genes, NCBI RefSeq Genes, and Ensembl Genes. New data tracks added for human and mouse genomes include the ENCODE registry of candidate cis-regulatory elements, promoters from the Eukaryotic Promoter Database, and NCBI RefSeq Select and Matched Annotation from NCBI and EMBL-EBI (MANE). Within weeks of learning about the outbreak of coronavirus, UCSC released a genome browser, with detailed annotation tracks, for the SARS-CoV-2 RNA reference assembly.
Collapse
Affiliation(s)
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Daniel Schmelter
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Kate R Rosenbloom
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Conner C Powell
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Luis R Nassar
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Nathan D Maulding
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Alastair C Fyfe
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jason D Fernandes
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Anna Benet-Pagès
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.,Medical Genetics Center (MGZ), Munich, Germany
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.,Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
19
|
Fernandes JD, Hinrichs AS, Clawson H, Gonzalez JN, Lee BT, Nassar LR, Raney BJ, Rosenbloom KR, Nerli S, Rao AA, Schmelter D, Fyfe A, Maulding N, Zweig AS, Lowe TM, Ares M, Corbet-Detig R, Kent WJ, Haussler D, Haeussler M. The UCSC SARS-CoV-2 Genome Browser. Nat Genet 2020; 52:991-998. [PMID: 32908258 PMCID: PMC8016453 DOI: 10.1038/s41588-020-0700-8] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Background: Researchers are generating molecular data pertaining to the SARS-CoV-2 RNA genome and its proteins at an unprecedented rate during the COVID-19 pandemic. As a result, there is a critical need for rapid and continuously updated access to the latest molecular data in a format in which all data can be quickly cross-referenced and compared. We adapted our genome browser visualization tool to the viral genome for this purpose. Molecular data, curated from published studies or from database submissions, are mapped to the viral genome and grouped together into “annotation tracks” where they can be visualized along the linear map of the viral genome sequence and programmatically downloaded in standard format for analysis. Results: The UCSC Genome Browser for SARS-CoV-2 (https://genome.ucsc.edu/covid19.html ) provides continuously updated access to the mutations in the many thousands of SARS-CoV-2 genomes deposited in GISAID and the international nucleotide sequencing databases, displayed alongside phylogenetic trees. These data are augmented with alignments of bat, pangolin, and other animal and human coronavirus genomes, including per-base evolutionary rate analysis. All available annotations are cross-referenced on the virus genome, including those from major databases (PDB, RFAM, IEDB, UniProt) as well as up-to-date individual results from preprints. Annotated data include predicted and validated immune epitopes, promising antibodies, RT-PCR and sequencing primers, CRISPR guides (from research, diagnostics, vaccines, and therapies), and points of interaction between human and viral genes. As a community resource, any user can add manual annotations which are quality checked and shared publicly on the browser the next day. Conclusions: We invite all investigators to contribute additional data and annotations to this resource to accelerate research and development activities globally. Contact us at genome-www@soe.ucsc.edu with data suggestions or requests for support for adding data. Rapid sharing of data will accelerate SARS-CoV-2 research, especially when researchers take time to integrate their data with those from other labs on a widely-used community browser platform with standardized machine-readable data formats, such as the SARS-CoV-2 Genome Browser.
Collapse
Affiliation(s)
- Jason D Fernandes
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Hiram Clawson
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Brian T Lee
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Luis R Nassar
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Brian J Raney
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Kate R Rosenbloom
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Santrupti Nerli
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Arjun A Rao
- ImmunoX Initiative, University of California San Francisco, San Francisco, CA, USA
| | - Daniel Schmelter
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Alastair Fyfe
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Nathan Maulding
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ann S Zweig
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Todd M Lowe
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- Center for Molecular Biology of RNA, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Manuel Ares
- Molecular, Cell and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
- Center for Molecular Biology of RNA, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Russ Corbet-Detig
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - W James Kent
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
- Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
- Center for Molecular Biology of RNA, University of California Santa Cruz, Santa Cruz, CA, USA.
| | | |
Collapse
|
20
|
Lee CM, Barber GP, Casper J, Clawson H, Diekhans M, Gonzalez JN, Hinrichs AS, Lee BT, Nassar LR, Powell CC, Raney BJ, Rosenbloom KR, Schmelter D, Speir ML, Zweig AS, Haussler D, Haeussler M, Kuhn RM, Kent WJ. UCSC Genome Browser enters 20th year. Nucleic Acids Res 2020; 48:D756-D761. [PMID: 31691824 PMCID: PMC7145642 DOI: 10.1093/nar/gkz1012] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 10/16/2019] [Accepted: 10/25/2019] [Indexed: 12/27/2022] Open
Abstract
The University of California Santa Cruz Genome Browser website (https://genome.ucsc.edu) enters its 20th year of providing high-quality genomics data visualization and genome annotations to the research community. In the past year, we have added a new option to our web BLAT tool that allows search against all genomes, a single-cell expression viewer (https://cells.ucsc.edu), a ‘lollipop’ plot display mode for high-density variation data, a RESTful API for data extraction and a custom-track backup feature. New datasets include Tabula Muris single-cell expression data, GeneHancer regulatory annotations, The Cancer Genome Atlas Pan-Cancer variants, Genome Reference Consortium Patch sequences, new ENCODE transcription factor binding site peaks and clusters, the Database of Genomic Variants Gold Standard Variants, Genomenon Mastermind variants and three new multi-species alignment tracks.
Collapse
Affiliation(s)
- Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Luis R Nassar
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Conner C Powell
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Kate R Rosenbloom
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Daniel Schmelter
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.,Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
21
|
Haeussler M, Zweig AS, Tyner C, Speir ML, Rosenbloom KR, Raney BJ, Lee CM, Lee BT, Hinrichs AS, Gonzalez JN, Gibson D, Diekhans M, Clawson H, Casper J, Barber GP, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res 2020; 47:D853-D858. [PMID: 30407534 PMCID: PMC6323953 DOI: 10.1093/nar/gky1095] [Citation(s) in RCA: 505] [Impact Index Per Article: 126.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 10/19/2018] [Indexed: 01/17/2023] Open
Abstract
The UCSC Genome Browser (https://genome.ucsc.edu) is a graphical viewer for exploring genome annotations. For almost two decades, the Browser has provided visualization tools for genetics and molecular biology and continues to add new data and features. This year, we added a new tool that lets users interactively arrange existing graphing tracks into new groups. Other software additions include new formats for chromosome interactions, a ChIP-Seq peak display for track hubs and improved support for HGVS. On the annotation side, we have added gnomAD, TCGA expression, RefSeq Functional elements, GTEx eQTLs, CRISPR Guides, SNPpedia and created a 30-way primate alignment on the human genome. Nine assemblies now have RefSeq-mapped gene models.
Collapse
Affiliation(s)
- Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cath Tyner
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Kate R Rosenbloom
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - David Gibson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.,Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
22
|
Casper J, Zweig AS, Villarreal C, Tyner C, Speir ML, Rosenbloom KR, Raney BJ, Lee CM, Lee BT, Karolchik D, Hinrichs AS, Haeussler M, Guruvadoo L, Navarro Gonzalez J, Gibson D, Fiddes IT, Eisenhart C, Diekhans M, Clawson H, Barber GP, Armstrong J, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2018 update. Nucleic Acids Res 2019; 46:D762-D769. [PMID: 29106570 PMCID: PMC5753355 DOI: 10.1093/nar/gkx1020] [Citation(s) in RCA: 338] [Impact Index Per Article: 67.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/18/2017] [Indexed: 12/14/2022] Open
Abstract
The UCSC Genome Browser (https://genome.ucsc.edu) provides a web interface for exploring annotated genome assemblies. The assemblies and annotation tracks are updated on an ongoing basis—12 assemblies and more than 28 tracks were added in the past year. Two recent additions are a display of CRISPR/Cas9 guide sequences and an interactive navigator for gene interactions. Other upgrades from the past year include a command-line version of the Variant Annotation Integrator, support for Human Genome Variation Society variant nomenclature input and output, and a revised highlighting tool that now supports multiple simultaneous regions and colors.
Collapse
Affiliation(s)
- Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Chris Villarreal
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cath Tyner
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Kate R Rosenbloom
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Donna Karolchik
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - David Gibson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ian T Fiddes
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Joel Armstrong
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.,Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
23
|
Doronina L, Reising O, Clawson H, Ray DA, Schmitz J. True Homoplasy of Retrotransposon Insertions in Primates. Syst Biol 2019; 68:482-493. [PMID: 30445649 DOI: 10.1093/sysbio/syy076] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 11/05/2018] [Accepted: 11/13/2018] [Indexed: 01/24/2023] Open
Abstract
How reliable are the presence/absence insertion patterns of the supposedly homoplasy-free retrotransposons, which were randomly inserted in the quasi infinite genomic space? To systematically examine this question in an up-to-date, multigenome comparison, we screened millions of primate transposed Alu SINE elements for incidences of homoplasious precise insertions and deletions. In genome-wide analyses, we identified and manually verified nine cases of precise parallel Alu insertions of apparently identical elements at orthologous positions in two ape lineages and twelve incidences of precise deletions of previously established SINEs. Correspondingly, eight precise parallel insertions and no exact deletions were detected in a comparison of lemuriform primate and human insertions spanning the range of primate diversity. With an overall frequency of homoplasious Alu insertions of only 0.01% (for human-chimpanzee-rhesus macaque) and 0.02-0.04% (for human-bushbaby-lemurs) and precise Alu deletions of 0.001-0.002% (for human-chimpanzee-rhesus macaque), real homoplasy is not considered to be a quantitatively relevant source of evolutionary noise. Thus, presence/absence patterns of Alu retrotransposons and, presumably, all LINE1-mobilized elements represent indeed the virtually homoplasy-free markers they are considered to be. Therefore, ancestral incomplete lineage sorting and hybridization remain the only serious sources of conflicting presence/absence patterns of retrotransposon insertions, and as such are detectable and quantifiable. [Homoplasy; precise deletions; precise parallel insertions; primates; retrotransposons.].
Collapse
Affiliation(s)
- Liliya Doronina
- Institute of Experimental Pathology (ZMBE), University of Münster, Von-Esmarch-Str. 56, D-48149 Münster, Germany
| | - Olga Reising
- Institute of Experimental Pathology (ZMBE), University of Münster, Von-Esmarch-Str. 56, D-48149 Münster, Germany
| | - Hiram Clawson
- Department of Biomolecular Engineering, University of California, 1156 High Street, Santa Cruz, CA, USA
| | - David A Ray
- Department of Biological Sciences, Texas Tech University, 2901 Main Street, Lubbock, TX, USA
| | - Jürgen Schmitz
- Institute of Experimental Pathology (ZMBE), University of Münster, Von-Esmarch-Str. 56, D-48149 Münster, Germany
| |
Collapse
|
24
|
Doronina L, Churakov G, Kuritzin A, Shi J, Baertsch R, Clawson H, Schmitz J. Speciation network in Laurasiatheria: retrophylogenomic signals. Genome Res 2017; 27:997-1003. [PMID: 28298429 PMCID: PMC5453332 DOI: 10.1101/gr.210948.116] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2016] [Accepted: 02/23/2017] [Indexed: 11/24/2022]
Abstract
Rapid species radiation due to adaptive changes or occupation of new ecospaces challenges our understanding of ancestral speciation and the relationships of modern species. At the molecular level, rapid radiation with successive speciations over short time periods-too short to fix polymorphic alleles-is described as incomplete lineage sorting. Incomplete lineage sorting leads to random fixation of genetic markers and hence, random signals of relationships in phylogenetic reconstructions. The situation is further complicated when you consider that the genome is a mosaic of ancestral and modern incompletely sorted sequence blocks that leads to reconstructed affiliations to one or the other relative, depending on the fixation of their shared ancestral polymorphic alleles. The laurasiatherian relationships among Chiroptera, Perissodactyla, Cetartiodactyla, and Carnivora present a prime example for such enigmatic affiliations. We performed whole-genome screenings for phylogenetically diagnostic retrotransposon insertions involving the representatives bat (Chiroptera), horse (Perissodactyla), cow (Cetartiodactyla), and dog (Carnivora), and extracted among 162,000 preselected cases 102 virtually homoplasy-free, phylogenetically informative retroelements to draw a complete picture of the highly complex evolutionary relations within Laurasiatheria. All possible evolutionary scenarios received considerable retrotransposon support, leaving us with a network of affiliations. However, the Cetartiodactyla-Carnivora relationship as well as the basal position of Chiroptera and an ancestral laurasiatherian hybridization process did exhibit some very clear, distinct signals. The significant accordance of retrotransposon presence/absence patterns and flanking nucleotide changes suggest an important influence of mosaic genome structures in the reconstruction of species histories.
Collapse
Affiliation(s)
- Liliya Doronina
- Institute of Experimental Pathology, ZMBE, University of Münster, 48149 Münster, Germany
| | - Gennady Churakov
- Institute of Experimental Pathology, ZMBE, University of Münster, 48149 Münster, Germany
- Institute for Evolution and Biodiversity, University of Münster, 48149 Münster, Germany
| | - Andrej Kuritzin
- Department of System Analysis, Saint Petersburg State Institute of Technology, 190013 St. Petersburg, Russia
| | - Jingjing Shi
- Institute of Experimental Pathology, ZMBE, University of Münster, 48149 Münster, Germany
| | - Robert Baertsch
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Hiram Clawson
- Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064, USA
| | - Jürgen Schmitz
- Institute of Experimental Pathology, ZMBE, University of Münster, 48149 Münster, Germany
| |
Collapse
|
25
|
Tyner C, Barber GP, Casper J, Clawson H, Diekhans M, Eisenhart C, Fischer CM, Gibson D, Gonzalez JN, Guruvadoo L, Haeussler M, Heitner S, Hinrichs AS, Karolchik D, Lee BT, Lee CM, Nejad P, Raney BJ, Rosenbloom KR, Speir ML, Villarreal C, Vivian J, Zweig AS, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2017 update. Nucleic Acids Res 2017; 45:D626-D634. [PMID: 27899642 PMCID: PMC5210591 DOI: 10.1093/nar/gkw1134] [Citation(s) in RCA: 197] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Revised: 10/17/2016] [Accepted: 10/31/2016] [Indexed: 12/14/2022] Open
Abstract
Since its 2001 debut, the University of California, Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu/) team has provided continuous support to the international genomics and biomedical communities through a web-based, open source platform designed for the fast, scalable display of sequence alignments and annotations landscaped against a vast collection of quality reference genome assemblies. The browser's publicly accessible databases are the backbone of a rich, integrated bioinformatics tool suite that includes a graphical interface for data queries and downloads, alignment programs, command-line utilities and more. This year's highlights include newly designed home and gateway pages; a new 'multi-region' track display configuration for exon-only, gene-only and custom regions visualization; new genome browsers for three species (brown kiwi, crab-eating macaque and Malayan flying lemur); eight updated genome assemblies; extended support for new data types such as CRAM, RNA-seq expression data and long-range chromatin interaction pairs; and the unveiling of a new supported mirror site in Japan.
Collapse
Affiliation(s)
- Cath Tyner
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Clayton M Fischer
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Gibson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Luvina Guruvadoo
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Steve Heitner
- Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Donna Karolchik
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Christopher M Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Parisa Nejad
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Kate R Rosenbloom
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Chris Villarreal
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - John Vivian
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
26
|
Speir ML, Zweig AS, Rosenbloom KR, Raney BJ, Paten B, Nejad P, Lee BT, Learned K, Karolchik D, Hinrichs AS, Heitner S, Harte RA, Haeussler M, Guruvadoo L, Fujita PA, Eisenhart C, Diekhans M, Clawson H, Casper J, Barber GP, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2016 update. Nucleic Acids Res 2015; 44:D717-25. [PMID: 26590259 PMCID: PMC4702902 DOI: 10.1093/nar/gkv1275] [Citation(s) in RCA: 334] [Impact Index Per Article: 37.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 11/03/2015] [Indexed: 01/19/2023] Open
Abstract
For the past 15 years, the UCSC Genome Browser (http://genome.ucsc.edu/) has served the international research community by offering an integrated platform for viewing and analyzing information from a large database of genome assemblies and their associated annotations. The UCSC Genome Browser has been under continuous development since its inception with new data sets and software features added frequently. Some release highlights of this year include new and updated genome browsers for various assemblies, including bonobo and zebrafish; new gene annotation sets; improvements to track and assembly hub support; and a new interactive tool, the “Data Integrator”, for intersecting data from multiple tracks. We have greatly expanded the data sets available on the most recent human assembly, hg38/GRCh38, to include updated gene prediction sets from GENCODE, more phenotype- and disease-associated variants from ClinVar and ClinGen, more genomic regulatory data, and a new multiple genome alignment.
Collapse
Affiliation(s)
- Matthew L Speir
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Kate R Rosenbloom
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Parisa Nejad
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Katrina Learned
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Donna Karolchik
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Steve Heitner
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Maximilian Haeussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Pauline A Fujita
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA 94143, USA
| | | | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA Howard Hughes Medical Institute, University of California Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
27
|
Doronina L, Churakov G, Shi J, Brosius J, Baertsch R, Clawson H, Schmitz J. Exploring Massive Incomplete Lineage Sorting in Arctoids (Laurasiatheria, Carnivora). Mol Biol Evol 2015; 32:3194-204. [PMID: 26337548 DOI: 10.1093/molbev/msv188] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Freed from the competition of large raptors, Paleocene carnivores could expand their newly acquired habitats in search of prey. Such changing conditions might have led to their successful distribution and rapid radiation. Today, molecular evolutionary biologists are faced, however, with the consequences of such accelerated adaptive radiations, because they led to sequential speciation more rapidly than phylogenetic markers could be fixed. The repercussions being that current genealogies based on such markers are incongruent with species trees.Our aim was to explore such conflicting phylogenetic zones of evolution during the early arctoid radiation, especially to distinguish diagnostic from misleading phylogenetic signals, and to examine other carnivore-related speciation events. We applied a combination of high-throughput computational strategies to screen carnivore and related genomes in silico for randomly inserted retroposed elements that we then used to identify inconsistent phylogenetic patterns in the Arctoidea group, which is well known for phylogenetic discordances.Our combined retrophylogenomic and in vitro wet lab approach detected hundreds of carnivore-specific insertions, many of them confirming well-established splits or identifying and solving conflicting species distributions. Our systematic genome-wide screens for Long INterspersed Elements detected homoplasy-free markers with insertion-specific truncation points that we used to distinguish phylogenetically informative markers from conflicting signals. The results were independently confirmed by phylogenetic diagnostic Short INterspersed Elements. As statistical analysis ruled out ancestral hybridization, these doubly verified but still conflicting patterns were statistically determined to be genomic remnants from a time of ancestral incomplete lineage sorting that especially accompanied large parts of Arctoidea evolution.
Collapse
Affiliation(s)
- Liliya Doronina
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany
| | - Gennady Churakov
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Jingjing Shi
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany
| | - Jürgen Brosius
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany Institute of Evolutionary and Medical Genomics, Brandenburg Medical School (MHB), Neuruppin, Germany
| | - Robert Baertsch
- Department of Biomolecular Engineering, University of California, Santa Cruz
| | - Hiram Clawson
- Department of Biomolecular Engineering, University of California, Santa Cruz
| | - Jürgen Schmitz
- Institute of Experimental Pathology, ZMBE, University of Münster, Münster, Germany
| |
Collapse
|
28
|
Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hickey G, Hinrichs AS, Hubley R, Karolchik D, Learned K, Lee BT, Li CH, Miga KH, Nguyen N, Paten B, Raney BJ, Smit AFA, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2015 update. Nucleic Acids Res 2014; 43:D670-81. [PMID: 25428374 PMCID: PMC4383971 DOI: 10.1093/nar/gku1177] [Citation(s) in RCA: 690] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.
Collapse
Affiliation(s)
- Kate R Rosenbloom
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Joel Armstrong
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Timothy R Dreszer
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Pauline A Fujita
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Rachel A Harte
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Steve Heitner
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Glenn Hickey
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Robert Hubley
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Donna Karolchik
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Chin H Li
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Karen H Miga
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ngan Nguyen
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Benedict Paten
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | | | - Matthew L Speir
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - W James Kent
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| |
Collapse
|
29
|
Haeussler M, Karolchik D, Clawson H, Raney BJ, Rosenbloom KR, Fujita PA, Hinrichs AS, Speir ML, Eisenhart C, Zweig AS, Haussler D, Kent WJ. The UCSC Ebola Genome Portal. PLoS Curr 2014; 6. [PMID: 25685613 PMCID: PMC4318873 DOI: 10.1371/currents.outbreaks.386ab0964ab4d6c8cb550bfb6071d822] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Background:
With the Ebola epidemic raging out of control in West Africa, there has been a flurry of research into the Ebola virus, resulting in the generation of much genomic data.
Methods:
In response to the clear need for tools that integrate multiple strands of research around molecular sequences, we have created the University of California Santa Cruz (UCSC) Ebola Genome Browser, an adaptation of our popular UCSC Genome Browser web tool, which can be used to view the Ebola virus genome sequence from GenBank and nearly 30 annotation tracks generated by mapping external data to the reference sequence. Significant annotations include a multiple alignment comprising 102 Ebola genomes from the current outbreak, 56 from previous outbreaks, and 2 Marburg genomes as an outgroup; a gene track curated by NCBI; protein annotations curated by UniProt and antibody-binding epitopes curated by IEDB. We have extended the Genome Browser’s multiple alignment color-coding scheme to distinguish mutations resulting from non-synonymous coding changes, synonymous changes, or changes in untranslated regions.
Discussion:
Our Ebola Genome portal at http://genome.ucsc.edu/ebolaPortal/ links to the Ebola virus Genome Browser and an aggregate of useful information, including a collection of Ebola antibodies we are curating.
Collapse
Affiliation(s)
| | - Donna Karolchik
- CBSE, University of California Santa Cruz, Santa Cruz, California, USA
| | - Hiram Clawson
- CBSE, University of California Santa Cruz, Santa Cruz, California, USA
| | - Brian J Raney
- CBSE, University of California Santa Cruz, Santa Cruz, California, USA
| | - Kate R Rosenbloom
- CBSE, University of California Santa Cruz, Santa Cruz, California, USA
| | - Pauline A Fujita
- CBSE, University of California Santa Cruz, Santa Cruz, California, USA
| | - Angie S Hinrichs
- Genomics Institute, University of California Santa Cruz, Santa Cruz, California, USA
| | | | - Chris Eisenhart
- CBSE, University of California Santa Cruz, Santa Cruz, California, USA
| | - Ann S Zweig
- CBSE, University of California Santa Cruz, Santa Cruz, California, USA
| | - David Haussler
- CBSE, University of California Santa Cruz, Santa Cruz, California, USA
| | - W James Kent
- CBSE, University of California Santa Cruz, Santa Cruz, California, USA
| |
Collapse
|
30
|
Haeussler M, Raney BJ, Hinrichs AS, Clawson H, Zweig AS, Karolchik D, Casper J, Speir ML, Haussler D, Kent WJ. Navigating protected genomics data with UCSC Genome Browser in a Box. Bioinformatics 2014; 31:764-6. [PMID: 25348212 PMCID: PMC4341066 DOI: 10.1093/bioinformatics/btu712] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Summary: Genome Browser in a Box (GBiB) is a small virtual machine version of the popular University of California Santa Cruz (UCSC) Genome Browser that can be run on a researcher's own computer. Once GBiB is installed, a standard web browser is used to access the virtual server and add personal data files from the local hard disk. Annotation data are loaded on demand through the Internet from UCSC or can be downloaded to the local computer for faster access. Availability and implementation: Software downloads and installation instructions are freely available for non-commercial use at https://genome-store.ucsc.edu/. GBiB requires the installation of open-source software VirtualBox, available for all major operating systems, and the UCSC Genome Browser, which is open source and free for non-commercial use. Commercial use of GBiB and the Genome Browser requires a license (http://genome.ucsc.edu/license/). Contact:genome@soe.ucsc.edu
Collapse
Affiliation(s)
- Maximilian Haeussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Donna Karolchik
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matthew L Speir
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - W James Kent
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| |
Collapse
|
31
|
Nguyen N, Hickey G, Raney BJ, Armstrong J, Clawson H, Zweig A, Karolchik D, Kent WJ, Haussler D, Paten B. Comparative assembly hubs: web-accessible browsers for comparative genomics. ACTA ACUST UNITED AC 2014; 30:3293-301. [PMID: 25138168 DOI: 10.1093/bioinformatics/btu534] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
MOTIVATION Researchers now have access to large volumes of genome sequences for comparative analysis, some generated by the plethora of public sequencing projects and, increasingly, from individual efforts. It is not possible, or necessarily desirable, that the public genome browsers attempt to curate all these data. Instead, a wealth of powerful tools is emerging to empower users to create their own visualizations and browsers. RESULTS We introduce a pipeline to easily generate collections of Web-accessible UCSC Genome Browsers interrelated by an alignment. It is intended to democratize our comparative genomic browser resources, serving the broad and growing community of evolutionary genomicists and facilitating easy public sharing via the Internet. Using the alignment, all annotations and the alignment itself can be efficiently viewed with reference to any genome in the collection, symmetrically. A new, intelligently scaled alignment display makes it simple to view all changes between the genomes at all levels of resolution, from substitutions to complex structural rearrangements, including duplications. To demonstrate this work, we create a comparative assembly hub containing 57 Escherichia coli and 9 Shigella genomes and show examples that highlight their unique biology. AVAILABILITY AND IMPLEMENTATION The source code is available as open source at: https://github.com/glennhickey/progressiveCactus The E.coli and Shigella genome hub is now a public hub listed on the UCSC browser public hubs Web page.
Collapse
Affiliation(s)
- Ngan Nguyen
- Center for Biomolecular Sciences and Engineering, CBSE/ITI, UC Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Glenn Hickey
- Center for Biomolecular Sciences and Engineering, CBSE/ITI, UC Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Center for Biomolecular Sciences and Engineering, CBSE/ITI, UC Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Joel Armstrong
- Center for Biomolecular Sciences and Engineering, CBSE/ITI, UC Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Sciences and Engineering, CBSE/ITI, UC Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ann Zweig
- Center for Biomolecular Sciences and Engineering, CBSE/ITI, UC Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Donna Karolchik
- Center for Biomolecular Sciences and Engineering, CBSE/ITI, UC Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - William James Kent
- Center for Biomolecular Sciences and Engineering, CBSE/ITI, UC Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Sciences and Engineering, CBSE/ITI, UC Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA Center for Biomolecular Sciences and Engineering, CBSE/ITI, UC Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Benedict Paten
- Center for Biomolecular Sciences and Engineering, CBSE/ITI, UC Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| |
Collapse
|
32
|
Karolchik D, Barber GP, Casper J, Clawson H, Cline MS, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hinrichs AS, Learned K, Lee BT, Li CH, Raney BJ, Rhead B, Rosenbloom KR, Sloan CA, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2014 update. Nucleic Acids Res 2014; 42:D764-70. [PMID: 24270787 PMCID: PMC3964947 DOI: 10.1093/nar/gkt1168] [Citation(s) in RCA: 550] [Impact Index Per Article: 55.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2013] [Revised: 10/30/2013] [Accepted: 10/30/2013] [Indexed: 12/17/2022] Open
Abstract
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a large collection of organisms, primarily vertebrates, with an emphasis on the human and mouse genomes. The Browser's web-based tools provide an integrated environment for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic data sets. As of September 2013, the database contained genomic sequence and a basic set of annotation 'tracks' for ∼90 organisms. Significant new annotations include a 60-species multiple alignment conservation track on the mouse, updated UCSC Genes tracks for human and mouse, and several new sets of variation and ENCODE data. New software tools include a Variant Annotation Integrator that returns predicted functional effects of a set of variants uploaded as a custom track, an extension to UCSC Genes that displays haplotype alleles for protein-coding genes and an expansion of data hubs that includes the capability to display remotely hosted user-provided assembly sequence in addition to annotation data. To improve European access, we have added a Genome Browser mirror (http://genome-euro.ucsc.edu) hosted at Bielefeld University in Germany.
Collapse
Affiliation(s)
- Donna Karolchik
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Galt P. Barber
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Melissa S. Cline
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Timothy R. Dreszer
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Pauline A. Fujita
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Rachel A. Harte
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Steve Heitner
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Angie S. Hinrichs
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian T. Lee
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Chin H. Li
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian J. Raney
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brooke Rhead
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Kate R. Rosenbloom
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Cricket A. Sloan
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Matthew L. Speir
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ann S. Zweig
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Robert M. Kuhn
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| | - W. James Kent
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), 1156 High Street, Santa Cruz, CA 95064, USA, Computational Biology Graduate Group, University of California Berkeley, Berkeley, CA 94720, USA, Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Stanford, CA 94305, USA and Howard Hughes Medical Institute, Center for Biomolecular Science and Engineering, UCSC, 1156 High Street, Santa Cruz, CA 95064, USA
| |
Collapse
|
33
|
Raney BJ, Dreszer TR, Barber GP, Clawson H, Fujita PA, Wang T, Nguyen N, Paten B, Zweig AS, Karolchik D, Kent WJ. Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics 2013; 30:1003-5. [PMID: 24227676 PMCID: PMC3967101 DOI: 10.1093/bioinformatics/btt637] [Citation(s) in RCA: 286] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
SUMMARY Track data hubs provide an efficient mechanism for visualizing remotely hosted Internet-accessible collections of genome annotations. Hub datasets can be organized, configured and fully integrated into the University of California Santa Cruz (UCSC) Genome Browser and accessed through the familiar browser interface. For the first time, individuals can use the complete browser feature set to view custom datasets without the overhead of setting up and maintaining a mirror. AVAILABILITY AND IMPLEMENTATION Source code for the BigWig, BigBed and Genome Browser software is freely available for non-commercial use at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, implemented in C and supported on Linux. Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at http://hgdownload.cse.ucsc.edu/admin/exe/. Binary Alignment/Map (BAM) and Variant Call Format (VCF)/tabix utilities are available from http://samtools.sourceforge.net/ and http://vcftools.sourceforge.net/. The UCSC Genome Browser is publicly accessible at http://genome.ucsc.edu.
Collapse
Affiliation(s)
- Brian J Raney
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA and Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO 63108, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, Raney BJ, Pohl A, Malladi VS, Li CH, Lee BT, Learned K, Kirkup V, Hsu F, Heitner S, Harte RA, Haeussler M, Guruvadoo L, Goldman M, Giardine BM, Fujita PA, Dreszer TR, Diekhans M, Cline MS, Clawson H, Barber GP, Haussler D, Kent WJ. The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res 2013; 41:D64-9. [PMID: 23155063 PMCID: PMC3531082 DOI: 10.1093/nar/gks1048] [Citation(s) in RCA: 612] [Impact Index Per Article: 55.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2012] [Accepted: 10/08/2012] [Indexed: 11/14/2022] Open
Abstract
The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic datasets. As of September 2012, genomic sequence and a basic set of annotation 'tracks' are provided for 63 organisms, including 26 mammals, 13 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms, yeast and sea hare. In the past year 19 new genome assemblies have been added, and we anticipate releasing another 28 in early 2013. Further, a large number of annotation tracks have been either added, updated by contributors or remapped to the latest human reference genome. Among these are an updated UCSC Genes track for human and mouse assemblies. We have also introduced several features to improve usability, including new navigation menus. This article provides an update to the UCSC Genome Browser database, which has been previously featured in the Database issue of this journal.
Collapse
Affiliation(s)
- Laurence R. Meyer
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Ann S. Zweig
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Angie S. Hinrichs
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Donna Karolchik
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Robert M. Kuhn
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Matthew Wong
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Cricket A. Sloan
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Kate R. Rosenbloom
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Greg Roe
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Brooke Rhead
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Brian J. Raney
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Andy Pohl
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Venkat S. Malladi
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Chin H. Li
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Brian T. Lee
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Vanessa Kirkup
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Fan Hsu
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Steve Heitner
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Rachel A. Harte
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Mary Goldman
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Belinda M. Giardine
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Pauline A. Fujita
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Timothy R. Dreszer
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Melissa S. Cline
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Galt P. Barber
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - W. James Kent
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, 08003 Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| |
Collapse
|
35
|
Rosenbloom KR, Dreszer TR, Long JC, Malladi VS, Sloan CA, Raney BJ, Cline MS, Karolchik D, Barber GP, Clawson H, Diekhans M, Fujita PA, Goldman M, Gravell RC, Harte RA, Hinrichs AS, Kirkup VM, Kuhn RM, Learned K, Maddren M, Meyer LR, Pohl A, Rhead B, Wong MC, Zweig AS, Haussler D, Kent WJ. ENCODE whole-genome data in the UCSC Genome Browser: update 2012. Nucleic Acids Res 2012; 40:D912-7. [PMID: 22075998 PMCID: PMC3245183 DOI: 10.1093/nar/gkr1012] [Citation(s) in RCA: 207] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Revised: 10/18/2011] [Accepted: 10/20/2011] [Indexed: 11/23/2022] Open
Abstract
The Encyclopedia of DNA Elements (ENCODE) Consortium is entering its 5th year of production-level effort generating high-quality whole-genome functional annotations of the human genome. The past year has brought the ENCODE compendium of functional elements to critical mass, with a diverse set of 27 biochemical assays now covering 200 distinct human cell types. Within the mouse genome, which has been under study by ENCODE groups for the past 2 years, 37 cell types have been assayed. Over 2000 individual experiments have been completed and submitted to the Data Coordination Center for public use. UCSC makes this data available on the quality-reviewed public Genome Browser (http://genome.ucsc.edu) and on an early-access Preview Browser (http://genome-preview.ucsc.edu). Visual browsing, data mining and download of raw and processed data files are all supported. An ENCODE portal (http://encodeproject.org) provides specialized tools and information about the ENCODE data sets.
Collapse
Affiliation(s)
- Kate R Rosenbloom
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Dreszer TR, Karolchik D, Zweig AS, Hinrichs AS, Raney BJ, Kuhn RM, Meyer LR, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B, Pohl A, Malladi VS, Li CH, Learned K, Kirkup V, Hsu F, Harte RA, Guruvadoo L, Goldman M, Giardine BM, Fujita PA, Diekhans M, Cline MS, Clawson H, Barber GP, Haussler D, James Kent W. The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res 2012; 40:D918-23. [PMID: 22086951 PMCID: PMC3245018 DOI: 10.1093/nar/gkr1055] [Citation(s) in RCA: 273] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2011] [Revised: 10/18/2011] [Accepted: 10/25/2011] [Indexed: 01/05/2023] Open
Abstract
The University of California Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analyzing and sharing both publicly available and user-generated genomic data sets. In the past year, the local database has been updated with four new species assemblies, and we anticipate another four will be released by the end of 2011. Further, a large number of annotation tracks have been either added, updated by contributors, or remapped to the latest human reference genome. Among these are new phenotype and disease annotations, UCSC genes, and a major dbSNP update, which required new visualization methods. Growing beyond the local database, this year we have introduced 'track data hubs', which allow the Genome Browser to provide access to remotely located sets of annotations. This feature is designed to significantly extend the number and variety of annotation tracks that are publicly available for visualization and analysis from within our site. We have also introduced several usability features including track search and a context-sensitive menu of options available with a right-click anywhere on the Browser's image.
Collapse
Affiliation(s)
- Timothy R. Dreszer
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Donna Karolchik
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Ann S. Zweig
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Angie S. Hinrichs
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Brian J. Raney
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Robert M. Kuhn
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Laurence R. Meyer
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Mathew Wong
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Cricket A. Sloan
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Kate R. Rosenbloom
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Greg Roe
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Brooke Rhead
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Andy Pohl
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Venkat S. Malladi
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Chin H. Li
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Vanessa Kirkup
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Fan Hsu
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Rachel A. Harte
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Mary Goldman
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Belinda M. Giardine
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Pauline A. Fujita
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Melissa S. Cline
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Galt P. Barber
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - W. James Kent
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Centre for Genomic Regulation (CRG), Barcelona, Spain, Center for Comparative Genomics and Bioinformatics, Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802 and Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| |
Collapse
|
37
|
Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, Liu T, Yip KY, Robilotto R, Rechtsteiner A, Ikegami K, Alves P, Chateigner A, Perry M, Morris M, Auerbach RK, Feng X, Leng J, Vielle A, Niu W, Rhrissorrakrai K, Agarwal A, Alexander RP, Barber G, Brdlik CM, Brennan J, Brouillet JJ, Carr A, Cheung MS, Clawson H, Contrino S, Dannenberg LO, Dernburg AF, Desai A, Dick L, Dosé AC, Du J, Egelhofer T, Ercan S, Euskirchen G, Ewing B, Feingold EA, Gassmann R, Good PJ, Green P, Gullier F, Gutwein M, Guyer MS, Habegger L, Han T, Henikoff JG, Henz SR, Hinrichs A, Holster H, Hyman T, Iniguez AL, Janette J, Jensen M, Kato M, Kent WJ, Kephart E, Khivansara V, Khurana E, Kim JK, Kolasinska-Zwierz P, Lai EC, Latorre I, Leahey A, Lewis S, Lloyd P, Lochovsky L, Lowdon RF, Lubling Y, Lyne R, MacCoss M, Mackowiak SD, Mangone M, McKay S, Mecenas D, Merrihew G, Miller DM, Muroyama A, Murray JI, Ooi SL, Pham H, Phippen T, Preston EA, Rajewsky N, Rätsch G, Rosenbaum H, Rozowsky J, Rutherford K, Ruzanov P, Sarov M, Sasidharan R, Sboner A, Scheid P, Segal E, Shin H, Shou C, Slack FJ, Slightam C, Smith R, Spencer WC, Stinson EO, Taing S, Takasaki T, Vafeados D, Voronina K, Wang G, Washington NL, Whittle CM, Wu B, Yan KK, Zeller G, Zha Z, Zhong M, Zhou X, Ahringer J, Strome S, Gunsalus KC, Micklem G, Liu XS, Reinke V, Kim SK, Hillier LW, Henikoff S, Piano F, Snyder M, Stein L, Lieb JD, Waterston RH. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project. Science 2010; 330:1775-87. [PMID: 21177976 PMCID: PMC3142569 DOI: 10.1126/science.1196914] [Citation(s) in RCA: 741] [Impact Index Per Article: 52.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
Collapse
Affiliation(s)
- Mark B. Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA
| | - Zhi John Lu
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Eric L. Van Nostrand
- Department of Genetics, Stanford University Medical Center, Stanford, CA 94305, USA
| | - Chao Cheng
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Bradley I. Arshinoff
- Ontario Institute for Cancer Research, 101 College Street, Suite 800, Toronto, Ontario M5G 0A3, Canada
- Department of Molecular Genetics, University of Toronto, 27 King's College Circle, Toronto, Ontario M5S 1A1, Canada
| | - Tao Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA
- Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA
| | - Kevin Y. Yip
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Rebecca Robilotto
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Andreas Rechtsteiner
- Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Kohta Ikegami
- Department of Biology and Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Pedro Alves
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Aurelien Chateigner
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK, and Cambridge Systems Biology Centre, Tennis Court Road, Cambridge CB2 1QR, UK
| | - Marc Perry
- Ontario Institute for Cancer Research, 101 College Street, Suite 800, Toronto, Ontario M5G 0A3, Canada
| | - Mitzi Morris
- Center for Genomics and Systems Biology, Department of Biology, New York University, 1009 Silver Center, 100 Washington Square East, New York, NY 10003–6688, USA
| | - Raymond K. Auerbach
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Xin Feng
- Ontario Institute for Cancer Research, 101 College Street, Suite 800, Toronto, Ontario M5G 0A3, Canada
- Department of Biomedical Engineering, State University of New York at Stonybrook, Stonybrook, NY 11794, USA
| | - Jing Leng
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Anne Vielle
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Wei Niu
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06824, USA
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520–8005, USA
| | - Kahn Rhrissorrakrai
- Center for Genomics and Systems Biology, Department of Biology, New York University, 1009 Silver Center, 100 Washington Square East, New York, NY 10003–6688, USA
| | - Ashish Agarwal
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA
| | - Roger P. Alexander
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Galt Barber
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064 USA
| | - Cathleen M. Brdlik
- Department of Genetics, Stanford University Medical Center, Stanford, CA 94305, USA
| | - Jennifer Brennan
- Department of Biology and Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | - Adrian Carr
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK, and Cambridge Systems Biology Centre, Tennis Court Road, Cambridge CB2 1QR, UK
| | - Ming-Sin Cheung
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Hiram Clawson
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064 USA
| | - Sergio Contrino
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK, and Cambridge Systems Biology Centre, Tennis Court Road, Cambridge CB2 1QR, UK
| | | | - Abby F. Dernburg
- Howard Hughes Medical Institute, Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA, and Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Arshad Desai
- Ludwig Institute Cancer Research/Department of Cellular and Molecular Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093–0653, USA
| | - Lindsay Dick
- David Rockefeller Graduate Program, Rockefeller University, 1230 York Avenue New York, NY 10065, USA
| | - Andréa C. Dosé
- Howard Hughes Medical Institute, Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA, and Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jiang Du
- Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA
| | - Thea Egelhofer
- Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Sevinc Ercan
- Department of Biology and Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Ghia Euskirchen
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06824, USA
| | - Brent Ewing
- Department of Genome Sciences, University of Washington School of Medicine, William H. Foege Building S350D, 1705 NE Pacific Street, Post Office Box 355065, Seattle, WA 98195–5065, USA
| | - Elise A. Feingold
- Division of Extramural Research, National Human Genome Research Institute, National Institutes of Health, 5635 Fishers Lane, Suite 4076, Bethesda, MD 20892–9305, USA
| | - Reto Gassmann
- Ludwig Institute Cancer Research/Department of Cellular and Molecular Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093–0653, USA
| | - Peter J. Good
- Division of Extramural Research, National Human Genome Research Institute, National Institutes of Health, 5635 Fishers Lane, Suite 4076, Bethesda, MD 20892–9305, USA
| | - Phil Green
- Department of Genome Sciences, University of Washington School of Medicine, William H. Foege Building S350D, 1705 NE Pacific Street, Post Office Box 355065, Seattle, WA 98195–5065, USA
| | - Francois Gullier
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK, and Cambridge Systems Biology Centre, Tennis Court Road, Cambridge CB2 1QR, UK
| | - Michelle Gutwein
- Center for Genomics and Systems Biology, Department of Biology, New York University, 1009 Silver Center, 100 Washington Square East, New York, NY 10003–6688, USA
| | - Mark S. Guyer
- Division of Extramural Research, National Human Genome Research Institute, National Institutes of Health, 5635 Fishers Lane, Suite 4076, Bethesda, MD 20892–9305, USA
| | - Lukas Habegger
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Ting Han
- Life Sciences Institute, Department of Human Genetics, University of Michigan, 210 Washtenaw Avenue, Ann Arbor, MI 48109–2216, USA
| | - Jorja G. Henikoff
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA
| | - Stefan R. Henz
- Max Planck Institute for Developmental Biology, Spemannstrasse 37-39, 72076 Tübingen, Germany
| | - Angie Hinrichs
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064 USA
| | - Heather Holster
- Roche NimbleGen, 500 South Rosa Road, Madison, WI 53719, USA
| | - Tony Hyman
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | - A. Leo Iniguez
- Roche NimbleGen, 500 South Rosa Road, Madison, WI 53719, USA
| | - Judith Janette
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520–8005, USA
| | - Morten Jensen
- Department of Biology and Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Masaomi Kato
- Department of Molecular, Cellular and Developmental Biology, Post Office Box 208103, Yale University, New Haven, CT 06520, USA
| | - W. James Kent
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA 95064 USA
| | - Ellen Kephart
- Ontario Institute for Cancer Research, 101 College Street, Suite 800, Toronto, Ontario M5G 0A3, Canada
| | - Vishal Khivansara
- Life Sciences Institute, Department of Human Genetics, University of Michigan, 210 Washtenaw Avenue, Ann Arbor, MI 48109–2216, USA
| | - Ekta Khurana
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - John K. Kim
- Life Sciences Institute, Department of Human Genetics, University of Michigan, 210 Washtenaw Avenue, Ann Arbor, MI 48109–2216, USA
| | - Paulina Kolasinska-Zwierz
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Eric C. Lai
- Sloan-Kettering Institute, 1275 York Avenue, Post Office Box 252, New York, NY 10065, USA
| | - Isabel Latorre
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Amber Leahey
- Department of Genome Sciences, University of Washington School of Medicine, William H. Foege Building S350D, 1705 NE Pacific Street, Post Office Box 355065, Seattle, WA 98195–5065, USA
| | - Suzanna Lewis
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mailstop 64-121, Berkeley, CA 94720 USA
| | - Paul Lloyd
- Ontario Institute for Cancer Research, 101 College Street, Suite 800, Toronto, Ontario M5G 0A3, Canada
| | - Lucas Lochovsky
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Rebecca F. Lowdon
- Division of Extramural Research, National Human Genome Research Institute, National Institutes of Health, 5635 Fishers Lane, Suite 4076, Bethesda, MD 20892–9305, USA
| | - Yaniv Lubling
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Rachel Lyne
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK, and Cambridge Systems Biology Centre, Tennis Court Road, Cambridge CB2 1QR, UK
| | - Michael MacCoss
- Department of Genome Sciences, University of Washington School of Medicine, William H. Foege Building S350D, 1705 NE Pacific Street, Post Office Box 355065, Seattle, WA 98195–5065, USA
| | - Sebastian D. Mackowiak
- Max-Delbrück-Centrum für Molekulare Medizin, Division of Systems Biology, Robert-Rössle-Strasse 10, D-13125 Berlin-Buch, Germany
| | - Marco Mangone
- Center for Genomics and Systems Biology, Department of Biology, New York University, 1009 Silver Center, 100 Washington Square East, New York, NY 10003–6688, USA
| | - Sheldon McKay
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11542 USA
| | - Desirea Mecenas
- Center for Genomics and Systems Biology, Department of Biology, New York University, 1009 Silver Center, 100 Washington Square East, New York, NY 10003–6688, USA
| | - Gennifer Merrihew
- Department of Genome Sciences, University of Washington School of Medicine, William H. Foege Building S350D, 1705 NE Pacific Street, Post Office Box 355065, Seattle, WA 98195–5065, USA
| | - David M. Miller
- Department of Cell and Developmental Biology, Vanderbilt University, 465 21st Avenue South, Nashville, TN 37232–8240, USA
| | - Andrew Muroyama
- Ludwig Institute Cancer Research/Department of Cellular and Molecular Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093–0653, USA
| | - John I. Murray
- Department of Genome Sciences, University of Washington School of Medicine, William H. Foege Building S350D, 1705 NE Pacific Street, Post Office Box 355065, Seattle, WA 98195–5065, USA
| | - Siew-Loon Ooi
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA
| | - Hoang Pham
- Howard Hughes Medical Institute, Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA, and Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Taryn Phippen
- Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Elicia A. Preston
- Department of Genome Sciences, University of Washington School of Medicine, William H. Foege Building S350D, 1705 NE Pacific Street, Post Office Box 355065, Seattle, WA 98195–5065, USA
| | - Nikolaus Rajewsky
- Max-Delbrück-Centrum für Molekulare Medizin, Division of Systems Biology, Robert-Rössle-Strasse 10, D-13125 Berlin-Buch, Germany
| | - Gunnar Rätsch
- Friedrich Miescher Laboratory of the Max Planck Society, Spemannstrasse 39, 72076 Tübingen, Germany
| | - Heidi Rosenbaum
- Roche NimbleGen, 500 South Rosa Road, Madison, WI 53719, USA
| | - Joel Rozowsky
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Kim Rutherford
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK, and Cambridge Systems Biology Centre, Tennis Court Road, Cambridge CB2 1QR, UK
| | - Peter Ruzanov
- Ontario Institute for Cancer Research, 101 College Street, Suite 800, Toronto, Ontario M5G 0A3, Canada
| | - Mihail Sarov
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | - Rajkumar Sasidharan
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Andrea Sboner
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Paul Scheid
- Center for Genomics and Systems Biology, Department of Biology, New York University, 1009 Silver Center, 100 Washington Square East, New York, NY 10003–6688, USA
| | - Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Hyunjin Shin
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA
- Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA
| | - Chong Shou
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Frank J. Slack
- Department of Molecular, Cellular and Developmental Biology, Post Office Box 208103, Yale University, New Haven, CT 06520, USA
| | - Cindie Slightam
- Department of Developmental Biology, Stanford University Medical Center, 279 Campus Drive, Stanford, CA 94305–5329, USA
| | - Richard Smith
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK, and Cambridge Systems Biology Centre, Tennis Court Road, Cambridge CB2 1QR, UK
| | - William C. Spencer
- Department of Cell and Developmental Biology, Vanderbilt University, 465 21st Avenue South, Nashville, TN 37232–8240, USA
| | - E. O. Stinson
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mailstop 64-121, Berkeley, CA 94720 USA
| | - Scott Taing
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA
| | - Teruaki Takasaki
- Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Dionne Vafeados
- Department of Genome Sciences, University of Washington School of Medicine, William H. Foege Building S350D, 1705 NE Pacific Street, Post Office Box 355065, Seattle, WA 98195–5065, USA
| | - Ksenia Voronina
- Ludwig Institute Cancer Research/Department of Cellular and Molecular Medicine, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093–0653, USA
| | - Guilin Wang
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520–8005, USA
| | - Nicole L. Washington
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mailstop 64-121, Berkeley, CA 94720 USA
| | - Christina M. Whittle
- Department of Biology and Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Beijing Wu
- Department of Developmental Biology, Stanford University Medical Center, 279 Campus Drive, Stanford, CA 94305–5329, USA
| | - Koon-Kiu Yan
- Program in Computational Biology and Bioinformatics, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
- Department of Molecular Biophysics and Biochemistry, Yale University, Bass 432, 266 Whitney Avenue, New Haven, CT 06520, USA
| | - Georg Zeller
- Friedrich Miescher Laboratory of the Max Planck Society, Spemannstrasse 39, 72076 Tübingen, Germany
- European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | - Zheng Zha
- Ontario Institute for Cancer Research, 101 College Street, Suite 800, Toronto, Ontario M5G 0A3, Canada
| | - Mei Zhong
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06824, USA
| | - Xingliang Zhou
- Department of Biology and Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | - Julie Ahringer
- Wellcome Trust/Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK
| | - Susan Strome
- Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Kristin C. Gunsalus
- Center for Genomics and Systems Biology, Department of Biology, New York University, 1009 Silver Center, 100 Washington Square East, New York, NY 10003–6688, USA
- New York University, Abu Dhabi, United Arab Emirates
| | - Gos Micklem
- Department of Genetics, University of Cambridge, Cambridge CB2 3EH, UK, and Cambridge Systems Biology Centre, Tennis Court Road, Cambridge CB2 1QR, UK
| | - X. Shirley Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA
- Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA
| | - Valerie Reinke
- Department of Genetics, Yale University School of Medicine, New Haven, CT 06520–8005, USA
| | - Stuart K. Kim
- Department of Genetics, Stanford University Medical Center, Stanford, CA 94305, USA
- Department of Developmental Biology, Stanford University Medical Center, 279 Campus Drive, Stanford, CA 94305–5329, USA
| | - LaDeana W. Hillier
- Department of Genome Sciences, University of Washington School of Medicine, William H. Foege Building S350D, 1705 NE Pacific Street, Post Office Box 355065, Seattle, WA 98195–5065, USA
| | - Steven Henikoff
- Basic Sciences Division, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA
| | - Fabio Piano
- Center for Genomics and Systems Biology, Department of Biology, New York University, 1009 Silver Center, 100 Washington Square East, New York, NY 10003–6688, USA
- New York University, Abu Dhabi, United Arab Emirates
| | - Michael Snyder
- Department of Genetics, Stanford University Medical Center, Stanford, CA 94305, USA
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06824, USA
| | - Lincoln Stein
- Ontario Institute for Cancer Research, 101 College Street, Suite 800, Toronto, Ontario M5G 0A3, Canada
- Department of Molecular Genetics, University of Toronto, 27 King's College Circle, Toronto, Ontario M5S 1A1, Canada
- Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY 11542 USA
| | - Jason D. Lieb
- Department of Biology and Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Robert H. Waterston
- Department of Genome Sciences, University of Washington School of Medicine, William H. Foege Building S350D, 1705 NE Pacific Street, Post Office Box 355065, Seattle, WA 98195–5065, USA
| |
Collapse
|
38
|
Raney BJ, Cline MS, Rosenbloom KR, Dreszer TR, Learned K, Barber GP, Meyer LR, Sloan CA, Malladi VS, Roskin KM, Suh BB, Hinrichs AS, Clawson H, Zweig AS, Kirkup V, Fujita PA, Rhead B, Smith KE, Pohl A, Kuhn RM, Karolchik D, Haussler D, Kent WJ. ENCODE whole-genome data in the UCSC genome browser (2011 update). Nucleic Acids Res 2010; 39:D871-5. [PMID: 21037257 PMCID: PMC3013645 DOI: 10.1093/nar/gkq1017] [Citation(s) in RCA: 155] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The ENCODE project is an international consortium with a goal of cataloguing all the functional elements in the human genome. The ENCODE Data Coordination Center (DCC) at the University of California, Santa Cruz serves as the central repository for ENCODE data. In this role, the DCC offers a collection of high-throughput, genome-wide data generated with technologies such as ChIP-Seq, RNA-Seq, DNA digestion and others. This data helps illuminate transcription factor-binding sites, histone marks, chromatin accessibility, DNA methylation, RNA expression, RNA binding and other cell-state indicators. It includes sequences with quality scores, alignments, signals calculated from the alignments, and in most cases, element or peak calls calculated from the signal data. Each data set is available for visualization and download via the UCSC Genome Browser (http://genome.ucsc.edu/). ENCODE data can also be retrieved using a metadata system that captures the experimental parameters of each assay. The ENCODE web portal at UCSC (http://encodeproject.org/) provides information about the ENCODE data and links for access.
Collapse
Affiliation(s)
- Brian J Raney
- Center for Biomolecular Science and Engineering, School of Engineering and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Fujita PA, Rhead B, Zweig AS, Hinrichs AS, Karolchik D, Cline MS, Goldman M, Barber GP, Clawson H, Coelho A, Diekhans M, Dreszer TR, Giardine BM, Harte RA, Hillman-Jackson J, Hsu F, Kirkup V, Kuhn RM, Learned K, Li CH, Meyer LR, Pohl A, Raney BJ, Rosenbloom KR, Smith KE, Haussler D, Kent WJ. The UCSC Genome Browser database: update 2011. Nucleic Acids Res 2010; 39:D876-82. [PMID: 20959295 PMCID: PMC3242726 DOI: 10.1093/nar/gkq963] [Citation(s) in RCA: 841] [Impact Index Per Article: 60.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The University of California, Santa Cruz Genome Browser (http://genome.ucsc.edu) offers online access to a database of genomic sequence and annotation data for a wide variety of organisms. The Browser also has many tools for visualizing, comparing and analyzing both publicly available and user-generated genomic data sets, aligning sequences and uploading user data. Among the features released this year are a gene search tool and annotation track drag-reorder functionality as well as support for BAM and BigWig/BigBed file formats. New display enhancements include overlay of multiple wiggle tracks through use of transparent coloring, options for displaying transformed wiggle data, a 'mean+whiskers' windowing function for display of wiggle data at high zoom levels, and more color schemes for microarray data. New data highlights include seven new genome assemblies, a Neandertal genome data portal, phenotype and disease association data, a human RNA editing track, and a zebrafish Conservation track. We also describe updates to existing tracks.
Collapse
Affiliation(s)
- Pauline A Fujita
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Rhead B, Karolchik D, Kuhn RM, Hinrichs AS, Zweig AS, Fujita PA, Diekhans M, Smith KE, Rosenbloom KR, Raney BJ, Pohl A, Pheasant M, Meyer LR, Learned K, Hsu F, Hillman-Jackson J, Harte RA, Giardine B, Dreszer TR, Clawson H, Barber GP, Haussler D, Kent WJ. The UCSC Genome Browser database: update 2010. Nucleic Acids Res 2009; 38:D613-9. [PMID: 19906737 PMCID: PMC2808870 DOI: 10.1093/nar/gkp939] [Citation(s) in RCA: 500] [Impact Index Per Article: 33.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The University of California, Santa Cruz (UCSC) Genome Browser website (http://genome.ucsc.edu/) provides a large database of publicly available sequence and annotation data along with an integrated tool set for examining and comparing the genomes of organisms, aligning sequence to genomes, and displaying and sharing users’ own annotation data. As of September 2009, genomic sequence and a basic set of annotation ‘tracks’ are provided for 47 organisms, including 14 mammals, 10 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms and a yeast. New data highlights this year include an updated human genome browser, a 44-species multiple sequence alignment track, improved variation and phenotype tracks and 16 new genome-wide ENCODE tracks. New features include drag-and-zoom navigation, a Wiki track for user-added annotations, new custom track formats for large datasets (bigBed and bigWig), a new multiple alignment output tool, links to variation and protein structure tools, in silico PCR utility enhancements, and improved track configuration tools.
Collapse
Affiliation(s)
- Brooke Rhead
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Kuhn RM, Karolchik D, Zweig AS, Wang T, Smith KE, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pheasant M, Meyer L, Hsu F, Hinrichs AS, Harte RA, Giardine B, Fujita P, Diekhans M, Dreszer T, Clawson H, Barber GP, Haussler D, Kent WJ. The UCSC Genome Browser Database: update 2009. Nucleic Acids Res 2008; 37:D755-61. [PMID: 18996895 PMCID: PMC2686463 DOI: 10.1093/nar/gkn875] [Citation(s) in RCA: 303] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The UCSC Genome Browser Database (GBD, http://genome.ucsc.edu) is a publicly available collection of genome assembly sequence data and integrated annotations for a large number of organisms, including extensive comparative-genomic resources. In the past year, 13 new genome assemblies have been added, including two important primate species, orangutan and marmoset, bringing the total to 46 assemblies for 24 different vertebrates and 39 assemblies for 22 different invertebrate animals. The GBD datasets may be viewed graphically with the UCSC Genome Browser, which uses a coordinate-based display system allowing users to juxtapose a wide variety of data. These data include all mRNAs from GenBank mapped to all organisms, RefSeq alignments, gene predictions, regulatory elements, gene expression data, repeats, SNPs and other variation data, as well as pairwise and multiple-genome alignments. A variety of other bioinformatics tools are also provided, including BLAT, the Table Browser, the Gene Sorter, the Proteome Browser, VisiGene and Genome Graphs.
Collapse
Affiliation(s)
- R M Kuhn
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, Kober KM, Miller W, Pedersen JS, Pohl A, Raney BJ, Rhead B, Rosenbloom KR, Smith KE, Stanke M, Thakkapallayil A, Trumbower H, Wang T, Zweig AS, Haussler D, Kent WJ. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res 2007; 36:D773-9. [PMID: 18086701 PMCID: PMC2238835 DOI: 10.1093/nar/gkm966] [Citation(s) in RCA: 403] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving and sharing, better custom track management, expanded Genome Browser configuration options and a Genome Browser wiki site. The downloadable GBD data, the companion Genome Browser toolset and links to documentation and related information can be found at: http://genome.ucsc.edu/.
Collapse
Affiliation(s)
- D Karolchik
- Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
43
|
Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, Siepel A, Birney E, Keefe D, Schwartz AS, Hou M, Taylor J, Nikolaev S, Montoya-Burgos JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Brown JB, Bickel P, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Stone EA, Rosenbloom KR, Kent WJ, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VVB, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Hinrichs A, Trumbower H, Clawson H, Zweig A, Kuhn RM, Barber G, Harte R, Karolchik D, Field MA, Moore RA, Matthewson CA, Schein JE, Marra MA, Antonarakis SE, Batzoglou S, Goldman N, Hardison R, Haussler D, Miller W, Pachter L, Green ED, Sidow A. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res 2007; 17:760-74. [PMID: 17567995 PMCID: PMC1891336 DOI: 10.1101/gr.6034307] [Citation(s) in RCA: 170] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization.
Collapse
Affiliation(s)
- Elliott H Margulies
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Birney E, Stamatoyannopoulos JA, Dutta A, Guigó R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SCJ, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermüller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung WK, Ooi HS, Chiu KP, Foissac S, Alioto T, Brent M, Pachter L, Tress ML, Valencia A, Choo SW, Choo CY, Ucla C, Manzano C, Wyss C, Cheung E, Clark TG, Brown JB, Ganesh M, Patel S, Tammana H, Chrast J, Henrichsen CN, Kai C, Kawai J, Nagalakshmi U, Wu J, Lian Z, Lian J, Newburger P, Zhang X, Bickel P, Mattick JS, Carninci P, Hayashizaki Y, Weissman S, Hubbard T, Myers RM, Rogers J, Stadler PF, Lowe TM, Wei CL, Ruan Y, Struhl K, Gerstein M, Antonarakis SE, Fu Y, Green ED, Karaöz U, Siepel A, Taylor J, Liefer LA, Wetterstrand KA, Good PJ, Feingold EA, Guyer MS, Cooper GM, Asimenos G, Dewey CN, Hou M, Nikolaev S, Montoya-Burgos JI, Löytynoja A, Whelan S, Pardi F, Massingham T, Huang H, Zhang NR, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Seringhaus M, Church D, Rosenbloom K, Kent WJ, Stone EA, Batzoglou S, Goldman N, Hardison RC, Haussler D, Miller W, Sidow A, Trinklein ND, Zhang ZD, Barrera L, Stuart R, King DC, Ameur A, Enroth S, Bieda MC, Kim J, Bhinge AA, Jiang N, Liu J, Yao F, Vega VB, Lee CWH, Ng P, Shahab A, Yang A, Moqtaderi Z, Zhu Z, Xu X, Squazzo S, Oberley MJ, Inman D, Singer MA, Richmond TA, Munn KJ, Rada-Iglesias A, Wallerman O, Komorowski J, Fowler JC, Couttet P, Bruce AW, Dovey OM, Ellis PD, Langford CF, Nix DA, Euskirchen G, Hartman S, Urban AE, Kraus P, Van Calcar S, Heintzman N, Kim TH, Wang K, Qu C, Hon G, Luna R, Glass CK, Rosenfeld MG, Aldred SF, Cooper SJ, Halees A, Lin JM, Shulha HP, Zhang X, Xu M, Haidar JNS, Yu Y, Ruan Y, Iyer VR, Green RD, Wadelius C, Farnham PJ, Ren B, Harte RA, Hinrichs AS, Trumbower H, Clawson H, Hillman-Jackson J, Zweig AS, Smith K, Thakkapallayil A, Barber G, Kuhn RM, Karolchik D, Armengol L, Bird CP, de Bakker PIW, Kern AD, Lopez-Bigas N, Martin JD, Stranger BE, Woodroffe A, Davydov E, Dimas A, Eyras E, Hallgrímsdóttir IB, Huppert J, Zody MC, Abecasis GR, Estivill X, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VVB, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Koriabine M, Nefedov M, Osoegawa K, Yoshinaga Y, Zhu B, de Jong PJ. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007; 447:799-816. [PMID: 17571346 PMCID: PMC2212820 DOI: 10.1038/nature05874] [Citation(s) in RCA: 3782] [Impact Index Per Article: 222.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Collapse
|
45
|
Thomas DJ, Rosenbloom KR, Clawson H, Hinrichs AS, Trumbower H, Raney BJ, Karolchik D, Barber GP, Harte RA, Hillman-Jackson J, Kuhn RM, Rhead BL, Smith KE, Thakkapallayil A, Zweig AS, Haussler D, Kent WJ. The ENCODE Project at UC Santa Cruz. Nucleic Acids Res 2007; 35:D663-7. [PMID: 17166863 PMCID: PMC1781110 DOI: 10.1093/nar/gkl1017] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2006] [Revised: 11/01/2006] [Accepted: 11/02/2006] [Indexed: 12/02/2022] Open
Abstract
The goal of the Encyclopedia Of DNA Elements (ENCODE) Project is to identify all functional elements in the human genome. The pilot phase is for comparison of existing methods and for the development of new methods to rigorously analyze a defined 1% of the human genome sequence. Experimental datasets are focused on the origin of replication, DNase I hypersensitivity, chromatin immunoprecipitation, promoter function, gene structure, pseudogenes, non-protein-coding RNAs, transcribed RNAs, multiple sequence alignment and evolutionarily constrained elements. The ENCODE project at UCSC website (http://genome.ucsc.edu/ENCODE) is the primary portal for the sequence-based data produced as part of the ENCODE project. In the pilot phase of the project, over 30 labs provided experimental results for a total of 56 browser tracks supported by 385 database tables. The site provides researchers with a number of tools that allow them to visualize and analyze the data as well as download data for local analyses. This paper describes the portal to the data, highlights the data that has been made available, and presents the tools that have been developed within the ENCODE project. Access to the data and types of interactive analysis that are possible are illustrated through supplemental examples.
Collapse
Affiliation(s)
- Daryl J Thomas
- Department of Biomolecular Engineering, University of California at Santa Cruz, Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
46
|
Kuhn RM, Karolchik D, Zweig AS, Trumbower H, Thomas DJ, Thakkapallayil A, Sugnet CW, Stanke M, Smith KE, Siepel A, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pedersen JS, Hsu F, Hinrichs AS, Harte RA, Diekhans M, Clawson H, Bejerano G, Barber GP, Baertsch R, Haussler D, Kent WJ. The UCSC genome browser database: update 2007. Nucleic Acids Res 2006; 35:D668-73. [PMID: 17142222 PMCID: PMC1669757 DOI: 10.1093/nar/gkl928] [Citation(s) in RCA: 226] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
The University of California, Santa Cruz Genome Browser Database contains, as of September 2006, sequence and annotation data for the genomes of 13 vertebrate and 19 invertebrate species. The Genome Browser displays a wide variety of annotations at all scales from the single nucleotide level up to a full chromosome and includes assembly data, genes and gene predictions, mRNA and EST alignments, and comparative genomics, regulation, expression and variation data. The database is optimized for fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. In the past year, 22 new assemblies and several new sets of human variation annotation have been released. New features include VisiGene, a fully integrated in situ hybridization image browser; phyloGif, for drawing evolutionary tree diagrams; a redesigned Custom Track feature; an expanded SNP annotation track; and many new display options. The Genome Browser, other tools, downloadable data files and links to documentation and other information can be found at .
Collapse
Affiliation(s)
- R M Kuhn
- Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Kabat JL, Barberan-Soler S, McKenna P, Clawson H, Farrer T, Zahler AM. Intronic alternative splicing regulators identified by comparative genomics in nematodes. PLoS Comput Biol 2006; 2:e86. [PMID: 16839192 PMCID: PMC1500816 DOI: 10.1371/journal.pcbi.0020086] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2006] [Accepted: 05/30/2006] [Indexed: 11/18/2022] Open
Abstract
Many alternative splicing events are regulated by pentameric and hexameric intronic sequences that serve as binding sites for splicing regulatory factors. We hypothesized that intronic elements that regulate alternative splicing are under selective pressure for evolutionary conservation. Using a Wobble Aware Bulk Aligner genomic alignment of Caenorhabditis elegans and Caenorhabditis briggsae, we identified 147 alternatively spliced cassette exons that exhibit short regions of high nucleotide conservation in the introns flanking the alternative exon. In vivo experiments on the alternatively spliced let-2 gene confirm that these conserved regions can be important for alternative splicing regulation. Conserved intronic element sequences were collected into a dataset and the occurrence of each pentamer and hexamer motif was counted. We compared the frequency of pentamers and hexamers in the conserved intronic elements to a dataset of all C. elegans intron sequences in order to identify short intronic motifs that are more likely to be associated with alternative splicing. High-scoring motifs were examined for upstream or downstream preferences in introns surrounding alternative exons. Many of the high-scoring nematode pentamer and hexamer motifs correspond to known mammalian splicing regulatory sequences, such as (T)GCATG, indicating that the mechanism of alternative splicing regulation is well conserved in metazoans. A comparison of the analysis of the conserved intronic elements, and analysis of the entire introns flanking these same exons, reveals that focusing on intronic conservation can increase the sensitivity of detecting putative splicing regulatory motifs. This approach also identified novel sequences whose role in splicing is under investigation and has allowed us to take a step forward in defining a catalog of splicing regulatory elements for an organism. In vivo experiments confirm that one novel high-scoring sequence from our analysis, (T)CTATC, is important for alternative splicing regulation of the unc-52 gene.
Collapse
Affiliation(s)
- Jennifer L Kabat
- Department of Molecular, Cell, and Developmental Biology and Center for Molecular Biology of RNA, University of California Santa Cruz, Santa Cruz, California, USA
| | | | | | | | | | | |
Collapse
|
48
|
Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H, Diekhans M, Furey TS, Harte RA, Hsu F, Hillman-Jackson J, Kuhn RM, Pedersen JS, Pohl A, Raney BJ, Rosenbloom KR, Siepel A, Smith KE, Sugnet CW, Sultan-Qurraie A, Thomas DJ, Trumbower H, Weber RJ, Weirauch M, Zweig AS, Haussler D, Kent WJ. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res 2006; 34:D590-8. [PMID: 16381938 PMCID: PMC1347506 DOI: 10.1093/nar/gkj144] [Citation(s) in RCA: 847] [Impact Index Per Article: 47.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering and comparison of genes by several metrics including expression data and several gene properties. BLAT and In Silico PCR search for sequences in entire genomes in seconds. These tools are highly integrated and provide many hyperlinks to other databases and websites. The GBD, browsing tools, downloadable data files and links to documentation and other information can be found at .
Collapse
Affiliation(s)
- A S Hinrichs
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Abstract
The University of California Santa Cruz (UCSC) Known Genes dataset is constructed by a fully automated process, based on protein data from Swiss-Prot/TrEMBL (UniProt) and the associated mRNA data from Genbank. The detailed steps of this process are described. Extensive cross-references from this dataset to other genomic and proteomic data were constructed. For each known gene, a details page is provided containing rich information about the gene, together with extensive links to other relevant genomic, proteomic and pathway data. As of July 2005, the UCSC Known Genes are available for human, mouse and rat genomes. The Known Genes serves as a foundation to support several key programs: the Genome Browser, Proteome Browser, Gene Sorter and Table Browser offered at the UCSC website. All the associated data files and program source code are also available. They can be accessed at http://genome.ucsc.edu. The genomic coverage of UCSC Known Genes, RefSeq, Ensembl Genes, H-Invitational and CCDS is analyzed. Although UCSC Known Genes offers the highest genomic and CDS coverage among major human and mouse gene sets, more detailed analysis suggests all of them could be further improved.
Collapse
Affiliation(s)
- Fan Hsu
- Center for Biomolecular Science and Engineering, School of Engineering, University of California Santa Cruz Santa Cruz, CA 95064, USA.
| | | | | | | | | | | |
Collapse
|
50
|
Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, Weinstock GM, Wilson RK, Gibbs RA, Kent WJ, Miller W, Haussler D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 2005; 15:1034-50. [PMID: 16024819 PMCID: PMC1182216 DOI: 10.1101/gr.3715005] [Citation(s) in RCA: 2757] [Impact Index Per Article: 145.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2005] [Accepted: 06/02/2005] [Indexed: 11/24/2022]
Abstract
We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%-8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%-53%), Caenorhabditis elegans (18%-37%), and Saccharaomyces cerevisiae (47%-68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3' UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.
Collapse
Affiliation(s)
- Adam Siepel
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, Santa Cruz, California 95064, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|