1
|
Wirthlin ME, Schmid TA, Elie JE, Zhang X, Kowalczyk A, Redlich R, Shvareva VA, Rakuljic A, Ji MB, Bhat NS, Kaplow IM, Schäffer DE, Lawler AJ, Wang AZ, Phan BN, Annaldasula S, Brown AR, Lu T, Lim BK, Azim E, Clark NL, Meyer WK, Pond SLK, Chikina M, Yartsev MM, Pfenning AR, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Vocal learning-associated convergent evolution in mammalian proteins and regulatory elements. Science 2024; 383:eabn3263. [PMID: 38422184 DOI: 10.1126/science.abn3263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 02/20/2024] [Indexed: 03/02/2024]
Abstract
Vocal production learning ("vocal learning") is a convergently evolved trait in vertebrates. To identify brain genomic elements associated with mammalian vocal learning, we integrated genomic, anatomical, and neurophysiological data from the Egyptian fruit bat (Rousettus aegyptiacus) with analyses of the genomes of 215 placental mammals. First, we identified a set of proteins evolving more slowly in vocal learners. Then, we discovered a vocal motor cortical region in the Egyptian fruit bat, an emergent vocal learner, and leveraged that knowledge to identify active cis-regulatory elements in the motor cortex of vocal learners. Machine learning methods applied to motor cortex open chromatin revealed 50 enhancers robustly associated with vocal learning whose activity tended to be lower in vocal learners. Our research implicates convergent losses of motor cortex regulatory elements in mammalian vocal learning evolution.
Collapse
Affiliation(s)
- Morgan E Wirthlin
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Tobias A Schmid
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Julie E Elie
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94708, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Xiaomeng Zhang
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Amanda Kowalczyk
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ruby Redlich
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Varvara A Shvareva
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Ashley Rakuljic
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Maria B Ji
- Department of Psychology, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Ninad S Bhat
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Irene M Kaplow
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Daniel E Schäffer
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Alyssa J Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Andrew Z Wang
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - BaDoi N Phan
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Siddharth Annaldasula
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Ashley R Brown
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Tianyu Lu
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Byung Kook Lim
- Neurobiology section, Division of Biological Science, University of California, San Diego, La Jolla, CA 92093, USA
| | - Eiman Azim
- Molecular Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA
| | - Nathan L Clark
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Wynn K Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA 18015, USA
| | | | - Maria Chikina
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Michael M Yartsev
- Helen Wills Neuroscience Institute, University of California, Berkeley, Berkeley, CA 94708, USA
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA 94708, USA
| | - Andreas R Pfenning
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2
|
Christmas MJ, Kaplow IM, Genereux DP, Dong MX, Hughes GM, Li X, Sullivan PF, Hindle AG, Andrews G, Armstrong JC, Bianchi M, Breit AM, Diekhans M, Fanter C, Foley NM, Goodman DB, Goodman L, Keough KC, Kirilenko B, Kowalczyk A, Lawless C, Lind AL, Meadows JRS, Moreira LR, Redlich RW, Ryan L, Swofford R, Valenzuela A, Wagner F, Wallerman O, Brown AR, Damas J, Fan K, Gatesy J, Grimshaw J, Johnson J, Kozyrev SV, Lawler AJ, Marinescu VD, Morrill KM, Osmanski A, Paulat NS, Phan BN, Reilly SK, Schäffer DE, Steiner C, Supple MA, Wilder AP, Wirthlin ME, Xue JR, Birren BW, Gazal S, Hubley RM, Koepfli KP, Marques-Bonet T, Meyer WK, Nweeia M, Sabeti PC, Shapiro B, Smit AFA, Springer MS, Teeling EC, Weng Z, Hiller M, Levesque DL, Lewin HA, Murphy WJ, Navarro A, Paten B, Pollard KS, Ray DA, Ruf I, Ryder OA, Pfenning AR, Lindblad-Toh K, Karlsson EK, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Evolutionary constraint and innovation across hundreds of placental mammals. Science 2023. [PMID: 37104599 DOI: 0.1126/science.abn3943] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Zoonomia is the largest comparative genomics resource for mammals produced to date. By aligning genomes for 240 species, we identify bases that, when mutated, are likely to affect fitness and alter disease risk. At least 332 million bases (~10.7%) in the human genome are unusually conserved across species (evolutionarily constrained) relative to neutrally evolving repeats, and 4552 ultraconserved elements are nearly perfectly conserved. Of 101 million significantly constrained single bases, 80% are outside protein-coding exons and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Changes in genes and regulatory elements are associated with exceptional mammalian traits, such as hibernation, that could inform therapeutic development. Earth's vast and imperiled biodiversity offers distinctive power for identifying genetic variants that affect genome function and organismal phenotypes.
Collapse
Affiliation(s)
- Matthew J Christmas
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Irene M Kaplow
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | - Michael X Dong
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Graham M Hughes
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Xue Li
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Morningside Graduate School of Biomedical Sciences, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Allyson G Hindle
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Gregory Andrews
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Joel C Armstrong
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matteo Bianchi
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Ana M Breit
- School of Biology and Ecology, University of Maine, Orono, ME 04469, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cornelia Fanter
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Daniel B Goodman
- Department of Microbiology and Immunology, University of California San Francisco, San Francisco, CA 94143, USA
| | | | - Kathleen C Keough
- Fauna Bio, Inc., Emeryville, CA 94608, USA
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Bogdan Kirilenko
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
| | - Amanda Kowalczyk
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Colleen Lawless
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Abigail L Lind
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Jennifer R S Meadows
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Lucas R Moreira
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Ruby W Redlich
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Louise Ryan
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Ross Swofford
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Alejandro Valenzuela
- Department of Experimental and Health Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Franziska Wagner
- Museum of Zoology, Senckenberg Natural History Collections Dresden, 01109 Dresden, Germany
| | - Ola Wallerman
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Ashley R Brown
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Joana Damas
- The Genome Center, University of California Davis, Davis, CA 95616, USA
| | - Kaili Fan
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Jenna Grimshaw
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Jeremy Johnson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Sergey V Kozyrev
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Alyssa J Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Voichita D Marinescu
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Kathleen M Morrill
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Morningside Graduate School of Biomedical Sciences, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Austin Osmanski
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Nicole S Paulat
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - BaDoi N Phan
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Steven K Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Daniel E Schäffer
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Cynthia Steiner
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Megan A Supple
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Aryn P Wilder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Morgan E Wirthlin
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Allen Institute for Brain Science, Seattle, WA 98109, USA
| | - James R Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Bruce W Birren
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Steven Gazal
- Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | | | - Klaus-Peter Koepfli
- Center for Species Survival, Smithsonian's National Zoo and Conservation Biology Institute, Washington, DC 20008, USA
- Computer Technologies Laboratory, ITMO University, St. Petersburg 197101, Russia
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA 22630, USA
| | - Tomas Marques-Bonet
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08036 Barcelona, Spain
- Department of Medicine and Life Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Wynn K Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA 18015, USA
| | - Martin Nweeia
- Department of Comprehensive Care, School of Dental Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
- Department of Vertebrate Zoology, Canadian Museum of Nature, Ottawa, Ontario K2P 2R1, Canada
- Department of Vertebrate Zoology, Smithsonian Institution, Washington, DC 20002, USA
- Narwhal Genome Initiative, Department of Restorative Dentistry and Biomaterials Sciences, Harvard School of Dental Medicine, Boston, MA 02115, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Mark S Springer
- Department of Evolution, Ecology and Organismal Biology, University of California Riverside, Riverside, CA 92521, USA
| | - Emma C Teeling
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Michael Hiller
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
| | | | - Harris A Lewin
- The Genome Center, University of California Davis, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
- John Muir Institute for the Environment, University of California Davis, Davis, CA 95616, USA
| | - William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Arcadi Navarro
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- Department of Medicine and Life Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, 08005 Barcelona, Spain
- CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08003 Barcelona, Spain
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Katherine S Pollard
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - David A Ray
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Irina Ruf
- Division of Messel Research and Mammalogy, Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt am Main, Germany
| | - Oliver A Ryder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
- Department of Evolution, Behavior and Ecology, School of Biological Sciences, University of California San Diego, La Jolla, CA 92039, USA
| | - Andreas R Pfenning
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kerstin Lindblad-Toh
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Elinor K Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
3
|
Kaplow IM, Lawler AJ, Schäffer DE, Srinivasan C, Sestili HH, Wirthlin ME, Phan BN, Prasad K, Brown AR, Zhang X, Foley K, Genereux DP, Karlsson EK, Lindblad-Toh K, Meyer WK, Pfenning AR, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Relating enhancer genetic variation across mammals to complex phenotypes using machine learning. Science 2023; 380:eabm7993. [PMID: 37104615 DOI: 10.1126/science.abm7993] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Protein-coding differences between species often fail to explain phenotypic diversity, suggesting the involvement of genomic elements that regulate gene expression such as enhancers. Identifying associations between enhancers and phenotypes is challenging because enhancer activity can be tissue-dependent and functionally conserved despite low sequence conservation. We developed the Tissue-Aware Conservation Inference Toolkit (TACIT) to associate candidate enhancers with species' phenotypes using predictions from machine learning models trained on specific tissues. Applying TACIT to associate motor cortex and parvalbumin-positive interneuron enhancers with neurological phenotypes revealed dozens of enhancer-phenotype associations, including brain size-associated enhancers that interact with genes implicated in microcephaly or macrocephaly. TACIT provides a foundation for identifying enhancers associated with the evolution of any convergently evolved phenotype in any large group of species with aligned genomes.
Collapse
Affiliation(s)
- Irene M Kaplow
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Alyssa J Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Daniel E Schäffer
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Chaitanya Srinivasan
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Heather H Sestili
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Morgan E Wirthlin
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| | - BaDoi N Phan
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Kavya Prasad
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ashley R Brown
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Xiaomeng Zhang
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Kathleen Foley
- Department of Biological Sciences, Lehigh University, Bethlehem, PA, USA
| | - Diane P Genereux
- Broad Institute, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Elinor K Karlsson
- Broad Institute, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Kerstin Lindblad-Toh
- Broad Institute, Cambridge, MA, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Wynn K Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA, USA
| | - Andreas R Pfenning
- Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
- Department of Biology, Carnegie Mellon University, Pittsburgh, PA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Christmas MJ, Kaplow IM, Genereux DP, Dong MX, Hughes GM, Li X, Sullivan PF, Hindle AG, Andrews G, Armstrong JC, Bianchi M, Breit AM, Diekhans M, Fanter C, Foley NM, Goodman DB, Goodman L, Keough KC, Kirilenko B, Kowalczyk A, Lawless C, Lind AL, Meadows JRS, Moreira LR, Redlich RW, Ryan L, Swofford R, Valenzuela A, Wagner F, Wallerman O, Brown AR, Damas J, Fan K, Gatesy J, Grimshaw J, Johnson J, Kozyrev SV, Lawler AJ, Marinescu VD, Morrill KM, Osmanski A, Paulat NS, Phan BN, Reilly SK, Schäffer DE, Steiner C, Supple MA, Wilder AP, Wirthlin ME, Xue JR, Birren BW, Gazal S, Hubley RM, Koepfli KP, Marques-Bonet T, Meyer WK, Nweeia M, Sabeti PC, Shapiro B, Smit AFA, Springer MS, Teeling EC, Weng Z, Hiller M, Levesque DL, Lewin HA, Murphy WJ, Navarro A, Paten B, Pollard KS, Ray DA, Ruf I, Ryder OA, Pfenning AR, Lindblad-Toh K, Karlsson EK. Evolutionary constraint and innovation across hundreds of placental mammals. Science 2023; 380:eabn3943. [PMID: 37104599 PMCID: PMC10250106 DOI: 10.1126/science.abn3943] [Citation(s) in RCA: 34] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Accepted: 12/16/2022] [Indexed: 04/29/2023]
Abstract
Zoonomia is the largest comparative genomics resource for mammals produced to date. By aligning genomes for 240 species, we identify bases that, when mutated, are likely to affect fitness and alter disease risk. At least 332 million bases (~10.7%) in the human genome are unusually conserved across species (evolutionarily constrained) relative to neutrally evolving repeats, and 4552 ultraconserved elements are nearly perfectly conserved. Of 101 million significantly constrained single bases, 80% are outside protein-coding exons and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Changes in genes and regulatory elements are associated with exceptional mammalian traits, such as hibernation, that could inform therapeutic development. Earth's vast and imperiled biodiversity offers distinctive power for identifying genetic variants that affect genome function and organismal phenotypes.
Collapse
Affiliation(s)
- Matthew J. Christmas
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Irene M. Kaplow
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | - Michael X. Dong
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Graham M. Hughes
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Xue Li
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Morningside Graduate School of Biomedical Sciences, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Patrick F. Sullivan
- Department of Genetics, University of North Carolina Medical School, Chapel Hill, NC 27599, USA
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Allyson G. Hindle
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Gregory Andrews
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Joel C. Armstrong
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Matteo Bianchi
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Ana M. Breit
- School of Biology and Ecology, University of Maine, Orono, ME 04469, USA
| | - Mark Diekhans
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Cornelia Fanter
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - Nicole M. Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Daniel B. Goodman
- Department of Microbiology and Immunology, University of California San Francisco, San Francisco, CA 94143, USA
| | | | - Kathleen C. Keough
- Fauna Bio, Inc., Emeryville, CA 94608, USA
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Bogdan Kirilenko
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
| | - Amanda Kowalczyk
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Colleen Lawless
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Abigail L. Lind
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
| | - Jennifer R. S. Meadows
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Lucas R. Moreira
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Ruby W. Redlich
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Louise Ryan
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Ross Swofford
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Alejandro Valenzuela
- Department of Experimental and Health Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Franziska Wagner
- Museum of Zoology, Senckenberg Natural History Collections Dresden, 01109 Dresden, Germany
| | - Ola Wallerman
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Ashley R. Brown
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Joana Damas
- The Genome Center, University of California Davis, Davis, CA 95616, USA
| | - Kaili Fan
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Jenna Grimshaw
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Jeremy Johnson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Sergey V. Kozyrev
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Alyssa J. Lawler
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Voichita D. Marinescu
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Kathleen M. Morrill
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Morningside Graduate School of Biomedical Sciences, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Austin Osmanski
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Nicole S. Paulat
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - BaDoi N. Phan
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Medical Scientist Training Program, University of Pittsburgh School of Medicine, Pittsburgh, PA 15261, USA
| | - Steven K. Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT 06510, USA
| | - Daniel E. Schäffer
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Cynthia Steiner
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Megan A. Supple
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Aryn P. Wilder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Morgan E. Wirthlin
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Allen Institute for Brain Science, Seattle, WA 98109, USA
| | - James R. Xue
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | | | - Bruce W. Birren
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Steven Gazal
- Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | | | - Klaus-Peter Koepfli
- Center for Species Survival, Smithsonian’s National Zoo and Conservation Biology Institute, Washington, DC 20008, USA
- Computer Technologies Laboratory, ITMO University, St. Petersburg 197101, Russia
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA 22630, USA
| | - Tomas Marques-Bonet
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08036 Barcelona, Spain
- Department of Medicine and Life Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, 08193 Cerdanyola del Vallès, Barcelona, Spain
| | - Wynn K. Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA 18015, USA
| | - Martin Nweeia
- Department of Comprehensive Care, School of Dental Medicine, Case Western Reserve University, Cleveland, OH 44106, USA
- Department of Vertebrate Zoology, Canadian Museum of Nature, Ottawa, Ontario K2P 2R1, Canada
- Department of Vertebrate Zoology, Smithsonian Institution, Washington, DC 20002, USA
- Narwhal Genome Initiative, Department of Restorative Dentistry and Biomaterials Sciences, Harvard School of Dental Medicine, Boston, MA 02115, USA
| | - Pardis C. Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
- Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Mark S. Springer
- Department of Evolution, Ecology and Organismal Biology, University of California Riverside, Riverside, CA 92521, USA
| | - Emma C. Teeling
- School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Michael Hiller
- Faculty of Biosciences, Goethe-University, 60438 Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
| | | | - Harris A. Lewin
- The Genome Center, University of California Davis, Davis, CA 95616, USA
- Department of Evolution and Ecology, University of California Davis, Davis, CA 95616, USA
- John Muir Institute for the Environment, University of California Davis, Davis, CA 95616, USA
| | - William J. Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX 77843, USA
| | - Arcadi Navarro
- Catalan Institution of Research and Advanced Studies (ICREA), 08010 Barcelona, Spain
- Department of Medicine and Life Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, 08003 Barcelona, Spain
- BarcelonaBeta Brain Research Center, Pasqual Maragall Foundation, 08005 Barcelona, Spain
- CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), 08003 Barcelona, Spain
| | - Benedict Paten
- Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA
| | - Katherine S. Pollard
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, CA 94158, USA
- Gladstone Institutes, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| | - David A. Ray
- Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
| | - Irina Ruf
- Division of Messel Research and Mammalogy, Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt am Main, Germany
| | - Oliver A. Ryder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
- Department of Evolution, Behavior and Ecology, School of Biological Sciences, University of California San Diego, La Jolla, CA 92039, USA
| | - Andreas R. Pfenning
- Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kerstin Lindblad-Toh
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Elinor K. Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| |
Collapse
|
5
|
Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed AW, Kontopoulos DG, Hilgers L, Lindblad-Toh K, Karlsson EK, Hiller M, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Integrating gene annotation with orthology inference at scale. Science 2023; 380:eabn3107. [PMID: 37104600 DOI: 10.1126/science.abn3107] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Annotating coding genes and inferring orthologs are two classical challenges in genomics and evolutionary biology that have traditionally been approached separately, limiting scalability. We present TOGA (Tool to infer Orthologs from Genome Alignments), a method that integrates structural gene annotation and orthology inference. TOGA implements a different paradigm to infer orthologous loci, improves ortholog detection and annotation of conserved genes compared with state-of-the-art methods, and handles even highly fragmented assemblies. TOGA scales to hundreds of genomes, which we demonstrate by applying it to 488 placental mammal and 501 bird assemblies, creating the largest comparative gene resources so far. Additionally, TOGA detects gene losses, enables selection screens, and automatically provides a superior measure of mammalian genome quality. TOGA is a powerful and scalable method to annotate and compare genes in the genomic era.
Collapse
Affiliation(s)
- Bogdan M Kirilenko
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Chetan Munegowda
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Ekaterina Osipova
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - David Jebb
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Virag Sharma
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Moritz Blumer
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
| | - Ariadna E Morales
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Alexis-Walid Ahmed
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Dimitrios-Georgios Kontopoulos
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Leon Hilgers
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 751 32 Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Elinor K Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Michael Hiller
- Max Planck Institute of Molecular Cell Biology and Genetics, 01307 Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, 01187 Dresden, Germany
- Center for Systems Biology Dresden, 01307 Dresden, Germany
- LOEWE Centre for Translational Biodiversity Genomics, 60325 Frankfurt, Germany
- Senckenberg Research Institute, 60325 Frankfurt, Germany
- Goethe University Frankfurt, Faculty of Biosciences, 60438 Frankfurt, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Wilder AP, Supple MA, Subramanian A, Mudide A, Swofford R, Serres-Armero A, Steiner C, Koepfli KP, Genereux DP, Karlsson EK, Lindblad-Toh K, Marques-Bonet T, Munoz Fuentes V, Foley K, Meyer WK, Ryder OA, Shapiro B, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. The contribution of historical processes to contemporary extinction risk in placental mammals. Science 2023; 380:eabn5856. [PMID: 37104572 PMCID: PMC10184782 DOI: 10.1126/science.abn5856] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Species persistence can be influenced by the amount, type, and distribution of diversity across the genome, suggesting a potential relationship between historical demography and resilience. In this study, we surveyed genetic variation across single genomes of 240 mammals that compose the Zoonomia alignment to evaluate how historical effective population size (Ne) affects heterozygosity and deleterious genetic load and how these factors may contribute to extinction risk. We find that species with smaller historical Ne carry a proportionally larger burden of deleterious alleles owing to long-term accumulation and fixation of genetic load and have a higher risk of extinction. This suggests that historical demography can inform contemporary resilience. Models that included genomic data were predictive of species' conservation status, suggesting that, in the absence of adequate census or ecological data, genomic information may provide an initial risk assessment.
Collapse
Affiliation(s)
- Aryn P Wilder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Megan A Supple
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA 95064, USA
| | | | | | - Ross Swofford
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Aitor Serres-Armero
- Institute of Evolutionary Biology, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona 08003, Spain
| | - Cynthia Steiner
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
| | - Klaus-Peter Koepfli
- Smithsonian-Mason School of Conservation, George Mason University, Front Royal, VA 22630, USA
- Center for Species Survival, Smithsonian Conservation Biology Institute, National Zoological Park, Washington, DC 30008, USA
- Computer Technologies Laboratory, ITMO University, St. Petersburg 197101, Russia
| | | | - Elinor K Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala 751 32, Sweden
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona 08003, Spain
- Catalan Institution of Research and Advanced Studies, Barcelona 08010, Spain
- Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona 08028, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona 08193, Spain
| | - Violeta Munoz Fuentes
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Kathleen Foley
- College of Law, University of Iowa, Iowa City, IA 52242, USA
- Department of Biological Sciences, Lehigh University, Bethlehem, PA 18015, USA
| | - Wynn K Meyer
- Department of Biological Sciences, Lehigh University, Bethlehem, PA 18015, USA
| | - Oliver A Ryder
- Conservation Genetics, San Diego Zoo Wildlife Alliance, Escondido, CA 92027, USA
- Department of Evolution, Behavior and Ecology, Division of Biology, University of California, San Diego, La Jolla, CA 92039, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA 95064, USA
- Howard Hughes Medical Institute, University of California, Santa Cruz, CA 95064, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Andrews G, Fan K, Pratt HE, Phalke N, Karlsson EK, Lindblad-Toh K, Gazal S, Moore JE, Weng Z, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. Mammalian evolution of human cis-regulatory elements and transcription factor binding sites. Science 2023; 380:eabn7930. [PMID: 37104580 DOI: 10.1126/science.abn7930] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Understanding the regulatory landscape of the human genome is a long-standing objective of modern biology. Using the reference-free alignment across 241 mammalian genomes produced by the Zoonomia Consortium, we charted evolutionary trajectories for 0.92 million human candidate cis-regulatory elements (cCREs) and 15.6 million human transcription factor binding sites (TFBSs). We identified 439,461 cCREs and 2,024,062 TFBSs under evolutionary constraint. Genes near constrained elements perform fundamental cellular processes, whereas genes near primate-specific elements are involved in environmental interaction, including odor perception and immune response. About 20% of TFBSs are transposable element-derived and exhibit intricate patterns of gains and losses during primate evolution whereas sequence variants associated with complex traits are enriched in constrained TFBSs. Our annotations illuminate the regulatory functions of the human genome.
Collapse
Affiliation(s)
- Gregory Andrews
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Kaili Fan
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Henry E Pratt
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Nishigandha Phalke
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Elinor K Karlsson
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA 01605, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132 Uppsala, Sweden
| | - Steven Gazal
- Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
- Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Foley NM, Mason VC, Harris AJ, Bredemeyer KR, Damas J, Lewin HA, Eizirik E, Gatesy J, Karlsson EK, Lindblad-Toh K, Springer MS, Murphy WJ, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. A genomic timescale for placental mammal evolution. Science 2023; 380:eabl8189. [PMID: 37104581 DOI: 10.1126/science.abl8189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
The precise pattern and timing of speciation events that gave rise to all living placental mammals remain controversial. We provide a comprehensive phylogenetic analysis of genetic variation across an alignment of 241 placental mammal genome assemblies, addressing prior concerns regarding limited genomic sampling across species. We compared neutral genome-wide phylogenomic signals using concatenation and coalescent-based approaches, interrogated phylogenetic variation across chromosomes, and analyzed extensive catalogs of structural variants. Interordinal relationships exhibit relatively low rates of phylogenomic conflict across diverse datasets and analytical methods. Conversely, X-chromosome versus autosome conflicts characterize multiple independent clades that radiated during the Cenozoic. Genomic time trees reveal an accumulation of cladogenic events before and immediately after the Cretaceous-Paleogene (K-Pg) boundary, implying important roles for Cretaceous continental vicariance and the K-Pg extinction in the placental radiation.
Collapse
Affiliation(s)
- Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
| | - Victor C Mason
- Institute of Cell Biology, University of Bern, Bern, Switzerland
| | - Andrew J Harris
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics and Genomics, Texas A&M University, College Station, TX, USA
| | - Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics and Genomics, Texas A&M University, College Station, TX, USA
| | - Joana Damas
- The Genome Center, University of California, Davis, CA, USA
| | - Harris A Lewin
- The Genome Center, University of California, Davis, CA, USA
- Department of Evolution and Ecology, University of California, Davis, CA, USA
| | - Eduardo Eizirik
- School of Health and Life Sciences, Pontifical Catholic University of Rio Grande do Sul, Porto Alegre, Brazil
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY, USA
| | - Elinor K Karlsson
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA 01605, USA
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Program in Molecular Medicine, University of Massachussetts Chan Medical School, Worcester, MA 01605, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, 751 32 Uppsala, Sweden
| | - Mark S Springer
- Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, USA
| | - William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, TX, USA
- Interdisciplinary Program in Genetics and Genomics, Texas A&M University, College Station, TX, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Osmanski AB, Paulat NS, Korstian J, Grimshaw JR, Halsey M, Sullivan KAM, Moreno-Santillán DD, Crookshanks C, Roberts J, Garcia C, Johnson MG, Densmore LD, Stevens RD, Rosen J, Storer JM, Hubley R, Smit AFA, Dávalos LM, Karlsson EK, Lindblad-Toh K, Ray DA. Insights into mammalian TE diversity through the curation of 248 genome assemblies. Science 2023; 380:eabn1430. [PMID: 37104570 DOI: 10.1126/science.abn1430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 10/28/2022] [Indexed: 04/29/2023]
Abstract
We examined transposable element (TE) content of 248 placental mammal genome assemblies, the largest de novo TE curation effort in eukaryotes to date. We found that although mammals resemble one another in total TE content and diversity, they show substantial differences with regard to recent TE accumulation. This includes multiple recent expansion and quiescence events across the mammalian tree. Young TEs, particularly long interspersed elements, drive increases in genome size, whereas DNA transposons are associated with smaller genomes. Mammals tend to accumulate only a few types of TEs at any given time, with one TE type dominating. We also found association between dietary habit and the presence of DNA transposon invasions. These detailed annotations will serve as a benchmark for future comparative TE analyses among placental mammals.
Collapse
Affiliation(s)
- Austin B Osmanski
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Nicole S Paulat
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Jenny Korstian
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Jenna R Grimshaw
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Michaela Halsey
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Kevin A M Sullivan
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | | | | | - Jacquelyn Roberts
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Carlos Garcia
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Matthew G Johnson
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | | | - Richard D Stevens
- Department of Natural Resources Management and Natural Science Research Laboratory, Museum of Texas Tech University, Lubbock, TX, USA
| | - Jeb Rosen
- Institute for Systems Biology, Seattle, WA, USA
| | | | | | | | - Liliana M Dávalos
- Department of Ecology & Evolution, Stony Brook University, Stony Brook, NY, USA
- Consortium for Inter-Disciplinary Environmental Research, Stony Brook University, Stony Brook, NY, USA
| | - Elinor K Karlsson
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA, USA
| | - David A Ray
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| |
Collapse
|
10
|
Xue JR, Mackay-Smith A, Mouri K, Garcia MF, Dong MX, Akers JF, Noble M, Li X, Lindblad-Toh K, Karlsson EK, Noonan JP, Capellini TD, Brennand KJ, Tewhey R, Sabeti PC, Reilly SK, Andrews G, Armstrong JC, Bianchi M, Birren BW, Bredemeyer KR, Breit AM, Christmas MJ, Clawson H, Damas J, Di Palma F, Diekhans M, Dong MX, Eizirik E, Fan K, Fanter C, Foley NM, Forsberg-Nilsson K, Garcia CJ, Gatesy J, Gazal S, Genereux DP, Goodman L, Grimshaw J, Halsey MK, Harris AJ, Hickey G, Hiller M, Hindle AG, Hubley RM, Hughes GM, Johnson J, Juan D, Kaplow IM, Karlsson EK, Keough KC, Kirilenko B, Koepfli KP, Korstian JM, Kowalczyk A, Kozyrev SV, Lawler AJ, Lawless C, Lehmann T, Levesque DL, Lewin HA, Li X, Lind A, Lindblad-Toh K, Mackay-Smith A, Marinescu VD, Marques-Bonet T, Mason VC, Meadows JRS, Meyer WK, Moore JE, Moreira LR, Moreno-Santillan DD, Morrill KM, Muntané G, Murphy WJ, Navarro A, Nweeia M, Ortmann S, Osmanski A, Paten B, Paulat NS, Pfenning AR, Phan BN, Pollard KS, Pratt HE, Ray DA, Reilly SK, Rosen JR, Ruf I, Ryan L, Ryder OA, Sabeti PC, Schäffer DE, Serres A, Shapiro B, Smit AFA, Springer M, Srinivasan C, Steiner C, Storer JM, Sullivan KAM, Sullivan PF, Sundström E, Supple MA, Swofford R, Talbot JE, Teeling E, Turner-Maier J, Valenzuela A, Wagner F, Wallerman O, Wang C, Wang J, Weng Z, Wilder AP, Wirthlin ME, Xue JR, Zhang X. The functional and evolutionary impacts of human-specific deletions in conserved elements. Science 2023; 380:eabn2253. [PMID: 37104592 DOI: 10.1126/science.abn2253] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Conserved genomic sequences disrupted in humans may underlie uniquely human phenotypic traits. We identified and characterized 10,032 human-specific conserved deletions (hCONDELs). These short (average 2.56 base pairs) deletions are enriched for human brain functions across genetic, epigenomic, and transcriptomic datasets. Using massively parallel reporter assays in six cell types, we discovered 800 hCONDELs conferring significant differences in regulatory activity, half of which enhance rather than disrupt regulatory function. We highlight several hCONDELs with putative human-specific effects on brain development, including HDAC5, CPEB4, and PPP2CA. Reverting an hCONDEL to the ancestral sequence alters the expression of LOXL2 and developmental genes involved in myelination and synaptic function. Our data provide a rich resource to investigate the evolutionary mechanisms driving new traits in humans and other species.
Collapse
Affiliation(s)
- James R Xue
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for System Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Ava Mackay-Smith
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | | | | | - Michael X Dong
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Jared F Akers
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Mark Noble
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Xue Li
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Elinor K Karlsson
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA, USA
| | - James P Noonan
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Department of Neuroscience, Yale School of Medicine, New Haven, CT, USA
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA
| | - Terence D Capellini
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Human Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Kristen J Brennand
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
- Department of Psychiatry, Yale University, New Haven, CT, USA
| | - Ryan Tewhey
- The Jackson Laboratory, Bar Harbor, ME, USA
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, ME, USA
- Graduate School of Biomedical Sciences Tufts University School of Medicine, Boston, MA, USA
| | - Pardis C Sabeti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for System Biology, Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Department of Immunology and Infectious Disease, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Steven K Reilly
- Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Paulat NS, Storer JM, Moreno-Santillán DD, Osmanski AB, Sullivan KAM, Grimshaw JR, Korstian J, Halsey M, Garcia CJ, Crookshanks C, Roberts J, Smit AFA, Hubley R, Rosen J, Teeling EC, Vernes SC, Myers E, Pippel M, Brown T, Hiller M, Rojas D, Dávalos LM, Lindblad-Toh K, Karlsson EK, Ray DA. Chiropterans are a hotspot for horizontal transfer of DNA transposons in Mammalia. Mol Biol Evol 2023; 40:7128099. [PMID: 37071810 PMCID: PMC10162687 DOI: 10.1093/molbev/msad092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 04/04/2023] [Accepted: 04/10/2023] [Indexed: 04/20/2023] Open
Abstract
Horizontal transfer of transposable elements is an important mechanism contributing to genetic diversity and innovation. Bats (order Chiroptera) have repeatedly been shown to experience horizontal transfer of transposable elements at what appears to be a high rate compared to other mammals. We investigated the occurrence of horizontally transferred DNA transposons involving bats. We found over 200 putative horizontally transferred elements within bats; sixteen transposons were shared across distantly related mammalian clades and two other elements were shared with a fish and two lizard species. Our results indicate that bats are a hotspot for horizontal transfer of DNA transposons. These events broadly coincide with the diversification of several bat clades, supporting the hypothesis that DNA transposon invasions have contributed to genetic diversification of bats.
Collapse
Affiliation(s)
- Nicole S Paulat
- Department of Biological Sciences, Texas Tech University; Lubbock, TX 79409, USA
| | | | | | - Austin B Osmanski
- Department of Biological Sciences, Texas Tech University; Lubbock, TX 79409, USA
| | - Kevin A M Sullivan
- Department of Biological Sciences, Texas Tech University; Lubbock, TX 79409, USA
| | - Jenna R Grimshaw
- Department of Biological Sciences, Texas Tech University; Lubbock, TX 79409, USA
| | - Jennifer Korstian
- Department of Biological Sciences, Texas Tech University; Lubbock, TX 79409, USA
| | - Michaela Halsey
- Department of Biological Sciences, Texas Tech University; Lubbock, TX 79409, USA
| | - Carlos J Garcia
- Department of Biological Sciences, Texas Tech University; Lubbock, TX 79409, USA
| | - Claudia Crookshanks
- Department of Biological Sciences, Texas Tech University; Lubbock, TX 79409, USA
| | - Jaquelyn Roberts
- Department of Biological Sciences, Texas Tech University; Lubbock, TX 79409, USA
| | | | - Robert Hubley
- Institute for Systems Biology; Seattle, WA 98109, USA
| | - Jeb Rosen
- Institute for Systems Biology; Seattle, WA 98109, USA
| | - Emma C Teeling
- School of Biology and Environmental Science, University College Dublin; Belfield, Dublin 4, Ireland
| | - Sonja C Vernes
- Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics; 6525 XD, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour; 6525 AJ, Nijmegen, The Netherlands
- School of Biology, The University of St Andrews; Fife KY16 9ST, UK
| | - Eugene Myers
- Max-Planck-Institute of Molecular Cell Biology and Genetics Dresden, 28271, Germany
| | - Martin Pippel
- Max Planck Institute of Molecular Cell Biology and Genetics; 01307, Dresden, Germany
| | - Thomas Brown
- Max Planck Institute of Molecular Cell Biology and Genetics; 01307, Dresden, Germany
| | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics; 60325, Frankfurt, Germany
| | - Danny Rojas
- Department of Natural Sciences and Mathematics, Pontificia Universidad Javeriana Cali, Valle del Cauca, Colombia
| | - Liliana M Dávalos
- Department of Ecology and Evolution, Stony Brook University; Stony Brook, NY 11790, USA
- Consortium for Inter-Disciplinary Environmental Research, Stony Brook University; Stony Brook, NY 11790, USA
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University; Uppsala, 751 32, Sweden
- Broad Institute of MIT and Harvard; Cambridge, MA 02139, USA
| | - Elinor K Karlsson
- Broad Institute of MIT and Harvard; Cambridge, MA 02139, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School; Worcester, MA 01605, USA
- Program in Molecular Medicine, UMass Chan Medical School; Worcester, MA 01605, USA
| | - David A Ray
- Department of Biological Sciences, Texas Tech University; Lubbock, TX 79409, USA
| |
Collapse
|
12
|
Hubley R, Wheeler TJ, Smit AFA. Accuracy of multiple sequence alignment methods in the reconstruction of transposable element families. NAR Genom Bioinform 2022; 4:lqac040. [PMID: 35591887 PMCID: PMC9112768 DOI: 10.1093/nargab/lqac040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 03/29/2022] [Accepted: 04/29/2022] [Indexed: 02/06/2023] Open
Abstract
The construction of a high-quality multiple sequence alignment (MSA) from copies of a transposable element (TE) is a critical step in the characterization of a new TE family. Most studies of MSA accuracy have been conducted on protein or RNA sequence families, where structural features and strong signals of selection may assist with alignment. Less attention has been given to the quality of sequence alignments involving neutrally evolving DNA sequences such as those resulting from TE replication. Transposable element sequences are challenging to align due to their wide divergence ranges, fragmentation, and predominantly-neutral mutation patterns. To gain insight into the effects of these properties on MSA accuracy, we developed a simulator of TE sequence evolution, and used it to generate a benchmark with which we evaluated the MSA predictions produced by several popular aligners, along with Refiner, a method we developed in the context of our RepeatModeler software. We find that MAFFT and Refiner generally outperform other aligners for low to medium divergence simulated sequences, while Refiner is uniquely effective when tasked with aligning high-divergent and fragmented instances of a family.
Collapse
Affiliation(s)
- Robert Hubley
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Travis J Wheeler
- Department of Computer Science, University of Montana, Missoula, MT 59801, USA
| | | |
Collapse
|
13
|
Storer JM, Hubley R, Rosen J, Smit AFA. Methodologies for the De novo Discovery of Transposable Element Families. Genes (Basel) 2022; 13:709. [PMID: 35456515 PMCID: PMC9025800 DOI: 10.3390/genes13040709] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/14/2022] [Accepted: 04/15/2022] [Indexed: 02/07/2023] Open
Abstract
The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each has been exposed to a unique and often complex set of TE families. De novo methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in de novo TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field.
Collapse
Affiliation(s)
| | | | | | - Arian F. A. Smit
- Institute for Systems Biology, Seattle, WA 98109, USA; (J.M.S.); (R.H.); (J.R.)
| |
Collapse
|
14
|
Hoyt SJ, Storer JM, Hartley GA, Grady PGS, Gershman A, de Lima LG, Limouse C, Halabian R, Wojenski L, Rodriguez M, Altemose N, Rhie A, Core LJ, Gerton JL, Makalowski W, Olson D, Rosen J, Smit AFA, Straight AF, Vollger MR, Wheeler TJ, Schatz MC, Eichler EE, Phillippy AM, Timp W, Miga KH, O’Neill RJ. From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science 2022; 376:eabk3112. [PMID: 35357925 PMCID: PMC9301658 DOI: 10.1126/science.abk3112] [Citation(s) in RCA: 108] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.
Collapse
Affiliation(s)
- Savannah J. Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | | | - Gabrielle A. Hartley
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Patrick G. S. Grady
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
| | | | - Charles Limouse
- Department of Biochemistry, Stanford University, Stanford, CA, USA
| | - Reza Halabian
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Luke Wojenski
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Matias Rodriguez
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Nicolas Altemose
- Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Leighton J. Core
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | | | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Daniel Olson
- Department of Computer Science, University of Montana, Missoula, MT, USA
| | - Jeb Rosen
- Institute for Systems Biology, Seattle, WA, USA
| | | | | | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Travis J. Wheeler
- Department of Computer Science, University of Montana, Missoula, MT, USA
| | - Michael C. Schatz
- Department of Computer Science and Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Rachel J. O’Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
| |
Collapse
|
15
|
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PG, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O’Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM. The complete sequence of a human genome. Science 2022; 376:44-53. [PMID: 35357919 PMCID: PMC9186530 DOI: 10.1126/science.abj6987] [Citation(s) in RCA: 894] [Impact Index Per Article: 447.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Collapse
Affiliation(s)
- Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego; La Jolla, CA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Nicolas Altemose
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
| | - Lev Uralsky
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Michael Alonge
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | | | | | - Gerard G. Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Shelise Y. Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley; Berkeley, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | | | | | | | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Richard Durbin
- Wellcome Sanger Institute; Cambridge, UK
- Department of Genetics, University of Cambridge; Cambridge, UK
| | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | | | - Giulio Formenti
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Robert S. Fulton
- Department of Genetics, Washington University School of Medicine; St. Louis, MO, USA
| | | | - Erik Garrison
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- University of Tennessee Health Science Center; Memphis, TN, USA
| | - Patrick G.S. Grady
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | | | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine; New Haven, CT, USA
| | - Nancy F. Hansen
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | | | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
- Department of Computational and Data Sciences, Indian Institute of Science; Bangalore KA, India
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Erich D. Jarvis
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | | | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | | | - Milinn Kremitzki
- McDonnell Genome Institute, Washington University in St. Louis; St. Louis, MO, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | - Valerie V. Maduro
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics; Düsseldorf, Germany
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Danny E. Miller
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children’s Hospital; Seattle, WA, USA
| | - James C. Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Eugene W. Myers
- Max-Planck Institute of Molecular Cell Biology and Genetics; Dresden, Germany
| | - Nathan D. Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research; Kansas City, MO, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School; Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University; Moscow, Russia
| | | | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine; Houston TX, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Colin J. Shew
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Ying Sims
- Wellcome Sanger Institute; Cambridge, UK
| | | | - Daniela C. Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan Sović
- Pacific Biosciences; Menlo Park, CA, USA
- Digital BioLogic d.o.o.; Ivanić-Grad, Croatia
| | | | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
- Chan Zuckerberg Biohub; San Francisco, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine; Durham, NC, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | | | - Justin Wagner
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Stephanie M. Yan
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Alice C. Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh; Pittsburgh, PA, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan A. Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences; Moscow, Russia
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research; Kansas City, MO, USA
- Department of Biochemistry and Molecular Biology, University of Kansas Medical School; Kansas City, MO, USA
| | - Rachel J. O’Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| |
Collapse
|
16
|
Locke DP, Hillier LW, Warren WC, Worley KC, Nazareth LV, Muzny DM, Yang SP, Wang Z, Chinwalla AT, Minx P, Mitreva M, Cook L, Delehaunty KD, Fronick C, Schmidt H, Fulton LA, Fulton RS, Nelson JO, Magrini V, Pohl C, Graves TA, Markovic C, Cree A, Dinh HH, Hume J, Kovar CL, Fowler GR, Lunter G, Meader S, Heger A, Ponting CP, Marques-Bonet T, Alkan C, Chen L, Cheng Z, Kidd JM, Eichler EE, White S, Searle S, Vilella AJ, Chen Y, Flicek P, Ma J, Raney B, Suh B, Burhans R, Herrero J, Haussler D, Faria R, Fernando O, Darré F, Farré D, Gazave E, Oliva M, Navarro A, Roberto R, Capozzi O, Archidiacono N, Della Valle G, Purgato S, Rocchi M, Konkel MK, Walker JA, Ullmer B, Batzer MA, Smit AFA, Hubley R, Casola C, Schrider DR, Hahn MW, Quesada V, Puente XS, Ordoñez GR, López-Otín C, Vinar T, Brejova B, Ratan A, Harris RS, Miller W, Kosiol C, Lawson HA, Taliwal V, Martins AL, Siepel A, RoyChoudhury A, Ma X, Degenhardt J, Bustamante CD, Gutenkunst RN, Mailund T, Dutheil JY, Hobolth A, Schierup MH, Ryder OA, Yoshinaga Y, de Jong PJ, Weinstock GM, Rogers J, Mardis ER, Gibbs RA, Wilson RK. Author Correction: Comparative and demographic analysis of orang-utan genomes. Nature 2022; 608:E36. [PMID: 35962045 PMCID: PMC9402433 DOI: 10.1038/s41586-022-04799-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Devin P. Locke
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - LaDeana W. Hillier
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Wesley C. Warren
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Kim C. Worley
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Lynne V. Nazareth
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Donna M. Muzny
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Shiaw-Pyng Yang
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Zhengyuan Wang
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Asif T. Chinwalla
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Pat Minx
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Makedonka Mitreva
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Lisa Cook
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Kim D. Delehaunty
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Catrina Fronick
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Heather Schmidt
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Lucinda A. Fulton
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Robert S. Fulton
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Joanne O. Nelson
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Vincent Magrini
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Craig Pohl
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Tina A. Graves
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Chris Markovic
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Andy Cree
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Huyen H. Dinh
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Jennifer Hume
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Christie L. Kovar
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Gerald R. Fowler
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Gerton Lunter
- grid.4991.50000 0004 1936 8948MRC Functional Genomics Unit and Department of Physiology, Anatomy and Genetics, University of Oxford, Le Gros Clark Building, Oxford, UK ,grid.270683.80000 0004 0641 4511Wellcome Trust Centre for Human Genetics, Oxford, UK
| | - Stephen Meader
- grid.4991.50000 0004 1936 8948MRC Functional Genomics Unit and Department of Physiology, Anatomy and Genetics, University of Oxford, Le Gros Clark Building, Oxford, UK
| | - Andreas Heger
- grid.4991.50000 0004 1936 8948MRC Functional Genomics Unit and Department of Physiology, Anatomy and Genetics, University of Oxford, Le Gros Clark Building, Oxford, UK
| | - Chris P. Ponting
- grid.4991.50000 0004 1936 8948MRC Functional Genomics Unit and Department of Physiology, Anatomy and Genetics, University of Oxford, Le Gros Clark Building, Oxford, UK
| | - Tomas Marques-Bonet
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA ,grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Can Alkan
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA
| | - Lin Chen
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA
| | - Ze Cheng
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA
| | - Jeffrey M. Kidd
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA
| | - Evan E. Eichler
- grid.34477.330000000122986657Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington USA ,grid.413575.10000 0001 2167 1581Howard Hughes Medical Institute, Seattle, Washington USA
| | - Simon White
- grid.10306.340000 0004 0606 5382Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Stephen Searle
- grid.10306.340000 0004 0606 5382Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Cambridge, UK
| | - Albert J. Vilella
- grid.52788.300000 0004 0427 7672European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge UK
| | - Yuan Chen
- grid.52788.300000 0004 0427 7672European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge UK
| | - Paul Flicek
- grid.52788.300000 0004 0427 7672European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge UK
| | - Jian Ma
- grid.205975.c0000 0001 0740 6917Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California USA ,grid.35403.310000 0004 1936 9991Present Address: Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, Illinois USA
| | - Brian Raney
- grid.205975.c0000 0001 0740 6917Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California USA
| | - Bernard Suh
- grid.205975.c0000 0001 0740 6917Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California USA
| | - Richard Burhans
- grid.29857.310000 0001 2097 4281Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania, USA
| | - Javier Herrero
- grid.52788.300000 0004 0427 7672European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge UK
| | - David Haussler
- grid.205975.c0000 0001 0740 6917Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California USA
| | - Rui Faria
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain ,grid.5808.50000 0001 1503 7226CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Campus Agrário de Vairão, Vairão, Portugal
| | - Olga Fernando
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain ,grid.10772.330000000121511713Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Oeiras, Portugal
| | - Fleur Darré
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Domènec Farré
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Elodie Gazave
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Meritxell Oliva
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Arcadi Navarro
- grid.5612.00000 0001 2172 2676IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, Barcelona, Spain ,grid.425902.80000 0000 9601 989XICREA (Institució Catalana de Recerca i Estudis Avançats) and INB (Instituto Nacional de Bioinformática) PRBB, Doctor Aiguader, 88, Barcelona, Spain
| | - Roberta Roberto
- grid.7644.10000 0001 0120 3326Department of Biology, University of Bari, Bari, Italy
| | - Oronzo Capozzi
- grid.7644.10000 0001 0120 3326Department of Biology, University of Bari, Bari, Italy
| | | | - Giuliano Della Valle
- grid.6292.f0000 0004 1757 1758Department of Biology, University of Bologna, Bologna, Italy
| | - Stefania Purgato
- grid.6292.f0000 0004 1757 1758Department of Biology, University of Bologna, Bologna, Italy
| | - Mariano Rocchi
- grid.7644.10000 0001 0120 3326Department of Biology, University of Bari, Bari, Italy
| | - Miriam K. Konkel
- grid.64337.350000 0001 0662 7451Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana USA
| | - Jerilyn A. Walker
- grid.64337.350000 0001 0662 7451Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana USA
| | - Brygg Ullmer
- grid.64337.350000 0001 0662 7451Center for Computation and Technology, Department of Computer Sciences, Louisiana State University, Baton Rouge, Louisiana USA
| | - Mark A. Batzer
- grid.64337.350000 0001 0662 7451Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana USA
| | - Arian F. A. Smit
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, Washington USA
| | - Robert Hubley
- grid.64212.330000 0004 0463 2320Institute for Systems Biology, Seattle, Washington USA
| | - Claudio Casola
- grid.411377.70000 0001 0790 959XDepartment of Biology and School of Informatics and Computing, Indiana University, Bloomington, Indiana USA
| | - Daniel R. Schrider
- grid.411377.70000 0001 0790 959XDepartment of Biology and School of Informatics and Computing, Indiana University, Bloomington, Indiana USA
| | - Matthew W. Hahn
- grid.411377.70000 0001 0790 959XDepartment of Biology and School of Informatics and Computing, Indiana University, Bloomington, Indiana USA
| | - Victor Quesada
- grid.10863.3c0000 0001 2164 6351Instituto Universitario de Oncologia, Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Spain
| | - Xose S. Puente
- grid.10863.3c0000 0001 2164 6351Instituto Universitario de Oncologia, Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Spain
| | - Gonzalo R. Ordoñez
- grid.10863.3c0000 0001 2164 6351Instituto Universitario de Oncologia, Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Spain
| | - Carlos López-Otín
- grid.10863.3c0000 0001 2164 6351Instituto Universitario de Oncologia, Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Spain
| | - Tomas Vinar
- grid.7634.60000000109409708Faculty of Mathematics, Physics and Informatics, Comenius University, Mlynska Dolina, Bratislava, Slovakia
| | - Brona Brejova
- grid.7634.60000000109409708Faculty of Mathematics, Physics and Informatics, Comenius University, Mlynska Dolina, Bratislava, Slovakia
| | - Aakrosh Ratan
- grid.29857.310000 0001 2097 4281Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania, USA
| | - Robert S. Harris
- grid.29857.310000 0001 2097 4281Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania, USA
| | - Webb Miller
- grid.29857.310000 0001 2097 4281Center for Comparative Genomics and Bioinformatics, Penn State University, University Park, Pennsylvania, USA
| | - Carolin Kosiol
- Institut für Populations genetik, Vetmeduni Vienna, Wien, Austria
| | - Heather A. Lawson
- grid.4367.60000 0001 2355 7002Department of Anatomy and Neurobiology, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Vikas Taliwal
- grid.5386.8000000041936877XDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York USA
| | - André L. Martins
- grid.5386.8000000041936877XDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York USA
| | - Adam Siepel
- grid.5386.8000000041936877XDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York USA
| | - Arindam RoyChoudhury
- grid.21729.3f0000000419368729Department of Biostatistics, Columbia University, New York, New York USA
| | - Xin Ma
- grid.5386.8000000041936877XDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York USA
| | - Jeremiah Degenhardt
- grid.5386.8000000041936877XDepartment of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York USA
| | - Carlos D. Bustamante
- grid.168010.e0000000419368956Department of Genetics, Stanford University, Stanford, California USA
| | - Ryan N. Gutenkunst
- grid.134563.60000 0001 2168 186XDepartment of Molecular and Cellular Biology, University of Arizona, Tucson, Arizona USA
| | - Thomas Mailund
- grid.7048.b0000 0001 1956 2722Bioinformatics Research Centre, Aarhus University, Aarhus C, Denmark
| | - Julien Y. Dutheil
- grid.7048.b0000 0001 1956 2722Bioinformatics Research Centre, Aarhus University, Aarhus C, Denmark
| | - Asger Hobolth
- grid.7048.b0000 0001 1956 2722Bioinformatics Research Centre, Aarhus University, Aarhus C, Denmark
| | - Mikkel H. Schierup
- grid.7048.b0000 0001 1956 2722Bioinformatics Research Centre, Aarhus University, Aarhus C, Denmark
| | - Oliver A. Ryder
- grid.452788.40000 0004 0458 5309San Diego Zoo’s Institute for Conservation Research, Escondido, California USA
| | - Yuko Yoshinaga
- grid.414016.60000 0004 0433 7727Children’s Hospital Oakland Research Institute, Oakland, California USA
| | - Pieter J. de Jong
- grid.414016.60000 0004 0433 7727Children’s Hospital Oakland Research Institute, Oakland, California USA
| | - George M. Weinstock
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Jeffrey Rogers
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Elaine R. Mardis
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| | - Richard A. Gibbs
- grid.39382.330000 0001 2160 926XHuman Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas USA
| | - Richard K. Wilson
- grid.4367.60000 0001 2355 7002The Genome Center at Washington University, Washington University School of Medicine, Saint Louis, Missouri USA
| |
Collapse
|
17
|
Parisot N, Vargas-Chávez C, Goubert C, Baa-Puyoulet P, Balmand S, Beranger L, Blanc C, Bonnamour A, Boulesteix M, Burlet N, Calevro F, Callaerts P, Chancy T, Charles H, Colella S, Da Silva Barbosa A, Dell'Aglio E, Di Genova A, Febvay G, Gabaldón T, Galvão Ferrarini M, Gerber A, Gillet B, Hubley R, Hughes S, Jacquin-Joly E, Maire J, Marcet-Houben M, Masson F, Meslin C, Montagné N, Moya A, Ribeiro de Vasconcelos AT, Richard G, Rosen J, Sagot MF, Smit AFA, Storer JM, Vincent-Monegat C, Vallier A, Vigneron A, Zaidman-Rémy A, Zamoum W, Vieira C, Rebollo R, Latorre A, Heddi A. The transposable element-rich genome of the cereal pest Sitophilus oryzae. BMC Biol 2021; 19:241. [PMID: 34749730 PMCID: PMC8576890 DOI: 10.1186/s12915-021-01158-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 09/27/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND The rice weevil Sitophilus oryzae is one of the most important agricultural pests, causing extensive damage to cereal in fields and to stored grains. S. oryzae has an intracellular symbiotic relationship (endosymbiosis) with the Gram-negative bacterium Sodalis pierantonius and is a valuable model to decipher host-symbiont molecular interactions. RESULTS We sequenced the Sitophilus oryzae genome using a combination of short and long reads to produce the best assembly for a Curculionidae species to date. We show that S. oryzae has undergone successive bursts of transposable element (TE) amplification, representing 72% of the genome. In addition, we show that many TE families are transcriptionally active, and changes in their expression are associated with insect endosymbiotic state. S. oryzae has undergone a high gene expansion rate, when compared to other beetles. Reconstruction of host-symbiont metabolic networks revealed that, despite its recent association with cereal weevils (30 kyear), S. pierantonius relies on the host for several amino acids and nucleotides to survive and to produce vitamins and essential amino acids required for insect development and cuticle biosynthesis. CONCLUSIONS Here we present the genome of an agricultural pest beetle, which may act as a foundation for pest control. In addition, S. oryzae may be a useful model for endosymbiosis, and studying TE evolution and regulation, along with the impact of TEs on eukaryotic genomes.
Collapse
Affiliation(s)
- Nicolas Parisot
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Carlos Vargas-Chávez
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- Institute for Integrative Systems Biology (I2SySBio), Universitat de València and Spanish Research Council (CSIC), València, Spain
- Present Address: Institute of Evolutionary Biology (IBE), CSIC-Universitat Pompeu Fabra, Barcelona, Spain
| | - Clément Goubert
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France
- Department of Molecular Biology and Genetics, Cornell University, 526 Campus Rd, Ithaca, New York, 14853, USA
- Present Address: Human Genetics, McGill University, Montreal, QC, Canada
| | | | - Séverine Balmand
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Louis Beranger
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Caroline Blanc
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Aymeric Bonnamour
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Matthieu Boulesteix
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France
| | - Nelly Burlet
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France
| | - Federica Calevro
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Patrick Callaerts
- Department of Human Genetics, Laboratory of Behavioral and Developmental Genetics, KU Leuven, University of Leuven, B-3000, Leuven, Belgium
| | - Théo Chancy
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Hubert Charles
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- ERABLE European Team, INRIA, Rhône-Alpes, France
| | - Stefano Colella
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- Present Address: LSTM, Laboratoire des Symbioses Tropicales et Méditerranéennes, IRD, CIRAD, INRAE, SupAgro, Univ Montpellier, Montpellier, France
| | - André Da Silva Barbosa
- INRAE, Sorbonne Université, CNRS, IRD, UPEC, Université de Paris, Institute of Ecology and Environmental Sciences of Paris, Versailles, France
| | - Elisa Dell'Aglio
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Alex Di Genova
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France
- ERABLE European Team, INRIA, Rhône-Alpes, France
- Instituto de Ciencias de la Ingeniería, Universidad de O'Higgins, Rancagua, Chile
| | - Gérard Febvay
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Toni Gabaldón
- Life Sciences, Barcelona Supercomputing Centre (BSC-CNS), Barcelona, Spain
- Mechanisms of Disease, Institute for Research in Biomedicine (IRB), Barcelona, Spain
- Institut Catalan de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | | | - Alexandra Gerber
- Laboratório de Bioinformática, Laboratório Nacional de Computação Científica, Petrópolis, Brazil
| | - Benjamin Gillet
- Institut de Génomique Fonctionnelle de Lyon (IGFL), Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS UMR 5242, Lyon, France
| | | | - Sandrine Hughes
- Institut de Génomique Fonctionnelle de Lyon (IGFL), Université de Lyon, Ecole Normale Supérieure de Lyon, CNRS UMR 5242, Lyon, France
| | - Emmanuelle Jacquin-Joly
- INRAE, Sorbonne Université, CNRS, IRD, UPEC, Université de Paris, Institute of Ecology and Environmental Sciences of Paris, Versailles, France
| | - Justin Maire
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- Present Address: School of BioSciences, The University of Melbourne, Parkville, VIC, 3010, Australia
| | | | - Florent Masson
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- Present Address: Global Health Institute, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015, Lausanne, Switzerland
| | - Camille Meslin
- INRAE, Sorbonne Université, CNRS, IRD, UPEC, Université de Paris, Institute of Ecology and Environmental Sciences of Paris, Versailles, France
| | - Nicolas Montagné
- INRAE, Sorbonne Université, CNRS, IRD, UPEC, Université de Paris, Institute of Ecology and Environmental Sciences of Paris, Versailles, France
| | - Andrés Moya
- Institute for Integrative Systems Biology (I2SySBio), Universitat de València and Spanish Research Council (CSIC), València, Spain
- Foundation for the Promotion of Sanitary and Biomedical Research of Valencian Community (FISABIO), València, Spain
| | | | - Gautier Richard
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653, Le Rheu, France
| | - Jeb Rosen
- Institute for Systems Biology, Seattle, WA, USA
| | - Marie-France Sagot
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France
- ERABLE European Team, INRIA, Rhône-Alpes, France
| | | | | | | | - Agnès Vallier
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Aurélien Vigneron
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
- Present Address: Department of Evolutionary Ecology, Institute for Organismic and Molecular Evolution, Johannes Gutenberg University, 55128, Mainz, Germany
| | - Anna Zaidman-Rémy
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Waël Zamoum
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France
| | - Cristina Vieira
- Laboratoire de Biométrie et Biologie Evolutive, UMR5558, Université Lyon 1, Université Lyon, Villeurbanne, France.
- ERABLE European Team, INRIA, Rhône-Alpes, France.
| | - Rita Rebollo
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France.
| | - Amparo Latorre
- Institute for Integrative Systems Biology (I2SySBio), Universitat de València and Spanish Research Council (CSIC), València, Spain.
- Foundation for the Promotion of Sanitary and Biomedical Research of Valencian Community (FISABIO), València, Spain.
| | - Abdelaziz Heddi
- Univ Lyon, INSA Lyon, INRAE, BF2I, UMR 203, 69621 Villeurbanne, France.
| |
Collapse
|
18
|
Abstract
Transposable elements (TEs) have the ability to alter individual genomic landscapes and shape the course of evolution for species in which they reside. Such profound changes can be understood by studying the biology of the organism and the interplay of the TEs it hosts. Characterizing and curating TEs across a wide range of species is a fundamental first step in this endeavor. This protocol employs techniques honed while developing TE libraries for a wide range of organisms and specifically addresses: (1) the extension of truncated de novo results into full-length TE families; (2) the iterative refinement of TE multiple sequence alignments; and (3) the use of alignment visualization to assess model completeness and subfamily structure. © 2021 Wiley Periodicals LLC. Basic Protocol: Extension and edge polishing of consensi and seed alignments derived from de novo repeat finders Support Protocol: Generating seed alignments using a library of consensi and a genome assembly.
Collapse
Affiliation(s)
| | | | - Jeb Rosen
- Institute for Systems Biology, Seattle, Washington
| | | |
Collapse
|
19
|
Konkel MK, Ullmer B, Arceneaux EL, Sanampudi S, Brantley SA, Hubley R, Smit AFA, Batzer MA. Discovery of a new repeat family in the Callithrix jacchus genome. Genome Res 2016; 26:649-59. [PMID: 26916108 PMCID: PMC4864456 DOI: 10.1101/gr.199075.115] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 02/23/2016] [Indexed: 11/24/2022]
Abstract
We identified a novel repeat family, termed Platy-1, in the Callithrix jacchus (common marmoset) genome that arose around the time of the divergence of platyrrhines and catarrhines and established itself as a repeat family in New World monkeys (NWMs). A full-length Platy-1 element is ∼100 bp in length, making it the shortest known short interspersed element (SINE) in primates, and harbors features characteristic of non-LTR retrotransposons. We identified 2268 full-length Platy-1 elements across 62 subfamilies in the common marmoset genome. Our subfamily reconstruction and phylogenetic analyses support Platy-1 propagation throughout the evolution of NWMs in the lineage leading to C. jacchus Platy-1 appears to have reached its amplification peak in the common ancestor of current day marmosets and has since moderately declined. However, identification of more than 200 Platy-1 elements identical to their respective consensus sequence, and the presence of polymorphic elements within common marmoset populations, suggests ongoing retrotransposition activity. Platy-1, a SINE, appears to have originated from an Alu element, and hence is likely derived from 7SL RNA. Our analyses illustrate the birth of a new repeat family and its propagation dynamics in the lineage leading to the common marmoset over the last 40 million years.
Collapse
Affiliation(s)
- Miriam K Konkel
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| | - Brygg Ullmer
- School of Electrical Engineering and Computer Science, Center for Computation and Technology, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| | - Erika L Arceneaux
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| | - Sreeja Sanampudi
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| | - Sarah A Brantley
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| | - Robert Hubley
- Institute for Systems Biology, Seattle, Washington 98109-5263, USA
| | - Arian F A Smit
- Institute for Systems Biology, Seattle, Washington 98109-5263, USA
| | - Mark A Batzer
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| |
Collapse
|
20
|
Hubley R, Finn RD, Clements J, Eddy SR, Jones TA, Bao W, Smit AFA, Wheeler TJ. The Dfam database of repetitive DNA families. Nucleic Acids Res 2015; 44:D81-9. [PMID: 26612867 PMCID: PMC4702899 DOI: 10.1093/nar/gkv1272] [Citation(s) in RCA: 391] [Impact Index Per Article: 43.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 11/03/2015] [Indexed: 11/20/2022] Open
Abstract
Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of many genomes. Dfam is an open access database of families of repetitive DNA elements, in which each family is represented by a multiple sequence alignment and a profile hidden Markov model (HMM). The initial release of Dfam, featured in the 2013 NAR Database Issue, contained 1143 families of repetitive elements found in humans, and was used to produce more than 100 Mb of additional annotation of TE-derived regions in the human genome, with improved speed. Here, we describe recent advances, most notably expansion to 4150 total families including a comprehensive set of known repeat families from four new organisms (mouse, zebrafish, fly and nematode). We describe improvements to coverage, and to our methods for identifying and reducing false annotation. We also describe updates to the website interface. The Dfam website has moved to http://dfam.org. Seed alignments, profile HMMs, hit lists and other underlying data are available for download.
Collapse
Affiliation(s)
- Robert Hubley
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1RQ, UK
| | - Jody Clements
- HHMI Janelia Research Campus, Ashburn, VA 20147, USA
| | - Sean R Eddy
- Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA
| | - Thomas A Jones
- Howard Hughes Medical Institute, Harvard University, Cambridge, MA 02138, USA
| | - Weidong Bao
- Genetic Information Research Institute, Los Altos, CA 94022, USA
| | | | | |
Collapse
|
21
|
Chong AY, Kojima KK, Jurka J, Ray DA, Smit AFA, Isberg SR, Gongora J. Evolution and gene capture in ancient endogenous retroviruses - insights from the crocodilian genomes. Retrovirology 2014; 11:71. [PMID: 25499090 PMCID: PMC4299795 DOI: 10.1186/s12977-014-0071-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 08/07/2014] [Indexed: 01/12/2023] Open
Abstract
Background Crocodilians are thought to be hosts to a diverse and divergent complement of endogenous retroviruses (ERVs) but a comprehensive investigation is yet to be performed. The recent sequencing of three crocodilian genomes provides an opportunity for a more detailed and accurate representation of the ERV diversity that is present in these species. Here we investigate the diversity, distribution and evolution of ERVs from the genomes of three key crocodilian species, and outline the key processes driving crocodilian ERV proliferation and evolution. Results ERVs and ERV related sequences make up less than 2% of crocodilian genomes. We recovered and described 45 ERV groups within the three crocodilian genomes, many of which are species specific. We have also revealed a new class of ERV, ERV4, which appears to be common to crocodilians and turtles, and currently has no characterised exogenous counterpart. For the first time, we formally describe the characteristics of this ERV class and its classification relative to other recognised ERV and retroviral classes. This class shares some sequence similarity and sequence characteristics with ERV3, although it is phylogenetically distinct from the other ERV classes. We have also identified two instances of gene capture by crocodilian ERVs, one of which, the capture of a host KIT-ligand mRNA has occurred without the loss of an ERV domain. Conclusions This study indicates that crocodilian ERVs comprise a wide variety of lineages, many of which appear to reflect ancient infections. In particular, ERV4 appears to have a limited host range, with current data suggesting that it is confined to crocodilians and some lineages of turtles. Also of interest are two ERV groups that demonstrate evidence of host gene capture. This study provides a framework to facilitate further studies into non-mammalian vertebrates and highlights the need for further studies into such species. Electronic supplementary material The online version of this article (doi:10.1186/s12977-014-0071-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Amanda Y Chong
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW, 2006, Australia.
| | - Kenji K Kojima
- Genetic Information Research Institute, Los Altos, CA, 94022, USA.
| | - Jerzy Jurka
- Genetic Information Research Institute, Los Altos, CA, 94022, USA.
| | - David A Ray
- Department of Biochemistry, Molecular Biology, Plant Pathology and Entomology, Mississippi State University, Starkville, Mississippi State, 39762, USA. .,Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, Mississippi State, 39762, USA. .,Current Address: Department of Biological Sciences, Texas Tech University, Lubbock, TX, 79409, USA.
| | - Arian F A Smit
- Institute for Systems Biology, Seattle, WA, 98109-5234, USA.
| | - Sally R Isberg
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW, 2006, Australia. .,Centre for Crocodile Research, Noonamah, NT, 0837, Australia.
| | - Jaime Gongora
- Faculty of Veterinary Science, University of Sydney, Sydney, NSW, 2006, Australia.
| |
Collapse
|
22
|
Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, Dreszer TR, Fujita PA, Guruvadoo L, Haeussler M, Harte RA, Heitner S, Hickey G, Hinrichs AS, Hubley R, Karolchik D, Learned K, Lee BT, Li CH, Miga KH, Nguyen N, Paten B, Raney BJ, Smit AFA, Speir ML, Zweig AS, Haussler D, Kuhn RM, Kent WJ. The UCSC Genome Browser database: 2015 update. Nucleic Acids Res 2014; 43:D670-81. [PMID: 25428374 PMCID: PMC4383971 DOI: 10.1093/nar/gku1177] [Citation(s) in RCA: 690] [Impact Index Per Article: 69.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.
Collapse
Affiliation(s)
- Kate R Rosenbloom
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Joel Armstrong
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Galt P Barber
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Jonathan Casper
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Hiram Clawson
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Timothy R Dreszer
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Pauline A Fujita
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Luvina Guruvadoo
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Maximilian Haeussler
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Rachel A Harte
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Steve Heitner
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Glenn Hickey
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Angie S Hinrichs
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Robert Hubley
- Institute for Systems Biology, Seattle, WA 98109, USA
| | - Donna Karolchik
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Katrina Learned
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian T Lee
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Chin H Li
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Karen H Miga
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ngan Nguyen
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Benedict Paten
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Brian J Raney
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | | | - Matthew L Speir
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - Ann S Zweig
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - David Haussler
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA Howard Hughes Medical Institute, UCSC, Santa Cruz, CA 95064, USA
| | - Robert M Kuhn
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| | - W James Kent
- Center for Biomolecular Science and Engineering, CBSE, UC Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA
| |
Collapse
|
23
|
Abstract
A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/.
Collapse
Affiliation(s)
- Juan Caballero
- Institute for Systems Biology, 401 Terry Ave. N, Seattle, WA 98109, USA
| | - Arian F A Smit
- Institute for Systems Biology, 401 Terry Ave. N, Seattle, WA 98109, USA
| | - Leroy Hood
- Institute for Systems Biology, 401 Terry Ave. N, Seattle, WA 98109, USA
| | - Gustavo Glusman
- Institute for Systems Biology, 401 Terry Ave. N, Seattle, WA 98109, USA
| |
Collapse
|
24
|
Wheeler TJ, Clements J, Eddy SR, Hubley R, Jones TA, Jurka J, Smit AFA, Finn RD. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res 2012. [PMID: 23203985 PMCID: PMC3531169 DOI: 10.1093/nar/gks1265] [Citation(s) in RCA: 178] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.
Collapse
|
25
|
Warren WC, Clayton DF, Ellegren H, Arnold AP, Hillier LW, Künstner A, Searle S, White S, Vilella AJ, Fairley S, Heger A, Kong L, Ponting CP, Jarvis ED, Mello CV, Minx P, Lovell P, Velho TAF, Ferris M, Balakrishnan CN, Sinha S, Blatti C, London SE, Li Y, Lin YC, George J, Sweedler J, Southey B, Gunaratne P, Watson M, Nam K, Backström N, Smeds L, Nabholz B, Itoh Y, Whitney O, Pfenning AR, Howard J, Völker M, Skinner BM, Griffin DK, Ye L, McLaren WM, Flicek P, Quesada V, Velasco G, Lopez-Otin C, Puente XS, Olender T, Lancet D, Smit AFA, Hubley R, Konkel MK, Walker JA, Batzer MA, Gu W, Pollock DD, Chen L, Cheng Z, Eichler EE, Stapley J, Slate J, Ekblom R, Birkhead T, Burke T, Burt D, Scharff C, Adam I, Richard H, Sultan M, Soldatov A, Lehrach H, Edwards SV, Yang SP, Li X, Graves T, Fulton L, Nelson J, Chinwalla A, Hou S, Mardis ER, Wilson RK. The genome of a songbird. Nature 2010; 464:757-62. [PMID: 20360741 PMCID: PMC3187626 DOI: 10.1038/nature08819] [Citation(s) in RCA: 597] [Impact Index Per Article: 42.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2009] [Accepted: 01/06/2010] [Indexed: 01/16/2023]
Abstract
The zebra finch is an important model organism in several fields with unique relevance to human neuroscience. Like other songbirds, the zebra finch communicates through learned vocalizations, an ability otherwise documented only in humans and a few other animals and lacking in the chicken-the only bird with a sequenced genome until now. Here we present a structural, functional and comparative analysis of the genome sequence of the zebra finch (Taeniopygia guttata), which is a songbird belonging to the large avian order Passeriformes. We find that the overall structures of the genomes are similar in zebra finch and chicken, but they differ in many intrachromosomal rearrangements, lineage-specific gene family expansions, the number of long-terminal-repeat-based retrotransposons, and mechanisms of sex chromosome dosage compensation. We show that song behaviour engages gene regulatory networks in the zebra finch brain, altering the expression of long non-coding RNAs, microRNAs, transcription factors and their targets. We also show evidence for rapid molecular evolution in the songbird lineage of genes that are regulated during song experience. These results indicate an active involvement of the genome in neural processes underlying vocal communication and identify potential genetic substrates for the evolution and regulation of this behaviour.
Collapse
Affiliation(s)
- Wesley C Warren
- The Genome Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Roach JC, Glusman G, Smit AFA, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, Shendure J, Drmanac R, Jorde LB, Hood L, Galas DJ. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 2010; 328:636-9. [PMID: 20220176 DOI: 10.1126/science.1186802] [Citation(s) in RCA: 729] [Impact Index Per Article: 52.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
We analyzed the whole-genome sequences of a family of four, consisting of two siblings and their parents. Family-based sequencing allowed us to delineate recombination sites precisely, identify 70% of the sequencing errors (resulting in > 99.999% accuracy), and identify very rare single-nucleotide polymorphisms. We also directly estimated a human intergeneration mutation rate of approximately 1.1 x 10(-8) per position per haploid genome. Both offspring in this family have two recessive disorders: Miller syndrome, for which the gene was concurrently identified, and primary ciliary dyskinesia, for which causative genes have been previously identified. Family-based genome analysis enabled us to narrow the candidate genes for both of these Mendelian disorders to only four. Our results demonstrate the value of complete genome sequencing in families.
Collapse
Affiliation(s)
- Jared C Roach
- Institute for Systems Biology, Seattle, WA 98103, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
27
|
Warren WC, Hillier LW, Marshall Graves JA, Birney E, Ponting CP, Grützner F, Belov K, Miller W, Clarke L, Chinwalla AT, Yang SP, Heger A, Locke DP, Miethke P, Waters PD, Veyrunes F, Fulton L, Fulton B, Graves T, Wallis J, Puente XS, López-Otín C, Ordóñez GR, Eichler EE, Chen L, Cheng Z, Deakin JE, Alsop A, Thompson K, Kirby P, Papenfuss AT, Wakefield MJ, Olender T, Lancet D, Huttley GA, Smit AFA, Pask A, Temple-Smith P, Batzer MA, Walker JA, Konkel MK, Harris RS, Whittington CM, Wong ESW, Gemmell NJ, Buschiazzo E, Vargas Jentzsch IM, Merkel A, Schmitz J, Zemann A, Churakov G, Kriegs JO, Brosius J, Murchison EP, Sachidanandam R, Smith C, Hannon GJ, Tsend-Ayush E, McMillan D, Attenborough R, Rens W, Ferguson-Smith M, Lefèvre CM, Sharp JA, Nicholas KR, Ray DA, Kube M, Reinhardt R, Pringle TH, Taylor J, Jones RC, Nixon B, Dacheux JL, Niwa H, Sekita Y, Huang X, Stark A, Kheradpour P, Kellis M, Flicek P, Chen Y, Webber C, Hardison R, Nelson J, Hallsworth-Pepin K, Delehaunty K, Markovic C, Minx P, Feng Y, Kremitzki C, Mitreva M, Glasscock J, Wylie T, Wohldmann P, Thiru P, Nhan MN, Pohl CS, Smith SM, Hou S, Nefedov M, de Jong PJ, Renfree MB, Mardis ER, Wilson RK. Genome analysis of the platypus reveals unique signatures of evolution. Nature 2008; 453:175-83. [PMID: 18464734 PMCID: PMC2803040 DOI: 10.1038/nature06936] [Citation(s) in RCA: 475] [Impact Index Per Article: 29.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2007] [Accepted: 03/25/2008] [Indexed: 12/18/2022]
Abstract
We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
Collapse
Affiliation(s)
- Wesley C Warren
- Genome Sequencing Center, Washington University School of Medicine, Campus Box 8501, 4444 Forest Park Avenue, St Louis, Missouri 63108, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Han K, Konkel MK, Xing J, Wang H, Lee J, Meyer TJ, Huang CT, Sandifer E, Hebert K, Barnes EW, Hubley R, Miller W, Smit AFA, Ullmer B, Batzer MA. Mobile DNA in Old World monkeys: a glimpse through the rhesus macaque genome. Science 2007; 316:238-40. [PMID: 17431169 DOI: 10.1126/science.1139462] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
The completion of the draft sequence of the rhesus macaque genome allowed us to study the genomic composition and evolution of transposable elements in this representative of the Old World monkey lineage, a group of diverse primates closely related to humans. The L1 family of long interspersed elements appears to have evolved as a single lineage, and Alu elements have evolved into four currently active lineages. We also found evidence of elevated horizontal transmissions of retroviruses and the absence of DNA transposon activity in the Old World monkey lineage. In addition, approximately 100 precursors of composite SVA (short interspersed element, variable number of tandem repeat, and Alu) elements were identified, with the majority being shared by the common ancestor of humans and rhesus macaques. Mobile elements compose roughly 50% of primate genomes, and our findings illustrate their diversity and strong influence on genome evolution between closely related species.
Collapse
Affiliation(s)
- Kyudong Han
- Department of Biological Sciences, Biological Computation and Visualization Center, Center for Bio-Modular Multi-Scale Systems, Louisiana State University, Baton Rouge, LA 70803, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, Batzer MA, Bustamante CD, Eichler EE, Hahn MW, Hardison RC, Makova KD, Miller W, Milosavljevic A, Palermo RE, Siepel A, Sikela JM, Attaway T, Bell S, Bernard KE, Buhay CJ, Chandrabose MN, Dao M, Davis C, Delehaunty KD, Ding Y, Dinh HH, Dugan-Rocha S, Fulton LA, Gabisi RA, Garner TT, Godfrey J, Hawes AC, Hernandez J, Hines S, Holder M, Hume J, Jhangiani SN, Joshi V, Khan ZM, Kirkness EF, Cree A, Fowler RG, Lee S, Lewis LR, Li Z, Liu YS, Moore SM, Muzny D, Nazareth LV, Ngo DN, Okwuonu GO, Pai G, Parker D, Paul HA, Pfannkoch C, Pohl CS, Rogers YH, Ruiz SJ, Sabo A, Santibanez J, Schneider BW, Smith SM, Sodergren E, Svatek AF, Utterback TR, Vattathil S, Warren W, White CS, Chinwalla AT, Feng Y, Halpern AL, Hillier LW, Huang X, Minx P, Nelson JO, Pepin KH, Qin X, Sutton GG, Venter E, Walenz BP, Wallis JW, Worley KC, Yang SP, Jones SM, Marra MA, Rocchi M, Schein JE, Baertsch R, Clarke L, Csürös M, Glasscock J, Harris RA, Havlak P, Jackson AR, Jiang H, Liu Y, Messina DN, Shen Y, Song HXZ, Wylie T, Zhang L, Birney E, Han K, Konkel MK, Lee J, Smit AFA, Ullmer B, Wang H, Xing J, Burhans R, Cheng Z, Karro JE, Ma J, Raney B, She X, Cox MJ, Demuth JP, Dumas LJ, Han SG, Hopkins J, Karimpour-Fard A, Kim YH, Pollack JR, Vinar T, Addo-Quaye C, Degenhardt J, Denby A, Hubisz MJ, Indap A, Kosiol C, Lahn BT, Lawson HA, Marklein A, Nielsen R, Vallender EJ, Clark AG, Ferguson B, Hernandez RD, Hirani K, Kehrer-Sawatzki H, Kolb J, Patil S, Pu LL, Ren Y, Smith DG, Wheeler DA, Schenck I, Ball EV, Chen R, Cooper DN, Giardine B, Hsu F, Kent WJ, Lesk A, Nelson DL, O'brien WE, Prüfer K, Stenson PD, Wallace JC, Ke H, Liu XM, Wang P, Xiang AP, Yang F, Barber GP, Haussler D, Karolchik D, Kern AD, Kuhn RM, Smith KE, Zwieg AS. Evolutionary and biomedical insights from the rhesus macaque genome. Science 2007; 316:222-34. [PMID: 17431167 DOI: 10.1126/science.1139247] [Citation(s) in RCA: 989] [Impact Index Per Article: 58.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The rhesus macaque (Macaca mulatta) is an abundant primate species that diverged from the ancestors of Homo sapiens about 25 million years ago. Because they are genetically and physiologically similar to humans, rhesus monkeys are the most widely used nonhuman primate in basic and applied biomedical research. We determined the genome sequence of an Indian-origin Macaca mulatta female and compared the data with chimpanzees and humans to reveal the structure of ancestral primate genomes and to identify evidence for positive selection and lineage-specific expansions and contractions of gene families. A comparison of sequences from individual animals was used to investigate their underlying genetic diversity. The complete description of the macaque genome blueprint enhances the utility of this animal model for biomedical research and improves our understanding of the basic biology of the species.
Collapse
|
30
|
Glusman G, Qin S, El-Gewely MR, Siegel AF, Roach JC, Hood L, Smit AFA. A third approach to gene prediction suggests thousands of additional human transcribed regions. PLoS Comput Biol 2006; 2:e18. [PMID: 16543943 PMCID: PMC1391917 DOI: 10.1371/journal.pcbi.0020018] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2005] [Accepted: 01/25/2006] [Indexed: 12/26/2022] Open
Abstract
The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent "genomic deserts."
Collapse
|
31
|
Churakov G, Smit AFA, Brosius J, Schmitz J. A novel abundant family of retroposed elements (DAS-SINEs) in the nine-banded armadillo (Dasypus novemcinctus). Mol Biol Evol 2004; 22:886-93. [PMID: 15616140 DOI: 10.1093/molbev/msi071] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
About half of the mammalian genome is composed of retroposons. Long interspersed elements (LINEs) and short interspersed elements (SINEs) are the most abundant repetitive elements and account for about 21% and 13% of the human genome, respectively. SINEs have been detected in all major mammalian lineages, except for the South American order Xenarthra, also termed Edentata (armadillos, anteaters, and sloths). Investigating this order, we discovered a novel high-copy-number family of tRNA derived SINEs in the nine-banded armadillo Dasypus novemcinctus, a species that successfully crossed the Central American land bridge to North America in the Pliocene. A specific computer algorithm was developed, and we detected and extracted 687 specific SINEs from databases. Termed DAS-SINEs, we further divided them into six distinct subfamilies. We extracted tRNA(Ala)-derived monomers, two types of dimers, and three subfamilies of chimeric fusion products of a tRNA(Ala) domain and an approximately 180-nt sequence of thus far unidentified origin. Comparisons of secondary structures of the DAS-SINEs' tRNA domains suggest selective pressure to maintain a tRNA-like D-arm structure in the respective founder RNAs, as shown by compensatory mutations. By analysis of subfamily-specific genetic variability, comparison of the proportion of direct repeats, and analysis of self-integrations as well as key events of dimerization and deletions or insertions, we were able to delineate the evolutionary history of the DAS-SINE subfamilies.
Collapse
Affiliation(s)
- Gennady Churakov
- Institute of Experimental Pathology (ZMBE), University of Münster, Münster, Germany
| | | | | | | |
Collapse
|
32
|
Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 2004; 14:708-15. [PMID: 15060014 PMCID: PMC383317 DOI: 10.1101/gr.1933104] [Citation(s) in RCA: 1037] [Impact Index Per Article: 51.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We define a "threaded blockset," which is a novel generalization of the classic notion of a multiple alignment. A new computer program called TBA (for "threaded blockset aligner") builds a threaded blockset under the assumption that all matching segments occur in the same order and orientation in the given sequences; inversions and duplications are not addressed. TBA is designed to be appropriate for aligning many, but by no means all, megabase-sized regions of multiple mammalian genomes. The output of TBA can be projected onto any genome chosen as a reference, thus guaranteeing that different projections present consistent predictions of which genomic positions are orthologous. This capability is illustrated using a new visualization tool to view TBA-generated alignments of vertebrate Hox clusters from both the mammalian and fish perspectives. Experimental evaluation of alignment quality, using a program that simulates evolutionary change in genomic sequences, indicates that TBA is more accurate than earlier programs. To perform the dynamic-programming alignment step, TBA runs a stand-alone program called MULTIZ, which can be used to align highly rearranged or incompletely sequenced genomes. We describe our use of MULTIZ to produce the whole-genome multiple alignments at the Santa Cruz Genome Browser.
Collapse
Affiliation(s)
- Mathieu Blanchette
- Howard Hughes Medical Institute, University of California at Santa Cruz, Santa Cruz, California 95064, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|