1
|
Haese-Hill W, Crouch K, Otto TD. Annotation and visualization of parasite, fungi and arthropod genomes with Companion. Nucleic Acids Res 2024; 52:W39-W44. [PMID: 38752499 PMCID: PMC11223846 DOI: 10.1093/nar/gkae378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 04/06/2024] [Accepted: 04/30/2024] [Indexed: 07/06/2024] Open
Abstract
As sequencing genomes has become increasingly popular, the need for annotation of the resulting assemblies is growing. Structural and functional annotation is still challenging as it includes finding the correct gene sequences, annotating other elements such as RNA and being able to submit those data to databases to share it with the community. Compared to de novo assembly where contiguous chromosomes are a sign of high quality, it is difficult to visualize and assess the quality of annotation. We developed the Companion web server to allow non-experts to annotate their genome using a reference-based method, enabling them to assess the output before submitting to public databases. In this update paper, we describe how we have included novel methods for gene finding and made the Companion server more efficient for annotation of genomes of up to 1 Gb in size. The reference set was increased to include genomes of interest for human and animal health from the fungi and arthropod kingdoms. We show that Companion outperforms existing comparable tools where closely related references are available.
Collapse
Affiliation(s)
| | - Kathryn Crouch
- School of Infection & Immunity, University of Glasgow, UK
| | - Thomas D Otto
- School of Infection & Immunity, University of Glasgow, UK
- LPHI, CNRS, INSERM, Université de Montpellier, France
| |
Collapse
|
2
|
Kim BY, Gellert HR, Church SH, Suvorov A, Anderson SS, Barmina O, Beskid SG, Comeault AA, Crown KN, Diamond SE, Dorus S, Fujichika T, Hemker JA, Hrcek J, Kankare M, Katoh T, Magnacca KN, Martin RA, Matsunaga T, Medeiros MJ, Miller DE, Pitnick S, Schiffer M, Simoni S, Steenwinkel TE, Syed ZA, Takahashi A, Wei KHC, Yokoyama T, Eisen MB, Kopp A, Matute D, Obbard DJ, O’Grady PM, Price DK, Toda MJ, Werner T, Petrov DA. Single-fly genome assemblies fill major phylogenomic gaps across the Drosophilidae Tree of Life. PLoS Biol 2024; 22:e3002697. [PMID: 39024225 PMCID: PMC11257246 DOI: 10.1371/journal.pbio.3002697] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 06/03/2024] [Indexed: 07/20/2024] Open
Abstract
Long-read sequencing is driving rapid progress in genome assembly across all major groups of life, including species of the family Drosophilidae, a longtime model system for genetics, genomics, and evolution. We previously developed a cost-effective hybrid Oxford Nanopore (ONT) long-read and Illumina short-read sequencing approach and used it to assemble 101 drosophilid genomes from laboratory cultures, greatly increasing the number of genome assemblies for this taxonomic group. The next major challenge is to address the laboratory culture bias in taxon sampling by sequencing genomes of species that cannot easily be reared in the lab. Here, we build upon our previous methods to perform amplification-free ONT sequencing of single wild flies obtained either directly from the field or from ethanol-preserved specimens in museum collections, greatly improving the representation of lesser studied drosophilid taxa in whole-genome data. Using Illumina Novaseq X Plus and ONT P2 sequencers with R10.4.1 chemistry, we set a new benchmark for inexpensive hybrid genome assembly at US $150 per genome while assembling genomes from as little as 35 ng of genomic DNA from a single fly. We present 183 new genome assemblies for 179 species as a resource for drosophilid systematics, phylogenetics, and comparative genomics. Of these genomes, 62 are from pooled lab strains and 121 from single adult flies. Despite the sample limitations of working with small insects, most single-fly diploid assemblies are comparable in contiguity (>1 Mb contig N50), completeness (>98% complete dipteran BUSCOs), and accuracy (>QV40 genome-wide with ONT R10.4.1) to assemblies from inbred lines. We present a well-resolved multi-locus phylogeny for 360 drosophilid and 4 outgroup species encompassing all publicly available (as of August 2023) genomes for this group. Finally, we present a Progressive Cactus whole-genome, reference-free alignment built from a subset of 298 suitably high-quality drosophilid genomes. The new assemblies and alignment, along with updated laboratory protocols and computational pipelines, are released as an open resource and as a tool for studying evolution at the scale of an entire insect family.
Collapse
Affiliation(s)
- Bernard Y. Kim
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Hannah R. Gellert
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Samuel H. Church
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut United States of America
| | - Anton Suvorov
- Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia, United States of America
| | - Sean S. Anderson
- Department of Biology, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Olga Barmina
- Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
| | - Sofia G. Beskid
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Aaron A. Comeault
- School of Environmental and Natural Sciences, Bangor University, Bangor, United Kingdom
| | - K. Nicole Crown
- Department of Biology, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Sarah E. Diamond
- Department of Biology, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Steve Dorus
- Center for Reproductive Evolution, Department of Biology, Syracuse University, Syracuse, New York, United States of America
| | - Takako Fujichika
- Department of Biological Sciences, Tokyo Metropolitan University, Tokyo, Japan
| | - James A. Hemker
- Department of Developmental Biology, Stanford University, Stanford, California, United States of America
| | - Jan Hrcek
- Institute of Entomology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Maaria Kankare
- Department of Biological and Environmental Science, University of Jyväskylä, Jyväskylä, Finland
| | - Toru Katoh
- Department of Biological Sciences, Hokkaido University, Sapporo, Japan
| | - Karl N. Magnacca
- Hawaii Invertebrate Program, Division of Forestry & Wildlife, Honolulu, Hawaii, United States of America
| | - Ryan A. Martin
- Department of Biology, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Teruyuki Matsunaga
- Department of Complexity Science and Engineering, The University of Tokyo, Tokyo, Japan
| | - Matthew J. Medeiros
- Pacific Biosciences Research Center, University of Hawaiʻi, Mānoa, Hawaii, United States of America
| | - Danny E. Miller
- Division of Genetic Medicine, Department of Pediatrics; Department of Laboratory Medicine and Pathology, University of Washington, Seattle, Washington, United States of America
| | - Scott Pitnick
- Center for Reproductive Evolution, Department of Biology, Syracuse University, Syracuse, New York, United States of America
| | - Michele Schiffer
- Daintree Rainforest Observatory, James Cook University, Townsville, Australia
| | - Sara Simoni
- Department of Biology, Stanford University, Stanford, California, United States of America
| | | | - Zeeshan A. Syed
- Center for Reproductive Evolution, Department of Biology, Syracuse University, Syracuse, New York, United States of America
| | - Aya Takahashi
- Department of Biological Sciences, Tokyo Metropolitan University, Tokyo, Japan
| | - Kevin H-C. Wei
- Department of Zoology, The University of British Columbia, Vancouver, Canada
| | - Tsuya Yokoyama
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Michael B. Eisen
- Department of Cell and Molecular Biology, University of California Berkeley, Berkeley, California, United States of America
- Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America
| | - Artyom Kopp
- Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
| | - Daniel Matute
- Department of Biology, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Darren J. Obbard
- Institute of Ecology and Evolution, University of Edinburgh, Edinburgh, United Kingdom
| | - Patrick M. O’Grady
- Department of Entomology, Cornell University, Ithaca, New York, United States of America
| | - Donald K. Price
- School of Life Sciences, University of Nevada Las Vegas, Las Vegas, Nevada, United States of America
| | | | - Thomas Werner
- Department of Biological Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | - Dmitri A. Petrov
- Department of Biology, Stanford University, Stanford, California, United States of America
- CZ Biohub, Investigator, San Francisco, California, United States of America
| |
Collapse
|
3
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bornberg-Bauer E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJC, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PGS, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Pond SLK, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McGarvey KM, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O'Neill RJ, Eichler EE, Phillippy AM. The complete sequence and comparative analysis of ape sex chromosomes. Nature 2024; 630:401-411. [PMID: 38811727 PMCID: PMC11168930 DOI: 10.1038/s41586-024-07473-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 04/26/2024] [Indexed: 05/31/2024]
Abstract
Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bornberg-Bauer
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health and Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Yong-Hwee E Loh
- University of California Santa Barbara, Santa Barbara, CA, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Kelly M McGarvey
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Joana L Rocha
- University of California Berkeley, Berkeley, CA, USA
| | - Fedor Ryabov
- Masters Program in National Research, University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Mario Ventura
- Università degli Studi di Bari Aldo Moro, Bari, Italy
| | | | - Alice C Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan E Eichler
- University of Washington School of Medicine, Seattle, WA, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
4
|
Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, Marschall T, Li H, Paten B, Abel HJ, Antonacci-Fulton LL, Asri M, Baid G, Baker CA, Belyaeva A, Billis K, Bourque G, Buonaiuto S, Carroll A, Chaisson MJP, Chang PC, Chang XH, Cheng H, Chu J, Cody S, Colonna V, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Doerr D, Ebert P, Ebler J, Eichler EE, Eizenga JM, Fairley S, Fedrigo O, Felsenfeld AL, Feng X, Fischer C, Flicek P, Formenti G, Frankish A, Fulton RS, Gao Y, Garg S, Garrison E, Garrison NA, Giron CG, Green RE, Groza C, Guarracino A, Haggerty L, Hall IM, Harvey WT, Haukness M, Haussler D, Heumos S, Hickey G, Hoekzema K, Hourlier T, Howe K, Jain M, Jarvis ED, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Li H, Liao WW, Lu S, Lu TY, Lucas JK, Magalhães H, Marco-Sola S, Marijon P, Markello C, Marschall T, Martin FJ, McCartney A, McDaniel J, Miga KH, Mitchell MW, Monlong J, Mountcastle J, Munson KM, Mwaniki MN, Nattestad M, Novak AM, Nurk S, Olsen HE, Olson ND, Paten B, Pesout T, Phillippy AM, Popejoy AB, Porubsky D, Prins P, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Sibbesen JA, Sirén J, Smith MW, Sofia HJ, Tayoun ANA, Thibaud-Nissen F, Tomlinson C, Tricomi FF, Villani F, Vollger MR, Wagner J, Walenz B, Wang T, Wood JMD, Zimin AV, Zook JM. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat Biotechnol 2024; 42:663-673. [PMID: 37165083 PMCID: PMC10638906 DOI: 10.1038/s41587-023-01793-w] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 04/18/2023] [Indexed: 05/12/2023]
Abstract
Pangenome references address biases of reference genomes by storing a representative set of diverse haplotypes and their alignment, usually as a graph. Alternate alleles determined by variant callers can be used to construct pangenome graphs, but advances in long-read sequencing are leading to widely available, high-quality phased assemblies. Constructing a pangenome graph directly from assemblies, as opposed to variant calls, leverages the graph's ability to represent variation at different scales. Here we present the Minigraph-Cactus pangenome pipeline, which creates pangenomes directly from whole-genome alignments, and demonstrate its ability to scale to 90 human haplotypes from the Human Pangenome Reference Consortium. The method builds graphs containing all forms of genetic variation while allowing use of current mapping and genotyping tools. We measure the effect of the quality and completeness of reference genomes used for analysis within the pangenomes and show that using the CHM13 reference from the Telomere-to-Telomere Consortium improves the accuracy of our methods. We also demonstrate construction of a Drosophila melanogaster pangenome.
Collapse
Affiliation(s)
- Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | | | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Haley J. Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Carl A. Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montreal, QC, Canada
- Canadian Center for Computational Genomics, McGill University, Montreal, QC, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | | | - Mark J. P. Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | - Xian H. Chang
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | | | - Robert M. Cook-Deegan
- Arizona State University, Barrett and O’Connor Washington Center, Washington, DC, USA
| | - Omar E. Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Daniel Doerr
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Peter Ebert
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Jana Ebler
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Jordan M. Eizenga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L. Felsenfeld
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Robert S. Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Nanibaa’ A. Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Richard E. Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montreal, QC, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - William T. Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Erich D. Jarvis
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Hanlee P. Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E. Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A. Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
| | | | - Jan O. Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P. Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Julian K. Lucas
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Hugo Magalhães
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d’Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Charles Markello
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Tobias Marschall
- Center for Digital Medicine, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Fergal J. Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Jean Monlong
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
- These authors contributed equally: Glenn Hickey, Jean Monlong
| | | | - Katherine M. Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Adam M. Novak
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E. Olsen
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Alice B. Popejoy
- Department of Public Health Sciences, University of California, Davis, Davis, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A. Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ashley D. Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I. Schultz
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Jonas A. Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Michael W. Smith
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J. Sofia
- National Institutes of Health (NIH)–National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N. Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children’s Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Aleksey V. Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| |
Collapse
|
5
|
Darian JC, Kundu R, Rajaby R, Sung WK. Constructing telomere-to-telomere diploid genome by polishing haploid nanopore-based assembly. Nat Methods 2024; 21:574-583. [PMID: 38459383 DOI: 10.1038/s41592-023-02141-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 11/30/2023] [Indexed: 03/10/2024]
Abstract
Draft genomes generated from Oxford Nanopore Technologies (ONT) long reads are known to have a higher error rate. Although existing genome polishers can enhance their quality, the error rate (including mismatches, indels and switching errors between paternal and maternal haplotypes) can be significant. Here, we develop two polishers, hypo-short and hypo-hybrid to address this issue. Hypo-short utilizes Illumina short reads to polish an ONT-based draft assembly, resulting in a high-quality assembly with low error rates and switching errors. Expanding on this, hypo-hybrid incorporates ONT long reads to further refine the assembly into a diploid representation. Leveraging on hypo-hybrid, we have created a diploid genome assembly pipeline called hypo-assembler. Hypo-assembler automates the generation of highly accurate, contiguous and nearly complete diploid assemblies using ONT long reads, Illumina short reads and optionally Hi-C reads. Notably, our solution even allows for the production of telomere-to-telomere diploid genomes with additional manual steps. As a proof of concept, we successfully assembled a fully phased telomere-to-telomere diploid genome of HG00733, achieving a quality value exceeding 50.
Collapse
Affiliation(s)
| | - Ritu Kundu
- School of Computing, National University of Singapore, Singapore, Singapore
| | | | - Wing-Kin Sung
- School of Computing, National University of Singapore, Singapore, Singapore.
- Genome Institute of Singapore, Singapore, Singapore.
- Department of Chemical Pathology, The Chinese University of Hong Kong, Hong Kong, China.
- JC STEM Laboratory of Computational Genomics, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China.
- Hong Kong Genome Institute, Hong Kong, China.
| |
Collapse
|
6
|
van Westerhoven AC, Aguilera-Galvez C, Nakasato-Tagami G, Shi-Kunne X, Martinez de la Parte E, Chavarro-Carrero E, Meijer HJG, Feurtey A, Maryani N, Ordóñez N, Schneiders H, Nijbroek K, Wittenberg AHJ, Hofstede R, García-Bastidas F, Sørensen A, Swennen R, Drenth A, Stukenbrock EH, Kema GHJ, Seidl MF. Segmental duplications drive the evolution of accessory regions in a major crop pathogen. THE NEW PHYTOLOGIST 2024; 242:610-625. [PMID: 38402521 DOI: 10.1111/nph.19604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 02/01/2024] [Indexed: 02/26/2024]
Abstract
Many pathogens evolved compartmentalized genomes with conserved core and variable accessory regions (ARs) that carry effector genes mediating virulence. The fungal plant pathogen Fusarium oxysporum has such ARs, often spanning entire chromosomes. The presence of specific ARs influences the host range, and horizontal transfer of ARs can modify the pathogenicity of the receiving strain. However, how these ARs evolve in strains that infect the same host remains largely unknown. We defined the pan-genome of 69 diverse F. oxysporum strains that cause Fusarium wilt of banana, a significant constraint to global banana production, and analyzed the diversity and evolution of the ARs. Accessory regions in F. oxysporum strains infecting the same banana cultivar are highly diverse, and we could not identify any shared genomic regions and in planta-induced effectors. We demonstrate that segmental duplications drive the evolution of ARs. Furthermore, we show that recent segmental duplications specifically in accessory chromosomes cause the expansion of ARs in F. oxysporum. Taken together, we conclude that extensive recent duplications drive the evolution of ARs in F. oxysporum, which contribute to the evolution of virulence.
Collapse
Affiliation(s)
- Anouk C van Westerhoven
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
- Department of Biology, Theoretical Biology & Bioinformatics, Utrecht University, Padualaan 8, 3584 CH, Utrecht, the Netherlands
| | - Carolina Aguilera-Galvez
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Giuliana Nakasato-Tagami
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Xiaoqian Shi-Kunne
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Einar Martinez de la Parte
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Edgar Chavarro-Carrero
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Harold J G Meijer
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
- Department Biointeractions and Plant Health, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Alice Feurtey
- Christian-Albrechts University of Kiel, Christian-Albrechts-Platz 4, 24118, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, August-Thienemann-Straße 2, 24306, Plön, Germany
- Plant Pathology, Eidgenössische Technische Hochschule Zürich, Rämistrasse 101, 8092, Zürich, Switzerland
| | - Nani Maryani
- Biology Education, Universitas Sultan Ageng Tirtayasa, Jalan Raya Palka No.Km 3, 42163, Banten, Indonesia
| | - Nadia Ordóñez
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Harrie Schneiders
- KeyGene, Agro Business Park 90, 6708 PW, Wageningen, the Netherlands
| | - Koen Nijbroek
- KeyGene, Agro Business Park 90, 6708 PW, Wageningen, the Netherlands
| | | | - Rene Hofstede
- KeyGene, Agro Business Park 90, 6708 PW, Wageningen, the Netherlands
| | | | - Anker Sørensen
- KeyGene, Agro Business Park 90, 6708 PW, Wageningen, the Netherlands
| | - Ronny Swennen
- Division of Crop Biotechnics, Laboratory of Tropical Crop Improvement, Catholic University of Leuven, Oude Markt 13, 3000, Leuven, Belgium
- International Institute of Tropical Agriculture, Plot 15 Naguru E Rd, Kampala, PO Box 7878, Uganda
| | - Andre Drenth
- The University of Queensland, St Lucia, 4072, Brisbane, Queensland, Australia
| | - Eva H Stukenbrock
- Christian-Albrechts University of Kiel, Christian-Albrechts-Platz 4, 24118, Kiel, Germany
- Max Planck Institute for Evolutionary Biology, August-Thienemann-Straße 2, 24306, Plön, Germany
| | - Gert H J Kema
- Laboratory of Phytopathology, Wageningen University, Droevendaalsesteeg 1, 6708 PB, Wageningen, the Netherlands
| | - Michael F Seidl
- Department of Biology, Theoretical Biology & Bioinformatics, Utrecht University, Padualaan 8, 3584 CH, Utrecht, the Netherlands
| |
Collapse
|
7
|
Wilder AP, Steiner CC, Hendricks S, Haller BC, Kim C, Korody ML, Ryder OA. Genetic load and viability of a future restored northern white rhino population. Evol Appl 2024; 17:e13683. [PMID: 38617823 PMCID: PMC11009427 DOI: 10.1111/eva.13683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 03/04/2024] [Accepted: 03/06/2024] [Indexed: 04/16/2024] Open
Abstract
As biodiversity loss outpaces recovery, conservationists are increasingly turning to novel tools for preventing extinction, including cloning and in vitro gametogenesis of biobanked cells. However, restoration of populations can be hindered by low genetic diversity and deleterious genetic load. The persistence of the northern white rhino (Ceratotherium simum cottoni) now depends on the cryopreserved cells of 12 individuals. These banked genomes have higher genetic diversity than southern white rhinos (C. s. simum), a sister subspecies that successfully recovered from a severe bottleneck, but the potential impact of genetic load is unknown. We estimated how demographic history has shaped genome-wide genetic load in nine northern and 13 southern white rhinos. The bottleneck left southern white rhinos with more fixed and homozygous deleterious alleles and longer runs of homozygosity, whereas northern white rhinos retained more deleterious alleles masked in heterozygosity. To gauge the impact of genetic load on the fitness of a northern white rhino population restored from biobanked cells, we simulated recovery using fitness of southern white rhinos as a benchmark for a viable population. Unlike traditional restoration, cell-derived founders can be reintroduced in subsequent generations to boost lost genetic diversity and relieve inbreeding. In simulations with repeated reintroduction of founders into a restored population, the fitness cost of genetic load remained lower than that borne by southern white rhinos. Without reintroductions, rapid growth of the restored population (>20-30% per generation) would be needed to maintain comparable fitness. Our results suggest that inbreeding depression from genetic load is not necessarily a barrier to recovery of the northern white rhino and demonstrate how restoration from biobanked cells relieves some constraints of conventional restoration from a limited founder pool. Established conservation methods that protect healthy populations will remain paramount, but emerging technologies hold promise to bolster these tools to combat the extinction crisis.
Collapse
Affiliation(s)
- Aryn P. Wilder
- Conservation GeneticsSan Diego Zoo Wildlife AllianceEscondidoCaliforniaUSA
| | - Cynthia C. Steiner
- Conservation GeneticsSan Diego Zoo Wildlife AllianceEscondidoCaliforniaUSA
| | - Sarah Hendricks
- Conservation GeneticsSan Diego Zoo Wildlife AllianceEscondidoCaliforniaUSA
- Institute for Interdisciplinary Data SciencesUniversity of IdahoMoscowIdahoUSA
| | | | - Chang Kim
- University of CaliforniaSanta Cruz Genomics InstituteSanta CruzCaliforniaUSA
- Department of Neurological SurgeryUniversity of CaliforniaSan FranciscoCaliforniaUSA
| | - Marisa L. Korody
- Conservation GeneticsSan Diego Zoo Wildlife AllianceEscondidoCaliforniaUSA
| | - Oliver A. Ryder
- Conservation GeneticsSan Diego Zoo Wildlife AllianceEscondidoCaliforniaUSA
| |
Collapse
|
8
|
Mao Y, Harvey WT, Porubsky D, Munson KM, Hoekzema K, Lewis AP, Audano PA, Rozanski A, Yang X, Zhang S, Yoo D, Gordon DS, Fair T, Wei X, Logsdon GA, Haukness M, Dishuck PC, Jeong H, Del Rosario R, Bauer VL, Fattor WT, Wilkerson GK, Mao Y, Shi Y, Sun Q, Lu Q, Paten B, Bakken TE, Pollen AA, Feng G, Sawyer SL, Warren WC, Carbone L, Eichler EE. Structurally divergent and recurrently mutated regions of primate genomes. Cell 2024; 187:1547-1562.e13. [PMID: 38428424 PMCID: PMC10947866 DOI: 10.1016/j.cell.2024.01.052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 11/26/2023] [Accepted: 01/31/2024] [Indexed: 03/03/2024]
Abstract
We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or ∼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.
Collapse
Affiliation(s)
- Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China.
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Allison Rozanski
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Xiangyu Yang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Shilong Zhang
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - DongAhn Yoo
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David S Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
| | - Xiaoxi Wei
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hyeonsoo Jeong
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ricardo Del Rosario
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vanessa L Bauer
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Bouder, CO, USA
| | - Will T Fattor
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Bouder, CO, USA
| | - Gregory K Wilkerson
- Department of Veterinary Sciences, Michale E. Keeling Center for Comparative Medicine and Research, The University of Texas MD Anderson Cancer Center, Bastrop, TX, USA; Department of Clinical Sciences, North Carolina State University, Raleigh, NC, USA
| | - Yuxiang Mao
- Institute of Neuroscience, State Key Laboratory of Neuroscience, Center for Excellence in Brain Science & Intelligence Technology, Chinese Academy of Sciences, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China
| | - Yongyong Shi
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China; Institute of Neuroscience, State Key Laboratory of Neuroscience, Center for Excellence in Brain Science & Intelligence Technology, Chinese Academy of Sciences, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China
| | - Qiang Sun
- Institute of Neuroscience, State Key Laboratory of Neuroscience, Center for Excellence in Brain Science & Intelligence Technology, Chinese Academy of Sciences, Shanghai, China; Shanghai Center for Brain Science and Brain-Inspired Intelligence Technology, Shanghai, China
| | - Qing Lu
- Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Ministry of Education, Shanghai Jiao Tong University, Shanghai, China
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Guoping Feng
- McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sara L Sawyer
- BioFrontiers Institute, Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Bouder, CO, USA
| | - Wesley C Warren
- Department of Animal Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO, USA; Department of Surgery, School of Medicine, University of Missouri, Columbia, MO, USA; Institute of Data Science and Informatics, University of Missouri, Columbia, MO, USA
| | - Lucia Carbone
- Department of Medicine, Knight Cardiovascular Institute, Oregon Health and Science University, Portland, OR, USA; Division of Genetics, Oregon National Primate Research Center, Beaverton, OR, USA; Department of Molecular and Medical Genetics, Oregon Health and Science University, Portland, OR, USA; Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
9
|
Webster TH, Vannan A, Pinto BJ, Denbrock G, Morales M, Dolby GA, Fiddes IT, DeNardo DF, Wilson MA. Lack of Dosage Balance and Incomplete Dosage Compensation in the ZZ/ZW Gila Monster (Heloderma suspectum) Revealed by De Novo Genome Assembly. Genome Biol Evol 2024; 16:evae018. [PMID: 38319079 PMCID: PMC10950046 DOI: 10.1093/gbe/evae018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 01/23/2024] [Accepted: 01/23/2024] [Indexed: 02/07/2024] Open
Abstract
Reptiles exhibit a variety of modes of sex determination, including both temperature-dependent and genetic mechanisms. Among those species with genetic sex determination, sex chromosomes of varying heterogamety (XX/XY and ZZ/ZW) have been observed with different degrees of differentiation. Karyotype studies have demonstrated that Gila monsters (Heloderma suspectum) have ZZ/ZW sex determination and this system is likely homologous to the ZZ/ZW system in the Komodo dragon (Varanus komodoensis), but little else is known about their sex chromosomes. Here, we report the assembly and analysis of the Gila monster genome. We generated a de novo draft genome assembly for a male using 10X Genomics technology. We further generated and analyzed short-read whole genome sequencing and whole transcriptome sequencing data for three males and three females. By comparing female and male genomic data, we identified four putative Z chromosome scaffolds. These putative Z chromosome scaffolds are homologous to Z-linked scaffolds identified in the Komodo dragon. Further, by analyzing RNAseq data, we observed evidence of incomplete dosage compensation between the Gila monster Z chromosome and autosomes and a lack of balance in Z-linked expression between the sexes. In particular, we observe lower expression of the Z in females (ZW) than males (ZZ) on a global basis, though we find evidence suggesting local gene-by-gene compensation. This pattern has been observed in most other ZZ/ZW systems studied to date and may represent a general pattern for female heterogamety in vertebrates.
Collapse
Affiliation(s)
- Timothy H Webster
- Department of Anthropology, University of Utah, Salt Lake City, UT, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Annika Vannan
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Brendan J Pinto
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI, USA
| | - Grant Denbrock
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Matheo Morales
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Department of Genetics, Yale University, New Haven, CT, USA
| | - Greer A Dolby
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL, USA
| | | | - Dale F DeNardo
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Melissa A Wilson
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
- Center for Mechanisms of Evolution, Biodesign Institute, Tempe, AZ, USA
| |
Collapse
|
10
|
Kautt AF, Chen J, Lewarch CL, Hu C, Turner K, Lassance JM, Baier F, Bedford NL, Bendesky A, Hoekstra HE. Evolution of gene expression across brain regions in behaviourally divergent deer mice. Mol Ecol 2024:e17270. [PMID: 38263608 DOI: 10.1111/mec.17270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/03/2024] [Accepted: 01/08/2024] [Indexed: 01/25/2024]
Abstract
The evolution of innate behaviours is ultimately due to genetic variation likely acting in the nervous system. Gene regulation may be particularly important because it can evolve in a modular brain-region specific fashion through the concerted action of cis- and trans-regulatory changes. Here, to investigate transcriptional variation and its regulatory basis across the brain, we perform RNA sequencing (RNA-Seq) on ten brain subregions in two sister species of deer mice (Peromyscus maniculatus and P. polionotus)-which differ in a range of innate behaviours, including their social system-and their F1 hybrids. We find that most of the variation in gene expression distinguishes subregions, followed by species. Interspecific differential expression (DE) is pervasive (52-59% of expressed genes), whereas the number of DE genes between sexes is modest overall (~3%). Interestingly, the identity of DE genes varies considerably across brain regions. Much of this modularity is due to cis-regulatory divergence, and while 43% of genes were consistently assigned to the same gene regulatory class across subregions (e.g. conserved, cis- or trans-regulatory divergence), a similar number were assigned to two or more different gene regulatory classes. Together, these results highlight the modularity of gene expression differences and divergence in the brain, which may be key to explain how the evolution of brain gene expression can contribute to the astonishing diversity of animal behaviours.
Collapse
Affiliation(s)
- Andreas F Kautt
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts, USA
| | - Jenny Chen
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts, USA
| | - Caitlin L Lewarch
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts, USA
| | - Caroline Hu
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts, USA
| | - Kyle Turner
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts, USA
| | - Jean-Marc Lassance
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts, USA
| | - Felix Baier
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts, USA
| | - Nicole L Bedford
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts, USA
| | - Andres Bendesky
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts, USA
| | - Hopi E Hoekstra
- Department of Organismic & Evolutionary Biology, Department of Molecular & Cellular Biology, Center for Brain Science, Harvard University, Cambridge, Massachusetts, USA
| |
Collapse
|
11
|
Fair T, Pavlovic BJ, Schaefer NK, Pollen AA. Mapping cis- and trans-regulatory target genes of human-specific deletions. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.27.573461. [PMID: 38234800 PMCID: PMC10793408 DOI: 10.1101/2023.12.27.573461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Deletion of functional sequence is predicted to represent a fundamental mechanism of molecular evolution1,2. Comparative genetic studies of primates2,3 have identified thousands of human-specific deletions (hDels), and the cis-regulatory potential of short (≤31 base pairs) hDels has been assessed using reporter assays4. However, how structural variant-sized (≥50 base pairs) hDels influence molecular and cellular processes in their native genomic contexts remains unexplored. Here, we design genome-scale libraries of single-guide RNAs targeting 7.2 megabases of sequence in 6,358 hDels and present a systematic CRISPR interference (CRISPRi) screening approach to identify hDels that modify cellular proliferation in chimpanzee pluripotent stem cells. By intersecting hDels with chromatin state features and performing single-cell CRISPRi (Perturb-seq) to identify their cis- and trans-regulatory target genes, we discovered 19 hDels controlling gene expression. We highlight two hDels, hDel_2247 and hDel_585, with tissue-specific activity in the liver and brain, respectively. Our findings reveal a molecular and cellular role for sequences lost in the human lineage and establish a framework for functionally interrogating human-specific genetic variants.
Collapse
Affiliation(s)
- Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Bryan J Pavlovic
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Nathan K Schaefer
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
12
|
Makova KD, Pickett BD, Harris RS, Hartley GA, Cechova M, Pal K, Nurk S, Yoo D, Li Q, Hebbar P, McGrath BC, Antonacci F, Aubel M, Biddanda A, Borchers M, Bomberg E, Bouffard GG, Brooks SY, Carbone L, Carrel L, Carroll A, Chang PC, Chin CS, Cook DE, Craig SJ, de Gennaro L, Diekhans M, Dutra A, Garcia GH, Grady PG, Green RE, Haddad D, Hallast P, Harvey WT, Hickey G, Hillis DA, Hoyt SJ, Jeong H, Kamali K, Kosakovsky Pond SL, LaPolice TM, Lee C, Lewis AP, Loh YHE, Masterson P, McCoy RC, Medvedev P, Miga KH, Munson KM, Pak E, Paten B, Pinto BJ, Potapova T, Rhie A, Rocha JL, Ryabov F, Ryder OA, Sacco S, Shafin K, Shepelev VA, Slon V, Solar SJ, Storer JM, Sudmant PH, Sweetalana, Sweeten A, Tassia MG, Thibaud-Nissen F, Ventura M, Wilson MA, Young AC, Zeng H, Zhang X, Szpiech ZA, Huber CD, Gerton JL, Yi SV, Schatz MC, Alexandrov IA, Koren S, O’Neill RJ, Eichler E, Phillippy AM. The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.30.569198. [PMID: 38077089 PMCID: PMC10705393 DOI: 10.1101/2023.11.30.569198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/24/2023]
Abstract
Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.
Collapse
Affiliation(s)
| | - Brandon D. Pickett
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Monika Cechova
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Karol Pal
- Penn State University, University Park, PA, USA
| | - Sergey Nurk
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - DongAhn Yoo
- University of Washington School of Medicine, Seattle, WA, USA
| | - Qiuhui Li
- Johns Hopkins University, Baltimore, MD, USA
| | - Prajna Hebbar
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | | | | | - Erich Bomberg
- University of Münster, Münster, Germany
- MPI for Developmental Biology, Tübingen, Germany
| | - Gerard G. Bouffard
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shelise Y. Brooks
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Lucia Carbone
- Oregon Health & Science University, Portland, OR, USA
- Oregon National Primate Research Center, Hillsboro, OR, USA
| | - Laura Carrel
- Penn State University School of Medicine, Hershey, PA, USA
| | | | | | - Chen-Shan Chin
- Foundation of Biological Data Sciences, Belmont, CA, USA
| | | | | | | | - Mark Diekhans
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - Amalia Dutra
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Gage H. Garcia
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Diana Haddad
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pille Hallast
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | - Glenn Hickey
- University of California Santa Cruz, Santa Cruz, CA, USA
| | - David A. Hillis
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | - Hyeonsoo Jeong
- University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Charles Lee
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | | | | | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Karen H. Miga
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | - Evgenia Pak
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Benedict Paten
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | - Arang Rhie
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | | | - Samuel Sacco
- University of California Santa Cruz, Santa Cruz, CA, USA
| | | | | | | | - Steven J. Solar
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Sweetalana
- Penn State University, University Park, PA, USA
| | - Alex Sweeten
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Johns Hopkins University, Baltimore, MD, USA
| | | | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | | | - Alice C. Young
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Xinru Zhang
- Penn State University, University Park, PA, USA
| | | | | | | | - Soojin V. Yi
- University of California Santa Barbara, Santa Barbara, CA, USA
| | | | | | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Evan Eichler
- University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Adam M. Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
13
|
Pérez-Umphrey AA, Settlecowski AE, Elbers JP, Williams ST, Jonsson CB, Bonisoli-Alquati A, Snider AM, Taylor SS. Genetic variants associated with hantavirus infection in a reservoir host are related to regulation of inflammation and immune surveillance. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2023; 116:105525. [PMID: 37956745 DOI: 10.1016/j.meegid.2023.105525] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 10/14/2023] [Accepted: 11/10/2023] [Indexed: 11/15/2023]
Abstract
The immunogenetics of wildlife populations influence the epidemiology and evolutionary dynamic of the host-pathogen system. Profiling immune gene diversity present in wildlife may be especially important for those species that, while not at risk of disease or extinction themselves, are host to diseases that are a threat to humans, other wildlife, or livestock. Hantaviruses (genus: Orthohantavirus) are globally distributed zoonotic RNA viruses with pathogenic strains carried by a diverse group of rodent hosts. The marsh rice rat (Oryzomys palustris) is the reservoir host of Orthohantavirus bayoui, a hantavirus that causes fatal cases of hantavirus cardiopulmonary syndrome in humans. We performed a genome wide association study (GWAS) using the rice rat "immunome" (i.e., all exons related to the immune response) to identify genetic variants associated with infection status in wild-caught rice rats naturally infected with their endemic strain of hantavirus. First, we created an annotated reference genome using 10× Chromium Linked Reads sequencing technology. This reference genome was used to create custom baits which were then used to target enrich prepared rice rat libraries (n = 128) and isolate their immunomes prior to sequencing. Top SNPs in the association test were present in four genes (Socs5, Eprs, Mrc1, and Il1f8) which have not been previously implicated in hantavirus infections. However, these genes correspond with other loci or pathways with established importance in hantavirus susceptibility or infection tolerance in reservoir hosts: the JAK/STAT, MHC, and NFκB. These results serve as informative markers for future exploration and highlight the importance of immune pathways that repeatedly emerge across hantavirus systems. Our work aids in creating cross-species comparisons for better understanding mechanisms of genetic susceptibility and host-pathogen coevolution in hantavirus systems.
Collapse
Affiliation(s)
- Anna A Pérez-Umphrey
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA.
| | - Amie E Settlecowski
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA
| | - Jean P Elbers
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA; Institute of Medical Genetics, Center for Pathobiochemistry and Genetics, Medical University of Vienna, Währinger Straße 10, 1090 Vienna, Austria
| | - S Tyler Williams
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA
| | - Colleen B Jonsson
- Department of Microbiology, Immunology and Biochemistry, College of Medicine, University of Tennessee Health Science Center, University of Tennessee, 858 Madison Ave., Memphis, TN 38163, USA
| | - Andrea Bonisoli-Alquati
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA; Department of Biological Sciences, California State Polytechnic University-Pomona, Pomona, CA 91768, USA
| | - Allison M Snider
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA
| | - Sabrina S Taylor
- School of Renewable Natural Resources, Louisiana State University and AgCenter, 227 RNR Building, Baton Rouge, LA 70803, USA
| |
Collapse
|
14
|
He Y, Chu Y, Guo S, Hu J, Li R, Zheng Y, Ma X, Du Z, Zhao L, Yu W, Xue J, Bian W, Yang F, Chen X, Zhang P, Wu R, Ma Y, Shao C, Chen J, Wang J, Li J, Wu J, Hu X, Long Q, Jiang M, Ye H, Song S, Li G, Wei Y, Xu Y, Ma Y, Chen Y, Wang K, Bao J, Xi W, Wang F, Ni W, Zhang M, Yu Y, Li S, Kang Y, Gao Z. T2T-YAO: A Telomere-to-telomere Assembled Diploid Reference Genome for Han Chinese. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:1085-1100. [PMID: 37595788 PMCID: PMC11082261 DOI: 10.1016/j.gpb.2023.08.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 08/01/2023] [Accepted: 08/08/2023] [Indexed: 08/20/2023]
Abstract
Since its initial release in 2001, the human reference genome has undergone continuous improvement in quality, and the recently released telomere-to-telomere (T2T) version - T2T-CHM13 - reaches its highest level of continuity and accuracy after 20 years of effort by working on a simplified, nearly homozygous genome of a hydatidiform mole cell line. Here, to provide an authentic complete diploid human genome reference for the Han Chinese, the largest population in the world, we assembled the genome of a male Han Chinese individual, T2T-YAO, which includes T2T assemblies of all the 22 + X + M and 22 + Y chromosomes in both haploids. The quality of T2T-YAO is much better than those of all currently available diploid assemblies, and its haploid version, T2T-YAO-hp, generated by selecting the better assembly for each autosome, reaches the top quality of fewer than one error per 29.5 Mb, even higher than that of T2T-CHM13. Derived from an individual living in the aboriginal region of the Han population, T2T-YAO shows clear ancestry and potential genetic continuity from the ancient ancestors. Each haplotype of T2T-YAO possesses ∼ 330-Mb exclusive sequences, ∼ 3100 unique genes, and tens of thousands of nucleotide and structural variations as compared with CHM13, highlighting the necessity of a population-stratified reference genome. The construction of T2T-YAO, an accurate and authentic representative of the Chinese population, would enable precise delineation of genomic variations and advance our understandings in the hereditability of diseases and phenotypes, especially within the context of the unique variations of the Chinese population.
Collapse
Affiliation(s)
- Yukun He
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China
| | - Yanan Chu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Shuming Guo
- Linfen Clinical Medicine Research Center, Linfen 041000, China; Institute of Chest and Lung Diseases, Shanxi Medical University, Taiyuan 030001, China
| | - Jiang Hu
- GrandOmics Biosciences Co., Ltd, Wuhan 430076, China
| | - Ran Li
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yali Zheng
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Xinqian Ma
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Zhenglin Du
- Institute of PSI Genomics, Wenzhou 325024, China
| | - Lili Zhao
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wenyi Yu
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Jianbo Xue
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wenjie Bian
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Feifei Yang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Xi Chen
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Pingan Zhang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Rihan Wu
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yifan Ma
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Changjun Shao
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Jing Chen
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Jian Wang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Jiwei Li
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Jing Wu
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Xiaoyi Hu
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Qiuyue Long
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Mingzheng Jiang
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Hongli Ye
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Shixu Song
- Department of Respiratory, Critical Care and Sleep Medicine, Xiang'an Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361101, China
| | - Guangyao Li
- Linfen Clinical Medicine Research Center, Linfen 041000, China
| | - Yue Wei
- Linfen Clinical Medicine Research Center, Linfen 041000, China
| | - Yu Xu
- Beijing Jishuitan Hospital, Capital Medical University, Beijing 100035, China
| | - Yanliang Ma
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yanwen Chen
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Keqiang Wang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Jing Bao
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wen Xi
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Fang Wang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Wentao Ni
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Moqin Zhang
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yan Yu
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Shengnan Li
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China
| | - Yu Kang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100490, China.
| | - Zhancheng Gao
- Department of Respiratory and Critical Care Medicine, Peking University People's Hospital, Beijing 100044, China; Institute of Chest and Lung Diseases, Shanxi Medical University, Taiyuan 030001, China; Beijing Key Laboratory of Genome and Precision Medicine Technologies, Beijing 100101, China.
| |
Collapse
|
15
|
Kersten O, Star B, Krabberød AK, Atmore LM, Tørresen OK, Anker-Nilssen T, Descamps S, Strøm H, Johansson US, Sweet PR, Jakobsen KS, Boessenkool S. Hybridization of Atlantic puffins in the Arctic coincides with 20th-century climate change. SCIENCE ADVANCES 2023; 9:eadh1407. [PMID: 37801495 PMCID: PMC10558128 DOI: 10.1126/sciadv.adh1407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 09/06/2023] [Indexed: 10/08/2023]
Abstract
The Arctic is experiencing the fastest rates of global warming, leading to shifts in the distribution of its biota and increasing the potential for hybridization. However, genomic evidence of recent hybridization events in the Arctic remains unexpectedly rare. Here, we use whole-genome sequencing of contemporary and 122-year-old historical specimens to investigate the origin of an Arctic hybrid population of Atlantic puffins (Fratercula arctica) on Bjørnøya, Norway. We show that the hybridization between the High Arctic, large-bodied subspecies F. a. naumanni and the temperate, smaller-sized subspecies F. a. arctica began as recently as six generations ago due to an unexpected southward range expansion of F. a. naumanni. Moreover, we find a significant temporal loss of genetic diversity across Arctic and temperate puffin populations. Our observations provide compelling genomic evidence of the impacts of recent distributional shifts and loss of diversity in Arctic communities during the 20th century.
Collapse
Affiliation(s)
- Oliver Kersten
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Oslo, Norway
| | - Bastiaan Star
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Oslo, Norway
| | - Anders K. Krabberød
- Section for Genetics and Evolutionary Biology (Evogene), Department of Biosciences, University of Oslo, Oslo, Norway
| | - Lane M. Atmore
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Oslo, Norway
| | - Ole K. Tørresen
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Oslo, Norway
| | | | | | - Hallvard Strøm
- Norwegian Polar Institute, Fram Centre, Langnes, Tromsø, Norway
| | | | - Paul R. Sweet
- American Museum of Natural History, New York, NY, USA
| | - Kjetill S. Jakobsen
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Oslo, Norway
| | - Sanne Boessenkool
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Oslo, Norway
| |
Collapse
|
16
|
Kim BY, Gellert HR, Church SH, Suvorov A, Anderson SS, Barmina O, Beskid SG, Comeault AA, Crown KN, Diamond SE, Dorus S, Fujichika T, Hemker JA, Hrcek J, Kankare M, Katoh T, Magnacca KN, Martin RA, Matsunaga T, Medeiros MJ, Miller DE, Pitnick S, Simoni S, Steenwinkel TE, Schiffer M, Syed ZA, Takahashi A, Wei KHC, Yokoyama T, Eisen MB, Kopp A, Matute D, Obbard DJ, O'Grady PM, Price DK, Toda MJ, Werner T, Petrov DA. Single-fly assemblies fill major phylogenomic gaps across the Drosophilidae Tree of Life. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.02.560517. [PMID: 37873137 PMCID: PMC10592941 DOI: 10.1101/2023.10.02.560517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Long-read sequencing is driving rapid progress in genome assembly across all major groups of life, including species of the family Drosophilidae, a longtime model system for genetics, genomics, and evolution. We previously developed a cost-effective hybrid Oxford Nanopore (ONT) long-read and Illumina short-read sequencing approach and used it to assemble 101 drosophilid genomes from laboratory cultures, greatly increasing the number of genome assemblies for this taxonomic group. The next major challenge is to address the laboratory culture bias in taxon sampling by sequencing genomes of species that cannot easily be reared in the lab. Here, we build upon our previous methods to perform amplification-free ONT sequencing of single wild flies obtained either directly from the field or from ethanol-preserved specimens in museum collections, greatly improving the representation of lesser studied drosophilid taxa in whole-genome data. Using Illumina Novaseq X Plus and ONT P2 sequencers with R10.4.1 chemistry, we set a new benchmark for inexpensive hybrid genome assembly at US $150 per genome while assembling genomes from as little as 35 ng of genomic DNA from a single fly. We present 183 new genome assemblies for 179 species as a resource for drosophilid systematics, phylogenetics, and comparative genomics. Of these genomes, 62 are from pooled lab strains and 121 from single adult flies. Despite the sample limitations of working with small insects, most single-fly diploid assemblies are comparable in contiguity (>1Mb contig N50), completeness (>98% complete dipteran BUSCOs), and accuracy (>QV40 genome-wide with ONT R10.4.1) to assemblies from inbred lines. We present a well-resolved multi-locus phylogeny for 360 drosophilid and 4 outgroup species encompassing all publicly available (as of August 2023) genomes for this group. Finally, we present a Progressive Cactus whole-genome, reference-free alignment built from a subset of 298 suitably high-quality drosophilid genomes. The new assemblies and alignment, along with updated laboratory protocols and computational pipelines, are released as an open resource and as a tool for studying evolution at the scale of an entire insect family.
Collapse
Affiliation(s)
| | | | - Samuel H Church
- Department of Ecology and Evolutionary Biology, Yale University, USA
| | - Anton Suvorov
- Department of Biological Sciences, Virginia Tech, USA
| | - Sean S Anderson
- Department of Biology, University of North Carolina Chapel Hill, USA
| | - Olga Barmina
- Department of Evolution and Ecology, University of California Davis, USA
| | | | - Aaron A Comeault
- School of Environmental and Natural Sciences, Bangor University, UK
| | - K Nicole Crown
- Department of Biology, Case Western Reserve University, USA
| | | | - Steve Dorus
- Center for Reproductive Evolution, Department of Biology, Syracuse University, USA
| | - Takako Fujichika
- Department of Biological Sciences, Tokyo Metropolitan University, Japan
| | - James A Hemker
- Department of Developmental Biology, Stanford University, USA
| | - Jan Hrcek
- Institute of Entomology, Biology Centre, Czech Academy of Sciences, Czechia
| | - Maaria Kankare
- Department of Biological and Environmental Science, University of Jyväskylä, Finland
| | - Toru Katoh
- Department of Biological Sciences, Hokkaido University, Japan
| | - Karl N Magnacca
- Hawaii Invertebrate Program, Division of Forestry & Wildlife, State of Hawaii, USA
| | - Ryan A Martin
- Department of Biology, Case Western Reserve University, USA
| | - Teruyuki Matsunaga
- Department of Complexity Science and Engineering, The University of Tokyo, Japan
| | | | - Danny E Miller
- Division of Genetic Medicine, Department of Pediatrics; Department of Laboratory Medicine and Pathology, University of Washington, USA
| | - Scott Pitnick
- Center for Reproductive Evolution, Department of Biology, Syracuse University, USA
| | - Sara Simoni
- Department of Biology, Stanford University, USA
| | | | - Michele Schiffer
- Daintree Rainforest Observatory, James Cook University, Australia
| | - Zeeshan A Syed
- Center for Reproductive Evolution, Department of Biology, Syracuse University, USA
| | - Aya Takahashi
- Department of Biological Sciences, Tokyo Metropolitan University, Japan
| | - Kevin H-C Wei
- Department of Zoology, The University of British Columbia
| | | | - Michael B Eisen
- Department of Cell and Molecular Biology, University of California Berkeley, United States
- Howard Hughes Medical Institute,University of California Berkeley, United States
| | - Artyom Kopp
- Department of Evolution and Ecology, University of California Davis, USA
| | - Daniel Matute
- Department of Biology, University of North Carolina Chapel Hill, USA
| | - Darren J Obbard
- Institute of Ecology and Evolution, University of Edinburgh, UK
| | | | - Donald K Price
- School of Life Sciences, University of Nevada Las Vegas, USA
| | | | - Thomas Werner
- Department of Biological Sciences, Michigan Technological University, USA
| | - Dmitri A Petrov
- Department of Biology, Stanford University, USA
- CZ Biohub, Investigator
| |
Collapse
|
17
|
Nguyen ED, Fard VN, Kim BY, Collins S, Galey M, Nelson BR, Wakenight P, Gable SM, McKenna A, Bammler TK, MacDonald J, Okamura DM, Shendure J, Beier DR, Ramirez JM, Majesky MW, Millen KJ, Tollis M, Miller DE. Genome Report: chromosome-scale genome assembly of the African spiny mouse (Acomys cahirinus). G3 (BETHESDA, MD.) 2023; 13:jkad177. [PMID: 37552705 PMCID: PMC10542272 DOI: 10.1093/g3journal/jkad177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 04/03/2023] [Accepted: 06/23/2023] [Indexed: 08/10/2023]
Abstract
There is increasing interest in the African spiny mouse (Acomys cahirinus) as a model organism because of its ability for regeneration of tissue after injury in skin, muscle, and internal organs such as the kidneys. A high-quality reference genome is needed to better understand these regenerative properties at the molecular level. Here, we present an improved reference genome for A. cahirinus generated from long Nanopore sequencing reads. We confirm the quality of our annotations using RNA sequencing data from 4 different tissues. Our genome is of higher contiguity and quality than previously reported genomes from this species and will facilitate ongoing efforts to better understand the regenerative properties of this organism.
Collapse
Affiliation(s)
- Elizabeth Dong Nguyen
- Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
- Center for Developmental Biology & Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA 98101, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA
| | - Vahid Nikoonejad Fard
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Bernard Y Kim
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Sarah Collins
- Center for Developmental Biology & Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Miranda Galey
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
| | - Branden R Nelson
- Center for Integrative Brain Research, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Paul Wakenight
- Center for Integrative Brain Research, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Simone M Gable
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Aaron McKenna
- Department of Molecular & Systems Biology, Dartmouth Geisel School of Medicine, Lebanon, NH 03755, USA
| | - Theo K Bammler
- Department of Environmental & Occupational Health Sciences, University of Washington, Seattle, WA 98195, USA
| | - Jim MacDonald
- Department of Environmental & Occupational Health Sciences, University of Washington, Seattle, WA 98195, USA
| | - Daryl M Okamura
- Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
- Center for Developmental Biology & Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Jay Shendure
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA 98195, USA
- Howard Hughes Medical Institute, Seattle, WA 98195, USA
- Institute of Stem Cell & Regenerative Medicine, University of Washington, Seattle, WA 98195, USA
| | - David R Beier
- Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
- Center for Developmental Biology & Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Jan Marino Ramirez
- Center for Integrative Brain Research, Seattle Children's Research Institute, Seattle, WA 98101, USA
- Department of Neurological Surgery, University of Washington, Seattle, WA 98195, USA
| | - Mark W Majesky
- Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
- Center for Developmental Biology & Regenerative Medicine, Seattle Children's Research Institute, Seattle, WA 98101, USA
- Institute of Stem Cell & Regenerative Medicine, University of Washington, Seattle, WA 98195, USA
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA 98195, USA
| | - Kathleen J Millen
- Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA
- Center for Integrative Brain Research, Seattle Children's Research Institute, Seattle, WA 98101, USA
| | - Marc Tollis
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Danny E Miller
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA 98195, USA
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
18
|
Rhie A, Nurk S, Cechova M, Hoyt SJ, Taylor DJ, Altemose N, Hook PW, Koren S, Rautiainen M, Alexandrov IA, Allen J, Asri M, Bzikadze AV, Chen NC, Chin CS, Diekhans M, Flicek P, Formenti G, Fungtammasan A, Garcia Giron C, Garrison E, Gershman A, Gerton JL, Grady PGS, Guarracino A, Haggerty L, Halabian R, Hansen NF, Harris R, Hartley GA, Harvey WT, Haukness M, Heinz J, Hourlier T, Hubley RM, Hunt SE, Hwang S, Jain M, Kesharwani RK, Lewis AP, Li H, Logsdon GA, Lucas JK, Makalowski W, Markovic C, Martin FJ, Mc Cartney AM, McCoy RC, McDaniel J, McNulty BM, Medvedev P, Mikheenko A, Munson KM, Murphy TD, Olsen HE, Olson ND, Paulin LF, Porubsky D, Potapova T, Ryabov F, Salzberg SL, Sauria MEG, Sedlazeck FJ, Shafin K, Shepelev VA, Shumate A, Storer JM, Surapaneni L, Taravella Oill AM, Thibaud-Nissen F, Timp W, Tomaszkiewicz M, Vollger MR, Walenz BP, Watwood AC, Weissensteiner MH, Wenger AM, Wilson MA, Zarate S, Zhu Y, Zook JM, Eichler EE, O'Neill RJ, Schatz MC, Miga KH, Makova KD, Phillippy AM. The complete sequence of a human Y chromosome. Nature 2023; 621:344-354. [PMID: 37612512 PMCID: PMC10752217 DOI: 10.1038/s41586-023-06457-y] [Citation(s) in RCA: 71] [Impact Index Per Article: 71.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 07/19/2023] [Indexed: 08/25/2023]
Abstract
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
Collapse
Affiliation(s)
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Oxford Nanopore Technologies Inc., Oxford, UK
| | - Monika Cechova
- Faculty of Informatics, Masaryk University, Brno, Czech Republic
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Savannah J Hoyt
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Dylan J Taylor
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Nicolas Altemose
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Paul W Hook
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Ivan A Alexandrov
- Federal Research Center of Biotechnology of the Russian Academy of Sciences, Moscow, Russia
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- Department of Anatomy and Anthropology and Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv-Yafo, Israel
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Mobin Asri
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Chen-Shan Chin
- GeneDX Holdings Corp, Stamford, CT, USA
- Foundation of Biological Data Science, Belmont, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | | | | | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ariel Gershman
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer L Gerton
- Stowers Institute for Medical Research, Kansas City, MO, USA
- University of Kansas Medical Center, Kansas City, MO, USA
| | - Patrick G S Grady
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Reza Halabian
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Nancy F Hansen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
- Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Robert Harris
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Gabrielle A Hartley
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Jakob Heinz
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Stephen Hwang
- XDBio Program, Johns Hopkins University, Baltimore, MD, USA
| | - Miten Jain
- Department of Bioengineering, Department of Physics, Northeastern University, Boston, MA, USA
| | - Rupesh K Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Glennis A Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Julian K Lucas
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Wojciech Makalowski
- Institute of Bioinformatics, Faculty of Medicine, University of Münster, Münster, Germany
| | - Christopher Markovic
- Genome Technology Access Center at the McDonnell Genome Institute, Washington University, St. Louis, MO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Ann M Mc Cartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Rajiv C McCoy
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brandy M McNulty
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Paul Medvedev
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, St Petersburg, Russia
- UCL Queen Square Institute of Neurology, UCL, London, UK
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Hugh E Olsen
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Nathan D Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Fedor Ryabov
- Masters Program in National Research University Higher School of Economics, Moscow, Russia
| | - Steven L Salzberg
- Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | | | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | | | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | | | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Angela M Taravella Oill
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Marta Tomaszkiewicz
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Department of Biomedical Engineering, Pennsylvania State University, State College, PA, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison C Watwood
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | | | | | - Melissa A Wilson
- Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Yiming Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Investigator, Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Rachel J O'Neill
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
- Department of Genetics and Genome Sciences, UConn Health, Farmington, CT, USA
| | - Michael C Schatz
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Karen H Miga
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
19
|
Peláez JN, Gloss AD, Goldman-Huertas B, Kim B, Lapoint RT, Pimentel-Solorio G, Verster KI, Aguilar JM, Nelson Dittrich AC, Singhal M, Suzuki HC, Matsunaga T, Armstrong EE, Charboneau JLM, Groen SC, Hembry DH, Ochoa CJ, O’Connor TK, Prost S, Zaaijer S, Nabity PD, Wang J, Rodas E, Liang I, Whiteman NK. Evolution of chemosensory and detoxification gene families across herbivorous Drosophilidae. G3 (BETHESDA, MD.) 2023; 13:jkad133. [PMID: 37317982 PMCID: PMC10411586 DOI: 10.1093/g3journal/jkad133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 03/19/2023] [Accepted: 05/31/2023] [Indexed: 06/16/2023]
Abstract
Herbivorous insects are exceptionally diverse, accounting for a quarter of all known eukaryotic species, but the genomic basis of adaptations that enabled this dietary transition remains poorly understood. Many studies have suggested that expansions and contractions of chemosensory and detoxification gene families-genes directly mediating interactions with plant chemical defenses-underlie successful plant colonization. However, this hypothesis has been challenging to test because the origins of herbivory in many insect lineages are ancient (>150 million years ago (mya)), obscuring genomic evolutionary patterns. Here, we characterized chemosensory and detoxification gene family evolution across Scaptomyza, a genus nested within Drosophila that includes a recently derived (<15 mya) herbivore lineage of mustard (Brassicales) specialists and carnation (Caryophyllaceae) specialists, and several nonherbivorous species. Comparative genomic analyses revealed that herbivorous Scaptomyza has among the smallest chemosensory and detoxification gene repertoires across 12 drosophilid species surveyed. Rates of gene turnover averaged across the herbivore clade were significantly higher than background rates in over half of the surveyed gene families. However, gene turnover was more limited along the ancestral herbivore branch, with only gustatory receptors and odorant-binding proteins experiencing strong losses. The genes most significantly impacted by gene loss, duplication, or changes in selective constraint were those involved in detecting compounds associated with feeding on living plants (bitter or electrophilic phytotoxins) or their ancestral diet (fermenting plant volatiles). These results provide insight into the molecular and evolutionary mechanisms of plant-feeding adaptations and highlight gene candidates that have also been linked to other dietary transitions in Drosophila.
Collapse
Affiliation(s)
- Julianne N Peláez
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Biology, Brandeis University, Waltham, MA 02453, USA
| | - Andrew D Gloss
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Department of Biology and Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Benjamin Goldman-Huertas
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Bernard Kim
- Department of Biology, Stanford University, Palo Alto, CA 94305, USA
| | - Richard T Lapoint
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | - Kirsten I Verster
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Biology, Stanford University, Palo Alto, CA 94305, USA
| | - Jessica M Aguilar
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Anna C Nelson Dittrich
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
| | - Malvika Singhal
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Chemistry & Biochemistry, University of Oregon, Eugene, OR 97403, USA
| | - Hiromu C Suzuki
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Teruyuki Matsunaga
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Ellie E Armstrong
- Department of Biology, Stanford University, Palo Alto, CA 94305, USA
| | - Joseph L M Charboneau
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Simon C Groen
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Department of Biology and Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Nematology, University of California Riverside, Riverside, CA 92521, USA
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA 92521, USA
- Center for Plant Cell Biology and Institute for Integrative Genome Biology, University of California Riverside, Riverside, CA 92521, USA
| | - David H Hembry
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Department of Biology, University of Texas Permian Basin, Odessa, TX 79762, USA
| | - Christopher J Ochoa
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
- Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Timothy K O’Connor
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Stefan Prost
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Biology, Stanford University, Palo Alto, CA 94305, USA
| | - Sophie Zaaijer
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Jacobs Institute, Cornell Tech, New York, NY 10044, USA
- FIND Genomics, New York, NY 10044, USA
| | - Paul D Nabity
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, CA 92521, USA
| | - Jiarui Wang
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA 90007, USA
| | - Esteban Rodas
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Irene Liang
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
| | - Noah K Whiteman
- Department of Integrative Biology, University of California Berkeley, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
20
|
She R, Fair T, Schaefer NK, Saunders RA, Pavlovic BJ, Weissman JS, Pollen AA. Comparative landscape of genetic dependencies in human and chimpanzee stem cells. Cell 2023; 186:2977-2994.e23. [PMID: 37343560 PMCID: PMC10461406 DOI: 10.1016/j.cell.2023.05.043] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 03/14/2023] [Accepted: 05/26/2023] [Indexed: 06/23/2023]
Abstract
Comparative studies of great apes provide a window into our evolutionary past, but the extent and identity of cellular differences that emerged during hominin evolution remain largely unexplored. We established a comparative loss-of-function approach to evaluate whether human cells exhibit distinct genetic dependencies. By performing genome-wide CRISPR interference screens in human and chimpanzee pluripotent stem cells, we identified 75 genes with species-specific effects on cellular proliferation. These genes comprised coherent processes, including cell-cycle progression and lysosomal signaling, which we determined to be human-derived by comparison with orangutan cells. Human-specific robustness to CDK2 and CCNE1 depletion persisted in neural progenitor cells and cerebral organoids, supporting the G1-phase length hypothesis as a potential evolutionary mechanism in human brain expansion. Our findings demonstrate that evolutionary changes in human cells reshaped the landscape of essential genes and establish a platform for systematically uncovering latent cellular and molecular differences between species.
Collapse
Affiliation(s)
- Richard She
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
| | - Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Nathan K Schaefer
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Reuben A Saunders
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA; Department of Cellular and Molecular Pharmacology, University of California at San Francisco, San Francisco, CA, USA
| | - Bryan J Pavlovic
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Jonathan S Weissman
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA; David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute Technology, Cambridge, MA 02142, USA.
| | - Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
21
|
Liao WW, Asri M, Ebler J, Doerr D, Haukness M, Hickey G, Lu S, Lucas JK, Monlong J, Abel HJ, Buonaiuto S, Chang XH, Cheng H, Chu J, Colonna V, Eizenga JM, Feng X, Fischer C, Fulton RS, Garg S, Groza C, Guarracino A, Harvey WT, Heumos S, Howe K, Jain M, Lu TY, Markello C, Martin FJ, Mitchell MW, Munson KM, Mwaniki MN, Novak AM, Olsen HE, Pesout T, Porubsky D, Prins P, Sibbesen JA, Sirén J, Tomlinson C, Villani F, Vollger MR, Antonacci-Fulton LL, Baid G, Baker CA, Belyaeva A, Billis K, Carroll A, Chang PC, Cody S, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Ebert P, Fairley S, Fedrigo O, Felsenfeld AL, Formenti G, Frankish A, Gao Y, Garrison NA, Giron CG, Green RE, Haggerty L, Hoekzema K, Hourlier T, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Magalhães H, Marco-Sola S, Marijon P, McCartney A, McDaniel J, Mountcastle J, Nattestad M, Nurk S, Olson ND, Popejoy AB, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Smith MW, Sofia HJ, Abou Tayoun AN, Thibaud-Nissen F, Tricomi FF, Wagner J, Walenz B, Wood JMD, Zimin AV, Bourque G, Chaisson MJP, Flicek P, Phillippy AM, Zook JM, Eichler EE, Haussler D, Wang T, Jarvis ED, Miga KH, Garrison E, Marschall T, Hall IM, Li H, Paten B. A draft human pangenome reference. Nature 2023; 617:312-324. [PMID: 37165242 PMCID: PMC10172123 DOI: 10.1038/s41586-023-05896-x] [Citation(s) in RCA: 216] [Impact Index Per Article: 216.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Accepted: 02/28/2023] [Indexed: 05/12/2023]
Abstract
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Collapse
Affiliation(s)
- Wen-Wei Liao
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
- Division of Biology and Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Mobin Asri
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Daniel Doerr
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Marina Haukness
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Glenn Hickey
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Shuangjia Lu
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA
| | - Julian K Lucas
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Jean Monlong
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haley J Abel
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Silvia Buonaiuto
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
| | - Xian H Chang
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Justin Chu
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jordan M Eizenga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Xiaowen Feng
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Christian Fischer
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Robert S Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Shilpa Garg
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Copenhagen, Denmark
| | - Cristian Groza
- Quantitative Life Sciences, McGill University, Montréal, Québec, Canada
| | - Andrea Guarracino
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
- Genomics Research Centre, Human Technopole, Milan, Italy
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Simon Heumos
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
- Biomedical Data Science, Department of Computer Science, University of Tübingen, Tübingen, Germany
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Hinxton, Cambridge, UK
| | - Miten Jain
- Northeastern University, Boston, MA, USA
| | - Tsung-Yu Lu
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Charles Markello
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Adam M Novak
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Hugh E Olsen
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Trevor Pesout
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonas A Sibbesen
- Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark
| | - Jouni Sirén
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Division of Medical Genetics, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | - Carl A Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Konstantinos Billis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | | | | | - Sarah Cody
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | | | - Robert M Cook-Deegan
- Barrett and O'Connor Washington Center, Arizona State University, Washington, DC, USA
| | - Omar E Cornejo
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Mark Diekhans
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Peter Ebert
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
- Core Unit Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
| | - Susan Fairley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam L Felsenfeld
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Giulio Formenti
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Yan Gao
- Center for Computational and Genomic Medicine, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Nanibaa' A Garrison
- Institute for Society and Genetics, College of Letters and Science, University of California, Los Angeles, CA, USA
- Institute for Precision Health, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Division of General Internal Medicine and Health Services Research, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Richard E Green
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA
- Dovetail Genomics, Scotts Valley, CA, USA
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Hanlee P Ji
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Eimear E Kenny
- Institute for Genomic Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Barbara A Koenig
- Program in Bioethics and Institute for Human Genetics, University of California, San Francisco, CA, USA
| | | | - Jan O Korbel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Jennifer Kordosky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - HoJoon Lee
- Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Hugo Magalhães
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Santiago Marco-Sola
- Computer Sciences Department, Barcelona Supercomputing Center, Barcelona, Spain
- Departament d'Arquitectura de Computadors i Sistemes Operatius, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Pierre Marijon
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
| | - Ann McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | | | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Alice B Popejoy
- Department of Public Health Sciences, University of California, Davis, CA, USA
| | - Daniela Puiu
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Allison A Regier
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Samuel Sacco
- Department of Ecology and Evolutionary Biology, University of California, Santa Cruz, CA, USA
| | - Ashley D Sanders
- Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine in the Helmholtz Association, Berlin, Germany
| | - Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Baergen I Schultz
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | | | - Michael W Smith
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Heidi J Sofia
- National Institutes of Health (NIH)-National Human Genome Research Institute, Bethesda, MD, USA
| | - Ahmad N Abou Tayoun
- Al Jalila Genomics Center of Excellence, Al Jalila Children's Specialty Hospital, Dubai, UAE
- Center for Genomic Discovery, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai, UAE
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Brian Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | - Aleksey V Zimin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA
| | - Guillaume Bourque
- Department of Human Genetics, McGill University, Montréal, Québec, Canada
- Canadian Center for Computational Genomics, McGill University, Montréal, Québec, Canada
- Institute for the Advanced Study of Human Biology (WPI-ASHBi), Kyoto University, Kyoto, Japan
| | - Mark J P Chaisson
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - David Haussler
- Genomics Institute, University of California, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Ting Wang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Erich D Jarvis
- Vertebrate Genome Laboratory, The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Karen H Miga
- Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA.
| | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf, Germany.
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany.
| | - Ira M Hall
- Department of Genetics, Yale University School of Medicine, New Haven, CT, USA.
- Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA.
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Benedict Paten
- Genomics Institute, University of California, Santa Cruz, CA, USA.
| |
Collapse
|
22
|
Webster TH, Vannan A, Pinto BJ, Denbrock G, Morales M, Dolby GA, Fiddes IT, DeNardo DF, Wilson MA. Incomplete dosage balance and dosage compensation in the ZZ/ZW Gila monster ( Heloderma suspectum) revealed by de novo genome assembly. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.26.538436. [PMID: 37163099 PMCID: PMC10168389 DOI: 10.1101/2023.04.26.538436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Reptiles exhibit a variety of modes of sex determination, including both temperature-dependent and genetic mechanisms. Among those species with genetic sex determination, sex chromosomes of varying heterogamety (XX/XY and ZZ/ZW) have been observed with different degrees of differentiation. Karyotype studies have demonstrated that Gila monsters (Heloderma suspectum) have ZZ/ZW sex determination and this system is likely homologous to the ZZ/ZW system in the Komodo dragon (Varanus komodoensis), but little else is known about their sex chromosomes. Here, we report the assembly and analysis of the Gila monster genome. We generated a de novo draft genome assembly for a male using 10X Genomics technology. We further generated and analyzed short-read whole genome sequencing and whole transcriptome sequencing data for three males and three females. By comparing female and male genomic data, we identified four putative Z-chromosome scaffolds. These putative Z-chromosome scaffolds are homologous to Z-linked scaffolds identified in the Komodo dragon. Further, by analyzing RNAseq data, we observed evidence of incomplete dosage compensation between the Gila monster Z chromosome and autosomes and a lack of balance in Z-linked expression between the sexes. In particular, we observe lower expression of the Z in females (ZW) than males (ZZ) on a global basis, though we find evidence suggesting local gene-by-gene compensation. This pattern has been observed in most other ZZ/ZW systems studied to date and may represent a general pattern for female heterogamety in vertebrates.
Collapse
Affiliation(s)
- Timothy H. Webster
- Department of Anthropology, University of Utah, Salt Lake City, UT
- School of Life Sciences, Arizona State University, Tempe, AZ
| | - Annika Vannan
- School of Life Sciences, Arizona State University, Tempe, AZ
| | - Brendan J. Pinto
- School of Life Sciences, Arizona State University, Tempe, AZ
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI USA
| | - Grant Denbrock
- School of Life Sciences, Arizona State University, Tempe, AZ
| | - Matheo Morales
- School of Life Sciences, Arizona State University, Tempe, AZ
- Department of Genetics, Yale University, New Haven, CT
| | - Greer A. Dolby
- School of Life Sciences, Arizona State University, Tempe, AZ
- Center for Mechanisms of Evolution, Biodesign Institute, Tempe, AZ
| | | | - Dale F. DeNardo
- School of Life Sciences, Arizona State University, Tempe, AZ
| | - Melissa A. Wilson
- School of Life Sciences, Arizona State University, Tempe, AZ
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ
- Center for Mechanisms of Evolution, Biodesign Institute, Tempe, AZ
| |
Collapse
|
23
|
Draper J, Philipp J, Neeb Z, Thomas R, Katzman S, Salama S, Haussler D, Sanford JR. Isoform-specific translational control is evolutionarily conserved in primates. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.21.537863. [PMID: 37131629 PMCID: PMC10153275 DOI: 10.1101/2023.04.21.537863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Alternative splicing (AS) alters messenger RNA (mRNA) coding capacity, localization, stability, and translation. Here we use comparative transcriptomics to identify cis- acting elements coupling AS to translational control (AS-TC). We sequenced total cytosolic and polyribosome-associated mRNA from human, chimpanzee, and orangutan induced pluripotent stem cells (iPSCs), revealing thousands of transcripts with splicing differences between subcellular fractions. We found both conserved and species-specific polyribosome association patterns for orthologous splicing events. Intriguingly, alternative exons with similar polyribosome profiles between species have stronger sequence conservation than exons with lineage-specific ribosome association. These data suggest that sequence variation underlies differences in the polyribosome association. Accordingly, single nucleotide substitutions in luciferase reporters designed to model exons with divergent polyribosome profiles are sufficient to regulate translational efficiency. We used position specific weight matrices to interpret exons with species-specific polyribosome association profiles, finding that polymorphic sites frequently alter recognition motifs for trans- acting RNA binding proteins. Together, our results show that AS can regulate translation by remodeling the cis- regulatory landscape of mRNA isoforms.
Collapse
|
24
|
Nguyen ED, Fard VN, Kim BY, Collins S, Galey M, Nelson BR, Wakenight P, Gable SM, McKenna A, Bammler TK, MacDonald J, Okamura DM, Shendure J, Beier DR, Ramirez JM, Majesky MW, Millen KJ, Tollis M, Miller DE. GENOME REPORT: Chromosome-scale genome assembly of the African spiny mouse ( Acomys cahirinus ). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.03.535372. [PMID: 37066261 PMCID: PMC10103962 DOI: 10.1101/2023.04.03.535372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
There is increasing interest in the African spiny mouse ( Acomys cahirinus ) as a model organism because of its ability for regeneration of tissue after injury in skin, muscle, and internal organs such as the kidneys. A high-quality reference genome is needed to better understand these regenerative properties at the molecular level. Here, we present an improved reference genome for A. cahirinus generated from long Nanopore sequencing reads. We confirm the quality of our annotations using RNA sequencing data from four different tissues. Our genome is of higher contiguity and quality than previously reported genomes from this species and will facilitate ongoing efforts to better understand the regenerative properties of this organism.
Collapse
Affiliation(s)
- Elizabeth Dong Nguyen
- Department of Pediatrics, University of Washington, Seattle, WA
- Center for Developmental Biology & Regenerative Medicine, Seattle Children’s Research Institute, Seattle, WA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA
| | - Vahid Nikoonejad Fard
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ
| | - Bernard Y. Kim
- Department of Biology, Stanford University, Stanford, CA
| | - Sarah Collins
- Center for Developmental Biology & Regenerative Medicine, Seattle Children’s Research Institute, Seattle, WA
| | - Miranda Galey
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA
| | - Branden R. Nelson
- Center for Integrative Brain Research, Seattle Children’s Research Institute, Seattle, WA
| | - Paul Wakenight
- Center for Integrative Brain Research, Seattle Children’s Research Institute, Seattle, WA
| | - Simone M. Gable
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ
| | - Aaron McKenna
- Department of Molecular & Systems Biology, Dartmouth Geisel School of Medicine, Lebanon, NH
| | - Theo K. Bammler
- Department of Environmental & Occupational Health Sciences, University of Washington, Seattle, WA
| | - Jim MacDonald
- Department of Environmental & Occupational Health Sciences, University of Washington, Seattle, WA
| | - Daryl M. Okamura
- Department of Pediatrics, University of Washington, Seattle, WA
- Center for Developmental Biology & Regenerative Medicine, Seattle Children’s Research Institute, Seattle, WA
| | - Jay Shendure
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA
- Department of Genome Sciences, University of Washington, Seattle, WA
- Allen Discovery Center for Cell Lineage Tracing, Seattle, WA
- Howard Hughes Medical Institute, Seattle, WA
- Institute of Stem Cell & Regenerative Medicine, University of Washington, Seattle, WA
| | - David R. Beier
- Department of Pediatrics, University of Washington, Seattle, WA
- Center for Developmental Biology & Regenerative Medicine, Seattle Children’s Research Institute, Seattle, WA
| | - Jan Marino Ramirez
- Center for Integrative Brain Research, Seattle Children’s Research Institute, Seattle, WA
- Department of Neurological Surgery, University of Washington, Seattle WA
| | - Mark W. Majesky
- Department of Pediatrics, University of Washington, Seattle, WA
- Center for Developmental Biology & Regenerative Medicine, Seattle Children’s Research Institute, Seattle, WA
- Institute of Stem Cell & Regenerative Medicine, University of Washington, Seattle, WA
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA
| | - Kathleen J. Millen
- Department of Pediatrics, University of Washington, Seattle, WA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA
- Center for Integrative Brain Research, Seattle Children’s Research Institute, Seattle, WA
| | - Marc Tollis
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ
| | - Danny E. Miller
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA
- Division of Genetic Medicine, Department of Pediatrics, University of Washington, Seattle, WA
- Department of Laboratory Medicine & Pathology, University of Washington, Seattle, WA
| |
Collapse
|
25
|
Pelaez JN, Gloss AD, Goldman-Huertas B, Kim B, Lapoint RT, Pimentel-Solorio G, Verster KI, Aguilar JM, Dittrich ACN, Singhal M, Suzuki HC, Matsunaga T, Armstrong EE, Charboneau JL, Groen SC, Hembry DH, Ochoa CJ, O’Connor TK, Prost S, Zaaijer S, Nabity PD, Wang J, Rodas E, Liang I, Whiteman NK. Evolution of chemosensory and detoxification gene families across herbivorous Drosophilidae. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.16.532987. [PMID: 36993186 PMCID: PMC10055167 DOI: 10.1101/2023.03.16.532987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Herbivorous insects are exceptionally diverse, accounting for a quarter of all known eukaryotic species, but the genetic basis of adaptations that enabled this dietary transition remains poorly understood. Many studies have suggested that expansions and contractions of chemosensory and detoxification gene families - genes directly mediating interactions with plant chemical defenses - underlie successful plant colonization. However, this hypothesis has been challenging to test because the origins of herbivory in many lineages are ancient (>150 million years ago [mya]), obscuring genomic evolutionary patterns. Here, we characterized chemosensory and detoxification gene family evolution across Scaptomyza, a genus nested within Drosophila that includes a recently derived (<15 mya) herbivore lineage of mustard (Brassicales) specialists and carnation (Caryophyllaceae) specialists, and several non-herbivorous species. Comparative genomic analyses revealed that herbivorous Scaptomyza have among the smallest chemosensory and detoxification gene repertoires across 12 drosophilid species surveyed. Rates of gene turnover averaged across the herbivore clade were significantly higher than background rates in over half of the surveyed gene families. However, gene turnover was more limited along the ancestral herbivore branch, with only gustatory receptors and odorant binding proteins experiencing strong losses. The genes most significantly impacted by gene loss, duplication, or changes in selective constraint were those involved in detecting compounds associated with feeding on plants (bitter or electrophilic phytotoxins) or their ancestral diet (yeast and fruit volatiles). These results provide insight into the molecular and evolutionary mechanisms of plant-feeding adaptations and highlight strong gene candidates that have also been linked to other dietary transitions in Drosophila .
Collapse
Affiliation(s)
- Julianne N. Pelaez
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
- Department of Biology, Brandeis University, Waltham, MA 02453, USA
| | - Andrew D. Gloss
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Department of Biology and Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
| | - Benjamin Goldman-Huertas
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Bernard Kim
- Department of Biology, Stanford University, Palo Alto, CA 94305, USA
| | - Richard T. Lapoint
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- National Center for Biotechnology Information, Bethesda, MD 20894, USA
| | | | - Kirsten I. Verster
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
- Department of Biology, Stanford University, Palo Alto, CA 94305, USA
| | - Jessica M. Aguilar
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
| | - Anna C. Nelson Dittrich
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Boyce Thompson Institute, Ithaca NY 14853 USA
| | - Malvika Singhal
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
- Department of Chemistry & Biochemistry, University of Oregon, OR, CA 97403, USA
| | - Hiromu C. Suzuki
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
| | - Teruyuki Matsunaga
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
| | | | - Joseph L.M. Charboneau
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| | - Simon C. Groen
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Department of Biology and Center for Genomics and Systems Biology, New York University, New York, NY 10003, USA
- Department of Nematology, University of California-Riverside, Riverside, CA 92521, USA
- Department of Botany and Plant Sciences, University of California-Riverside, Riverside, CA 92521, USA
- Center for Plant Cell Biology and Institute for Integrative Genome Biology, University of California-Riverside, Riverside, CA 92521, USA
| | - David H. Hembry
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Department of Biology, University of Texas Permian Basin, Odessa, TX 79762, USA
| | - Christopher J. Ochoa
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
- Molecular Biology Institute, University of California-Los Angeles, Los Angeles, CA 90095, USA
| | - Timothy K. O’Connor
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
| | - Stefan Prost
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
- Department of Biology, Stanford University, Palo Alto, CA 94305, USA
| | - Sophie Zaaijer
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Jacobs Institute, Cornell Tech, New York, NY 10044, USA
- FIND Genomics, New York, NY 10044, USA
| | - Paul D. Nabity
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
- Department of Botany and Plant Sciences, University of California-Riverside, Riverside, CA 92521, USA
| | - Jiarui Wang
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
- Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA 90007, USA
| | - Esteban Rodas
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
| | - Irene Liang
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
| | - Noah K. Whiteman
- Department of Integrative Biology, University of California-Berkeley, Berkeley, CA 94720, USA
- Department of Molecular and Cell Biology, University of California-Berkeley, Berkeley, CA 94720, USA
| |
Collapse
|
26
|
Li H. Protein-to-genome alignment with miniprot. Bioinformatics 2023; 39:btad014. [PMID: 36648328 PMCID: PMC9869432 DOI: 10.1093/bioinformatics/btad014] [Citation(s) in RCA: 52] [Impact Index Per Article: 52.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 12/25/2022] [Accepted: 01/16/2023] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over 10 years ago and did not incorporate the latest advances in alignment algorithms. They are inefficient and could not keep up with the rapid production of new genomes and quickly growing protein databases. RESULTS Here, we describe miniprot, a new aligner for mapping protein sequences to a complete genome. Miniprot integrates recent techniques such as k-mer sketch and vectorized dynamic programming. It is tens of times faster than existing tools while achieving comparable accuracy on real data. AVAILABILITY AND IMPLEMENTATION https://github.com/lh3/miniport.
Collapse
Affiliation(s)
- Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
27
|
Frankish A, Carbonell-Sala S, Diekhans M, Jungreis I, Loveland J, Mudge J, Sisu C, Wright J, Arnan C, Barnes I, Banerjee A, Bennett R, Berry A, Bignell A, Boix C, Calvet F, Cerdán-Vélez D, Cunningham F, Davidson C, Donaldson S, Dursun C, Fatima R, Giorgetti S, Giron C, Gonzalez J, Hardy M, Harrison P, Hourlier T, Hollis Z, Hunt T, James B, Jiang Y, Johnson R, Kay M, Lagarde J, Martin F, Gómez L, Nair S, Ni P, Pozo F, Ramalingam V, Ruffier M, Schmitt B, Schreiber J, Steed E, Suner MM, Sumathipala D, Sycheva I, Uszczynska-Ratajczak B, Wass E, Yang Y, Yates A, Zafrulla Z, Choudhary J, Gerstein M, Guigo R, Hubbard TJP, Kellis M, Kundaje A, Paten B, Tress M, Flicek P. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res 2022; 51:D942-D949. [PMID: 36420896 PMCID: PMC9825462 DOI: 10.1093/nar/gkac1071] [Citation(s) in RCA: 72] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/15/2022] [Accepted: 11/07/2022] [Indexed: 11/27/2022] Open
Abstract
GENCODE produces high quality gene and transcript annotation for the human and mouse genomes. All GENCODE annotation is supported by experimental data and serves as a reference for genome biology and clinical genomics. The GENCODE consortium generates targeted experimental data, develops bioinformatic tools and carries out analyses that, along with externally produced data and methods, support the identification and annotation of transcript structures and the determination of their function. Here, we present an update on the annotation of human and mouse genes, including developments in the tools, data, analyses and major collaborations which underpin this progress. For example, we report the creation of a set of non-canonical ORFs identified in GENCODE transcripts, the LRGASP collaboration to assess the use of long transcriptomic data to build transcript models, the progress in collaborations with RefSeq and UniProt to increase convergence in the annotation of human and mouse protein-coding genes, the propagation of GENCODE across the human pan-genome and the development of new tools to support annotation of regulatory features by GENCODE. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Collapse
Affiliation(s)
- Adam Frankish
- To whom correspondence should be addressed. Tel: +44 1223 494388; Fax: +44 1223 484696;
| | - Sílvia Carbonell-Sala
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cristina Sisu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Department of Life Sciences, Brunel University London, Uxbridge UB8 3PH, UK
| | - James C Wright
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Carme Arnan
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Abhimanyu Banerjee
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Alexandra Bignell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carles Boix
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Ferriol Calvet
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Cagatay Dursun
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carlos Garcıa Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jose Manuel Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Benjamin James
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Yunzhe Jiang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Rory Johnson
- Department of Medical Oncology, Bern University Hospital, Murtenstrasse 35, 3008 Bern, Switzerland,School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, D04 V1W8, Ireland
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Julien Lagarde
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Laura Martínez Gómez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Surag Nair
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Pengyu Ni
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Fernando Pozo
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Vivek Ramalingam
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Magali Ruffier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Bianca M Schmitt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jacob M Schreiber
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Dulika Sumathipala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Irina Sycheva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Barbara Uszczynska-Ratajczak
- Computational Biology of Noncoding RNA, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Noskowskiego 12/14, 61-704 Poznan, Poland
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Yucheng T Yang
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai 200433, China
| | - Andrew Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Zahoor Zafrulla
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Jyoti S Choudhary
- Functional Proteomics, Division of Cancer Biology, Institute of Cancer Research, 237 Fulham Road, London SW3 6JB, UK
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Roderic Guigo
- Department of Bioinformatics and Genomics, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science andTechnology, Dr. Aiguader 88, Barcelona 08003, Catalonia, Spain,Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra (UPF), Barcelona, E-08003 Catalonia, Spain
| | - Tim J P Hubbard
- Department of Medical and Molecular Genetics, King's College London, Guys Hospital, Great Maze Pond, London SE1 9RT, UK
| | - Manolis Kellis
- MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar St, Cambridge, MA 02139,USA,Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Palo Alto, CA, USA,Department of Computer Science, Stanford University, Palo Alto, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA 95064, USA
| | - Michael L Tress
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Calle Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
28
|
Wibowo AT, Antunez-Sanchez J, Dawson A, Price J, Meehan C, Wrightsman T, Collenberg M, Bezrukov I, Becker C, Benhamed M, Weigel D, Gutierrez-Marcos J. Predictable and stable epimutations induced during clonal plant propagation with embryonic transcription factor. PLoS Genet 2022; 18:e1010479. [PMID: 36383565 PMCID: PMC9731469 DOI: 10.1371/journal.pgen.1010479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 12/08/2022] [Accepted: 10/15/2022] [Indexed: 11/17/2022] Open
Abstract
Clonal propagation is frequently used in commercial plant breeding and biotechnology programs because it minimizes genetic variation, yet it is not uncommon to observe clonal plants with stable phenotypic changes, a phenomenon known as somaclonal variation. Several studies have linked epigenetic modifications induced during regeneration with this newly acquired phenotypic variation. However, the factors that determine the extent of somaclonal variation and the molecular changes underpinning this process remain poorly understood. To address this gap in our knowledge, we compared clonally propagated Arabidopsis thaliana plants derived from somatic embryogenesis using two different embryonic transcription factors- RWP-RK DOMAIN-CONTAINING 4 (RKD4) or LEAFY COTYLEDON2 (LEC2) and from two epigenetically distinct founder tissues. We found that both the epi(genetic) status of the explant and the regeneration protocol employed play critical roles in shaping the molecular and phenotypic landscape of clonal plants. Phenotypic variation in regenerated plants can be largely explained by the inheritance of tissue-specific DNA methylation imprints, which are associated with specific transcriptional and metabolic changes in sexual progeny of clonal plants. For instance, regenerants were particularly affected by the inheritance of root-specific epigenetic imprints, which were associated with an increased accumulation of salicylic acid in leaves and accelerated plant senescence. Collectively, our data reveal specific pathways underpinning the phenotypic and molecular variation that arise and accumulate in clonal plant populations.
Collapse
Affiliation(s)
- Anjar Tri Wibowo
- School of Life Science, University of Warwick, Coventry, United Kingdom
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tubingen, Germany
- Department of Biology, Faculty of Science and Technology, Airlangga University, Surabaya City, East Java, Indonesia
| | | | - Alexander Dawson
- School of Life Science, University of Warwick, Coventry, United Kingdom
| | - Jonathan Price
- School of Life Science, University of Warwick, Coventry, United Kingdom
| | - Cathal Meehan
- School of Life Science, University of Warwick, Coventry, United Kingdom
| | - Travis Wrightsman
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tubingen, Germany
| | - Maximillian Collenberg
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tubingen, Germany
| | - Ilja Bezrukov
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tubingen, Germany
| | - Claude Becker
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tubingen, Germany
- Ludwig-Maximilians-University of Munich, Faculty of Biology, Biocenter, Martinsried, Germany
| | - Moussa Benhamed
- Université Paris-Saclay, Centre National de la Recherche Scientifique, Institut National De La Recherche Agronomique, University of Évry, Institute of Plant Sciences Paris-Saclay (IPS2), Orsay, France
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tubingen, Germany
- * E-mail: (DW); (JGM)
| | - Jose Gutierrez-Marcos
- School of Life Science, University of Warwick, Coventry, United Kingdom
- * E-mail: (DW); (JGM)
| |
Collapse
|
29
|
Fernandes Santos CA, Rodrigues da Costa S, Silva Boiteux L, Grattapaglia D, Silva-Junior OB. Genetic associations with resistance to Meloidogyne enterolobii in guava (Psidium sp.) using cross-genera SNPs and comparative genomics to Eucalyptus highlight evolutionary conservation across the Myrtaceae. PLoS One 2022; 17:e0273959. [PMID: 36322533 PMCID: PMC9629644 DOI: 10.1371/journal.pone.0273959] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Accepted: 10/14/2022] [Indexed: 11/07/2022] Open
Abstract
Tropical fruit tree species constitute a yet untapped supply of outstanding diversity of taste and nutritional value, barely developed from the genetics standpoint, with scarce or no genomic resources to tackle the challenges arising in modern breeding practice. We generated a de novo genome assembly of the Psidium guajava, the super fruit “apple of the tropics”, and successfully transferred 14,268 SNP probesets from Eucalyptus to Psidium at the nucleotide level, to detect genomic loci linked to resistance to the root knot nematode (RKN) Meloidogyne enterolobii derived from the wild relative P. guineense. Significantly associated loci with resistance across alternative analytical frameworks, were detected at two SNPs on chromosome 3 in a pseudo-assembly of Psidium guajava genome built using a syntenic path approach with the Eucalyptus grandis genome to determine the order and orientation of the contigs. The P. guineense-derived resistance response to RKN and disease onset is conceivably triggered by mineral nutrients and phytohormone homeostasis or signaling with the involvement of the miRNA pathway. Hotspots of mapped resistance quantitative trait loci and functional annotation in the same genomic region of Eucalyptus provide further indirect support to our results, highlighting the evolutionary conservation of genomes across genera of Myrtaceae in the adaptation to pathogens. Marker assisted introgression of the resistance loci mapped should accelerate the development of improved guava cultivars and hybrid rootstocks.
Collapse
Affiliation(s)
| | - Soniane Rodrigues da Costa
- Graduate program in Genetic Resources, Universidade Estadual de Feira de Santana, Feira de Santana, Bahia, Brazil
| | | | - Dario Grattapaglia
- Embrapa Genetic Resources and Biotechnology (CENARGEN), Brasília, Distrito Federal, Brazil
- * E-mail:
| | | |
Collapse
|
30
|
Yoo D, Park J, Lee C, Song I, Lee YH, Yun T, Lee H, Heguy A, Han JY, Dasen JS, Kim H, Baek M. Little skate genome provides insights into genetic programs essential for limb-based locomotion. eLife 2022; 11:e78345. [PMID: 36288084 PMCID: PMC9605692 DOI: 10.7554/elife.78345] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Accepted: 10/10/2022] [Indexed: 11/13/2022] Open
Abstract
The little skate Leucoraja erinacea, a cartilaginous fish, displays pelvic fin driven walking-like behavior using genetic programs and neuronal subtypes similar to those of land vertebrates. However, mechanistic studies on little skate motor circuit development have been limited, due to a lack of high-quality reference genome. Here, we generated an assembly of the little skate genome, with precise gene annotation and structures, which allowed post-genome analysis of spinal motor neurons (MNs) essential for locomotion. Through interspecies comparison of mouse, skate and chicken MN transcriptomes, shared and divergent gene expression profiles were identified. Comparison of accessible chromatin regions between mouse and skate MNs predicted shared transcription factor (TF) motifs with divergent ones, which could be used for achieving differential regulation of MN-expressed genes. A greater number of TF motif predictions were observed in MN-expressed genes in mouse than in little skate. These findings suggest conserved and divergent molecular mechanisms controlling MN development of vertebrates during evolution, which might contribute to intricate gene regulatory networks in the emergence of a more sophisticated motor system in tetrapods.
Collapse
Affiliation(s)
- DongAhn Yoo
- Interdisciplinary Program in Bioinformatics, Seoul National UniversitySeoulRepublic of Korea
| | - Junhee Park
- Department of Brain Sciences, DGISTDaeguRepublic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National UniversitySeoulRepublic of Korea
| | - Injun Song
- Department of Brain Sciences, DGISTDaeguRepublic of Korea
| | - Young Ho Lee
- Interdisciplinary Program in Bioinformatics, Seoul National UniversitySeoulRepublic of Korea
| | - Tery Yun
- Department of Brain Sciences, DGISTDaeguRepublic of Korea
| | - Hyemin Lee
- Department of Biology, Graduate School of Arts and Science, NYUNew YorkUnited States
| | - Adriana Heguy
- Genome Technology Center, Division for Advanced Research Technologies, and Department of Pathology, NYU School of MedicineNew YorkUnited States
| | - Jae Yong Han
- Department of Agricultural Biotechnology, Seoul National UniversitySeoulRepublic of Korea
| | - Jeremy S Dasen
- Neuroscience Institute, Department of Neuroscience and Physiology, New York University School of MedicineNew YorkUnited States
| | - Heebal Kim
- Interdisciplinary Program in Bioinformatics, Seoul National UniversitySeoulRepublic of Korea
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National UniversitySeoulRepublic of Korea
- eGnome, IncSeoulRepublic of Korea
| | - Myungin Baek
- Department of Brain Sciences, DGISTDaeguRepublic of Korea
| |
Collapse
|
31
|
Bayega A, Oikonomopoulos S, Wang YC, Ragoussis J. Improved Nanopore full-length cDNA sequencing by PCR-suppression. Front Genet 2022; 13:1031355. [PMID: 36324505 PMCID: PMC9618600 DOI: 10.3389/fgene.2022.1031355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 09/30/2022] [Indexed: 11/29/2022] Open
Abstract
Full-length transcript sequencing remains a main goal of RNA sequencing. However, even the application of long-read sequencing technologies such as Oxford Nanopore Technologies still fail to yield full-length transcript sequencing for a significant portion of sequenced reads. Since these technologies can sequence reads that are far longer than the longest known processed transcripts, the lack of efficiency to obtain full-length transcripts from good quality RNAs stems from library preparation inefficiency rather than the presence of degraded RNA molecules. It has previously been shown that addition of inverted terminal repeats in cDNA during reverse transcription followed by single-primer PCR creates a PCR suppression effect that prevents amplification of short molecules thus enriching the library for longer transcripts. We adapted this method for Nanopore cDNA library preparation and show that not only is PCR efficiency increased but gene body coverage is dramatically improved. The results show that implementation of this simple strategy will result in better quality full-length RNA sequencing data and make full-length transcript sequencing possible for most of sequenced reads.
Collapse
Affiliation(s)
- Anthony Bayega
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
| | - Spyros Oikonomopoulos
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
| | - Yu Chang Wang
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
| | - Jiannis Ragoussis
- Department of Human Genetics, McGill University Genome Centre, McGill University, Montréal, QC, Canada
- Department of Bioengineering, McGill University, Montréal, QC, Canada
- *Correspondence: Jiannis Ragoussis,
| |
Collapse
|
32
|
Kim J, Lee C, Ko BJ, Yoo DA, Won S, Phillippy AM, Fedrigo O, Zhang G, Howe K, Wood J, Durbin R, Formenti G, Brown S, Cantin L, Mello CV, Cho S, Rhie A, Kim H, Jarvis ED. False gene and chromosome losses in genome assemblies caused by GC content variation and repeats. Genome Biol 2022; 23:204. [PMID: 36167554 PMCID: PMC9516821 DOI: 10.1186/s13059-022-02765-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/02/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many short-read genome assemblies have been found to be incomplete and contain mis-assemblies. The Vertebrate Genomes Project has been producing new reference genome assemblies with an emphasis on being as complete and error-free as possible, which requires utilizing long reads, long-range scaffolding data, new assembly algorithms, and manual curation. A more thorough evaluation of the recent references relative to prior assemblies can provide a detailed overview of the types and magnitude of improvements. RESULTS Here we evaluate new vertebrate genome references relative to the previous assemblies for the same species and, in two cases, the same individuals, including a mammal (platypus), two birds (zebra finch, Anna's hummingbird), and a fish (climbing perch). We find that up to 11% of genomic sequence is entirely missing in the previous assemblies. In the Vertebrate Genomes Project zebra finch assembly, we identify eight new GC- and repeat-rich micro-chromosomes with high gene density. The impact of missing sequences is biased towards GC-rich 5'-proximal promoters and 5' exon regions of protein-coding genes and long non-coding RNAs. Between 26 and 60% of genes include structural or sequence errors that could lead to misunderstanding of their function when using the previous genome assemblies. CONCLUSIONS Our findings reveal novel regulatory landscapes and protein coding sequences that have been greatly underestimated in previous assemblies and are now present in the Vertebrate Genomes Project reference genomes.
Collapse
Affiliation(s)
- Juwan Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Dong Ahn Yoo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Sohyoung Won
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York City, USA
| | - Guojie Zhang
- BGI-Shenzhen, Shenzhen, 518083, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Universitetsparken 15, 2100, Copenhagen, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, 650223, China
| | | | | | - Richard Durbin
- Wellcome Sanger Institute, Cambridge, UK
- Department of Genetics, University of Cambridge, Cambridge, UK
| | - Giulio Formenti
- Vertebrate Genome Lab, The Rockefeller University, New York City, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA
| | - Samara Brown
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA
| | - Lindsey Cantin
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA
| | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, 97239, USA
| | - Seoae Cho
- eGnome, Inc, Seoul, Republic of Korea
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Heebal Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea.
- eGnome, Inc, Seoul, Republic of Korea.
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York City, USA.
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York City, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
33
|
An enhancer of Agouti contributes to parallel evolution of cryptically colored beach mice. Proc Natl Acad Sci U S A 2022; 119:e2202862119. [PMID: 35776547 PMCID: PMC9271204 DOI: 10.1073/pnas.2202862119] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Identifying the genetic basis of repeatedly evolved traits provides a way to reconstruct their evolutionary history and ultimately investigate the predictability of evolution. Here, we focus on the oldfield mouse (Peromyscus polionotus), which occurs in the southeastern United States, where it exhibits considerable color variation. Dorsal coats range from dark brown in mainland mice to near white in mice inhabiting sandy beaches; this light pelage has evolved independently on Florida's Gulf and Atlantic coasts as camouflage from predators. To facilitate genomic analyses, we first generated a chromosome-level genome assembly of Peromyscus polionotus subgriseus. Next, in a uniquely variable mainland population (Peromyscus polionotus albifrons), we scored 23 pigment traits and performed targeted resequencing in 168 mice. We find that pigment variation is strongly associated with an ∼2-kb region ∼5 kb upstream of the Agouti signaling protein coding region. Using a reporter-gene assay, we demonstrate that this regulatory region contains an enhancer that drives expression in the dermis of mouse embryos during the establishment of pigment prepatterns. Moreover, extended tracts of homozygosity in this Agouti region indicate that the light allele experienced recent and strong positive selection. Notably, this same light allele appears fixed in both Gulf and Atlantic coast beach mice, despite these populations being separated by >1,000 km. Together, our results suggest that this identified Agouti enhancer allele has been maintained in mainland populations as standing genetic variation and from there, has spread to and been selected in two independent beach mouse lineages, thereby facilitating their rapid and parallel evolution.
Collapse
|
34
|
Santander MD, Maronna MM, Ryan JF, Andrade SCS. The state of Medusozoa genomics: current evidence and future challenges. Gigascience 2022; 11:6586816. [PMID: 35579552 PMCID: PMC9112765 DOI: 10.1093/gigascience/giac036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Revised: 02/18/2022] [Accepted: 03/15/2022] [Indexed: 12/13/2022] Open
Abstract
Medusozoa is a widely distributed ancient lineage that harbors one-third of Cnidaria diversity divided into 4 classes. This clade is characterized by the succession of stages and modes of reproduction during metagenic lifecycles, and includes some of the most plastic body plans and life cycles among animals. The characterization of traditional genomic features, such as chromosome numbers and genome sizes, was rather overlooked in Medusozoa and many evolutionary questions still remain unanswered. Modern genomic DNA sequencing in this group started in 2010 with the publication of the Hydra vulgaris genome and has experienced an exponential increase in the past 3 years. Therefore, an update of the state of Medusozoa genomics is warranted. We reviewed different sources of evidence, including cytogenetic records and high-throughput sequencing projects. We focused on 4 main topics that would be relevant for the broad Cnidaria research community: (i) taxonomic coverage of genomic information; (ii) continuity, quality, and completeness of high-throughput sequencing datasets; (iii) overview of the Medusozoa specific research questions approached with genomics; and (iv) the accessibility of data and metadata. We highlight a lack of standardization in genomic projects and their reports, and reinforce a series of recommendations to enhance future collaborative research.
Collapse
Affiliation(s)
- Mylena D Santander
- Correspondence address. Mylena D. Santander, Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade São Paulo, 277 Rua do Matão, Cidade Universitária, São Paulo 05508-090, Brazil. E-mail:
| | - Maximiliano M Maronna
- Correspondence address. Maximiliano M. Maronna, Departamento de Zoologia, Instituto de Biociências, Universidade de São Paulo, 101 Rua do Matão Cidade Universitária, São Paulo 05508-090, Brazil. E-mail:
| | - Joseph F Ryan
- Whitney Laboratory for Marine Bioscience, University of Florida, 9505 Ocean Shore Blvd, St. Augustine, FL 32080, USA,Department of Biology, University of Florida, 220 Bartram Hall, Gainesville, FL 32611, USA
| | - Sónia C S Andrade
- Departamento de Genética e Biologia Evolutiva, Instituto de Biociências, Universidade São Paulo, 277 Rua do Matão, Cidade Universitária, São Paulo 05508-090, Brazil
| |
Collapse
|
35
|
Alexandre NM, Haji D, Bakhtiari M, Chatla K, Aguilar JM, Arzumanova K, Whiteman NK. A Reference Genome Assembly of Hybrid-Derived California Wild Radish (Raphanus sativus × raphanistrum). J Hered 2022; 113:197-204. [PMID: 35575080 PMCID: PMC9113464 DOI: 10.1093/jhered/esab076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Accepted: 11/24/2021] [Indexed: 01/30/2023] Open
Abstract
For agriculturally important plants, pollination and herbivory are 2 ecological factors that play into the success of crop yields. Each is also important in natural environments where invasive plants and their effect on species interactions may alter the native ecology. The California Wild Radish (Raphanus sativus × raphanistrum), a hybrid derived from an agriculturally important crop and a nonnative cultivar, is common in California. Remarkably, it has recently replaced wild populations of both progenitor species. Experiments on phenotypic variation for petal color and antiherbivore defenses suggest both pairs of polymorphisms are maintained as a result of pollinator- and herbivore-mediated natural selection. This species provides an opportunity to understand how natural selection shapes the evolution of ecologically important traits when traits are constrained by 2 opposing forces. Here we provide the genome assembly of the California Wild Radish displaying improvement to currently existing genomes for agronomically important crucifers. This genome sequence provides the tools to dissect the genomic architecture of traits related to herbivory and pollination using natural variation in the wild as well as the ability to infer demographic and selective history in the context of hybridization. Study systems like these will improve our understanding and predictions of evolutionary change for correlated traits.
Collapse
Affiliation(s)
- Nicolas M Alexandre
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Diler Haji
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Moe Bakhtiari
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Kamalakar Chatla
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Jessica M Aguilar
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Ksenia Arzumanova
- Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Noah K Whiteman
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA, USA
| |
Collapse
|
36
|
Nurk S, Koren S, Rhie A, Rautiainen M, Bzikadze AV, Mikheenko A, Vollger MR, Altemose N, Uralsky L, Gershman A, Aganezov S, Hoyt SJ, Diekhans M, Logsdon GA, Alonge M, Antonarakis SE, Borchers M, Bouffard GG, Brooks SY, Caldas GV, Chen NC, Cheng H, Chin CS, Chow W, de Lima LG, Dishuck PC, Durbin R, Dvorkina T, Fiddes IT, Formenti G, Fulton RS, Fungtammasan A, Garrison E, Grady PG, Graves-Lindsay TA, Hall IM, Hansen NF, Hartley GA, Haukness M, Howe K, Hunkapiller MW, Jain C, Jain M, Jarvis ED, Kerpedjiev P, Kirsche M, Kolmogorov M, Korlach J, Kremitzki M, Li H, Maduro VV, Marschall T, McCartney AM, McDaniel J, Miller DE, Mullikin JC, Myers EW, Olson ND, Paten B, Peluso P, Pevzner PA, Porubsky D, Potapova T, Rogaev EI, Rosenfeld JA, Salzberg SL, Schneider VA, Sedlazeck FJ, Shafin K, Shew CJ, Shumate A, Sims Y, Smit AFA, Soto DC, Sović I, Storer JM, Streets A, Sullivan BA, Thibaud-Nissen F, Torrance J, Wagner J, Walenz BP, Wenger A, Wood JMD, Xiao C, Yan SM, Young AC, Zarate S, Surti U, McCoy RC, Dennis MY, Alexandrov IA, Gerton JL, O’Neill RJ, Timp W, Zook JM, Schatz MC, Eichler EE, Miga KH, Phillippy AM. The complete sequence of a human genome. Science 2022; 376:44-53. [PMID: 35357919 PMCID: PMC9186530 DOI: 10.1126/science.abj6987] [Citation(s) in RCA: 1034] [Impact Index Per Article: 517.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Collapse
Affiliation(s)
- Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Mikko Rautiainen
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego; La Jolla, CA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | - Mitchell R. Vollger
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Nicolas Altemose
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
| | - Lev Uralsky
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
| | - Ariel Gershman
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
| | - Sergey Aganezov
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Savannah J. Hoyt
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Michael Alonge
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | | | | | - Gerard G. Bouffard
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Shelise Y. Brooks
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gina V. Caldas
- Department of Molecular and Cell Biology, University of California, Berkeley; Berkeley, CA, USA
| | - Nae-Chyun Chen
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Haoyu Cheng
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | | | | | | | - Philip C. Dishuck
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Richard Durbin
- Wellcome Sanger Institute; Cambridge, UK
- Department of Genetics, University of Cambridge; Cambridge, UK
| | - Tatiana Dvorkina
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
| | | | - Giulio Formenti
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Robert S. Fulton
- Department of Genetics, Washington University School of Medicine; St. Louis, MO, USA
| | | | - Erik Garrison
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- University of Tennessee Health Science Center; Memphis, TN, USA
| | - Patrick G.S. Grady
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | | | - Ira M. Hall
- Department of Genetics, Yale University School of Medicine; New Haven, CT, USA
| | - Nancy F. Hansen
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Gabrielle A. Hartley
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | | | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
- Department of Computational and Data Sciences, Indian Institute of Science; Bangalore KA, India
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Erich D. Jarvis
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University; New York, NY, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | | | - Melanie Kirsche
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | | | - Milinn Kremitzki
- McDonnell Genome Institute, Washington University in St. Louis; St. Louis, MO, USA
| | - Heng Li
- Department of Data Sciences, Dana-Farber Cancer Institute; Boston, MA
- Department of Biomedical Informatics, Harvard Medical School; Boston, MA
| | - Valerie V. Maduro
- Undiagnosed Diseases Program, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Tobias Marschall
- Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics; Düsseldorf, Germany
| | - Ann M. McCartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | - Jennifer McDaniel
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Danny E. Miller
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children’s Hospital; Seattle, WA, USA
| | - James C. Mullikin
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
- Comparative Genomics Analysis Unit, Cancer Genetics and Comparative Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Eugene W. Myers
- Max-Planck Institute of Molecular Cell Biology and Genetics; Dresden, Germany
| | - Nathan D. Olson
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | | | - Pavel A. Pevzner
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
| | - Tamara Potapova
- Stowers Institute for Medical Research; Kansas City, MO, USA
| | - Evgeny I. Rogaev
- Sirius University of Science and Technology; Sochi, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Department of Psychiatry, University of Massachusetts Medical School; Worcester, MA, USA
- Faculty of Biology, Lomonosov Moscow State University; Moscow, Russia
| | | | - Steven L. Salzberg
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine; Houston TX, USA
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
| | - Colin J. Shew
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Ying Sims
- Wellcome Sanger Institute; Cambridge, UK
| | | | - Daniela C. Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan Sović
- Pacific Biosciences; Menlo Park, CA, USA
- Digital BioLogic d.o.o.; Ivanić-Grad, Croatia
| | | | - Aaron Streets
- Department of Bioengineering, University of California, Berkeley; Berkeley, CA, USA
- Chan Zuckerberg Biohub; San Francisco, CA, USA
| | - Beth A. Sullivan
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine; Durham, NC, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | | | - Justin Wagner
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Brian P. Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| | | | | | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health; Bethesda, MD, USA
| | - Stephanie M. Yan
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Alice C. Young
- NIH Intramural Sequencing Center, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD, USA
| | - Samantha Zarate
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
| | - Urvashi Surti
- Department of Pathology, University of Pittsburgh; Pittsburgh, PA, USA
| | - Rajiv C. McCoy
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Megan Y. Dennis
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis; CA, USA
| | - Ivan A. Alexandrov
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University; Saint Petersburg, Russia
- Vavilov Institute of General Genetics; Moscow, Russia
- Research Center of Biotechnology of the Russian Academy of Sciences; Moscow, Russia
| | - Jennifer L. Gerton
- Stowers Institute for Medical Research; Kansas City, MO, USA
- Department of Biochemistry and Molecular Biology, University of Kansas Medical School; Kansas City, MO, USA
| | - Rachel J. O’Neill
- Institute for Systems Genomics and Department of Molecular and Cell Biology, University of Connecticut; Storrs, CT, USA
| | - Winston Timp
- Department of Molecular Biology and Genetics, Johns Hopkins University; Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University; Baltimore, MD, USA
| | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology; Gaithersburg, MD, USA
| | - Michael C. Schatz
- Department of Computer Science, Johns Hopkins University; Baltimore, MD, USA
- Department of Biology, Johns Hopkins University; Baltimore, MD, USA
| | - Evan E. Eichler
- Department of Genome Sciences, University of Washington School of Medicine; Seattle, WA, USA
- Howard Hughes Medical Institute; Chevy Chase, MD, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz; Santa Cruz, CA, USA
- Department of Biomolecular Engineering, University of California Santa Cruz, CA, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health; Bethesda, MD USA
| |
Collapse
|
37
|
Schmitz MT, Sandoval K, Chen CP, Mostajo-Radji MA, Seeley WW, Nowakowski TJ, Ye CJ, Paredes MF, Pollen AA. The development and evolution of inhibitory neurons in primate cerebrum. Nature 2022; 603:871-877. [PMID: 35322231 PMCID: PMC8967711 DOI: 10.1038/s41586-022-04510-w] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 02/01/2022] [Indexed: 12/14/2022]
Abstract
Neuroanatomists have long speculated that expanded primate brains contain an increased morphological diversity of inhibitory neurons (INs)1, and recent studies have identified primate-specific neuronal populations at the molecular level2. However, we know little about the developmental mechanisms that specify evolutionarily novel cell types in the brain. Here, we reconstruct gene expression trajectories specifying INs generated throughout the neurogenic period in macaques and mice by analysing the transcriptomes of 250,181 cells. We find that the initial classes of INs generated prenatally are largely conserved among mammals. Nonetheless, we identify two contrasting developmental mechanisms for specifying evolutionarily novel cell types during prenatal development. First, we show that recently identified primate-specific TAC3 striatal INs are specified by a unique transcriptional programme in progenitors followed by induction of a distinct suite of neuropeptides and neurotransmitter receptors in new-born neurons. Second, we find that multiple classes of transcriptionally conserved olfactory bulb (OB)-bound precursors are redirected to expanded primate white matter and striatum. These classes include a novel peristriatal class of striatum laureatum neurons that resemble dopaminergic periglomerular cells of the OB. We propose an evolutionary model in which conserved initial classes of neurons supplying the smaller primate OB are reused in the enlarged striatum and cortex. Together, our results provide a unified developmental taxonomy of initial classes of mammalian INs and reveal multiple developmental mechanisms for neural cell type evolution. Evolutionary modelling shows that an initial set of inhibitory neurons serving olfactory bulbs may have been repurposed to diversify the taxonomy of interneurons found in the expanded striata and cortices in primates.
Collapse
Affiliation(s)
- Matthew T Schmitz
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.,Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Kadellyn Sandoval
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.,Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Christopher P Chen
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.,Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Mohammed A Mostajo-Radji
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.,Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - William W Seeley
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Tomasz J Nowakowski
- Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.,Department of Anatomy, University of California, San Francisco, San Francisco, CA, USA.,Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, USA.,Department of Neurological Surgery, University of California, San Francisco, San Francisco, CA, USA.,Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Chun Jimmie Ye
- Chan Zuckerberg Biohub, San Francisco, CA, USA.,Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA.,Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA.,Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
| | - Mercedes F Paredes
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.,Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Alex A Pollen
- Eli and Edythe Broad Center for Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA. .,Department of Neurology, University of California, San Francisco, San Francisco, CA, USA. .,Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA. .,Chan Zuckerberg Biohub, San Francisco, CA, USA.
| |
Collapse
|
38
|
Hiraoka S, Sumida T, Hirai M, Toyoda A, Kawagucci S, Yokokawa T, Nunoura T. Diverse DNA modification in marine prokaryotic and viral communities. Nucleic Acids Res 2022; 50:1531-1550. [PMID: 35051998 PMCID: PMC8919816 DOI: 10.1093/nar/gkab1292] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 11/30/2021] [Accepted: 12/17/2021] [Indexed: 11/15/2022] Open
Abstract
DNA chemical modifications, including methylation, are widespread and play important roles in prokaryotes and viruses. However, current knowledge of these modification systems is severely biased towards a limited number of culturable prokaryotes, despite the fact that a vast majority of microorganisms have not yet been cultured. Here, using single-molecule real-time sequencing, we conducted culture-independent 'metaepigenomic' analyses (an integrated analysis of metagenomics and epigenomics) of marine microbial communities. A total of 233 and 163 metagenomic-assembled genomes (MAGs) were constructed from diverse prokaryotes and viruses, respectively, and 220 modified motifs and 276 DNA methyltransferases (MTases) were identified. Most of the MTase genes were not genetically linked with the endonuclease genes predicted to be involved in defense mechanisms against extracellular DNA. The MTase-motif correspondence found in the MAGs revealed 10 novel pairs, 5 of which showed novel specificities and experimentally confirmed the catalytic specificities of the MTases. We revealed novel alternative specificities in MTases that are highly conserved in Alphaproteobacteria, which may enhance our understanding of the co-evolutionary history of the methylation systems and the genomes. Our findings highlight diverse unexplored DNA modifications that potentially affect the ecology and evolution of prokaryotes and viruses in nature.
Collapse
Affiliation(s)
- Satoshi Hiraoka
- Research Center for Bioscience and Nanoscience (CeBN), Research Institute for Marine Resources Utilization, Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan
| | - Tomomi Sumida
- Research Center for Bioscience and Nanoscience (CeBN), Research Institute for Marine Resources Utilization, Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan
| | - Miho Hirai
- Institute for Extra-cutting-edge Science and Technology Avant-garde Research (X-star), Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan
| | - Atsushi Toyoda
- Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Shinsuke Kawagucci
- Institute for Extra-cutting-edge Science and Technology Avant-garde Research (X-star), Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan.,Marine Biodiversity and Environmental Assessment Research Center (BioEnv), Research Institute for Global Change (RIGC), Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan
| | - Taichi Yokokawa
- Institute for Extra-cutting-edge Science and Technology Avant-garde Research (X-star), Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan
| | - Takuro Nunoura
- Research Center for Bioscience and Nanoscience (CeBN), Research Institute for Marine Resources Utilization, Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan
| |
Collapse
|
39
|
Pucker B, Irisarri I, de Vries J, Xu B. Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions. QUANTITATIVE PLANT BIOLOGY 2022; 3:e5. [PMID: 37077982 PMCID: PMC10095996 DOI: 10.1017/qpb.2021.18] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/24/2021] [Accepted: 12/21/2021] [Indexed: 05/03/2023]
Abstract
Third-generation long-read sequencing is transforming plant genomics. Oxford Nanopore Technologies and Pacific Biosciences are offering competing long-read sequencing technologies and enable plant scientists to investigate even large and complex plant genomes. Sequencing projects can be conducted by single research groups and sequences of smaller plant genomes can be completed within days. This also resulted in an increased investigation of genomes from multiple species in large scale to address fundamental questions associated with the origin and evolution of land plants. Increased accessibility of sequencing devices and user-friendly software allows more researchers to get involved in genomics. Current challenges are accurately resolving diploid or polyploid genome sequences and better accounting for the intra-specific diversity by switching from the use of single reference genome sequences to a pangenome graph.
Collapse
Affiliation(s)
- Boas Pucker
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
- Institute of Plant Biology & Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
- Author for correspondence: Boas Pucker E-mail:
| | - Iker Irisarri
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
| | - Jan de Vries
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
- Department of Applied Bioinformatics, Göttingen Center for Molecular Biosciences (GZMB), University of Goettingen, Göttingen, Germany
| | - Bo Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
40
|
Salama SR. The Complexity of the Mammalian Transcriptome. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1363:11-22. [PMID: 35220563 DOI: 10.1007/978-3-030-92034-0_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Draft genome assemblies for multiple mammalian species combined with new technologies to map transcripts from diverse RNA samples to these genomes developed in the early 2000s revealed that the mammalian transcriptome was vastly larger and more complex than previously anticipated. Efforts to comprehensively catalog the identity and features of transcripts present in a variety of species, tissues and cell lines revealed that a large fraction of the mammalian genome is transcribed in at least some settings. A large number of these transcripts encode long non-coding RNAs (lncRNAs). Many lncRNAs overlap or are anti-sense to protein coding genes and others overlap small RNAs. However, a large number are independent of any previously known mRNA or small RNA. While the functions of a majority of these lncRNAs are unknown, many appear to play roles in gene regulation. Many lncRNAs have species-specific and cell type specific expression patterns and their evolutionary origins are varied. While technological challenges have hindered getting a full picture of the diversity and transcript structure of all of the transcripts arising from lncRNA loci, new technologies including single molecule nanopore sequencing and single cell RNA sequencing promise to generate a comprehensive picture of the mammalian transcriptome.
Collapse
Affiliation(s)
- Sofie R Salama
- UC Santa Cruz Genomics Institute, Department of Biomolecular Engineering and Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
41
|
Shafin K, Pesout T, Chang PC, Nattestad M, Kolesnikov A, Goel S, Baid G, Kolmogorov M, Eizenga JM, Miga KH, Carnevali P, Jain M, Carroll A, Paten B. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat Methods 2021; 18:1322-1332. [PMID: 34725481 PMCID: PMC8571015 DOI: 10.1038/s41592-021-01299-w] [Citation(s) in RCA: 106] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 09/06/2021] [Indexed: 01/15/2023]
Abstract
Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished).
Collapse
Affiliation(s)
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | | | | | | | | | | | | - Karen H Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Miten Jain
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | |
Collapse
|
42
|
Sætre CLC, Eroukhmanoff F, Rönkä K, Kluen E, Thorogood R, Torrance J, Tracey A, Chow W, Pelan S, Howe K, Jakobsen KS, Tørresen OK. A Chromosome-Level Genome Assembly of the Reed Warbler (Acrocephalus scirpaceus). Genome Biol Evol 2021; 13:6367782. [PMID: 34499122 PMCID: PMC8459166 DOI: 10.1093/gbe/evab212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/06/2021] [Indexed: 11/13/2022] Open
Abstract
The reed warbler (Acrocephalus scirpaceus) is a long-distance migrant passerine with a wide distribution across Eurasia. This species has fascinated researchers for decades, especially its role as host of a brood parasite, and its capacity for rapid phenotypic change in the face of climate change. Currently, it is expanding its range northwards in Europe, and is altering its migratory behavior in certain areas. Thus, there is great potential to discover signs of recent evolution and its impact on the genomic composition of the reed warbler. Here, we present a high-quality reference genome for the reed warbler, based on PacBio, 10×, and Hi-C sequencing. The genome has an assembly size of 1,075,083,815 bp with a scaffold N50 of 74,438,198 bp and a contig N50 of 12,742,779 bp. BUSCO analysis using aves_odb10 as a model showed that 95.7% of BUSCO genes were complete. We found unequivocal evidence of two separate macrochromosomal fusions in the reed warbler genome, in addition to the previously identified fusion between chromosome Z and a part of chromosome 4A in the Sylvioidea superfamily. We annotated 14,645 protein-coding genes, and a BUSCO analysis of the protein sequences indicated 97.5% completeness. This reference genome will serve as an important resource, and will provide new insights into the genomic effects of evolutionary drivers such as coevolution, range expansion, and adaptations to climate change, as well as chromosomal rearrangements in birds.
Collapse
Affiliation(s)
| | | | - Katja Rönkä
- HiLIFE Helsinki Institute of Life Sciences, University of Helsinki, Finland.,Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
| | - Edward Kluen
- HiLIFE Helsinki Institute of Life Sciences, University of Helsinki, Finland.,Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
| | - Rose Thorogood
- HiLIFE Helsinki Institute of Life Sciences, University of Helsinki, Finland.,Research Programme in Organismal and Evolutionary Biology, Faculty of Biological and Environmental Sciences, University of Helsinki, Finland
| | - James Torrance
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Sarah Pelan
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Norway
| | - Ole K Tørresen
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Norway
| |
Collapse
|
43
|
Abstract
The reference human genome sequence is inarguably the most important and widely used resource in the fields of human genetics and genomics. It has transformed the conduct of biomedical sciences and brought invaluable benefits to the understanding and improvement of human health. However, the commonly used reference sequence has profound limitations, because across much of its span, it represents the sequence of just one human haplotype. This single, monoploid reference structure presents a critical barrier to representing the broad genomic diversity in the human population. In this review, we discuss the modernization of the reference human genome sequence to a more complete reference of human genomic diversity, known as a human pangenome.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute and Department of Biomedical Engineering, University of California, Santa Cruz, California 95064, USA;
| | - Ting Wang
- Department of Genetics, Edison Family Center for Genome Sciences and Systems Biology, and McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
| |
Collapse
|
44
|
Kim BY, Wang JR, Miller DE, Barmina O, Delaney E, Thompson A, Comeault AA, Peede D, D'Agostino ERR, Pelaez J, Aguilar JM, Haji D, Matsunaga T, Armstrong EE, Zych M, Ogawa Y, Stamenković-Radak M, Jelić M, Veselinović MS, Tanasković M, Erić P, Gao JJ, Katoh TK, Toda MJ, Watabe H, Watada M, Davis JS, Moyle LC, Manoli G, Bertolini E, Košťál V, Hawley RS, Takahashi A, Jones CD, Price DK, Whiteman N, Kopp A, Matute DR, Petrov DA. Highly contiguous assemblies of 101 drosophilid genomes. eLife 2021; 10:e66405. [PMID: 34279216 PMCID: PMC8337076 DOI: 10.7554/elife.66405] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 07/16/2021] [Indexed: 12/13/2022] Open
Abstract
Over 100 years of studies in Drosophila melanogaster and related species in the genus Drosophila have facilitated key discoveries in genetics, genomics, and evolution. While high-quality genome assemblies exist for several species in this group, they only encompass a small fraction of the genus. Recent advances in long-read sequencing allow high-quality genome assemblies for tens or even hundreds of species to be efficiently generated. Here, we utilize Oxford Nanopore sequencing to build an open community resource of genome assemblies for 101 lines of 93 drosophilid species encompassing 14 species groups and 35 sub-groups. The genomes are highly contiguous and complete, with an average contig N50 of 10.5 Mb and greater than 97% BUSCO completeness in 97/101 assemblies. We show that Nanopore-based assemblies are highly accurate in coding regions, particularly with respect to coding insertions and deletions. These assemblies, along with a detailed laboratory protocol and assembly pipelines, are released as a public resource and will serve as a starting point for addressing broad questions of genetics, ecology, and evolution at the scale of hundreds of species.
Collapse
Affiliation(s)
- Bernard Y Kim
- Department of Biology, Stanford UniversityStanfordUnited States
| | - Jeremy R Wang
- Department of Genetics, University of North CarolinaChapel HillUnited States
| | - Danny E Miller
- Department of Pediatrics, Division of Genetic Medicine, University of Washington and Seattle Children’s HospitalSeattleUnited States
| | - Olga Barmina
- Department of Evolution and Ecology, University of California DavisDavisUnited States
| | - Emily Delaney
- Department of Evolution and Ecology, University of California DavisDavisUnited States
| | - Ammon Thompson
- Department of Evolution and Ecology, University of California DavisDavisUnited States
| | - Aaron A Comeault
- School of Natural Sciences, Bangor UniversityBangorUnited Kingdom
| | - David Peede
- Biology Department, University of North CarolinaChapel HillUnited States
| | | | - Julianne Pelaez
- Department of Integrative Biology, University of California, BerkeleyBerkeleyUnited States
| | - Jessica M Aguilar
- Department of Integrative Biology, University of California, BerkeleyBerkeleyUnited States
| | - Diler Haji
- Department of Integrative Biology, University of California, BerkeleyBerkeleyUnited States
| | - Teruyuki Matsunaga
- Department of Integrative Biology, University of California, BerkeleyBerkeleyUnited States
| | | | - Molly Zych
- Molecular and Cellular Biology Program, University of WashingtonSeattleUnited States
| | - Yoshitaka Ogawa
- Department of Biological Sciences, Tokyo Metropolitan UniversityHachiojiJapan
| | | | - Mihailo Jelić
- Faculty of Biology, University of BelgradeBelgradeSerbia
| | | | - Marija Tanasković
- University of Belgrade, Institute for Biological Research "Siniša Stanković", National Institute of Republic of SerbiaBelgradeSerbia
| | - Pavle Erić
- University of Belgrade, Institute for Biological Research "Siniša Stanković", National Institute of Republic of SerbiaBelgradeSerbia
| | - Jian-Jun Gao
- School of Ecology and Environmental Science, Yunnan UniversityKunmingChina
| | - Takehiro K Katoh
- School of Ecology and Environmental Science, Yunnan UniversityKunmingChina
| | | | - Hideaki Watabe
- Biological Laboratory, Sapporo College, Hokkaido University of EducationSapporoJapan
| | - Masayoshi Watada
- Graduate School of Science and Engineering, Ehime UniversityMatsuyamaJapan
| | - Jeremy S Davis
- Department of Biology, University of KentuckyLexingtonUnited States
| | - Leonie C Moyle
- Department of Biology, Indiana UniversityBloomingtonUnited States
| | - Giulia Manoli
- Neurobiology and Genetics, Theodor Boveri Institute, Biocentre, University of WürzburgWürzburgGermany
| | - Enrico Bertolini
- Neurobiology and Genetics, Theodor Boveri Institute, Biocentre, University of WürzburgWürzburgGermany
| | - Vladimír Košťál
- Institute of Entomology, Biology Centre, Academy of Sciences of the Czech RepublicPragueCzech Republic
| | - R Scott Hawley
- Department of Molecular and Integrative Physiology, University of Kansas Medical Center, Stowers Institute for Medical ResearchKansas CityUnited States
| | - Aya Takahashi
- Department of Biological Sciences, Tokyo Metropolitan UniversityHachiojiJapan
| | - Corbin D Jones
- Biology Department, University of North CarolinaChapel HillUnited States
| | - Donald K Price
- School of Life Science, University of NevadaLas VegasUnited States
| | - Noah Whiteman
- Department of Integrative Biology, University of California, BerkeleyBerkeleyUnited States
| | - Artyom Kopp
- Department of Evolution and Ecology, University of California DavisDavisUnited States
| | - Daniel R Matute
- Biology Department, University of North CarolinaChapel HillUnited States
| | - Dmitri A Petrov
- Department of Biology, Stanford UniversityStanfordUnited States
| |
Collapse
|
45
|
Mao Y, Catacchio CR, Hillier LW, Porubsky D, Li R, Sulovari A, Fernandes JD, Montinaro F, Gordon DS, Storer JM, Haukness M, Fiddes IT, Murali SC, Dishuck PC, Hsieh P, Harvey WT, Audano PA, Mercuri L, Piccolo I, Antonacci F, Munson KM, Lewis AP, Baker C, Underwood JG, Hoekzema K, Huang TH, Sorensen M, Walker JA, Hoffman J, Thibaud-Nissen F, Salama SR, Pang AWC, Lee J, Hastie AR, Paten B, Batzer MA, Diekhans M, Ventura M, Eichler EE. A high-quality bonobo genome refines the analysis of hominid evolution. Nature 2021; 594:77-81. [PMID: 33953399 PMCID: PMC8172381 DOI: 10.1038/s41586-021-03519-x] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 04/07/2021] [Indexed: 12/17/2022]
Abstract
The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3–5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome. A high-quality bonobo genome assembly provides insights into incomplete lineage sorting in hominids and its relevance to gene evolution and the genetic relationship among living hominids.
Collapse
Affiliation(s)
- Yafei Mao
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - LaDeana W Hillier
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Ruiyang Li
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Arvis Sulovari
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jason D Fernandes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Francesco Montinaro
- Department of Biology, University of Bari, Bari, Italy.,Estonian Biocentre, Institute of Genomics, Tartu, Estonia
| | - David S Gordon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | | | - Marina Haukness
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Shwetha Canchi Murali
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - PingHsun Hsieh
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - William T Harvey
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | | - Katherine M Munson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Carl Baker
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | | | - Kendra Hoekzema
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Tzu-Hsueh Huang
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Melanie Sorensen
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Jerilyn A Walker
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Jinna Hoffman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Sofie R Salama
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA.,Howard Hughes Medical Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | | | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | | | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mark A Batzer
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA
| | - Mario Ventura
- Department of Biology, University of Bari, Bari, Italy.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA. .,Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
46
|
Warren WC, Harris RA, Haukness M, Fiddes IT, Murali SC, Fernandes J, Dishuck PC, Storer JM, Raveendran M, Hillier LW, Porubsky D, Mao Y, Gordon D, Vollger MR, Lewis AP, Munson KM, DeVogelaere E, Armstrong J, Diekhans M, Walker JA, Tomlinson C, Graves-Lindsay TA, Kremitzki M, Salama SR, Audano PA, Escalona M, Maurer NW, Antonacci F, Mercuri L, Maggiolini FAM, Catacchio CR, Underwood JG, O'Connor DH, Sanders AD, Korbel JO, Ferguson B, Kubisch HM, Picker L, Kalin NH, Rosene D, Levine J, Abbott DH, Gray SB, Sanchez MM, Kovacs-Balint ZA, Kemnitz JW, Thomasy SM, Roberts JA, Kinnally EL, Capitanio JP, Skene JHP, Platt M, Cole SA, Green RE, Ventura M, Wiseman RW, Paten B, Batzer MA, Rogers J, Eichler EE. Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility. Science 2021; 370:370/6523/eabc6617. [PMID: 33335035 DOI: 10.1126/science.abc6617] [Citation(s) in RCA: 73] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 10/29/2020] [Indexed: 12/15/2022]
Abstract
The rhesus macaque (Macaca mulatta) is the most widely studied nonhuman primate (NHP) in biomedical research. We present an updated reference genome assembly (Mmul_10, contig N50 = 46 Mbp) that increases the sequence contiguity 120-fold and annotate it using 6.5 million full-length transcripts, thus improving our understanding of gene content, isoform diversity, and repeat organization. With the improved assembly of segmental duplications, we discovered new lineage-specific genes and expanded gene families that are potentially informative in studies of evolution and disease susceptibility. Whole-genome sequencing (WGS) data from 853 rhesus macaques identified 85.7 million single-nucleotide variants (SNVs) and 10.5 million indel variants, including potentially damaging variants in genes associated with human autism and developmental delay, providing a framework for developing noninvasive NHP models of human disease.
Collapse
Affiliation(s)
- Wesley C Warren
- Department of Animal Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA. .,Department of Surgery, School of Medicine, University of Missouri, Columbia, MO 65211, USA.,Institute of Data Science and Informatics, University of Missouri, Columbia, MO 65211, USA
| | - R Alan Harris
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Marina Haukness
- Computational Genomics Laboratory, University of California-Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Shwetha C Murali
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Jason Fernandes
- Department of Biomolecular Engineering, University of California-Santa Cruz, Santa Cruz, CA 95064, USA
| | - Philip C Dishuck
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Jessica M Storer
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA.,Institue for Systems Biology, Seattle, WA 98109, USA
| | - Muthuswamy Raveendran
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - LaDeana W Hillier
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - David Porubsky
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Yafei Mao
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - David Gordon
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | - Mitchell R Vollger
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Alexandra P Lewis
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Katherine M Munson
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Elizabeth DeVogelaere
- Computational Genomics Laboratory, University of California-Santa Cruz, Santa Cruz, CA 95064, USA
| | - Joel Armstrong
- Computational Genomics Laboratory, University of California-Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark Diekhans
- Computational Genomics Laboratory, University of California-Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jerilyn A Walker
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Chad Tomlinson
- McDonnell Genome Institute, Washington University, St. Louis, MO 63108, USA
| | | | - Milinn Kremitzki
- McDonnell Genome Institute, Washington University, St. Louis, MO 63108, USA
| | - Sofie R Salama
- Department of Biomolecular Engineering, University of California-Santa Cruz, Santa Cruz, CA 95064, USA
| | - Peter A Audano
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Merly Escalona
- Department of Biomolecular Engineering, University of California-Santa Cruz, Santa Cruz, CA 95064, USA
| | - Nicholas W Maurer
- Department of Biomolecular Engineering, University of California-Santa Cruz, Santa Cruz, CA 95064, USA
| | | | - Ludovica Mercuri
- Department of Biology, University of Bari 'Aldo Moro', 70125 Bari, Italy
| | | | | | | | - David H O'Connor
- Department of Pathology and Laboratory Medicine, Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI 53711, USA
| | - Ashley D Sanders
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Jan O Korbel
- European Molecular Biology Laboratory, Genome Biology Unit, Heidelberg, Germany
| | - Betsy Ferguson
- Division of Genetics, Oregon National Primate Research Center, Oregon Health and Science University, Beaverton, OR 97006, USA
| | | | - Louis Picker
- Oregon National Primate Research Center and Vaccine and Gene Therapy Institute, Oregon Health Sciences University, Beaverton, OR 97006, USA
| | - Ned H Kalin
- Department of Psychiatry, University of Wisconsin School of Medicine and Public Health, Madison, WI 53719, USA
| | - Douglas Rosene
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA 02118, USA
| | - Jon Levine
- Department of Neuroscience, University of Wisconsin, Madison, WI 53175, USA.,Wisconsin National Primate Research Center, University of Wisconsin, Madison, WI 53171, USA
| | - David H Abbott
- Wisconsin National Primate Research Center, University of Wisconsin, Madison, WI 53171, USA.,Department of Obstetrics and Gynecology, Wisconsin National Primate Research Center, University of Wisconsin, Madison, WI 53715, USA
| | - Stanton B Gray
- The University of Texas MD Anderson Cancer Center, Michale E. Keeling Center for Comparative Medicine and Research, Bastrop, TX 78602, USA
| | - Mar M Sanchez
- Yerkes National Primate Research Center, Atlanta, GA 30329, USA.,Department of Psychiatry and Behavioral Sciences, Emory University School of Medicine, Atlanta, GA 30329, USA
| | | | - Joseph W Kemnitz
- Wisconsin National Primate Research Center, University of Wisconsin, Madison, WI 53171, USA.,Department of Cell and Regenerative Biology, University of Wisconsin, Madison, WI 53706, USA
| | - Sara M Thomasy
- Department of Surgical and Radiological Sciences, School of Veterinary Medicine, University of California-Davis, Davis, CA 95616, USA.,Department of Ophthalmology and Vision Science, School of Medicine, University of California-Davis, Davis, CA 95817, USA
| | | | - Erin L Kinnally
- California National Primate Research Center, Davis, CA 95616, USA.,Department of Psychology, University of California, Davis, CA 95616, USA
| | - John P Capitanio
- California National Primate Research Center, Davis, CA 95616, USA.,Department of Psychology, University of California, Davis, CA 95616, USA
| | - J H Pate Skene
- Department of Neurobiology, Duke University School of Medicine, Durham, NC 27710, USA
| | - Michael Platt
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Shelley A Cole
- Population Health Program, Texas Biomedical Research Institute and Southwest National Primate Research Center, San Antonio, TX 78227, USA
| | - Richard E Green
- Department of Biomolecular Engineering, University of California-Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mario Ventura
- Department of Biology, University of Bari 'Aldo Moro', 70125 Bari, Italy
| | - Roger W Wiseman
- Department of Pathology and Laboratory Medicine, Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI 53711, USA
| | - Benedict Paten
- Computational Genomics Laboratory, University of California-Santa Cruz, Santa Cruz, CA 95064, USA
| | - Mark A Batzer
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, USA
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA. .,Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
47
|
Minkin I, Medvedev P. Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. Nat Commun 2020; 11:6327. [PMID: 33303762 PMCID: PMC7728760 DOI: 10.1038/s41467-020-19777-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2019] [Accepted: 10/29/2020] [Indexed: 11/29/2022] Open
Abstract
Multiple whole-genome alignment is a challenging problem in bioinformatics. Despite many successes, current methods are not able to keep up with the growing number, length, and complexity of assembled genomes, especially when computational resources are limited. Approaches based on compacted de Bruijn graphs to identify and extend anchors into locally collinear blocks have potential for scalability, but current methods do not scale to mammalian genomes. We present an algorithm, SibeliaZ-LCB, for identifying collinear blocks in closely related genomes based on analysis of the de Bruijn graph. We further incorporate this into a multiple whole-genome alignment pipeline called SibeliaZ. SibeliaZ shows run-time improvements over other methods while maintaining accuracy. On sixteen recently-assembled strains of mice, SibeliaZ runs in under 16 hours on a single machine, while other tools did not run to completion for eight mice within a week. SibeliaZ makes a significant step towards improving scalability of multiple whole-genome alignment and collinear block reconstruction algorithms on a single machine. Multiple whole-genome alignment is a challenging problem in bioinformatics, especially when computational resources are limited. Here the authors present SibeliaZ, an algorithm and software based on analysis of de Bruijn graphs, which provides improved computational efficiency and scalability.
Collapse
Affiliation(s)
- Ilia Minkin
- Department of Computer Science and Engineering, The Pennsylvania State University, 506 Wartik Lab University Park, University Park, PA, 16802, USA.
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, 506 Wartik Lab University Park, University Park, PA, 16802, USA.,Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 506 Wartik Lab University Park, University Park, PA, 16802, USA.,Center for Computational Biology and Bioinformatics, The Pennsylvania State University, 506 Wartik Lab University Park, University Park, PA, 16802, USA
| |
Collapse
|
48
|
Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J, Genereux D, Johnson J, Marinescu VD, Alföldi J, Harris RS, Lindblad-Toh K, Haussler D, Karlsson E, Jarvis ED, Zhang G, Paten B. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 2020; 587:246-251. [PMID: 33177663 PMCID: PMC7673649 DOI: 10.1038/s41586-020-2871-y] [Citation(s) in RCA: 190] [Impact Index Per Article: 47.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2019] [Accepted: 07/27/2020] [Indexed: 12/11/2022]
Abstract
New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1-3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.
Collapse
Affiliation(s)
- Joel Armstrong
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Glenn Hickey
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Ian T Fiddes
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Adam M Novak
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Alden Deran
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Qi Fang
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Duo Xie
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Shaohong Feng
- BGI-Shenzhen, Beishan Industrial Zone, Shenzhen, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Josefin Stiller
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Diane Genereux
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
| | - Jeremy Johnson
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
| | - Voichita Dana Marinescu
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Jessica Alföldi
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
| | - Robert S Harris
- Department of Biology, The Pennsylvania State University, University Park, PA, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - David Haussler
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Elinor Karlsson
- Broad Institute of Harvard and Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA, USA
- Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Erich D Jarvis
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Guojie Zhang
- Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
- China National GeneBank, BGI-Shenzhen, Shenzhen, China.
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA.
| |
Collapse
|
49
|
Jung H, Ventura T, Chung JS, Kim WJ, Nam BH, Kong HJ, Kim YO, Jeon MS, Eyun SI. Twelve quick steps for genome assembly and annotation in the classroom. PLoS Comput Biol 2020; 16:e1008325. [PMID: 33180771 PMCID: PMC7660529 DOI: 10.1371/journal.pcbi.1008325] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Eukaryotic genome sequencing and de novo assembly, once the exclusive domain of well-funded international consortia, have become increasingly affordable, thus fitting the budgets of individual research groups. Third-generation long-read DNA sequencing technologies are increasingly used, providing extensive genomic toolkits that were once reserved for a few select model organisms. Generating high-quality genome assemblies and annotations for many aquatic species still presents significant challenges due to their large genome sizes, complexity, and high chromosome numbers. Indeed, selecting the most appropriate sequencing and software platforms and annotation pipelines for a new genome project can be daunting because tools often only work in limited contexts. In genomics, generating a high-quality genome assembly/annotation has become an indispensable tool for better understanding the biology of any species. Herein, we state 12 steps to help researchers get started in genome projects by presenting guidelines that are broadly applicable (to any species), sustainable over time, and cover all aspects of genome assembly and annotation projects from start to finish. We review some commonly used approaches, including practical methods to extract high-quality DNA and choices for the best sequencing platforms and library preparations. In addition, we discuss the range of potential bioinformatics pipelines, including structural and functional annotations (e.g., transposable elements and repetitive sequences). This paper also includes information on how to build a wide community for a genome project, the importance of data management, and how to make the data and results Findable, Accessible, Interoperable, and Reusable (FAIR) by submitting them to a public repository and sharing them with the research community.
Collapse
Affiliation(s)
- Hyungtaek Jung
- School of Biological Sciences, The University of Queensland, St Lucia, Queensland, Australia
- Centre for Agriculture and Bioeconomy, Queensland University of Technology, Brisbane, Queensland, Australia
| | - Tomer Ventura
- Genecology Research Centre, School of Science and Engineering, University of the Sunshine Coast, Sippy Downs, Queensland, Australia
| | - J. Sook Chung
- Institute of Marine and Environmental Technology, University of Maryland Center for Environmental Science, Baltimore, Maryland, United States of America
| | - Woo-Jin Kim
- Genetics and Breeding Research Center, National Institute of Fisheries Science, Geoje, Korea
| | - Bo-Hye Nam
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Hee Jeong Kong
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Young-Ok Kim
- Biotechnology Research Division, National Institute of Fisheries Science, Busan, Korea
| | - Min-Seung Jeon
- Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Seong-il Eyun
- Department of Life Science, Chung-Ang University, Seoul, Korea
| |
Collapse
|
50
|
Mahtani-Williams S, Fulton W, Desvars-Larrive A, Lado S, Elbers JP, Halpern B, Herczeg D, Babocsay G, Lauš B, Nagy ZT, Jablonski D, Kukushkin O, Orozco-terWengel P, Vörös J, Burger PA. Landscape Genomics of a Widely Distributed Snake, Dolichophis caspius (Gmelin, 1789) across Eastern Europe and Western Asia. Genes (Basel) 2020; 11:genes11101218. [PMID: 33080926 PMCID: PMC7603136 DOI: 10.3390/genes11101218] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 10/02/2020] [Accepted: 10/15/2020] [Indexed: 11/29/2022] Open
Abstract
Across the distribution of the Caspian whipsnake (Dolichophis caspius), populations have become increasingly disconnected due to habitat alteration. To understand population dynamics and this widespread but locally endangered snake’s adaptive potential, we investigated population structure, admixture, and effective migration patterns. We took a landscape-genomic approach to identify selected genotypes associated with environmental variables relevant to D. caspius. With double-digest restriction-site associated DNA (ddRAD) sequencing of 53 samples resulting in 17,518 single nucleotide polymorphisms (SNPs), we identified 8 clusters within D. caspius reflecting complex evolutionary patterns of the species. Estimated Effective Migration Surfaces (EEMS) revealed higher-than-average gene flow in most of the Balkan Peninsula and lower-than-average gene flow along the middle section of the Danube River. Landscape genomic analysis identified 751 selected genotypes correlated with 7 climatic variables. Isothermality correlated with the highest number of selected genotypes (478) located in 41 genes, followed by annual range (127) and annual mean temperature (87). We conclude that environmental variables, especially the day-to-night temperature oscillation in comparison to the summer-to-winter oscillation, may have an important role in the distribution and adaptation of D. caspius.
Collapse
Affiliation(s)
- Sarita Mahtani-Williams
- Research Institute of Wildlife Ecology, Vetmeduni Vienna, Savoyenstrasse 1, A-1160 Vienna, Austria; (S.M.-W.); (W.F.); (A.D.-L.); (S.L.); (J.P.E.)
- Cardiff School of Biosciences, Cardiff University, The Sir Martin Evans Building, Museum Ave, Cardiff CF103AX, UK;
- Fundación Charles Darwin, Avenida Charles Darwin s/n, Casilla 200144, Puerto Ayora EC-200350, Ecuador
| | - William Fulton
- Research Institute of Wildlife Ecology, Vetmeduni Vienna, Savoyenstrasse 1, A-1160 Vienna, Austria; (S.M.-W.); (W.F.); (A.D.-L.); (S.L.); (J.P.E.)
- Cardiff School of Biosciences, Cardiff University, The Sir Martin Evans Building, Museum Ave, Cardiff CF103AX, UK;
| | - Amelie Desvars-Larrive
- Research Institute of Wildlife Ecology, Vetmeduni Vienna, Savoyenstrasse 1, A-1160 Vienna, Austria; (S.M.-W.); (W.F.); (A.D.-L.); (S.L.); (J.P.E.)
- Institute of Food Safety, Food Technology and Veterinary Public Health, Vetmeduni Vienna, Veterinaerplatz 1, A-1210 Vienna, Austria
- Complexity Science Hub Vienna, Josefstädter Straße 39, A-1080 Vienna, Austria
| | - Sara Lado
- Research Institute of Wildlife Ecology, Vetmeduni Vienna, Savoyenstrasse 1, A-1160 Vienna, Austria; (S.M.-W.); (W.F.); (A.D.-L.); (S.L.); (J.P.E.)
| | - Jean Pierre Elbers
- Research Institute of Wildlife Ecology, Vetmeduni Vienna, Savoyenstrasse 1, A-1160 Vienna, Austria; (S.M.-W.); (W.F.); (A.D.-L.); (S.L.); (J.P.E.)
| | - Bálint Halpern
- MME Birdlife Hungary, Költő utca 21., H-1121 Budapest, Hungary; (B.H.); (G.B.)
| | - Dávid Herczeg
- Lendület Evolutionary Ecology Research Group, Centre for Agricultural Research, Plant Protection Institute, Herman Ottó út 15., H-1022 Budapest, Hungary;
| | - Gergely Babocsay
- MME Birdlife Hungary, Költő utca 21., H-1121 Budapest, Hungary; (B.H.); (G.B.)
- Mátra Museum of the Hungarian Natural History Museum, Kossuth Lajos utca 40., H-3200 Gyöngyös, Hungary
| | - Boris Lauš
- Association HYLA, Lipocac I., No. 7, C-10000 Zagreb, Croatia;
| | - Zoltán Tamás Nagy
- Independent Researcher, Hielscherstraße 25, D-13158 Berlin, Germany;
| | - Daniel Jablonski
- Department of Zoology, Comenius University in Bratislava, Ilkovičova 6, Mlynská Dolina, S-84215 Bratislava, Slovakia;
| | - Oleg Kukushkin
- Department of Biodiversity Studies and Ecological Monitoring, T. I. Vyazemsky Karadag Scientific Station–Nature Reserve–Branch of Institute of Biology of the Southern Seas of the Russian Academy of Sciences, Nauki Street 24, R-298188 Theodosia, Crimea;
- Department of Herpetology, Zoological Institute of the Russian Academy of Sciences, Universitetskaya Embankment 1, R-199034 Saint Petersburg, Russia
| | - Pablo Orozco-terWengel
- Cardiff School of Biosciences, Cardiff University, The Sir Martin Evans Building, Museum Ave, Cardiff CF103AX, UK;
| | - Judit Vörös
- Department of Zoology, Hungarian Natural History Museum, Baross u. 13., H-1088 Budapest, Hungary
- Molecular Taxonomy Laboratory, Hungarian Natural History Museum, Ludovika tér 2-6., H-1083 Budapest, Hungary
- Correspondence: (J.V.); (P.A.B.)
| | - Pamela Anna Burger
- Research Institute of Wildlife Ecology, Vetmeduni Vienna, Savoyenstrasse 1, A-1160 Vienna, Austria; (S.M.-W.); (W.F.); (A.D.-L.); (S.L.); (J.P.E.)
- Correspondence: (J.V.); (P.A.B.)
| |
Collapse
|