1
|
Roberts JR, Bernstein JM, Austin CC, Hains T, Mata J, Kieras M, Pirro S, Ruane S. Whole snake genomes from eighteen families of snakes (Serpentes: Caenophidia) and their applications to systematics. J Hered 2024; 115:487-497. [PMID: 38722259 DOI: 10.1093/jhered/esae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 05/08/2024] [Indexed: 08/21/2024] Open
Abstract
We present genome assemblies for 18 snake species representing 18 families (Serpentes: Caenophidia): Acrochordus granulatus, Aparallactus werneri, Boaedon fuliginosus, Calamaria suluensis, Cerberus rynchops, Grayia smithii, Imantodes cenchoa, Mimophis mahfalensis, Oxyrhabdium leporinum, Pareas carinatus, Psammodynastes pulverulentus, Pseudoxenodon macrops, Pseudoxyrhopus heterurus, Sibynophis collaris, Stegonotus admiraltiensis, Toxicocalamus goodenoughensis, Trimeresurus albolabris, and Tropidonophis doriae. From these new genome assemblies, we extracted thousands of loci commonly used in systematic and phylogenomic studies on snakes, including target-capture datasets composed of ultraconserved elements (UCEs) and anchored hybrid enriched loci (AHEs), as well as traditional Sanger loci. Phylogenies inferred from the two target-capture loci datasets were identical with each other and strongly congruent with previously published snake phylogenies. To show the additional utility of these non-model genomes for investigative evolutionary research, we mined the genome assemblies of two New Guinea island endemics in our dataset (S. admiraltiensis and T. doriae) for the ATP1a3 gene, a thoroughly researched indicator of resistance to toad toxin ingestion by squamates. We find that both these snakes possess the genotype for toad toxin resistance despite their endemism to New Guinea, a region absent of any toads until the human-mediated introduction of Cane Toads in the 1930s. These species possess identical substitutions that suggest the same bufotoxin resistance as their Australian congenerics (Stegonotus australis and Tropidonophis mairii) which forage on invasive Cane Toads. Herein, we show the utility of short-read high-coverage genomes, as well as improving the deficit of available squamate genomes with associated voucher specimens.
Collapse
Affiliation(s)
- Jackson R Roberts
- Division of Zoology, Sternberg Museum of Natural History, Fort Hays State University, Hays, KS 67601, United States
- Division of Herpetology, Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, United States
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, United States
| | - Justin M Bernstein
- Center for Genomics, University of Kansas, Lawrence, KS 66045, United States
- Department of Biology, University of Texas at Arlington, Arlington, TX 76010, United States
| | - Christopher C Austin
- Division of Herpetology, Museum of Natural Science, Louisiana State University, Baton Rouge, LA 70803, United States
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70803, United States
| | - Taylor Hains
- Committee on Evolutionary Biology, University of Chicago, Chicago, IL 60637, United States
- Life Sciences Section, Negaunee Integrative Research Center, The Field Museum of Natural History, Chicago, IL 60637, United States
| | - Joshua Mata
- Amphibian and Reptile Collection, The Field Museum of Natural History, Chicago, IL 60605, United States
| | - Michael Kieras
- Iridian Genomes, Inc., Bethesda, MD 20817, United States
| | - Stacy Pirro
- Iridian Genomes, Inc., Bethesda, MD 20817, United States
| | - Sara Ruane
- Life Sciences Section, Negaunee Integrative Research Center, The Field Museum of Natural History, Chicago, IL 60637, United States
- Amphibian and Reptile Collection, The Field Museum of Natural History, Chicago, IL 60605, United States
| |
Collapse
|
2
|
Xie H, Linning-Duffy K, Demireva EY, Toh H, Abolibdeh B, Shi J, Zhou B, Iwase S, Yan L. CRISPR-based genome editing of a diurnal rodent, Nile grass rat (Arvicanthis niloticus). BMC Biol 2024; 22:144. [PMID: 38956550 PMCID: PMC11218167 DOI: 10.1186/s12915-024-01943-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 06/21/2024] [Indexed: 07/04/2024] Open
Abstract
BACKGROUND Diurnal and nocturnal mammals have evolved distinct pathways to optimize survival for their chronotype-specific lifestyles. Conventional rodent models, being nocturnal, may not sufficiently recapitulate the biology of diurnal humans in health and disease. Although diurnal rodents are potentially advantageous for translational research, until recently, they have not been genetically tractable. The present study aims to address this major limitation by developing experimental procedures necessary for genome editing in a well-established diurnal rodent model, the Nile grass rat (Arvicanthis niloticus). RESULTS A superovulation protocol was established, which yielded nearly 30 eggs per female grass rat. Fertilized eggs were cultured in a modified rat 1-cell embryo culture medium (mR1ECM), in which grass rat embryos developed from the 1-cell stage into blastocysts. A CRISPR-based approach was then used for gene editing in vivo and in vitro, targeting Retinoic acid-induced 1 (Rai1), the causal gene for Smith-Magenis Syndrome, a neurodevelopmental disorder. The CRISPR reagents were delivered in vivo by electroporation using an improved Genome-editing via Oviductal Nucleic Acids Delivery (i-GONAD) method. The in vivo approach produced several edited founder grass rats with Rai1 null mutations, which showed stable transmission of the targeted allele to the next generation. CRISPR reagents were also microinjected into 2-cell embryos in vitro. Large deletion of the Rai1 gene was confirmed in 70% of the embryos injected, demonstrating high-efficiency genome editing in vitro. CONCLUSION We have established a set of methods that enabled the first successful CRISPR-based genome editing in Nile grass rats. The methods developed will guide future genome editing of this and other diurnal rodent species, which will promote greater utility of these models in basic and translational research.
Collapse
Affiliation(s)
- Huirong Xie
- Transgenic and Genome Editing Facility, Institute for Quantitative Health Science & Engineering, Research Technology Support Facility, Michigan State University, East Lansing, MI, 48824, USA.
| | | | - Elena Y Demireva
- Transgenic and Genome Editing Facility, Institute for Quantitative Health Science & Engineering, Research Technology Support Facility, Michigan State University, East Lansing, MI, 48824, USA
| | - Huishi Toh
- Neuroscience Research Institute, University of California Santa Barbara, Santa Barbara, USA
| | - Bana Abolibdeh
- Transgenic and Genome Editing Facility, Institute for Quantitative Health Science & Engineering, Research Technology Support Facility, Michigan State University, East Lansing, MI, 48824, USA
| | - Jiaming Shi
- Department of Psychology, Michigan State University, East Lansing, MI, 48824, USA
| | - Bo Zhou
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, USA
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, USA
| | - Shigeki Iwase
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, USA
- Department of Pediatrics, University of Michigan Medical School, Ann Arbor, USA
| | - Lily Yan
- Department of Psychology, Michigan State University, East Lansing, MI, 48824, USA.
- Neuroscience Program, Michigan State University, East Lansing, USA.
| |
Collapse
|
3
|
Gupta A, Mirarab S, Turakhia Y. Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.27.596098. [PMID: 38854139 PMCID: PMC11160643 DOI: 10.1101/2024.05.27.596098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2024]
Abstract
Inference of species trees plays a crucial role in advancing our understanding of evolutionary relationships and has immense significance for diverse biological and medical applications. Extensive genome sequencing efforts are currently in progress across a broad spectrum of life forms, holding the potential to unravel the intricate branching patterns within the tree of life. However, estimating species trees starting from raw genome sequences is quite challenging, and the current cutting-edge methodologies require a series of error-prone steps that are neither entirely automated nor standardized. In this paper, we present ROADIES, a novel pipeline for species tree inference from raw genome assemblies that is fully automated, easy to use, scalable, free from reference bias, and provides flexibility to adjust the tradeoff between accuracy and runtime. The ROADIES pipeline eliminates the need to align whole genomes, choose a single reference species, or pre-select loci such as functional genes found using cumbersome annotation steps. Moreover, it leverages recent advances in phylogenetic inference to allow multi-copy genes, eliminating the need to detect orthology. Using the genomic datasets released from large-scale sequencing consortia across three diverse life forms (placental mammals, pomace flies, and birds), we show that ROADIES infers species trees that are comparable in quality with the state-of-the-art approaches but in a fraction of the time. By incorporating optimal approaches and automating all steps from assembled genomes to species and gene trees, ROADIES is poised to improve the accuracy, scalability, and reproducibility of phylogenomic analyses.
Collapse
Affiliation(s)
- Anshu Gupta
- Department of Computer Science and Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| | - Yatish Turakhia
- Department of Electrical and Computer Engineering, University of California, San Diego; San Diego, CA 92093, USA
| |
Collapse
|
4
|
Cenzato D, Lipták Z. A survey of BWT variants for string collections. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae333. [PMID: 38788221 DOI: 10.1093/bioinformatics/btae333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 04/13/2024] [Accepted: 05/23/2024] [Indexed: 05/26/2024]
Abstract
MOTIVATION In recent years, the focus of bioinformatics research has moved from individual sequences to collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform (BWT) in string processing, a number of dedicated tools have been developed for computing the BWT of string collections. While the focus has been on improving efficiency, both in space and time, the exact definition of the BWT employed has not been at the center of attention. As we show in this paper, the different tools in use often compute non-equivalent BWT variants: the resulting transforms can differ from each other significantly, including the number r of runs, a central parameter of the BWT. Moreover, with many tools, the transform depends on the input order of the collection. In other words, on the same dataset, the same tool may output different transforms if the dataset is given in a different order. RESULTS We studied 18 dedicated tools for computing the BWT of string collections and were able to identify 6 different BWT variants computed by these tools. We review the differences between these BWT variants, both from a theoretical and from a practical point of view, comparing them on 8 real-life biological datasets with different characteristics. We find that the differences can be extensive, depending on the datasets, and are largest on collections of many similar short sequences. The parameter r, the number of runs of the BWT, also shows notable variation between the different BWT variants; on our datasets, it varied by a multiplicative factor of up to 4.2. AVAILABILITY Source code and scripts to replicate the results and download the data used in the article are available at https://github.com/davidecenzato/BWT-variants-for-string-collections. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Davide Cenzato
- Department of Environmental Sciences, Informatics and Statistics, Ca' Foscari University, Venice, Italy
| | - Zsuzsanna Lipták
- Department of Computer Science, University of Verona, Verona, Italy
| |
Collapse
|
5
|
Chao KH, Heinz JM, Hoh C, Mao A, Shumate A, Pertea M, Salzberg SL. Combining DNA and protein alignments to improve genome annotation with LiftOn. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.16.593026. [PMID: 38798552 PMCID: PMC11118573 DOI: 10.1101/2024.05.16.593026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
As the number and variety of assembled genomes continues to grow, the number of annotated genomes is falling behind, particularly for eukaryotes. DNA-based mapping tools help to address this challenge, but they are only able to transfer annotation between closely-related species. Here we introduce LiftOn, a homology-based software tool that integrates DNA and protein alignments to enhance the accuracy of genome-scale annotation and to allow mapping between relatively distant species. LiftOn's protein-centric algorithm considers both types of alignments, chooses optimal open reading frames, resolves overlapping gene loci, and finds additional gene copies where they exist. LiftOn can reliably transfer annotation between genomes representing members of the same species, as we demonstrate on human, mouse, honey bee, rice, and Arabidopsis thaliana. It can further map annotation effectively across species pairs as far apart as mouse and rat or Drosophila melanogaster and D. erecta.
Collapse
Affiliation(s)
- Kuan-Hao Chao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jakob M. Heinz
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Celine Hoh
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alan Mao
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alaina Shumate
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Mihaela Pertea
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Steven L Salzberg
- Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21211, USA
| |
Collapse
|
6
|
Hogg CJ. Translating genomic advances into biodiversity conservation. Nat Rev Genet 2024; 25:362-373. [PMID: 38012268 DOI: 10.1038/s41576-023-00671-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/12/2023] [Indexed: 11/29/2023]
Abstract
A key action of the new Global Biodiversity Framework is the maintenance of genetic diversity in all species to safeguard their adaptive potential. To achieve this goal, a translational mindset, which aims to convert results of basic research into direct practical benefits, needs to be applied to biodiversity conservation. Despite much discussion on the value of genomics to conservation, a disconnect between those generating genomic resources and those applying it to biodiversity management remains. As global efforts to generate reference genomes for non-model species increase, investment into practical biodiversity applications is critically important. Applications such as understanding population and multispecies diversity and longitudinal monitoring need support alongside education for policymakers on integrating the data into evidence-based decisions. Without such investment, the opportunity to revolutionize global biodiversity conservation using genomics will not be fully realized.
Collapse
Affiliation(s)
- Carolyn J Hogg
- School of Life & Environmental Sciences, The University of Sydney, Sydney, NSW, Australia.
| |
Collapse
|
7
|
Demian WL, Cormier O, Mossman K. Immunological features of bats: resistance and tolerance to emerging viruses. Trends Immunol 2024; 45:198-210. [PMID: 38453576 DOI: 10.1016/j.it.2024.01.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 01/30/2024] [Accepted: 01/31/2024] [Indexed: 03/09/2024]
Abstract
Bats are among the most diverse mammalian species, representing over 20% of mammalian diversity. The past two decades have witnessed a disproportionate spillover of viruses from bats to humans compared with other mammalian hosts, attributed to the viral richness within bats, their phylogenetic likeness to humans, and increased human contact with wildlife. Unique evolutionary adaptations in bat genomes, particularly in antiviral protection and immune tolerance genes, enable bats to serve as reservoirs for pandemic-inducing viruses. Here, we discuss current limitations and advances made in understanding the role of bats as drivers of pandemic zoonoses. We also discuss novel technologies that have revealed spatial, dynamic, and physiological factors driving virus and host coevolution.
Collapse
Affiliation(s)
- Wael L Demian
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada; McMaster Immunology Research Centre, McMaster University, Hamilton, Ontario, Canada; Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
| | - Olga Cormier
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada; McMaster Immunology Research Centre, McMaster University, Hamilton, Ontario, Canada; Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada
| | - Karen Mossman
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada; McMaster Immunology Research Centre, McMaster University, Hamilton, Ontario, Canada; Michael G. DeGroote Institute for Infectious Disease Research, McMaster University, Hamilton, Ontario, Canada.
| |
Collapse
|
8
|
Bálint B, Merényi Z, Hegedüs B, Grigoriev IV, Hou Z, Földi C, Nagy LG. ContScout: sensitive detection and removal of contamination from annotated genomes. Nat Commun 2024; 15:936. [PMID: 38296951 PMCID: PMC10831095 DOI: 10.1038/s41467-024-45024-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 01/08/2024] [Indexed: 02/02/2024] Open
Abstract
Contamination of genomes is an increasingly recognized problem affecting several downstream applications, from comparative evolutionary genomics to metagenomics. Here we introduce ContScout, a precise tool for eliminating foreign sequences from annotated genomes. It achieves high specificity and sensitivity on synthetic benchmark data even when the contaminant is a closely related species, outperforms competing tools, and can distinguish horizontal gene transfer from contamination. A screen of 844 eukaryotic genomes for contamination identified bacteria as the most common source, followed by fungi and plants. Furthermore, we show that contaminants in ancestral genome reconstructions lead to erroneous early origins of genes and inflate gene loss rates, leading to a false notion of complex ancestral genomes. Taken together, we offer here a tool for sensitive removal of foreign proteins, identify and remove contaminants from diverse eukaryotic genomes and evaluate their impact on phylogenomic analyses.
Collapse
Affiliation(s)
- Balázs Bálint
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
| | - Botond Hegedüs
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
| | - Igor V Grigoriev
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, 94720, USA
| | - Zhihao Hou
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
- Doctoral School of Biology, Faculty of Science and Informatics, University of Szeged, Szeged, 6720, Hungary
| | - Csenge Földi
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
- Doctoral School of Biology, Faculty of Science and Informatics, University of Szeged, Szeged, 6720, Hungary
| | - László G Nagy
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary.
| |
Collapse
|
9
|
Taft JM, Tolley KA, Alexander GJ, Geneva AJ. De Novo Whole Genome Assemblies for Two Southern African Dwarf Chameleons (Bradypodion, Chamaeleonidae). Genome Biol Evol 2023; 15:evad182. [PMID: 37847614 PMCID: PMC10603767 DOI: 10.1093/gbe/evad182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 09/15/2023] [Accepted: 09/28/2023] [Indexed: 10/19/2023] Open
Abstract
A complete and high-quality reference genome has become a fundamental tool for the study of functional, comparative, and evolutionary genomics. However, efforts to produce high-quality genomes for African taxa are lagging given the limited access to sufficient resources and technologies. The southern African dwarf chameleons (Bradypodion) are a relatively young lineage, with a large body of evidence demonstrating the highly adaptive capacity of these lizards. Bradypodion are known for their habitat specialization, with evidence of convergent phenotypes across the phylogeny. However, the underlying genetic architecture of these phenotypes remains unknown for Bradypodion, and without adequate genomic resources, many evolutionary questions cannot be answered. We present de novo assembled whole genomes for Bradypodion pumilum and Bradypodion ventrale, using Pacific Biosciences long-read sequencing data. BUSCO analysis revealed that 96.36% of single copy orthologs were present in the B. pumilum genome and 94% in B. ventrale. Moreover, these genomes boast scaffold N50 of 389.6 and 374.9 Mb, respectively. Based on a whole genome alignment of both Bradypodion genomes, B. pumilum is highly syntenic with B. ventrale. Furthermore, Bradypodion is also syntenic with Anolis lizards, despite the divergence between these lineages estimated to be nearly 170 Ma. Coalescent analysis of the genomic data also suggests that historical changes in effective population size for these species correspond to notable shifts in the southern African environment. These high-quality Bradypodion genome assemblies will support future research on the evolutionary history, diversification, and genetic underpinnings of adaptation in Bradypodion.
Collapse
Affiliation(s)
- Jody M Taft
- School of Animal, Plant and Environmental Sciences, University of the Witwatersrand, Johannesburg, South Africa
- South African National Biodiversity Institute, Kirstenbosch Research Centre, Claremont, South Africa
| | - Krystal A Tolley
- South African National Biodiversity Institute, Kirstenbosch Research Centre, Claremont, South Africa
- Centre for Ecological Genomics and Wildlife Conservation, University of Johannesburg, Johannesburg, South Africa
| | - Graham J Alexander
- School of Animal, Plant and Environmental Sciences, University of the Witwatersrand, Johannesburg, South Africa
| | - Anthony J Geneva
- Department of Biology, Center for Computational and Integrative Biology, Rutgers University–Camden, Camden, New Jersey, USA
| |
Collapse
|
10
|
Thorburn DMJ, Sagonas K, Binzer-Panchal M, Chain FJJ, Feulner PGD, Bornberg-Bauer E, Reusch TBH, Samonte-Padilla IE, Milinski M, Lenz TL, Eizaguirre C. Origin matters: Using a local reference genome improves measures in population genomics. Mol Ecol Resour 2023; 23:1706-1723. [PMID: 37489282 DOI: 10.1111/1755-0998.13838] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 05/10/2023] [Accepted: 06/02/2023] [Indexed: 07/26/2023]
Abstract
Genome sequencing enables answering fundamental questions about the genetic basis of adaptation, population structure and epigenetic mechanisms. Yet, we usually need a suitable reference genome for mapping population-level resequencing data. In some model systems, multiple reference genomes are available, giving the challenging task of determining which reference genome best suits the data. Here, we compared the use of two different reference genomes for the three-spined stickleback (Gasterosteus aculeatus), one novel genome derived from a European gynogenetic individual and the published reference genome of a North American individual. Specifically, we investigated the impact of using a local reference versus one generated from a distinct lineage on several common population genomics analyses. Through mapping genome resequencing data of 60 sticklebacks from across Europe and North America, we demonstrate that genetic distance among samples and the reference genomes impacts downstream analyses. Using a local reference genome increased mapping efficiency and genotyping accuracy, effectively retaining more and better data. Despite comparable distributions of the metrics generated across the genome using SNP data (i.e. π, Tajima's D and FST ), window-based statistics using different references resulted in different outlier genes and enriched gene functions. A marker-based analysis of DNA methylation distributions had a comparably high overlap in outlier genes and functions, yet with distinct differences depending on the reference genome. Overall, our results highlight how using a local reference genome decreases reference bias to increase confidence in downstream analyses of the data. Such results have significant implications in all reference-genome-based population genomic analyses.
Collapse
Affiliation(s)
- Doko-Miles J Thorburn
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
- Department of Life Sciences, Imperial College London, London, UK
| | - Kostas Sagonas
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
- Department of Zoology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - Mahesh Binzer-Panchal
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, National Bioinformatics Infrastructure Sweden (NBIS), Uppsala University, Uppsala, Sweden
| | - Frederic J J Chain
- Department of Biological Sciences, University of Massachusetts Lowell, Lowell, Massachusetts, USA
| | - Philine G D Feulner
- Department of Fish Ecology and Evolution, Centre of Ecology, Evolution and Biogeochemistry, EAWAG Swiss Federal Institute of Aquatic Science and Technology, Kastanienbaum, Switzerland
- Division of Aquatic Ecology and Evolution, Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
| | - Erich Bornberg-Bauer
- Evolutionary Bioinformatics, Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Thorsten B H Reusch
- Marine Evolutionary Ecology, GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany
| | - Irene E Samonte-Padilla
- Department of Evolutionary Ecology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Manfred Milinski
- Department of Evolutionary Ecology, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Tobias L Lenz
- Research Group for Evolutionary Immunogenomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
- Research Unit for Evolutionary Immunogenomics, Department of Biology, University of Hamburg, Hamburg, Germany
| | - Christophe Eizaguirre
- School of Biological and Chemical Sciences, Queen Mary University of London, London, UK
| |
Collapse
|
11
|
Xie H, Linning-Duffy K, Demireva EY, Toh H, Abolibdeh B, Shi J, Zhou B, Iwase S, Yan L. CRISPR-based Genome Editing of a Diurnal Rodent, Nile Grass Rat ( Arvicanthis niloticus). BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.553600. [PMID: 37662225 PMCID: PMC10473663 DOI: 10.1101/2023.08.23.553600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Diurnal and nocturnal mammals have evolved distinct pathways to optimize survival for their chronotype-specific lifestyles. Conventional rodent models, being nocturnal, may not sufficiently recapitulate the biology of diurnal humans in health and disease. Although diurnal rodents are potentially advantageous for translational research, until recently, they have not been genetically tractable. Here, we address this major limitation by demonstrating the first successful CRISPR genome editing of the Nile grass rat ( Arvicanthis niloticus ), a valuable diurnal rodent. We establish methods for superovulation; embryo development, manipulation, and culture; and pregnancy maintenance to guide future genome editing of this and other diurnal rodent species.
Collapse
|
12
|
Pinto BJ, Gamble T, Smith CH, Wilson MA. A lizard is never late: Squamate genomics as a recent catalyst for understanding sex chromosome and microchromosome evolution. J Hered 2023; 114:445-458. [PMID: 37018459 PMCID: PMC10445521 DOI: 10.1093/jhered/esad023] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 04/03/2023] [Indexed: 04/07/2023] Open
Abstract
In 2011, the first high-quality genome assembly of a squamate reptile (lizard or snake) was published for the green anole. Dozens of genome assemblies were subsequently published over the next decade, yet these assemblies were largely inadequate for answering fundamental questions regarding genome evolution in squamates due to their lack of contiguity or annotation. As the "genomics age" was beginning to hit its stride in many organismal study systems, progress in squamates was largely stagnant following the publication of the green anole genome. In fact, zero high-quality (chromosome-level) squamate genomes were published between the years 2012 and 2017. However, since 2018, an exponential increase in high-quality genome assemblies has materialized with 24 additional high-quality genomes published for species across the squamate tree of life. As the field of squamate genomics is rapidly evolving, we provide a systematic review from an evolutionary genomics perspective. We collated a near-complete list of publicly available squamate genome assemblies from more than half-a-dozen international and third-party repositories and systematically evaluated them with regard to their overall quality, phylogenetic breadth, and usefulness for continuing to provide accurate and efficient insights into genome evolution across squamate reptiles. This review both highlights and catalogs the currently available genomic resources in squamates and their ability to address broader questions in vertebrates, specifically sex chromosome and microchromosome evolution, while addressing why squamates may have received less historical focus and has caused their progress in genomics to lag behind peer taxa.
Collapse
Affiliation(s)
- Brendan J Pinto
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, United States
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI, United States
| | - Tony Gamble
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI, United States
- Department of Biological Sciences, Marquette University, Milwaukee, WI, United States
- Bell Museum of Natural History, University of Minnesota, St Paul, MN, United States
| | - Chase H Smith
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, United States
| | - Melissa A Wilson
- School of Life Sciences, Arizona State University, Tempe, AZ, United States
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, United States
- Center for Mechanisms of Evolution, Biodesign Institute, Tempe, AZ, United States
| |
Collapse
|
13
|
Garg KM, Lamba V, Sanyal A, Dovih P, Chattopadhyay B. Next Generation Sequencing Revolutionizes Organismal Biology Research in Bats. J Mol Evol 2023:10.1007/s00239-023-10107-2. [PMID: 37154841 PMCID: PMC10166039 DOI: 10.1007/s00239-023-10107-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 03/29/2023] [Indexed: 05/10/2023]
Abstract
The advent of next generation sequencing technologies (NGS) has greatly accelerated our understanding of critical aspects of organismal biology from non-model organisms. Bats form a particularly interesting group in this regard, as genomic data have helped unearth a vast spectrum of idiosyncrasies in bat genomes associated with bat biology, physiology, and evolution. Bats are important bioindicators and are keystone species to many eco-systems. They often live in proximity to humans and are frequently associated with emerging infectious diseases, including the COVID-19 pandemic. Nearly four dozen bat genomes have been published to date, ranging from drafts to chromosomal level assemblies. Genomic investigations in bats have also become critical towards our understanding of disease biology and host-pathogen coevolution. In addition to whole genome sequencing, low coverage genomic data like reduced representation libraries, resequencing data, etc. have contributed significantly towards our understanding of the evolution of natural populations, and their responses to climatic and anthropogenic perturbations. In this review, we discuss how genomic data have enhanced our understanding of physiological adaptations in bats (particularly related to ageing, immunity, diet, etc.), pathogen discovery, and host pathogen co-evolution. In comparison, the application of NGS towards population genomics, conservation, biodiversity assessment, and functional genomics has been appreciably slower. We reviewed the current areas of focus, identifying emerging topical research directions and providing a roadmap for future genomic studies in bats.
Collapse
Affiliation(s)
- Kritika M Garg
- Centre for Interdisciplinay Archaeological Research, Ashoka University, Sonipat, Haryana, 131029, India
- Department of Biology, Ashoka University, Sonipat, Haryana, 131029, India
- Centre for Climate Change and Sustainability (3CS), Ashoka University, Sonipat, Haryana, 131029, India
| | - Vinita Lamba
- Trivedi School of Biosciences, Ashoka University, Sonipat, Haryana, 131029, India
- J. William Fulbright College of Arts and Sciences, Department of Biological Sciences, University of Arkansas, Fayetteville, AR72701, USA
| | - Avirup Sanyal
- Trivedi School of Biosciences, Ashoka University, Sonipat, Haryana, 131029, India
- Ecology and Evolution, National Centre for Biological Sciences, Bangalore, 560065, India
| | - Pilot Dovih
- Centre for Climate Change and Sustainability (3CS), Ashoka University, Sonipat, Haryana, 131029, India
- Ecology and Evolution, National Centre for Biological Sciences, Bangalore, 560065, India
- School of Chemistry and Biotechnology, Sastra University, Thanjavur, Tamil Nadu, 613401, India
| | - Balaji Chattopadhyay
- Centre for Climate Change and Sustainability (3CS), Ashoka University, Sonipat, Haryana, 131029, India.
- Trivedi School of Biosciences, Ashoka University, Sonipat, Haryana, 131029, India.
| |
Collapse
|
14
|
Osmanski AB, Paulat NS, Korstian J, Grimshaw JR, Halsey M, Sullivan KAM, Moreno-Santillán DD, Crookshanks C, Roberts J, Garcia C, Johnson MG, Densmore LD, Stevens RD, Rosen J, Storer JM, Hubley R, Smit AFA, Dávalos LM, Karlsson EK, Lindblad-Toh K, Ray DA. Insights into mammalian TE diversity through the curation of 248 genome assemblies. Science 2023; 380:eabn1430. [PMID: 37104570 PMCID: PMC11103246 DOI: 10.1126/science.abn1430] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 10/28/2022] [Indexed: 04/29/2023]
Abstract
We examined transposable element (TE) content of 248 placental mammal genome assemblies, the largest de novo TE curation effort in eukaryotes to date. We found that although mammals resemble one another in total TE content and diversity, they show substantial differences with regard to recent TE accumulation. This includes multiple recent expansion and quiescence events across the mammalian tree. Young TEs, particularly long interspersed elements, drive increases in genome size, whereas DNA transposons are associated with smaller genomes. Mammals tend to accumulate only a few types of TEs at any given time, with one TE type dominating. We also found association between dietary habit and the presence of DNA transposon invasions. These detailed annotations will serve as a benchmark for future comparative TE analyses among placental mammals.
Collapse
Affiliation(s)
- Austin B. Osmanski
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Nicole S. Paulat
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Jenny Korstian
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Jenna R. Grimshaw
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Michaela Halsey
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | | | | | | | - Jacquelyn Roberts
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Carlos Garcia
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | - Matthew G. Johnson
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| | | | - Richard D. Stevens
- Department of Natural Resources Management and Natural Science Research Laboratory, Museum of Texas Tech University, Lubbock, TX, USA
| | | | - Jeb Rosen
- Institute for Systems Biology, Seattle, WA, USA
| | | | | | | | - Liliana M. Dávalos
- Department of Ecology & Evolution, Stony Brook University, Stony Brook, NY, USA
- Consortium for Inter-Disciplinary Environmental Research, Stony Brook University, Stony Brook, NY, USA
| | - Elinor K. Karlsson
- Department of Medical Biochemistry and Microbiology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Program in Bioinformatics and Integrative Biology, UMass Chan Medical School, Worcester, MA, USA
- Program in Molecular Medicine, UMass Chan Medical School, Worcester, MA, USA
| | - David A. Ray
- Department of Biological Sciences, Texas Tech University, Lubbock, TX, USA
| |
Collapse
|
15
|
Pinto BJ, Gamble T, Smith CH, Wilson MA. A lizard is never late: squamate genomics as a recent catalyst for understanding sex chromosome and microchromosome evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.20.524006. [PMID: 37034614 PMCID: PMC10081179 DOI: 10.1101/2023.01.20.524006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In 2011, the first high-quality genome assembly of a squamate reptile (lizard or snake) was published for the green anole. Dozens of genome assemblies were subsequently published over the next decade, yet these assemblies were largely inadequate for answering fundamental questions regarding genome evolution in squamates due to their lack of contiguity or annotation. As the "genomics age" was beginning to hit its stride in many organismal study systems, progress in squamates was largely stagnant following the publication of the green anole genome. In fact, zero high-quality (chromosome-level) squamate genomes were published between the years 2012-2017. However, since 2018, an exponential increase in high-quality genome assemblies has materialized with 24 additional high-quality genomes published for species across the squamate tree of life. As the field of squamate genomics is rapidly evolving, we provide a systematic review from an evolutionary genomics perspective. We collated a near-complete list of publicly available squamate genome assemblies from more than half-a-dozen international and third-party repositories and systematically evaluated them with regard to their overall quality, phylogenetic breadth, and usefulness for continuing to provide accurate and efficient insights into genome evolution across squamate reptiles. This review both highlights and catalogs the currently available genomic resources in squamates and their ability to address broader questions in vertebrates, specifically sex chromosome and microchromosome evolution, while addressing why squamates may have received less historical focus and has caused their progress in genomics to lag behind peer taxa.
Collapse
Affiliation(s)
- Brendan J Pinto
- School of Life Sciences, Arizona State University, Tempe, AZ USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ USA
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI USA
| | - Tony Gamble
- Department of Zoology, Milwaukee Public Museum, Milwaukee, WI USA
- Department of Biological Sciences, Marquette University, Milwaukee WI USA
- Bell Museum of Natural History, University of Minnesota, St Paul, MN USA
| | - Chase H Smith
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
| | - Melissa A Wilson
- School of Life Sciences, Arizona State University, Tempe, AZ USA
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ USA
- Center for Mechanisms of Evolution, Biodesign Institute, Tempe, AZ USA
| |
Collapse
|
16
|
Silva L, Antunes A. Omics and Remote Homology Integration to Decipher Protein Functionality. Methods Mol Biol 2023; 2627:61-81. [PMID: 36959442 DOI: 10.1007/978-1-0716-2974-1_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
In the recent years, several "omics" technologies based on specific biomolecules (from DNA, RNA, proteins, or metabolites) have won growing importance in the scientific field. Despite each omics possess their own laboratorial protocols, they share a background of bioinformatic tools for data integration and analysis. A recent subset of bioinformatic tools, based on available templates or remote homology protocols, allow computational fast and high-accuracy prediction of protein structures. The quickly predict of actually unsolved protein structures, together with late omics findings allow a boost of scientific advances in multiple fields such as cancer, longevity, immunity, mitochondrial function, toxicology, drug design, biosensors, and recombinant protein engineering. In this chapter, we assessed methodological approaches for the integration of omics and remote homology inferences to decipher protein functionality, opening the door to the next era of biological knowledge.
Collapse
Affiliation(s)
- Liliana Silva
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal.
- Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal.
| |
Collapse
|
17
|
Woo C, Kumari P, Eo KY, Lee WS, Kimura J, Yamamoto N. Combining vertebrate mitochondrial 12S rRNA gene sequencing and shotgun metagenomic sequencing to investigate the diet of the leopard cat (Prionailurus bengalensis) in Korea. PLoS One 2023; 18:e0281245. [PMID: 36719887 PMCID: PMC9888693 DOI: 10.1371/journal.pone.0281245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 01/18/2023] [Indexed: 02/01/2023] Open
Abstract
The leopard cat (Prionailurus bengalensis), an endangered species in South Korea, is a small feline widely distributed in Asia. Here, we investigated the diet of leopard cats in the inland areas of Korea by examining their fecal contents using vertebrate mitochondrial 12S rRNA gene sequencing and shotgun metagenomic sequencing. Shotgun metagenomic sequencing revealed that the feces were rich in DNA not only of vertebrates but also of arthropods and plants, but care should be taken when using shotgun metagenomic sequencing to identify vertebrates at low taxonomic levels (e.g., genus level), as it was often erroneous. Meanwhile, vertebrate mitochondrial 12S rRNA gene sequencing was found to be accurate in the genus-level identification, as the genera identified were consistent with the Korean fauna. We found that small mammals such as murids were their main prey. By using these two sequencing methods in combination, this study demonstrated that accurate information about the overall dietary content and vertebrate prey of leopard cats could be obtained. We expect that the continued community efforts to expand the genome database of wildlife, including vertebrates, will alleviate the problem of erroneous identification of prey at low taxonomic levels by shotgun metagenomic sequencing in the near future.
Collapse
Affiliation(s)
- Cheolwoon Woo
- Department of Environmental Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea
| | - Priyanka Kumari
- Department of Environmental Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea
- Institute of Health and Environment, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea
| | - Kyung Yeon Eo
- Department of Animal Health and Welfare, College of Healthcare and Biotechnology, Semyung University, Jecheon, Republic of Korea
| | - Woo-Shin Lee
- Department of Forest Sciences, College of Agriculture and Life Science, Seoul National University, Seoul, Republic of Korea
| | - Junpei Kimura
- College of Veterinary Medicine, Seoul National University, Seoul, Republic of Korea
| | - Naomichi Yamamoto
- Department of Environmental Health Sciences, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea
- Institute of Health and Environment, Graduate School of Public Health, Seoul National University, Seoul, Republic of Korea
- * E-mail:
| |
Collapse
|
18
|
The Current Developments in Medicinal Plant Genomics Enabled the Diversification of Secondary Metabolites' Biosynthesis. Int J Mol Sci 2022; 23:ijms232415932. [PMID: 36555572 PMCID: PMC9781956 DOI: 10.3390/ijms232415932] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 12/04/2022] [Accepted: 12/09/2022] [Indexed: 12/23/2022] Open
Abstract
Medicinal plants produce important substrates for their adaptation and defenses against environmental factors and, at the same time, are used for traditional medicine and industrial additives. Plants have relatively little in the way of secondary metabolites via biosynthesis. Recently, the whole-genome sequencing of medicinal plants and the identification of secondary metabolite production were revolutionized by the rapid development and cheap cost of sequencing technology. Advances in functional genomics, such as transcriptomics, proteomics, and metabolomics, pave the way for discoveries in secondary metabolites and related key genes. The multi-omics approaches can offer tremendous insight into the variety, distribution, and development of biosynthetic gene clusters (BGCs). Although many reviews have reported on the plant and medicinal plant genome, chemistry, and pharmacology, there is no review giving a comprehensive report about the medicinal plant genome and multi-omics approaches to study the biosynthesis pathway of secondary metabolites. Here, we introduce the medicinal plant genome and the application of multi-omics tools for identifying genes related to the biosynthesis pathway of secondary metabolites. Moreover, we explore comparative genomics and polyploidy for gene family analysis in medicinal plants. This study promotes medicinal plant genomics, which contributes to the biosynthesis and screening of plant substrates and plant-based drugs and prompts the research efficiency of traditional medicine.
Collapse
|
19
|
Premzl M. Revised eutherian gene collections. BMC Genom Data 2022; 23:56. [PMID: 35870891 PMCID: PMC9308196 DOI: 10.1186/s12863-022-01071-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Accepted: 07/13/2022] [Indexed: 11/24/2022] Open
Abstract
Objectives The most recent research projects in scientific field of eutherian comparative genomics included intentions to sequence every extant eutherian species genome in foreseeable future, so that future revisions and updates of eutherian gene data sets were expected. Data description Using 35 public eutherian reference genomic sequence assemblies and free available software, the eutherian comparative genomic analysis protocol RRID:SCR_014401 was published as guidance against potential genomic sequence errors. The protocol curated 14 eutherian third-party data gene data sets, including, in aggregate, 2615 complete coding sequences that were deposited in European Nucleotide Archive. The published eutherian gene collections were used in revisions and updates of eutherian gene data set classifications and nomenclatures that included gene annotations, phylogenetic analyses and protein molecular evolution analyses.
Collapse
|
20
|
Çilingir FG, A'Bear L, Hansen D, Davis LR, Bunbury N, Ozgul A, Croll D, Grossen C. Chromosome-level genome assembly for the Aldabra giant tortoise enables insights into the genetic health of a threatened population. Gigascience 2022; 11:giac090. [PMID: 36251273 PMCID: PMC9553416 DOI: 10.1093/gigascience/giac090] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 07/22/2022] [Accepted: 09/12/2022] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND The Aldabra giant tortoise (Aldabrachelys gigantea) is one of only two giant tortoise species left in the world. The species is endemic to Aldabra Atoll in Seychelles and is listed as Vulnerable on the International Union for Conservation of Nature Red List (v2.3) due to its limited distribution and threats posed by climate change. Genomic resources for A. gigantea are lacking, hampering conservation efforts for both wild and ex situpopulations. A high-quality genome would also open avenues to investigate the genetic basis of the species' exceptionally long life span. FINDINGS We produced the first chromosome-level de novo genome assembly of A. gigantea using PacBio High-Fidelity sequencing and high-throughput chromosome conformation capture. We produced a 2.37-Gbp assembly with a scaffold N50 of 148.6 Mbp and a resolution into 26 chromosomes. RNA sequencing-assisted gene model prediction identified 23,953 protein-coding genes and 1.1 Gbp of repetitive sequences. Synteny analyses among turtle genomes revealed high levels of chromosomal collinearity even among distantly related taxa. To assess the utility of the high-quality assembly for species conservation, we performed a low-coverage resequencing of 30 individuals from wild populations and two zoo individuals. Our genome-wide population structure analyses detected genetic population structure in the wild and identified the most likely origin of the zoo-housed individuals. We further identified putatively deleterious mutations to be monitored. CONCLUSIONS We establish a high-quality chromosome-level reference genome for A. gigantea and one of the most complete turtle genomes available. We show that low-coverage whole-genome resequencing, for which alignment to the reference genome is a necessity, is a powerful tool to assess the population structure of the wild population and reveal the geographic origins of ex situ individuals relevant for genetic diversity management and rewilding efforts.
Collapse
Affiliation(s)
- F Gözde Çilingir
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich 8057, Switzerland
| | - Luke A'Bear
- Seychelles Islands Foundation, Victoria, Republic of Seychelles
| | - Dennis Hansen
- Zoological Museum, University of Zurich, Zurich 8006, Switzerland
- Indian Ocean Tortoise Alliance, Ile Cerf, Victoria, Republic of Seychelles
| | | | - Nancy Bunbury
- Seychelles Islands Foundation, Victoria, Republic of Seychelles
- Centre for Ecology and Conservation, College of Life and Environmental Sciences, University of Exeter, Penryn, Cornwall, TR10 9FE, UK
| | - Arpat Ozgul
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich 8057, Switzerland
| | - Daniel Croll
- Institute of Biology, University of Neuchâtel, Neuchâtel 2000, Switzerland
| | - Christine Grossen
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich 8057, Switzerland
| |
Collapse
|
21
|
Ko BJ, Lee C, Kim J, Rhie A, Yoo DA, Howe K, Wood J, Cho S, Brown S, Formenti G, Jarvis ED, Kim H. Widespread false gene gains caused by duplication errors in genome assemblies. Genome Biol 2022; 23:205. [PMID: 36167596 PMCID: PMC9516828 DOI: 10.1186/s13059-022-02764-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Accepted: 09/02/2022] [Indexed: 12/22/2022] Open
Abstract
Background False duplications in genome assemblies lead to false biological conclusions. We quantified false duplications in popularly used previous genome assemblies for platypus, zebra finch, and Anna’s Hummingbird, and their new counterparts of the same species generated by the Vertebrate Genomes Project, of which the Vertebrate Genomes Project pipeline attempted to eliminate false duplications through haplotype phasing and purging. These assemblies are among the first generated by the Vertebrate Genomes Project where there was a prior chromosomal level reference assembly to compare with. Results Whole genome alignments revealed that 4 to 16% of the sequences are falsely duplicated in the previous assemblies, impacting hundreds to thousands of genes. These lead to overestimated gene family expansions. The main source of the false duplications is heterotype duplications, where the haplotype sequences were relatively more divergent than other parts of the genome leading the assembly algorithms to classify them as separate genes or genomic regions. A minor source is sequencing errors. Ancient ATP nucleotide binding gene families have a higher prevalence of false duplications compared to other gene families. Although present in a smaller proportion, we observe false duplications remaining in the Vertebrate Genomes Project assemblies that can be identified and purged. Conclusions This study highlights the need for more advanced assembly methods that better separate haplotypes and sequence errors, and the need for cautious analyses on gene gains. Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02764-1.
Collapse
Affiliation(s)
- Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Juwan Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, USA
| | - Dong Ahn Yoo
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | | | | | - Seoae Cho
- eGnome, Inc, Seoul, Republic of Korea
| | - Samara Brown
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA.,Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Giulio Formenti
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Erich D Jarvis
- Laboratory of the Neurogenetics of Language, The Rockefeller University, New York, NY, USA. .,Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| | - Heebal Kim
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea. .,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea. .,eGnome, Inc, Seoul, Republic of Korea.
| |
Collapse
|
22
|
Descorps-Declère S, Richard GF. Megasatellite formation and evolution in vertebrate genes. Cell Rep 2022; 40:111347. [PMID: 36103826 DOI: 10.1016/j.celrep.2022.111347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 04/28/2022] [Accepted: 08/23/2022] [Indexed: 11/03/2022] Open
Abstract
Since formation of the first proto-eukaryotes, gene repertoire and genome complexity have significantly increased. Among genetic elements responsible for this increase are tandem repeats. Here we describe a genome-wide analysis of large tandem repeats, called megasatellites, in 58 vertebrate genomes. Two bursts occurred, one after the radiation between Agnatha and Gnathostomata fishes and the second one in therian mammals. Megasatellites are enriched in subtelomeric regions and frequently encoded in genes involved in transcription regulation, intracellular trafficking, and cell membrane metabolism, reminiscent of what is observed in fungus genomes. The presence of many introns within young megasatellites suggests that an exon-intron DNA segment is first duplicated and amplified before accumulation of mutations in intronic parts partially erases the megasatellite in such a way that it becomes detectable only in exons. Our results suggest that megasatellite formation and evolution is a dynamic and still ongoing process in vertebrate genomes.
Collapse
Affiliation(s)
- Stéphane Descorps-Declère
- Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub, 25 rue du Dr Roux, 75015 Paris, France.
| | - Guy-Franck Richard
- Institut Pasteur, Université Paris Cité, CNRS UMR3525, Natural & Synthetic Genome Instabilities, 25 rue du Dr Roux, 75015 Paris, France.
| |
Collapse
|
23
|
Kille B, Balaji A, Sedlazeck FJ, Nute M, Treangen TJ. Multiple genome alignment in the telomere-to-telomere assembly era. Genome Biol 2022; 23:182. [PMID: 36038949 PMCID: PMC9421119 DOI: 10.1186/s13059-022-02735-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Accepted: 07/21/2022] [Indexed: 01/22/2023] Open
Abstract
With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights.
Collapse
Affiliation(s)
- Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Michael Nute
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
24
|
Abstract
Paleoproteomics, the study of ancient proteins, is a rapidly growing field at the intersection of molecular biology, paleontology, archaeology, paleoecology, and history. Paleoproteomics research leverages the longevity and diversity of proteins to explore fundamental questions about the past. While its origins predate the characterization of DNA, it was only with the advent of soft ionization mass spectrometry that the study of ancient proteins became truly feasible. Technological gains over the past 20 years have allowed increasing opportunities to better understand preservation, degradation, and recovery of the rich bioarchive of ancient proteins found in the archaeological and paleontological records. Growing from a handful of studies in the 1990s on individual highly abundant ancient proteins, paleoproteomics today is an expanding field with diverse applications ranging from the taxonomic identification of highly fragmented bones and shells and the phylogenetic resolution of extinct species to the exploration of past cuisines from dental calculus and pottery food crusts and the characterization of past diseases. More broadly, these studies have opened new doors in understanding past human-animal interactions, the reconstruction of past environments and environmental changes, the expansion of the hominin fossil record through large scale screening of nondiagnostic bone fragments, and the phylogenetic resolution of the vertebrate fossil record. Even with these advances, much of the ancient proteomic record still remains unexplored. Here we provide an overview of the history of the field, a summary of the major methods and applications currently in use, and a critical evaluation of current challenges. We conclude by looking to the future, for which innovative solutions and emerging technology will play an important role in enabling us to access the still unexplored "dark" proteome, allowing for a fuller understanding of the role ancient proteins can play in the interpretation of the past.
Collapse
Affiliation(s)
- Christina Warinner
- Department
of Anthropology, Harvard University, Cambridge, Massachusetts 02138, United States
- Department of Archaeogenetics, Max Planck Institute for Evolutionary Anthropology, Leipzig 04103, Germany
| | - Kristine Korzow Richter
- Department
of Anthropology, Harvard University, Cambridge, Massachusetts 02138, United States
| | - Matthew J. Collins
- Department
of Archaeology, Cambridge University, Cambridge CB2 3DZ, United Kingdom
- Section
for Evolutionary Genomics, Globe Institute,
University of Copenhagen, Copenhagen 1350, Denmark
| |
Collapse
|
25
|
Phillips JD, Gillis DJ, Hanner RH. Lack of Statistical Rigor in DNA Barcoding Likely Invalidates the Presence of a True Species' Barcode Gap. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.859099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
DNA barcoding has been largely successful in satisfactorily exposing levels of standing genetic diversity for a wide range of taxonomic groups through the employment of only one or a few universal gene markers. However, sufficient coverage of geographically-broad intra-specific haplotype variation within genomic databases like the Barcode of Life Data Systems (BOLD) and GenBank remains relatively sparse. As reference sequence libraries continue to grow exponentially in size, there is now the need to identify novel ways of meaningfully analyzing vast amounts of available DNA barcode data. This is an important issue to address promptly for the routine tasks of specimen identification and species discovery, which have seen broad adoption in areas as diverse as regulatory forensics and resource conservation. Here, it is demonstrated that the interpretation of DNA barcoding data is lacking in statistical rigor. To highlight this, focus is set specifically on one key concept that has become a household name in the field: the DNA barcode gap. Arguments outlined herein specifically center on DNA barcoding in animal taxa and stem from three angles: (1) the improper allocation of specimen sampling effort necessary to capture adequate levels of within-species genetic variation, (2) failing to properly visualize intra-specific and interspecific genetic distances, and (3) the inconsistent, inappropriate use, or absence of statistical inferential procedures in DNA barcoding gap analyses. Furthermore, simple statistical solutions are outlined which can greatly propel the use of DNA barcoding as a tool to irrefutably match unknowns to knowns on the basis of the barcoding gap with a high degree of confidence. Proposed methods examined herein are illustrated through application to DNA barcode sequence data from Canadian Pacific fish species as a case study.
Collapse
|
26
|
Išerić H, Alkan C, Hach F, Numanagić I. Fast characterization of segmental duplication structure in multiple genome assemblies. Algorithms Mol Biol 2022; 17:4. [PMID: 35303886 PMCID: PMC8932185 DOI: 10.1186/s13015-022-00210-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 02/08/2022] [Indexed: 11/29/2022] Open
Abstract
MOTIVATION The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural elements, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing the genomic structure and inventing new genes. Optimal computation of SDs within a genome requires quadratic-time local alignment algorithms that are impractical due to the size of most genomes. Additionally, to perform evolutionary analysis, one needs to characterize SDs in multiple genomes and find relations between those SDs and unique (non-duplicated) segments in other genomes. A naïve approach consisting of multiple sequence alignment would make the optimal solution to this problem even more impractical. Thus there is a need for fast and accurate algorithms to characterize SD structure in multiple genome assemblies to better understand the evolutionary forces that shaped the genomes of today. RESULTS Here we introduce a new approach, BISER, to quickly detect SDs in multiple genomes and identify elementary SDs and core duplicons that drive the formation of such SDs. BISER improves earlier tools by (i) scaling the detection of SDs with low homology to multiple genomes while introducing further 7-33[Formula: see text] speed-ups over the existing tools, and by (ii) characterizing elementary SDs and detecting core duplicons to help trace the evolutionary history of duplications to as far as 300 million years. AVAILABILITY AND IMPLEMENTATION BISER is implemented in Seq programming language and is publicly available at https://github.com/0xTCG/biser .
Collapse
Affiliation(s)
- Hamza Išerić
- Department of Computer Science, University of Victoria, Victoria, BC, V8P 5C2, Canada
| | - Can Alkan
- Department of Computer Engineering, Bilkent University, 06800, Ankara, Turkey
| | - Faraz Hach
- Vancouver Prostate Centre, Vancouver, BC, V6H 3Z6, Canada
- Department of Urologic Sciences, University of British Columbia, Vancouver, BC, V5Z 1M9, Canada
| | - Ibrahim Numanagić
- Department of Computer Science, University of Victoria, Victoria, BC, V8P 5C2, Canada.
| |
Collapse
|
27
|
Ramos L, Antunes A. Decoding sex: Elucidating sex determination and how high-quality genome assemblies are untangling the evolutionary dynamics of sex chromosomes. Genomics 2022; 114:110277. [PMID: 35104609 DOI: 10.1016/j.ygeno.2022.110277] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 12/22/2021] [Accepted: 01/26/2022] [Indexed: 11/28/2022]
Abstract
Sexual reproduction is a diverse and widespread process. In gonochoristic species, the differentiation of sexes occurs through diverse mechanisms, influenced by environmental and genetic factors. In most vertebrates, a master-switch gene is responsible for triggering a sex determination network. However, only a few genes have acquired master-switch functions, and this process is associated with the evolution of sex-chromosomes, which have a significant influence in evolution. Additionally, their highly repetitive regions impose challenges for high-quality sequencing, even using high-throughput, state-of-the-art techniques. Here, we review the mechanisms involved in sex determination and their role in the evolution of species, particularly vertebrates, focusing on sex chromosomes and the challenges involved in sequencing these genomic elements. We also address the improvements provided by the growth of sequencing projects, by generating a massive number of near-gapless, telomere-to-telomere, chromosome-level, phased assemblies, increasing the number and quality of sex-chromosome sequences available for further studies.
Collapse
Affiliation(s)
- Luana Ramos
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal; Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Porto, Portugal; Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal.
| |
Collapse
|
28
|
Hogg CJ, Ottewell K, Latch P, Rossetto M, Biggs J, Gilbert A, Richmond S, Belov K. Threatened Species Initiative: Empowering conservation action using genomic resources. Proc Natl Acad Sci U S A 2022; 119:e2115643118. [PMID: 35042806 PMCID: PMC8795520 DOI: 10.1073/pnas.2115643118] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Globally, 15,521 animal species are listed as threatened by the International Union for the Conservation of Nature, and of these less than 3% have genomic resources that can inform conservation management. To combat this, global genome initiatives are developing genomic resources, yet production of a reference genome alone does not conserve a species. The reference genome allows us to develop a suite of tools to understand both genome-wide and functional diversity within and between species. Conservation practitioners can use these tools to inform their decision-making. But, at present there is an implementation gap between the release of genome information and the use of genomic data in applied conservation by conservation practitioners. In May 2020, we launched the Threatened Species Initiative and brought a consortium of genome biologists, population biologists, bioinformaticians, population geneticists, and ecologists together with conservation agencies across Australia, including government, zoos, and nongovernment organizations. Our objective is to create a foundation of genomic data to advance our understanding of key Australian threatened species, and ultimately empower conservation practitioners to access and apply genomic data to their decision-making processes through a web-based portal. Currently, we are developing genomic resources for 61 threatened species from a range of taxa, across Australia, with more than 130 collaborators from government, academia, and conservation organizations. Developed in direct consultation with government threatened-species managers and other conservation practitioners, herein we present our framework for meeting their needs and our systematic approach to integrating genomics into threatened species recovery.
Collapse
Affiliation(s)
- Carolyn J Hogg
- School of Life & Environmental Science, University of Sydney, Sydney, NSW 2006, Australia;
| | - Kym Ottewell
- Conservation Science Centre, Department of Biodiversity, Conservation, & Attractions, Kensington, WA 6151, Australia
| | - Peter Latch
- Australian Government Department of Agriculture, Water & Environment, Canberra, ACT 2600, Australia
| | - Maurizio Rossetto
- Research Centre for Ecosystem Resilience, Australian Institute of Botanical Science, The Royal Botanic Garden Sydney, Sydney, NSW 2000, Australia
| | - James Biggs
- Zoo and Aquarium Association Australasia, Mosman, NSW 2088, Australia
| | | | | | - Katherine Belov
- School of Life & Environmental Science, University of Sydney, Sydney, NSW 2006, Australia
| |
Collapse
|
29
|
Peel E, Silver L, Brandies P, Hogg CJ, Belov K. A reference genome for the critically endangered woylie, Bettongia penicillata ogilbyi. GIGABYTE 2021; 2021:gigabyte35. [PMID: 36824341 PMCID: PMC9650285 DOI: 10.46471/gigabyte.35] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 12/08/2021] [Indexed: 11/09/2022] Open
Abstract
Biodiversity is declining globally, and Australia has one of the worst extinction records for mammals. The development of sequencing technologies means that genomic approaches are now available as important tools for wildlife conservation and management. Despite this, genome sequences are available for only 5% of threatened Australian species. Here we report the first reference genome for the woylie (Bettongia penicillata ogilbyi), a critically endangered marsupial from Western Australia, and the first genome within the Potoroidae family. The woylie reference genome was generated using Pacific Biosciences HiFi long-reads, resulting in a 3.39 Gbp assembly with a scaffold N50 of 6.49 Mbp and 86.5% complete mammalian BUSCOs. Assembly of a global transcriptome from pouch skin, tongue, heart and blood RNA-seq reads was used to guide annotation with Fgenesh++, resulting in the annotation of 24,655 genes. The woylie reference genome is a valuable resource for conservation, management and investigations into disease-induced decline of this critically endangered marsupial.
Collapse
Affiliation(s)
- Emma Peel
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Luke Silver
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Parice Brandies
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Carolyn J. Hogg
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| | - Katherine Belov
- School of Life and Environmental Sciences, The University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
30
|
Phenotyping in the era of genomics: MaTrics—a digital character matrix to document mammalian phenotypic traits. Mamm Biol 2021. [DOI: 10.1007/s42991-021-00192-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
AbstractA new and uniquely structured matrix of mammalian phenotypes, MaTrics (Mammalian Traits for Comparative Genomics) in a digital form is presented. By focussing on mammalian species for which genome assemblies are available, MaTrics provides an interface between mammalogy and comparative genomics.MaTrics was developed within a project aimed to find genetic causes of phenotypic traits of mammals using Forward Genomics. This approach requires genomes and comprehensive and recorded information on homologous phenotypes that are coded as discrete categories in a matrix. MaTrics is an evolving online resource providing information on phenotypic traits in numeric code; traits are coded either as absent/present or with several states as multistate. The state record for each species is linked to at least one reference (e.g., literature, photographs, histological sections, CT scans, or museum specimens) and so MaTrics contributes to digitalization of museum collections. Currently, MaTrics covers 147 mammalian species and includes 231 characters related to structure, morphology, physiology, ecology, and ethology and available in a machine actionable NEXUS-format*. Filling MaTrics revealed substantial knowledge gaps, highlighting the need for phenotyping efforts. Studies based on selected data from MaTrics and using Forward Genomics identified associations between genes and certain phenotypes ranging from lifestyles (e.g., aquatic) to dietary specializations (e.g., herbivory, carnivory). These findings motivate the expansion of phenotyping in MaTrics by filling research gaps and by adding taxa and traits. Only databases like MaTrics will provide machine actionable information on phenotypic traits, an important limitation to genomics. MaTrics is available within the data repository Morph·D·Base (www.morphdbase.de).
Collapse
|
31
|
Bravo GA, Schmitt CJ, Edwards SV. What Have We Learned from the First 500 Avian Genomes? ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS 2021. [DOI: 10.1146/annurev-ecolsys-012121-085928] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The increased capacity of DNA sequencing has significantly advanced our understanding of the phylogeny of birds and the proximate and ultimate mechanisms molding their genomic diversity. In less than a decade, the number of available avian reference genomes has increased to over 500—approximately 5% of bird diversity—placing birds in a privileged position to advance the fields of phylogenomics and comparative, functional, and population genomics. Whole-genome sequence data, as well as indels and rare genomic changes, are further resolving the avian tree of life. The accumulation of bird genomes, increasingly with long-read sequence data, greatly improves the resolution of genomic features such as germline-restricted chromosomes and the W chromosome, and is facilitating the comparative integration of genotypes and phenotypes. Community-based initiatives such as the Bird 10,000 Genomes Project and Vertebrate Genome Project are playing a fundamental role in amplifying and coalescing a vibrant international program in avian comparative genomics.
Collapse
Affiliation(s)
- Gustavo A. Bravo
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138, USA;, ,
| | - C. Jonathan Schmitt
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138, USA;, ,
| | - Scott V. Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts 02138, USA;, ,
| |
Collapse
|
32
|
Abstract
The reference human genome sequence is inarguably the most important and widely used resource in the fields of human genetics and genomics. It has transformed the conduct of biomedical sciences and brought invaluable benefits to the understanding and improvement of human health. However, the commonly used reference sequence has profound limitations, because across much of its span, it represents the sequence of just one human haplotype. This single, monoploid reference structure presents a critical barrier to representing the broad genomic diversity in the human population. In this review, we discuss the modernization of the reference human genome sequence to a more complete reference of human genomic diversity, known as a human pangenome.
Collapse
Affiliation(s)
- Karen H Miga
- UC Santa Cruz Genomics Institute and Department of Biomedical Engineering, University of California, Santa Cruz, California 95064, USA;
| | - Ting Wang
- Department of Genetics, Edison Family Center for Genome Sciences and Systems Biology, and McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
| |
Collapse
|
33
|
Xu Z, Dixon JR. Genome reconstruction and haplotype phasing using chromosome conformation capture methodologies. Brief Funct Genomics 2021; 19:139-150. [PMID: 31875884 DOI: 10.1093/bfgp/elz026] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 09/06/2019] [Accepted: 09/15/2019] [Indexed: 12/22/2022] Open
Abstract
Genomic analysis of individuals or organisms is predicated on the availability of high-quality reference and genotype information. With the rapidly dropping costs of high-throughput DNA sequencing, this is becoming readily available for diverse organisms and for increasingly large populations of individuals. Despite these advances, there are still aspects of genome sequencing that remain challenging for existing sequencing methods. This includes the generation of long-range contiguity during genome assembly, identification of structural variants in both germline and somatic tissues, the phasing of haplotypes in diploid organisms and the resolution of genome sequence for organisms derived from complex samples. These types of information are valuable for understanding the role of genome sequence and genetic variation on genome function, and numerous approaches have been developed to address them. Recently, chromosome conformation capture (3C) experiments, such as the Hi-C assay, have emerged as powerful tools to aid in these challenges for genome reconstruction. We will review the current use of Hi-C as a tool for aiding in genome sequencing, addressing the applications, strengths, limitations and potential future directions for the use of 3C data in genome analysis. We argue that unique features of Hi-C experiments make this data type a powerful tool to address challenges in genome sequencing, and that future integration of Hi-C data with alternative sequencing assays will facilitate the continuing revolution in genomic analysis and genome sequencing.
Collapse
|
34
|
Measurement of Genetic Mobility Using a Transposon-Based Marker System in Sorghum. Methods Mol Biol 2021. [PMID: 33900606 DOI: 10.1007/978-1-0716-1134-0_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Transposable elements (TEs) are ubiquitous repetitive components of eukaryotic organisms that show mobility in the genome against diverse stresses. TEs contribute considerably to the size, structure, and plasticity of genomes and also play an active role in genome evolution by helping their hosts adapt to novel conditions by conferring useful characteristics. We developed a simple and rapid method for investigation of genetic mobility and diversity among TEs in combination with a target region amplification polymorphism (TE-TRAP) marker system in gamma-irradiated sorghum mutants. The TE-TRAP marker system reveals a high level of genetic diversity, which provides a useful marker resource for genetic mobility research.
Collapse
|
35
|
Khorsand P, Denti L, Bonizzoni P, Chikhi R, Hormozdiari F. Comparative genome analysis using sample-specific string detection in accurate long reads. BIOINFORMATICS ADVANCES 2021; 1:vbab005. [PMID: 36700094 PMCID: PMC9710709 DOI: 10.1093/bioadv/vbab005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Motivation Comparative genome analysis of two or more whole-genome sequenced (WGS) samples is at the core of most applications in genomics. These include the discovery of genomic differences segregating in populations, case-control analysis in common diseases and diagnosing rare disorders. With the current progress of accurate long-read sequencing technologies (e.g. circular consensus sequencing from PacBio sequencers), we can dive into studying repeat regions of the genome (e.g. segmental duplications) and hard-to-detect variants (e.g. complex structural variants). Results We propose a novel framework for comparative genome analysis through the discovery of strings that are specific to one genome ('samples-specific' strings). We have developed a novel, accurate and efficient computational method for the discovery of sample-specific strings between two groups of WGS samples. The proposed approach will give us the ability to perform comparative genome analysis without the need to map the reads and is not hindered by shortcomings of the reference genome and mapping algorithms. We show that the proposed approach is capable of accurately finding sample-specific strings representing nearly all variation (>98%) reported across pairs or trios of WGS samples using accurate long reads (e.g. PacBio HiFi data). Availability and implementation Data, code and instructions for reproducing the results presented in this manuscript are publicly available at https://github.com/Parsoa/PingPong. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | - Luca Denti
- Department of Computational Biology, Institut Pasteur, Paris 75015, France
| | | | - Paola Bonizzoni
- Department of Informatics, Systems and Communication, University of Milano-Bicocca, Milano, 20126, Italy,To whom correspondence should be addressed. or or
| | - Rayan Chikhi
- Department of Computational Biology, Institut Pasteur, Paris 75015, France,To whom correspondence should be addressed. or or
| | - Fereydoun Hormozdiari
- Genome Center, UC Davis, Davis, CA 95616, USA,UC Davis MIND Institute, Sacramento, CA 95817, USA,Department of Biochemistry and Molecular Medicine, Sacramento, UC Davis, Sacramento, CA 95817, USA,To whom correspondence should be addressed. or or
| |
Collapse
|
36
|
Hoencamp C, Dudchenko O, Elbatsh AMO, Brahmachari S, Raaijmakers JA, van Schaik T, Sedeño Cacciatore Á, Contessoto VG, van Heesbeen RGHP, van den Broek B, Mhaskar AN, Teunissen H, St Hilaire BG, Weisz D, Omer AD, Pham M, Colaric Z, Yang Z, Rao SSP, Mitra N, Lui C, Yao W, Khan R, Moroz LL, Kohn A, St Leger J, Mena A, Holcroft K, Gambetta MC, Lim F, Farley E, Stein N, Haddad A, Chauss D, Mutlu AS, Wang MC, Young ND, Hildebrandt E, Cheng HH, Knight CJ, Burnham TLU, Hovel KA, Beel AJ, Mattei PJ, Kornberg RD, Warren WC, Cary G, Gómez-Skarmeta JL, Hinman V, Lindblad-Toh K, Di Palma F, Maeshima K, Multani AS, Pathak S, Nel-Themaat L, Behringer RR, Kaur P, Medema RH, van Steensel B, de Wit E, Onuchic JN, Di Pierro M, Lieberman Aiden E, Rowland BD. 3D genomics across the tree of life reveals condensin II as a determinant of architecture type. Science 2021; 372:984-989. [PMID: 34045355 PMCID: PMC8172041 DOI: 10.1126/science.abe2218] [Citation(s) in RCA: 143] [Impact Index Per Article: 35.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 04/16/2021] [Indexed: 01/01/2023]
Abstract
We investigated genome folding across the eukaryotic tree of life. We find two types of three-dimensional (3D) genome architectures at the chromosome scale. Each type appears and disappears repeatedly during eukaryotic evolution. The type of genome architecture that an organism exhibits correlates with the absence of condensin II subunits. Moreover, condensin II depletion converts the architecture of the human genome to a state resembling that seen in organisms such as fungi or mosquitoes. In this state, centromeres cluster together at nucleoli, and heterochromatin domains merge. We propose a physical model in which lengthwise compaction of chromosomes by condensin II during mitosis determines chromosome-scale genome architecture, with effects that are retained during the subsequent interphase. This mechanism likely has been conserved since the last common ancestor of all eukaryotes.
Collapse
Affiliation(s)
- Claire Hoencamp
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands
| | - Olga Dudchenko
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
| | - Ahmed M O Elbatsh
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands
| | | | - Jonne A Raaijmakers
- Division of Cell Biology, Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands
| | - Tom van Schaik
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands
| | | | - Vinícius G Contessoto
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
- Department of Physics, Institute of Biosciences, Letters and Exact Sciences, São Paulo State University (UNESP), São José do Rio Preto - SP, 15054-000, Brazil
| | - Roy G H P van Heesbeen
- Division of Cell Biology, Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands
| | - Bram van den Broek
- BioImaging Facility, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands
| | - Aditya N Mhaskar
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands
| | - Hans Teunissen
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands
| | - Brian Glenn St Hilaire
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David Weisz
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Arina D Omer
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
| | - Melanie Pham
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zane Colaric
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
| | - Zhenzhen Yang
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech, Pudong 201210, China
| | - Suhas S P Rao
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Namita Mitra
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Christopher Lui
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
| | - Weijie Yao
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
| | - Ruqayya Khan
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Leonid L Moroz
- Whitney Laboratory and Department of Neuroscience, University of Florida, Gainesville, FL 32611, USA
| | - Andrea Kohn
- Whitney Laboratory and Department of Neuroscience, University of Florida, Gainesville, FL 32611, USA
| | - Judy St Leger
- Department of Biosciences, Cornell University College of Veterinary Medicine, Ithaca, NY 14853, USA
| | | | | | | | - Fabian Lim
- Department of Medicine and Molecular Biology, University of California, San Diego, La Jolla, CA 92093, USA
| | - Emma Farley
- Department of Medicine and Molecular Biology, University of California, San Diego, La Jolla, CA 92093, USA
| | - Nils Stein
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK Gatersleben), 06466 Seeland, Germany
- Center of Integrated Breeding Research (CiBreed), Department of Crop Sciences, Georg-August-University Göttingen, 37075 Göttingen, Germany
- UWA School of Agriculture and Environment, The University of Western Australia, Perth, WA 6009, Australia
| | - Alexander Haddad
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA
| | - Daniel Chauss
- National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Ayse Sena Mutlu
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Meng C Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Huffington Center on Aging, Baylor College of Medicine, Houston, TX 77030, USA
- Howard Hughes Medical Institute, Baylor College of Medicine, Houston, TX 77030, USA
| | - Neil D Young
- Faculty of Veterinary and Agricultural Sciences, University of Melbourne, Parkville, VIC 3010, Australia
| | - Evin Hildebrandt
- Avian Diseases and Oncology Laboratory, US Department of Agriculture, Agricultural Research Service, East Lansing, MI 48823, USA
| | - Hans H Cheng
- Avian Diseases and Oncology Laboratory, US Department of Agriculture, Agricultural Research Service, East Lansing, MI 48823, USA
| | | | - Theresa L U Burnham
- Department of Wildlife, Fish, and Conservation Biology, University of California, Davis, Davis, CA 95616, USA
- Coastal and Marine Institute and Department of Biology, San Diego State University, San Diego, CA 92106, USA
| | - Kevin A Hovel
- Coastal and Marine Institute and Department of Biology, San Diego State University, San Diego, CA 92106, USA
| | - Andrew J Beel
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Pierre-Jean Mattei
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Roger D Kornberg
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Wesley C Warren
- Department of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
| | - Gregory Cary
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - José Luis Gómez-Skarmeta
- Centro Andaluz de Biología del Desarrollo CSIC, Universidad Pablo de Olavide, 41013 Sevilla, Spain
| | - Veronica Hinman
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Kerstin Lindblad-Toh
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 751 23 Uppsala, Sweden
| | - Federica Di Palma
- Department of Biological Sciences, University of East Anglia, Norwich NR4 7TJ, UK
| | - Kazuhiro Maeshima
- Genome Dynamics Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
- Department of Genetics, Sokendai (Graduate University for Advanced Studies), Mishima, Shizuoka 411-8540, Japan
| | - Asha S Multani
- Department of Genetics, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Sen Pathak
- Department of Genetics, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Liesl Nel-Themaat
- Department of Genetics, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Richard R Behringer
- Department of Genetics, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Parwinder Kaur
- UWA School of Agriculture and Environment, The University of Western Australia, Perth, WA 6009, Australia
| | - René H Medema
- Division of Cell Biology, Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands
| | - Bas van Steensel
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands
| | - Elzo de Wit
- Division of Gene Regulation, Oncode Institute, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
- Departments of Physics and Astronomy, Chemistry, and Biosciences, Rice University, Houston, TX 77005, USA
| | - Michele Di Pierro
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
- Department of Physics, Northeastern University, Boston, MA 02115, USA
| | - Erez Lieberman Aiden
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005, USA
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech, Pudong 201210, China
- UWA School of Agriculture and Environment, The University of Western Australia, Perth, WA 6009, Australia
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Benjamin D Rowland
- Division of Gene Regulation, Netherlands Cancer Institute, 1066 CX Amsterdam, Netherlands.
| |
Collapse
|
37
|
Garg S. Computational methods for chromosome-scale haplotype reconstruction. Genome Biol 2021; 22:101. [PMID: 33845884 PMCID: PMC8040228 DOI: 10.1186/s13059-021-02328-9] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 03/25/2021] [Indexed: 12/13/2022] Open
Abstract
High-quality chromosome-scale haplotype sequences of diploid genomes, polyploid genomes, and metagenomes provide important insights into genetic variation associated with disease and biodiversity. However, whole-genome short read sequencing does not yield haplotype information spanning whole chromosomes directly. Computational assembly of shorter haplotype fragments is required for haplotype reconstruction, which can be challenging owing to limited fragment lengths and high haplotype and repeat variability across genomes. Recent advancements in long-read and chromosome-scale sequencing technologies, alongside computational innovations, are improving the reconstruction of haplotypes at the level of whole chromosomes. Here, we review recent and discuss methodological progress and perspectives in these areas.
Collapse
Affiliation(s)
- Shilpa Garg
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
38
|
Rhie A, McCarthy SA, Fedrigo O, Damas J, Formenti G, Koren S, Uliano-Silva M, Chow W, Fungtammasan A, Kim J, Lee C, Ko BJ, Chaisson M, Gedman GL, Cantin LJ, Thibaud-Nissen F, Haggerty L, Bista I, Smith M, Haase B, Mountcastle J, Winkler S, Paez S, Howard J, Vernes SC, Lama TM, Grutzner F, Warren WC, Balakrishnan CN, Burt D, George JM, Biegler MT, Iorns D, Digby A, Eason D, Robertson B, Edwards T, Wilkinson M, Turner G, Meyer A, Kautt AF, Franchini P, Detrich HW, Svardal H, Wagner M, Naylor GJP, Pippel M, Malinsky M, Mooney M, Simbirsky M, Hannigan BT, Pesout T, Houck M, Misuraca A, Kingan SB, Hall R, Kronenberg Z, Sović I, Dunn C, Ning Z, Hastie A, Lee J, Selvaraj S, Green RE, Putnam NH, Gut I, Ghurye J, Garrison E, Sims Y, Collins J, Pelan S, Torrance J, Tracey A, Wood J, Dagnew RE, Guan D, London SE, Clayton DF, Mello CV, Friedrich SR, Lovell PV, Osipova E, Al-Ajli FO, Secomandi S, Kim H, Theofanopoulou C, Hiller M, Zhou Y, Harris RS, Makova KD, Medvedev P, Hoffman J, Masterson P, Clark K, Martin F, Howe K, Flicek P, Walenz BP, Kwak W, Clawson H, Diekhans M, Nassar L, Paten B, Kraus RHS, Crawford AJ, Gilbert MTP, Zhang G, Venkatesh B, Murphy RW, Koepfli KP, Shapiro B, Johnson WE, Di Palma F, Marques-Bonet T, Teeling EC, Warnow T, Graves JM, Ryder OA, Haussler D, O'Brien SJ, Korlach J, Lewin HA, Howe K, Myers EW, Durbin R, Phillippy AM, Jarvis ED. Towards complete and error-free genome assemblies of all vertebrate species. Nature 2021; 592:737-746. [PMID: 33911273 PMCID: PMC8081667 DOI: 10.1038/s41586-021-03451-0] [Citation(s) in RCA: 1111] [Impact Index Per Article: 277.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 03/12/2021] [Indexed: 02/02/2023]
Abstract
High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species1-4. To address this issue, the international Genome 10K (G10K) consortium5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.
Collapse
Affiliation(s)
- Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Shane A McCarthy
- Department of Genetics, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Olivier Fedrigo
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | - Joana Damas
- The Genome Center, University of California Davis, Davis, CA, USA
| | - Giulio Formenti
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Marcela Uliano-Silva
- Leibniz Institute for Zoo and Wildlife Research, Department of Evolutionary Genetics, Berlin, Germany
- Berlin Center for Genomics in Biodiversity Research, Berlin, Germany
| | | | | | - Juwan Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Chul Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Byung June Ko
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
| | - Mark Chaisson
- University of Southern California, Los Angeles, CA, USA
| | - Gregory L Gedman
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Lindsey J Cantin
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Iliana Bista
- Department of Genetics, University of Cambridge, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | | | - Bettina Haase
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | - Sylke Winkler
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- DRESDEN-concept Genome Center, Dresden, Germany
| | - Sadye Paez
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | | | - Sonja C Vernes
- Neurogenetics of Vocal Communication Group, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
- School of Biology, University of St Andrews, St Andrews, UK
| | - Tanya M Lama
- University of Massachusetts Cooperative Fish and Wildlife Research Unit, Amherst, MA, USA
| | - Frank Grutzner
- School of Biological Science, The Environment Institute, University of Adelaide, Adelaide, South Australia, Australia
| | - Wesley C Warren
- Bond Life Sciences Center, University of Missouri, Columbia, MO, USA
| | | | - Dave Burt
- UQ Genomics, University of Queensland, Brisbane, Queensland, Australia
| | - Julia M George
- Department of Biological Sciences, Clemson University, Clemson, SC, USA
| | - Matthew T Biegler
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA
| | - David Iorns
- The Genetic Rescue Foundation, Wellington, New Zealand
| | - Andrew Digby
- Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand
| | - Daryl Eason
- Kākāpō Recovery, Department of Conservation, Invercargill, New Zealand
| | - Bruce Robertson
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | | | - Mark Wilkinson
- Department of Life Sciences, Natural History Museum, London, UK
| | - George Turner
- School of Natural Sciences, Bangor University, Gwynedd, UK
| | - Axel Meyer
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Andreas F Kautt
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Paolo Franchini
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - H William Detrich
- Department of Marine and Environmental Sciences, Northeastern University Marine Science Center, Nahant, MA, USA
| | - Hannes Svardal
- Department of Biology, University of Antwerp, Antwerp, Belgium
- Naturalis Biodiversity Center, Leiden, The Netherlands
| | - Maximilian Wagner
- Institute of Biology, Karl-Franzens University of Graz, Graz, Austria
| | - Gavin J P Naylor
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
| | - Martin Pippel
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology, Dresden, Germany
| | - Milan Malinsky
- Wellcome Sanger Institute, Cambridge, UK
- Zoological Institute, University of Basel, Basel, Switzerland
| | | | | | | | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | | | | | | | | | | | - Ivan Sović
- Pacific Biosciences, Menlo Park, CA, USA
- Digital BioLogic, Ivanić-Grad, Croatia
| | | | - Zemin Ning
- Wellcome Sanger Institute, Cambridge, UK
| | | | - Joyce Lee
- Bionano Genomics, San Diego, CA, USA
| | | | - Richard E Green
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
- Dovetail Genomics, Santa Cruz, CA, USA
| | | | - Ivo Gut
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | - Jay Ghurye
- Dovetail Genomics, Santa Cruz, CA, USA
- Department of Computer Science, University of Maryland College Park, College Park, MD, USA
| | - Erik Garrison
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Ying Sims
- Wellcome Sanger Institute, Cambridge, UK
| | | | | | | | | | | | | | - Dengfeng Guan
- Department of Genetics, University of Cambridge, Cambridge, UK
- School of Computer Science and Technology, Center for Bioinformatics, Harbin Institute of Technology, Harbin, China
| | - Sarah E London
- Department of Psychology, Institute for Mind and Biology, University of Chicago, Chicago, IL, USA
| | - David F Clayton
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, USA
| | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Samantha R Friedrich
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Peter V Lovell
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Ekaterina Osipova
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
- Center for Systems Biology, Dresden, Germany
- Max Planck Institute for the Physics of Complex Systems, Dresden, Germany
| | - Farooq O Al-Ajli
- Monash University Malaysia Genomics Facility, School of Science, Selangor Darul Ehsan, Malaysia
- Tropical Medicine and Biology Multidisciplinary Platform, Monash University Malaysia, Selangor Darul Ehsan, Malaysia
- Qatar Falcon Genome Project, Doha, Qatar
| | | | - Heebal Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul, Republic of Korea
- eGnome, Inc., Seoul, Republic of Korea
| | | | - Michael Hiller
- LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany
- Senckenberg Research Institute, Frankfurt, Germany
- Goethe-University, Faculty of Biosciences, Frankfurt, Germany
| | | | - Robert S Harris
- Department of Biology, Pennsylvania State University, University Park, PA, USA
| | - Kateryna D Makova
- Department of Biology, Pennsylvania State University, University Park, PA, USA
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
| | - Paul Medvedev
- Center for Medical Genomics, Pennsylvania State University, University Park, PA, USA
- Center for Computational Biology and Bioinformatics, Pennsylvania State University, University Park, PA, USA
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, USA
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, USA
| | - Jinna Hoffman
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Patrick Masterson
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Karen Clark
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, USA
| | - Fergal Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Kevin Howe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Brian P Walenz
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Woori Kwak
- eGnome, Inc., Seoul, Republic of Korea
- Hoonygen, Seoul, Korea
| | - Hiram Clawson
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Luis Nassar
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
| | - Robert H S Kraus
- Department of Biology, University of Konstanz, Konstanz, Germany
- Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, Germany
| | - Andrew J Crawford
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - M Thomas P Gilbert
- Center for Evolutionary Hologenomics, The GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
- University Museum, NTNU, Trondheim, Norway
| | - Guojie Zhang
- China National Genebank, BGI-Shenzhen, Shenzhen, China
- Villum Center for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Byrappa Venkatesh
- Institute of Molecular and Cell Biology, A*STAR, Biopolis, Singapore, Singapore
| | - Robert W Murphy
- Centre for Biodiversity, Royal Ontario Museum, Toronto, Ontario, Canada
| | - Klaus-Peter Koepfli
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, DC, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Warren E Johnson
- Smithsonian Conservation Biology Institute, Center for Species Survival, National Zoological Park, Washington, DC, USA
- The Walter Reed Biosystematics Unit, Museum Support Center MRC-534, Smithsonian Institution, Suitland, MD, USA
- Walter Reed Army Institute of Research, Silver Spring, MD, USA
| | - Federica Di Palma
- Department of Biological Sciences, Earlham Institute, University of East Anglia, Norwich, UK
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), PRBB, Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Emma C Teeling
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
| | - Tandy Warnow
- Department of Computer Science, The University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | | | - Oliver A Ryder
- San Diego Zoo Global, Escondido, CA, USA
- Department of Evolution, Behavior, and Ecology, University of California San Diego, La Jolla, CA, USA
| | - David Haussler
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Stephen J O'Brien
- Laboratory of Genomics Diversity-Center for Computer Technologies, ITMO University, St. Petersburg, Russian Federation
- Guy Harvey Oceanographic Center, Halmos College of Natural Sciences and Oceanography, Nova Southeastern University, Fort Lauderdale, FL, USA
| | | | - Harris A Lewin
- The Genome Center, University of California Davis, Davis, CA, USA
- Department of Evolution and Ecology, University of California Davis, Davis, CA, USA
- John Muir Institute for the Environment, University of California Davis, Davis, CA, USA
| | | | - Eugene W Myers
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
- Center for Systems Biology, Dresden, Germany.
- Faculty of Computer Science, Technical University Dresden, Dresden, Germany.
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, UK.
- Wellcome Sanger Institute, Cambridge, UK.
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.
| | - Erich D Jarvis
- Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA.
- Laboratory of Neurogenetics of Language, The Rockefeller University, New York, NY, USA.
- Howard Hughes Medical Institute, Chevy Chase, MD, USA.
| |
Collapse
|
39
|
Wang J, Itgen MW, Wang H, Gong Y, Jiang J, Li J, Sun C, Sessions SK, Mueller RL. Gigantic Genomes Provide Empirical Tests of Transposable Element Dynamics Models. GENOMICS PROTEOMICS & BIOINFORMATICS 2021; 19:123-139. [PMID: 33677107 PMCID: PMC8498967 DOI: 10.1016/j.gpb.2020.11.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 11/29/2020] [Accepted: 11/30/2020] [Indexed: 12/12/2022]
Abstract
Transposable elements (TEs) are a major determinant of eukaryotic genome size. The collective properties of a genomic TE community reveal the history of TE/host evolutionary dynamics and impact present-day host structure and function, from genome to organism levels. In rare cases, TE community/genome size has greatly expanded in animals, associated with increased cell size and changes to anatomy and physiology. Here, we characterize the TE landscape of the genome and transcriptome in an amphibian with a giant genome — the caecilianIchthyophis bannanicus, which we show has a genome size of 12.2 Gb. Amphibians are an important model system because the clade includes independent cases of genomic gigantism. The I. bannanicus genome differs compositionally from other giant amphibian genomes, but shares a low rate of ectopic recombination-mediated deletion. We examine TE activity using expression and divergence plots; TEs account for 15% of somatic transcription, and most superfamilies appear active. We quantify TE diversity in the caecilian, as well as other vertebrates with a range of genome sizes, using diversity indices commonly applied in community ecology. We synthesize previous models that integrate TE abundance, diversity, and activity, and test whether the caecilian meets model predictions for genomes with high TE abundance. We propose thorough, consistent characterization of TEs to strengthen future comparative analyses. Such analyses will ultimately be required to reveal whether the divergent TE assemblages found across convergent gigantic genomes reflect fundamental shared features of TE/host genome evolutionary dynamics.
Collapse
Affiliation(s)
- Jie Wang
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China.
| | - Michael W Itgen
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Huiju Wang
- School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China
| | - Yuzhou Gong
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
| | - Jianping Jiang
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
| | - Jiatang Li
- CAS Key Laboratory of Mountain Ecological Restoration and Bioresource Utilization & Ecological Restoration and Biodiversity Conservation Key Laboratory of Sichuan Province, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
| | - Cheng Sun
- Institute of Apicultural Research, Chinese Academy of Agricultural Sciences, Beijing 100093, China
| | | | | |
Collapse
|
40
|
Brandies PA, Hogg CJ. Ten simple rules for getting started with command-line bioinformatics. PLoS Comput Biol 2021; 17:e1008645. [PMID: 33600404 PMCID: PMC7891784 DOI: 10.1371/journal.pcbi.1008645] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Affiliation(s)
- Parice A. Brandies
- School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, New South Wales, Australia
| | - Carolyn J. Hogg
- School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, New South Wales, Australia
- * E-mail:
| |
Collapse
|
41
|
Peel E, Frankenberg S, Hogg CJ, Pask A, Belov K. Annotation of immune genes in the extinct thylacine (Thylacinus cynocephalus). Immunogenetics 2021; 73:263-275. [PMID: 33544183 DOI: 10.1007/s00251-020-01197-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 11/24/2020] [Indexed: 11/28/2022]
Abstract
Advances in genome sequencing technology have enabled genomes of extinct species to be sequenced. However, given the fragmented nature of these genome assemblies, it is not clear whether it is possible to comprehensively annotate highly variable and repetitive genes such as those involved in immunity. As such, immune genes have only been investigated in a handful of extinct genomes, mainly in human lineages. In 2018 the genome of the thylacine (Thylacinus cynocephalus), a carnivorous marsupial from Tasmania that went extinct in 1936, was sequenced. Here we attempt to characterise the immune repertoire of the thylacine and determine similarity to its closest relative with a genome available, the Tasmanian devil (Sarcophilus harrisii), as well as other marsupials. Members from all major immune gene families were identified. However, variable regions could not be characterised, and complex families such as the major histocompatibility complex (MHC) were highly fragmented and located across multiple small scaffolds. As such, at a gene level we were unable to reconstruct full-length coding sequences for the majority of thylacine immune genes. Despite this, we identified genes encoding functionally important receptors and immune effector molecules, which suggests the functional capacity of the thylacine immune system was similar to other mammals. However, the high number of partial immune gene sequences identified limits our ability to reconstruct an accurate picture of the thylacine immune repertoire.
Collapse
Affiliation(s)
- Emma Peel
- School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, NSW, Australia
| | | | - Carolyn J Hogg
- School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, NSW, Australia
| | - Andrew Pask
- School of BioSciences, The University of Melbourne, Vic, Australia
| | - Katherine Belov
- School of Life and Environmental Sciences, Faculty of Science, The University of Sydney, Sydney, NSW, Australia.
| |
Collapse
|
42
|
Santana FL, Estrada K, Ortiz E, Corzo G. Reptilian β-defensins: Expanding the repertoire of known crocodylian peptides. Peptides 2021; 136:170473. [PMID: 33309943 DOI: 10.1016/j.peptides.2020.170473] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 11/26/2020] [Accepted: 12/03/2020] [Indexed: 01/31/2023]
Abstract
One of the major families of host defense peptides (HDPs) in vertebrates are β-defensins. They constitute important components of innate immunity and have remained an interesting topic of research for more than two decades. While many β-defensin sequences in mammals and birds have been identified and their properties and functions characterized, β-defensin peptides from other groups of vertebrates, particularly reptiles, are still largely unexplored. In this review, we focus on reptilian β-defensins and summarize different aspects of their biology, such as their genomic organization, evolution, structure, and biological activities. Reptilian β-defensin genes exhibit similar genomic organization to birds and their number and gene structure are variable among different species. During the evolution of reptiles, several gene duplication and deletion events have occurred and the functional diversification of β-defensins has been mainly driven by positive selection. These peptides display broad antimicrobial activity in vitro, but a deeper understanding of their mechanisms of action in vivo, including their role as immunomodulators, is still lacking. Reptilian β-defensins constitute unique polypeptide sequences to expand our current understanding of innate immunity in these animals and elucidate core biological functions of this family of HDPs across amniotes.
Collapse
Affiliation(s)
- Felix L Santana
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, A.P. 510-3, Cuernavaca Mor., 62250, Mexico.
| | - Karel Estrada
- Unidad de Secuenciación Masiva y Bioinformática, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, Mexico
| | - Ernesto Ortiz
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, A.P. 510-3, Cuernavaca Mor., 62250, Mexico
| | - Gerardo Corzo
- Departamento de Medicina Molecular y Bioprocesos, Instituto de Biotecnología, Universidad Nacional Autónoma de México, A.P. 510-3, Cuernavaca Mor., 62250, Mexico.
| |
Collapse
|
43
|
Abstract
Understanding the genetic mechanisms underlying particular adaptations/phenotypes of organisms is one of the core issues of evolutionary biology. The use of genomic data has greatly advanced our understandings on this issue, as well as other aspects of evolutionary biology, including molecular adaptation, speciation, and even conservation of endangered species. Despite the well-recognized advantages, usages of genomic data are still limited to non-mammal vertebrate groups, partly due to the difficulties in assembling large or highly heterozygous genomes. Although this is particularly the case for amphibians, nonetheless, several comparative and population genomic analyses have shed lights into the speciation and adaptation processes of amphibians in a complex landscape, giving a promising hope for a wider application of genomics in the previously believed challenging groups of organisms. At the same time, these pioneer studies also allow us to realize numerous challenges in studying the molecular adaptations and/or phenotypic evolutionary mechanisms of amphibians. In this review, we first summarize the recent progresses in the study of adaptive evolution of amphibians based on genomic data, and then we give perspectives regarding how to effectively identify key pathways underlying the evolution of complex traits in the genomic era, as well as directions for future research.
Collapse
Affiliation(s)
- Yan-Bo Sun
- Laboratory of Ecology and Evolutionary Biology, Yunnan University, Kunming, Yunnan 650091, China.,State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China. E-mail:
| | - Yi Zhang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Kai Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China.,Sam Noble Oklahoma Museum of Natural History and Department of Biology, University of Oklahoma, Norman, Oklahoma 73072, USA
| |
Collapse
|
44
|
Murphy WJ, Foley NM, Bredemeyer KR, Gatesy J, Springer MS. Phylogenomics and the Genetic Architecture of the Placental Mammal Radiation. Annu Rev Anim Biosci 2020; 9:29-53. [PMID: 33228377 DOI: 10.1146/annurev-animal-061220-023149] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The genomes of placental mammals are being sequenced at an unprecedented rate. Alignments of hundreds, and one day thousands, of genomes spanning the rich living and extinct diversity of species offer unparalleled power to resolve phylogenetic controversies, identify genomic innovations of adaptation, and dissect the genetic architecture of reproductive isolation. We highlight outstanding questions about the earliest phases of placental mammal diversification and the promise of newer methods, as well as remaining challenges, toward using whole genome data to resolve placental mammal phylogeny. The next phase of mammalian comparative genomics will see the completion and application of finished-quality, gapless genome assemblies from many ordinal lineages and closely related species. Interspecific comparisons between the most hypervariable genomic loci will likely reveal large, but heretofore mostly underappreciated, effects on population divergence, morphological innovation, and the origin of new species.
Collapse
Affiliation(s)
- William J Murphy
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Nicole M Foley
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - Kevin R Bredemeyer
- Veterinary Integrative Biosciences, Texas A&M University, College Station, Texas 77843, USA;
| | - John Gatesy
- Division of Vertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA
| | - Mark S Springer
- Department of Evolution, Ecology and Organismal Biology, University of California, Riverside, California 92521, USA
| |
Collapse
|
45
|
Feng S, Stiller J, Deng Y, Armstrong J, Fang Q, Reeve AH, Xie D, Chen G, Guo C, Faircloth BC, Petersen B, Wang Z, Zhou Q, Diekhans M, Chen W, Andreu-Sánchez S, Margaryan A, Howard JT, Parent C, Pacheco G, Sinding MHS, Puetz L, Cavill E, Ribeiro ÂM, Eckhart L, Fjeldså J, Hosner PA, Brumfield RT, Christidis L, Bertelsen MF, Sicheritz-Ponten T, Tietze DT, Robertson BC, Song G, Borgia G, Claramunt S, Lovette IJ, Cowen SJ, Njoroge P, Dumbacher JP, Ryder OA, Fuchs J, Bunce M, Burt DW, Cracraft J, Meng G, Hackett SJ, Ryan PG, Jønsson KA, Jamieson IG, da Fonseca RR, Braun EL, Houde P, Mirarab S, Suh A, Hansson B, Ponnikas S, Sigeman H, Stervander M, Frandsen PB, van der Zwan H, van der Sluis R, Visser C, Balakrishnan CN, Clark AG, Fitzpatrick JW, Bowman R, Chen N, Cloutier A, Sackton TB, Edwards SV, Foote DJ, Shakya SB, Sheldon FH, Vignal A, Soares AER, Shapiro B, González-Solís J, Ferrer-Obiol J, Rozas J, Riutort M, Tigano A, Friesen V, Dalén L, Urrutia AO, Székely T, Liu Y, Campana MG, Corvelo A, Fleischer RC, Rutherford KM, Gemmell NJ, Dussex N, Mouritsen H, Thiele N, Delmore K, Liedvogel M, Franke A, Hoeppner MP, Krone O, Fudickar AM, Milá B, Ketterson ED, Fidler AE, Friis G, Parody-Merino ÁM, Battley PF, Cox MP, Lima NCB, Prosdocimi F, Parchman TL, Schlinger BA, Loiselle BA, Blake JG, Lim HC, Day LB, Fuxjager MJ, Baldwin MW, Braun MJ, Wirthlin M, Dikow RB, Ryder TB, Camenisch G, Keller LF, DaCosta JM, Hauber ME, Louder MIM, Witt CC, McGuire JA, Mudge J, Megna LC, Carling MD, Wang B, Taylor SA, Del-Rio G, Aleixo A, Vasconcelos ATR, Mello CV, Weir JT, Haussler D, Li Q, Yang H, Wang J, Lei F, Rahbek C, Gilbert MTP, Graves GR, Jarvis ED, Paten B, Zhang G. Dense sampling of bird diversity increases power of comparative genomics. Nature 2020; 587:252-257. [PMID: 33177665 PMCID: PMC7759463 DOI: 10.1038/s41586-020-2873-9] [Citation(s) in RCA: 206] [Impact Index Per Article: 41.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 07/27/2020] [Indexed: 12/13/2022]
Abstract
Whole-genome sequencing projects are increasingly populating the tree of life and characterizing biodiversity1-4. Sparse taxon sampling has previously been proposed to confound phylogenetic inference5, and captures only a fraction of the genomic diversity. Here we report a substantial step towards the dense representation of avian phylogenetic and molecular diversity, by analysing 363 genomes from 92.4% of bird families-including 267 newly sequenced genomes produced for phase II of the Bird 10,000 Genomes (B10K) Project. We use this comparative genome dataset in combination with a pipeline that leverages a reference-free whole-genome alignment to identify orthologous regions in greater numbers than has previously been possible and to recognize genomic novelties in particular bird lineages. The densely sampled alignment provides a single-base-pair map of selection, has more than doubled the fraction of bases that are confidently predicted to be under conservation and reveals extensive patterns of weak selection in predominantly non-coding DNA. Our results demonstrate that increasing the diversity of genomes used in comparative studies can reveal more shared and lineage-specific variation, and improve the investigation of genomic characteristics. We anticipate that this genomic resource will offer new perspectives on evolutionary processes in cross-species comparative analyses and assist in efforts to conserve species.
Collapse
Affiliation(s)
- Shaohong Feng
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
- BGI-Shenzhen, Shenzhen, China
| | - Josefin Stiller
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Yuan Deng
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Joel Armstrong
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Qi Fang
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Andrew Hart Reeve
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Duo Xie
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Guangji Chen
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
- BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, China
| | - Chunxue Guo
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
| | - Brant C Faircloth
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
| | - Bent Petersen
- Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, AIMST University, Kedah, Malaysia
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Zongji Wang
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
- MOE Laboratory of Biosystems Homeostasis and Protection, Life Sciences Institute, Zhejiang University, Hangzhou, China
- Department of Neuroscience and Developmental Biology, University of Vienna, Vienna, Austria
| | - Qi Zhou
- MOE Laboratory of Biosystems Homeostasis and Protection, Life Sciences Institute, Zhejiang University, Hangzhou, China
- Department of Neuroscience and Developmental Biology, University of Vienna, Vienna, Austria
- Center for Reproductive Medicine, The 2nd Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Mark Diekhans
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Wanjun Chen
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
| | - Sergio Andreu-Sánchez
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Ashot Margaryan
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- Institute of Molecular Biology, National Academy of Sciences, Yerevan, Armenia
| | | | | | - George Pacheco
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Mikkel-Holger S Sinding
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Lara Puetz
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Emily Cavill
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Ângela M Ribeiro
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Leopold Eckhart
- Department of Dermatology, Medical University of Vienna, Vienna, Austria
| | - Jon Fjeldså
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
- Center for Macroecology, Evolution, and Climate, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Peter A Hosner
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
- Center for Macroecology, Evolution, and Climate, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Robb T Brumfield
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
| | - Les Christidis
- Southern Cross University, Coffs Harbour, New South Wales, Australia
| | - Mads F Bertelsen
- Centre for Zoo and Wild Animal Health, Copenhagen Zoo, Frederiksberg, Denmark
| | - Thomas Sicheritz-Ponten
- Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), Faculty of Applied Sciences, AIMST University, Kedah, Malaysia
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Gang Song
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Environmental Futures Research Institute, Griffith University, Nathan, Queensland, Australia
| | - Gerald Borgia
- Department of Biology, University of Maryland, College Park, MD, USA
| | - Santiago Claramunt
- Department of Natural History, Royal Ontario Museum, Toronto, Ontario, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Irby J Lovette
- Cornell Lab of Ornithology, Cornell University, Ithaca, NY, USA
| | - Saul J Cowen
- Biodiversity and Conservation Science, Department of Biodiversity Conservation and Attractions, Perth, Western Australia, Australia
| | - Peter Njoroge
- Ornithology Section, Zoology Department, National Museums of Kenya, Nairobi, Kenya
| | | | - Oliver A Ryder
- San Diego Zoo Institute for Conservation Research, Escondido, CA, USA
- Evolution, Behavior, and Ecology, Division of Biology, University of California San Diego, La Jolla, CA, USA
| | - Jérôme Fuchs
- Institut de Systématique, Evolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | - Michael Bunce
- Trace and Environmental DNA (TrEnD) Laboratory, School of Molecular and Life Sciences, Curtin University, Western Australia, Perth, Australia
| | - David W Burt
- UQ Genomics, University of Queensland, Brisbane, Queensland, Australia
| | - Joel Cracraft
- Department of Ornithology, American Museum of Natural History, New York, NY, USA
| | | | - Shannon J Hackett
- Integrative Research Center, Field Museum of Natural History, Chicago, IL, USA
| | - Peter G Ryan
- FitzPatrick Institute of African Ornithology, University of Cape Town, Cape Town, South Africa
| | - Knud Andreas Jønsson
- Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
| | - Ian G Jamieson
- Department of Zoology, University of Otago, Dunedin, New Zealand
| | - Rute R da Fonseca
- Center for Macroecology, Evolution, and Climate, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| | - Edward L Braun
- Department of Biology, University of Florida, Gainesville, FL, USA
| | - Peter Houde
- Department of Biology, New Mexico State University, Las Cruces, NM, USA
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Alexander Suh
- Department of Ecology and Genetics - Evolutionary Biology, Evolutionary Biology Centre (EBC), Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre (EBC), Science for Life Laboratory, Uppsala University, Uppsala, Sweden
- School of Biological Sciences, University of East Anglia, Norwich, UK
| | - Bengt Hansson
- Department of Biology, Lund University, Lund, Sweden
| | - Suvi Ponnikas
- Department of Biology, Lund University, Lund, Sweden
| | - Hanna Sigeman
- Department of Biology, Lund University, Lund, Sweden
| | - Martin Stervander
- Department of Biology, Lund University, Lund, Sweden
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
| | - Paul B Frandsen
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, UT, USA
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC, USA
| | | | - Rencia van der Sluis
- Focus Area for Human Metabolomics, North-West University, Potchefstroom, South Africa
| | - Carina Visser
- Department of Animal Sciences, University of Pretoria, Pretoria, South Africa
| | | | - Andrew G Clark
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | | | - Reed Bowman
- Avian Ecology Program, Archbold Biological Station, Venus, FL, USA
| | - Nancy Chen
- Department of Biology, University of Rochester, Rochester, NY, USA
| | - Alison Cloutier
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | | | - Scott V Edwards
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
- Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA
| | - Dustin J Foote
- Department of Biology, East Carolina University, Greenville, NC, USA
- Sylvan Heights Bird Park, Scotland Neck, NC, USA
| | - Subir B Shakya
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
| | - Frederick H Sheldon
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
| | - Alain Vignal
- GenPhySE, INRA, INPT, INP-ENVT, Université de Toulouse, Castanet-Tolosan, France
| | - André E R Soares
- Laboratório Nacional de Computação Científica, Petrópolis, Brazil
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Beth Shapiro
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, CA, USA
- Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Jacob González-Solís
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
- Departament de Biologia Evolutiva, Ecologia i Ciències Ambientals (BEECA), Universitat de Barcelona, Barcelona, Spain
| | - Joan Ferrer-Obiol
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
| | - Marta Riutort
- Institut de Recerca de la Biodiversitat (IRBio), Universitat de Barcelona, Barcelona, Spain
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona, Spain
| | - Anna Tigano
- Department of Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH, USA
- Department of Biology, Queen's University, Kingston, Ontario, Canada
| | - Vicki Friesen
- Department of Biology, Queen's University, Kingston, Ontario, Canada
| | - Love Dalén
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
- Centre for Palaeogenetics, Stockholm, Sweden
| | - Araxi O Urrutia
- Milner Centre for Evolution, University of Bath, Bath, UK
- Instituto de Ecologia, UNAM, Mexico City, Mexico
| | - Tamás Székely
- Milner Centre for Evolution, University of Bath, Bath, UK
| | - Yang Liu
- State Key Laboratory of Biocontrol, School of Ecology, Sun Yat-sen University, Guangzhou, China
| | - Michael G Campana
- Center for Conservation Genomics, Smithsonian Conservation Biology Institute, Smithsonian Institution, Washington, DC, USA
| | | | - Robert C Fleischer
- Center for Conservation Genomics, Smithsonian Conservation Biology Institute, Smithsonian Institution, Washington, DC, USA
| | - Kim M Rutherford
- Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Neil J Gemmell
- Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Nicolas Dussex
- Department of Bioinformatics and Genetics, Swedish Museum of Natural History, Stockholm, Sweden
- Centre for Palaeogenetics, Stockholm, Sweden
- Department of Anatomy, University of Otago, Dunedin, New Zealand
| | - Henrik Mouritsen
- AG Neurosensory Sciences, Institut für Biologie und Umweltwissenschaften, University of Oldenburg, Oldenburg, Germany
| | - Nadine Thiele
- AG Neurosensory Sciences, Institut für Biologie und Umweltwissenschaften, University of Oldenburg, Oldenburg, Germany
| | - Kira Delmore
- Biology Department, Texas A&M University, College Station, TX, USA
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Miriam Liedvogel
- MPRG Behavioural Genomics, Max Planck Institute for Evolutionary Biology, Plön, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Marc P Hoeppner
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
| | - Oliver Krone
- Department of Wildlife Diseases, Leibniz Institute for Zoo and Wildlife Research, Berlin, Germany
| | - Adam M Fudickar
- Environmental Resilience Institute, Indiana University, Bloomington, IN, USA
| | - Borja Milá
- National Museum of Natural Sciences, Spanish National Research Council (CSIC), Madrid, Spain
| | | | - Andrew Eric Fidler
- Institute of Marine Science, University of Auckland, Auckland, New Zealand
| | - Guillermo Friis
- Center for Genomics and Systems Biology, Department of Biology, New York University - Abu Dhabi, Abu Dhabi, UAE
| | | | - Phil F Battley
- Wildlife and Ecology Group, Massey University, Palmerston North, New Zealand
| | - Murray P Cox
- School of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | - Nicholas Costa Barroso Lima
- Laboratório Nacional de Computação Científica, Petrópolis, Brazil
- Departamento de Bioquímica e Biologia Molecular, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, Brazil
| | - Francisco Prosdocimi
- Laboratório de Genômica e Biodiversidade, Instituto de Bioquímica Médica Leopoldo de Meis, Rio de Janeiro, Brazil
| | | | - Barney A Schlinger
- Department of Integrative Biology and Physiology, UCLA, Los Angeles, CA, USA
- Smithsonian Tropical Research Institute, Panama City, Panama
| | - Bette A Loiselle
- Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
- Center for Latin American Studies, University of Florida, Gainesville, FL, USA
| | - John G Blake
- Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA
| | - Haw Chuan Lim
- Center for Conservation Genomics, Smithsonian Conservation Biology Institute, Smithsonian Institution, Washington, DC, USA
- Department of Biology, George Mason University, Fairfax, VA, USA
| | - Lainy B Day
- Department of Biology and Neuroscience Minor, University of Mississippi, University, MS, USA
| | - Matthew J Fuxjager
- Department of Ecology and Evolutionary Biology, Brown University, Providence, RI, USA
| | | | - Michael J Braun
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
- Behavior, Ecology, Evolution and Systematics Program, University of Maryland, College Park, MD, USA
| | - Morgan Wirthlin
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Rebecca B Dikow
- Data Science Lab, Office of the Chief Information Officer, Smithsonian Institution, Washington, DC, USA
| | - T Brandt Ryder
- Migratory Bird Center, Smithsonian National Zoological Park and Conservation Biology Institute, Washington, DC, USA
| | - Glauco Camenisch
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | - Lukas F Keller
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zurich, Switzerland
| | | | - Mark E Hauber
- Department of Evolution, Ecology, and Behavior, School of Integrative Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Matthew I M Louder
- Department of Biology, East Carolina University, Greenville, NC, USA
- Department of Evolution, Ecology, and Behavior, School of Integrative Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- International Research Center for Neurointelligence, University of Tokyo, Tokyo, Japan
| | - Christopher C Witt
- Museum of Southwestern Biology, Department of Biology, University of New Mexico, Albuquerque, NM, USA
| | - Jimmy A McGuire
- Museum of Vertebrate Zoology, Department of Integrative Biology, University of California, Berkeley, Berkeley, CA, USA
| | - Joann Mudge
- National Center for Genome Resources, Santa Fe, NM, USA
| | - Libby C Megna
- Department of Zoology and Physiology, University of Wyoming, Laramie, WY, USA
| | - Matthew D Carling
- Department of Zoology and Physiology, University of Wyoming, Laramie, WY, USA
| | - Biao Wang
- School of BioSciences, The University of Melbourne, Melbourne, Victoria, Australia
| | - Scott A Taylor
- Department of Ecology and Evolutionary Biology, University of Colorado Boulder, Boulder, CO, USA
| | - Glaucia Del-Rio
- Museum of Natural Science, Louisiana State University, Baton Rouge, LA, USA
| | - Alexandre Aleixo
- Finnish Museum of Natural History, University of Helsinki, Helsinki, Finland
| | | | - Claudio V Mello
- Department of Behavioral Neuroscience, Oregon Health and Science University, Portland, OR, USA
| | - Jason T Weir
- Department of Natural History, Royal Ontario Museum, Toronto, Ontario, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
- Department of Biological Sciences, University of Toronto Scarborough, Toronto, Ontario, Canada
| | - David Haussler
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
| | - Qiye Li
- China National GeneBank, BGI-Shenzhen, Shenzhen, China
- BGI-Shenzhen, Shenzhen, China
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen, China
- James D. Watson Institute of Genome Sciences, Hangzhou, China
| | | | - Fumin Lei
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| | - Carsten Rahbek
- Center for Macroecology, Evolution, and Climate, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
- Danish Institute for Advanced Study, University of Southern Denmark, Odense, Denmark
- Institute of Ecology, Peking University, Beijing, China
- Department of Life Sciences, Imperial College London, Ascot, UK
| | - M Thomas P Gilbert
- Section for Evolutionary Genomics, The GLOBE Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
- University Museum, Norwegian University of Science and Technology, Trondheim, Norway
| | - Gary R Graves
- Center for Macroecology, Evolution, and Climate, GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Erich D Jarvis
- Duke University Medical Center, Durham, NC, USA
- The Rockefeller University, New York, NY, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA.
| | - Guojie Zhang
- China National GeneBank, BGI-Shenzhen, Shenzhen, China.
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China.
- Villum Centre for Biodiversity Genomics, Section for Ecology and Evolution, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
- Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China.
| |
Collapse
|
46
|
Turakhia Y, Chen HI, Marcovitz A, Bejerano G. A fully-automated method discovers loss of mouse-lethal and human-monogenic disease genes in 58 mammals. Nucleic Acids Res 2020; 48:e91. [PMID: 32614390 PMCID: PMC7498332 DOI: 10.1093/nar/gkaa550] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 05/23/2020] [Accepted: 06/23/2020] [Indexed: 01/20/2023] Open
Abstract
Gene losses provide an insightful route for studying the morphological and physiological adaptations of species, but their discovery is challenging. Existing genome annotation tools focus on annotating intact genes and do not attempt to distinguish nonfunctional genes from genes missing annotation due to sequencing and assembly artifacts. Previous attempts to annotate gene losses have required significant manual curation, which hampers their scalability for the ever-increasing deluge of newly sequenced genomes. Using extreme sequence erosion (amino acid deletions and substitutions) and sister species support as an unambiguous signature of loss, we developed an automated approach for detecting high-confidence gene loss events across a species tree. Our approach relies solely on gene annotation in a single reference genome, raw assemblies for the remaining species to analyze, and the associated phylogenetic tree for all organisms involved. Using human as reference, we discovered over 400 unique human ortholog erosion events across 58 mammals. This includes dozens of clade-specific losses of genes that result in early mouse lethality or are associated with severe human congenital diseases. Our discoveries yield intriguing potential for translational medical genetics and evolutionary biology, and our approach is readily applicable to large-scale genome sequencing efforts across the tree of life.
Collapse
Affiliation(s)
- Yatish Turakhia
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
| | - Heidi I Chen
- Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA
| | - Amir Marcovitz
- Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA
| | - Gill Bejerano
- Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA
- Department of Computer Science, Stanford University, Stanford, CA 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
- Department of Pediatrics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
47
|
Fouret J, Brunet FG, Binet M, Aurine N, Enchéry F, Croze S, Guinier M, Goumaidi A, Preininger D, Volff JN, Bailly-Bechet M, Lachuer J, Horvat B, Legras-Lachuer C. Sequencing the Genome of Indian Flying Fox, Natural Reservoir of Nipah Virus, Using Hybrid Assembly and Conservative Secondary Scaffolding. Front Microbiol 2020; 11:1807. [PMID: 32849415 PMCID: PMC7403528 DOI: 10.3389/fmicb.2020.01807] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 07/09/2020] [Indexed: 11/20/2022] Open
Abstract
Indian fruit bats, flying fox Pteropus medius was identified as an asymptomatic natural host of recently emerged Nipah virus, which is known to induce a severe infectious disease in humans. The absence of P. medius genome sequence presents an important obstacle for further studies of virus–host interactions and better understanding of mechanisms of zoonotic viral emergence. Generation of the high-quality genome sequence is often linked to a considerable effort associated to elevated costs. Although secondary scaffolding methods have reduced sequencing expenses, they imply the development of new tools for the integration of different data sources to achieve more reliable sequencing results. We initially sequenced the P. medius genome using the combination of Illumina paired-end and Nanopore sequencing, with a depth of 57.4x and 6.1x, respectively. Then, we introduced the novel scaff2link software to integrate multiple sources of information for secondary scaffolding, allowing to remove the association with discordant information among two sources. Different quality metrics were next produced to validate the benefits from secondary scaffolding. The P. medius genome, assembled by this method, has a length of 1,985 Mb and consists of 33,613 contigs and 16,113 scaffolds with an NG50 of 19 Mb. At least 22.5% of the assembled sequences is covered by interspersed repeats already described in other species and 19,823 coding genes are annotated. Phylogenetic analysis demonstrated the clustering of P. medius genome with two other Pteropus bat species, P. alecto and P. vampyrus, for which genome sequences are currently available. SARS-CoV entry receptor ACE2 sequence of P. medius was 82.7% identical with ACE2 of Rhinolophus sinicus bats, thought to be the natural host of SARS-CoV. Altogether, our results confirm that a lower depth of sequencing is enough to obtain a valuable genome sequence, using secondary scaffolding approaches and demonstrate the benefits of the scaff2link application. The genome sequence is now available to the scientific community to (i) proceed with further genomic analysis of P. medius, (ii) to characterize the underlying mechanism allowing Nipah virus maintenance and perpetuation in its bat host, and (iii) to monitor their evolutionary pathways toward a better understanding of bats’ ability to control viral infections.
Collapse
Affiliation(s)
- Julien Fouret
- CIRI, International Center for Infectiology Research, Team Immunobiology of Viral Infections, Univ Lyon, INSERM U1111, CNRS UMR 5308, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, Lyon, France.,Viroscan3D, Trévoux, France
| | - Frédéric G Brunet
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS UMR 5242, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, Lyon, France
| | - Martin Binet
- CIRI, International Center for Infectiology Research, Team Immunobiology of Viral Infections, Univ Lyon, INSERM U1111, CNRS UMR 5308, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, Lyon, France.,Viroscan3D, Trévoux, France
| | - Noémie Aurine
- CIRI, International Center for Infectiology Research, Team Immunobiology of Viral Infections, Univ Lyon, INSERM U1111, CNRS UMR 5308, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, Lyon, France
| | - Francois Enchéry
- CIRI, International Center for Infectiology Research, Team Immunobiology of Viral Infections, Univ Lyon, INSERM U1111, CNRS UMR 5308, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, Lyon, France
| | - Séverine Croze
- Plateforme Profilexpert, Université Claude Bernard Lyon 1, Lyon, France
| | | | | | | | - Jean-Nicolas Volff
- Institut de Génomique Fonctionnelle de Lyon, Université de Lyon, CNRS UMR 5242, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, Lyon, France
| | | | - Joël Lachuer
- Cancer Research Center of Lyon, INSERM 1052/CNRS 5286, Université de Lyon, Lyon, France.,Plateforme Profilexpert, Université Claude Bernard Lyon 1, Lyon, France
| | - Branka Horvat
- CIRI, International Center for Infectiology Research, Team Immunobiology of Viral Infections, Univ Lyon, INSERM U1111, CNRS UMR 5308, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon 1, Lyon, France
| | - Catherine Legras-Lachuer
- Viroscan3D, Trévoux, France.,Ecologie Microbienne, CNRS UMR 5557, LEM, INRA, VetAgro Sup, Université Claude Bernard Lyon 1, Villeurbanne, France
| |
Collapse
|
48
|
Castrignanò T, Gioiosa S, Flati T, Cestari M, Picardi E, Chiara M, Fratelli M, Amente S, Cirilli M, Tangaro MA, Chillemi G, Pesole G, Zambelli F. ELIXIR-IT HPC@CINECA: high performance computing resources for the bioinformatics community. BMC Bioinformatics 2020; 21:352. [PMID: 32838759 PMCID: PMC7446135 DOI: 10.1186/s12859-020-03565-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND The advent of Next Generation Sequencing (NGS) technologies and the concomitant reduction in sequencing costs allows unprecedented high throughput profiling of biological systems in a cost-efficient manner. Modern biological experiments are increasingly becoming both data and computationally intensive and the wealth of publicly available biological data is introducing bioinformatics into the "Big Data" era. For these reasons, the effective application of High Performance Computing (HPC) architectures is becoming progressively more recognized also by bioinformaticians. Here we describe HPC resources provisioning pilot programs dedicated to bioinformaticians, run by the Italian Node of ELIXIR (ELIXIR-IT) in collaboration with CINECA, the main Italian supercomputing center. RESULTS Starting from April 2016, CINECA and ELIXIR-IT launched the pilot Call "ELIXIR-IT HPC@CINECA", offering streamlined access to HPC resources for bioinformatics. Resources are made available either through web front-ends to dedicated workflows developed at CINECA or by providing direct access to the High Performance Computing systems through a standard command-line interface tailored for bioinformatics data analysis. This allows to offer to the biomedical research community a production scale environment, continuously updated with the latest available versions of publicly available reference datasets and bioinformatic tools. Currently, 63 research projects have gained access to the HPC@CINECA program, for a total handout of ~ 8 Millions of CPU/hours and, for data storage, ~ 100 TB of permanent and ~ 300 TB of temporary space. CONCLUSIONS Three years after the beginning of the ELIXIR-IT HPC@CINECA program, we can appreciate its impact over the Italian bioinformatics community and draw some considerations. Several Italian researchers who applied to the program have gained access to one of the top-ranking public scientific supercomputing facilities in Europe. Those investigators had the opportunity to sensibly reduce computational turnaround times in their research projects and to process massive amounts of data, pursuing research approaches that would have been otherwise difficult or impossible to undertake. Moreover, by taking advantage of the wealth of documentation and training material provided by CINECA, participants had the opportunity to improve their skills in the usage of HPC systems and be better positioned to apply to similar EU programs of greater scale, such as PRACE. To illustrate the effective usage and impact of the resources awarded by the program - in different research applications - we report five successful use cases, which have already published their findings in peer-reviewed journals.
Collapse
Affiliation(s)
- Tiziana Castrignanò
- Department of Ecological and Biological Sciences (DEB), University of Tuscia, Viterbo, Italy.
| | - Silvia Gioiosa
- CINECA, SuperComputing Applications and Innovation Department, Rome, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (IBIOM-CNR), Bari, Italy
| | - Tiziano Flati
- CINECA, SuperComputing Applications and Innovation Department, Rome, Italy.,Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (IBIOM-CNR), Bari, Italy
| | - Mirko Cestari
- CINECA, SuperComputing Applications and Innovation Department, Rome, Italy
| | - Ernesto Picardi
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (IBIOM-CNR), Bari, Italy.,Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari "A. Moro", Bari, Italy
| | - Matteo Chiara
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (IBIOM-CNR), Bari, Italy.,Department of Biosciences, University of Milan, Milan, Italy
| | - Maddalena Fratelli
- IRCCS-Istituto di Ricerche Farmacologiche "Mario Negri", Milano, Milan, Italy
| | - Stefano Amente
- Department of Molecular Medicine and Medical Biotechnologies, University of Naples 'Federico II', Naples, Italy
| | - Marco Cirilli
- Department of Agricultural and Environmental Sciences - Production, Landscape, Agroenergy (DISAA), University of Milan, Milan, Italy
| | - Marco Antonio Tangaro
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (IBIOM-CNR), Bari, Italy
| | - Giovanni Chillemi
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (IBIOM-CNR), Bari, Italy.,Department for Innovation in Biological, Agro-food and Forest systems (DIBAF), University of Tuscia, Viterbo, Italy
| | - Graziano Pesole
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (IBIOM-CNR), Bari, Italy. .,Department of Biosciences, Biotechnology and Biopharmaceutics, University of Bari "A. Moro", Bari, Italy.
| | - Federico Zambelli
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council (IBIOM-CNR), Bari, Italy. .,Department of Biosciences, University of Milan, Milan, Italy.
| |
Collapse
|
49
|
Medina JJ, Maley JM, Sannapareddy S, Medina NN, Gilman CM, McCormack JE. A rapid and cost-effective pipeline for digitization of museum specimens with 3D photogrammetry. PLoS One 2020; 15:e0236417. [PMID: 32790700 PMCID: PMC7425849 DOI: 10.1371/journal.pone.0236417] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Accepted: 07/06/2020] [Indexed: 01/06/2023] Open
Abstract
Natural history collections are yielding more information as digitization brings specimen data to researchers, connects specimens across museums, and as new technologies allow for more large-scale data collection. Therefore, a key goal in specimen digitization is developing methods that both increase access and allow for the highest yield of phenomic data. 3D digitization is increasingly popular because it has the potential to meet both aspects of that key goal. However, current methods overlook or do not prioritize some of the most sought-after phenotypic traits, those involving the external appearance of specimens, especially color. Here, we introduce an efficient and cost-effective pipeline for 3D photogrammetry to capture the external appearance of natural history specimens and other museum objects. 3D photogrammetry aligns and compares sets of dozens, hundreds, or even thousands of photos to create 3D models. The hardware set-up requires little physical space and around $3,000 in initial investment, while the software pipeline requires $1,400/year for proprietary software subscriptions (with open-source alternatives). The creation of each 3D model takes 1-2 hours/specimen and much of the software pipeline is automated with minimal supervision required, including the onerous step of mesh processing. We showcase the method by creating 3D models for most of the type specimens in the Moore Laboratory of Zoology bird collection and show that digital bill measurements are comparable to hand-taken measurements. Color data, while not included as part of this pipeline, is easily extractable from the models and one of the most promising areas of data collection. Future advances can adapt the method for ultraviolet reflectance capture and increased efficiency and model quality. Combined with genomic data, phenomic data from 3D models including photogrammetry will open new doors to understanding organismal evolution.
Collapse
Affiliation(s)
- Joshua J. Medina
- Moore Laboratory of Zoology, Occidental College, Los Angeles, CA, United States of America
| | - James M. Maley
- Moore Laboratory of Zoology, Occidental College, Los Angeles, CA, United States of America
| | - Siddharth Sannapareddy
- Moore Laboratory of Zoology, Occidental College, Los Angeles, CA, United States of America
| | - Noah N. Medina
- Moore Laboratory of Zoology, Occidental College, Los Angeles, CA, United States of America
| | - Cyril M. Gilman
- Moore Laboratory of Zoology, Occidental College, Los Angeles, CA, United States of America
| | - John E. McCormack
- Moore Laboratory of Zoology, Occidental College, Los Angeles, CA, United States of America
| |
Collapse
|
50
|
Alves LQ, Ruivo R, Fonseca MM, Lopes-Marques M, Ribeiro P, Castro L. PseudoChecker: an integrated online platform for gene inactivation inference. Nucleic Acids Res 2020; 48:W321-W331. [PMID: 32449938 PMCID: PMC7319564 DOI: 10.1093/nar/gkaa408] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Revised: 04/22/2020] [Accepted: 05/06/2020] [Indexed: 01/21/2023] Open
Abstract
The rapid expansion of high-quality genome assemblies, exemplified by ongoing initiatives such as the Genome-10K and i5k, demands novel automated methods to approach comparative genomics. Of these, the study of inactivating mutations in the coding region of genes, or pseudogenization, as a source of evolutionary novelty is mostly overlooked. Thus, to address such evolutionary/genomic events, a systematic, accurate and computationally automated approach is required. Here, we present PseudoChecker, the first integrated online platform for gene inactivation inference. Unlike the few existing methods, our comparative genomics-based approach displays full automation, a built-in graphical user interface and a novel index, PseudoIndex, for an empirical evaluation of the gene coding status. As a multi-platform online service, PseudoChecker simplifies access and usability, allowing a fast identification of disruptive mutations. An analysis of 30 genes previously reported to be eroded in mammals, and 30 viable genes from the same lineages, demonstrated that PseudoChecker was able to correctly infer 97% of loss events and 95% of functional genes, confirming its reliability. PseudoChecker is freely available, without login required, at http://pseudochecker.ciimar.up.pt.
Collapse
Affiliation(s)
- Luís Q Alves
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, U. Porto-University of Porto, Matosinhos, 4450-208, Portugal
| | - Raquel Ruivo
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, U. Porto-University of Porto, Matosinhos, 4450-208, Portugal
| | - Miguel M Fonseca
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, U. Porto-University of Porto, Matosinhos, 4450-208, Portugal
| | - Mónica Lopes-Marques
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, U. Porto-University of Porto, Matosinhos, 4450-208, Portugal
| | - Pedro Ribeiro
- CRACS & INESC-TEC Department of Computer Science, FCUP, Porto, 4169-007, Portugal
| | - L Filipe C Castro
- CIIMAR-Interdisciplinary Centre of Marine and Environmental Research, U. Porto-University of Porto, Matosinhos, 4450-208, Portugal
- Department of Biology, FCUP, Porto, 4169-007, Portugal
| |
Collapse
|