51
|
Caswell J, Gans JD, Generous N, Hudson CM, Merkley E, Johnson C, Oehmen C, Omberg K, Purvine E, Taylor K, Ting CL, Wolinsky M, Xie G. Defending Our Public Biological Databases as a Global Critical Infrastructure. Front Bioeng Biotechnol 2019; 7:58. [PMID: 31024904 PMCID: PMC6460893 DOI: 10.3389/fbioe.2019.00058] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Accepted: 03/04/2019] [Indexed: 11/13/2022] Open
Abstract
Progress in modern biology is being driven, in part, by the large amounts of freely available data in public resources such as the International Nucleotide Sequence Database Collaboration (INSDC), the world's primary database of biological sequence (and related) information. INSDC and similar databases have dramatically increased the pace of fundamental biological discovery and enabled a host of innovative therapeutic, diagnostic, and forensic applications. However, as high-value, openly shared resources with a high degree of assumed trust, these repositories share compelling similarities to the early days of the Internet. Consequently, as public biological databases continue to increase in size and importance, we expect that they will face the same threats as undefended cyberspace. There is a unique opportunity, before a significant breach and loss of trust occurs, to ensure they evolve with quality and security as a design philosophy rather than costly "retrofitted" mitigations. This Perspective surveys some potential quality assurance and security weaknesses in existing open genomic and proteomic repositories, describes methods to mitigate the likelihood of both intentional and unintentional errors, and offers recommendations for risk mitigation based on lessons learned from cybersecurity.
Collapse
Affiliation(s)
- Jacob Caswell
- Sandia National Laboratories, Albuquerque, NM, United States
| | - Jason D Gans
- Los Alamos National Laboratory, Bioscience Division, Los Alamos, NM, United States
| | - Nicholas Generous
- Los Alamos National Laboratory, Global Security Directorate, Los Alamos, NM, United States
| | - Corey M Hudson
- Sandia National Laboratories, Livermore, CA, United States
| | - Eric Merkley
- Pacific Northwest National Laboratory, Richland, WA, United States
| | - Curtis Johnson
- Sandia National Laboratories, Albuquerque, NM, United States
| | | | - Kristin Omberg
- Pacific Northwest National Laboratory, Richland, WA, United States
| | - Emilie Purvine
- Pacific Northwest National Laboratory, Richland, WA, United States
| | - Karen Taylor
- Pacific Northwest National Laboratory, Richland, WA, United States
| | | | - Murray Wolinsky
- Los Alamos National Laboratory, Bioscience Division, Los Alamos, NM, United States
| | - Gary Xie
- Los Alamos National Laboratory, Bioscience Division, Los Alamos, NM, United States
| |
Collapse
|
52
|
Laumer CE. Inferring Ancient Relationships with Genomic Data: A Commentary on Current Practices. Integr Comp Biol 2019; 58:623-639. [PMID: 29982611 DOI: 10.1093/icb/icy075] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Contemporary phylogeneticists enjoy an embarrassment of riches, not only in the volumes of data now available, but also in the diversity of bioinformatic tools for handling these data. Here, I discuss a subset of these tools I consider well-suited to the task of inferring ancient relationships with coding sequence data in particular, encompassing data generation, orthology assignment, alignment and gene tree inference, supermatrix construction, and analysis under the best-fitting models applicable to large-scale datasets. Throughout, I compare and critique methods, considering both their theoretical principles and the details of their implementation, and offering practical tips on usage where appropriate. I also entertain different motivations for analyzing what are almost always originally DNA sequence data as codons, amino acids, and higher-order recodings. Although presented in a linear order, I see value in using the diversity of tools available to us to assess the sensitivity of clades of biological interest to different gene and taxon sets and analytical modes, which can be an indication of the presence of systematic error, of which a few forms remain poorly controlled by even the best available inference methods.
Collapse
Affiliation(s)
- Christopher E Laumer
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, EBML-EBI South Building, Hinxton CB10 1SD, UK
| |
Collapse
|
53
|
Galtier N, Roux C, Rousselle M, Romiguier J, Figuet E, Glémin S, Bierne N, Duret L. Codon Usage Bias in Animals: Disentangling the Effects of Natural Selection, Effective Population Size, and GC-Biased Gene Conversion. Mol Biol Evol 2019; 35:1092-1103. [PMID: 29390090 DOI: 10.1093/molbev/msy015] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Selection on codon usage bias is well documented in a number of microorganisms. Whether codon usage is also generally shaped by natural selection in large organisms, despite their relatively small effective population size (Ne), is unclear. In animals, the population genetics of codon usage bias has only been studied in a handful of model organisms so far, and can be affected by confounding, nonadaptive processes such as GC-biased gene conversion and experimental artefacts. Using population transcriptomics data, we analyzed the relationship between codon usage, gene expression, allele frequency distribution, and recombination rate in 30 nonmodel species of animals, each from a different family, covering a wide range of effective population sizes. We disentangled the effects of translational selection and GC-biased gene conversion on codon usage by separately analyzing GC-conservative and GC-changing mutations. We report evidence for effective translational selection on codon usage in large-Ne species of animals, but not in small-Ne ones, in agreement with the nearly neutral theory of molecular evolution. C- and T-ending codons tend to be preferred over synonymous G- and A-ending ones, for reasons that remain to be determined. In contrast, we uncovered a conspicuous effect of GC-biased gene conversion, which is widespread in animals and the main force determining the fate of AT↔GC mutations. Intriguingly, the strength of its effect was uncorrelated with Ne.
Collapse
Affiliation(s)
- Nicolas Galtier
- UMR5554, Institut des Sciences de l'Evolution, University Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Camille Roux
- UMR5554, Institut des Sciences de l'Evolution, University Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland.,UMR 8198 - Evo-Eco-Paleo, CNRS, Université de Lille-Sciences et Technologies, Villeneuve d'Ascq, France
| | - Marjolaine Rousselle
- UMR5554, Institut des Sciences de l'Evolution, University Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Jonathan Romiguier
- UMR5554, Institut des Sciences de l'Evolution, University Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Department of Ecology and Evolution, University of Lausanne, Lausanne, Switzerland
| | - Emeric Figuet
- UMR5554, Institut des Sciences de l'Evolution, University Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Sylvain Glémin
- UMR5554, Institut des Sciences de l'Evolution, University Montpellier, CNRS, IRD, EPHE, Montpellier, France.,Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
| | - Nicolas Bierne
- UMR5554, Institut des Sciences de l'Evolution, University Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Laurent Duret
- Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, CNRS, Université de Lyon, Université Lyon 1, Villeurbanne, France
| |
Collapse
|
54
|
Rousselle M, Laverré A, Figuet E, Nabholz B, Galtier N. Influence of Recombination and GC-biased Gene Conversion on the Adaptive and Nonadaptive Substitution Rate in Mammals versus Birds. Mol Biol Evol 2019; 36:458-471. [PMID: 30590692 PMCID: PMC6389324 DOI: 10.1093/molbev/msy243] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Recombination is expected to affect functional sequence evolution in several ways. On the one hand, recombination is thought to improve the efficiency of multilocus selection by dissipating linkage disequilibrium. On the other hand, natural selection can be counteracted by recombination-associated transmission distorters such as GC-biased gene conversion (gBGC), which tends to promote G and C alleles irrespective of their fitness effect in high-recombining regions. It has been suggested that gBGC might impact coding sequence evolution in vertebrates, and particularly the ratio of nonsynonymous to synonymous substitution rates (dN/dS). However, distinctive gBGC patterns have been reported in mammals and birds, maybe reflecting the documented contrasts in evolutionary dynamics of recombination rate between these two taxa. Here, we explore how recombination and gBGC affect coding sequence evolution in mammals and birds by analyzing proteome-wide data in six species of Galloanserae (fowls) and six species of catarrhine primates. We estimated the dN/dS ratio and rates of adaptive and nonadaptive evolution in bins of genes of increasing recombination rate, separately analyzing AT → GC, GC → AT, and G ↔ C/A ↔ T mutations. We show that in both taxa, recombination and gBGC entail a decrease in dN/dS. Our analysis indicates that recombination enhances the efficiency of purifying selection by lowering Hill-Robertson effects, whereas gBGC leads to an overestimation of the adaptive rate of AT → GC mutations. Finally, we report a mutagenic effect of recombination, which is independent of gBGC.
Collapse
Affiliation(s)
| | - Alexandre Laverré
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Emeric Figuet
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Benoit Nabholz
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| | - Nicolas Galtier
- ISEM, Université de Montpellier, CNRS, IRD, EPHE, Montpellier, France
| |
Collapse
|
55
|
Contamination in Low Microbial Biomass Microbiome Studies: Issues and Recommendations. Trends Microbiol 2019; 27:105-117. [DOI: 10.1016/j.tim.2018.11.003] [Citation(s) in RCA: 421] [Impact Index Per Article: 84.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 09/06/2018] [Accepted: 11/05/2018] [Indexed: 01/18/2023]
|
56
|
Hooper R, Brealey JC, van der Valk T, Alberdi A, Durban JW, Fearnbach H, Robertson KM, Baird RW, Bradley Hanson M, Wade P, Gilbert MTP, Morin PA, Wolf JBW, Foote AD, Guschanski K. Host-derived population genomics data provides insights into bacterial and diatom composition of the killer whale skin. Mol Ecol 2019; 28:484-502. [PMID: 30187987 PMCID: PMC6487819 DOI: 10.1111/mec.14860] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Revised: 08/15/2018] [Accepted: 08/27/2018] [Indexed: 12/20/2022]
Abstract
Recent exploration into the interactions and relationship between hosts and their microbiota has revealed a connection between many aspects of the host's biology, health and associated micro-organisms. Whereas amplicon sequencing has traditionally been used to characterize the microbiome, the increasing number of published population genomics data sets offers an underexploited opportunity to study microbial profiles from the host shotgun sequencing data. Here, we use sequence data originally generated from killer whale Orcinus orca skin biopsies for population genomics, to characterize the skin microbiome and investigate how host social and geographical factors influence the microbial community composition. Having identified 845 microbial taxa from 2.4 million reads that did not map to the killer whale reference genome, we found that both ecotypic and geographical factors influence community composition of killer whale skin microbiomes. Furthermore, we uncovered key taxa that drive the microbiome community composition and showed that they are embedded in unique networks, one of which is tentatively linked to diatom presence and poor skin condition. Community composition differed between Antarctic killer whales with and without diatom coverage, suggesting that the previously reported episodic migrations of Antarctic killer whales to warmer waters associated with skin turnover may control the effects of potentially pathogenic bacteria such as Tenacibaculum dicentrarchi. Our work demonstrates the feasibility of microbiome studies from host shotgun sequencing data and highlights the importance of metagenomics in understanding the relationship between host and microbial ecology.
Collapse
Affiliation(s)
- Rebecca Hooper
- Animal EcologyDepartment of Ecology and GeneticsEvolutionary Biology CentreUppsala UniversityUppsalaSweden
| | - Jaelle C. Brealey
- Animal EcologyDepartment of Ecology and GeneticsEvolutionary Biology CentreUppsala UniversityUppsalaSweden
| | - Tom van der Valk
- Animal EcologyDepartment of Ecology and GeneticsEvolutionary Biology CentreUppsala UniversityUppsalaSweden
| | - Antton Alberdi
- Centre for GeoGeneticsNatural History Museum of DenmarkUniversity of CopenhagenCopenhagen KDenmark
| | - John W. Durban
- Marine Mammal and Turtle DivisionSouthwest Fisheries Science CenterNational Marine Fisheries ServiceNational Oceanic and Atmospheric AdministrationLa JollaCalifornia
| | - Holly Fearnbach
- SR3, SeaLife Response, Rehabilitation, and ResearchSeattleWashington
| | - Kelly M. Robertson
- Marine Mammal and Turtle DivisionSouthwest Fisheries Science CenterNational Marine Fisheries ServiceNational Oceanic and Atmospheric AdministrationLa JollaCalifornia
| | | | - M. Bradley Hanson
- Northwest Fisheries Science CenterNational Marine Fisheries ServiceNational Oceanic and Atmospheric AdministrationSeattleWashington
| | - Paul Wade
- National Marine Mammal LaboratoryAlaska Fisheries Science CenterNational Marine Fisheries ServiceNational Oceanic and Atmospheric AdministrationSeattleWashington
| | - M. Thomas P. Gilbert
- Centre for GeoGeneticsNatural History Museum of DenmarkUniversity of CopenhagenCopenhagen KDenmark
- NTNU University MuseumTrondheimNorway
| | - Phillip A. Morin
- Marine Mammal and Turtle DivisionSouthwest Fisheries Science CenterNational Marine Fisheries ServiceNational Oceanic and Atmospheric AdministrationLa JollaCalifornia
| | - Jochen B. W. Wolf
- Science of Life Laboratories and Department of Evolutionary BiologyEvolutionary Biology CentreUppsala UniversityUppsalaSweden
- Section of Evolutionary BiologyFaculty of BiologyLMU MunichMunichGermany
| | - Andrew D. Foote
- Molecular Ecology and Fisheries Genetics LaboratorySchool of Biological SciencesBangor UniversityBangorGwyneddUK
| | - Katerina Guschanski
- Animal EcologyDepartment of Ecology and GeneticsEvolutionary Biology CentreUppsala UniversityUppsalaSweden
| |
Collapse
|
57
|
Straube N, Li C, Mertzen M, Yuan H, Moritz T. A phylogenomic approach to reconstruct interrelationships of main clupeocephalan lineages with a critical discussion of morphological apomorphies. BMC Evol Biol 2018; 18:158. [PMID: 30352561 PMCID: PMC6199709 DOI: 10.1186/s12862-018-1267-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2017] [Accepted: 09/26/2018] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Previous molecular studies on the phylogeny and classification of clupeocephalan fishes revealed numerous new taxonomic entities. For re-analysing these taxa, we perform target gene capturing and subsequent next generation sequencing of putative ortholog exons of major clupeocephalan lineages. Sequence information for the RNA bait design was derived from publicly available genomes of bony fishes. Newly acquired sequence data comprising > 800 exon sequences was subsequently used for phylogenetic reconstructions. RESULTS Our results support monophyletic Otomorpha comprising Alepocephaliformes. Within Ostariophysi, Gonorynchiformes are sister to a clade comprising Cypriniformes, Characiformes, Siluriformes and Gymnotiformes, where the interrelationships of Characiformes, Siluriformes and Gymnotiformes remain enigmatic. Euteleosts comprise four major clades: Lepidogalaxiiformes, Protacanthopterygii, Stomiatii, and Galaxiiformes plus Neoteleostei. The monotypic Lepidogalaxiiformes form the sister-group to all remaining euteleosts. Protacanthopterygii, comprising Argentini-, Esoci- and Salmoniformes, is sister to Stomiatii (Osmeriformes and Stomiatiformes) and Galaxiiformes plus Neoteleostei. CONCLUSIONS Several proposed monophyla defined by morphological apomorphies within the Clupeocephalan phylogeny are confirmed by the phylogenetic estimates presented herein. However, other morphologically described groups cannot be reconciled with molecular phylogenies. Thus, numerous morphological apomoprhies of supposed monophyla are called into question. The interpretation of suggested morphological synapomorphies of otomorph fishes is strongly affected by the inclusion of deep-sea inhabiting, and to that effect morphologically adapted Alepocephaliformes. Our revision of these potential synapomorphies, in the context that Alepocephaliformes are otomorph fishes, reveals that only a single character of the total nine characters proposed as synapomorphic for the group is clearly valid for all otomorphs. Three further characters remain possible apomorphies since their status remains unclear in the deep-sea adapted Alepocephaliformes showing developmental lag and lacking a swim bladder. Further, our analysis places Galaxiiformes as sister group to neoteleosts, which contradicts some previous molecular phylogenetic studies. This needs further investigation from a morphological perspective, as suggested synapomophies for several euteleostean lineages are challenged or still lacking. For the verification of results presented herein, a denser phylogenomic-level taxon sampling should be applied.
Collapse
Affiliation(s)
- Nicolas Straube
- Institut für Zoologie & Evolutionsbiologie, Friedrich-Schiller-Universität Jena, Erbertstraße 1, 07743 Jena, Germany
- Zoologische Staatssammlung München, Staatliche Naturwissenschaftliche Sammlungen Bayerns, Münchhausenstraße 21, 81247 Munich, Germany
| | - Chenhong Li
- Key Laboratory of Exploration and Utilization of Aquatic, Genetic Resources, Shanghai Ocean University, Ministry of Education, Shanghai, 201306 China
| | - Matthias Mertzen
- Institut für Zoologie & Evolutionsbiologie, Friedrich-Schiller-Universität Jena, Erbertstraße 1, 07743 Jena, Germany
- Deutsches Meeresmuseum, Katharinenberg 14-20, 18439 Stralsund, Germany
| | - Hao Yuan
- Key Laboratory of Exploration and Utilization of Aquatic, Genetic Resources, Shanghai Ocean University, Ministry of Education, Shanghai, 201306 China
| | - Timo Moritz
- Institut für Zoologie & Evolutionsbiologie, Friedrich-Schiller-Universität Jena, Erbertstraße 1, 07743 Jena, Germany
- Deutsches Meeresmuseum, Katharinenberg 14-20, 18439 Stralsund, Germany
| |
Collapse
|
58
|
Cornet L, Meunier L, Van Vlierberghe M, Léonard RR, Durieu B, Lara Y, Misztak A, Sirjacobs D, Javaux EJ, Philippe H, Wilmotte A, Baurain D. Consensus assessment of the contamination level of publicly available cyanobacterial genomes. PLoS One 2018; 13:e0200323. [PMID: 30044797 PMCID: PMC6059444 DOI: 10.1371/journal.pone.0200323] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 06/22/2018] [Indexed: 12/31/2022] Open
Abstract
Publicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially important for Cyanobacteria because axenic strains are notoriously difficult to obtain and keep in culture. Yet, despite their great scientific interest, no data are currently available concerning the quality of publicly available cyanobacterial genomes. As reliably detecting contaminants is a complex task, we designed a pipeline combining six methods in a consensus strategy to assess the contamination level of 440 genome assemblies of Cyanobacteria. Two methods are based on published reference databases of ribosomal genes (SSU rRNA 16S and ribosomal proteins), one is indirectly based on a reference database of marker genes (CheckM), and three are based on complete genome analysis. Among those genome-wide methods, Kraken and DIAMOND blastx share the same reference database that we derived from Ensembl Bacteria, whereas CONCOCT does not require any reference database, instead relying on differences in DNA tetramer frequencies. Given that all the six methods appear to have their own strengths and limitations, we used the consensus of their rankings to infer that >5% of cyanobacterial genome assemblies are highly contaminated by foreign DNA (i.e., contaminants were detected by 5 or 6 methods). Our results will help researchers to check the quality of publicly available genomic data before use in their own analyses. Moreover, we argue that journals should make mandatory the submission of raw read data along with genome assemblies in order to facilitate the detection of contaminants in sequence databases.
Collapse
Affiliation(s)
- Luc Cornet
- InBioS–PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium
- UR Geology–Palaeobiogeology-Palaeobotany-Palaeopalynology, University of Liège, Liège, Belgium
| | - Loïc Meunier
- InBioS–PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium
| | - Mick Van Vlierberghe
- InBioS–PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium
| | - Raphaël R. Léonard
- InBioS–PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium
- InBioS–CIP, Macromolecular Crystallography, University of Liège, Liège, Belgium
| | - Benoit Durieu
- InBioS–CIP, Centre for Protein Engineering, University of Liège, Liège, Belgium
| | - Yannick Lara
- InBioS–CIP, Centre for Protein Engineering, University of Liège, Liège, Belgium
| | - Agnieszka Misztak
- InBioS–PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium
- Intercollegiate Faculty of Biotechnology UG-MUG, Gdansk, Poland
| | - Damien Sirjacobs
- InBioS–PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium
| | - Emmanuelle J. Javaux
- UR Geology–Palaeobiogeology-Palaeobotany-Palaeopalynology, University of Liège, Liège, Belgium
| | - Hervé Philippe
- Centre for Biodiversity Theory and Modelling, Moulis, France
| | - Annick Wilmotte
- InBioS–CIP, Centre for Protein Engineering, University of Liège, Liège, Belgium
| | - Denis Baurain
- InBioS–PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium
- * E-mail:
| |
Collapse
|
59
|
Waldron FM, Stone GN, Obbard DJ. Metagenomic sequencing suggests a diversity of RNA interference-like responses to viruses across multicellular eukaryotes. PLoS Genet 2018; 14:e1007533. [PMID: 30059538 PMCID: PMC6085071 DOI: 10.1371/journal.pgen.1007533] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Revised: 08/09/2018] [Accepted: 07/04/2018] [Indexed: 11/24/2022] Open
Abstract
RNA interference (RNAi)-related pathways target viruses and transposable element (TE) transcripts in plants, fungi, and ecdysozoans (nematodes and arthropods), giving protection against infection and transmission. In each case, this produces abundant TE and virus-derived 20-30nt small RNAs, which provide a characteristic signature of RNAi-mediated defence. The broad phylogenetic distribution of the Argonaute and Dicer-family genes that mediate these pathways suggests that defensive RNAi is ancient, and probably shared by most animal (metazoan) phyla. Indeed, while vertebrates had been thought an exception, it has recently been argued that mammals also possess an antiviral RNAi pathway, although its immunological relevance is currently uncertain and the viral small RNAs (viRNAs) are not easily detectable. Here we use a metagenomic approach to test for the presence of viRNAs in five species from divergent animal phyla (Porifera, Cnidaria, Echinodermata, Mollusca, and Annelida), and in a brown alga-which represents an independent origin of multicellularity from plants, fungi, and animals. We use metagenomic RNA sequencing to identify around 80 virus-like contigs in these lineages, and small RNA sequencing to identify viRNAs derived from those viruses. We identified 21U small RNAs derived from an RNA virus in the brown alga, reminiscent of plant and fungal viRNAs, despite the deep divergence between these lineages. However, contrary to our expectations, we were unable to identify canonical (i.e. Drosophila- or nematode-like) viRNAs in any of the animals, despite the widespread presence of abundant micro-RNAs, and somatic transposon-derived piwi-interacting RNAs. We did identify a distinctive group of small RNAs derived from RNA viruses in the mollusc. However, unlike ecdysozoan viRNAs, these had a piRNA-like length distribution but lacked key signatures of piRNA biogenesis. We also identified primary piRNAs derived from putatively endogenous copies of DNA viruses in the cnidarian and the echinoderm, and an endogenous RNA virus in the mollusc. The absence of canonical virus-derived small RNAs from our samples may suggest that the majority of animal phyla lack an antiviral RNAi response. Alternatively, these phyla could possess an antiviral RNAi response resembling that reported for vertebrates, with cryptic viRNAs not detectable through simple metagenomic sequencing of wild-type individuals. In either case, our findings show that the antiviral RNAi responses of arthropods and nematodes, which are highly divergent from each other and from that of plants and fungi, are also highly diverged from the most likely ancestral metazoan state.
Collapse
Affiliation(s)
- Fergal M. Waldron
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, United Kingdom
| | - Graham N. Stone
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, United Kingdom
| | - Darren J. Obbard
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Edinburgh, United Kingdom
- Centre for Immunity Infection and Evolution, University of Edinburgh, Ashworth Laboratories, Edinburgh, United Kingdom
| |
Collapse
|
60
|
Sheik CS, Reese BK, Twing KI, Sylvan JB, Grim SL, Schrenk MO, Sogin ML, Colwell FS. Identification and Removal of Contaminant Sequences From Ribosomal Gene Databases: Lessons From the Census of Deep Life. Front Microbiol 2018; 9:840. [PMID: 29780369 PMCID: PMC5945997 DOI: 10.3389/fmicb.2018.00840] [Citation(s) in RCA: 69] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Accepted: 04/12/2018] [Indexed: 11/15/2022] Open
Abstract
Earth’s subsurface environment is one of the largest, yet least studied, biomes on Earth, and many questions remain regarding what microorganisms are indigenous to the subsurface. Through the activity of the Census of Deep Life (CoDL) and the Deep Carbon Observatory, an open access 16S ribosomal RNA gene sequence database from diverse subsurface environments has been compiled. However, due to low quantities of biomass in the deep subsurface, the potential for incorporation of contaminants from reagents used during sample collection, processing, and/or sequencing is high. Thus, to understand the ecology of subsurface microorganisms (i.e., the distribution, richness, or survival), it is necessary to minimize, identify, and remove contaminant sequences that will skew the relative abundances of all taxa in the sample. In this meta-analysis, we identify putative contaminants associated with the CoDL dataset, recommend best practices for removing contaminants from samples, and propose a series of best practices for subsurface microbiology sampling. The most abundant putative contaminant genera observed, independent of evenness across samples, were Propionibacterium, Aquabacterium, Ralstonia, and Acinetobacter. While the top five most frequently observed genera were Pseudomonas, Propionibacterium, Acinetobacter, Ralstonia, and Sphingomonas. The majority of the most frequently observed genera (high evenness) were associated with reagent or potential human contamination. Additionally, in DNA extraction blanks, we observed potential archaeal contaminants, including methanogens, which have not been discussed in previous contamination studies. Such contaminants would directly affect the interpretation of subsurface molecular studies, as methanogenesis is an important subsurface biogeochemical process. Utilizing previously identified contaminant genera, we found that ∼27% of the total dataset were identified as contaminant sequences that likely originate from DNA extraction and DNA cleanup methods. Thus, controls must be taken at every step of the collection and processing procedure when working with low biomass environments such as, but not limited to, portions of Earth’s deep subsurface. Taken together, we stress that the CoDL dataset is an incredible resource for the broader research community interested in subsurface life, and steps to remove contamination derived sequences must be taken prior to using this dataset.
Collapse
Affiliation(s)
- Cody S Sheik
- Department of Biology and Large Lakes Observatory, University of Minnesota Duluth, Duluth, MN, United States
| | - Brandi Kiel Reese
- Department of Life Sciences, Texas A&M University Corpus Christi, Corpus Christi, TX, United States
| | - Katrina I Twing
- Department of Biology, The University of Utah, Salt Lake City, UT, United States
| | - Jason B Sylvan
- Department of Oceanography, Texas A&M University, College Station, TX, United States
| | - Sharon L Grim
- Department of Earth and Environmental Sciences, University of Michigan, Ann Arbor, MI, United States
| | - Matthew O Schrenk
- Department of Earth and Environmental Sciences, Michigan State University, East Lansing, MI, United States
| | - Mitchell L Sogin
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA, United States
| | - Frederick S Colwell
- College of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
61
|
Tang KW, Larsson E. Tumour virology in the era of high-throughput genomics. Philos Trans R Soc Lond B Biol Sci 2018; 372:rstb.2016.0265. [PMID: 28893932 PMCID: PMC5597732 DOI: 10.1098/rstb.2016.0265] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/09/2017] [Indexed: 12/12/2022] Open
Abstract
With the advent of massively parallel sequencing, oncogenic viruses in tumours can now be detected in an unbiased and comprehensive manner. Additionally, new viruses or strains can be discovered based on sequence similarity with known viruses. Using this approach, the causative agent for Merkel cell carcinoma was identified. Subsequent studies using data from large collections of tumours have confirmed models built during decades of hypothesis-driven and low-throughput research, and a more detailed and comprehensive description of virus-tumour associations have emerged. Notably, large cohorts and high sequencing depth, in combination with newly developed bioinformatical techniques, have made it possible to rule out several suggested virus-tumour associations with a high degree of confidence. In this review we discuss possibilities, limitations and insights gained from using massively parallel sequencing to characterize tumours with viral content, with emphasis on detection of viral sequences and genomic integration events.This article is part of the themed issue 'Human oncogenic viruses'.
Collapse
Affiliation(s)
- Ka-Wei Tang
- Department of Infectious Diseases, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Medicinaregatan 9A, 405 30 Gothenburg, Sweden
| | - Erik Larsson
- Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Medicinaregatan 9A, 405 30 Gothenburg, Sweden
| |
Collapse
|
62
|
Simion P, Belkhir K, François C, Veyssier J, Rink JC, Manuel M, Philippe H, Telford MJ. A software tool 'CroCo' detects pervasive cross-species contamination in next generation sequencing data. BMC Biol 2018; 16:28. [PMID: 29506533 PMCID: PMC5838952 DOI: 10.1186/s12915-018-0486-7] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 01/11/2018] [Indexed: 01/20/2023] Open
Abstract
Background Multiple RNA samples are frequently processed together and often mixed before multiplex sequencing in the same sequencing run. While different samples can be separated post sequencing using sample barcodes, the possibility of cross contamination between biological samples from different species that have been processed or sequenced in parallel has the potential to be extremely deleterious for downstream analyses. Results We present CroCo, a software package for identifying and removing such cross contaminants from assembled transcriptomes. Using multiple, recently published sequence datasets, we show that cross contamination is consistently present at varying levels in real data. Using real and simulated data, we demonstrate that CroCo detects contaminants efficiently and correctly. Using a real example from a molecular phylogenetic dataset, we show that contaminants, if not eliminated, can have a decisive, deleterious impact on downstream comparative analyses. Conclusions Cross contamination is pervasive in new and published datasets and, if undetected, can have serious deleterious effects on downstream analyses. CroCo is a database-independent, multi-platform tool, designed for ease of use, that efficiently and accurately detects and removes cross contamination in assembled transcriptomes to avoid these problems. We suggest that the use of CroCo should become a standard cleaning step when processing multiple samples for transcriptome sequencing. Electronic supplementary material The online version of this article (10.1186/s12915-018-0486-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Paul Simion
- Institut des Sciences de l'Evolution (ISEM), UMR 5554, CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France.,Sorbonne Université, CNRS, Institut de Biologie Paris-Seine (IBPS), Evolution Paris-Seine (UMR7138), Case 05, 7 Quai St Bernard, 75005, Paris, France
| | - Khalid Belkhir
- Institut des Sciences de l'Evolution (ISEM), UMR 5554, CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France
| | - Clémentine François
- Institut des Sciences de l'Evolution (ISEM), UMR 5554, CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France
| | - Julien Veyssier
- Institut des Sciences de l'Evolution (ISEM), UMR 5554, CNRS, IRD, EPHE, Université de Montpellier, Montpellier, France
| | - Jochen C Rink
- Max Plank Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse 108, 01307, Dresden, Germany
| | - Michaël Manuel
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine (IBPS), Evolution Paris-Seine (UMR7138), Case 05, 7 Quai St Bernard, 75005, Paris, France
| | - Hervé Philippe
- Centre de Théorisation et de Modélisation de la Biodiversité, Station d'Ecologie Théorique et Expérimentale, UMR CNRS 5321, Moulis, 09200, France.,Département de Biochimie, Centre Robert-Cedergren, Université de Montréal, Montréal, H3C 3J7, Québec, Canada
| | - Maximilian J Telford
- Centre for Life's Origins and Evolution, Department of Genetics Evolution and Environment, University College London, Darwin Building, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
63
|
Leduc-Robert G, Maddison WP. Phylogeny with introgression in Habronattus jumping spiders (Araneae: Salticidae). BMC Evol Biol 2018; 18:24. [PMID: 29471785 PMCID: PMC5824460 DOI: 10.1186/s12862-018-1137-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 02/15/2018] [Indexed: 01/12/2023] Open
Abstract
Background Habronattus is a diverse clade of jumping spiders with complex courtship displays and repeated evolution of Y chromosomes. A well-resolved species phylogeny would provide an important framework to study these traits, but has not yet been achieved, in part because the few genes available in past studies gave conflicting signals. Such discordant gene trees could be the result of incomplete lineage sorting (ILS) in recently diverged parts of the phylogeny, but there are indications that introgression could be a source of conflict. Results To infer Habronattus phylogeny and investigate the cause of gene tree discordance, we assembled transcriptomes for 34 Habronattus species and 2 outgroups. The concatenated 2.41 Mb of nuclear data (1877 loci) resolved phylogeny by Maximum Likelihood (ML) with high bootstrap support (95-100%) at most nodes, with some uncertainty surrounding the relationships of H. icenoglei, H. cambridgei, H. oregonensis, and Pellenes canadensis. Species tree analyses by ASTRAL and SVDQuartets gave almost completely congruent results. Several nodes in the ML phylogeny from 12.33 kb of mitochondrial data are incongruent with the nuclear phylogeny and indicate possible mitochondrial introgression: the internal relationships of the americanus and the coecatus groups, the relationship between the altanus, decorus, banksi, and americanus group, and between H. clypeatus and the coecatus group. To determine the relative contributions of ILS and introgression, we analyzed gene tree discordance for nuclear loci longer than 1 kb using Bayesian Concordance Analysis (BCA) for the americanus group (679 loci) and the VCCR clade (viridipes/clypeatus/coecatus/roberti groups) (517 loci) and found signals of introgression in both. Finally, we tested specifically for introgression in the concatenated nuclear matrix with Patterson’s D statistics and DFOIL. We found nuclear introgression resulting in substantial admixture between americanus group species, between H. roberti and the clypeatus group, and between the clypeatus and coecatus groups. Conclusions Our results indicate that the phylogenetic history of Habronattus is predominantly a diverging tree, but that hybridization may have been common between phylogenetically distant species, especially in subgroups with complex courtship displays. Electronic supplementary material The online version of this article (10.1186/s12862-018-1137-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Wayne P Maddison
- Department of Zoology, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada. .,Department of Botany and Beaty Biodiversity Museum, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
| |
Collapse
|
64
|
Medd NC, Fellous S, Waldron FM, Xuéreb A, Nakai M, Cross JV, Obbard DJ. The virome of Drosophila suzukii, an invasive pest of soft fruit. Virus Evol 2018; 4:vey009. [PMID: 29644097 PMCID: PMC5888908 DOI: 10.1093/ve/vey009] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Drosophila suzukii (Matsumura) is one of the most damaging and costly pests to invade temperate horticultural regions in recent history. Conventional control of this pest is challenging, and an environmentally benign microbial biopesticide is highly desirable. A thorough exploration of the pathogens infecting this pest is not only the first step on the road to the development of an effective biopesticide, but also provides a valuable comparative dataset for the study of viruses in the model family Drosophilidae. Here we use a metatransciptomic approach to identify viruses infecting this fly in both its native (Japanese) and invasive (British and French) ranges. We describe eighteen new RNA viruses, including members of the Picornavirales, Mononegavirales, Bunyavirales, Chuviruses, Nodaviridae, Tombusviridae, Reoviridae, and Nidovirales, and discuss their phylogenetic relationships with previously known viruses. We also detect 18 previously described viruses of other Drosophila species that appear to be associated with D. suzukii in the wild.
Collapse
Affiliation(s)
- Nathan C Medd
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Charlotte Auerbach Road, Edinburgh EH9 3FL, UK
| | - Simon Fellous
- Centre de Biologie pour la Gestion des Populations, INRA, 755 avenue du Campus Agropolis, 34988, Montferrier-sur-Lez cedex, France
| | - Fergal M Waldron
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Charlotte Auerbach Road, Edinburgh EH9 3FL, UK
| | - Anne Xuéreb
- Centre de Biologie pour la Gestion des Populations, INRA, 755 avenue du Campus Agropolis, 34988, Montferrier-sur-Lez cedex, France
| | - Madoka Nakai
- Tokyo University of Agriculture and Technology, Saiwaicho, Fuchu, Tokyo 183-8509, Japan
| | - Jerry V Cross
- NIAB EMR, New Road, East Malling, Kent, ME19 6BJ, UK
| | - Darren J Obbard
- Institute of Evolutionary Biology, University of Edinburgh, Ashworth Laboratories, Charlotte Auerbach Road, Edinburgh EH9 3FL, UK
- Centre for Immunity, Infection and Evolution, University of Edinburgh, Ashworth Laboratories, Charlotte Auerbach Road, Edinburgh EH9 3FL, UK
| |
Collapse
|
65
|
Peccoud J, Cordaux R, Gilbert C. Analyzing Horizontal Transfer of Transposable Elements on a Large Scale: Challenges and Prospects. Bioessays 2017; 40. [DOI: 10.1002/bies.201700177] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 11/22/2017] [Indexed: 02/06/2023]
Affiliation(s)
- Jean Peccoud
- UMR CNRS 7267; Ecologie et Biologie des Interactions; Equipe Ecologie Evolution Symbiose; Université de Poitiers; 86000 Poitiers France
| | - Richard Cordaux
- UMR CNRS 7267; Ecologie et Biologie des Interactions; Equipe Ecologie Evolution Symbiose; Université de Poitiers; 86000 Poitiers France
| | - Clément Gilbert
- UMR CNRS 9191; UMR 247 IRD Laboratoire Evolution, Génomes, Comportement, Écologie; Université Paris-Sud,; 91198 Gif-sur-Yvette France
| |
Collapse
|