1
|
Nakazato T, Jinbo U. Cross-sectional use of barcode of life data system and GenBank as DNA barcoding databases for the advancement of museomics. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.966605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Museomics is an approach to the DNA sequencing of museum specimens that can generate both biodiversity and sequence information. In this study, we surveyed both the biodiversity information-based database BOLD (Barcode of Life System) and the sequence information database GenBank, by using DNA barcoding data as an example, with the aim of integrating the data from these two databases. DNA barcoding is a method of identifying species from DNA sequences by using short genetic markers. We surveyed how many entries had biodiversity information (such as links to BOLD and specimen IDs) by downloading all fish, insect, and flowering plant data available from the GenBank Nucleotide, and BOLD ID was assigned to 26.2% of entries for insects. In the same way, we downloaded the respective BOLD data and checked the status of links to sequence information. We also investigated how many species do these databases cover, and 7,693 species were found to exist only in BOLD. In the future, as museomics develops as a field, the targeted sequences will be extended not only to DNA barcodes, but also to mitochondrial genomes, other genes, and genome sequences. Consequently, the value of the sequence data will increase. In addition, various species will be sequenced and, thus, biodiversity information such as the evidence specimen photographs used as a basis for species identification, will become even more indispensable. This study contributes to the acceleration of museomics-associated research by using databases in a cross-sectional manner.
Collapse
|
2
|
Porter TM, Hajibabaei M. MetaWorks: A flexible, scalable bioinformatic pipeline for high-throughput multi-marker biodiversity assessments. PLoS One 2022; 17:e0274260. [PMID: 36174014 PMCID: PMC9521933 DOI: 10.1371/journal.pone.0274260] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Accepted: 08/24/2022] [Indexed: 01/04/2023] Open
Abstract
Multi-marker metabarcoding is increasingly being used to generate biodiversity information across different domains of life from microbes to fungi to animals such as for molecular ecology and biomonitoring applications in different sectors from academic research to regulatory agencies and industry. Current popular bioinformatic pipelines support microbial and fungal marker analysis, while ad hoc methods are often used to process animal metabarcode markers from the same study. MetaWorks provides a harmonized processing environment, pipeline, and taxonomic assignment approach for demultiplexed Illumina reads for all biota using a wide range of metabarcoding markers such as 16S, ITS, and COI. A Conda environment is provided to quickly gather most of the programs and dependencies for the pipeline. Several workflows are provided such as: taxonomically assigning exact sequence variants, provides an option to generate operational taxonomic units, and facilitates single-read processing. Pipelines are automated using Snakemake to minimize user intervention and facilitate scalability. All pipelines use the RDP classifier to provide taxonomic assignments with confidence measures. We extend the functionality of the RDP classifier for taxonomically assigning 16S (bacteria), ITS (fungi), and 28S (fungi), to also support COI (eukaryotes), rbcL (eukaryotes, land plants, diatoms), 12S (fish, vertebrates), 18S (eukaryotes, diatoms) and ITS (fungi, plants). MetaWorks properly handles ITS by trimming flanking conserved rRNA gene regions as well as protein coding genes by providing two options for removing obvious pseudogenes. MetaWorks can be downloaded from https://github.com/terrimporter/MetaWorks and quickstart instructions, pipeline details, and a tutorial for new users can be found at https://terrimporter.github.io/MetaWorksSite.
Collapse
Affiliation(s)
- Teresita M. Porter
- Centre for Biodiversity Genomics @ Biodiversity Institute of Ontario & Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
- * E-mail:
| | - Mehrdad Hajibabaei
- Centre for Biodiversity Genomics @ Biodiversity Institute of Ontario & Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| |
Collapse
|
3
|
Hervé A, Domaizon I, Baudoin JM, Dejean T, Gibert P, Jean P, Peroux T, Raymond JC, Valentini A, Vautier M, Logez M. Spatio-temporal variability of eDNA signal and its implication for fish monitoring in lakes. PLoS One 2022; 17:e0272660. [PMID: 35960745 PMCID: PMC9374266 DOI: 10.1371/journal.pone.0272660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 07/24/2022] [Indexed: 11/18/2022] Open
Abstract
Environmental DNA (eDNA) metabarcoding is revolutionizing the monitoring of aquatic biodiversity. The use of eDNA has the potential to enable non-invasive, cost-effective, time-efficient and high-sensitivity monitoring of fish assemblages. Although the capacity of eDNA metabarcoding to describe fish assemblages is recognised, research efforts are still needed to better assess the spatial and temporal variability of the eDNA signal and to ultimately design an optimal sampling strategy for eDNA monitoring. In this context, we sampled three different lakes (a dam reservoir, a shallow eutrophic lake and a deep oligotrophic lake) every 6 weeks for 1 year. We performed four types of sampling for each lake (integrative sampling of sub-surface water along transects on the left shore, the right shore and above the deepest zone, and point sampling in deeper layers near the lake bottom) to explore the spatial variability of the eDNA signal at the lake scale over a period of 1 year. A metabarcoding approach was applied to analyse the 92 eDNA samples in order to obtain fish species inventories which were compared with traditional fish monitoring methods (standardized gillnet samplings). Several species known to be present in these lakes were only detected by eDNA, confirming the higher sensitivity of this technique in comparison with gillnetting. The eDNA signal varied spatially, with shoreline samples being richer in species than the other samples. Furthermore, deep-water samplings appeared to be non-relevant for regularly mixed lakes, where the eDNA signal was homogeneously distributed. These results also demonstrate a clear temporal variability of the eDNA signal that seems to be related to species phenology, with most of the species detected in spring during the spawning period on shores, but also a peak of detection in winter for salmonid and coregonid species during their reproduction period. These results contribute to our understanding of the spatio-temporal distribution of eDNA in lakes and allow us to provide methodological recommendations regarding where and when to sample eDNA for fish monitoring in lakes.
Collapse
Affiliation(s)
- Alix Hervé
- SPYGEN, Le Bourget du Lac, France
- Pole R&D ECLA, Le Bourget-du-Lac, France
- INRAE, Aix Marseille Université, RECOVER, Aix-en-Provence, France
| | - Isabelle Domaizon
- Pole R&D ECLA, Le Bourget-du-Lac, France
- INRAE, UMR CARRTEL, Thonon-les-Bains, France
| | - Jean-Marc Baudoin
- Pole R&D ECLA, Le Bourget-du-Lac, France
- OFB, Direction de la Recherche et de l’Appui Scientifique, Route Cézanne, Aix-en-Provence, France
| | | | - Pierre Gibert
- Pole R&D ECLA, Le Bourget-du-Lac, France
- INRAE, Aix Marseille Université, RECOVER, Aix-en-Provence, France
| | | | - Tiphaine Peroux
- Pole R&D ECLA, Le Bourget-du-Lac, France
- INRAE, Aix Marseille Université, RECOVER, Aix-en-Provence, France
| | - Jean-Claude Raymond
- Pole R&D ECLA, Le Bourget-du-Lac, France
- OFB, DR AURA, Thonon-les-Bains, France
| | | | - Marine Vautier
- Pole R&D ECLA, Le Bourget-du-Lac, France
- INRAE, UMR CARRTEL, Thonon-les-Bains, France
| | - Maxime Logez
- Pole R&D ECLA, Le Bourget-du-Lac, France
- INRAE, Aix Marseille Université, RECOVER, Aix-en-Provence, France
- INRAE, UR RIVERLY, Villeurbanne, France
- * E-mail:
| |
Collapse
|
4
|
Hoban ML, Whitney J, Collins AG, Meyer C, Murphy KR, Reft AJ, Bemis KE. Skimming for barcodes: rapid production of mitochondrial genome and nuclear ribosomal repeat reference markers through shallow shotgun sequencing. PeerJ 2022; 10:e13790. [PMID: 35959477 PMCID: PMC9359134 DOI: 10.7717/peerj.13790] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 07/05/2022] [Indexed: 01/17/2023] Open
Abstract
DNA barcoding is critical to conservation and biodiversity research, yet public reference databases are incomplete. Existing barcode databases are biased toward cytochrome oxidase subunit I (COI) and frequently lack associated voucher specimens or geospatial metadata, which can hinder reliable species assignments. The emergence of metabarcoding approaches such as environmental DNA (eDNA) has necessitated multiple marker techniques combined with barcode reference databases backed by voucher specimens. Reference barcodes have traditionally been generated by Sanger sequencing, however sequencing multiple markers is costly for large numbers of specimens, requires multiple separate PCR reactions, and limits resulting sequences to targeted regions. High-throughput sequencing techniques such as genome skimming enable assembly of complete mitogenomes, which contain the most commonly used barcoding loci (e.g., COI, 12S, 16S), as well as nuclear ribosomal repeat regions (e.g., ITS1&2, 18S). We evaluated the feasibility of genome skimming to generate barcode references databases for marine fishes by assembling complete mitogenomes and nuclear ribosomal repeats. We tested genome skimming across a taxonomically diverse selection of 12 marine fish species from the collections of the National Museum of Natural History, Smithsonian Institution. We generated two sequencing libraries per species to test the impact of shearing method (enzymatic or mechanical), extraction method (kit-based or automated), and input DNA concentration. We produced complete mitogenomes for all non-chondrichthyans (11/12 species) and assembled nuclear ribosomal repeats (18S-ITS1-5.8S-ITS2-28S) for all taxa. The quality and completeness of mitogenome assemblies was not impacted by shearing method, extraction method or input DNA concentration. Our results reaffirm that genome skimming is an efficient and (at scale) cost-effective method to generate all mitochondrial and common nuclear DNA barcoding loci for multiple species simultaneously, which has great potential to scale for future projects and facilitate completing barcode reference databases for marine fishes.
Collapse
Affiliation(s)
- Mykle L. Hoban
- Hawai‘i Institute of Marine Biology, University of Hawai‘i at Mānoa, Kāne‘ohe, Hawai‘i, United States of America
| | - Jonathan Whitney
- Pacific Islands Fisheries Science Center, National Oceanic and Atmospheric Administration, Honolulu, Hawai‘i, United States of America
| | - Allen G. Collins
- NOAA National Systematics Laboratory, Natural Museum of Natural History, Smithsonian Institution, Washington, D.C., United States of America
| | - Christopher Meyer
- Department of Invertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, D.C., United States of America
| | - Katherine R. Murphy
- Laboratories of Analytical Biology, National Museum of Natural History, Smithsonian Institution, Washington, D.C., United States of America
| | - Abigail J. Reft
- NOAA National Systematics Laboratory, Natural Museum of Natural History, Smithsonian Institution, Washington, D.C., United States of America
| | - Katherine E. Bemis
- NOAA National Systematics Laboratory, Natural Museum of Natural History, Smithsonian Institution, Washington, D.C., United States of America
| |
Collapse
|
5
|
Vasconcelos S, Nunes GL, Dias MC, Lorena J, Oliveira RRM, Lima TGL, Pires ES, Valadares RBS, Alves R, Watanabe MTC, Zappi DC, Hiura AL, Pastore M, Vasconcelos LV, Mota NFO, Viana PL, Gil ASB, Simões AO, Imperatriz‐Fonseca VL, Harley RM, Giulietti AM, Oliveira G. Unraveling the plant diversity of the Amazonian canga through DNA barcoding. Ecol Evol 2021; 11:13348-13362. [PMID: 34646474 PMCID: PMC8495817 DOI: 10.1002/ece3.8057] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2021] [Revised: 08/03/2021] [Accepted: 08/11/2021] [Indexed: 01/04/2023] Open
Abstract
The canga of the Serra dos Carajás, in Eastern Amazon, is home to a unique open plant community, harboring several endemic and rare species. Although a complete flora survey has been recently published, scarce to no genetic information is available for most plant species of the ironstone outcrops of the Serra dos Carajás. In this scenario, DNA barcoding appears as a fast and effective approach to assess the genetic diversity of the Serra dos Carajás flora, considering the growing need for robust biodiversity conservation planning in such an area with industrial mining activities. Thus, after testing eight different DNA barcode markers (matK, rbcL, rpoB, rpoC1, atpF-atpH, psbK-psbI, trnH-psbA, and ITS2), we chose rbcL and ITS2 as the most suitable markers for a broad application in the regional flora. Here we describe DNA barcodes for 1,130 specimens of 538 species, 323 genera, and 115 families of vascular plants from a highly diverse flora in the Amazon basin, with a total of 344 species being barcoded for the first time. In addition, we assessed the potential of using DNA metabarcoding of bulk samples for surveying plant diversity in the canga. Upon achieving the first comprehensive DNA barcoding effort directed to a complete flora in the Brazilian Amazon, we discuss the relevance of our results to guide future conservation measures in the Serra dos Carajás.
Collapse
Affiliation(s)
| | | | - Mariana C. Dias
- Instituto Tecnológico ValeBelémBrazil
- Programa Interunidades de Pós‐Graduação em BioinformáticaUniversidade Federal de Minas GeraisBelo HorizonteBrazil
| | | | - Renato R. M. Oliveira
- Instituto Tecnológico ValeBelémBrazil
- Programa Interunidades de Pós‐Graduação em BioinformáticaUniversidade Federal de Minas GeraisBelo HorizonteBrazil
| | | | | | | | | | | | - Daniela C. Zappi
- Instituto Tecnológico ValeBelémBrazil
- Instituto de Ciências BiológicasUniversidade de BrasíliaBrasíliaBrazil
| | | | - Mayara Pastore
- Instituto Tecnológico ValeBelémBrazil
- Coordenação de BotânicaMuseu Paraense Emílio GoeldiBelémBrazil
| | - Liziane V. Vasconcelos
- Instituto Tecnológico ValeBelémBrazil
- Programa de Pós‐Graduação em EcologiaUniversidade Federal do ParáBelémBrazil
| | - Nara F. O. Mota
- Instituto Tecnológico ValeBelémBrazil
- Coordenação de BotânicaMuseu Paraense Emílio GoeldiBelémBrazil
| | - Pedro L. Viana
- Coordenação de BotânicaMuseu Paraense Emílio GoeldiBelémBrazil
| | - André S. B. Gil
- Coordenação de BotânicaMuseu Paraense Emílio GoeldiBelémBrazil
| | - André O. Simões
- Departamento de Biologia VegetalUniversidade Estadual de CampinasCampinasBrazil
| | | | | | - Ana M. Giulietti
- Instituto Tecnológico ValeBelémBrazil
- Programa de Pós‐Graduação em BotânicaUniversidade Estadual de Feira de SantanaFeira de SantanaBrazil
| | | |
Collapse
|
6
|
Anslan S, Sachs M, Rancilhac L, Brinkmann H, Petersen J, Künzel S, Schwarz A, Arndt H, Kerney R, Vences M. Diversity and substrate-specificity of green algae and other micro-eukaryotes colonizing amphibian clutches in Germany, revealed by DNA metabarcoding. Naturwissenschaften 2021; 108:29. [PMID: 34181110 PMCID: PMC8238718 DOI: 10.1007/s00114-021-01734-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 03/20/2021] [Accepted: 05/02/2021] [Indexed: 02/17/2023]
Abstract
Amphibian clutches are colonized by diverse but poorly studied communities of micro-organisms. One of the most noted ones is the unicellular green alga, Oophila amblystomatis, but the occurrence and role of other micro-organisms in the capsular chamber surrounding amphibian clutches have remained largely unstudied. Here, we undertook a multi-marker DNA metabarcoding study to characterize the community of algae and other micro-eukaryotes associated with agile frog (Rana dalmatina) clutches. Samplings were performed at three small ponds in Germany, from four substrates: water, sediment, tree leaves from the bottom of the pond, and R. dalmatina clutches. Sampling substrate strongly determined the community compositions of algae and other micro-eukaryotes. Therefore, as expected, the frog clutch-associated communities formed clearly distinct clusters. Clutch-associated communities in our study were structured by a plethora of not only green algae, but also diatoms and other ochrophytes. The most abundant operational taxonomic units (OTUs) in clutch samples were taxa from Chlamydomonas, Oophila, but also from Nitzschia and other ochrophytes. Sequences of Oophila "Clade B" were found exclusively in clutches. Based on additional phylogenetic analyses of 18S rDNA and of a matrix of 18 nuclear genes derived from transcriptomes, we confirmed in our samples the existence of two distinct clades of green algae assigned to Oophila in past studies. We hypothesize that "Clade B" algae correspond to the true Oophila, whereas "Clade A" algae are a series of Chlorococcum species that, along with other green algae, ochrophytes and protists, colonize amphibian clutches opportunistically and are often cultured from clutch samples due to their robust growth performance. The clutch-associated communities were subject to filtering by sampling location, suggesting that the taxa colonizing amphibian clutches can drastically differ depending on environmental conditions.
Collapse
Affiliation(s)
- Sten Anslan
- Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany.
| | - Maria Sachs
- Institute of Zoology, University of Cologne, Zülpicherstr. 47b, 50674, Köln, Germany
| | - Lois Rancilhac
- Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany
| | - Henner Brinkmann
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Jörn Petersen
- Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7B, 38124, Braunschweig, Germany
| | - Sven Künzel
- Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, 24306, Plön, Germany
| | - Anja Schwarz
- Institute of Geosystems and Bioindication, Technische Universität Braunschweig, Braunschweig, Germany
| | - Hartmut Arndt
- Institute of Zoology, University of Cologne, Zülpicherstr. 47b, 50674, Köln, Germany
| | - Ryan Kerney
- Department of Biology, Gettysburg College, Gettysburg, PA, USA
| | - Miguel Vences
- Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany
| |
Collapse
|
7
|
Avanesyan A, Sutton H, Lamp WO. Choosing an Effective PCR-Based Approach for Diet Analysis of Insect Herbivores: A Systematic Review. JOURNAL OF ECONOMIC ENTOMOLOGY 2021; 114:1035-1046. [PMID: 33822094 DOI: 10.1093/jee/toab057] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Indexed: 06/12/2023]
Abstract
Identification of ingested plant species using polymerase chain reaction (PCR)-based methods is an increasingly useful yet challenging approach to accurately determine the diet composition of insect herbivores and thus their trophic interactions. A typical process of detection of DNA of ingested plants involves the choice of a DNA extraction method, a genomic target region, and/or the best approach for an accurate plant species identification. The wide range of available techniques makes the choice of the most appropriate method for an accurately and timely identification of ingested plants from insect guts difficult. In our study, we reviewed the commonly used PCR-based approaches in studies published from 1977 to 2019, to provide researchers with the information on the tools which have been shown to be effective for obtaining and identifying ingested plants. Our results showed that among five insect orders used in the retrieved studies Coleoptera and Hemiptera were prevalent (33 and 28% of all the records, respectively). In 79% of the studies a DNA barcoding approach was employed. In a substantial number of studies Qiagen DNA extraction kits and CTAB protocol were used (43 and 23%, respectively). Of all records, 65% used a single locus as a targeted plant DNA fragment; trnL, rbcL, and ITS regions were the most frequently used loci. Sequencing was the dominant type of among DNA verification approaches (70% of all records). This review provides important information on the availability of successfully used PCR-based approaches to identify ingested plant DNA in insect guts, and suggests potential directions for future studies on plant-insect trophic interactions.
Collapse
Affiliation(s)
- Alina Avanesyan
- Department of Entomology, University of Maryland, 4291 Fieldhouse Drive, 4112 Plant Sciences, College Park, MD 20742, USA
| | - Hannah Sutton
- Department of Entomology, University of Maryland, 4291 Fieldhouse Drive, 4112 Plant Sciences, College Park, MD 20742, USA
| | - William O Lamp
- Department of Entomology, University of Maryland, 4291 Fieldhouse Drive, 4112 Plant Sciences, College Park, MD 20742, USA
| |
Collapse
|
8
|
Fuhrmann N, Kaiser TS. The importance of DNA barcode choice in biogeographic analyses - a case study on marine midges of the genus Clunio. Genome 2020; 64:242-252. [PMID: 32510236 DOI: 10.1139/gen-2019-0191] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
DNA barcodes are widely used for species identification and biogeographic studies. Here, we compare the use of full mitochondrial genomes versus DNA barcodes and other mitochondrial DNA fragments for biogeographic and ecological analyses. Our dataset comprised 120 mitochondrial genomes from the genus Clunio (Diptera: Chironomidae), comprising five populations from two closely related species (Clunio marinus and Clunio balticus) and three ecotypes. We extracted cytochrome oxidase c subunit I (COI) barcodes and partitioned the mitochondrial genomes into non-overlapping windows of 750 or 1500 bp. Haplotype networks and diversity indices were compared for these windows and full mitochondrial genomes (15.4 kb). Full mitochondrial genomes indicate complete geographic isolation between populations, but do not allow for conclusions on the separation of ecotypes or species. COI barcodes have comparatively few polymorphisms, ideal for species identification, but do not resolve geographic isolation. Many of the similarly sized 750 bp windows have higher nucleotide and haplotype diversity than COI barcodes, but still do not resolve biogeography. Only when increasing the window size to 1500 bp, two windows resolve biogeography reasonably well. Our results suggest that the design and use of DNA barcodes in biogeographic studies must be carefully evaluated for each investigated species.
Collapse
Affiliation(s)
- Nico Fuhrmann
- Max Planck Institute for Evolutionary Biology, Max Planck Research Group "Biological Clocks", August-Thienemann-Strasse 2, 24306 Plön, Germany.,Max Planck Institute for Evolutionary Biology, Max Planck Research Group "Biological Clocks", August-Thienemann-Strasse 2, 24306 Plön, Germany
| | - Tobias S Kaiser
- Max Planck Institute for Evolutionary Biology, Max Planck Research Group "Biological Clocks", August-Thienemann-Strasse 2, 24306 Plön, Germany.,Max Planck Institute for Evolutionary Biology, Max Planck Research Group "Biological Clocks", August-Thienemann-Strasse 2, 24306 Plön, Germany
| |
Collapse
|
9
|
Salvi D, Berrilli E, D'Alessandro P, Biondi M. Sharpening the DNA barcoding tool through a posteriori taxonomic validation: The case of Longitarsus flea beetles (Coleoptera: Chrysomelidae). PLoS One 2020; 15:e0233573. [PMID: 32437469 PMCID: PMC7241800 DOI: 10.1371/journal.pone.0233573] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 05/07/2020] [Indexed: 11/30/2022] Open
Abstract
The accuracy of the DNA barcoding tool depends on the existence of a comprehensive archived library of sequences reliably determined at species level by expert taxonomists. However, misidentifications are not infrequent, especially following large-scale DNA barcoding campaigns on diverse and taxonomically complex groups. In this study we used the species-rich flea beetle genus Longitarsus, that requires a high level of expertise for morphological species identification, as a case study to assess the accuracy of the DNA barcoding tool following several optimization procedures. We built a cox1 reference database of 1502 sequences representing 78 Longitarsus species, among which 117 sequences (32 species) were newly generated using a non-invasive DNA extraction method that allows keeping reference voucher specimens. Within this dataset we identified 69 taxonomic inconsistencies using barcoding gap analysis and tree topology methods. Threshold optimisation and a posteriori taxonomic revision based on newly generated reference sequences and metadata allowed resolving 44 sequences with ambiguous and incorrect identification and provided a significant improvement of the DNA barcoding accuracy and identification efficacy. Unresolved taxonomic uncertainties, due to overlapping intra- and inter-specific levels of divergences, mainly regards the Longitarsus pratensis species complex and polyphyletic groups L. melanocephalus, L. nigrofasciatus and L. erro. Such type of errors indicates either poorly established taxonomy or any biological processes that make mtDNA groups poorly predictive of species boundaries (e.g. recent speciation or interspecific hybridisation), thus providing directions for further integrative taxonomic and evolutionary studies. Overall, this study underlines the importance of reference vouchers and high-quality metadata associated to sequences in reference databases and corroborates, once again, the key role of taxonomists in any step of the DNA barcoding pipeline in order to generate and maintain a correct and functional reference library.
Collapse
Affiliation(s)
- Daniele Salvi
- Department of Health, Life and Environmental Sciences, University of L'Aquila, Coppito, L'Aquila, Italy
- CIBIO-InBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, Universidade do Porto, Vairão, Portugal
| | - Emanuele Berrilli
- Department of Health, Life and Environmental Sciences, University of L'Aquila, Coppito, L'Aquila, Italy
| | - Paola D'Alessandro
- Department of Health, Life and Environmental Sciences, University of L'Aquila, Coppito, L'Aquila, Italy
| | - Maurizio Biondi
- Department of Health, Life and Environmental Sciences, University of L'Aquila, Coppito, L'Aquila, Italy
| |
Collapse
|
10
|
Turon X, Antich A, Palacín C, Præbel K, Wangensteen OS. From metabarcoding to metaphylogeography: separating the wheat from the chaff. ECOLOGICAL APPLICATIONS : A PUBLICATION OF THE ECOLOGICAL SOCIETY OF AMERICA 2020; 30:e02036. [PMID: 31709684 PMCID: PMC7078904 DOI: 10.1002/eap.2036] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 07/31/2019] [Accepted: 10/03/2019] [Indexed: 05/31/2023]
Abstract
Metabarcoding is by now a well-established method for biodiversity assessment in terrestrial, freshwater, and marine environments. Metabarcoding data sets are usually used for α- and β-diversity estimates, that is, interspecies (or inter-MOTU [molecular operational taxonomic unit]) patterns. However, the use of hypervariable metabarcoding markers may provide an enormous amount of intraspecies (intra-MOTU) information-mostly untapped so far. The use of cytochrome oxidase (COI) amplicons is gaining momentum in metabarcoding studies targeting eukaryote richness. COI has been for a long time the marker of choice in population genetics and phylogeographic studies. Therefore, COI metabarcoding data sets may be used to study intraspecies patterns and phylogeographic features for hundreds of species simultaneously, opening a new field that we suggest to name metaphylogeography. The main challenge for the implementation of this approach is the separation of erroneous sequences from true intra-MOTU variation. Here, we develop a cleaning protocol based on changes in entropy of the different codon positions of the COI sequence, together with co-occurrence patterns of sequences. Using a data set of community DNA from several benthic littoral communities in the Mediterranean and Atlantic seas, we first tested by simulation on a subset of sequences a two-step cleaning approach consisting of a denoising step followed by a minimal abundance filtering. The procedure was then applied to the whole data set. We obtained a total of 563 MOTUs that were usable for phylogeographic inference. We used semiquantitative rank data instead of read abundances to perform AMOVAs and haplotype networks. Genetic variability was mainly concentrated within samples, but with an important between seas component as well. There were intergroup differences in the amount of variability between and within communities in each sea. For two species, the results could be compared with traditional Sanger sequence data available for the same zones, giving similar patterns. Our study shows that metabarcoding data can be used to infer intra- and interpopulation genetic variability of many species at a time, providing a new method with great potential for basic biogeography, connectivity and dispersal studies, and for the more applied fields of conservation genetics, invasion genetics, and design of protected areas.
Collapse
Affiliation(s)
- Xavier Turon
- Department of Marine EcologyCentre for Advanced Studies of Blanes (CEAB, CSIC)BlanesCataloniaSpain
| | - Adrià Antich
- Department of Marine EcologyCentre for Advanced Studies of Blanes (CEAB, CSIC)BlanesCataloniaSpain
| | - Creu Palacín
- Department of Evolutionary Biology, Ecology and Environmental Sciences, and Institute of Biodiversity Research (IRBio)University of BarcelonaBarcelonaCataloniaSpain
| | - Kim Præbel
- Norwegian College of Fishery ScienceUiT the Arctic University of NorwayTromsøNorway
| | | |
Collapse
|
11
|
Leray M, Knowlton N, Ho SL, Nguyen BN, Machida RJ. GenBank is a reliable resource for 21st century biodiversity research. Proc Natl Acad Sci U S A 2019; 116:22651-22656. [PMID: 31636175 PMCID: PMC6842603 DOI: 10.1073/pnas.1911714116] [Citation(s) in RCA: 80] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Traditional methods of characterizing biodiversity are increasingly being supplemented and replaced by approaches based on DNA sequencing alone. These approaches commonly involve extraction and high-throughput sequencing of bulk samples from biologically complex communities or samples of environmental DNA (eDNA). In such cases, vouchers for individual organisms are rarely obtained, often unidentifiable, or unavailable. Thus, identifying these sequences typically relies on comparisons with sequences from genetic databases, particularly GenBank. While concerns have been raised about biases and inaccuracies in laboratory and analytical methods, comparatively little attention has been paid to the taxonomic reliability of GenBank itself. Here we analyze the metazoan mitochondrial sequences of GenBank using a combination of distance-based clustering and phylogenetic analysis. Because of their comparatively rapid evolutionary rates and consequent high taxonomic resolution, mitochondrial sequences represent an invaluable resource for the detection of the many small and often undescribed organisms that represent the bulk of animal diversity. We show that metazoan identifications in GenBank are surprisingly accurate, even at low taxonomic levels (likely <1% error rate at the genus level). This stands in contrast to previously voiced concerns based on limited analyses of particular groups and the fact that individual researchers currently submit annotated sequences to GenBank without significant external taxonomic validation. Our encouraging results suggest that the rapid uptake of DNA-based approaches is supported by a bioinformatic infrastructure capable of assessing both the losses to biodiversity caused by global change and the effectiveness of conservation efforts aimed at slowing or reversing these losses.
Collapse
Affiliation(s)
- Matthieu Leray
- Smithsonian Tropical Research Institute, Smithsonian Institution, Panama City, 0843-03092, Republic of Panama
| | - Nancy Knowlton
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20560;
| | - Shian-Lei Ho
- Biodiversity Research Centre, Academia Sinica, 115-29 Taipei, Taiwan
| | - Bryan N Nguyen
- National Museum of Natural History, Smithsonian Institution, Washington, DC 20560
- Department of Biological Sciences, The George Washington University, Washington, DC 20052
- Computational Biology Institute, Milken Institute School of Public Health, The George Washington University, Washington, DC 20052
| | - Ryuji J Machida
- Biodiversity Research Centre, Academia Sinica, 115-29 Taipei, Taiwan;
| |
Collapse
|
12
|
McGee KM, Robinson CV, Hajibabaei M. Gaps in DNA-Based Biomonitoring Across the Globe. Front Ecol Evol 2019. [DOI: 10.3389/fevo.2019.00337] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
13
|
Cristescu ME, Donaldson MR. Genome's 60th anniversary. Genome 2019; 62:iii-iv. [PMID: 31050572 DOI: 10.1139/gen-2019-0057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|