1
|
Noll NW, Scherber C, Schäffler L. taxalogue: a toolkit to create comprehensive CO1 reference databases. PeerJ 2023; 11:e16253. [PMID: 38077427 PMCID: PMC10702336 DOI: 10.7717/peerj.16253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 09/18/2023] [Indexed: 12/18/2023] Open
Abstract
Background Taxonomic identification through DNA barcodes gained considerable traction through the invention of next-generation sequencing and DNA metabarcoding. Metabarcoding allows for the simultaneous identification of thousands of organisms from bulk samples with high taxonomic resolution. However, reliable identifications can only be achieved with comprehensive and curated reference databases. Therefore, custom reference databases are often created to meet the needs of specific research questions. Due to taxonomic inconsistencies, formatting issues, and technical difficulties, building a custom reference database requires tremendous effort. Here, we present taxalogue, an easy-to-use software for creating comprehensive and customized reference databases that provide clean and taxonomically harmonized records. In combination with extensive geographical filtering options, taxalogue opens up new possibilities for generating and testing evolutionary hypotheses. Methods taxalogue collects DNA sequences from several online sources and combines them into a reference database. Taxonomic incongruencies between the different data sources can be harmonized according to available taxonomies. Dereplication and various filtering options are available regarding sequence quality or metadata information. taxalogue is implemented in the open-source Ruby programming language, and the source code is available at https://github.com/nwnoll/taxalogue. We benchmark four reference databases by sequence identity against eight queries from different localities and trapping devices. Subsamples from each reference database were used to compare how well another one is covered. Results taxalogue produces reference databases with the best coverage at high identities for most tested queries, enabling more accurate, reliable predictions with higher certainty than the other benchmarked reference databases. Additionally, the performance of taxalogue is more consistent while providing good coverage for a variety of habitats, regions, and sampling methods. taxalogue simplifies the creation of reference databases and makes the process reproducible and transparent. Multiple available output formats for commonly used downstream applications facilitate the easy adoption of taxalogue in many different software pipelines. The resulting reference databases improve the taxonomic classification accuracy through high coverage of the query sequences at high identities.
Collapse
Affiliation(s)
- Niklas W. Noll
- Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, North Rhine-Westphalia, Germany
| | - Christoph Scherber
- Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, North Rhine-Westphalia, Germany
| | - Livia Schäffler
- Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change, Bonn, North Rhine-Westphalia, Germany
| |
Collapse
|
2
|
Mugnai F, Costantini F, Chenuil A, Leduc M, Gutiérrez Ortega JM, Meglécz E. Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies. PeerJ 2023; 11:e14616. [PMID: 36643652 PMCID: PMC9835706 DOI: 10.7717/peerj.14616] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 12/01/2022] [Indexed: 01/11/2023] Open
Abstract
Background In metabarcoding analyses, the taxonomic assignment is crucial to place sequencing data in biological and ecological contexts. This fundamental step depends on a reference database, which should have a good taxonomic coverage to avoid unassigned sequences. However, this goal is rarely achieved in many geographic regions and for several taxonomic groups. On the other hand, more is not necessarily better, as sequences in reference databases belonging to taxonomic groups out of the studied region/environment context might lead to false assignments. Methods We investigated the effect of using several subsets of a cytochrome c oxidase subunit I (COI) reference database on taxonomic assignment. Published metabarcoding sequences from the Mediterranean Sea were assigned to taxa using COInr, which is a comprehensive, non-redundant and recent database of COI sequences obtained both from BOLD and NCBI, and two of its subsets: (i) all sequences except insects (COInr-WO-Insecta), which represent the overwhelming majority of COInr database, but are irrelevant for marine samples, and (ii) all sequences from taxonomic families present in the Mediterranean Sea (COInr-Med). Four different algorithms for taxonomic assignment were employed in parallel to evaluate differences in their output and data consistency. Results The reduction of the database to more specific custom subsets increased the number of unassigned sequences. Nevertheless, since most of them were incorrectly assigned by the less specific databases, this is a positive outcome. Moreover, the taxonomic resolution (the lowest taxonomic level to which a sequence is attributed) of several sequences tended to increase when using customized databases. These findings clearly indicated the need for customized databases adapted to each study. However, the very high proportion of unassigned sequences points to the need to enrich the local database with new barcodes specifically obtained from the studied region and/or taxonomic group. Including novel local barcodes to the COI database proved to be very profitable: by adding only 116 new barcodes sequenced in our laboratory, thus increasing the reference database by only 0.04%, we were able to improve the resolution for ca. 0.6-1% of the Amplicon Sequence Variants (ASVs).
Collapse
Affiliation(s)
- Francesco Mugnai
- Department of Biological, Geological and Environmental Sciences (BiGeA), University of Bologna, Ravenna, Italy
| | - Federica Costantini
- Department of Biological, Geological and Environmental Sciences (BiGeA), University of Bologna, Ravenna, Italy,Consorzio Nazionale Interuniversitario per le Scienze del Mare (CoNISMa), Roma, Italy
| | - Anne Chenuil
- Aix Marseille Univ, Avignon Université, CNRS, IRD, IMBE, Marseille, France
| | | | | | - Emese Meglécz
- Aix Marseille Univ, Avignon Université, CNRS, IRD, IMBE, Marseille, France
| |
Collapse
|
3
|
Gold Z, Curd EE, Goodwin KD, Choi ES, Frable BW, Thompson AR, Walker HJ, Burton RS, Kacev D, Martz LD, Barber PH. Improving metabarcoding taxonomic assignment: A case study of fishes in a large marine ecosystem. Mol Ecol Resour 2021; 21:2546-2564. [PMID: 34235858 DOI: 10.1111/1755-0998.13450] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 05/25/2021] [Accepted: 06/03/2021] [Indexed: 01/08/2023]
Abstract
DNA metabarcoding is an important tool for molecular ecology. However, its effectiveness hinges on the quality of reference sequence databases and classification parameters employed. Here we evaluate the performance of MiFish 12S taxonomic assignments using a case study of California Current Large Marine Ecosystem fishes to determine best practices for metabarcoding. Specifically, we use a taxonomy cross-validation by identity framework to compare classification performance between a global database comprised of all available sequences and a curated database that only includes sequences of fishes from the California Current Large Marine Ecosystem. We demonstrate that the regional database provides higher assignment accuracy than the comprehensive global database. We also document a tradeoff between accuracy and misclassification across a range of taxonomic cutoff scores, highlighting the importance of parameter selection for taxonomic classification. Furthermore, we compared assignment accuracy with and without the inclusion of additionally generated reference sequences. To this end, we sequenced tissue from 597 species using the MiFish 12S primers, adding 252 species to GenBank's existing 550 California Current Large Marine Ecosystem fish sequences. We then compared species and reads identified from seawater environmental DNA samples using global databases with and without our generated references, and the regional database. The addition of new references allowed for the identification of 16 additional native taxa representing 17.0% of total reads from eDNA samples, including species with vast ecological and economic value. Together these results demonstrate the importance of comprehensive and curated reference databases for effective metabarcoding and the need for locus-specific validation efforts.
Collapse
Affiliation(s)
- Zachary Gold
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California, USA
| | - Emily E Curd
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California, USA
| | - Kelly D Goodwin
- Ocean Chemistry and Ecosystems Division, Atlantic Oceanographic and Meteorological Laboratory, National Oceanic and Atmospheric Administration, Stationed at Southwest Fisheries Science Center, La Jolla, California, USA
| | - Emma S Choi
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, USA
| | - Benjamin W Frable
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, USA
| | - Andrew R Thompson
- Southwest Fisheries Science Center, National Oceanic and Atmospheric Administration, La Jolla, California, USA
| | - Harold J Walker
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, USA
| | - Ronald S Burton
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, USA
| | - Dovi Kacev
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, USA
| | - Lucas D Martz
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, USA
| | - Paul H Barber
- Department of Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, California, USA
| |
Collapse
|
4
|
Rossmann S, Lysøe E, Skogen M, Talgø V, Brurberg MB. DNA Metabarcoding Reveals Broad Presence of Plant Pathogenic Oomycetes in Soil From Internationally Traded Plants. Front Microbiol 2021; 12:637068. [PMID: 33841362 PMCID: PMC8027490 DOI: 10.3389/fmicb.2021.637068] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 03/03/2021] [Indexed: 01/04/2023] Open
Abstract
Plants with roots and soil clumps transported over long distances in plant trading can harbor plant pathogenic oomycetes, facilitating disease outbreaks that threaten ecosystems, biodiversity, and food security. Tools to detect the presence of such oomycetes with a sufficiently high throughput and broad scope are currently not part of international phytosanitary testing regimes. In this work, DNA metabarcoding targeting the internal transcribed spacer (ITS) region was employed to broadly detect and identify oomycetes present in soil from internationally shipped plants. This method was compared to traditional isolation-based detection and identification after an enrichment step. DNA metabarcoding showed widespread presence of potentially plant pathogenic Phytophthora and Pythium species in internationally transported rhizospheric soil with Pythium being the overall most abundant genus observed. Baiting, a commonly employed enrichment method for Phytophthora species, led to an increase of golden-brown algae in the soil samples, but did not increase the relative or absolute abundance of potentially plant pathogenic oomycetes. Metabarcoding of rhizospheric soil yielded DNA sequences corresponding to oomycete isolates obtained after enrichment and identified them correctly but did not always detect the isolated oomycetes in the same samples. This work provides a proof of concept and outlines necessary improvements for the use of environmental DNA (eDNA) and metabarcoding as a standalone phytosanitary assessment tool for broad detection and identification of plant pathogenic oomycetes.
Collapse
Affiliation(s)
- Simeon Rossmann
- Division of Biotechnology and Plant Health, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
| | - Erik Lysøe
- Division of Biotechnology and Plant Health, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
| | - Monica Skogen
- Division of Biotechnology and Plant Health, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
| | - Venche Talgø
- Division of Biotechnology and Plant Health, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
| | - May Bente Brurberg
- Division of Biotechnology and Plant Health, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
- Department of Plant Sciences, Norwegian University of Life Sciences (NMBU), Ås, Norway
| |
Collapse
|
5
|
Arranz V, Pearman WS, Aguirre JD, Liggins L. MARES, a replicable pipeline and curated reference database for marine eukaryote metabarcoding. Sci Data 2020; 7:209. [PMID: 32620910 PMCID: PMC7334202 DOI: 10.1038/s41597-020-0549-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 05/27/2020] [Indexed: 11/09/2022] Open
Abstract
The use of DNA metabarcoding to characterise the biodiversity of environmental and community samples has exploded in recent years. However, taxonomic inferences from these studies are contingent on the quality and completeness of the sequence reference database used to characterise sample species-composition. In response, studies often develop custom reference databases to improve species assignment. The disadvantage of this approach is that it limits the potential for database re-use, and the transferability of inferences across studies. Here, we present the MARine Eukaryote Species (MARES) reference database for use in marine metabarcoding studies, created using a transparent and reproducible pipeline. MARES includes all COI sequences available in GenBank and BOLD for marine taxa, unified into a single taxonomy. Our pipeline facilitates the curation of sequences, synonymization of taxonomic identifiers used by different repositories, and formatting these data for use in taxonomic assignment tools. Overall, MARES provides a benchmark COI reference database for marine eukaryotes, and a standardised pipeline for (re)producing reference databases enabling integration and fair comparison of marine DNA metabarcoding results. Measurement(s) | DNA | Technology Type(s) | bioinformatics analysis | Factor Type(s) | DNA sequence | Sample Characteristic - Organism | Eukaryota | Sample Characteristic - Environment | marine environment |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12324122
Collapse
Affiliation(s)
- Vanessa Arranz
- School of Natural and Computational Sciences, Massey University Auckland, Albany, Auckland, 0745, New Zealand.
| | - William S Pearman
- School of Natural and Computational Sciences, Massey University Auckland, Albany, Auckland, 0745, New Zealand
| | - J David Aguirre
- School of Natural and Computational Sciences, Massey University Auckland, Albany, Auckland, 0745, New Zealand
| | - Libby Liggins
- School of Natural and Computational Sciences, Massey University Auckland, Albany, Auckland, 0745, New Zealand
| |
Collapse
|
6
|
Richardson RT, Sponsler DB, McMinn‐Sauder H, Johnson RM. MetaCurator: A hidden Markov model‐based toolkit for extracting and curating sequences from taxonomically‐informative genetic markers. Methods Ecol Evol 2019. [DOI: 10.1111/2041-210x.13314] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | - Douglas B. Sponsler
- Department of Entomology Pennsylvania State University University Park PA USA
- Department of Botany The Academy of Natural Sciences of Drexel University Philadelphia PA USA
| | | | - Reed M. Johnson
- Department of Entomology The Ohio State University Columbus OH USA
| |
Collapse
|
7
|
Piper AM, Batovska J, Cogan NOI, Weiss J, Cunningham JP, Rodoni BC, Blacket MJ. Prospects and challenges of implementing DNA metabarcoding for high-throughput insect surveillance. Gigascience 2019; 8:giz092. [PMID: 31363753 PMCID: PMC6667344 DOI: 10.1093/gigascience/giz092] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 06/25/2019] [Accepted: 07/09/2019] [Indexed: 12/21/2022] Open
Abstract
Trap-based surveillance strategies are widely used for monitoring of invasive insect species, aiming to detect newly arrived exotic taxa as well as track the population levels of established or endemic pests. Where these surveillance traps have low specificity and capture non-target endemic species in excess of the target pests, the need for extensive specimen sorting and identification creates a major diagnostic bottleneck. While the recent development of standardized molecular diagnostics has partly alleviated this requirement, the single specimen per reaction nature of these methods does not readily scale to the sheer number of insects trapped in surveillance programmes. Consequently, target lists are often restricted to a few high-priority pests, allowing unanticipated species to avoid detection and potentially establish populations. DNA metabarcoding has recently emerged as a method for conducting simultaneous, multi-species identification of complex mixed communities and may lend itself ideally to rapid diagnostics of bulk insect trap samples. Moreover, the high-throughput nature of recent sequencing platforms could enable the multiplexing of hundreds of diverse trap samples on a single flow cell, thereby providing the means to dramatically scale up insect surveillance in terms of both the quantity of traps that can be processed concurrently and number of pest species that can be targeted. In this review of the metabarcoding literature, we explore how DNA metabarcoding could be tailored to the detection of invasive insects in a surveillance context and highlight the unique technical and regulatory challenges that must be considered when implementing high-throughput sequencing technologies into sensitive diagnostic applications.
Collapse
Affiliation(s)
- Alexander M Piper
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora 3083, VIC, Australia
| | - Jana Batovska
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora 3083, VIC, Australia
| | - Noel O I Cogan
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora 3083, VIC, Australia
| | - John Weiss
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
| | - John Paul Cunningham
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
| | - Brendan C Rodoni
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora 3083, VIC, Australia
| | - Mark J Blacket
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
| |
Collapse
|
8
|
Richardson RT, Curtis HR, Matcham EG, Lin C, Suresh S, Sponsler DB, Hearon LE, Johnson RM. Quantitative multi‐locus metabarcoding and waggle dance interpretation reveal honey bee spring foraging patterns in Midwest agroecosystems. Mol Ecol 2019; 28:686-697. [DOI: 10.1111/mec.14975] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 11/09/2018] [Accepted: 11/19/2018] [Indexed: 12/17/2022]
Affiliation(s)
| | - Hailey R. Curtis
- College of Veterinary Medicine The Ohio State University Columbus Ohio
| | - Emma G. Matcham
- Department of Horticulture and Crop Science The Ohio State University Columbus Ohio
| | - Chia‐Hua Lin
- Department of Entomology The Ohio State University Columbus Ohio
| | - Sreelakshmi Suresh
- Department of Evolution, Ecology, and Organismal Biology The Ohio State University Columbus Ohio
| | - Douglas B. Sponsler
- Department of Entomology Pennsylvania State University University Park Pennsylvania
| | - Luke E. Hearon
- Department of Entomology The Ohio State University Columbus Ohio
| | - Reed M. Johnson
- Department of Entomology The Ohio State University Columbus Ohio
| |
Collapse
|