51
|
Zhang AB, Feng J, Ward RD, Wan P, Gao Q, Wu J, Zhao WZ. A new method for species identification via protein-coding and non-coding DNA barcodes by combining machine learning with bioinformatic methods. PLoS One 2012; 7:e30986. [PMID: 22363527 PMCID: PMC3282726 DOI: 10.1371/journal.pone.0030986] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 12/29/2011] [Indexed: 11/19/2022] Open
Abstract
Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
Collapse
Affiliation(s)
- Ai-bing Zhang
- College of Life Sciences, Capital Normal University, Beijing, People's Republic of China.
| | | | | | | | | | | | | |
Collapse
|
52
|
Zhang AB, Muster C, Liang HB, Zhu CD, Crozier R, Wan P, Feng J, Ward RD. A fuzzy-set-theory-based approach to analyse species membership in DNA barcoding. Mol Ecol 2011; 21:1848-63. [PMID: 21883585 DOI: 10.1111/j.1365-294x.2011.05235.x] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Reliable assignment of an unknown query sequence to its correct species remains a methodological problem for the growing field of DNA barcoding. While great advances have been achieved recently, species identification from barcodes can still be unreliable if the relevant biodiversity has been insufficiently sampled. We here propose a new notion of species membership for DNA barcoding-fuzzy membership, based on fuzzy set theory-and illustrate its successful application to four real data sets (bats, fishes, butterflies and flies) with more than 5000 random simulations. Two of the data sets comprise especially dense species/population-level samples. In comparison with current DNA barcoding methods, the newly proposed minimum distance (MD) plus fuzzy set approach, and another computationally simple method, 'best close match', outperform two computationally sophisticated Bayesian and BootstrapNJ methods. The new method proposed here has great power in reducing false-positive species identification compared with other methods when conspecifics of the query are absent from the reference database.
Collapse
Affiliation(s)
- A-B Zhang
- College of Life Sciences, Capital Normal University, Beijing 100048, China.
| | | | | | | | | | | | | | | |
Collapse
|
53
|
Smith MA, Eveleigh ES, McCann KS, Merilo MT, McCarthy PC, Van Rooyen KI. Barcoding a quantified food web: crypsis, concepts, ecology and hypotheses. PLoS One 2011; 6:e14424. [PMID: 21754977 PMCID: PMC3130735 DOI: 10.1371/journal.pone.0014424] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2010] [Accepted: 10/22/2010] [Indexed: 11/19/2022] Open
Abstract
The efficient and effective monitoring of individuals and populations is critically dependent on correct species identification. While this point may seem obvious, identifying the majority of the more than 100 natural enemies involved in the spruce budworm (Choristoneura fumiferana--SBW) food web remains a non-trivial endeavor. Insect parasitoids play a major role in the processes governing the population dynamics of SBW throughout eastern North America. However, these species are at the leading edge of the taxonomic impediment and integrating standardized identification capacity into existing field programs would provide clear benefits. We asked to what extent DNA barcoding the SBW food web would alter our understanding of the diversity and connectence of the food web and the frequency of generalists vs. specialists in different forest habitats. We DNA barcoded over 10% of the insects collected from the SBW food web in three New Brunswick forest plots from 1983 to 1993. For 30% of these specimens, we amplified at least one additional nuclear region. When the nodes of the food web were estimated based on barcode divergences (using molecular operational taxonomic units (MOTU) or phylogenetic diversity (PD)--the food web became much more diverse and connectence was reduced. We tested one measure of food web structure (the "bird feeder effect") and found no difference compared to the morphologically based predictions. Many, but not all, of the presumably polyphagous parasitoids now appear to be morphologically-cryptic host-specialists. To our knowledge, this project is the first to barcode a food web in which interactions have already been well-documented and described in space, time and abundance. It is poised to be a system in which field-based methods permit the identification capacity required by forestry scientists. Food web barcoding provided an effective tool for the accurate identification of all species involved in the cascading effects of future budworm outbreaks. Integrating standardized barcodes within food webs may ultimately change the face of community ecology. This will be most poignantly felt in food webs that have not yet been quantified. Here, more accurate and precise connections will be within the grasp of any researcher for the first time.
Collapse
Affiliation(s)
- M Alex Smith
- Biodiversity Institute of Ontario and Department of Integrative Biology, University of Guelph, Guelph, Ontario, Canada.
| | | | | | | | | | | |
Collapse
|
54
|
Matzen da Silva J, Creer S, dos Santos A, Costa AC, Cunha MR, Costa FO, Carvalho GR. Systematic and evolutionary insights derived from mtDNA COI barcode diversity in the Decapoda (Crustacea: Malacostraca). PLoS One 2011; 6:e19449. [PMID: 21589909 PMCID: PMC3093375 DOI: 10.1371/journal.pone.0019449] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Accepted: 04/06/2011] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Decapods are the most recognizable of all crustaceans and comprise a dominant group of benthic invertebrates of the continental shelf and slope, including many species of economic importance. Of the 17635 morphologically described Decapoda species, only 5.4% are represented by COI barcode region sequences. It therefore remains a challenge to compile regional databases that identify and analyse the extent and patterns of decapod diversity throughout the world. METHODOLOGY/PRINCIPAL FINDINGS We contributed 101 decapod species from the North East Atlantic, the Gulf of Cadiz and the Mediterranean Sea, of which 81 species represent novel COI records. Within the newly-generated dataset, 3.6% of the species barcodes conflicted with the assigned morphological taxonomic identification, highlighting both the apparent taxonomic ambiguity among certain groups, and the need for an accelerated and independent taxonomic approach. Using the combined COI barcode projects from the Barcode of Life Database, we provide the most comprehensive COI data set so far examined for the Order (1572 sequences of 528 species, 213 genera, and 67 families). Patterns within families show a general predicted molecular hierarchy, but the scale of divergence at each taxonomic level appears to vary extensively between families. The range values of mean K2P distance observed were: within species 0.285% to 1.375%, within genus 6.376% to 20.924% and within family 11.392% to 25.617%. Nucleotide composition varied greatly across decapods, ranging from 30.8 % to 49.4 % GC content. CONCLUSIONS/SIGNIFICANCE Decapod biological diversity was quantified by identifying putative cryptic species allowing a rapid assessment of taxon diversity in groups that have until now received limited morphological and systematic examination. We highlight taxonomic groups or species with unusual nucleotide composition or evolutionary rates. Such data are relevant to strategies for conservation of existing decapod biodiversity, as well as elucidating the mechanisms and constraints shaping the patterns observed.
Collapse
Affiliation(s)
- Joana Matzen da Silva
- Molecular Ecology and Fisheries Genetics Laboratory, School of Biological Sciences, Environment Centre for Wales, Bangor University, Bangor, Wales, United Kingdom.
| | | | | | | | | | | | | |
Collapse
|
55
|
Pacheco MA, Battistuzzi FU, Lentino M, Aguilar RF, Kumar S, Escalante AA. Evolution of modern birds revealed by mitogenomics: timing the radiation and origin of major orders. Mol Biol Evol 2011; 28:1927-42. [PMID: 21242529 DOI: 10.1093/molbev/msr014] [Citation(s) in RCA: 149] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Mitochondrial (mt) genes and genomes are among the major sources of data for evolutionary studies in birds. This places mitogenomic studies in birds at the core of intense debates in avian evolutionary biology. Indeed, complete mt genomes are actively been used to unveil the phylogenetic relationships among major orders, whereas single genes (e.g., cytochrome c oxidase I [COX1]) are considered standard for species identification and defining species boundaries (DNA barcoding). In this investigation, we study the time of origin and evolutionary relationships among Neoaves orders using complete mt genomes. First, we were able to solve polytomies previously observed at the deep nodes of the Neoaves phylogeny by analyzing 80 mt genomes, including 17 new sequences reported in this investigation. As an example, we found evidence indicating that columbiforms and charadriforms are sister groups. Overall, our analyses indicate that by improving the taxonomic sampling, complete mt genomes can solve the evolutionary relationships among major bird groups. Second, we used our phylogenetic hypotheses to estimate the time of origin of major avian orders as a way to test if their diversification took place prior to the Cretaceous/Tertiary (K/T) boundary. Such timetrees were estimated using several molecular dating approaches and conservative calibration points. Whereas we found time estimates slightly younger than those reported by others, most of the major orders originated prior to the K/T boundary. Finally, we used our timetrees to estimate the rate of evolution of each mt gene. We found great variation on the mutation rates among mt genes and within different bird groups. COX1 was the gene with less variation among Neoaves orders and the one with the least amount of rate heterogeneity across lineages. Such findings support the choice of COX 1 among mt genes as target for developing DNA barcoding approaches in birds.
Collapse
Affiliation(s)
- M Andreína Pacheco
- Center for Evolutionary Medicine and Informatics, The Biodesign Institute, Arizona State University, AZ, USA
| | | | | | | | | | | |
Collapse
|
56
|
Sherwood AR, Kurihara A, Conklin KY, Sauvage T, Presting GG. The Hawaiian Rhodophyta Biodiversity Survey (2006-2010): a summary of principal findings. BMC PLANT BIOLOGY 2010; 10:258. [PMID: 21092229 PMCID: PMC3012605 DOI: 10.1186/1471-2229-10-258] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2010] [Accepted: 11/22/2010] [Indexed: 05/25/2023]
Abstract
BACKGROUND The Hawaiian red algal flora is diverse, isolated, and well studied from a morphological and anatomical perspective, making it an excellent candidate for assessment using a combination of traditional taxonomic and molecular approaches. Acquiring and making these biodiversity data freely available in a timely manner ensures that other researchers can incorporate these baseline findings into phylogeographic studies of Hawaiian red algae or red algae found in other locations. RESULTS A total of 1,946 accessions are represented in the collections from 305 different geographical locations in the Hawaiian archipelago. These accessions represent 24 orders, 49 families, 152 genera and 252 species/subspecific taxa of red algae. One order of red algae (the Rhodachlyales) was recognized in Hawaii for the first time and 196 new island distributional records were determined from the survey collections. One family and four genera are reported for the first time from Hawaii, and multiple species descriptions are in progress for newly discovered taxa. A total of 2,418 sequences were generated for Hawaiian red algae in the course of this study--915 for the nuclear LSU marker, 864 for the plastidial UPA marker, and 639 for the mitochondrial COI marker. These baseline molecular data are presented as neighbor-joining trees to illustrate degrees of divergence within and among taxa. The LSU marker was typically most conserved, followed by UPA and COI. Phylogenetic analysis of a set of concatenated LSU, UPA and COI sequences recovered a tree that broadly resembled the current understanding of florideophyte red algal relationships, but bootstrap support was largely absent above the ordinal level. Phylogeographic trends are reported here for some common taxa within the Hawaiian Islands and include examples of those with, as well as without, intraspecific variation. CONCLUSIONS The UPA and COI markers were determined to be the most useful of the three and are recommended for inclusion in future algal biodiversity surveys. Molecular data for the survey provide the most extensive assessment of Hawaiian red algal diversity and, in combination with the morphological/anatomical and distributional data collected as part of the project, provide a solid baseline data set for future studies of the flora. The data are freely available via the Hawaiian Algal Database (HADB), which was designed and constructed to accommodate the results of the project. We present the first DNA sequence reference collection for a tropical Pacific seaweed flora, whose value extends beyond Hawaii since many Hawaiian taxa are shared with other tropical areas.
Collapse
Affiliation(s)
- Alison R Sherwood
- Botany Department, 3190 Maile Way, University of Hawaii, Honolulu, HI USA 96822
| | - Akira Kurihara
- Botany Department, 3190 Maile Way, University of Hawaii, Honolulu, HI USA 96822
- Kobe University Research Center for Inland Seas, 1-1 Rokkodai, Kobe 657-8501 Japan
| | - Kimberly Y Conklin
- Botany Department, 3190 Maile Way, University of Hawaii, Honolulu, HI USA 96822
- Hawaii Institute of Marine Biology, P.O. Box 1346, Kaneohe, HI USA 96744
| | - Thomas Sauvage
- Botany Department, 3190 Maile Way, University of Hawaii, Honolulu, HI USA 96822
| | - Gernot G Presting
- Department of Molecular Biosciences and Bioengineering, 1955 East-West Rd, University of Hawaii, Honolulu, HI USA 96822
| |
Collapse
|
57
|
Abstract
This study reports DNA barcodes for more than 1300 Lepidoptera species from the eastern half of North America, establishing that 99.3 per cent of these species possess diagnostic barcode sequences. Intraspecific divergences averaged just 0.43 per cent among this assemblage, but most values were lower. The mean was elevated by deep barcode divergences (greater than 2%) in 5.1 per cent of the species, often involving the sympatric occurrence of two barcode clusters. A few of these cases have been analysed in detail, revealing species overlooked by the current taxonomic system. This study also provided a large-scale test of the extent of regional divergence in barcode sequences, indicating that geographical differentiation in the Lepidoptera of eastern North America is small, even when comparisons involve populations as much as 2800 km apart. The present results affirm that a highly effective system for the identification of Lepidoptera in this region can be built with few records per species because of the limited intra-specific variation. As most terrestrial and marine taxa are likely to possess a similar pattern of population structure, an effective DNA-based identification system can be developed with modest effort.
Collapse
Affiliation(s)
- Paul D N Hebert
- Biodiversity Institute of Ontario, University of Guelph, Guelph, Ontario, Canada , N1G 2W1.
| | | | | |
Collapse
|