1
|
Bergmann T. CAOS-R: Character-Based Barcoding. Methods Mol Biol 2024; 2744:347-357. [PMID: 38683330 DOI: 10.1007/978-1-0716-3581-0_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
CAOS-Barcoding is a culmination of traditional taxonomy and modern DNA barcoding. CAOS identifies taxa by diagnostic characters as is done in traditional taxonomy and produces an identification matrix for taxon discrimination similar to DNA barcoding distance matrices. Here, I describe how to set up the CAOS-Barcoder and CAOS-Classifier software, which input data is needed, and how to interpret the output data. With the CAOS-Barcoder, single marker or concatenated data can be processed into diagnostic barcodes for taxon discrimination. The CAOS-Classifier can use the diagnostic barcodes for specimen identification.
Collapse
Affiliation(s)
- Tjard Bergmann
- Division of Ecology and Evolution, University of Veterinary Medicine Hannover, Hannover, Germany.
| |
Collapse
|
2
|
Barcoding Atlantic Canada's mesopelagic and upper bathypelagic marine fishes. PLoS One 2017; 12:e0185173. [PMID: 28931082 PMCID: PMC5607201 DOI: 10.1371/journal.pone.0185173] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2017] [Accepted: 09/07/2017] [Indexed: 12/02/2022] Open
Abstract
DNA barcode sequences were developed from 557 mesopelagic and upper bathypelagic teleost specimens collected in waters off Atlantic Canada. Confident morphological identifications were available for 366 specimens, of 118 species and 93 genera, which yielded 328 haplotypes. Five of the species were novel to the Barcode of Life Database (BOLD). Most of the 118 species conformed to expectations of monophyly and the presence of a “barcode gap”, though some known weaknesses in existing taxonomy were confirmed and a deficiency in published keys was revealed. Of the specimens for which no firm morphological identification was available, 156 were successfully identified to species, and a further 11 to genus, using their barcode sequences and a combination of distance- and character-based methods. The remaining 24 specimens were from species for which no reference barcode is yet available or else ones confused by apparent misidentification of publicly available sequences in BOLD. Addition of the new sequences to those previously in BOLD contributed support to recent taxonomic revisions of Chiasmodon and Poromitra, while it also revealed 18 cases of potential cryptic speciation. Most of the latter appear to result from genetic divergence among populations in different ocean basins, while the general lack of strong horizontal environmental gradients within the deep sea has allowed morphology to be conserved. Other examples of divergence appear to distinguish individuals living under the sub-tropical gyre of the North Atlantic from those under that ocean’s sub-polar gyre. In contrast, the available sequences for two myctophid species, Benthosema glaciale and Notoscopelus elongatus, showed genetic structuring on finer geographic scales. The observed structure was not consistent with recent suggestions that “resident” populations of myctophids can maintain allopatry despite the mixing of ocean waters. Rather, it indicates that the very rapid speciation characteristic of the Myctophidae is both on-going and detectable using barcodes.
Collapse
|
3
|
Rach J, Bergmann T, Paknia O, DeSalle R, Schierwater B, Hadrys H. The marker choice: Unexpected resolving power of an unexplored CO1 region for layered DNA barcoding approaches. PLoS One 2017; 12:e0174842. [PMID: 28406914 PMCID: PMC5390999 DOI: 10.1371/journal.pone.0174842] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 03/16/2017] [Indexed: 01/13/2023] Open
Abstract
The potential of DNA barcoding approaches to identify single species and characterize species compositions strongly depends on the marker choice. The prominent “Folmer region”, a 648 basepair fragment at the 5’ end of the mitochondrial CO1 gene, has been traditionally applied as a universal DNA barcoding region for metazoans. In order to find a suitable marker for biomonitoring odonates (dragonflies and damselflies), we here explore a new region of the CO1 gene (CO1B) for DNA barcoding in 51 populations of 23 dragonfly and damselfly species. We compare the “Folmer region”, the mitochondrial ND1 gene (NADH dehydrogenase 1) and the new CO1 region with regard to (i) speed and reproducibility of sequence generation, (ii) levels of homoplasy and (iii) numbers of diagnostic characters for discriminating closely related sister taxa and populations. The performances of the gene regions regarding these criteria were quite different. Both, the amplification of CO1B and ND1 was highly reproducible and CO1B showed the highest potential for discriminating sister taxa at different taxonomic levels. In contrast, the amplification of the “Folmer region” using the universal primers was difficult and the third codon positions of this fragment have experienced nucleotide substitution saturation. Most important, exploring this new barcode region of the CO1 gene identified a higher discriminating power between closely related sister taxa. Together with the design of layered barcode approaches adapted to the specific taxonomic “environment”, this new marker will further enhance the discrimination power at the species level.
Collapse
Affiliation(s)
- Jessica Rach
- ITZ, Ecology & Evolution, TiHo Hannover, Hannover, D-30559, Germany
| | - Tjard Bergmann
- ITZ, Ecology & Evolution, TiHo Hannover, Hannover, D-30559, Germany
| | - Omid Paknia
- ITZ, Ecology & Evolution, TiHo Hannover, Hannover, D-30559, Germany
| | - Rob DeSalle
- Sackler Institute of Comparative Genomics, American Museum of Natural History, New York, NY 10024, United States of America
| | - Bernd Schierwater
- ITZ, Ecology & Evolution, TiHo Hannover, Hannover, D-30559, Germany
- Sackler Institute of Comparative Genomics, American Museum of Natural History, New York, NY 10024, United States of America
| | - Heike Hadrys
- ITZ, Ecology & Evolution, TiHo Hannover, Hannover, D-30559, Germany
- Sackler Institute of Comparative Genomics, American Museum of Natural History, New York, NY 10024, United States of America
- * E-mail:
| |
Collapse
|
4
|
González-Castro M, Rosso JJ, Mabragaña E, Díaz de Astarloa JM. Surfing among species, populations and morphotypes: Inferring boundaries between two species of new world silversides (Atherinopsidae). C R Biol 2015; 339:10-23. [PMID: 26705969 DOI: 10.1016/j.crvi.2015.11.004] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2015] [Revised: 11/17/2015] [Accepted: 11/18/2015] [Indexed: 11/16/2022]
Abstract
Atherinopsidae are widespread freshwater and shallow marine fish with singular economic importance. Morphological, genetical and life cycles differences between marine and estuarine populations were already reported in this family, suggesting ongoing speciation. Also, coexistence and interbreeding between closely related species were documented. The aim of this study was to infer boundaries among: (A) Odontesthes bonariensis and O. argentinensis at species level, and intermediate morphs; (B) the population of O. argentinensis of Mar Chiquita Lagoon and its marine conspecifics. To achieve this, we integrated, meristic, Geometrics Morphometrics and DNA Barcode approaches. Four groups were discriminated and subsequently characterized according to their morphological traits, shape and meristic characters. No shared haplotypes between O. bonariensis and O. argentinensis were found. Significative-meristic and body shape differences between the Mar Chiquita and marine individuals of O. argentinensis were found, suggesting they behave as well differentiated populations, or even incipient ecological species. The fact that the Odontesthes morphotypes shared haplotypes with both, O. argentinensis and O. bonariensis, but also possess meristic and morphometric distinctive traits open new questions related to the origin of this morphogroup.
Collapse
Affiliation(s)
- Mariano González-Castro
- Grupo de Biotaxonomía Morfológica y molecular de peces, IIMyC-CONICET, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina.
| | - Juan José Rosso
- Grupo de Biotaxonomía Morfológica y molecular de peces, IIMyC-CONICET, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Ezequiel Mabragaña
- Grupo de Biotaxonomía Morfológica y molecular de peces, IIMyC-CONICET, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Juan Martín Díaz de Astarloa
- Grupo de Biotaxonomía Morfológica y molecular de peces, IIMyC-CONICET, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|
5
|
Chen W, Ma X, Shen Y, Mao Y, He S. The fish diversity in the upper reaches of the Salween River, Nujiang River, revealed by DNA barcoding. Sci Rep 2015; 5:17437. [PMID: 26616046 PMCID: PMC4663501 DOI: 10.1038/srep17437] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Accepted: 10/29/2015] [Indexed: 11/09/2022] Open
Abstract
Nujiang River (NR), an essential component of the biodiversity hotspot of the
Mountains of Southwest China, possesses a characteristic fish fauna and contains
endemic species. Although previous studies on fish diversity in the NR have
primarily consisted of listings of the fish species observed during field
collections, in our study, we DNA-barcoded 1139 specimens belonging to 46
morphologically distinct fish species distributed throughout the NR basin by
employing multiple analytical approaches. According to our analyses, DNA barcoding
is an efficient method for the identification of fish by the presence of barcode
gaps. However, three invasive species are characterized by deep conspecific
divergences, generating multiple lineages and Operational Taxonomic Units (OTUs),
implying the possibility of cryptic species. At the other end of the spectrum, ten
species (from three genera) that are characterized by an overlap between their
intra- and interspecific genetic distances form a single genetic cluster and share
haplotypes. The neighbor-joining phenogram, Barcode Index Numbers (BINs) and
Automatic Barcode Gap Discovery (ABGD) identified 43 putative species, while the
General Mixed Yule-coalescence (GMYC) identified five more OTUs. Thus, our study
established a reliable DNA barcode reference library for the fish in the NR and
sheds new light on the local fish diversity.
Collapse
Affiliation(s)
- Weitao Chen
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, 430072, China.,Graduate school of Chinese Academy of Sciences, Beijing, 10001, China
| | - Xiuhui Ma
- School of life science, Southwest University, Beibei, Chongqing, 400715, China
| | - Yanjun Shen
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, 430072, China.,Graduate school of Chinese Academy of Sciences, Beijing, 10001, China
| | - Yuntao Mao
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, 430072, China.,Graduate school of Chinese Academy of Sciences, Beijing, 10001, China
| | - Shunping He
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, 430072, China
| |
Collapse
|
6
|
Li L, Ji G, Ye C, Shu C, Zhang J, Liang C. PlantOrDB: a genome-wide ortholog database for land plants and green algae. BMC PLANT BIOLOGY 2015; 15:161. [PMID: 26112452 PMCID: PMC4481079 DOI: 10.1186/s12870-015-0531-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Accepted: 05/21/2015] [Indexed: 05/07/2023]
Abstract
BACKGROUND Genes with different functions are originally generated from some ancestral genes by gene duplication, mutation and functional recombination. It is widely accepted that orthologs are homologous genes evolved from speciation events while paralogs are homologous genes resulted from gene duplication events.With the rapid increase of genomic data, identifying and distinguishing these genes among different species is becoming an important part of functional genomics research. DESCRIPTION Using 35 plant and 6 green algal genomes from Phytozome v9, we clustered 1,291,670 peptide sequences into 49,355 homologous gene families in terms of sequence similarity. For each gene family, we have generated a peptide sequence alignment and phylogenetic tree, and identified the speciation/duplication events for every node within the tree. For each node, we also identified and highlighted diagnostic characters that facilitate appropriate addition of a new query sequence into the existing phylogenetic tree and sequence alignment of its best matched gene family. Based on a desired species or subgroup of all species, users can view the phylogenetic tree, sequence alignment and diagnostic characters for a given gene family selectively. PlantOrDB not only allows users to identify orthologs or paralogs from phylogenetic trees, but also provides all orthologs that are built using Reciprocal Best Hit (RBH) pairwise alignment method. Users can upload their own sequences to find the best matched gene families, and visualize their query sequences within the relevant phylogenetic trees and sequence alignments. CONCLUSION PlantOrDB ( http://bioinfolab.miamioh.edu/plantordb ) is a genome-wide ortholog database for land plants and green algae. PlantOrDB offers highly interactive visualization, accurate query classification and powerful search functions useful for functional genomic research.
Collapse
Affiliation(s)
- Lei Li
- Department of Automation, Xiamen University, Fujian, 361005, China.
- Department of Biology, Miami University, Oxford, OH, 45056, USA.
| | - Guoli Ji
- Department of Automation, Xiamen University, Fujian, 361005, China.
- Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, Fujian, 361005, China.
| | - Congting Ye
- Department of Automation, Xiamen University, Fujian, 361005, China.
- Department of Biology, Miami University, Oxford, OH, 45056, USA.
| | - Changlong Shu
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| | - Jie Zhang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| | - Chun Liang
- Department of Biology, Miami University, Oxford, OH, 45056, USA.
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, 100193, China.
| |
Collapse
|
7
|
Abstract
Accurate identification of unknown specimens by means of DNA barcoding is contingent on the presence of a DNA barcoding gap, among other factors, as its absence may result in dubious specimen identifications - false negatives or positives. Whereas the utility of DNA barcoding would be greatly reduced in the absence of a distinct and sufficiently sized barcoding gap, the limits of intraspecific and interspecific distances are seldom thoroughly inspected across comprehensive sampling. The present study aims to illuminate this aspect of barcoding in a comprehensive manner for the animal phylum Annelida. All cytochrome c oxidase subunit I sequences (cox1 gene; the chosen region for zoological DNA barcoding) present in GenBank for Annelida, as well as for "Polychaeta", "Oligochaeta", and Hirudinea separately, were downloaded and curated for length, coverage and potential contaminations. The final datasets consisted of 9782 (Annelida), 5545 ("Polychaeta"), 3639 ("Oligochaeta"), and 598 (Hirudinea) cox1 sequences and these were either (i) used as is in an automated global barcoding gap detection analysis or (ii) further analyzed for genetic distances, separated into bins containing intraspecific and interspecific comparisons and plotted in a graph to visualize any potential global barcoding gap. Over 70 million pairwise genetic comparisons were made and results suggest that although there is a tendency towards separation, no distinct or sufficiently sized global barcoding gap exists in either of the datasets rendering future barcoding efforts at risk of erroneous specimen identifications (but local barcoding gaps may still exist allowing for the identification of specimens at lower taxonomic ranks). This seems to be especially true for earthworm taxa, which account for fully 35% of the total number of interspecific comparisons that show 0% divergence.
Collapse
Affiliation(s)
- Sebastian Kvist
- a Museum of Comparative Zoology, Department of Organismic and Evolutionary Biology, Harvard University , Cambridge , MA , USA
| |
Collapse
|
8
|
Fish pathogens near the Arctic Circle: molecular, morphological and ecological evidence for unexpected diversity of Diplostomum (Digenea: diplostomidae) in Iceland. Int J Parasitol 2014; 44:703-15. [DOI: 10.1016/j.ijpara.2014.04.009] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2014] [Revised: 03/21/2014] [Accepted: 04/16/2014] [Indexed: 11/30/2022]
|
9
|
Kvist S. Barcoding in the dark?: A critical view of the sufficiency of zoological DNA barcoding databases and a plea for broader integration of taxonomic knowledge. Mol Phylogenet Evol 2013; 69:39-45. [DOI: 10.1016/j.ympev.2013.05.012] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Revised: 05/14/2013] [Accepted: 05/16/2013] [Indexed: 12/16/2022]
|
10
|
Wong EHK, Shivji MS, Hanner RH. Identifying sharks with DNA barcodes: assessing the utility of a nucleotide diagnostic approach. Mol Ecol Resour 2013; 9 Suppl s1:243-56. [PMID: 21564984 DOI: 10.1111/j.1755-0998.2009.02653.x] [Citation(s) in RCA: 103] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Shark fisheries worldwide are mostly unmanaged, but the burgeoning shark fin industry in the last few decades has made monitoring catch and trade of these animals critical. As a tool for molecular species identification, DNA barcoding offers significant potential. However, the genetic distance-based approach towards species identification employed by the Barcode of Life Data Systems may oftentimes lack the specificity needed for regulatory or legal applications that require unambiguous identification results. This is because such specificity is not typically realized by anything less than a 100% match of the query sequence to an entry in the reference database using genetic distance. Although various divergence thresholds have been proposed to define acceptable levels of intraspecific variation, enough exceptions exist to cast reasonable doubt on many less than exact matches using a distance-based approach for the identification of unknowns. An alternative approach relies on the identification of discrete molecular characters that can be used to unambiguously diagnose species. The objective of this study was to assess the performance differences between these competing approaches by examining more than 1000 DNA barcodes representing nearly 20% of all known elasmobranch species. Our results demonstrate that a character-based, nucleotide diagnostic (ND) approach to barcode identification is feasible and also provides novel insights into the structure of haplotype diversity among closely related species of sharks. Considerations for the use of NDs in applied fields are also explored.
Collapse
Affiliation(s)
- Eugene H-K Wong
- Department of Integrative Biology, University of Guelph, 50 Stone Road East, Guelph, ON, Canada N1G 2W1, Guy Harvey Research Institute and Save Our Seas Shark Center, Nova Southeastern University, 8000 North Ocean Drive, Dania Beach, FL 33004, USA
| | | | | |
Collapse
|
11
|
Bergmann T, Rach J, Damm S, DeSalle R, Schierwater B, Hadrys H. The potential of distance-based thresholds and character-based DNA barcoding for defining problematic taxonomic entities by CO1 and ND1. Mol Ecol Resour 2013; 13:1069-81. [DOI: 10.1111/1755-0998.12125] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 04/09/2013] [Indexed: 11/28/2022]
Affiliation(s)
- T. Bergmann
- ITZ Ecology & Evolution; TiHo Hannover; Bünteweg 17d; D-30559; Hannover; Germany
| | - J. Rach
- ITZ Ecology & Evolution; TiHo Hannover; Bünteweg 17d; D-30559; Hannover; Germany
| | - S. Damm
- ITZ Ecology & Evolution; TiHo Hannover; Bünteweg 17d; D-30559; Hannover; Germany
| | - R. DeSalle
- American Museum of Natural History; The Sackler Institute for Comparative Genomics; New York; NY; 10024; USA
| | | | | |
Collapse
|
12
|
van Velzen R, Weitschek E, Felici G, Bakker FT. DNA barcoding of recently diverged species: relative performance of matching methods. PLoS One 2012; 7:e30490. [PMID: 22272356 PMCID: PMC3260286 DOI: 10.1371/journal.pone.0030490] [Citation(s) in RCA: 124] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Accepted: 12/22/2011] [Indexed: 12/23/2022] Open
Abstract
Recently diverged species are challenging for identification, yet they are frequently of special interest scientifically as well as from a regulatory perspective. DNA barcoding has proven instrumental in species identification, especially in insects and vertebrates, but for the identification of recently diverged species it has been reported to be problematic in some cases. Problems are mostly due to incomplete lineage sorting or simply lack of a 'barcode gap' and probably related to large effective population size and/or low mutation rate. Our objective was to compare six methods in their ability to correctly identify recently diverged species with DNA barcodes: neighbor joining and parsimony (both tree-based), nearest neighbor and BLAST (similarity-based), and the diagnostic methods DNA-BAR, and BLOG. We analyzed simulated data assuming three different effective population sizes as well as three selected empirical data sets from published studies. Results show, as expected, that success rates are significantly lower for recently diverged species (∼75%) than for older species (∼97%) (P<0.00001). Similarity-based and diagnostic methods significantly outperform tree-based methods, when applied to simulated DNA barcode data (P<0.00001). The diagnostic method BLOG had highest correct query identification rate based on simulated (86.2%) as well as empirical data (93.1%), indicating that it is a consistently better method overall. Another advantage of BLOG is that it offers species-level information that can be used outside the realm of DNA barcoding, for instance in species description or molecular detection assays. Even though we can confirm that identification success based on DNA barcoding is generally high in our data, recently diverged species remain difficult to identify. Nevertheless, our results contribute to improved solutions for their accurate identification.
Collapse
Affiliation(s)
- Robin van Velzen
- Biosystematics Group, Wageningen University, Wageningen, The Netherlands.
| | | | | | | |
Collapse
|
13
|
Little DP. DNA barcode sequence identification incorporating taxonomic hierarchy and within taxon variability. PLoS One 2011; 6:e20552. [PMID: 21857897 PMCID: PMC3156709 DOI: 10.1371/journal.pone.0020552] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2010] [Accepted: 05/04/2011] [Indexed: 11/19/2022] Open
Abstract
For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple-sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple-sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment-free sequence identification algorithm--BRONX--that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple-sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user-defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini-barcode queries against a full-length barcode database). BRONX consistently produced better identifications at the genus-level for all query types.
Collapse
Affiliation(s)
- Damon P Little
- Lewis B. and Dorothy Cullman Program for Molecular Systematics, The New York Botanical Garden, Bronx, New York, United States of America.
| |
Collapse
|
14
|
Oceguera-Figueroa A, León-Règagnon V, Siddall ME. DNA barcoding reveals Mexican diversity within the freshwater leech genus Helobdella (Annelida: Glossiphoniidae). ACTA ACUST UNITED AC 2011; 21 Suppl 1:24-9. [PMID: 21271855 DOI: 10.3109/19401736.2010.527965] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
We investigated the genetic distances and taxonomic status among species of Helobdella, a genus of non-blood-feeding leeches, based on mitochondrial cytochrome c oxidase subunit I sequences. Sampling included 20 specimens representing nine nominal species collected in 11 states in Mexico as well as previously published sequences of different species of Helobdella from several places. A neighbor-joining tree, as well as identification of diagnostic nucleotides, was used to suggest the presence of seven species of Helobdella in Mexico including potentially two undescribed forms.
Collapse
|
15
|
Abstract
More than 230,000 known species representing 31 metazoan phyla populate the world's oceans. Perhaps another 1,000,000 or more species remain to be discovered. There is reason for concern that species extinctions may out-pace discovery, especially in diverse and endangered marine habitats such as coral reefs. DNA barcodes (i.e., short DNA sequences for species recognition and discrimination) are useful tools to accelerate species-level analysis of marine biodiversity and to facilitate conservation efforts. This review focuses on the usual barcode region for metazoans: a approximately 648 base-pair region of the mitochondrial cytochrome c oxidase subunit I (COI) gene. Barcodes have also been used for population genetic and phylogeographic analysis, identification of prey in gut contents, detection of invasive species, forensics, and seafood safety. More controversially, barcodes have been used to delimit species boundaries, reveal cryptic species, and discover new species. Emerging frontiers are the use of barcodes for rapid and increasingly automated biodiversity assessment by high-throughput sequencing, including environmental barcoding and the use of barcodes to detect species for which formal identification or scientific naming may never be possible.
Collapse
Affiliation(s)
- Ann Bucklin
- Department of Marine Sciences, University of Connecticut, Groton, Connecticut 06340, USA.
| | | | | |
Collapse
|
16
|
Goldstein PZ, DeSalle R. Integrating DNA barcode data and taxonomic practice: Determination, discovery, and description. Bioessays 2010; 33:135-47. [DOI: 10.1002/bies.201000036] [Citation(s) in RCA: 242] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
17
|
Damm S, Schierwater B, Hadrys H. An integrative approach to species discovery in odonates: from character-based DNA barcoding to ecology. Mol Ecol 2010; 19:3881-93. [PMID: 20701681 DOI: 10.1111/j.1365-294x.2010.04720.x] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Modern taxonomy requires an analytical approach incorporating all lines of evidence into decision-making. Such an approach can enhance both species identification and species discovery. The character-based DNA barcode method provides a molecular data set that can be incorporated into classical taxonomic data such that the discovery of new species can be made in an analytical framework that includes multiple sources of data. We here illustrate such a corroborative framework in a dragonfly model system that permits the discovery of two new, but visually cryptic species. In the African dragonfly genus Trithemis three distinct genetic clusters can be detected which could not be identified by using classical taxonomic characters. In order to test the hypothesis of two new species, DNA-barcodes from different sequence markers (ND1 and COI) were combined with morphological, ecological and biogeographic data sets. Phylogenetic analyses and incorporation of all data sets into a scheme called taxonomic circle highly supports the hypothesis of two new species. Our case study suggests an analytical approach to modern taxonomy that integrates data sets from different disciplines, thereby increasing the ease and reliability of both species discovery and species assignment.
Collapse
Affiliation(s)
- Sandra Damm
- ITZ, Ecology & Evolution, TiHo Hannover, Hannover, Germany.
| | | | | |
Collapse
|
18
|
Kvist S, Sarkar IN, Erséus C. Genetic variation and phylogeny of the cosmopolitan marine genus Tubificoides (Annelida: Clitellata: Naididae: Tubificinae). Mol Phylogenet Evol 2010; 57:687-702. [PMID: 20801225 DOI: 10.1016/j.ympev.2010.08.018] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2010] [Revised: 08/16/2010] [Accepted: 08/17/2010] [Indexed: 11/25/2022]
Abstract
Prior attempts to resolve the phylogenetic relationships of the cosmopolitan, marine clitellate genus Tubificoides, using only morphology, resulted in unresolved trees. In this study, three mitochondrial and three nuclear loci (5912 aligned sites) were analyzed, representing 14 morphologically separate species. Genetic distances within and between these forms on the basis of the mitochondrial genes (COI, 16S and 12S) revealed that 18 distinct mitochondrial lineages were represented in the data set. After analyzing also nuclear data (28S, 18S and ITS) we conclude that 17 separately evolving lineages (i.e., phylogenetic species) were represented, including three new, cryptic species closely related to T. pseudogaster, T. amplivasatus and T. insularis, respectively. Special emphasis was put on the DNA barcoding gene (COI), which was subject to haplotype diversity analysis and, for four species, diagnostic position (as determined by the Characteristic Attribute Organization System [CAOS]) screening. Typically, the intralineage variation was 1-2 orders of magnitude smaller than the interlineage divergence, making COI useful for identification of species within Tubificoides. The genetic data corroborate that many of the morphospecies are coherent but widely distributed metapopulations. Monophyly of the genus is supported and the evolutionary history of parts of the genus is revealed by phylogenetic analysis of the combined data set. A northern hemisphere origin of the genus is suggested, and most of the widely distributed species are members of one particular clade. Two morphological characters previously emphasized in Tubificoides taxonomy (hair chaetae and cuticular papillation) were optimized on the phylogenetic tree, revealing considerable homoplasy, belying the utility of these features as phylogenetic markers.
Collapse
Affiliation(s)
- Sebastian Kvist
- Department of Zoology, University of Gothenburg, Box 463, SE-405 30 Göteborg, Sweden.
| | | | | |
Collapse
|
19
|
Pettengill JB, Neel MC. An evaluation of candidate plant DNA barcodes and assignment methods in diagnosing 29 species in the genus Agalinis (Orobanchaceae). AMERICAN JOURNAL OF BOTANY 2010; 97:1391-406. [PMID: 21616891 DOI: 10.3732/ajb.0900176] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
PREMISE OF THE STUDY DNA barcoding has been proposed as a useful technique within many disciplines (e.g., conservation biology and forensics) for determining the taxonomic identity of a sample based on nucleotide similarity to samples of known taxonomy. Application of DNA barcoding to plants has primarily focused on evaluating the success of candidate barcodes across a broad spectrum of evolutionary divergence. Less attention has been paid to evaluating performance when distinguishing congeners or to differential success of analytical techniques despite the fact that the practical application and utility of barcoding hinges on the ability to distinguish closely related species. • METHODS We tested the ability to distinguish among 92 samples representing 29 putative species in the genus Agalinis (Orobanchaceae) using 13 candidate barcodes and three analytical methods (i.e., threshold genetic distances, hierarchical tree-based, and diagnostic character differences). Due to questions regarding evolutionary distinctiveness of some taxa, we evaluated success under two taxonomic hypotheses. • KEY RESULTS The psbA-trnH and trnT-trnL barcodes in conjunction with the "best close match" distance-based method best met the objectives of DNA barcoding. Success was also a function of the taxonomy used. • CONCLUSIONS In addition to accurately identifying query sequences, our results showed that DNA barcoding is useful for detecting taxonomic uncertainty; determining whether erroneous taxonomy or incomplete lineage sorting is the cause requires additional information provided by traditional taxonomic approaches. The magnitude of differentiation within and among the Agalinis species sampled suggests that our results inform how DNA barcoding will perform among closely related species in other genera.
Collapse
Affiliation(s)
- James B Pettengill
- Behavior, Ecology, Evolution, and Systematics Graduate Program, 2174 Plant Sciences Building, University of Maryland College Park, College Park, Maryland 20742
| | | |
Collapse
|
20
|
Genetic identification of Southern Ocean octopod samples using mtCOI. C R Biol 2010; 333:395-404. [PMID: 20451881 DOI: 10.1016/j.crvi.2010.02.002] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2010] [Accepted: 02/01/2010] [Indexed: 11/23/2022]
Abstract
East Antarctic octopods were identified by sequencing mtCOI and using four analytical approaches: Neighbor-joining by Kimura-2-Parameter-based distances, character-based, BLAST, and Bayesian Inference of Phylogeny. Although the distance-based analytical approaches identified a high proportion of the sequences (99.5% to genus and 88.1% to species level), these results are undermined by the absence of a clear gap between intra- and interspecific variation. The character-based approach gave highly conflicting results compared to the distance-based methods and failed to identify apomorphic characters for many of the species. While a DNA independent approach is necessary for validation of the method comparisons, crude morphological observations give early support to the distance-based results and indicate extensive range expansions of several species compared to previous studies. Furthermore, the use of distance-based phylogenetic methods nevertheless group specimens into plausible species clades that are highly useful in non-taxonomical or non-systematic studies.
Collapse
|
21
|
Casiraghi M, Labra M, Ferri E, Galimberti A, De Mattia F. DNA barcoding: a six-question tour to improve users' awareness about the method. Brief Bioinform 2010; 11:440-53. [PMID: 20156987 DOI: 10.1093/bib/bbq003] [Citation(s) in RCA: 103] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
DNA barcoding is a recent and widely used molecular-based identification system that aims to identify biological specimens, and to assign them to a given species. However, DNA barcoding is even more than this, and besides many practical uses, it can be considered the core of an integrated taxonomic system, where bioinformatics plays a key role. DNA barcoding data could be interpreted in different ways depending on the examined taxa but the technique relies on standardized approaches, methods and analyses. The existing reference towards a common way to treat DNA barcoding data, analyses and results is the Barcode of Life Data Systems. However, the scientific community has produced in the recent years a number of alternative methods to manage barcoding data. The present work starts from this point, because users should be aware of the consequences their choices produce on the results. Despite the fact that a strict standardization is the essence of DNA barcoding, we propose a tour of six questions to improve the users' awareness about the method, the correct use of concepts and alternative tools provided by scientific community.
Collapse
Affiliation(s)
- Maurizio Casiraghi
- ZooPlantLab, Dipartimento di Biotecnologie e Bioscienze, Università degli Studi di Milano Bicocca, Piazza della Scienza 2 - 20126, Milan, Italy.
| | | | | | | | | |
Collapse
|
22
|
NARO‐MACIEL EUGENIA, REID BRENDAN, FITZSIMMONS NANCYN, LE MINH, DESALLE ROB, AMATO GEORGE. DNA barcodes for globally threatened marine turtles: a registry approach to documenting biodiversity. Mol Ecol Resour 2010; 10:252-63. [DOI: 10.1111/j.1755-0998.2009.02747.x] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- EUGENIA NARO‐MACIEL
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
- Center for Biodiversity and Conservation, American Museum of Natural History, New York, NY 10024, USA
| | - BRENDAN REID
- Department of Ecology, Evolution and Environmental Biology, Columbia University, New York, NY 10027, USA
| | - NANCY N. FITZSIMMONS
- Institute for Applied Ecology, University of Canberra, Canberra, ACT 2601, Australia
| | - MINH LE
- Center for Natural Resources and Environmental Studies, Vietnam National University, 19 Le Thanh Tong St., Hanoi, Vietnam
- Department of Herpetology, American Museum of Natural History, New York, NY 10024, USA
| | - ROB DESALLE
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
| | - GEORGE AMATO
- Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, NY 10024, USA
| |
Collapse
|
23
|
Austerlitz F, David O, Schaeffer B, Bleakley K, Olteanu M, Leblois R, Veuille M, Laredo C. DNA barcode analysis: a comparison of phylogenetic and statistical classification methods. BMC Bioinformatics 2009; 10 Suppl 14:S10. [PMID: 19900297 PMCID: PMC2775147 DOI: 10.1186/1471-2105-10-s14-s10] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. Results No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci - nuclear genes - improved the predictive performance of most methods. Conclusion The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.
Collapse
Affiliation(s)
- Frederic Austerlitz
- CNRS, Laboratoire Ecologie Systématique et Evolution, UMR 8079, Orsay, F-91405, France.
| | | | | | | | | | | | | | | |
Collapse
|
24
|
Abstract
Background According to many field experts, specimens classification based on morphological keys needs to be supported with automated techniques based on the analysis of DNA fragments. The most successful results in this area are those obtained from a particular fragment of mitochondrial DNA, the gene cytochrome c oxidase I (COI) (the "barcode"). Since 2004 the Consortium for the Barcode of Life (CBOL) promotes the collection of barcode specimens and the development of methods to analyze the barcode for several tasks, among which the identification of rules to correctly classify an individual into its species by reading its barcode. Results We adopt a Logic Mining method based on two optimization models and present the results obtained on two datasets where a number of COI fragments are used to describe the individuals that belong to different species. The method proposed exhibits high correct recognition rates on a training-testing split of the available data using a small proportion of the information available (e.g., correct recognition approx. 97% when only 20 sites of the 648 available are used). The method is able to provide compact formulas on the values (A, C, G, T) at the selected sites that synthesize the characteristic of each species, a relevant information for taxonomists. Conclusion We have presented a Logic Mining technique designed to analyze barcode data and to provide detailed output of interest to the taxonomists and the barcode community represented in the CBOL Consortium. The method has proven to be effective, efficient and precise.
Collapse
Affiliation(s)
- Paola Bertolazzi
- Istituto di Analisi dei Sistemi e Informatica Antonio Ruberti, Consiglio Nazionale delle Ricerche, Viale Manzoni 30, 00185, Rome, Italy.
| | | | | |
Collapse
|
25
|
Lee ES, Son DS, Kim SH, Lee J, Jo J, Han J, Kim H, Lee HJ, Choi HY, Jung Y, Park M, Lim YS, Kim K, Shim Y, Kim BC, Lee K, Huh N, Ko C, Park K, Lee JW, Choi YS, Kim J. Prediction of recurrence-free survival in postoperative non-small cell lung cancer patients by using an integrated model of clinical information and gene expression. Clin Cancer Res 2009; 14:7397-404. [PMID: 19010856 DOI: 10.1158/1078-0432.ccr-07-4937] [Citation(s) in RCA: 205] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
PURPOSE One of the main challenges of lung cancer research is identifying patients at high risk for recurrence after surgical resection. Simple, accurate, and reproducible methods of evaluating individual risks of recurrence are needed. EXPERIMENTAL DESIGN Based on a combined analysis of time-to-recurrence data, censoring information, and microarray data from a set of 138 patients, we selected statistically significant genes thought to be predictive of disease recurrence. The number of genes was further reduced by eliminating those whose expression levels were not reproducible by real-time quantitative PCR. Within these variables, a recurrence prediction model was constructed using Cox proportional hazard regression and validated via two independent cohorts (n = 56 and n = 59). RESULTS After performing a log-rank test of the microarray data and successively selecting genes based on real-time quantitative PCR analysis, the most significant 18 genes had P values of <0.05. After subsequent stepwise variable selection based on gene expression information and clinical variables, the recurrence prediction model consisted of six genes (CALB1, MMP7, SLC1A7, GSTA1, CCL19, and IFI44). Two pathologic variables, pStage and cellular differentiation, were developed. Validation by two independent cohorts confirmed that the proposed model is significantly accurate (P = 0.0314 and 0.0305, respectively). The predicted median recurrence-free survival times for each patient correlated well with the actual data. CONCLUSIONS We have developed an accurate, technically simple, and reproducible method for predicting individual recurrence risks. This model would potentially be useful in developing customized strategies for managing lung cancer.
Collapse
Affiliation(s)
- Eung-Sirk Lee
- Cancer Research Center, Center for Clinical Research, Samsung Biomedical Research Institute, Seoul, South Korea
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
26
|
Abstract
OrthologID (http://nypg.bio.nyu.edu/orthologid/) allows for the rapid and accurate identification of gene orthology within a character-based phylogenetic framework. The Web application has two functions - an orthologous group search and a query orthology classification. The former determines orthologous gene sets for complete genomes and identifies diagnostic characters that define each orthologous gene set; and the latter allows for the classification of unknown query sequences to orthology groups. The first module of the Web application, the gene family generator, uses an E-value based approach to sort genes into gene families. An alignment constructor then aligns members of gene families and the resulting gene family alignments are submitted to the tree builder to obtain gene family guide trees. Finally, the diagnostics generator extracts diagnostic characters from guide trees and these diagnostics are used to determine gene orthology for query sequences.
Collapse
Affiliation(s)
- Mary Egan
- Department of Biology, Montclair State University, Montclair, NJ, USA
| | | | | | | | | |
Collapse
|
27
|
Four years of DNA barcoding: Current advances and prospects. INFECTION GENETICS AND EVOLUTION 2008; 8:727-36. [DOI: 10.1016/j.meegid.2008.05.005] [Citation(s) in RCA: 240] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2008] [Revised: 05/23/2008] [Accepted: 05/27/2008] [Indexed: 11/21/2022]
|
28
|
Abstract
The success of character-based DNA barcoding depends on the efficient identification of diagnostic character states from molecular sequences that have been organized hierarchically (e.g. according to phylogenetic methods). Similarly, the reliability of these identified diagnostic character states must be assessed according to their ability to diagnose new sequences. Here, a set of software tools is presented that implement the previously described Characteristic Attribute Organization System for both diagnostic identification and diagnostic-based classification. The software is publicly available from http://sarkarlab.mbl.edu/CAOS.
Collapse
Affiliation(s)
- Indra Neil Sarkar
- MBLWHOI Library, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA, Sackler Institute for Comparative Genomics, American Museum of Natural History, 79th Street at CPW, New York, NY 10024, USA
| | | | | |
Collapse
|
29
|
Rach J, DeSalle R, Sarkar I, Schierwater B, Hadrys H. Character-based DNA barcoding allows discrimination of genera, species and populations in Odonata. Proc Biol Sci 2008; 275:237-47. [PMID: 17999953 PMCID: PMC2212734 DOI: 10.1098/rspb.2007.1290] [Citation(s) in RCA: 189] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Revised: 10/17/2007] [Accepted: 10/18/2007] [Indexed: 11/12/2022] Open
Abstract
DNA barcoding has become a promising means for identifying organisms of all life stages. Currently, phenetic approaches and tree-building methods have been used to define species boundaries and discover 'cryptic species'. However, a universal threshold of genetic distance values to distinguish taxonomic groups cannot be determined. As an alternative, DNA barcoding approaches can be 'character based', whereby species are identified through the presence or absence of discrete nucleotide substitutions (character states) within a DNA sequence. We demonstrate the potential of character-based DNA barcodes by analysing 833 odonate specimens from 103 localities belonging to 64 species. A total of 54 species and 22 genera could be discriminated reliably through unique combinations of character states within only one mitochondrial gene region (NADH dehydrogenase 1). Character-based DNA barcodes were further successfully established at a population level discriminating seven population-specific entities out of a total of 19 populations belonging to three species. Thus, for the first time, DNA barcodes have been found to identify entities below the species level that may constitute separate conservation units or even species units. Our findings suggest that character-based DNA barcoding can be a rapid and reliable means for (i) the assignment of unknown specimens to a taxonomic group, (ii) the exploration of diagnosability of conservation units, and (iii) complementing taxonomic identification systems.
Collapse
Affiliation(s)
- J Rach
- ITZ, Ecology and EvolutionTiHo Hannover, Bünteweg 17d, 30559 Hannover, Germany
| | - R DeSalle
- Division of Invertebrate Zoology, American Museum of Natural HistoryNew York, NY 10024, USA
| | - I.N Sarkar
- MBLWHOI Library, Marine Biological LaboratoryWoods Hole, MA 02543, USA
| | - B Schierwater
- ITZ, Ecology and EvolutionTiHo Hannover, Bünteweg 17d, 30559 Hannover, Germany
- Division of Invertebrate Zoology, American Museum of Natural HistoryNew York, NY 10024, USA
| | - H Hadrys
- ITZ, Ecology and EvolutionTiHo Hannover, Bünteweg 17d, 30559 Hannover, Germany
- Department of Ecology and Evolutionary Biology, Yale UniversityNew Haven, CT 06520-8104, USA
| |
Collapse
|
30
|
Zhao Y, Chen Y, Zhang X. A Novel Ensemble Approach for Cancer Data Classification. ADVANCES IN NEURAL NETWORKS – ISNN 2007 2007. [DOI: 10.1007/978-3-540-72393-6_143] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
31
|
KELLY RYANP, SARKAR INDRANEIL, EERNISSE DOUGLASJ, DESALLE ROB. DNA barcoding using chitons (genusMopalia). ACTA ACUST UNITED AC 2007. [DOI: 10.1111/j.1471-8286.2006.01641.x] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
32
|
DeSalle R. What’s in a character? J Biomed Inform 2006; 39:6-17. [PMID: 16384747 DOI: 10.1016/j.jbi.2005.11.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2005] [Revised: 11/04/2005] [Accepted: 11/05/2005] [Indexed: 11/22/2022]
Abstract
Systematic analyses are included as integral parts of bioinformatic analysis. The use of phenetic and phylogenetic trees in many of the newer areas of biology create a need for bioinformaticists to understand more completely the nuances of systematic analysis. Any description in comparative biology, universally begins with what information to use in the comparative endeavor. Phylogenetic approaches are no different. The diversity of approaches and phylogenetic questions in systematics have sometimes hindered a precise understanding of what primary data should be collected to perform such analyses. In addition, one should always keep in mind that the objective of systematic organization of entities in nature not only strives to organize those entities in an objective, repeatable and operational way, but also to organize the attributes of the entities in a similar hierarchical context. This paper attempts to describe characters as the basis of all comparative analysis, to describe the diverse kinds of primary data that exist today in biology, genomics, and bioinformatics, and to place these kinds of primary data in the context of the established approaches to tree building.
Collapse
Affiliation(s)
- Rob DeSalle
- Division of Invertebrates and the Molecular Systematics Laboratories, American Museum of Natural History, 79th Street at Central Park West, New York, NY 10024, USA.
| |
Collapse
|
33
|
Hong JH, Cho SB. The classification of cancer based on DNA microarray data that uses diverse ensemble genetic programming. Artif Intell Med 2006; 36:43-58. [PMID: 16102956 DOI: 10.1016/j.artmed.2005.06.002] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2004] [Revised: 05/11/2005] [Accepted: 06/17/2005] [Indexed: 11/23/2022]
Abstract
OBJECT The classification of cancer based on gene expression data is one of the most important procedures in bioinformatics. In order to obtain highly accurate results, ensemble approaches have been applied when classifying DNA microarray data. Diversity is very important in these ensemble approaches, but it is difficult to apply conventional diversity measures when there are only a few training samples available. Key issues that need to be addressed under such circumstances are the development of a new ensemble approach that can enhance the successful classification of these datasets. MATERIALS AND METHODS An effective ensemble approach that does use diversity in genetic programming is proposed. This diversity is measured by comparing the structure of the classification rules instead of output-based diversity estimating. RESULTS Experiments performed on common gene expression datasets (such as lymphoma cancer dataset, lung cancer dataset and ovarian cancer dataset) demonstrate the performance of the proposed method in relation to the conventional approaches. CONCLUSION Diversity measured by comparing the structure of the classification rules obtained by genetic programming is useful to improve the performance of the ensemble classifier.
Collapse
Affiliation(s)
- Jin-Hyuk Hong
- Department of Computer Science, Yonsei University, 134 Sinchon-dong, Sudaemoon-ku, Seoul 120-749, Republic of Korea
| | | |
Collapse
|
34
|
|