1
|
Mahmoud MAB. Classification of DNA Sequence Based on a Non-gradient Algorithm: Pseudoinverse Learners. Methods Mol Biol 2024; 2744:359-373. [PMID: 38683331 DOI: 10.1007/978-1-0716-3581-0_23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
This chapter proposes a prototype-based classification approach for analyzing DNA barcodes that uses a spectral representation of DNA sequences and a non-gradient neural network. Biological sequences can be viewed as data components with higher non-fixed dimensions, which correspond to the length of the sequences. Through computational procedures such as one-hot encoding, numerical encoding plays an important role in DNA sequence evaluation (OHE). However, the OHE method has some disadvantages: (1) It does not add any details that could result in an additional predictive variable, and (2) if the variable has many classes, OHE significantly expands the feature space. To address these shortcomings, this chapter proposes a computationally efficient framework for classifying DNA sequences of living organisms in the image domain. A multilayer perceptron trained by a pseudoinverse learning autoencoder (PILAE) algorithm is used in the proposed strategy. The learning control parameters and the number of hidden layers do not have to be specified during the PILAE training process. As a result, the PILAE classifier outperforms other deep neural network (DNN) strategies such as the VGG-16 and Xception models.
Collapse
Affiliation(s)
- Mohammed A B Mahmoud
- Faculty of Computer Science, October University for Modern Sciences and Arts, Cairo, Egypt.
| |
Collapse
|
2
|
Filip E, Strzała T, Stępień E, Cembrowska-Lech D. Universal mtDNA fragment for Cervidae barcoding species identification using phylogeny and preliminary analysis of machine learning approach. Sci Rep 2023; 13:9133. [PMID: 37277428 DOI: 10.1038/s41598-023-35637-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Accepted: 05/21/2023] [Indexed: 06/07/2023] Open
Abstract
The aim of the study was to use total DNA obtained from bone material to identify species of free-living animals based on the analysis of mtDNA fragments by molecular methods using accurate bioinformatics tools Bayesian approach and the machine learning approach. In our research, we present a case study of successful species identification based on degraded samples of bone, with the use of short mtDNA fragments. For better barcoding, we used molecular and bioinformatics methods. We obtained a partial sequence of the mitochondrial cytochrome b (Cytb) gene for Capreolus capreolus, Dama dama, and Cervus elaphus, that can be used for species affiliation. The new sequences have been deposited in GenBank, enriching the existing Cervidae mtDNA base. We have also analysed the effect of barcodes on species identification from the perspective of the machine learning approach. Machine learning approaches of BLOG and WEKA were compared with distance-based (TaxonDNA) and tree-based (NJ tree) methods based on the discrimination accuracy of the single barcodes. The results indicated that BLOG and WEKAs SMO classifier and NJ tree performed better than TaxonDNA in discriminating Cervidae species, with BLOG and WEKAs SMO classifier performing the best.
Collapse
Affiliation(s)
- Ewa Filip
- Institute of Biology, University of Szczecin, Wąska 13, 71-415, Szczecin, Poland.
- The Centre for Molecular Biology and Biotechnology, University of Szczecin, Szczecin, Poland.
| | - Tomasz Strzała
- Department of Genetics, Faculty of Biology and Animal Science, Wrocław University of Environmental and Life Sciences, Wrocław, Poland
| | - Edyta Stępień
- Institute of Marine and Environmental Sciences, University of Szczecin, Adama Mickiewicza 16, 70-383, Szczecin, Poland
| | - Danuta Cembrowska-Lech
- Institute of Biology, University of Szczecin, Wąska 13, 71-415, Szczecin, Poland
- Sanprobi Sp. z o. o. Sp. k., Kurza Stopka 5C, 70-535, Szczecin, Poland
| |
Collapse
|
3
|
Dev SA, Unnikrishnan R, Prathibha PS, Sijimol K, Sreekumar VB, AzharAli A, Anoop EV, Viswanath S. Artificial intelligence in timber forensics employing DNA barcode database. 3 Biotech 2023; 13:183. [PMID: 37193334 PMCID: PMC10182240 DOI: 10.1007/s13205-023-03604-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 05/03/2023] [Indexed: 05/18/2023] Open
Abstract
Extreme difficulties in species identification of illegally sourced wood with conventional tools have accelerated illicit logging activities, leading to the destruction of natural resources in India. In this regard, the study primarily focused on developing a DNA barcode database for 41 commercial timber tree species which are highly vulnerable to adulteration in south India. The developed DNA barcode database was validated using an integrated approach involving wood anatomical features of traded wood samples collected from south India. Traded wood samples were primarily identified using wood anatomical features using IAWA list of microscopic features for hardwood identification. Consortium of Barcode of Life (CBOL) recommended barcode gene regions (rbcL, matK & psbA-trnH) were employed for developing DNA barcode database. Secondly, we employed artificial intelligence (AI) analytical platform, Waikato Environment for Knowledge Analysis (WEKA) for analyzing DNA barcode sequence database which could append precision, speed, and accuracy for the entire identification process. Among the four classification algorithms implemented in the machine learning algorithm (WEKA), best performance was shown by SMO, which could clearly allocate individual samples to their respective sequence database of biological reference materials (BRM) with 100 % accuracy, indicating its efficiency in authenticating the traded timber species. Major advantage of AI is the ability to analyze huge data sets with more precision and also provides a large platform for rapid authentication of species, which subsequently reduces human labor and time. Supplementary Information The online version contains supplementary material available at 10.1007/s13205-023-03604-0.
Collapse
Affiliation(s)
- Suma Arun Dev
- Forest Genetic & Biotechnology Division, Kerala Forest Research Institute, Peechi, Thrissur, Kerala 680653 India
| | - Remya Unnikrishnan
- Forest Genetic & Biotechnology Division, Kerala Forest Research Institute, Peechi, Thrissur, Kerala 680653 India
- Cochin University of Science & Technology, Kochi, Kerala India
| | - P. S. Prathibha
- Forest Genetic & Biotechnology Division, Kerala Forest Research Institute, Peechi, Thrissur, Kerala 680653 India
| | - K. Sijimol
- Forest Genetic & Biotechnology Division, Kerala Forest Research Institute, Peechi, Thrissur, Kerala 680653 India
| | - V. B. Sreekumar
- Forest Genetic & Biotechnology Division, Kerala Forest Research Institute, Peechi, Thrissur, Kerala 680653 India
| | - A. AzharAli
- Department of Forest Products and Utilization, College of Forestry, Kerala Agricultural University, Vellanikara, Thrissur, Kerala 680654 India
| | - E. V. Anoop
- Department of Forest Products and Utilization, College of Forestry, Kerala Agricultural University, Vellanikara, Thrissur, Kerala 680654 India
| | - Syam Viswanath
- Forest Genetic & Biotechnology Division, Kerala Forest Research Institute, Peechi, Thrissur, Kerala 680653 India
| |
Collapse
|
4
|
Mohd Salleh MH, Esa Y, Mohamed R. Global Terrapin Character-Based DNA Barcodes: Assessment of the Mitochondrial COI Gene and Conservation Status Revealed a Putative Cryptic Species. Animals (Basel) 2023; 13:1720. [PMID: 37889683 PMCID: PMC10251852 DOI: 10.3390/ani13111720] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Revised: 02/14/2023] [Accepted: 02/17/2023] [Indexed: 06/29/2023] Open
Abstract
Technological and analytical advances to study evolutionary biology, ecology, and conservation of the Southern River Terrapin (Batagur affinis ssp.) are realised through molecular approaches, including DNA barcoding. We evaluated the use of COI DNA barcodes in Malaysia's Southern River Terrapin population to better understand the species' genetic divergence and other genetic characteristics. We evaluated 26 sequences, including four from field specimens of Southern River Terrapins obtained in Bota Kanan, Perak, Malaysia, and Kuala Berang, Terengganu, Malaysia, as well as 22 sequences from global terrapins previously included in the Barcode of Life Database (BOLD) Systems and GenBank. The species are divided into three families: eight Geoemydidae species (18%), three Emydidae species (6%), and one Pelomedusidae species (2%). The IUCN Red List assigned the 12 species of terrapins sampled for this study to the classifications of critically endangered (CR) for 25% of the samples and endangered (EN) for 8% of the samples. With new haplotypes from the world's terrapins, 16 haplotypes were found. The intraspecific distance values between the COI gene sequences were calculated using the K2P model, which indicated a potential cryptic species between the Northern River Terrapin (Batagur baska) and Southern River Terrapin (Batagur affinis affinis). The Bayesian analysis of the phylogenetic tree also showed both species in the same lineage. The BLASTn search resulted in 100% of the same species of B. affinis as B. baska. The Jalview alignment visualised almost identical sequences between both species. The Southern River Terrapin (B. affinis affinis) from the west coast of Peninsular Malaysia was found to share the same haplotype (Hap_1) as the Northern River Terrapin from India. However, B. affinis edwardmolli from the east coast of Peninsular Malaysia formed Hap_16. The COI analysis found new haplotypes and showed that DNA barcodes are an excellent way to measure the diversity of a population.
Collapse
Affiliation(s)
- Mohd Hairul Mohd Salleh
- Department of Aquaculture, Faculty of Agriculture, Universiti Putra Malaysia, Serdang 43400, Malaysia
- Royal Malaysian Customs Department, Persiaran Perdana, Presint 2, Putrajaya 62596, Malaysia
| | - Yuzine Esa
- Department of Aquaculture, Faculty of Agriculture, Universiti Putra Malaysia, Serdang 43400, Malaysia
- International Institute of Aquaculture and Aquatic Sciences, Universiti Putra Malaysia, Lot 960 Jalan Kemang 6, Port Dickson 71050, Malaysia
| | - Rozihan Mohamed
- Department of Aquaculture, Faculty of Agriculture, Universiti Putra Malaysia, Serdang 43400, Malaysia
| |
Collapse
|
5
|
Yang CH, Wu KC, Chuang LY, Chang HW. DeepBarcoding: Deep Learning for Species Classification Using DNA Barcoding. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2158-2165. [PMID: 33600318 DOI: 10.1109/tcbb.2021.3056570] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
DNA barcodes with short sequence fragments are used for species identification. Because of advances in sequencing technologies, DNA barcodes have gradually been emphasized. DNA sequences from different organisms are easily and rapidly acquired. Therefore, DNA sequence analysis tools play an increasingly crucial role in species identification. This study proposed deep barcoding, a deep learning framework for species classification by using DNA barcodes. Deep barcoding uses raw sequence data as the input to represent one-hot encoding as a one-dimensional image and uses a deep convolutional neural network with a fully connected deep neural network for sequence analysis. It can achieve an average accuracy of >90 percent for both simulation and real datasets. Although deep learning yields outstanding performance for species classification with DNA sequences, its application remains a challenge. The deep barcoding model can be a potential tool for species classification and can elucidate DNA barcode-based species identification.
Collapse
|
6
|
van Bemmelen van der Plaat A, van Treuren R, van Hintum TJL. Reliable genomic strategies for species classification of plant genetic resources. BMC Bioinformatics 2021; 22:173. [PMID: 33789577 PMCID: PMC8011391 DOI: 10.1186/s12859-021-04018-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 02/11/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND To address the need for easy and reliable species classification in plant genetic resources collections, we assessed the potential of five classifiers (Random Forest, Neighbour-Joining, 1-Nearest Neighbour, a conservative variety of 3-Nearest Neighbours and Naive Bayes) We investigated the effects of the number of accessions per species and misclassification rate on classification success, and validated theirs generic value results with three complete datasets. RESULTS We found the conservative variety of 3-Nearest Neighbours to be the most reliable classifier when varying species representation and misclassification rate. Through the analysis of the three complete datasets, this finding showed generic value. Additionally, we present various options for marker selection for classification taks such as these. CONCLUSIONS Large-scale genomic data are increasingly being produced for genetic resources collections. These data are useful to address species classification issues regarding crop wild relatives, and improve genebank documentation. Implementation of a classification method that can improve the quality of bad datasets without gold standard training data is considered an innovative and efficient method to improve gene bank documentation.
Collapse
Affiliation(s)
| | - Rob van Treuren
- Centre for Genetic Resources, Wageningen University and Research, P.O. Box 16, 6700 AA, Wageningen, The Netherlands
| | - Theo J L van Hintum
- Centre for Genetic Resources, Wageningen University and Research, P.O. Box 16, 6700 AA, Wageningen, The Netherlands
| |
Collapse
|
7
|
|
8
|
Nie R, Wei J, Zhang S, Vogler AP, Wu L, Konstantinov AS, Li W, Yang X, Xue H. Diversification of mitogenomes in three sympatric
Altica
flea beetles (Insecta, Chrysomelidae). ZOOL SCR 2019. [DOI: 10.1111/zsc.12371] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Rui‐E Nie
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology Chinese Academy of Sciences Beijing China
| | - Jing Wei
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology Chinese Academy of Sciences Beijing China
- University of Chinese Academy of Sciences Beijing China
| | - Shou‐Ke Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology Chinese Academy of Sciences Beijing China
- Research Institute of Subtropical Forestry Chinese Academy of Forestry Fuyang China
| | - Alfried P. Vogler
- Department of Life Sciences Natural History Museum London UK
- Department of Life Sciences, Silwood Park Campus Imperial College London Ascot UK
| | - Ling Wu
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology Chinese Academy of Sciences Beijing China
- College of Life Sciences Hebei University Baoding China
| | | | - Wen‐Zhu Li
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology Chinese Academy of Sciences Beijing China
| | - Xing‐Ke Yang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology Chinese Academy of Sciences Beijing China
| | - Huai‐Jun Xue
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology Chinese Academy of Sciences Beijing China
| |
Collapse
|
9
|
Kreuzer M, Howard C, Adhikari B, Pendry CA, Hawkins JA. Phylogenomic Approaches to DNA Barcoding of Herbal Medicines: Developing Clade-Specific Diagnostic Characters for Berberis. FRONTIERS IN PLANT SCIENCE 2019; 10:586. [PMID: 31139202 PMCID: PMC6527895 DOI: 10.3389/fpls.2019.00586] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 04/18/2019] [Indexed: 05/12/2023]
Abstract
DNA barcoding of herbal medicines has been mainly concerned with authentication of products in trade and has raised awareness of species substitution and adulteration. More recently DNA barcodes have been included in pharmacopoeias, providing tools for regulatory purposes. The commonly used DNA barcoding regions in plants often fail to resolve identification to species level. This can be especially challenging in evolutionarily complex groups where incipient or reticulate speciation is ongoing. In this study, we take a phylogenomic approach, analyzing whole plastid sequences from the evolutionarily complex genus Berberis in order to develop DNA barcodes for the medicinally important species Berberis aristata. The phylogeny reconstructed from an alignment of ∼160 kbp of chloroplast DNA for 57 species reveals that the pharmacopoeial species in question is polyphyletic, complicating development of a species-specific DNA barcode. Instead we propose a DNA barcode that is clade specific, using our phylogeny to define Operational Phylogenetic Units (OPUs). The plastid alignment is then reduced to small, informative DNA regions including nucleotides diagnostic for these OPUs. These DNA barcodes were tested on commercial samples, and shown to discriminate plants in trade and therefore to meet the requirement of a pharmacopoeial standard. The proposed method provides an innovative approach for inferring DNA barcodes for evolutionarily complex groups for regulatory purposes and quality control.
Collapse
Affiliation(s)
- Marco Kreuzer
- School of Biological Sciences, University of Reading, Reading, United Kingdom
| | - Caroline Howard
- BP-NIBSC Herbal Laboratory, National Institute for Biological Standards and Control, Potters Bar, United Kingdom
| | | | | | - Julie A. Hawkins
- School of Biological Sciences, University of Reading, Reading, United Kingdom
| |
Collapse
|
10
|
He T, Jiao L, Wiedenhoeft AC, Yin Y. Machine learning approaches outperform distance- and tree-based methods for DNA barcoding of Pterocarpus wood. PLANTA 2019; 249:1617-1625. [PMID: 30825008 DOI: 10.1007/s00425-019-03116-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 02/20/2019] [Indexed: 05/10/2023]
Abstract
Machine-learning approaches (MLAs) for DNA barcoding outperform distance- and tree-based methods on identification accuracy and cost-effectiveness to arrive at species-level identification of wood. DNA barcoding is a promising tool to combat illegal logging and associated trade, and the development of reliable and efficient analytical methods is essential for its extensive application in the trade of wood and in the forensics of natural materials more broadly. In this study, 120 DNA sequences of four barcodes (ITS2, matK, ndhF-rpl32, and rbcL) generated in our previous study and 85 downloaded from National Center for Biotechnology Information (NCBI) were collected to establish a reference data set for six commercial Pterocarpus woods. MLAs (BLOG, BP-neural network, SMO and J48) were compared with distance- (TaxonDNA) and tree-based (NJ tree) methods based on identification accuracy and cost-effectiveness across these six species, and also were applied to discriminate the CITES-listed species Pterocarpus santalinus from its anatomically similar species P. tinctorius for forensic identification. MLAs provided higher identification accuracy (30.8-100%) than distance- (15.1-97.4%) and tree-based methods (11.1-87.5%), with SMO performing the best among the machine learning classifiers. The two-locus combination ITS2 + matK when using SMO classifier exhibited the highest resolution (100%) with the fewest barcodes for discriminating the six Pterocarpus species. The CITES-listed species P. santalinus was discriminated successfully from P. tinctorius using MLAs with a single barcode, ndhF-rpl32. This study shows that MLAs provided higher identification accuracy and cost-effectiveness for forensic application over other analytical methods in DNA barcoding of Pterocarpus wood.
Collapse
Affiliation(s)
- Tuo He
- Department of Wood Anatomy and Utilization, Chinese Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing, 100091, China
- Wood Collections (WOODPEDIA), Chinese Academy of Forestry, Beijing, 100091, China
- Forest Products Laboratory, Center for Wood Anatomy Research, USDA Forest Service, Madison, WI, 53726, USA
- Department of Botany, University of Wisconsin, Madison, WI, 53706, USA
| | - Lichao Jiao
- Department of Wood Anatomy and Utilization, Chinese Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing, 100091, China
- Wood Collections (WOODPEDIA), Chinese Academy of Forestry, Beijing, 100091, China
| | - Alex C Wiedenhoeft
- Forest Products Laboratory, Center for Wood Anatomy Research, USDA Forest Service, Madison, WI, 53726, USA
- Department of Botany, University of Wisconsin, Madison, WI, 53706, USA
- Department of Forestry and National Resources, Purdue University, West Lafayette, IN, 47907, USA
- Ciências Biológicas (Botânica), Univesidade Estadual Paulista, Botucatu, São Paulo, Brazil
| | - Yafang Yin
- Department of Wood Anatomy and Utilization, Chinese Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing, 100091, China.
- Wood Collections (WOODPEDIA), Chinese Academy of Forestry, Beijing, 100091, China.
| |
Collapse
|
11
|
Phillips JD, Gillis DJ, Hanner RH. Incomplete estimates of genetic diversity within species: Implications for DNA barcoding. Ecol Evol 2019; 9:2996-3010. [PMID: 30891232 PMCID: PMC6406011 DOI: 10.1002/ece3.4757] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 09/03/2018] [Accepted: 10/12/2018] [Indexed: 02/01/2023] Open
Abstract
DNA barcoding has greatly accelerated the pace of specimen identification to the species level, as well as species delineation. Whereas the application of DNA barcoding to the matching of unknown specimens to known species is straightforward, its use for species delimitation is more controversial, as species discovery hinges critically on present levels of haplotype diversity, as well as patterning of standing genetic variation that exists within and between species. Typical sample sizes for molecular biodiversity assessment using DNA barcodes range from 5 to 10 individuals per species. However, required levels that are necessary to fully gauge haplotype variation at the species level are presumed to be strongly taxon-specific. Importantly, little attention has been paid to determining appropriate specimen sample sizes that are necessary to reveal the majority of intraspecific haplotype variation within any one species. In this paper, we present a brief outline of the current literature and methods on intraspecific sample size estimation for the assessment of COI DNA barcode haplotype sampling completeness. The importance of adequate sample sizes for studies of molecular biodiversity is stressed, with application to a variety of metazoan taxa, through reviewing foundational statistical and population genetic models, with specific application to ray-finned fishes (Chordata: Actinopterygii). Finally, promising avenues for further research in this area are highlighted.
Collapse
Affiliation(s)
- Jarrett D. Phillips
- School of Computer ScienceUniversity of GuelphGuelphOntarioCanada
- Centre for Biodiversity GenomicsBiodiversity Institute of OntarioUniversity of GuelphGuelphOntarioCanada
| | - Daniel J. Gillis
- School of Computer ScienceUniversity of GuelphGuelphOntarioCanada
| | - Robert H. Hanner
- Centre for Biodiversity GenomicsBiodiversity Institute of OntarioUniversity of GuelphGuelphOntarioCanada
- Department of Integrative BiologyUniversity of GuelphGuelphOntarioCanada
| |
Collapse
|
12
|
Meher PK, Sahu TK, Gahoi S, Tomar R, Rao AR. funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model. BMC Genet 2019; 20:2. [PMID: 30616524 PMCID: PMC6323839 DOI: 10.1186/s12863-018-0710-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 12/26/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS An online prediction server "funbarRF" is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF ( https://cran.r-project.org/web/packages/funbarRF/ ) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Shachi Gahoi
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Ruchi Tomar
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
- Department of Bioinformatics, Janta Vedic College, Baraut, Baghpat, Uttar Pradesh 250611 India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| |
Collapse
|
13
|
Martín M, Zhang LF, Fernández-López J, Dueñas M, Rodríguez-Armas J, Beltrán-Tejera E, Telleria M. Hyphoderma paramacaronesicum sp. nov. ( Meruliaceae, Polyporales, Basidiomycota), a cryptic lineage to H. macaronesicum. Fungal Syst Evol 2018; 2:57-68. [PMID: 32467888 PMCID: PMC7225581 DOI: 10.3114/fuse.2018.02.05] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
This article re-evaluates the taxonomy of Hyphoderma macaronesicum based on various strategies, including the cohesion species recognition method through haplotype networks, multilocus genetic analyses using the genealogical concordance phylogenetic concept, as well as species tree reconstruction. The following loci were examined: the internal transcribed spacers of nuclear ribosomal DNA (ITS nrDNA), the intergenic spacers of nuclear ribosomal DNA (IGS nrDNA), two fragments of the protein-coding RNA polymerase II subunit 2 (RPB2), and two fragments of the translation elongation factor 1-α (EF1-α). Our results indicate that the name H. macaronesicum includes at least two separate species, one of which is newly described as Hyphoderma paramacaronesicum. The two species are readily distinguished based on the various loci analysed, namely ITS, IGS, RPB2 and EF1-α.
Collapse
Affiliation(s)
- M.P. Martín
- Departamento de Micología, Real Jardín Botánico, RJB-CSIC, Plaza de Murillo 2, 28014 Madrid, Spain
| | - L.-F. Zhang
- Departamento de Micología, Real Jardín Botánico, RJB-CSIC, Plaza de Murillo 2, 28014 Madrid, Spain
| | - J. Fernández-López
- Departamento de Micología, Real Jardín Botánico, RJB-CSIC, Plaza de Murillo 2, 28014 Madrid, Spain
| | - M. Dueñas
- Departamento de Micología, Real Jardín Botánico, RJB-CSIC, Plaza de Murillo 2, 28014 Madrid, Spain
| | - J.L. Rodríguez-Armas
- Departamento de Botánica, Ecología y Fisiología Vegetal, Universidad de La Laguna, 38200 La Laguna, Tenerife, Islas Canarias, Spain
| | - E. Beltrán-Tejera
- Departamento de Botánica, Ecología y Fisiología Vegetal, Universidad de La Laguna, 38200 La Laguna, Tenerife, Islas Canarias, Spain
| | - M.T. Telleria
- Departamento de Micología, Real Jardín Botánico, RJB-CSIC, Plaza de Murillo 2, 28014 Madrid, Spain
| |
Collapse
|
14
|
Yang CH, Wu KC, Chuang LY, Chang HW. Decision Tree Algorithm-Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging. Evol Bioinform Online 2018; 14:1176934318760856. [PMID: 29551885 PMCID: PMC5846911 DOI: 10.1177/1176934318760856] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2017] [Accepted: 01/24/2018] [Indexed: 01/17/2023] Open
Abstract
DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a ribulose diphosphate carboxylase (rbcL) SNP barcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree-selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species.
Collapse
Affiliation(s)
- Cheng-Hong Yang
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan.,Graduate Institute of Clinical Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Kuo-Chuan Wu
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan.,Department of Computer Science and Information Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - Li-Yeh Chuang
- Department of Chemical Engineering, Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan
| | - Hsueh-Wei Chang
- Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung, Taiwan.,Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan.,Department of Biomedical Science and Environmental Biology, Kaohsiung Medical University, Kaohsiung, Taiwan
| |
Collapse
|
15
|
Mishra P, Shukla AK, Sundaresan V. Candidate DNA Barcode Tags Combined With High Resolution Melting (Bar-HRM) Curve Analysis for Authentication of Senna alexandrina Mill. With Validation in Crude Drugs. FRONTIERS IN PLANT SCIENCE 2018; 9:283. [PMID: 29593755 PMCID: PMC5859231 DOI: 10.3389/fpls.2018.00283] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2017] [Accepted: 02/19/2018] [Indexed: 05/07/2023]
Abstract
Senna alexandrina (Fabaceae) is a globally recognized medicinal plant for its laxative properties as well as the only source of sennosides, and is highly exported bulk herb from India. Its major procurement is exclusively from limited cultivation, which leads to risks of deliberate or unintended adulteration. The market raw materials are in powdered or finished product form, which lead to difficulties in authentication. Here, DNA barcode tags based on chloroplast genes (rbcL and matK) and intergenic spacers (psbA-trnH and ITS) were developed for S. alexandrina along with the allied species. The ability and performance of the ITS1 region to discriminate among the Senna species resulted in the present proposal of the ITS1 tags as successful barcode. Further, these tags were coupled with high-resolution melting (HRM) curve analysis in a real-time PCR genotyping method to derive Bar-HRM (Barcoding-HRM) assays. Suitable HRM primer sets were designed through SNP detection and mutation scanning in genomic signatures of Senna species. The melting profiles of S. alexandrina and S. italica subsp. micrantha were almost identical and the remaining five species were clearly separated so that they can be differentiated by HRM method. The sensitivity of the method was utilized to authenticate market samples [Herbal Sample Assays (HSAs)]. HSA01 (S. alexandrina crude drug sample from Bangalore) and HSA06 (S. alexandrina crude drug sample from Tuticorin, Tamil Nadu, India) were found to be highly contaminated with S. italica subsp. micrantha. Species admixture samples mixed in varying percentage was identified sensitively with detection of contamination as low as 1%. The melting profiles of PCR amplicons are clearly distinct, which enables the authentic differentiation of species by the HRM method. This study reveals that DNA barcoding coupled with HRM is an efficient molecular tool to authenticate Senna herbal products in the market for quality control in the drug supply chain. CIMAP Communication Number: CIMAP/PUB/2017/31.
Collapse
Affiliation(s)
- Priyanka Mishra
- Plant Biology and Systematics, CSIR-Central Institute of Medicinal and Aromatic Plants, Research Centre, Bangalore, India
| | - Ashutosh K. Shukla
- Biotechnology Division, CSIR-Central Institute of Medicinal and Aromatic Plants, Lucknow, India
| | - Velusamy Sundaresan
- Plant Biology and Systematics, CSIR-Central Institute of Medicinal and Aromatic Plants, Research Centre, Bangalore, India
- *Correspondence: Velusamy Sundaresan, ;
| |
Collapse
|
16
|
Mallo D, Posada D. Multilocus inference of species trees and DNA barcoding. Philos Trans R Soc Lond B Biol Sci 2017; 371:rstb.2015.0335. [PMID: 27481787 PMCID: PMC4971187 DOI: 10.1098/rstb.2015.0335] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/10/2016] [Indexed: 11/30/2022] Open
Abstract
The unprecedented amount of data resulting from next-generation sequencing has opened a new era in phylogenetic estimation. Although large datasets should, in theory, increase phylogenetic resolution, massive, multilocus datasets have uncovered a great deal of phylogenetic incongruence among different genomic regions, due both to stochastic error and to the action of different evolutionary process such as incomplete lineage sorting, gene duplication and loss and horizontal gene transfer. This incongruence violates one of the fundamental assumptions of the DNA barcoding approach, which assumes that gene history and species history are identical. In this review, we explain some of the most important challenges we will have to face to reconstruct the history of species, and the advantages and disadvantages of different strategies for the phylogenetic analysis of multilocus data. In particular, we describe the evolutionary events that can generate species tree—gene tree discordance, compare the most popular methods for species tree reconstruction, highlight the challenges we need to face when using them and discuss their potential utility in barcoding. Current barcoding methods sacrifice a great amount of statistical power by only considering one locus, and a transition to multilocus barcodes would not only improve current barcoding methods, but also facilitate an eventual transition to species-tree-based barcoding strategies, which could better accommodate scenarios where the barcode gap is too small or inexistent. This article is part of the themed issue ‘From DNA barcodes to biomes’.
Collapse
Affiliation(s)
- Diego Mallo
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain
| | - David Posada
- Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo 36310, Spain
| |
Collapse
|
17
|
Khawaldeh S, Pervaiz U, Elsharnoby M, Alchalabi AE, Al-Zubi N. Taxonomic Classification for Living Organisms Using Convolutional Neural Networks. Genes (Basel) 2017; 8:genes8110326. [PMID: 29149087 PMCID: PMC5704239 DOI: 10.3390/genes8110326] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 11/05/2017] [Accepted: 11/14/2017] [Indexed: 12/11/2022] Open
Abstract
Taxonomic classification has a wide-range of applications such as finding out more about evolutionary history. Compared to the estimated number of organisms that nature harbors, humanity does not have a thorough comprehension of to which specific classes they belong. The classification of living organisms can be done in many machine learning techniques. However, in this study, this is performed using convolutional neural networks. Moreover, a DNA encoding technique is incorporated in the algorithm to increase performance and avoid misclassifications. The algorithm proposed outperformed the state of the art algorithms in terms of accuracy and sensitivity, which illustrates a high potential for using it in many other applications in genome analysis.
Collapse
Affiliation(s)
- Saed Khawaldeh
- Erasmus+ Joint Master Program in Medical Imaging and Applications, University of Burgundy, 21000 Dijon, France.
- Erasmus+ Joint Master Program in Medical Imaging and Applications, UNICLAM, 03043 Cassino FR, Italy.
- Erasmus+ Joint Master Program in Medical Imaging and Applications, University of Girona, 17004 Girona, Spain.
- Graduate School of Natural and Applied Sciences, Istanbul Sehir University, 34865 Kartal/İstanbul, Turkey.
- Department of Electrical Engineering and Automation, Aalto University, 02150 Espoo, Finland.
| | - Usama Pervaiz
- Erasmus+ Joint Master Program in Medical Imaging and Applications, University of Burgundy, 21000 Dijon, France.
- Erasmus+ Joint Master Program in Medical Imaging and Applications, UNICLAM, 03043 Cassino FR, Italy.
- Erasmus+ Joint Master Program in Medical Imaging and Applications, University of Girona, 17004 Girona, Spain.
| | - Mohammed Elsharnoby
- Graduate School of Natural and Applied Sciences, Istanbul Sehir University, 34865 Kartal/İstanbul, Turkey.
| | - Alaa Eddin Alchalabi
- Graduate School of Natural and Applied Sciences, Istanbul Sehir University, 34865 Kartal/İstanbul, Turkey.
| | - Nayel Al-Zubi
- Department of Computer Engineering, Al-Balqa' Applied University, 19117 Al-Salt, Jordan.
| |
Collapse
|
18
|
Mishra P, Kumar A, Sivaraman G, Shukla AK, Kaliamoorthy R, Slater A, Velusamy S. Character-based DNA barcoding for authentication and conservation of IUCN Red listed threatened species of genus Decalepis (Apocynaceae). Sci Rep 2017; 7:14910. [PMID: 29097709 PMCID: PMC5668324 DOI: 10.1038/s41598-017-14887-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Accepted: 10/09/2017] [Indexed: 11/09/2022] Open
Abstract
The steno-endemic species of genus Decalepis are highly threatened by destructive wild harvesting. The medicinally important fleshy tuberous roots of Decalepis hamiltonii are traded as substitute, to meet the international market demand of Hemidesmus indicus. In addition, the tuberous roots of all three species of Decalepis possess similar exudates and texture, which challenges the ability of conventional techniques alone to perform accurate species authentication. This study was undertaken to generate DNA barcodes that could be utilized in monitoring and curtailing the illegal trade of these endangered species. The DNA barcode reference library was developed in BOLD database platform for candidate barcodes rbcL, matK, psbA-trnH, ITS and ITS2. The average intra-specific variations (0-0.27%) were less than the distance to nearest neighbour (0.4-11.67%) with matK and ITS. Anchoring the coding region rbcL in multigene tiered approach, the combination rbcL + matK + ITS yielded 100% species resolution, using the least number of loci combinations either with PAUP or BLOG methods to support a character-based approach. Species-specific SNP position (230 bp) in the matK region that is characteristic of D. hamiltonii could be used to design specific assays, enhancing its applicability for direct use in CITES enforcement for distinguishing it from H. indicus.
Collapse
Affiliation(s)
- Priyanka Mishra
- Plant Biology and Systematics, CSIR - Central Institute of Medicinal and Aromatic Plants, Research Center, Allalsandra, GKVK Post, Bengaluru, 560065, Karnataka, India
| | - Amit Kumar
- Plant Biology and Systematics, CSIR - Central Institute of Medicinal and Aromatic Plants, Research Center, Allalsandra, GKVK Post, Bengaluru, 560065, Karnataka, India
| | - Gokul Sivaraman
- Plant Biology and Systematics, CSIR - Central Institute of Medicinal and Aromatic Plants, Research Center, Allalsandra, GKVK Post, Bengaluru, 560065, Karnataka, India
| | - Ashutosh K Shukla
- Biotechnology Division, CSIR - Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, 226015, Uttar Pradesh, India
| | - Ravikumar Kaliamoorthy
- School of Conservation, TransDisciplinary University, 74/2, Jarakabande Kaval, Post Attur, Via Yelahanka, Bangalore, 560064, Karnataka, India
| | - Adrian Slater
- Biomolecular Technology Group, Faculty of Health and Life Sciences, De Montfort University, Leicester, LE1 9BH, UK
| | - Sundaresan Velusamy
- Plant Biology and Systematics, CSIR - Central Institute of Medicinal and Aromatic Plants, Research Center, Allalsandra, GKVK Post, Bengaluru, 560065, Karnataka, India.
| |
Collapse
|
19
|
Hou G, Chen WT, Lu HS, Cheng F, Xie SG. Developing a DNA barcode library for perciform fishes in the South China Sea: Species identification, accuracy and cryptic diversity. Mol Ecol Resour 2017; 18:137-146. [DOI: 10.1111/1755-0998.12718] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 06/14/2017] [Accepted: 08/14/2017] [Indexed: 11/26/2022]
Affiliation(s)
- Gang Hou
- Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences; Institute of Hydrobiology; Chinese Academy of Sciences; Wuhan China
- University of Chinese Academy of Sciences; Beijing China
- Guangdong Ocean University; Zhanjiang China
| | - Wei-Tao Chen
- Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences; Institute of Hydrobiology; Chinese Academy of Sciences; Wuhan China
- University of Chinese Academy of Sciences; Beijing China
| | | | - Fei Cheng
- Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences; Institute of Hydrobiology; Chinese Academy of Sciences; Wuhan China
| | - Song-Guang Xie
- Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences; Institute of Hydrobiology; Chinese Academy of Sciences; Wuhan China
- Huai'an Research Center; Institute of Hydrobiology; Chinese Academy of Sciences; Huai'an China
| |
Collapse
|
20
|
Raclariu AC, Mocan A, Popa MO, Vlase L, Ichim MC, Crisan G, Brysting AK, de Boer H. Veronica officinalis Product Authentication Using DNA Metabarcoding and HPLC-MS Reveals Widespread Adulteration with Veronica chamaedrys. Front Pharmacol 2017; 8:378. [PMID: 28674497 PMCID: PMC5474480 DOI: 10.3389/fphar.2017.00378] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2017] [Accepted: 05/31/2017] [Indexed: 11/13/2022] Open
Abstract
Studying herbal products derived from local and traditional knowledge and their value chains is one of the main challenges in ethnopharmacology. The majority of these products have a long history of use, but non-harmonized trade and differences in regulatory policies between countries impact their value chains and lead to concerns over product efficacy, safety and quality. Veronica officinalis L. (common speedwell), a member of Plantaginaceae family, has a long history of use in European traditional medicine, mainly in central eastern Europe and the Balkans. However, no specified control tests are available either to establish the quality of derived herbal products or for the discrimination of its most common substitute, V. chamaedrys L. (germander speedwell). In this study, we use DNA metabarcoding and high performance liquid chromatography coupled with mass spectrometry (HPLC-MS) to authenticate sixteen V. officinalis herbal products and compare the potential of the two approaches to detect substitution, adulteration and the use of unreported constituents. HPLC-MS showed high resolution in detecting phytochemical target compounds, but did not enable detection of specific plant species in the products. DNA metabarcoding detected V. officinalis in only 15% of the products, whereas it detected V. chamaedrys in 62% of the products. The results confirm that DNA metabarcoding can be used to test for the presence of Veronica species, and detect substitution and/or admixture of other Veronica species, as well as simultaneously detect all other species present. Our results confirm that none of the herbal products contained exactly the species listed on the label, and all included substitutes, contaminants or fillers. This study highlights the need for authentication of raw herbals along the value chain of these products. An integrative methodology can assess both the quality of herbal products in terms of target compound concentrations and species composition, as well as admixture and substitution with other chemical compounds and plants.
Collapse
Affiliation(s)
- Ancuta C Raclariu
- Plant Evolution and Metabarcoding Group, Natural History Museum, University of OsloOslo, Norway.,Stejarul Research Centre for Biological Sciences, National Institute of Research and Development for Biological Sciences (NIRDBS)Piatra Neamţ, Romania
| | - Andrei Mocan
- Department of Pharmaceutical Botany, Faculty of Pharmacy, Iuliu Hatieganu University of Medicine and PharmacyCluj-Napoca, Romania.,ICHAT and Institute for Life Sciences, University of Agricultural Sciences and Veterinary Medicine of Cluj-NapocaCluj-Napoca, Romania
| | - Madalina O Popa
- Stejarul Research Centre for Biological Sciences, National Institute of Research and Development for Biological Sciences (NIRDBS)Piatra Neamţ, Romania
| | - Laurian Vlase
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of OsloOslo, Norway
| | - Mihael C Ichim
- Stejarul Research Centre for Biological Sciences, National Institute of Research and Development for Biological Sciences (NIRDBS)Piatra Neamţ, Romania
| | - Gianina Crisan
- Department of Pharmaceutical Botany, Faculty of Pharmacy, Iuliu Hatieganu University of Medicine and PharmacyCluj-Napoca, Romania
| | - Anne K Brysting
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis (CEES)Oslo, Norway
| | - Hugo de Boer
- Plant Evolution and Metabarcoding Group, Natural History Museum, University of OsloOslo, Norway.,Department of Organismal Biology, Evolutionary Biology Centre, Uppsala UniversityUppsala, Sweden
| |
Collapse
|
21
|
Yang J, Vázquez L, Chen X, Li H, Zhang H, Liu Z, Zhao G. Development of Chloroplast and Nuclear DNA Markers for Chinese Oaks ( Quercus Subgenus Quercus) and Assessment of Their Utility as DNA Barcodes. FRONTIERS IN PLANT SCIENCE 2017; 8:816. [PMID: 28579999 PMCID: PMC5437370 DOI: 10.3389/fpls.2017.00816] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 05/01/2017] [Indexed: 05/04/2023]
Abstract
Chloroplast DNA (cpDNA) is frequently used for species demography, evolution, and species discrimination of plants. However, the lack of efficient and universal markers often brings particular challenges for genetic studies across different plant groups. In this study, chloroplast genomes from two closely related species (Quercus rubra and Castanea mollissima) in Fagaceae were compared to explore universal cpDNA markers for the Chinese oak species in Quercus subgenus Quercus, a diverse species group without sufficient molecular differentiation. With the comparison, nine and 14 plastid markers were selected as barcoding and phylogeographic candidates for the Chinese oaks. Five (psbA-trnH, matK-trnK, ycf3-trnS, matK, and ycf1) of the nine plastid candidate barcodes, with the addition of newly designed ITS and a single-copy nuclear gene (SAP), were then tested on 35 Chinese oak species employing four different barcoding approaches (genetic distance-, BLAST-, character-, and tree-based methods). The four methods showed different species identification powers with character-based method performing the best. Of the seven barcodes tested, a barcoding gap was absent in all of them across the Chinese oaks, while ITS and psbA-trnH provided the highest species resolution (30.30%) with the character- and BLAST-based methods, respectively. The six-marker combination (psbA-trnH + matK-trnK + matK + ycf1 + ITS + SAP) showed the best species resolution (84.85%) using the character-based method for barcoding the Chinese oaks. The barcoding results provided additional implications for taxonomy of the Chinese oaks in subg. Quercus, basically identifying three major infrageneric clades of the Chinese oaks (corresponding to Groups Quercus, Cerris, and Ilex) referenced to previous phylogenetic classification of Quercus. While the morphology-based allocations proposed for the Chinese oaks in subg. Quercus were challenged. A low variation rate of the chloroplast genome, and complex speciation patterns involving incomplete lineage sorting, interspecific hybridization and introgression, possibly have negative impacts on the species assignment and phylogeny of oak species.
Collapse
Affiliation(s)
- Jia Yang
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest UniversityXi'an, China
| | - Lucía Vázquez
- Biology Department, University of Illinois at SpringfieldSpringfield, IL, United States
| | - Xiaodan Chen
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest UniversityXi'an, China
| | - Huimin Li
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest UniversityXi'an, China
| | - Hao Zhang
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest UniversityXi'an, China
| | - Zhanlin Liu
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest UniversityXi'an, China
| | - Guifang Zhao
- Key Laboratory of Resource Biology and Biotechnology in Western China, Ministry of Education, College of Life Sciences, Northwest UniversityXi'an, China
| |
Collapse
|
22
|
Horn T, Häser A. Bamboo tea: reduction of taxonomic complexity and application of DNA diagnostics based on rbcL and matK sequence data. PeerJ 2016; 4:e2781. [PMID: 27957401 PMCID: PMC5149056 DOI: 10.7717/peerj.2781] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 11/10/2016] [Indexed: 11/30/2022] Open
Abstract
Background Names used in ingredient lists of food products are trivial and in their nature rarely precise. The most recent scientific interpretation of the term bamboo (Bambusoideae, Poaceae) comprises over 1,600 distinct species. In the European Union only few of these exotic species are well known sources for food ingredients (i.e., bamboo sprouts) and are thus not considered novel foods, which would require safety assessments before marketing of corresponding products. In contrast, the use of bamboo leaves and their taxonomic origin is mostly unclear. However, products containing bamboo leaves are currently marketed. Methods We analysed bamboo species and tea products containing bamboo leaves using anatomical leaf characters and DNA sequence data. To reduce taxonomic complexity associated with the term bamboo, we used a phylogenetic framework to trace the origin of DNA from commercially available bamboo leaves within the bambusoid subfamily. For authentication purposes, we introduced a simple PCR based test distinguishing genuine bamboo from other leaf components and assessed the diagnostic potential of rbcL and matK to resolve taxonomic entities within the bamboo subfamily and tribes. Results Based on anatomical and DNA data we were able to trace the taxonomic origin of bamboo leaves used in products to the genera Phyllostachys and Pseudosasa from the temperate “woody” bamboo tribe (Arundinarieae). Currently available rbcL and matK sequence data allow the character based diagnosis of 80% of represented bamboo genera. We detected adulteration by carnation in four of eight tea products and, after adapting our objectives, could trace the taxonomic origin of the adulterant to Dianthus chinensis (Caryophyllaceae), a well known traditional Chinese medicine with counter indications for pregnant women.
Collapse
Affiliation(s)
- Thomas Horn
- Molecular Cellbiology, Karlsruhe Institute of Technology , Karlsruhe , Germany
| | - Annette Häser
- Molecular Cellbiology, Karlsruhe Institute of Technology , Karlsruhe , Germany
| |
Collapse
|
23
|
Meher PK, Sahu TK, Rao AR. Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier. Gene 2016; 592:316-24. [PMID: 27393648 DOI: 10.1016/j.gene.2016.07.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 07/02/2016] [Accepted: 07/04/2016] [Indexed: 11/17/2022]
Abstract
DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| | - A R Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| |
Collapse
|
24
|
Wu HY, Wang YH, Xie Q, Ke YL, Bu WJ. Molecular classification based on apomorphic amino acids (Arthropoda, Hexapoda): Integrative taxonomy in the era of phylogenomics. Sci Rep 2016; 6:28308. [PMID: 27312960 PMCID: PMC4911608 DOI: 10.1038/srep28308] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Accepted: 05/31/2016] [Indexed: 11/10/2022] Open
Abstract
With the great development of sequencing technologies and systematic methods, our understanding of evolutionary relationships at deeper levels within the tree of life has greatly improved over the last decade. However, the current taxonomic methodology is insufficient to describe the growing levels of diversity in both a standardised and general way due to the limitations of using only morphological traits to describe clades. Herein, we propose the idea of a molecular classification based on hierarchical and discrete amino acid characters. Clades are classified based on the results of phylogenetic analyses and described using amino acids with group specificity in phylograms. Practices based on the recently published phylogenomic datasets of insects together with 15 de novo sequenced transcriptomes in this study demonstrate that such a methodology can accommodate various higher ranks of taxonomy. Such an approach has the advantage of describing organisms in a standard and discrete way within a phylogenetic framework, thereby facilitating the recognition of clades from the view of the whole lineage, as indicated by PhyloCode. By combining identification keys and phylogenies, the molecular classification based on hierarchical and discrete characters may greatly boost the progress of integrative taxonomy.
Collapse
Affiliation(s)
- Hao-Yang Wu
- Institute of Entomology, College of Life Sciences, Nankai University, Tianjin 300071, China
| | - Yan-Hui Wang
- Institute of Entomology, College of Life Sciences, Nankai University, Tianjin 300071, China
- College of Computer and Control Engineering, Nankai University, 38 Tongyan Road, Haihe Education Park, Jinnan District, Tianjin 300350, China
| | - Qiang Xie
- Institute of Entomology, College of Life Sciences, Nankai University, Tianjin 300071, China
| | - Yun-Ling Ke
- Guangdong Entomological Institute, Guangzhou 510260, China
| | - Wen-Jun Bu
- Institute of Entomology, College of Life Sciences, Nankai University, Tianjin 300071, China
| |
Collapse
|
25
|
Zou S, Li Q. Pay Attention to the Overlooked Cryptic Diversity in Existing Barcoding Data: the Case of Mollusca with Character-Based DNA Barcoding. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2016; 18:327-335. [PMID: 26899167 DOI: 10.1007/s10126-016-9692-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 01/28/2016] [Indexed: 06/05/2023]
Abstract
With the global biodiversity crisis, DNA barcoding aims for fast species identification and cryptic species diversity revelation. For more than 10 years, large amounts of DNA barcode data have been accumulating in publicly available databases, most of which were conducted by distance or tree-building methods that have often been argued, especially for cryptic species revelation. In this context, overlooked cryptic diversity may exist in the available barcoding data. The character-based DNA barcoding, however, has a good chance for detecting the overlooked cryptic diversity. In this study, marine mollusk was as the ideal case for detecting the overlooked potential cryptic species from existing cytochrome c oxidase I (COI) sequences with character-based DNA barcode. A total of 1081 COI sequences of mollusks, belonging to 176 species of 25 families of Gastropoda, Cephalopoda, and Lamellibranchia, were conducted by character analysis. As a whole, the character-based barcoding results were consistent with previous distance and tree-building analysis for species discrimination. More importantly, quite a number of species analyzed were divided into distinct clades with unique diagnostical characters. Based on the concept of cryptic species revelation of character-based barcoding, these species divided into separate taxonomic groups might be potential cryptic species. The detection of the overlooked potential cryptic diversity proves that the character-based barcoding mode possesses more advantages of revealing cryptic biodiversity. With the development of DNA barcoding, making the best use of barcoding data is worthy of our attention for species conservation.
Collapse
Affiliation(s)
- Shanmei Zou
- Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao, 266003, China
- Jiangsu Provincial Key Laboratory of Marine Biology, College of Resources and Environmental Science, Nanjing Agricultural University, Nanjing, 210095, China
| | - Qi Li
- Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao, 266003, China.
| |
Collapse
|
26
|
Zou S, Fei C, Song J, Bao Y, He M, Wang C. Combining and Comparing Coalescent, Distance and Character-Based Approaches for Barcoding Microalgaes: A Test with Chlorella-Like Species (Chlorophyta). PLoS One 2016; 11:e0153833. [PMID: 27092945 PMCID: PMC4841637 DOI: 10.1371/journal.pone.0153833] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 03/13/2016] [Indexed: 01/19/2023] Open
Abstract
Several different barcoding methods of distinguishing species have been advanced, but which method is the best is still controversial. Chlorella is becoming particularly promising in the development of second-generation biofuels. However, the taxonomy of Chlorella-like organisms is easily confused. Here we report a comprehensive barcoding analysis of Chlorella-like species from Chlorella, Chloroidium, Dictyosphaerium and Actinastrum based on rbcL, ITS, tufA and 16S sequences to test the efficiency of traditional barcoding, GMYC, ABGD, PTP, P ID and character-based barcoding methods. First of all, the barcoding results gave new insights into the taxonomic assessment of Chlorella-like organisms studied, including the clear species discrimination and resolution of potentially cryptic species complexes in C. sorokiniana, D. ehrenbergianum and C. Vulgaris. The tufA proved to be the most efficient barcoding locus, which thus could be as potential "specific barcode" for Chlorella-like species. The 16S failed in discriminating most closely related species. The resolution of GMYC, PTP, P ID, ABGD and character-based barcoding methods were variable among rbcL, ITS and tufA genes. The best resolution for species differentiation appeared in tufA analysis where GMYC, PTP, ABGD and character-based approaches produced consistent groups while the PTP method over-split the taxa. The character analysis of rbcL, ITS and tufA sequences could clearly distinguish all taxonomic groups respectively, including the potentially cryptic lineages, with many character attributes. Thus, the character-based barcoding provides an attractive complement to coalescent and distance-based barcoding. Our study represents the test that proves the efficiency of multiple DNA barcoding in species discrimination of microalgaes.
Collapse
Affiliation(s)
- Shanmei Zou
- Jiangsu Provincial Key Laboratory of Marine Biology, College of Resources and Environmental Science, Nanjing Agricultural University, Nanjing 210095, PR China
| | - Cong Fei
- Jiangsu Provincial Key Laboratory of Marine Biology, College of Resources and Environmental Science, Nanjing Agricultural University, Nanjing 210095, PR China
| | - Jiameng Song
- Jiangsu Provincial Key Laboratory of Marine Biology, College of Resources and Environmental Science, Nanjing Agricultural University, Nanjing 210095, PR China
| | - Yachao Bao
- Jiangsu Provincial Key Laboratory of Marine Biology, College of Resources and Environmental Science, Nanjing Agricultural University, Nanjing 210095, PR China
| | - Meilin He
- Jiangsu Provincial Key Laboratory of Marine Biology, College of Resources and Environmental Science, Nanjing Agricultural University, Nanjing 210095, PR China
| | - Changhai Wang
- Jiangsu Provincial Key Laboratory of Marine Biology, College of Resources and Environmental Science, Nanjing Agricultural University, Nanjing 210095, PR China
| |
Collapse
|
27
|
Weitschek E, Cunial F, Felici G. LAF: Logic Alignment Free and its application to bacterial genomes classification. BioData Min 2015; 8:39. [PMID: 26664519 PMCID: PMC4673791 DOI: 10.1186/s13040-015-0073-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Accepted: 11/30/2015] [Indexed: 12/24/2022] Open
Abstract
Alignment-free algorithms can be used to estimate the similarity of biological sequences and hence are often applied to the phylogenetic reconstruction of genomes. Most of these algorithms rely on comparing the frequency of all the distinct substrings of fixed length (k-mers) that occur in the analyzed sequences. In this paper, we present Logic Alignment Free (LAF), a method that combines alignment-free techniques and rule-based classification algorithms in order to assign biological samples to their taxa. This method searches for a minimal subset of k-mers whose relative frequencies are used to build classification models as disjunctive-normal-form logic formulas (if-then rules). We apply LAF successfully to the classification of bacterial genomes to their corresponding taxonomy. In particular, we succeed in obtaining reliable classification at different taxonomic levels by extracting a handful of rules, each one based on the frequency of just few k-mers. State of the art methods to adjust the frequency of k-mers to the character distribution of the underlying genomes have negligible impact on classification performance, suggesting that the signal of each class is strong and that LAF is effective in identifying it.
Collapse
Affiliation(s)
- Emanuel Weitschek
- Department of Engineering, Uninettuno International University, Corso Vittorio Emanuele II, 39, Rome, 00186 Italy ; Institute of Systems Analysis and Computer Science "A. Ruberti", National Research Council, Via dei Taurini 19, Rome, 00185 Italy
| | - Fabio Cunial
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, P.O. Box 68 (Gustaf Hällströmin katu 2b), Helsinki, FI-00014 Finland
| | - Giovanni Felici
- Institute of Systems Analysis and Computer Science "A. Ruberti", National Research Council, Via dei Taurini 19, Rome, 00185 Italy
| |
Collapse
|
28
|
Chen W, Ma X, Shen Y, Mao Y, He S. The fish diversity in the upper reaches of the Salween River, Nujiang River, revealed by DNA barcoding. Sci Rep 2015; 5:17437. [PMID: 26616046 PMCID: PMC4663501 DOI: 10.1038/srep17437] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Accepted: 10/29/2015] [Indexed: 11/09/2022] Open
Abstract
Nujiang River (NR), an essential component of the biodiversity hotspot of the
Mountains of Southwest China, possesses a characteristic fish fauna and contains
endemic species. Although previous studies on fish diversity in the NR have
primarily consisted of listings of the fish species observed during field
collections, in our study, we DNA-barcoded 1139 specimens belonging to 46
morphologically distinct fish species distributed throughout the NR basin by
employing multiple analytical approaches. According to our analyses, DNA barcoding
is an efficient method for the identification of fish by the presence of barcode
gaps. However, three invasive species are characterized by deep conspecific
divergences, generating multiple lineages and Operational Taxonomic Units (OTUs),
implying the possibility of cryptic species. At the other end of the spectrum, ten
species (from three genera) that are characterized by an overlap between their
intra- and interspecific genetic distances form a single genetic cluster and share
haplotypes. The neighbor-joining phenogram, Barcode Index Numbers (BINs) and
Automatic Barcode Gap Discovery (ABGD) identified 43 putative species, while the
General Mixed Yule-coalescence (GMYC) identified five more OTUs. Thus, our study
established a reliable DNA barcode reference library for the fish in the NR and
sheds new light on the local fish diversity.
Collapse
Affiliation(s)
- Weitao Chen
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, 430072, China.,Graduate school of Chinese Academy of Sciences, Beijing, 10001, China
| | - Xiuhui Ma
- School of life science, Southwest University, Beibei, Chongqing, 400715, China
| | - Yanjun Shen
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, 430072, China.,Graduate school of Chinese Academy of Sciences, Beijing, 10001, China
| | - Yuntao Mao
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, 430072, China.,Graduate school of Chinese Academy of Sciences, Beijing, 10001, China
| | - Shunping He
- The Key Laboratory of Aquatic Biodiversity and Conservation of Chinese Academy of Sciences, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei, 430072, China
| |
Collapse
|
29
|
Bhagwat RM, Dholakia BB, Kadoo NY, Balasundaran M, Gupta VS. Two New Potential Barcodes to Discriminate Dalbergia Species. PLoS One 2015; 10:e0142965. [PMID: 26569490 PMCID: PMC4646644 DOI: 10.1371/journal.pone.0142965] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2014] [Accepted: 10/29/2015] [Indexed: 12/13/2022] Open
Abstract
DNA barcoding enables precise identification of species from analysis of unique DNA sequence of a target gene. The present study was undertaken to develop barcodes for different species of the genus Dalbergia, an economically important timber plant and is widely distributed in the tropics. Ten Dalbergia species selected from the Western Ghats of India were evaluated using three regions in the plastid genome (matK, rbcL, trnH-psbA), a nuclear transcribed spacer (nrITS) and their combinations, in order to discriminate them at species level. Five criteria: (i) inter and intraspecific distances, (ii) Neighbor Joining (NJ) trees, (iii) Best Match (BM) and Best Close Match (BCM), (iv) character based rank test and (v) Wilcoxon signed rank test were used for species discrimination. Among the evaluated loci, rbcL had the highest success rate for amplification and sequencing (97.6%), followed by matK (97.0%), trnH-psbA (94.7%) and nrITS (80.5%). The inter and intraspecific distances, along with Wilcoxon signed rank test, indicated a higher divergence for nrITS. The BM and BCM approaches revealed the highest rate of correct species identification (100%) with matK, matK+rbcL and matK+trnH-psb loci. These three loci, along with nrITS, were further supported by character based identification method. Considering the overall performance of these loci and their ranking with different approaches, we suggest matK and matK+rbcL as the most suitable barcodes to unambiguously differentiate Dalbergia species. These findings will potentially be helpful in delineating the various species of Dalbergia genus, as well as other related genera.
Collapse
Affiliation(s)
- Rasika M. Bhagwat
- Plant Molecular Biology Group, Biochemical Sciences Division, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
| | - Bhushan B. Dholakia
- Plant Molecular Biology Group, Biochemical Sciences Division, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
| | - Narendra Y. Kadoo
- Plant Molecular Biology Group, Biochemical Sciences Division, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
| | - M. Balasundaran
- Forest Genetics and Biotechnology Division, Kerala Forest Research Institute, Peechi, Thrissur, Kerala, India
| | - Vidya S. Gupta
- Plant Molecular Biology Group, Biochemical Sciences Division, CSIR-National Chemical Laboratory, Pune, Maharashtra, India
| |
Collapse
|
30
|
Fiannaca A, La Rosa M, Rizzo R, Urso A. A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network. Artif Intell Med 2015; 64:173-84. [PMID: 26170017 DOI: 10.1016/j.artmed.2015.06.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Revised: 05/25/2015] [Accepted: 06/25/2015] [Indexed: 11/28/2022]
Abstract
OBJECTIVES In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. METHODS In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". RESULTS The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. CONCLUSIONS Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments.
Collapse
Affiliation(s)
- Antonino Fiannaca
- Institute of High-Performance Computing and Networking, National Research Council of Italy, Viale delle Scienze, Ed. 11, 90128 Palermo, Italy.
| | - Massimo La Rosa
- Institute of High-Performance Computing and Networking, National Research Council of Italy, Viale delle Scienze, Ed. 11, 90128 Palermo, Italy
| | - Riccardo Rizzo
- Institute of High-Performance Computing and Networking, National Research Council of Italy, Viale delle Scienze, Ed. 11, 90128 Palermo, Italy
| | - Alfonso Urso
- Institute of High-Performance Computing and Networking, National Research Council of Italy, Viale delle Scienze, Ed. 11, 90128 Palermo, Italy
| |
Collapse
|
31
|
Mutanen M, Kekkonen M, Prosser SWJ, Hebert PDN, Kaila L. One species in eight: DNA barcodes from type specimens resolve a taxonomic quagmire. Mol Ecol Resour 2015; 15:967-84. [PMID: 25524367 PMCID: PMC4964951 DOI: 10.1111/1755-0998.12361] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Revised: 12/04/2014] [Accepted: 12/08/2014] [Indexed: 11/26/2022]
Abstract
Each holotype specimen provides the only objective link to a particular Linnean binomen. Sequence information from them is increasingly valuable due to the growing usage of DNA barcodes in taxonomy. As type specimens are often old, it may only be possible to recover fragmentary sequence information from them. We tested the efficacy of short sequences from type specimens in the resolution of a challenging taxonomic puzzle: the Elachista dispunctella complex which includes 64 described species with minuscule morphological differences. We applied a multistep procedure to resolve the taxonomy of this species complex. First, we sequenced a large number of newly collected specimens and as many holotypes as possible. Second, we used all >400 bp examine species boundaries. We employed three unsupervised methods (BIN, ABGD, GMYC) with specified criteria on how to handle discordant results and examined diagnostic bases from each delineated putative species (operational taxonomic units, OTUs). Third, we evaluated the morphological characters of each OTU. Finally, we associated short barcodes from types with the delineated OTUs. In this step, we employed various supervised methods, including distance‐based, tree‐based and character‐based. We recovered 658 bp barcode sequences from 194 of 215 fresh specimens and recovered an average of 141 bp from 33 of 42 holotypes. We observed strong congruence among all methods and good correspondence with morphology. We demonstrate potential pitfalls with tree‐, distance‐ and character‐based approaches when associating sequences of varied length. Our results suggest that sequences as short as 56 bp can often provide valuable taxonomic information. The results support significant taxonomic oversplitting of species in the Elachista dispunctella complex.
Collapse
Affiliation(s)
- Marko Mutanen
- Biodiversity Unit, Department of Biology, University of Oulu, P.O. Box 3000, FI-90014, Oulu, Finland
| | - Mari Kekkonen
- Zoology Unit, Finnish Museum of Natural History, University of Helsinki, P.O. Box 17, FI-00014, Helsinki, Finland.,Biodiversity Institute of Ontario, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Sean W J Prosser
- Biodiversity Institute of Ontario, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Paul D N Hebert
- Biodiversity Institute of Ontario, University of Guelph, Guelph, ON, N1G 2W1, Canada
| | - Lauri Kaila
- Zoology Unit, Finnish Museum of Natural History, University of Helsinki, P.O. Box 17, FI-00014, Helsinki, Finland
| |
Collapse
|
32
|
Liu Q, Zhu F, Zhong G, Wang Y, Fang M, Xiao R, Cai Y, Guo P. COI-based barcoding of Chinese vipers (Reptilia: Squamata: Viperidae). AMPHIBIA-REPTILIA 2015. [DOI: 10.1163/15685381-00003012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
DNA barcoding seeks to assemble a standardized reference library for rapid and unambiguous identification of species, and can be used to screen for potentially cryptic species. The 5′ region of cytochrome oxidase subunit I (COI), which is a mitochondrial DNA (mtDNA) gene fragment, has been proposed as a universal marker for this purpose among animals. However, DNA barcoding of reptiles is still supported only by few datasets compared with other groups. We investigated the utilization of COI to discriminate 34 putative species of vipers, representing almost 92% of the recorded species in China. Based on a total of 241 sequences, our results indicated that the average degree of intraspecific variability (0.0198) tends to be one-sixth the average of interspecific divergence (0.0931), but no barcoding gap was detected between them. The threshold method, BLOG analyses and tree-based methods all can identify species with a high success rate. These results consistently suggested the usefulness and reliability of the DNA barcoding approach in Chinese vipers.
Collapse
Affiliation(s)
- Qin Liu
- College of Life Sciences and Food Engineering, Yibin University, Yibin 644007, China
- College of Life Sciences, Sichuan University, Chengdu 610064, China
| | - Fei Zhu
- College of Life Sciences and Food Engineering, Yibin University, Yibin 644007, China
- College of Life Sciences, Sichuan University, Chengdu 610064, China
| | - Guanghui Zhong
- College of Life Sciences and Food Engineering, Yibin University, Yibin 644007, China
- College of Tourism and Urban-Rural Planning, Chengdu University of Technology, Chengdu 610059, China
| | - Yunyu Wang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
| | - Min Fang
- College of Life Sciences and Food Engineering, Yibin University, Yibin 644007, China
| | - Rong Xiao
- College of Life Sciences and Food Engineering, Yibin University, Yibin 644007, China
| | - Yansen Cai
- Department of Medical Biology and Genetics, Luzhou Medical College, Luzhou, 646000, China
| | - Peng Guo
- College of Life Sciences and Food Engineering, Yibin University, Yibin 644007, China
| |
Collapse
|
33
|
Čandek K, Kuntner M. DNA barcoding gap: reliable species identification over morphological and geographical scales. Mol Ecol Resour 2014; 15:268-77. [DOI: 10.1111/1755-0998.12304] [Citation(s) in RCA: 120] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2014] [Revised: 07/06/2014] [Accepted: 07/16/2014] [Indexed: 12/23/2022]
Affiliation(s)
- Klemen Čandek
- Institute of Biology; Scientific Research Centre of the Slovenian Academy of Sciences and Arts; Novi Trg 2 1000 Ljubljana Slovenia
| | - Matjaž Kuntner
- Institute of Biology; Scientific Research Centre of the Slovenian Academy of Sciences and Arts; Novi Trg 2 1000 Ljubljana Slovenia
- Centre for Behavioural Ecology and Evolution; College of Life Sciences; Hubei University; 368 Youyi Road 430062 Wuhan China
- Department of Entomology; National Museum of Natural History; Smithsonian Institution; PO Box 37012 Washington DC 20013-7012 USA
| |
Collapse
|
34
|
Classification of selectively constrained DNA elements using feature vectors and rule-based classifiers. Genomics 2014; 104:79-86. [PMID: 25058025 DOI: 10.1016/j.ygeno.2014.07.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Accepted: 07/15/2014] [Indexed: 12/29/2022]
Abstract
Scarce work has been done in the analysis of the composition of conserved non-coding elements (CNEs) that are identified by comparisons of two or more genomes and are found to exist in all metazoan genomes. Here we present the analysis of CNEs with a methodology that takes into account word occurrence at various lengths scales in the form of feature vector representation and rule based classifiers. We implement our approach on both protein-coding exons and CNEs, originating from human, insect (Drosophila melanogaster) and worm (Caenorhabditis elegans) genomes, that are either identified in the present study or obtained from the literature. Alignment free feature vector representation of sequences combined with rule-based classification methods leads to successful classification of the different CNEs classes. Biologically meaningful results are derived by comparison with the genomic signatures approach, and classification rates for a variety of functional elements of the genomes along with surrogates are presented.
Collapse
|
35
|
Li J, Zheng X, Cai Y, Zhang X, Yang M, Yue B, Li J. DNA barcoding of Murinae (Rodentia: Muridae) and Arvicolinae (Rodentia: Cricetidae) distributed in China. Mol Ecol Resour 2014; 15:153-67. [PMID: 24838015 DOI: 10.1111/1755-0998.12279] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Revised: 04/24/2014] [Accepted: 05/07/2014] [Indexed: 12/25/2022]
Abstract
Identification of rodents is very difficult mainly due to high similarities in morphology and controversial taxonomy. In this study, mitochondrial cytochrome oxidase subunit I (COI) was used as DNA barcode to identify the Murinae and Arvicolinae species distributed in China and to facilitate the systematics studies of Rodentia. In total, 242 sequences (31 species, 11 genera) from Murinae and 130 sequences (23 species, 6 genera) from Arvicolinae were investigated, of which 90 individuals were novel. Genetic distance, threshold method, tree-based method, online BLAST and BLOG were employed to analyse the data sets. There was no obvious barcode gap. The average K2P distance within species and genera was 2.10% and 12.61% in Murinae, and 2.86% and 11.80% in Arvicolinae, respectively. The optimal threshold was 5.62% for Murinae and 3.34% for Arvicolinae. All phylogenetic trees exhibited similar topology and could distinguish 90.32% of surveyed species in Murinae and 82.60% in Arvicolinae with high support values. BLAST analyses yielded similar results with identification success rates of 92.15% and 93.85% for Murinae and Arvicolinae, respectively. BLOG successfully authenticated 100% of detected species except Leopoldamys edwardsi based on the latest taxonomic revision. Our results support the species status of recently recognized Micromys erythrotis, Eothenomys tarquinius and E. hintoni and confirm the important roles of comprehensive taxonomy and accurate morphological identification in DNA barcoding studies. We believe that, when proper analytic methods are applied or combined, DNA barcoding could serve as an accurate and effective species identification approach for Murinae and Arvicolinae based on a proper taxonomic framework.
Collapse
Affiliation(s)
- Jing Li
- Key Laboratory of Bio-Resources and Eco-Environment (Ministry of Education), College of Life Sciences, Sichuan University, Chengdu, 610065, China
| | | | | | | | | | | | | |
Collapse
|
36
|
Weitschek E, Fiscon G, Felici G. Supervised DNA Barcodes species classification: analysis, comparisons and results. BioData Min 2014; 7:4. [PMID: 24721333 PMCID: PMC4022351 DOI: 10.1186/1756-0381-7-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2013] [Accepted: 04/05/2014] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. METHODS In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. RESULTS A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. CONCLUSIONS The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community.
Collapse
Affiliation(s)
- Emanuel Weitschek
- Department of Engineering, Roma Tre University, Via della Vasca Navale, 79, 00146 Rome, Italy
- Institute of Systems Analysis and Computer Science Antonio Ruberti, National Research Council, Viale Manzoni, 30, 00185 Rome, Italy
| | - Giulia Fiscon
- Institute of Systems Analysis and Computer Science Antonio Ruberti, National Research Council, Viale Manzoni, 30, 00185 Rome, Italy
- Department of Computer, Control, and Management Engineering, Sapienza University, Via Ariosto, 25, 00185 Rome, Italy
| | - Giovanni Felici
- Institute of Systems Analysis and Computer Science Antonio Ruberti, National Research Council, Viale Manzoni, 30, 00185 Rome, Italy
| |
Collapse
|
37
|
Fan L, Hui JHL, Yu ZG, Chu KH. VIP Barcoding: composition vector-based software for rapid species identification based on DNA barcoding. Mol Ecol Resour 2014; 14:871-81. [PMID: 24479510 DOI: 10.1111/1755-0998.12235] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2013] [Revised: 01/22/2014] [Accepted: 01/24/2014] [Indexed: 12/17/2022]
Abstract
Species identification based on short sequences of DNA markers, that is, DNA barcoding, has emerged as an integral part of modern taxonomy. However, software for the analysis of large and multilocus barcoding data sets is scarce. The Basic Local Alignment Search Tool (BLAST) is currently the fastest tool capable of handling large databases (e.g. >5000 sequences), but its accuracy is a concern and has been criticized for its local optimization. However, current more accurate software requires sequence alignment or complex calculations, which are time-consuming when dealing with large data sets during data preprocessing or during the search stage. Therefore, it is imperative to develop a practical program for both accurate and scalable species identification for DNA barcoding. In this context, we present VIP Barcoding: a user-friendly software in graphical user interface for rapid DNA barcoding. It adopts a hybrid, two-stage algorithm. First, an alignment-free composition vector (CV) method is utilized to reduce searching space by screening a reference database. The alignment-based K2P distance nearest-neighbour method is then employed to analyse the smaller data set generated in the first stage. In comparison with other software, we demonstrate that VIP Barcoding has (i) higher accuracy than Blastn and several alignment-free methods and (ii) higher scalability than alignment-based distance methods and character-based methods. These results suggest that this platform is able to deal with both large-scale and multilocus barcoding data with accuracy and can contribute to DNA barcoding for modern taxonomy. VIP Barcoding is free and available at http://msl.sls.cuhk.edu.hk/vipbarcoding/.
Collapse
Affiliation(s)
- Long Fan
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
| | | | | | | |
Collapse
|
38
|
de Boer HJ, Ouarghidi A, Martin G, Abbad A, Kool A. DNA barcoding reveals limited accuracy of identifications based on folk taxonomy. PLoS One 2014; 9:e84291. [PMID: 24416210 PMCID: PMC3885563 DOI: 10.1371/journal.pone.0084291] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 11/13/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND The trade of plant roots as traditional medicine is an important source of income for many people around the world. Destructive harvesting practices threaten the existence of some plant species. Harvesters of medicinal roots identify the collected species according to their own folk taxonomies, but once the dried or powdered roots enter the chain of commercialization, accurate identification becomes more challenging. METHODOLOGY A survey of morphological diversity among four root products traded in the medina of Marrakech was conducted. Fifty-one root samples were selected for molecular identification using DNA barcoding using three markers, trnH-psbA, rpoC1, and ITS. Sequences were searched using BLAST against a tailored reference database of Moroccan medicinal plants and their closest relatives submitted to NCBI GenBank. PRINCIPAL FINDINGS Combining psbA-trnH, rpoC1, and ITS allowed the majority of the market samples to be identified to species level. Few of the species level barcoding identifications matched the scientific names given in the literature, including the most authoritative and widely cited pharmacopeia. CONCLUSIONS/SIGNIFICANCE The four root complexes selected from the medicinal plant products traded in Marrakech all comprise more than one species, but not those previously asserted. The findings have major implications for the monitoring of trade in endangered plant species as morphology-based species identifications alone may not be accurate. As a result, trade in certain species may be overestimated, whereas the commercialization of other species may not be recorded at all.
Collapse
Affiliation(s)
- Hugo J. de Boer
- Department of Organismal Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden
- Naturalis Biodiversity Center, Leiden, The Netherlands
- Natural History Museum, University of Oslo, Oslo, Norway
| | - Abderrahim Ouarghidi
- Faculty of Science Semlalia, Cadi Ayyad University, Marrakech, Morocco
- Global Diversity Foundation, Marrakech, Morocco
| | - Gary Martin
- Global Diversity Foundation, Marrakech, Morocco
| | - Abdelaziz Abbad
- Faculty of Science Semlalia, Cadi Ayyad University, Marrakech, Morocco
| | - Anneleen Kool
- Natural History Museum, University of Oslo, Oslo, Norway
| |
Collapse
|
39
|
Nagy ZT, Sonet G, Mortelmans J, Vandewynkel C, Grootaert P. Using DNA barcodes for assessing diversity in the family Hybotidae (Diptera, Empidoidea). Zookeys 2013:263-78. [PMID: 24453562 PMCID: PMC3890682 DOI: 10.3897/zookeys.365.6070] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 11/27/2013] [Indexed: 12/31/2022] Open
Abstract
Empidoidea is one of the largest extant lineages of flies, but phylogenetic relationships among species of this group are poorly investigated and global diversity remains scarcely assessed. In this context, one of the most enigmatic empidoid families is Hybotidae. Within the framework of a pilot study, we barcoded 339 specimens of Old World hybotids belonging to 164 species and 22 genera (plus two Empis as outgroups) and attempted to evaluate whether patterns of intra- and interspecific divergences match the current taxonomy. We used a large sampling of diverse Hybotidae. The material came from the Palaearctic (Belgium, France, Portugal and Russian Caucasus), the Afrotropic (Democratic Republic of the Congo) and the Oriental realms (Singapore and Thailand). Thereby, we optimized lab protocols for barcoding hybotids. Although DNA barcodes generally well distinguished recognized taxa, the study also revealed a number of unexpected phenomena: e.g., undescribed taxa found within morphologically very similar or identical specimens, especially when geographic distance was large; some morphologically distinct species showed no genetic divergence; or different pattern of intraspecific divergence between populations or closely related species. Using COI sequences and simple Neighbour-Joining tree reconstructions, the monophyly of many species- and genus-level taxa was well supported, but more inclusive taxonomical levels did not receive significant bootstrap support. We conclude that in hybotids DNA barcoding might be well used to identify species, when two main constraints are considered. First, incomplete barcoding libraries hinder efficient (correct) identification. Therefore, extra efforts are needed to increase the representation of hybotids in these databases. Second, the spatial scale of sampling has to be taken into account, and especially for widespread species or species complexes with unclear taxonomy, an integrative approach has to be used to clarify species boundaries and identities.
Collapse
Affiliation(s)
- Zoltán T Nagy
- Royal Belgian Institute of Natural Sciences, OD Taxonomy and Phylogeny (JEMU), Rue Vautierstraat 29, 1000 Brussels, Belgium
| | - Gontran Sonet
- Royal Belgian Institute of Natural Sciences, OD Taxonomy and Phylogeny (JEMU), Rue Vautierstraat 29, 1000 Brussels, Belgium
| | - Jonas Mortelmans
- Royal Belgian Institute of Natural Sciences, OD Taxonomy and Phylogeny (Entomology), Rue Vautierstraat 29, 1000 Brussels, Belgium
| | - Camille Vandewynkel
- Laboratoire des Sciences de l'eau et environnement, Faculté des Sciences et Techniques, Avenue Albert Thomas, 23, 87060 Limoges, France
| | - Patrick Grootaert
- Royal Belgian Institute of Natural Sciences, OD Taxonomy and Phylogeny (Entomology), Rue Vautierstraat 29, 1000 Brussels, Belgium
| |
Collapse
|
40
|
Frey JE, Guillén L, Frey B, Samietz J, Rull J, Aluja M. Developing diagnostic SNP panels for the identification of true fruit flies (Diptera: Tephritidae) within the limits of COI-based species delimitation. BMC Evol Biol 2013; 13:106. [PMID: 23718854 PMCID: PMC3682933 DOI: 10.1186/1471-2148-13-106] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Accepted: 05/15/2013] [Indexed: 11/14/2022] Open
Abstract
Background Rapid and reliable identification of quarantine pests is essential for plant inspection services to prevent introduction of invasive species. For insects, this may be a serious problem when dealing with morphologically similar cryptic species complexes and early developmental stages that lack distinctive characters useful for taxonomic identification. DNA based barcoding could solve many of these problems. The standard barcode fragment, an approx. 650 base pairs long sequence of the 5′end of the mitochondrial cytochrome oxidase I (COI), enables differentiation of a very wide range of arthropods. However, problems remain in some taxa, such as Tephritidae, where recent genetic differentiation among some of the described species hinders accurate molecular discrimination. Results In order to explore the full species discrimination potential of COI, we sequenced the barcoding region of the COI gene of a range of economically important Tephritid species and complemented these data with all GenBank and BOLD entries for the systematic group available as of January 2012. We explored the limits of species delimitation of this barcode fragment among 193 putative Tephritid species and established operational taxonomic units (OTUs), between which discrimination is reliably possible. Furthermore, to enable future development of rapid diagnostic assays based on this sequence information, we characterized all single nucleotide polymorphisms (SNPs) and established “near-minimal” sets of SNPs that differentiate among all included OTUs with at least three and four SNPs, respectively. Conclusions We found that although several species cannot be differentiated based on the genetic diversity observed in COI and hence form composite OTUs, 85% of all OTUs correspond to described species. Because our SNP panels are developed based on all currently available sequence information and rely on a minimal pairwise difference of three SNPs, they are highly reliable and hence represent an important resource for developing taxon-specific diagnostic assays. For selected cases, possible explanations that may cause composite OTUs are discussed.
Collapse
Affiliation(s)
- Juerg E Frey
- Federal Department of Economic Affairs FDEA, Agroscope Changins-Wädenswil Research Station ACW, Department of Plant Protection, Wädenswil, Switzerland.
| | | | | | | | | | | |
Collapse
|
41
|
Bhargava M, Sharma A. DNA barcoding in plants: evolution and applications of in silico approaches and resources. Mol Phylogenet Evol 2013; 67:631-41. [PMID: 23500333 DOI: 10.1016/j.ympev.2013.03.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2012] [Revised: 02/28/2013] [Accepted: 03/01/2013] [Indexed: 02/03/2023]
Abstract
Bioinformatics has played an important role in the analysis of DNA barcoding data. The process of DNA barcoding initially involves the available data collection from the existing databases. Many databases have been developed in recent years, e.g. MMDBD [Medicinal Materials DNA Barcode Database], BioBarcode, etc. In case of non-availability of sequences, sequencing has to be done in vitro for which a recently developed software ecoPrimers can be helpful. This is followed by multiple sequence alignment. Further, basic sequence statistics computation and phylogenetic analysis can be performed by MEGA and PHYLIP/PAUP tools respectively. Some of the recent tools for in silico and statistical analysis specifically designed for barcoding viz. CAOS (Character Based DNA Barcoding), BRONX (DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability), Spider (Analysis of species identity and evolution, particularly DNA barcoding), jMOTU and Taxonerator (Turning DNA Barcode Sequences into Annotated OTUs), OTUbase (Analysis of OTU data and taxonomic data), SAP (Statistical Assignment Package), etc. have been discussed and analysed in this review. The paper presents a comprehensive overview of the various in silico methods, tools, softwares and databases used for DNA barcoding of plants.
Collapse
Affiliation(s)
- Mili Bhargava
- Biotechnology Division, Central Institute of Medicinal and Aromatic Plants, Council of Scientific and Industrial Research, PO, Lucknow 226 015, India.
| | | |
Collapse
|
42
|
D’Amato ME, Alechine E, Cloete KW, Davison S, Corach D. Where is the game? Wild meat products authentication in South Africa: a case study. INVESTIGATIVE GENETICS 2013; 4:6. [PMID: 23452350 PMCID: PMC3621286 DOI: 10.1186/2041-2223-4-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 02/14/2013] [Indexed: 11/10/2022]
Abstract
BACKGROUND Wild animals' meat is extensively consumed in South Africa, being obtained either from ranching, farming or hunting. To test the authenticity of the commercial labels of meat products in the local market, we obtained DNA sequence information from 146 samples (14 beef and 132 game labels) for barcoding cytochrome c oxidase subunit I and partial cytochrome b and mitochondrial fragments. The reliability of species assignments were evaluated using BLAST searches in GenBank, maximum likelihood phylogenetic analysis and the character-based method implemented in BLOG. The Kimura-2-parameter intra- and interspecific variation was evaluated for all matched species. RESULTS The combined application of similarity, phylogenetic and character-based methods proved successful in species identification. Game meat samples showed 76.5% substitution, no beef samples were substituted. The substitutions showed a variety of domestic species (cattle, horse, pig, lamb), common game species in the market (kudu, gemsbok, ostrich, impala, springbok), uncommon species in the market (giraffe, waterbuck, bushbuck, duiker, mountain zebra) and extra-continental species (kangaroo). The mountain zebra Equus zebra is an International Union for Conservation of Nature (IUCN) red listed species. We also detected Damaliscus pygargus, which is composed of two subspecies with one listed by IUCN as 'near threatened'; however, these mitochondrial fragments were insufficient to distinguish between the subspecies. The genetic distance between African ungulate species often overlaps with within-species distance in cases of recent speciation events, and strong phylogeographic structure determines within-species distances that are similar to the commonly accepted distances between species. CONCLUSIONS The reliability of commercial labeling of game meat in South Africa is very poor. The extensive substitution of wild game has important implications for conservation and commerce, and for the consumers making decisions on the basis of health, religious beliefs or personal choices.Distance would be a poor indicator for identification of African ungulates species. The efficiency of the character-based method is reliant upon availability of large reference data. The current higher availability of cytochrome b data would make this the marker of choice for African ungulates. The encountered problems of incomplete or erroneous information in databases are discussed.
Collapse
Affiliation(s)
- Maria Eugenia D’Amato
- Biotechnology Department, Forensic DNA Lab, University of the Western Cape, Modderdam Road, Bellville, 7535, South Africa
| | - Evguenia Alechine
- Servicio de Huellas Digitales Genéticas, School of Pharmacy and Biochemistry, University of Buenos Aires, Junín 956, Buenos Aires, 1113, Argentina
| | - Kevin Wesley Cloete
- Biotechnology Department, Forensic DNA Lab, University of the Western Cape, Modderdam Road, Bellville, 7535, South Africa
| | - Sean Davison
- Biotechnology Department, Forensic DNA Lab, University of the Western Cape, Modderdam Road, Bellville, 7535, South Africa
| | - Daniel Corach
- Servicio de Huellas Digitales Genéticas, School of Pharmacy and Biochemistry, University of Buenos Aires, Junín 956, Buenos Aires, 1113, Argentina
| |
Collapse
|
43
|
Weitschek E, Van Velzen R, Felici G, Bertolazzi P. BLOG 2.0: a software system for character-based species classification with DNA Barcode sequences. What it does, how to use it. Mol Ecol Resour 2013; 13:1043-6. [PMID: 23350601 DOI: 10.1111/1755-0998.12073] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2012] [Revised: 12/19/2012] [Accepted: 12/22/2012] [Indexed: 11/30/2022]
Abstract
BLOG (Barcoding with LOGic) is a diagnostic and character-based DNA Barcode analysis method. Its aim is to classify specimens to species based on DNA Barcode sequences and on a supervised machine learning approach, using classification rules that compactly characterize species in terms of DNA Barcode locations of key diagnostic nucleotides. The BLOG 2.0 software, its fundamental modules, online/offline user interfaces and recent improvements are described. These improvements affect both methodology and software design, and lead to the availability of different releases on the website http://dmb.iasi.cnr.it/blog-downloads.php. Previous and new experimental tests show that BLOG 2.0 outperforms previous versions as well as other DNA Barcode analysis methods.
Collapse
Affiliation(s)
- Emanuel Weitschek
- Institute of Systems Analysis and Computer Science A. Ruberti, National Research Council, Viale Manzoni 30, 00185, Rome, Italy; Department of Informatics and Automation, Università degli Studi Roma Tre, Via della Vasca Navale 79, 00146, Rome, Italy
| | | | | | | |
Collapse
|
44
|
Zou S, Li Q, Kong L. Monophyly, distance and character-based multigene barcoding reveal extraordinary cryptic diversity in Nassarius: a complex and dangerous community. PLoS One 2012; 7:e47276. [PMID: 23071774 PMCID: PMC3469534 DOI: 10.1371/journal.pone.0047276] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 09/10/2012] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Correct identification and cryptic biodiversity revelation for marine organisms are pressing since the marine life is important in maintaining the balance of ecological system and is facing the problem of biodiversity crisis or food safety. DNA barcoding has been proved successful to provide resolution beyond the boundaries of morphological information. Nassarius, the common mudsnail, plays an important role in marine environment and has problem in food safety, but the classification of it is quite confused because of the complex morphological diversity. METHODOLOGY/PRINCIPAL FINDINGS Here we report a comprehensive barcoding analysis of 22 Nassarius species. We integrated the mitochondrial and nuclear sequences and the morphological characters to determine 13 Nassarius species studied and reveal four cryptic species and one pair synonyms. Distance, monophyly, and character-based barcoding methods were employed. CONCLUSIONS/SIGNIFICANCE Such successful identification and unexpected cryptic discovery is significant for Nassarius in food safety and species conversation and remind us to pay more attention to the hidden cryptic biodiversity ignored in marine life. Distance, monophyly, and character-based barcoding methods are all very helpful in identification but the character-based method shows some advantages.
Collapse
Affiliation(s)
- Shanmei Zou
- Key Laboratory of Mariculture Ministry of Education, Ocean University of China, Qingdao, China
| | - Qi Li
- Key Laboratory of Mariculture Ministry of Education, Ocean University of China, Qingdao, China
| | - Lingfeng Kong
- Key Laboratory of Mariculture Ministry of Education, Ocean University of China, Qingdao, China
| |
Collapse
|
45
|
Coissac E, Riaz T, Puillandre N. Bioinformatic challenges for DNA metabarcoding of plants and animals. Mol Ecol 2012; 21:1834-47. [PMID: 22486822 DOI: 10.1111/j.1365-294x.2012.05550.x] [Citation(s) in RCA: 160] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Almost all empirical studies in ecology have to identify the species involved in the ecological process under examination. DNA metabarcoding, which couples the principles of DNA barcoding with next generation sequencing technology, provides an opportunity to easily produce large amounts of data on biodiversity. Microbiologists have long used metabarcoding approaches, but use of this technique in the assessment of biodiversity in plant and animal communities is under-explored. Despite its relationship with DNA barcoding, several unique features of DNA metabarcoding justify the development of specific data analysis methodologies. In this review, we describe the bioinformatics tools available for DNA metabarcoding of plants and animals, and we revisit others developed for DNA barcoding or microbial metabarcoding. We also discuss the principles and associated tools for evaluating and comparing DNA barcodes in the context of DNA metabarcoding, for designing new custom-made barcodes adapted to specific ecological question, for dealing with PCR and sequencing errors, and for inferring taxonomical data from sequences.
Collapse
Affiliation(s)
- Eric Coissac
- Laboratoire d'Ecologie Alpine, CNRS UMR 5553, Université Joseph Fourier, Grenoble, France.
| | | | | |
Collapse
|
46
|
Weitschek E, Lo Presti A, Drovandi G, Felici G, Ciccozzi M, Ciotti M, Bertolazzi P. Human polyomaviruses identification by logic mining techniques. Virol J 2012; 9:58. [PMID: 22385517 PMCID: PMC3307486 DOI: 10.1186/1743-422x-9-58] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2011] [Accepted: 03/02/2012] [Indexed: 11/16/2022] Open
Abstract
Background Differences in genomic sequences are crucial for the classification of viruses into different species. In this work, viral DNA sequences belonging to the human polyomaviruses BKPyV, JCPyV, KIPyV, WUPyV, and MCPyV are analyzed using a logic data mining method in order to identify the nucleotides which are able to distinguish the five different human polyomaviruses. Results The approach presented in this work is successful as it discovers several logic rules that effectively characterize the different five studied polyomaviruses. The individuated logic rules are able to separate precisely one viral type from the other and to assign an unknown DNA sequence to one of the five analyzed polyomaviruses. Conclusions The data mining analysis is performed by considering the complete sequences of the viruses and the sequences of the different gene regions separately, obtaining in both cases extremely high correct recognition rates.
Collapse
Affiliation(s)
- Emanuel Weitschek
- Institute of Systems Analysis and Computer Science "A, Ruberti", National Research Council, Viale Manzoni 30, 00185 Rome, Italy.
| | | | | | | | | | | | | |
Collapse
|
47
|
van Velzen R, Weitschek E, Felici G, Bakker FT. DNA barcoding of recently diverged species: relative performance of matching methods. PLoS One 2012; 7:e30490. [PMID: 22272356 PMCID: PMC3260286 DOI: 10.1371/journal.pone.0030490] [Citation(s) in RCA: 124] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Accepted: 12/22/2011] [Indexed: 12/23/2022] Open
Abstract
Recently diverged species are challenging for identification, yet they are frequently of special interest scientifically as well as from a regulatory perspective. DNA barcoding has proven instrumental in species identification, especially in insects and vertebrates, but for the identification of recently diverged species it has been reported to be problematic in some cases. Problems are mostly due to incomplete lineage sorting or simply lack of a 'barcode gap' and probably related to large effective population size and/or low mutation rate. Our objective was to compare six methods in their ability to correctly identify recently diverged species with DNA barcodes: neighbor joining and parsimony (both tree-based), nearest neighbor and BLAST (similarity-based), and the diagnostic methods DNA-BAR, and BLOG. We analyzed simulated data assuming three different effective population sizes as well as three selected empirical data sets from published studies. Results show, as expected, that success rates are significantly lower for recently diverged species (∼75%) than for older species (∼97%) (P<0.00001). Similarity-based and diagnostic methods significantly outperform tree-based methods, when applied to simulated DNA barcode data (P<0.00001). The diagnostic method BLOG had highest correct query identification rate based on simulated (86.2%) as well as empirical data (93.1%), indicating that it is a consistently better method overall. Another advantage of BLOG is that it offers species-level information that can be used outside the realm of DNA barcoding, for instance in species description or molecular detection assays. Even though we can confirm that identification success based on DNA barcoding is generally high in our data, recently diverged species remain difficult to identify. Nevertheless, our results contribute to improved solutions for their accurate identification.
Collapse
Affiliation(s)
- Robin van Velzen
- Biosystematics Group, Wageningen University, Wageningen, The Netherlands.
| | | | | | | |
Collapse
|
48
|
Zou S, Li Q, Kong L, Yu H, Zheng X. Comparing the usefulness of distance, monophyly and character-based DNA barcoding methods in species identification: a case study of neogastropoda. PLoS One 2011; 6:e26619. [PMID: 22039517 PMCID: PMC3200347 DOI: 10.1371/journal.pone.0026619] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2011] [Accepted: 09/29/2011] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND DNA barcoding has recently been proposed as a promising tool for the rapid species identification in a wide range of animal taxa. Two broad methods (distance and monophyly-based methods) have been used. One method is based on degree of DNA sequence variation within and between species while another method requires the recovery of species as discrete clades (monophyly) on a phylogenetic tree. Nevertheless, some issues complicate the use of both methods. A recently applied new technique, the character-based DNA barcode method, however, characterizes species through a unique combination of diagnostic characters. METHODOLOGY/PRINCIPAL FINDINGS Here we analyzed 108 COI and 102 16S rDNA sequences of 40 species of Neogastropoda from a wide phylogenetic range to assess the performance of distance, monophyly and character-based methods of DNA barcoding. The distance-based method for both COI and 16S rDNA genes performed poorly in terms of species identification. Obvious overlap between intraspecific and interspecific divergences for both genes was found. The "10× rule" threshold resulted in lumping about half of distinct species for both genes. The neighbour-joining phylogenetic tree of COI could distinguish all species studied. However, the 16S rDNA tree could not distinguish some closely related species. In contrast, the character-based barcode method for both genes successfully identified 100% of the neogastropod species included, and performed well in discriminating neogastropod genera. CONCLUSIONS/SIGNIFICANCE This present study demonstrates the effectiveness of the character-based barcoding method for species identification in different taxonomic levels, especially for discriminating the closely related species. While distance and monophyly-based methods commonly use COI as the ideal gene for barcoding, the character-based approach can perform well for species identification using relatively conserved gene markers (e.g., 16S rDNA in this study). Nevertheless, distance and monophyly-based methods, especially the monophyly-based method, can still be used to flag species.
Collapse
Affiliation(s)
- Shanmei Zou
- Key Laboratory of Mariculture Ministry of Education Ocean University of China, Qingdao, China
| | - Qi Li
- Key Laboratory of Mariculture Ministry of Education Ocean University of China, Qingdao, China
| | - Lingfeng Kong
- Key Laboratory of Mariculture Ministry of Education Ocean University of China, Qingdao, China
| | - Hong Yu
- Key Laboratory of Mariculture Ministry of Education Ocean University of China, Qingdao, China
| | - Xiaodong Zheng
- Key Laboratory of Mariculture Ministry of Education Ocean University of China, Qingdao, China
| |
Collapse
|
49
|
The Barcode of Life Data Portal: bridging the biodiversity informatics divide for DNA barcoding. PLoS One 2011; 6:e14689. [PMID: 21818249 PMCID: PMC3144886 DOI: 10.1371/journal.pone.0014689] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2010] [Accepted: 01/20/2011] [Indexed: 11/25/2022] Open
Abstract
With the volume of molecular sequence data that is systematically being generated globally, there is a need for centralized resources for data exploration and analytics. DNA Barcode initiatives are on track to generate a compendium of molecular sequence–based signatures for identifying animals and plants. To date, the range of available data exploration and analytic tools to explore these data have only been available in a boutique form—often representing a frustrating hurdle for many researchers that may not necessarily have resources to install or implement algorithms described by the analytic community. The Barcode of Life Data Portal (BDP) is a first step towards integrating the latest biodiversity informatics innovations with molecular sequence data from DNA barcoding. Through establishment of community driven standards, based on discussion with the Data Analysis Working Group (DAWG) of the Consortium for the Barcode of Life (CBOL), the BDP provides an infrastructure for incorporation of existing and next-generation DNA barcode analytic applications in an open forum.
Collapse
|
50
|
Abstract
More than 230,000 known species representing 31 metazoan phyla populate the world's oceans. Perhaps another 1,000,000 or more species remain to be discovered. There is reason for concern that species extinctions may out-pace discovery, especially in diverse and endangered marine habitats such as coral reefs. DNA barcodes (i.e., short DNA sequences for species recognition and discrimination) are useful tools to accelerate species-level analysis of marine biodiversity and to facilitate conservation efforts. This review focuses on the usual barcode region for metazoans: a approximately 648 base-pair region of the mitochondrial cytochrome c oxidase subunit I (COI) gene. Barcodes have also been used for population genetic and phylogeographic analysis, identification of prey in gut contents, detection of invasive species, forensics, and seafood safety. More controversially, barcodes have been used to delimit species boundaries, reveal cryptic species, and discover new species. Emerging frontiers are the use of barcodes for rapid and increasingly automated biodiversity assessment by high-throughput sequencing, including environmental barcoding and the use of barcodes to detect species for which formal identification or scientific naming may never be possible.
Collapse
Affiliation(s)
- Ann Bucklin
- Department of Marine Sciences, University of Connecticut, Groton, Connecticut 06340, USA.
| | | | | |
Collapse
|