1
|
Ratnasingham S, Wei C, Chan D, Agda J, Agda J, Ballesteros-Mejia L, Boutou HA, El Bastami ZM, Ma E, Manjunath R, Rea D, Ho C, Telfer A, McKeowan J, Rahulan M, Steinke C, Dorsheimer J, Milton M, Hebert PDN. BOLD v4: A Centralized Bioinformatics Platform for DNA-Based Biodiversity Data. Methods Mol Biol 2024; 2744:403-441. [PMID: 38683334 DOI: 10.1007/978-1-0716-3581-0_26] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
BOLD, the Barcode of Life Data System, supports the acquisition, storage, validation, analysis, and publication of DNA barcodes, activities requiring the integration of molecular, morphological, and distributional data. Its pivotal role in curating the reference library of DNA barcodes, coupled with its data management and analysis capabilities, makes it a central resource for biodiversity science. It enables rapid, accurate identification of specimens and also reveals patterns of genetic diversity and evolutionary relationships among taxa.Launched in 2005, BOLD has become an increasingly powerful tool for advancing the understanding of planetary biodiversity. It currently hosts 17 million specimen records and 14 million barcodes that provide coverage for more than a million species from every continent and ocean. The platform has the long-term goal of providing a consistent, accurate system for identifying all species of eukaryotes.BOLD's integrated analytical tools, full data lifecycle support, and secure collaboration framework distinguish it from other biodiversity platforms. BOLD v4 brought enhanced data management and analysis capabilities as well as novel functionality for data dissemination and publication. Its next version will include features to strengthen its utility to the research community, governments, industry, and society-at-large.
Collapse
Affiliation(s)
| | - Catherine Wei
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Dean Chan
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Jireh Agda
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Josh Agda
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | | | - Hamza Ait Boutou
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | | | - Eddie Ma
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Ramya Manjunath
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Dana Rea
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Chris Ho
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Angela Telfer
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Jaclyn McKeowan
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Miduna Rahulan
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Claudia Steinke
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Justin Dorsheimer
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Megan Milton
- Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
| | - Paul D N Hebert
- College of Biological Science, University of Guelph, Guelph, ON, Canada
| |
Collapse
|
2
|
de Benedetta F, Gargiulo S, Miele F, Figlioli L, Innangi M, Audisio P, Nugnes F, Bernardo U. The spread of Carpophilus truncatus is on the razor's edge between an outbreak and a pest invasion. Sci Rep 2022; 12:18841. [PMID: 36344625 PMCID: PMC9640586 DOI: 10.1038/s41598-022-23520-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 11/01/2022] [Indexed: 11/09/2022] Open
Abstract
In 2019, in southern Italy (Campania) there was an outbreak of a sap beetle infesting stored walnut fruits. A monitoring activity started to assess the spread and impact of the pest in walnut orchards and in warehouses, and an integrative characterization led to identify the beetle as Carpophilus truncatus. This species has been in Europe for a long time, rare and harmless until recently. We show also that this species is the same recently recorded in other two continents, Latin America and Australia, where it is causing massive damage on walnut and almond fruits. The sharing of a mitochondrial haplotype among populations recorded on three continents suggests that a worldwide invasion might be ongoing. A Geographic Profiling approach has determined that the more virulent population was first introduced in Italy, and the climate conditions of areas where C. truncatus is currently widespread and harmful indicate that the entire walnuts world production is in jeopardy as this species could adapt to any of the main walnut and almond production areas.
Collapse
Affiliation(s)
- Flavia de Benedetta
- grid.5326.20000 0001 1940 4177Institute for Sustainable Plant Protection - IPSP-CNR, National Research Council, P.le E. Fermi, 1, 80055 Portici, NA Italy ,grid.4691.a0000 0001 0790 385XDepartment of Agricultural Sciences, University of Napoli Federico II, Via Università, 100, 80055 Portici, NA Italy
| | - Simona Gargiulo
- grid.5326.20000 0001 1940 4177Institute for Sustainable Plant Protection - IPSP-CNR, National Research Council, P.le E. Fermi, 1, 80055 Portici, NA Italy
| | - Fortuna Miele
- grid.5326.20000 0001 1940 4177Institute for Sustainable Plant Protection - IPSP-CNR, National Research Council, P.le E. Fermi, 1, 80055 Portici, NA Italy
| | - Laura Figlioli
- grid.5326.20000 0001 1940 4177Institute for Sustainable Plant Protection - IPSP-CNR, National Research Council, P.le E. Fermi, 1, 80055 Portici, NA Italy
| | - Michele Innangi
- grid.10373.360000000122055422Department of Biosciences and Territory, University of Molise, Contrada Fonte Lappone, 86090 Pesche, IS Italy
| | - Paolo Audisio
- grid.7841.aDepartment of Biology and Biotechnologies “C. Darwin”, “Sapienza” University of Rome, Via A. Borelli, 50, 00161 Rome, Italy
| | - Francesco Nugnes
- grid.5326.20000 0001 1940 4177Institute for Sustainable Plant Protection - IPSP-CNR, National Research Council, P.le E. Fermi, 1, 80055 Portici, NA Italy
| | - Umberto Bernardo
- grid.5326.20000 0001 1940 4177Institute for Sustainable Plant Protection - IPSP-CNR, National Research Council, P.le E. Fermi, 1, 80055 Portici, NA Italy
| |
Collapse
|
3
|
Phillips JD, Gillis DJ, Hanner RH. Lack of Statistical Rigor in DNA Barcoding Likely Invalidates the Presence of a True Species' Barcode Gap. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.859099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
DNA barcoding has been largely successful in satisfactorily exposing levels of standing genetic diversity for a wide range of taxonomic groups through the employment of only one or a few universal gene markers. However, sufficient coverage of geographically-broad intra-specific haplotype variation within genomic databases like the Barcode of Life Data Systems (BOLD) and GenBank remains relatively sparse. As reference sequence libraries continue to grow exponentially in size, there is now the need to identify novel ways of meaningfully analyzing vast amounts of available DNA barcode data. This is an important issue to address promptly for the routine tasks of specimen identification and species discovery, which have seen broad adoption in areas as diverse as regulatory forensics and resource conservation. Here, it is demonstrated that the interpretation of DNA barcoding data is lacking in statistical rigor. To highlight this, focus is set specifically on one key concept that has become a household name in the field: the DNA barcode gap. Arguments outlined herein specifically center on DNA barcoding in animal taxa and stem from three angles: (1) the improper allocation of specimen sampling effort necessary to capture adequate levels of within-species genetic variation, (2) failing to properly visualize intra-specific and interspecific genetic distances, and (3) the inconsistent, inappropriate use, or absence of statistical inferential procedures in DNA barcoding gap analyses. Furthermore, simple statistical solutions are outlined which can greatly propel the use of DNA barcoding as a tool to irrefutably match unknowns to knowns on the basis of the barcoding gap with a high degree of confidence. Proposed methods examined herein are illustrated through application to DNA barcode sequence data from Canadian Pacific fish species as a case study.
Collapse
|
4
|
Bhaskar R, Das MK, Sharon EA, Kumar RR, R. G. C. Genetic identification of marine eels (Anguilliformes: Congroidei) through DNA barcoding from Kasimedu fishing harbour. Mitochondrial DNA B Resour 2021; 6:3354-3361. [PMID: 34790868 PMCID: PMC8592592 DOI: 10.1080/23802359.2021.1996291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
Along with the mysteries of their body's shape like snakes, marine eels have fascinated biologists for centuries. Information on the molecular taxonomy of marine eels is scarce from the Southeast Indian region and hence, the present study aimed to barcode marine eels collected from Kasimedu fishing harbor, Chennai, Tamil Nadu. A total of 44 specimens were collected and DNA barcoding was done with a COI marker. The evolutionary history was inferred using the BA method. We observed 17 species, 10 genera, 4 families from the suborder Congroidei of which the genus Ariosoma and Conger were found to be predominant. The species of the family Muraenesocidae and Congridae are highly variable. The average Kimura two-parameter (K2P) distances within species, genera, and families were 3.08%, 6.80%, 13.80%, respectively. Maximum genetic distance (0.307) was observed between the species Muraenesox cinereus and Ariosoma sp.1. BA tree topology revealed distinct clusters in concurrence with the taxonomic status of the species. A deeper split was observed in Uroconger lepturus. We sequenced for the first-time barcode of Sauromuraenesox vorax and a new species Ophichthus chennaiensis is the gap-filling in identifying this taxon in the Indian context. We found a correct match between morphological and genetic identification of the species analyzed, depending on the cluster analysis performed (BINs and ASAP). This demonstrates that the COI gene sequence is suitable for phylogenetic analysis and species identification.
Collapse
Affiliation(s)
- Ranjana Bhaskar
- Zoological Survey of India, Southern Regional Centre, Chennai, India
| | - Mrinal Kumar Das
- Zoological Survey of India, Marine Biology Regional Centre, Chennai, India
| | - E. Agnita Sharon
- Zoological Survey of India, Southern Regional Centre, Chennai, India
| | | | - Chandika R. G.
- Zoological Survey of India, Southern Regional Centre, Chennai, India
| |
Collapse
|
5
|
Arning N, Sheppard SK, Bayliss S, Clifton DA, Wilson DJ. Machine learning to predict the source of campylobacteriosis using whole genome data. PLoS Genet 2021; 17:e1009436. [PMID: 34662334 PMCID: PMC8553134 DOI: 10.1371/journal.pgen.1009436] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 10/28/2021] [Accepted: 08/26/2021] [Indexed: 11/18/2022] Open
Abstract
Campylobacteriosis is among the world's most common foodborne illnesses, caused predominantly by the bacterium Campylobacter jejuni. Effective interventions require determination of the infection source which is challenging as transmission occurs via multiple sources such as contaminated meat, poultry, and drinking water. Strain variation has allowed source tracking based upon allelic variation in multi-locus sequence typing (MLST) genes allowing isolates from infected individuals to be attributed to specific animal or environmental reservoirs. However, the accuracy of probabilistic attribution models has been limited by the ability to differentiate isolates based upon just 7 MLST genes. Here, we broaden the input data spectrum to include core genome MLST (cgMLST) and whole genome sequences (WGS), and implement multiple machine learning algorithms, allowing more accurate source attribution. We increase attribution accuracy from 64% using the standard iSource population genetic approach to 71% for MLST, 85% for cgMLST and 78% for kmerized WGS data using the classifier we named aiSource. To gain insight beyond the source model prediction, we use Bayesian inference to analyse the relative affinity of C. jejuni strains to infect humans and identified potential differences, in source-human transmission ability among clonally related isolates in the most common disease causing lineage (ST-21 clonal complex). Providing generalizable computationally efficient methods, based upon machine learning and population genetics, we provide a scalable approach to global disease surveillance that can continuously incorporate novel samples for source attribution and identify fine-scale variation in transmission potential.
Collapse
Affiliation(s)
- Nicolas Arning
- Big Data institute, Nuffield Department of Population Health, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, Oxford, United Kingdom
- * E-mail:
| | - Samuel K. Sheppard
- The Milner Centre of Evolution, Department of Biology & Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom
| | - Sion Bayliss
- The Milner Centre of Evolution, Department of Biology & Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom
| | - David A. Clifton
- Department of Engineering Science, University of Oxford, Oxford, UK; Oxford-Suzhou Centre for Advanced Research, Suzhou, China
| | - Daniel J. Wilson
- Big Data institute, Nuffield Department of Population Health, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, Oxford, United Kingdom
| |
Collapse
|
6
|
Liu Y, Zhang M, Chen X, Chen X, Hu Y, Gao J, Pan W, Xin Y, Wu J, Du Y, Zhang X. Developing an efficient DNA barcoding system to differentiate between Lilium species. BMC PLANT BIOLOGY 2021; 21:465. [PMID: 34645404 PMCID: PMC8513328 DOI: 10.1186/s12870-021-03229-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 09/23/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Lilium is an important ornamental bulb, possesses medicinal properties, and is also edible. Species within the Lilium genus share very similar morphology and macroscopic characteristics, thus they cannot be easily and clearly distinguished from one another. To date, no efficient species-specific markers have been developed for classifying wild lily species, which poses an issue with further characterizing its medicinal properties. RESULTS To develop a simple and reliable identification system for Lilium, 45 representative species from 6 sections were used to develop a DNA barcoding system, which was based on DNA sequence polymorphisms. In this study, we assessed five commonly used DNA barcode candidates (ITS, rbcL, ycf1b, matK and psbA-trnH) and five novel barcode candidates obtained from highly variable chloroplast genomic regions (trnL-trnF, trnS-trnG, trnF-ndhJ, trnP-psaJ-rpI33 and psbB-psbH). We showed that a set of three novel DNA barcodes (ITS + trnP-psaJ-rpI33 + psbB-psbH) could be efficiently used as a genetic marker to distinguish between lily species, as assessed by methods including DNAsp, BI and ML tree, and Pair Wise Group (PWG). CONCLUSIONS A rapid and reliable DNA barcoding method was developed for all 45 wild Lilium species by using ITS, trnP-psaJ-rpI33, and psbB-psbH as DNA barcoding markers. The method can be used in the classification of wild Lilium species, especially endangered species, and also provides an effective method for selective lily breeding.
Collapse
Affiliation(s)
- Yixin Liu
- Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100097, China
| | - Mingfang Zhang
- Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100097, China
| | - Xuqing Chen
- Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100097, China
| | - Xi Chen
- Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100097, China
- School of Landscape Architecture, Beijing Forestry University, Beijing, 100083, China
| | - Yue Hu
- Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100097, China
| | - Junlian Gao
- Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100097, China
| | - Wenqiang Pan
- Beijing Key Laboratory of Development and Quality Control of Ornamental Crops, Department of Ornamental Horticulture and Landscape Architecture, China Agricultural University, Beijing, 100193, China
| | - Yin Xin
- Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100097, China
- Beijing Key Laboratory of Development and Quality Control of Ornamental Crops, Department of Ornamental Horticulture and Landscape Architecture, China Agricultural University, Beijing, 100193, China
| | - Jian Wu
- Beijing Key Laboratory of Development and Quality Control of Ornamental Crops, Department of Ornamental Horticulture and Landscape Architecture, China Agricultural University, Beijing, 100193, China.
| | - Yunpeng Du
- Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100097, China.
| | - Xiuhai Zhang
- Beijing Academy of Agriculture and Forestry Sciences, Beijing, 100097, China.
| |
Collapse
|
7
|
Gutierrez MAC, Lopez ROH, Ramos AT, Vélez ID, Gomez RV, Arrivillaga-Henríquez J, Uribe S. DNA barcoding of Lutzomyia longipalpis species complex (Diptera: Psychodidae), suggests the existence of 8 candidate species. Acta Trop 2021; 221:105983. [PMID: 34048789 DOI: 10.1016/j.actatropica.2021.105983] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 04/27/2021] [Accepted: 05/20/2021] [Indexed: 10/21/2022]
Abstract
The sand fly Lutzomyia (L.) longipalpis has been implicated as the primary vector of Leishmania infantum, the causative agent of visceral leishmaniasis VL. In addition, it has been associated with atypical cutaneous leishmaniasis transmission in the Neotropic and Central America, respectively. The existence of a L. longipalpis complex species has been suggested with important implications for leishmaniasis epidemiology; however, the delimitation of species conforming it remains a topic of controversy. The DNA Barcoding Initiative based on cox1 sequence variation was used to identify the MOTUs in L. longipalpis including previously described L. pseudolongipalpis. The genetic variation was analyzed based on tree and distance methods. Fifty-five haplotypes were obtained from 103 sequences which were assigned to MOTUs, with a clear separation and a high correspondence of individuals to the groups. Maximum likelihood and Bayesian phylogenetic analysis showed eight MOTUs (100% bootstrap) with high genetic divergence (12.6%). Data obtained in the present study suggest that L. longipalpis complex consists of at least 8 lineages that may represent species. It would be desirable perform additional morphological and molecular analysis of L. longipalpis from Colosó (Caribbean ecoregion) considering that specimens from that area were grouped with L. pseudolongipalpis one of the complex species previously described from Venezuela, which has not been registered in Colombia.
Collapse
|
8
|
Porter TM, Hajibabaei M. Profile hidden Markov model sequence analysis can help remove putative pseudogenes from DNA barcoding and metabarcoding datasets. BMC Bioinformatics 2021; 22:256. [PMID: 34011275 PMCID: PMC8136176 DOI: 10.1186/s12859-021-04180-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 05/10/2021] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Pseudogenes are non-functional copies of protein coding genes that typically follow a different molecular evolutionary path as compared to functional genes. The inclusion of pseudogene sequences in DNA barcoding and metabarcoding analysis can lead to misleading results. None of the most widely used bioinformatic pipelines used to process marker gene (metabarcode) high throughput sequencing data specifically accounts for the presence of pseudogenes in protein-coding marker genes. The purpose of this study is to develop a method to screen for nuclear mitochondrial DNA segments (nuMTs) in large COI datasets. We do this by: (1) describing gene and nuMT characteristics from an artificial COI barcode dataset, (2) show the impact of two different pseudogene removal methods on perturbed community datasets with simulated nuMTs, and (3) incorporate a pseudogene filtering step in a bioinformatic pipeline that can be used to process Illumina paired-end COI metabarcode sequences. Open reading frame length and sequence bit scores from hidden Markov model (HMM) profile analysis were used to detect pseudogenes. RESULTS Our simulations showed that it was more difficult to identify nuMTs from shorter amplicon sequences such as those typically used in metabarcoding compared with full length DNA barcodes that are used in the construction of barcode libraries. It was also more difficult to identify nuMTs in datasets where there is a high percentage of nuMTs. Existing bioinformatic pipelines used to process metabarcode sequences already remove some nuMTs, especially in the rare sequence removal step, but the addition of a pseudogene filtering step can remove up to 5% of sequences even when other filtering steps are in place. CONCLUSIONS Open reading frame length filtering alone or combined with hidden Markov model profile analysis can be used to effectively screen out apparent pseudogenes from large datasets. There is more to learn from COI nuMTs such as their frequency in DNA barcoding and metabarcoding studies, their taxonomic distribution, and evolution. Thus, we encourage the submission of verified COI nuMTs to public databases to facilitate future studies.
Collapse
Affiliation(s)
- T M Porter
- Department of Integrative Biology and Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, Canada.
| | - M Hajibabaei
- Department of Integrative Biology and Centre for Biodiversity Genomics, University of Guelph, 50 Stone Road East, Guelph, ON, Canada
| |
Collapse
|
9
|
Hassan SS, Aljabali AAA, Panda PK, Ghosh S, Attrish D, Choudhury PP, Seyran M, Pizzol D, Adadi P, Abd El-Aziz TM, Soares A, Kandimalla R, Lundstrom K, Lal A, Azad GK, Uversky VN, Sherchan SP, Baetas-da-Cruz W, Uhal BD, Rezaei N, Chauhan G, Barh D, Redwan EM, Dayhoff GW, Bazan NG, Serrano-Aroca Á, El-Demerdash A, Mishra YK, Palu G, Takayama K, Brufsky AM, Tambuwala MM. A unique view of SARS-CoV-2 through the lens of ORF8 protein. Comput Biol Med 2021; 133:104380. [PMID: 33872970 PMCID: PMC8049180 DOI: 10.1016/j.compbiomed.2021.104380] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 04/01/2021] [Accepted: 04/02/2021] [Indexed: 01/07/2023]
Abstract
Immune evasion is one of the unique characteristics of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) attributed to its ORF8 protein. This protein modulates the adaptive host immunity through down-regulation of MHC-1 (Major Histocompatibility Complex) molecules and innate immune responses by surpassing the host's interferon-mediated antiviral response. To understand the host's immune perspective in reference to the ORF8 protein, a comprehensive study of the ORF8 protein and mutations possessed by it have been performed. Chemical and structural properties of ORF8 proteins from different hosts, such as human, bat, and pangolin, suggest that the ORF8 of SARS-CoV-2 is much closer to ORF8 of Bat RaTG13-CoV than to that of Pangolin-CoV. Eighty-seven mutations across unique variants of ORF8 in SARS-CoV-2 can be grouped into four classes based on their predicted effects (Hussain et al., 2021) [1]. Based on the geo-locations and timescale of sample collection, a possible flow of mutations was built. Furthermore, conclusive flows of amalgamation of mutations were found upon sequence similarity analyses and consideration of the amino acid conservation phylogenies. Therefore, this study seeks to highlight the uniqueness of the rapidly evolving SARS-CoV-2 through the ORF8.
Collapse
Affiliation(s)
- Sk Sarif Hassan
- Department of Mathematics, Pingla Thana Mahavidyalaya, Maligram, 721140, India
| | - Alaa A A Aljabali
- Department of Pharmaceutics and Pharmaceutical Technology, Yarmouk University-Faculty of Pharmacy, Irbid, 566, Jordan
| | - Pritam Kumar Panda
- Condensed Matter Theory Group, Materials Theory Division, Department of Physics and Astronomy, Uppsala University, Box 516, SE-751 20, Uppsala, Sweden
| | - Shinjini Ghosh
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, 700009, West Bengal, India
| | - Diksha Attrish
- Dr. B. R. Ambedkar Centre for Biomedical Research (ACBR), University of Delhi (North Campus), Delhi, 110007, India
| | - Pabitra Pal Choudhury
- Applied Statistics Unit, Indian Statistical Institute, Kolkata, 700108, West Bengal, India
| | - Murat Seyran
- Doctoral Studies in Natural and Technical Sciences (SPL 44), University of Vienna, Austria
| | - Damiano Pizzol
- Italian Agency for Development Cooperation - Khartoum, Sudan Street 33, Al Amarat, Sudan
| | - Parise Adadi
- Department of Food Science, University of Otago, Dunedin, 9054, New Zealand
| | - Tarek Mohamed Abd El-Aziz
- Zoology Department, Faculty of Science, Minia University, El-Minia, 61519, Egypt; Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229-3900, USA
| | - Antonio Soares
- Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229-3900, USA
| | - Ramesh Kandimalla
- CSIR-Indian Institute of Chemical Technology Uppal Road, Tarnaka, Hyderabad, 500007, Telangana State, India
| | | | - Amos Lal
- Division of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, MN, USA
| | | | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, 33612, USA
| | - Samendra P Sherchan
- Department of Environmental Health Sciences, Tulane University, New Orleans, LA, 70112, USA
| | - Wagner Baetas-da-Cruz
- Translational Laboratory in Molecular Physiology, Centre for Experimental Surgery, College of Medicine, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
| | - Bruce D Uhal
- Department of Physiology, Michigan State University, East Lansing, MI, 48824, USA
| | - Nima Rezaei
- Research Center for Immunodeficiencies, Pediatrics Center of Excellence, Children's Medical Center, Tehran University of Medical Sciences, Tehran, Iran and Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Stockholm, Sweden
| | - Gaurav Chauhan
- School of Engineering and Sciences, Tecnologico de Monterrey, Av. Eugenio Garza Sada 2501, Sur, 64849, Monterrey, NL, Mexico Tecnológico De Monterrey, Campus Monterrey, Monterrey, Nuevo León, Mexico
| | - Debmalya Barh
- Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), PatnaPatna, India
| | - Elrashdy M Redwan
- King Abdulazizi University, Faculty of Science, Department of Biological Science, Saudi Arabia
| | - Guy W Dayhoff
- Department of Chemistry, College of Art and Sciences, University of South Florida, Tampa, FL, 33620, USA
| | - Nicolas G Bazan
- Neuroscience Center of Excellence, School of Medicine, Louisiana State University Health New Orleans, New Orleans, LA, 70112, USA
| | - Ángel Serrano-Aroca
- Biomaterials and Bioengineering Lab, Translational Research Centre San Alberto Magno, Catholic University of Valencia San Vicente Mártir, C/Guillem de Castro 94, 46001, Valencia, Spain
| | - Amr El-Demerdash
- Natural Products and Medicinal Chemistry Department, Institute de Chimie des Substances Naturelles, Gif-sur-Yvette, France
| | - Yogendra K Mishra
- University of Southern Denmark, Mads Clausen Institute, NanoSYD, Alsion 2, 6400 Sønderborg, Denmark
| | - Giorgio Palu
- Department of Molecular Medicine, University of Padova, Italy
| | - Kazuo Takayama
- Center for IPS Cell Research and Application, Kyoto University, Kyoto, 606-8397, Japan
| | - Adam M Brufsky
- University of Pittsburgh School of Medicine, Department of Medicine, Division of Hematology/Oncology, UPMC Hillman Cancer Center, Pittsburgh, PA, USA
| | - Murtaza M Tambuwala
- School of Pharmacy and Pharmaceutical Science, Ulster University, Coleraine BT52 1SA, Northern Ireland, UK.
| |
Collapse
|
10
|
Beyond the comfort zone: amphibian diversity and distribution in the West Sahara-Sahel using mtDNA and nuDNA barcoding and spatial modelling. CONSERV GENET 2021. [DOI: 10.1007/s10592-021-01331-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
11
|
van Bemmelen van der Plaat A, van Treuren R, van Hintum TJL. Reliable genomic strategies for species classification of plant genetic resources. BMC Bioinformatics 2021; 22:173. [PMID: 33789577 PMCID: PMC8011391 DOI: 10.1186/s12859-021-04018-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 02/11/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND To address the need for easy and reliable species classification in plant genetic resources collections, we assessed the potential of five classifiers (Random Forest, Neighbour-Joining, 1-Nearest Neighbour, a conservative variety of 3-Nearest Neighbours and Naive Bayes) We investigated the effects of the number of accessions per species and misclassification rate on classification success, and validated theirs generic value results with three complete datasets. RESULTS We found the conservative variety of 3-Nearest Neighbours to be the most reliable classifier when varying species representation and misclassification rate. Through the analysis of the three complete datasets, this finding showed generic value. Additionally, we present various options for marker selection for classification taks such as these. CONCLUSIONS Large-scale genomic data are increasingly being produced for genetic resources collections. These data are useful to address species classification issues regarding crop wild relatives, and improve genebank documentation. Implementation of a classification method that can improve the quality of bad datasets without gold standard training data is considered an innovative and efficient method to improve gene bank documentation.
Collapse
Affiliation(s)
| | - Rob van Treuren
- Centre for Genetic Resources, Wageningen University and Research, P.O. Box 16, 6700 AA, Wageningen, The Netherlands
| | - Theo J L van Hintum
- Centre for Genetic Resources, Wageningen University and Research, P.O. Box 16, 6700 AA, Wageningen, The Netherlands
| |
Collapse
|
12
|
Martin BT, Chafin TK, Douglas MR, Placyk JS, Birkhead RD, Phillips CA, Douglas ME. The choices we make and the impacts they have: Machine learning and species delimitation in North American box turtles (Terrapene spp.). Mol Ecol Resour 2021; 21:2801-2817. [PMID: 33566450 DOI: 10.1111/1755-0998.13350] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 01/20/2021] [Accepted: 02/05/2021] [Indexed: 12/26/2022]
Abstract
Model-based approaches that attempt to delimit species are hampered by computational limitations as well as the unfortunate tendency by users to disregard algorithmic assumptions. Alternatives are clearly needed, and machine-learning (M-L) is attractive in this regard as it functions without the need to explicitly define a species concept. Unfortunately, its performance will vary according to which (of several) bioinformatic parameters are invoked. Herein, we gauge the effectiveness of M-L-based species-delimitation algorithms by parsing 64 variably-filtered versions of a ddRAD-derived SNP data set collected from North American box turtles (Terrapene spp.). Our filtering strategies included: (i) minor allele frequencies (MAF) of 5%, 3%, 1%, and 0% (= none), and (ii) maximum missing data per-individual/per-population at 25%, 50%, 75%, and 100% (= no filtering). We found that species-delimitation via unsupervised M-L impacted the signal-to-noise ratio in our data, as well as the discordance among resolved clades. The latter may also reflect biogeographic history, gene flow, incomplete lineage sorting, or combinations thereof (as corroborated from previously observed patterns of differential introgression). Our results substantiate M-L as a viable species-delimitation method, but also demonstrate how commonly observed patterns of phylogenetic discordance can seriously impact M-L-classification.
Collapse
Affiliation(s)
- Bradley T Martin
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Tyler K Chafin
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Marlis R Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| | - John S Placyk
- Department of Biology, University of Texas, Tyler, TX, USA.,Science Division, Trinity Valley Community College, Athens, Texas, USA
| | | | - Christopher A Phillips
- Illinois Natural History Survey, Prairie Research Institute, University of Illinois, Champaign, IL, USA
| | - Michael E Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA
| |
Collapse
|
13
|
Diet Composition of the Wild Stump-Tailed Macaque ( Macaca arctoides) in Perlis State Park, Peninsular Malaysia, Using a Chloroplast tRNL DNA Metabarcoding Approach: A Preliminary Study. Animals (Basel) 2020; 10:ani10122215. [PMID: 33255964 PMCID: PMC7761072 DOI: 10.3390/ani10122215] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 11/22/2020] [Accepted: 11/24/2020] [Indexed: 11/22/2022] Open
Abstract
Simple Summary This study investigated plant diet of wild Macaca arctoides in the Malaysia–Thailand border region using a chloroplast tRNL DNA metabarcoding approach. It is a comprehensive molecular technique to assess foods eaten by primates. We have chosen chloroplast tRNL because this region has been widely used for identifying plant species. Chloroplast tRNL DNA was amplified and sequenced using the Illumina MiniSeq platform. Sequences were analyzed using the CLC Genomic Workbench software version 12.0 to check for M. arctoides plant diet. Across these samples, we successfully identified 29 plant orders, 46 families, 124 genera, and 145 species. As the first report in Malaysia, the findings provide an important understanding on diet of wild M. arctoides that only reside in Perlis State Park, Malaysia. Abstract Understanding dietary diversity is a fundamental task in the study of stump-tailed macaque, Macaca arctoides in its natural habitat. However, direct feeding observation and morphological identification using fecal samples are not effective and nearly impossible to obtain in natural habitats because this species is sensitive to human presence. As ecological methods are challenging and time-consuming, DNA metabarcoding offers a more powerful assessment of the diet. We used a chloroplast tRNL DNA metabarcoding approach to identify the diversity of plants consumed by free-ranging M. arctoides in the Malaysia–Thailand border region located in Perlis State Park, Peninsular Malaysia. DNA was extracted from three fecal samples, and chloroplast tRNL DNA was amplified and sequenced using the Illumina MiniSeq platform. Sequences were analyzed using the CLC Genomic Workbench software. A total of 145 plant species from 46 families were successfully identified as being consumed by M. arctoides. The most abundant species were yellow saraca, Saraca thaipingensis (11.70%), common fig, Ficus carica (9.33%), aramata, Clathrotropis brachypetala (5.90%), sea fig, Ficus superba (5.44%), and envireira, Malmea dielsiana (1.70%). However, Clathrotropis and Malmea are not considered Malaysian trees because of limited data available from Malaysian plant DNA. Our study is the first to identify plant taxa up to the species level consumed by stump-tailed macaques based on a DNA metabarcoding approach. This result provides an important understanding on diet of wild M. arctoides that only reside in Perlis State Park, Malaysia.
Collapse
|
14
|
Sohsah GN, Ibrahimzada AR, Ayaz H, Cakmak A. Scalable classification of organisms into a taxonomy using hierarchical supervised learners. J Bioinform Comput Biol 2020; 18:2050026. [PMID: 33125294 DOI: 10.1142/s0219720020500262] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Accurately identifying organisms based on their partially available genetic material is an important task to explore the phylogenetic diversity in an environment. Specific fragments in the DNA sequence of a living organism have been defined as DNA barcodes and can be used as markers to identify species efficiently and effectively. The existing DNA barcode-based classification approaches suffer from three major issues: (i) most of them assume that the classification is done within a given taxonomic class and/or input sequences are pre-aligned, (ii) highly performing classifiers, such as SVM, cannot scale to large taxonomies due to high memory requirements, (iii) mutations and noise in input DNA sequences greatly reduce the taxonomic classification score. In order to address these issues, we propose a multi-level hierarchical classifier framework to automatically assign taxonomy labels to DNA sequences. We utilize an alignment-free approach called spectrum kernel method for feature extraction. We build a proof-of-concept hierarchical classifier with two levels, and evaluated it on real DNA sequence data from barcode of life data systems. We demonstrate that the proposed framework provides higher f1-score than regular classifiers. Besides, hierarchical framework scales better to large datasets enabling researchers to employ classifiers with high classification performance and high memory requirement on large datasets. Furthermore, we show that the proposed framework is more robust to mutations and noise in sequence data than the non-hierarchical classifiers.
Collapse
Affiliation(s)
- Gihad N Sohsah
- Department of Computer Science, Istanbul Sehir University, Istanbul, Turkey
| | | | - Huzeyfe Ayaz
- Department of Computer Science, Istanbul Sehir University, Istanbul, Turkey
| | - Ali Cakmak
- Department of Computer Engineering, Istanbul Technical University, Istanbul, Turkey
| |
Collapse
|
15
|
Koroiva R, Rodrigues LRR, Santana DJ. DNA barcoding for identification of anuran species in the central region of South America. PeerJ 2020; 8:e10189. [PMID: 33150083 PMCID: PMC7585382 DOI: 10.7717/peerj.10189] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 09/24/2020] [Indexed: 11/20/2022] Open
Abstract
The use of COI barcodes for specimen identification and species discovery has been a useful molecular approach for the study of Anura. Here, we establish a comprehensive amphibian barcode reference database in a central area of South America, in particular for specimens collected in Mato Grosso do Sul state (Brazil), and to evaluate the applicability of the COI gene for species-level identification. Both distance- and tree-based methods were applied for assessing species boundaries and the accuracy of specimen identification was evaluated. A total of 204 mitochondrial COI barcode sequences were evaluated from 22 genera and 59 species (19 newly barcoded species). Our results indicate that morphological and molecular identifications converge for most species, however, some species may present cryptic species due to high intraspecific variation, and there is a high efficiency of specimen identification. Thus, we show that COI sequencing can be used to identify anuran species present in this region.
Collapse
Affiliation(s)
- Ricardo Koroiva
- Departamento de Sistemática e Ecologia, Universidade Federal da Paraíba, João Pessoa, Paraíba, Brazil
| | | | - Diego José Santana
- Instituto de Biociências, Universidade Federal de Mato Grosso do Sul, Campo Grande, Mato Grosso do Sul, Brazil
| |
Collapse
|
16
|
Rampazzo F, Tosi F, Tedeschi P, Gion C, Arcangeli G, Brandolini V, Giovanardi O, Maietti A, Berto D. Preliminary multi analytical approach to address geographic traceability at the intraspecific level in Scombridae family. ISOTOPES IN ENVIRONMENTAL AND HEALTH STUDIES 2020; 56:260-279. [PMID: 32216466 DOI: 10.1080/10256016.2020.1739671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Accepted: 01/29/2020] [Indexed: 06/10/2023]
Abstract
Globalization of seafood product marketing caused the increase of request of an effective fish traceability that enhances the consumer confidence in food safety. In this study, an integrated multi analytical approach based on two different and independent analytical techniques (carbon and nitrogen stable isotopes and fatty acids analysis) was applied in order to identify different fish species and trace their geographical provenience. The investigation was focused on four species (Thunnus thynnus, Thunnus alalunga, Auxis rochei and Scomber scombrus) belonging to the Scombridae family. The DNA barcoding method confirmed genus and species for S. scombrus and A. rochei, but only genus for T. alalunga and T. thynnus. Carbon and nitrogen stable isotopes results evidenced different fish diets and trophic positions, whereas fatty acids analysis displayed that the unsaturated prevailed (∼60 %) over the saturated compounds with a variation among the species and the geographical area in particular for docosahexaenoic and eicosapentaenoic acids percentage. The principal component analysis applied to stable isotopes and fatty acids evidenced a good discrimination among species and their geographical catching area. This multi-disciplinary analytical approach could represent a promising tool to identify the commercial fish and trace their origin in order to guarantee the health of consumers.
Collapse
Affiliation(s)
- Federico Rampazzo
- Italian National Institute for Environmental Protection and Research (ISPRA), Chioggia (VE), Italy
| | - Federica Tosi
- Istituto Zooprofilattico Sperimentale delle Venezie, Legnaro (PD), Italy
| | - Paola Tedeschi
- Department of Chemical and Pharmaceutical Sciences, University of Ferrara (FE), Ferrara (FE), Italy
| | - Claudia Gion
- Italian National Institute for Environmental Protection and Research (ISPRA), Chioggia (VE), Italy
| | - Giuseppe Arcangeli
- Istituto Zooprofilattico Sperimentale delle Venezie, Legnaro (PD), Italy
| | - Vincenzo Brandolini
- Department of Chemical and Pharmaceutical Sciences, University of Ferrara (FE), Ferrara (FE), Italy
| | - Otello Giovanardi
- Italian National Institute for Environmental Protection and Research (ISPRA), Chioggia (VE), Italy
| | - Annalisa Maietti
- Department of Chemical and Pharmaceutical Sciences, University of Ferrara (FE), Ferrara (FE), Italy
| | - Daniela Berto
- Italian National Institute for Environmental Protection and Research (ISPRA), Chioggia (VE), Italy
| |
Collapse
|
17
|
Yang CQ, Lv Q, Zhang AB. Sixteen Years of DNA Barcoding in China: What Has Been Done? What Can Be Done? Front Ecol Evol 2020. [DOI: 10.3389/fevo.2020.00057] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
18
|
He S, Tian Y, Feng S, Wu Y, Shen X, Chen K, He Y, Sun Q, Li X, Xu J, Wen Z, Qu JY. In vivo single-cell lineage tracing in zebrafish using high-resolution infrared laser-mediated gene induction microscopy. eLife 2020; 9:e52024. [PMID: 31904340 PMCID: PMC7018510 DOI: 10.7554/elife.52024] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Accepted: 01/04/2020] [Indexed: 12/15/2022] Open
Abstract
Heterogeneity broadly exists in various cell types both during development and at homeostasis. Investigating heterogeneity is crucial for comprehensively understanding the complexity of ontogeny, dynamics, and function of specific cell types. Traditional bulk-labeling techniques are incompetent to dissect heterogeneity within cell population, while the new single-cell lineage tracing methodologies invented in the last decade can hardly achieve high-fidelity single-cell labeling and long-term in-vivo observation simultaneously. In this work, we developed a high-precision infrared laser-evoked gene operator heat-shock system, which uses laser-induced CreERT2 combined with loxP-DsRedx-loxP-GFP reporter to achieve precise single-cell labeling and tracing. In vivo study indicated that this system can precisely label single cell in brain, muscle and hematopoietic system in zebrafish embryo. Using this system, we traced the hematopoietic potential of hemogenic endothelium (HE) in the posterior blood island (PBI) of zebrafish embryo and found that HEs in the PBI are heterogeneous, which contains at least myeloid unipotent and myeloid-lymphoid bipotent subtypes.
Collapse
Affiliation(s)
- Sicong He
- Department of Electronic and Computer EngineeringThe Hong Kong University of Science and TechnologyKowloonChina
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
| | - Ye Tian
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
- Division of Life ScienceThe Hong Kong University of Science and TechnologyKowloonChina
| | - Shachuan Feng
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
- Division of Life ScienceThe Hong Kong University of Science and TechnologyKowloonChina
| | - Yi Wu
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
- Division of Life ScienceThe Hong Kong University of Science and TechnologyKowloonChina
| | - Xinwei Shen
- Department of MathematicsThe Hong Kong University of Science and TechnologyKowloonChina
| | - Kani Chen
- Department of MathematicsThe Hong Kong University of Science and TechnologyKowloonChina
| | - Yingzhu He
- Department of Electronic and Computer EngineeringThe Hong Kong University of Science and TechnologyKowloonChina
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
| | - Qiqi Sun
- Department of Electronic and Computer EngineeringThe Hong Kong University of Science and TechnologyKowloonChina
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
| | - Xuesong Li
- Department of Electronic and Computer EngineeringThe Hong Kong University of Science and TechnologyKowloonChina
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
| | - Jin Xu
- Division of Cell, Developmental and Integrative Biology, School of MedicineSouth China University of TechnologyGuangzhouChina
| | - Zilong Wen
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
- Division of Life ScienceThe Hong Kong University of Science and TechnologyKowloonChina
| | - Jianan Y Qu
- Department of Electronic and Computer EngineeringThe Hong Kong University of Science and TechnologyKowloonChina
- State Key Laboratory of Molecular NeuroscienceThe Hong Kong University of Science and TechnologyKowloonChina
- Center of Systems Biology and Human HealthThe Hong Kong University of Science and TechnologyKowloonChina
| |
Collapse
|
19
|
Swain SN, Makunin A, Dora AS, Barik TK. SNP barcoding based on decision tree algorithm: A new tool for identification of mosquito species with special reference to Anopheles. Acta Trop 2019; 199:105152. [PMID: 31445898 DOI: 10.1016/j.actatropica.2019.105152] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 07/29/2019] [Accepted: 08/20/2019] [Indexed: 02/01/2023]
Abstract
Molecular taxonomy based identification of species in the form of DNA barcodes are extensively used in evolutionary systematics. Almost all the DNA barcodes contain detailed information of the barcoding gene along with uninformative sequences of a particular species. Therefore, a technique is highly essential to remove or to reduce the number of uninformative sequences and ought to create species-specific barcodes for differentiation. The actual variation in genetic sequences, called single nucleotide polymorphism (SNP) genotyping, can be utilized to develop a new tool for rapid, reliable, and high-throughput assay to distinguish the known species. SNPs act as important hereditary markers for uncovering the evolutionary history and normal genetic polymorphisms. Keeping in mind, we propose a decision tree-based barcoding (DTB) algorithm for generating SNP barcodes from the DNA barcoding sequence of several evolutionarily related species to accurately identify a single species. To address this issue, we analyzed mitochondrial COI gene sequences of 64 species of Anopheles mosquitoes. After alignment and truncating, 32 SNPs were discovered in COI gene sequences of Anopheles mosquitoes and then computed to set up the decision rule for constructing the decision tree. The decision tree based barcoding algorithm generates 126 nodes and 32 loci for discriminating 64 Anopheles mosquito species. Finally, we concluded that the DTB method is useful and effective for generating sequence tags for Anopheles mosquito species identification.
Collapse
|
20
|
Derkarabetian S, Castillo S, Koo PK, Ovchinnikov S, Hedin M. A demonstration of unsupervised machine learning in species delimitation. Mol Phylogenet Evol 2019; 139:106562. [PMID: 31323334 PMCID: PMC6880864 DOI: 10.1016/j.ympev.2019.106562] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Revised: 07/03/2019] [Accepted: 07/15/2019] [Indexed: 01/13/2023]
Abstract
One major challenge to delimiting species with genetic data is successfully differentiating population structure from species-level divergence, an issue exacerbated in taxa inhabiting naturally fragmented habitats. Many fields of science are now using machine learning, and in evolutionary biology supervised machine learning has recently been used to infer species boundaries. These supervised methods require training data with associated labels. Conversely, unsupervised machine learning (UML) uses inherent data structure and does not require user-specified training labels, potentially providing more objectivity in species delimitation. In the context of integrative taxonomy, we demonstrate the utility of three UML approaches (random forests, variational autoencoders, t-distributed stochastic neighbor embedding) for species delimitation in an arachnid taxon with high population genetic structure (Opiliones, Laniatores, Metanonychus). We find that UML approaches successfully cluster samples according to species-level divergences and not high levels of population structure, while model-based validation methods severely over-split putative species. UML offers intuitive data visualization in two-dimensional space, the ability to accommodate various data types, and has potential in many areas of systematic and evolutionary biology. We argue that machine learning methods are ideally suited for species delimitation and may perform well in many natural systems and across taxa with diverse biological characteristics.
Collapse
Affiliation(s)
- Shahan Derkarabetian
- Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA 02138, United States; Department of Biology, San Diego State University, San Diego, CA 92182, United States; Department of Evolution, Ecology, and Organismal Biology, University of California, Riverside, Riverside, CA 92521, United States.
| | - Stephanie Castillo
- Department of Biology, San Diego State University, San Diego, CA 92182, United States; Department of Entomology, University of California, Riverside, Riverside, CA 92521, United States
| | - Peter K Koo
- Howard Hughes Medical Institute, Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, United States
| | - Sergey Ovchinnikov
- Center for Systems Biology, Harvard University, Cambridge, MA 02138, United States
| | - Marshal Hedin
- Department of Biology, San Diego State University, San Diego, CA 92182, United States
| |
Collapse
|
21
|
Kreuzer M, Howard C, Adhikari B, Pendry CA, Hawkins JA. Phylogenomic Approaches to DNA Barcoding of Herbal Medicines: Developing Clade-Specific Diagnostic Characters for Berberis. FRONTIERS IN PLANT SCIENCE 2019; 10:586. [PMID: 31139202 PMCID: PMC6527895 DOI: 10.3389/fpls.2019.00586] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 04/18/2019] [Indexed: 05/12/2023]
Abstract
DNA barcoding of herbal medicines has been mainly concerned with authentication of products in trade and has raised awareness of species substitution and adulteration. More recently DNA barcodes have been included in pharmacopoeias, providing tools for regulatory purposes. The commonly used DNA barcoding regions in plants often fail to resolve identification to species level. This can be especially challenging in evolutionarily complex groups where incipient or reticulate speciation is ongoing. In this study, we take a phylogenomic approach, analyzing whole plastid sequences from the evolutionarily complex genus Berberis in order to develop DNA barcodes for the medicinally important species Berberis aristata. The phylogeny reconstructed from an alignment of ∼160 kbp of chloroplast DNA for 57 species reveals that the pharmacopoeial species in question is polyphyletic, complicating development of a species-specific DNA barcode. Instead we propose a DNA barcode that is clade specific, using our phylogeny to define Operational Phylogenetic Units (OPUs). The plastid alignment is then reduced to small, informative DNA regions including nucleotides diagnostic for these OPUs. These DNA barcodes were tested on commercial samples, and shown to discriminate plants in trade and therefore to meet the requirement of a pharmacopoeial standard. The proposed method provides an innovative approach for inferring DNA barcodes for evolutionarily complex groups for regulatory purposes and quality control.
Collapse
Affiliation(s)
- Marco Kreuzer
- School of Biological Sciences, University of Reading, Reading, United Kingdom
| | - Caroline Howard
- BP-NIBSC Herbal Laboratory, National Institute for Biological Standards and Control, Potters Bar, United Kingdom
| | | | | | - Julie A. Hawkins
- School of Biological Sciences, University of Reading, Reading, United Kingdom
| |
Collapse
|
22
|
Yang CH, Wu KC, Chuang LY, Chang HW. Decision Theory-Based COI-SNP Tagging Approach for 126 Scombriformes Species Tagging. Front Genet 2019; 10:259. [PMID: 31001317 PMCID: PMC6456664 DOI: 10.3389/fgene.2019.00259] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2018] [Accepted: 03/08/2019] [Indexed: 12/02/2022] Open
Abstract
The mitochondrial gene cytochrome c oxidase I (COI) is commonly used for DNA barcoding in animals. However, most of the COI barcode nucleotides are conserved and sequences longer than about 650 base pairs increase the computational burden for species identification. To solve this problem, we propose a decision theory-based COI SNP tagging (DCST) approach that focuses on the discrimination of species using single nucleotide polymorphisms (SNPs) as the variable nucleotides of the sequences of a group of species. Using the example of 126 teleost mackerel fish species (order: Scombriformes), we identified 281 SNPs by alignment and trimming of their COI sequences. After decision rule making, 49 SNPs in 126 fish species were determined using the scoring system of the DCST approach. These COI-SNP barcodes were finally transformed into one-dimensional barcode images. Our proposed DCST approach simplifies the computational complexity and identifies the most effective and fewest SNPs to resolve or discriminate species for species tagging.
Collapse
Affiliation(s)
- Cheng-Hong Yang
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan.,Biomedical Engineering, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Kuo-Chuan Wu
- Department of Electronic Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan.,Department of Computer Science and Information Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
| | - Li-Yeh Chuang
- Department of Chemical Engineering and Institute of Biotechnology and Chemical Engineering, I-Shou University, Kaohsiung, Taiwan
| | - Hsueh-Wei Chang
- Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung, Taiwan.,Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan.,Department of Biomedical Science and Environmental Biology, Kaohsiung Medical University, Kaohsiung, Taiwan
| |
Collapse
|
23
|
Lin SW, Lopardo L, Haase M, Uhl G. Taxonomic revision of the dwarf spider genus Shaanxinus Tanasevitch, 2006 (Araneae, Linyphiidae, Erigoninae), with new species from Taiwan and Vietnam. ORG DIVERS EVOL 2019. [DOI: 10.1007/s13127-018-00389-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
24
|
Phillips JD, Gillis DJ, Hanner RH. Incomplete estimates of genetic diversity within species: Implications for DNA barcoding. Ecol Evol 2019; 9:2996-3010. [PMID: 30891232 PMCID: PMC6406011 DOI: 10.1002/ece3.4757] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 09/03/2018] [Accepted: 10/12/2018] [Indexed: 02/01/2023] Open
Abstract
DNA barcoding has greatly accelerated the pace of specimen identification to the species level, as well as species delineation. Whereas the application of DNA barcoding to the matching of unknown specimens to known species is straightforward, its use for species delimitation is more controversial, as species discovery hinges critically on present levels of haplotype diversity, as well as patterning of standing genetic variation that exists within and between species. Typical sample sizes for molecular biodiversity assessment using DNA barcodes range from 5 to 10 individuals per species. However, required levels that are necessary to fully gauge haplotype variation at the species level are presumed to be strongly taxon-specific. Importantly, little attention has been paid to determining appropriate specimen sample sizes that are necessary to reveal the majority of intraspecific haplotype variation within any one species. In this paper, we present a brief outline of the current literature and methods on intraspecific sample size estimation for the assessment of COI DNA barcode haplotype sampling completeness. The importance of adequate sample sizes for studies of molecular biodiversity is stressed, with application to a variety of metazoan taxa, through reviewing foundational statistical and population genetic models, with specific application to ray-finned fishes (Chordata: Actinopterygii). Finally, promising avenues for further research in this area are highlighted.
Collapse
Affiliation(s)
- Jarrett D. Phillips
- School of Computer ScienceUniversity of GuelphGuelphOntarioCanada
- Centre for Biodiversity GenomicsBiodiversity Institute of OntarioUniversity of GuelphGuelphOntarioCanada
| | - Daniel J. Gillis
- School of Computer ScienceUniversity of GuelphGuelphOntarioCanada
| | - Robert H. Hanner
- Centre for Biodiversity GenomicsBiodiversity Institute of OntarioUniversity of GuelphGuelphOntarioCanada
- Department of Integrative BiologyUniversity of GuelphGuelphOntarioCanada
| |
Collapse
|
25
|
Ren F, Wang Y, Xu Z, Li Y, Xin T, Zhou J, Qi Y, Wei X, Yao H, Song J. DNA barcoding of Corydalis, the most taxonomically complicated genus of Papaveraceae. Ecol Evol 2019; 9:1934-1945. [PMID: 30847083 PMCID: PMC6392370 DOI: 10.1002/ece3.4886] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 11/02/2018] [Accepted: 12/06/2018] [Indexed: 02/06/2023] Open
Abstract
The genus Corydalis is recognized as one of the most taxonomically challenging plant taxa. It is mainly distributed in the Himalaya-Hengduan Mountains, a global biodiversity hotspot. To date, no effective solution for species discrimination and taxonomic assignment in Corydalis has been developed. In this study, five nuclear and chloroplast DNA regions, ITS, ITS2, matK, rbcL, and psbA-trnH, were preliminarily assessed based on their ability to discriminate Corydalis to eliminate inefficient regions, and the three regions showing good performance (ITS, ITS2 and matK) were then evaluated in 131 samples representing 28 species of 11 sections of four subgenera in Corydalis using three analytical methods (NJ, ML, MP tree; K2P-distance and BLAST). The results showed that the various approaches exhibit different species identification power and that BLAST shows the best performance among the tested approaches. A comparison of different barcodes indicated that among the single barcodes, ITS (65.2%) exhibited the highest identification success rate and that the combination of ITS + matK (69.6%) provided the highest species resolution among all single barcodes and their combinations. Three Pharmacopoeia-recorded medicinal plants and their materia medica were identified successfully based on the ITS and ITS2 regions. In the phylogenetic analysis, the sections Thalictrifoliae, Sophorocapnos, Racemosae, Aulacostigma, and Corydalis formed well-supported separate lineages. We thus hypothesize that the five sections should be classified as an independent subgenus and that the genus should be divided into three subgenera. In this study, DNA barcoding provided relatively high species discrimination power, indicating that it can be used for species discrimination in this taxonomically complicated genus and as a potential tool for the authentication of materia medica belonging to Corydalis.
Collapse
Affiliation(s)
- Feng‐Ming Ren
- Key Lab of Chinese Medicine Resources Conservation, State Administration of Traditional Chinese Medicine of the People's Republic of China, Institute of Medicinal Plant DevelopmentChinese Academy of Medical Sciences, Peking Union Medical CollegeBeijingChina
- Chongqing Institute of Medicinal Plant Cultivation, Research and Utilization on Characteristic Biological Resources of Sichuan and Chongqing Co‐construction LabChinese Medicine Breeding and Evaluation Engineering Technology Research Center of ChongqingChongqingChina
| | - Ying‐Wei Wang
- Beijing Botanical Garden, Institute of BotanyChinese Academy of SciencesBeijingChina
| | - Zhi‐Chao Xu
- Key Lab of Chinese Medicine Resources Conservation, State Administration of Traditional Chinese Medicine of the People's Republic of China, Institute of Medicinal Plant DevelopmentChinese Academy of Medical Sciences, Peking Union Medical CollegeBeijingChina
| | - Ying Li
- Key Lab of Chinese Medicine Resources Conservation, State Administration of Traditional Chinese Medicine of the People's Republic of China, Institute of Medicinal Plant DevelopmentChinese Academy of Medical Sciences, Peking Union Medical CollegeBeijingChina
| | - Tian‐Yi Xin
- Key Lab of Chinese Medicine Resources Conservation, State Administration of Traditional Chinese Medicine of the People's Republic of China, Institute of Medicinal Plant DevelopmentChinese Academy of Medical Sciences, Peking Union Medical CollegeBeijingChina
| | - Jian‐Guo Zhou
- Key Lab of Chinese Medicine Resources Conservation, State Administration of Traditional Chinese Medicine of the People's Republic of China, Institute of Medicinal Plant DevelopmentChinese Academy of Medical Sciences, Peking Union Medical CollegeBeijingChina
| | - Yao‐Dong Qi
- Key Lab of Chinese Medicine Resources Conservation, State Administration of Traditional Chinese Medicine of the People's Republic of China, Institute of Medicinal Plant DevelopmentChinese Academy of Medical Sciences, Peking Union Medical CollegeBeijingChina
| | - Xue‐Ping Wei
- Key Lab of Chinese Medicine Resources Conservation, State Administration of Traditional Chinese Medicine of the People's Republic of China, Institute of Medicinal Plant DevelopmentChinese Academy of Medical Sciences, Peking Union Medical CollegeBeijingChina
| | - Hui Yao
- Key Lab of Chinese Medicine Resources Conservation, State Administration of Traditional Chinese Medicine of the People's Republic of China, Institute of Medicinal Plant DevelopmentChinese Academy of Medical Sciences, Peking Union Medical CollegeBeijingChina
| | - Jing‐Yuan Song
- Key Lab of Chinese Medicine Resources Conservation, State Administration of Traditional Chinese Medicine of the People's Republic of China, Institute of Medicinal Plant DevelopmentChinese Academy of Medical Sciences, Peking Union Medical CollegeBeijingChina
| |
Collapse
|
26
|
Meher PK, Sahu TK, Gahoi S, Tomar R, Rao AR. funbarRF: DNA barcode-based fungal species prediction using multiclass Random Forest supervised learning model. BMC Genet 2019; 20:2. [PMID: 30616524 PMCID: PMC6323839 DOI: 10.1186/s12863-018-0710-z] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2018] [Accepted: 12/26/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS An online prediction server "funbarRF" is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF ( https://cran.r-project.org/web/packages/funbarRF/ ) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Shachi Gahoi
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| | - Ruchi Tomar
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
- Department of Bioinformatics, Janta Vedic College, Baraut, Baghpat, Uttar Pradesh 250611 India
| | - Atmakuri Ramakrishna Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012 India
| |
Collapse
|
27
|
Authentication of Herbal Medicines Dipsacus asper and Phlomoides umbrosa Using DNA Barcodes, Chloroplast Genome, and Sequence Characterized Amplified Region (SCAR) Marker. Molecules 2018; 23:molecules23071748. [PMID: 30018232 PMCID: PMC6099718 DOI: 10.3390/molecules23071748] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 07/13/2018] [Accepted: 07/15/2018] [Indexed: 01/21/2023] Open
Abstract
Dried roots of Dipsacus asper (Caprifoliaceae) are used as important traditional herbal medicines in Korea. However, the roots are often used as a mixture or contaminated with Dipsacus japonicus in Korean herbal markets. Furthermore, the dried roots of Phlomoides umbrosa (Lamiaceae) are used indiscriminately with those of D. asper, with the confusing Korean names of Sok-Dan and Han-Sok-Dan for D. asper and P. umbrosa, respectively. Although D. asper and P. umbrosa are important herbal medicines, the molecular marker and genomic information available for these species are limited. In this study, we analysed DNA barcodes to distinguish among D. asper, D. japonicus, and P. umbrosa and sequenced the chloroplast (CP) genomes of D. asper and D. japonicus. The CP genomes of D. asper and D. japonicus were 160,530 and 160,371 bp in length, respectively, and were highly divergent from those of the other Caprifoliaceae species. Phylogenetic analysis revealed a monophyletic group within Caprifoliaceae. We also developed a novel sequence characterised amplified region (SCAR) markers to distinguish among D. asper, D. japonicus, and P. umbrosa. Our results provide important taxonomic, phylogenetic, and evolutionary information on the Dipsacus species. The SCAR markers developed here will be useful for the authentication of herbal medicines.
Collapse
|
28
|
Yang F, Ding F, Chen H, He M, Zhu S, Ma X, Jiang L, Li H. DNA Barcoding for the Identification and Authentication of Animal Species in Traditional Medicine. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE : ECAM 2018; 2018:5160254. [PMID: 29849709 PMCID: PMC5937547 DOI: 10.1155/2018/5160254] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/19/2017] [Accepted: 03/11/2018] [Indexed: 02/06/2023]
Abstract
Animal-based traditional medicine not only plays a significant role in therapeutic practices worldwide but also provides a potential compound library for drug discovery. However, persistent hunting and illegal trade markedly threaten numerous medicinal animal species, and increasing demand further provokes the emergence of various adulterants. As the conventional methods are difficult and time-consuming to detect processed products or identify animal species with similar morphology, developing novel authentication methods for animal-based traditional medicine represents an urgent need. During the last decade, DNA barcoding offers an accurate and efficient strategy that can identify existing species and discover unknown species via analysis of sequence variation in a standardized region of DNA. Recent studies have shown that DNA barcoding as well as minibarcoding and metabarcoding is capable of identifying animal species and discriminating the authentics from the adulterants in various types of traditional medicines, including raw materials, processed products, and complex preparations. These techniques can also be used to detect the unlabelled and threatened animal species in traditional medicine. Here, we review the recent progress of DNA barcoding for the identification and authentication of animal species used in traditional medicine, which provides a reference for quality control and trade supervision of animal-based traditional medicine.
Collapse
Affiliation(s)
- Fan Yang
- Institute of Forensic Science, Ministry of Public Security, Beijing 100038, China
- Beijing Engineering Research Center of Crime Scene Evidence Examination, Institute of Forensic Science, Beijing 100038, China
| | - Fei Ding
- Center for Bioresources & Drug Discovery and School of Biosciences & Biopharmaceutics, Guangdong Pharmaceutical University, Guangzhou, Guangdong 510006, China
| | - Hong Chen
- Center for Bioresources & Drug Discovery and School of Biosciences & Biopharmaceutics, Guangdong Pharmaceutical University, Guangzhou, Guangdong 510006, China
| | - Mingqi He
- Center for Bioresources & Drug Discovery and School of Biosciences & Biopharmaceutics, Guangdong Pharmaceutical University, Guangzhou, Guangdong 510006, China
| | - Shixin Zhu
- Center for Bioresources & Drug Discovery and School of Biosciences & Biopharmaceutics, Guangdong Pharmaceutical University, Guangzhou, Guangdong 510006, China
| | - Xin Ma
- Institute of Forensic Science, Ministry of Public Security, Beijing 100038, China
- Beijing Engineering Research Center of Crime Scene Evidence Examination, Institute of Forensic Science, Beijing 100038, China
| | - Li Jiang
- Institute of Forensic Science, Ministry of Public Security, Beijing 100038, China
- Beijing Engineering Research Center of Crime Scene Evidence Examination, Institute of Forensic Science, Beijing 100038, China
| | - Haifeng Li
- Center for Bioresources & Drug Discovery and School of Biosciences & Biopharmaceutics, Guangdong Pharmaceutical University, Guangzhou, Guangdong 510006, China
| |
Collapse
|
29
|
Song C, Lin XL, Wang Q, Wang XH. DNA barcodes successfully delimit morphospecies in a superdiverse insect genus. ZOOL SCR 2018. [DOI: 10.1111/zsc.12284] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Chao Song
- College of Life Sciences; Nankai University; Tianjin China
| | - Xiao-Long Lin
- Department of Natural History; NTNU University Museum; Norwegian University of Science and Technology; Trondheim Norway
| | - Qian Wang
- Tianjin key Laboratory of Aqua-Ecology & Aquaculture; Fisheries of College; Tianjin Agricultural University; Tianjin China
| | - Xin-Hua Wang
- College of Life Sciences; Nankai University; Tianjin China
| |
Collapse
|
30
|
Tahir A, Hussain F, Ahmed N, Ghorbani A, Jamil A. Assessing universality of DNA barcoding in geographically isolated selected desert medicinal species of Fabaceae and Poaceae. PeerJ 2018; 6:e4499. [PMID: 29576968 PMCID: PMC5855882 DOI: 10.7717/peerj.4499] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 02/21/2018] [Indexed: 01/14/2023] Open
Abstract
In pursuit of developing fast and accurate species-level molecular identification methods, we tested six DNA barcodes, namely ITS2, matK, rbcLa, ITS2+matK, ITS2+rbcLa, matK+rbcLa and ITS2+matK+rbcLa, for their capacity to identify frequently consumed but geographically isolated medicinal species of Fabaceae and Poaceae indigenous to the desert of Cholistan. Data were analysed by BLASTn sequence similarity, pairwise sequence divergence in TAXONDNA, and phylogenetic (neighbour-joining and maximum-likelihood trees) methods. Comparison of six barcode regions showed that ITS2 has the highest number of variable sites (209/360) for tested Fabaceae and (106/365) Poaceae species, the highest species-level identification (40%) in BLASTn procedure, distinct DNA barcoding gap, 100% correct species identification in BM and BCM functions of TAXONDNA, and clear cladding pattern with high nodal support in phylogenetic trees in both families. ITS2+matK+rbcLa followed ITS2 in its species-level identification capacity. The study was concluded with advocating the DNA barcoding as an effective tool for species identification and ITS2 as the best barcode region in identifying medicinal species of Fabaceae and Poaceae. Current research has practical implementation potential in the fields of pharmaco-vigilance, trade of medicinal plants and biodiversity conservation.
Collapse
Affiliation(s)
- Aisha Tahir
- Department of Biochemistry, Faculty of Science, University of Agriculture, Faisalabad, Pakistan
| | - Fatma Hussain
- Department of Biochemistry, Faculty of Science, University of Agriculture, Faisalabad, Pakistan
| | - Nisar Ahmed
- Centre of Agricultural Biochemistry and Biotechnology, University of Agriculture, Faisalabad, Pakistan
| | | | - Amer Jamil
- Department of Biochemistry, Faculty of Science, University of Agriculture, Faisalabad, Pakistan
| |
Collapse
|
31
|
Shi ZY, Yang CQ, Hao MD, Wang XY, Ward RD, Zhang AB. FuzzyID2: A software package for large data set species identification via barcoding and metabarcoding using hidden Markov models and fuzzy set methods. Mol Ecol Resour 2017; 18:666-675. [DOI: 10.1111/1755-0998.12738] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Revised: 10/20/2017] [Accepted: 11/12/2017] [Indexed: 11/29/2022]
Affiliation(s)
- Zhi-yong Shi
- College of Life Sciences; Capital Normal University; Beijing China
| | - Cai-qing Yang
- College of Life Sciences; Capital Normal University; Beijing China
| | - Meng-di Hao
- College of Life Sciences; Capital Normal University; Beijing China
| | - Xiao-yang Wang
- State Key Laboratory of Genetic Resources and Evolution; Kunming Institute of Zoology; Chinese Academy of Sciences; Kunming Yunnan China
- Kunming College of Life Sciences; University of Chinese Academy of Sciences; Kunming Yunnan China
| | - Robert D. Ward
- CSIRO National Research Collections Australia; Hobart TAS Australia
| | - Ai-bing Zhang
- College of Life Sciences; Capital Normal University; Beijing China
| |
Collapse
|
32
|
Yu M, Jiao L, Guo J, Wiedenhoeft AC, He T, Jiang X, Yin Y. DNA barcoding of vouchered xylarium wood specimens of nine endangered Dalbergia species. PLANTA 2017; 246:1165-1176. [PMID: 28825134 DOI: 10.1007/s00425-017-2758-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Accepted: 08/08/2017] [Indexed: 06/07/2023]
Abstract
ITS2+ trnH - psbA was the best combination of DNA barcode to resolve the Dalbergia wood species studied. We demonstrate the feasibility of building a DNA barcode reference database using xylarium wood specimens. The increase in illegal logging and timber trade of CITES-listed tropical species necessitates the development of unambiguous identification methods at the species level. For these methods to be fully functional and deployable for law enforcement, they must work using wood or wood products. DNA barcoding of wood has been promoted as a promising tool for species identification; however, the main barrier to extensive application of DNA barcoding to wood is the lack of a comprehensive and reliable DNA reference library of barcodes from wood. In this study, xylarium wood specimens of nine Dalbergia species were selected from the Wood Collection of the Chinese Academy of Forestry and DNA was then extracted from them for further PCR amplification of eight potential DNA barcode sequences (ITS2, matK, trnL, trnH-psbA, trnV-trnM1, trnV-trnM2, trnC-petN, and trnS-trnG). The barcodes were tested singly and in combination for species-level discrimination ability by tree-based [neighbor-joining (NJ)] and distance-based (TaxonDNA) methods. We found that the discrimination ability of DNA barcodes in combination was higher than any single DNA marker among the Dalbergia species studied, with the best two-marker combination of ITS2+trnH-psbA analyzed with NJ trees performing the best (100% accuracy). These barcodes are relatively short regions (<350 bp) and amplification reactions were performed with high success (≥90%) using wood as the source material, a necessary factor to apply DNA barcoding to timber trade. The present results demonstrate the feasibility of using vouchered xylarium specimens to build DNA barcoding reference databases.
Collapse
Affiliation(s)
- Min Yu
- Department of Wood Anatomy and Utilization, Chinese Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing, 100091, China
- Wood Collections (WOODPEDIA), Chinese Academy of Forestry, Beijing, 100091, China
| | - Lichao Jiao
- Department of Wood Anatomy and Utilization, Chinese Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing, 100091, China
- Wood Collections (WOODPEDIA), Chinese Academy of Forestry, Beijing, 100091, China
| | - Juan Guo
- Department of Wood Anatomy and Utilization, Chinese Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing, 100091, China
- Wood Collections (WOODPEDIA), Chinese Academy of Forestry, Beijing, 100091, China
| | - Alex C Wiedenhoeft
- Center for Wood Anatomy Research, USDA Forest Service, Forest Products Laboratory, Madison, WI, 53726, USA
- Department of Botany, University of Wisconsin, Madison, WI, 53706, USA
- Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN, 47907, USA
- Ciências Biológicas (Botânica), Univesidade Estadual Paulista, Botucatu, São Paulo, Brazil
| | - Tuo He
- Department of Wood Anatomy and Utilization, Chinese Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing, 100091, China
- Wood Collections (WOODPEDIA), Chinese Academy of Forestry, Beijing, 100091, China
| | - Xiaomei Jiang
- Department of Wood Anatomy and Utilization, Chinese Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing, 100091, China
- Wood Collections (WOODPEDIA), Chinese Academy of Forestry, Beijing, 100091, China
| | - Yafang Yin
- Department of Wood Anatomy and Utilization, Chinese Research Institute of Wood Industry, Chinese Academy of Forestry, Beijing, 100091, China.
- Wood Collections (WOODPEDIA), Chinese Academy of Forestry, Beijing, 100091, China.
| |
Collapse
|
33
|
Development of 21 polymorphic microsatellite markers for the black-banded sea krait, Laticauda semifasciata (Elapidae: Laticaudinae), and cross-species amplification for two other congeneric species. Genes Genomics 2017; 40:447-454. [DOI: 10.1007/s13258-017-0626-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 10/24/2017] [Indexed: 12/21/2022]
|
34
|
Mishra P, Kumar A, Sivaraman G, Shukla AK, Kaliamoorthy R, Slater A, Velusamy S. Character-based DNA barcoding for authentication and conservation of IUCN Red listed threatened species of genus Decalepis (Apocynaceae). Sci Rep 2017; 7:14910. [PMID: 29097709 PMCID: PMC5668324 DOI: 10.1038/s41598-017-14887-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2017] [Accepted: 10/09/2017] [Indexed: 11/09/2022] Open
Abstract
The steno-endemic species of genus Decalepis are highly threatened by destructive wild harvesting. The medicinally important fleshy tuberous roots of Decalepis hamiltonii are traded as substitute, to meet the international market demand of Hemidesmus indicus. In addition, the tuberous roots of all three species of Decalepis possess similar exudates and texture, which challenges the ability of conventional techniques alone to perform accurate species authentication. This study was undertaken to generate DNA barcodes that could be utilized in monitoring and curtailing the illegal trade of these endangered species. The DNA barcode reference library was developed in BOLD database platform for candidate barcodes rbcL, matK, psbA-trnH, ITS and ITS2. The average intra-specific variations (0-0.27%) were less than the distance to nearest neighbour (0.4-11.67%) with matK and ITS. Anchoring the coding region rbcL in multigene tiered approach, the combination rbcL + matK + ITS yielded 100% species resolution, using the least number of loci combinations either with PAUP or BLOG methods to support a character-based approach. Species-specific SNP position (230 bp) in the matK region that is characteristic of D. hamiltonii could be used to design specific assays, enhancing its applicability for direct use in CITES enforcement for distinguishing it from H. indicus.
Collapse
Affiliation(s)
- Priyanka Mishra
- Plant Biology and Systematics, CSIR - Central Institute of Medicinal and Aromatic Plants, Research Center, Allalsandra, GKVK Post, Bengaluru, 560065, Karnataka, India
| | - Amit Kumar
- Plant Biology and Systematics, CSIR - Central Institute of Medicinal and Aromatic Plants, Research Center, Allalsandra, GKVK Post, Bengaluru, 560065, Karnataka, India
| | - Gokul Sivaraman
- Plant Biology and Systematics, CSIR - Central Institute of Medicinal and Aromatic Plants, Research Center, Allalsandra, GKVK Post, Bengaluru, 560065, Karnataka, India
| | - Ashutosh K Shukla
- Biotechnology Division, CSIR - Central Institute of Medicinal and Aromatic Plants, P.O. CIMAP, Lucknow, 226015, Uttar Pradesh, India
| | - Ravikumar Kaliamoorthy
- School of Conservation, TransDisciplinary University, 74/2, Jarakabande Kaval, Post Attur, Via Yelahanka, Bangalore, 560064, Karnataka, India
| | - Adrian Slater
- Biomolecular Technology Group, Faculty of Health and Life Sciences, De Montfort University, Leicester, LE1 9BH, UK
| | - Sundaresan Velusamy
- Plant Biology and Systematics, CSIR - Central Institute of Medicinal and Aromatic Plants, Research Center, Allalsandra, GKVK Post, Bengaluru, 560065, Karnataka, India.
| |
Collapse
|
35
|
Birch JL, Walsh NG, Cantrill DJ, Holmes GD, Murphy DJ. Testing efficacy of distance and tree-based methods for DNA barcoding of grasses (Poaceae tribe Poeae) in Australia. PLoS One 2017; 12:e0186259. [PMID: 29084279 PMCID: PMC5662090 DOI: 10.1371/journal.pone.0186259] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2017] [Accepted: 09/28/2017] [Indexed: 01/09/2023] Open
Abstract
In Australia, Poaceae tribe Poeae are represented by 19 genera and 99 species, including economically and environmentally important native and introduced pasture grasses [e.g. Poa (Tussock-grasses) and Lolium (Ryegrasses)]. We used this tribe, which are well characterised in regards to morphological diversity and evolutionary relationships, to test the efficacy of DNA barcoding methods. A reference library was generated that included 93.9% of species in Australia (408 individuals, [Formula: see text] = 3.7 individuals per species). Molecular data were generated for official plant barcoding markers (rbcL, matK) and the nuclear ribosomal internal transcribed spacer (ITS) region. We investigated accuracy of specimen identifications using distance- (nearest neighbour, best-close match, and threshold identification) and tree-based (maximum likelihood, Bayesian inference) methods and applied species discovery methods (automatic barcode gap discovery, Poisson tree processes) based on molecular data to assess congruence with recognised species. Across all methods, success rate for specimen identification of genera was high (87.5-99.5%) and of species was low (25.6-44.6%). Distance- and tree-based methods were equally ineffective in providing accurate identifications for specimens to species rank (26.1-44.6% and 25.6-31.3%, respectively). The ITS marker achieved the highest success rate for specimen identification at both generic and species ranks across the majority of methods. For distance-based analyses the best-close match method provided the greatest accuracy for identification of individuals with a high percentage of "correct" (97.6%) and a low percentage of "incorrect" (0.3%) generic identifications, based on the ITS marker. For tribe Poeae, and likely for other grass lineages, sequence data in the standard DNA barcode markers are not variable enough for accurate identification of specimens to species rank. For recently diverged grass species similar challenges are encountered in the application of genetic and morphological data to species delimitations, with taxonomic signal limited by extensive infra-specific variation and shared polymorphisms among species in both data types.
Collapse
Affiliation(s)
- Joanne L. Birch
- Royal Botanic Gardens Victoria, Melbourne, Victoria, Australia
| | | | | | | | | |
Collapse
|
36
|
Zheng X, Zhang P, Liao B, Li J, Liu X, Shi Y, Cheng J, Lai Z, Xu J, Chen S. A Comprehensive Quality Evaluation System for Complex Herbal Medicine Using PacBio Sequencing, PCR-Denaturing Gradient Gel Electrophoresis, and Several Chemical Approaches. FRONTIERS IN PLANT SCIENCE 2017; 8:1578. [PMID: 28955365 PMCID: PMC5601397 DOI: 10.3389/fpls.2017.01578] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 08/29/2017] [Indexed: 06/01/2023]
Abstract
Herbal medicine is a major component of complementary and alternative medicine, contributing significantly to the health of many people and communities. Quality control of herbal medicine is crucial to ensure that it is safe and sound for use. Here, we investigated a comprehensive quality evaluation system for a classic herbal medicine, Danggui Buxue Formula, by applying genetic-based and analytical chemistry approaches to authenticate and evaluate the quality of its samples. For authenticity, we successfully applied two novel technologies, third-generation sequencing and PCR-DGGE (denaturing gradient gel electrophoresis), to analyze the ingredient composition of the tested samples. For quality evaluation, we used high performance liquid chromatography assays to determine the content of chemical markers to help estimate the dosage relationship between its two raw materials, plant roots of Huangqi and Danggui. A series of surveys were then conducted against several exogenous contaminations, aiming to further access the efficacy and safety of the samples. In conclusion, the quality evaluation system demonstrated here can potentially address the authenticity, quality, and safety of herbal medicines, thus providing novel insight for enhancing their overall quality control. Highlight: We established a comprehensive quality evaluation system for herbal medicine, by combining two genetic-based approaches third-generation sequencing and DGGE (denaturing gradient gel electrophoresis) with analytical chemistry approaches to achieve the authentication and quality connotation of the samples.
Collapse
Affiliation(s)
- Xiasheng Zheng
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, China Academy of Chinese Medical SciencesBeijing, China
- Key Laboratory of Technologies and Applications of Ultrafine Granular Powder of Herbal Medicine, State Administration of Traditional Chinese Medicine, Zhongshan Zhongzhi Pharmaceutical Group LimitedZhongshan, China
- Guangdong Provincial Key Laboratory of New Drug Development and Research of Chinese Medicine, Guangzhou University of Chinese MedicineGuangzhou, China
| | - Peng Zhang
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, China Academy of Chinese Medical SciencesBeijing, China
- School of Chinese Materia Medica, Beijing University of Chinese MedicineBeijing, China
| | - Baosheng Liao
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, China Academy of Chinese Medical SciencesBeijing, China
| | - Jing Li
- Traditional Chinese Medicine Gynecology Laboratory in Lingnan Medical Research Center, Guangzhou University of Chinese MedicineGuangzhou, China
| | - Xingyun Liu
- Key Laboratory of Technologies and Applications of Ultrafine Granular Powder of Herbal Medicine, State Administration of Traditional Chinese Medicine, Zhongshan Zhongzhi Pharmaceutical Group LimitedZhongshan, China
| | - Yuhua Shi
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, China Academy of Chinese Medical SciencesBeijing, China
| | - Jinle Cheng
- Key Laboratory of Technologies and Applications of Ultrafine Granular Powder of Herbal Medicine, State Administration of Traditional Chinese Medicine, Zhongshan Zhongzhi Pharmaceutical Group LimitedZhongshan, China
| | - Zhitian Lai
- Key Laboratory of Technologies and Applications of Ultrafine Granular Powder of Herbal Medicine, State Administration of Traditional Chinese Medicine, Zhongshan Zhongzhi Pharmaceutical Group LimitedZhongshan, China
| | - Jiang Xu
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, China Academy of Chinese Medical SciencesBeijing, China
| | - Shilin Chen
- Key Laboratory of Beijing for Identification and Safety Evaluation of Chinese Medicine, Institute of Chinese Materia Medica, China Academy of Chinese Medical SciencesBeijing, China
| |
Collapse
|
37
|
The utility of mtDNA and rDNA for barcoding and phylogeny of plant-parasitic nematodes from Longidoridae (Nematoda, Enoplea). Sci Rep 2017; 7:10905. [PMID: 28883648 PMCID: PMC5589882 DOI: 10.1038/s41598-017-11085-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Accepted: 08/11/2017] [Indexed: 11/15/2022] Open
Abstract
The traditional identification of plant-parasitic nematode species by morphology and morphometric studies is very difficult because of high morphological variability that can lead to considerable overlap of many characteristics and their ambiguous interpretation. For this reason, it is essential to implement approaches to ensure accurate species identification. DNA barcoding aids in identification and advances species discovery. This study sought to unravel the use of the mitochondrial marker cytochrome c oxidase subunit 1 (coxI) as barcode for Longidoridae species identification, and as a phylogenetic marker. The results showed that mitochondrial and ribosomal markers could be used as barcoding markers, except for some species from the Xiphinema americanum group. The ITS1 region showed a promising role in barcoding for species identification because of the clear molecular variability among species. Some species presented important molecular variability in coxI. The analysis of the newly provided sequences and the sequences deposited in GenBank showed plausible misidentifications, and the use of voucher species and topotype specimens is a priority for this group of nematodes. The use of coxI and D2 and D3 expansion segments of the 28S rRNA gene did not clarify the phylogeny at the genus level.
Collapse
|
38
|
Hosein FN, Austin N, Maharaj S, Johnson W, Rostant L, Ramdass AC, Rampersad SN. Utility of DNA barcoding to identify rare endemic vascular plant species in Trinidad. Ecol Evol 2017; 7:7311-7333. [PMID: 28944019 PMCID: PMC5606854 DOI: 10.1002/ece3.3220] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Revised: 05/17/2017] [Accepted: 06/12/2017] [Indexed: 02/06/2023] Open
Abstract
The islands of the Caribbean are considered to be a "biodiversity hotspot." Collectively, a high level of endemism for several plant groups has been reported for this region. Biodiversity conservation should, in part, be informed by taxonomy, population status, and distribution of flora. One taxonomic impediment to species inventory and management is correct identification as conventional morphology-based assessment is subject to several caveats. DNA barcoding can be a useful tool to quickly and accurately identify species and has the potential to prompt the discovery of new species. In this study, the ability of DNA barcoding to confirm the identities of 14 endangered endemic vascular plant species in Trinidad was assessed using three DNA barcodes (matK, rbcL, and rpoC1). Herbarium identifications were previously made for all species under study. matK, rbcL, and rpoC1 markers were successful in amplifying target regions for seven of the 14 species. rpoC1 sequences required extensive editing and were unusable. rbcL primers resulted in cleanest reads, however, matK appeared to be superior to rbcL based on a number of parameters assessed including level of DNA polymorphism in the sequences, genetic distance, reference library coverage based on BLASTN statistics, direct sequence comparisons within "best match" and "best close match" criteria, and finally, degree of clustering with moderate to strong bootstrap support (>60%) in neighbor-joining tree-based comparisons. The performance of both markers seemed to be species-specific based on the parameters examined. Overall, the Trinidad sequences were accurately identified to the genus level for all endemic plant species successfully amplified and sequenced using both matK and rbcL markers. DNA barcoding can contribute to taxonomic and biodiversity research and will complement efforts to select taxa for various molecular ecology and population genetics studies.
Collapse
Affiliation(s)
- Fazeeda N. Hosein
- Faculty of Science and TechnologyDepartment of Life SciencesThe University of the West IndiesSt. AugustineTrinidad and Tobago – West Indies
| | - Nigel Austin
- Faculty of Science and TechnologyDepartment of Life SciencesThe University of the West IndiesSt. AugustineTrinidad and Tobago – West Indies
| | - Shobha Maharaj
- Faculty of Science and TechnologyDepartment of Life SciencesThe University of the West IndiesSt. AugustineTrinidad and Tobago – West Indies
| | - Winston Johnson
- Faculty of Science and TechnologyDepartment of Life SciencesThe University of the West IndiesSt. AugustineTrinidad and Tobago – West Indies
| | - Luke Rostant
- Faculty of Science and TechnologyDepartment of Life SciencesThe University of the West IndiesSt. AugustineTrinidad and Tobago – West Indies
| | - Amanda C. Ramdass
- Faculty of Science and TechnologyDepartment of Life SciencesThe University of the West IndiesSt. AugustineTrinidad and Tobago – West Indies
| | - Sephra N. Rampersad
- Faculty of Science and TechnologyDepartment of Life SciencesThe University of the West IndiesSt. AugustineTrinidad and Tobago – West Indies
| |
Collapse
|
39
|
Mishra P, Kumar A, Nagireddy A, Shukla AK, Sundaresan V. Evaluation of single and multilocus DNA barcodes towards species delineation in complex tree genus Terminalia. PLoS One 2017; 12:e0182836. [PMID: 28829803 PMCID: PMC5567895 DOI: 10.1371/journal.pone.0182836] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Accepted: 07/25/2017] [Indexed: 11/19/2022] Open
Abstract
DNA barcoding is used as a universal tool for delimiting species boundaries in taxonomically challenging groups, with different plastid and nuclear regions (rbcL, matK, ITS and psbA-trnH) being recommended as primary DNA barcodes for plants. We evaluated the feasibility of using these regions in the species-rich genus Terminalia, which exhibits various overlapping morphotypes with pantropical distribution, owing to its complex taxonomy. Terminalia bellerica and T. chebula are ingredients of the famous Ayurvedic Rasayana formulation Triphala, used for detoxification and rejuvenation. High demand for extracted phytochemicals as well as the high trade value of several species renders mandatory the need for the correct identification of traded plant material. Three different analytical methods with single and multilocus barcoding regions were tested to develop a DNA barcode reference library from 222 individuals representing 41 Terminalia species. All the single barcodes tested had a lower discriminatory power than the multilocus regions, and the combination of matK+ITS had the highest resolution rate (94.44%). The average intra-specific variations (0.0188±0.0019) were less than the distance to the nearest neighbour (0.106±0.009) with matK and ITS. Distance-based Neighbour Joining analysis outperformed the character-based Maximum Parsimony method in the identification of traded species such as T. arjuna, T. chebula and T. tomentosa, which are prone to adulteration. rbcL was shown to be a highly conservative region with only 3.45% variability between all of the sequences. The recommended barcode combination, rbcL+matK, failed to perform in the genus Terminalia. Considering the complexity of resolution observed with single regions, the present study proposes the combination of matK+ITS as the most successful barcode in Terminalia.
Collapse
Affiliation(s)
- Priyanka Mishra
- Plant Biology and Systematics, CSIR—Central Institute of Medicinal and Aromatic Plants, Research Center, Bengaluru, Karnataka, India
| | - Amit Kumar
- Plant Biology and Systematics, CSIR—Central Institute of Medicinal and Aromatic Plants, Research Center, Bengaluru, Karnataka, India
| | - Akshitha Nagireddy
- Plant Biology and Systematics, CSIR—Central Institute of Medicinal and Aromatic Plants, Research Center, Bengaluru, Karnataka, India
| | - Ashutosh K. Shukla
- Biotechnology Division, CSIR—Central Institute of Medicinal and Aromatic Plants, Lucknow, Uttar Pradesh, India
| | - Velusamy Sundaresan
- Plant Biology and Systematics, CSIR—Central Institute of Medicinal and Aromatic Plants, Research Center, Bengaluru, Karnataka, India
| |
Collapse
|
40
|
Lourenço J, Watkins ER, Obolski U, Peacock SJ, Morris C, Maiden MCJ, Gupta S. Lineage structure of Streptococcus pneumoniae may be driven by immune selection on the groEL heat-shock protein. Sci Rep 2017; 7:9023. [PMID: 28831154 PMCID: PMC5567354 DOI: 10.1038/s41598-017-08990-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Accepted: 07/20/2017] [Indexed: 12/29/2022] Open
Abstract
Populations of Streptococcus pneumoniae (SP) are typically structured into groups of closely related organisms or lineages, but it is not clear whether they are maintained by selection or neutral processes. Here, we attempt to address this question by applying a machine learning technique to SP whole genomes. Our results indicate that lineages evolved through immune selection on the groEL chaperone protein. The groEL protein is part of the groESL operon and enables a large range of proteins to fold correctly within the physical environment of the nasopharynx, thereby explaining why lineage structure is so stable within SP despite high levels of genetic transfer. SP is also antigenically diverse, exhibiting a variety of distinct capsular serotypes. Associations exist between lineage and capsular serotype but these can be easily perturbed, such as by vaccination. Overall, our analyses indicate that the evolution of SP can be conceptualized as the rearrangement of modular functional units occurring on several different timescales under different pressures: some patterns have locked in early (such as the epistatic interactions between groESL and a constellation of other genes) and preserve the differentiation of lineages, while others (such as the associations between capsular serotype and lineage) remain in continuous flux.
Collapse
Affiliation(s)
- José Lourenço
- Department of Zoology, University of Oxford, Oxford, United Kingdom.
| | | | - Uri Obolski
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | - Samuel J Peacock
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| | | | | | - Sunetra Gupta
- Department of Zoology, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
41
|
Izan S, Esselink D, Visser RGF, Smulders MJM, Borm T. De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences. FRONTIERS IN PLANT SCIENCE 2017; 8:1271. [PMID: 28824658 PMCID: PMC5539191 DOI: 10.3389/fpls.2017.01271] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Accepted: 07/05/2017] [Indexed: 05/11/2023]
Abstract
Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to de novo assemble the chloroplast genome from total genomic DNA sequences. In this study, we used k-mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully de novo assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: Solanum lycopersicum (0.9 Gb), Aegilops tauschii (4 Gb) and Paphiopedilum henryanum (25 Gb). We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for de novo short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.
Collapse
Affiliation(s)
- Shairul Izan
- Plant Breeding, Wageningen University and ResearchWageningen, Netherlands
- Department of Crop Science, Faculty of Agriculture, Universiti Putra MalaysiaSerdang, Malaysia
| | - Danny Esselink
- Plant Breeding, Wageningen University and ResearchWageningen, Netherlands
| | | | | | - Theo Borm
- Plant Breeding, Wageningen University and ResearchWageningen, Netherlands
| |
Collapse
|
42
|
Differentiating Authentic Adenophorae Radix from Its Adulterants in Commercially-Processed Samples Using Multiplexed ITS Sequence-Based SCAR Markers. APPLIED SCIENCES-BASEL 2017. [DOI: 10.3390/app7070660] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
43
|
Yang CH, Wu KC, Dahms HU, Chuang LY, Chang HW. Single nucleotide polymorphism barcoding of cytochrome c oxidase I sequences for discriminating 17 species of Columbidae by decision tree algorithm. Ecol Evol 2017; 7:4717-4725. [PMID: 28690801 PMCID: PMC5496562 DOI: 10.1002/ece3.3045] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 04/16/2017] [Indexed: 01/08/2023] Open
Abstract
DNA barcodes are widely used in taxonomy, systematics, species identification, food safety, and forensic science. Most of the conventional DNA barcode sequences contain the whole information of a given barcoding gene. Most of the sequence information does not vary and is uninformative for a given group of taxa within a monophylum. We suggest here a method that reduces the amount of noninformative nucleotides in a given barcoding sequence of a major taxon, like the prokaryotes, or eukaryotic animals, plants, or fungi. The actual differences in genetic sequences, called single nucleotide polymorphism (SNP) genotyping, provide a tool for developing a rapid, reliable, and high‐throughput assay for the discrimination between known species. Here, we investigated SNPs as robust markers of genetic variation for identifying different pigeon species based on available cytochrome c oxidase I (COI) data. We propose here a decision tree‐based SNP barcoding (DTSB) algorithm where SNP patterns are selected from the DNA barcoding sequence of several evolutionarily related species in order to identify a single species with pigeons as an example. This approach can make use of any established barcoding system. We here firstly used as an example the mitochondrial gene COI information of 17 pigeon species (Columbidae, Aves) using DTSB after sequence trimming and alignment. SNPs were chosen which followed the rule of decision tree and species‐specific SNP barcodes. The shortest barcode of about 11 bp was then generated for discriminating 17 pigeon species using the DTSB method. This method provides a sequence alignment and tree decision approach to parsimoniously assign a unique and shortest SNP barcode for any known species of a chosen monophyletic taxon where a barcoding sequence is available.
Collapse
Affiliation(s)
- Cheng-Hong Yang
- Department of Electronic Engineering National Kaohsiung University of Applied Sciences Kaohsiung Taiwan.,Graduate Institute of Clinical Medicine Kaohsiung Medical University Kaohsiung Taiwan
| | - Kuo-Chuan Wu
- Department of Electronic Engineering National Kaohsiung University of Applied Sciences Kaohsiung Taiwan.,Department of Computer Science and Information Engineering National Kaohsiung University of Applied Sciences Kaohsiung Taiwan
| | - Hans-Uwe Dahms
- Department of Biomedical Science and Environmental Biology Kaohsiung Medical University Kaohsiung Taiwan
| | - Li-Yeh Chuang
- Department of Chemical Engineering and Institute of Biotechnology and Chemical EngineeringI-Shou University Kaohsiung Taiwan
| | - Hsueh-Wei Chang
- Department of Biomedical Science and Environmental Biology Kaohsiung Medical University Kaohsiung Taiwan.,Institute of Medical Science and Technology National Sun Yat-sen University Kaohsiung Taiwan.,Department of Medical Research Kaohsiung Medical University Hospital Kaohsiung Taiwan.,Research Center for Natural Products and Drug Development Kaohsiung Medical University Kaohsiung Taiwan
| |
Collapse
|
44
|
Mishra P, Kumar A, Rodrigues V, Shukla AK, Sundaresan V. Feasibility of nuclear ribosomal region ITS1 over ITS2 in barcoding taxonomically challenging genera of subtribe Cassiinae (Fabaceae). PeerJ 2016; 4:e2638. [PMID: 27994958 PMCID: PMC5162394 DOI: 10.7717/peerj.2638] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Accepted: 09/30/2016] [Indexed: 12/13/2022] Open
Abstract
PREMISE OF THE STUDY The internal transcribed spacer (ITS) region is situated between 18S and 26S in a polycistronic rRNA precursor transcript. It had been proved to be the most commonly sequenced region across plant species to resolve phylogenetic relationships ranging from shallow to deep taxonomic levels. Despite several taxonomical revisions in Cassiinae, a stable phylogeny remains elusive at the molecular level, particularly concerning the delineation of species in the genera Cassia, Senna and Chamaecrista. This study addresses the comparative potential of ITS datasets (ITS1, ITS2 and concatenated) in resolving the underlying morphological disparity in the highly complex genera, to assess their discriminatory power as potential barcode candidates in Cassiinae. METHODOLOGY A combination of experimental data and an in-silico approach based on threshold genetic distances, sequence similarity based and hierarchical tree-based methods was performed to decipher the discriminating power of ITS datasets on 18 different species of Cassiinae complex. Lab-generated sequences were compared against those available in the GenBank using BLAST and were aligned through MUSCLE 3.8.31 and analysed in PAUP 4.0 and BEAST1.8 using parsimony ratchet, maximum likelihood and Bayesian inference (BI) methods of gene and species tree reconciliation with bootstrapping. DNA barcoding gap was realized based on the Kimura two-parameter distance model (K2P) in TaxonDNA and MEGA. PRINCIPAL FINDINGS Based on the K2P distance, significant divergences between the inter- and intra-specific genetic distances were observed, while the presence of a DNA barcoding gap was obvious. The ITS1 region efficiently identified 81.63% and 90% of species using TaxonDNA and BI methods, respectively. The PWG-distance method based on simple pairwise matching indicated the significance of ITS1 whereby highest number of variable (210) and informative sites (206) were obtained. The BI tree-based methods outperformed the similarity-based methods producing well-resolved phylogenetic trees with many nodes well supported by bootstrap analyses. CONCLUSION The reticulated phylogenetic hypothesis using the ITS1 region mainly supported the relationship between the species of Cassiinae established by traditional morphological methods. The ITS1 region showed a higher discrimination power and desirable characteristics as compared to ITS2 and ITS1 + 2, thereby concluding to be the locus of choice. Considering the complexity of the group and the underlying biological ambiguities, the results presented here are encouraging for developing DNA barcoding as a useful tool for resolving taxonomical challenges in corroboration with morphological framework.
Collapse
Affiliation(s)
- Priyanka Mishra
- Department of Plant Biology & Systematics, CSIR - Central Institute of Medicinal and Aromatics Plants, Research Center , Bangalore , Karnataka , India
| | - Amit Kumar
- Department of Plant Biology & Systematics, CSIR - Central Institute of Medicinal and Aromatics Plants, Research Center , Bangalore , Karnataka , India
| | - Vereena Rodrigues
- Department of Plant Biology & Systematics, CSIR - Central Institute of Medicinal and Aromatics Plants, Research Center , Bangalore , Karnataka , India
| | - Ashutosh K Shukla
- Biotechnology Division, CSIR - Central Institute of Medicinal and Aromatic Plants , Lucknow , Uttar Pradesh , India
| | - Velusamy Sundaresan
- Department of Plant Biology & Systematics, CSIR - Central Institute of Medicinal and Aromatics Plants, Research Center , Bangalore , Karnataka , India
| |
Collapse
|
45
|
Fiscon G, Weitschek E, Cella E, Lo Presti A, Giovanetti M, Babakir-Mina M, Ciotti M, Ciccozzi M, Pierangeli A, Bertolazzi P, Felici G. MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification. BioData Min 2016; 9:38. [PMID: 27980679 PMCID: PMC5139023 DOI: 10.1186/s13040-016-0116-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 11/20/2016] [Indexed: 12/04/2022] Open
Abstract
Background Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods. Results We propose a supervised method based on a genetic algorithm to identify small genomic subsequences that discriminate among different species. The method identifies multiple subsequences of bounded length with the same information power in a given genomic region. The algorithm has been successfully evaluated through its integration into a rule-based classification framework and applied to three different biological data sets: Influenza, Polyoma, and Rhino virus sequences. Conclusions We discover a large number of small subsequences that can be used to identify each virus type with high accuracy and low computational time, and moreover help to characterize different genomic regions. Bounding their length to 20, our method found 1164 characterizing subsequences for all the Influenza virus subtypes, 194 for all the Polyoma viruses, and 11 for Rhino viruses. The abundance of small separating subsequences extracted for each genomic region may be an important support for quick and robust virus identification. Finally, useful biological information can be derived by the relative location and abundance of such subsequences along the different regions. Electronic supplementary material The online version of this article (doi:10.1186/s13040-016-0116-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Giulia Fiscon
- Institute of Systems Analysis and Computer Science A. Ruberti (IASI), National Research Council (CNR), Via dei Taurini 19, Rome, 00185 Italy
| | - Emanuel Weitschek
- Institute of Systems Analysis and Computer Science A. Ruberti (IASI), National Research Council (CNR), Via dei Taurini 19, Rome, 00185 Italy.,Department of Engineering, Uninettuno International University, Corso Vittorio Emanuele II 39, Rome, 00186 Italy
| | - Eleonora Cella
- Department of Infectious Diseases, Istituto Superiore di Sanita, Viale Regina Margherita 299, Rome, 00161 Italy.,Public Health and Infectious Diseases, Sapienza University, Piazzale Aldo Moro 5, Rome, 00185 Italy
| | - Alessandra Lo Presti
- Department of Infectious Diseases, Istituto Superiore di Sanita, Viale Regina Margherita 299, Rome, 00161 Italy
| | - Marta Giovanetti
- Department of Infectious Diseases, Istituto Superiore di Sanita, Viale Regina Margherita 299, Rome, 00161 Italy.,Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica 1, Rome, 00133 Italy
| | | | - Marco Ciotti
- Laboratory of Molecular Virology, Polyclinic Tor Vergata Foundation, Viale Oxford 81, Rome, 00133 Italy
| | - Massimo Ciccozzi
- Institute of Systems Analysis and Computer Science A. Ruberti (IASI), National Research Council (CNR), Via dei Taurini 19, Rome, 00185 Italy.,Department of Infectious Diseases, Istituto Superiore di Sanita, Viale Regina Margherita 299, Rome, 00161 Italy
| | - Alessandra Pierangeli
- Virology Laboratory, Department of Molecular Medicine, Sapienza University, Viale di Porta Tiburtina 2, Rome, 00185 Italy
| | - Paola Bertolazzi
- Institute of Systems Analysis and Computer Science A. Ruberti (IASI), National Research Council (CNR), Via dei Taurini 19, Rome, 00185 Italy
| | - Giovanni Felici
- Institute of Systems Analysis and Computer Science A. Ruberti (IASI), National Research Council (CNR), Via dei Taurini 19, Rome, 00185 Italy
| |
Collapse
|
46
|
Yang Z, Landry JF, Hebert PDN. A DNA Barcode Library for North American Pyraustinae (Lepidoptera: Pyraloidea: Crambidae). PLoS One 2016; 11:e0161449. [PMID: 27736878 PMCID: PMC5063472 DOI: 10.1371/journal.pone.0161449] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 08/06/2016] [Indexed: 11/24/2022] Open
Abstract
Although members of the crambid subfamily Pyraustinae are frequently important crop pests, their identification is often difficult because many species lack conspicuous diagnostic morphological characters. DNA barcoding employs sequence diversity in a short standardized gene region to facilitate specimen identifications and species discovery. This study provides a DNA barcode reference library for North American pyraustines based upon the analysis of 1589 sequences recovered from 137 nominal species, 87% of the fauna. Data from 125 species were barcode compliant (>500bp, <1% n), and 99 of these taxa formed a distinct cluster that was assigned to a single BIN. The other 26 species were assigned to 56 BINs, reflecting frequent cases of deep intraspecific sequence divergence and a few instances of barcode sharing, creating a total of 155 BINs. Two systems for OTU designation, ABGD and BIN, were examined to check the correspondence between current taxonomy and sequence clusters. The BIN system performed better than ABGD in delimiting closely related species, while OTU counts with ABGD were influenced by the value employed for relative gap width. Different species with low or no interspecific divergence may represent cases of unrecognized synonymy, whereas those with high intraspecific divergence require further taxonomic scrutiny as they may involve cryptic diversity. The barcode library developed in this study will also help to advance understanding of relationships among species of Pyraustinae.
Collapse
Affiliation(s)
- Zhaofu Yang
- Key laboratory of Plant Protection Resources and Pest Management, Ministry of Education, Northwest A&F University, Yangling, Shaanxi, China
- College of Plant Protection, Northwest A&F University, Yangling, Shaanxi, China
- * E-mail:
| | - Jean-François Landry
- Agriculture and Agri-Food Canada, Ottawa Research & Development Centre, Ottawa, Ontario, Canada
| | - Paul D. N. Hebert
- Centre for Biodiversity Genomics, Biodiversity Institute of Ontario, University of Guelph, Guelph, Ontario, Canada
| |
Collapse
|
47
|
Holovachov O. Metabarcoding of marine nematodes - evaluation of reference datasets used in tree-based taxonomy assignment approach. Biodivers Data J 2016:e10021. [PMID: 27932919 PMCID: PMC5136706 DOI: 10.3897/bdj.4.e10021] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 09/15/2016] [Indexed: 11/30/2022] Open
Abstract
Background Metabarcoding is becoming a common tool used to assess and compare diversity of organisms in environmental samples. Identification of OTUs is one of the critical steps in the process and several taxonomy assignment methods were proposed to accomplish this task. This publication evaluates the quality of reference datasets, alongside with several alignment and phylogeny inference methods used in one of the taxonomy assignment methods, called tree-based approach. This approach assigns anonymous OTUs to taxonomic categories based on relative placements of OTUs and reference sequences on the cladogram and support that these placements receive. New information In tree-based taxonomy assignment approach, reliable identification of anonymous OTUs is based on their placement in monophyletic and highly supported clades together with identified reference taxa. Therefore, it requires high quality reference dataset to be used. Resolution of phylogenetic trees is strongly affected by the presence of erroneous sequences as well as alignment and phylogeny inference methods used in the process. Two preparation steps are essential for the successful application of tree-based taxonomy assignment approach. Completing the above mentioned preparation steps is expected to decrease the number of unassigned OTUs and thus improve the results of the tree-based taxonomy assignment approach.
Collapse
|
48
|
Meher PK, Sahu TK, Rao AR. Identification of species based on DNA barcode using k-mer feature vector and Random forest classifier. Gene 2016; 592:316-24. [PMID: 27393648 DOI: 10.1016/j.gene.2016.07.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2016] [Revised: 07/02/2016] [Accepted: 07/04/2016] [Indexed: 11/17/2022]
Abstract
DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| | - Tanmaya Kumar Sahu
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| | - A R Rao
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
| |
Collapse
|
49
|
Somervuo P, Koskela S, Pennanen J, Henrik Nilsson R, Ovaskainen O. Unbiased probabilistic taxonomic classification for DNA barcoding. Bioinformatics 2016; 32:2920-7. [DOI: 10.1093/bioinformatics/btw346] [Citation(s) in RCA: 59] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 05/27/2016] [Indexed: 11/14/2022] Open
|
50
|
Luo A, Lan H, Ling C, Zhang A, Shi L, Ho SYW, Zhu C. A simulation study of sample size for DNA barcoding. Ecol Evol 2015; 5:5869-79. [PMID: 26811761 PMCID: PMC4717336 DOI: 10.1002/ece3.1846] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Revised: 10/20/2015] [Accepted: 10/21/2015] [Indexed: 01/31/2023] Open
Abstract
For some groups of organisms, DNA barcoding can provide a useful tool in taxonomy, evolutionary biology, and biodiversity assessment. However, the efficacy of DNA barcoding depends on the degree of sampling per species, because a large enough sample size is needed to provide a reliable estimate of genetic polymorphism and for delimiting species. We used a simulation approach to examine the effects of sample size on four estimators of genetic polymorphism related to DNA barcoding: mismatch distribution, nucleotide diversity, the number of haplotypes, and maximum pairwise distance. Our results showed that mismatch distributions derived from subsamples of ≥20 individuals usually bore a close resemblance to that of the full dataset. Estimates of nucleotide diversity from subsamples of ≥20 individuals tended to be bell‐shaped around that of the full dataset, whereas estimates from smaller subsamples were not. As expected, greater sampling generally led to an increase in the number of haplotypes. We also found that subsamples of ≥20 individuals allowed a good estimate of the maximum pairwise distance of the full dataset, while smaller ones were associated with a high probability of underestimation. Overall, our study confirms the expectation that larger samples are beneficial for the efficacy of DNA barcoding and suggests that a minimum sample size of 20 individuals is needed in practice for each population.
Collapse
Affiliation(s)
- Arong Luo
- Key Laboratory of Zoological Systematics and Evolution Institute of Zoology Chinese Academy of Sciences Beijing 100101 China
| | - Haiqiang Lan
- Key Laboratory of Zoological Systematics and Evolution Institute of Zoology Chinese Academy of Sciences Beijing 100101 China; School of Statistics and Mathematics Yunnan University of Finance and Economics Kunming 650221 China
| | - Cheng Ling
- Department of Computer Science and Technology College of Information Science and Technology Beijing University of Chemical Technology Beijing 100029 China
| | - Aibing Zhang
- College of Life Sciences Capital Normal University Beijing 100048 China
| | - Lei Shi
- School of Statistics and Mathematics Yunnan University of Finance and Economics Kunming 650221 China
| | - Simon Y W Ho
- School of Biological Sciences University of Sydney Sydney New South Wales 2006 Australia
| | - Chaodong Zhu
- Key Laboratory of Zoological Systematics and Evolution Institute of Zoology Chinese Academy of Sciences Beijing 100101 China; College of Life Sciences University of Chinese Academy of Sciences Beijing 100049 China
| |
Collapse
|