1
|
Sukumaran J, Meila M. Piikun: an information theoretic toolkit for analysis and visualization of species delimitation metric space. BMC Bioinformatics 2024; 25:385. [PMID: 39695946 DOI: 10.1186/s12859-024-05997-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Accepted: 11/21/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Existing software for comparison of species delimitation models do not provide a (true) metric or distance functions between species delimitation models, nor a way to compare these models in terms of relative clustering differences along a lattice of partitions. RESULTS Piikun is a Python package for analyzing and visualizing species delimitation models in an information theoretic framework that, in addition to classic measures of information such as the entropy and mutual information [1], provides for the calculation of the Variation of Information (VI) criterion [2], a true metric or distance function for species delimitation models that is aligned with the lattice of partitions. CONCLUSIONS Piikun is available under the MIT license from its public repository ( https://github.com/jeetsukumaran/piikun ), and can be installed locally using the Python package manager 'pip'.
Collapse
Affiliation(s)
- Jeet Sukumaran
- Biology, San Diego State University, San Diego, CA, USA.
| | - Marina Meila
- Statistics, University of Washington, Seattle, 10587, WA, USA
| |
Collapse
|
2
|
Vences M, Patmanidis S, Schmidt JC, Matschiner M, Miralles A, Renner SS. Hapsolutely: a user-friendly tool integrating haplotype phasing, network construction, and haploweb calculation. BIOINFORMATICS ADVANCES 2024; 4:vbae083. [PMID: 38895561 PMCID: PMC11184345 DOI: 10.1093/bioadv/vbae083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 05/15/2024] [Accepted: 06/04/2024] [Indexed: 06/21/2024]
Abstract
Motivation Haplotype networks are a routine approach to visualize relationships among alleles. Such visual analysis of single-locus data is still of importance, especially in species diagnosis and delimitation, where a limited amount of sequence data usually are available and sufficient, along with other datasets in the framework of integrative taxonomy. In diploid organisms, this often requires separating (phasing) sequences with heterozygotic positions, and typically separate programs are required for phasing, reformatting of input files, and haplotype network construction. We therefore developed Hapsolutely, a user-friendly program with an ergonomic graphical user interface that integrates haplotype phasing from single-locus sequences with five approaches for network/genealogy reconstruction. Results Among the novel options implemented, Hapsolutely integrates phasing and graphical reconstruction steps of haplotype networks, supports input of species partition data in the common SPART and SPART-XML formats, and calculates and visualizes haplowebs and fields for recombination, thus allowing graphical comparison of allele distribution and allele sharing among subsets for the purpose of species delimitation. The new tool has been specifically developed with a focus on the workflow in alpha-taxonomy, where exploring fields for recombination across alternative species partitions may help species delimitation. Availability and implementation Hapsolutely is written in Python, and integrates code from Phase, SeqPHASE, and PopART in C++ and Haxe. Compiled stand-alone executables for MS Windows and Mac OS along with a detailed manual can be downloaded from https://www.itaxotools.org; the source code is openly available on GitHub (https://github.com/iTaxoTools/Hapsolutely).
Collapse
Affiliation(s)
- Miguel Vences
- Division of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, 38106 Braunschweig, Germany
| | - Stefanos Patmanidis
- Department of Computer Science, School of Electrical and Computer Engineering, National Technical University of Athens, 15780 Athens, Greece
| | - Jan-Christopher Schmidt
- Division of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, 38106 Braunschweig, Germany
| | | | - Aurélien Miralles
- Division of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, 38106 Braunschweig, Germany
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d’Histoire Naturelle, CNRS, Sorbonne Université, EPHE, 75005 Paris, France
| | - Susanne S Renner
- Department of Biology, Washington University, Saint Louis, MO 63130, United States
| |
Collapse
|
3
|
Vuataz L, Reding JP, Reding A, Roesti C, Stoffel C, Vinçon G, Gattolliat JL. A comprehensive DNA barcoding reference database for Plecoptera of Switzerland. Sci Rep 2024; 14:6322. [PMID: 38491157 PMCID: PMC10943188 DOI: 10.1038/s41598-024-56930-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 03/12/2024] [Indexed: 03/18/2024] Open
Abstract
DNA barcoding is an essential tool in modern biodiversity sciences. Despite considerable work to barcode the tree of life, many groups, including insects, remain partially or totally unreferenced, preventing barcoding from reaching its full potential. Aquatic insects, especially the three orders Ephemeroptera, Plecoptera, and Trichoptera (EPT), are key freshwater quality indicators worldwide. Among them, Plecoptera (stoneflies), which are among the most sensitive aquatic insects to habitat modification, play a central role in river monitoring surveys. Here, we present an update of the Plecoptera reference database for (meta)barcoding in Switzerland, now covering all 118 species known from this country. Fresh specimens, mostly from rare or localized species, were collected, and 151 new CO1 barcodes were generated. These were merged with the 422 previously published sequences, resulting in a dataset of 573 barcoded specimens. Our CO1 dataset was delimited in 115 CO1 clusters based on a priori morphological identifications, of which 17% are newly reported for Switzerland, and 4% are newly reported globally. Among the 115 CO1 clusters, 85% showed complete congruence with morphology. Distance-based analysis indicated local barcoding gaps in 97% of the CO1 clusters. This study significantly improves the Swiss reference database for stoneflies, enhancing future species identification accuracy and biodiversity monitoring. Additionally, this work reveals cryptic diversity and incongruence between morphology and barcodes, both presenting valuable opportunities for future integrative taxonomic studies. Voucher specimens, DNA extractions and reference barcodes are available for future developments, including metabarcoding and environmental DNA surveys.
Collapse
Affiliation(s)
- Laurent Vuataz
- Département de zoologie, Palais de Rumine, Muséum cantonal des sciences naturelles, Place Riponne 6, 1005, Lausanne, Switzerland.
- Department of Ecology and Evolution, University of Lausanne (UNIL), 1015, Lausanne, Switzerland.
| | | | | | | | - Céline Stoffel
- Département de zoologie, Palais de Rumine, Muséum cantonal des sciences naturelles, Place Riponne 6, 1005, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne (UNIL), 1015, Lausanne, Switzerland
| | | | - Jean-Luc Gattolliat
- Département de zoologie, Palais de Rumine, Muséum cantonal des sciences naturelles, Place Riponne 6, 1005, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne (UNIL), 1015, Lausanne, Switzerland
| |
Collapse
|
4
|
Miralles A, Puillandre N, Vences M. DNA Barcoding in Species Delimitation: From Genetic Distances to Integrative Taxonomy. Methods Mol Biol 2024; 2744:77-104. [PMID: 38683312 DOI: 10.1007/978-1-0716-3581-0_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
Over the past two decades, DNA barcoding has become the most popular exploration approach in molecular taxonomy, whether for identification, discovery, delimitation, or description of species. The present contribution focuses on the utility of DNA barcoding for taxonomic research activities related to species delimitation, emphasizing the following aspects:(1) To what extent DNA barcoding can be a valuable ally for fundamental taxonomic research, (2) its methodological and theoretical limitations, (3) the conceptual background and practical use of pairwise distances between DNA barcode sequences in taxonomy, and (4) the different ways in which DNA barcoding can be combined with complementary means of investigation within a broader integrative framework. In this chapter, we recall and discuss the key conceptual advances that have led to the so-called renaissance of taxonomy, elaborate a detailed glossary for the terms specific to this discipline (see Glossary in Chap. 35 ), and propose a newly designed step-by-step species delimitation protocol starting from DNA barcode data that includes steps from the preliminary elaboration of an optimal sampling strategy to the final decision-making process which potentially leads to nomenclatural changes.
Collapse
Affiliation(s)
- Aurélien Miralles
- Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum national d'Histoire naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
| | - Nicolas Puillandre
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum national d'Histoire naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
| | - Miguel Vences
- Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany.
| |
Collapse
|
5
|
Vences M, Miralles A, DeSalle R. A Glossary of DNA Barcoding Terms. Methods Mol Biol 2024; 2744:561-572. [PMID: 38683343 DOI: 10.1007/978-1-0716-3581-0_35] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
This chapter provides a reference glossary for the protocols in this volume. We have chosen only the very basic terms in the DNA barcode lexicon to include, and provide clear and concise definitions of these terms. We hope the reader finds this glossary useful.
Collapse
Affiliation(s)
- Miguel Vences
- Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany
| | - Aurélien Miralles
- Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum national d'Histoire naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
| | - Robert DeSalle
- Division of Invertebrate Zoology, American Museum of Natural History, New York, NY, USA.
| |
Collapse
|
6
|
Puillandre N, Miralles A, Brouillet S, Fedosov A, Fischell F, Patmanidis S, Vences M. Species Delimitation and Exploration of Species Partitions with ASAP and LIMES. Methods Mol Biol 2024; 2744:313-334. [PMID: 38683328 DOI: 10.1007/978-1-0716-3581-0_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
DNA barcoding plays an important role in exploring undescribed biodiversity and is increasingly used to delimit lineages at the species level (see Chap. 4 by Miralles et al.). Although several approaches and programs have been developed to perform species delimitation from datasets of single-locus DNA sequences, such as DNA barcodes, most of these were not initially provided as user-friendly GUI-driven executables. In spite of their differences, most of these tools share the same goal, i.e., inferring de novo a partition of subsets, potentially each representing a distinct species. More recently, a proposed common exchange format for the resulting species partitions (SPART) has been implemented by several of these tools, paving the way toward developing an interoperable digital environment entirely dedicated to integrative and comparative species delimitation. In this chapter, we provide detailed protocols for the use of two bioinformatic tools, one for single locus molecular species delimitation (ASAP) and one for statistical comparison of species partitions resulting from any kind of species delimitation analyses (LIMES).
Collapse
Affiliation(s)
- Nicolas Puillandre
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
| | - Aurélien Miralles
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
- Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany
| | - Sophie Brouillet
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum National d'Histoire Naturelle, CNRS, Sorbonne Université, EPHE, Paris, France
| | - Alexander Fedosov
- Department of Zoology, Swedish Museum of Natural History, Stockholm, Sweden
| | - Frank Fischell
- Institute of Zoology, University of Cologne, Köln, Germany
| | - Stefanos Patmanidis
- School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
| | - Miguel Vences
- Department of Evolutionary Biology, Zoological Institute, Technische Universität Braunschweig, Braunschweig, Germany.
| |
Collapse
|
7
|
Hubert N, Phillips JD, Hanner RH. Delimiting Species with Single-Locus DNA Sequences. Methods Mol Biol 2024; 2744:53-76. [PMID: 38683311 DOI: 10.1007/978-1-0716-3581-0_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
DNA sequences are increasingly used for large-scale biodiversity inventories. Because these genetic data avoid the time-consuming initial sorting of specimens based on their phenotypic attributes, they have been recently incorporated into taxonomic workflows for overlooked and diverse taxa. Major statistical developments have accompanied this new practice, and several models have been proposed to delimit species with single-locus DNA sequences. However, proposed approaches to date make different assumptions regarding taxon lineage history, leading to strong discordance whenever comparisons are made among methods. Distance-based methods, such as Automatic Barcode Gap Discovery (ABGD) and Assemble Species by Automatic Partitioning (ASAP), rely on the detection of a barcode gap (i.e., the lack of overlap in the distributions of intraspecific and interspecific genetic distances) and the associated threshold in genetic distances. Network-based methods, as exemplified by the REfined Single Linkage (RESL) algorithm for the generation of Barcode Index Numbers (BINs), use connectivity statistics to hierarchically cluster-related haplotypes into molecular operational taxonomic units (MOTUs) which serve as species proxies. Tree-based methods, including Poisson Tree Processes (PTP) and the General Mixed Yule Coalescent (GMYC), fit statistical models to phylogenetic trees by maximum likelihood or Bayesian frameworks.Multiple webservers and stand-alone versions of these methods are now available, complicating decision-making regarding the most appropriate approach to use for a given taxon of interest. For instance, tree-based methods require an initial phylogenetic reconstruction, and multiple options are now available for this purpose such as RAxML and BEAST. Across all examined species delimitation methods, judicious parameter setting is paramount, as different model parameterizations can lead to differing conclusions. The objective of this chapter is to guide users step-by-step through all the procedures involved for each of these methods, while aggregating all necessary information required to conduct these analyses. The "Materials" section details how to prepare and format input files, including options to align sequences and conduct tree reconstruction with Maximum Likelihood and Bayesian inference. The Methods section presents the procedure and options available to conduct species delimitation analyses, including distance-, network-, and tree-based models. Finally, limits and future developments are discussed in the Notes section. Most importantly, species delimitation methods discussed herein are categorized based on five indicators: reliability, availability, scalability, understandability, and usability, all of which are fundamental properties needed for any approach to gain unanimous adoption within the DNA barcoding community moving forward.
Collapse
Affiliation(s)
- Nicolas Hubert
- UMR ISEM (IRD, UM, CNRS), Université de Montpellier, Montpellier, France.
| | - Jarrett D Phillips
- School of Computer Science, University of Guelph, Guelph, ON, Canada
- Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| | - Robert H Hanner
- Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| |
Collapse
|