1
|
Salinas NR, Eshel G, Coruzzi GM, DeSalle R, Tessler M, Little DP. BAD2matrix: Phylogenomic matrix concatenation, indel coding, and more. APPLICATIONS IN PLANT SCIENCES 2024; 12:e11604. [PMID: 39628543 PMCID: PMC11610412 DOI: 10.1002/aps3.11604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 02/24/2024] [Accepted: 03/16/2024] [Indexed: 12/06/2024]
Abstract
Premise Common steps in phylogenomic matrix production include biological sequence concatenation, morphological data concatenation, insertion/deletion (indel) coding, gene content (presence/absence) coding, removing uninformative characters for parsimony analysis, recording with reduced amino acid alphabets, and occupancy filtering. Existing software does not accomplish these tasks on a phylogenomic scale using a single program. Methods and Results BAD2matrix is a Python script that performs the above-mentioned steps in phylogenomic matrix construction for DNA or amino acid sequences as well as morphological data. The script works in UNIX-like environments (e.g., LINUX, MacOS, Windows Subsystem for LINUX). Conclusions BAD2matrix helps simplify phylogenomic pipelines and can be downloaded from https://github.com/dpl10/BAD2matrix/tree/master under a GNU General Public License v2.
Collapse
Affiliation(s)
- Nelson R. Salinas
- Lewis B. and Dorothy Cullman Program for Molecular SystematicsThe New York Botanical Garden, BronxNew YorkUSA
| | - Gil Eshel
- Center for Genomics and Systems BiologyNew York UniversityNew YorkNew YorkUSA
| | - Gloria M. Coruzzi
- Center for Genomics and Systems BiologyNew York UniversityNew YorkNew YorkUSA
| | - Rob DeSalle
- Institute for Comparative GenomicsAmerican Museum of Natural HistoryNew YorkNew YorkUSA
| | - Michael Tessler
- Lewis B. and Dorothy Cullman Program for Molecular SystematicsThe New York Botanical Garden, BronxNew YorkUSA
- Institute for Comparative GenomicsAmerican Museum of Natural HistoryNew YorkNew YorkUSA
- Department of Biology, Medgar Evers CollegeCity University of New YorkBrooklynNew YorkUSA
| | - Damon P. Little
- Lewis B. and Dorothy Cullman Program for Molecular SystematicsThe New York Botanical Garden, BronxNew YorkUSA
| |
Collapse
|
2
|
Molecular tools for resolving Merodon ruficornis group (Diptera, Syrphidae) taxonomy. ORG DIVERS EVOL 2022. [DOI: 10.1007/s13127-022-00571-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
3
|
Boutte J, Fishbein M, Straub SCK. NGS-Indel Coder v2.0: A Streamlined Pipeline to Code Indel Characters in Phylogenomic Data. Methods Mol Biol 2022; 2512:61-72. [PMID: 35817999 DOI: 10.1007/978-1-0716-2429-6_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Hypothesized evolutionary insertions and deletions in nucleic acid sequences (indels) contain significant phylogenetic information and can be integrated in phylogenomic analyses. However, assemblies of short reads obtained from next-generation sequencing (NGS) technologies can contain errors that result in falsely inferred indels that need to be detected and omitted to avoid inclusion in phylogenetic analysis. Here, we detail the commands that comprise a new version of the NGS-Indel Coder pipeline, which was developed to validate indels using assembly read depth.
Collapse
Affiliation(s)
- Julien Boutte
- Department of Biology, Hobart and William Smith Colleges, Geneva, NY, USA.
| | - Mark Fishbein
- Department of Plant Biology, Ecology and Evolution, Oklahoma State University, Stillwater, OK, USA
| | - Shannon C K Straub
- Department of Biology, Hobart and William Smith Colleges, Geneva, NY, USA
| |
Collapse
|
4
|
Echevarría LY, De la Riva I, Venegas PJ, Rojas-Runjaic FJM, R Dias I, Castroviejo-Fisher S. Total evidence and sensitivity phylogenetic analyses of egg-brooding frogs (Anura: Hemiphractidae). Cladistics 2021; 37:375-401. [PMID: 34478194 DOI: 10.1111/cla.12447] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/10/2020] [Indexed: 01/06/2023] Open
Abstract
We study the phylogenetic relationships of egg-brooding frogs, a group of 118 neotropical species, unique among anurans by having embryos with large bell-shaped gills and females carrying their eggs on the dorsum, exposed or inside a pouch. We assembled a total evidence dataset of published and newly generated data containing 51 phenotypic characters and DNA sequences of 20 loci for 143 hemiphractids and 127 outgroup terminals. We performed six analytical strategies combining different optimality criteria (parsimony and maximum likelihood), alignment methods (tree- and similarity-alignment), and three different indel coding schemes (fifth character state, unknown nucleotide, and presence/absence characters matrix). Furthermore, we analyzed a subset of the total evidence dataset to evaluate the impact of phenotypic characters on hemiphractid phylogenetic relationships. Our main results include: (i) monophyly of Hemiphractidae and its six genera for all our analyses, novel relationships among hemiphractid genera, and non-monophyly of Hemiphractinae according to our preferred phylogenetic hypothesis; (ii) non-monophyly of current supraspecific taxonomies of Gastrotheca, an updated taxonomy is provided; (iii) previous differences among studies were mainly caused by differences in analytical factors, not by differences in character/taxon sampling; (iv) optimality criteria, alignment method, and indel coding caused differences among optimal topologies, in that order of degree; (v) in most cases, parsimony analyses are more sensitive to the addition of phenotypic data than maximum likelihood analyses; (vi) adding phenotypic data resulted in an increase of shared clades for most analyses.
Collapse
Affiliation(s)
- Lourdes Y Echevarría
- Laboratório de Sistemática de Vertebrados, Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Av. Ipiranga 6681, Porto Alegre, RS, 90619-900, Brazil.,División de Herpetología-Centro de Ornitología y Biodiversidad (CORBIDI), Urb. Huertos de San Antonio, Santa Rita No. 105 Of. 202, Surco, Lima, Perú
| | - Ignacio De la Riva
- Museo Nacional de Ciencias Naturales-CSIC, C/José Gutiérrez Abascal 2, Madrid, 28006, Spain
| | - Pablo J Venegas
- División de Herpetología-Centro de Ornitología y Biodiversidad (CORBIDI), Urb. Huertos de San Antonio, Santa Rita No. 105 Of. 202, Surco, Lima, Perú
| | | | - Iuri R Dias
- Graduate Program in Zoology, Universidade Estadual de Santa Cruz, Rodovia Jorge Amado, km 16, Ilhéus, Bahia, 45662-900, Brazil
| | - Santiago Castroviejo-Fisher
- Laboratório de Sistemática de Vertebrados, Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Av. Ipiranga 6681, Porto Alegre, RS, 90619-900, Brazil.,Department of Herpetology, American Museum of Natural History, New York, NY, 10024, USA
| |
Collapse
|
5
|
Straub SCK, Boutte J, Fishbein M, Livshultz T. Enabling evolutionary studies at multiple scales in Apocynaceae through Hyb-Seq. APPLICATIONS IN PLANT SCIENCES 2020; 8:e11400. [PMID: 33304663 PMCID: PMC7705337 DOI: 10.1002/aps3.11400] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 09/12/2020] [Indexed: 05/07/2023]
Abstract
PREMISE Apocynaceae is the 10th largest flowering plant family and a focus for study of plant-insect interactions, especially as mediated by secondary metabolites. However, it has few genomic resources relative to its size. Target capture sequencing is a powerful approach for genome reduction that facilitates studies requiring data from the nuclear genome in non-model taxa, such as Apocynaceae. METHODS Transcriptomes were used to design probes for targeted sequencing of putatively single-copy nuclear genes across Apocynaceae. The sequences obtained were used to assess the success of the probe design, the intrageneric and intraspecific variation in the targeted genes, and the utility of the genes for inferring phylogeny. RESULTS From 853 candidate nuclear genes, 835 were consistently recovered in single copy and were variable enough for phylogenomics. The inferred gene trees were useful for coalescent-based species tree analysis, which showed all subfamilies of Apocynaceae as monophyletic, while also resolving relationships among species within the genus Apocynum. Intraspecific comparison of Elytropus chilensis individuals revealed numerous single-nucleotide polymorphisms with potential for use in population-level studies. DISCUSSION Community use of this Hyb-Seq probe set will facilitate and promote progress in the study of Apocynaceae across scales from population genomics to phylogenomics.
Collapse
Affiliation(s)
- Shannon C. K. Straub
- Department of BiologyHobart and William Smith Colleges300 Pulteney StreetGenevaNew York14456USA
| | - Julien Boutte
- Department of BiologyHobart and William Smith Colleges300 Pulteney StreetGenevaNew York14456USA
| | - Mark Fishbein
- Department of Plant Biology, Ecology, and EvolutionOklahoma State University301 Physical SciencesStillwaterOklahoma74078USA
| | - Tatyana Livshultz
- Department of Biodiversity, Earth, and Environmental Sciences and the Academy of Natural SciencesDrexel University1900 Benjamin Franklin ParkwayPhiladelphiaPennsylvania19103USA
| |
Collapse
|