1
|
Wong MK, Chen WJ. Exploring the phylogeny and depth evolution of cusk eels and their relatives (Ophidiiformes: Ophidioidei). Mol Phylogenet Evol 2024; 199:108164. [PMID: 39084413 DOI: 10.1016/j.ympev.2024.108164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 07/27/2024] [Accepted: 07/27/2024] [Indexed: 08/02/2024]
Abstract
With 289 known species in 51 genera, the ophidiiform family Ophidiidae together with their relatives from the Carapidae (36 species in eight genera) of the same suborder Ophidioidei dominate the deep sea, but some occur also in shallow water habitats. Despite their high species diversity in the deep sea and wide bathymetric distributions, their phylogenetic relationships and evolution remain unexplored due in part to sampling difficulties. Thanks to the biodiversity exploratory program entitled "Tropical Deep-Sea Benthos" and joint efforts between Taiwan and French teams for sampling from different localities across the Indo-West Pacific over the last two decades, we are able to compile comprehensive datasets for investigations. In this study, 59 samples representing 36 of 59 known ophidioid genera are selected and used to construct a multi-gene dataset to infer the phylogenetic relationships of ophidioid fishes and their relatives. Our results reveal that the Ophidiidae forms a paraphyletic group with respect to the Carapidae. The four main clades of Ophidioidei resolved are the (1) clade comprising species from the subfamily Brotulinae; (2) clade that includes species in the genera Acanthonus and Xyelacyba; (3) clade grouping Hypopleuron caninum with species from the family Carapidae; and (4) clade containing the species in the subfamily Brotulotaenilinae, Neobythitinae (in part), and Ophidiinae. Accordingly, we suggest the following new revisions based on our results and proposed morphological diagnoses. The subfamily Brotulinae should be elevated to the family level. The genera Xyelacyba and probably Tauredophidium (unsampled in this study) should be included in the newly established family Acanthonidae with Acanthonus. The families Carapidae and Ophidiidae are re-defined. Our time-calibrated phylogenetic and ancestral depth reconstructions enable us to clarify the evolutionary history of ophidiiform fishes and infer past patterns of species distributions at different depths. While Ophidiiformes is inferred to have originated in shallow waters around 96.25 million years ago (Mya), the common ancestor to the Ophidioidei is inferred to have invaded the deep sea around 90.22 Mya, the dates coinciding with the global anoxic event of the OAE2. The observed bathymetric distribution patterns in Ophidioidei most likely point to the mesopelagic zone as the center of origin and diversification. This was followed by multiple events of depth transitions or range expansions towards either shallower waters or greater depth zones, which were likely triggered by past climate changes during the Paleogene-Neogene.
Collapse
Affiliation(s)
- Man-Kwan Wong
- Institute of Oceanography, National Taiwan University, No.1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan.
| | - Wei-Jen Chen
- Institute of Oceanography, National Taiwan University, No.1, Sec. 4, Roosevelt Road, Taipei 10617, Taiwan.
| |
Collapse
|
2
|
Frost LA, Bedoya AM, Lagomarsino LP. Artifactual Orthologs and the Need for Diligent Data Exploration in Complex Phylogenomic Datasets: A Museomic Case Study from the Andean Flora. Syst Biol 2024; 73:308-322. [PMID: 38170162 DOI: 10.1093/sysbio/syad076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 11/20/2023] [Accepted: 01/02/2024] [Indexed: 01/05/2024] Open
Abstract
The Andes mountains of western South America are a globally important biodiversity hotspot, yet there is a paucity of resolved phylogenies for plant clades from this region. Filling an important gap in our understanding of the World's richest flora, we present the first phylogeny of Freziera (Pentaphylacaceae), an Andean-centered, cloud forest radiation. Our dataset was obtained via hybrid-enriched target sequence capture of Angiosperms353 universal loci for 50 of the ca. 75 spp., obtained almost entirely from herbarium specimens. We identify high phylogenomic complexity in Freziera, including the presence of data artifacts. Via by-eye observation of gene trees, detailed examination of warnings from recently improved assembly pipelines, and gene tree filtering, we identified that artifactual orthologs (i.e., the presence of only one copy of a multicopy gene due to differential assembly) were an important source of gene tree heterogeneity that had a negative impact on phylogenetic inference and support. These artifactual orthologs may be common in plant phylogenomic datasets, where multiple instances of genome duplication are common. After accounting for artifactual orthologs as source of gene tree error, we identified a significant, but nonspecific signal of introgression using Patterson's D and f4 statistics. Despite phylogenomic complexity, we were able to resolve Freziera into 9 well-supported subclades whose evolution has been shaped by multiple evolutionary processes, including incomplete lineage sorting, historical gene flow, and gene duplication. Our results highlight the complexities of plant phylogenomics, which are heightened in Andean radiations, and show the impact of filtering data processing artifacts and standard filtering approaches on phylogenetic inference.
Collapse
Affiliation(s)
- Laura A Frost
- Shirley C. Tucker Herbarium, Department of Biological Sciences, Louisiana State University, Life Science Annex Building A257, Baton Rouge, LA 70803, USA
- Biology Department, University of South Alabama, 5871 USA N Dr, Mobile, AL 36688, USA
| | - Ana M Bedoya
- Shirley C. Tucker Herbarium, Department of Biological Sciences, Louisiana State University, Life Science Annex Building A257, Baton Rouge, LA 70803, USA
| | - Laura P Lagomarsino
- Shirley C. Tucker Herbarium, Department of Biological Sciences, Louisiana State University, Life Science Annex Building A257, Baton Rouge, LA 70803, USA
| |
Collapse
|
3
|
Myers BM, Burns KJ, Clark CJ, Brelsford A. Sampling affects population genetic inference: A case study of the Allen's (Selasphorus sasin) and rufous hummingbird (Selasphorus rufus). J Hered 2023; 114:625-636. [PMID: 37455658 DOI: 10.1093/jhered/esad044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 07/12/2023] [Indexed: 07/18/2023] Open
Abstract
Gene flow can affect evolutionary inference when species are undersampled. Here, we evaluate the effects of gene flow and geographic sampling on demographic inference of 2 hummingbirds that hybridize, Allen's hummingbird (Selasphorus sasin) and rufous hummingbird (Selasphorus rufus). Using whole-genome data and extensive geographic sampling, we find widespread connectivity, with introgression far beyond the Allen's × rufous hybrid zone, although the Z chromosome resists introgression beyond the hybrid zone. We test alternative hypotheses of speciation history of Allen's, rufous, and Calliope (S. calliope) hummingbird and find that rufous hummingbird is the sister taxon to Allen's hummingbird, and Calliope hummingbird is the outgroup. A model treating the 2 subspecies of Allen's hummingbird as a single panmictic population fit observed genetic data better than models treating the subspecies as distinct populations, in contrast to morphological and behavioral differences and analyses of spatial population structure. With additional sampling, our study builds upon recent studies that came to conflicting conclusions regarding the evolutionary histories of these 2 species. Our results stress the importance of thorough geographic sampling when assessing demographic history in the presence of gene flow.
Collapse
Affiliation(s)
- Brian M Myers
- Department of Biological Sciences, San Diego State University, San Diego, CA, United States
| | - Kevin J Burns
- Department of Biological Sciences, San Diego State University, San Diego, CA, United States
| | - Christopher J Clark
- Department of Evolution, Ecology, and Organismal Biology, Speith Hall, University of California, Riverside, CA, United States
| | - Alan Brelsford
- Department of Evolution, Ecology, and Organismal Biology, Speith Hall, University of California, Riverside, CA, United States
| |
Collapse
|
4
|
Vera-Paz SI, Granados Mendoza C, Díaz Contreras Díaz DD, Jost M, Salazar GA, Rossado AJ, Montes-Azcué CA, Hernández-Gutiérrez R, Magallón S, Sánchez-González LA, Gouda EJ, Cabrera LI, Ramírez-Morillo IM, Flores-Cruz M, Granados-Aguilar X, Martínez-García AL, Hornung-Leoni CT, Barfuss MH, Wanke S. Plastome phylogenomics reveals an early Pliocene North- and Central America colonization by long-distance dispersal from South America of a highly diverse bromeliad lineage. FRONTIERS IN PLANT SCIENCE 2023; 14:1205511. [PMID: 37426962 PMCID: PMC10326849 DOI: 10.3389/fpls.2023.1205511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 05/31/2023] [Indexed: 07/11/2023]
Abstract
Understanding the spatial and temporal frameworks of species diversification is fundamental in evolutionary biology. Assessing the geographic origin and dispersal history of highly diverse lineages of rapid diversification can be hindered by the lack of appropriately sampled, resolved, and strongly supported phylogenetic contexts. The use of currently available cost-efficient sequencing strategies allows for the generation of a substantial amount of sequence data for dense taxonomic samplings, which together with well-curated geographic information and biogeographic models allow us to formally test the mode and tempo of dispersal events occurring in quick succession. Here, we assess the spatial and temporal frameworks for the origin and dispersal history of the expanded clade K, a highly diverse Tillandsia subgenus Tillandsia (Bromeliaceae, Poales) lineage hypothesized to have undergone a rapid radiation across the Neotropics. We assembled full plastomes from Hyb-Seq data for a dense taxon sampling of the expanded clade K plus a careful selection of outgroup species and used them to estimate a time- calibrated phylogenetic framework. This dated phylogenetic hypothesis was then used to perform biogeographic model tests and ancestral area reconstructions based on a comprehensive compilation of geographic information. The expanded clade K colonized North and Central America, specifically the Mexican transition zone and the Mesoamerican dominion, by long-distance dispersal from South America at least 4.86 Mya, when most of the Mexican highlands were already formed. Several dispersal events occurred subsequently northward to the southern Nearctic region, eastward to the Caribbean, and southward to the Pacific dominion during the last 2.8 Mya, a period characterized by pronounced climate fluctuations, derived from glacial-interglacial climate oscillations, and substantial volcanic activity, mainly in the Trans-Mexican Volcanic Belt. Our taxon sampling design allowed us to calibrate for the first time several nodes, not only within the expanded clade K focal group but also in other Tillandsioideae lineages. We expect that this dated phylogenetic framework will facilitate future macroevolutionary studies and provide reference age estimates to perform secondary calibrations for other Tillandsioideae lineages.
Collapse
Affiliation(s)
- Sandra I. Vera-Paz
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Carolina Granados Mendoza
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Institut für Botanik, Technische Universität Dresden, Dresden, Germany
| | - Daniel D. Díaz Contreras Díaz
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Matthias Jost
- Institut für Botanik, Technische Universität Dresden, Dresden, Germany
| | - Gerardo A. Salazar
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Andrés J. Rossado
- Laboratorio de Sistemática de Plantas Vasculares, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Claudia A. Montes-Azcué
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Rebeca Hernández-Gutiérrez
- Departament of Evolution, Ecology, and Organismal Biology, University of California, Riverside, CA, United States
| | - Susana Magallón
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Luis A. Sánchez-González
- Museo de Zoología “Alfonso L. Herrera”, Departamento de Biología Evolutiva, Facultad de Ciencias, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Eric J. Gouda
- Botanical Garden, Utrecht University, Utrecht, Netherlands
| | - Lidia I. Cabrera
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | | | - María Flores-Cruz
- Departamento El Hombre y su Ambiente, División de Ciencias Biológicas y de la Salud, Universidad Autónoma Metropolitana, Unidad Xochimilco, Mexico City, Mexico
| | - Xochitl Granados-Aguilar
- Departamento de Ecología Evolutiva, Instituto de Ecología, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Ana L. Martínez-García
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Centro de Investigaciones Biológicas, Herbario HGOM, Instituto de Ciencias Básicas e Ingeniería, Universidad Autónoma del Estado de Hidalgo, Hidalgo, Mexico
| | - Claudia T. Hornung-Leoni
- Centro de Investigaciones Biológicas, Herbario HGOM, Instituto de Ciencias Básicas e Ingeniería, Universidad Autónoma del Estado de Hidalgo, Hidalgo, Mexico
| | - Michael H.J. Barfuss
- Departament of Botany and Biodiversity Research, University of Vienna, Vienna, Austria
| | - Stefan Wanke
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Mexico City, Mexico
- Institut für Botanik, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
5
|
Yi H, Dong S, Yang L, Wang J, Kidner C, Kang M. Genome-wide data reveal cryptic diversity and hybridization in a group of tree ferns. Mol Phylogenet Evol 2023; 184:107801. [PMID: 37088242 DOI: 10.1016/j.ympev.2023.107801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 04/07/2023] [Accepted: 04/18/2023] [Indexed: 04/25/2023]
Abstract
Discovery of cryptic diversity is essential to understanding both the process of speciation and the conservation of species. Determining species boundaries in fern lineages represents a major challenge due to lack of morphologically diagnostic characters and frequent hybridization. Genomic data has substantially enhanced our understanding of the speciation process, increased the resolution of species delimitation studies, and led to the discovery of cryptic diversity. Here, we employed restriction-site-associated DNA sequencing (RAD-seq) and integrated phylogenomic and population genomic analyses to investigate phylogenetic relationships and evolutionary history of 16 tree ferns with marginate scales (Cyatheaceae) from China and Vietnam. We conducted multiple species delimitation analyses using the multispecies coalescent (MSC) model and novel approaches based on genealogical divergence index (gdi) and isolation by distance (IBD). In addition, we inferred species trees using concatenation and several coalescent-based methods, and assessed hybridization patterns and rate of gene flow across the phylogeny. We obtained highly supported and generally congruent phylogenies inferred from concatenated and summary-coalescent methods, and the monophyly of all currently recognized species were strongly supported. Our results revealed substantial evidence of cryptic diversity in three widely distributed Gymnosphaera species, each of which was composite of two highly structure lineages that may correspond to cryptic species. We found that hybridization was fairly common between not only closely related species, but also distantly related species. Collectively, it appears that scaly tree ferns may contain cryptic diversity and hybridization has played an important role throughout the evolutionary history of this group.
Collapse
Affiliation(s)
- Huiqin Yi
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China; South China National Botanical Garden, Guangzhou 510650, China
| | - Shiying Dong
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China; South China National Botanical Garden, Guangzhou 510650, China
| | - Lihua Yang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China; South China National Botanical Garden, Guangzhou 510650, China
| | - Jing Wang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China; South China National Botanical Garden, Guangzhou 510650, China
| | - Catherine Kidner
- Institute of Molecular Plant Sciences, University of Edinburgh, Daniel Rutherford Building Max Born Crescent, The King's Buildings, Edinburgh EH9 3BF, UK; Royal Botanic Garden Edinburgh, 20a Inverleith Row, Edinburgh EH3 5LR, UK
| | - Ming Kang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China; South China National Botanical Garden, Guangzhou 510650, China.
| |
Collapse
|
6
|
Ortiz-Sepulveda CM, Genete M, Blassiau C, Godé C, Albrecht C, Vekemans X, Van Bocxlaer B. Target enrichment of long open reading frames and ultraconserved elements to link microevolution and macroevolution in non-model organisms. Mol Ecol Resour 2023; 23:659-679. [PMID: 36349833 DOI: 10.1111/1755-0998.13735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 10/09/2022] [Accepted: 10/19/2022] [Indexed: 11/10/2022]
Abstract
Despite the increasing accessibility of high-throughput sequencing, obtaining high-quality genomic data on non-model organisms without proximate well-assembled and annotated genomes remains challenging. Here, we describe a workflow that takes advantage of distant genomic resources and ingroup transcriptomes to select and jointly enrich long open reading frames (ORFs) and ultraconserved elements (UCEs) from genomic samples for integrative studies of microevolutionary and macroevolutionary dynamics. This workflow is applied to samples of the African unionid bivalve tribe Coelaturini (Parreysiinae) at basin and continent-wide scales. Our results indicate that ORFs are efficiently captured without prior identification of intron-exon boundaries. The enrichment of UCEs was less successful, but nevertheless produced substantial data sets. Exploratory continent-wide phylogenetic analyses with ORF supercontigs (>515,000 parsimony informative sites) resulted in a fully resolved phylogeny, the backbone of which was also retrieved with UCEs (>11,000 informative sites). Variant calling on ORFs and UCEs of Coelaturini from the Malawi Basin produced ~2000 SNPs per population pair. Estimates of nucleotide diversity and population differentiation were similar for ORFs and UCEs. They were low compared to previous estimates in molluscs, but comparable to those in recently diversifying Malawi cichlids and other taxa at an early stage of speciation. Skimming off-target sequence data from the same enriched libraries of Coelaturini from the Malawi Basin, we reconstructed the maternally-inherited mitogenome, which displays the gene order inferred for the most recent common ancestor of Unionidae. Overall, our workflow and results provide exciting perspectives for integrative genomic studies of microevolutionary and macroevolutionary dynamics in non-model organisms.
Collapse
Affiliation(s)
| | - Mathieu Genete
- CNRS, Univ. Lille, UMR 8198 - Evo-Eco-Paleo, F-59000 Lille, France
| | | | - Cécile Godé
- CNRS, Univ. Lille, UMR 8198 - Evo-Eco-Paleo, F-59000 Lille, France
| | - Christian Albrecht
- Department of Animal Ecology and Systematics, Justus Liebig University, D-35392 Giessen, Germany.,Department of Biology, Mbarara University of Science and Technology, Mbarara, Uganda
| | - Xavier Vekemans
- CNRS, Univ. Lille, UMR 8198 - Evo-Eco-Paleo, F-59000 Lille, France
| | | |
Collapse
|
7
|
Nunes R, Storer C, Doleck T, Kawahara AY, Pierce NE, Lohman DJ. Predictors of sequence capture in a large-scale anchored phylogenomics project. Front Ecol Evol 2022. [DOI: 10.3389/fevo.2022.943361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2023] Open
Abstract
Next-generation sequencing (NGS) technologies have revolutionized phylogenomics by decreasing the cost and time required to generate sequence data from multiple markers or whole genomes. Further, the fragmented DNA of biological specimens collected decades ago can be sequenced with NGS, reducing the need for collecting fresh specimens. Sequence capture, also known as anchored hybrid enrichment, is a method to produce reduced representation libraries for NGS sequencing. The technique uses single-stranded oligonucleotide probes that hybridize with pre-selected regions of the genome that are sequenced via NGS, culminating in a dataset of numerous orthologous loci from multiple taxa. Phylogenetic analyses using these sequences have the potential to resolve deep and shallow phylogenetic relationships. Identifying the factors that affect sequence capture success could save time, money, and valuable specimens that might be destructively sampled despite low likelihood of sequencing success. We investigated the impacts of specimen age, preservation method, and DNA concentration on sequence capture (number of captured sequences and sequence quality) while accounting for taxonomy and extracted tissue type in a large-scale butterfly phylogenomics project. This project used two probe sets to extract 391 loci or a subset of 13 loci from over 6,000 butterfly specimens. We found that sequence capture is a resilient method capable of amplifying loci in samples of varying age (0–111 years), preservation method (alcohol, papered, pinned), and DNA concentration (0.020 ng/μl - 316 ng/ul). Regression analyses demonstrate that sequence capture is positively correlated with DNA concentration. However, sequence capture and DNA concentration are negatively correlated with sample age and preservation method. Our findings suggest that sequence capture projects should prioritize the use of alcohol-preserved samples younger than 20 years old when available. In the absence of such specimens, dried samples of any age can yield sequence data, albeit with returns that diminish with increasing age.
Collapse
|
8
|
Zhang C, Mirarab S. Weighting by Gene Tree Uncertainty Improves Accuracy of Quartet-based Species Trees. Mol Biol Evol 2022; 39:6750035. [PMID: 36201617 PMCID: PMC9750496 DOI: 10.1093/molbev/msac215] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Revised: 09/20/2022] [Accepted: 10/03/2022] [Indexed: 01/07/2023] Open
Abstract
Phylogenomic analyses routinely estimate species trees using methods that account for gene tree discordance. However, the most scalable species tree inference methods, which summarize independently inferred gene trees to obtain a species tree, are sensitive to hard-to-avoid errors introduced in the gene tree estimation step. This dilemma has created much debate on the merits of concatenation versus summary methods and practical obstacles to using summary methods more widely and to the exclusion of concatenation. The most successful attempt at making summary methods resilient to noisy gene trees has been contracting low support branches from the gene trees. Unfortunately, this approach requires arbitrary thresholds and poses new challenges. Here, we introduce threshold-free weighting schemes for the quartet-based species tree inference, the metric used in the popular method ASTRAL. By reducing the impact of quartets with low support or long terminal branches (or both), weighting provides stronger theoretical guarantees and better empirical performance than the unweighted ASTRAL. Our simulations show that weighting improves accuracy across many conditions and reduces the gap with concatenation in conditions with low gene tree discordance and high noise. On empirical data, weighting improves congruence with concatenation and increases support. Together, our results show that weighting, enabled by a new optimization algorithm we introduce, improves the utility of summary methods and can reduce the incongruence often observed across analytical pipelines.
Collapse
Affiliation(s)
- Chao Zhang
- Bioinformatics and Systems Biology, UC San Diego, La Jolla, CA, USA
| | | |
Collapse
|
9
|
Mahbub S, Sawmya S, Saha A, Reaz R, Rahman MS, Bayzid MS. Quartet Based Gene Tree Imputation Using Deep Learning Improves Phylogenomic Analyses Despite Missing Data. JOURNAL OF COMPUTATIONAL BIOLOGY : A JOURNAL OF COMPUTATIONAL MOLECULAR CELL BIOLOGY 2022; 29:1156-1172. [PMID: 36048555 DOI: 10.1089/cmb.2022.0212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, for a combination of reasons (ranging from sampling biases to more biological causes, as in gene birth and loss), gene trees are often incomplete, meaning that not all species of interest have a common set of genes. Incomplete gene trees can potentially impact the accuracy of phylogenomic inference. We, for the first time, introduce the problem of imputing the quartet distribution induced by a set of incomplete gene trees, which involves adding the missing quartets back to the quartet distribution. We present Quartet based Gene tree Imputation using Deep Learning (QT-GILD), an automated and specially tailored unsupervised deep learning technique, accompanied by cues from natural language processing, which learns the quartet distribution in a given set of incomplete gene trees and generates a complete set of quartets accordingly. QT-GILD is a general-purpose technique needing no explicit modeling of the subject system or reasons for missing data or gene tree heterogeneity. Experimental studies on a collection of simulated and empirical datasets suggest that QT-GILD can effectively impute the quartet distribution, which results in a dramatic improvement in the species tree accuracy. Remarkably, QT-GILD not only imputes the missing quartets but can also account for gene tree estimation error. Therefore, QT-GILD advances the state-of-the-art in species tree estimation from gene trees in the face of missing data.
Collapse
Affiliation(s)
- Sazan Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh.,Department of Computer Science, University of Maryland, College Park, Maryland, USA
| | - Shashata Sawmya
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Arpita Saha
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Rezwana Reaz
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - M Sohel Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| |
Collapse
|
10
|
Out of chaos: Phylogenomics of Asian Sonerileae. Mol Phylogenet Evol 2022; 175:107581. [PMID: 35810973 DOI: 10.1016/j.ympev.2022.107581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Revised: 05/23/2022] [Accepted: 05/26/2022] [Indexed: 11/22/2022]
Abstract
Sonerileae is a diverse Melastomataceae lineage comprising ca. 1000 species in 44 genera, with >70% of genera and species distributed in Asia. Asian Sonerileae are taxonomically intractable with obscure generic circumscriptions. The backbone phylogeny of this group remains poorly resolved, possibly due to complexity caused by rapid species radiation in early and middle Miocene, which hampers further systematic study. Here, we used genome resequencing data to reconstruct the phylogeny of Asian Sonerileae. Three parallel datasets, viz. single-copy ortholog (SCO), genomic SNPs, and whole plastome, were assembled from genome resequencing data of 205 species for this purpose. Based on these genome-scale data, we provided the first well resolved phylogeny of Asian Sonerileae, with 34 major clades identified and 74% of the interclade relationships consistently resolved by both SCO and genomic data. Meanwhile, widespread phylogenetic discordance was detected among SCO gene trees as well as species trees reconstructed using different tree estimation methods (concatenation/site-based coalescent method/summary method) or different datasets (SCO/genomic/plastome). We explored sources of discordance using multiple approaches and found that the observed discordance in Asian Sonerileae was mainly caused by a combination of biased distribution of missing data, random noise from uninformative genes, incomplete lineage sorting, and hybridization/introgression. Exploration of these sources can enable us to generate hypotheses for future testing, which is the first step towards understanding the evolution of Asian Sonerileae. We also detected high levels of homoplasy for some characters traditionally used in taxonomy, which explains current chaotic generic delimitations. The backbone phylogeny of Asian Sonerileae revealed in this study offers a solid basis for future taxonomic revision at the generic level.
Collapse
|
11
|
Xiong H, Wang D, Shao C, Yang X, Yang J, Ma T, Davis CC, Liu L, Xi Z. Species Tree Estimation and the Impact of Gene Loss Following Whole-Genome Duplication. Syst Biol 2022; 71:1348-1361. [PMID: 35689633 PMCID: PMC9558847 DOI: 10.1093/sysbio/syac040] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 12/02/2022] Open
Abstract
Whole-genome duplication (WGD) occurs broadly and repeatedly across the history of eukaryotes and is recognized as a prominent evolutionary force, especially in plants. Immediately following WGD, most genes are present in two copies as paralogs. Due to this redundancy, one copy of a paralog pair commonly undergoes pseudogenization and is eventually lost. When speciation occurs shortly after WGD; however, differential loss of paralogs may lead to spurious phylogenetic inference resulting from the inclusion of pseudoorthologs–paralogous genes mistakenly identified as orthologs because they are present in single copies within each sampled species. The influence and impact of including pseudoorthologs versus true orthologs as a result of gene extinction (or incomplete laboratory sampling) are only recently gaining empirical attention in the phylogenomics community. Moreover, few studies have yet to investigate this phenomenon in an explicit coalescent framework. Here, using mathematical models, numerous simulated data sets, and two newly assembled empirical data sets, we assess the effect of pseudoorthologs on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and differential gene loss scenarios following WGD. When gene loss occurs along the terminal branches of the species tree, alignment-based (BPP) and gene-tree-based (ASTRAL, MP-EST, and STAR) coalescent methods are adversely affected as the degree of ILS increases. This can be greatly improved by sampling a sufficiently large number of genes. Under the same circumstances, however, concatenation methods consistently estimate incorrect species trees as the number of genes increases. Additionally, pseudoorthologs can greatly mislead species tree inference when gene loss occurs along the internal branches of the species tree. Here, both coalescent and concatenation methods yield inconsistent results. These results underscore the importance of understanding the influence of pseudoorthologs in the phylogenomics era. [Coalescent method; concatenation method; incomplete lineage sorting; pseudoorthologs; single-copy gene; whole-genome duplication.]
Collapse
Affiliation(s)
- Haifeng Xiong
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Danying Wang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Chen Shao
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Xuchen Yang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Jialin Yang
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Tao Ma
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| | - Charles C Davis
- Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, MA 02138, USA
| | - Liang Liu
- Department of Statistics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Zhenxiang Xi
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, China
| |
Collapse
|
12
|
DeRaad DA, McCormack JE, Chen N, Peterson AT, Moyle RG. Combining Species Delimitation, Species Trees, and Tests for Gene Flow Clarifies Complex Speciation in Scrub-Jays. Syst Biol 2022; 71:1453-1470. [PMID: 35552760 DOI: 10.1093/sysbio/syac034] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 05/02/2022] [Accepted: 05/06/2022] [Indexed: 11/13/2022] Open
Abstract
Complex speciation, involving rapid divergence and multiple bouts of post-divergence gene flow, can obfuscate phylogenetic relationships and species limits. In North America, cases of complex speciation are common, due at least in part to the cyclical Pleistocene glacial history of the continent. Scrub-jays in the genus Aphelocoma provide a useful case study in complex speciation because their range throughout North America is structured by phylogeographic barriers with multiple cases of secondary contact between divergent lineages. Here, we show that a comprehensive approach to genomic reconstruction of evolutionary history, i.e., synthesizing results from species delimitation, species tree reconstruction, demographic model testing, and tests for gene flow, is capable of clarifying evolutionary history despite complex speciation. We find concordant evidence across all statistical approaches for the distinctiveness of an endemic southern Mexico lineage (A. w. sumichrasti), culminating in support for the species status of this lineage under any commonly applied species concept. We also find novel genomic evidence for the species status of a Texas endemic lineage A. w. texana, for which equivocal species delimitation results were clarified by demographic modeling and spatially explicit models of gene flow. Finally, we find that complex signatures of both ancient and modern gene flow between the non-sister California Scrub-Jay (A. californica) and Woodhouse's Scrub-Jay (A. woodhouseii), result in discordant gene trees throughout the species' genomes despite clear support for their overall isolation and species status. In sum, we find that a multi-faceted approach to genomic analysis can increase our understanding of complex speciation histories, even in well-studied groups. Given the emerging recognition that complex speciation is relatively commonplace, the comprehensive framework that we demonstrate for interrogation of species limits and evolutionary history using genomic data can provide a necessary roadmap for disentangling the impacts of gene flow and incomplete lineage sorting to better understand the systematics of other groups with similarly complex evolutionary histories.
Collapse
Affiliation(s)
- Devon A DeRaad
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence KS, 66045, USA
| | - John E McCormack
- Moore Laboratory of Zoology,Occidental College, Los Angeles, CA, 90041, USA
| | - Nancy Chen
- Department of Biology, University of Rochester, Rochester, NY, 14627, USA
| | - A Townsend Peterson
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence KS, 66045, USA
| | - Robert G Moyle
- Biodiversity Institute and Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence KS, 66045, USA
| |
Collapse
|
13
|
Hancock ZB, Lehmberg ES, Blackmon H. Phylogenetics in Space: How Continuous Spatial Structure Impacts Tree Inference. Mol Phylogenet Evol 2022; 173:107505. [PMID: 35577296 DOI: 10.1016/j.ympev.2022.107505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 04/08/2022] [Accepted: 05/06/2022] [Indexed: 11/26/2022]
Abstract
The tendency to discretize biology permeates taxonomy and systematics, leading to models that simplify the often continuous nature of populations. Even when the assumption of panmixia is relaxed, most models still assume some degree of discrete structure. The multispecies coalescent has emerged as a powerful model in phylogenetics, but in its common implementation is entirely space-independent - what we call the "missing z-axis". In this article, we review the many lines of evidence for how continuous spatial structure can impact phylogenetic inference. We illustrate and expand on these by using complex continuous-space demographic models that include distinct modes of speciation. We find that the impact of spatial structure permeates all aspects of phylogenetic inference, including gene tree stoichiometry, topological and branch-length variance, network estimation, and species delimitation. We conclude by utilizing our results to suggest how researchers can identify spatial structure in phylogenetic datasets.
Collapse
|
14
|
Willson J, Roddur MS, Liu B, Zaharias P, Warnow T. DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition. Syst Biol 2022; 71:610-629. [PMID: 34450658 PMCID: PMC9016570 DOI: 10.1093/sysbio/syab070] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 08/18/2021] [Accepted: 08/23/2021] [Indexed: 11/21/2022] Open
Abstract
Species tree inference from gene family trees is a significant problem in computational biology. However, gene tree heterogeneity, which can be caused by several factors including gene duplication and loss, makes the estimation of species trees very challenging. While there have been several species tree estimation methods introduced in recent years to specifically address gene tree heterogeneity due to gene duplication and loss (such as DupTree, FastMulRFS, ASTRAL-Pro, and SpeciesRax), many incur high cost in terms of both running time and memory. We introduce a new approach, DISCO, that decomposes the multi-copy gene family trees into many single copy trees, which allows for methods previously designed for species tree inference in a single copy gene tree context to be used. We prove that using DISCO with ASTRAL (i.e., ASTRAL-DISCO) is statistically consistent under the GDL model, provided that ASTRAL-Pro correctly roots and tags each gene family tree. We evaluate DISCO paired with different methods for estimating species trees from single copy genes (e.g., ASTRAL, ASTRID, and IQ-TREE) under a wide range of model conditions, and establish that high accuracy can be obtained even when ASTRAL-Pro is not able to correctly roots and tags the gene family trees. We also compare results using MI, an alternative decomposition strategy from Yang Y. and Smith S.A. (2014), and find that DISCO provides better accuracy, most likely as a result of covering more of the gene family tree leafset in the output decomposition. [Concatenation analysis; gene duplication and loss; species tree inference; summary method.].
Collapse
Affiliation(s)
- James Willson
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Mrinmoy Saha Roddur
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Baqiao Liu
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Paul Zaharias
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
15
|
Rahman MA, Tutul AA, Abdullah SM, Bayzid MS. CHAPAO: Likelihood and hierarchical reference-based representation of biomolecular sequences and applications to compressing multiple sequence alignments. PLoS One 2022; 17:e0265360. [PMID: 35436292 PMCID: PMC9015123 DOI: 10.1371/journal.pone.0265360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2021] [Accepted: 02/28/2022] [Indexed: 11/18/2022] Open
Abstract
Background
High-throughput experimental technologies are generating tremendous amounts of genomic data, offering valuable resources to answer important questions and extract biological insights. Storing this sheer amount of genomic data has become a major concern in bioinformatics. General purpose compression techniques (e.g. gzip, bzip2, 7-zip) are being widely used due to their pervasiveness and relatively good speed. However, they are not customized for genomic data and may fail to leverage special characteristics and redundancy of the biomolecular sequences.
Results
We present a new lossless compression method CHAPAO (COmpressing Alignments using Hierarchical and Probabilistic Approach), which is especially designed for multiple sequence alignments (MSAs) of biomolecular data and offers very good compression gain. We have introduced a novel hierarchical referencing technique to represent biomolecular sequences which combines likelihood based analyses of the sequence similarities and graph theoretic algorithms. We performed an extensive evaluation study using a collection of real biological data from the avian phylogenomics project, 1000 plants project (1KP), and 16S and 23S rRNA datasets. We report the performance of CHAPAO in comparison with general purpose compression techniques as well as with MFCompress and Nucleotide Archival Format (NAF)—two of the best known methods especially designed for FASTA files. Experimental results suggest that CHAPAO offers significant improvements in compression gain over most other alternative methods. CHAPAO is freely available as an open source software at https://github.com/ashiq24/CHAPAO.
Conclusion
CHAPAO advances the state-of-the-art in compression algorithms and represents a potential alternative to the general purpose compression techniques as well as to the existing specialized compression techniques for biomolecular sequences.
Collapse
Affiliation(s)
- Md Ashiqur Rahman
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Abdullah Aman Tutul
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Sifat Muhammad Abdullah
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Md. Shamsuzzoha Bayzid
- Department of Computer Science and Engineering/Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
- * E-mail:
| |
Collapse
|
16
|
Zhu T, Flouri T, Yang Z. A simulation study to examine the impact of recombination on phylogenomic inferences under the multispecies coalescent model. Mol Ecol 2022; 31:2814-2829. [PMID: 35313033 PMCID: PMC9321900 DOI: 10.1111/mec.16433] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 01/25/2022] [Accepted: 02/28/2022] [Indexed: 11/28/2022]
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing 100190 China
- Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing 100190 China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment University College London London WC1E 6BT UK
| |
Collapse
|
17
|
Giaretta A, Murphy B, Maurin O, Mazine FF, Sano P, Lucas E. Phylogenetic Relationships Within the Hyper-Diverse Genus Eugenia (Myrtaceae: Myrteae) Based on Target Enrichment Sequencing. FRONTIERS IN PLANT SCIENCE 2022; 12:759460. [PMID: 35185945 PMCID: PMC8855041 DOI: 10.3389/fpls.2021.759460] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 11/29/2021] [Indexed: 06/14/2023]
Abstract
Eugenia is one of the most taxonomically challenging lineages of flowering plants, in which morphological delimitation has changed over the last few years resulting from recent phylogenetic study based on molecular data. Efforts, until now, have been limited to Sanger sequencing of mostly plastid markers. These phylogenetic studies indicate 11 clades formalized as infrageneric groups. However, relationships among these clades are poorly supported at key nodes and inconsistent between studies, particularly along the backbone and within Eugenia sect. Umbellatae encompasses ca. 700 species. To resolve and better understand systematic discordance, 54 Eugenia taxa were subjected to phylogenomic Hyb-Seq using 353 low-copy nuclear genes. Twenty species trees based on coding and non-coding loci of nuclear and plastid datasets were recovered using coalescent and concatenated approaches. Concordant and conflicting topologies were assessed by comparing tree landscapes, topology tests, and gene and site concordance factors. The topologies are similar except between nuclear and plastid datasets. The coalescent trees better accommodate disparity in the intron dataset, which contains more parsimony informative sites, while concatenated trees recover more conservative topologies, as they have narrower distribution in the tree landscape. This suggests that highly supported phylogenetic relationships determined in previous studies do not necessarily indicate overwhelming concordant signal. Congruence must be interpreted carefully especially in concatenated datasets. Despite this, the congruence between the multi-species coalescent (MSC) approach and concatenated tree topologies found here is notable. Our analysis does not support Eugenia subg. Pseudeugenia or sect. Pilothecium, as currently circumscribed, suggesting necessary taxonomic reassessment. Five clades are further discussed within Eugenia sect. Umbellatae progress toward its division into workable clades. While targeted sequencing provides a massive quantity of data that improves phylogenetic resolution in Eugenia, uncertainty still remains in Eugenia sect. Umbellatae. The general pattern of higher site coefficient factor (CF) than gene CF in the backbone of Eugenia suggests stochastic error from limited signal. Tree landscapes in combination with concordance factor scores, as implemented here, provide a comprehensive approach that incorporates several phylogenetic hypotheses. We believe the protocols employed here will be of use for future investigations on the evolutionary history of Myrtaceae.
Collapse
Affiliation(s)
- Augusto Giaretta
- Faculdade de Ciências Biológicas e Ambientais, Universidade Federal da Grande Dourados, Unidade II, Dourados, Brazil
- Laboratório de Sistemática Vegetal, Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Bruce Murphy
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Surrey, United Kingdom
- Department of Life Sciences, Imperial College, London, United Kingdom
| | - Olivier Maurin
- Jodrell Laboratory, Royal Botanic Gardens, Kew, Surrey, United Kingdom
| | - Fiorella F. Mazine
- Centro de Ciências e Tecnologias para a Sustentabilidade, Universidade Federal de São Carlos, Campus Sorocaba, Sorocaba, Brazil
| | - Paulo Sano
- Laboratório de Sistemática Vegetal, Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Eve Lucas
- Herbarium, Royal Botanic Gardens, Kew, Surrey, United Kingdom
| |
Collapse
|
18
|
Jiao X, Flouri T, Yang Z. Multispecies coalescent and its applications to infer species phylogenies and cross-species gene flow. Natl Sci Rev 2022; 8:nwab127. [PMID: 34987842 PMCID: PMC8692950 DOI: 10.1093/nsr/nwab127] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 07/10/2021] [Accepted: 07/11/2021] [Indexed: 02/06/2023] Open
Abstract
Multispecies coalescent (MSC) is the extension of the single-population coalescent model to multiple species. It integrates the phylogenetic process of species divergences and the population genetic process of coalescent, and provides a powerful framework for a number of inference problems using genomic sequence data from multiple species, including estimation of species divergence times and population sizes, estimation of species trees accommodating discordant gene trees, inference of cross-species gene flow and species delimitation. In this review, we introduce the major features of the MSC model, discuss full-likelihood and heuristic methods of species tree estimation and summarize recent methodological advances in inference of cross-species gene flow. We discuss the statistical and computational challenges in the field and research directions where breakthroughs may be likely in the next few years.
Collapse
Affiliation(s)
- Xiyun Jiao
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK
| |
Collapse
|
19
|
Abstract
Motivation Phylogenomics faces a dilemma: on the one hand, most accurate species and gene tree estimation methods are those that co-estimate them; on the other hand, these co-estimation methods do not scale to moderately large numbers of species. The summary-based methods, which first infer gene trees independently and then combine them, are much more scalable but are prone to gene tree estimation error, which is inevitable when inferring trees from limited-length data. Gene tree estimation error is not just random noise and can create biases such as long-branch attraction. Results We introduce a scalable likelihood-based approach to co-estimation under the multi-species coalescent model. The method, called quartet co-estimation (QuCo), takes as input independently inferred distributions over gene trees and computes the most likely species tree topology and internal branch length for each quartet, marginalizing over gene tree topologies and ignoring branch lengths by making several simplifying assumptions. It then updates the gene tree posterior probabilities based on the species tree. The focus on gene tree topologies and the heuristic division to quartets enables fast likelihood calculations. We benchmark our method with extensive simulations for quartet trees in zones known to produce biased species trees and further with larger trees. We also run QuCo on a biological dataset of bees. Our results show better accuracy than the summary-based approach ASTRAL run on estimated gene trees. Availability and implementation QuCo is available on https://github.com/maryamrabiee/quco. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maryam Rabiee
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA 92093, USA
| | | |
Collapse
|
20
|
Matschiner M. Species Tree Inference with SNP Data. Methods Mol Biol 2022; 2512:23-44. [PMID: 35817997 DOI: 10.1007/978-1-0716-2429-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
While the inference of species trees from molecular sequences has become a common type of analysis in studies of species diversification, few programs so far allow for the use of single-nucleotide polymorphisms (SNPs) for the same purpose. In this book chapter, I discuss the use of the Bayesian program SNAPP, which infers the species tree by mathematically integrating over all possible genealogies at each SNP. In particular, I focus on a molecular clock model developed for SNAPP, allowing the inference of divergence times together with the species tree topology and the population size, directly from SNP datasets in variant call format. With the growing availability of SNP datasets for multiple closely related species, this approach is becoming increasingly relevant for the reconstruction of the temporal framework of recent species diversification.
Collapse
Affiliation(s)
- Michael Matschiner
- Department of Palaeontology and Museum, University of Zurich, Zurich, Switzerland.
- Natural History Museum, University of Oslo, Oslo, Norway.
| |
Collapse
|
21
|
Zhu Q, Mirarab S. Assembling a Reference Phylogenomic Tree of Bacteria and Archaea by Summarizing Many Gene Phylogenies. Methods Mol Biol 2022; 2569:137-165. [PMID: 36083447 DOI: 10.1007/978-1-0716-2691-7_7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Phylogenomics is the inference of phylogenetic trees based on multiple marker genes sampled in the genomes of interest. An important challenge in phylogenomics is the potential incongruence among the evolutionary histories of individual genes, which can be widespread in microorganisms due to the prevalence of horizontal gene transfer. This protocol introduces the procedures for building a phylogenetic tree of a large number of microbial genomes using a broad sampling of marker genes that are representative of whole-genome evolution. The protocol highlights the use of a gene tree summary method, which can effectively reconstruct the species tree while accounting for the topological conflicts among individual gene trees. The pipeline described in this protocol is scalable to tens of thousands of genomes while retaining high accuracy. We discussed multiple software tools, libraries, and scripts to enable convenient adoption of the protocol. The protocol is suitable for microbiology and microbiome studies based on public genomes and metagenomic data.
Collapse
Affiliation(s)
- Qiyun Zhu
- Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.
- School of Life Sciences, Arizona State University, Tempe, AZ, USA.
| | - Siavash Mirarab
- Department of Electrical and Computer Engineering, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
22
|
How challenging RADseq data turned out to favor coalescent-based species tree inference. A case study in Aichryson (Crassulaceae). Mol Phylogenet Evol 2021; 167:107342. [PMID: 34785384 DOI: 10.1016/j.ympev.2021.107342] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Revised: 07/05/2021] [Accepted: 10/29/2021] [Indexed: 12/24/2022]
Abstract
Analysing multiple genomic regions while incorporating detection and qualification of discordance among regions has become standard for understanding phylogenetic relationships. In plants, which usually have comparatively large genomes, this is feasible by the combination of reduced-representation library (RRL) methods and high-throughput sequencing enabling the cost effective acquisition of genomic data for thousands of loci from hundreds of samples. One popular RRL method is RADseq. A major disadvantage of established RADseq approaches is the rather short fragment and sequencing range, leading to loci of little individual phylogenetic information. This issue hampers the application of coalescent-based species tree inference. The modified RADseq protocol presented here targets ca. 5,000 loci of 300-600nt length, sequenced with the latest short-read-sequencing (SRS) technology, has the potential to overcome this drawback. To illustrate the advantages of this approach we use the study group Aichryson Webb & Berthelott (Crassulaceae), a plant genus that diversified on the Canary Islands. The data analysis approach used here aims at a careful quality control of the long loci dataset. It involves an informed selection of thresholds for accurate clustering, a thorough exploration of locus properties, such as locus length, coverage and variability, to identify potential biased data and a comparative phylogenetic inference of filtered datasets, accompanied by an evaluation of resulting BS support, gene and site concordance factor values, to improve overall resolution of the resulting phylogenetic trees. The final dataset contains variable loci with an average length of 373nt and facilitates species tree estimation using a coalescent-based summary approach. Additional improvements brought by the approach are critically discussed.
Collapse
|
23
|
Li J, Zhang Y, Ruhsam M, Milne RI, Wang Y, Wu D, Jia S, Tao T, Mao K. Seeing through the hedge: Phylogenomics of Thuja (Cupressaceae) reveals prominent incomplete lineage sorting and ancient introgression for Tertiary relict flora. Cladistics 2021; 38:187-203. [PMID: 34551153 DOI: 10.1111/cla.12491] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 08/15/2021] [Accepted: 08/27/2021] [Indexed: 12/16/2022] Open
Abstract
The Eastern Asia (EA) - North America (NA) disjunction is a well-known biogeographic pattern of the Tertiary relict flora; however, few studies have investigated the evolutionary history of this disjunction using a phylogenomic approach. Here, we used 2369 single copy nuclear genes and nearly full plastomes to reconstruct the evolutionary history of the small Tertiary relict genus Thuja, which consists of five disjunctly distributed species. The nuclear species tree strongly supported an EA clade Thuja standishii-Thuja sutchuenensis and a "disjunct clade", where western NA species T. plicata is sister to an EA-eastern NA disjunct Thuja occidentalis-Thuja koraiensis group. Our results suggested that the observed topological discordance among the gene trees as well as the cytonuclear discordance is mainly due to incomplete lineage sorting, probably facilitated by the fast diversification of Thuja around the Early Miocene and the large effective population sizes of ancestral lineages. Furthermore, approximately 20% of the T. sutchuenensis nuclear genome is derived from an unknown ancestral lineage of Thuja, which might explain the close resemblance of its cone morphology to that of an ancient fossil species. Overall, our study demonstrates that single genes may not resolve interspecific relationships for disjunct taxa, and that more reliable results will come from hundreds or thousands of loci, revealing a more complex evolutionary history. This will steadily improve our understanding of their origin and evolution.
Collapse
Affiliation(s)
- Jialiang Li
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, Sichuan, 610065, China
| | - Yujiao Zhang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, Sichuan, 610065, China
| | - Markus Ruhsam
- Royal Botanic Garden Edinburgh, 20A Inverleith Row, Edinburgh, EH3 5LR, UK
| | - Richard Ian Milne
- Institute of Molecular Plant Sciences, The University of Edinburgh, Edinburgh, EH9 3JH, UK
| | - Yi Wang
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, Sichuan, 610065, China
| | - Dayu Wu
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, Sichuan, 610065, China
| | - Shiyu Jia
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, Sichuan, 610065, China
| | - Tongzhou Tao
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, Sichuan, 610065, China
| | - Kangshan Mao
- Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, Sichuan, 610065, China.,College of Science, Tibet University, Lhasa, Xizang Autonomous Region, 850012, China
| |
Collapse
|
24
|
Stringer DN, Bertozzi T, Meusemann K, Delean S, Guzik MT, Tierney SM, Mayer C, Cooper SJB, Javidkar M, Zwick A, Austin AD. Development and evaluation of a custom bait design based on 469 single-copy protein-coding genes for exon capture of isopods (Philosciidae: Haloniscus). PLoS One 2021; 16:e0256861. [PMID: 34534224 PMCID: PMC8448321 DOI: 10.1371/journal.pone.0256861] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 08/17/2021] [Indexed: 12/02/2022] Open
Abstract
Transcriptome-based exon capture approaches, along with next-generation sequencing, are allowing for the rapid and cost-effective production of extensive and informative phylogenomic datasets from non-model organisms for phylogenetics and population genetics research. These approaches generally employ a reference genome to infer the intron-exon structure of targeted loci and preferentially select longer exons. However, in the absence of an existing and well-annotated genome, we applied this exon capture method directly, without initially identifying intron-exon boundaries for bait design, to a group of highly diverse Haloniscus (Philosciidae), paraplatyarthrid and armadillid isopods, and examined the performance of our methods and bait design for phylogenetic inference. Here, we identified an isopod-specific set of single-copy protein-coding loci, and a custom bait design to capture targeted regions from 469 genes, and analysed the resulting sequence data with a mapping approach and newly-created post-processing scripts. We effectively recovered a large and informative dataset comprising both short (<100 bp) and longer (>300 bp) exons, with high uniformity in sequencing depth. We were also able to successfully capture exon data from up to 16-year-old museum specimens along with more distantly related outgroup taxa, and efficiently pool multiple samples prior to capture. Our well-resolved phylogenies highlight the overall utility of this methodological approach and custom bait design, which offer enormous potential for application to future isopod, as well as broader crustacean, molecular studies.
Collapse
Affiliation(s)
- Danielle N. Stringer
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- South Australian Museum, Adelaide, South Australia, Australia
- * E-mail:
| | - Terry Bertozzi
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- South Australian Museum, Adelaide, South Australia, Australia
| | - Karen Meusemann
- Evolutionary Biology and Ecology, Institute for Biology I, University of Freiburg, Freiburg, Germany
- Australian National Insect Collection, CSIRO National Research Collections Australia, Acton, Australian Capital Territory, Australia
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Steven Delean
- School of Biological Sciences and the Environment Institute, The University of Adelaide, Adelaide, South Australia, Australia
| | - Michelle T. Guzik
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Simon M. Tierney
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, New South Wales, Australia
| | - Christoph Mayer
- Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Bonn, Germany
| | - Steven J. B. Cooper
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- South Australian Museum, Adelaide, South Australia, Australia
| | - Mohammad Javidkar
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
| | - Andreas Zwick
- Australian National Insect Collection, CSIRO National Research Collections Australia, Acton, Australian Capital Territory, Australia
| | - Andrew D. Austin
- Australian Centre for Evolutionary Biology and Biodiversity, School of Biological Sciences, The University of Adelaide, Adelaide, South Australia, Australia
- South Australian Museum, Adelaide, South Australia, Australia
| |
Collapse
|
25
|
Forthman M, Braun EL, Kimball RT. Gene tree quality affects empirical coalescent branch length estimation. ZOOL SCR 2021. [DOI: 10.1111/zsc.12512] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Affiliation(s)
- Michael Forthman
- Department of Entomology & Nematology University of Florida Gainesville FL USA
- California State Collection of Arthropods Plant Pest Diagnostics Branch California Department of Food & Agriculture Sacramento CA USA
| | - Edward L. Braun
- Department of Biology University of Florida Gainesville FL USA
| | | |
Collapse
|
26
|
Esquerré D, Keogh JS, Demangel D, Morando M, Avila LJ, Sites JW, Ferri-Yáñez F, Leaché AD. Rapid radiation and rampant reticulation: Phylogenomics of South American Liolaemus lizards. Syst Biol 2021; 71:286-300. [PMID: 34259868 DOI: 10.1093/sysbio/syab058] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 06/25/2021] [Accepted: 06/30/2021] [Indexed: 01/09/2023] Open
Abstract
Understanding the factors that cause heterogeneity among gene trees can increase the accuracy of species trees. Discordant signals across the genome are commonly produced by incomplete lineage sorting (ILS) and introgression, which in turn can result in reticulate evolution. Species tree inference using the multispecies coalescent is designed to deal with ILS and is robust to low levels of introgression, but extensive introgression violates the fundamental assumption that relationships are strictly bifurcating. In this study, we explore the phylogenomics of the iconic Liolaemus subgenus of South American lizards, a group of over 100 species mostly distributed in and around the Andes mountains. Using mitochondrial DNA (mtDNA) and genome-wide restriction-site associated DNA sequencing (RADseq; nDNA hereafter), we inferred a time-calibrated mtDNA gene tree, nDNA species trees, and phylogenetic networks. We found high levels of discordance between mtDNA and nDNA, which we attribute in part to extensive ILS resulting from rapid diversification. These data also reveal extensive and deep introgression, which combined with rapid diversification, explain the high level of phylogenetic discordance. We discuss these findings in the context of Andean orogeny and glacial cycles that fragmented, expanded, and contracted species distributions. Finally, we use the new phylogeny to resolve long-standing taxonomic issues in one of the most studied lizard groups in the New World.
Collapse
Affiliation(s)
- Damien Esquerré
- Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - J Scott Keogh
- Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | | | - Mariana Morando
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC- CONICET), Puerto Madryn, Chubut, Argentina
| | - Luciano J Avila
- Instituto Patagónico para el Estudio de los Ecosistemas Continentales (IPEEC- CONICET), Puerto Madryn, Chubut, Argentina
| | - Jack W Sites
- Department of Biology and M.L. Bean Life Science Museum, Brigham Young University, Provo, Utah, USA
| | - Francisco Ferri-Yáñez
- Departamento de Biogeografía y Cambio Global, Museo Nacional de Ciencias Naturales, CSIC & Laboratorio Internacional en Cambio Global CSIC-PUC (LINCGlobal), Calle José Gutiérrez Abascal, 2, 28006, Madrid, Spain
| | - Adam D Leaché
- Department of Biology & Burke Museum of Natural History and Culture, University of Washington, Seattle, Washington, USA
| |
Collapse
|
27
|
Vázquez-Miranda H, Barker FK. Autosomal, sex-linked and mitochondrial loci resolve evolutionary relationships among wrens in the genus Campylorhynchus. Mol Phylogenet Evol 2021; 163:107242. [PMID: 34224849 DOI: 10.1016/j.ympev.2021.107242] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 06/14/2021] [Accepted: 06/29/2021] [Indexed: 01/18/2023]
Abstract
Although there is general consensus that sampling of multiple genetic loci is critical in accurate reconstruction of species trees, the exact numbers and the best types of molecular markers remain an open question. In particular, the phylogenetic utility of sex-linked loci is underexplored. Here, we sample all species and 70% of the named diversity of the New World wren genus Campylorhynchus using sequences from 23 loci, to evaluate the effects of linkage on efficiency in recovering a well-supported tree for the group. At a tree-wide level, we found that most loci supported fewer than half the possible clades and that sex-linked loci produced similar resolution to slower-coalescing autosomal markers, controlling for locus length. By contrast, we did find evidence that linkage affected the efficiency of recovery of individual relationships; as few as two sex-linked loci were necessary to resolve a selection of clades with long to medium subtending branches, whereas 4-6 autosomal loci were necessary to achieve comparable results. These results support an expanded role for sampling of the avian Z chromosome in phylogenetic studies, including target enrichment approaches. Our concatenated and species tree analyses represent significant improvements in our understanding of diversification in Campylorhynchus, and suggest a relatively complex scenario for its radiation across the Miocene/Pliocene boundary, with multiple invasions of South America.
Collapse
Affiliation(s)
- Hernán Vázquez-Miranda
- Departamento de Zoología, Instituto de Biología, Universidad Nacional Autónoma de México, Ciudad de México C.P. 04510, Mexico
| | - F Keith Barker
- Department of Ecology, Evolution and Behavior, Bell Museum of Natural History, University of Minnesota, 40 Gortner Laboratory, 1479 Gortner Avenue, Saint Paul, MN 55108, USA
| |
Collapse
|
28
|
Mahbub M, Wahab Z, Reaz R, Rahman MS, Bayzid MS. wQFM: Highly Accurate Genome-scale Species Tree Estimation from Weighted Quartets. Bioinformatics 2021; 37:3734-3743. [PMID: 34086858 DOI: 10.1093/bioinformatics/btab428] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 05/24/2021] [Accepted: 06/03/2021] [Indexed: 02/01/2023] Open
Abstract
MOTIVATION Species tree estimation from genes sampled from throughout the whole genome is complicated due to the gene tree-species tree discordance. Incomplete lineage sorting (ILS) is one of the most frequent causes for this discordance, where alleles can coexist in populations for periods that may span several speciation events. Quartet-based summary methods for estimating species trees from a collection of gene trees are becoming popular due to their high accuracy and statistical guarantee under ILS. Generating quartets with appropriate weights, where weights correspond to the relative importance of quartets, and subsequently amalgamating the weighted quartets to infer a single coherent species tree can allow for a statistically consistent way of estimating species trees. However, handling weighted quartets is challenging. RESULTS We propose wQFM, a highly accurate method for species tree estimation from multi-locus data, by extending the quartet FM (QFM) algorithm to a weighted setting. wQFM was assessed on a collection of simulated and real biological datasets, including the avian phylogenomic dataset which is one of the largest phylogenomic datasets to date. We compared wQFM with wQMC, which is the best alternate method for weighted quartet amalgamation, and with ASTRAL, which is one of the most accurate and widely used coalescent-based species tree estimation methods. Our results suggest that wQFM matches or improves upon the accuracy of wQMC and ASTRAL. AVAILABILITY wQFM is available in open source form at https://github.com/Mahim1997/wQFM-2020. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mahim Mahbub
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Zahin Wahab
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Rezwana Reaz
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - M Saifur Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka-1205, Bangladesh
| |
Collapse
|
29
|
Shen XX, Steenwyk JL, Rokas A. Dissecting incongruence between concatenation- and quartet-based approaches in phylogenomic data. Syst Biol 2021; 70:997-1014. [PMID: 33616672 DOI: 10.1093/sysbio/syab011] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2020] [Revised: 02/10/2021] [Accepted: 02/17/2021] [Indexed: 12/12/2022] Open
Abstract
Topological conflict or incongruence is widespread in phylogenomic data. Concatenation- and coalescent-based approaches often result in incongruent topologies, but the causes of this conflict can be difficult to characterize. We examined incongruence stemming from conflict between likelihood-based signal (quantified by the difference in gene-wise log likelihood score or ΔGLS) and quartet-based topological signal (quantified by the difference in gene-wise quartet score or ΔGQS) for every gene in three phylogenomic studies in animals, fungi, and plants, which were chosen because their concatenation-based IQ-TREE (T1) and quartet-based ASTRAL (T2) phylogenies are known to produce eight conflicting internal branches (bipartitions). By comparing the types of phylogenetic signal for all genes in these three data matrices, we found that 30% - 36% of genes in each data matrix are inconsistent, that is, each of these genes has higher log likelihood score for T1 versus T2 (i.e., ΔGLS >0) whereas its T1 topology has lower quartet score than its T2 topology (i.e., ΔGQS <0) or vice versa. Comparison of inconsistent and consistent genes using a variety of metrics (e.g., evolutionary rate, gene tree topology, distribution of branch lengths, hidden paralogy, and gene tree discordance) showed that inconsistent genes are more likely to recover neither T1 nor T2 and have higher levels of gene tree discordance than consistent genes. Simulation analyses demonstrate that removal of inconsistent genes from datasets with low levels of incomplete lineage sorting (ILS) and low and medium levels of gene tree estimation error (GTEE) reduced incongruence and increased accuracy. In contrast, removal of inconsistent genes from datasets with medium and high ILS levels and high GTEE levels eliminated or extensively reduced incongruence, but the resulting congruent species phylogenies were not always topologically identical to the true species trees.
Collapse
Affiliation(s)
- Xing-Xing Shen
- State Key Laboratory of Rice Biology and Ministry of Agriculture Key Lab of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, China.,Institute of Insect Sciences, Zhejiang University, Hangzhou, China
| | - Jacob L Steenwyk
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
30
|
Rabiee M, Mirarab S. SODA: Multi-locus species delimitation using quartet frequencies. Bioinformatics 2021; 36:5623-5631. [PMID: 33555318 DOI: 10.1093/bioinformatics/btaa1010] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 10/19/2020] [Accepted: 11/21/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Species delimitation, the process of deciding how to group a set of organisms into units called species, is one of the most challenging problems in evolutionary computational biology. While many methods exist for species delimitation, most based on the coalescent theory, few are scalable to very large datasets, and methods that scale tend to be not accurate. Species delimitation is closely related to species tree inference from discordant gene trees, a problem that has enjoyed rapid advances in recent years. RESULTS In this paper, we build on the accuracy and scalability of recent quartet-based methods for species tree estimation and propose a new method called SODA for species delimitation. SODA relies heavily on a recently developed method for testing zero branch length in species trees. In extensive simulations, we show that SODA can easily scale to very large datasets while maintaining high accuracy. AVAILABILITY The code and data presented here are available on https://github.com/maryamrabiee/SODA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maryam Rabiee
- Computer Science and Engineering, University of California, San Diego, US
| | - Siavash Mirarab
- Electrical and Computer Engineering, University of California, San Diego, US
| |
Collapse
|
31
|
|
32
|
Zhu T, Yang Z. Complexity of the simplest species tree problem. Mol Biol Evol 2021; 38:3993-4009. [PMID: 33492385 PMCID: PMC8382899 DOI: 10.1093/molbev/msab009] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2020] [Revised: 01/04/2021] [Accepted: 01/13/2021] [Indexed: 02/06/2023] Open
Abstract
The multispecies coalescent model provides a natural framework for species tree estimation accounting for gene-tree conflicts. Although a number of species tree methods under the multispecies coalescent have been suggested and evaluated using simulation, their statistical properties remain poorly understood. Here, we use mathematical analysis aided by computer simulation to examine the identifiability, consistency, and efficiency of different species tree methods in the case of three species and three sequences under the molecular clock. We consider four major species-tree methods including concatenation, two-step, independent-sites maximum likelihood, and maximum likelihood. We develop approximations that predict that the probit transform of the species tree estimation error decreases linearly with the square root of the number of loci. Even in this simplest case, major differences exist among the methods. Full-likelihood methods are considerably more efficient than summary methods such as concatenation and two-step. They also provide estimates of important parameters such as species divergence times and ancestral population sizes,whereas these parameters are not identifiable by summary methods. Our results highlight the need to improve the statistical efficiency of summary methods and the computational efficiency of full likelihood methods of species tree estimation.
Collapse
Affiliation(s)
- Tianqi Zhu
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Key Laboratory of Random Complex Structures and Data Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China
| | - Ziheng Yang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,Department of Genetics, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
33
|
Koch H, DeGiorgio M. Maximum Likelihood Estimation of Species Trees from Gene Trees in the Presence of Ancestral Population Structure. Genome Biol Evol 2020; 12:3977-3995. [PMID: 32022857 PMCID: PMC7061232 DOI: 10.1093/gbe/evaa022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/23/2020] [Indexed: 11/12/2022] Open
Abstract
Though large multilocus genomic data sets have led to overall improvements in phylogenetic inference, they have posed the new challenge of addressing conflicting signals across the genome. In particular, ancestral population structure, which has been uncovered in a number of diverse species, can skew gene tree frequencies, thereby hindering the performance of species tree estimators. Here we develop a novel maximum likelihood method, termed TASTI (Taxa with Ancestral structure Species Tree Inference), that can infer phylogenies under such scenarios, and find that it has increasing accuracy with increasing numbers of input gene trees, contrasting with the relatively poor performances of methods not tailored for ancestral structure. Moreover, we propose a supertree approach that allows TASTI to scale computationally with increasing numbers of input taxa. We use genetic simulations to assess TASTI's performance in the three- and four-taxon settings and demonstrate the application of TASTI on a six-species Afrotropical mosquito data set. Finally, we have implemented TASTI in an open-source software package for ease of use by the scientific community.
Collapse
Affiliation(s)
- Hillary Koch
- Department of Statistics, Pennsylvania State University
| | - Michael DeGiorgio
- Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University
| |
Collapse
|
34
|
Portik DM, Wiens JJ. Do Alignment and Trimming Methods Matter for Phylogenomic (UCE) Analyses? Syst Biol 2020; 70:440-462. [PMID: 32797207 DOI: 10.1093/sysbio/syaa064] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 08/02/2020] [Accepted: 08/03/2020] [Indexed: 11/14/2022] Open
Abstract
Alignment is a crucial issue in molecular phylogenetics because different alignment methods can potentially yield very different topologies for individual genes. But it is unclear if the choice of alignment methods remains important in phylogenomic analyses, which incorporate data from hundreds or thousands of genes. For example, problematic biases in alignment might be multiplied across many loci, whereas alignment errors in individual genes might become irrelevant. The issue of alignment trimming (i.e., removing poorly aligned regions or missing data from individual genes) is also poorly explored. Here, we test the impact of 12 different combinations of alignment and trimming methods on phylogenomic analyses. We compare these methods using published phylogenomic data from ultraconserved elements (UCEs) from squamate reptiles (lizards and snakes), birds, and tetrapods. We compare the properties of alignments generated by different alignment and trimming methods (e.g., length, informative sites, missing data). We also test whether these data sets can recover well-established clades when analyzed with concatenated (RAxML) and species-tree methods (ASTRAL-III), using the full data ($\sim $5000 loci) and subsampled data sets (10% and 1% of loci). We show that different alignment and trimming methods can significantly impact various aspects of phylogenomic data sets (e.g., length, informative sites). However, these different methods generally had little impact on the recovery and support values for well-established clades, even across very different numbers of loci. Nevertheless, our results suggest several "best practices" for alignment and trimming. Intriguingly, the choice of phylogenetic methods impacted the phylogenetic results most strongly, with concatenated analyses recovering significantly more well-established clades (with stronger support) than the species-tree analyses. [Alignment; concatenated analysis; phylogenomics; sequence length heterogeneity; species-tree analysis; trimming].
Collapse
Affiliation(s)
- Daniel M Portik
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA.,California Academy of Sciences, San Francisco, CA 94118, USA
| | - John J Wiens
- Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
35
|
Huang J, Flouri T, Yang Z. A Simulation Study to Examine the Information Content in Phylogenomic Data Sets under the Multispecies Coalescent Model. Mol Biol Evol 2020; 37:3211-3224. [DOI: 10.1093/molbev/msaa166] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
AbstractWe use computer simulation to examine the information content in multilocus data sets for inference under the multispecies coalescent model. Inference problems considered include estimation of evolutionary parameters (such as species divergence times, population sizes, and cross-species introgression probabilities), species tree estimation, and species delimitation based on Bayesian comparison of delimitation models. We found that the number of loci is the most influential factor for almost all inference problems examined. Although the number of sequences per species does not appear to be important to species tree estimation, it is very influential to species delimitation. Increasing the number of sites and the per-site mutation rate both increase the mutation rate for the whole locus and these have the same effect on estimation of parameters, but the sequence length has a greater effect than the per-site mutation rate for species tree estimation. We discuss the computational costs when the data size increases and provide guidelines concerning the subsampling of genomic data to enable the application of full-likelihood methods of inference.
Collapse
Affiliation(s)
- Jun Huang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
- Department of Mathematics, Beijing Jiaotong University, Beijing, P.R. China
| | - Tomáš Flouri
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| | - Ziheng Yang
- Department of Genetics, Evolution and Environment, University College London, London, United Kingdom
| |
Collapse
|
36
|
Bagley JC, Uribe-Convers S, Carlsen MM, Muchhala N. Utility of targeted sequence capture for phylogenomics in rapid, recent angiosperm radiations: Neotropical Burmeistera bellflowers as a case study. Mol Phylogenet Evol 2020; 152:106769. [PMID: 32081762 DOI: 10.1016/j.ympev.2020.106769] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 02/10/2020] [Accepted: 02/12/2020] [Indexed: 02/06/2023]
Abstract
Targeted sequence capture is a promising approach for large-scale phylogenomics. However, rapid evolutionary radiations pose significant challenges for phylogenetic inference (e.g. incomplete lineages sorting (ILS), phylogenetic noise), and the ability of targeted nuclear loci to resolve species trees despite such issues remains poorly studied. We test the utility of targeted sequence capture for inferring phylogenetic relationships in rapid, recent angiosperm radiations, focusing on Burmeistera bellflowers (Campanulaceae), which diversified into ~130 species over less than 3 million years. We compared phylogenies estimated from supercontig (exons plus flanking sequences), exon-only, and flanking-only datasets with 506-546 loci (~4.7 million bases) for 46 Burmeistera species/lineages and 10 outgroup taxa. Nuclear loci resolved backbone nodes and many congruent internal relationships with high support in concatenation and coalescent-based species tree analyses, and inferences were largely robust to effects of missing taxa and base composition biases. Nevertheless, species trees were incongruent between datasets, and gene trees exhibited remarkably high levels of conflict (~4-60% congruence, ~40-99% conflict) not simply driven by poor gene tree resolution. Higher gene tree heterogeneity at shorter branches suggests an important role of ILS, as expected for rapid radiations. Phylogenetic informativeness analyses also suggest this incongruence has resulted from low resolving power at short internal branches, consistent with ILS, and homoplasy at deeper nodes, with exons exhibiting much greater risk of incorrect topologies due to homoplasy than other datasets. Our findings suggest that targeted sequence capture is feasible for resolving rapid, recent angiosperm radiations, and that results based on supercontig alignments containing nuclear exons and flanking sequences have higher phylogenetic utility and accuracy than either alone. We use our results to make practical recommendations for future target capture-based studies of Burmeistera and other rapid angiosperm radiations, including that such studies should analyze supercontigs to maximize the phylogenetic information content of loci.
Collapse
Affiliation(s)
- Justin C Bagley
- Department of Biology, University of Missouri-St. Louis, St. Louis, MO 63121, USA; Department of Biology, Virginia Commonwealth University, Richmond, VA 23284, USA.
| | - Simon Uribe-Convers
- Department of Biology, University of Missouri-St. Louis, St. Louis, MO 63121, USA
| | - Mónica M Carlsen
- Research Department, Science and Conservation Division, Missouri Botanical Garden, St. Louis, MO 63110, USA
| | - Nathan Muchhala
- Department of Biology, University of Missouri-St. Louis, St. Louis, MO 63121, USA
| |
Collapse
|
37
|
Islam M, Sarker K, Das T, Reaz R, Bayzid MS. STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency. BMC Genomics 2020; 21:136. [PMID: 32039704 PMCID: PMC7011378 DOI: 10.1186/s12864-020-6519-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Accepted: 01/20/2020] [Indexed: 12/14/2022] Open
Abstract
Background Species tree estimation is frequently based on phylogenomic approaches that use multiple genes from throughout the genome. However, estimating a species tree from a collection of gene trees can be complicated due to the presence of gene tree incongruence resulting from incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent process. Maximum likelihood and Bayesian MCMC methods can potentially result in accurate trees, but they do not scale well to large datasets. Results We present STELAR (Species Tree Estimation by maximizing tripLet AgReement), a new fast and highly accurate statistically consistent coalescent-based method for estimating species trees from a collection of gene trees. We formalized the constrained triplet consensus (CTC) problem and showed that the solution to the CTC problem is a statistically consistent estimate of the species tree under the multi-species coalescent (MSC) model. STELAR is an efficient dynamic programming based solution to the CTC problem which is highly accurate and scalable. We evaluated the accuracy of STELAR in comparison with SuperTriplets, which is an alternate fast and highly accurate triplet-based supertree method, and with MP-EST and ASTRAL – two of the most popular and accurate coalescent-based methods. Experimental results suggest that STELAR matches the accuracy of ASTRAL and improves on MP-EST and SuperTriplets. Conclusions Theoretical and empirical results (on both simulated and real biological datasets) suggest that STELAR is a valuable technique for species tree estimation from gene tree distributions.
Collapse
Affiliation(s)
- Mazharul Islam
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Kowshika Sarker
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Trisha Das
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh
| | - Rezwana Reaz
- Department of Computer Science, The University of Texas at Austin, Texas, 78712, USA
| | - Md Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, 1205, Bangladesh.
| |
Collapse
|
38
|
Olave M, Meyer A. Implementing Large Genomic Single Nucleotide Polymorphism Data Sets in Phylogenetic Network Reconstructions: A Case Study of Particularly Rapid Radiations of Cichlid Fish. Syst Biol 2020; 69:848-862. [DOI: 10.1093/sysbio/syaa005] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2019] [Revised: 01/09/2020] [Accepted: 01/23/2020] [Indexed: 12/23/2022] Open
Abstract
AbstractThe Midas cichlids of the Amphilophus citrinellus spp. species complex from Nicaragua (13 species) are an extraordinary example of adaptive and rapid radiation ($<$24,000 years old). These cichlids are a very challenging group to infer its evolutionary history in phylogenetic analyses, due to the apparent prevalence of incomplete lineage sorting (ILS), as well as past and current gene flow. Assuming solely a vertical transfer of genetic material from an ancestral lineage to new lineages is not appropriate in many cases of genes transferred horizontally in nature. Recently developed methods to infer phylogenetic networks under such circumstances might be able to circumvent these problems. These models accommodate not just ILS, but also gene flow, under the multispecies network coalescent (MSNC) model, processes that are at work in young, hybridizing, and/or rapidly diversifying lineages. There are currently only a few programs available that implement MSNC for estimating phylogenetic networks. Here, we present a novel way to incorporate single nucleotide polymorphism (SNP) data into the currently available PhyloNetworks program. Based on simulations, we demonstrate that SNPs can provide enough power to recover the true phylogenetic network. We also show that it can accurately infer the true network more often than other similar SNP-based programs (PhyloNet and HyDe). Moreover, our approach results in a faster algorithm compared to the original pipeline in PhyloNetworks, without losing power. We also applied our new approach to infer the phylogenetic network of Midas cichlid radiation. We implemented the most comprehensive genomic data set to date (RADseq data set of 679 individuals and $>$37K SNPs from 19 ingroup lineages) and present estimated phylogenetic networks for this extremely young and fast-evolving radiation of cichlid fish. We demonstrate that the MSNC is more appropriate than the multispecies coalescent alone for the analysis of this rapid radiation. [Genomics; multispecies network coalescent; phylogenetic networks; phylogenomics; RADseq; SNPs.]
Collapse
Affiliation(s)
- Melisa Olave
- Department of Biology, University of Konstanz, 78457 Konstanz, Germany
| | - Axel Meyer
- Department of Biology, University of Konstanz, 78457 Konstanz, Germany
| |
Collapse
|
39
|
Moumi NA, Das B, Tasnim Promi Z, Bristy NA, Bayzid MS. Quartet-based inference of cell differentiation trees from ChIP-Seq histone modification data. PLoS One 2019; 14:e0221270. [PMID: 31557185 PMCID: PMC6762093 DOI: 10.1371/journal.pone.0221270] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 08/04/2019] [Indexed: 01/23/2023] Open
Abstract
Understanding cell differentiation-the process of generation of distinct cell-types-plays a pivotal role in developmental and evolutionary biology. Transcriptomic information and epigenetic marks are useful to elucidate hierarchical developmental relationships among cell-types. Standard phylogenetic approaches such as maximum parsimony, maximum likelihood and neighbor joining have previously been applied to ChIP-Seq histone modification data to infer cell-type trees, showing how diverse types of cells are related. In this study, we demonstrate the applicability and suitability of quartet-based phylogenetic tree estimation techniques for constructing cell-type trees. We propose two quartet-based pipelines for constructing cell phylogeny. Our methods were assessed for their validity in inferring hierarchical differentiation processes of various cell-types in H3K4me3, H3K27me3, H3K36me3, and H3K27ac histone mark data. We also propose a robust metric for evaluating cell-type trees.
Collapse
Affiliation(s)
- Nazifa Ahmed Moumi
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Badhan Das
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Zarin Tasnim Promi
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Nishat Anjum Bristy
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Md. Shamsuzzoha Bayzid
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
- * E-mail:
| |
Collapse
|
40
|
Yuan H, Atta C, Tornabene L, Li C. Assexon: Assembling Exon Using Gene Capture Data. Evol Bioinform Online 2019; 15:1176934319874792. [PMID: 31523128 PMCID: PMC6732846 DOI: 10.1177/1176934319874792] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 08/19/2019] [Indexed: 12/30/2022] Open
Abstract
Exon capture across species has been one of the most broadly applied approaches
to acquire multi-locus data in phylogenomic studies of non-model organisms.
Methods for assembling loci from short-read sequences (eg, Illumina platforms)
that rely on mapping reads to a reference genome may not be suitable for studies
comprising species across a wide phylogenetic spectrum; thus, de novo assembling
methods are more generally applied. Current approaches for assembling targeted
exons from short reads are not particularly optimized as they cannot (1)
assemble loci with low read depth, (2) handle large files efficiently, and (3)
reliably address issues with paralogs. Thus, we present Assexon: a streamlined
pipeline that de novo assembles targeted exons and their flanking sequences from
raw reads. We tested our method using reads from Lepisosteus
osseus (4.37 Gb) and Boleophthalmus pectinirostris
(2.43 Gb), which are captured using baits that were designed based on genome
sequence of Lepisosteus oculatus and Oreochromis
niloticus, respectively. We compared performance of Assexon to
PHYLUCE and HybPiper, which are commonly used pipelines to assemble
ultra-conserved element (UCE) and Hyb-seq data. A custom exon capture analysis
pipeline (CP) developed by Yuan et al was compared as well. Assexon accurately
assembled more than 3400 to 3800 (20%-28%) loci than PHYLUCE and more than 1900
to 2300 (8%-14%) loci than HybPiper across different levels of phylogenetic
divergence. Assexon ran at least twice as fast as PHYLUCE and HybPiper. Number
of loci assembled using CP was comparable with Assexon in both tests, while
Assexon ran at least 7 times faster than CP. In addition, some steps of CP
require the user’s interaction and are not fully automated, and this user time
was not counted in our calculation. Both Assexon and CP retrieved no paralogs in
the testing runs, but PHYLUCE and Hybpiper did. In conclusion, Assexon is a tool
for accurate and efficient assembling of large read sets from exon capture
experiments. Furthermore, Assexon includes scripts to filter poorly aligned
coding regions and flanking regions, calculate summary statistics of loci, and
select loci with reliable phylogenetic signal. Assexon is available at https://github.com/yhadevol/Assexon.
Collapse
Affiliation(s)
- Hao Yuan
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution (Shanghai Ocean University), Shanghai, China.,Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China.,Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Shanghai Ocean Universitiy), Ministry of Education, Shanghai, China
| | - Calder Atta
- School of Aquatic and Fishery Sciences and the Burke Museum of Natural History and Culture, University of Washington, Seattle, WA, USA
| | - Luke Tornabene
- School of Aquatic and Fishery Sciences and the Burke Museum of Natural History and Culture, University of Washington, Seattle, WA, USA
| | - Chenhong Li
- Shanghai Universities Key Laboratory of Marine Animal Taxonomy and Evolution (Shanghai Ocean University), Shanghai, China.,Shanghai Collaborative Innovation for Aquatic Animal Genetics and Breeding, Shanghai, China.,Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Shanghai Ocean Universitiy), Ministry of Education, Shanghai, China
| |
Collapse
|
41
|
Bribiesca-Contreras G, Pineda-Enríquez T, Márquez-Borrás F, Solís-Marín FA, Verbruggen H, Hugall AF, O'Hara TD. Dark offshoot: Phylogenomic data sheds light on the evolutionary history of a new species of cave brittle star. Mol Phylogenet Evol 2019; 136:151-163. [PMID: 30981811 DOI: 10.1016/j.ympev.2019.04.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Revised: 02/28/2019] [Accepted: 04/10/2019] [Indexed: 11/28/2022]
Abstract
Caves are a useful system for testing evolutionary and biogeographic hypotheses, as they are isolated, and their environmental conditions have resulted in adaptive selection across different taxa. Although in recent years many more cave species have been discovered, cave-dwelling members of the class Ophiuroidea (brittle stars) remain scarce. Out of the more than two thousand species of brittle stars described to date, only three are regarded as true cave-dwellers. These occurrences represent rare colonising events, compared to other groups that are known to have successfully diversified in these systems. A third species from an anchihaline cave system in the Yucatan Peninsula, Mexico, has been previously identified from cytochrome oxidase I (COI) barcodes. In this study, we reassess the species boundaries of this putative cave species using a phylogenomic dataset (20 specimens in 13 species, 100 exons, 18.7 kbp). We perform species delimitation analyses using robust full-coalescent methods for discovery and validation of hypotheses on species boundaries, as well as infer its phylogenetic relationships with species distributed in adjacent marine regions, in order to investigate the origin of this cave-adapted species. We assess which hypotheses on the origin of subterranean taxa can be applied to this species by taking into account its placement within the genus Ophionereis and its demographic history. We provide a detailed description of Ophionereis commutabilis n. sp., and evaluate its morphological characters in the light of its successful adaptation to life in caves.
Collapse
Affiliation(s)
- Guadalupe Bribiesca-Contreras
- Museum Victoria, GPO Box 666, Melbourne 3001, Australia; School of Biosciences, University of Melbourne, Victoria 3010, Australia.
| | - Tania Pineda-Enríquez
- Department of Biology, Division of Invertebrate Zoology, Florida Museum of Natural History, University of Florida, Gainesville, FL, USA; Natural History Museum of Los Angeles County, 900 Exposition Blvd, Los Angeles, CA 90007, USA
| | - Francisco Márquez-Borrás
- Laboratorio de Sistemática y Ecología de Equinodermos, Instituto de Ciencias del Mar y Limnología, Universidad Nacional Autónoma de México, Circuito Universitario s/n, Ciudad de México CP 04510, Mexico; Posgrado en Ciencias del Mar y Limnología, Universidad Nacional Autónoma de México, Circuito Universitario s/n, Ciudad de México CP 04510, Mexico
| | - Francisco A Solís-Marín
- Laboratorio de Sistemática y Ecología de Equinodermos, Instituto de Ciencias del Mar y Limnología, Universidad Nacional Autónoma de México, Circuito Universitario s/n, Ciudad de México CP 04510, Mexico
| | - Heroen Verbruggen
- School of Biosciences, University of Melbourne, Victoria 3010, Australia
| | | | | |
Collapse
|
42
|
Roch S, Nute M, Warnow T. Long-Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology-Based Summary Methods. Syst Biol 2019; 68:281-297. [PMID: 30247732 DOI: 10.1093/sysbio/syy061] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 09/12/2018] [Indexed: 11/13/2022] Open
Abstract
With advances in sequencing technologies, there are now massive amounts of genomic data from across all life, leading to the possibility that a robust Tree of Life can be constructed. However, "gene tree heterogeneity", which is when different genomic regions can evolve differently, is a common phenomenon in multi-locus data sets, and reduces the accuracy of standard methods for species tree estimation that do not take this heterogeneity into account. New methods have been developed for species tree estimation that specifically address gene tree heterogeneity, and that have been proven to converge to the true species tree when the number of loci and number of sites per locus both increase (i.e., the methods are said to be "statistically consistent"). Yet, little is known about the biologically realistic condition where the number of sites per locus is bounded. We show that when the sequence length of each locus is bounded (by any arbitrarily chosen value), the most common approaches to species tree estimation that take heterogeneity into account (i.e., traditional fully partitioned concatenated maximum likelihood and newer approaches, called summary methods, that estimate the species tree by combining estimated gene trees) are not statistically consistent, even when the heterogeneity is extremely constrained. The main challenge is the presence of conditions such as long branch attraction that create biased tree estimation when the number of sites is restricted. Hence, our study uncovers a fundamental challenge to species tree estimation using both traditional and new methods.
Collapse
Affiliation(s)
- Sebastien Roch
- Department of Mathematics, University of Wisconsin-Madison, 480 Lincoln Dr, Madison, WI 53706, USA
| | - Michael Nute
- Department of Statistics, The University of Illinois at Urbana-Champaign, 725 S Wright St #101, Champaign, IL 61820, USA
| | - Tandy Warnow
- Department of Computer Science, The University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801-2302, USA
| |
Collapse
|
43
|
Shin S, Clarke DJ, Lemmon AR, Moriarty Lemmon E, Aitken AL, Haddad S, Farrell BD, Marvaldi AE, Oberprieler RG, McKenna DD. Phylogenomic Data Yield New and Robust Insights into the Phylogeny and Evolution of Weevils. Mol Biol Evol 2019; 35:823-836. [PMID: 29294021 DOI: 10.1093/molbev/msx324] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The phylogeny and evolution of weevils (the beetle superfamily Curculionoidea) has been extensively studied, but many relationships, especially in the large family Curculionidae (true weevils; > 50,000 species), remain uncertain. We used phylogenomic methods to obtain DNA sequences from 522 protein-coding genes for representatives of all families of weevils and all subfamilies of Curculionidae. Most of our phylogenomic results had strong statistical support, and the inferred relationships were generally congruent with those reported in previous studies, but with some interesting exceptions. Notably, the backbone relationships of the weevil phylogeny were consistently strongly supported, and the former Nemonychidae (pine flower snout beetles) were polyphyletic, with the subfamily Cimberidinae (here elevated to Cimberididae) placed as sister group of all other weevils. The clade comprising the sister families Brentidae (straight-snouted weevils) and Curculionidae was maximally supported and the composition of both families was firmly established. The contributions of substitution modeling, codon usage and/or mutational bias to differences between trees reconstructed from amino acid and nucleotide sequences were explored. A reconstructed timetree for weevils is consistent with a Mesozoic radiation of gymnosperm-associated taxa to form most extant families and diversification of Curculionidae alongside flowering plants-first monocots, then other groups-beginning in the Cretaceous.
Collapse
Affiliation(s)
- Seunggwan Shin
- Department of Biological Sciences, University of Memphis, Memphis, TN
| | - Dave J Clarke
- Department of Biological Sciences, University of Memphis, Memphis, TN
| | - Alan R Lemmon
- Department of Scientific Computing, Florida State University, Tallahassee, FL
| | | | | | - Stephanie Haddad
- Department of Biological Sciences, University of Memphis, Memphis, TN
| | - Brian D Farrell
- Museum of Comparative Zoology, Harvard University, Cambridge, MA
| | - Adriana E Marvaldi
- CONICET, División Entomología, Universidad Nacional de La Plata, La Plata, Buenos Aires, Argentina
| | | | - Duane D McKenna
- Department of Biological Sciences, University of Memphis, Memphis, TN
| |
Collapse
|
44
|
Espeland M, Breinholt JW, Barbosa EP, Casagrande MM, Huertas B, Lamas G, Marín MA, Mielke OH, Miller JY, Nakahara S, Tan D, Warren AD, Zacca T, Kawahara AY, Freitas AV, Willmott KR. Four hundred shades of brown: Higher level phylogeny of the problematic Euptychiina (Lepidoptera, Nymphalidae, Satyrinae) based on hybrid enrichment data. Mol Phylogenet Evol 2019; 131:116-124. [DOI: 10.1016/j.ympev.2018.10.039] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Revised: 08/06/2018] [Accepted: 10/30/2018] [Indexed: 12/25/2022]
|
45
|
Sousa-Santos C, Jesus TF, Fernandes C, Robalo JI, Coelho MM. Fish diversification at the pace of geomorphological changes: evolutionary history of western Iberian Leuciscinae (Teleostei: Leuciscidae) inferred from multilocus sequence data. Mol Phylogenet Evol 2018; 133:263-285. [PMID: 30583043 DOI: 10.1016/j.ympev.2018.12.020] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 12/12/2018] [Accepted: 12/15/2018] [Indexed: 01/05/2023]
Abstract
The evolutionary history of western Iberian Leuciscinae, obligatory freshwater fish, is directly linked to the evolution of the hydrographic network of the Iberian Peninsula after its isolation from the rest of Europe, which involved dramatic rearrangements such as the transition from endorheic lakes to open basins draining to the Atlantic. Previous phylogenetic research on western Iberian leuciscines, using mainly mitochondrial DNA and more recently one or two nuclear genes, has found contradictory results and there remain many unresolved issues regarding species relationships, taxonomy, and evolutionary history. Moreover, there is a lack of integration between phylogenetic and divergence time estimates and information on the timing of geomorphological changes and paleobasin rearrangements in the Iberian Peninsula. This study presents the first comprehensive fossil-calibrated multilocus coalescent species tree of western Iberian Leuciscinae (including 14 species of Achondrostoma, Iberochondrostoma, Pseudochondrostoma and Squalius endemic to the Iberian Peninsula, seven of which endemic to Portugal) based on seven nuclear genes, and from which we infer their biogeographic history by comparing divergence time estimates to known dated geological events. The phylogenetic pattern suggests slow-paced evolution of leuciscines during the Early-Middle Miocene endorheic phase of the main Iberian river basins, with the shift to exorheism in the late Neogene-Quaternary allowing westward dispersals that resulted in many cladogenetic events and a high rate of endemism in western Iberia. The results of this study also: (i) confirm the paraphyly of S. pyrenaicus with respect to S. carolitertii, and thus the possible presence of a new taxon in the Portuguese Tagus currently assigned to S. pyrenaicus; (ii) support the taxonomic separation of the Guadiana and Sado populations of S. pyrenaicus; (iii) show the need for further population sampling and taxonomic research to clarify the phylogenetic status of A. arcasii from the Minho basin and of the I. lusitanicum populations in the Sado and Tagus basins; and (iv) indicate that A. occidentale, I. olisiponensis and P. duriensis are the most ancient lineages within their respective genera.
Collapse
Affiliation(s)
- C Sousa-Santos
- MARE - Marine and Environmental Sciences Centre, ISPA-Instituto Universitário, Rua Jardim do Tabaco 34, 1149-041 Lisbon, Portugal.
| | - T F Jesus
- cE3c - Center for Ecology, Evolution and Environmental Changes, Departamento de Biologia Animal, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisbon, Portugal; Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina da Universidade de Lisboa, Av. Professor Egaz Moniz, 1649-028 Lisbon, Portugal(2).
| | - C Fernandes
- cE3c - Center for Ecology, Evolution and Environmental Changes, Departamento de Biologia Animal, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisbon, Portugal.
| | - J I Robalo
- MARE - Marine and Environmental Sciences Centre, ISPA-Instituto Universitário, Rua Jardim do Tabaco 34, 1149-041 Lisbon, Portugal.
| | - M M Coelho
- cE3c - Center for Ecology, Evolution and Environmental Changes, Departamento de Biologia Animal, Faculdade de Ciências da Universidade de Lisboa, Campo Grande, 1749-016 Lisbon, Portugal.
| |
Collapse
|
46
|
Smissen PJ, Rowe KC. Repeated biome transitions in the evolution of Australian rodents. Mol Phylogenet Evol 2018; 128:182-191. [DOI: 10.1016/j.ympev.2018.07.015] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2017] [Revised: 06/26/2018] [Accepted: 07/16/2018] [Indexed: 12/31/2022]
|
47
|
Kang Q, Schardl CL, Moore N, Yoshida R. CURatio: Genome-wide phylogenomic analysis method using ratios of total branch lengths. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 17:10.1109/TCBB.2018.2878564. [PMID: 30387738 PMCID: PMC7372714 DOI: 10.1109/tcbb.2018.2878564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Evolutionary hypotheses provide important underpinnings of biological and medical sciences, and comprehensive, genome-wide understanding of evolutionary relationships among organisms are needed to test and refine such hypotheses. Theory and empirical evidence clearly indicate that phylogenies (trees) of different genes (loci) should not display precisely matching topologies. The main reason for such phylogenetic incongruence is reticulated evolutionary history of most species due to meiotic sexual recombination in eukaryotes, or horizontal transfers of genetic material in prokaryotes. Nevertheless, many genes should display topologically related phylogenies, and should group into one or more (for genetic hybrids) clusters in poly-dimensional "tree space". Unusual evolutionary histories or effects of selection may result in "outlier" genes with phylogenies that fall outside the main distribution(s) of trees in tree space. We present a new phylogenomic method, CURatio, which uses ratios of total branch lengths in gene trees to help identify phylogenetic outliers in a given set of ortholog groups from multiple genomes. An advantage of CURatio over other methods is that genes absent from and/or duplicated in some genomes can be included in the analysis. We conducted a simulation study under the coalescent model, and showed that, given sufficient species depth and topological difference, these ratios are significantly higher for the "outlier" gene phylogenies. Also, we applied CURatio to a set of annotated genomes of the fungal family, Clavicipitaceae, and identified alkaloid biosynthesis genes as outliers, probably due to a history of duplication and loss. The source code is available at https://github.com/QiwenKang/CURatio, and the empirical data set on Clavicipitaceae and simulated data set are available at Mendeley https://data.mendeley.com/datasets/mrxts7wjrr/1.
Collapse
|
48
|
Degnan JH. Modeling Hybridization Under the Network Multispecies Coalescent. Syst Biol 2018; 67:786-799. [PMID: 29846734 PMCID: PMC6101600 DOI: 10.1093/sysbio/syy040] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2017] [Revised: 05/13/2018] [Accepted: 05/16/2018] [Indexed: 11/13/2022] Open
Abstract
Simultaneously modeling hybridization and the multispecies coalescent is becoming increasingly common, and inference of species networks in this context is now implemented in several software packages. This article addresses some of the conceptual issues and decisions to be made in this modeling, including whether or not to use branch lengths and issues with model identifiability. This article is based on a talk given at a Spotlight Session at Evolution 2017 meeting in Portland, Oregon. This session included several talks about modeling hybridization and gene flow in the presence of incomplete lineage sorting. Other talks given at this meeting are also included in this special issue of Systematic Biology.
Collapse
Affiliation(s)
- James H Degnan
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87131, USA
| |
Collapse
|
49
|
Adams NE, Inoue K, Seidel RA, Lang BK, Berg DJ. Isolation drives increased diversification rates in freshwater amphipods. Mol Phylogenet Evol 2018; 127:746-757. [PMID: 29908996 DOI: 10.1016/j.ympev.2018.06.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Revised: 02/21/2018] [Accepted: 06/11/2018] [Indexed: 01/06/2023]
Abstract
Vicariance and dispersal events affect current biodiversity patterns in desert springs. Whether major diversification events are due to environmental changes leading to radiation or due to isolation resulting in relict species is largely unknown. We seek to understand whether the Gammarus pecos species complex underwent major diversification events due to environmental changes in the area leading either to radiation into novel habitats, or formation of relicts due to isolation. Specifically, we tested the hypothesis that Gammarus in the northern Chihuahuan Desert of New Mexico and Texas, USA are descendants of an ancient marine lineage now containing multiple undescribed species. We sequenced a nuclear (28S) and two mitochondrial (16S, COI) genes from gammarid amphipods representing 16 desert springs in the northern Chihuahuan Desert. We estimated phylogenetic relationships, divergence times, and diversification rates of the Gammarus pecos complex. Our results revealed that the region contained two evolutionarily independent lineages: a younger Freshwater Lineage that shared a most-recent-common-ancestor with an older Saline Lineage ∼66.3 MYA (95.6-42.4 MYA). Each spring system generally formed a monophyletic clade based on the concatenated dataset. Freshwater Lineage diversification rates were 2.0-9.8 times higher than rates of the Saline Lineage. A series of post-Cretaceous colonizations by ancestral Gammarus taxa was likely followed by isolation. Paleo-geological, hydrological, and climatic events in the Neogene-to-Quaternary periods (23.03 MYA - present) in western North America promoted allopatric speciation of both lineages. We suggest that Saline Lineage populations include two undescribed Gammarus species, while the Freshwater Lineage shows repetition of fine-scale genetic structure in all major clades suggesting incipient speciation. Such ongoing speciation suggests that this region will continue to be a biodiversity hotspot for amphipods and other freshwater taxa.
Collapse
Affiliation(s)
- Nicole E Adams
- Department of Biology, Miami University, Oxford, OH 45056, United States.
| | - Kentaro Inoue
- Department of Biology, Miami University, Oxford, OH 45056, United States
| | - Richard A Seidel
- Department of Biology, Miami University, Oxford, OH 45056, United States
| | - Brian K Lang
- New Mexico Department of Game and Fish, Santa Fe, NM 87507, United States
| | - David J Berg
- Department of Biology, Miami University, Hamilton, OH 45011, United States
| |
Collapse
|
50
|
Almendra AL, González-Cózatl FX, Engstrom MD, Rogers DS. Evolutionary relationships and climatic niche evolution in the genus Handleyomys (Sigmodontinae: Oryzomyini). Mol Phylogenet Evol 2018; 128:12-25. [PMID: 29906608 DOI: 10.1016/j.ympev.2018.06.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Revised: 06/04/2018] [Accepted: 06/11/2018] [Indexed: 11/17/2022]
Abstract
Mesoamerica is considered a biodiversity hot spot with levels of endemism and species diversity likely underestimated. Unfortunately, the region continues to experience some of the highest deforestation rates in the world. For mammals, the evolutionary relationships of many endemic taxa are controversial, as it is the case for members of the genus Handleyomys. Estimation of a time-calibrated hypothesis for the evolution of these six genera (Euryoryzomys, Handleyomys, Hylaeamys, Nephelomys, Oecomys and Transandinomys) supported a monophyletic Handleyomys sensu lato. Based on their distinctive morphology and the amount of inter-generic genetic divergence, Handleyomys sensu stricto, H. alfaroi, the H. chapmani, and the H. melanotis species groups warrant recognition as separate genera. In addition, species delimitation documents the existence of cryptic species-level lineages within H. alfaroi and H. rostratus. Cryptic lineages within H. rostratus exhibited significant niche differentiation, but this was not the pattern among species-level clades within H. alfaroi. Similarly, age-range correlations revealed that niche evolution within Handleyomys is not correlated with evolutionary time, instead, ancestral climate tolerance reconstructions show niche disparities at specific diversification events within the chapmani and melanotis species groups, while the climatic niche of the rest of species of Handleyomys tended to be conservative.
Collapse
Affiliation(s)
- Ana Laura Almendra
- Centro de Investigación en Biodiversidad y Conservación, Universidad Autónoma del Estado de Morelos, Avenida Universidad 1001, Chamilpa, Cuernavaca, Morelos C.P. 62209, Mexico.
| | - Francisco X González-Cózatl
- Centro de Investigación en Biodiversidad y Conservación, Universidad Autónoma del Estado de Morelos, Avenida Universidad 1001, Chamilpa, Cuernavaca, Morelos C.P. 62209, Mexico
| | - Mark D Engstrom
- Centre for Biodiversity and Conservation Biology, Royal Ontario Museum, 100 Queen's Park, Toronto, Ontario M5S 2C6, Canada
| | - Duke S Rogers
- Department of Biology and M. L. Bean Life Science Museum, Brigham Young University, Provo, UT 84602, USA
| |
Collapse
|