1
|
Tabatabaee Y, Roch S, Warnow T. QR-STAR: A Polynomial-Time Statistically Consistent Method for Rooting Species Trees Under the Coalescent. J Comput Biol 2023; 30:1146-1181. [PMID: 37902986 DOI: 10.1089/cmb.2023.0185] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/01/2023] Open
Abstract
We address the problem of rooting an unrooted species tree given a set of unrooted gene trees, under the assumption that gene trees evolve within the model species tree under the multispecies coalescent (MSC) model. Quintet Rooting (QR) is a polynomial time algorithm that was recently proposed for this problem, which is based on the theory developed by Allman, Degnan, and Rhodes that proves the identifiability of rooted 5-taxon trees from unrooted gene trees under the MSC. However, although QR had good accuracy in simulations, its statistical consistency was left as an open problem. We present QR-STAR, a variant of QR with an additional step and a different cost function, and prove that it is statistically consistent under the MSC. Moreover, we derive sample complexity bounds for QR-STAR and show that a particular variant of it based on "short quintets" has polynomial sample complexity. Finally, our simulation study under a variety of model conditions shows that QR-STAR matches or improves on the accuracy of QR. QR-STAR is available in open-source form on github.
Collapse
Affiliation(s)
- Yasamin Tabatabaee
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| | - Sebastien Roch
- Department of Mathematics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA
| |
Collapse
|
2
|
Deng Z, Botas J, Cantalapiedra CP, Hernández-Plaza A, Burguet-Castell J, Huerta-Cepas J. PhyloCloud: an online platform for making sense of phylogenomic data. Nucleic Acids Res 2022; 50:W577-W582. [PMID: 35544233 PMCID: PMC9252743 DOI: 10.1093/nar/gkac324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 04/18/2022] [Accepted: 05/03/2022] [Indexed: 11/14/2022] Open
Abstract
Phylogenomics data have grown exponentially over the last decades. It is currently common for genome-wide projects to generate hundreds or even thousands of phylogenetic trees and multiple sequence alignments, which may also be very large in size. However, the analysis and interpretation of such data still depends on custom bioinformatic and visualisation workflows that are largely unattainable for non-expert users. Here, we present PhyloCloud, an online platform aimed at hosting, indexing and exploring large phylogenetic tree collections, providing also seamless access to common analyses and operations, such as node annotation, searching, topology editing, automatic tree rooting, orthology detection and more. In addition, PhyloCloud provides quick access to tools that allow users to build their own phylogenies using fast predefined workflows, graphically compare tree topologies, or query taxonomic databases such as NBCI or GTDB. Finally, PhyloCloud offers a novel tree visualisation system based on ETE Toolkit v4.0, which can be used to explore very large trees and enhance them with custom annotations and multiple sequence alignments. The platform allows for sharing tree collections and specific tree views via private links, or make them fully public, serving also as a repository of phylogenomic data. PhyloCloud is available at https://phylocloud.cgmlab.org.
Collapse
Affiliation(s)
- Ziqi Deng
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) and Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), 28223 Madrid, Spain
| | - Jorge Botas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) and Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), 28223 Madrid, Spain
| | - Carlos P Cantalapiedra
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) and Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), 28223 Madrid, Spain
| | - Ana Hernández-Plaza
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) and Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), 28223 Madrid, Spain
| | - Jordi Burguet-Castell
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) and Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), 28223 Madrid, Spain
| | - Jaime Huerta-Cepas
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) and Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA-CSIC), 28223 Madrid, Spain
| |
Collapse
|
3
|
Superson A, Battistuzzi F. Exclusion of fast evolving genes or fast evolving sites produces different archaean phylogenies. Mol Phylogenet Evol 2022; 170:107438. [DOI: 10.1016/j.ympev.2022.107438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2019] [Revised: 01/07/2022] [Accepted: 02/03/2022] [Indexed: 11/26/2022]
|
4
|
McLean BS, Bell KC, Cook JA. SNP-based Phylogenomic Inference in Holarctic Ground Squirrels (Urocitellus). Mol Phylogenet Evol 2022; 169:107396. [PMID: 35031463 DOI: 10.1016/j.ympev.2022.107396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 12/02/2021] [Accepted: 12/08/2021] [Indexed: 11/24/2022]
Abstract
Resolution of rapid evolutionary radiations requires harvesting maximal signal from phylogenomic datasets. However, studies of non-model clades often target conserved loci that are characterized by reduced information content, which can negatively affect gene tree precision and species tree accuracy. Single nucleotide polymorphism (SNP)-based methods are an underutilized but potentially valuable tool for estimating phylogeny and divergence times because they do not rely on resolved gene trees, allowing information from many or all variant loci to be leveraged in species tree reconstruction. We evaluated the utility of SNP-based methods in resolving phylogeny of Holarctic ground squirrels (Urocitellus), a radiation that has been difficult to disentangle, even in prior phylogenomic studies. We inferred phylogeny from a dataset of >3,000 ultraconserved element loci (UCEs) using two methods (SNAPP, SVDquartets) and compared our results with a new mitogenome phylogeny. We also systematically evaluated how phasing of UCEs improves per-locus information content, and inference of topology and other parameters within each of these SNP-based methods. Phasing improved topological resolution and branch length estimation at shallow levels (within species complexes), but less so at deeper levels, likely reflecting true uncertainty due to ancestral polymorphisms segregating in these rapidly diverging lineages. We resolved several key clades in Urocitellus and present targeted opportunities for future phylogenomic inquiry. Our results extend the roadmap for use of SNPs to address vertebrate radiations and support comparative analyses at multiple temporal scales.
Collapse
Affiliation(s)
- Bryan S McLean
- University of North Carolina Greensboro, Department of Biology, Greensboro, NC 27402 USA.
| | - Kayce C Bell
- Natural History Museum of Los Angeles County, Department of Mammalogy, Los Angeles, CA 90007 USA.
| | - Joseph A Cook
- University of New Mexico, Department of Biology and Museum of Southwestern Biology, Albuquerque, NM 87131 USA.
| |
Collapse
|
5
|
Derilus D, Rahman MZ, Serrano AE, Massey SE. Proteome size reduction in Apicomplexans is linked with loss of DNA repair and host redundant pathways. INFECTION GENETICS AND EVOLUTION 2020; 87:104642. [PMID: 33296723 DOI: 10.1016/j.meegid.2020.104642] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 11/07/2020] [Accepted: 11/23/2020] [Indexed: 11/29/2022]
Abstract
Apicomplexans are alveolate parasites which include Plasmodium falciparum, the main cause of malaria, one of the world's biggest killers from infectious disease. Apicomplexans are characterized by a reduction in proteome size, which appears to result from metabolic and functional simplification, commensurate with their parasitic lifestyle. However, other factors may also help to explain gene loss such as population bottlenecks experienced during transmission, and the effect of reducing the overall genomic information content. The latter constitutes an 'informational constraint', which is proposed to exert a selective pressure to evolve and maintain genes involved in informational fidelity and error correction, proportional to the quantity of information in the genome (which approximates to proteome size). The dynamics of gene loss was examined in 41 Apicomplexan genomes using orthogroup analysis. We show that loss of genes involved in amino acid metabolism and steroid biosynthesis can be explained by metabolic redundancy with the host. We also show that there is a marked tendency to lose DNA repair genes as proteome size is reduced. This may be explained by a reduction in size of the informational constraint and can help to explain elevated mutation rates in pathogens with reduced genome size. Multiple Sequentially Markovian Coalescent (MSMC) analysis indicates a recent bottleneck, consistent with predictions generated using allele-based population genetics approaches, implying that relaxed selection pressure due to reduced population size might have contributed to gene loss. However, the non-randomness of pathways that are lost challenges this scenario. Lastly, we identify unique orthogroups in malaria-causing Plasmodium species that infect humans, with a high proportion of membrane associated proteins. Thus, orthogroup analysis appears useful for identifying novel candidate pathogenic factors in parasites, when there is a wide sample of genomes available.
Collapse
Affiliation(s)
- D Derilus
- Environmental Sciences Department, University of Puerto Rico-Rio Piedras, United States of America
| | - M Z Rahman
- Biology Department, University of Puerto Rico-Rio Piedras, United States of America
| | - A E Serrano
- Department of Microbiology, University of Puerto Rico-School of Medicine, Medical Sciences, United States of America
| | - S E Massey
- Biology Department, University of Puerto Rico-Rio Piedras, United States of America.
| |
Collapse
|
6
|
Zhou M, Yang G, Sun G, Guo Z, Gong X, Pan Y. Resolving complicated relationships of the Panax bipinnatifidus complex in southwestern China by RAD-seq data. Mol Phylogenet Evol 2020; 149:106851. [PMID: 32438045 DOI: 10.1016/j.ympev.2020.106851] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2019] [Revised: 04/30/2020] [Accepted: 05/04/2020] [Indexed: 11/26/2022]
Abstract
The P. binpinnatifidus complex included most of the Panax species distributed in Sino-Himalaya regions except for P. pseudoginseng, P. stipuleanatus and P. notoginseng. However, the delimitation and identification of these taxa within the species complex are very difficult due to the existence of morphological intermediates, and their evolutionary relationships remain unresolved despite several studies have been carried out based on traditional DNA markers. The taxonomic uncertainty hinders the identification, conservation and exploration of these wild populations of Panax. To study this species complex, we employed ddRAD-seq data of these taxa from 18 different localities of southwestern China, using two RAD analysis pipelines, STACKS and pyRAD. Based on the results of phylogenetic analysis, the species complex was divided into four clades with high supports, which largely agreed with morphologically described species. Two clades, corresponding to P. vietnamensis and P. zingiberensis, respectively, were sister groups, indicating that these two species had a closer genetic relationship; the third clade was consisted of samples with bamboo-like rhizomes named as P. wangianus clade, and the fourth one with moniliform rhizomes was named as P. bipinnatifidus clade. The population genetic structure analysis and D-statistics test showed the localized admixture among these species, which indicated that introgression had occurred among the related lineages continuously distributed in southeastern Yunnan and adjacent regions.
Collapse
Affiliation(s)
- Mingmei Zhou
- Department of Economic Plants and Biotechnology, Yunnan Key Laboratory for Wild Plant Resources, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guoqian Yang
- School of Agriculture and Biology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Guiling Sun
- Institute of Plant Stress Biology, State Key Laboratory of Cotton Biology, Henan University, Henan 475001, China
| | - Zhenhua Guo
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China
| | - Xun Gong
- Department of Economic Plants and Biotechnology, Yunnan Key Laboratory for Wild Plant Resources, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China.
| | - Yuezhi Pan
- Department of Economic Plants and Biotechnology, Yunnan Key Laboratory for Wild Plant Resources, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China.
| |
Collapse
|
7
|
Karin BR, Gamble T, Jackman TR. Optimizing Phylogenomics with Rapidly Evolving Long Exons: Comparison with Anchored Hybrid Enrichment and Ultraconserved Elements. Mol Biol Evol 2020; 37:904-922. [PMID: 31710677 PMCID: PMC7038749 DOI: 10.1093/molbev/msz263] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Marker selection has emerged as an important component of phylogenomic study design due to rising concerns of the effects of gene tree estimation error, model misspecification, and data-type differences. Researchers must balance various trade-offs associated with locus length and evolutionary rate among other factors. The most commonly used reduced representation data sets for phylogenomics are ultraconserved elements (UCEs) and Anchored Hybrid Enrichment (AHE). Here, we introduce Rapidly Evolving Long Exon Capture (RELEC), a new set of loci that targets single exons that are both rapidly evolving (evolutionary rate faster than RAG1) and relatively long in length (>1,500 bp), while at the same time avoiding paralogy issues across amniotes. We compare the RELEC data set to UCEs and AHE in squamate reptiles by aligning and analyzing orthologous sequences from 17 squamate genomes, composed of 10 snakes and 7 lizards. The RELEC data set (179 loci) outperforms AHE and UCEs by maximizing per-locus genetic variation while maintaining presence and orthology across a range of evolutionary scales. RELEC markers show higher phylogenetic informativeness than UCE and AHE loci, and RELEC gene trees show greater similarity to the species tree than AHE or UCE gene trees. Furthermore, with fewer loci, RELEC remains computationally tractable for full Bayesian coalescent species tree analyses. We contrast RELEC to and discuss important aspects of comparable methods, and demonstrate how RELEC may be the most effective set of loci for resolving difficult nodes and rapid radiations. We provide several resources for capturing or extracting RELEC loci from other amniote groups.
Collapse
Affiliation(s)
- Benjamin R Karin
- Department of Biology, Villanova University, Villanova, PA
- Museum of Vertebrate Zoology and Department of Integrative Biology, University of California, Berkeley, CA
| | - Tony Gamble
- Department of Biological Sciences, Marquette University, Milwaukee, WI
- Milwaukee Public Museum, Milwaukee, WI
- Bell Museum of Natural History, University of Minnesota, St. Paul, MN
| | - Todd R Jackman
- Department of Biology, Villanova University, Villanova, PA
| |
Collapse
|
8
|
Hellmuth M, Huber KT, Moulton V. Reconciling event-labeled gene trees with MUL-trees and species networks. J Math Biol 2019; 79:1885-1925. [PMID: 31410552 DOI: 10.1007/s00285-019-01414-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Revised: 05/08/2019] [Indexed: 11/30/2022]
Abstract
Phylogenomics commonly aims to construct evolutionary trees from genomic sequence information. One way to approach this problem is to first estimate event-labeled gene trees (i.e., rooted trees whose non-leaf vertices are labeled by speciation or gene duplication events), and to then look for a species tree which can be reconciled with this tree through a reconciliation map between the trees. In practice, however, it can happen that there is no such map from a given event-labeled tree to any species tree. An important situation where this might arise is where the species evolution is better represented by a network instead of a tree. In this paper, we therefore consider the problem of reconciling event-labeled trees with species networks. In particular, we prove that any event-labeled gene tree can be reconciled with some network and that, under certain mild assumptions on the gene tree, the network can even be assumed to be multi-arc free. To prove this result, we show that we can always reconcile the gene tree with some multi-labeled (MUL-)tree, which can then be "folded up" to produce the desired reconciliation and network. In addition, we study the interplay between reconciliation maps from event-labeled gene trees to MUL-trees and networks. Our results could be useful for understanding how genomes have evolved after undergoing complex evolutionary events such as polyploidy.
Collapse
Affiliation(s)
- Marc Hellmuth
- Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, Germany. .,Center for Bioinformatics, Saarland University, Saarbrücken, Germany.
| | - Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, UK
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, UK
| |
Collapse
|
9
|
Tagliacollo VA, Lanfear R. Estimating Improved Partitioning Schemes for Ultraconserved Elements. Mol Biol Evol 2019; 35:1798-1811. [PMID: 29659989 PMCID: PMC5995204 DOI: 10.1093/molbev/msy069] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Ultraconserved (UCEs) are popular markers for phylogenomic studies. They are relatively simple to collect from distantly-related organisms, and contain sufficient information to infer relationships at almost all taxonomic levels. Most studies of UCEs use partitioning to account for variation in rates and patterns of molecular evolution among sites, for example by estimating an independent model of molecular evolution for each UCE. However, rates and patterns of molecular evolution vary substantially within as well as between UCEs, suggesting that there may be opportunities to improve how UCEs are partitioned for phylogenetic inference. We propose and evaluate new partitioning methods for phylogenomic studies of UCEs: Sliding-Window Site Characteristics (SWSC), and UCE Site Position (UCESP). The first method uses site characteristics such as entropy, multinomial likelihood, and GC content to generate partitions that account for heterogeneity in rates and patterns of molecular evolution within each UCE. The second method groups together nucleotides that are found in similar physical locations within the UCEs. We examined the new methods with seven published data sets from a variety of taxa. We demonstrate the UCESP method generates partitions that are worse than other strategies used to partition UCE data sets (e.g., one partition per UCE). The SWSC method, particularly when based on site entropies, generates partitions that account for within-UCE heterogeneity and leads to large increases in the model fit. All of the methods, code, and data used in this study, are available from https://github.com/Tagliacollo/PartitionUCE. Simplified code for implementing the best method, the SWSC-EN, is available from https://github.com/Tagliacollo/PFinderUCE-SWSC-EN.
Collapse
Affiliation(s)
- Victor A Tagliacollo
- Programa de Pós-graduação Ciências do Ambiente (CIAMB), Universidade Federal do Tocantins, Palmas, Tocantins, Brazil.,Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia
| | - Robert Lanfear
- Ecology and Evolution, Research School of Biology, Australian National University, Canberra, Australia
| |
Collapse
|
10
|
Zhang LN, Ma PF, Zhang YX, Zeng CX, Zhao L, Li DZ. Using nuclear loci and allelic variation to disentangle the phylogeny of Phyllostachys (Poaceae, Bambusoideae). Mol Phylogenet Evol 2019; 137:222-235. [PMID: 31112779 DOI: 10.1016/j.ympev.2019.05.011] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 05/16/2019] [Accepted: 05/17/2019] [Indexed: 11/18/2022]
Abstract
With the development of sequencing technologies, the use of multiple nuclear genes has become conventional for resolving difficult phylogenies. However, this technique also presents challenges due to gene-tree discordance, as a result of incomplete lineage sorting (ILS) and reticulate evolution. Although alleles can show sequence variation within individuals, which contain information regarding the evolution of organisms, they continue to be ignored in almost all phylogenetic analyses using randomly phased genome sequences. Here, we tried to incorporate alleles from multiple nuclear loci to study the phylogeny of the economically important bamboo genus Phyllostachys (Poaceae, Bambusoideae). Obtaining a total of 3926 sequences, we documented extensive allelic variation for 61 genes from 39 sampled species. Using datasets consisting of selected alleles, we demonstrated substantial discordance among phylogenetic relationships inferred from different alleles, as well as between concatenation and coalescent methods. Furthermore, ILS and hybridization were suggested to be underlying causes of the discordant phylogenetic signals. Taking these possible causes for conflicting phylogenetic results into consideration, we recovered the monophyly of Phyllostachys and its two morphology-defined sections. Our study also suggests that alleles deserve more attention in phylogenetic studies, since ignoring them can yield highly supported but spurious phylogenies. Meanwhile, alleles are helpful for unraveling complex evolutionary processes, particularly hybridization.
Collapse
Affiliation(s)
- Li-Na Zhang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China; College of Life Science, Fujian Agriculture and Forestry University, Fuzhou, Fujian 350002, China
| | - Peng-Fei Ma
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - Yu-Xiao Zhang
- Yunnan Academy of Biodiversity, Southwest Forestry University, Kunming, Yunnan 650224, China
| | - Chun-Xia Zeng
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - Lei Zhao
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China
| | - De-Zhu Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, Yunnan 650201, China.
| |
Collapse
|
11
|
Roch S, Nute M, Warnow T. Long-Branch Attraction in Species Tree Estimation: Inconsistency of Partitioned Likelihood and Topology-Based Summary Methods. Syst Biol 2019; 68:281-297. [PMID: 30247732 DOI: 10.1093/sysbio/syy061] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Accepted: 09/12/2018] [Indexed: 11/13/2022] Open
Abstract
With advances in sequencing technologies, there are now massive amounts of genomic data from across all life, leading to the possibility that a robust Tree of Life can be constructed. However, "gene tree heterogeneity", which is when different genomic regions can evolve differently, is a common phenomenon in multi-locus data sets, and reduces the accuracy of standard methods for species tree estimation that do not take this heterogeneity into account. New methods have been developed for species tree estimation that specifically address gene tree heterogeneity, and that have been proven to converge to the true species tree when the number of loci and number of sites per locus both increase (i.e., the methods are said to be "statistically consistent"). Yet, little is known about the biologically realistic condition where the number of sites per locus is bounded. We show that when the sequence length of each locus is bounded (by any arbitrarily chosen value), the most common approaches to species tree estimation that take heterogeneity into account (i.e., traditional fully partitioned concatenated maximum likelihood and newer approaches, called summary methods, that estimate the species tree by combining estimated gene trees) are not statistically consistent, even when the heterogeneity is extremely constrained. The main challenge is the presence of conditions such as long branch attraction that create biased tree estimation when the number of sites is restricted. Hence, our study uncovers a fundamental challenge to species tree estimation using both traditional and new methods.
Collapse
Affiliation(s)
- Sebastien Roch
- Department of Mathematics, University of Wisconsin-Madison, 480 Lincoln Dr, Madison, WI 53706, USA
| | - Michael Nute
- Department of Statistics, The University of Illinois at Urbana-Champaign, 725 S Wright St #101, Champaign, IL 61820, USA
| | - Tandy Warnow
- Department of Computer Science, The University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue, Urbana, IL 61801-2302, USA
| |
Collapse
|
12
|
Recknagel H, Kamenos NA, Elmer KR. Common lizards break Dollo’s law of irreversibility: Genome-wide phylogenomics support a single origin of viviparity and re-evolution of oviparity. Mol Phylogenet Evol 2018; 127:579-588. [DOI: 10.1016/j.ympev.2018.05.029] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2017] [Revised: 04/12/2018] [Accepted: 05/22/2018] [Indexed: 01/03/2023]
|
13
|
Eberle J, Dimitrov D, Valdez-Mondragón A, Huber BA. Microhabitat change drives diversification in pholcid spiders. BMC Evol Biol 2018; 18:141. [PMID: 30231864 PMCID: PMC6145181 DOI: 10.1186/s12862-018-1244-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Accepted: 08/16/2018] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Microhabitat changes are thought to be among the main drivers of diversification. However, this conclusion is mostly based on studies on vertebrates. Here, we investigate the influence of microhabitat on diversification rates in pholcid spiders (Araneae, Pholcidae). Diversification analyses were conducted in the framework of the largest molecular phylogeny of pholcid spiders to date based on three nuclear and three mitochondrial loci from 600 species representing more than 85% of the currently described pholcid genera. RESULTS Assessments of ancestral microhabitat revealed frequent evolutionary change. In particular, within the largest subfamily Pholcinae, numerous changes from near-ground habitats towards leaves and back were found. In general, taxa occupying leaves and large sheltered spaces had higher diversification rates than ground-dwelling taxa. Shifts in speciation rate were found in leaf- and space-dwelling taxa. CONCLUSIONS Our analyses result in one of the most comprehensive phylogenies available for a major spider family and provide a framework for any subsequent studies of pholcid spider biology. Diversification analyses strongly suggest that microhabitat is an important factor influencing diversification patterns in pholcid spiders.
Collapse
Affiliation(s)
- Jonas Eberle
- Alexander Koenig Research Museum of Zoology, Adenauerallee 160, 53113 Bonn, Germany
| | - Dimitar Dimitrov
- Center for Macroecology, Evolution and Climate, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark
- Natural History Museum, University of Oslo, PO Box 1172 Blindern, NO-0318 Oslo, Norway
- Department of Natural History, University Museum of Bergen, University of Bergen, PO Box 7800, NO-5020 Bergen, Norway
| | - Alejandro Valdez-Mondragón
- Alexander Koenig Research Museum of Zoology, Adenauerallee 160, 53113 Bonn, Germany
- Instituto de Biologia UNAM, sede Tlaxcala. Contiguo FES-Zaragoza Campus III, Ex Fábrica San Manuel de Morcom s/n, San Miguel Contla, Municipio de Santa Cruz Tlaxcala, C.P, 90640 Tlaxcala, Mexico
| | - Bernhard A. Huber
- Alexander Koenig Research Museum of Zoology, Adenauerallee 160, 53113 Bonn, Germany
| |
Collapse
|
14
|
Mclean BS, Bell KC, Allen JM, Helgen KM, Cook JA. Impacts of Inference Method and Data set Filtering on Phylogenomic Resolution in a Rapid Radiation of Ground Squirrels (Xerinae: Marmotini). Syst Biol 2018; 68:298-316. [DOI: 10.1093/sysbio/syy064] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 09/12/2018] [Indexed: 12/20/2022] Open
Affiliation(s)
- Bryan S Mclean
- Department of Biology and Museum of Southwestern Biology, 1 University of New Mexico, MSC03-2020, Albuquerque, NM 87131, USA
- Florida Museum of Natural History, University of Florida, 1659 Museum Road, Gainesville, FL 32611, USA
| | - Kayce C Bell
- Department of Biology and Museum of Southwestern Biology, 1 University of New Mexico, MSC03-2020, Albuquerque, NM 87131, USA
- Department of Invertebrate Zoology, Smithsonian Institution National Museum of Natural History, P.O. Box 37012, MRC 163, Washington, DC 20013-7012, USA
| | - Julie M Allen
- Department of Biology, University of Nevada, 1664 N. Virginia Street, Reno, NV 89557, USA
| | - Kristofer M Helgen
- Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of Adelaide, North Terrace, Adelaide SA 5005, Australia
| | - Joseph A Cook
- Department of Biology and Museum of Southwestern Biology, 1 University of New Mexico, MSC03-2020, Albuquerque, NM 87131, USA
| |
Collapse
|
15
|
Pratas D, Hosseini M, Grilo G, Pinho AJ, Silva RM, Caetano T, Carneiro J, Pereira F. Metagenomic Composition Analysis of an Ancient Sequenced Polar Bear Jawbone from Svalbard. Genes (Basel) 2018; 9:E445. [PMID: 30200636 PMCID: PMC6162538 DOI: 10.3390/genes9090445] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/03/2018] [Accepted: 09/03/2018] [Indexed: 12/17/2022] Open
Abstract
The sequencing of ancient DNA samples provides a novel way to find, characterize, and distinguish exogenous genomes of endogenous targets. After sequencing, computational composition analysis enables filtering of undesired sources in the focal organism, with the purpose of improving the quality of assemblies and subsequent data analysis. More importantly, such analysis allows extinct and extant species to be identified without requiring a specific or new sequencing run. However, the identification of exogenous organisms is a complex task, given the nature and degradation of the samples, and the evident necessity of using efficient computational tools, which rely on algorithms that are both fast and highly sensitive. In this work, we relied on a fast and highly sensitive tool, FALCON-meta, which measures similarity against whole-genome reference databases, to analyse the metagenomic composition of an ancient polar bear (Ursus maritimus) jawbone fossil. The fossil was collected in Svalbard, Norway, and has an estimated age of 110,000 to 130,000 years. The FASTQ samples contained 349 GB of nonamplified shotgun sequencing data. We identified and localized, relative to the FASTQ samples, the genomes with significant similarities to reference microbial genomes, including those of viruses, bacteria, and archaea, and to fungal, mitochondrial, and plastidial sequences. Among other striking features, we found significant similarities between modern-human, some bacterial and viral sequences (contamination) and the organelle sequences of wild carrot and tomato relative to the whole samples. For each exogenous candidate, we ran a damage pattern analysis, which in addition to revealing shallow levels of damage in the plant candidates, identified the source as contamination.
Collapse
Affiliation(s)
- Diogo Pratas
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal.
| | - Morteza Hosseini
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal.
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal.
| | - Gonçalo Grilo
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal.
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal.
| | - Armando J Pinho
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal.
- Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal.
| | - Raquel M Silva
- Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal.
- Department of Medical Sciences, University of Aveiro, 3810-193 Aveiro, Portugal.
- Institute for Biomedicine, University of Aveiro, 3810-193 Aveiro, Portugal.
| | - Tânia Caetano
- Department of Biology, University of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal.
- Centre for Environmental and Marine Studies, University of Aveiro, 3810-193 Aveiro, Portugal.
| | - João Carneiro
- Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Matosinhos, Portugal.
| | - Filipe Pereira
- Interdisciplinary Centre of Marine and Environmental Research, University of Porto, 4450-208 Matosinhos, Portugal.
| |
Collapse
|
16
|
Bangs MR, Douglas MR, Mussmann SM, Douglas ME. Unraveling historical introgression and resolving phylogenetic discord within Catostomus (Osteichthys: Catostomidae). BMC Evol Biol 2018; 18:86. [PMID: 29879898 PMCID: PMC5992631 DOI: 10.1186/s12862-018-1197-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Accepted: 05/18/2018] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Porous species boundaries can be a source of conflicting hypotheses, particularly when coupled with variable data and/or methodological approaches. Their impacts can often be magnified when non-model organisms with complex histories of reticulation are investigated. One such example is the genus Catostomus (Osteichthys, Catostomidae), a freshwater fish clade with conflicting morphological and mitochondrial phylogenies. The former is hypothesized as reflecting the presence of admixed genotypes within morphologically distinct lineages, whereas the latter is interpreted as the presence of distinct morphologies that emerged multiple times through convergent evolution. We tested these hypotheses using multiple methods, to including multispecies coalescent and concatenated approaches. Patterson's D-statistic was applied to resolve potential discord, examine introgression, and test the putative hybrid origin of two species. We also applied naïve binning to explore potential effects of concatenation. RESULTS We employed 14,007 loci generated from ddRAD sequencing of 184 individuals to derive the first highly supported nuclear phylogeny for Catostomus. Our phylogenomic analyses largely agreed with a morphological interpretation,with the exception of the placement of Xyrauchen texanus, which differs from both morphological and mitochondrial phylogenies. Additionally, our evaluation of the putative hybrid species C. columbianus revealed a lack introgression and instead matched the mitochondrial phylogeny. Furthermore, D-statistic tests clarified all discrepancies based solely on mitochondrial data, with agreement among topologies derived from concatenation and multispecies coalescent approaches. Extensive historic introgression was detected across six species-pairs. Potential endemism in the Virgin and Little Colorado Rivers was also apparent, and the former genus Pantosteus was derived as monophyletic, save for C. columbianus. CONCLUSIONS Complex reticulated histories detected herein support the hypothesis that introgression was responsible for conflicts that occurred within the mitochondrial phylogeny, and explains discrepancies found between it and previous morphological phylogenies. Additionally, the hybrid origin of C. columbianus was refuted, but with the caveat that more fine-grain sampling is still needed. Our diverse phylogenomic approaches provided largely concordant results, with naïve binning useful in exploring the single conflict. Considerable diversity was found within Catostomus across southwestern North America, with two drainages [Virgin River (UT) and Little Colorado River (AZ)] reflecting unique composition.
Collapse
Affiliation(s)
- Max R Bangs
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, 72701, USA. .,School of Fisheries, Aquaculture and Aquatic Sciences, Auburn University, Auburn, AL, 36849, USA.
| | - Marlis R Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Steven M Mussmann
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, 72701, USA
| | - Michael E Douglas
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR, 72701, USA
| |
Collapse
|
17
|
Menardo F, Wicker T, Keller B. Reconstructing the Evolutionary History of Powdery Mildew Lineages (Blumeria graminis) at Different Evolutionary Time Scales with NGS Data. Genome Biol Evol 2018; 9:446-456. [PMID: 28164219 PMCID: PMC5381671 DOI: 10.1093/gbe/evx008] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/30/2017] [Indexed: 01/25/2023] Open
Abstract
Blumeria graminis (Ascomycota) includes fungal pathogens that infect numerous grasses and cereals. Despite its economic impact on agriculture and its scientific importance in plant–pathogen interaction studies, the evolution of different lineages with different host ranges is poorly understood. Moreover, the taxonomy of grass powdery mildew is rather exceptional: there is only one described species (B. graminis) subdivided in different formae speciales (ff.spp.), which are defined by their host range. In this study we applied phylogenomic and population genomic methods to whole genome sequence data of 31 isolates of B. graminis belonging to different ff.spp. and reconstructed the evolutionary relationships between different lineages. The results of the phylogenomic analysis support a pattern of co-evolution between some of the ff.spp. and their host plant. In addition, we identified exceptions to this pattern, namely host jump events and the recent radiation of a clade less than 280,000 years ago. Furthermore, we found a high level of gene tree incongruence localized in the youngest clade. To distinguish between incomplete lineage sorting and lateral gene flow, we applied a coalescent-based method of demographic inference and found evidence of horizontal gene flow between recently diverged lineages. Overall we found that different processes shaped the diversification of B. graminis, co-evolution with the host species, host jump and fast radiation. Our study is an example of how genomic data can resolve complex evolutionary histories of cryptic lineages at different time scales, dealing with incomplete lineage sorting and lateral gene flow.
Collapse
|
18
|
Christensen S, Molloy EK, Vachaspati P, Warnow T. OCTAL: Optimal Completion of gene trees in polynomial time. Algorithms Mol Biol 2018; 13:6. [PMID: 29568323 PMCID: PMC5853121 DOI: 10.1186/s13015-018-0124-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2017] [Accepted: 03/06/2018] [Indexed: 12/16/2022] Open
Abstract
Background For a combination of reasons (including data generation protocols, approaches to taxon and gene sampling, and gene birth and loss), estimated gene trees are often incomplete, meaning that they do not contain all of the species of interest. As incomplete gene trees can impact downstream analyses, accurate completion of gene trees is desirable. Results We introduce the Optimal Tree Completion problem, a general optimization problem that involves completing an unrooted binary tree (i.e., adding missing leaves) so as to minimize its distance from a reference tree on a superset of the leaves. We present OCTAL, an algorithm that finds an optimal solution to this problem when the distance between trees is defined using the Robinson–Foulds (RF) distance, and we prove that OCTAL runs in \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$O(n^2)$$\end{document}O(n2) time, where n is the total number of species. We report on a simulation study in which gene trees can differ from the species tree due to incomplete lineage sorting, and estimated gene trees are completed using OCTAL with a reference tree based on a species tree estimated from the multi-locus dataset. OCTAL produces completed gene trees that are closer to the true gene trees than an existing heuristic approach in ASTRAL-II, but the accuracy of a completed gene tree computed by OCTAL depends on how topologically similar the reference tree (typically an estimated species tree) is to the true gene tree. Conclusions OCTAL is a useful technique for adding missing taxa to incomplete gene trees and provides good accuracy under a wide range of model conditions. However, results show that OCTAL’s accuracy can be reduced when incomplete lineage sorting is high, as the reference tree can be far from the true gene tree. Hence, this study suggests that OCTAL would benefit from using other types of reference trees instead of species trees when there are large topological distances between true gene trees and species trees. Electronic supplementary material The online version of this article (10.1186/s13015-018-0124-5) contains supplementary material, which is available to authorized users.
Collapse
|
19
|
Dornburg A, Townsend JP, Wang Z. Maximizing Power in Phylogenetics and Phylogenomics: A Perspective Illuminated by Fungal Big Data. ADVANCES IN GENETICS 2017; 100:1-47. [PMID: 29153398 DOI: 10.1016/bs.adgen.2017.09.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Since its original inception over 150 years ago by Darwin, we have made tremendous progress toward the reconstruction of the Tree of Life. In particular, the transition from analyzing datasets comprised of small numbers of loci to those comprised of hundreds of loci, if not entire genomes, has aided in resolving some of the most vexing of evolutionary problems while giving us a new perspective on biodiversity. Correspondingly, phylogenetic trees have taken a central role in fields that span ecology, conservation, and medicine. However, the rise of big data has also presented phylogenomicists with a new set of challenges to experimental design, quantitative analyses, and computation. The sequencing of a number of very first genomes presented significant challenges to phylogenetic inference, leading fungal phylogenomicists to begin addressing pitfalls and postulating solutions to the issues that arise from genome-scale analyses relevant to any lineage across the Tree of Life. Here we highlight insights from fungal phylogenomics for topics including systematics and species delimitation, ecological and phenotypic diversification, and biogeography while providing an overview of progress made on the reconstruction of the fungal Tree of Life. Finally, we provide a review of considerations to phylogenomic experimental design for robust tree inference. We hope that this special issue of Advances in Genetics not only excites the continued progress of fungal evolutionary biology but also motivates the interdisciplinary development of new theory and methods designed to maximize the power of genomic scale data in phylogenetic analyses.
Collapse
Affiliation(s)
- Alex Dornburg
- North Carolina Museum of Natural Sciences, Raleigh, NC, United States
| | | | - Zheng Wang
- Yale University, New Haven, CT, United States.
| |
Collapse
|
20
|
Edwards SV, Cloutier A, Baker AJ. Conserved Nonexonic Elements: A Novel Class of Marker for Phylogenomics. Syst Biol 2017; 66:1028-1044. [PMID: 28637293 PMCID: PMC5790140 DOI: 10.1093/sysbio/syx058] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2016] [Revised: 06/03/2017] [Accepted: 06/06/2017] [Indexed: 01/12/2023] Open
Abstract
Noncoding markers have a particular appeal as tools for phylogenomic analysis because, at least in vertebrates, they appear less subject to strong variation in GC content among lineages. Thus far, ultraconserved elements (UCEs) and introns have been the most widely used noncoding markers. Here we analyze and study the evolutionary properties of a new type of noncoding marker, conserved nonexonic elements (CNEEs), which consists of noncoding elements that are estimated to evolve slower than the neutral rate across a set of species. Although they often include UCEs, CNEEs are distinct from UCEs because they are not ultraconserved, and, most importantly, the core region alone is analyzed, rather than both the core and its flanking regions. Using a data set of 16 birds plus an alligator outgroup, and ∼3600-∼3800 loci per marker type, we found that although CNEEs were less variable than bioinformatically derived UCEs or introns and in some cases exhibited a slower approach to branch resolution as determined by phylogenomic subsampling, the quality of CNEE alignments was superior to those of the other markers, with fewer gaps and missing species. Phylogenetic resolution using coalescent approaches was comparable among the three marker types, with most nodes being fully and congruently resolved. Comparison of phylogenetic results across the three marker types indicated that one branch, the sister group to the passerine + falcon clade, was resolved differently and with moderate (>70%) bootstrap support between CNEEs and UCEs or introns. Overall, CNEEs appear to be promising as phylogenomic markers, yielding phylogenetic resolution as high as for UCEs and introns but with fewer gaps, less ambiguity in alignments and with patterns of nucleotide substitution more consistent with the assumptions of commonly used methods of phylogenetic analysis.
Collapse
Affiliation(s)
- Scott V. Edwards
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, 26 Oxford Street, Harvard University, Cambridge, MA 02138 USA
| | - Alison Cloutier
- Department of Organismic and Evolutionary Biology and Museum of Comparative Zoology, 26 Oxford Street, Harvard University, Cambridge, MA 02138 USA
- Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario, M5S 2C6 Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, Ontario, M5S 3B2 Canada
| | - Allan J. Baker
- Department of Natural History, Royal Ontario Museum, 100 Queen’s Park, Toronto, Ontario, M5S 2C6 Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, 25 Willcox Street, Toronto, Ontario, M5S 3B2 Canada
| |
Collapse
|
21
|
Molloy EK, Warnow T. To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods. Syst Biol 2017; 67:285-303. [DOI: 10.1093/sysbio/syx077] [Citation(s) in RCA: 138] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2016] [Accepted: 09/13/2017] [Indexed: 01/27/2023] Open
Affiliation(s)
- Erin K Molloy
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
22
|
Liu Y, Li D, Zhang Q, Song C, Zhong C, Zhang X, Wang Y, Yao X, Wang Z, Zeng S, Wang Y, Guo Y, Wang S, Li X, Li L, Liu C, McCann HC, He W, Niu Y, Chen M, Du L, Gong J, Datson PM, Hilario E, Huang H. Rapid radiations of both kiwifruit hybrid lineages and their parents shed light on a two-layer mode of species diversification. THE NEW PHYTOLOGIST 2017; 215:877-890. [PMID: 28543189 DOI: 10.1111/nph.14607] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Accepted: 04/04/2017] [Indexed: 05/20/2023]
Abstract
Reticulate speciation caused by interspecific hybridization is now recognized as an important mechanism in the creation of biological diversity. However, depicting the patterns of phylogenetic networks for lineages that have undergone interspecific gene flow is challenging. Here we sequenced 25 taxa representing natural diversity in the genus Actinidia with an average mapping depth of 26× on the reference genome to reconstruct their reticulate history. We found evidence, including significant gene tree discordance, cytonuclear conflicts, and changes in genome-wide heterozygosity across taxa, collectively supporting extensive reticulation in the genus. Furthermore, at least two separate parental species pairs were involved in the repeated origin of the hybrid lineages, in some of which a further phase of syngameon was triggered. On the basis of the elucidated hybridization relationships, we obtained a highly resolved backbone phylogeny consisting of taxa exhibiting no evidence of hybrid origin. The backbone taxa have distinct demographic histories and are the product of recent rounds of rapid radiations via sorting of ancestral variation under variable climatic and ecological conditions. Our results suggest a mode for consecutive plant diversification through two layers of radiations, consisting of the rapid evolution of backbone lineages and the formation of hybrid swarms derived from these lineages.
Collapse
Affiliation(s)
- Yifei Liu
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, The Chinese Academy of Sciences, Guangzhou, Guangdong, 510650, China
| | - Dawei Li
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Qiong Zhang
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Chi Song
- Wuhan Benagen Tech Solutions Company Limited, Wuhan, Hubei, 430070, China
| | - Caihong Zhong
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Xudong Zhang
- Wuhan Benagen Tech Solutions Company Limited, Wuhan, Hubei, 430070, China
| | - Ying Wang
- Wuhan Benagen Tech Solutions Company Limited, Wuhan, Hubei, 430070, China
| | - Xiaohong Yao
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Zupeng Wang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, The Chinese Academy of Sciences, Guangzhou, Guangdong, 510650, China
| | - Shaohua Zeng
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, The Chinese Academy of Sciences, Guangzhou, Guangdong, 510650, China
| | - Ying Wang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, The Chinese Academy of Sciences, Guangzhou, Guangdong, 510650, China
| | - Yangtao Guo
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, The Chinese Academy of Sciences, Guangzhou, Guangdong, 510650, China
| | - Shuaibin Wang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, The Chinese Academy of Sciences, Guangzhou, Guangdong, 510650, China
| | - Xinwei Li
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Li Li
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Chunyan Liu
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Honour C McCann
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, The Chinese Academy of Sciences, Guangzhou, Guangdong, 510650, China
- New Zealand Institute for Advanced Study, Massey University, Auckland, 0745, New Zealand
| | - Weiming He
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, The Chinese Academy of Sciences, Guangzhou, Guangdong, 510650, China
| | - Yan Niu
- Wuhan Benagen Tech Solutions Company Limited, Wuhan, Hubei, 430070, China
| | - Min Chen
- Wuhan Benagen Tech Solutions Company Limited, Wuhan, Hubei, 430070, China
| | - Liuwen Du
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Junjie Gong
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| | - Paul M Datson
- The New Zealand Institute for Plant and Food Research Limited, Mt Albert Research Centre, Auckland, 1142, New Zealand
| | - Elena Hilario
- The New Zealand Institute for Plant and Food Research Limited, Mt Albert Research Centre, Auckland, 1142, New Zealand
| | - Hongwen Huang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, The Chinese Academy of Sciences, Guangzhou, Guangdong, 510650, China
- Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Chinese Academy of Sciences, Wuhan, Hubei, 430074, China
| |
Collapse
|
23
|
Harish A, Kurland CG. Empirical genome evolution models root the tree of life. Biochimie 2017; 138:137-155. [DOI: 10.1016/j.biochi.2017.04.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Accepted: 04/25/2017] [Indexed: 01/05/2023]
|
24
|
Harish A, Kurland CG. Akaryotes and Eukaryotes are independent descendants of a universal common ancestor. Biochimie 2017; 138:168-183. [PMID: 28461155 DOI: 10.1016/j.biochi.2017.04.013] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Accepted: 04/25/2017] [Indexed: 11/29/2022]
Abstract
We reconstructed a global tree of life (ToL) with non-reversible and non-stationary models of genome evolution that root trees intrinsically. We implemented Bayesian model selection tests and compared the statistical support for four conflicting ToL hypotheses. We show that reconstructions obtained with a Bayesian implementation (Klopfstein et al., 2015) are consistent with reconstructions obtained with an empirical Sankoff parsimony (ESP) implementation (Harish et al., 2013). Both are based on the genome contents of coding sequences for protein domains (superfamilies) from hundreds of genomes. Thus, we conclude that the independent descent of Eukaryotes and Akaryotes (archaea and bacteria) from the universal common ancestor (UCA) is the most probable as well as the most parsimonious hypothesis for the evolutionary origins of extant genomes. Reconstructions of ancestral proteomes by both Bayesian and ESP methods suggest that at least 70% of unique domain-superfamilies known in extant species were present in the UCA. In addition, identification of a vast majority (96%) of the mitochondrial superfamilies in the UCA proteome precludes a symbiotic hypothesis for the origin of eukaryotes. Accordingly, neither the archaeal origin of eukaryotes nor the bacterial origin of mitochondria is supported by the data. The proteomic complexity of the UCA suggests that the evolution of cellular phenotypes in the two primordial lineages, Akaryotes and Eukaryotes, was driven largely by duplication of common superfamilies as well as by loss of unique superfamilies. Finally, innovation of novel superfamilies has played a surprisingly small role in the evolution of Akaryotes and only a marginal role in the evolution of Eukaryotes.
Collapse
Affiliation(s)
- Ajith Harish
- Department of Cell and Molecular Biology, Structural and Molecular Biology Program, Uppsala University, Uppsala, Sweden.
| | - Charles G Kurland
- Department of Biology, Microbial Ecology Program, Lund University, Lund, Sweden.
| |
Collapse
|
25
|
Arcila D, Ortí G, Vari R, Armbruster JW, Stiassny MLJ, Ko KD, Sabaj MH, Lundberg J, Revell LJ, Betancur-R R. Genome-wide interrogation advances resolution of recalcitrant groups in the tree of life. Nat Ecol Evol 2017; 1:20. [PMID: 28812610 DOI: 10.1038/s41559-016-0020] [Citation(s) in RCA: 150] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 10/25/2016] [Indexed: 12/21/2022]
Abstract
Much progress has been achieved in disentangling evolutionary relationships among species in the tree of life, but some taxonomic groups remain difficult to resolve despite increasing availability of genome-scale data sets. Here we present a practical approach to studying ancient divergences in the face of high levels of conflict, based on explicit gene genealogy interrogation (GGI). We show its efficacy in resolving the controversial relationships within the largest freshwater fish radiation (Otophysi) based on newly generated DNA sequences for 1,051 loci from 225 species. Initial results using a suite of standard methodologies revealed conflicting phylogenetic signal, which supports ten alternative evolutionary histories among early otophysan lineages. By contrast, GGI revealed that the vast majority of gene genealogies supports a single tree topology grounded on morphology that was not obtained by previous molecular studies. We also reanalysed published data sets for exemplary groups with recalcitrant resolution to assess the power of this approach. GGI supports the notion that ctenophores are the earliest-branching animal lineage, and adds insight into relationships within clades of yeasts, birds and mammals. GGI opens up a promising avenue to account for incompatible signals in large data sets and to discern between estimation error and actual biological conflict explaining gene tree discordance.
Collapse
Affiliation(s)
- Dahiana Arcila
- Department of Biological Sciences, The George Washington University, 2023 G Street NW, Washington DC 20052, USA.,Department of Vertebrate Zoology, National Museum of Natural History Smithsonian Institution, PO Box 37012, MRC 159, Washington DC 20013, USA
| | - Guillermo Ortí
- Department of Biological Sciences, The George Washington University, 2023 G Street NW, Washington DC 20052, USA
| | - Richard Vari
- Department of Vertebrate Zoology, National Museum of Natural History Smithsonian Institution, PO Box 37012, MRC 159, Washington DC 20013, USA
| | | | - Melanie L J Stiassny
- Department of Ichthyology, Division of Vertebrate Zoology, American Museum of Natural History, New York, New York 10024, USA
| | - Kyung D Ko
- Department of Biological Sciences, The George Washington University, 2023 G Street NW, Washington DC 20052, USA
| | - Mark H Sabaj
- Department of Ichthyology, The Academy of Natural Sciences, 1900 Benjamin Franklin Parkway, Philadelphia, Pennsylvania 19103, USA
| | - John Lundberg
- Department of Ichthyology, The Academy of Natural Sciences, 1900 Benjamin Franklin Parkway, Philadelphia, Pennsylvania 19103, USA
| | - Liam J Revell
- Department of Biology, University of Massachusetts Boston, Boston, Massachusetts 02125, USA
| | - Ricardo Betancur-R
- Department of Vertebrate Zoology, National Museum of Natural History Smithsonian Institution, PO Box 37012, MRC 159, Washington DC 20013, USA.,Department of Biology, University of Puerto Rico - Río Piedras, PO Box 23360, San Juan, Puerto Rico
| |
Collapse
|
26
|
Li X, Hao B, Pan D, Schneeweiss GM. Marker Development for Phylogenomics: The Case of Orobanchaceae, a Plant Family with Contrasting Nutritional Modes. FRONTIERS IN PLANT SCIENCE 2017; 8:1973. [PMID: 29218053 PMCID: PMC5704539 DOI: 10.3389/fpls.2017.01973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2017] [Accepted: 11/01/2017] [Indexed: 05/02/2023]
Abstract
Phylogenomic approaches, employing next-generation sequencing (NGS) techniques, have revolutionized systematic and evolutionary biology. Target enrichment is an efficient and cost-effective method in phylogenomics and is becoming increasingly popular. Depending on availability and quality of reference data as well as on biological features of the study system, (semi-)automated identification of suitable markers will require specific bioinformatic pipelines. Here, we established a highly flexible bioinformatic pipeline, BaitsFinder, to identify putative orthologous single copy genes (SCGs) and to construct bait sequences in a single workflow. Additionally, this pipeline has been constructed to be able to cope with challenging data sets, such as the nutritionally heterogeneous plant family Orobanchaceae. To this end, we used transcriptome data of differing quality available for four Orobanchaceae species and, as reference, SCG data from monkeyflower (Erythranthe guttata, syn. Mimulus g.; 1,915 genes) and tomato (Solanum lycopersicum; 391 genes). Depending on whether gaps were permitted in initial blast searches of the four Orobanchaceae species against the reference, our pipeline identified 1,307 and 981 SCGs with average length of 994 bp and 775 bp, respectively. Automated bait sequence construction (using 2× tiling) resulted in 38,170 and 21,856 bait sequences, respectively. In comparison to the recently published MarkerMiner 1.0 pipeline BaitsFinder identified about 1.6 times as many SCGs (of at least 900 bp length). Skipping steps specific to analyses of Orobanchaceae, BaitsFinder was successfully used in a group of non-parasitic plants (three Asteraceae species and, as reference, SCG data from Arabidopsis thaliana based on previously compiled SCGs). Thus, BaitsFinder is expected to be broadly applicable in groups, where only transcriptomes or partial genome data of differing quality are available.
Collapse
|
27
|
Zhao L, Li X, Zhang N, Zhang SD, Yi TS, Ma H, Guo ZH, Li DZ. Phylogenomic analyses of large-scale nuclear genes provide new insights into the evolutionary relationships within the rosids. Mol Phylogenet Evol 2016; 105:166-176. [DOI: 10.1016/j.ympev.2016.06.007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2015] [Revised: 06/06/2016] [Accepted: 06/27/2016] [Indexed: 12/28/2022]
|