1
|
McLay TGB, Fowler RM, Fahey PS, Murphy DJ, Udovicic F, Cantrill DJ, Bayly MJ. Phylogenomics reveals extreme gene tree discordance in a lineage of dominant trees: hybridization, introgression, and incomplete lineage sorting blur deep evolutionary relationships despite clear species groupings in Eucalyptus subgenus Eudesmia. Mol Phylogenet Evol 2023; 187:107869. [PMID: 37423562 DOI: 10.1016/j.ympev.2023.107869] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 06/29/2023] [Accepted: 06/30/2023] [Indexed: 07/11/2023]
Abstract
Eucalypts are a large and ecologically important group of plants on the Australian continent, and understanding their evolution is important in understanding evolution of the unique Australian flora. Previous phylogenies using plastome DNA, nuclear-ribosomal DNA, or random genome-wide SNPs, have been confounded by limited genetic sampling or by idiosyncratic biological features of the eucalypts, including widespread plastome introgression. Here we present phylogenetic analyses of Eucalyptus subgenus Eudesmia (22 species from western, northern, central and eastern Australia), in the first study to apply a target-capture sequencing approach using custom, eucalypt-specific baits (of 568 genes) to a lineage of Eucalyptus. Multiple accessions of all species were included, and target-capture data were supplemented by separate analyses of plastome genes (average of 63 genes per sample). Analyses revealed a complex evolutionary history likely shaped by incomplete lineage sorting and hybridization. Gene tree discordance generally increased with phylogenetic depth. Species, or groups of species, toward the tips of the tree are mostly supported, and three major clades are identified, but the branching order of these clades cannot be confirmed with confidence. Multiple approaches to filtering the nuclear dataset, by removing genes or samples, could not reduce gene tree conflict or resolve these relationships. Despite inherent complexities in eucalypt evolution, the custom bait kit devised for this research will be a powerful tool for investigating the evolutionary history of eucalypts more broadly.
Collapse
Affiliation(s)
- Todd G B McLay
- Royal Botanic Gardens Victoria, Melbourne 3004, Vic, Australia; School of BioSciences, The University of Melbourne, Parkville 3010, Vic, Australia.
| | - Rachael M Fowler
- School of BioSciences, The University of Melbourne, Parkville 3010, Vic, Australia
| | - Patrick S Fahey
- Research Centre for Ecosystem Resilience, The Royal Botanic Garden Sydney, Sydney 2000, NSW, Australia; Qld Alliance for Agriculture and Food Innovation, The University of Queensland, St Lucia 4072, Qld, Australia
| | - Daniel J Murphy
- Royal Botanic Gardens Victoria, Melbourne 3004, Vic, Australia; School of BioSciences, The University of Melbourne, Parkville 3010, Vic, Australia
| | - Frank Udovicic
- Royal Botanic Gardens Victoria, Melbourne 3004, Vic, Australia
| | - David J Cantrill
- Royal Botanic Gardens Victoria, Melbourne 3004, Vic, Australia; School of BioSciences, The University of Melbourne, Parkville 3010, Vic, Australia
| | - Michael J Bayly
- School of BioSciences, The University of Melbourne, Parkville 3010, Vic, Australia
| |
Collapse
|
2
|
Avni E, Snir S. A New Quartet-Based Statistical Method for Comparing Sets of Gene Trees Is Developed Using a Generalized Hoeffding Inequality. J Comput Biol 2018; 26:27-37. [PMID: 30422680 DOI: 10.1089/cmb.2018.0129] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Extracting the strength of the tree signal that is encompassed by a collection of gene trees is an exceptionally challenging problem in phylogenomics. Often, this problem not only involves the construction of individual phylogenies based on different genes, which may be a difficult endeavor on its own, but is also exacerbated by many factors that create conflicts between the evolutionary histories of different gene families, such as duplications or losses of genes; hybridization events; incomplete lineage sorting; and horizontal gene transfer, the latter two play central roles in the evolution of eukaryotes and prokaryotes, respectively. In this work, we tackle the aforementioned problem by focusing on quartet trees, which are the most basic unit of information in the context of unrooted phylogenies. In the first part, we show how a theorem of Janson that generalizes the classical Hoeffding inequality can be used to develop a statistical test involving quartets. In the second part, we study real and simulated data using this theoretical advancement, thus demonstrating how the significance of the differences between sets of quartets can be assessed. Our results are particularly intriguing since they nonstandardly require the analysis of dependent random variables.
Collapse
Affiliation(s)
- Eliran Avni
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
3
|
Fleischauer M, Böcker S. BCD Beam Search: considering suboptimal partial solutions in Bad Clade Deletion supertrees. PeerJ 2018; 6:e4987. [PMID: 29900080 PMCID: PMC5995099 DOI: 10.7717/peerj.4987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Accepted: 05/26/2018] [Indexed: 11/20/2022] Open
Abstract
Supertree methods enable the reconstruction of large phylogenies. The supertree problem can be formalized in different ways in order to cope with contradictory information in the input. Some supertree methods are based on encoding the input trees in a matrix; other methods try to find minimum cuts in some graph. Recently, we introduced Bad Clade Deletion (BCD) supertrees which combines the graph-based computation of minimum cuts with optimizing a global objective function on the matrix representation of the input trees. The BCD supertree method has guaranteed polynomial running time and is very swift in practice. The quality of reconstructed supertrees was superior to matrix representation with parsimony (MRP) and usually on par with SuperFine for simulated data; but particularly for biological data, quality of BCD supertrees could not keep up with SuperFine supertrees. Here, we present a beam search extension for the BCD algorithm that keeps alive a constant number of partial solutions in each top-down iteration phase. The guaranteed worst-case running time of the new algorithm is still polynomial in the size of the input. We present an exact and a randomized subroutine to generate suboptimal partial solutions. Both beam search approaches consistently improve supertree quality on all evaluated datasets when keeping 25 suboptimal solutions alive. Supertree quality of the BCD Beam Search algorithm is on par with MRP and SuperFine even for biological data. This is the best performance of a polynomial-time supertree algorithm reported so far.
Collapse
Affiliation(s)
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
| |
Collapse
|
4
|
Abstract
Supertree methods merge a set of overlapping phylogenetic trees into a supertree containing all taxa of the input trees. The challenge in supertree reconstruction is the way of dealing with conflicting information in the input trees. Many different algorithms for different objective functions have been suggested to resolve these conflicts. In particular, there exist methods based on encoding the source trees in a matrix, where the supertree is constructed applying a local search heuristic to optimize the respective objective function. We present a novel heuristic supertree algorithm called Bad Clade Deletion (BCD) supertrees. It uses minimum cuts to delete a locally minimal number of columns from such a matrix representation so that it is compatible. This is the complement problem to Matrix Representation with Compatibility (Maximum Split Fit). Our algorithm has guaranteed polynomial worst-case running time and performs swiftly in practice. Different from local search heuristics, it guarantees to return the directed perfect phylogeny for the input matrix, corresponding to the parent tree of the input trees, if one exists. Comparing supertrees to model trees for simulated data, BCD shows a better accuracy (F1 score) than the state-of-the-art algorithms SuperFine (up to 3%) and Matrix Representation with Parsimony (up to 7%); at the same time, BCD is up to 7 times faster than SuperFine, and up to 600 times faster than Matrix Representation with Parsimony. Finally, using the BCD supertree as a starting tree for a combined Maximum Likelihood analysis using RAxML, we reach significantly improved accuracy (1% higher F1 score) and running time (1.7-fold speedup).
Collapse
Affiliation(s)
- Markus Fleischauer
- Chair for Bioinformatics, Institute for Computer Science, Friedrich-Schiller-University Jena, Jena, Germany
| | - Sebastian Böcker
- Chair for Bioinformatics, Institute for Computer Science, Friedrich-Schiller-University Jena, Jena, Germany
| |
Collapse
|
5
|
Fleischauer M, Böcker S. Collecting reliable clades using the Greedy Strict Consensus Merger. PeerJ 2016; 4:e2172. [PMID: 27375971 PMCID: PMC4928488 DOI: 10.7717/peerj.2172] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 06/03/2016] [Indexed: 12/02/2022] Open
Abstract
Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well-known Matrix Representation with Parsimony, while others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the GSCM supertree. We find this modifications to increase the number of true positive clades by 18% compared to the currently used Overlap scoring.
Collapse
Affiliation(s)
- Markus Fleischauer
- Lehrstuhl für Bioinformatik, Friedrich-Schiller Universität , Jena , Thüringen , Germany
| | - Sebastian Böcker
- Lehrstuhl für Bioinformatik, Friedrich-Schiller Universität , Jena , Thüringen , Germany
| |
Collapse
|
6
|
Bhattacharyya S, Mukherjee J. COSPEDTree: COuplet Supertree by Equivalence Partitioning of Taxa Set and DAG Formation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:590-603. [PMID: 26357270 DOI: 10.1109/tcbb.2014.2366778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
From a set of phylogenetic trees with overlapping taxa set, a supertree exhibits evolutionary relationships among all input taxa. The key is to resolve the contradictory relationships with respect to input trees, between individual taxa subsets. Formulation of this NP hard problem employs either local search heuristics to reduce tree search space, or resolves the conflicts with respect to fixed or varying size subtree level decompositions. Different approximation techniques produce supertrees with considerable performance variations. Moreover, the majority of the algorithms involve high computational complexity, thus not suitable for use on large biological data sets. Current study presents COSPEDTree, a novel method for supertree construction. The technique resolves source tree conflicts by analyzing couplet (taxa pair) relationships for each source trees. Subsequently, individual taxa pairs are resolved with a single relation. To prioritize the consensus relations among individual taxa pairs for resolving them, greedy scoring is employed to assign higher score values for the consensus relations among a taxa pair. Selected set of relations resolving individual taxa pairs is subsequently used to construct a directed acyclic graph (DAG). Vertices of DAG represents a taxa subset inferred from the same speciation event. Thus, COSPEDTree can generate non-binary supertrees as well. Depth first traversal on this DAG yields final supertree. According to the performance metrics on branch dissimilarities (such as FP, FN and RF), COSPEDTree produces mostly conservative, well resolved supertrees. Specifically, RF metrics are mostly lower compared to the reference approaches, and FP values are lower apart from only strictly conservative (or veto) approaches. COSPEDTree has worst case time and space complexities of cubic and quadratic order, respectively, better or comparable to the reference approaches. Such high performance and low computational costs enable COSPEDTree to be applied on large scale biological data sets.
Collapse
|
7
|
Curnick DJ, Head CEI, Huang D, Crabbe MJC, Gollock M, Hoeksema BW, Johnson KG, Jones R, Koldewey HJ, Obura DO, Rosen BR, Smith DJ, Taylor ML, Turner JR, Wren S, Redding DW. Setting evolutionary-based conservation priorities for a phylogenetically data-poor taxonomic group (Scleractinia). Anim Conserv 2015. [DOI: 10.1111/acv.12185] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- D. J. Curnick
- Centre for Biodiversity and Environment Research; Department of Genetics, Evolution and Environment; University College London; London UK
- Zoological Society of London; London UK
| | - C. E. I. Head
- Zoological Society of London; London UK
- Department of Zoology; University of Oxford; Oxford UK
| | - D. Huang
- Department of Biological Sciences; National University of Singapore; Singapore
| | - M. J. C. Crabbe
- Department of Zoology; University of Oxford; Oxford UK
- Institute of Biomedical and Environmental Science and Technology; Faculty of Creative Arts, Technologies and Science; University of Bedfordshire; Luton UK
| | | | - B. W. Hoeksema
- Department of Marine Zoology; Naturalis Biodiversity Center; Leiden The Netherlands
| | - K. G. Johnson
- Department of Earth Sciences; Natural History Museum; London UK
| | - R. Jones
- Zoological Society of London; London UK
| | | | - D. O. Obura
- Coastal Oceans Research and Development in the Indian Ocean (CORDIO) East Africa; Mombasa Kenya
| | - B. R. Rosen
- Department of Earth Sciences; Natural History Museum; London UK
| | - D. J. Smith
- Coral Reef Research Unit; University of Essex; Colchester UK
| | - M. L. Taylor
- Department of Zoology; University of Oxford; Oxford UK
| | - J. R. Turner
- School of Ocean Sciences; Bangor University; Anglesey UK
| | - S. Wren
- Zoological Society of London; London UK
- Department of Zoology; University of Otago; Dunedin New Zealand
| | - D. W. Redding
- Centre for Biodiversity and Environment Research; Department of Genetics, Evolution and Environment; University College London; London UK
| |
Collapse
|
8
|
Lang JM, Darling AE, Eisen JA. Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One 2013; 8:e62510. [PMID: 23638103 PMCID: PMC3636077 DOI: 10.1371/journal.pone.0062510] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2012] [Accepted: 03/26/2013] [Indexed: 11/29/2022] Open
Abstract
Over 3000 microbial (bacterial and archaeal) genomes have been made publically available to date, providing an unprecedented opportunity to examine evolutionary genomic trends and offering valuable reference data for a variety of other studies such as metagenomics. The utility of these genome sequences is greatly enhanced when we have an understanding of how they are phylogenetically related to each other. Therefore, we here describe our efforts to reconstruct the phylogeny of all available bacterial and archaeal genomes. We identified 24, single-copy, ubiquitous genes suitable for this phylogenetic analysis. We used two approaches to combine the data for the 24 genes. First, we concatenated alignments of all genes into a single alignment from which a Maximum Likelihood (ML) tree was inferred using RAxML. Second, we used a relatively new approach to combining gene data, Bayesian Concordance Analysis (BCA), as implemented in the BUCKy software, in which the results of 24 single-gene phylogenetic analyses are used to generate a "primary concordance" tree. A comparison of the concatenated ML tree and the primary concordance (BUCKy) tree reveals that the two approaches give similar results, relative to a phylogenetic tree inferred from the 16S rRNA gene. After comparing the results and the methods used, we conclude that the current best approach for generating a single phylogenetic tree, suitable for use as a reference phylogeny for comparative analyses, is to perform a maximum likelihood analysis of a concatenated alignment of conserved, single-copy genes.
Collapse
Affiliation(s)
- Jenna Morgan Lang
- Department of Medical Microbiology and Immunology and Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
| | - Aaron E. Darling
- Department of Medical Microbiology and Immunology and Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
| | - Jonathan A. Eisen
- Department of Medical Microbiology and Immunology and Department of Evolution and Ecology, University of California Davis, Davis, California, United States of America
- Department of Energy Joint Genome Institute, Walnut Creek, California, United States of America
| |
Collapse
|
9
|
Kelly S, Maini PK. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments. PLoS One 2013; 8:e58537. [PMID: 23554899 PMCID: PMC3598851 DOI: 10.1371/journal.pone.0058537] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2012] [Accepted: 02/07/2013] [Indexed: 01/03/2023] Open
Abstract
The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.
Collapse
Affiliation(s)
- Steven Kelly
- Department of Plant Sciences, University of Oxford, Oxford, United Kingdom.
| | | |
Collapse
|
10
|
Yang J, Grünewald S, Wan XF. Quartet-net: a quartet-based method to reconstruct phylogenetic networks. Mol Biol Evol 2013; 30:1206-17. [PMID: 23493256 DOI: 10.1093/molbev/mst040] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Phylogenetic networks can model reticulate evolutionary events such as hybridization, recombination, and horizontal gene transfer. However, reconstructing such networks is not trivial. Popular character-based methods are computationally inefficient, whereas distance-based methods cannot guarantee reconstruction accuracy because pairwise genetic distances only reflect partial information about a reticulate phylogeny. To balance accuracy and computational efficiency, here we introduce a quartet-based method to construct a phylogenetic network from a multiple sequence alignment. Unlike distances that only reflect the relationship between a pair of taxa, quartets contain information on the relationships among four taxa; these quartets provide adequate capacity to infer a more accurate phylogenetic network. In applications to simulated and biological data sets, we demonstrate that this novel method is robust and effective in reconstructing reticulate evolutionary events and it has the potential to infer more accurate phylogenetic distances than other conventional phylogenetic network construction methods such as Neighbor-Joining, Neighbor-Net, and Split Decomposition. This method can be used in constructing phylogenetic networks from simple evolutionary events involving a few reticulate events to complex evolutionary histories involving a large number of reticulate events. A software called "Quartet-Net" is implemented and available at http://sysbio.cvm.msstate.edu/QuartetNet/.
Collapse
Affiliation(s)
- Jialiang Yang
- Department of Basic Sciences, College of Veterinary Medicine, Mississippi State University, USA
| | | | | |
Collapse
|
11
|
Grünewald S, Spillner A, Bastkowski S, Bögershausen A, Moulton V. SuperQ: computing supernetworks from quartets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:151-60. [PMID: 23702551 DOI: 10.1109/tcbb.2013.8] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Supertrees are a commonly used tool in phylogenetics to summarize collections of partial phylogenetic trees. As a generalization of supertrees, phylogenetic supernetworks allow, in addition, the visual representation of conflict between the trees that is not possible to observe with a single tree. Here, we introduce SuperQ, a new method for constructing such supernetworks (SuperQ is freely available at >www.uea.ac.uk/computing/superq.). It works by first breaking the input trees into quartet trees, and then stitching these together to form a special kind of phylogenetic network, called a split network. This stitching process is performed using an adaptation of the QNet method for split network reconstruction employing a novel approach to use the branch lengths from the input trees to estimate the branch lengths in the resulting network. Compared with previous supernetwork methods, SuperQ has the advantage of producing a planar network. We compare the performance of SuperQ to the Z-closure and Q-imputation supernetwork methods, and also present an analysis of some published data sets as an illustration of its applicability.
Collapse
Affiliation(s)
- Stefan Grünewald
- CAS-MPG Partner Institute for Computational Biology, Chinese Academy of Sciences, Shanghai, China.
| | | | | | | | | |
Collapse
|
12
|
Holland BR, Jarvis PD, Sumner JG. Low-Parameter Phylogenetic Inference Under the General Markov Model. Syst Biol 2012; 62:78-92. [DOI: 10.1093/sysbio/sys072] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Affiliation(s)
- Barbara R. Holland
- School of Mathematics and Physics, University of Tasmania, Hobart 7001, Australia
| | - Peter D. Jarvis
- School of Mathematics and Physics, University of Tasmania, Hobart 7001, Australia
| | - Jeremy G. Sumner
- School of Mathematics and Physics, University of Tasmania, Hobart 7001, Australia
| |
Collapse
|
13
|
Parks DH, Beiko RG. Measuring Community Similarity with Phylogenetic Networks. Mol Biol Evol 2012; 29:3947-58. [DOI: 10.1093/molbev/mss200] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
|
14
|
Lapierre P, Lasek-Nesselquist E, Gogarten JP. The impact of HGT on phylogenomic reconstruction methods. Brief Bioinform 2012; 15:79-90. [DOI: 10.1093/bib/bbs050] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
15
|
Hassanzadeh R, Eslahchi C, Sung WK. Constructing phylogenetic supernetworks based on simulated annealing. Mol Phylogenet Evol 2012; 63:738-44. [DOI: 10.1016/j.ympev.2012.02.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2011] [Revised: 01/29/2012] [Accepted: 02/15/2012] [Indexed: 10/28/2022]
|
16
|
Swenson MS, Suri R, Linder CR, Warnow T. SuperFine: Fast and Accurate Supertree Estimation. Syst Biol 2011; 61:214-27. [DOI: 10.1093/sysbio/syr092] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- M. Shel Swenson
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
| | - Rahul Suri
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
| | - C. Randal Linder
- Section of Integrative Biology, School of Biological Sciences, The University of Texas at Austin, Austin, TX, USA
| | - Tandy Warnow
- Department of Computer Science, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
17
|
Kupczok A. Split-based computation of majority-rule supertrees. BMC Evol Biol 2011; 11:205. [PMID: 21752249 PMCID: PMC3169514 DOI: 10.1186/1471-2148-11-205] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Accepted: 07/13/2011] [Indexed: 12/02/2022] Open
Abstract
Background Supertree methods combine overlapping input trees into a larger supertree. Here, I consider split-based supertree methods that first extract the split information of the input trees and subsequently combine this split information into a phylogeny. Well known split-based supertree methods are matrix representation with parsimony and matrix representation with compatibility. Combining input trees on the same taxon set, as in the consensus setting, is a well-studied task and it is thus desirable to generalize consensus methods to supertree methods. Results Here, three variants of majority-rule (MR) supertrees that generalize majority-rule consensus trees are investigated. I provide simple formulas for computing the respective score for bifurcating input- and supertrees. These score computations, together with a heuristic tree search minmizing the scores, were implemented in the python program PluMiST (Plus- and Minus SuperTrees) available from http://www.cibiv.at/software/plumist. The different MR methods were tested by simulation and on real data sets. The search heuristic was successful in combining compatible input trees. When combining incompatible input trees, especially one variant, MR(-) supertrees, performed well. Conclusions The presented framework allows for an efficient score computation of three majority-rule supertree variants and input trees. I combined the score computation with a heuristic search over the supertree space. The implementation was tested by simulation and on real data sets and showed promising results. Especially the MR(-) variant seems to be a reasonable score for supertree reconstruction. Generalizing these computations to multifurcating trees is an open problem, which may be tackled using this framework.
Collapse
Affiliation(s)
- Anne Kupczok
- Center for Integrative Bioinformatics Vienna, Max F, Perutz Laboratories, University of Vienna, Medical University of Vienna, University of Veterinary Medicine Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria.
| |
Collapse
|
18
|
Swenson MS, Suri R, Linder CR, Warnow T. An experimental study of Quartets MaxCut and other supertree methods. Algorithms Mol Biol 2011; 6:7. [PMID: 21504600 PMCID: PMC3101644 DOI: 10.1186/1748-7188-6-7] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Accepted: 04/19/2011] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Supertree methods represent one of the major ways by which the Tree of Life can be estimated, but despite many recent algorithmic innovations, matrix representation with parsimony (MRP) remains the main algorithmic supertree method. RESULTS We evaluated the performance of several supertree methods based upon the Quartets MaxCut (QMC) method of Snir and Rao and showed that two of these methods usually outperform MRP and five other supertree methods that we studied, under many realistic model conditions. However, the QMC-based methods have scalability issues that may limit their utility on large datasets. We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled. Finally, we showed that the popular optimality criterion of minimizing the total topological distance of the supertree to the source trees is only weakly correlated with supertree topological accuracy. Therefore evaluating supertree methods on biological datasets is problematic. CONCLUSIONS Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods. Also, because topological accuracy depends upon taxon sampling strategies, attempts to construct very large phylogenetic trees using supertree methods should consider the selection of source tree datasets, as well as supertree methods. Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.
Collapse
Affiliation(s)
- M Shel Swenson
- Department of Computer Science, The University of Texas at Austin, Austin TX, USA
| | - Rahul Suri
- Department of Computer Science, The University of Texas at Austin, Austin TX, USA
| | - C Randal Linder
- Section of Integrative Biology, The University of Texas at Austin, Austin TX, USA
| | - Tandy Warnow
- Department of Computer Science, The University of Texas at Austin, Austin TX, USA
| |
Collapse
|
19
|
Campbell V, Legendre P, Lapointe FJ. The performance of the Congruence Among Distance Matrices (CADM) test in phylogenetic analysis. BMC Evol Biol 2011; 11:64. [PMID: 21388552 PMCID: PMC3065422 DOI: 10.1186/1471-2148-11-64] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2010] [Accepted: 03/09/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND CADM is a statistical test used to estimate the level of Congruence Among Distance Matrices. It has been shown in previous studies to have a correct rate of type I error and good power when applied to dissimilarity matrices and to ultrametric distance matrices. Contrary to most other tests of incongruence used in phylogenetic analysis, the null hypothesis of the CADM test assumes complete incongruence of the phylogenetic trees instead of congruence. In this study, we performed computer simulations to assess the type I error rate and power of the test. It was applied to additive distance matrices representing phylogenies and to genetic distance matrices obtained from nucleotide sequences of different lengths that were simulated on randomly generated trees of varying sizes, and under different evolutionary conditions. RESULTS Our results showed that the test has an accurate type I error rate and good power. As expected, power increased with the number of objects (i.e., taxa), the number of partially or completely congruent matrices and the level of congruence among distance matrices. CONCLUSIONS Based on our results, we suggest that CADM is an excellent candidate to test for congruence and, when present, to estimate its level in phylogenomic studies where numerous genes are analysed simultaneously.
Collapse
Affiliation(s)
- Véronique Campbell
- Département de Sciences biologiques, Université de Montréal, C.P. 6128, Succ. Centre-ville, Montréal, Québec, H3C 3J7, Canada
| | - Pierre Legendre
- Département de Sciences biologiques, Université de Montréal, C.P. 6128, Succ. Centre-ville, Montréal, Québec, H3C 3J7, Canada
| | - François-Joseph Lapointe
- Département de Sciences biologiques, Université de Montréal, C.P. 6128, Succ. Centre-ville, Montréal, Québec, H3C 3J7, Canada
| |
Collapse
|
20
|
Bapteste E, O'Malley MA, Beiko RG, Ereshefsky M, Gogarten JP, Franklin-Hall L, Lapointe FJ, Dupré J, Dagan T, Boucher Y, Martin W. Prokaryotic evolution and the tree of life are two different things. Biol Direct 2009; 4:34. [PMID: 19788731 PMCID: PMC2761302 DOI: 10.1186/1745-6150-4-34] [Citation(s) in RCA: 128] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2009] [Accepted: 09/29/2009] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The concept of a tree of life is prevalent in the evolutionary literature. It stems from attempting to obtain a grand unified natural system that reflects a recurrent process of species and lineage splittings for all forms of life. Traditionally, the discipline of systematics operates in a similar hierarchy of bifurcating (sometimes multifurcating) categories. The assumption of a universal tree of life hinges upon the process of evolution being tree-like throughout all forms of life and all of biological time. In multicellular eukaryotes, the molecular mechanisms and species-level population genetics of variation do indeed mainly cause a tree-like structure over time. In prokaryotes, they do not. Prokaryotic evolution and the tree of life are two different things, and we need to treat them as such, rather than extrapolating from macroscopic life to prokaryotes. In the following we will consider this circumstance from philosophical, scientific, and epistemological perspectives, surmising that phylogeny opted for a single model as a holdover from the Modern Synthesis of evolution. RESULTS It was far easier to envision and defend the concept of a universal tree of life before we had data from genomes. But the belief that prokaryotes are related by such a tree has now become stronger than the data to support it. The monistic concept of a single universal tree of life appears, in the face of genome data, increasingly obsolete. This traditional model to describe evolution is no longer the most scientifically productive position to hold, because of the plurality of evolutionary patterns and mechanisms involved. Forcing a single bifurcating scheme onto prokaryotic evolution disregards the non-tree-like nature of natural variation among prokaryotes and accounts for only a minority of observations from genomes. CONCLUSION Prokaryotic evolution and the tree of life are two different things. Hence we will briefly set out alternative models to the tree of life to study their evolution. Ultimately, the plurality of evolutionary patterns and mechanisms involved, such as the discontinuity of the process of evolution across the prokaryote-eukaryote divide, summons forth a pluralistic approach to studying evolution. REVIEWERS This article was reviewed by Ford Doolittle, John Logsdon and Nicolas Galtier.
Collapse
|
21
|
Cardona G, Llabrés M, Rosselló F, Valiente G. Metrics for phylogenetic networks I: generalizations of the Robinson-Foulds metric. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:46-61. [PMID: 19179698 DOI: 10.1109/tcbb.2008.70] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The assessment of phylogenetic network reconstruction methods requires the ability to compare phylogenetic networks. This is the first in a series of papers devoted to the analysis and comparison of metrics for tree-child time consistent phylogenetic networks on the same set of taxa. In this paper, we study three metrics that have already been introduced in the literature: the Robinson-Foulds distance, the tripartitions distance and the mu-distance. They generalize to networks the classical Robinson-Foulds or partition distance for phylogenetic trees. We analyze the behavior of these metrics by studying their least and largest values and when they achieve them. As a by-product of this study, we obtain tight bounds on the size of a tree-child time consistent phylogenetic network.
Collapse
Affiliation(s)
- Gabriel Cardona
- University of the Balearic Islands, Palma de Mallorca, Spain.
| | | | | | | |
Collapse
|
22
|
Whitfield JB, Cameron SA, Huson DH, Steel MA. Filtered Z-Closure Supernetworks for Extracting and Visualizing Recurrent Signal from Incongruent Gene Trees. Syst Biol 2008; 57:939-47. [DOI: 10.1080/10635150802552849] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Affiliation(s)
- James B. Whitfield
- Department of Entomology, University of Illinois Urbana, Illinois 61801, USA; E-mail: (J.B.W.); (S.A.C.)
| | - Sydney A. Cameron
- Department of Entomology, University of Illinois Urbana, Illinois 61801, USA; E-mail: (J.B.W.); (S.A.C.)
| | - Daniel H. Huson
- Center for Bioinformatics, Tübingen University 72075 Tübingen, Germany; E-mail:
| | - Mike A. Steel
- Allan Wilson Centre, University of Canterbury Christchurch, New Zealand; E-mail:
| |
Collapse
|
23
|
Grünewald S, Huber KT, Wu Q. Two novel closure rules for constructing phylogenetic super-networks. Bull Math Biol 2008; 70:1906-24. [PMID: 18665426 DOI: 10.1007/s11538-008-9331-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2006] [Accepted: 04/23/2008] [Indexed: 10/21/2022]
Abstract
A contemporary and fundamental problem faced by many evolutionary biologists is how to puzzle together a collection Weierstrass p of partial trees (leaf-labeled trees whose leaves are bijectively labeled by species or, more generally, taxa, each supported by, e.g., a gene) into an overall parental structure that displays all trees in Weierstrass p. This already difficult problem is complicated by the fact that the trees in Weierstrass p regularly support conflicting phylogenetic relationships and are not on the same but only overlapping taxa sets. A desirable requirement on the sought after parental structure, therefore, is that it can accommodate the observed conflicts. Phylogenetic networks are a popular tool capable of doing precisely this. However, not much is known about how to construct such networks from partial trees, a notable exception being the Z-closure super-network approach, which is based on the Z-closure rule, and the Q-imputation approach. Although attractive approaches, they both suffer from the fact that the generated networks tend to be multidimensional making it necessary to apply some kind of filter to reduce their complexity.To avoid having to resort to a filter, we follow a different line of attack in this paper and develop closure rules for generating circular phylogenetic networks which have the attractive property that they can be represented in the plane. In particular, we introduce the novel Y-(closure) rule and show that this rule on its own or in combination with one of Meacham's closure rules (which we call the M-rule) has some very desirable theoretical properties. In addition, we present a case study based on Rivera et al. "ring of life" to explore the reconstructive power of the M- and Y-rule and also reanalyze an Arabidopsis thaliana data set.
Collapse
Affiliation(s)
- S Grünewald
- Department of Combinatorics and Geometry, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | | | | |
Collapse
|
24
|
Holland BR, Benthin S, Lockhart PJ, Moulton V, Huber KT. Using supernetworks to distinguish hybridization from lineage-sorting. BMC Evol Biol 2008; 8:202. [PMID: 18625077 PMCID: PMC2500029 DOI: 10.1186/1471-2148-8-202] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2007] [Accepted: 07/14/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A simple and widely used approach for detecting hybridization in phylogenies is to reconstruct gene trees from independent gene loci, and to look for gene tree incongruence. However, this approach may be confounded by factors such as poor taxon-sampling and/or incomplete lineage-sorting. RESULTS Using coalescent simulations, we investigated the potential of supernetwork methods to differentiate between gene tree incongruence arising from taxon sampling and incomplete lineage-sorting as opposed to hybridization. For few hybridization events, a large number of independent loci, and well-sampled taxa across these loci, we found that it was possible to distinguish incomplete lineage-sorting from hybridization using the filtered Z-closure and Q-imputation supernetwork methods. Moreover, we found that the choice of supernetwork method was less important than the choice of filtering, and that count-based filtering was the most effective filtering technique. CONCLUSION Filtered supernetworks provide a tool for detecting and identifying hybridization events in phylogenies, a tool that should become increasingly useful in light of current genome sequencing initiatives and the ease with which large numbers of independent gene loci can be determined using new generation sequencing technologies.
Collapse
Affiliation(s)
- Barbara R Holland
- Allan Wilson Centre, Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | - Steffi Benthin
- Allan Wilson Centre, Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | - Peter J Lockhart
- Allan Wilson Centre, Institute of Molecular BioSciences, Massey University, Palmerston North, New Zealand
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, UK
| | - Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, UK
| |
Collapse
|
25
|
Levasseur A, Pontarotti P, Poch O, Thompson JD. Strategies for reliable exploitation of evolutionary concepts in high throughput biology. Evol Bioinform Online 2008; 4:121-37. [PMID: 19204813 PMCID: PMC2614184 DOI: 10.4137/ebo.s597] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
The recent availability of the complete genome sequences of a large number of model organisms, together with the immense amount of data being produced by the new high-throughput technologies, means that we can now begin comparative analyses to understand the mechanisms involved in the evolution of the genome and their consequences in the study of biological systems. Phylogenetic approaches provide a unique conceptual framework for performing comparative analyses of all this data, for propagating information between different systems and for predicting or inferring new knowledge. As a result, phylogeny-based inference systems are now playing an increasingly important role in most areas of high throughput genomics, including studies of promoters (phylogenetic footprinting), interactomes (based on the presence and degree of conservation of interacting proteins), and in comparisons of transcriptomes or proteomes (phylogenetic proximity and co-regulation/co-expression). Here we review the recent developments aimed at making automatic, reliable phylogeny-based inference feasible in large-scale projects. We also discuss how evolutionary concepts and phylogeny-based inference strategies are now being exploited in order to understand the evolution and function of biological systems. Such advances will be fundamental for the success of the emerging disciplines of systems biology and synthetic biology, and will have wide-reaching effects in applied fields such as biotechnology, medicine and pharmacology.
Collapse
Affiliation(s)
- Anthony Levasseur
- Phylogenomics Laboratory, EA 3781 Evolution Biologique, Université de Provence, 13331 Marseille, France
| | | | | | | |
Collapse
|
26
|
Woolley SM, Posada D, Crandall KA. A comparison of phylogenetic network methods using computer simulation. PLoS One 2008; 3:e1913. [PMID: 18398452 PMCID: PMC2275308 DOI: 10.1371/journal.pone.0001913] [Citation(s) in RCA: 93] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2007] [Accepted: 02/24/2008] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND We present a series of simulation studies that explore the relative performance of several phylogenetic network approaches (statistical parsimony, split decomposition, union of maximum parsimony trees, neighbor-net, simulated history recombination upper bound, median-joining, reduced median joining and minimum spanning network) compared to standard tree approaches, (neighbor-joining and maximum parsimony) in the presence and absence of recombination. PRINCIPAL FINDINGS In the absence of recombination, all methods recovered the correct topology and branch lengths nearly all of the time when the substitution rate was low, except for minimum spanning networks, which did considerably worse. At a higher substitution rate, maximum parsimony and union of maximum parsimony trees were the most accurate. With recombination, the ability to infer the correct topology was halved for all methods and no method could accurately estimate branch lengths. CONCLUSIONS Our results highlight the need for more accurate phylogenetic network methods and the importance of detecting and accounting for recombination in phylogenetic studies. Furthermore, we provide useful information for choosing a network algorithm and a framework in which to evaluate improvements to existing methods and novel algorithms developed in the future.
Collapse
Affiliation(s)
- Steven M Woolley
- Computational Biology Program, Washington University School of Medicine, St. Louis, Missouri, United States of America.
| | | | | |
Collapse
|
27
|
Whitfield JB, Kjer KM. Ancient rapid radiations of insects: challenges for phylogenetic analysis. ANNUAL REVIEW OF ENTOMOLOGY 2008; 53:449-72. [PMID: 17877448 DOI: 10.1146/annurev.ento.53.103106.093304] [Citation(s) in RCA: 117] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Phylogenies of major groups of insects based on both morphological and molecular data have sometimes been contentious, often lacking the data to distinguish between alternative views of relationships. This paucity of data is often due to real biological and historical causes, such as shortness of time spans between divergences for evolution to occur and long time spans after divergences for subsequent evolutionary changes to obscure the earlier ones. Another reason for difficulty in resolving some of the relationships using molecular data is the limited spectrum of genes so far developed for phylogeny estimation. For this latter issue, there is cause for current optimism owing to rapid increases in our knowledge of comparative genomics. At least some historical patterns of divergence may, however, continue to defy our attempts to completely reconstruct them with confidence, at least using current strategies.
Collapse
Affiliation(s)
- James B Whitfield
- Department of Entomology, University of Illinois, Urbana, IL 61821, USA.
| | | |
Collapse
|