1
|
Fleischauer M, Böcker S. BCD Beam Search: considering suboptimal partial solutions in Bad Clade Deletion supertrees. PeerJ 2018; 6:e4987. [PMID: 29900080 PMCID: PMC5995099 DOI: 10.7717/peerj.4987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2018] [Accepted: 05/26/2018] [Indexed: 11/20/2022] Open
Abstract
Supertree methods enable the reconstruction of large phylogenies. The supertree problem can be formalized in different ways in order to cope with contradictory information in the input. Some supertree methods are based on encoding the input trees in a matrix; other methods try to find minimum cuts in some graph. Recently, we introduced Bad Clade Deletion (BCD) supertrees which combines the graph-based computation of minimum cuts with optimizing a global objective function on the matrix representation of the input trees. The BCD supertree method has guaranteed polynomial running time and is very swift in practice. The quality of reconstructed supertrees was superior to matrix representation with parsimony (MRP) and usually on par with SuperFine for simulated data; but particularly for biological data, quality of BCD supertrees could not keep up with SuperFine supertrees. Here, we present a beam search extension for the BCD algorithm that keeps alive a constant number of partial solutions in each top-down iteration phase. The guaranteed worst-case running time of the new algorithm is still polynomial in the size of the input. We present an exact and a randomized subroutine to generate suboptimal partial solutions. Both beam search approaches consistently improve supertree quality on all evaluated datasets when keeping 25 suboptimal solutions alive. Supertree quality of the BCD Beam Search algorithm is on par with MRP and SuperFine even for biological data. This is the best performance of a polynomial-time supertree algorithm reported so far.
Collapse
Affiliation(s)
| | - Sebastian Böcker
- Chair for Bioinformatics, Friedrich-Schiller-University, Jena, Germany
| |
Collapse
|
2
|
Fleischauer M, Böcker S. Collecting reliable clades using the Greedy Strict Consensus Merger. PeerJ 2016; 4:e2172. [PMID: 27375971 PMCID: PMC4928488 DOI: 10.7717/peerj.2172] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 06/03/2016] [Indexed: 12/02/2022] Open
Abstract
Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well-known Matrix Representation with Parsimony, while others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the GSCM supertree. We find this modifications to increase the number of true positive clades by 18% compared to the currently used Overlap scoring.
Collapse
Affiliation(s)
- Markus Fleischauer
- Lehrstuhl für Bioinformatik, Friedrich-Schiller Universität , Jena , Thüringen , Germany
| | - Sebastian Böcker
- Lehrstuhl für Bioinformatik, Friedrich-Schiller Universität , Jena , Thüringen , Germany
| |
Collapse
|
3
|
Davis RB, Nicholson DB, Saunders ELR, Mayhew PJ. Fossil gaps inferred from phylogenies alter the apparent nature of diversification in dragonflies and their relatives. BMC Evol Biol 2011; 11:252. [PMID: 21917167 PMCID: PMC3179963 DOI: 10.1186/1471-2148-11-252] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2011] [Accepted: 09/14/2011] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND The fossil record has suggested that clade growth may differ in marine and terrestrial taxa, supporting equilibrial models in the former and expansionist models in the latter. However, incomplete sampling may bias findings based on fossil data alone. To attempt to correct for such bias, we assemble phylogenetic supertrees on one of the oldest clades of insects, the Odonatoidea (dragonflies, damselflies and their extinct relatives), using MRP and MRC. We use the trees to determine when, and in what clades, changes in taxonomic richness have occurred. We then test whether equilibrial or expansionist models are supported by fossil data alone, and whether findings differ when phylogenetic information is used to infer gaps in the fossil record. RESULTS There is broad agreement in family-level relationships between both supertrees, though with some uncertainty along the backbone of the tree regarding dragonflies (Anisoptera). "Anisozygoptera" are shown to be paraphyletic when fossil information is taken into account. In both trees, decreases in net diversification are associated with species-poor extant families (Neopetaliidae, Hemiphlebiidae), and an upshift is associated with Calopterygidae + Polythoridae. When ghost ranges are inferred from the fossil record, many families are shown to have much earlier origination dates. In a phylogenetic context, the number of family-level lineages is shown to be up to twice as high as the fossil record alone suggests through the Cretaceous and Cenozoic, and a logistic increase in richness is detected in contrast to an exponential increase indicated by fossils alone. CONCLUSIONS Our analysis supports the notion that taxa, which appear to have diversified exponentially using fossil data, may in fact have diversified more logistically. This in turn suggests that one of the major apparent differences between the marine and terrestrial fossil record may simply be an artifact of incomplete sampling. Our results also support previous notions that adult colouration plays an important role in odonate radiation, and that Anisozygoptera should be grouped in a single inclusive taxon with Anisoptera, separate from Zygoptera.
Collapse
Affiliation(s)
- Robert B Davis
- Department of Biology, University of York, York, YO10 5YW, UK
- Department of Zoology, University of Tartu, Vanemuise 46, EE-51014 Tartu, Estonia
| | - David B Nicholson
- Department of Biology, University of York, York, YO10 5YW, UK
- Department of Palaeontology, The Natural History Museum, Cromwell Road, London, SW7 5BD, UK
- National Museums of Scotland, Department of Natural Sciences, Edinburgh, Midlothian, EH1 1JF, UK
| | | | - Peter J Mayhew
- Department of Biology, University of York, York, YO10 5YW, UK
| |
Collapse
|
4
|
Kupczok A. Split-based computation of majority-rule supertrees. BMC Evol Biol 2011; 11:205. [PMID: 21752249 PMCID: PMC3169514 DOI: 10.1186/1471-2148-11-205] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Accepted: 07/13/2011] [Indexed: 12/02/2022] Open
Abstract
Background Supertree methods combine overlapping input trees into a larger supertree. Here, I consider split-based supertree methods that first extract the split information of the input trees and subsequently combine this split information into a phylogeny. Well known split-based supertree methods are matrix representation with parsimony and matrix representation with compatibility. Combining input trees on the same taxon set, as in the consensus setting, is a well-studied task and it is thus desirable to generalize consensus methods to supertree methods. Results Here, three variants of majority-rule (MR) supertrees that generalize majority-rule consensus trees are investigated. I provide simple formulas for computing the respective score for bifurcating input- and supertrees. These score computations, together with a heuristic tree search minmizing the scores, were implemented in the python program PluMiST (Plus- and Minus SuperTrees) available from http://www.cibiv.at/software/plumist. The different MR methods were tested by simulation and on real data sets. The search heuristic was successful in combining compatible input trees. When combining incompatible input trees, especially one variant, MR(-) supertrees, performed well. Conclusions The presented framework allows for an efficient score computation of three majority-rule supertree variants and input trees. I combined the score computation with a heuristic search over the supertree space. The implementation was tested by simulation and on real data sets and showed promising results. Especially the MR(-) variant seems to be a reasonable score for supertree reconstruction. Generalizing these computations to multifurcating trees is an open problem, which may be tackled using this framework.
Collapse
Affiliation(s)
- Anne Kupczok
- Center for Integrative Bioinformatics Vienna, Max F, Perutz Laboratories, University of Vienna, Medical University of Vienna, University of Veterinary Medicine Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria.
| |
Collapse
|
5
|
Kupczok A. Consequences of different null models on the tree shape bias of supertree methods. Syst Biol 2011; 60:218-25. [PMID: 21252387 DOI: 10.1093/sysbio/syq086] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Anne Kupczok
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, University of Veterinary Medicine Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria.
| |
Collapse
|
6
|
Kupczok A, Schmidt HA, von Haeseler A. Accuracy of phylogeny reconstruction methods combining overlapping gene data sets. Algorithms Mol Biol 2010; 5:37. [PMID: 21134245 PMCID: PMC3022592 DOI: 10.1186/1748-7188-5-37] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2010] [Accepted: 12/06/2010] [Indexed: 11/17/2022] Open
Abstract
Background The availability of many gene alignments with overlapping taxon sets raises the question of which strategy is the best to infer species phylogenies from multiple gene information. Methods and programs abound that use the gene alignment in different ways to reconstruct the species tree. In particular, different methods combine the original data at different points along the way from the underlying sequences to the final tree. Accordingly, they are classified into superalignment, supertree and medium-level approaches. Here, we present a simulation study to compare different methods from each of these three approaches. Results We observe that superalignment methods usually outperform the other approaches over a wide range of parameters including sparse data and gene-specific evolutionary parameters. In the presence of high incongruency among gene trees, however, other combination methods show better performance than the superalignment approach. Surprisingly, some supertree and medium-level methods exhibit, on average, worse results than a single gene phylogeny with complete taxon information. Conclusions For some methods, using the reconstructed gene tree as an estimation of the species tree is superior to the combination of incomplete information. Superalignment usually performs best since it is less susceptible to stochastic error. Supertree methods can outperform superalignment in the presence of gene-tree conflict.
Collapse
|
7
|
Davis RB, Baldauf SL, Mayhew PJ. Many hexapod groups originated earlier and withstood extinction events better than previously realized: inferences from supertrees. Proc Biol Sci 2010; 277:1597-606. [PMID: 20129983 PMCID: PMC2871844 DOI: 10.1098/rspb.2009.2299] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2009] [Accepted: 01/11/2010] [Indexed: 11/12/2022] Open
Abstract
Comprising over half of all described species, the hexapods are central to understanding the evolution of global biodiversity. Direct fossil evidence suggests that new hexapod orders continued to originate from the Jurassic onwards, and diversity is presently higher than ever. Previous studies also suggest that several shifts in net diversification rate have occurred at higher taxonomic levels. However, their inferred timing is phylogeny dependent. We re-examine these issues using the supertree approach to provide, to our knowledge, the first composite estimates of hexapod order-level phylogeny. The Purvis matrix representation with parsimony method provides the most optimal supertree, but alternative methods are considered. Inferring ghost ranges shows richness of terminal lineages in the order-level phylogeny to peak just before the end-Permian extinction, rather than the present day, indicating that at least 11 more lineages survived this extinction than implied by fossils alone. The major upshift in diversification is associated with the origin of wings/wing folding and for the first time, to our knowledge, significant downshifts are shown associated with the origin of species-poor taxa (e.g. Neuropterida, Zoraptera). Polyneopteran phylogeny, especially the position of Zoraptera, remains important resolve because this influences findings regarding shifts in diversification. Our study shows how combining fossil with phylogenetic information can improve macroevolutionary inferences.
Collapse
Affiliation(s)
- Robert B Davis
- Department of Biology, University of York, Heslington, York, UK.
| | | | | |
Collapse
|
8
|
Swenson MS, Barbançon F, Warnow T, Linder CR. A simulation study comparing supertree and combined analysis methods using SMIDGen. Algorithms Mol Biol 2010; 5:8. [PMID: 20047664 PMCID: PMC2837663 DOI: 10.1186/1748-7188-5-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2009] [Accepted: 01/04/2010] [Indexed: 11/16/2022] Open
Abstract
Background Supertree methods comprise one approach to reconstructing large molecular phylogenies given multi-marker datasets: trees are estimated on each marker and then combined into a tree (the "supertree") on the entire set of taxa. Supertrees can be constructed using various algorithmic techniques, with the most common being matrix representation with parsimony (MRP). When the data allow, the competing approach is a combined analysis (also known as a "supermatrix" or "total evidence" approach) whereby the different sequence data matrices for each of the different subsets of taxa are concatenated into a single supermatrix, and a tree is estimated on that supermatrix. Results In this paper, we describe an extensive simulation study we performed comparing two supertree methods, MRP and weighted MRP, to combined analysis methods on large model trees. A key contribution of this study is our novel simulation methodology (Super-Method Input Data Generator, or SMIDGen) that better reflects biological processes and the practices of systematists than earlier simulations. We show that combined analysis based upon maximum likelihood outperforms MRP and weighted MRP, giving especially big improvements when the largest subtree does not contain most of the taxa. Conclusions This study demonstrates that MRP and weighted MRP produce distinctly less accurate trees than combined analyses for a given base method (maximum parsimony or maximum likelihood). Since there are situations in which combined analyses are not feasible, there is a clear need for better supertree methods. The source tree and combined datasets used in this study can be used to test other supertree and combined analysis methods.
Collapse
|
9
|
Davis RB, Baldauf SL, Mayhew PJ. Eusociality and the success of the termites: insights from a supertree of dictyopteran families. J Evol Biol 2009; 22:1750-61. [PMID: 19549138 DOI: 10.1111/j.1420-9101.2009.01789.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Sociality in insects may negatively impact on species richness. We tested whether termites have experienced shifts in diversification rates through time. Supertree methods were used to synthesize family-level relationships within termites, cockroaches and mantids. A deep positive shift in diversification rate is found within termites, but not in the cockroaches from which they evolved. The shift is responsible for most of their extant species richness suggesting that eusociality is not necessarily detrimental to species richness, and may sometimes have a positive effect. Mechanistic studies of speciation and extinction in eusocial insects are advocated.
Collapse
Affiliation(s)
- R B Davis
- Department of Biology, University of York, York YO105YW, UK.
| | | | | |
Collapse
|
10
|
|
11
|
Abstract
We analyze a maximum likelihood approach for combining phylogenetic trees into a larger "supertree." This is based on a simple exponential model of phylogenetic error, which ensures that ML supertrees have a simple combinatorial description (as a median tree, minimizing a weighted sum of distances to the input trees). We show that this approach to ML supertree reconstruction is statistically consistent (it converges on the true species supertree as more input trees are combined), in contrast to the widely used MRP method, which we show can be statistically inconsistent under the exponential error model. We also show that this statistical consistency extends to an ML approach for constructing species supertrees from gene trees. In this setting, incomplete lineage sorting (due to coalescence rates of homologous genes being lower than speciation rates) has been shown to lead to gene trees that are frequently different from species trees, and this can confound efforts to reconstruct the species phylogeny correctly.
Collapse
Affiliation(s)
- Mike Steel
- Allan Wilson Centre for Molecular Ecology and Evolution, Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand.
| | | |
Collapse
|
12
|
Ranwez V, Berry V, Criscuolo A, Fabre PH, Guillemot S, Scornavacca C, Douzery EJP. PhySIC: A Veto Supertree Method with Desirable Properties. Syst Biol 2007; 56:798-817. [PMID: 17918032 DOI: 10.1080/10635150701639754] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022] Open
Affiliation(s)
- Vincent Ranwez
- Institut des Sciences de l'Evolution (ISEM, UMR 5554 CNRS), Université Montpellier II Place E. Bataillon, CC 064, 34095, Montpellier, Cedex 5, France E-mail:
| | - Vincent Berry
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM,UMR 5506, CNRS), Université Montpellier II 161 rue Ada, 34392, Montpellier, Cedex 5, France
| | - Alexis Criscuolo
- Institut des Sciences de l'Evolution (ISEM, UMR 5554 CNRS), Université Montpellier II Place E. Bataillon, CC 064, 34095, Montpellier, Cedex 5, France E-mail:
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM,UMR 5506, CNRS), Université Montpellier II 161 rue Ada, 34392, Montpellier, Cedex 5, France
| | - Pierre-Henri Fabre
- Institut des Sciences de l'Evolution (ISEM, UMR 5554 CNRS), Université Montpellier II Place E. Bataillon, CC 064, 34095, Montpellier, Cedex 5, France E-mail:
| | - Sylvain Guillemot
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM,UMR 5506, CNRS), Université Montpellier II 161 rue Ada, 34392, Montpellier, Cedex 5, France
| | - Celine Scornavacca
- Institut des Sciences de l'Evolution (ISEM, UMR 5554 CNRS), Université Montpellier II Place E. Bataillon, CC 064, 34095, Montpellier, Cedex 5, France E-mail:
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier (LIRMM,UMR 5506, CNRS), Université Montpellier II 161 rue Ada, 34392, Montpellier, Cedex 5, France
| | - Emmanuel J. P. Douzery
- Institut des Sciences de l'Evolution (ISEM, UMR 5554 CNRS), Université Montpellier II Place E. Bataillon, CC 064, 34095, Montpellier, Cedex 5, France E-mail:
| |
Collapse
|
13
|
Wilkinson M, Cotton JA, Lapointe FJ, Pisani D. Properties of supertree methods in the consensus setting. Syst Biol 2007; 56:330-7. [PMID: 17464887 DOI: 10.1080/10635150701245370] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Affiliation(s)
- Mark Wilkinson
- Department of Zoology, The Natural History Museum, London, SW7 5BD, UK.
| | | | | | | |
Collapse
|
14
|
Goloboff PA. Minority rule supertrees? MRP, Compatibility, and Minimum Flip may display the least frequent groups. Cladistics 2005. [DOI: 10.1111/j.1096-0031.2005.00064.x] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
15
|
Abstract
Supertrees result from combining many smaller, overlapping phylogenetic trees into a single, more comprehensive tree. As such, supertree construction is probably as old as the field of systematics itself, and remains our only way of visualizing the Tree of Life as a whole. Over the past decade, supertree construction has gained a more formal, objective footing, and has become an area of active theoretical and practical research. Here, I review the history of the supertree approach, focusing mainly on its current implementation. The supertrees of today represent some of the largest, complete phylogenies available for many groups, but are not without their critics. I conclude by arguing that the ever-growing molecular revolution will result in supertree construction taking on a new role and implementation in the future for analyzing large DNA sequence matrices as part of a divide-and-conquer phylogenetic approach.
Collapse
Affiliation(s)
- Olaf R P Bininda-Emonds
- Lehrstuhl für Tierzucht, Technical University of Munich, D-85354 Freising-Weihenstephan, Germany.
| |
Collapse
|
16
|
|
17
|
|
18
|
Burleigh JG, Eulenstein O, Fernández-Baca D, Sanderson MJ. MRF Supertrees. COMPUTATIONAL BIOLOGY 2004. [DOI: 10.1007/978-1-4020-2330-9_4] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
19
|
|
20
|
|
21
|
|
22
|
|