1
|
Huang YX, Xing ZP, Zhang H, Xu ZB, Tao LL, Hu HY, Kitching IJ, Wang X. Characterization of the Complete Mitochondrial Genome of Eight Diurnal Hawkmoths (Lepidoptera: Sphingidae): New Insights into the Origin and Evolution of Diurnalism in Sphingids. INSECTS 2022; 13:887. [PMID: 36292835 PMCID: PMC9604448 DOI: 10.3390/insects13100887] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/20/2022] [Accepted: 09/27/2022] [Indexed: 06/16/2023]
Abstract
In this study, the mitochondrial genomes of 22 species from three subfamilies in the Sphingidae were sequenced, assembled, and annotated. Eight diurnal hawkmoths were included, of which six were newly sequenced (Hemaris radians, Macroglossum bombylans, M. fritzei, M. pyrrhosticta, Neogurelca himachala, and Sataspes xylocoparis) and two were previously published (Cephonodes hylas and Macroglossum stellatarum). The mitochondrial genomes of these eight diurnal hawkmoths were comparatively analyzed in terms of sequence length, nucleotide composition, relative synonymous codon usage, non-synonymous/synonymous substitution ratio, gene spacing, and repeat sequences. The mitogenomes of the eight species, ranging in length from 15,201 to 15,461 bp, encode the complete set of 37 genes usually found in animal mitogenomes. The base composition of the mitochondrial genomes showed A+T bias. The most commonly used codons were UUA (Leu), AUU (Ile), UUU (Phe), AUA (Met), and AAU (Asn), whereas GCG (Ala) and CCG (Pro) were rarely used. A phylogenetic tree of Sphingidae was constructed based on both maximum likelihood and Bayesian methods. We verified the monophyly of the four current subfamilies of Sphingidae, all of which had high support. In addition, we performed divergence time estimation and ancestral character reconstruction analyses. Diurnal behavior in hawkmoths originated 29.19 million years ago (Mya). It may have been influenced by the combination of herbaceous flourishing, which occurred 26-28 Mya, the uplift of the Tibetan Plateau, and the large-scale evolution of bats in the Oligocene to Pre-Miocene. Moreover, diurnalism in hawkmoths had multiple independent origins in Sphingidae.
Collapse
Affiliation(s)
- Yi-Xin Huang
- Collaborative Innovation Center of Recovery and Reconstruction of Degraded Ecosystem in Wanjiang Basin Co-Founded by Anhui Province and Ministry of Education, School of Ecology and Environment, Anhui Normal University, Wuhu 241000, China
- Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, No. 1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Zhi-Ping Xing
- Collaborative Innovation Center of Recovery and Reconstruction of Degraded Ecosystem in Wanjiang Basin Co-Founded by Anhui Province and Ministry of Education, School of Ecology and Environment, Anhui Normal University, Wuhu 241000, China
| | - Hao Zhang
- Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu 241000, China
| | - Zhen-Bang Xu
- Institute of Resource Plants, Yunnan University, Kunming 650500, China
| | - Li-Long Tao
- Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu 241000, China
| | - Hao-Yuan Hu
- Collaborative Innovation Center of Recovery and Reconstruction of Degraded Ecosystem in Wanjiang Basin Co-Founded by Anhui Province and Ministry of Education, School of Ecology and Environment, Anhui Normal University, Wuhu 241000, China
| | | | - Xu Wang
- Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, No. 1 Beichen West Road, Chaoyang District, Beijing 100101, China
- Anhui Provincial Key Laboratory of the Conservation and Exploitation of Biological Resources, College of Life Sciences, Anhui Normal University, Wuhu 241000, China
| |
Collapse
|
2
|
Tahiri N, Fichet B, Makarenkov V. Building alternative consensus trees and supertrees using k-means and Robinson and Foulds distance. Bioinformatics 2022; 38:3367-3376. [PMID: 35579343 DOI: 10.1093/bioinformatics/btac326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2022] [Revised: 04/28/2022] [Accepted: 05/10/2022] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Each gene has its own evolutionary history which can substantially differ from evolutionary histories of other genes. For example, some individual genes or operons can be affected by specific horizontal gene transfer or recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. However, the output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. RESULTS We present a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of gene phylogenies. We show how an adapted version of the popular k-means clustering algorithm, based on some remarkable properties of the Robinson and Foulds distance, can be used to partition a given set of trees into one (for homogeneous data) or multiple (for heterogeneous data) cluster(s) of trees. Moreover, we adapt the popular Caliński-Harabasz, Silhouette, Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. Special attention is given to the relevant but very challenging problem of inferring alternative supertrees. The use of the Euclidean property of the objective function of the method makes it faster than the existing tree clustering techniques, and thus better suited for analyzing large evolutionary datasets. AVAILABILITY AND IMPLEMENTATION Our KMeansSuperTreeClustering program along with its C ++ source code is available at: https://github.com/TahiriNadia/KMeansSuperTreeClustering. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nadia Tahiri
- Département d'informatique, Université du Québec à Montréal, Montreal, QC, Canada.,Département d'informatique, Université de Sherbrooke, Sherbrooke, QC, Canada
| | - Bernard Fichet
- Aix-Marseille Université, Faculté de Médecine, 27 Bd. Jean Moulin, F-13385 Marseille, France
| | - Vladimir Makarenkov
- Département d'informatique, Université du Québec à Montréal, Montreal, QC, Canada
| |
Collapse
|
3
|
Górecki P, Markin A, Eulenstein O. Exact median-tree inference for unrooted reconciliation costs. BMC Evol Biol 2020; 20:136. [PMID: 33115401 PMCID: PMC7593691 DOI: 10.1186/s12862-020-01700-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Solving median tree problems under tree reconciliation costs is a classic and well-studied approach for inferring species trees from collections of discordant gene trees. These problems are NP-hard, and therefore are, in practice, typically addressed by local search heuristics. So far, however, such heuristics lack any provable correctness or precision. Further, even for small phylogenetic studies, it has been demonstrated that local search heuristics may only provide sub-optimal solutions. Obviating such heuristic uncertainties are exact dynamic programming solutions that allow solving tree reconciliation problems for smaller phylogenetic studies. Despite these promises, such exact solutions are only suitable for credibly rooted input gene trees, which constitute only a tiny fraction of the readily available gene trees. Standard gene tree inference approaches provide only unrooted gene trees and accurately rooting such trees is often difficult, if not impossible. Results Here, we describe complex dynamic programming solutions that represent the first nonnaïve exact solutions for solving the tree reconciliation problems for unrooted input gene trees. Further, we show that the asymptotic runtime of the proposed solutions does not increase when compared to the most time-efficient dynamic programming solutions for rooted input trees. Conclusions In an experimental evaluation, we demonstrate that the described solutions for unrooted gene trees are, like the solutions for rooted input gene trees, suitable for smaller phylogenetic studies. Finally, for the first time, we study the accuracy of classic local search heuristics for unrooted tree reconciliation problems.
Collapse
Affiliation(s)
- Paweł Górecki
- University of Warsaw, Faculty of Mathematics, Informatics and Mechanics, Banacha 2, Warsaw, 02-097, Poland.
| | - Alexey Markin
- Department of Computer Science, Iowa State University, Atanasoff Hall 212, Ames, 50011, USA
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Atanasoff Hall 212, Ames, 50011, USA
| |
Collapse
|
4
|
Wang X, Zhang Y, Zhang H, Qin G, Lin Q. Complete mitochondrial genomes of eight seahorses and pipefishes (Syngnathiformes: Syngnathidae): insight into the adaptive radiation of syngnathid fishes. BMC Evol Biol 2019; 19:119. [PMID: 31185889 PMCID: PMC6560779 DOI: 10.1186/s12862-019-1430-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 04/30/2019] [Indexed: 11/17/2022] Open
Abstract
Background The evolution of male pregnancy is the most distinctive characteristic of syngnathids, and their specialized life history traits make syngnathid species excellent model species for many issues in biological evolution. However, the origin of syngnathids and the evolutionary divergence time of different syngnathid species remain poorly resolved. Comprehensive phylogenetic studies of the Syngnathidae will provide critical evidence to elucidate their origin, evolution, and dispersal patterns. Results We sequenced the mitochondrial genomes of eight syngnathid species in this study, and the estimated divergence times suggested that syngnathids diverged from other teleosts approximately 48.8 Mya during the Eocene period. Selection analysis showed that many mitochondrial genes of syngnathids exhibited significantly lower Ka/Ks values than those of other teleosts. The two most frequently used codons in syngnathid fishes were different from those in other teleosts, and a greater proportion of the mitochondrial simple sequence repeats (SSRs) were distributed in non-coding sequences in syngnathids compared with other teleosts. Conclusions Our study indicated that syngnathid fishes experienced an adaptive radiation process during the early explosion of species. Syngnathid mitochondrial OXPHOS genes appear to exhibit depressed Ka/Ks ratios compared with those of other teleosts, and this may suggest that their mitogenomes have experienced strong selective constraints to eliminate deleterious mutations. Electronic supplementary material The online version of this article (10.1186/s12862-019-1430-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xin Wang
- CAS Key Laboratory of Tropical Marine Bio-Resources and Ecology, South China Sea Institute of Oceanology, Institution of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, 510301, People's Republic of China.,Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, 266237, People's Republic of China.,University of the Chinese Academy of Sciences, Beijing, 100049, People's Republic of China
| | - Yanhong Zhang
- CAS Key Laboratory of Tropical Marine Bio-Resources and Ecology, South China Sea Institute of Oceanology, Institution of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, 510301, People's Republic of China
| | - Huixian Zhang
- CAS Key Laboratory of Tropical Marine Bio-Resources and Ecology, South China Sea Institute of Oceanology, Institution of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, 510301, People's Republic of China
| | - Geng Qin
- CAS Key Laboratory of Tropical Marine Bio-Resources and Ecology, South China Sea Institute of Oceanology, Institution of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, 510301, People's Republic of China
| | - Qiang Lin
- CAS Key Laboratory of Tropical Marine Bio-Resources and Ecology, South China Sea Institute of Oceanology, Institution of South China Sea Ecology and Environmental Engineering, Chinese Academy of Sciences, Guangzhou, 510301, People's Republic of China. .,Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, 266237, People's Republic of China. .,University of the Chinese Academy of Sciences, Beijing, 100049, People's Republic of China.
| |
Collapse
|
5
|
Pawel Gorecki P, Paszek J, Eulenstein O. Unconstrained Diameters for Deep Coalescence. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1002-1012. [PMID: 26887001 DOI: 10.1109/tcbb.2016.2520937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The minimizing-deep-coalescence (MDC) approach infers a median (species) tree for a given set of gene trees under the deep coalescence cost. This cost accounts for the minimum number of deep coalescences needed to reconcile a gene tree with a species tree where the leaf-genes are mapped to the leaf-species through a function called leaf labeling. In order to better understand the MDC approach we investigate here the diameter of a gene tree, which is an important property of the deep coalescence cost. This diameter is the maximal deep coalescence costs for a given gene tree under all leaf labelings for each possible species tree topology. While we prove that this diameter is generally infinite, this result relies on the diameter's unrealistic assumption that species trees can be of infinite size. Providing a more practical definition, we introduce a natural extension of the gene tree diameter that constrains the species tree size by a given constant. For this new diameter, we describe an exact formula, present a complete classification of the trees yielding this diameter, derive formulas for its mean and variance, and demonstrate its ability using comparative studies.
Collapse
|
6
|
Moon J, Eulenstein O. Synthesizing large-scale species trees using the strict consensus approach. J Bioinform Comput Biol 2017; 15:1740002. [PMID: 28513253 DOI: 10.1142/s0219720017400029] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Supertree problems are a standard tool for synthesizing large-scale species trees from a given collection of gene trees under some problem-specific objective. Unfortunately, these problems are typically NP-hard, and often remain so when their instances are restricted to rooted gene trees sampled from the same species. While a class of restricted supertree problems has been effectively addressed by the parameterized strict consensus approach, in practice, most gene trees are unrooted and sampled from different species. Here, we overcome this stringent limitation by describing efficient algorithms that are adopting the strict consensus approach to also handle unrestricted supertree problems. Finally, we demonstrate the performance of our algorithms in a comparative study with classic supertree heuristics using simulated and empirical data sets.
Collapse
Affiliation(s)
- Jucheol Moon
- 1 Department of Computer Science, Iowa State University Ames, Iowa 50010, USA
| | - Oliver Eulenstein
- 1 Department of Computer Science, Iowa State University Ames, Iowa 50010, USA
| |
Collapse
|
7
|
McMorris FR, Powers RC. Some axiomatic limitations for consensus and supertree functions on hierarchies. J Theor Biol 2016; 404:342-347. [PMID: 27320681 DOI: 10.1016/j.jtbi.2016.06.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Revised: 06/01/2016] [Accepted: 06/13/2016] [Indexed: 10/21/2022]
Abstract
Consensus trees and supertrees are regularly used in systematic biology in order to obtain a summary for the common agreement of the evolutionary relationships among a collection of phylogenetic trees (hierarchies). When every tree is defined on the same set of taxa then consensus functions are used, while if the trees are defined on different sets then supertree functions are used. For both of these situations we will consider some of the limitations that might arise from the placing of singularly reasonable and apparently innocuous conditions on the functions. Previous work is reviewed together with new material. In particular, we consider the impact of axioms requiring that the removal or addition of a tree that contains no, or no new, branching information should not affect the outcome.
Collapse
Affiliation(s)
- F R McMorris
- Department of Applied Mathematics, Illinois Institute of Technology, Chicago, IL 60616, United States; Department of Mathematics, University of Louisville, Louisville, KY 40292, United States.
| | - Robert C Powers
- Department of Mathematics, University of Louisville, Louisville, KY 40292, United States
| |
Collapse
|
8
|
Moon J, Lin HT, Eulenstein O. Consensus properties and their large-scale applications for the gene duplication problem. J Bioinform Comput Biol 2016; 14:1642005. [PMID: 27122201 DOI: 10.1142/s0219720016420051] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Solving the gene duplication problem is a classical approach for species tree inference from gene trees that are confounded by gene duplications. This problem takes a collection of gene trees and seeks a species tree that implies the minimum number of gene duplications. Wilkinson et al. posed the conjecture that the gene duplication problem satisfies the desirable Pareto property for clusters. That is, for every instance of the problem, all clusters that are commonly present in the input gene trees of this instance, called strict consensus, will also be found in every solution to this instance. We prove that this conjecture does not generally hold. Despite this negative result we show that the gene duplication problem satisfies a weaker version of the Pareto property where the strict consensus is found in at least one solution (rather than all solutions). This weaker property contributes to our design of an efficient scalable algorithm for the gene duplication problem. We demonstrate the performance of our algorithm in analyzing large-scale empirical datasets. Finally, we utilize the algorithm to evaluate the accuracy of standard heuristics for the gene duplication problem using simulated datasets.
Collapse
Affiliation(s)
- Jucheol Moon
- 1 Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames, Iowa 50010, USA
| | - Harris T Lin
- 1 Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames, Iowa 50010, USA
| | - Oliver Eulenstein
- 1 Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames, Iowa 50010, USA
| |
Collapse
|
9
|
Akanni WA, Wilkinson M, Creevey CJ, Foster PG, Pisani D. Implementing and testing Bayesian and maximum-likelihood supertree methods in phylogenetics. ROYAL SOCIETY OPEN SCIENCE 2015; 2:140436. [PMID: 26361544 PMCID: PMC4555849 DOI: 10.1098/rsos.140436] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 07/06/2015] [Indexed: 05/14/2023]
Abstract
Since their advent, supertrees have been increasingly used in large-scale evolutionary studies requiring a phylogenetic framework and substantial efforts have been devoted to developing a wide variety of supertree methods (SMs). Recent advances in supertree theory have allowed the implementation of maximum likelihood (ML) and Bayesian SMs, based on using an exponential distribution to model incongruence between input trees and the supertree. Such approaches are expected to have advantages over commonly used non-parametric SMs, e.g. matrix representation with parsimony (MRP). We investigated new implementations of ML and Bayesian SMs and compared these with some currently available alternative approaches. Comparisons include hypothetical examples previously used to investigate biases of SMs with respect to input tree shape and size, and empirical studies based either on trees harvested from the literature or on trees inferred from phylogenomic scale data. Our results provide no evidence of size or shape biases and demonstrate that the Bayesian method is a viable alternative to MRP and other non-parametric methods. Computation of input tree likelihoods allows the adoption of standard tests of tree topologies (e.g. the approximately unbiased test). The Bayesian approach is particularly useful in providing support values for supertree clades in the form of posterior probabilities.
Collapse
Affiliation(s)
- Wasiu A. Akanni
- Department of Biology, The National University of Ireland, Maynooth, Co. Kildare, Republic of Ireland
- Department of Life Science, The Natural History Museum, London SW7 5BD, UK
| | - Mark Wilkinson
- Department of Life Science, The Natural History Museum, London SW7 5BD, UK
| | - Christopher J. Creevey
- Institute of Biological, Environmental and Rural Sciences (IBERS), Aberystwyth University, Aberystwyth, Ceredigion SY23 3FG, UK
| | - Peter G. Foster
- Department of Life Science, The Natural History Museum, London SW7 5BD, UK
| | - Davide Pisani
- School of Biological Sciences and School of Earth Sciences, University of Bristol, Life Sciences Building, 24 Tyndall Avenue, Bristol BS8 1TG, UK
- Author for correspondence: Davide Pisani e-mail:
| |
Collapse
|
10
|
Górecki P, Eulenstein O. Gene Tree Diameter for Deep Coalescence. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:155-165. [PMID: 26357086 DOI: 10.1109/tcbb.2014.2351795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The deep coalescence cost accounts for discord caused by deep coalescence between a gene tree and a species tree. It is a major concern that the diameter of a gene tree (the tree's maximum deep coalescence cost across all species trees) depends on its topology, which can largely obfuscate phylogenetic studies. While this bias can be compensated by normalizing the deep coalescence cost using diameters, obtaining them efficiently has been posed as an open problem by Than and Rosenberg. Here, we resolve this problem by describing a linear time algorithm to compute the diameter of a gene tree. In addition, we provide a complete classification of the species trees yielding this diameter to guide phylogenetic analyses.
Collapse
|
11
|
Akanni WA, Creevey CJ, Wilkinson M, Pisani D. L.U.St: a tool for approximated maximum likelihood supertree reconstruction. BMC Bioinformatics 2014; 15:183. [PMID: 24925766 PMCID: PMC4073192 DOI: 10.1186/1471-2105-15-183] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2014] [Accepted: 06/02/2014] [Indexed: 12/29/2022] Open
Abstract
Background Supertrees combine disparate, partially overlapping trees to generate a synthesis that provides a high level perspective that cannot be attained from the inspection of individual phylogenies. Supertrees can be seen as meta-analytical tools that can be used to make inferences based on results of previous scientific studies. Their meta-analytical application has increased in popularity since it was realised that the power of statistical tests for the study of evolutionary trends critically depends on the use of taxon-dense phylogenies. Further to that, supertrees have found applications in phylogenomics where they are used to combine gene trees and recover species phylogenies based on genome-scale data sets. Results Here, we present the L.U.St package, a python tool for approximate maximum likelihood supertree inference and illustrate its application using a genomic data set for the placental mammals. L.U.St allows the calculation of the approximate likelihood of a supertree, given a set of input trees, performs heuristic searches to look for the supertree of highest likelihood, and performs statistical tests of two or more supertrees. To this end, L.U.St implements a winning sites test allowing ranking of a collection of a-priori selected hypotheses, given as a collection of input supertree topologies. It also outputs a file of input-tree-wise likelihood scores that can be used as input to CONSEL for calculation of standard tests of two trees (e.g. Kishino-Hasegawa, Shimidoara-Hasegawa and Approximately Unbiased tests). Conclusion This is the first fully parametric implementation of a supertree method, it has clearly understood properties, and provides several advantages over currently available supertree approaches. It is easy to implement and works on any platform that has python installed. Availability: bitBucket page - https://afro-juju@bitbucket.org/afro-juju/l.u.st.git. Contact: Davide.Pisani@bristol.ac.uk.
Collapse
Affiliation(s)
| | | | | | - Davide Pisani
- Department of Biology, The National University of Ireland, Maynooth, Maynooth, Kildare, Ireland.
| |
Collapse
|
12
|
Lin HT, Burleigh JG, Eulenstein O. Consensus properties for the deep coalescence problem and their application for scalable tree search. BMC Bioinformatics 2012; 13 Suppl 10:S12. [PMID: 22759417 PMCID: PMC3382448 DOI: 10.1186/1471-2105-13-s10-s12] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Background To infer a species phylogeny from unlinked genes, phylogenetic inference methods must confront the biological processes that create incongruence between gene trees and the species phylogeny. Intra-specific gene variation in ancestral species can result in deep coalescence, also known as incomplete lineage sorting, which creates incongruence between gene trees and the species tree. One approach to account for deep coalescence in phylogenetic analyses is the deep coalescence problem, which takes a collection of gene trees and seeks the species tree that implies the fewest deep coalescence events. Although this approach is promising for phylogenetics, the consensus properties of this problem are mostly unknown and analyses of large data sets may be computationally prohibitive. Results We prove that the deep coalescence consensus tree problem satisfies the highly desirable Pareto property for clusters (clades). That is, in all instances, each cluster that is present in all of the input gene trees, called a consensus cluster, will also be found in every optimal solution. Moreover, we introduce a new divide and conquer method for the deep coalescence problem based on the Pareto property. This method refines the strict consensus of the input gene trees, thereby, in practice, often greatly reducing the complexity of the tree search and guaranteeing that the estimated species tree will satisfy the Pareto property. Conclusions Analyses of both simulated and empirical data sets demonstrate that the divide and conquer method can greatly improve upon the speed of heuristics that do not consider the Pareto consensus property, while also guaranteeing that the proposed solution fulfills the Pareto property. The divide and conquer method extends the utility of the deep coalescence problem to data sets with enormous numbers of taxa.
Collapse
Affiliation(s)
- Harris T Lin
- Department of Computer Science, Iowa State University, Ames, IA, USA
| | | | | |
Collapse
|
13
|
Williams D, Fournier GP, Lapierre P, Swithers KS, Green AG, Andam CP, Gogarten JP. A rooted net of life. Biol Direct 2011; 6:45. [PMID: 21936906 PMCID: PMC3189188 DOI: 10.1186/1745-6150-6-45] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2011] [Accepted: 09/21/2011] [Indexed: 01/29/2023] Open
Abstract
Abstract Phylogenetic reconstruction using DNA and protein sequences has allowed the reconstruction of evolutionary histories encompassing all life. We present and discuss a means to incorporate much of this rich narrative into a single model that acknowledges the discrete evolutionary units that constitute the organism. Briefly, this Rooted Net of Life genome phylogeny is constructed around an initial, well resolved and rooted tree scaffold inferred from a supermatrix of combined ribosomal genes. Extant sampled ribosomes form the leaves of the tree scaffold. These leaves, but not necessarily the deeper parts of the scaffold, can be considered to represent a genome or pan-genome, and to be associated with members of other gene families within that sequenced (pan)genome. Unrooted phylogenies of gene families containing four or more members are reconstructed and superimposed over the scaffold. Initially, reticulations are formed where incongruities between topologies exist. Given sufficient evidence, edges may then be differentiated as those representing vertical lines of inheritance within lineages and those representing horizontal genetic transfers or endosymbioses between lineages. Reviewers W. Ford Doolittle, Eric Bapteste and Robert Beiko.
Collapse
Affiliation(s)
- David Williams
- Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT 06269-3125, USA.
| | | | | | | | | | | | | |
Collapse
|
14
|
Kupczok A. Split-based computation of majority-rule supertrees. BMC Evol Biol 2011; 11:205. [PMID: 21752249 PMCID: PMC3169514 DOI: 10.1186/1471-2148-11-205] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Accepted: 07/13/2011] [Indexed: 12/02/2022] Open
Abstract
Background Supertree methods combine overlapping input trees into a larger supertree. Here, I consider split-based supertree methods that first extract the split information of the input trees and subsequently combine this split information into a phylogeny. Well known split-based supertree methods are matrix representation with parsimony and matrix representation with compatibility. Combining input trees on the same taxon set, as in the consensus setting, is a well-studied task and it is thus desirable to generalize consensus methods to supertree methods. Results Here, three variants of majority-rule (MR) supertrees that generalize majority-rule consensus trees are investigated. I provide simple formulas for computing the respective score for bifurcating input- and supertrees. These score computations, together with a heuristic tree search minmizing the scores, were implemented in the python program PluMiST (Plus- and Minus SuperTrees) available from http://www.cibiv.at/software/plumist. The different MR methods were tested by simulation and on real data sets. The search heuristic was successful in combining compatible input trees. When combining incompatible input trees, especially one variant, MR(-) supertrees, performed well. Conclusions The presented framework allows for an efficient score computation of three majority-rule supertree variants and input trees. I combined the score computation with a heuristic search over the supertree space. The implementation was tested by simulation and on real data sets and showed promising results. Especially the MR(-) variant seems to be a reasonable score for supertree reconstruction. Generalizing these computations to multifurcating trees is an open problem, which may be tackled using this framework.
Collapse
Affiliation(s)
- Anne Kupczok
- Center for Integrative Bioinformatics Vienna, Max F, Perutz Laboratories, University of Vienna, Medical University of Vienna, University of Veterinary Medicine Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria.
| |
Collapse
|
15
|
Kupczok A. Consequences of different null models on the tree shape bias of supertree methods. Syst Biol 2011; 60:218-25. [PMID: 21252387 DOI: 10.1093/sysbio/syq086] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Anne Kupczok
- Center for Integrative Bioinformatics Vienna, Max F. Perutz Laboratories, University of Vienna, Medical University of Vienna, University of Veterinary Medicine Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria.
| |
Collapse
|
16
|
Buerki S, Forest F, Salamin N, Alvarez N. Comparative performance of supertree algorithms in large data sets using the soapberry family (Sapindaceae) as a case study. Syst Biol 2010; 60:32-44. [PMID: 21068445 DOI: 10.1093/sysbio/syq057] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
For the last 2 decades, supertree reconstruction has been an active field of research and has seen the development of a large number of major algorithms. Because of the growing popularity of the supertree methods, it has become necessary to evaluate the performance of these algorithms to determine which are the best options (especially with regard to the supermatrix approach that is widely used). In this study, seven of the most commonly used supertree methods are investigated by using a large empirical data set (in terms of number of taxa and molecular markers) from the worldwide flowering plant family Sapindaceae. Supertree methods were evaluated using several criteria: similarity of the supertrees with the input trees, similarity between the supertrees and the total evidence tree, level of resolution of the supertree and computational time required by the algorithm. Additional analyses were also conducted on a reduced data set to test if the performance levels were affected by the heuristic searches rather than the algorithms themselves. Based on our results, two main groups of supertree methods were identified: on one hand, the matrix representation with parsimony (MRP), MinFlip, and MinCut methods performed well according to our criteria, whereas the average consensus, split fit, and most similar supertree methods showed a poorer performance or at least did not behave the same way as the total evidence tree. Results for the super distance matrix, that is, the most recent approach tested here, were promising with at least one derived method performing as well as MRP, MinFlip, and MinCut. The output of each method was only slightly improved when applied to the reduced data set, suggesting a correct behavior of the heuristic searches and a relatively low sensitivity of the algorithms to data set sizes and missing data. Results also showed that the MRP analyses could reach a high level of quality even when using a simple heuristic search strategy, with the exception of MRP with Purvis coding scheme and reversible parsimony. The future of supertrees lies in the implementation of a standardized heuristic search for all methods and the increase in computing power to handle large data sets. The latter would prove to be particularly useful for promising approaches such as the maximum quartet fit method that yet requires substantial computing power.
Collapse
Affiliation(s)
- Sven Buerki
- Real Jardin Botanico, Department of Biodiversity and Conservation, CSIC, Plaza de Murillo 2, 28014 Madrid, Spain.
| | | | | | | |
Collapse
|
17
|
Abstract
MOTIVATION Phylogenetic tree-building methods use molecular data to represent the evolutionary history of genes and taxa. A recurrent problem is to reconcile the various phylogenies built from different genomic sequences into a single one. This task is generally conducted by a two-step approach whereby a binary representation of the initial trees is first inferred and then a maximum parsimony (MP) analysis is performed on it. This binary representation uses a decomposition of all source trees that is usually based on clades, but that can also be based on triplets or quartets. The relative performances of these representations have been discussed but are difficult to assess since both are limited to relatively small datasets. RESULTS This article focuses on the triplet-based representation of source trees. We first recall how, using this representation, the parsimony analysis is related to the median tree notion. We then introduce SuperTriplets, a new algorithm that is specially designed to optimize this alternative formulation of the MP criterion. The method avoids several practical limitations of the triplet-based binary matrix representation, making it useful to deal with large datasets. When the correct resolution of every triplet appears more often than the incorrect ones in source trees, SuperTriplets warrants to reconstruct the correct phylogeny. Both simulations and a case study on mammalian phylogenomics confirm the advantages of this approach. In both cases, SuperTriplets tends to propose less resolved but more reliable supertrees than those inferred using M(atrix) Representation with Parsimony. AVAILABILITY Online and JAVA standalone versions of SuperTriplets are available at http://www.supertriplets.univ-montp2.fr/.
Collapse
Affiliation(s)
- Vincent Ranwez
- Université Montpellier 2, CC064, Place Eugène Bataillon, 34 095 Montpellier Cedex 05, France.
| | | | | |
Collapse
|
18
|
Campbell V, Lapointe FJ. An application of supertree methods to Mammalian mitogenomic sequences. Evol Bioinform Online 2010; 6:57-71. [PMID: 20535231 PMCID: PMC2880846 DOI: 10.4137/ebo.s4527] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
TWO DIFFERENT APPROACHES CAN BE USED IN PHYLOGENOMICS: combined or separate analysis. In the first approach, different datasets are combined in a concatenated supermatrix. In the second, datasets are analyzed separately and the phylogenetic trees are then combined in a supertree. The supertree method is an interesting alternative to avoid missing data, since datasets that are analyzed separately do not need to represent identical taxa. However, the supertree approach and the corresponding consensus methods have been highly criticized for not providing valid phylogenetic hypotheses. In this study, congruence of trees estimated by consensus and supertree approaches were compared to model trees obtained from a combined analysis of complete mitochondrial sequences of 102 species representing 93 mammal families. The consensus methods produced poorly resolved consensus trees and did not perform well, except for the majority rule consensus with compatible groupings. The weighted supertree and matrix representation with parsimony methods performed equally well and were highly congruent with the model trees. The most similar supertree method was the least congruent with the model trees. We conclude that some of the methods tested are worth considering in a phylogenomic context.
Collapse
Affiliation(s)
- Véronique Campbell
- Université de Montréal, Département de Sciences Biologiques, C.P. 6128, Succ. Centre-ville, Montréal, Québec, H3C 3J7, Canada
| | - François-Joseph Lapointe
- Université de Montréal, Département de Sciences Biologiques, C.P. 6128, Succ. Centre-ville, Montréal, Québec, H3C 3J7, Canada
| |
Collapse
|
19
|
Bansal MS, Burleigh JG, Eulenstein O, Fernández-Baca D. Robinson-Foulds supertrees. Algorithms Mol Biol 2010; 5:18. [PMID: 20181274 PMCID: PMC2846952 DOI: 10.1186/1748-7188-5-18] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2009] [Accepted: 02/24/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Supertree methods synthesize collections of small phylogenetic trees with incomplete taxon overlap into comprehensive trees, or supertrees, that include all taxa found in the input trees. Supertree methods based on the well established Robinson-Foulds (RF) distance have the potential to build supertrees that retain much information from the input trees. Specifically, the RF supertree problem seeks a binary supertree that minimizes the sum of the RF distances from the supertree to the input trees. Thus, an RF supertree is a supertree that is consistent with the largest number of clusters (or clades) from the input trees. RESULTS We introduce efficient, local search based, hill-climbing heuristics for the intrinsically hard RF supertree problem on rooted trees. These heuristics use novel non-trivial algorithms for the SPR and TBR local search problems which improve on the time complexity of the best known (naïve) solutions by a factor of Theta(n) and Theta(n2) respectively (where n is the number of taxa, or leaves, in the supertree). We use an implementation of our new algorithms to examine the performance of the RF supertree method and compare it to matrix representation with parsimony (MRP) and the triplet supertree method using four supertree data sets. Not only did our RF heuristic provide fast estimates of RF supertrees in all data sets, but the RF supertrees also retained more of the information from the input trees (based on the RF distance) than the other supertree methods. CONCLUSIONS Our heuristics for the RF supertree problem, based on our new local search algorithms, make it possible for the first time to estimate large supertrees by directly optimizing the RF distance from rooted input trees to the supertrees. This provides a new and fast method to build accurate supertrees. RF supertrees may also be useful for estimating majority-rule(-) supertrees, which are a generalization of majority-rule consensus trees.
Collapse
Affiliation(s)
- Mukul S Bansal
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | - J Gordon Burleigh
- Department of Biology, University of Florida, Gainesville, FL 32611, USA
| | - Oliver Eulenstein
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | | |
Collapse
|
20
|
Dong J, Fernández-Baca D, McMorris FR. Constructing majority-rule supertrees. Algorithms Mol Biol 2010; 5:2. [PMID: 20047658 PMCID: PMC2826330 DOI: 10.1186/1748-7188-5-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2009] [Accepted: 01/04/2010] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Supertree methods combine the phylogenetic information from multiple partially-overlapping trees into a larger phylogenetic tree called a supertree. Several supertree construction methods have been proposed to date, but most of these are not designed with any specific properties in mind. Recently, Cotton and Wilkinson proposed extensions of the majority-rule consensus tree method to the supertree setting that inherit many of the appealing properties of the former. RESULTS We study a variant of one of Cotton and Wilkinson's methods, called majority-rule (+) supertrees. After proving that a key underlying problem for constructing majority-rule (+) supertrees is NP-hard, we develop a polynomial-size exact integer linear programming formulation of the problem. We then present a data reduction heuristic that identifies smaller subproblems that can be solved independently. While this technique is not guaranteed to produce optimal solutions, it can achieve substantial problem-size reduction. Finally, we report on a computational study of our approach on various real data sets, including the 121-taxon, 7-tree Seabirds data set of Kennedy and Page. CONCLUSIONS The results indicate that our exact method is computationally feasible for moderately large inputs. For larger inputs, our data reduction heuristic makes it feasible to tackle problems that are well beyond the range of the basic integer programming approach. Comparisons between the results obtained by our heuristic and exact solutions indicate that the heuristic produces good answers. Our results also suggest that the majority-rule (+) approach, in both its basic form and with data reduction, yields biologically meaningful phylogenies.
Collapse
Affiliation(s)
- Jianrong Dong
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | | | - FR McMorris
- Department of Applied Mathematics, Illinois Institute of Technology, Chicago, IL 60616, USA
| |
Collapse
|
21
|
Gaubert P, Denys G, Oberdorff T. Genus-level supertree of Cyprinidae (Actinopterygii: Cypriniformes), partitioned qualitative clade support and test of macro-evolutionary scenarios. Biol Rev Camb Philos Soc 2009; 84:653-89. [DOI: 10.1111/j.1469-185x.2009.00091.x] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
22
|
Bapteste E, O'Malley MA, Beiko RG, Ereshefsky M, Gogarten JP, Franklin-Hall L, Lapointe FJ, Dupré J, Dagan T, Boucher Y, Martin W. Prokaryotic evolution and the tree of life are two different things. Biol Direct 2009; 4:34. [PMID: 19788731 PMCID: PMC2761302 DOI: 10.1186/1745-6150-4-34] [Citation(s) in RCA: 128] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2009] [Accepted: 09/29/2009] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The concept of a tree of life is prevalent in the evolutionary literature. It stems from attempting to obtain a grand unified natural system that reflects a recurrent process of species and lineage splittings for all forms of life. Traditionally, the discipline of systematics operates in a similar hierarchy of bifurcating (sometimes multifurcating) categories. The assumption of a universal tree of life hinges upon the process of evolution being tree-like throughout all forms of life and all of biological time. In multicellular eukaryotes, the molecular mechanisms and species-level population genetics of variation do indeed mainly cause a tree-like structure over time. In prokaryotes, they do not. Prokaryotic evolution and the tree of life are two different things, and we need to treat them as such, rather than extrapolating from macroscopic life to prokaryotes. In the following we will consider this circumstance from philosophical, scientific, and epistemological perspectives, surmising that phylogeny opted for a single model as a holdover from the Modern Synthesis of evolution. RESULTS It was far easier to envision and defend the concept of a universal tree of life before we had data from genomes. But the belief that prokaryotes are related by such a tree has now become stronger than the data to support it. The monistic concept of a single universal tree of life appears, in the face of genome data, increasingly obsolete. This traditional model to describe evolution is no longer the most scientifically productive position to hold, because of the plurality of evolutionary patterns and mechanisms involved. Forcing a single bifurcating scheme onto prokaryotic evolution disregards the non-tree-like nature of natural variation among prokaryotes and accounts for only a minority of observations from genomes. CONCLUSION Prokaryotic evolution and the tree of life are two different things. Hence we will briefly set out alternative models to the tree of life to study their evolution. Ultimately, the plurality of evolutionary patterns and mechanisms involved, such as the discontinuity of the process of evolution across the prokaryote-eukaryote divide, summons forth a pluralistic approach to studying evolution. REVIEWERS This article was reviewed by Ford Doolittle, John Logsdon and Nicolas Galtier.
Collapse
|
23
|
|
24
|
Simon S, Strauss S, von Haeseler A, Hadrys H. A phylogenomic approach to resolve the basal pterygote divergence. Mol Biol Evol 2009; 26:2719-30. [PMID: 19713325 DOI: 10.1093/molbev/msp191] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
One of the most fascinating Bauplan transitions in the animal kingdom was the invention of insect wings, a change that also contributed to the success and enormous diversity of this animal group. However, the origin of insect flight and the relationships of basal winged insect orders are still controversial. Three hypotheses have been proposed to explain the phylogeny of winged insects: 1) the traditional Palaeoptera hypothesis (Ephemeroptera + Odonata, Neoptera), 2) the Metapterygota hypothesis (Ephemeroptera, Odonata + Neoptera), and 3) the Chiastomyaria hypothesis (Odonata, Ephemeroptera + Neoptera). Neither phylogenetic analyses of single genes nor even multiple marker systems (e.g., molecular markers + morphological characters) have yet been able to conclusively resolve basal pterygote divergences. A possible explanation for the lack of resolution is that the divergences took place in the mid-Devonian within a short period of time and attempts to solve this problem have been confounded by the major challenge of finding molecular markers to accurately track these short ancient internodes. Although phylogenomic data are available for Neoptera and some wingless (apterygote) orders, they are lacking for the crucial Odonata and Ephemeroptera orders. We adopt a multigene approach including data from two new expressed sequence tag projects-from the orders Ephemeroptera (Baetis sp.) and Odonata (Ischnura elegans)-to evaluate the potential of phylogenomic analyses in clarifying this unresolved issue. We analyzed two data sets that differed in represented taxa, genes, and overall sequence lengths: maxspe (15 taxa, 125 genes, and 31,643 amino acid positions) and maxgen (8 taxa, 150 genes, and 42,541 amino acid positions). Maximum likelihood and Bayesian inference analyses both place the Odonata at the base of the winged insects. Furthermore, statistical hypotheses testing rejected both the Palaeoptera and the Metapterygota hypotheses. The comprehensive molecular data set developed here provides conclusive support for odonates as the most basal winged insect order (Chiastomyaria hypothesis). Data quality assessment indicates that proteins involved in cellular processes and signaling harbor the most informative phylogenetic signal.
Collapse
Affiliation(s)
- Sabrina Simon
- Institute of Ecology & Evolution, Stiftung Tieraerztliche Hochschule Hannover, Hannover, Germany.
| | | | | | | |
Collapse
|
25
|
Baker WJ, Savolainen V, Asmussen-Lange CB, Chase MW, Dransfield J, Forest F, Harley MM, Uhl NW, Wilkinson M. Complete Generic-Level Phylogenetic Analyses of Palms (Arecaceae) with Comparisons of Supertree and Supermatrix Approaches. Syst Biol 2009; 58:240-56. [PMID: 20525581 DOI: 10.1093/sysbio/syp021] [Citation(s) in RCA: 152] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Vincent Savolainen
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AB, UK
- Imperial College London, Silwood Park Campus, Buckhurst Road, Ascot, Berkshire SL5 7PY, UK
| | - Conny B. Asmussen-Lange
- Department of Ecology, University of Copenhagen, Rolighedsvej 21, DK-1958 Frederiksberg C, Denmark
| | - Mark W. Chase
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AB, UK
| | | | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AB, UK
| | | | - Natalie W. Uhl
- Department of Plant Biology, Cornell University, 412 Mann Library Building, Ithaca, NY 14853, USA
| | - Mark Wilkinson
- Department of Zoology, Natural History Museum, Cromwell Road, London SW7 5BD, UK
| |
Collapse
|
26
|
Supertrees join the mainstream of phylogenetics. Trends Ecol Evol 2009; 24:1-3. [DOI: 10.1016/j.tree.2008.08.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2008] [Revised: 08/12/2008] [Accepted: 08/26/2008] [Indexed: 11/20/2022]
|
27
|
Explosions and hot spots in supertree methods. J Theor Biol 2008; 253:345-8. [PMID: 18472112 DOI: 10.1016/j.jtbi.2008.03.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2007] [Revised: 03/18/2008] [Accepted: 03/25/2008] [Indexed: 11/23/2022]
Abstract
In phylogenetic systematics a problem of great practical and theoretical interest is to construct one or more large phylogenies (evolutionary trees), i.e., supertrees, from a given set of small phylogenies with overlapping sets of leaf labels. Although the methods being used to solve this problem are usually given plausible biological or theoretical justifications, occasionally it is possible to see that the result of a supertree method (SM) is explosive, and therefore logically meaningless, in the sense that it has been inferred from logical propositions that are contradictory. This paper presents the basic ideas and issues of how explosions affect the inference of rooted trees by SMs. We define the relevant concepts, give examples, and show how sometimes it is possible to identify hot spots in the input from which an SM may make explosive inferences that cannot be logically justified.
Collapse
|
28
|
Abstract
Most supertree methods proposed to date are essentially ad hoc, rather than designed with particular properties in mind. Although the supertree problem remains difficult, one promising avenue is to develop from better understood consensus methods to the more general supertree setting. Here, we generalize the widely used majority-rule consensus method to the supertree setting. The majority-rule consensus tree is the strict consensus of the median trees under the symmetric-difference metric, so we can generalize the consensus method by generalizing this metric to trees with differing leaf sets. There are two different natural generalizations, based on pruning or grafting leaves to produce comparable trees, and these two generalizations produce two different, but related, majority-rule supertree methods.
Collapse
Affiliation(s)
- James A Cotton
- Department of Zoology, The Natural History Museum, London SW7 5BD, UK.
| | | |
Collapse
|
29
|
Pisani D, Cotton JA, McInerney JO. Supertrees Disentangle the Chimerical Origin of Eukaryotic Genomes. Mol Biol Evol 2007; 24:1752-60. [PMID: 17504772 DOI: 10.1093/molbev/msm095] [Citation(s) in RCA: 146] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Eukaryotes are traditionally considered to be one of the three natural divisions of the tree of life and the sister group of the Archaebacteria. However, eukaryotic genomes are replete with genes of eubacterial ancestry, and more than 20 mutually incompatible hypotheses have been proposed to account for eukaryote origins. Here we test the predictions of these hypotheses using a novel supertree-based phylogenetic signal-stripping method, and recover supertrees of life based on phylogenies for up to 5,741 single gene families distributed across 185 genomes. Using our signal-stripping method, we show that there are three distinct phylogenetic signals in eukaryotic genomes. In order of strength, these link eukaryotes with the Cyanobacteria, the Proteobacteria, and the Thermoplasmatales, an archaebacterial (euryarchaeotes) group. These signals correspond to distinct symbiotic partners involved in eukaryote evolution: plastids, mitochondria, and the elusive host lineage. According to our whole-genome data, eukaryotes are hardly the sister group of the Archaebacteria, because up to 83% of eukaryotic genes with a prokaryotic homolog have eubacterial, not archaebacterial, origins. The results reject all but two of the current hypotheses for the origin of eukaryotes: those assuming a sulfur-dependent or hydrogen-dependent syntrophy for the origin of mitochondria.
Collapse
Affiliation(s)
- Davide Pisani
- Department of Biology, The National University of Ireland, Maynooth, Maynooth, County Kildare, Ireland, UK
| | | | | |
Collapse
|