1
|
Murakami Y, van Iersel L, Janssen R, Jones M, Moulton V. Reconstructing Tree-Child Networks from Reticulate-Edge-Deleted Subnetworks. Bull Math Biol 2019; 81:3823-3863. [PMID: 31297691 PMCID: PMC6764941 DOI: 10.1007/s11538-019-00641-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 07/03/2019] [Indexed: 01/16/2023]
Abstract
Network reconstruction lies at the heart of phylogenetic research. Two well-studied classes of phylogenetic networks include tree-child networks and level-k networks. In a tree-child network, every non-leaf node has a child that is a tree node or a leaf. In a level-k network, the maximum number of reticulations contained in a biconnected component is k. Here, we show that level-k tree-child networks are encoded by their reticulate-edge-deleted subnetworks, which are subnetworks obtained by deleting a single reticulation edge, if [Formula: see text]. Following this, we provide a polynomial-time algorithm for uniquely reconstructing such networks from their reticulate-edge-deleted subnetworks. Moreover, we show that this can even be done when considering subnetworks obtained by deleting one reticulation edge from each biconnected component with k reticulations.
Collapse
Affiliation(s)
- Yukihiro Murakami
- Delft Institute of Applied Mathematics, Delft University of Technology, Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands
| | - Leo van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands
| | - Remie Janssen
- Delft Institute of Applied Mathematics, Delft University of Technology, Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands
| | - Mark Jones
- Delft Institute of Applied Mathematics, Delft University of Technology, Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ UK
| |
Collapse
|
2
|
Cunningham CW, Zhu H, Hillis DM. BEST‐FIT MAXIMUM‐LIKELIHOOD MODELS FOR PHYLOGENETIC INFERENCE: EMPIRICAL TESTS WITH KNOWN PHYLOGENIES. Evolution 2017; 52:978-987. [DOI: 10.1111/j.1558-5646.1998.tb01827.x] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/1997] [Accepted: 04/16/1998] [Indexed: 12/01/2022]
Affiliation(s)
| | - H. Zhu
- Zoology Department Duke University Durham North Carolina 27708
| | - D. M. Hillis
- Department of Zoology and Institute of Cellular and Molecular Biology University of Texas Austin Texas 78712
| |
Collapse
|
3
|
|
4
|
Sumner JG, Jarvis PD, Holland BR. A tensorial approach to the inversion of group-based phylogenetic models. BMC Evol Biol 2014; 14:236. [PMID: 25472897 PMCID: PMC4268818 DOI: 10.1186/s12862-014-0236-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 11/06/2014] [Indexed: 11/16/2022] Open
Abstract
Background Hadamard conjugation is part of the standard mathematical armoury in the analysis of molecular phylogenetic methods. For group-based models, the approach provides a one-to-one correspondence between the so-called “edge length” and “sequence” spectrum on a phylogenetic tree. The Hadamard conjugation has been used in diverse phylogenetic applications not only for inference but also as an important conceptual tool for thinking about molecular data leading to generalizations beyond strictly tree-like evolutionary modelling. Results For general group-based models of phylogenetic branching processes, we reformulate the problem of constructing a one-one correspondence between pattern probabilities and edge parameters. This takes a classic result previously shown through use of Fourier analysis and presents it in the language of tensors and group representation theory. This derivation makes it clear why the inversion is possible, because, under their usual definition, group-based models are defined for abelian groups only. Conclusion We provide an inversion of group-based phylogenetic models that can implemented using matrix multiplication between rectangular matrices indexed by ordered-partitions of varying sizes. Our approach provides additional context for the construction of phylogenetic probability distributions on network structures, and highlights the potential limitations of restricting to group-based models in this setting.
Collapse
Affiliation(s)
- Jeremy G Sumner
- School of Physical Sciences, University of Tasmania, Hobart TAS 7001, Australia.
| | | | | |
Collapse
|
5
|
|
6
|
Ramadugu C, Pfeil BE, Keremane ML, Lee RF, Maureira-Butler IJ, Roose ML. A six nuclear gene phylogeny of Citrus (Rutaceae) taking into account hybridization and lineage sorting. PLoS One 2013; 8:e68410. [PMID: 23874615 PMCID: PMC3713030 DOI: 10.1371/journal.pone.0068410] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2012] [Accepted: 05/29/2013] [Indexed: 12/14/2022] Open
Abstract
Background Genus Citrus (Rutaceae) comprises many important cultivated species that generally hybridize easily. Phylogenetic study of a group showing extensive hybridization is challenging. Since the genus Citrus has diverged recently (4–12 Ma), incomplete lineage sorting of ancestral polymorphisms is also likely to cause discrepancies among genes in phylogenetic inferences. Incongruence of gene trees is observed and it is essential to unravel the processes that cause inconsistencies in order to understand the phylogenetic relationships among the species. Methodology and Principal Findings (1) We generated phylogenetic trees using haplotype sequences of six low copy nuclear genes. (2) Published simple sequence repeat data were re-analyzed to study population structure and the results were compared with the phylogenetic trees constructed using sequence data and coalescence simulations. (3) To distinguish between hybridization and incomplete lineage sorting, we developed and utilized a coalescence simulation approach. In other studies, species trees have been inferred despite the possibility of hybridization having occurred and used to generate null distributions of the effect of lineage sorting alone (by coalescent simulation). Since this is problematic, we instead generate these distributions directly from observed gene trees. Of the six trees generated, we used the most resolved three to detect hybrids. We found that 11 of 33 samples appear to be affected by historical hybridization. Analysis of the remaining three genes supported the conclusions from the hybrid detection test. Conclusions We have identified or confirmed probable hybrid origins for several Citrus cultivars using three different approaches–gene phylogenies, population structure analysis and coalescence simulation. Hybridization and incomplete lineage sorting were identified primarily based on differences among gene phylogenies with reference to null expectations via coalescence simulations. We conclude that identifying hybridization as a frequent cause of incongruence among gene trees is critical to correctly infer the phylogeny among species of Citrus.
Collapse
Affiliation(s)
- Chandrika Ramadugu
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, California, United States of America
| | - Bernard E. Pfeil
- Commonwealth Scientific and Industrial Research Organisation Plant Industry, Canberra, Australian Capital Territory, Australia
- DBES, Gothenburg University, Gothenburg, Sweden
- * E-mail:
| | - Manjunath L. Keremane
- United States Department of Agriculture–Agricultural Research Service National Clonal Germplasm Repository for Citrus and Dates, Riverside, California, United States of America
| | - Richard F. Lee
- United States Department of Agriculture–Agricultural Research Service National Clonal Germplasm Repository for Citrus and Dates, Riverside, California, United States of America
| | - Ivan J. Maureira-Butler
- Agriaquaculture Nutritional Genomic Center, Centro de Genómica Nutricional Agroacuícola, Genomics and Bioinformatics Unit, Temuco, Chile
| | - Mikeal L. Roose
- Department of Botany and Plant Sciences, University of California Riverside, Riverside, California, United States of America
| |
Collapse
|
7
|
Klaere S, Liebscher V. An algebraic analysis of the two state Markov model on tripod trees. Math Biosci 2012; 237:38-48. [PMID: 22430560 DOI: 10.1016/j.mbs.2012.03.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2010] [Revised: 02/22/2012] [Accepted: 03/02/2012] [Indexed: 11/15/2022]
Abstract
Methods of phylogenetic inference use more and more complex models to generate trees from data. However, even simple models and their implications are not fully understood. Here, we investigate the two-state Markov model on a tripod tree, inferring conditions under which a given set of observations gives rise to such a model. This type of investigation has been undertaken before by several scientists from different fields of research. In contrast to other work we fully analyse the model, presenting conditions under which one can infer a model from the observation or at least get support for the tree-shaped interdependence of the leaves considered. We also present all conditions under which the results can be extended from tripod trees to quartet trees, a step necessary to reconstruct at least a topology. Apart from finding conditions under which such an extension works we discuss example cases for which such an extension does not work.
Collapse
Affiliation(s)
- Steffen Klaere
- Department of Statistics and School of Biological Sciences, The University of Auckland, Auckland, New Zealand.
| | | |
Collapse
|
8
|
The Algebra of the General Markov Model on Phylogenetic Trees and Networks. Bull Math Biol 2011; 74:858-80. [DOI: 10.1007/s11538-011-9691-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2010] [Accepted: 08/09/2011] [Indexed: 10/17/2022]
|
9
|
Boc A, Makarenkov V. Towards an accurate identification of mosaic genes and partial horizontal gene transfers. Nucleic Acids Res 2011; 39:e144. [PMID: 21917854 PMCID: PMC3241670 DOI: 10.1093/nar/gkr735] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Many bacteria and viruses adapt to varying environmental conditions through the acquisition of mosaic genes. A mosaic gene is composed of alternating sequence polymorphisms either belonging to the host original allele or derived from the integrated donor DNA. Often, the integrated sequence contains a selectable genetic marker (e.g. marker allowing for antibiotic resistance). An effective identification of mosaic genes and detection of corresponding partial horizontal gene transfers (HGTs) are among the most important challenges posed by evolutionary biology. We developed a method for detecting partial HGT events and related intragenic recombination giving rise to the formation of mosaic genes. A bootstrap procedure incorporated in our method is used to assess the support of each predicted partial gene transfer. The proposed method can be also applied to confirm or discard complete (i.e. traditional) horizontal gene transfers detected by any HGT inferring method. While working on a full-genome scale, the new method can be used to assess the level of mosaicism in the considered genomes as well as the rates of complete and partial HGT underlying their evolution.
Collapse
Affiliation(s)
- Alix Boc
- Département d'Informatique, Université du Québec à Montréal, CP 8888, Succursale Centre Ville, Montreal, QC, Canada H3C 3P8
| | | |
Collapse
|
10
|
Boc A, Philippe H, Makarenkov V. Inferring and validating horizontal gene transfer events using bipartition dissimilarity. Syst Biol 2010; 59:195-211. [PMID: 20525630 DOI: 10.1093/sysbio/syp103] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Horizontal gene transfer (HGT) is one of the main mechanisms driving the evolution of microorganisms. Its accurate identification is one of the major challenges posed by reticulate evolution. In this article, we describe a new polynomial-time algorithm for inferring HGT events and compare 3 existing and 1 new tree comparison indices in the context of HGT identification. The proposed algorithm can rely on different optimization criteria, including least squares (LS), Robinson and Foulds (RF) distance, quartet distance (QD), and bipartition dissimilarity (BD), when searching for an optimal scenario of subtree prune and regraft (SPR) moves needed to transform the given species tree into the given gene tree. As the simulation results suggest, the algorithmic strategy based on BD, introduced in this article, generally provides better results than those based on LS, RF, and QD. The BD-based algorithm also proved to be more accurate and faster than a well-known polynomial time heuristic RIATA-HGT. Moreover, the HGT recovery results yielded by BD were generally equivalent to those provided by the exponential-time algorithm LatTrans, but a clear gain in running time was obtained using the new algorithm. Finally, a statistical framework for assessing the reliability of obtained HGTs by bootstrap analysis is also presented.
Collapse
Affiliation(s)
- Alix Boc
- Département d'informatique, Université du Québec à Montréal, C.P. 8888, Succ. Centre-ville, Montréal, Québec, Canada.
| | | | | |
Collapse
|
11
|
Snir S, Tuller T. The NET-HMM approach: phylogenetic network inference by combining maximum likelihood and Hidden Markov Models. J Bioinform Comput Biol 2009; 7:625-44. [PMID: 19634195 DOI: 10.1142/s021972000900428x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2008] [Revised: 12/05/2008] [Accepted: 12/06/2008] [Indexed: 11/18/2022]
Abstract
Horizontal gene transfer (HGT) is the event of transferring genetic material from one lineage in the evolutionary tree to a different lineage. HGT plays a major role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Although the prevailing assumption is of complete HGT, cases of partial HGT (which are also named chimeric HGT) where only part of a gene is horizontally transferred, have also been reported, albeit less frequently. In this work we suggest a new probabilistic model, the NET-HMM, for analyzing and modeling phylogenetic networks. This new model captures the biologically realistic assumption that neighboring sites of DNA or amino acid sequences are not independent, which increases the accuracy of the inference. The model describes the phylogenetic network as a Hidden Markov Model (HMM), where each hidden state is related to one of the network's trees. One of the advantages of the NET-HMM is its ability to infer partial HGT as well as complete HGT. We describe the properties of the NET-HMM, devise efficient algorithms for solving a set of problems related to it, and implement them in software. We also provide a novel complementary significance test for evaluating the fitness of a model (NET-HMM) to a given dataset. Using NET-HMM, we are able to answer interesting biological questions, such as inferring the length of partial HGT's and the affected nucleotides in the genomic sequences, as well as inferring the exact location of HGT events along the tree branches. These advantages are demonstrated through the analysis of synthetical inputs and three different biological inputs.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Israel.
| | | |
Collapse
|
12
|
Wägele JW, Letsch H, Klussmann-Kolb A, Mayer C, Misof B, Wägele H. Phylogenetic support values are not necessarily informative: the case of the Serialia hypothesis (a mollusk phylogeny). Front Zool 2009; 6:12. [PMID: 19555513 PMCID: PMC2710323 DOI: 10.1186/1742-9994-6-12] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Accepted: 06/26/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Molecular phylogenies are being published increasingly and many biologists rely on the most recent topologies. However, different phylogenetic trees often contain conflicting results and contradict significant background data. Not knowing how reliable traditional knowledge is, a crucial question concerns the quality of newly produced molecular data. The information content of DNA alignments is rarely discussed, as quality statements are mostly restricted to the statistical support of clades. Here we present a case study of a recently published mollusk phylogeny that contains surprising groupings, based on five genes and 108 species, and we apply new or rarely used tools for the analysis of the information content of alignments and for the filtering of noise (masking of random-like alignment regions, split decomposition, phylogenetic networks, quartet mapping). RESULTS The data are very fragmentary and contain contaminations. We show that that signal-like patterns in the data set are conflicting and partly not distinct and that the reported strong support for a "rather surprising result" (monoplacophorans and chitons form a monophylum Serialia) does not exist at the level of primary homologies. Split-decomposition, quartet mapping and neighbornet analyses reveal conflicting nucleotide patterns and lack of distinct phylogenetic signal for the deeper phylogeny of mollusks. CONCLUSION Even though currently a majority of molecular phylogenies are being justified with reference to the 'statistical' support of clades in tree topologies, this confidence seems to be unfounded. Contradictions between phylogenies based on different analyses are already a strong indication of unnoticed pitfalls. The use of tree-independent tools for exploratory analyses of data quality is highly recommended. Concerning the new mollusk phylogeny more convincing evidence is needed.
Collapse
Affiliation(s)
- J Wolfgang Wägele
- Zoologisches Forschungsmuseum Alexander Koenig, Adenauerallee 160, 53313 Bonn, Germany
| | - Harald Letsch
- Zoologisches Forschungsmuseum Alexander Koenig, Adenauerallee 160, 53313 Bonn, Germany
| | - Annette Klussmann-Kolb
- J. W. Goethe University, Institute for Ecology, Evolution and Diversity, Siesmayerstrasse 70, D – 60054 Frankfurt am Main, Germany
| | - Christoph Mayer
- Ruhr-University Bochum, Faculty of Biology, Universitätsstr., 44370 Bochum, Germany
| | - Bernhard Misof
- Zoologisches Forschungsmuseum Alexander Koenig, Adenauerallee 160, 53313 Bonn, Germany
| | - Heike Wägele
- Zoologisches Forschungsmuseum Alexander Koenig, Adenauerallee 160, 53313 Bonn, Germany
| |
Collapse
|
13
|
Klaere S, Gesell T, von Haeseler A. The impact of single substitutions on multiple sequence alignments. Philos Trans R Soc Lond B Biol Sci 2009; 363:4041-7. [PMID: 18852110 DOI: 10.1098/rstb.2008.0140] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.
Collapse
Affiliation(s)
- Steffen Klaere
- Center for Integrative Bioinformatics Vienna, University of Vienna, Medical University Vienna, Veterinary University Vienna, Max F. Perutz Laboratories, Dr Bohrgasse 9, 1030 Wien, Austria.
| | | | | |
Collapse
|
14
|
Abstract
MOTIVATION Horizontal gene transfer (HGT) is believed to be ubiquitous among bacteria, and plays a major role in their genome diversification as well as their ability to develop resistance to antibiotics. In light of its evolutionary significance and implications for human health, developing accurate and efficient methods for detecting and reconstructing HGT is imperative. RESULTS In this article we provide a new HGT-oriented likelihood framework for many problems that involve phylogeny-based HGT detection and reconstruction. Beside the formulation of various likelihood criteria, we show that most of these problems are NP-hard, and offer heuristics for efficient and accurate reconstruction of HGT under these criteria. We implemented our heuristics and used them to analyze biological as well as synthetic data. In both cases, our criteria and heuristics exhibited very good performance with respect to identifying the correct number of HGT events as well as inferring their correct location on the species tree. AVAILABILITY Implementation of the criteria as well as heuristics and hardness proofs are available from the authors upon request. Hardness proofs can also be downloaded at http://www.cs.tau.ac.il/~tamirtul/MLNET/Supp-ML.pdf
Collapse
Affiliation(s)
- Guohua Jin
- Department of Computer Science, Rice University Houston, TX, USA
| | | | | | | |
Collapse
|
15
|
Abstract
Exponentially accumulating genetic molecular data were supposed to bring us closer to resolving one of the most fundamental issues in biology—the reconstruction of the tree of life. This tree should encompass the evolutionary history of all living creatures on earth and trace back a few billions of years to
the most ancient microbial ancestor.
Ironically, this abundance of data only blurs our traditional beliefs and seems to make this goal harder to achieve than initially thought. This is largelydue to lateral gene transfer, the passage of genetic material between organisms not through lineal descent. Evolution in light of lateral transfer tangles the traditional universal tree of life, turning it into a network of relationships. Lateral
transfer is a significant factor in microbial evolution and is the mechanism of antibiotic resistance spread in bacteria species.
In this paper we survey current methods designed to cope with lateral transfer in conjunction with vertical inheritance. We distinguish between phylogenetic-based methods and sequence-based methods and illuminate the advantages and disadvantages of each. Finally, we sketch a new statistically rigorous approach aimed at identifying lateral transfer between two genomes.
Collapse
Affiliation(s)
- Sagi Snir
- Institute of Evolution, University of Haifa, 31905 Haifa, Israel and Department of Computer Science, Netanya Academic College
| |
Collapse
|
16
|
Morrison DA. Networks in phylogenetic analysis: new tools for population biology. Int J Parasitol 2005; 35:567-82. [PMID: 15826648 DOI: 10.1016/j.ijpara.2005.02.007] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2005] [Revised: 02/10/2005] [Accepted: 02/10/2005] [Indexed: 11/29/2022]
Abstract
Phylogenetic analysis has changed greatly in the past decade, including the more widespread appreciation of the idea that evolutionary histories are not always tree-like, and may, thus, be best represented as reticulated networks rather than as strictly dichotomous trees. Reconstructing such histories in the absence of a bifurcating speciation process is even more difficult than the usual procedure, and a range of alternative strategies have been developed. There seem to be two basic uses for a network model of evolution: the display of real but unobservable evolutionary events (i.e. a hypothesis of the true phylogenetic history), and the display of character conflict within the data itself (i.e. a summary of the data). These two general approaches are briefly reviewed here, and the strengths and weaknesses of the different implementations are compared and contrasted. Each network methodology seems to have limitations in terms of how it responds to increasing complexity (e.g. conflict) in the data, and therefore each is likely to be more appropriate for one of the two uses than for the other. Several examples using parasitological data sets illustrate the uses of networks within the context of population biology.
Collapse
Affiliation(s)
- David A Morrison
- Department of Parasitology (SWEPAR), National Veterinary Institute and Swedish University of Agricultural Sciences, 751 89 Uppsala, Sweden.
| |
Collapse
|
17
|
Abstract
This paper poses the problem of estimating and validating phylogenetic trees in statistical terms. The problem is hard enough to warrant several tacks: we reason by analogy to rounding real numbers, and dealing with ranking data. These are both cases where, as in phylogeny the parameters of interest are not real numbers. Then we pose the problem in geometrical terms, using distances and measures on a natural space of trees. We do not solve the problems of inference on tree space, but suggest some coherent ways of tackling them.
Collapse
Affiliation(s)
- Susan Holmes
- Statistics Department, Stanford University, CA 94305-4065, USA.
| |
Collapse
|
18
|
|
19
|
Huber KT, Watson EE, Hendy MD. An algorithm for constructing local regions in a phylogenetic network. Mol Phylogenet Evol 2001; 19:1-8. [PMID: 11286486 DOI: 10.1006/mpev.2000.0891] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The groupings of taxa in a phylogenetic tree cannot represent all the conflicting signals that usually occur among site patterns in aligned homologous genetic sequences. Hence a tree-building program must compromise by reporting a subset of the patterns, using some discriminatory criterion. Thus, in the worst case, out of possibly a large number of equally good trees, only an arbitrarily chosen tree might be reported by the tree-building program as "The Tree." This tree might then be used as a basis for phylogenetic conclusions. One strategy to represent conflicting patterns in the data is to construct a network. The Buneman graph is a theoretically very attractive example of such a network. In particular, a characterization for when this network will be a tree is known. Also the Buneman graph contains each of the most parsimonious trees indicated by the data. In this paper we describe a new method for constructing the Buneman graph that can be used for a generalization of Hadamard conjugation to networks. This new method differs from previous methods by allowing us to focus on local regions of the graph without having to first construct the full graph. The construction is illustrated by an example.
Collapse
Affiliation(s)
- K T Huber
- FMI, Department of Mathematics and Physics, Mid Sweden University, S-851-70 Sundsvall, Sweden
| | | | | |
Collapse
|
20
|
Abstract
Methods such as maximum parsimony (MP) are frequently criticized as being statistically unsound and not being based on any "model." On the other hand, advocates of MP claim that maximum likelihood (ML) has some fundamental problems. Here, we explore the connection between the different versions of MP and ML methods, particularly in light of recent theoretical results. We describe links between the two methods--for example, we describe how MP can be regarded as an ML method when there is no common mechanism between sites (such as might occur with morphological data and certain forms of molecular data). In the process, we clarify certain historical points of disagreement between proponents of the two methodologies, including a discussion of several forms of the ML optimality criterion. We also describe some additional results that shed light on how much needs to be assumed about underlying models of sequence evolution in order to successfully reconstruct evolutionary trees.
Collapse
Affiliation(s)
- M Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand.
| | | |
Collapse
|
21
|
Abstract
A method for computing the likelihood of a set of sequences assuming a phylogenetic network as an evolutionary hypothesis is presented. The approach applies directed graphical models to sequence evolution on networks and is a natural generalization of earlier work by Felsenstein on evolutionary trees, including it as a special case. The likelihood computation involves several steps. First, the phylogenetic network is rooted to form a directed acyclic graph (DAG). Then, applying standard models for nucleotide/amino acid substitution, the DAG is converted into a Bayesian network from which the joint probability distribution involving all nodes of the network can be directly read. The joint probability is explicitly dependent on branch lengths and on recombination parameters (prior probability of a parent sequence). The likelihood of the data assuming no knowledge of hidden nodes is obtained by marginalization, i.e., by summing over all combinations of unknown states. As the number of terms increases exponentially with the number of hidden nodes, a Markov chain Monte Carlo procedure (Gibbs sampling) is used to accurately approximate the likelihood by summing over the most important states only. Investigating a human T-cell lymphotropic virus (HTLV) data set and optimizing both branch lengths and recombination parameters, we find that the likelihood of a corresponding phylogenetic network outperforms a set of competing evolutionary trees. In general, except for the case of a tree, the likelihood of a network will be dependent on the choice of the root, even if a reversible model of substitution is applied. Thus, the method also provides a way in which to root a phylogenetic network by choosing a node that produces a most likely network.
Collapse
Affiliation(s)
- K Strimmer
- GSF-Forschungszentrum für Umwelt und Gesundheit, MIPS, am Max-Planck-Institut für Biochemie, Martinsried, Germany
| | | |
Collapse
|
22
|
Strimmer K, von Haeseler A. Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci U S A 1997; 94:6815-9. [PMID: 9192648 PMCID: PMC21241 DOI: 10.1073/pnas.94.13.6815] [Citation(s) in RCA: 660] [Impact Index Per Article: 24.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
We introduce a graphical method, likelihood-mapping, to visualize the phylogenetic content of a set of aligned sequences. The method is based on an analysis of the maximum likelihoods for the three fully resolved tree topologies that can be computed for four sequences. The three likelihoods are represented as one point inside an equilateral triangle. The triangle is partitioned in different regions. One region represents star-like evolution, three regions represent a well-resolved phylogeny, and three regions reflect the situation where it is difficult to distinguish between two of the three trees. The location of the likelihoods in the triangle defines the mode of sequence evolution. If n sequences are analyzed, then the likelihoods for each subset of four sequences are mapped onto the triangle. The resulting distribution of points shows whether the data are suitable for a phylogenetic reconstruction or not.
Collapse
Affiliation(s)
- K Strimmer
- Zoologisches Institut, Universität München, P.O. Box 202136, D-80021 Munich, Germany
| | | |
Collapse
|
23
|
Brower AVZ, DeSalle R, Vogler A. GENE TREES, SPECIES TREES, AND SYSTEMATICS: A Cladistic Perspective. ACTA ACUST UNITED AC 1996. [DOI: 10.1146/annurev.ecolsys.27.1.423] [Citation(s) in RCA: 150] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- A. V. Z. Brower
- Department of Entomology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024
- Department of Entomology, The Natural History Museum, Cromwell Road, London SW7 5BD, United Kingdom
- Department of Biology, Imperial College at Silwood Park, Ascot, Berkshire, SL5 7PY, United Kingdom
| | - R. DeSalle
- Department of Entomology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024
- Department of Entomology, The Natural History Museum, Cromwell Road, London SW7 5BD, United Kingdom
- Department of Biology, Imperial College at Silwood Park, Ascot, Berkshire, SL5 7PY, United Kingdom
| | - A. Vogler
- Department of Entomology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024
- Department of Entomology, The Natural History Museum, Cromwell Road, London SW7 5BD, United Kingdom
- Department of Biology, Imperial College at Silwood Park, Ascot, Berkshire, SL5 7PY, United Kingdom
| |
Collapse
|