1
|
Katriel G, Mahanaymi U, Brezner S, Kezel N, Koutschan C, Zeilberger D, Steel M, Snir S. Gene Transfer-Based Phylogenetics: Analytical Expressions and Additivity via Birth-Death Theory. Syst Biol 2023; 72:1403-1417. [PMID: 37862116 DOI: 10.1093/sysbio/syad060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 09/01/2023] [Accepted: 10/05/2023] [Indexed: 10/22/2023] Open
Abstract
The genomic era has opened up vast opportunities in molecular systematics, one of which is deciphering the evolutionary history in fine detail. Under this mass of data, analyzing the point mutations of standard markers is often too crude and slow for fine-scale phylogenetics. Nevertheless, genome dynamics (GD) events provide alternative, often richer information. The synteny index (SI) between a pair of genomes combines gene order and gene content information, allowing the comparison of genomes of unequal gene content, together with order considerations of their common genes. Recently, genome dynamics has been modeled as a continuous-time Markov process, and gene distance in the genome as a birth-death-immigration process. Nevertheless, due to complexities arising in this setting, no precise and provably consistent estimators could be derived, resulting in heuristic solutions. Here, we extend this modeling approach by using techniques from birth-death theory to derive explicit expressions of the system's probabilistic dynamics in the form of rational functions of the model parameters. This, in turn, allows us to infer analytically accurate distances between organisms based on their SI. Subsequently, we establish additivity of this estimated evolutionary distance (a desirable property yielding phylogenetic consistency). Applying the new measure in simulation studies shows that it provides accurate results in realistic settings and even under model extensions such as gene gain/loss or over a tree structure. In the real-data realm, we applied the new formulation to unique data structure that we constructed-the ordered orthology DB-based on a new version of the EggNOG database, to construct a tree with more than 4.5K taxa. To the best of our knowledge, this is the largest gene-order-based tree constructed and it overcomes shortcomings found in previous approaches. Constructing a GD-based tree allows to confirm and contrast findings based on other phylogenetic approaches, as we show.
Collapse
Affiliation(s)
- Guy Katriel
- Department of Mathematics, Braude College of Engineering, Karmiel, Israel
| | - Udi Mahanaymi
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Shelly Brezner
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| | - Noor Kezel
- Department of Mathematics, University of Haifa, Haifa, Israel
| | | | - Doron Zeilberger
- Department of Mathematics, Rutgers University, New Brunwick, NJ, USA
| | - Mike Steel
- School of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
| | - Sagi Snir
- Department of Evolutionary and Environmental Biology, University of Haifa, Haifa, Israel
| |
Collapse
|
2
|
Lutteropp S, Scornavacca C, Kozlov AM, Morel B, Stamatakis A. NetRAX: accurate and fast maximum likelihood phylogenetic network inference. BIOINFORMATICS (OXFORD, ENGLAND) 2022; 38:3725-3733. [PMID: 35713506 DOI: 10.1101/2021.08.30.458194] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 05/11/2022] [Accepted: 06/14/2022] [Indexed: 05/26/2023]
Abstract
MOTIVATION Phylogenetic networks can represent non-treelike evolutionary scenarios. Current, actively developed approaches for phylogenetic network inference jointly account for non-treelike evolution and incomplete lineage sorting (ILS). Unfortunately, this induces a very high computational complexity and current tools can only analyze small datasets. RESULTS We present NetRAX, a tool for maximum likelihood (ML) inference of phylogenetic networks in the absence of ILS. Our tool leverages state-of-the-art methods for efficiently computing the phylogenetic likelihood function on trees, and extends them to phylogenetic networks via the notion of 'displayed trees'. NetRAX can infer ML phylogenetic networks from partitioned multiple sequence alignments and returns the inferred networks in Extended Newick format. On simulated data, our results show a very low relative difference in Bayesian Information Criterion (BIC) score and a near-zero unrooted softwired cluster distance to the true, simulated networks. With NetRAX, a network inference on a partitioned alignment with 8000 sites, 30 taxa and 3 reticulations completes within a few minutes on a standard laptop. AVAILABILITY AND IMPLEMENTATION Our implementation is available under the GNU General Public License v3.0 at https://github.com/lutteropp/NetRAX. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sarah Lutteropp
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Céline Scornavacca
- Institut des Sciences de l'Évolution Université de Montpellier, CNRS, IRD, EPHE Place Eugène Bataillon, 34095 Montpellier Cedex 05, France
| | - Alexey M Kozlov
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
| | - Benoit Morel
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76128, Germany
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg 69118, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe 76128, Germany
| |
Collapse
|
3
|
Motoki MT, Linton YM, Conn JE, Ruiz-Lopez F, Wilkerson RC. Phylogenetic Network of Mitochondrial COI Gene Sequences Distinguishes 10 Taxa Within the Neotropical Albitarsis Group (Diptera: Culicidae), Confirming the Separate Species Status of Anopheles albitarsis H (Diptera: Culicidae) and Revealing a Novel Lineage, Anopheles albitarsis J. JOURNAL OF MEDICAL ENTOMOLOGY 2021; 58:599-607. [PMID: 33033825 PMCID: PMC7954104 DOI: 10.1093/jme/tjaa211] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Indexed: 05/14/2023]
Abstract
The Neotropical Albitarsis Group is a complex assemblage of essentially isomorphic species which currently comprises eight recognized species-five formally described (Anopheles albitarsis Lynch-Arribalzaga, An. deaneorum Rosa-Freitas, An. janconnae Wilkerson and Sallum, An. marajoara Galvao and Damasceno, An. oryzalimnetes Wilkerson and Motoki) and three molecularly assigned (An. albitarsis F, G & I)-and one mitochondrial lineage (An. albitarsis H). To further explore species recognition within this important group, 658 base pairs of the mitochondrial DNA cytochrome oxidase subunit I (COI) were analyzed from 988 specimens from South America. We conducted statistical parsimony network analysis, generated estimates of haplotype, nucleotide, genetic differentiation, divergence time, and tested the effect of isolation by distance (IBD). Ten clusters were identified, which confirmed the validity of the eight previously determined species, and confirmed the specific status of the previous mitochondrial lineage An. albitarsis H. High levels of diversity were highlighted in two samples from Pará (= An. albitarsis J), which needs further exploration through additional sampling, but which may indicate another cryptic species. The highest intra-specific nucleotide diversity was observed in An. deaneorum, and the lowest in An. marajoara. Significant correlation between genetic and geographical distance was observed only in An. oryzalimnetes and An. albitarsis F. Divergence time within the Albitarsis Group was estimated at 0.58-2.25 Mya, during the Pleistocene. The COI barcode region was considered an effective marker for species recognition within the Albitarsis Group and a network approach was an analytical method to discriminate among species of this group.
Collapse
Affiliation(s)
- Maysa T Motoki
- Walter Reed Biosystematics Unit, Smithsonian Institution Museum Support Center, Suitland, MD
- Global Health Research, Vysnova Partners Inc., Landover, MD
- Corresponding author, e-mail:
| | - Yvonne-Marie Linton
- Walter Reed Biosystematics Unit, Smithsonian Institution Museum Support Center, Suitland, MD
- Department of Entomology, Smithsonian Institution—Natural Museum of Natural History, Washington, DC
- Walter Reed Army Institute of Research, Silver Spring, MD
| | - Jan E Conn
- Griffin Laboratory, Wadsworth Center, New York State Department of Health, Albany, NY
- School of Public Health, Department of Biomedical Sciences, State University of New York, Albany, NY
| | - Fredy Ruiz-Lopez
- Walter Reed Biosystematics Unit, Smithsonian Institution Museum Support Center, Suitland, MD
- Programa de Estudio y Control de Enfermedades Tropicales (PECET), Facultad de Medicina, Universidad de Antioquia, Medellin, Colombia
| | - Richard C Wilkerson
- Walter Reed Biosystematics Unit, Smithsonian Institution Museum Support Center, Suitland, MD
- Department of Entomology, Smithsonian Institution—Natural Museum of Natural History, Washington, DC
| |
Collapse
|
4
|
Murakami Y, van Iersel L, Janssen R, Jones M, Moulton V. Reconstructing Tree-Child Networks from Reticulate-Edge-Deleted Subnetworks. Bull Math Biol 2019; 81:3823-3863. [PMID: 31297691 PMCID: PMC6764941 DOI: 10.1007/s11538-019-00641-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 07/03/2019] [Indexed: 01/16/2023]
Abstract
Network reconstruction lies at the heart of phylogenetic research. Two well-studied classes of phylogenetic networks include tree-child networks and level-k networks. In a tree-child network, every non-leaf node has a child that is a tree node or a leaf. In a level-k network, the maximum number of reticulations contained in a biconnected component is k. Here, we show that level-k tree-child networks are encoded by their reticulate-edge-deleted subnetworks, which are subnetworks obtained by deleting a single reticulation edge, if [Formula: see text]. Following this, we provide a polynomial-time algorithm for uniquely reconstructing such networks from their reticulate-edge-deleted subnetworks. Moreover, we show that this can even be done when considering subnetworks obtained by deleting one reticulation edge from each biconnected component with k reticulations.
Collapse
Affiliation(s)
- Yukihiro Murakami
- Delft Institute of Applied Mathematics, Delft University of Technology, Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands
| | - Leo van Iersel
- Delft Institute of Applied Mathematics, Delft University of Technology, Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands
| | - Remie Janssen
- Delft Institute of Applied Mathematics, Delft University of Technology, Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands
| | - Mark Jones
- Delft Institute of Applied Mathematics, Delft University of Technology, Van Mourik Broekmanweg 6, 2628 XE Delft, The Netherlands
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ UK
| |
Collapse
|
5
|
Caetano-Anollés G, Nasir A, Kim KM, Caetano-Anollés D. Rooting Phylogenies and the Tree of Life While Minimizing Ad Hoc and Auxiliary Assumptions. Evol Bioinform Online 2018; 14:1176934318805101. [PMID: 30364468 PMCID: PMC6196624 DOI: 10.1177/1176934318805101] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 09/05/2018] [Indexed: 12/25/2022] Open
Abstract
Phylogenetic methods unearth evolutionary history when supported by three starting points of reason: (1) the continuity axiom begs the existence of a "model" of evolutionary change, (2) the singularity axiom defines the historical ground plan (phylogeny) in which biological entities (taxa) evolve, and (3) the memory axiom demands identification of biological attributes (characters) with historical information. Axiom consequences are interlinked, making the retrodiction enterprise an endeavor of reciprocal fulfillment. In particular, establishing direction of evolutionary change (character polarization) roots phylogenies and enables testing the existence of historical memory (homology). Unfortunately, rooting phylogenies, especially the "tree of life," generally follow narratives instead of integrating empirical and theoretical knowledge of retrodictive exploration. This stems mostly from a focus on molecular sequence analysis and uncertainties about rooting methods. Here, we review available rooting criteria, highlighting the need to minimize both ad hoc and auxiliary assumptions, especially argumentative ad hocness. We show that while the outgroup comparison method has been widely adopted, the generality criterion of nesting and additive phylogenetic change embodied in Weston rule offers the most powerful rooting approach. We also propose a change of focus, from phylogenies that describe the evolution of biological systems to those that describe the evolution of parts of those systems. This weakens violation of character independence, helps formalize the generality criterion of rooting, and provides new ways to study the problem of evolution.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Arshan Nasir
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Biosciences, COMSATS University Islamabad, Islamabad, Pakistan
| | - Kyung Mo Kim
- Division of Polar Life Sciences, Korea Polar Research Institute, Incheon, Republic of Korea
| | - Derek Caetano-Anollés
- Department of Evolutionary Genetics, Max-Planck-Institut für Evolutionsbiologie, Plön, Germany
| |
Collapse
|
6
|
Zhu S, Degnan JH. Displayed Trees Do Not Determine Distinguishability Under the Network Multispecies Coalescent. Syst Biol 2018; 66:283-298. [PMID: 27780899 DOI: 10.1093/sysbio/syw097] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 03/08/2016] [Indexed: 11/13/2022] Open
Abstract
Recent work in estimating species relationships from gene trees has included inferring networks assuming that past hybridization has occurred between species. Probabilistic models using the multispecies coalescent can be used in this framework for likelihood-based inference of both network topologies and parameters, including branch lengths and hybridization parameters. A difficulty for such methods is that it is not always clear whether, or to what extent, networks are identifiable-that is whether there could be two distinct networks that lead to the same distribution of gene trees. For cases in which incomplete lineage sorting occurs in addition to hybridization, we demonstrate a new representation of the species network likelihood that expresses the probability distribution of the gene tree topologies as a linear combination of gene tree distributions given a set of species trees. This representation makes it clear that in some cases in which two distinct networks give the same distribution of gene trees when sampling one allele per species, the two networks can be distinguished theoretically when multiple individuals are sampled per species. This result means that network identifiability is not only a function of the trees displayed by the networks but also depends on allele sampling within species. We additionally give an example in which two networks that display exactly the same trees can be distinguished from their gene trees even when there is only one lineage sampled per species. [gene tree, hybridization, identifiability, maximum likelihood, species tree, phylogeny.].
Collapse
Affiliation(s)
- Sha Zhu
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - James H Degnan
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87110, USA
| |
Collapse
|
7
|
Furcation and fusion: The phylogenetics of evolutionary novelty. Dev Biol 2017; 431:69-76. [DOI: 10.1016/j.ydbio.2017.09.015] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 09/11/2017] [Accepted: 09/11/2017] [Indexed: 01/02/2023]
|
8
|
Wang JH, Tang CT, Chen H. An Adaptable Continuous Restricted Boltzmann Machine in VLSI for Fusing the Sensory Data of an Electronic Nose. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:961-974. [PMID: 26863678 DOI: 10.1109/tnnls.2016.2517078] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
An embedded system capable of fusing sensory data is demanded for many portable or implantable microsystems. The continuous restricted Boltzmann machine (CRBM) is a probabilistic neural network not only capable of classifying data reliably but also amenable to very-large-scale-integration (VLSI) implementation. Although the embedded system based on the CRBM has been demonstrated with analog VLSI, the precision required by the learning algorithm is hardly achievable with analog circuits. Therefore, this paper investigates the feasibility of realizing the CRBM as a digital embedded system for fusing the sensory data of an electronic nose (eNose). The fusion here refers to data clustering and dimensional reduction that facilitates reliable classification. The capability of the CRBM to model different types of eNose data is first examined by MATLAB simulation. Afterward, the CRBM algorithm is customdesigned as a digital embedded system within an eNose microsystem. The functionality of the embedded CRBM system is then tested and discussed. With on-chip learning ability, the CRBM-embedded eNose is able to adapt its parameters in response to new data inputs or environmental changes.
Collapse
|
9
|
Solís-Lemus C, Ané C. Inferring Phylogenetic Networks with Maximum Pseudolikelihood under Incomplete Lineage Sorting. PLoS Genet 2016; 12:e1005896. [PMID: 26950302 PMCID: PMC4780787 DOI: 10.1371/journal.pgen.1005896] [Citation(s) in RCA: 244] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2015] [Accepted: 02/03/2016] [Indexed: 11/23/2022] Open
Abstract
Phylogenetic networks are necessary to represent the tree of life expanded by edges to represent events such as horizontal gene transfers, hybridizations or gene flow. Not all species follow the paradigm of vertical inheritance of their genetic material. While a great deal of research has flourished into the inference of phylogenetic trees, statistical methods to infer phylogenetic networks are still limited and under development. The main disadvantage of existing methods is a lack of scalability. Here, we present a statistical method to infer phylogenetic networks from multi-locus genetic data in a pseudolikelihood framework. Our model accounts for incomplete lineage sorting through the coalescent model, and for horizontal inheritance of genes through reticulation nodes in the network. Computation of the pseudolikelihood is fast and simple, and it avoids the burdensome calculation of the full likelihood which can be intractable with many species. Moreover, estimation at the quartet-level has the added computational benefit that it is easily parallelizable. Simulation studies comparing our method to a full likelihood approach show that our pseudolikelihood approach is much faster without compromising accuracy. We applied our method to reconstruct the evolutionary relationships among swordtails and platyfishes (Xiphophorus: Poeciliidae), which is characterized by widespread hybridizations. Phylogenetic networks display the evolutionary history of groups of individuals (species or populations) including reticulation events such as hybridization, horizontal gene transfer or migration. Here, we present a likelihood method to learn networks from molecular sequences at multiple genes. Our model accounts for several biological processes: mutations, incomplete lineage sorting of alleles in ancestral populations, and reticulations in the network. The likelihood is decomposed into 4-taxon subsets to make the analyses scale to many species and many genes. Our work makes it possible to learn large phylogenetic networks from large data sets, with a statistical approach and a biologically relevant model.
Collapse
Affiliation(s)
- Claudia Solís-Lemus
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- * E-mail:
| | - Cécile Ané
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- Department of Botany, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
10
|
Wheeler WC. Phylogenetic network analysis as a parsimony optimization problem. BMC Bioinformatics 2015; 16:296. [PMID: 26382078 PMCID: PMC4574467 DOI: 10.1186/s12859-015-0675-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 07/14/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many problems in comparative biology are, or are thought to be, best expressed as phylogenetic "networks" as opposed to trees. In trees, vertices may have only a single parent (ancestor), while networks allow for multiple parent vertices. There are two main interpretive types of networks, "softwired" and "hardwired." The parsimony cost of hardwired networks is based on all changes over all edges, hence must be greater than or equal to the best tree cost contained ("displayed") by the network. This is in contrast to softwired, where each character follows the lowest parsimony cost tree displayed by the network, resulting in costs which are less than or equal to the best display tree. Neither situation is ideal since hard-wired networks are not generally biologically attractive (since individual heritable characters can have more than one parent) and softwired networks can be trivially optimized (containing the best tree for each character). Furthermore, given the alternate cost scenarios of trees and these two flavors of networks, hypothesis testing among these explanatory scenarios is impossible. RESULTS A network cost adjustment (penalty) is proposed to allow phylogenetic trees and soft-wired phylogenetic networks to compete equally on a parsimony optimality basis. This cost is demonstrated for several real and simulated datasets. In each case, the favored graph representation (tree or network) matched expectation or simulation scenario. CONCLUSIONS The softwired network cost regime proposed here presents a quantitative criterion for an optimality-based search procedure where trees and networks can participate in hypothesis testing simultaneously.
Collapse
Affiliation(s)
- Ward C Wheeler
- Division of Invertebrate Zoology, American Museum of Natural History, Central Park West @ 79th Street, New York, 10024-5192, NY, USA.
| |
Collapse
|
11
|
|
12
|
Huber KT, Linz S, Moulton V, Wu T. Spaces of phylogenetic networks from generalized nearest-neighbor interchange operations. J Math Biol 2015; 72:699-725. [PMID: 26037483 DOI: 10.1007/s00285-015-0899-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 05/04/2015] [Indexed: 11/29/2022]
Abstract
Phylogenetic networks are a generalization of evolutionary or phylogenetic trees that are used to represent the evolution of species which have undergone reticulate evolution. In this paper we consider spaces of such networks defined by some novel local operations that we introduce for converting one phylogenetic network into another. These operations are modeled on the well-studied nearest-neighbor interchange operations on phylogenetic trees, and lead to natural generalizations of the tree spaces that have been previously associated to such operations. We present several results on spaces of some relatively simple networks, called level-1 networks, including the size of the neighborhood of a fixed network, and bounds on the diameter of the metric defined by taking the smallest number of operations required to convert one network into another. We expect that our results will be useful in the development of methods for systematically searching for optimal phylogenetic networks using, for example, likelihood and Bayesian approaches.
Collapse
Affiliation(s)
- Katharina T Huber
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | - Simone Linz
- Department of Computer Science, University of Auckland, Auckland, New Zealand.
| | - Vincent Moulton
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | - Taoyang Wu
- School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| |
Collapse
|
13
|
Sumner JG, Jarvis PD, Holland BR. A tensorial approach to the inversion of group-based phylogenetic models. BMC Evol Biol 2014; 14:236. [PMID: 25472897 PMCID: PMC4268818 DOI: 10.1186/s12862-014-0236-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 11/06/2014] [Indexed: 11/16/2022] Open
Abstract
Background Hadamard conjugation is part of the standard mathematical armoury in the analysis of molecular phylogenetic methods. For group-based models, the approach provides a one-to-one correspondence between the so-called “edge length” and “sequence” spectrum on a phylogenetic tree. The Hadamard conjugation has been used in diverse phylogenetic applications not only for inference but also as an important conceptual tool for thinking about molecular data leading to generalizations beyond strictly tree-like evolutionary modelling. Results For general group-based models of phylogenetic branching processes, we reformulate the problem of constructing a one-one correspondence between pattern probabilities and edge parameters. This takes a classic result previously shown through use of Fourier analysis and presents it in the language of tensors and group representation theory. This derivation makes it clear why the inversion is possible, because, under their usual definition, group-based models are defined for abelian groups only. Conclusion We provide an inversion of group-based phylogenetic models that can implemented using matrix multiplication between rectangular matrices indexed by ordered-partitions of varying sizes. Our approach provides additional context for the construction of phylogenetic probability distributions on network structures, and highlights the potential limitations of restricting to group-based models in this setting.
Collapse
Affiliation(s)
- Jeremy G Sumner
- School of Physical Sciences, University of Tasmania, Hobart TAS 7001, Australia.
| | | | | |
Collapse
|
14
|
Wertheim JO, Leigh Brown AJ, Hepler NL, Mehta SR, Richman DD, Smith DM, Kosakovsky Pond SL. The global transmission network of HIV-1. J Infect Dis 2013; 209:304-13. [PMID: 24151309 DOI: 10.1093/infdis/jit524] [Citation(s) in RCA: 166] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Human immunodeficiency virus type 1 (HIV-1) is pandemic, but its contemporary global transmission network has not been characterized. A better understanding of the properties and dynamics of this network is essential for surveillance, prevention, and eventual eradication of HIV. Here, we apply a simple and computationally efficient network-based approach to all publicly available HIV polymerase sequences in the global database, revealing a contemporary picture of the spread of HIV-1 within and between countries. This approach automatically recovered well-characterized transmission clusters and extended other clusters thought to be contained within a single country across international borders. In addition, previously undescribed transmission clusters were discovered. Together, these clusters represent all known modes of HIV transmission. The extent of international linkage revealed by our comprehensive approach demonstrates the need to consider the global diversity of HIV, even when describing local epidemics. Finally, the speed of this method allows for near-real-time surveillance of the pandemic's progression.
Collapse
|
15
|
Saitou N, Kitano T. The PNarec method for detection of ancient recombinations through phylogenetic network analysis. Mol Phylogenet Evol 2012; 66:507-14. [PMID: 23022140 DOI: 10.1016/j.ympev.2012.09.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Revised: 09/07/2012] [Accepted: 09/07/2012] [Indexed: 11/18/2022]
Abstract
Recombinations are known to disrupt bifurcating tree structure of gene genealogies. Although recently occurred recombinations are easily detectable by using conventional methods, recombinations may have occurred at any time. We devised a new method for detecting ancient recombinations through phylogenetic network analysis, and detected five ancient recombinations in gibbon ABO blood group genes [Kitano et al., 2009. Mol. Phylogenet. Evol., 51, 465-471]. We present applications of this method, now named as "PNarec", to various virus sequences as well as HLA genes.
Collapse
Affiliation(s)
- Naruya Saitou
- Division of Population Genetics, National Institute of Genetics, Mishima 411-8540, Japan.
| | | |
Collapse
|
16
|
Asano T, Jansson J, Sadakane K, Uehara R, Valiente G. Faster computation of the Robinson–Foulds distance between phylogenetic networks. Inf Sci (N Y) 2012. [DOI: 10.1016/j.ins.2012.01.038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
17
|
Klaere S, Liebscher V. An algebraic analysis of the two state Markov model on tripod trees. Math Biosci 2012; 237:38-48. [PMID: 22430560 DOI: 10.1016/j.mbs.2012.03.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2010] [Revised: 02/22/2012] [Accepted: 03/02/2012] [Indexed: 11/15/2022]
Abstract
Methods of phylogenetic inference use more and more complex models to generate trees from data. However, even simple models and their implications are not fully understood. Here, we investigate the two-state Markov model on a tripod tree, inferring conditions under which a given set of observations gives rise to such a model. This type of investigation has been undertaken before by several scientists from different fields of research. In contrast to other work we fully analyse the model, presenting conditions under which one can infer a model from the observation or at least get support for the tree-shaped interdependence of the leaves considered. We also present all conditions under which the results can be extended from tripod trees to quartet trees, a step necessary to reconstruct at least a topology. Apart from finding conditions under which such an extension works we discuss example cases for which such an extension does not work.
Collapse
Affiliation(s)
- Steffen Klaere
- Department of Statistics and School of Biological Sciences, The University of Auckland, Auckland, New Zealand.
| | | |
Collapse
|
18
|
The Algebra of the General Markov Model on Phylogenetic Trees and Networks. Bull Math Biol 2011; 74:858-80. [DOI: 10.1007/s11538-011-9691-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2010] [Accepted: 08/09/2011] [Indexed: 10/17/2022]
|
19
|
De Barro P, Ahmed MZ. Genetic networking of the Bemisia tabaci cryptic species complex reveals pattern of biological invasions. PLoS One 2011; 6:e25579. [PMID: 21998669 PMCID: PMC3184991 DOI: 10.1371/journal.pone.0025579] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2011] [Accepted: 09/07/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND A challenge within the context of cryptic species is the delimitation of individual species within the complex. Statistical parsimony network analytics offers the opportunity to explore limits in situations where there are insufficient species-specific morphological characters to separate taxa. The results also enable us to explore the spread in taxa that have invaded globally. METHODOLOGY/PRINCIPAL FINDINGS Using a 657 bp portion of mitochondrial cytochrome oxidase 1 from 352 unique haplotypes belonging to the Bemisia tabaci cryptic species complex, the analysis revealed 28 networks plus 7 unconnected individual haplotypes. Of the networks, 24 corresponded to the putative species identified using the rule set devised by Dinsdale et al. (2010). Only two species proposed in Dinsdale et al. (2010) departed substantially from the structure suggested by the analysis. The analysis of the two invasive members of the complex, Mediterranean (MED) and Middle East - Asia Minor 1 (MEAM1), showed that in both cases only a small number of haplotypes represent the majority that have spread beyond the home range; one MEAM1 and three MED haplotypes account for >80% of the GenBank records. Israel is a possible source of the globally invasive MEAM1 whereas MED has two possible sources. The first is the eastern Mediterranean which has invaded only the USA, primarily Florida and to a lesser extent California. The second are western Mediterranean haplotypes that have spread to the USA, Asia and South America. The structure for MED supports two home range distributions, a Sub-Saharan range and a Mediterranean range. The MEAM1 network supports the Middle East - Asia Minor region. CONCLUSION/SIGNIFICANCE The network analyses show a high level of congruence with the species identified in a previous phylogenetic analysis. The analysis of the two globally invasive members of the complex support the view that global invasion often involve very small portions of the available genetic diversity.
Collapse
Affiliation(s)
- Paul De Barro
- CSIRO Ecosystem Sciences, Brisbane, Queensland, Australia.
| | | |
Collapse
|
20
|
Willson SJ. Regular networks can be uniquely constructed from their trees. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:785-796. [PMID: 20714025 DOI: 10.1109/tcbb.2010.69] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
A rooted acyclic digraph N with labeled leaves displays a tree T when there exists a way to select a unique parent of each hybrid vertex resulting in the tree T. Let Tr(N) denote the set of all trees displayed by the network N. In general, there may be many other networks M, such that Tr(M) = Tr(N). A network is regular if it is isomorphic with its cover digraph. If N is regular and D is a collection of trees displayed by N, this paper studies some procedures to try to reconstruct N given D. If the input is D = Tr(N), one procedure is described, which will reconstruct N. Hence, if N and M are regular networks and Tr(N) = Tr(M), it follows that N = M, proving that a regular network is uniquely determined by its displayed trees. If D is a (usually very much smaller) collection of displayed trees that satisfies certain hypotheses, modifications of the procedure will still reconstruct N given D.
Collapse
Affiliation(s)
- Stephen J Willson
- Department of Mathematics, Iowa State University, Ames, IA 50011, USA.
| |
Collapse
|
21
|
Snir S, Tuller T. The NET-HMM approach: phylogenetic network inference by combining maximum likelihood and Hidden Markov Models. J Bioinform Comput Biol 2009; 7:625-44. [PMID: 19634195 DOI: 10.1142/s021972000900428x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2008] [Revised: 12/05/2008] [Accepted: 12/06/2008] [Indexed: 11/18/2022]
Abstract
Horizontal gene transfer (HGT) is the event of transferring genetic material from one lineage in the evolutionary tree to a different lineage. HGT plays a major role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Although the prevailing assumption is of complete HGT, cases of partial HGT (which are also named chimeric HGT) where only part of a gene is horizontally transferred, have also been reported, albeit less frequently. In this work we suggest a new probabilistic model, the NET-HMM, for analyzing and modeling phylogenetic networks. This new model captures the biologically realistic assumption that neighboring sites of DNA or amino acid sequences are not independent, which increases the accuracy of the inference. The model describes the phylogenetic network as a Hidden Markov Model (HMM), where each hidden state is related to one of the network's trees. One of the advantages of the NET-HMM is its ability to infer partial HGT as well as complete HGT. We describe the properties of the NET-HMM, devise efficient algorithms for solving a set of problems related to it, and implement them in software. We also provide a novel complementary significance test for evaluating the fitness of a model (NET-HMM) to a given dataset. Using NET-HMM, we are able to answer interesting biological questions, such as inferring the length of partial HGT's and the affected nucleotides in the genomic sequences, as well as inferring the exact location of HGT events along the tree branches. These advantages are demonstrated through the analysis of synthetical inputs and three different biological inputs.
Collapse
Affiliation(s)
- Sagi Snir
- Department of Evolutionary and Environmental Biology and Institute of Evolution, University of Haifa, Israel.
| | | |
Collapse
|
22
|
Cardona G, Rosselló F, Valiente G. Comparison of tree-child phylogenetic networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:552-569. [PMID: 19875855 DOI: 10.1109/tcbb.2007.70270] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of nontreelike evolutionary events, like recombination, hybridization, or lateral gene transfer. While much progress has been made to find practical algorithms for reconstructing a phylogenetic network from a set of sequences, all attempts to endorse a class of phylogenetic networks (strictly extending the class of phylogenetic trees) with a well-founded distance measure have, to the best of our knowledge and with the only exception of the bipartition distance on regular networks, failed so far. In this paper, we present and study a new meaningful class of phylogenetic networks, called tree-child phylogenetic networks, and we provide an injective representation of these networks as multisets of vectors of natural numbers, their path multiplicity vectors. We then use this representation to define a distance on this class that extends the well-known Robinson-Foulds distance for phylogenetic trees and to give an alignment method for pairs of networks in this class. Simple polynomial algorithms for reconstructing a tree-child phylogenetic network from its path multiplicity vectors, for computing the distance between two tree-child phylogenetic networks and for aligning a pair of tree-child phylogenetic networks, are provided. They have been implemented as a Perl package and a Java applet, which can be found at http://bioinfo.uib.es/~recerca/phylonetworks/mudistance/.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, E-07122 Palma de Mallorca, Spain.
| | | | | |
Collapse
|
23
|
Bapteste E, O'Malley MA, Beiko RG, Ereshefsky M, Gogarten JP, Franklin-Hall L, Lapointe FJ, Dupré J, Dagan T, Boucher Y, Martin W. Prokaryotic evolution and the tree of life are two different things. Biol Direct 2009; 4:34. [PMID: 19788731 PMCID: PMC2761302 DOI: 10.1186/1745-6150-4-34] [Citation(s) in RCA: 128] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2009] [Accepted: 09/29/2009] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The concept of a tree of life is prevalent in the evolutionary literature. It stems from attempting to obtain a grand unified natural system that reflects a recurrent process of species and lineage splittings for all forms of life. Traditionally, the discipline of systematics operates in a similar hierarchy of bifurcating (sometimes multifurcating) categories. The assumption of a universal tree of life hinges upon the process of evolution being tree-like throughout all forms of life and all of biological time. In multicellular eukaryotes, the molecular mechanisms and species-level population genetics of variation do indeed mainly cause a tree-like structure over time. In prokaryotes, they do not. Prokaryotic evolution and the tree of life are two different things, and we need to treat them as such, rather than extrapolating from macroscopic life to prokaryotes. In the following we will consider this circumstance from philosophical, scientific, and epistemological perspectives, surmising that phylogeny opted for a single model as a holdover from the Modern Synthesis of evolution. RESULTS It was far easier to envision and defend the concept of a universal tree of life before we had data from genomes. But the belief that prokaryotes are related by such a tree has now become stronger than the data to support it. The monistic concept of a single universal tree of life appears, in the face of genome data, increasingly obsolete. This traditional model to describe evolution is no longer the most scientifically productive position to hold, because of the plurality of evolutionary patterns and mechanisms involved. Forcing a single bifurcating scheme onto prokaryotic evolution disregards the non-tree-like nature of natural variation among prokaryotes and accounts for only a minority of observations from genomes. CONCLUSION Prokaryotic evolution and the tree of life are two different things. Hence we will briefly set out alternative models to the tree of life to study their evolution. Ultimately, the plurality of evolutionary patterns and mechanisms involved, such as the discontinuity of the process of evolution across the prokaryote-eukaryote divide, summons forth a pluralistic approach to studying evolution. REVIEWERS This article was reviewed by Ford Doolittle, John Logsdon and Nicolas Galtier.
Collapse
|
24
|
Kubatko LS. Identifying Hybridization Events in the Presence of Coalescence via Model Selection. Syst Biol 2009; 58:478-88. [DOI: 10.1093/sysbio/syp055] [Citation(s) in RCA: 148] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Laura Salter Kubatko
- Department of Statistics
- Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
25
|
Abstract
Galled trees, evolutionary networks with isolated reticulation cycles, have appeared under several slightly different definitions in the literature. In this paper, we establish the actual relationships between the main four such alternative definitions: namely, the original galled trees, level-1 networks, nested networks with nesting depth 1, and evolutionary networks with arc-disjoint reticulation cycles.
Collapse
|
26
|
Meng C, Kubatko LS. Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: A model. Theor Popul Biol 2009; 75:35-45. [DOI: 10.1016/j.tpb.2008.10.004] [Citation(s) in RCA: 176] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2008] [Revised: 09/04/2008] [Accepted: 10/14/2008] [Indexed: 11/28/2022]
|
27
|
Cardona G, Llabrés M, Rosselló F, Valiente G. A distance metric for a class of tree-sibling phylogenetic networks. ACTA ACUST UNITED AC 2008; 24:1481-8. [PMID: 18477576 PMCID: PMC2718672 DOI: 10.1093/bioinformatics/btn231] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Motivation: The presence of reticulate evolutionary events in phylogenies turn phylogenetic trees into phylogenetic networks. These events imply in particular that there may exist multiple evolutionary paths from a non-extant species to an extant one, and this multiplicity makes the comparison of phylogenetic networks much more difficult than the comparison of phylogenetic trees. In fact, all attempts to define a sound distance measure on the class of all phylogenetic networks have failed so far. Thus, the only practical solutions have been either the use of rough estimates of similarity (based on comparison of the trees embedded in the networks), or narrowing the class of phylogenetic networks to a certain class where such a distance is known and can be efficiently computed. The first approach has the problem that one may identify two networks as equivalent, when they are not; the second one has the drawback that there may not exist algorithms to reconstruct such networks from biological sequences. Results: We present in this article a distance measure on the class of semi-binary tree-sibling time consistent phylogenetic networks, which generalize tree-child time consistent phylogenetic networks, and thus also galled-trees. The practical interest of this distance measure is 2-fold: it can be computed in polynomial time by means of simple algorithms, and there also exist polynomial-time algorithms for reconstructing networks of this class from DNA sequence data. Availability: The Perl package Bio::PhyloNetwork, included in the BioPerl bundle, implements many algorithms on phylogenetic networks, including the computation of the distance presented in this article. Contact:gabriel.cardona@uib.es Supplementary information: Some counterexamples, proofs of the results not included in this article, and some computational experiments are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, E-07122 Palma de Mallorca, Spain.
| | | | | | | |
Collapse
|
28
|
Reconstruction of certain phylogenetic networks from the genomes at their leaves. J Theor Biol 2008; 252:338-49. [DOI: 10.1016/j.jtbi.2008.02.015] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2007] [Revised: 02/06/2008] [Accepted: 02/11/2008] [Indexed: 11/20/2022]
|
29
|
Cardona G, Rosselló F, Valiente G. A perl package and an alignment tool for phylogenetic networks. BMC Bioinformatics 2008; 9:175. [PMID: 18371228 PMCID: PMC2330044 DOI: 10.1186/1471-2105-9-175] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2007] [Accepted: 03/27/2008] [Indexed: 11/12/2022] Open
Abstract
Background Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of evolutionary events acting at the population level, like recombination between genes, hybridization between lineages, and lateral gene transfer. While most phylogenetics tools implement a wide range of algorithms on phylogenetic trees, there exist only a few applications to work with phylogenetic networks, none of which are open-source libraries, and they do not allow for the comparative analysis of phylogenetic networks by computing distances between them or aligning them. Results In order to improve this situation, we have developed a Perl package that relies on the BioPerl bundle and implements many algorithms on phylogenetic networks. We have also developed a Java applet that makes use of the aforementioned Perl package and allows the user to make simple experiments with phylogenetic networks without having to develop a program or Perl script by him or herself. Conclusion The Perl package is available as part of the BioPerl bundle, and can also be downloaded. A web-based application is also available (see availability and requirements). The Perl package includes full documentation of all its features.
Collapse
Affiliation(s)
- Gabriel Cardona
- Department of Mathematics and Computer Science, University of the Balearic Islands, E-07122 Palma de Mallorca, Spain.
| | | | | |
Collapse
|
30
|
Tripartitions do not always discriminate phylogenetic networks. Math Biosci 2008; 211:356-70. [DOI: 10.1016/j.mbs.2007.11.003] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2007] [Revised: 11/09/2007] [Accepted: 11/16/2007] [Indexed: 11/23/2022]
|
31
|
Birin H, Gal-Or Z, Elias I, Tuller T. Inferring horizontal transfers in the presence of rearrangements by the minimum evolution criterion†. Bioinformatics 2008; 24:826-32. [DOI: 10.1093/bioinformatics/btn024] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
32
|
Munshaw S, Kepler TB. An Information-Theoretic Method for the Treatment of Plural Ancestry in Phylogenetics. Mol Biol Evol 2008; 25:1199-208. [DOI: 10.1093/molbev/msn066] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
33
|
Rector A, Lemey P, Tachezy R, Mostmans S, Ghim SJ, Van Doorslaer K, Roelke M, Bush M, Montali RJ, Joslin J, Burk RD, Jenson AB, Sundberg JP, Shapiro B, Van Ranst M. Ancient papillomavirus-host co-speciation in Felidae. Genome Biol 2007; 8:R57. [PMID: 17430578 PMCID: PMC1896010 DOI: 10.1186/gb-2007-8-4-r57] [Citation(s) in RCA: 129] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2006] [Revised: 03/20/2007] [Accepted: 04/12/2007] [Indexed: 12/12/2022] Open
Abstract
The evolutionary rate of feline papillomaviruses is inferred from the phylogenetic analysis of their hosts, providing evidence for long-term virus-host co-speciation Background Estimating evolutionary rates for slowly evolving viruses such as papillomaviruses (PVs) is not possible using fossil calibrations directly or sequences sampled over a time-scale of decades. An ability to correlate their divergence with a host species, however, can provide a means to estimate evolutionary rates for these viruses accurately. To determine whether such an approach is feasible, we sequenced complete feline PV genomes, previously available only for the domestic cat (Felis domesticus, FdPV1), from four additional, globally distributed feline species: Lynx rufus PV type 1, Puma concolor PV type 1, Panthera leo persica PV type 1, and Uncia uncia PV type 1. Results The feline PVs all belong to the Lambdapapillomavirus genus, and contain an unusual second noncoding region between the early and late protein region, which is only present in members of this genus. Our maximum likelihood and Bayesian phylogenetic analyses demonstrate that the evolutionary relationships between feline PVs perfectly mirror those of their feline hosts, despite a complex and dynamic phylogeographic history. By applying host species divergence times, we provide the first precise estimates for the rate of evolution for each PV gene, with an overall evolutionary rate of 1.95 × 10-8 (95% confidence interval 1.32 × 10-8 to 2.47 × 10-8) nucleotide substitutions per site per year for the viral coding genome. Conclusion Our work provides evidence for long-term virus-host co-speciation of feline PVs, indicating that viral diversity in slowly evolving viruses can be used to investigate host species evolution. These findings, however, should not be extrapolated to other viral lineages without prior confirmation of virus-host co-divergence.
Collapse
Affiliation(s)
- Annabel Rector
- Laboratory of Clinical & Epidemiological Virology, Rega Institute for Medical Research, University of Leuven, Minderbroedersstraat, B3000 Leuven, Belgium
| | - Philippe Lemey
- Laboratory of Clinical & Epidemiological Virology, Rega Institute for Medical Research, University of Leuven, Minderbroedersstraat, B3000 Leuven, Belgium
- Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK
| | - Ruth Tachezy
- Department of Experimental Virology, Institute of Hematology and Blood Transfusion, U Nemocnice, 128 22 Prague, Czech Republic
| | - Sara Mostmans
- Laboratory of Clinical & Epidemiological Virology, Rega Institute for Medical Research, University of Leuven, Minderbroedersstraat, B3000 Leuven, Belgium
| | - Shin-Je Ghim
- The Brown Cancer Center, University of Louisville, South Jackson Street, Louisville, KY 40202, USA
| | - Koenraad Van Doorslaer
- Laboratory of Clinical & Epidemiological Virology, Rega Institute for Medical Research, University of Leuven, Minderbroedersstraat, B3000 Leuven, Belgium
- Department of Epidemiology and Social Medicine, Comprehensive Cancer Center, Albert Einstein College of Medicine, Morris Park Avenue, Bronx, NY 10461, USA
| | - Melody Roelke
- Basic Research Program-SAIC Frederick-National Cancer Institute, Building 560, Frederick, MD 21702-1201, USA
| | - Mitchell Bush
- National Zoological Park, Smithsonian Conservation and Research Center, Remount Road, Front Royal, VA 22630, USA
| | | | - Janis Joslin
- Phoenix Zoo, Galvin Parkway, Phoenix, AZ 85008, USA
| | - Robert D Burk
- Department of Epidemiology and Social Medicine, Comprehensive Cancer Center, Albert Einstein College of Medicine, Morris Park Avenue, Bronx, NY 10461, USA
| | - Alfred B Jenson
- The Brown Cancer Center, University of Louisville, South Jackson Street, Louisville, KY 40202, USA
| | - John P Sundberg
- The Jackson Laboratory, Main Street, Bar Harbor, MA 04609-1500, USA
| | - Beth Shapiro
- Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK
| | - Marc Van Ranst
- Laboratory of Clinical & Epidemiological Virology, Rega Institute for Medical Research, University of Leuven, Minderbroedersstraat, B3000 Leuven, Belgium
| |
Collapse
|
34
|
Abstract
MOTIVATION Horizontal gene transfer (HGT) is believed to be ubiquitous among bacteria, and plays a major role in their genome diversification as well as their ability to develop resistance to antibiotics. In light of its evolutionary significance and implications for human health, developing accurate and efficient methods for detecting and reconstructing HGT is imperative. RESULTS In this article we provide a new HGT-oriented likelihood framework for many problems that involve phylogeny-based HGT detection and reconstruction. Beside the formulation of various likelihood criteria, we show that most of these problems are NP-hard, and offer heuristics for efficient and accurate reconstruction of HGT under these criteria. We implemented our heuristics and used them to analyze biological as well as synthetic data. In both cases, our criteria and heuristics exhibited very good performance with respect to identifying the correct number of HGT events as well as inferring their correct location on the species tree. AVAILABILITY Implementation of the criteria as well as heuristics and hardness proofs are available from the authors upon request. Hardness proofs can also be downloaded at http://www.cs.tau.ac.il/~tamirtul/MLNET/Supp-ML.pdf
Collapse
Affiliation(s)
- Guohua Jin
- Department of Computer Science, Rice University Houston, TX, USA
| | | | | | | |
Collapse
|
35
|
Abstract
Exponentially accumulating genetic molecular data were supposed to bring us closer to resolving one of the most fundamental issues in biology—the reconstruction of the tree of life. This tree should encompass the evolutionary history of all living creatures on earth and trace back a few billions of years to
the most ancient microbial ancestor.
Ironically, this abundance of data only blurs our traditional beliefs and seems to make this goal harder to achieve than initially thought. This is largelydue to lateral gene transfer, the passage of genetic material between organisms not through lineal descent. Evolution in light of lateral transfer tangles the traditional universal tree of life, turning it into a network of relationships. Lateral
transfer is a significant factor in microbial evolution and is the mechanism of antibiotic resistance spread in bacteria species.
In this paper we survey current methods designed to cope with lateral transfer in conjunction with vertical inheritance. We distinguish between phylogenetic-based methods and sequence-based methods and illuminate the advantages and disadvantages of each. Finally, we sketch a new statistically rigorous approach aimed at identifying lateral transfer between two genomes.
Collapse
Affiliation(s)
- Sagi Snir
- Institute of Evolution, University of Haifa, 31905 Haifa, Israel and Department of Computer Science, Netanya Academic College
| |
Collapse
|
36
|
Morrison DA. Networks in phylogenetic analysis: new tools for population biology. Int J Parasitol 2005; 35:567-82. [PMID: 15826648 DOI: 10.1016/j.ijpara.2005.02.007] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2005] [Revised: 02/10/2005] [Accepted: 02/10/2005] [Indexed: 11/29/2022]
Abstract
Phylogenetic analysis has changed greatly in the past decade, including the more widespread appreciation of the idea that evolutionary histories are not always tree-like, and may, thus, be best represented as reticulated networks rather than as strictly dichotomous trees. Reconstructing such histories in the absence of a bifurcating speciation process is even more difficult than the usual procedure, and a range of alternative strategies have been developed. There seem to be two basic uses for a network model of evolution: the display of real but unobservable evolutionary events (i.e. a hypothesis of the true phylogenetic history), and the display of character conflict within the data itself (i.e. a summary of the data). These two general approaches are briefly reviewed here, and the strengths and weaknesses of the different implementations are compared and contrasted. Each network methodology seems to have limitations in terms of how it responds to increasing complexity (e.g. conflict) in the data, and therefore each is likely to be more appropriate for one of the two uses than for the other. Several examples using parasitological data sets illustrate the uses of networks within the context of population biology.
Collapse
Affiliation(s)
- David A Morrison
- Department of Parasitology (SWEPAR), National Veterinary Institute and Swedish University of Agricultural Sciences, 751 89 Uppsala, Sweden.
| |
Collapse
|
37
|
|
38
|
Moret BME, Nakhleh L, Warnow T, Linder CR, Tholse A, Padolina A, Sun J, Timme R. Phylogenetic networks: modeling, reconstructibility, and accuracy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2004; 1:13-23. [PMID: 17048405 DOI: 10.1109/tcbb.2004.10] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Phylogenetic networks model the evolutionary history of sets of organisms when events such as hybrid speciation and horizontal gene transfer occur. In spite of their widely acknowledged importance in evolutionary biology, phylogenetic networks have so far been studied mostly for specific data sets. We present a general definition of phylogenetic networks in terms of directed acyclic graphs (DAGs) and a set of conditions. Further, we distinguish between model networks and reconstructible ones and characterize the effect of extinction and taxon sampling on the reconstructibility of the network. Simulation studies are a standard technique for assessing the performance of phylogenetic methods. A main step in such studies entails quantifying the topological error between the model and inferred phylogenies. While many measures of tree topological accuracy have been proposed, none exist for phylogenetic networks. Previously, we proposed the first such measure, which applied only to a restricted class of networks. In this paper, we extend that measure to apply to all networks, and prove that it is a metric on the space of phylogenetic networks. Our results allow for the systematic study of existing network methods, and for the design of new accurate ones.
Collapse
Affiliation(s)
- Bernard M E Moret
- Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA.
| | | | | | | | | | | | | | | |
Collapse
|
39
|
Abstract
Recombination can be a dominant force in shaping genomes and associated phenotypes. To better understand the impact of recombination on genomic evolution, we need to be able to identify recombination in aligned sequences. We review bioinformatic approaches for detecting recombination and measuring recombination rates. We also examine the impact of recombination on the reconstruction of evolutionary histories and the estimation of population genetic parameters. Finally, we review the role of recombination in the evolutionary history of bacteria, viruses, and human mitochondria. We conclude by highlighting a number of areas for future development of tools to help quantify the role of recombination in genomic evolution.
Collapse
Affiliation(s)
- David Posada
- Variagenics Inc. Cambridge, Massachusetts 02139, USA.
| | | | | |
Collapse
|
40
|
Abstract
This paper poses the problem of estimating and validating phylogenetic trees in statistical terms. The problem is hard enough to warrant several tacks: we reason by analogy to rounding real numbers, and dealing with ranking data. These are both cases where, as in phylogeny the parameters of interest are not real numbers. Then we pose the problem in geometrical terms, using distances and measures on a natural space of trees. We do not solve the problems of inference on tree space, but suggest some coherent ways of tackling them.
Collapse
Affiliation(s)
- Susan Holmes
- Statistics Department, Stanford University, CA 94305-4065, USA.
| |
Collapse
|
41
|
Abstract
The problem of inferring confidence sets of gene trees is discussed without assuming that the substitution model or the branching pattern of any of the investigated trees is correct. In this case, widely used methods to compare genealogies can give highly contradicting results. Here, three methods to infer confidence sets that are robust against model misspecification are compared, including a new approach based on estimating the confidence in a specific tree using expected-likelihood weights. The power of the investigated methods is studied by analysing HIV-1 and mtDNA sequence data as well as simulated sequences. Finally, guidelines for choosing an appropriate method to compare multiple gene trees are provided.
Collapse
Affiliation(s)
- Korbinian Strimmer
- Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK.
| | | |
Collapse
|
42
|
Abstract
Species-level phylogenies derived from molecular data provide an indirect record of the speciation events that have led to extant species. This offers enormous potential for investigating the general causes and rates of speciation within clades. To make the most of this potential, we should ideally sample all the species in a higher group, such as a genus, ensure that those species reflect evolutionary entities within the group, and rule out the effects of other processes, such as extinction, as explanations for observed patterns. We discuss recent practical and theoretical advances in this area and outline how future work should benefit from incorporating data from genealogical and phylogeographical scales.
Collapse
|
43
|
|
44
|
|
45
|
Abstract
Intraspecific gene evolution cannot always be represented by a bifurcating tree. Rather, population genealogies are often multifurcated, descendant genes coexist with persistent ancestors and recombination events produce reticulate relationships. Whereas traditional phylogenetic methods assume bifurcating trees, several networking approaches have recently been developed to estimate intraspecific genealogies that take into account these population-level phenomena.
Collapse
|
46
|
Abstract
Methods such as maximum parsimony (MP) are frequently criticized as being statistically unsound and not being based on any "model." On the other hand, advocates of MP claim that maximum likelihood (ML) has some fundamental problems. Here, we explore the connection between the different versions of MP and ML methods, particularly in light of recent theoretical results. We describe links between the two methods--for example, we describe how MP can be regarded as an ML method when there is no common mechanism between sites (such as might occur with morphological data and certain forms of molecular data). In the process, we clarify certain historical points of disagreement between proponents of the two methodologies, including a discussion of several forms of the ML optimality criterion. We also describe some additional results that shed light on how much needs to be assumed about underlying models of sequence evolution in order to successfully reconstruct evolutionary trees.
Collapse
Affiliation(s)
- M Steel
- Biomathematics Research Centre, University of Canterbury, Christchurch, New Zealand.
| | | |
Collapse
|