1
|
Wong TKF, Cherryh C, Rodrigo AG, Hahn MW, Minh BQ, Lanfear R. MAST: Phylogenetic Inference with Mixtures Across Sites and Trees. Syst Biol 2024; 73:375-391. [PMID: 38421146 PMCID: PMC11282360 DOI: 10.1093/sysbio/syae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 12/18/2023] [Accepted: 02/27/2024] [Indexed: 03/02/2024] Open
Abstract
Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting (ILS), introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call mixtures across sites and trees (MAST). This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of ILS in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of 4 Platyrrhine species for which standard concatenated maximum likelihood (ML) and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e., the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyze a concatenated alignment using ML while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.
Collapse
Affiliation(s)
- Thomas K F Wong
- School of Computing, Australian National University, Canberra, ACT 2601, Australia
| | - Caitlin Cherryh
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Allen G Rodrigo
- School of Biological Sciences, University of Auckland, Auckland 1142, New Zealand
| | - Matthew W Hahn
- Department of Biology and Department of Computer Science, Indiana University, Bloomington, Indiana 47405, USA
| | - Bui Quang Minh
- School of Computing, Australian National University, Canberra, ACT 2601, Australia
| | - Robert Lanfear
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| |
Collapse
|
2
|
Bozhüyük KAJ, Präve L, Kegler C, Schenk L, Kaiser S, Schelhas C, Shi YN, Kuttenlochner W, Schreiber M, Kandler J, Alanjary M, Mohiuddin TM, Groll M, Hochberg GKA, Bode HB. Evolution-inspired engineering of nonribosomal peptide synthetases. Science 2024; 383:eadg4320. [PMID: 38513038 DOI: 10.1126/science.adg4320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 02/09/2024] [Indexed: 03/23/2024]
Abstract
Many clinically used drugs are derived from or inspired by bacterial natural products that often are produced through nonribosomal peptide synthetases (NRPSs), megasynthetases that activate and join individual amino acids in an assembly line fashion. In this work, we describe a detailed phylogenetic analysis of several bacterial NRPSs that led to the identification of yet undescribed recombination sites within the thiolation (T) domain that can be used for NRPS engineering. We then developed an evolution-inspired "eXchange Unit between T domains" (XUT) approach, which allows the assembly of NRPS fragments over a broad range of GC contents, protein similarities, and extender unit specificities, as demonstrated for the specific production of a proteasome inhibitor designed and assembled from five different NRPS fragments.
Collapse
Affiliation(s)
- Kenan A J Bozhüyük
- Max Planck Institute for Terrestrial Microbiology, Department of Natural Products in Organismic Interactions, 35043 Marburg, Germany
- Molecular Biotechnology, Department of Biosciences, Goethe-University Frankfurt, 60438 Frankfurt, Germany
- Myria Biosciences AG, Tech Park Basel, Hochbergstrasse 60C, 4057 Basel, Switzerland
| | - Leonard Präve
- Max Planck Institute for Terrestrial Microbiology, Department of Natural Products in Organismic Interactions, 35043 Marburg, Germany
- Molecular Biotechnology, Department of Biosciences, Goethe-University Frankfurt, 60438 Frankfurt, Germany
| | - Carsten Kegler
- Max Planck Institute for Terrestrial Microbiology, Department of Natural Products in Organismic Interactions, 35043 Marburg, Germany
- Molecular Biotechnology, Department of Biosciences, Goethe-University Frankfurt, 60438 Frankfurt, Germany
| | - Leonie Schenk
- Max Planck Institute for Terrestrial Microbiology, Department of Natural Products in Organismic Interactions, 35043 Marburg, Germany
- Molecular Biotechnology, Department of Biosciences, Goethe-University Frankfurt, 60438 Frankfurt, Germany
| | - Sebastian Kaiser
- Max Planck Institute for Terrestrial Microbiology, Department of Natural Products in Organismic Interactions, 35043 Marburg, Germany
- Evolutionary Biochemistry Group, Max Planck Institute for Terrestrial Microbiology, 35043 Marburg, Germany
| | - Christian Schelhas
- Max Planck Institute for Terrestrial Microbiology, Department of Natural Products in Organismic Interactions, 35043 Marburg, Germany
| | - Yan-Ni Shi
- Molecular Biotechnology, Department of Biosciences, Goethe-University Frankfurt, 60438 Frankfurt, Germany
| | - Wolfgang Kuttenlochner
- Chair of Biochemistry, Center for Protein Assemblies, Technical University of Munich, Ernst-Otto-Fischer-Straße 8, 85748 Garching, Germany
| | - Max Schreiber
- Max Planck Institute for Terrestrial Microbiology, Department of Natural Products in Organismic Interactions, 35043 Marburg, Germany
- Molecular Biotechnology, Department of Biosciences, Goethe-University Frankfurt, 60438 Frankfurt, Germany
| | - Joshua Kandler
- Molecular Biotechnology, Department of Biosciences, Goethe-University Frankfurt, 60438 Frankfurt, Germany
| | - Mohammad Alanjary
- Bioinformatics Group, Wageningen University, Droevendaalsesteeg 1, 6708PB Wageningen, The Netherlands
| | - T M Mohiuddin
- Molecular Biotechnology, Department of Biosciences, Goethe-University Frankfurt, 60438 Frankfurt, Germany
| | - Michael Groll
- Chair of Biochemistry, Center for Protein Assemblies, Technical University of Munich, Ernst-Otto-Fischer-Straße 8, 85748 Garching, Germany
| | - Georg K A Hochberg
- Evolutionary Biochemistry Group, Max Planck Institute for Terrestrial Microbiology, 35043 Marburg, Germany
- Center for Synthetic Microbiology (SYNMIKRO), Phillips University Marburg, 35043 Marburg, Germany
- Department of Chemistry, Phillips University Marburg, 35043 Marburg, Germany
| | - Helge B Bode
- Max Planck Institute for Terrestrial Microbiology, Department of Natural Products in Organismic Interactions, 35043 Marburg, Germany
- Molecular Biotechnology, Department of Biosciences, Goethe-University Frankfurt, 60438 Frankfurt, Germany
- Center for Synthetic Microbiology (SYNMIKRO), Phillips University Marburg, 35043 Marburg, Germany
- Department of Chemistry, Phillips University Marburg, 35043 Marburg, Germany
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG) & Senckenberg Gesellschaft für Naturforschung, 60325 Frankfurt, Germany
| |
Collapse
|
3
|
Pluta A, Rola-Łuszczak M, Hoffmann FG, Donnik I, Petropavlovskiy M, Kuźmak J. Genetic Variability of Bovine Leukemia Virus: Evidence of Dual Infection, Recombination and Quasi-Species. Pathogens 2024; 13:178. [PMID: 38392916 PMCID: PMC10893129 DOI: 10.3390/pathogens13020178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/23/2024] [Accepted: 02/13/2024] [Indexed: 02/25/2024] Open
Abstract
We have characterized the intrahost genetic variation in the bovine leukemia virus (BLV) by examining 16 BLV isolates originating from the Western Siberia-Tyumen and South Ural-Chelyabinsk regions of Russia. Our research focused on determining the genetic composition of an 804 bp fragment of the BLV env gene, encoding for the entire gp51 protein. The results provide the first indication of the quasi-species genetic nature of BLV infection and its relevance for genome-level variation. Furthermore, this is the first phylogenetic evidence for the existence of a dual infection with BLV strains belonging to different genotypes within the same host: G4 and G7. We identified eight cases of recombination between these two BLV genotypes. The detection of quasi-species with cases of dual infection and recombination indicated a higher potential of BLV for genetic variability at the intra-host level than was previously considered.
Collapse
Affiliation(s)
- Aneta Pluta
- Department of Biochemistry, National Veterinary Research Institute, 24-100 Puławy, Poland; (M.R.-Ł.); (J.K.)
| | - Marzena Rola-Łuszczak
- Department of Biochemistry, National Veterinary Research Institute, 24-100 Puławy, Poland; (M.R.-Ł.); (J.K.)
| | - Federico G. Hoffmann
- Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology, Mississippi State University, Starkville, MS 39762, USA;
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, MS 39762, USA
| | - Irina Donnik
- Ural State Agrarian University, Ekaterinburg 620075, Russia;
| | - Maxim Petropavlovskiy
- Ural Federal Agrarian Scientific Research Centre of the Ural Branch of the Russian Academy of Sciences, Ekaterinburg 620049, Russia;
| | - Jacek Kuźmak
- Department of Biochemistry, National Veterinary Research Institute, 24-100 Puławy, Poland; (M.R.-Ł.); (J.K.)
| |
Collapse
|
4
|
Patané JSL, Martins J, Setubal JC. A Guide to Phylogenomic Inference. Methods Mol Biol 2024; 2802:267-345. [PMID: 38819564 DOI: 10.1007/978-1-0716-3838-5_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. Phylogenomics has significant applications in fields such as evolutionary biology, systematics, comparative genomics, and conservation genetics, providing valuable insights into the origins and relationships of species and contributing to our understanding of biological diversity and evolution. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Laboratório de Genética e Cardiologia Molecular, Instituto do Coração/Heart Institute Hospital das Clínicas - Faculdade de Medicina da Universidade de São Paulo São Paulo, São Paulo, SP, Brazil
| | - Joaquim Martins
- Integrative Omics group, Biorenewables National Laboratory, Brazilian Center for Research in Energy and Materials, Campinas, SP, Brazil
| | - João Carlos Setubal
- Departmento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, SP, Brazil.
| |
Collapse
|
5
|
Gluck-Thaler E, Cerutti A, Perez-Quintero AL, Butchacas J, Roman-Reyna V, Madhavan VN, Shantharaj D, Merfa MV, Pesce C, Jauneau A, Vancheva T, Lang JM, Allen C, Verdier V, Gagnevin L, Szurek B, Beckham GT, De La Fuente L, Patel HK, Sonti RV, Bragard C, Leach JE, Noël LD, Slot JC, Koebnik R, Jacobs JM. Repeated gain and loss of a single gene modulates the evolution of vascular plant pathogen lifestyles. SCIENCE ADVANCES 2020; 6:6/46/eabc4516. [PMID: 33188025 PMCID: PMC7673761 DOI: 10.1126/sciadv.abc4516] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 09/30/2020] [Indexed: 05/21/2023]
Abstract
Vascular plant pathogens travel long distances through host veins, leading to life-threatening, systemic infections. In contrast, nonvascular pathogens remain restricted to infection sites, triggering localized symptom development. The contrasting features of vascular and nonvascular diseases suggest distinct etiologies, but the basis for each remains unclear. Here, we show that the hydrolase CbsA acts as a phenotypic switch between vascular and nonvascular plant pathogenesis. cbsA was enriched in genomes of vascular phytopathogenic bacteria in the family Xanthomonadaceae and absent in most nonvascular species. CbsA expression allowed nonvascular Xanthomonas to cause vascular blight, while cbsA mutagenesis resulted in reduction of vascular or enhanced nonvascular symptom development. Phylogenetic hypothesis testing further revealed that cbsA was lost in multiple nonvascular lineages and more recently gained by some vascular subgroups, suggesting that vascular pathogenesis is ancestral. Our results overall demonstrate how the gain and loss of single loci can facilitate the evolution of complex ecological traits.
Collapse
Affiliation(s)
- Emile Gluck-Thaler
- Department of Plant Pathology, The Ohio State University, Columbus, OH 43210, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Aude Cerutti
- LIPM, Université de Toulouse, INRAE, CNRS, Université Paul Sabatier, Castanet-Tolosan, France
| | | | - Jules Butchacas
- Department of Plant Pathology, The Ohio State University, Columbus, OH 43210, USA
- Infectious Disease Institute, The Ohio State University, Columbus, OH 43210, USA
| | - Verónica Roman-Reyna
- Department of Plant Pathology, The Ohio State University, Columbus, OH 43210, USA
- Infectious Disease Institute, The Ohio State University, Columbus, OH 43210, USA
| | | | - Deepak Shantharaj
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL 36849, USA
| | - Marcus V Merfa
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL 36849, USA
| | - Céline Pesce
- IRD, CIRAD, Université Montpellier, IPME, Montpellier, France
- Earth & Life Institute, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
- HM Clause (Limagrain group), Davis, CA, 95618, USA
| | - Alain Jauneau
- Institut Fédératif de Recherche 3450, Plateforme Imagerie, Pôle de Biotechnologie Végétale, Castanet-Tolosan, France
| | - Taca Vancheva
- IRD, CIRAD, Université Montpellier, IPME, Montpellier, France
- Earth & Life Institute, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
| | - Jillian M Lang
- Agricultural Biology, Colorado State University, Fort Collins, CO, USA
| | - Caitilyn Allen
- Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Valerie Verdier
- IRD, CIRAD, Université Montpellier, IPME, Montpellier, France
| | - Lionel Gagnevin
- IRD, CIRAD, Université Montpellier, IPME, Montpellier, France
| | - Boris Szurek
- IRD, CIRAD, Université Montpellier, IPME, Montpellier, France
| | - Gregg T Beckham
- Renewable Resources and Enabling Sciences Center, National Renewable Energy Laboratory, Golden, CO 80401, USA
| | - Leonardo De La Fuente
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL 36849, USA
| | | | - Ramesh V Sonti
- CSIR-Centre for Cellular and Molecular Biology, Hyderabad 500007, India
| | - Claude Bragard
- Earth & Life Institute, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
| | - Jan E Leach
- Agricultural Biology, Colorado State University, Fort Collins, CO, USA
| | - Laurent D Noël
- LIPM, Université de Toulouse, INRAE, CNRS, Université Paul Sabatier, Castanet-Tolosan, France
| | - Jason C Slot
- Department of Plant Pathology, The Ohio State University, Columbus, OH 43210, USA
- Infectious Disease Institute, The Ohio State University, Columbus, OH 43210, USA
| | - Ralf Koebnik
- IRD, CIRAD, Université Montpellier, IPME, Montpellier, France.
| | - Jonathan M Jacobs
- Department of Plant Pathology, The Ohio State University, Columbus, OH 43210, USA.
- Infectious Disease Institute, The Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
6
|
Smith SA, Walker-Hale N, Walker JF. Intragenic Conflict in Phylogenomic Data Sets. Mol Biol Evol 2020; 37:3380-3388. [DOI: 10.1093/molbev/msaa170] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Abstract
Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.
Collapse
Affiliation(s)
- Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | | | - Joseph F Walker
- The Sainsbury Laboratory (SLCU), University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
7
|
Allman ES, Kubatko LS, Rhodes JA. Split Scores: A Tool to Quantify Phylogenetic Signal in Genome-Scale Data. Syst Biol 2018; 66:620-636. [PMID: 28123114 DOI: 10.1093/sysbio/syw103] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2016] [Accepted: 10/28/2016] [Indexed: 11/14/2022] Open
Abstract
Detecting variation in the evolutionary process along chromosomes is increasingly important as whole-genome data become more widely available. For example, factors such as incomplete lineage sorting, horizontal gene transfer, and chromosomal inversion are expected to result in changes in the underlying gene trees along a chromosome, while changes in selective pressure and mutational rates for different genomic regions may lead to shifts in the underlying mutational process. We propose the split score as a general method for quantifying support for a particular phylogenetic relationship within a genomic data set. Because the split score is based on algebraic properties of a matrix of site pattern frequencies, it can be rapidly computed, even for data sets that are large in the number of taxa and/or in the length of the alignment, providing an advantage over other methods (e.g., maximum likelihood) that are often used to assess such support. Using simulation, we explore the properties of the split score, including its dependence on sequence length, branch length, size of a split and its ability to detect true splits in the underlying tree. Using a sliding window analysis, we show that split scores can be used to detect changes in the underlying evolutionary process for genome-scale data from primates, mosquitoes, and viruses in a computationally efficient manner. Computation of the split score has been implemented in the software package SplitSup.
Collapse
Affiliation(s)
- Elizabeth S Allman
- Department of Mathematics and Statistics, PO Box 756660, University of Alaska Fairbanks, Fairbanks, AK 99775-6660, USA
| | - Laura S Kubatko
- Department of Statistics and Department of Evolution, Ecology, and Organismal Biology, The Ohio State University, Columbus, OH 43210, USA
| | - John A Rhodes
- Department of Mathematics and Statistics, PO Box 756660, University of Alaska Fairbanks, Fairbanks, AK 99775-6660, USA
| |
Collapse
|
8
|
Abstract
Phylogenomics aims at reconstructing the evolutionary histories of organisms taking into account whole genomes or large fractions of genomes. The abundance of genomic data for an enormous variety of organisms has enabled phylogenomic inference of many groups, and this has motivated the development of many computer programs implementing the associated methods. This chapter surveys phylogenetic concepts and methods aimed at both gene tree and species tree reconstruction while also addressing common pitfalls, providing references to relevant computer programs. A practical phylogenomic analysis example including bacterial genomes is presented at the end of the chapter.
Collapse
Affiliation(s)
- José S L Patané
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - Joaquim Martins
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil
| | - João C Setubal
- Department of Biochemistry, Institute of Chemistry, University of São Paulo, Av. Prof. Lineu Prestes 748, São Paulo, SP, 05508-000, Brazil.
| |
Collapse
|
9
|
Abstract
This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.
Collapse
Affiliation(s)
- Gergely J Szöllősi
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Eric Tannier
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Bastien Boussau
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France;
| |
Collapse
|
10
|
Höhna S, Heath TA, Boussau B, Landis MJ, Ronquist F, Huelsenbeck JP. Probabilistic graphical model representation in phylogenetics. Syst Biol 2014; 63:753-71. [PMID: 24951559 PMCID: PMC4184382 DOI: 10.1093/sysbio/syu039] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.]
Collapse
Affiliation(s)
- Sebastian Höhna
- Department of Mathematics, Stockholm University, Stockholm, SE-106 91 Stockholm, Sweden; Department of Evolution and Ecology, University of California, Davis, Storer Hall, One Shields Avenue, Davis, CA 95616, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Bioinformatics and Evolutionary Genomics, Université de Lyon, Villeurbanne, France; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden; and Department of Biological Science, King Abdulaziz University, Jeddah, Saudi Arabia;Department of Mathematics, Stockholm University, Stockholm, SE-106 91 Stockholm, Sweden; Department of Evolution and Ecology, University of California, Davis, Storer Hall, One Shields Avenue, Davis, CA 95616, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Bioinformatics and Evolutionary Genomics, Université de Lyon, Villeurbanne, France; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden; and Department of Biological Science, King Abdulaziz University, Jeddah, Saudi Arabia;
| | - Tracy A Heath
- Department of Mathematics, Stockholm University, Stockholm, SE-106 91 Stockholm, Sweden; Department of Evolution and Ecology, University of California, Davis, Storer Hall, One Shields Avenue, Davis, CA 95616, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Bioinformatics and Evolutionary Genomics, Université de Lyon, Villeurbanne, France; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden; and Department of Biological Science, King Abdulaziz University, Jeddah, Saudi Arabia;Department of Mathematics, Stockholm University, Stockholm, SE-106 91 Stockholm, Sweden; Department of Evolution and Ecology, University of California, Davis, Storer Hall, One Shields Avenue, Davis, CA 95616, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Bioinformatics and Evolutionary Genomics, Université de Lyon, Villeurbanne, France; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden; and Department of Biological Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Bastien Boussau
- Department of Mathematics, Stockholm University, Stockholm, SE-106 91 Stockholm, Sweden; Department of Evolution and Ecology, University of California, Davis, Storer Hall, One Shields Avenue, Davis, CA 95616, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Bioinformatics and Evolutionary Genomics, Université de Lyon, Villeurbanne, France; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden; and Department of Biological Science, King Abdulaziz University, Jeddah, Saudi Arabia;Department of Mathematics, Stockholm University, Stockholm, SE-106 91 Stockholm, Sweden; Department of Evolution and Ecology, University of California, Davis, Storer Hall, One Shields Avenue, Davis, CA 95616, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Bioinformatics and Evolutionary Genomics, Université de Lyon, Villeurbanne, France; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden; and Department of Biological Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Michael J Landis
- Department of Mathematics, Stockholm University, Stockholm, SE-106 91 Stockholm, Sweden; Department of Evolution and Ecology, University of California, Davis, Storer Hall, One Shields Avenue, Davis, CA 95616, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Bioinformatics and Evolutionary Genomics, Université de Lyon, Villeurbanne, France; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden; and Department of Biological Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Fredrik Ronquist
- Department of Mathematics, Stockholm University, Stockholm, SE-106 91 Stockholm, Sweden; Department of Evolution and Ecology, University of California, Davis, Storer Hall, One Shields Avenue, Davis, CA 95616, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Bioinformatics and Evolutionary Genomics, Université de Lyon, Villeurbanne, France; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden; and Department of Biological Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - John P Huelsenbeck
- Department of Mathematics, Stockholm University, Stockholm, SE-106 91 Stockholm, Sweden; Department of Evolution and Ecology, University of California, Davis, Storer Hall, One Shields Avenue, Davis, CA 95616, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Bioinformatics and Evolutionary Genomics, Université de Lyon, Villeurbanne, France; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden; and Department of Biological Science, King Abdulaziz University, Jeddah, Saudi Arabia;Department of Mathematics, Stockholm University, Stockholm, SE-106 91 Stockholm, Sweden; Department of Evolution and Ecology, University of California, Davis, Storer Hall, One Shields Avenue, Davis, CA 95616, USA; Department of Integrative Biology, University of California, Berkeley, CA 94720, USA; Department of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045, USA; Bioinformatics and Evolutionary Genomics, Université de Lyon, Villeurbanne, France; Department of Bioinformatics and Genetics, Swedish Museum of Natural History, SE-10405 Stockholm, Sweden; and Department of Biological Science, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
11
|
Doyle VP, Andersen JJ, Nelson BJ, Metzker ML, Brown JM. Untangling the influences of unmodeled evolutionary processes on phylogenetic signal in a forensically important HIV-1 transmission cluster. Mol Phylogenet Evol 2014; 75:126-37. [PMID: 24589520 DOI: 10.1016/j.ympev.2014.02.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Revised: 02/17/2014] [Accepted: 02/19/2014] [Indexed: 11/28/2022]
Abstract
Stochastic models of sequence evolution have been developed to reflect many biologically important processes, allowing for accurate phylogenetic reconstruction when an appropriate model is selected. However, commonly used models do not incorporate several potentially important biological processes. Spurious phylogenetic inference may result if these processes play an important role in the evolution of a dataset yet are not incorporated into assumed models. Few studies have attempted to assess the relative importance of multiple processes in producing spurious inferences. The application of phylogenetic methods to infer the source of HIV-1 transmission clusters depends upon accurate phylogenetic results, yet there are several relevant unmodeled biological processes (e.g., recombination and convergence) that may cause complications. Here, through analyses of HIV-1 env sequences from a small, forensically important transmission cluster, we tease apart the impact of these processes and present evidence suggesting that convergent evolution and high rates of insertions and deletions (causing alignment uncertainty) led to spurious phylogenetic signal with forensic relevance. Previous analyses show paraphyly of HIV-1 lineages sampled from an individual who, based on non-phylogenetic evidence, had never acted as a source of infection for others in this transmission cluster. If true, this pattern calls into question assumptions underlying phylogenetic approaches to source and recipient identification. By systematically assessing the contribution of different unmodeled processes, we demonstrate that removal of sites likely influenced by strong positive selection both reduces the alignment-wide signal supporting paraphyly of viruses sampled from this individual and eliminates support for the effects of recombination. Additionally, the removal of ambiguously aligned sites alters strongly supported relationships among viruses sampled from different individuals. These observations highlight the need to jointly consider multiple unmodeled evolutionary processes and motivate a phylogenomic perspective when inferring viral transmission histories.
Collapse
Affiliation(s)
- Vinson P Doyle
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - John J Andersen
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Bradley J Nelson
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA
| | - Michael L Metzker
- Human Genome Sequencing Center, Department of Molecular and Human Genetics, and Cell and Molecular Biology Program, Baylor College of Medicine, Houston, TX, USA
| | - Jeremy M Brown
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA.
| |
Collapse
|
12
|
Chung Y, Perna NT, Ané C. Computing the joint distribution of tree shape and tree distance for gene tree inference and recombination detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1263-1274. [PMID: 24384712 DOI: 10.1109/tcbb.2013.109] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Ancestral recombination events can cause the underlying genealogy of a site to vary along the genome. We consider Bayesian models to simultaneously detect recombination breakpoints in very long sequence alignments and estimate the phylogenetic tree of each block between breakpoints. The models we consider use a dissimilarity measure between trees in their prior distribution to favor similar trees at neighboring loci. We show empirical evidence in Enterobacteria that neighboring genomic regions have similar trees. The main hurdle in using such models is the need to properly calculate the normalizing function for the prior probabilities on trees. In this work, we quantify the impact of approximating this normalizing function as done in biomc2, a hierarchical Bayesian method to detect recombination based on distance between tree topologies. We then derive an algorithm to calculate the normalizing function exactly, for a Gibbs distribution based on the Robinson-Foulds (RF) distance between gene trees at neighboring loci. At the core is the calculation of the joint distribution of the shape of a random tree and its RF distance to a fixed tree. We also propose fast approximations to the normalizing function, which are shown to be very accurate with little impact on the Bayesian inference.
Collapse
|
13
|
Galardini M, Pini F, Bazzicalupo M, Biondi EG, Mengoni A. Replicon-dependent bacterial genome evolution: the case of Sinorhizobium meliloti. Genome Biol Evol 2013; 5:542-58. [PMID: 23431003 PMCID: PMC3622305 DOI: 10.1093/gbe/evt027] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Many bacterial species, such as the alphaproteobacterium Sinorhizobium meliloti, are characterized by open pangenomes and contain multipartite genomes consisting of a chromosome and other large-sized replicons, such as chromids, megaplasmids, and plasmids. The evolutionary forces in both functional and structural aspects that shape the pangenome of species with multipartite genomes are still poorly understood. Therefore, we sequenced the genomes of 10 new S. meliloti strains, analyzed with four publicly available additional genomic sequences. Results indicated that the three main replicons present in these strains (a chromosome, a chromid, and a megaplasmid) partly show replicon-specific behaviors related to strain differentiation. In particular, the pSymB chromid was shown to be a hot spot for positively selected genes, and, unexpectedly, genes resident in the pSymB chromid were also found to be more widespread in distant taxa than those located in the other replicons. Moreover, through the exploitation of a DNA proximity network, a series of conserved “DNA backbones” were found to shape the evolution of the genome structure, with the rest of the genome experiencing rearrangements. The presented data allow depicting a scenario where the pSymB chromid has a distinctive role in intraspecies differentiation and in evolution through positive selection, whereas the pSymA megaplasmid mostly contributes to structural fluidity and to the emergence of new functions, indicating a specific evolutionary role for each replicon in the pangenome evolution.
Collapse
Affiliation(s)
- Marco Galardini
- Department of Biology, University of Firenze, Firenze, Italy
| | | | | | | | | |
Collapse
|
14
|
Schumer M, Cui R, Boussau B, Walter R, Rosenthal G, Andolfatto P. An evaluation of the hybrid speciation hypothesis for Xiphophorus clemenciae based on whole genome sequences. Evolution 2013; 67:1155-68. [PMID: 23550763 PMCID: PMC3621027 DOI: 10.1111/evo.12009] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Once thought rare in animal taxa, hybridization has been increasingly recognized as an important and common force in animal evolution. In the past decade, a number of studies have suggested that hybridization has driven speciation in some animal groups. We investigate the signature of hybridization in the genome of a putative hybrid species, Xiphophorus clemenciae, through whole genome sequencing of this species and its hypothesized progenitors. Based on analysis of this data, we find that X. clemenciae is unlikely to have been derived from admixture between its proposed parental species. However, we find significant evidence for recent gene flow between Xiphophorus species. Although we detect genetic exchange in two pairs of species analyzed, the proportion of genomic regions that can be attributed to hybrid origin is small, suggesting that strong behavioral premating isolation prevents frequent hybridization in Xiphophorus. The direction of gene flow between species is potentially consistent with a role for sexual selection in mediating hybridization.
Collapse
Affiliation(s)
- Molly Schumer
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, New Jersey 08544, USA.
| | | | | | | | | | | |
Collapse
|
15
|
Garzón-Martínez GA, Zhu ZI, Landsman D, Barrero LS, Mariño-Ramírez L. The Physalis peruviana leaf transcriptome: assembly, annotation and gene model prediction. BMC Genomics 2012; 13:151. [PMID: 22533342 PMCID: PMC3488962 DOI: 10.1186/1471-2164-13-151] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2011] [Accepted: 04/25/2012] [Indexed: 11/16/2022] Open
Abstract
Background Physalis peruviana commonly known as Cape gooseberry is a member of the Solanaceae family that has an increasing popularity due to its nutritional and medicinal values. A broad range of genomic tools is available for other Solanaceae, including tomato and potato. However, limited genomic resources are currently available for Cape gooseberry. Results We report the generation of a total of 652,614 P. peruviana Expressed Sequence Tags (ESTs), using 454 GS FLX Titanium technology. ESTs, with an average length of 371 bp, were obtained from a normalized leaf cDNA library prepared using a Colombian commercial variety. De novo assembling was performed to generate a collection of 24,014 isotigs and 110,921 singletons, with an average length of 1,638 bp and 354 bp, respectively. Functional annotation was performed using NCBI’s BLAST tools and Blast2GO, which identified putative functions for 21,191 assembled sequences, including gene families involved in all the major biological processes and molecular functions as well as defense response and amino acid metabolism pathways. Gene model predictions in P. peruviana were obtained by using the genomes of Solanum lycopersicum (tomato) and Solanum tuberosum (potato). We predict 9,436 P. peruviana sequences with multiple-exon models and conserved intron positions with respect to the potato and tomato genomes. Additionally, to study species diversity we developed 5,971 SSR markers from assembled ESTs. Conclusions We present the first comprehensive analysis of the Physalis peruviana leaf transcriptome, which will provide valuable resources for development of genetic tools in the species. Assembled transcripts with gene models could serve as potential candidates for marker discovery with a variety of applications including: functional diversity, conservation and improvement to increase productivity and fruit quality. P. peruviana was estimated to be phylogenetically branched out before the divergence of five other Solanaceae family members, S. lycopersicum, S. tuberosum, Capsicum spp, S. melongena and Petunia spp.
Collapse
Affiliation(s)
- Gina A Garzón-Martínez
- Plant Molecular Genetics Laboratory, Center of Biotechnology and Bioindustry (CBB), Colombian Corporation for Agricultural Research (CORPOICA), Bogota, Colombia
| | | | | | | | | |
Collapse
|
16
|
Ané C. Detecting phylogenetic breakpoints and discordance from genome-wide alignments for species tree reconstruction. Genome Biol Evol 2011; 3:246-58. [PMID: 21362638 PMCID: PMC3070431 DOI: 10.1093/gbe/evr013] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
With the easy acquisition of sequence data, it is now possible to obtain and align whole genomes across multiple related species or populations. In this work, I assess the performance of a statistical method to reconstruct the whole distribution of phylogenetic trees along the genome, estimate the proportion of the genome for which a given clade is true, and infer a concordance tree that summarizes the dominant vertical inheritance pattern. There are two main issues when dealing with whole-genome alignments, as opposed to multiple genes: the size of the data and the detection of recombination breakpoints. These breakpoints partition the genomic alignment into phylogenetically homogeneous loci, where sites within a given locus all share the same phylogenetic tree topology. To delimitate these loci, I describe here a method based on the minimum description length (MDL) principle, implemented with dynamic programming for computational efficiency. Simulations show that combining MDL partitioning with Bayesian concordance analysis provides an efficient and robust way to estimate both the vertical inheritance signal and the horizontal phylogenetic signal. The method performed well both in the presence of incomplete lineage sorting and in the presence of horizontal gene transfer. A high level of systematic bias was found here, highlighting the need for good individual tree building methods, which form the basis for more elaborate gene tree/species tree reconciliation methods.
Collapse
Affiliation(s)
- Cécile Ané
- Departments of Statistics and Botany, University of Wisconsin-Madison, USA.
| |
Collapse
|