1
|
Veith T, Bleicker T, Eschbach-Bludau M, Brünink S, Mühlemann B, Schneider J, Beheim-Schwarzbach J, Rakotondranary SJ, Ratovonamana YR, Tsagnangara C, Ernest R, Randriantafika F, Sommer S, Stetter N, Jones TC, Drosten C, Ganzhorn JU, Corman VM. Non-structural genes of novel lemur adenoviruses reveal codivergence of virus and host. Virus Evol 2023; 9:vead024. [PMID: 37091898 PMCID: PMC10121206 DOI: 10.1093/ve/vead024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 03/06/2023] [Accepted: 03/27/2023] [Indexed: 03/29/2023] Open
Abstract
Adenoviruses (AdVs) are important human and animal pathogens and are frequently used as vectors for gene therapy and vaccine delivery. Surprisingly, there are only scant data regarding primate AdV origin and evolution, especially in the most basal primate hosts. We detect and sequence AdVs from faeces of two Madagascan lemur species. Complete genome sequence analyses define a new AdV species with a particularly large gene encoding a protein of unknown function in the early gene region 3. Unexpectedly, the new AdV species is not most similar to human or other simian AdVs but to bat adenovirus C. Genome characterisation shows signals of virus-host codivergence in non-structural genes, which show lower diversity than structural genes. Outside a lemur species mixing zone, recombination less frequently separates structural genes, as in human adenovirus C. The evolutionary history of lemur AdVs likely involves both a host switch and codivergence with the lemur hosts.
Collapse
Affiliation(s)
- Talitha Veith
- Institute of Virology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, Berlin 10117, Germany
| | - Tobias Bleicker
- Institute of Virology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, Berlin 10117, Germany
| | - Monika Eschbach-Bludau
- Institute of Virology, University Hospital, University of Bonn, Venusberg-Campus 1, Bonn 53127, Germany
| | - Sebastian Brünink
- Institute of Virology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, Berlin 10117, Germany
| | - Barbara Mühlemann
- Institute of Virology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, Berlin 10117, Germany
- German Centre for Infection Research (DZIF), Partner Site Berlin, Charitéplatz 1, Berlin 10117, Germany
| | - Julia Schneider
- Institute of Virology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, Berlin 10117, Germany
- German Centre for Infection Research (DZIF), Partner Site Berlin, Charitéplatz 1, Berlin 10117, Germany
| | - Jörn Beheim-Schwarzbach
- Institute of Virology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, Berlin 10117, Germany
| | - S Jacques Rakotondranary
- Institute of Cell and Systems Biology of Animals, Universität Hamburg, Martin-Luther-King Platz 3, Hamburg 20146, Germany
- Département Biologie Animale, Faculté des Sciences, Université d’ Antananarivo, P.O. Box 906, Antananarivo 101, Madagascar
| | - Yedidya R Ratovonamana
- Institute of Cell and Systems Biology of Animals, Universität Hamburg, Martin-Luther-King Platz 3, Hamburg 20146, Germany
- Département Biologie Animale, Faculté des Sciences, Université d’ Antananarivo, P.O. Box 906, Antananarivo 101, Madagascar
| | - Cedric Tsagnangara
- Tropical Biodiversity and Social Enterprise SARL, Immeuble CNAPS, premier étage, Fort Dauphin 614, Madagascar
| | - Refaly Ernest
- Tropical Biodiversity and Social Enterprise SARL, Immeuble CNAPS, premier étage, Fort Dauphin 614, Madagascar
| | | | - Simone Sommer
- Institute of Evolutionary Ecology and Conservation Genomics, University of Ulm, Albert-Einstein Allee 11, Ulm 89069, Germany
| | - Nadine Stetter
- Institute of Cell and Systems Biology of Animals, Universität Hamburg, Martin-Luther-King Platz 3, Hamburg 20146, Germany
- Bernhard Nocht Institute for Tropical Medicine, Bernhard-Nocht-Straße 74, Hamburg 20359, Germany
| | - Terry C Jones
- Institute of Virology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, Berlin 10117, Germany
- Centre for Pathogen Evolution, Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK
| | - Christian Drosten
- Institute of Virology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, Berlin 10117, Germany
- German Centre for Infection Research (DZIF), Partner Site Berlin, Charitéplatz 1, Berlin 10117, Germany
| | - Jörg U Ganzhorn
- Institute of Cell and Systems Biology of Animals, Universität Hamburg, Martin-Luther-King Platz 3, Hamburg 20146, Germany
| | - Victor M Corman
- Institute of Virology, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Charitéplatz 1, Berlin 10117, Germany
- German Centre for Infection Research (DZIF), Partner Site Berlin, Charitéplatz 1, Berlin 10117, Germany
- Labor Berlin, Charité—Vivantes GmbH, Sylter Straße 2, Berlin 13353, Germany
| |
Collapse
|
2
|
Jacob Machado D, Scott R, Guirales S, Janies DA. Fundamental evolution of all Orthocoronavirinae including three deadly lineages descendent from Chiroptera-hosted coronaviruses: SARS-CoV, MERS-CoV and SARS-CoV-2. Cladistics 2021; 37:461-488. [PMID: 34570933 PMCID: PMC8239696 DOI: 10.1111/cla.12454] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/24/2021] [Indexed: 12/14/2022] Open
Abstract
The severe acute respiratory syndrome coronavirus (SARS-CoV) emerged in humans in 2002. Despite reports showing Chiroptera as the original animal reservoir of SARS-CoV, many argue that Carnivora-hosted viruses are the most likely origin. The emergence of the Middle East respiratory syndrome coronavirus (MERS-CoV) in 2012 also involves Chiroptera-hosted lineages. However, factors such as the lack of comprehensive phylogenies hamper our understanding of host shifts once MERS-CoV emerged in humans and Artiodactyla. Since 2019, the origin of SARS-CoV-2, causative agent of coronavirus disease 2019 (COVID-19), added to this episodic history of zoonotic transmission events. Here we introduce a phylogenetic analysis of 2006 unique and complete genomes of different lineages of Orthocoronavirinae. We used gene annotations to align orthologous sequences for total evidence analysis under the parsimony optimality criterion. Deltacoronavirus and Gammacoronavirus were set as outgroups to understand spillovers of Alphacoronavirus and Betacoronavirus among ten orders of animals. We corroborated that Chiroptera-hosted viruses are the sister group of SARS-CoV, SARS-CoV-2 and MERS-related viruses. Other zoonotic events were qualified and quantified to provide a comprehensive picture of the risk of coronavirus emergence among humans. Finally, we used a 250 SARS-CoV-2 genomes dataset to elucidate the phylogenetic relationship between SARS-CoV-2 and Chiroptera-hosted coronaviruses.
Collapse
Affiliation(s)
- Denis Jacob Machado
- Department of Bioinformatics and GenomicsUniversity of North Carolina at Charlotte9331 Robert D. Snyder RdCharlotteNC28223USA
| | - Rachel Scott
- Department of Bioinformatics and GenomicsUniversity of North Carolina at Charlotte9331 Robert D. Snyder RdCharlotteNC28223USA
| | - Sayal Guirales
- Department of Bioinformatics and GenomicsUniversity of North Carolina at Charlotte9331 Robert D. Snyder RdCharlotteNC28223USA
| | - Daniel A. Janies
- Department of Bioinformatics and GenomicsUniversity of North Carolina at Charlotte9331 Robert D. Snyder RdCharlotteNC28223USA
| |
Collapse
|
3
|
Smith SA, Walker-Hale N, Walker JF. Intragenic Conflict in Phylogenomic Data Sets. Mol Biol Evol 2020; 37:3380-3388. [DOI: 10.1093/molbev/msaa170] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Abstract
Most phylogenetic analyses assume that a single evolutionary history underlies one gene. However, both biological processes and errors can cause intragenic conflict. The extent to which this conflict is present in empirical data sets is not well documented, but if common, could have far-reaching implications for phylogenetic analyses. We examined several large phylogenomic data sets from diverse taxa using a fast and simple method to identify well-supported intragenic conflict. We found conflict to be highly variable between data sets, from 1% to >92% of genes investigated. We analyzed four exemplar genes in detail and analyzed simulated data under several scenarios. Our results suggest that alignment error may be one major source of conflict, but other conflicts remain unexplained and may represent biological signal or other errors. Whether as part of data analysis pipelines or to explore biologically processes, analyses of within-gene phylogenetic signal should become common.
Collapse
Affiliation(s)
- Stephen A Smith
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI
| | | | - Joseph F Walker
- The Sainsbury Laboratory (SLCU), University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
4
|
Gurtler V, Grando D, Kumar BK, Maiti B, Karunasagar I, Karunasagar I. The Use of Recombined Ribosomal RNA Operon (rrn) Type-Specific Flanking Genes to Investigate rrn Differences Between Vibrio parahaemolyticus Environmental and Clinical Strains. GENE REPORTS 2016. [DOI: 10.1016/j.genrep.2016.02.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
5
|
Abstract
This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.
Collapse
Affiliation(s)
- Gergely J Szöllősi
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Eric Tannier
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Vincent Daubin
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France
| | - Bastien Boussau
- ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France; ELTE-MTA "Lendület" Biophysics Research Group, Pázmány P. stny. 1A., 1117 Budapest, Hungary; Laboratoire de Biométrie et Biologie Evolutive, Centre National de la Recherche Scientifique, Unité Mixte de Recherche 5558, Université Lyon 1, F-69622 Villeurbanne, France; Université de Lyon, F-69000 Lyon, France; and Institut National de Recherche en Informatique et en Automatique Rhône-Alpes, F-38334 Montbonnot, France;
| |
Collapse
|
6
|
Grummer JA, Bryson RW, Reeder TW. Species Delimitation Using Bayes Factors: Simulations and Application to the Sceloporus scalaris Species Group (Squamata: Phrynosomatidae). Syst Biol 2013; 63:119-33. [DOI: 10.1093/sysbio/syt069] [Citation(s) in RCA: 201] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Affiliation(s)
- Jared A. Grummer
- Department of Biology, San Diego State University, San Diego, CA 92182-4614, USA and 2Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
- Department of Biology, San Diego State University, San Diego, CA 92182-4614, USA and 2Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
| | - Robert W. Bryson
- Department of Biology, San Diego State University, San Diego, CA 92182-4614, USA and 2Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
| | - Tod W. Reeder
- Department of Biology, San Diego State University, San Diego, CA 92182-4614, USA and 2Department of Biology and Burke Museum of Natural History and Culture, University of Washington, Box 351800, Seattle, WA 98195-1800, USA
| |
Collapse
|
7
|
Collective phenomena and non-finite state computation in a human social system. PLoS One 2013; 8:e75818. [PMID: 24130745 PMCID: PMC3794014 DOI: 10.1371/journal.pone.0075818] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2013] [Accepted: 08/21/2013] [Indexed: 11/29/2022] Open
Abstract
We investigate the computational structure of a paradigmatic example of distributed social interaction: that of the open-source Wikipedia community. We examine the statistical properties of its cooperative behavior, and perform model selection to determine whether this aspect of the system can be described by a finite-state process, or whether reference to an effectively unbounded resource allows for a more parsimonious description. We find strong evidence, in a majority of the most-edited pages, in favor of a collective-state model, where the probability of a “revert” action declines as the square root of the number of non-revert actions seen since the last revert. We provide evidence that the emergence of this social counter is driven by collective interaction effects, rather than properties of individual users.
Collapse
|
8
|
Chung Y, Perna NT, Ané C. Computing the joint distribution of tree shape and tree distance for gene tree inference and recombination detection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1263-1274. [PMID: 24384712 DOI: 10.1109/tcbb.2013.109] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Ancestral recombination events can cause the underlying genealogy of a site to vary along the genome. We consider Bayesian models to simultaneously detect recombination breakpoints in very long sequence alignments and estimate the phylogenetic tree of each block between breakpoints. The models we consider use a dissimilarity measure between trees in their prior distribution to favor similar trees at neighboring loci. We show empirical evidence in Enterobacteria that neighboring genomic regions have similar trees. The main hurdle in using such models is the need to properly calculate the normalizing function for the prior probabilities on trees. In this work, we quantify the impact of approximating this normalizing function as done in biomc2, a hierarchical Bayesian method to detect recombination based on distance between tree topologies. We then derive an algorithm to calculate the normalizing function exactly, for a Gibbs distribution based on the Robinson-Foulds (RF) distance between gene trees at neighboring loci. At the core is the calculation of the joint distribution of the shape of a random tree and its RF distance to a fixed tree. We also propose fast approximations to the normalizing function, which are shown to be very accurate with little impact on the Bayesian inference.
Collapse
|
9
|
Quinlivan M, Cook F, Kenna R, Callinan JJ, Cullinane A. Genetic characterization by composite sequence analysis of a new pathogenic field strain of equine infectious anemia virus from the 2006 outbreak in Ireland. J Gen Virol 2013; 94:612-622. [DOI: 10.1099/vir.0.047191-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Equine infectious anemia virus (EIAV), the causative agent of equine infectious anaemia (EIA), possesses the least-complex genomic organization of any known extant lentivirus. Despite this relative genetic simplicity, all of the complete genomic sequences published to date are derived from just two viruses, namely the North American EIAVWYOMING (EIAVWY) and Chinese EIAVLIAONING (EIAVLIA) strains. In 2006, an outbreak of EIA occurred in Ireland, apparently as a result of the importation of contaminated horse plasma from Italy and subsequent iatrogenic transmission to foals. This EIA outbreak was characterized by cases of severe, sometimes fatal, disease. To begin to understand the molecular mechanisms underlying this pathogenic phenotype, complete proviral genomic sequences in the form of 12 overlapping PCR-generated fragments were obtained from four of the EIAV-infected animals, including two of the index cases. Sequence analysis of multiple molecular clones produced from each fragment demonstrated the extent of diversity within individual viral genes and permitted construction of consensus whole-genome sequences for each of the four viral isolates. In addition, complete env gene sequences were obtained from 11 animals with differing clinical profiles, despite exposure to a common EIAV source. Although the overall genomic organization of the Irish EIAV isolates was typical of that seen in all other strains, the European viruses possessed ≤80 % nucleotide sequence identity with either EIAVWY or EIAVLIA. Furthermore, phylogenetic analysis suggested that the Irish EIAV isolates developed independently of the North American and Chinese viruses and that they constitute a separate monophyletic group.
Collapse
Affiliation(s)
- Michelle Quinlivan
- Virology Unit, Irish Equine Centre, Johnstown, Naas, Co. Kildare, Ireland
| | - Frank Cook
- Gluck Equine Research Centre, Department of Veterinary Science, University of Kentucky, Lexington, KY 40545, USA
| | - Rachel Kenna
- Virology Unit, Irish Equine Centre, Johnstown, Naas, Co. Kildare, Ireland
| | - John J. Callinan
- Veterinary Science Centre, University College Dublin, Belfield, Dublin 4, Ireland
| | - Ann Cullinane
- Virology Unit, Irish Equine Centre, Johnstown, Naas, Co. Kildare, Ireland
| |
Collapse
|
10
|
Guindon S. From trajectories to averages: an improved description of the heterogeneity of substitution rates along lineages. Syst Biol 2012; 62:22-34. [PMID: 22798331 DOI: 10.1093/sysbio/sys063] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The accuracy and precision of species divergence date estimation from molecular data strongly depend on the models describing the variation of substitution rates along a phylogeny. These models generally assume that rates randomly fluctuate along branches from one node to the next. However, for mathematical convenience, the stochasticity of such a process is ignored when translating these rate trajectories into branch lengths. This study addresses this shortcoming. A new approach is described that explicitly considers the average substitution rates along branches as random quantities, resulting in a more realistic description of the variations of evolutionary rates along lineages. The proposed method provides more precise estimates of the rate autocorrelation parameter as well as divergence times. Also, simulation results indicate that ignoring the stochastic variation of rates along edges can lead to significant overestimation of specific node ages. Altogether, the new approach introduced in this study is a step forward to designing biologically relevant models of rate evolution that are well suited to data sets with dense taxon sampling which are likely to present rate autocorrelation. The computer programme PhyTime, part of the PhyML package and implementing the new approach, is available from http://code.google.com/p/phyml (last accessed 1 August 2012).
Collapse
Affiliation(s)
- Stéphane Guindon
- Department of Statistics, University of Auckland, Auckland, 1010, New Zealand.
| |
Collapse
|
11
|
Bay RA, Bielawski JP. Recombination Detection Under Evolutionary Scenarios Relevant to Functional Divergence. J Mol Evol 2012; 73:273-86. [DOI: 10.1007/s00239-011-9473-0] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2011] [Accepted: 11/07/2011] [Indexed: 12/01/2022]
|
12
|
Matthews LJ, Tehrani JJ, Jordan FM, Collard M, Nunn CL. Testing for divergent transmission histories among cultural characters: a study using Bayesian phylogenetic methods and Iranian tribal textile data. PLoS One 2011; 6:e14810. [PMID: 21559083 PMCID: PMC3084691 DOI: 10.1371/journal.pone.0014810] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2010] [Accepted: 03/18/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Archaeologists and anthropologists have long recognized that different cultural complexes may have distinct descent histories, but they have lacked analytical techniques capable of easily identifying such incongruence. Here, we show how bayesian phylogenetic analysis can be used to identify incongruent cultural histories. We employ the approach to investigate Iranian tribal textile traditions. METHODS We used bayes factor comparisons in a phylogenetic framework to test two models of cultural evolution: the hierarchically integrated system hypothesis and the multiple coherent units hypothesis. In the hierarchically integrated system hypothesis, a core tradition of characters evolves through descent with modification and characters peripheral to the core are exchanged among contemporaneous populations. In the multiple coherent units hypothesis, a core tradition does not exist. Rather, there are several cultural units consisting of sets of characters that have different histories of descent. RESULTS For the Iranian textiles, the bayesian phylogenetic analyses supported the multiple coherent units hypothesis over the hierarchically integrated system hypothesis. Our analyses suggest that pile-weave designs represent a distinct cultural unit that has a different phylogenetic history compared to other textile characters. CONCLUSIONS The results from the Iranian textiles are consistent with the available ethnographic evidence, which suggests that the commercial rug market has influenced pile-rug designs but not the techniques or designs incorporated in the other textiles produced by the tribes. We anticipate that bayesian phylogenetic tests for inferring cultural units will be of great value for researchers interested in studying the evolution of cultural traits including language, behavior, and material culture.
Collapse
Affiliation(s)
- Luke J Matthews
- Department of Human Evolutionary Biology, Harvard University, Cambridge, Massachusetts, United States of America.
| | | | | | | | | |
Collapse
|
13
|
Huelsenbeck JP, Alfaro ME, Suchard MA. Biologically inspired phylogenetic models strongly outperform the no common mechanism model. Syst Biol 2011; 60:225-32. [PMID: 21252385 PMCID: PMC3038349 DOI: 10.1093/sysbio/syq089] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2009] [Revised: 06/29/2009] [Accepted: 09/22/2010] [Indexed: 11/13/2022] Open
Abstract
But Tuffley and Steel (1997) introduced a model called No Common Mechanism (NCM), in which characters may-but are not required to-vary their relative rates independently, both within and between branches. Because the independent variation is taken only as a possibility, not as a requirement, NCM would apply to almost any situation, and so may be accepted as realistic. This is useful because Tuffley and Steel also showed that maximum likelihood under NCM selects the same trees as does parsimony. With the realistic NCM in the background, then, most parsimonious trees have greatest power to explain available observations. -Farris (2008).
Collapse
Affiliation(s)
- John P Huelsenbeck
- Department of Integrative Biology, University of California, Berkeley, CA 94720-3140, USA.
| | | | | |
Collapse
|
14
|
Boussau B, Guéguen L, Gouy M. A mixture model and a hidden markov model to simultaneously detect recombination breakpoints and reconstruct phylogenies. Evol Bioinform Online 2009; 5:67-79. [PMID: 19812727 PMCID: PMC2747125 DOI: 10.4137/ebo.s2242] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Homologous recombination is a pervasive biological process that affects sequences in all living organisms and viruses. In the presence of recombination, the evolutionary history of an alignment of homologous sequences cannot be properly depicted by a single bifurcating tree: some sites have evolved along a specific phylogenetic tree, others have followed another path. Methods available to analyse recombination in sequences usually involve an analysis of the alignment through sliding-windows, or are particularly demanding in computational resources, and are often limited to nucleotide sequences. In this article, we propose and implement a Mixture Model on trees and a phylogenetic Hidden Markov Model to reveal recombination breakpoints while searching for the various evolutionary histories that are present in an alignment known to have undergone homologous recombination. These models are sufficiently efficient to be applied to dozens of sequences on a single desktop computer, and can handle equivalently nucleotide or protein sequences. We estimate their accuracy on simulated sequences and test them on real data.
Collapse
Affiliation(s)
- Bastien Boussau
- Université de Lyon, université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, 43 boulevard du 11 novembre 1918, Villeurbanne F-69622, France.
| | | | | |
Collapse
|
15
|
Lemey P, Lott M, Martin DP, Moulton V. Identifying recombinants in human and primate immunodeficiency virus sequence alignments using quartet scanning. BMC Bioinformatics 2009; 10:126. [PMID: 19397803 PMCID: PMC2684544 DOI: 10.1186/1471-2105-10-126] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Accepted: 04/27/2009] [Indexed: 12/02/2022] Open
Abstract
Background Recombination has a profound impact on the evolution of viruses, but characterizing recombination patterns in molecular sequences remains a challenging endeavor. Despite its importance in molecular evolutionary studies, identifying the sequences that exhibit such patterns has received comparatively less attention in the recombination detection framework. Here, we extend a quartet-mapping based recombination detection method to enable identification of recombinant sequences without prior specifications of either query and reference sequences. Through simulations we evaluate different recombinant identification statistics and significance tests. We compare the quartet approach with triplet-based methods that employ additional heuristic tests to identify parental and recombinant sequences. Results Analysis of phylogenetic simulations reveal that identifying the descendents of relatively old recombination events is a challenging task for all methods available, and that quartet scanning performs relatively well compared to the triplet based methods. The use of quartet scanning is further demonstrated by analyzing both well-established and putative HIV-1 recombinant strains. In agreement with recent findings, we provide evidence that the presumed circulating recombinant CRF02_AG is a 'pure' lineage, whereas the presumed parental lineage subtype G has a recombinant origin. We also demonstrate HIV-1 intrasubtype recombination, confirm the hybrid origin of SIV in chimpanzees and further disentangle the recombinant history of SIV lineages in a primate immunodeficiency virus data set. Conclusion Quartet scanning makes a valuable addition to triplet-based methods for identifying recombinant sequences without prior specifications of either query and reference sequences. The new method is available in the VisRD v.3.0 package .
Collapse
Affiliation(s)
- Philippe Lemey
- Rega Institute, Katholieke Universiteit Leuven, Minderbroedersstraat 10, 3000 Leuven, Belgium.
| | | | | | | |
Collapse
|
16
|
McBride AJA, Cerqueira GM, Suchard MA, Moreira AN, Zuerner RL, Reis MG, Haake DA, Ko AI, Dellagostin OA. Genetic diversity of the Leptospiral immunoglobulin-like (Lig) genes in pathogenic Leptospira spp. INFECTION GENETICS AND EVOLUTION 2008; 9:196-205. [PMID: 19028604 DOI: 10.1016/j.meegid.2008.10.012] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2008] [Revised: 10/23/2008] [Accepted: 10/28/2008] [Indexed: 11/17/2022]
Abstract
Recent serologic, immunoprotection, and pathogenesis studies identified the Lig proteins as key virulence determinants in interactions of leptospiral pathogens with the mammalian host. We examined the sequence variation and recombination patterns of ligA, ligB, and ligC among 10 pathogenic strains from five Leptospira species. All strains were found to have intact ligB genes and genetic drift accounting for most of the ligB genetic diversity observed. The ligA gene was found exclusively in L. interrogans and L. kirschneri strains, and was created from ligB by a two-step partial gene duplication process. The aminoterminal domain of LigB and the LigA paralog were essentially identical (98.5+/-0.8% mean identity) in strains with both genes. Like ligB, ligC gene variation also followed phylogenetic patterns, suggesting an early gene duplication event. However, ligC is a pseudogene in several strains, suggesting that LigC is not essential for virulence. Two ligB genes and one ligC gene had mosaic compositions and evidence for recombination events between related Leptospira species was also found for some ligA genes. In conclusion, the results presented here indicate that Lig diversity has important ramifications for the selection of Lig polypeptides for use in diagnosis and as vaccine candidates. This sequence information will aid the identification of highly conserved regions within the Lig proteins and improve upon the performance characteristics of the Lig proteins in diagnostic assays and in subunit vaccine formulations with the potential to confer heterologous protection.
Collapse
Affiliation(s)
- Alan J A McBride
- Centro de Pesquisa Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, BA, Brazil
| | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Huelsenbeck JP, Ané C, Larget B, Ronquist F. A Bayesian perspective on a non-parsimonious parsimony model. Syst Biol 2008; 57:406-19. [PMID: 18570035 DOI: 10.1080/10635150802166046] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022] Open
Abstract
Several stochastic models of character change, when implemented in a maximum likelihood framework, are known to give a correspondence between the maximum parsimony method and the method of maximum likelihood. One such model has an independently estimated branch-length parameter for each site and each branch of the phylogenetic tree. This model--the no-common-mechanism model--has many parameters, and, in fact, the number of parameters increases as fast as the alignment is extended. We take a Bayesian approach to the no-common-mechanism model and place independent gamma prior probability distributions on the branch-length parameters. We are able to analytically integrate over the branch lengths, and this allowed us to implement an efficient Markov chain Monte Carlo method for exploring the space of phylogenetic trees. We were able to reliably estimate the posterior probabilities of clades for phylogenetic trees of up to 500 sequences. However, the Bayesian approach to the problem, at least as implemented here with an independent prior on the length of each branch, does not tame the behavior of the branch-length parameters. The integrated likelihood appears to be a simple rescaling of the parsimony score for a tree, and the marginal posterior probability distribution of the length of a branch is dependent upon how the maximum parsimony method reconstructs the characters at the interior nodes of the tree. The method we describe, however, is of potential importance in the analysis of morphological character data and also for improving the behavior of Markov chain Monte Carlo methods implemented for models in which sites share a common branch-length parameter.
Collapse
Affiliation(s)
- John P Huelsenbeck
- Department of Integrative Biology, University of California, Berkeley, CA 94720-3140, USA.
| | | | | | | |
Collapse
|
18
|
Stevenson B, Choy HA, Pinne M, Rotondi ML, Miller MC, Demoll E, Kraiczy P, Cooley AE, Creamer TP, Suchard MA, Brissette CA, Verma A, Haake DA. Leptospira interrogans endostatin-like outer membrane proteins bind host fibronectin, laminin and regulators of complement. PLoS One 2007; 2:e1188. [PMID: 18000555 PMCID: PMC2063517 DOI: 10.1371/journal.pone.0001188] [Citation(s) in RCA: 168] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2007] [Accepted: 10/24/2007] [Indexed: 11/19/2022] Open
Abstract
The pathogenic spirochete Leptospira interrogans disseminates throughout its hosts via the bloodstream, then invades and colonizes a variety of host tissues. Infectious leptospires are resistant to killing by their hosts' alternative pathway of complement-mediated killing, and interact with various host extracellular matrix (ECM) components. The LenA outer surface protein (formerly called LfhA and Lsa24) was previously shown to bind the host ECM component laminin and the complement regulators factor H and factor H-related protein-1. We now demonstrate that infectious L. interrogans contain five additional paralogs of lenA, which we designated lenB, lenC, lenD, lenE and lenF. All six genes encode domains predicted to bear structural and functional similarities with mammalian endostatins. Sequence analyses of genes from seven infectious L. interrogans serovars indicated development of sequence diversity through recombination and intragenic duplication. LenB was found to bind human factor H, and all of the newly-described Len proteins bound laminin. In addition, LenB, LenC, LenD, LenE and LenF all exhibited affinities for fibronectin, a distinct host extracellular matrix protein. These characteristics suggest that Len proteins together facilitate invasion and colonization of host tissues, and protect against host immune responses during mammalian infection.
Collapse
Affiliation(s)
- Brian Stevenson
- Department of Microbiology, Immunology, and Molecular Genetics, University of Kentucky College of Medicine, Lexington, Kentucky, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Liang LJ, Weiss RE. A hierarchical semiparametric regression model for combining HIV-1 phylogenetic analyses using iterative reweighting algorithms. Biometrics 2007; 63:733-41. [PMID: 17825006 DOI: 10.1111/j.1541-0420.2007.00753.x] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Phylogenetic modeling is computationally challenging and most phylogeny models fit a single phylogeny to a single set of molecular sequences. Individual phylogenetic analyses are typically performed independently using publicly available software that fits a computationally intensive Bayesian model using Markov chain Monte Carlo (MCMC) simulation. We develop a Bayesian hierarchical semiparametric regression model to combine multiple phylogenetic analyses of HIV-1 nucleotide sequences and estimate parameters of interest within and across analyses. We use a mixture of Dirichlet processes as a prior for the parameters to relax inappropriate parametric assumptions and to ensure the prior distribution for the parameters is continuous. We use several reweighting algorithms for combining completed MCMC analyses to shrink parameter estimates while adjusting for data set-specific covariates. This avoids constructing a large complex model involving all the original data, which would be computationally challenging and would require rewriting the existing stand-alone software.
Collapse
Affiliation(s)
- Li-Jung Liang
- Department of Biostatistics, UCLA School of Public Health, Los Angeles, California 90095-1772, USA.
| | | |
Collapse
|
20
|
Minin VN, Dorman KS, Fang F, Suchard MA. Phylogenetic mapping of recombination hotspots in human immunodeficiency virus via spatially smoothed change-point processes. Genetics 2006; 175:1773-85. [PMID: 17194781 PMCID: PMC1855141 DOI: 10.1534/genetics.106.066258] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We present a Bayesian framework for inferring spatial preferences of recombination from multiple putative recombinant nucleotide sequences. Phylogenetic recombination detection has been an active area of research for the last 15 years. However, only recently attempts to summarize information from several instances of recombination have been made. We propose a hierarchical model that allows for simultaneous inference of recombination breakpoint locations and spatial variation in recombination frequency. The dual multiple change-point model for phylogenetic recombination detection resides at the lowest level of our hierarchy under the umbrella of a common prior on breakpoint locations. The hierarchical prior allows for information about spatial preferences of recombination to be shared among individual data sets. To overcome the sparseness of breakpoint data, dictated by the modest number of available recombinant sequences, we a priori impose a biologically relevant correlation structure on recombination location log odds via a Gaussian Markov random field hyperprior. To examine the capabilities of our model to recover spatial variation in recombination frequency, we simulate recombination from a predefined distribution of breakpoint locations. We then proceed with the analysis of 42 human immunodeficiency virus (HIV) intersubtype gag recombinants and identify a putative recombination hotspot.
Collapse
Affiliation(s)
- Vladimir N Minin
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, California 90095, USA
| | | | | | | |
Collapse
|
21
|
Fang F, Ding J, Minin VN, Suchard MA, Dorman KS. cBrother: relaxing parental tree assumptions for Bayesian recombination detection. Bioinformatics 2006; 23:507-8. [PMID: 17145740 DOI: 10.1093/bioinformatics/btl613] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED Bayesian multiple change-point models accurately detect recombination in molecular sequence data. Previous Java-based implementations assume a fixed topology for the representative parental data. cBrother is a novel C language implementation that capitalizes on reduced computational time to relax the fixed tree assumption. We show that cBrother is 19 times faster than its predecessor and the fixed tree assumption can influence estimates of recombination in a medically-relevant dataset. AVAILABILITY cBrother can be freely downloaded from http://www.biomath.org/dormanks/ and can be compiled on Linux, Macintosh and Windows operating systems. Online documentation and a tutorial are also available at the site.
Collapse
Affiliation(s)
- Fang Fang
- Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA 50011, USA
| | | | | | | | | |
Collapse
|
22
|
Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SDW. Automated Phylogenetic Detection of Recombination Using a Genetic Algorithm. Mol Biol Evol 2006; 23:1891-901. [PMID: 16818476 DOI: 10.1093/molbev/msl051] [Citation(s) in RCA: 699] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
The evolution of homologous sequences affected by recombination or gene conversion cannot be adequately explained by a single phylogenetic tree. Many tree-based methods for sequence analysis, for example, those used for detecting sites evolving nonneutrally, have been shown to fail if such phylogenetic incongruity is ignored. However, it may be possible to propose several phylogenies that can correctly model the evolution of nonrecombinant fragments. We propose a model-based framework that uses a genetic algorithm to search a multiple-sequence alignment for putative recombination break points, quantifies the level of support for their locations, and identifies sequences or clades involved in putative recombination events. The software implementation can be run quickly and efficiently in a distributed computing environment, and various components of the methods can be chosen for computational expediency or statistical rigor. We evaluate the performance of the new method on simulated alignments and on an array of published benchmark data sets. Finally, we demonstrate that prescreening alignments with our method allows one to analyze recombinant sequences for positive selection.
Collapse
|
23
|
Holloway AK, Cannatella DC, Gerhardt HC, Hillis DM. Polyploids with Different Origins and Ancestors Form a Single Sexual Polyploid Species. Am Nat 2006; 167:E88-101. [PMID: 16670990 DOI: 10.1086/501079] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2005] [Accepted: 11/02/2005] [Indexed: 11/03/2022]
Abstract
Polyploidization is one of the few mechanisms that can produce instantaneous speciation. Multiple origins of tetraploid lineages from the same two diploid progenitors are common, but here we report the first known instance of a single tetraploid species that originated repeatedly from at least three diploid ancestors. Parallel evolution of advertisement calls in tetraploid lineages of gray tree frogs has allowed these lineages to interbreed, resulting in a single sexually interacting polyploid species despite the separate origins of polyploids from different diploids. Speciation by polyploidization in these frogs has been the source of considerable debate, but the various published hypotheses have assumed that polyploids arose through either autopolyploidy or allopolyploidy of extant diploid species. We utilized molecular markers and advertisement calls to infer the origins of tetraploid gray tree frogs. Previous hypotheses did not sufficiently account for the observed data. Instead, we found that tetraploids originated multiple times from extant diploid gray tree frogs and two other, apparently extinct, lineages of tree frogs. Tetraploid lineages then merged through interbreeding to result in a single species. Thus, polyploid species may have complex origins, especially in systems in which isolating mechanisms (such as advertisement calls) are affected directly through hybridization and polyploidy.
Collapse
Affiliation(s)
- Alisha K Holloway
- Section of Integrative Biology, University of Texas, Austin, Texas 78712, USA.
| | | | | | | |
Collapse
|
24
|
Wilson DJ, McVean G. Estimating diversifying selection and functional constraint in the presence of recombination. Genetics 2006; 172:1411-25. [PMID: 16387887 PMCID: PMC1456295 DOI: 10.1534/genetics.105.044917] [Citation(s) in RCA: 193] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2005] [Accepted: 12/26/2005] [Indexed: 11/18/2022] Open
Abstract
Models of molecular evolution that incorporate the ratio of nonsynonymous to synonymous polymorphism (dN/dS ratio) as a parameter can be used to identify sites that are under diversifying selection or functional constraint in a sample of gene sequences. However, when there has been recombination in the evolutionary history of the sequences, reconstructing a single phylogenetic tree is not appropriate, and inference based on a single tree can give misleading results. In the presence of high levels of recombination, the identification of sites experiencing diversifying selection can suffer from a false-positive rate as high as 90%. We present a model that uses a population genetics approximation to the coalescent with recombination and use reversible-jump MCMC to perform Bayesian inference on both the dN/dS ratio and the recombination rate, allowing each to vary along the sequence. We demonstrate that the method has the power to detect variation in the dN/dS ratio and the recombination rate and does not suffer from a high false-positive rate. We use the method to analyze the porB gene of Neisseria meningitidis and verify the inferences using prior sensitivity analysis and model criticism techniques.
Collapse
Affiliation(s)
- Daniel J Wilson
- Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom.
| | | |
Collapse
|
25
|
Minin VN, Dorman KS, Fang F, Suchard MA. Dual multiple change-point model leads to more accurate recombination detection. Bioinformatics 2005; 21:3034-42. [PMID: 15914546 DOI: 10.1093/bioinformatics/bti459] [Citation(s) in RCA: 102] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION We introduce a dual multiple change-point (MCP) model for recombination detection among aligned nucleotide sequences. The dual MCP model is an extension of the model introduced previously by Suchard and co-workers. In the original single MCP model, one change-point process is used to model spatial phylogenetic variation. Here, we show that using two change-point processes, one for spatial variation of tree topologies and the other for spatial variation of substitution process parameters, increases recombination detection accuracy. Statistical analysis is done in a Bayesian framework using reversible jump Markov chain Monte Carlo sampling to approximate the joint posterior distribution of all model parameters. RESULTS We use primate mitochondrial DNA data with simulated recombination break-points at specific locations to compare the two models. We also analyze two real HIV sequences to identify recombination break-points using the dual MCP model.
Collapse
Affiliation(s)
- Vladimir N Minin
- Department of Biomathematics, David Geffen School of Medicine, University of California Los Angeles, CA 90095-1766, USA
| | | | | | | |
Collapse
|
26
|
Pollack JD, Li Q, Pearl DK. Taxonomic utility of a phylogenetic analysis of phosphoglycerate kinase proteins of Archaea, Bacteria, and Eukaryota: Insights by Bayesian analyses. Mol Phylogenet Evol 2005; 35:420-30. [PMID: 15804412 DOI: 10.1016/j.ympev.2005.02.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2004] [Revised: 02/04/2005] [Accepted: 02/07/2005] [Indexed: 10/25/2022]
Abstract
We studied 131 protein sequences of the essentially ubiquitous glycolytic enzyme 3-phosphoglycerate kinase (3-PGK) by Bayesian analyses in three Domains: 15 Archaea, 83 Bacteria, and 33 Eukaryota. The posterior distribution of phylogenetic trees developed were based on a uniform prior, the WAG model of protein evolution, Metropolis-Hastings sampling in a Markov chain Monte Carlo analysis, and a package of diagnostics to critically evaluate the validity of the analyses. The 15 Archaea separated with high posterior probability. The archaean Phyla Euryarchaeota and the apparently Euryarchaeota derived Crenarchaeota were monophyletic. The 33 Eukaryota separated into two main groups: the non-chlorophyllous forms with coherent sub-groupings of Euglenozoa, Alveolata, Fungi, and Metazoa and all the chlorophyllous species studied: the Plantae (Viridaeplantae), chlorophyllous Stramenopiles, and the chlorophyllous Bacteria. This association supports other opinions concerning the related lineage of cyanobacteria and the Plantae. The 3-PGK sequences from 83 Bacteria in almost every instance associated by their recognized taxal group: alpha-, beta-, gamma-, epsilon-proteobacteria, Chlamydia, Actinobacteridae, and Firmicutes. Firmicutes sequences were subdivided into three apparently monophyletic groups: the anaerobic Clostridia, the spore-forming Bacillales and a group containing the Mollicutes, Lactobacillales and non-spore-forming Bacillales. The 3-PGK-gene tree assemblage was notable both for its pervasive clustering in three Domains according to recognized taxonomic groupings of Class, Order, Family, and Genus. The 3-PGK enzyme or 3-PGK-like activity may have played a central role in the metabolism of the Universal Ancestor.
Collapse
Affiliation(s)
- J Dennis Pollack
- Department of Molecular Virology, Immunology and Medical Genetics, The Ohio State University, 333 West 10th Avenue, Columbus, OH 43210, USA.
| | | | | |
Collapse
|
27
|
Abstract
Horizontal gene transfer (HGT) plays a critical role in evolution across all domains of life with important biological and medical implications. I propose a simple class of stochastic models to examine HGT using multiple orthologous gene alignments. The models function in a hierarchical phylogenetic framework. The top level of the hierarchy is based on a random walk process in "tree space" that allows for the development of a joint probabilistic distribution over multiple gene trees and an unknown, but estimable species tree. I consider two general forms of random walks. The first form is derived from the subtree prune and regraft (SPR) operator that mirrors the observed effects that HGT has on inferred trees. The second form is based on walks over complete graphs and offers numerically tractable solutions for an increasing number of taxa. The bottom level of the hierarchy utilizes standard phylogenetic models to reconstruct gene trees given multiple gene alignments conditional on the random walk process. I develop a well-mixing Markov chain Monte Carlo algorithm to fit the models in a Bayesian framework. I demonstrate the flexibility of these stochastic models to test competing ideas about HGT by examining the complexity hypothesis. Using 144 orthologous gene alignments from six prokaryotes previously collected and analyzed, Bayesian model selection finds support for (1) the SPR model over the alternative form, (2) the 16S rRNA reconstruction as the most likely species tree, and (3) increased HGT of operational genes compared to informational genes.
Collapse
Affiliation(s)
- Marc A Suchard
- Department of Biomathematics, David Geffen School of Medicine, University of California, Los Angeles, 90095-1766, USA.
| |
Collapse
|
28
|
Posada D, Buckley TR. Model selection and model averaging in phylogenetics: advantages of akaike information criterion and bayesian approaches over likelihood ratio tests. Syst Biol 2005; 53:793-808. [PMID: 15545256 DOI: 10.1080/10635150490522304] [Citation(s) in RCA: 2289] [Impact Index Per Article: 120.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus(genus Carabus) ground beetles described by Sota and Vogler (2001).
Collapse
Affiliation(s)
- David Posada
- Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, Vigo 36200, Spain.
| | | |
Collapse
|
29
|
Jones LR, Weber EL. Homologous recombination in bovine pestiviruses. Phylogenetic and statistic evidence. INFECTION GENETICS AND EVOLUTION 2004; 4:335-43. [PMID: 15374531 DOI: 10.1016/j.meegid.2004.04.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2004] [Revised: 04/26/2004] [Accepted: 04/26/2004] [Indexed: 11/28/2022]
Abstract
Bovine pestiviruses (Bovine Viral Diarrea Virus 1 (BVDV 1) and Bovine Viral Diarrea Virus 2 (BVDV 2)) belong to the genus Pestivirus (Flaviviridae), which is composed of positive stranded RNA viruses causing significant economic losses world-wide. We used phylogenetic and bootstrap analyses to systematically scan alignments of previously sequenced genomes in order to explore further the evolutionary mechanisms responsible for variation in the virus. Previously published data suggested that homologous crossover might be one of the mechanisms responsible for the genomic rearrangements observed in cytopathic (cp) strains of bovine pestiviruses. Nevertheless, homologous recombination involves not just homologous crossovers, but also replacement of a homologous region of the acceptor RNA. Furthermore, cytopathic strains represent dead paths in evolution, since they are isolated exclusively from the fatal cases of mucosal disease. Herein, we report evidence of homologous inter-genotype recombination in the genome of a non-cytopathic (ncp) strain of Bovine Viral Diarrea Virus 1, the type species of the genus Pestivirus. We also show that intra-genotype homologous recombination might be a common phenomenon in both species of Pestivirus. This evidence demonstrates that homologous recombination contribute to the diversification of bovine pestiviruses in nature. Implications for virus evolution, taxonomy and phylogenetics are discussed.
Collapse
Affiliation(s)
- Leandro Roberto Jones
- Instituto de Virología, CICVyA, Inta-Castelar, CC77 (1708) Morón, Buenos Aires, Argentina.
| | | |
Collapse
|
30
|
Paraskevis D, Deforche K, Lemey P, Magiorkinis G, Hatzakis A, Vandamme AM. SlidingBayes: exploring recombination using a sliding window approach based on Bayesian phylogenetic inference. Bioinformatics 2004; 21:1274-5. [PMID: 15546940 DOI: 10.1093/bioinformatics/bti139] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We developed a software tool (SlidingBayes) for recombination analysis based on Bayesian phylogenetic inference. Sliding-Bayes provides a powerful approach for detecting potential recombination, especially between highly divergent sequences and complex HIV-1 recombinants for which simpler methods like neighbor joining (NJ) may be less powerful. SlidingBayes guides Markov Chain Monte Carlo (MCMC) sampling performed by MrBayes in a sliding window across the alignment (Bayesian scanning). The tool can be used for nucleotide and amino acid sequences and combines all the modeling possibilities of MrBayes with the ability to plot the posterior probability support for clustering of various combinations of taxa.
Collapse
Affiliation(s)
- D Paraskevis
- Laboratory for Clinical and Epidemiological Virology, Rega Institute for Medical Research, Katholieke Universiteit Leuven, Minderbroedersstraat 10, B-3000 Leuven, Belgium.
| | | | | | | | | | | |
Collapse
|
31
|
Haake DA, Suchard MA, Kelley MM, Dundoo M, Alt DP, Zuerner RL. Molecular evolution and mosaicism of leptospiral outer membrane proteins involves horizontal DNA transfer. J Bacteriol 2004; 186:2818-28. [PMID: 15090524 PMCID: PMC387810 DOI: 10.1128/jb.186.9.2818-2828.2004] [Citation(s) in RCA: 98] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Leptospires belong to a genus of parasitic bacterial spirochetes that have adapted to a broad range of mammalian hosts. Mechanisms of leptospiral molecular evolution were explored by sequence analysis of four genes shared by 38 strains belonging to the core group of pathogenic Leptospira species: L. interrogans, L. kirschneri, L. noguchii, L. borgpetersenii, L. santarosai, and L. weilii. The 16S rRNA and lipL32 genes were highly conserved, and the lipL41 and ompL1 genes were significantly more variable. Synonymous substitutions are distributed throughout the ompL1 gene, whereas nonsynonymous substitutions are clustered in four variable regions encoding surface loops. While phylogenetic trees for the 16S, lipL32, and lipL41 genes were relatively stable, 8 of 38 (20%) ompL1 sequences had mosaic compositions consistent with horizontal transfer of DNA between related bacterial species. A novel Bayesian multiple change point model was used to identify the most likely sites of recombination and to determine the phylogenetic relatedness of the segments of the mosaic ompL1 genes. Segments of the mosaic ompL1 genes encoding two of the surface-exposed loops were likely acquired by horizontal transfer from a peregrine allele of unknown ancestry. Identification of the most likely sites of recombination with the Bayesian multiple change point model, an approach which has not previously been applied to prokaryotic gene sequence analysis, serves as a model for future studies of recombination in molecular evolution of genes.
Collapse
Affiliation(s)
- David A Haake
- Division of Infectious Diseases, Veterans Affairs Greater Los Angeles Healthcare System, Los Angeles, CA 90073, USA.
| | | | | | | | | | | |
Collapse
|
32
|
Abstract
The construction of evolutionary trees is now a standard part of exploratory sequence analysis. Bayesian methods for estimating trees have recently been proposed as a faster method of incorporating the power of complex statistical models into the process. Researchers who rely on comparative analyses need to understand the theoretical and practical motivations that underlie these new techniques, and how they differ from previous methods. The ability of the new approaches to address previously intractable questions is making phylogenetic analysis an essential tool in an increasing number of areas of genetic research.
Collapse
Affiliation(s)
- Mark Holder
- Department of Ecology and Evolutionary Biology, 75 North Eagleville Road, University of Connecticut, Storrs, Connecticut 06269-3043, USA.
| | | |
Collapse
|