1
|
Zaharias P, Warnow T. Recent progress on methods for estimating and updating large phylogenies. Philos Trans R Soc Lond B Biol Sci 2022; 377:20210244. [PMID: 35989607 PMCID: PMC9393559 DOI: 10.1098/rstb.2021.0244] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 01/07/2022] [Indexed: 12/20/2022] Open
Abstract
With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the past few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g. incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements. This article is part of a discussion meeting issue 'Genomic population structures of microbial pathogens'.
Collapse
Affiliation(s)
- Paul Zaharias
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| | - Tandy Warnow
- Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
2
|
Doyle JJ. Cell types as species: Exploring a metaphor. FRONTIERS IN PLANT SCIENCE 2022; 13:868565. [PMID: 36072310 PMCID: PMC9444152 DOI: 10.3389/fpls.2022.868565] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 07/29/2022] [Indexed: 06/05/2023]
Abstract
The concept of "cell type," though fundamental to cell biology, is controversial. Cells have historically been classified into types based on morphology, physiology, or location. More recently, single cell transcriptomic studies have revealed fine-scale differences among cells with similar gross phenotypes. Transcriptomic snapshots of cells at various stages of differentiation, and of cells under different physiological conditions, have shown that in many cases variation is more continuous than discrete, raising questions about the relationship between cell type and cell state. Some researchers have rejected the notion of fixed types altogether. Throughout the history of discussions on cell type, cell biologists have compared the problem of defining cell type with the interminable and often contentious debate over the definition of arguably the most important concept in systematics and evolutionary biology, "species." In the last decades, systematics, like cell biology, has been transformed by the increasing availability of molecular data, and the fine-grained resolution of genetic relationships have generated new ideas about how that variation should be classified. There are numerous parallels between the two fields that make exploration of the "cell types as species" metaphor timely. These parallels begin with philosophy, with discussion of both cell types and species as being either individuals, groups, or something in between (e.g., homeostatic property clusters). In each field there are various different types of lineages that form trees or networks that can (and in some cases do) provide criteria for grouping. Developing and refining models for evolutionary divergence of species and for cell type differentiation are parallel goals of the two fields. The goal of this essay is to highlight such parallels with the hope of inspiring biologists in both fields to look for new solutions to similar problems outside of their own field.
Collapse
|
3
|
Sheinman M, Arkhipova K, Arndt PF, Dutilh BE, Hermsen R, Massip F. Identical sequences found in distant genomes reveal frequent horizontal transfer across the bacterial domain. eLife 2021; 10:62719. [PMID: 34121661 PMCID: PMC8270642 DOI: 10.7554/elife.62719] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 06/13/2021] [Indexed: 12/19/2022] Open
Abstract
Horizontal gene transfer (HGT) is an essential force in microbial evolution. Despite detailed studies on a variety of systems, a global picture of HGT in the microbial world is still missing. Here, we exploit that HGT creates long identical DNA sequences in the genomes of distant species, which can be found efficiently using alignment-free methods. Our pairwise analysis of 93,481 bacterial genomes identified 138,273 HGT events. We developed a model to explain their statistical properties as well as estimate the transfer rate between pairs of taxa. This reveals that long-distance HGT is frequent: our results indicate that HGT between species from different phyla has occurred in at least 8% of the species. Finally, our results confirm that the function of sequences strongly impacts their transfer rate, which varies by more than three orders of magnitude between different functional categories. Overall, we provide a comprehensive view of HGT, illuminating a fundamental process driving bacterial evolution.
Collapse
Affiliation(s)
- Michael Sheinman
- Theoretical Biology and Bioinformatics, Biology Department, Utrecht University, Utrecht, Netherlands.,Division of Molecular Carcinogenesis, the Netherlands Cancer Institute, Amsterdam, Netherlands
| | - Ksenia Arkhipova
- Theoretical Biology and Bioinformatics, Biology Department, Utrecht University, Utrecht, Netherlands
| | - Peter F Arndt
- Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Biology Department, Utrecht University, Utrecht, Netherlands
| | - Rutger Hermsen
- Theoretical Biology and Bioinformatics, Biology Department, Utrecht University, Utrecht, Netherlands
| | - Florian Massip
- Berlin Institute for Medical Systems Biology, Max Delbrück Center, Berlin, Germany.,Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Evolutive UMR 5558, Villleurbanne, France
| |
Collapse
|
4
|
Avni E, Snir S. A New Phylogenomic Approach For Quantifying Horizontal Gene Transfer Trends in Prokaryotes. Sci Rep 2020; 10:12425. [PMID: 32709941 PMCID: PMC7381616 DOI: 10.1038/s41598-020-62446-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Accepted: 01/27/2020] [Indexed: 11/09/2022] Open
Abstract
It is well established nowadays that among prokaryotes, various families of orthologous genes exhibit conflicting evolutionary history. A prime factor for this conflict is horizontal gene transfer (HGT) - the transfer of genetic material not via vertical descent. Thus, the prevalence of HGT is challenging the meaningfulness of the classical Tree of Life concept. Here we present a comprehensive study of HGT representing the entire prokaryotic world. We mainly rely on a novel analytic approach for analyzing an aggregate of gene histories, by means of the quartet plurality distribution (QPD) that we develop. Through the analysis of real and simulated data, QPD is used to reveal evidence of a barrier against HGT, separating the archaea from the bacteria and making HGT between the two domains, in general, quite rare. In contrast, bacteria's confined HGT is substantially more frequent than archaea's. Our approach also reveals that despite intensive HGT, a strong tree-like signal can be extracted, corroborating several previous works. Thus, QPD, which enables one to analytically combine information from an aggregate of gene trees, can be used for understanding patterns and rates of HGT in prokaryotes, as well as for validating or refuting models of horizontal genetic transfers and evolution in general.
Collapse
Affiliation(s)
- Eliran Avni
- Department of Evolutionary Biology, University of Haifa, Haifa, 31905, Israel.
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, Haifa, 31905, Israel.
| |
Collapse
|
5
|
Puigbò P, Wolf YI, Koonin EV. Genome-Wide Comparative Analysis of Phylogenetic Trees: The Prokaryotic Forest of Life. Methods Mol Biol 2019; 1910:241-269. [PMID: 31278667 DOI: 10.1007/978-1-4939-9074-0_8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the boot-split distance (BSD) method is introduced as an extension of the previously developed split distance (SD) method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting treelike and netlike evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the applications methods used to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a "species tree."
Collapse
Affiliation(s)
- Pere Puigbò
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.,Division of Genetics and Physiology, Department of Biology, University of Turku, Turku, Finland
| | - Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
6
|
Abstract
BACKGROUND Deciphering the history of life on Earth has long been regarded as one of the most central tasks in biology. In past years, widespread discordance between the evolutionary histories of different groups of orthologous genes of prokaryotes have been revealed, primarily due to horizontal gene transfers (HGTs). Nonetheless, evidence that support a strong tree-like signal of evolution have been uncovered, despite the presence of HGT events. Therefore, a challenging task is to distill this tree-like signal from the noise induced by all sources of non-tree-like events. RESULTS In this work we tackle this question, using real and simulated data. We first tighten a recent related theoretical result in this field. In a simulation study, we infer individual quartet topologies, and then use the inferred quartets to reconstruct simulated species trees. We demonstrate that accurate tree reconstruction is feasible despite surprisingly high rates of HGT. In a real data study, we construct phylogenies of two sets of prokaryotes, and show that our tree reconstruction scheme is comparable with (and complementary better than) other commonly used methods. CONCLUSIONS Using a blend of theoretical and empirical investigations, our study proves the feasibility of accurate quartet-based phylogenetic reconstruction, the vast impact of HGT events notwithstanding.
Collapse
Affiliation(s)
- Eliran Avni
- Department of Evolutionary Biology, University of Haifa, 199 Aba Khoushy Ave. Mount Carmel, Haifa, 3498838, Israel
| | - Sagi Snir
- Department of Evolutionary Biology, University of Haifa, 199 Aba Khoushy Ave. Mount Carmel, Haifa, 3498838, Israel.
| |
Collapse
|
7
|
Denton JSS, Goolsby EW. Measuring inferential importance of taxa using taxon influence indices. Ecol Evol 2018; 8:4484-4494. [PMID: 29760889 PMCID: PMC5938459 DOI: 10.1002/ece3.3941] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2017] [Revised: 01/14/2018] [Accepted: 01/31/2018] [Indexed: 11/30/2022] Open
Abstract
Assessing the importance of different taxa for inferring evolutionary history is a critical, but underutilized, aspect of systematics. Quantifying the importance of all taxa within a dataset provides an empirical measurement that can establish a ranking of extant taxa for ecological study and/or quantify the relative importance of newly announced or redescribed specimens to enable the disentangling of novelty and inferential influence. Here, we illustrate the use of taxon influence indices through analysis of both molecular and morphological datasets, introducing a modified Bayesian approach to the taxon influence index that accounts for model and topological uncertainty. Quantification of taxon influence using the Bayesian approach produced clear rankings for both dataset types. Bayesian taxon rankings differed from maximum likelihood (ML)‐derived rankings from a mitogenomic dataset, and the highest ranking taxa exhibited the largest interquartile range in influence estimate, suggesting variance in the estimate must be taken into account when the ranking of taxa is the feature of interest. Application of the Bayesian taxon influence index to a recent morphological analysis of the Tully Monster (Tullimonstrum) reveals that it exhibits consistently low inferential importance across two recent treatments of the taxon with alternative character codings. These results lend support to the idea that taxon influence indices may be robust to character coding and therefore effective for morphological analyses. These results underscore a need for the development of approaches to, and application of, taxon influence analyses both for the purpose of establishing robust rankings for future inquiry and for explicitly quantifying the importance of individual taxa. Quantifying the importance of individual taxa refocuses debates in morphological studies from questions of character choice/significance and taxon sampling to explicitly analytical techniques, and guides discussion of the context of new discoveries.
Collapse
Affiliation(s)
- John S S Denton
- Department of Vertebrate Paleontology American Museum of Natural History New York NY USA
| | - Eric W Goolsby
- Department of Ecology and Evolutionary Biology Yale University New Haven CT USA
| |
Collapse
|
8
|
Braun DR, Chevrette MG, Acharya D, Currie CR, Rajski SR, Ritchie KB, Bugni TS. Complete Genome Sequence of Dietzia sp. Strain WMMA184, a Marine Coral-Associated Bacterium. GENOME ANNOUNCEMENTS 2018; 6:e01582-17. [PMID: 29437114 PMCID: PMC5794961 DOI: 10.1128/genomea.01582-17] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Accepted: 01/02/2018] [Indexed: 01/25/2023]
Abstract
Dietzia sp. strain WMMA184 was isolated from the marine coral Montastraea faveolata as part of ongoing drug discovery efforts. Analysis of the 4.16-Mb genome provides information regarding interspecies interactions as it pertains to the regulation of secondary metabolism and natural product biosynthesis potential.
Collapse
Affiliation(s)
- Doug R Braun
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Marc G Chevrette
- Department of Bacteriology, University of Wisconsin-Madison, Madison, Wisconsin, USA
- Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Deepa Acharya
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Cameron R Currie
- Department of Bacteriology, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Scott R Rajski
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Kim B Ritchie
- The University of South Carolina-Beaufort, Beaufort, South Carolina, USA
| | - Tim S Bugni
- Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
9
|
McTavish EJ, Drew BT, Redelings B, Cranston KA. How and Why to Build a Unified Tree of Life. Bioessays 2017; 39. [PMID: 28980328 DOI: 10.1002/bies.201700114] [Citation(s) in RCA: 42] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 08/27/2017] [Indexed: 01/20/2023]
Abstract
Phylogenetic trees are a crucial backbone for a wide breadth of biological research spanning systematics, organismal biology, ecology, and medicine. In 2015, the Open Tree of Life project published a first draft of a comprehensive tree of life, summarizing digitally available taxonomic and phylogenetic knowledge. This paper reviews, investigates, and addresses the following questions as a follow-up to that paper, from the perspective of researchers involved in building this summary of the tree of life: Is there a tree of life and should we reconstruct it? Is available data sufficient to reconstruct the tree of life? Do we have access to phylogenetic inferences in usable form? Can we combine different phylogenetic estimates across the tree of life? And finally, what is the future of understanding the tree of life?
Collapse
Affiliation(s)
| | - Bryan T Drew
- University of Nebraska at Kearney, Kerney, NE, 68849, USA
| | - Ben Redelings
- University of Kansas, Lawrence, KS, 66045, USA Duke University, Durham NC 27705 USA; Ronin Institute, Durham, NC 27705 USA
| | | |
Collapse
|
10
|
Abstract
Lateral gene transfer (LGT) profoundly shapes the evolution of bacterial lineages. LGT across disparate phylogenetic groups and genome content diversity between related organisms suggest a model of bacterial evolution that views LGT as rampant and promiscuous. It has even driven the argument that species concepts and tree-based phylogenetics cannot be applied to bacteria. Here, we show that acquisition and retention of genes through LGT are surprisingly rare in the ubiquitous and biomedically important bacterial genus Streptomyces Using a molecular clock, we estimate that the Streptomyces bacteria are ~380 million years old, indicating that this bacterial genus is as ancient as land vertebrates. Calibrating LGT rate to this geologic time span, we find that on average only 10 genes per million years were acquired and subsequently maintained. Over that same time span, Streptomyces accumulated thousands of point mutations. By explicitly incorporating evolutionary timescale into our analyses, we provide a dramatically different view on the dynamics of LGT and its impact on bacterial evolution.IMPORTANCE Tree-based phylogenetics and the use of species as units of diversity lie at the foundation of modern biology. In bacteria, these pillars of evolutionary theory have been called into question due to the observation of thousands of lateral gene transfer (LGT) events within and between lineages. Here, we show that acquisition and retention of genes through LGT are exceedingly rare in the bacterial genus Streptomyces, with merely one gene acquired in Streptomyces lineages every 100,000 years. These findings stand in contrast to the current assumption of rampant genetic exchange, which has become the dominant hypothesis used to explain bacterial diversity. Our results support a more nuanced understanding of genetic exchange, with LGT impacting evolution over short timescales but playing a significant role over long timescales. Deeper understanding of LGT provides new insight into the evolutionary history of life on Earth, as the vast majority of this history is microbial.
Collapse
|
11
|
Draft Genome Sequence of Micromonospora sp. Strain WMMB235, a Marine Ascidian-Associated Bacterium. GENOME ANNOUNCEMENTS 2017; 5:5/2/e01369-16. [PMID: 28082484 PMCID: PMC5256203 DOI: 10.1128/genomea.01369-16] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Micromonospora sp. strain WMMB235 was isolated in 2011 off the coast of the Florida Keys, USA, from a marine ascidian as part of an ongoing drug discovery project. Analysis of the ~7.1-Mb genome provides insight into this strain's biosynthetic potential, means of regulation, and response to coculturing conditions.
Collapse
|
12
|
Complete Genome Sequence of Rhodococcus sp. Strain WMMA185, a Marine Sponge-Associated Bacterium. GENOME ANNOUNCEMENTS 2016; 4:4/6/e01406-16. [PMID: 27979952 PMCID: PMC5159585 DOI: 10.1128/genomea.01406-16] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The Rhodococcus strain WMMA185 was isolated from the marine sponge Chondrilla nucula as part of ongoing drug discovery efforts. Analysis of the 4.44-Mb genome provides information regarding interspecies interactions as pertains to regulation of secondary metabolism and natural product biosynthetic potentials.
Collapse
|
13
|
Domagal-Goldman SD, Wright KE, Adamala K, Arina de la Rubia L, Bond J, Dartnell LR, Goldman AD, Lynch K, Naud ME, Paulino-Lima IG, Singer K, Walther-Antonio M, Abrevaya XC, Anderson R, Arney G, Atri D, Azúa-Bustos A, Bowman JS, Brazelton WJ, Brennecka GA, Carns R, Chopra A, Colangelo-Lillis J, Crockett CJ, DeMarines J, Frank EA, Frantz C, de la Fuente E, Galante D, Glass J, Gleeson D, Glein CR, Goldblatt C, Horak R, Horodyskyj L, Kaçar B, Kereszturi A, Knowles E, Mayeur P, McGlynn S, Miguel Y, Montgomery M, Neish C, Noack L, Rugheimer S, Stüeken EE, Tamez-Hidalgo P, Imari Walker S, Wong T. The Astrobiology Primer v2.0. ASTROBIOLOGY 2016; 16:561-653. [PMID: 27532777 PMCID: PMC5008114 DOI: 10.1089/ast.2015.1460] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Accepted: 06/06/2016] [Indexed: 05/09/2023]
Affiliation(s)
- Shawn D Domagal-Goldman
- 1 NASA Goddard Space Flight Center , Greenbelt, Maryland, USA
- 2 Virtual Planetary Laboratory , Seattle, Washington, USA
| | - Katherine E Wright
- 3 University of Colorado at Boulder , Colorado, USA
- 4 Present address: UK Space Agency, UK
| | - Katarzyna Adamala
- 5 Department of Genetics, Cell Biology and Development, University of Minnesota , Minneapolis, Minnesota, USA
| | | | - Jade Bond
- 7 Department of Physics, University of New South Wales , Sydney, Australia
| | | | | | - Kennda Lynch
- 10 Division of Biological Sciences, University of Montana , Missoula, Montana, USA
| | - Marie-Eve Naud
- 11 Institute for research on exoplanets (iREx) , Université de Montréal, Montréal, Canada
| | - Ivan G Paulino-Lima
- 12 Universities Space Research Association , Mountain View, California, USA
- 13 Blue Marble Space Institute of Science , Seattle, Washington, USA
| | - Kelsi Singer
- 14 Southwest Research Institute , Boulder, Colorado, USA
| | | | - Ximena C Abrevaya
- 16 Instituto de Astronomía y Física del Espacio (IAFE) , UBA-CONICET, Ciudad Autónoma de Buenos Aires, Argentina
| | - Rika Anderson
- 17 Department of Biology, Carleton College , Northfield, Minnesota, USA
| | - Giada Arney
- 18 University of Washington Astronomy Department and Astrobiology Program , Seattle, Washington, USA
| | - Dimitra Atri
- 13 Blue Marble Space Institute of Science , Seattle, Washington, USA
| | | | - Jeff S Bowman
- 19 Lamont-Doherty Earth Observatory, Columbia University , Palisades, New York, USA
| | | | | | - Regina Carns
- 22 Polar Science Center, Applied Physics Laboratory, University of Washington , Seattle, Washington, USA
| | - Aditya Chopra
- 23 Planetary Science Institute, Research School of Earth Sciences, Research School of Astronomy and Astrophysics, The Australian National University , Canberra, Australia
| | - Jesse Colangelo-Lillis
- 24 Earth and Planetary Science, McGill University , and the McGill Space Institute, Montréal, Canada
| | | | - Julia DeMarines
- 13 Blue Marble Space Institute of Science , Seattle, Washington, USA
| | | | - Carie Frantz
- 27 Department of Geosciences, Weber State University , Ogden, Utah, USA
| | - Eduardo de la Fuente
- 28 IAM-Departamento de Fisica, CUCEI , Universidad de Guadalajara, Guadalajara, México
| | - Douglas Galante
- 29 Brazilian Synchrotron Light Laboratory , Campinas, Brazil
| | - Jennifer Glass
- 30 School of Earth and Atmospheric Sciences, Georgia Institute of Technology , Atlanta, Georgia , USA
| | | | | | - Colin Goldblatt
- 33 School of Earth and Ocean Sciences, University of Victoria , Victoria, Canada
| | - Rachel Horak
- 34 American Society for Microbiology , Washington, DC, USA
| | | | - Betül Kaçar
- 36 Harvard University , Organismic and Evolutionary Biology, Cambridge, Massachusetts, USA
| | - Akos Kereszturi
- 37 Research Centre for Astronomy and Earth Sciences , Hungarian Academy of Sciences, Budapest, Hungary
| | - Emily Knowles
- 38 Johnson & Wales University , Denver, Colorado, USA
| | - Paul Mayeur
- 39 Rensselaer Polytechnic Institute , Troy, New York, USA
| | - Shawn McGlynn
- 40 Earth Life Science Institute, Tokyo Institute of Technology , Tokyo, Japan
| | - Yamila Miguel
- 41 Laboratoire Lagrange, UMR 7293, Université Nice Sophia Antipolis , CNRS, Observatoire de la Côte d'Azur, Nice, France
| | | | - Catherine Neish
- 43 Department of Earth Sciences, The University of Western Ontario , London, Canada
| | - Lena Noack
- 44 Royal Observatory of Belgium , Brussels, Belgium
| | - Sarah Rugheimer
- 45 Department of Astronomy, Harvard University , Cambridge, Massachusetts, USA
- 46 University of St. Andrews , St. Andrews, UK
| | - Eva E Stüeken
- 47 University of Washington , Seattle, Washington, USA
- 48 University of California , Riverside, California, USA
| | | | - Sara Imari Walker
- 13 Blue Marble Space Institute of Science , Seattle, Washington, USA
- 50 School of Earth and Space Exploration and Beyond Center for Fundamental Concepts in Science, Arizona State University , Tempe, Arizona, USA
| | - Teresa Wong
- 51 Department of Earth and Planetary Sciences, Washington University in St. Louis , St. Louis, Missouri, USA
| |
Collapse
|
14
|
Gupta RS. Impact of genomics on the understanding of microbial evolution and classification: the importance of Darwin's views on classification. FEMS Microbiol Rev 2016; 40:520-53. [PMID: 27279642 DOI: 10.1093/femsre/fuw011] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/14/2016] [Indexed: 12/24/2022] Open
Abstract
Analyses of genome sequences, by some approaches, suggest that the widespread occurrence of horizontal gene transfers (HGTs) in prokaryotes disguises their evolutionary relationships and have led to questioning of the Darwinian model of evolution for prokaryotes. These inferences are critically examined in the light of comparative genome analysis, characteristic synapomorphies, phylogenetic trees and Darwin's views on examining evolutionary relationships. Genome sequences are enabling discovery of numerous molecular markers (synapomorphies) such as conserved signature indels (CSIs) and conserved signature proteins (CSPs), which are distinctive characteristics of different prokaryotic taxa. Based on these molecular markers, exhibiting high degree of specificity and predictive ability, numerous prokaryotic taxa of different ranks, currently identified based on the 16S rRNA gene trees, can now be reliably demarcated in molecular terms. Within all studied groups, multiple CSIs and CSPs have been identified for successive nested clades providing reliable information regarding their hierarchical relationships and these inferences are not affected by HGTs. These results strongly support Darwin's views on evolution and classification and supplement the current phylogenetic framework based on 16S rRNA in important respects. The identified molecular markers provide important means for developing novel diagnostics, therapeutics and for functional studies providing important insights regarding prokaryotic taxa.
Collapse
Affiliation(s)
- Radhey S Gupta
- Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, ON, Canada
| |
Collapse
|
15
|
Boto L. Evolutionary change and phylogenetic relationships in light of horizontal gene transfer. J Biosci 2016; 40:465-72. [PMID: 25963270 DOI: 10.1007/s12038-015-9514-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Horizontal gene transfer has, over the past 25 years, become a part of evolutionary thinking. In the present paper I discuss horizontal gene transfer (HGT) in relation to contingency, natural selection, evolutionary change speed and the Tree-of-Life endeavour, with the aim of contributing to the understanding of the role of HGT in evolutionary processes. In addition, the challenges that HGT imposes on the current view of evolution are emphasized.
Collapse
Affiliation(s)
- Luis Boto
- Departamento de Biodiversidad y Biologia Evolutiva, Museo Nacional Ciencias Naturales, CSIC, C/ Jose Gutierrez Abascal 2, 28006, Madrid, Spain,
| |
Collapse
|
16
|
Zamani-Dahaj SA, Okasha M, Kosakowski J, Higgs PG. Estimating the Frequency of Horizontal Gene Transfer Using Phylogenetic Models of Gene Gain and Loss. Mol Biol Evol 2016; 33:1843-57. [DOI: 10.1093/molbev/msw062] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
17
|
Orata FD, Kirchberger PC, Méheust R, Barlow EJ, Tarr CL, Boucher Y. The Dynamics of Genetic Interactions between Vibrio metoecus and Vibrio cholerae, Two Close Relatives Co-Occurring in the Environment. Genome Biol Evol 2015; 7:2941-54. [PMID: 26454015 PMCID: PMC4684700 DOI: 10.1093/gbe/evv193] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
Vibrio metoecus is the closest relative of Vibrio cholerae, the causative agent of the potent diarrheal disease cholera. Although the pathogenic potential of this new species is yet to be studied in depth, it has been co-isolated with V. cholerae in coastal waters and found in clinical specimens in the United States. We used these two organisms to investigate the genetic interaction between closely related species in their natural environment. The genomes of 20 V. cholerae and 4 V. metoecus strains isolated from a brackish coastal pond on the US east coast, as well as 4 clinical V. metoecus strains were sequenced and compared with reference strains. Whole genome comparison shows 86-87% average nucleotide identity (ANI) in their core genes between the two species. On the other hand, the chromosomal integron, which occupies approximately 3% of their genomes, shows higher conservation in ANI between species than any other region of their genomes. The ANI of 93-94% observed in this region is not significantly greater within than between species, meaning that it does not follow species boundaries. Vibrio metoecus does not encode toxigenic V. cholerae major virulence factors, the cholera toxin and toxin-coregulated pilus. However, some of the pathogenicity islands found in pandemic V. cholerae were either present in the common ancestor it shares with V. metoecus, or acquired by clinical and environmental V. metoecus in partial fragments. The virulence factors of V. cholerae are therefore both more ancient and more widespread than previously believed. There is high interspecies recombination in the core genome, which has been detected in 24% of the single-copy core genes, including genes involved in pathogenicity. Vibrio metoecus was six times more often the recipient of DNA from V. cholerae as it was the donor, indicating a strong bias in the direction of gene transfer in the environment.
Collapse
Affiliation(s)
- Fabini D Orata
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Paul C Kirchberger
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| | - Raphaël Méheust
- Unité Mixte de Recherche 7138, Evolution Paris-Seine, Institut de Biologie Paris-Seine, Université Pierre et Marie Curie, Paris, France
| | - E Jed Barlow
- Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
| | - Cheryl L Tarr
- Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA
| | - Yan Boucher
- Department of Biological Sciences, University of Alberta, Edmonton, Alberta, Canada
| |
Collapse
|
18
|
Chaudhary R, Boussau B, Burleigh JG, Fernández-Baca D. Assessing approaches for inferring species trees from multi-copy genes. Syst Biol 2014; 64:325-39. [PMID: 25540456 DOI: 10.1093/sysbio/syu128] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
With the availability of genomic sequence data, there is increasing interest in using genes with a possible history of duplication and loss for species tree inference. Here we assess the performance of both nonprobabilistic and probabilistic species tree inference approaches using gene duplication and loss and coalescence simulations. We evaluated the performance of gene tree parsimony (GTP) based on duplication (Only-dup), duplication and loss (Dup-loss), and deep coalescence (Deep-c) costs, the NJst distance method, the MulRF supertree method, and PHYLDOG, which jointly estimates gene trees and species tree using a hierarchical probabilistic model. We examined the effects of gene tree and species sampling, gene tree error, and duplication and loss rates on the accuracy of phylogenetic estimates. In the 10-taxon duplication and loss simulation experiments, MulRF is more accurate than the other methods when the duplication and loss rates are low, and Dup-loss is generally the most accurate when the duplication and loss rates are high. PHYLDOG performs well in 10-taxon duplication and loss simulations, but its run time is prohibitively long on larger data sets. In the larger duplication and loss simulation experiments, MulRF outperforms all other methods in experiments with at most 100 taxa; however, in the larger simulation, Dup-loss generally performs best. In all duplication and loss simulation experiments with more than 10 taxa, all methods perform better with more gene trees and fewer missing sequences, and they are all affected by gene tree error. Our results also highlight high levels of error in estimates of duplications and losses from GTP methods and demonstrate the usefulness of methods based on generic tree distances for large analyses.
Collapse
Affiliation(s)
- Ruchi Chaudhary
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA; Department of Biology, University of Florida, Gainesville, FL 32611, USA; and Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622, France Department of Computer Science, Iowa State University, Ames, IA 50011, USA; Department of Biology, University of Florida, Gainesville, FL 32611, USA; and Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622, France
| | - Bastien Boussau
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA; Department of Biology, University of Florida, Gainesville, FL 32611, USA; and Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622, France
| | - J Gordon Burleigh
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA; Department of Biology, University of Florida, Gainesville, FL 32611, USA; and Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622, France
| | - David Fernández-Baca
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA; Department of Biology, University of Florida, Gainesville, FL 32611, USA; and Université de Lyon, Université Lyon 1, CNRS, UMR 5558, Laboratoire de Biométrie et Biologie Evolutive, Villeurbanne F-69622, France
| |
Collapse
|
19
|
Li J, Wong CF, Wong MT, Huang H, Leung FC. Modularized evolution in archaeal methanogens phylogenetic forest. Genome Biol Evol 2014; 6:3344-59. [PMID: 25502908 PMCID: PMC4986457 DOI: 10.1093/gbe/evu259] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/17/2014] [Indexed: 11/13/2022] Open
Abstract
Methanogens are methane-producing archaea that plays a key role in the global carbon cycle. To date, the evolutionary history of methanogens and closely related nonmethanogen species remains unresolved among studies conducted upon different genetic markers, attributing to horizontal gene transfers (HGTs). With an effort to decipher both congruent and conflicting evolutionary events, reconstruction of coevolved gene clusters and hierarchical structure in the archaeal methanogen phylogenetic forest, comprehensive evolution, and network analyses were performed upon 3,694 gene families from 41 methanogens and 33 closely related archaea. Our results show that 1) greater than 50% of genes are in topological dissonance with others; 2) the prevalent interorder HGTs, even for core genes, in methanogen genomes led to their scrambled phylogenetic relationships; 3) most methanogenesis-related genes have experienced at least one HGT; 4) greater than 20% of the genes in methanogen genomes were transferred horizontally from other archaea, with genes involved in cell-wall synthesis and defense system having been transferred most frequently; 5) the coevolution network contains seven statistically robust modules, wherein the central module has the highest average node strength and comprises a majority of the core genes; 6) different coevolutionary module genes boomed in different time and evolutionary lineage, constructing diversified pan-genome structures; 7) the modularized evolution is also closely related to the vertical evolution signals and the HGT rate of the genes. Overall, this study presented a modularized phylogenetic forest that describes a combination of complicated vertical and nonvertical evolutionary processes for methanogenic archaeal species.
Collapse
Affiliation(s)
- Jun Li
- School of Biological Sciences, Faculty of Science, The University of Hong Kong, China
| | - Chi-Fat Wong
- School of Biological Sciences, Faculty of Science, The University of Hong Kong, China School of Biological Sciences, Faculty of Science, The University of Hong Kong, China
| | - Mabel Ting Wong
- School of Biological Sciences, Faculty of Science, The University of Hong Kong, China Present address: Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
| | - He Huang
- Center for Marine Environmental Studies, Ehime University, Japan
| | - Frederick C Leung
- School of Biological Sciences, Faculty of Science, The University of Hong Kong, China Bioinformatics Center, Nanjing Agricultural University, People's Republic of China
| |
Collapse
|
20
|
Currie TE, Mace R. Evolution of cultural traits occurs at similar relative rates in different world regions. Proc Biol Sci 2014; 281:20141622. [PMID: 25297866 PMCID: PMC4213619 DOI: 10.1098/rspb.2014.1622] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2014] [Accepted: 09/09/2014] [Indexed: 02/04/2023] Open
Abstract
A fundamental issue in understanding human diversity is whether or not there are regular patterns and processes involved in cultural change. Theoretical and mathematical models of cultural evolution have been developed and are increasingly being used and assessed in empirical analyses. Here, we test the hypothesis that the rates of change of features of human socio-cultural organization are governed by general rules. One prediction of this hypothesis is that different cultural traits will tend to evolve at similar relative rates in different world regions, despite the unique historical backgrounds of groups inhabiting these regions. We used phylogenetic comparative methods and systematic cross-cultural data to assess how different socio-cultural traits changed in (i) island southeast Asia and the Pacific, and (ii) sub-Saharan Africa. The relative rates of change in these two regions are significantly correlated. Furthermore, cultural traits that are more directly related to external environmental conditions evolve more slowly than traits related to social structures. This is consistent with the idea that a form of purifying selection is acting with greater strength on these more environmentally linked traits. These results suggest that despite contingent historical events and the role of humans as active agents in the historical process, culture does indeed evolve in ways that can be predicted from general principles.
Collapse
Affiliation(s)
- Thomas E Currie
- Department of Biosciences, College of Life and Environmental Sciences, University of Exeter, Penryn Campus, Cornwall TR10 9EZ, UK Department of Anthropology, University College London, 14 Taviton St., London WC1H 0BH, UK
| | - Ruth Mace
- Department of Anthropology, University College London, 14 Taviton St., London WC1H 0BH, UK
| |
Collapse
|
21
|
Puigbò P, Lobkovsky AE, Kristensen DM, Wolf YI, Koonin EV. Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes. BMC Biol 2014; 12:66. [PMID: 25141959 PMCID: PMC4166000 DOI: 10.1186/s12915-014-0066-4] [Citation(s) in RCA: 117] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 07/31/2014] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Genomes of bacteria and archaea (collectively, prokaryotes) appear to exist in incessant flux, expanding via horizontal gene transfer and gene duplication, and contracting via gene loss. However, the actual rates of genome dynamics and relative contributions of different types of event across the diversity of prokaryotes are largely unknown, as are the sizes of microbial supergenomes, i.e. pools of genes that are accessible to the given microbial species. RESULTS We performed a comprehensive analysis of the genome dynamics in 35 groups (34 bacterial and one archaeal) of closely related microbial genomes using a phylogenetic birth-and-death maximum likelihood model to quantify the rates of gene family gain and loss, as well as expansion and reduction. The results show that loss of gene families dominates the evolution of prokaryotes, occurring at approximately three times the rate of gain. The rates of gene family expansion and reduction are typically seven and twenty times less than the gain and loss rates, respectively. Thus, the prevailing mode of evolution in bacteria and archaea is genome contraction, which is partially compensated by the gain of new gene families via horizontal gene transfer. However, the rates of gene family gain, loss, expansion and reduction vary within wide ranges, with the most stable genomes showing rates about 25 times lower than the most dynamic genomes. For many groups, the supergenome estimated from the fraction of repetitive gene family gains includes about tenfold more gene families than the typical genome in the group although some groups appear to have vast, 'open' supergenomes. CONCLUSIONS Reconstruction of evolution for groups of closely related bacteria and archaea reveals an extremely rapid and highly variable flux of genes in evolving microbial genomes, demonstrates that extensive gene loss and horizontal gene transfer leading to innovation are the two dominant evolutionary processes, and yields robust estimates of the supergenome size.
Collapse
|
22
|
Knöppel A, Lind PA, Lustig U, Näsvall J, Andersson DI. Minor fitness costs in an experimental model of horizontal gene transfer in bacteria. Mol Biol Evol 2014; 31:1220-7. [PMID: 24536043 DOI: 10.1093/molbev/msu076] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Genes introduced by horizontal gene transfer (HGT) from other species constitute a significant portion of many bacterial genomes, and the evolutionary dynamics of HGTs are important for understanding the spread of antibiotic resistance and the emergence of new pathogenic strains of bacteria. The fitness effects of the transferred genes largely determine the fixation rates and the amount of neutral diversity of newly acquired genes in bacterial populations. Comparative analysis of bacterial genomes provides insight into what genes are commonly transferred, but direct experimental tests of the fitness constraints on HGT are scarce. Here, we address this paucity of experimental studies by introducing 98 random DNA fragments varying in size from 0.45 to 5 kb from Bacteroides, Proteus, and human intestinal phage into a defined position in the Salmonella chromosome and measuring the effects on fitness. Using highly sensitive competition assays, we found that eight inserts were deleterious with selection coefficients (s) ranging from ≈ -0.007 to -0.02 and 90 did not have significant fitness effects. When inducing transcription from a PBAD promoter located at one end of the insert, 16 transfers were deleterious and 82 were not significantly different from the control. In conclusion, a major fraction of the inserts had minor effects on fitness implying that extra DNA transferred by HGT, even though it does not confer an immediate selective advantage, could be maintained at selection-transfer balance and serve as raw material for the evolution of novel beneficial functions.
Collapse
Affiliation(s)
- Anna Knöppel
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | | | | | | | | |
Collapse
|
23
|
Matzke NJ, Shih PM, Kerfeld CA. Bayesian analysis of congruence of core genes in Prochlorococcus and Synechococcus and implications on horizontal gene transfer. PLoS One 2014; 9:e85103. [PMID: 24465485 PMCID: PMC3897415 DOI: 10.1371/journal.pone.0085103] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 11/22/2013] [Indexed: 01/28/2023] Open
Abstract
It is often suggested that horizontal gene transfer is so ubiquitous in microbes that the concept of a phylogenetic tree representing the pattern of vertical inheritance is oversimplified or even positively misleading. "Universal proteins" have been used to infer the organismal phylogeny, but have been criticized as being only the "tree of one percent." Currently, few options exist for those wishing to rigorously assess how well a universal protein phylogeny, based on a relative handful of well-conserved genes, represents the phylogenetic histories of hundreds of genes. Here, we address this problem by proposing a visualization method and a statistical test within a Bayesian framework. We use the genomes of marine cyanobacteria, a group thought to exhibit substantial amounts of HGT, as a test case. We take 379 orthologous gene families from 28 cyanobacteria genomes and estimate the Bayesian posterior distributions of trees - a "treecloud" - for each, as well as for a concatenated dataset based on putative "universal proteins." We then calculate the average distance between trees within and between all treeclouds on various metrics and visualize this high-dimensional space with non-metric multidimensional scaling (NMMDS). We show that the tree space is strongly clustered and that the universal protein treecloud is statistically significantly closer to the center of this tree space than any individual gene treecloud. We apply several commonly-used tests for incongruence/HGT and show that they agree HGT is rare in this dataset, but make different choices about which genes were subject to HGT. Our results show that the question of the representativeness of the "tree of one percent" is a quantitative empirical question, and that the phylogenetic central tendency is a meaningful observation even if many individual genes disagree due to the various sources of incongruence.
Collapse
Affiliation(s)
- Nicholas J. Matzke
- Department of Integrative Biology, University of California, Berkeley, California, United States of America
| | - Patrick M. Shih
- Department of Plant and Microbial Biology, University of California, Berkeley, California, United States of America
| | - Cheryl A. Kerfeld
- Department of Plant and Microbial Biology, University of California, Berkeley, California, United States of America
- US Department of Energy-Joint Genome Institute, Walnut Creek, California, United States of America
- * E-mail:
| |
Collapse
|
24
|
Boon E, Meehan CJ, Whidden C, Wong DHJ, Langille MGI, Beiko RG. Interactions in the microbiome: communities of organisms and communities of genes. FEMS Microbiol Rev 2014; 38:90-118. [PMID: 23909933 PMCID: PMC4298764 DOI: 10.1111/1574-6976.12035] [Citation(s) in RCA: 119] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Revised: 07/02/2013] [Accepted: 07/10/2013] [Indexed: 12/17/2022] Open
Abstract
A central challenge in microbial community ecology is the delineation of appropriate units of biodiversity, which can be taxonomic, phylogenetic, or functional in nature. The term 'community' is applied ambiguously; in some cases, the term refers simply to a set of observed entities, while in other cases, it requires that these entities interact with one another. Microorganisms can rapidly gain and lose genes, potentially decoupling community roles from taxonomic and phylogenetic groupings. Trait-based approaches offer a useful alternative, but many traits can be defined based on gene functions, metabolic modules, and genomic properties, and the optimal set of traits to choose is often not obvious. An analysis that considers taxon assignment and traits in concert may be ideal, with the strengths of each approach offsetting the weaknesses of the other. Individual genes also merit consideration as entities in an ecological analysis, with characteristics such as diversity, turnover, and interactions modeled using genes rather than organisms as entities. We identify some promising avenues of research that are likely to yield a deeper understanding of microbial communities that shift from observation-based questions of 'Who is there?' and 'What are they doing?' to the mechanistically driven question of 'How will they respond?'
Collapse
Affiliation(s)
- Eva Boon
- Department of Biology, Dalhousie University, Halifax, NS, Canada
| | | | | | | | | | | |
Collapse
|
25
|
Evolution of tryptophan biosynthetic pathway in microbial genomes: a comparative genetic study. SYSTEMS AND SYNTHETIC BIOLOGY 2013; 8:59-72. [PMID: 24592292 DOI: 10.1007/s11693-013-9127-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2013] [Revised: 10/05/2013] [Accepted: 10/08/2013] [Indexed: 10/26/2022]
Abstract
Biosynthetic pathway evolution needs to consider the evolution of a group of genes that code for enzymes catalysing the multiple chemical reaction steps leading to the final end product. Tryptophan biosynthetic pathway has five chemical reaction steps that are highly conserved in diverse microbial genomes, though the genes of the pathway enzymes show considerable variations in arrangements, operon structure (gene fusion and splitting) and regulation. We use a combined bioinformatic and statistical analyses approach to address the question if the pathway genes from different microbial genomes, belonging to a wide range of groups, show similar evolutionary relationships within and between them. Our analyses involved detailed study of gene organization (fusion/splitting events), base composition, relative synonymous codon usage pattern of the genes, gene expressivity, amino acid usage, etc. to assess inter- and intra-genic variations, between and within the pathway genes, in diverse group of microorganisms. We describe these genetic and genomic variations in the tryptophan pathway genes in different microorganisms to show the similarities across organisms, and compare the same genes across different organisms to find the possible variability arising possibly due to horizontal gene transfers. Such studies form the basis for moving from single gene evolution to pathway evolutionary studies that are important steps towards understanding the systems biology of intracellular pathways.
Collapse
|
26
|
Prokaryotic phylogenies inferred from whole-genome sequence and annotation data. BIOMED RESEARCH INTERNATIONAL 2013; 2013:409062. [PMID: 24073404 PMCID: PMC3773407 DOI: 10.1155/2013/409062] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 06/26/2013] [Accepted: 07/22/2013] [Indexed: 11/25/2022]
Abstract
Phylogenetic trees are used to represent the evolutionary relationship among various groups of species. In this paper, a novel method for inferring prokaryotic phylogenies using multiple genomic information is proposed. The method is called CGCPhy and based on the distance matrix of orthologous gene clusters between whole-genome pairs. CGCPhy comprises four main steps. First, orthologous genes are determined by sequence similarity, genomic function, and genomic structure information. Second, genes involving potential HGT events are eliminated, since such genes are considered to be the highly conserved genes across different species and the genes located on fragments with abnormal genome barcode. Third, we calculate the distance of the orthologous gene clusters between each genome pair in terms of the number of orthologous genes in conserved clusters. Finally, the neighbor-joining method is employed to construct phylogenetic trees across different species. CGCPhy has been examined on different datasets from 617 complete single-chromosome prokaryotic genomes and achieved applicative accuracies on different species sets in agreement with Bergey's taxonomy in quartet topologies. Simulation results show that CGCPhy achieves high average accuracy and has a low standard deviation on different datasets, so it has an applicative potential for phylogenetic analysis.
Collapse
|
27
|
Roch S, Snir S. Recovering the treelike trend of evolution despite extensive lateral genetic transfer: a probabilistic analysis. J Comput Biol 2013; 20:93-112. [PMID: 23383996 DOI: 10.1089/cmb.2012.0234] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Lateral gene transfer (LGT) is a common mechanism of nonvertical evolution, during which genetic material is transferred between two more or less distantly related organisms. It is particularly common in bacteria where it contributes to adaptive evolution with important medical implications. In evolutionary studies, LGT has been shown to create widespread discordance between gene trees as genomes become mosaics of gene histories. In particular, the Tree of Life has been questioned as an appropriate representation of bacterial evolutionary history. Nevertheless a common hypothesis is that prokaryotic evolution is primarily treelike, but that the underlying trend is obscured by LGT. Extensive empirical work has sought to extract a common treelike signal from conflicting gene trees. Here we give a probabilistic perspective on the problem of recovering the treelike trend despite LGT. Under a model of randomly distributed LGT, we show that the species phylogeny can be reconstructed even in the presence of surprisingly many (almost linear number of) LGT events per gene tree. Our results, which are optimal up to logarithmic factors, are based on the analysis of a robust, computationally efficient reconstruction method and provides insight into the design of such methods. Finally, we show that our results have implications for the discovery of highways of gene sharing.
Collapse
Affiliation(s)
- Sebastien Roch
- Department of Mathematics and Bioinformatics Program, University of California at Los Angeles, Los Angeles, CA, USA.
| | | |
Collapse
|
28
|
Lasek-Nesselquist E, Gogarten JP. The effects of model choice and mitigating bias on the ribosomal tree of life. Mol Phylogenet Evol 2013; 69:17-38. [PMID: 23707703 DOI: 10.1016/j.ympev.2013.05.006] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2013] [Revised: 04/26/2013] [Accepted: 05/08/2013] [Indexed: 01/03/2023]
Abstract
Deep-level relationships within Bacteria, Archaea, and Eukarya as well as the relationships of these three domains to each other require resolution. The ribosomal machinery, universal to all cellular life, represents a protein repertoire resistant to horizontal gene transfer, which provides a largely congruent signal necessary for reconstructing a tree suitable as a backbone for life's reticulate history. Here, we generate a ribosomal tree of life from a robust taxonomic sampling of Bacteria, Archaea, and Eukarya to elucidate deep-level intra-domain and inter-domain relationships. Lack of phylogenetic information and systematic errors caused by inadequate models (that cannot account for substitution rate or compositional heterogeneities) or improper model selection compound conflicting phylogenetic signals from HGT and/or paralogy. Thus, we tested several models of varying sophistication on three different datasets, performed removal of fast-evolving or long-branched Archaea and Eukarya, and employed three different strategies to remove compositional heterogeneity to examine their effects on the topological outcome. Our results support a two-domain topology for the tree of life, where Eukarya emerges from within Archaea as sister to a Korarchaeota/Thaumarchaeota (KT) or Crenarchaeota/KT clade for all models under all or at least one of the strategies employed. Taxonomic manipulation allows single-matrix and certain mixture models to vacillate between two-domain and three-domain phylogenies. We find that models vary in their ability to resolve different areas of the tree of life, which does not necessarily correlate with model complexity. For example, both single-matrix and some mixture models recover monophyletic Crenarchaeota and Euryarchaeota archaeal phyla. In contrast, the most sophisticated model recovers a paraphyletic Euryarchaeota but detects two large clades that comprise the Bacteria, which were recovered separately but never together in the other models. Overall, models recovered consistent topologies despite dataset modifications due to the removal of compositional bias, which reflects either ineffective bias reduction or robust datasets that allow models to overcome reconstruction artifacts. We recommend a comparative approach for evolutionary models to identify model weaknesses as well as consensus relationships.
Collapse
|
29
|
Kvist S, Siddall ME. Phylogenomics of Annelida revisited: a cladistic approach using genome-wide expressed sequence tag data mining and examining the effects of missing data. Cladistics 2013; 29:435-448. [DOI: 10.1111/cla.12015] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/31/2012] [Indexed: 11/28/2022] Open
|
30
|
Segata N, Börnigen D, Morgan XC, Huttenhower C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat Commun 2013; 4:2304. [PMID: 23942190 PMCID: PMC3760377 DOI: 10.1038/ncomms3304] [Citation(s) in RCA: 575] [Impact Index Per Article: 52.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 07/12/2013] [Indexed: 12/19/2022] Open
Abstract
New microbial genomes are constantly being sequenced, and it is crucial to accurately determine their taxonomic identities and evolutionary relationships. Here we report PhyloPhlAn, a new method to assign microbial phylogeny and putative taxonomy using >400 proteins optimized from among 3,737 genomes. This method measures the sequence diversity of all clades, classifies genomes from deep-branching candidate divisions through closely related subspecies and improves consistency between phylogenetic and taxonomic groupings. PhyloPhlAn improved taxonomic accuracy for existing and newly sequenced genomes, detecting 157 erroneous labels, correcting 46 and placing or refining 130 new genomes. We provide examples of accurate classifications from subspecies (Sulfolobus spp.) to phyla, and of preliminary rooting of deep-branching candidate divisions, including consistent statistical support for Caldiserica (formerly candidate division OP5). PhyloPhlAn will thus be useful for both phylogenetic assessment and taxonomic quality control of newly sequenced genomes. The final phylogenies, conserved protein sequences and open-source implementation are available online.
Collapse
Affiliation(s)
- Nicola Segata
- Biostatistics Department, Harvard School of Public Health, 655 Huntington Avenue, 02115, Boston, MA
| | - Daniela Börnigen
- Biostatistics Department, Harvard School of Public Health, 655 Huntington Avenue, 02115, Boston, MA
- Broad Institute of Harvard and MIT, 301 Binney Street, 02142 Cambridge, MA
| | - Xochitl C. Morgan
- Biostatistics Department, Harvard School of Public Health, 655 Huntington Avenue, 02115, Boston, MA
- Broad Institute of Harvard and MIT, 301 Binney Street, 02142 Cambridge, MA
| | - Curtis Huttenhower
- Biostatistics Department, Harvard School of Public Health, 655 Huntington Avenue, 02115, Boston, MA
- Broad Institute of Harvard and MIT, 301 Binney Street, 02142 Cambridge, MA
| |
Collapse
|
31
|
Chan CX, Soares MB, Bonaldo MF, Wisecaver JH, Hackett JD, Anderson DM, Erdner DL, Bhattacharya D. ANALYSIS OF ALEXANDRIUM TAMARENSE (DINOPHYCEAE) GENES REVEALS THE COMPLEX EVOLUTIONARY HISTORY OF A MICROBIAL EUKARYOTE(). JOURNAL OF PHYCOLOGY 2012; 48:1130-1142. [PMID: 23066170 PMCID: PMC3466611 DOI: 10.1111/j.1529-8817.2012.01194.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Microbial eukaryotes may extinguish much of their nuclear phylogenetic history due to endosymbiotic/horizontal gene transfer (E/HGT). We studied E/HGT in 32,110 contigs of expressed sequence tags (ESTs) from the dinoflagellate Alexandrium tamarense (Dinophyceae) using a conservative phylogenomic approach. The vast majority of predicted proteins (86.4%) in this alga are novel or dinoflagellate-specific. We searched for putative homologs of these predicted proteins against a taxonomically broadly sampled protein database that includes all currently available data from algae and protists and reconstructed a phylogeny from each of the putative homologous protein sets. Of the 2,523 resulting phylogenies, 14-17% are potentially impacted by E/HGT involving both prokaryote and eukaryote lineages, with 2-4% showing clear evidence of reticulate evolution. The complex evolutionary histories of the remaining proteins, many of which may also have been affected by E/HGT, cannot be interpreted using our approach with currently available gene data. We present empirical evidence of reticulate genome evolution that combined with inadequate or highly complex phylogenetic signal in many proteins may impede genome-wide approaches to infer the tree of microbial eukaryotes.
Collapse
Affiliation(s)
- Cheong Xin Chan
- Department of Ecology, Evolution and Natural Resources, and Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ 08901, USA
| | - Marcelo B. Soares
- Northwestern University, Children's Memorial Research Center, Chicago, IL 60614, USA
| | - Maria F. Bonaldo
- Northwestern University, Children's Memorial Research Center, Chicago, IL 60614, USA
| | - Jennifer H. Wisecaver
- Department of Ecology and Evolutionary Biology, The University of Arizona, Tucson, AZ 85721, USA
| | - Jeremiah D. Hackett
- Department of Ecology and Evolutionary Biology, The University of Arizona, Tucson, AZ 85721, USA
| | | | - Deana L. Erdner
- Marine Science Institute, University of Texas, Port Aransas, TX 78373, USA
| | - Debashish Bhattacharya
- Department of Ecology, Evolution and Natural Resources, and Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ 08901, USA
| |
Collapse
|
32
|
Meinel T, Krause A. Meta-analysis of general bacterial subclades in whole-genome phylogenies using tree topology profiling. Evol Bioinform Online 2012; 8:489-525. [PMID: 22915837 PMCID: PMC3422217 DOI: 10.4137/ebo.s9642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
In the last two decades, a large number of whole-genome phylogenies have been inferred to reconstruct the Tree of Life (ToL). Underlying data models range from gene or functionality content in species to phylogenetic gene family trees and multiple sequence alignments of concatenated protein sequences. Diversity in data models together with the use of different tree reconstruction techniques, disruptive biological effects and the steadily increasing number of genomes have led to a huge diversity in published phylogenies. Comparison of those and, moreover, identification of the impact of inference properties (underlying data model, inference technique) on particular reconstructions is almost impossible. In this work, we introduce tree topology profiling as a method to compare already published whole-genome phylogenies. This method requires visual determination of the particular topology in a drawn whole-genome phylogeny for a set of particular bacterial clans. For each clan, neighborhoods to other bacteria are collected into a catalogue of generalized alternative topologies. Particular topology alternatives found for an ordered list of bacterial clans reveal a topology profile that represents the analyzed phylogeny. To simulate the inhomogeneity of published gene content phylogenies we generate a set of seven phylogenies using different inference techniques and the SYSTERS-PhyloMatrix data model. After tree topology profiling on in total 54 selected published and newly inferred phylogenies, we separate artefactual from biologically meaningful phylogenies and associate particular inference results (phylogenies) with inference background (inference techniques as well as data models). Topological relationships of particular bacterial species groups are presented. With this work we introduce tree topology profiling into the scientific field of comparative phylogenomics.
Collapse
Affiliation(s)
- Thomas Meinel
- Charité-University Medicine Berlin, Institute for Physiology, Structural Bioinformatics Group, Thielallee 71, 14195 Berlin, Germany
| | | |
Collapse
|
33
|
Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat Genet 2012; 44:226-32. [PMID: 22231483 PMCID: PMC3272472 DOI: 10.1038/ng.1028] [Citation(s) in RCA: 352] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2011] [Accepted: 11/07/2011] [Indexed: 12/24/2022]
Abstract
Detecting genetic variants that are highly divergent from a reference sequence remains a major challenge in genome sequencing. We introduce de novo assembly algorithms using colored de Bruijn graphs for detecting and genotyping simple and complex genetic variants in an individual or population. We provide an efficient software implementation, Cortex, the first de novo assembler capable of assembling multiple eukaryotic genomes simultaneously. Four applications of Cortex are presented. First, we detect and validate both simple and complex structural variations in a high-coverage human genome. Second, we identify more than 3 Mb of sequence absent from the human reference genome, in pooled low-coverage population sequence data from the 1000 Genomes Project. Third, we show how population information from ten chimpanzees enables accurate variant calls without a reference sequence. Last, we estimate classical human leukocyte antigen (HLA) genotypes at HLA-B, the most variable gene in the human genome.
Collapse
Affiliation(s)
- Zamin Iqbal
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Mario Caccamo
- The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Isaac Turner
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Paul Flicek
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK
| | - Gil McVean
- Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN, UK
- Department of Statistics, 1 South Parks Road, Oxford OX1 3TG, UK
| |
Collapse
|
34
|
Puigbò P, Wolf YI, Koonin EV. Genome-wide comparative analysis of phylogenetic trees: the prokaryotic forest of life. Methods Mol Biol 2012; 856:53-79. [PMID: 22399455 PMCID: PMC3842619 DOI: 10.1007/978-1-61779-585-5_3] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article, we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the Boot-Split Distance (BSD) method is introduced as an extension of the previously developed Split Distance method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting tree-like and net-like evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the application of these methods to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a "species tree."
Collapse
Affiliation(s)
- Pere Puigbò
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health. Bethesda, Maryland 20894. USA
| | - Yuri I. Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health. Bethesda, Maryland 20894. USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health. Bethesda, Maryland 20894. USA
| |
Collapse
|
35
|
Skippington E, Ragan MA. Lateral genetic transfer and the construction of genetic exchange communities. FEMS Microbiol Rev 2011; 35:707-35. [DOI: 10.1111/j.1574-6976.2010.00261.x] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
|
36
|
Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K. Statistics and truth in phylogenomics. Mol Biol Evol 2011; 29:457-72. [PMID: 21873298 DOI: 10.1093/molbev/msr202] [Citation(s) in RCA: 164] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Phylogenomics refers to the inference of historical relationships among species using genome-scale sequence data and to the use of phylogenetic analysis to infer protein function in multigene families. With rapidly decreasing sequencing costs, phylogenomics is becoming synonymous with evolutionary analysis of genome-scale and taxonomically densely sampled data sets. In phylogenetic inference applications, this translates into very large data sets that yield evolutionary and functional inferences with extremely small variances and high statistical confidence (P value). However, reports of highly significant P values are increasing even for contrasting phylogenetic hypotheses depending on the evolutionary model and inference method used, making it difficult to establish true relationships. We argue that the assessment of the robustness of results to biological factors, that may systematically mislead (bias) the outcomes of statistical estimation, will be a key to avoiding incorrect phylogenomic inferences. In fact, there is a need for increased emphasis on the magnitude of differences (effect sizes) in addition to the P values of the statistical test of the null hypothesis. On the other hand, the amount of sequence data available will likely always remain inadequate for some phylogenomic applications, for example, those involving episodic positive selection at individual codon positions and in specific lineages. Again, a focus on effect size and biological relevance, rather than the P value, may be warranted. Here, we present a theoretical overview and discuss practical aspects of the interplay between effect sizes, bias, and P values as it relates to the statistical inference of evolutionary truth in phylogenomics.
Collapse
Affiliation(s)
- Sudhir Kumar
- Center for Evolutionary Medicine and Informatics, Biodesign Institute, Arizona State University, Arizona, USA.
| | | | | | | | | |
Collapse
|
37
|
McInerney JO, Pisani D, Bapteste E, O'Connell MJ. The Public Goods Hypothesis for the evolution of life on Earth. Biol Direct 2011; 6:41. [PMID: 21861918 PMCID: PMC3179745 DOI: 10.1186/1745-6150-6-41] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2011] [Accepted: 08/23/2011] [Indexed: 02/01/2023] Open
Abstract
It is becoming increasingly difficult to reconcile the observed extent of horizontal gene transfers with the central metaphor of a great tree uniting all evolving entities on the planet. In this manuscript we describe the Public Goods Hypothesis and show that it is appropriate in order to describe biological evolution on the planet. According to this hypothesis, nucleotide sequences (genes, promoters, exons, etc.) are simply seen as goods, passed from organism to organism through both vertical and horizontal transfer. Public goods sequences are defined by having the properties of being largely non-excludable (no organism can be effectively prevented from accessing these sequences) and non-rival (while such a sequence is being used by one organism it is also available for use by another organism). The universal nature of genetic systems ensures that such non-excludable sequences exist and non-excludability explains why we see a myriad of genes in different combinations in sequenced genomes. There are three features of the public goods hypothesis. Firstly, segments of DNA are seen as public goods, available for all organisms to integrate into their genomes. Secondly, we expect the evolution of mechanisms for DNA sharing and of defense mechanisms against DNA intrusion in genomes. Thirdly, we expect that we do not see a global tree-like pattern. Instead, we expect local tree-like patterns to emerge from the combination of a commonage of genes and vertical inheritance of genomes by cell division. Indeed, while genes are theoretically public goods, in reality, some genes are excludable, particularly, though not only, when they have variant genetic codes or behave as coalition or club goods, available for all organisms of a coalition to integrate into their genomes, and non-rival within the club. We view the Tree of Life hypothesis as a regionalized instance of the Public Goods hypothesis, just like classical mechanics and euclidean geometry are seen as regionalized instances of quantum mechanics and Riemannian geometry respectively. We argue for this change using an axiomatic approach that shows that the Public Goods hypothesis is a better accommodation of the observed data than the Tree of Life hypothesis.
Collapse
Affiliation(s)
- James O McInerney
- Molecular Evolution and Bioinformatics Unit, Department of Biology, National University of Ireland Maynooth, County Kildare, Ireland.
| | | | | | | |
Collapse
|
38
|
A close relationship between primary nucleotides sequence structure and the composition of functional genes in the genome of prokaryotes. Mol Phylogenet Evol 2011; 61:650-8. [PMID: 21864693 DOI: 10.1016/j.ympev.2011.08.011] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2010] [Revised: 05/31/2011] [Accepted: 08/05/2011] [Indexed: 11/21/2022]
Abstract
Comparative genomics is an essential tool to unravel how genomes change over evolutionary time and to gain clues on the links between functional genomics and evolution. In prokaryotes, the large, good quality, genome sequences available in public databases and the recently developed large-scale computational methods, offer an unprecedent view on the ecology and evolution of microorganisms through comparative genomics. In this work, we examined the links among genome structure (i.e., the sequential distribution of nucleotides itself by detrended fluctuation analysis, DFA) and genomic diversity (i.e., gene functionality by Clusters of Orthologous Genes, COGs) in 828 full sequenced prokaryotic genomes from 548 different bacteria and archaea species. DFA scaling exponent α indicated persistent long-range correlations (fractality) in each genome analyzed. Higher resolution power was found when considering the sequential succession of purine (AG) vs. pyrimidine (CT) bases than either keto (GT) to amino (AC) forms or strongly (GC) vs. weakly (AT) bonded nucleotides. Interestingly, the phyla Aquificae, Fusobacteria, Dictyoglomi, Nitrospirae, and Thermotogae were closer to archaea than to their bacterial counterparts. A strong significant correlation was found between scaling exponent α and COGs distribution, and we consistently observed that the larger α the more heterogeneous was the gene distribution within each functional category, suggesting a close relationship between primary nucleotides sequence structure and functional genes composition.
Collapse
|
39
|
Phylogenomic networks. Trends Microbiol 2011; 19:483-91. [PMID: 21820313 DOI: 10.1016/j.tim.2011.07.001] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Revised: 07/04/2011] [Accepted: 07/08/2011] [Indexed: 01/15/2023]
Abstract
Phylogenomics is aimed at studying functional and evolutionary aspects of genome biology using phylogenetic analysis of whole genomes. Current approaches to genome phylogenies are commonly founded in terms of phylogenetic trees. However, several evolutionary processes are non tree-like in nature, including recombination and lateral gene transfer (LGT). Phylogenomic networks are a special type of phylogenetic network reconstructed from fully sequenced genomes. The network model, comprising genomes connected by pairwise evolutionary relations, enables the reconstruction of both vertical and LGT events. Modeling genome evolution in the form of a network enables the use of an extensive toolbox developed for network research. The structural properties of phylogenomic networks open up fundamentally new insights into genome evolution.
Collapse
|
40
|
Kurt Lienau E, DeSalle R, Allard M, Brown EW, Swofford D, Rosenfeld JA, Sarkar IN, Planet PJ. The mega-matrix tree of life: using genome-scale horizontal gene transfer and sequence evolution data as information about the vertical history of life. Cladistics 2011; 27:417-427. [PMID: 34875790 DOI: 10.1111/j.1096-0031.2010.00337.x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Because horizontal gene transfer can confound the recovery of the largely prokaryotic tree of life (ToL), most genome-based techniques seek to eliminate horizontal signal from ToL analyses, commonly by sieving out incongruent genes and data. This approach greatly limits the number of gene families analysed to a subset thought to be representative of vertical evolutionary history. However, formalized tests have not been performed to determine whether combining the massive amounts of information available in fully sequenced genomes can recover a reasonable ToL. Consequently, we used empirically defined gene homology definitions from a previous study that delineate xenologous gene families (gene families derived from a common transfer event) to generate a massively concatenated, combined-data ToL matrix derived from 323 404 translated open reading frames arranged into 12 381 gene homologue groups coded as amino acid data and 63 336, 64 105, 65 153, 66 922 and 67 109 gene homologue groups coded as gene presence/absence data for 166 fully sequenced genomes. This whole-genome gene presence/absence and amino acid sequence ToL data matrix is composed of 4867 184 characters (a combined data-type mega-matrix). Phylogenetic analysis of this mega-matrix yielded a fully resolved ToL that classifies all three commonly accepted domains of life as monophyletic and groups most taxa in traditionally recognized locations with high support. Most importantly, these results corroborate the existence of a common evolutionary history for these taxa present in both data types that is evident only when these data are analysed in combination. © The Willi Hennig Society 2010.
Collapse
Affiliation(s)
- E Kurt Lienau
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA.,Department of Biology, Graduate School of Arts and Science, New York University, 100 Washington Square East, New York, NY 10003, USA.,Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - Rob DeSalle
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA
| | - Marc Allard
- Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - Eric W Brown
- Division of Microbiology, Center for Food Safety and Nutrition, Food and Drug Administration, 5100 Paint Branch Parkway, College Park, MD 20740, USA
| | - David Swofford
- Duke Institute for Genomes and Science Policy, 366 BioSci, Duke University, Durham, NC 27708, USA
| | - Jeffrey A Rosenfeld
- Department of Biology, Graduate School of Arts and Science, New York University, 100 Washington Square East, New York, NY 10003, USA
| | - Indra N Sarkar
- Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA
| | - Paul J Planet
- Sackler Institute for Comparative Genomics, American Museum of Natural History, Central Park West at 79th St, New York, NY 10024, USA.,Department of Pediatrics, Children's Hospital of New York, Columbia University, College of Physicians and Surgeons, New York, NY 10032, USA
| |
Collapse
|
41
|
Abstract
We have developed a semi-automatic methodology to reconstruct the phylogenetic species tree in Protozoa, integrating different phylogenetic algorithms and programs, and demonstrating the utility of a supermatrix approach to construct phylogenomics-based trees using 31 universal orthologs (UO). The species tree obtained was formed by three major clades that were related to three groups of data: i) Species containing at least 80% of UO (25/31) in the concatenated multiple alignment or supermatrix, this clade was called C1, ii) Species containing between 50%–79% (15–24/31) of UO called C2, and iii) Species containing less than 50% (1–14/31) of UO called C3. C1 was composed by only protozoan species, C2 was composed by species related to Protozoa, and C3 was composed by some species of C1 (Protozoa) and C2 (related to Protozoa). Our phylogenomics-based methodology using a supermatrix approach proved to be reliable with protozoan genome data and using at least 25 UO, suggesting that (a) the more UO used the better, (b) using the entire UO sequence or just a conserved block of it for the supermatrix produced similar phylogenomic trees.
Collapse
|
42
|
O'Malley MA, Koonin EV. How stands the Tree of Life a century and a half after The Origin? Biol Direct 2011; 6:32. [PMID: 21714936 PMCID: PMC3158114 DOI: 10.1186/1745-6150-6-32] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2011] [Accepted: 06/30/2011] [Indexed: 12/21/2022] Open
Abstract
We examine the Tree of Life (TOL) as an evolutionary hypothesis and a heuristic. The original TOL hypothesis has failed but a new "statistical TOL hypothesis" is promising. The TOL heuristic usefully organizes data without positing fundamental evolutionary truth.
Collapse
Affiliation(s)
- Maureen A O'Malley
- Department of Philosophy, Quadrangle A14, University of Sydney, NSW 2006, Australia
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda MD20894, USA
| |
Collapse
|
43
|
Zhang X, Kupiec M, Gophna U, Tuller T. Analysis of coevolving gene families using mutually exclusive orthologous modules. Genome Biol Evol 2011; 3:413-23. [PMID: 21498882 PMCID: PMC5654409 DOI: 10.1093/gbe/evr030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Coevolutionary networks can encapsulate information about the dynamics of presence and absence of gene families in organisms. Analysis of such networks should reveal fundamental principles underlying the evolution of cellular systems and the functionality of sets of genes. In this study, we describe a new approach for analyzing coevolutionary networks. Our method detects Mutually Exclusive Orthologous Modules (MEOMs). A MEOM is composed of two sets of gene families, each including gene families that tend to appear in the same organisms, such that the two sets tend to mutually exclude each other (if one set appears in a certain organism the second set does not). Thus, a MEOM reflects the evolutionary replacement of one set of genes by another due to reasons such as lineage/environmental specificity, incompatibility, or functional redundancy. We use our method to analyze a coevolutionary network that is based on 383 microorganisms from the three domains of life. As we demonstrate, our method is useful for detecting meaningful evolutionary clades of organisms as well as sets of proteins that interact with each other. Among our results, we report that: 1) MEOMs tend to include gene families whose cellular functions involve transport, energy production, metabolism, and translation, suggesting that changes in the metabolic environments that require adaptation to new sources of energy are central triggers of complex/pathway replacement in evolution. 2) Many MEOMs are related to outer membrane proteins, such proteins are involved in interaction with the environment and could thus be replaced as a result of adaptation. 3) MEOMs tend to separate organisms with large phylogenetic distance but they also separate organisms that live in different ecological niches. 4) Strikingly, although many MEOMs can be identified, there are much fewer cases where the two cliques in the MEOM completely mutually exclude each other, demonstrating the flexibility of protein evolution. 5) CO dehydrogenase and thymidylate synthase and the glycine cleavage genes mutually exclude each other in archaea; this may represent an alternative route for generation of methyl donors for thymidine synthesis.
Collapse
Affiliation(s)
- Xiuwei Zhang
- Laboratory for Computational Biology and Bioinformatics, School of Computer and Communication Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | | | | | | |
Collapse
|
44
|
Burleigh JG, Bansal MS, Eulenstein O, Hartmann S, Wehe A, Vision TJ. Genome-scale phylogenetics: inferring the plant tree of life from 18,896 gene trees. Syst Biol 2011; 60:117-25. [PMID: 21186249 PMCID: PMC3038350 DOI: 10.1093/sysbio/syq072] [Citation(s) in RCA: 70] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2009] [Revised: 07/03/2009] [Accepted: 08/17/2010] [Indexed: 11/13/2022] Open
Abstract
Phylogenetic analyses using genome-scale data sets must confront incongruence among gene trees, which in plants is exacerbated by frequent gene duplications and losses. Gene tree parsimony (GTP) is a phylogenetic optimization criterion in which a species tree that minimizes the number of gene duplications induced among a set of gene trees is selected. The run time performance of previous implementations has limited its use on large-scale data sets. We used new software that incorporates recent algorithmic advances to examine the performance of GTP on a plant data set consisting of 18,896 gene trees containing 510,922 protein sequences from 136 plant taxa (giving a combined alignment length of >2.9 million characters). The relationships inferred from the GTP analysis were largely consistent with previous large-scale studies of backbone plant phylogeny and resolved some controversial nodes. The placement of taxa that were present in few gene trees generally varied the most among GTP bootstrap replicates. Excluding these taxa either before or after the GTP analysis revealed high levels of phylogenetic support across plants. The analyses supported magnoliids sister to a eudicot + monocot clade and did not support the eurosid I and II clades. This study presents a nuclear genomic perspective on the broad-scale phylogenic relationships among plants, and it demonstrates that nuclear genes with a history of duplication and loss can be phylogenetically informative for resolving the plant tree of life.
Collapse
Affiliation(s)
- J Gordon Burleigh
- Department of Biology, University of Florida, Gainesville, FL 32609, USA.
| | | | | | | | | | | |
Collapse
|
45
|
Vogan AA, Higgs PG. The advantages and disadvantages of horizontal gene transfer and the emergence of the first species. Biol Direct 2011; 6:1. [PMID: 21199581 PMCID: PMC3043529 DOI: 10.1186/1745-6150-6-1] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2010] [Accepted: 01/03/2011] [Indexed: 01/05/2023] Open
Abstract
Background Horizontal Gene Transfer (HGT) is beneficial to a cell if the acquired gene confers a useful function, but is detrimental if the gene has no function, if it is incompatible with existing genes, or if it is a selfishly replicating mobile element. If the balance of these effects is beneficial on average, we would expect cells to evolve high rates of acceptance of horizontally transferred genes, whereas if it is detrimental, cells should reduce the rate of HGT as far as possible. It has been proposed that the rate of HGT was very high in the early stages of prokaryotic evolution, and hence there were no separate lineages of organisms. Only when the HGT rate began to fall, would lineages begin to emerge with their own distinct sets of genes. Evolution would then become more tree-like. This phenomenon has been called the Darwinian Threshold. Results We study a model for genome evolution that incorporates both beneficial and detrimental effects of HGT. We show that if rate of gene loss during genome replication is high, as was probably the case in the earliest genomes before the time of the last universal common ancestor, then a high rate of HGT is favourable. HGT leads to the rapid spread of new genes and allows the build-up of larger, fitter genomes than could be achieved by purely vertical inheritance. In contrast, if the gene loss rate is lower, as in modern prokaryotes, then HGT is, on average, unfavourable. Conclusions Modern cells should therefore evolve to reduce HGT if they can, although the prevalence of independently replicating mobile elements and viruses may mean that cells cannot avoid HGT in practice. In the model, natural selection leads to gradual improvement of the replication accuracy and gradual decrease in the optimal rate of HGT. By clustering genomes based on gene content, we show that there are no separate lineages of organisms when the rate of HGT is high; however, as the rate of HGT decreases, a tree-like structure emerges with well-defined lineages. The model therefore passes through a Darwinian Threshold. Reviewers This article was reviewed by Eugene V. Koonin, Anthony Poole and J. Peter Gogarten.
Collapse
Affiliation(s)
- Aaron A Vogan
- Origins Institute, McMaster University, Hamilton, Ontario, Canada
| | | |
Collapse
|
46
|
Abstract
Phylogenetic trees of individual genes of prokaryotes (archaea and bacteria) generally have different topologies, largely owing to extensive horizontal gene transfer (HGT), suggesting that the Tree of Life (TOL) should be replaced by a "net of life" as the paradigm of prokaryote evolution. However, trees remain the natural representation of the histories of individual genes given the fundamentally bifurcating process of gene replication. Therefore, although no single tree can fully represent the evolution of prokaryote genomes, the complete picture of evolution will necessarily combine trees and nets. A quantitative measure of the signals of tree and net evolution is derived from an analysis of all quartets of species in all trees of the "Forest of Life" (FOL), which consists of approximately 7,000 phylogenetic trees for prokaryote genes including approximately 100 nearly universal trees (NUTs). Although diverse routes of net-like evolution collectively dominate the FOL, the pattern of tree-like evolution that reflects the consistent topologies of the NUTs is the most prominent coherent trend. We show that the contributions of tree-like and net-like evolutionary processes substantially differ across bacterial and archaeal lineages and between functional classes of genes. Evolutionary simulations indicate that the central tree-like signal cannot be realistically explained by a self-reinforcing pattern of biased HGT.
Collapse
Affiliation(s)
- Pere Puigbò
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | |
Collapse
|
47
|
Bokhari SH, Janies DA. Reassortment networks for investigating the evolution of segmented viruses. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:288-298. [PMID: 20431148 DOI: 10.1109/tcbb.2008.73] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Many viruses of interest, such as influenza A, have distinct segments in their genome. The evolution of these viruses involves mutation and reassortment, where segments are interchanged between viruses that coinfect a host. Phylogenetic trees can be constructed to investigate the mutation-driven evolution of individual viral segments. However, reassortment events among viral genomes are not well depicted in such bifurcating trees. We propose the concept of reassortment networks to analyze the evolution of segmented viruses. These are layered graphs in which the layers represent evolutionary stages such as a temporal series of seasons in which influenza viruses are isolated. Nodes represent viral isolates and reassortment events between pairs of isolates. Edges represent evolutionary steps, while weights on edges represent edit costs of reassortment and mutation events. Paths represent possible transformation series among viruses. The length of each path is the sum edit cost of the events required to transform one virus into another. In order to analyze tau stages of evolution of n viruses with segments of maximum length m, we first compute the pairwise distances between all corresponding segments of all viruses in O(m2n2) time using dynamic programming. The reassortment network, with O(taun2) nodes, is then constructed using these distances. The ancestors and descendents of a specific virus can be traced via shortest paths in this network, which can be found in O(taun3) time.
Collapse
Affiliation(s)
- Shahid H Bokhari
- Department of Biomedical Informatics, Ohio State University, 3190 Graves Hall, 333 W. 10th Ave. Columbus, OH 43210, USA.
| | | |
Collapse
|
48
|
Guillemot S, Berry V. Fixed-parameter tractability of the maximum agreement supertree problem. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010; 7:342-353. [PMID: 20431153 DOI: 10.1109/tcbb.2008.93] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Given a set L of labels and a collection of rooted trees whose leaves are bijectively labeled by some elements of L, the Maximum Agreement Supertree (SMAST) problem is given as follows: find a tree T on a largest label set L(') is included in L that homeomorphically contains every input tree restricted to L('). The problem has phylogenetic applications to infer supertrees and perform tree congruence analyses. In this paper, we focus on the parameterized complexity of this NP-hard problem, considering different combinations of parameters as well as particular cases. We show that SMAST on k rooted binary trees on a label set of size n can be solved in O((8n)k) time, which is an improvement with respect to the previously known O(n3k2) time algorithm. In this case, we also give an O((2k)pkn2) time algorithm, where p is an upper bound on the number of leaves of L missing in a SMAST solution. This shows that SMAST can be solved efficiently when the input trees are mostly congruent. Then, for the particular case where any triple of leaves is contained in at least one input tree, we give O(4pn3) and O(3:12p + n4) time algorithms, obtaining the first fixed-parameter tractable algorithms on a single parameter for this problem. We also obtain intractability results for several combinations of parameters, thus indicating that it is unlikely that fixed-parameter tractable algorithms can be found in these particular cases.
Collapse
Affiliation(s)
- Sylvain Guillemot
- Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier, Centre National de Recherche Scientifique (CNRS), University of Montpellier 2, 161 rue Ada, 34392 Montpellier, France.
| | | |
Collapse
|
49
|
Metagenomic insights into evolution of a heavy metal-contaminated groundwater microbial community. ISME JOURNAL 2010; 4:660-72. [PMID: 20182523 DOI: 10.1038/ismej.2009.154] [Citation(s) in RCA: 203] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Understanding adaptation of biological communities to environmental change is a central issue in ecology and evolution. Metagenomic analysis of a stressed groundwater microbial community reveals that prolonged exposure to high concentrations of heavy metals, nitric acid and organic solvents ( approximately 50 years) has resulted in a massive decrease in species and allelic diversity as well as a significant loss of metabolic diversity. Although the surviving microbial community possesses all metabolic pathways necessary for survival and growth in such an extreme environment, its structure is very simple, primarily composed of clonal denitrifying gamma- and beta-proteobacterial populations. The resulting community is overabundant in key genes conferring resistance to specific stresses including nitrate, heavy metals and acetone. Evolutionary analysis indicates that lateral gene transfer could have a key function in rapid response and adaptation to environmental contamination. The results presented in this study have important implications in understanding, assessing and predicting the impacts of human-induced activities on microbial communities ranging from human health to agriculture to environmental management, and their responses to environmental changes.
Collapse
|
50
|
Chan CX, Beiko RG, Darling AE, Ragan MA. Lateral transfer of genes and gene fragments in prokaryotes. Genome Biol Evol 2009; 1:429-38. [PMID: 20333212 PMCID: PMC2817436 DOI: 10.1093/gbe/evp044] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/31/2009] [Indexed: 01/24/2023] Open
Abstract
Lateral genetic transfer (LGT) involves the movement of genetic material from one lineage into another and its subsequent incorporation into the new host genome via genetic recombination. Studies in individual taxa have indicated lateral origins for stretches of DNA of greatly varying length, from a few nucleotides to chromosome size. Here we analyze 1,462 sets of single-copy, putatively orthologous genes from 144 fully sequenced prokaryote genomes, asking to what extent complete genes and fragments of genes have been transferred and recombined in LGT. Using a rigorous phylogenetic approach, we find evidence for LGT in at least 476 (32.6%) of these 1,462 gene sets: 286 (19.6%) clearly show one or more "observable recombination breakpoints" within the boundaries of the open reading frame, while a further 190 (13.0%) yield trees that are topologically incongruent with the reference tree but do not contain a recombination breakpoint within the open reading frame. We refer to these gene sets as observable recombination breakpoint positive (ORB(+)) and negative (ORB(-)) respectively. The latter are prima facie instances of lateral transfer of an entire gene or beyond. We observe little functional bias between ORB(+) and ORB(-) gene sets, but find that incorporation of entire genes is potentially more frequent in pathogens than in nonpathogens. As ORB(+) gene sets are about 50% more common than ORB(-) sets in our data, the transfer of gene fragments has been relatively frequent, and the frequency of LGT may have been systematically underestimated in phylogenetic studies.
Collapse
Affiliation(s)
- Cheong Xin Chan
- Institute for Molecular Bioscience and ARC Centre of Excellence in Bioinformatics, The University of Queensland, Brisbane, Queensland, Australia
| | | | | | | |
Collapse
|